Intelligence Squared Debate Dataset ============================================== This dataset contains a collection of transcripts and metadata for debates from the series "Intelligence Squared Debates" (IQ2) [1], held in the US from September 2006 to September 2015. For each debate, the transcript of each turn is given, along with information such as voting results pre- and post-debate, and audience reaction markers. There are 108 debates with an average of 117 utterances per debate. URL: http://tisjune.github.io/research/iq2 Authors: Justine Zhang Ravi Kumar Sujith Ravi Cristian Danescu-Niculescu-Mizil Contact: Justine Zhang Last updated: April 5, 2016 Version: 1.0 The dataset is further described in our paper: Conversational flow in Oxford-style debates Justine Zhang, Ravi Kumar, Sujith Ravi, Cristian Danescu-Niculescu-Mizil Proceedings of NAACL, 2016. Short paper. Files ----- * iq2_data_release.json - a JSON file containing the dataset * README.txt - this readme Code forthcoming. Debate description ------------------ IQ2 is a series of public debates which follow the Oxford style and are recorded live. In each debate, teams of 2 to 3 experts debate for or against a motion and attempt to sway the audience to take their position. Each debate has a moderator (John Donvan for the most recent debates). Debates consist of 3 rounds: * In the introduction, debaters are given 7 minutes each to make opening statements and lay out their points. * In the discussion, debaters take questions from the moderator and audience, and also respond to attacks from the other team. This round lasts 30 minutes. * In the conclusion, debaters are given 2 minutes each to make final remarks. Before the debate, the live audience votes on whether they are for, against, or undecided on the motion. After the debate, the audience votes again, and the difference between post- and pre-debate votes (the "delta") is computed. The side with the higher delta wins the debate. Format ------ The dataset is a JSON file: >>> import json >>> with open("iq2_data_release.json", "r") as f: ... debates = json.load(f) ... It is structured as a dictionary with entries for each of 108 debates. This is an example of one such entry, with fields explained: >>> debate = debates['040914%20Millennials'] # the debate pdf can be found at # intelligencesquaredus.org/images/debates/past/transcripts/, # eg http://intelligencesquaredus.org/images/debates/past/transcripts/040914%20Millennials.pdf >>> debate { 'title': "Millennials Don't Stand A Chance", # debate title given by IQ2 'date': 'Wednesday, April 9, 2014', # date the debate took place 'url': 'http://intelligencesquaredus.org/debates/past-debates/item/1019-millennials-dont-stand-a-chance', # the URL of the page containing debate information 'results': # debate results { 'breakdown': # each position1_position2 number is the percentage of audience members # who voted for position1 before the debate and position2 after # Note this breakdown is not available for the earliest debates. { 'against_against': 29.0, 'against_for': 11.0, 'against_undecided': 5.0, 'for_against': 5.0, 'for_for': 10.0, 'for_undecided': 0.0, 'undecided_against': 16.0, 'undecided_for': 16.0, 'undecided_undecided': 9.0 }, 'post': {'against': 52.0, 'for': 38.0, 'undecided': 10.0}, # vote breakdown post-debate 'pre': {'against': 47.0, 'for': 18.0, 'undecided': 35.0} # vote breakdown pre-debate }, 'speakers': # debaters { 'against': # list of speakers on the 'against' side [ { 'name': David D. Burstein, # speaker name 'bio': ..., # long bio that is provided on the IQ2 website 'bio_short': 'Author, [...]' # short bio provided on the IQ2 website }, ... ] 'for': [...] # similar to against 'moderator': ... # similar format as speaker listings; however this is a single entry # describing the moderator as opposed to a list of speakers. } 'summary': ..., # transcript of the debate provided on the IQ2 website 'transcript': # list of turns taken by speakers in the debate [ ..., { 'nontext': { 'laughter': [[0,3]], ... }, # dictionary denoting annotations in the transcript # these annotations correspond to either audience reactions like laughter, # or to positions where the speech is unclear or some other difficulty # with transcription occurred. # the format is a dictionary where keys correspond to the text # in the annotation and values are lists of tuples of form # [index of paragraph, index of annotation in words in paragraph]. # for instance, in the example, the annotation 'laughter' occurred # at debate['transcript']['paragraphs'][0], right before the 3rd word. 'paragraphs': [ ... ], # list of text for each paragraph in this turn. # paragraph boundaries are chosen by the transcribers. 'segment': 0, # which part (intro, discussion, conclusion) the turn comes from # 0=intro, 1=discussion, 2=conclusion 'speaker': 'John Donvan', # name of the speaker 'speakertype': 'mod' # side the speaker is on. "for" and "against" denote debaters; # "mod" denotes moderator, "host" denotes host of the debate, # "unknown" denotes other speakers (generally audience members) }, ... ] } References ---------- [1] http://www.intelligencesquaredus.org/