Why do you focus on lower house elections?
Every democratic country vests important, if not full sovereign, authority in its lower house of parliament or of the legislature. It is the one consistent institution that is elected across all of the world’s democracies. Since CLEA was created to enable comparison across countries and within countries over time, providing data on the one common institution and its elections across all countries provides a reasonable focus of study.
What principles have you used to construct CLEA?
The central priority of CLEA is to preserve information on elections for use by current and future generations of researchers, which entails amassing source materials from countries around the world, assembling relevant results they contain, and consolidating these data into an archive. We pursue the most comprehensive coverage possible given data availability without imposing geographic, temporal, or political restrictions. The data posted in CLEA are of high quality and have the kind of integrity researchers need to draw accurate inferences. The collection encompasses all sovereign nations – including the micro-states – and self-governing territories. The current coverage has a substantial representation of cases from most regions of the world, and from both developed and developing societies. The reach extends back as far as possible, even to elections from pre-independence periods. CLEA has been developed specifically to make the data easy to access and use. One of CLEA’s most distinctive features for researchers is a design oriented toward researchers conducting comparative research. To strengthen the functionality and appeal of CLEA and keep it on the cutting edge we regularly pursue innovations and special features, including innovations that our users suggest.
Are countries excluded for any reasons (size, type of electoral system, problems with elections, surrounding political conditions, etc.?
Countries with blatantly uncompetitive elections and rubber-stamp plebiscites are excluded (e.g., North Korea), but otherwise we have data from any countries holding lower house elections. This includes marginally democratic countries such as Pakistan or Singapore.
Why do you include elections that were boycotted, disputed, suspended and/or annulled?
The data from those elections can provide important information for researchers studying those countries. We include with the dataset a codebook which provides information for researchers about the context of each election, and they can then make an informed decision about whether to include those data in their analysis.
Why/when does CLEA include results from political entities that are not sovereign countries (e.g., Anguilla)?
We see legitimate reasons why elections from those entities might be interesting and useful for researchers.
How has the dataset been constructed?
The CLEA team has progressively and painstakingly accumulated a massive volume of election results from hundreds of primary (e.g., online and print reports from election authorities) and secondary (independent websites and research publications) sources, supplemented with direct contributions of data collected privately by at least two-dozen other scholars. To ensure that our data collection is as comprehensive, organized, and systematic as possible, we developed a complete list of all lower house legislative elections using a variety of sources (e.g., Megan Reif’s eDates dataset). The list is updated regularly to include the latest elections. A running inventory is maintained of what type of data has been located (i.e., constituency level, other sub-national, aggregate national, none) for each election on the list. Our efforts to fill in the cases involve a combination of online and bibliographic searches, some of which have led to loans of material from libraries or other institutions, and inquiries of official authorities and independent contacts. Special attention is paid to catching new releases of election results. Research assistants scour available websites to be sure to capture the data before it is taken down. All sources we find are archived right away in digital format for purposes of preservation. Online material is downloaded and saved in the original file formats. Print material is scanned to PDF files.
What is the process for making data part of CLEA?
In general, we take data in whatever form we can find it and then reformat it and clean it (check for errors and fill in missing data when possible). For some countries this is straightforward and takes limited time and specialized expertise. But the process can be involved, difficult, and time consuming. Consider the example of the Hungarian parliamentary elections. The country has a complex three-tier system comprised of 176 single-member local constituencies, 20 single-member regional constituencies, and an overall national list. Originally, we obtained results for Hungary’s post-communist elections from a secondary source that was found to be riddled with errors. Many of the errors were corrected by consulting other sources, but we opted to remove the 1998 election results during the fourth release until a more accurate record could be found. The Hungarian National Election Office (NEO) posts results since 1998 on its website. Yet this material is in Hungarian and remained beyond our reach to process until recent improvements in web-based translation programs. At that point, we tasked three undergraduate students with scraping the single-member constituency results from NEO’s reports for the 1998 parliamentary elections. This entailed copying the results from tables in 371 separate URLs, since constituencies were formatted on separate pages and nearly every constituency had a second-round runoff. The results were pasted into an Excel worksheet. The data then required reformatting to put it into the CLEA template. Next, a graduate student checked for errors, which were sent back for the students to correct. Afterwards, the file was sent to the CLEA computer programmer, who checked for both source (e.g., votes outside the possible range) and production errors (e.g., identical values across multiple districts), assigned missing values, and calculated several additional variables, entailing seven hours of effort. Thus, one solitary election consumed many person-hours. Though this case is atypical, we offer the illustration to give a sense of the challenges and manpower requirements involved.
How do you determine party codes?
This is actually a tricky issue. The basic principle is to try to give a unique code to each political party that has a consistent label and organizational basis. We cannot always know about changes on either dimension without a significant level of country and party expertise. So, for a hypothetical example, a Social Democrat party in 2000 may not be the same organizationally to a Social Democrat party in 2008 in a given country. Alternatively, the Social Democrat party 2000 could be the same party organizationally to the New Democrat party in 2005. Our rule of thumb is to give the same party code as long as the name remains consistent in consecutive elections. This assumes that labels mean something and the same labels signify continuity over time. If, for example, a party skips an election, and then runs again under the same name as a previous election (but not immediately prior) we start the party over again with a new label. This can be problematic for some purposes, we recognize, and requires the researcher to know something about the parties to connect the two different party codes across time. But it avoids the problem of having to separate out parties with the same party code because they have the same name but they are in fact different parties organizationally. A classic example is the Republican label in American history. The modern Republicans have the same party code since 1854, but a different code than the Republicans who ran candidates as early as 1800 and died out a few decades later.
How is the turnout variable calculated?
Turnout is the fraction of eligible voters who vote in a given constituency. We calculate this variable by dividing the total number of votes cast in the constituency (VOT1) by the number of eligible voters in the constituency (PEV1). Because countries use a variety of methods to calculate turnout, we do not report the turnout rates listed in official election reports as these are not consistent across countries.
Why in some constituencies does the total of the votes received by all the candidates/parties exceed the number of valid votes?
In some countries valid votes are reported as the valid number of ballots cast, but voters may make multiple candidate or party selections on the ballot. Thus, the sum of votes for candidates and/or parties will exceed the number of valid votes.
Why are the constituencies given different numbers for each election?
Countries sometimes change the boundaries of constituencies but often keep the same name. So we cannot be sure if a given constituency of the same name over time is actually the same geographic area. We thus chose to “time stamp” the constituencies by assigning numbers per election; although in many cases (e.g., the United States), constituencies have the same number in between redistricting due to recent efforts to standardize constituency names.
What if I find a mistake in CLEA?
Please email us right away with information about the mistakes: firstname.lastname@example.org. The research team is committed to correcting mistakes, and in some instances, we have pulled data from the website when the errors are sufficiently serious, and then put the data back on the website after corrections. With every release we include an errata which details any found errors and the steps taken to correct those errors.
Do you make any changes in the data between releases?
We are always improving the quality of our data, and known errors that may hamper data analysis are posted in between releases on the errata page. In between releases we fix these errors, and in some cases, we remove a previously released election from the website if the repairs would delay significantly a forthcoming release.
If I have data to contribute, whom do I contact?
Of what benefit is it to add data to CLEA?
First, you are contributing to a public good for social scientists, government researchers, reporters, and the general public. Second, you will receive credit for contributing the data on the CLEA website. Third, we ask all users to cite the people who contributed the data when those users incorporate that particular country’s data into their research.
The dataset lacks results from the election in year X in country y -- do you have the information, and if so when do you intend to post?
Our archive contains the results for many elections that are not available on the website, but these may not be posted for several reasons. First, the results may be in a format that requires manual entry (i.e., a hard copy data book, scanned images), and depending on the complexity of the electoral system, may involve several person-hours for a single election. In these cases, the research team continues to look for electronic resources that can be formatted much more quickly into the CLEA template. Second, official sources, such as the election commission or interior ministry, may post results that are not at the level of representation for the lower chamber (i.e., where votes are translated into seats). We continue to look for constituency-level results from alternative sources, such as newspapers, in such situations. Third, translation issues may slow the data cleaning process, particularly the completion of the official party list.
We do not make public our timeline for posting specific elections because unforeseen circumstances, such as those described above, may result in the removal of a case from a forthcoming release, but we are happy to share materials in our archives upon request.