Frequently Asked Questions

What is THEA, and how does it relate to CLEA?

THEA (The Elections Archive) is a repository of subnational election results. CLEA (Constituency-Level Elections Archive) has traditionally focused solely on results reported at the constituency level, whereas THEA compiles elections reported at various subnational levels, including but not limited to elections reported at the constituency level. Another difference between THEA and CLEA is that CLEA only includes data from national and supra-national elections, whereas THEA also includes selected results from smaller geographic units (e.g., provinces, cities). However, the products that formed CLEA will continue to be released under THEA’s auspices–namely, national lower house, upper house, and European Parliamentary elections reported at the constituency level; Geo-Referenced Electoral Districts (GRED); and party nationalization measures for the lower and upper houses. THEA in the future may include more than election results, such as images of ballots or campaign posters.

What principles have you used to construct THEA?

The central priority of THEA is to preserve information on elections for use by current and future generations of researchers, which entails amassing source materials from countries around the world, assembling relevant results they contain, and consolidating these data into an archive. We pursue the most comprehensive coverage possible given data availability without imposing geographic, temporal, or political restrictions. The data posted in THEA are of high quality and have the kind of integrity researchers need to draw accurate inferences. The collection aims to encompass all sovereign nations – including the micro-states – and self-governing territories. The current coverage has a substantial representation of cases from most regions of the world, and from both developed and developing societies. The reach extends back as far as possible, even to elections from pre-independence periods. THEA has been developed specifically to make the data easy to access and use. One of THEA’s most distinctive features for researchers is a design oriented toward researchers conducting comparative research. To strengthen the functionality and appeal of THEA and keep it on the cutting edge, we regularly pursue innovations and special features, including innovations that our users suggest.

Are countries excluded for any reasons (size, type of electoral system, problems with elections, surrounding political conditions, etc.?

Elections that are blatantly uncompetitive and rubber-stamp plebiscites are excluded (e.g., North Korea), but otherwise we have data from any polity holding elections. This includes marginally democratic elections, such as those in Pakistan or Singapore.

Why do you include elections that were boycotted, disputed, suspended and/or annulled?

The data from those elections can provide important information for researchers studying the areas where they occurred. We include with the dataset a codebook which provides information for researchers about the context of each election, and they can then make an informed decision about whether to include those data in their analysis.

Why/when does THEA include results from political entities that are not sovereign countries (e.g., Anguilla)?

We see legitimate reasons why elections from those entities might be interesting and useful for researchers.

How has the dataset been constructed?

The THEA team has progressively and painstakingly accumulated a massive volume of election results from hundreds of primary (e.g., online and print reports from election authorities) and secondary (independent websites and research publications) sources, supplemented with direct contributions of data collected privately by at least two-dozen other scholars. Special attention is paid to catching new releases of election results. Research assistants scour available websites to be sure to capture the data before it is taken down. Where feasible, we archive sources in a digital format for purposes of preservation.

What is the process for making data part of THEA?

In general, we take data in whatever form we can find it and then reformat it and clean it (check for errors and fill in missing data when possible). For some countries this is straightforward and takes limited time and specialized expertise. But the process can be involved, difficult, and time consuming. Consider the example of the Hungarian parliamentary elections. The country has a complex three-tier system comprised of 176 single-member local constituencies, 20 single-member regional constituencies, and an overall national list. Originally, we obtained results for Hungary’s post-communist elections from a secondary source that was found to be riddled with errors. Many of the errors were corrected by consulting other sources, but we opted to remove the 1998 election results during the fourth release until a more accurate record could be found. The Hungarian National Election Office (NEO) posts results since 1998 on its website. Yet this material is in Hungarian and remained beyond our reach to process until recent improvements in web-based translation programs. At that point, we tasked three undergraduate students with scraping the single-member constituency results from NEO’s reports for the 1998 parliamentary elections. This entailed copying the results from tables in 371 separate URLs, since constituencies were formatted on separate pages and nearly every constituency had a second-round runoff. The results were pasted into an Excel worksheet. The data then required reformatting to put it into the CLEA template. Next, a graduate student checked for errors, which were sent back for the students to correct. Afterwards, the file was sent to the CLEA computer programmer, who checked for both source (e.g., votes outside the possible range) and production errors (e.g., identical values across multiple districts), assigned missing values, and calculated several additional variables, entailing seven hours of effort. Thus, one solitary election consumed many person-hours. Though this case is atypical, we offer the illustration to give a sense of the challenges and manpower requirements involved.

How do you determine party codes?

This is actually a tricky issue. The basic principle is to try to give a unique code to each political party that has a relatively consistent label and organizational basis. We cannot always know about changes on either dimension without a significant level of country and party expertise. So, for a hypothetical example, a Social Democrat party in 2000 may not be the same organizationally as a Social Democrat party in 2008 in a given country. Alternatively, the Social Democrat party in 2000 could be the same party organizationally to the New Democrat party in 2005. This can be problematic for some purposes, we recognize, and it requires the researcher to know something about the parties to connect the two different party codes across time. But it avoids the problem of having to separate out parties with the same party code because they have the same name but they are in fact different parties organizationally. A classic example is the Republican label in American history. The modern Republicans have the same party code since 1854, but a different code than the Republicans who ran candidates as early as 1800 and died out a few decades later. Our rule of thumb is to give the same party code as long as the name remains consistent in consecutive elections, or if a party of the same name does not appear for one election but then reappears in the following election. If a party does not appear for two consecutive elections, but then a party of the same name appears (e.g., Labour Party appears in 2000, but not in 2004 and 2008, and then Labour Party appears again in 2012), it receives a new party code.

How is the turnout variable calculated?

Turnout is the fraction of eligible voters who vote in a given constituency. We calculate this variable by dividing the total number of votes cast in the constituency (VOT1) by the number of eligible voters in the constituency (PEV1). Because countries use a variety of methods to calculate turnout, we do not report the turnout rates listed in official election reports as these are not consistent across countries.

Why in some constituencies does the total of the votes received by all the candidates/parties exceed the number of valid votes?

In some countries valid votes are reported as the valid number of ballots cast, but voters may make multiple candidate or party selections on the ballot. Thus, the sum of votes for candidates and/or parties will exceed the number of valid votes.

Why are the constituencies given different numbers for each election?

Countries sometimes change the boundaries of constituencies but keep the same name, so we cannot be sure if a given constituency of the same name over time is actually the same geographic area. We thus chose to “time stamp” the constituencies by assigning numbers per election; although in many cases (e.g., the United States), constituencies have the same number in between redistricting due to recent efforts to standardize constituency names.

What if I find a mistake in THEA?

Please email us right away with information about the mistakes: [email protected]. The research team is committed to correcting mistakes, and in some instances, we have pulled data from the website when the errors are sufficiently serious, and then put the data back on the website after corrections.

Do you make any changes in the data between releases?

We are always improving the quality of our data, and known errors that may hamper data analysis are posted in between releases on the errata page. In between releases we fix these errors, and in some cases, we remove a previously released election from the website if the repairs would delay significantly a forthcoming release.

I have data I would like to contribute. How can I do that?

You can email us at [email protected], or contact any of members of the THEA leadership team.

Of what benefit is it to add data to THEA?

You are contributing to a public good for social scientists, government researchers, reporters, and the general public. You will also receive credit for contributing the data in the THEA codebook.

The dataset lacks results from the election in year X in country y -- do you have the information, and if so when do you intend to post?

Our archive contains the results for many elections that are not available on the website, but these may not be posted for several reasons. First, the results may be in a format that requires manual entry (i.e., a hard copy data book, scanned images), and depending on the complexity of the electoral system, may involve several person-hours for a single election. In these cases, the research team often instead opts to continue to look for electronic resources that can be formatted much more quickly into the THEA template. Second, we sometimes obtain sources that do not provide enough information for us to code to THEA specifications (e.g., they only include winners of a given election rather than all candidates). Third, translation issues may slow the data cleaning process, particularly the completion of the official party list. We do not make public our timeline for posting specific elections because unforeseen circumstances, such as those described above, may result in the removal of a case from a forthcoming release, but we are happy to share materials in our archives upon request.