Marius Niculae

This post is focusing on the analysis of an actual research manuscript called “Tobacco Control in the EU-15: The Role of Member States and the European Union”, written by Donley T. Studlar, Kyle Christensen and Arnita Sitasari, and first published in the Journal of European Public Policy Volume 18, Issue 5, 2011. Conceptually, my post is framed as a critical “referee report” on the possible flaws of the above manuscript.

One of the most important lessons to be learned from the last US Presidential elections concerns the importance of the research design in statistics. Nate Silver, a renowned statistician, was heavily criticized by for his strong confidence in cold hard data. Moreover, he was accused of political partisanship because he refused to consider the influence that political debates might have on the final electoral outcome. Nate also refused to account for the ways in which the ideology could impact on his research model: “All Nate’s doing is averaging polls and counting electoral votes right? That’s the secret sauce” . Despite this, Nate Silver was capable to call all the 50 states in the US Electoral College.

In other words, by using a data subset he was able to draw an accurate prediction for the full electoral body and to argue for a clear relationship between pooling data and election results. As proved by N. Silver, one of the great advantages of statistics is the capacity to describe with accuracy, if correctly used, the main tendencies within a sample of a population or the population as a whole. Another role of statistics, which in Silver’s case brought him accusations of using a skewed model, is to help our intuition. More exactly, it enables us to determine the extent to which the available data support or deny our hypothesis. By using a simple research model for his forecast, Silver was capable to come up with highly accurate data and to constantly narrow the gap produced by errors.

Nate Silver’s case is a a valuable critical model for my effort in pointing out some hidden hazards of the above research design. The first step is to test its internal and external validity against the most common pitfalls of a research paper: non-falsifiable or unclear hypothesis, poor operationalization or fail to address rival explanations, non-representative samples, selecting according to the dependent variable, or data mining. During my analysis I will also refer to other concepts such as omitted variable bias or outliers.

First, the abstract unfolds some initial hints about the manuscript: ” This paper uses statistical models to test explanations of tobacco control policy across 15 EU member states adopting instruments of Comprehensive Tobacco Control Policy (CPTC) over two decades. Socioeconomic modernization, economic interest groups and domestic political factors all play a role in policy. Although there is declining influence of pro-tobacco domestic constituencies, adoption of CPTC is still inhibited by corporatist practices in member states. Vertical policy diffusion through the EU has aided domestic sources of policy adoption, making tobacco control policy one of multi-level governance and enhancing its comprehensiveness”[1].

Arguing for causality requires a theory and a research model. Otherwise, the relationship between the dependent variable and the independent variables cannot be correctly defined. In other words, the lack of a strong theory and clear mechanism can leave space for lurking variables to step in as a third factor and create the illusion of a direct causal relationship among unrelated variables. I will come back on this topic later one, as I advance with my analysis.

Defining the concept is a basic step in constructing causal explanations. This allows the researches to extract an idea or an element thought to be essential for the entire population[2]. Unfortunately, the absence of a clear research model cannot lay the backgrounds for sound operationalization. This later process must result in hard data which are a pre-requisite for good independent variables allowing the empirical measurement of the concept.

Leaving behind the questions marks raised by the abstract and focusing on the introduction one can identify that the “paper attempts to ascertain what social, economical and political factors help explain the progressive adoption of more CTCP in 15 EU member countries over the past two decades, including the role of the EU in their adoption[3]. As King et al. put it, a successful project is one that explains a lot with a little or aims to use a single explanatory variable to explain numerous observations on dependent variables [4]. However, while doing this, there is a clear risk for an omitted variable bias which can undermine the entire estimation process. Omitted-variable bias occurs when a recent created research model leaves out one or more important causal factors.

In this specific case, I think that the authors were so preoccupied in avoiding such traps that they overlooked a rather obvious one such as building a model that is trying to explain a lot with a lot [6]. Furthermore, it is not clear to what degree one can assume that the EU acts as a non-representative sample for the research model or not: “Since the mid-1980s the EU has provided an additional arena in which struggle over tobacco control policy has occurred, making it subject to federal or quasi-federal practices of shared sovereignty and potential for policy transfer either coercively or voluntary[7]. A non-representative sample generates a skewed model, especially in a research defined as “an early cross-national quantitative” that is trying to “establish the plausibility of different potential influences in a multivariate model [8]

Moreover, the authors try to build a sound research while still being hesitant about the role of the multiple institutions involved in the decision making process at EU level and by using blurry concepts such as “historical-cultural-linguistic similarities captured in the concept of families of nations[9].

Another flaw of the proposed research design lays in the huge amount of literature review proposed by the authors. First of all, it will be almost impossible for a peer reviewer do recreate the entire research in a reasonable amount of time, even containing himself at just looking into the available bibliography. Aggregating data from OECD, World Bank, EU Commission, EUROLEX, WHO Europe or IMF can be a highly difficult process and can bring into discussion data which are not really relevant for the research design.

Second, Donley T. Studlar, as one of the researchers behind this study, is quoted more than 10 times in association with studies where he was either the main author or part of the researching team. Considering this pattern as being very suspicious, I turned to Google scholar research engine to look deeper into the backgrounds of this paper. My intuition proved to be a correct one. The Tobacco Control in the EU-15: The Role of Member States and the European Union study is cited only once by the same D.T.Studler in a another research paper having most of the flaws identified in my current analysis [10].

Thirdly, there is a clear bias of generalizing findings based on previous experiences. In decision making theory this concept is called anchoring and it goes against one of the most important traits of statistics, the one that helps us making sense of data collection and observations. As a result, statistics allows us to avoid jumping to conclusions and to be cautious about the extent to which we can generalize from our always limited experience [11].

As a logical consequence, the above findings made me pay attention at any possible hints of data mining. In this way, I have noticed that the authors tried to confirm data already obtained in previous studies: “The results here differ somewhat from previous analysis based on examining individual instruments of policy across states, excluding the EU variable, rather than the policy aggregates of CTCP (Studlar 2009)”. The problem behind my observation is that the authors are not testing their hypothesis against different data sets. In other words, they use data mining for constructing independent variables that are matching what they want to demonstrate. As a consequence their research design is biased.

The authors of the study are deriving the independent variables from the dependent variable: “The dependent variable in the model is an indexed score of non-tax tobacco score ranging from 0 to 34(…) based on studies of relevant instruments in tobacco control (…) the independent variables in the models comprise a list of factors that might affect policy adoption.” More exactly, they are trying to explain the adoption of increasingly restrictive tobacco control policies in Western democracies by using only variables directly linked with this policy. A basic rule of thumb in research design is to allow variation on the dependent variable in order to obtain valuable observations and plausible inference. On way the authors can do this is to disaggregate their theory about tobacco policy control into small observations and to supplement their research with a few less detailed cases based on secondary data. A good example in this sense will be to analyse what is happening at regional level in the field of tobacco control, to include the effects of black market or to extract data according to gender, age, education or health status.

Considering on the above information one can conclude that the proposed hypotheses are built by using a wrong set of data as they are trying to demonstrate causality and not to argue for it. Hypotheses should be as specific as possible and provide a clear pattern between the variables and must be empirically testable. This will allow the researcher to know if he is wrong or not, a question everyone should ask himself while writing a research paper.

As a final remark on the relevance of this research manuscript for revealing relevant data about the control tobacco policy in Western democracies we let the text speak for itself: “ In order effectively to determine what impacts the development of non-tax tobacco policy in the EU-15, our analysis incorporates a set of pooled cross-sectional time-series models (…) Our design incorporates Prais Winston regression panel corrected standard errors, in conjunction with a correction for a first order auto-regressive function”.

In order to argue about the validity of a research manuscript a peer reviewer must be able to comprehend it. One cannot picture different outcomes for research designs built outside a clear methodological approach. According to Drezner, if we cannot picture that, then we can discard the paper as an agitprop effort rather than a piece of research [12].


ALLISON David Paul, Multiple Regression: A Primer, ed. Pine Forge Press, Thousand Oaks, 1999

KING Gary, KEOHANE O. Robert, VERBA Sidney, Designing Social Inquiry, Scientific Inference in Qualitative Research, ed. Princeton University Press, New Jersey, 1994

MEIER J. Kenneth, BRUNDEY L. Jeffrey, BOTHE John, Applied Statistics for Public and Non-Profit Administration, Ed. Thomson/Wadsworth, Belmont, 2009

ROWNTREE Derek, Statistics without tears, An introduction for non mathematicians, Ed. Penguin Books, London, 1981

STUDLAR T Donley, CHRISTENSEN Kyle, SITASARI, Arnita, Tobacco Control in the EU-15: The Role of Member States and the European Union,


[2] MEIER J., Kenneth, BRUNDEY L., Jeffrey, BOTHE, John, Applied Statistics…, op. cit, p.34

[3] STUDLAR T Donley, CHRISTENSEN Kyle, SITASARI, Arnita, Tobacco Control in the EU-15: The Role of Member States and the European Union, p.2

[4] KING, Gary, KEOHANE O., Robert, VERBA, Sidney, Designing Social Inquiry, Scientific Inference in Qualitative Research, ed. Princeton University Press, New Jersey, 1994, p. 123

[5] ALLISON, David Paul, Multiple Regressions: A Primer, ed. Pine Forge Press, Thousand Oaks, 1999, p. 50

[6] idem

[7] STUDLAR T Donley, CHRISTENSEN Kyle, SITASARI, Arnita, Tobacco Control in the EU-15: The Role of Member States and the European Union,p. 3

[8] idem, p. 10

[9] idem, p.4


[11] ROWNTREE, Derek, Statistics without tears, An introduction for non mathematicians, Ed. Penguin Books, London, 1981, p.15

[12] DREZNER W., Daniel



Author :