Working papers

[12] “The Effect of Omitted Variables on the Sign of Regression Coefficients", with Alexandre Poirier (August 2022; latest draft: Feb 2023)

Abstract
  • Omitted variables are a common concern in empirical research. We distinguish between two ways omitted variables can affect baseline estimates: by driving them to zero or by reversing their sign. We show that, depending on how the impact of omitted variables is measured, it can be substantially easier for omitted variables to flip coefficient signs than to drive them to zero. Consequently, results which are considered robust to being "explained away" by omitted variables are not necessarily robust to sign changes. We show that this behavior occurs with "Oster's delta" (Oster 2019), a commonly reported measure of regression coefficient robustness to the presence of omitted variables. Specifically, we show that any time this measure is large--suggesting that omitted variables may be unimportant--a much smaller value reverses the sign of the parameter of interest. Relatedly, we show that selection bias adjusted estimands can be extremely sensitive to the choice of the sensitivity parameter. Specifically, researchers commonly compute a bias adjustment under the assumption that Oster's delta equals one. Under the alternative assumption that delta is very close to one, but not exactly equal to one, we show that the bias can instead be arbitrarily large. To address these concerns, we propose a modified measure of robustness that accounts for such sign changes, and discuss best practices for assessing sensitivity to omitted variables. We demonstrate this sign flipping behavior in an empirical application to social capital and the rise of the Nazi party, where we show how it can overturn conclusions about robustness, and how our proposed modifications can be used to regain robustness. We analyze three additional empirical applications as well. We implement our proposed methods in the companion Stata module regsensitivity for easy use in practice.

To install the companion Stata module, type ssc install regsensitivity, all from within Stata. Type help regsensitivity for syntax and instructions. Also see our vignette for a walkthrough. All files are also available on our GitHub repo.

[11] “Assessing Omitted Variable Bias when the Controls are Endogenous", with Paul Diegert and Alexandre Poirier (June 2022; latest draft: May 2023)

Abstract
  • Omitted variables are one of the most important threats to the identification of causal effects. Several widely used approaches, including Oster (2019), assess the impact of omitted variables on empirical conclusions by comparing measures of selection on observables with measures of selection on unobservables. These approaches either (1) assume the omitted variables are uncorrelated with the included controls, an assumption that is often considered strong and implausible, or (2) use a method called residualization to avoid this assumption. In our first contribution, we develop a framework for objectively comparing sensitivity parameters. We use this framework to formally prove that the residualization method generally leads to incorrect conclusions about robustness. In our second contribution, we then provide a new approach to sensitivity analysis that avoids this critique, allows the omitted variables to be correlated with the included controls, and lets researchers calibrate sensitivity parameters by comparing the magnitude of selection on observables with the magnitude of selection on unobservables as in previous methods. We illustrate our results in an empirical study of the effect of historical American frontier life on modern cultural beliefs. Finally, we implement these methods in the companion Stata module regsensitivity for easy use in practice.

See [12] for Stata module installation instructions.

Published papers

[10] “Choosing Exogeneity Assumptions in Potential Outcome Models”, with Alexandre Poirier (May 2022); Supersedes our previous paper “Interpreting Quantile Independence”, Econometrics Journal (forthcoming)

Abstract
  • There are many kinds of exogeneity assumptions. How should researchers choose among them? When exogeneity is imposed on an unobservable like a potential outcome, we argue that the form of exogeneity should be chosen based on the kind of selection on unobservables it allows. Consequently, researchers can assess the plausibility of any exogeneity assumption by studying the distributions of treatment given the unobservables that are consistent with that assumption. We use this approach to study two common exogeneity assumptions: quantile and mean independence. We show that both assumptions require a kind of non-monotonic relationship between treatment and the potential outcomes. We discuss how to assess the plausibility of this kind of treatment selection. We also show how to define a new and weaker version of quantile independence that allows for monotonic treatment selection. We then show the implications of the choice of exogeneity assumption for identification. We apply these results in an empirical illustration of the effect of child soldiering on wages.

[9] “Assessing Sensitivity to Unconfoundedness: Estimation and Inference”, with Alexandre Poirier and Linqi Zhang (December 2020), Journal of Business & Economic Statistics (forthcoming)

Abstract
  • This paper provides a set of methods for quantifying the robustness of treatment effects estimated using the unconfoundedness assumption (also known as selection on observables or conditional independence). Specifically, we estimate and do inference on bounds on various treatment effect parameters, like the average treatment effect (ATE) and the average effect of treatment on the treated (ATT), under nonparametric relaxations of the unconfoundedness assumption indexed by a scalar sensitivity parameter c. These relaxations allow for limited selection on unobservables, depending on the value of c. For large enough c, these bounds equal the no assumptions bounds. Using a non-standard bootstrap method, we show how to construct confidence bands for these bound functions which are uniform over all values of c. We illustrate these methods with an empirical application to effects of the National Supported Work Demonstration program. We implement these methods in a companion Stata module for easy use in practice.

To install the companion Stata module, type ssc install tesensitivity from within Stata. Type help tesensitivity for syntax and instructions. Also see our vignette for a walkthrough. All files are also available on our GitHub repo.

[8] “ivcrc: An Instrumental Variables Estimator for the Correlated Random Coefficients Model” (2022), with David Benson and Alexander Torgovitsky (Preprint), The Stata Journal

Abstract
  • We present the ivcrc command, which implements an instrumental variables (IV) estimator for the linear correlated random coefficients (CRC) model. This model is a natural generalization of the standard linear IV model that allows for endogenous, multivalued treatments and unobserved heterogeneity in treatment effects. The proposed estimator uses recent semiparametric identification results that allow for flexible functional forms and permit instruments that may be binary, discrete, or continuous. The command also allows for the estimation of varying coefficients regressions, which are closely related in structure to the proposed IV estimator. We illustrate this IV estimator and the ivcrc command by estimating the returns to education in the National Longitudinal Survey of Young Men.

[7] “Salvaging Falsified Instrumental Variable Models” (2021), with Alexandre Poirier (Supplemental appendix; Replication files; Dec 2018 draft; Journal link), Econometrica

Abstract
  • What should researchers do when their baseline model is falsified? We recommend reporting the set of parameters that are consistent with minimally nonfalsified models. We call this the falsification adaptive set (FAS). This set generalizes the standard baseline estimand to account for possible falsification. Importantly, it does not require the researcher to select or calibrate sensitivity parameters. In the classical linear IV model with multiple instruments, we show that the FAS has a simple closed-form expression that only depends on a few 2SLS coefficients. We apply our results to an empirical study of roads and trade. We show how the FAS complements traditional overidentification tests by summarizing the variation in estimates obtained from alternative nonfalsified models.

[6] “Inference on Breakdown Frontiers” (2020), with Alexandre Poirier (Supplemental appendix; Replication files; May 2017 draft; arXiv drafts), Quantitative Economics

Abstract
  • A breakdown frontier is the boundary between the set of assumptions which lead to a specific conclusion and those which do not. In a potential outcomes model with a binary treatment, we consider two conclusions: First, that ATE is at least a specific value (e.g., nonnegative) and second that the proportion of units who benefit from treatment is at least a specific value (e.g., at least 50%). For these conclusions, we derive the breakdown frontier for two kinds of assumptions: one which indexes deviations from random assignment of treatment, and one which indexes deviations from rank invariance. These classes of assumptions nest both the point identifying assumptions of random assignment and rank invariance and the opposite end of no constraints on treatment selection or the dependence structure between potential outcomes. This frontier provides a quantitative measure of robustness of conclusions to deviations from the point identifying assumptions. We derive root-N-consistent sample analog estimators for these frontiers. We then provide two asymptotically valid bootstrap procedures for constructing lower uniform confidence bands for the breakdown frontier. As a measure of robustness, estimated breakdown frontiers and their corresponding confidence bands can be presented alongside traditional point estimates and confidence intervals obtained under point identifying assumptions. We illustrate this approach in an empirical application to the effect of child soldiering on wages. We find that the conclusions we consider are fairly robust to failure of rank invariance, when random assignment holds, but these conclusions are much more sensitive to both assumptions for small deviations from random assignment.

[5] “A Practical Guide to Compact Infinite Dimensional Parameter Spaces’‘ (2019), with Joachim Freyberger (Supplemental appendix; First version), Econometric Reviews

Abstract
  • We gather and review general compactness results for many commonly used parameter spaces in nonparametric estimation, and we provide several new results. We consider three kinds of functions: (1) functions with bounded domains which satisfy standard norm bounds, (2) functions with bounded domains which do not satisfy standard norm bounds, and (3) functions with unbounded domains. In all three cases we provide two kinds of results, compact embedding and closedness, which together allow one to show that parameter spaces defined by a strong norm bound are compact under a consistency norm. We illustrate how these results are typically used in econometrics by considering two common settings: nonparametric mean regression and nonparametric instrumental variables estimation.

[4] “Identification of Treatment Effects under Conditional Partial Independence” (2018), with Alexandre Poirier (Journal link), Econometrica

Abstract
  • Conditional independence of treatment assignment from potential outcomes is a commonly used but nonrefutable assumption. We derive identified sets for various treatment effect parameters under nonparametric deviations from this conditional independence assumption. These deviations are defined via a conditional treatment assignment probability, which makes it straightforward to interpret. Our results can be used to assess the robustness of empirical conclusions obtained under the baseline conditional independence assumption.

Note 1: See here for an erratum correcting Proposition 5.

Note 2: See our paper [9] Masten, Poirier, and Zhang (2023) for the corresponding estimation and inference theory, as well as a companion Stata module: ssc install tesensitivity. Further installation instructions above.

[3] “Random Coefficients on Endogenous Variables in Simultaneous Equations Models” (2018) (Supplemental appendix; Replication files; 2015 preprint; 2013 preprint), The Review of Economic Studies

Abstract
  • This paper considers a classical linear simultaneous equations model with random coefficients on the endogenous variables. Simultaneous equations models are used to study social interactions, strategic interactions between firms, and market equilibrium. Random coefficient models allow for heterogeneous marginal effects. I show that random coefficient seemingly unrelated regression models with common regressors are not point identified, which implies random coefficient simultaneous equations models are not point identified. Important features of these models, however, can be identified. For two-equation systems, I give two sets of sufficient conditions for point identification of the coefficients’ marginal distributions conditional on exogenous covariates. The first allows for small support continuous instruments under tail restrictions on the distributions of unobservables which are necessary for point identification. The second requires full support instruments, but allows for nearly arbitrary distributions of unobservables. I discuss how to generalize these results to many equation systems, where I focus on linear-in-means models with heterogeneous endogenous social interaction effects. I give sufficient conditions for point identification of the distributions of these endogenous social effects. I propose a consistent nonparametric kernel estimator for these distributions based on the identification arguments. I apply my results to the Add Health data to analyze peer effects in education.

[2] “Identification of Instrumental Variable Correlated Random Coefficients Models” (2016), with Alexander Torgovitsky (Preprint; 2014 draft (longer)),The Review of Economics and Statistics

Companion Stata module ivcrc is available at our GitHub repo, which includes installation instructions.

Abstract
  • We study identification and estimation of the average partial effect in an instrumental variable correlated random coefficients model with continuously distributed endogenous regressors. This model allows treatment effects to be correlated with the level of treatment. The main result shows that the average partial effect is identified by averaging coefficients obtained from a collection of ordinary linear regressions that condition on different realizations of a control function. These control functions can be constructed from binary or discrete instruments which may affect the endogenous variables heterogeneously. Our results suggest a simple estimator that can be implemented with a companion Stata module.

[1] “A Specification Test for Discrete Choice Models” (2013) with Mark Chicu, Economics Letters

Abstract
  • In standard discrete choice models, adding options cannot increase the choice probability of an existing alternative. We use this observation to construct a simple nonparametric specification test by exploiting variation in the choice sets individuals face. We use a multiple testing procedure to determine the particular kind of choice sets that produce violations. We apply these tests to the 1896 US House of Representatives election and reject commonly used discrete choice voting models.

Other papers

Partial Independence in Nonseparable Models“, with Alexandre Poirier (June 2016); Portions of this paper appear in [4] and [10]

Abstract
  • We analyze identification of nonseparable models under three kinds of exogeneity assumptions weaker than full statistical independence. The first is based on quantile independence. Selection on unobservables drives deviations from full independence. We show that such deviations based on quantile independence require non-monotonic and oscillatory propensity scores. Our second and third approaches are based on a distance-from-independence metric, using either a conditional cdf or a propensity score. Under all three approaches we obtain simple analytical characterizations of identified sets for various parameters of interest. We do this in three models: the exogenous regressor model of Matzkin (2003), the instrumental variable model of Chernozhukov and Hansen (2005), and the binary choice model with nonparametric latent utility of Matzkin (1992).

How Should the Graduate Economics Core be Changed?” (2011) with Jose Miguel Abito, Katarina Borovickova, Hays Golden, Jacob Goldin, Miguel Morin, Alexandre Poirier, Vincent Pons, Israel Romem, Tyler Williams, and Chamna Yoon, The Journal of Economic Education

Abstract
  • The authors present suggestions by graduate students from a range of economics departments for improving the first-year core sequence in economics. The students identified a number of elements that should be added to the core: more training in building microeconomic models, a discussion of the methodological foundations of model-building, more emphasis on institutions to motivate and contextualize macroeconomic models, and greater focus on econometric practice rather than theory. The authors hope that these suggestions will encourage departments to take a fresh look at the content of the first-year core.