Title: | Specification Tests for Parametric Propensity Score Models |
---|---|
Description: | The propensity score is one of the most widely used tools in studying the causal effect of a treatment, intervention, or policy. Given that the propensity score is usually unknown, it has to be estimated, implying that the reliability of many treatment effect estimators depends on the correct specification of the (parametric) propensity score. This package implements the data-driven nonparametric diagnostic tools for detecting propensity score misspecification proposed by Sant'Anna and Song (2019) <doi:10.1016/j.jeconom.2019.02.002>. |
Authors: | Pedro H. C. Sant'Anna <[email protected]>, Xiaojun Song <[email protected]> |
Maintainer: | Pedro H. C. Sant'Anna <[email protected]> |
License: | GPL-2 |
Version: | 0.1.3.900 |
Built: | 2025-02-17 04:29:39 UTC |
Source: | https://github.com/pedrohcgs/pstest |
The propensity score is one of the most widely used tools in studying the causal effect of a treatment, intervention, or policy. Given that the propensity score is usually unknown, it has to be estimated, implying that the reliability of many treatment effect estimators depends on the correct specification of the (parametric) propensity score. This package provides data-driven nonparametric diagnostic tools for detecting propensity score misspecification.
This R package implements the class of specification test for the propensity score proposed in Sant'Anna and Song (2016), ‘Specification Tests for the Propensity Score’, available at Pedro H.C. Sant'Anna webpage, http://sites.google.com/site/pedrohcsantanna/.
In short, this package implements Kolmogorov-Smirnov and Cramer-von Mises type tests for parametric propensity score models with either logistic ('logit'), or normal ('probit') link function. Critical values are computed with the assistance of a multiplier bootstrap.
The tests are based on the integrated conditional moment approach, where the weight function used is based on an orthogonal projection onto the tangent space of nuisance parameters. As a result, the tests (a) enjoy improved power properties, (b) do not suffer from the 'curse of dimensionality' when the vector of covariates is of high-dimensionality, (c) are fully data-driven, (e) do not require tuning parameters such as bandwidths, and (e) are able to detect a broad class of local alternatives converging to the null at the parametric rate. These appealing features highlight that the tests can be of great use in practice.
It is worth stressing that this package implements in a unified manner a large class of
specification tests, depending on the chosen weight function :
‘ind’ - the indicator weight function . This is the default.
‘exp’ - the exponential weight function .
‘logistic’ - the logistic weight function .
‘sin’ - the sine weight function .
‘sincos’ - the sine and cosine weight function .
Different weight functions have different power properties, and therefore,
being able to choose different
gives us flexibility to direct power in desired
directions.
Sant'Anna, Pedro H. C, and Song, Xiaojun (2016), Specification Tests for the Propensity Score, available at http://sites.google.com/site/pedrohcsantanna/.
pstest computes Kolmogorov-Smirnov and Cramer-von Mises type tests for the null hypothesis that a parametric model for the propensity score is is correctly specified. For details of the testing procedure, see Sant'Anna and Song (2016),'Specification Tests for the Propensity Score'.
pstest(d, pscore, xpscore, pscore.model = NULL, model = "logit", w = "ind", dist = "Mammen", nboot = 1000, cores = 1, chunk = 1000)
pstest(d, pscore, xpscore, pscore.model = NULL, model = "logit", w = "ind", dist = "Mammen", nboot = 1000, cores = 1, chunk = 1000)
d |
a vector containing the binary treatment indicator. |
pscore |
a vector containing the estimated propensity scores. |
xpscore |
a matrix (or data frame) containing the covariates (and their transformations) included in the propensity score estimation. It should also include the constant term. |
pscore.model |
in case you you set model="het.probit", pscore.model is the entire hetglm object. Default for pscore.model is NULL. |
model |
a description of the functional form (link function) used to estimated propensity score. The alternatives are: 'logit' (default), 'probit', and het.probit |
w |
a description of which weight function the projection is based on.
The alternatives are 'ind' (default), which set |
dist |
a description of which distribution to use during the bootstrap. The alternatives are 'Mammen' (default), and 'Rademacher'. |
nboot |
number of bootstrap replicates to perform. Default is 1,000. |
cores |
number of cores to use during the bootstrap. Default is 1. If cores is greater than 1, the bootstrap is conducted using parLapply, instead of lapply type call. |
chunk |
a value that determine the size of each 'tile'. Such argument is used to split the original data into chunks, saving memory. Default value is 1,000. If the pstest function throw a memory error, you should choose a smaller value for chunk. |
a list containing the Kolmogorov-Smirnov and Cramer-von Mises test statistics for the null hypothesis of correctly specified propensity score model (kstest and cvmtest, respectively), and their associate bootstrapped p-values, pvks and pvcvm, respectively. All inputs are also returned.
Sant'Anna, Pedro H. C, and Song, Xiaojun (2019), Specification Tests for the Propensity Score, Journal of Econometrics, vol. 210 (2), p. 379-404.
# Example based on simulation data # Simulate vector of covariates set.seed(1234) x1 <- runif(100) x2 <- rt(100, 5) x3 <- rpois(100, 3) # generate treatment status score based on Probit Specification treat <- (x1 + x2 + x3 >= rnorm(100, 4, 5)) # estimate correctly specified propensity score based on Probit pscore <- stats::glm(treat ~ x1 + x2 + x3, family = binomial(link = "probit"), x = TRUE) # Test the correct specification of estimated propensity score, using # the weight function 'ind', and bootstrap based on 'Mammen'. pstest(d = pscore$y, pscore = pscore$fit, xpscore = pscore$x, model = "probit", w = "ind", dist = "Mammen") # Alternatively, one can use the 'sin' weight function pstest(d = pscore$y, pscore = pscore$fit, xpscore = pscore$x, model = "probit", w = "sin", dist = "Mammen")
# Example based on simulation data # Simulate vector of covariates set.seed(1234) x1 <- runif(100) x2 <- rt(100, 5) x3 <- rpois(100, 3) # generate treatment status score based on Probit Specification treat <- (x1 + x2 + x3 >= rnorm(100, 4, 5)) # estimate correctly specified propensity score based on Probit pscore <- stats::glm(treat ~ x1 + x2 + x3, family = binomial(link = "probit"), x = TRUE) # Test the correct specification of estimated propensity score, using # the weight function 'ind', and bootstrap based on 'Mammen'. pstest(d = pscore$y, pscore = pscore$fit, xpscore = pscore$x, model = "probit", w = "ind", dist = "Mammen") # Alternatively, one can use the 'sin' weight function pstest(d = pscore$y, pscore = pscore$fit, xpscore = pscore$x, model = "probit", w = "sin", dist = "Mammen")