Title: | Doubly Robust Difference-in-Differences Estimators |
---|---|
Description: | Implements the locally efficient doubly robust difference-in-differences (DiD) estimators for the average treatment effect proposed by Sant'Anna and Zhao (2020) <doi:10.1016/j.jeconom.2020.06.003>. The estimator combines inverse probability weighting and outcome regression estimators (also implemented in the package) to form estimators with more attractive statistical properties. Two different estimation methods can be used to estimate the nuisance functions. |
Authors: | Pedro H. C. Sant'Anna [aut, cre, cph], Jun Zhao [aut] |
Maintainer: | Pedro H. C. Sant'Anna <[email protected]> |
License: | GPL-3 |
Version: | 1.2.1 |
Built: | 2024-11-15 06:18:58 UTC |
Source: | https://github.com/pedrohcgs/drdid |
drdid
is used to compute the locally efficient doubly robust estimators for the ATT
in difference-in-differences (DiD) setups. It can be used with panel or stationary repeated cross section data.
Data should be store in "long" format.
drdid( yname, tname, idname, dname, xformla = NULL, data, panel = TRUE, estMethod = c("imp", "trad"), weightsname = NULL, boot = FALSE, boot.type = c("weighted", "multiplier"), nboot = 999, inffunc = FALSE )
drdid( yname, tname, idname, dname, xformla = NULL, data, panel = TRUE, estMethod = c("imp", "trad"), weightsname = NULL, boot = FALSE, boot.type = c("weighted", "multiplier"), nboot = 999, inffunc = FALSE )
yname |
The name of the outcome variable. |
tname |
The name of the column containing the time periods. |
idname |
The name of the column containing the unit id name. |
dname |
The name of the column containing the treatment group (=1 if observation is treated in the post-treatment, =0 otherwise) |
xformla |
A formula for the covariates to include in the model. It should be of the form |
data |
The name of the data.frame that contains the data. |
panel |
Whether or not the data is a panel dataset. The panel dataset should be provided in long format – that
is, where each row corresponds to a unit observed at a particular point in time. The default is TRUE.
When |
estMethod |
the method to estimate the nuisance parameters. The default is "imp" which uses weighted least squares to estimate the outcome regressions and inverse probability tilting to the estimate the the propensity score, leading to the improved locally efficient DR DiD estimator proposed by Sant'Anna and Zhao (2020). The other alternative is "trad", which then uses OLS to estimate outcome regressions and maximum likelihood to estimate propensity score. This leads to the "traditional" locally efficient DR DiD estimator proposed by Sant'Anna and Zhao (2020). |
weightsname |
The name of the column containing the sampling weights. If NULL, then every observation has the same weights. The weights are normalized and therefore enforced to have mean 1 across all observations. |
boot |
Logical argument to whether bootstrap should be used for inference. Default is |
boot.type |
Type of bootstrap to be performed (not relevant if |
nboot |
Number of bootstrap repetitions (not relevant if boot = |
inffunc |
Logical argument to whether influence function should be returned. Default is |
When panel data are available (panel = TRUE
), the drdid
function implements the
locally efficient doubly robust difference-in-differences (DiD) estimator for the average treatment effect
on the treated (ATT) defined in equation (3.1) in Sant'Anna and Zhao (2020). This estimator makes use of
a logistic propensity score model for the probability of being in the treated group,
and of a linear regression model for the outcome evolution among the comparison units.
When only stationary repeated cross-section data are available (panel = FALSE
), the drdid
function
implements the locally efficient doubly robust difference-in-differences (DiD) estimator for the
average treatment effect on the treated (ATT) defined in equation (3.4) in Sant'Anna and Zhao (2020).
This estimator makes use of a logistic propensity score model for the probability of being in the
treated group, and of (separate) linear regression models for the outcome of both treated and comparison units,
in both pre and post-treatment periods.
When one sets estMethod = "imp"
(the default), the nuisance parameters (propensity score and
outcome regression parameters) are estimated using the methods described in Sections 3.1 and 3.2 of
Sant'Anna and Zhao (2020). In short, the propensity score parameters are estimated using the inverse
probability tilting estimator proposed by Graham, Pinto and Pinto (2012), and the outcome
regression coefficients are estimated using weighted least squares,where the weights depend on
the propensity score estimates; see Sant'Anna and Zhao (2020) for details.
When one sets estMethod = "trad"
, the propensity score parameters are estimated using maximum
likelihood, and the outcome regression coefficients are estimated using ordinary least squares.
The main advantage of using estMethod = "imp"
is that the resulting estimator is not only
locally efficient and doubly robust for the ATT, but it is also doubly robust for inference;
see Sant'Anna and Zhao (2020) for details.
A list containing the following components:
ATT |
The DR DiD point estimate |
se |
The DR DiD standard error |
uci |
Estimate of the upper bound of a 95% CI for the ATT |
lci |
Estimate of the lower bound of a 95% CI for the ATT |
boots |
All Bootstrap draws of the ATT, in case bootstrap was used to conduct inference. Default is NULL |
att.inf.func |
Estimate of the influence function. Default is NULL |
ps.flag |
Convergence Flag for the propensity score estimation
(only active if |
call.param |
The matched call. |
argu |
Some arguments used in the call (panel, estMethod, boot, boot.type, nboot, type="dr") |
Graham, Bryan, Pinto, Cristine, and Egel, Daniel (2012), "Inverse Probability Tilting for Moment Condition Models with Missing Data." Review of Economic Studies, vol. 79 (3), pp. 1053-1079, doi:10.1093/restud/rdr047
Sant'Anna, Pedro H. C. and Zhao, Jun. (2020), "Doubly Robust Difference-in-Differences Estimators." Journal of Econometrics, Vol. 219 (1), pp. 101-122, doi:10.1016/j.jeconom.2020.06.003
# ----------------------------------------------- # Panel data case # ----------------------------------------------- # Form the Lalonde sample with CPS comparison group eval_lalonde_cps <- subset(nsw_long, nsw_long$treated == 0 | nsw_long$sample == 2) # Further reduce sample to speed example set.seed(123) unit_random <- sample(unique(eval_lalonde_cps$id), 5000) eval_lalonde_cps <- eval_lalonde_cps[eval_lalonde_cps$id %in% unit_random,] # ----------------------------------------------- # Implement improved DR locally efficient DiD with panel data drdid(yname="re", tname = "year", idname = "id", dname = "experimental", xformla= ~ age+ educ+ black+ married+ nodegree+ hisp+ re74, data = eval_lalonde_cps, panel = TRUE) #Implement "traditional" DR locally efficient DiD with panel data drdid(yname="re", tname = "year", idname = "id", dname = "experimental", xformla= ~ age+ educ+ black+ married+ nodegree+ hisp+ re74, data = eval_lalonde_cps, panel = TRUE, estMethod = "trad") # ----------------------------------------------- # Repeated cross section case # ----------------------------------------------- # use the simulated data provided in the package #Implement "improved" DR locally efficient DiD with repeated cross-section data drdid(yname="y", tname = "post", idname = "id", dname = "d", xformla= ~ x1 + x2 + x3 + x4, data = sim_rc, panel = FALSE, estMethod = "imp") #Implement "traditional" DR locally efficient DiD with repeated cross-section data drdid(yname="y", tname = "post", idname = "id", dname = "d", xformla= ~ x1 + x2 + x3 + x4, data = sim_rc, panel = FALSE, estMethod = "trad")
# ----------------------------------------------- # Panel data case # ----------------------------------------------- # Form the Lalonde sample with CPS comparison group eval_lalonde_cps <- subset(nsw_long, nsw_long$treated == 0 | nsw_long$sample == 2) # Further reduce sample to speed example set.seed(123) unit_random <- sample(unique(eval_lalonde_cps$id), 5000) eval_lalonde_cps <- eval_lalonde_cps[eval_lalonde_cps$id %in% unit_random,] # ----------------------------------------------- # Implement improved DR locally efficient DiD with panel data drdid(yname="re", tname = "year", idname = "id", dname = "experimental", xformla= ~ age+ educ+ black+ married+ nodegree+ hisp+ re74, data = eval_lalonde_cps, panel = TRUE) #Implement "traditional" DR locally efficient DiD with panel data drdid(yname="re", tname = "year", idname = "id", dname = "experimental", xformla= ~ age+ educ+ black+ married+ nodegree+ hisp+ re74, data = eval_lalonde_cps, panel = TRUE, estMethod = "trad") # ----------------------------------------------- # Repeated cross section case # ----------------------------------------------- # use the simulated data provided in the package #Implement "improved" DR locally efficient DiD with repeated cross-section data drdid(yname="y", tname = "post", idname = "id", dname = "d", xformla= ~ x1 + x2 + x3 + x4, data = sim_rc, panel = FALSE, estMethod = "imp") #Implement "traditional" DR locally efficient DiD with repeated cross-section data drdid(yname="y", tname = "post", idname = "id", dname = "d", xformla= ~ x1 + x2 + x3 + x4, data = sim_rc, panel = FALSE, estMethod = "trad")
drdid_imp_panel
is used to compute the locally efficient doubly robust estimators for the ATT
in difference-in-differences (DiD) setups with panel data. The resulting estimator is also doubly robust
for inference; see Section 3.1 of Sant'Anna and Zhao (2020).
drdid_imp_panel( y1, y0, D, covariates, i.weights = NULL, boot = FALSE, boot.type = "weighted", nboot = NULL, inffunc = FALSE )
drdid_imp_panel( y1, y0, D, covariates, i.weights = NULL, boot = FALSE, boot.type = "weighted", nboot = NULL, inffunc = FALSE )
y1 |
An |
y0 |
An |
D |
An |
covariates |
An |
i.weights |
An |
boot |
Logical argument to whether bootstrap should be used for inference. Default is FALSE. |
boot.type |
Type of bootstrap to be performed (not relevant if |
nboot |
Number of bootstrap repetitions (not relevant if |
inffunc |
Logical argument to whether influence function should be returned. Default is FALSE. |
The drdid_imp_panel
function implements the locally efficient doubly robust difference-in-differences (DiD)
estimator for the average treatment effect on the treated (ATT) defined in equation (3.1)
in Sant'Anna and Zhao (2020). This estimator makes use of a logistic propensity score model for the probability
of being in the treated group, and of a linear regression model for the outcome evolution among the comparison units.
The nuisance parameters (propensity score and outcome regression parameters) are estimated using the methods described in Sections 3.1 of Sant'Anna and Zhao (2020). In short, the propensity score parameters are estimated using the inverse probability tilting estimator proposed by Graham, Pinto and Pinto (2012), and the outcome regression coefficients are estimated using weighted least squares,where the weights depend on the propensity score estimates; see Sant'Anna and Zhao (2020) for details.
The resulting estimator is not only locally efficient and doubly robust for the ATT, but it is also doubly robust for inference; see Sant'Anna and Zhao (2020) for details.
A list containing the following components:
ATT |
The DiD point estimate. |
se |
The DiD standard error. |
uci |
The upper bound of the 95% CI for the ATT. |
lci |
The lower bound of the 95% CI for the ATT |
boots |
All Bootstrap draws of the ATT, in case bootstrap was used to conduct inference. Default is |
ps.flag |
Convergence Flag for the propensity score estimation: =0 if |
att.inf.func |
Estimate of the influence function. Default is NULL |
call.param |
The matched call. |
argu |
Some arguments used (explicitly or not) in the call (panel = TRUE, estMethod = "imp", boot, boot.type, nboot, type="dr") |
Graham, Bryan, Pinto, Cristine, and Egel, Daniel (2012), "Inverse Probability Tilting for Moment Condition Models with Missing Data." Review of Economic Studies, vol. 79 (3), pp. 1053-1079, doi:10.1093/restud/rdr047
Sant'Anna, Pedro H. C. and Zhao, Jun. (2020), "Doubly Robust Difference-in-Differences Estimators." Journal of Econometrics, Vol. 219 (1), pp. 101-122, doi:10.1016/j.jeconom.2020.06.003
# Form the Lalonde sample with CPS comparison group eval_lalonde_cps <- subset(nsw, nsw$treated == 0 | nsw$sample == 2) # Further reduce sample to speed example set.seed(123) unit_random <- sample(1:nrow(eval_lalonde_cps), 5000) eval_lalonde_cps <- eval_lalonde_cps[unit_random,] # Select some covariates covX = as.matrix(cbind(1, eval_lalonde_cps$age, eval_lalonde_cps$educ, eval_lalonde_cps$black, eval_lalonde_cps$married, eval_lalonde_cps$nodegree, eval_lalonde_cps$hisp, eval_lalonde_cps$re74)) # Implement improved DR locally efficient DiD with panel data drdid_imp_panel(y1 = eval_lalonde_cps$re78, y0 = eval_lalonde_cps$re75, D = eval_lalonde_cps$experimental, covariates = covX)
# Form the Lalonde sample with CPS comparison group eval_lalonde_cps <- subset(nsw, nsw$treated == 0 | nsw$sample == 2) # Further reduce sample to speed example set.seed(123) unit_random <- sample(1:nrow(eval_lalonde_cps), 5000) eval_lalonde_cps <- eval_lalonde_cps[unit_random,] # Select some covariates covX = as.matrix(cbind(1, eval_lalonde_cps$age, eval_lalonde_cps$educ, eval_lalonde_cps$black, eval_lalonde_cps$married, eval_lalonde_cps$nodegree, eval_lalonde_cps$hisp, eval_lalonde_cps$re74)) # Implement improved DR locally efficient DiD with panel data drdid_imp_panel(y1 = eval_lalonde_cps$re78, y0 = eval_lalonde_cps$re75, D = eval_lalonde_cps$experimental, covariates = covX)
drdid_imp_rc
is used to compute the locally efficient doubly robust estimators for the ATT
in difference-in-differences (DiD) setups with stationary repeated cross-sectional data. The resulting estimator is
also doubly robust for inference; see Section 3.2 of Sant'Anna and Zhao (2020).
drdid_imp_rc( y, post, D, covariates, i.weights = NULL, boot = FALSE, boot.type = "weighted", nboot = NULL, inffunc = FALSE )
drdid_imp_rc( y, post, D, covariates, i.weights = NULL, boot = FALSE, boot.type = "weighted", nboot = NULL, inffunc = FALSE )
y |
An |
post |
An |
D |
An |
covariates |
An |
i.weights |
An |
boot |
Logical argument to whether bootstrap should be used for inference. Default is FALSE. |
boot.type |
Type of bootstrap to be performed (not relevant if |
nboot |
Number of bootstrap repetitions (not relevant if |
inffunc |
Logical argument to whether influence function should be returned. Default is FALSE. |
The drdid_imp_rc
function implements the locally efficient doubly robust difference-in-differences (DiD)
estimator for the average treatment effect on the treated (ATT) defined in equation (3.4)
in Sant'Anna and Zhao (2020). This estimator makes use of a logistic propensity score model for the probability
of being in the treated group, and of (separate) linear regression models for the outcome of both
treated and comparison units, in both pre and post-treatment periods.
The nuisance parameters (propensity score and outcome regression parameters) are estimated using the methods described in Sections 3.2 of Sant'Anna and Zhao (2020). In short, the propensity score parameters are estimated using the inverse probability tilting estimator proposed by Graham, Pinto and Pinto (2012), and the outcome regression coefficients are estimated using weighted least squares,where the weights depend on the propensity score estimates; see Sant'Anna and Zhao (2020) for details.
The resulting estimator is not only locally efficient and doubly robust for the ATT, but it is also doubly robust for inference; see Sant'Anna and Zhao (2020) for details.
A list containing the following components:
ATT |
The DR DiD point estimate |
se |
The DR DiD standard error |
uci |
Estimate of the upper bound of a 95% CI for the ATT |
lci |
Estimate of the lower bound of a 95% CI for the ATT |
boots |
All Bootstrap draws of the ATT, in case bootstrap was used to conduct inference. Default is NULL |
ps.flag |
Convergence Flag for the propensity score estimation: =0 if |
att.inf.func |
Estimate of the influence function. Default is NULL |
call.param |
The matched call. |
argu |
Some arguments used (explicitly or not) in the call (panel = FALSE, estMethod = "imp", boot, boot.type, nboot, type="dr") |
Graham, Bryan, Pinto, Cristine, and Egel, Daniel (2012), "Inverse Probability Tilting for Moment Condition Models with Missing Data." Review of Economic Studies, vol. 79 (3), pp. 1053-1079, doi:10.1093/restud/rdr047
Sant'Anna, Pedro H. C. and Zhao, Jun. (2020), "Doubly Robust Difference-in-Differences Estimators." Journal of Econometrics, Vol. 219 (1), pp. 101-122, doi:10.1016/j.jeconom.2020.06.003
# use the simulated data covX = as.matrix(cbind(1, sim_rc[,5:8])) # Implement the improved, locally efficient DR DiD estimator drdid_imp_rc(y = sim_rc$y, post = sim_rc$post, D = sim_rc$d, covariates= covX)
# use the simulated data covX = as.matrix(cbind(1, sim_rc[,5:8])) # Implement the improved, locally efficient DR DiD estimator drdid_imp_rc(y = sim_rc$y, post = sim_rc$post, D = sim_rc$d, covariates= covX)
drdid_imp_rc1
is used to compute the doubly robust estimators for the ATT
in difference-in-differences (DiD) setups with stationary repeated cross-sectional data. The resulting estimator is also doubly robust for inference,
though it is not locally efficient; see Section 3.2 of Sant'Anna and Zhao (2020).
drdid_imp_rc1( y, post, D, covariates, i.weights = NULL, boot = FALSE, boot.type = "weighted", nboot = NULL, inffunc = FALSE )
drdid_imp_rc1( y, post, D, covariates, i.weights = NULL, boot = FALSE, boot.type = "weighted", nboot = NULL, inffunc = FALSE )
y |
An |
post |
An |
D |
An |
covariates |
An |
i.weights |
An |
boot |
Logical argument to whether bootstrap should be used for inference. Default is FALSE. |
boot.type |
Type of bootstrap to be performed (not relevant if |
nboot |
Number of bootstrap repetitions (not relevant if |
inffunc |
Logical argument to whether influence function should be returned. Default is FALSE. |
The drdid_imp_rc1
function implements the doubly robust difference-in-differences (DiD)
estimator for the average treatment effect on the treated (ATT) defined in equation (3.3)
in Sant'Anna and Zhao (2020). This estimator makes use of a logistic propensity score model for the probability
of being in the treated group, and of (separate) linear regression models for the outcome among the comparison units in both pre and
post-treatment time periods. Importantly, this estimator is not locally efficient for the ATT.
The nuisance parameters (propensity score and outcome regression parameters) are estimated using the methods described in Sections 3.2 of Sant'Anna and Zhao (2020). In short, the propensity score parameters are estimated using the inverse probability tilting estimator proposed by Graham, Pinto and Pinto (2012), and the outcome regression coefficients are estimated using weighted least squares,where the weights depend on the propensity score estimates; see Sant'Anna and Zhao (2020) for details.
The resulting estimator is not only doubly robust for the ATT, but it is also doubly robust for inference. However, we stress that it is not locally efficient; see Sant'Anna and Zhao (2020) for details.
A list containing the following components:
ATT |
The DR DiD point estimate |
se |
The DR DiD standard error |
uci |
Estimate of the upper bound of a 95% CI for the ATT |
lci |
Estimate of the lower bound of a 95% CI for the ATT |
boots |
All Bootstrap draws of the ATT, in case bootstrap was used to conduct inference. Default is NULL |
ps.flag |
Convergence Flag for the propensity score estimation: =0 if |
att.inf.func |
Estimate of the influence function. Default is NULL |
call.param |
The matched call. |
argu |
Some arguments used (explicitly or not) in the call (panel = FALSE, estMethod = "imp2", boot, boot.type, nboot, type="dr") |
Graham, Bryan, Pinto, Cristine, and Egel, Daniel (2012), "Inverse Probability Tilting for Moment Condition Models with Missing Data." Review of Economic Studies, vol. 79 (3), pp. 1053-1079, doi:10.1093/restud/rdr047
Sant'Anna, Pedro H. C. and Zhao, Jun. (2020), "Doubly Robust Difference-in-Differences Estimators." Journal of Econometrics, Vol. 219 (1), pp. 101-122, doi:10.1016/j.jeconom.2020.06.003
# use the simulated data provided in the package covX = as.matrix(cbind(1, sim_rc[,5:8])) # Implement the improved DR DiD estimator (but not locally efficient!) drdid_imp_rc1(y = sim_rc$y, post = sim_rc$post, D = sim_rc$d, covariates= covX)
# use the simulated data provided in the package covX = as.matrix(cbind(1, sim_rc[,5:8])) # Implement the improved DR DiD estimator (but not locally efficient!) drdid_imp_rc1(y = sim_rc$y, post = sim_rc$post, D = sim_rc$d, covariates= covX)
drdid_panel
is used to compute the locally efficient doubly robust estimators for the ATT
in difference-in-differences (DiD) setups with panel data.
drdid_panel( y1, y0, D, covariates, i.weights = NULL, boot = FALSE, boot.type = "weighted", nboot = NULL, inffunc = FALSE )
drdid_panel( y1, y0, D, covariates, i.weights = NULL, boot = FALSE, boot.type = "weighted", nboot = NULL, inffunc = FALSE )
y1 |
An |
y0 |
An |
D |
An |
covariates |
An |
i.weights |
An |
boot |
Logical argument to whether bootstrap should be used for inference. Default is FALSE. |
boot.type |
Type of bootstrap to be performed (not relevant if |
nboot |
Number of bootstrap repetitions (not relevant if |
inffunc |
Logical argument to whether influence function should be returned. Default is FALSE. |
The drdid_panel
function implements the locally efficient doubly robust difference-in-differences (DiD)
estimator for the average treatment effect on the treated (ATT) defined in equation (3.1)
in Sant'Anna and Zhao (2020). This estimator makes use of a logistic propensity score model for the probability
of being in the treated group, and of a linear regression model for the outcome evolution among the comparison units.
The propensity score parameters are estimated using maximum likelihood, and the outcome regression coefficients are estimated using ordinary least squares.
A list containing the following components:
ATT |
The DR DiD point estimate. |
se |
The DR DiD standard error. |
uci |
Estimate of the upper bound of a 95% CI for the ATT. |
lci |
Estimate of the lower bound of a 95% CI for the ATT. |
boots |
All Bootstrap draws of the ATT, in case bootstrap was used to conduct inference. Default is NULL. |
att.inf.func |
Estimate of the influence function. Default is NULL. |
call.param |
The matched call. |
argu |
Some arguments used (explicitly or not) in the call (panel = TRUE, estMethod = "trad", boot, boot.type, nboot, type="dr") |
Sant'Anna, Pedro H. C. and Zhao, Jun. (2020), "Doubly Robust Difference-in-Differences Estimators." Journal of Econometrics, Vol. 219 (1), pp. 101-122, doi:10.1016/j.jeconom.2020.06.003
# Form the Lalonde sample with CPS comparison group (data in wide format) eval_lalonde_cps <- subset(nsw, nsw$treated == 0 | nsw$sample == 2) # Further reduce sample to speed example set.seed(123) unit_random <- sample(1:nrow(eval_lalonde_cps), 5000) eval_lalonde_cps <- eval_lalonde_cps[unit_random,] # Select some covariates covX = as.matrix(cbind(1, eval_lalonde_cps$age, eval_lalonde_cps$educ, eval_lalonde_cps$black, eval_lalonde_cps$married, eval_lalonde_cps$nodegree, eval_lalonde_cps$hisp, eval_lalonde_cps$re74)) # Implement traditional DR locally efficient DiD with panel data drdid_panel(y1 = eval_lalonde_cps$re78, y0 = eval_lalonde_cps$re75, D = eval_lalonde_cps$experimental, covariates = covX)
# Form the Lalonde sample with CPS comparison group (data in wide format) eval_lalonde_cps <- subset(nsw, nsw$treated == 0 | nsw$sample == 2) # Further reduce sample to speed example set.seed(123) unit_random <- sample(1:nrow(eval_lalonde_cps), 5000) eval_lalonde_cps <- eval_lalonde_cps[unit_random,] # Select some covariates covX = as.matrix(cbind(1, eval_lalonde_cps$age, eval_lalonde_cps$educ, eval_lalonde_cps$black, eval_lalonde_cps$married, eval_lalonde_cps$nodegree, eval_lalonde_cps$hisp, eval_lalonde_cps$re74)) # Implement traditional DR locally efficient DiD with panel data drdid_panel(y1 = eval_lalonde_cps$re78, y0 = eval_lalonde_cps$re75, D = eval_lalonde_cps$experimental, covariates = covX)
drdid_rc
is used to compute the locally efficient doubly robust estimators for the ATT
in difference-in-differences (DiD) setups with stationary repeated cross-sectional data.
drdid_rc( y, post, D, covariates, i.weights = NULL, boot = FALSE, boot.type = "weighted", nboot = NULL, inffunc = FALSE )
drdid_rc( y, post, D, covariates, i.weights = NULL, boot = FALSE, boot.type = "weighted", nboot = NULL, inffunc = FALSE )
y |
An |
post |
An |
D |
An |
covariates |
An |
i.weights |
An |
boot |
Logical argument to whether bootstrap should be used for inference. Default is FALSE. |
boot.type |
Type of bootstrap to be performed (not relevant if |
nboot |
Number of bootstrap repetitions (not relevant if |
inffunc |
Logical argument to whether influence function should be returned. Default is FALSE. |
The drdid_rc
function implements the locally efficient doubly robust difference-in-differences (DiD)
estimator for the average treatment effect on the treated (ATT) defined in equation (3.4)
in Sant'Anna and Zhao (2020). This estimator makes use of a logistic propensity score model for the probability
of being in the treated group, and of (separate) linear regression models for the outcome of both treated and comparison units,
in both pre and post-treatment periods.
The propensity score parameters are estimated using maximum likelihood, and the outcome regression coefficients are estimated using ordinary least squares; see Sant'Anna and Zhao (2020) for details.
A list containing the following components:
ATT |
The DR DiD point estimate |
se |
The DR DiD standard error |
uci |
Estimate of the upper bound of a 95% CI for the ATT |
lci |
Estimate of the lower bound of a 95% CI for the ATT |
boots |
All Bootstrap draws of the ATT, in case bootstrap was used to conduct inference. Default is NULL |
att.inf.func |
Estimate of the influence function. Default is NULL |
call.param |
The matched call. |
argu |
Some arguments used (explicitly or not) in the call (panel = TRUE, estMethod = "trad", boot, boot.type, nboot, type="dr") |
Sant'Anna, Pedro H. C. and Zhao, Jun. (2020), "Doubly Robust Difference-in-Differences Estimators." Journal of Econometrics, Vol. 219 (1), pp. 101-122, doi:10.1016/j.jeconom.2020.06.003
# use the simulated data provided in the package covX = as.matrix(cbind(1, sim_rc[,5:8])) # Implement the 'traditional' locally efficient DR DiD estimator drdid_rc(y = sim_rc$y, post = sim_rc$post, D = sim_rc$d, covariates= covX)
# use the simulated data provided in the package covX = as.matrix(cbind(1, sim_rc[,5:8])) # Implement the 'traditional' locally efficient DR DiD estimator drdid_rc(y = sim_rc$y, post = sim_rc$post, D = sim_rc$d, covariates= covX)
drdid_rc1
is used to compute the doubly robust estimators for the ATT
in difference-in-differences (DiD) setups with stationary repeated cross-sectional data. The resulting estimator is
not locally efficient; see Section 3.2 of Sant'Anna and Zhao (2020).
drdid_rc1( y, post, D, covariates, i.weights = NULL, boot = FALSE, boot.type = "weighted", nboot = NULL, inffunc = FALSE )
drdid_rc1( y, post, D, covariates, i.weights = NULL, boot = FALSE, boot.type = "weighted", nboot = NULL, inffunc = FALSE )
y |
An |
post |
An |
D |
An |
covariates |
An |
i.weights |
An |
boot |
Logical argument to whether bootstrap should be used for inference. Default is FALSE. |
boot.type |
Type of bootstrap to be performed (not relevant if |
nboot |
Number of bootstrap repetitions (not relevant if |
inffunc |
Logical argument to whether influence function should be returned. Default is FALSE. |
The drdid_rc1
function implements the doubly robust difference-in-differences (DiD)
estimator for the average treatment effect on the treated (ATT) defined in equation (3.3)
in Sant'Anna and Zhao (2020). This estimator makes use of a logistic propensity score model for the probability
of being in the treated group, and of (separate) linear regression models for the outcome among the comparison units in both pre and
post-treatment time periods. Importantly, this estimator is not locally efficient for the ATT.
The propensity score parameters are estimated using maximum likelihood, and the outcome regression coefficients are estimated using ordinary least squares.
The resulting estimator is not not locally efficient; see Sant'Anna and Zhao (2020) for details.
A list containing the following components:
ATT |
The DR DiD point estimate |
se |
The DR DiD standard error |
uci |
Estimate of the upper bound of a 95% CI for the ATT |
lci |
Estimate of the lower bound of a 95% CI for the ATT |
boots |
All Bootstrap draws of the ATT, in case bootstrap was used to conduct inference. Default is NULL |
att.inf.func |
Estimate of the influence function. Default is NULL |
call.param |
The matched call. |
argu |
Some arguments used (explicitly or not) in the call (panel = FALSE, estMethod = "trad2", boot, boot.type, nboot, type="dr") |
Sant'Anna, Pedro H. C. and Zhao, Jun. (2020), "Doubly Robust Difference-in-Differences Estimators." Journal of Econometrics, Vol. 219 (1), pp. 101-122, doi:10.1016/j.jeconom.2020.06.003
# use the simulated data provided in the package covX = as.matrix(cbind( 1, sim_rc[,5:8])) # Implement the 'traditional' DR DiD estimator (not locally efficient!) drdid_rc1(y = sim_rc$y, post = sim_rc$post, D = sim_rc$d, covariates= covX)
# use the simulated data provided in the package covX = as.matrix(cbind( 1, sim_rc[,5:8])) # Implement the 'traditional' DR DiD estimator (not locally efficient!) drdid_rc1(y = sim_rc$y, post = sim_rc$post, D = sim_rc$d, covariates= covX)
ipw_did_panel
is used to compute inverse probability weighted (IPW) estimators for the ATT
in difference-in-differences (DiD) setups with panel data. IPW weights are not normalized to sum up to one,
that is, the estimator is of the Horwitz-Thompson type.
ipw_did_panel( y1, y0, D, covariates, i.weights = NULL, boot = FALSE, boot.type = "weighted", nboot = NULL, inffunc = FALSE )
ipw_did_panel( y1, y0, D, covariates, i.weights = NULL, boot = FALSE, boot.type = "weighted", nboot = NULL, inffunc = FALSE )
y1 |
An |
y0 |
An |
D |
An |
covariates |
An |
i.weights |
An |
boot |
Logical argument to whether bootstrap should be used for inference. Default is FALSE. |
boot.type |
Type of bootstrap to be performed (not relevant if |
nboot |
Number of bootstrap repetitions (not relevant if |
inffunc |
Logical argument to whether influence function should be returned. Default is FALSE. |
A list containing the following components:
ATT |
The IPW DiD point estimate. |
se |
The IPW DiD standard error |
uci |
Estimate of the upper bound of a 95% CI for the ATT |
lci |
Estimate of the lower bound of a 95% CI for the ATT |
boots |
All Bootstrap draws of the ATT, in case bootstrap was used to conduct inference. Default is NULL |
att.inf.func |
Estimate of the influence function. Default is NULL |
call.param |
The matched call. |
argu |
Some arguments used (explicitly or not) in the call (panel = TRUE, normalized = FALSE, boot, boot.type, nboot, type="ipw") |
Abadie, Alberto (2005), "Semiparametric Difference-in-Differences Estimators", Review of Economic Studies, vol. 72(1), p. 1-19, doi:10.1111/0034-6527.00321
Sant'Anna, Pedro H. C. and Zhao, Jun. (2020), "Doubly Robust Difference-in-Differences Estimators." Journal of Econometrics, Vol. 219 (1), pp. 101-122, doi:10.1016/j.jeconom.2020.06.003
# Form the Lalonde sample with CPS comparison group eval_lalonde_cps <- subset(nsw, nsw$treated == 0 | nsw$sample == 2) # Further reduce sample to speed example set.seed(123) unit_random <- sample(1:nrow(eval_lalonde_cps), 5000) eval_lalonde_cps <- eval_lalonde_cps[unit_random,] # Select some covariates covX = as.matrix(cbind(1, eval_lalonde_cps$age, eval_lalonde_cps$educ, eval_lalonde_cps$black, eval_lalonde_cps$married, eval_lalonde_cps$nodegree, eval_lalonde_cps$hisp, eval_lalonde_cps$re74)) # Implement (unnormalized) IPW DiD with panel data ipw_did_panel(y1 = eval_lalonde_cps$re78, y0 = eval_lalonde_cps$re75, D = eval_lalonde_cps$experimental, covariates = covX)
# Form the Lalonde sample with CPS comparison group eval_lalonde_cps <- subset(nsw, nsw$treated == 0 | nsw$sample == 2) # Further reduce sample to speed example set.seed(123) unit_random <- sample(1:nrow(eval_lalonde_cps), 5000) eval_lalonde_cps <- eval_lalonde_cps[unit_random,] # Select some covariates covX = as.matrix(cbind(1, eval_lalonde_cps$age, eval_lalonde_cps$educ, eval_lalonde_cps$black, eval_lalonde_cps$married, eval_lalonde_cps$nodegree, eval_lalonde_cps$hisp, eval_lalonde_cps$re74)) # Implement (unnormalized) IPW DiD with panel data ipw_did_panel(y1 = eval_lalonde_cps$re78, y0 = eval_lalonde_cps$re75, D = eval_lalonde_cps$experimental, covariates = covX)
ipw_did_rc
is used to compute inverse probability weighted (IPW) estimators for the ATT
in difference-in-differences (DiD) setups with stationary cross-sectional data. IPW weights are not normalized
to sum up to one, that is, the estimator is of the Horwitz-Thompson type.
ipw_did_rc( y, post, D, covariates, i.weights = NULL, boot = FALSE, boot.type = "weighted", nboot = NULL, inffunc = FALSE )
ipw_did_rc( y, post, D, covariates, i.weights = NULL, boot = FALSE, boot.type = "weighted", nboot = NULL, inffunc = FALSE )
y |
An |
post |
An |
D |
An |
covariates |
An |
i.weights |
An |
boot |
Logical argument to whether bootstrap should be used for inference. Default is FALSE. |
boot.type |
Type of bootstrap to be performed (not relevant if |
nboot |
Number of bootstrap repetitions (not relevant if |
inffunc |
Logical argument to whether influence function should be returned. Default is FALSE. |
A list containing the following components:
ATT |
The IPW DiD point estimate. |
se |
The IPW DiD standard error |
uci |
Estimate of the upper bound of a 95% CI for the ATT |
lci |
Estimate of the lower bound of a 95% CI for the ATT |
boots |
All Bootstrap draws of the ATT, in case bootstrap was used to conduct inference. Default is NULL |
att.inf.func |
Estimate of the influence function. Default is NULL |
call.param |
The matched call. |
argu |
Some arguments used (explicitly or not) in the call (panel = FALSE, normalized = FALSE, boot, boot.type, nboot, type="ipw") |
Abadie, Alberto (2005), "Semiparametric Difference-in-Differences Estimators", Review of Economic Studies, vol. 72(1), p. 1-19, doi:10.1111/0034-6527.00321
Sant'Anna, Pedro H. C. and Zhao, Jun. (2020), "Doubly Robust Difference-in-Differences Estimators." Journal of Econometrics, Vol. 219 (1), pp. 101-122, doi:10.1016/j.jeconom.2020.06.003
# use the simulated data provided in the package covX = as.matrix(cbind(1, sim_rc[,5:8])) # Implement unnormalized IPW DiD estimator ipw_did_rc(y = sim_rc$y, post = sim_rc$post, D = sim_rc$d, covariates= covX)
# use the simulated data provided in the package covX = as.matrix(cbind(1, sim_rc[,5:8])) # Implement unnormalized IPW DiD estimator ipw_did_rc(y = sim_rc$y, post = sim_rc$post, D = sim_rc$d, covariates= covX)
ipwdid
computes the inverse probability weighted estimators for the average treatment effect
on the treated in difference-in-differences (DiD) setups. It can be used with panel or stationary repeated
cross-sectional data, with or without normalized (stabilized) weights. See Abadie (2005) and Sant'Anna and
Zhao (2020) for details.
ipwdid( yname, tname, idname, dname, xformla = NULL, data, panel = TRUE, normalized = TRUE, weightsname = NULL, boot = FALSE, boot.type = c("weighted", "multiplier"), nboot = 999, inffunc = FALSE )
ipwdid( yname, tname, idname, dname, xformla = NULL, data, panel = TRUE, normalized = TRUE, weightsname = NULL, boot = FALSE, boot.type = c("weighted", "multiplier"), nboot = 999, inffunc = FALSE )
yname |
The name of the outcome variable. |
tname |
The name of the column containing the time periods. |
idname |
The name of the column containing the unit id name. |
dname |
The name of the column containing the treatment group (=1 if observation is treated in the post-treatment, =0 otherwise) |
xformla |
A formula for the covariates to include in the model. It should be of the form |
data |
The name of the data.frame that contains the data. |
panel |
Whether or not the data is a panel dataset. The panel dataset should be provided in long format – that
is, where each row corresponds to a unit observed at a particular point in time. The default is TRUE.
When |
normalized |
Logical argument to whether IPW weights should be normalized to sum up to one. Default is |
weightsname |
The name of the column containing the sampling weights. If NULL, then every observation has the same weights. The weights are normalized and therefore enforced to have mean 1 across all observations. |
boot |
Logical argument to whether bootstrap should be used for inference. Default is |
boot.type |
Type of bootstrap to be performed (not relevant if |
nboot |
Number of bootstrap repetitions (not relevant if boot = |
inffunc |
Logical argument to whether influence function should be returned. Default is |
The ipwdid
function implements the
inverse probability weighted (IPW) difference-in-differences (DiD) estimator for the average treatment effect
on the treated (ATT) proposed by Abadie (2005) (normalized = FALSE
) or Hajek-type version
defined in equations (4.1) and (4.2) in Sant'Anna and Zhao (2020), when either panel data or
stationary repeated cross-sectional data are available. This estimator makes use of
a logistic propensity score model for the probability of being in the treated group, and the propensity score
parameters are estimated via maximum likelihood.
A list containing the following components:
ATT |
The IPW DiD point estimate |
se |
The IPW DiD standard error |
uci |
Estimate of the upper bound of a 95% CI for the ATT |
lci |
Estimate of the lower bound of a 95% CI for the ATT |
boots |
All Bootstrap draws of the ATT, in case bootstrap was used to conduct inference. Default is NULL |
att.inf.func |
Estimate of the influence function. Default is NULL |
call.param |
The matched call. |
argu |
Some arguments used in the call (panel, normalized, boot, boot.type, nboot, type=="ipw") |
Abadie, Alberto (2005), "Semiparametric Difference-in-Differences Estimators", Review of Economic Studies, vol. 72(1), p. 1-19, doi:10.1111/0034-6527.00321
Sant'Anna, Pedro H. C. and Zhao, Jun. (2020), "Doubly Robust Difference-in-Differences Estimators." Journal of Econometrics, Vol. 219 (1), pp. 101-122, doi:10.1016/j.jeconom.2020.06.003
# ----------------------------------------------- # Panel data case # ----------------------------------------------- # Form the Lalonde sample with CPS comparison group eval_lalonde_cps <- subset(nsw_long, nsw_long$treated == 0 | nsw_long$sample == 2) # Further reduce sample to speed example set.seed(123) unit_random <- sample(unique(eval_lalonde_cps$id), 5000) eval_lalonde_cps <- eval_lalonde_cps[eval_lalonde_cps$id %in% unit_random,] # Implement IPW DiD with panel data (normalized weights) ipwdid(yname="re", tname = "year", idname = "id", dname = "experimental", xformla= ~ age+ educ+ black+ married+ nodegree+ hisp+ re74, data = eval_lalonde_cps, panel = TRUE) # ----------------------------------------------- # Repeated cross section case # ----------------------------------------------- # use the simulated data provided in the package #Implement IPW DiD with repeated cross-section data (normalized weights) # use Bootstrap to make inference with 199 bootstrap draws (just for illustration) ipwdid(yname="y", tname = "post", idname = "id", dname = "d", xformla= ~ x1 + x2 + x3 + x4, data = sim_rc, panel = FALSE, boot = TRUE, nboot = 199)
# ----------------------------------------------- # Panel data case # ----------------------------------------------- # Form the Lalonde sample with CPS comparison group eval_lalonde_cps <- subset(nsw_long, nsw_long$treated == 0 | nsw_long$sample == 2) # Further reduce sample to speed example set.seed(123) unit_random <- sample(unique(eval_lalonde_cps$id), 5000) eval_lalonde_cps <- eval_lalonde_cps[eval_lalonde_cps$id %in% unit_random,] # Implement IPW DiD with panel data (normalized weights) ipwdid(yname="re", tname = "year", idname = "id", dname = "experimental", xformla= ~ age+ educ+ black+ married+ nodegree+ hisp+ re74, data = eval_lalonde_cps, panel = TRUE) # ----------------------------------------------- # Repeated cross section case # ----------------------------------------------- # use the simulated data provided in the package #Implement IPW DiD with repeated cross-section data (normalized weights) # use Bootstrap to make inference with 199 bootstrap draws (just for illustration) ipwdid(yname="y", tname = "post", idname = "id", dname = "d", xformla= ~ x1 + x2 + x3 + x4, data = sim_rc, panel = FALSE, boot = TRUE, nboot = 199)
nsw
contains all the subsamples of from the
National Supported Work (NSW) Demonstration analyzed used by Smith and Todd (2005)
in their paper "Does matching overcome LaLonde's critique of nonexperimental estimators?".
nsw
nsw
A data frame in "wide" format with 19204 observations on the following and 14 variables:
an indicator variable for treatment status. Missing if not part of the NSW experimental sample
age in years.
years of schooling.
indicator variable for blacks.
indicator variable for martial status.
indicator variable for high school diploma.
indicator variable for inclusion in Dehejia and Wahba sample. Missing if not part of the experimental sample
real earnings in 1974 (pre-treatment).
real earnings in 1975 (pre-treatment).
real earnings in 1978 (post-treatment).
indicator variable for Hispanics.
indicator variable for inclusion in the early random assignment sample in Smith and Todd (2005). Missing if not part of the experimental sample
1 if NSW (experimental sample), 2 if CPS comparison group, 3 if PSID comparison group.
1 if in experimental sample, 0 otherwise.
https://dataverse.harvard.edu/file.xhtml?persistentId=doi:10.7910/DVN/23407/DYEWLO&version=1.0.
Diamond, Alexis, and Sekhon, Jasjeet S. (2013), 'Genetic Matching for Estimating Causal Effects: A General Multivariate Matching Method for Achieving Balance in Observational Studies' Review of Economics and Statistics, vol. 95 , pp. 932-945, doi:10.1162/REST_a_00318
Smith, Jeffrey, and Todd, Petra (2005), Does matching overcome LaLonde's critique of nonexperimental estimators?' Journal of Econometrics, vol. 125, pp. 305-353, doi:10.1016/j.jeconom.2004.04.011
nsw_long
is the same dataset as nsw
but in a long format.
nsw_long
nsw_long
A data frame in "long" format with 38408 observations on the following and 15 variables:
unique identifier for each cross-sectional unit (worker).
year. 1975 is the pre-treatment and 1978 is the post-treatment
an indicator variable for treatment status. Missing if not part of the NSW experimental sample.
age in years.
years of schooling.
indicator variable for blacks.
indicator variable for martial status.
indicator variable for high school diploma.
indicator variable for inclusion in Dehejia and Wahba sample. Missing if not part of the experimental sample
real earnings in 1974 (pre-treatment).
indicator variable for Hispanics.
indicator variable for inclusion in the early random assignment sample in Smith and Todd (2005). Missing if not part of the experimental sample
1 if NSW (experimental sample), 2 if CPS comparison group, 3 if PSID comparison group.
real earnings (outcome of interest).
1 if in experimental sample, 0 otherwise.
https://dataverse.harvard.edu/file.xhtml?persistentId=doi:10.7910/DVN/23407/DYEWLO&version=1.0.
Diamond, Alexis, and Sekhon, Jasjeet S. (2013), 'Genetic Matching for Estimating Causal Effects: A General Multivariate Matching Method for Achieving Balance in Observational Studies' Review of Economics and Statistics, vol. 95 , pp. 932-945, doi:10.1162/REST_a_00318
Smith, Jeffrey, and Todd, Petra (2005), Does matching overcome LaLonde's critique of nonexperimental estimators?' Journal of Econometrics, vol. 125, pp. 305-353, doi:10.1016/j.jeconom.2004.04.011
ordid
computes the outcome regressions estimators for the average treatment effect on the
treated in difference-in-differences (DiD) setups. It can be used with panel or repeated cross section data.
See Sant'Anna and Zhao (2020) for details.
ordid( yname, tname, idname, dname, xformla = NULL, data, panel = TRUE, weightsname = NULL, boot = FALSE, boot.type = c("weighted", "multiplier"), nboot = 999, inffunc = FALSE )
ordid( yname, tname, idname, dname, xformla = NULL, data, panel = TRUE, weightsname = NULL, boot = FALSE, boot.type = c("weighted", "multiplier"), nboot = 999, inffunc = FALSE )
yname |
The name of the outcome variable. |
tname |
The name of the column containing the time periods. |
idname |
The name of the column containing the unit id name. |
dname |
The name of the column containing the treatment group (=1 if observation is treated in the post-treatment, =0 otherwise) |
xformla |
A formula for the covariates to include in the model. It should be of the form |
data |
The name of the data.frame that contains the data. |
panel |
Whether or not the data is a panel dataset. The panel dataset should be provided in long format – that
is, where each row corresponds to a unit observed at a particular point in time. The default is TRUE.
When |
weightsname |
The name of the column containing the sampling weights. If NULL, then every observation has the same weights. The weights are normalized and therefore enforced to have mean 1 across all observations. |
boot |
Logical argument to whether bootstrap should be used for inference. Default is |
boot.type |
Type of bootstrap to be performed (not relevant if |
nboot |
Number of bootstrap repetitions (not relevant if boot = |
inffunc |
Logical argument to whether influence function should be returned. Default is |
The ordid
function implements
outcome regression difference-in-differences (DiD) estimator for the average treatment effect
on the treated (ATT) defined in equation (2.2) of Sant'Anna and Zhao (2020). The estimator follows the same spirit
of the nonparametric estimators proposed by Heckman, Ichimura and Todd (1997), though here the the outcome regression
models are assumed to be linear in covariates (parametric).
The nuisance parameters (outcome regression coefficients) are estimated via ordinary least squares.
A list containing the following components:
ATT |
The OR DiD point estimate |
se |
The OR DiD standard error |
uci |
Estimate of the upper bound of a 95% CI for the ATT |
lci |
Estimate of the lower bound of a 95% CI for the ATT |
boots |
All Bootstrap draws of the ATT, in case bootstrap was used to conduct inference. Default is NULL |
att.inf.func |
Estimate of the influence function. Default is NULL |
call.param |
The matched call. |
argu |
Some arguments used in the call (panel, normalized, boot, boot.type, nboot, type=="or") |
Heckman, James J., Ichimura, Hidehiko, and Todd, Petra E. (1997),"Matching as an Econometric Evaluation Estimator: Evidence from Evaluating a Job Training Programme", Review of Economic Studies, vol. 64(4), p. 605–654, doi:10.2307/2971733.
Sant'Anna, Pedro H. C. and Zhao, Jun. (2020), "Doubly Robust Difference-in-Differences Estimators." Journal of Econometrics, Vol. 219 (1), pp. 101-122, doi:10.1016/j.jeconom.2020.06.003
# ----------------------------------------------- # Panel data case # ----------------------------------------------- # Form the Lalonde sample with CPS comparison group eval_lalonde_cps <- subset(nsw_long, nsw_long$treated == 0 | nsw_long$sample == 2) # Further reduce sample to speed example set.seed(123) unit_random <- sample(unique(eval_lalonde_cps$id), 5000) eval_lalonde_cps <- eval_lalonde_cps[eval_lalonde_cps$id %in% unit_random,] # Implement OR DiD with panel data ordid(yname="re", tname = "year", idname = "id", dname = "experimental", xformla= ~ age+ educ+ black+ married+ nodegree+ hisp+ re74, data = eval_lalonde_cps, panel = TRUE) # ----------------------------------------------- # Repeated cross section case # ----------------------------------------------- # use the simulated data provided in the package # Implement OR DiD with repeated cross-section data # use Bootstrap to make inference with 199 bootstrap draws (just for illustration) ordid(yname="y", tname = "post", idname = "id", dname = "d", xformla= ~ x1 + x2 + x3 + x4, data = sim_rc, panel = FALSE, boot = TRUE, nboot = 199)
# ----------------------------------------------- # Panel data case # ----------------------------------------------- # Form the Lalonde sample with CPS comparison group eval_lalonde_cps <- subset(nsw_long, nsw_long$treated == 0 | nsw_long$sample == 2) # Further reduce sample to speed example set.seed(123) unit_random <- sample(unique(eval_lalonde_cps$id), 5000) eval_lalonde_cps <- eval_lalonde_cps[eval_lalonde_cps$id %in% unit_random,] # Implement OR DiD with panel data ordid(yname="re", tname = "year", idname = "id", dname = "experimental", xformla= ~ age+ educ+ black+ married+ nodegree+ hisp+ re74, data = eval_lalonde_cps, panel = TRUE) # ----------------------------------------------- # Repeated cross section case # ----------------------------------------------- # use the simulated data provided in the package # Implement OR DiD with repeated cross-section data # use Bootstrap to make inference with 199 bootstrap draws (just for illustration) ordid(yname="y", tname = "post", idname = "id", dname = "d", xformla= ~ x1 + x2 + x3 + x4, data = sim_rc, panel = FALSE, boot = TRUE, nboot = 199)
reg_did_panel
computes the outcome regressions estimators for the average treatment effect on the
treated in difference-in-differences (DiD) setups with panel data.
reg_did_panel( y1, y0, D, covariates, i.weights = NULL, boot = FALSE, boot.type = "weighted", nboot = NULL, inffunc = FALSE )
reg_did_panel( y1, y0, D, covariates, i.weights = NULL, boot = FALSE, boot.type = "weighted", nboot = NULL, inffunc = FALSE )
y1 |
An |
y0 |
An |
D |
An |
covariates |
An |
i.weights |
An |
boot |
Logical argument to whether bootstrap should be used for inference. Default is FALSE. |
boot.type |
Type of bootstrap to be performed (not relevant if |
nboot |
Number of bootstrap repetitions (not relevant if |
inffunc |
Logical argument to whether influence function should be returned. Default is FALSE. |
The reg_did_panel
function implements
outcome regression difference-in-differences (DiD) estimator for the average treatment effect
on the treated (ATT) defined in equation (2.2) of Sant'Anna and Zhao (2020) when panel data are available.
The estimator follows the same spirit of the nonparametric estimators proposed by Heckman, Ichimura and Todd (1997),
though here the the outcome regression models are assumed to be linear in covariates (parametric),
The nuisance parameters (outcome regression coefficients) are estimated via ordinary least squares.
A list containing the following components:
ATT |
The OR DiD point estimate |
se |
The OR DiD standard error |
uci |
Estimate of the upper bound of a 95% CI for the ATT |
lci |
Estimate of the lower bound of a 95% CI for the ATT |
boots |
All Bootstrap draws of the ATT, in case bootstrap was used to conduct inference. Default is NULL |
att.inf.func |
Estimate of the influence function. Default is NULL |
call.param |
The matched call. |
argu |
Some arguments used (explicitly or not) in the call (panel = TRUE, boot, boot.type, nboot, type="or") |
Heckman, James J., Ichimura, Hidehiko, and Todd, Petra E. (1997),"Matching as an Econometric Evaluation Estimator: Evidence from Evaluating a Job Training Programme", Review of Economic Studies, vol. 64(4), p. 605–654, doi:10.2307/2971733.
Sant'Anna, Pedro H. C. and Zhao, Jun. (2020), "Doubly Robust Difference-in-Differences Estimators." Journal of Econometrics, Vol. 219 (1), pp. 101-122, doi:10.1016/j.jeconom.2020.06.003
# Form the Lalonde sample with CPS comparison group eval_lalonde_cps <- subset(nsw, nsw$treated == 0 | nsw$sample == 2) # Further reduce sample to speed example set.seed(123) unit_random <- sample(1:nrow(eval_lalonde_cps), 5000) eval_lalonde_cps <- eval_lalonde_cps[unit_random,] # Select some covariates covX = as.matrix(cbind(1, eval_lalonde_cps$age, eval_lalonde_cps$educ, eval_lalonde_cps$black, eval_lalonde_cps$married, eval_lalonde_cps$nodegree, eval_lalonde_cps$hisp, eval_lalonde_cps$re74)) # Implement OR DiD with panel data reg_did_panel(y1 = eval_lalonde_cps$re78, y0 = eval_lalonde_cps$re75, D = eval_lalonde_cps$experimental, covariates = covX)
# Form the Lalonde sample with CPS comparison group eval_lalonde_cps <- subset(nsw, nsw$treated == 0 | nsw$sample == 2) # Further reduce sample to speed example set.seed(123) unit_random <- sample(1:nrow(eval_lalonde_cps), 5000) eval_lalonde_cps <- eval_lalonde_cps[unit_random,] # Select some covariates covX = as.matrix(cbind(1, eval_lalonde_cps$age, eval_lalonde_cps$educ, eval_lalonde_cps$black, eval_lalonde_cps$married, eval_lalonde_cps$nodegree, eval_lalonde_cps$hisp, eval_lalonde_cps$re74)) # Implement OR DiD with panel data reg_did_panel(y1 = eval_lalonde_cps$re78, y0 = eval_lalonde_cps$re75, D = eval_lalonde_cps$experimental, covariates = covX)
reg_did_rc
computes the outcome regressions estimators for the average treatment effect on the
treated in difference-in-differences (DiD) setups with stationary repeated cross-sectional data.
reg_did_rc( y, post, D, covariates, i.weights = NULL, boot = FALSE, boot.type = "weighted", nboot = NULL, inffunc = FALSE )
reg_did_rc( y, post, D, covariates, i.weights = NULL, boot = FALSE, boot.type = "weighted", nboot = NULL, inffunc = FALSE )
y |
An |
post |
An |
D |
An |
covariates |
An |
i.weights |
An |
boot |
Logical argument to whether bootstrap should be used for inference. Default is FALSE. |
boot.type |
Type of bootstrap to be performed (not relevant if |
nboot |
Number of bootstrap repetitions (not relevant if |
inffunc |
Logical argument to whether influence function should be returned. Default is FALSE. |
The reg_did_rc
function implements
outcome regression difference-in-differences (DiD) estimator for the average treatment effect
on the treated (ATT) defined in equation (2.2) of Sant'Anna and Zhao (2020) when stationary repeated cross-sectional
data are available. The estimator follows the same spirit of the nonparametric estimators proposed by Heckman, Ichimura and Todd (1997),
though here the the outcome regression models are assumed to be linear in covariates (parametric),
The nuisance parameters (outcome regression coefficients) are estimated via ordinary least squares.
A list containing the following components:
ATT |
The OR DiD point estimate |
se |
The OR DiD standard error |
uci |
Estimate of the upper bound of a 95% CI for the ATT |
lci |
Estimate of the lower bound of a 95% CI for the ATT |
boots |
All Bootstrap draws of the ATT, in case bootstrap was used to conduct inference. Default is NULL |
att.inf.func |
Estimate of the influence function. Default is NULL |
call.param |
The matched call. |
argu |
Some arguments used (explicitly or not) in the call (panel = FALSE, boot, boot.type, nboot, type="or") |
Heckman, James J., Ichimura, Hidehiko, and Todd, Petra E. (1997),"Matching as an Econometric Evaluation Estimator: Evidence from Evaluating a Job Training Programme", Review of Economic Studies, vol. 64(4), p. 605–654, doi:10.2307/2971733.
Sant'Anna, Pedro H. C. and Zhao, Jun. (2020), "Doubly Robust Difference-in-Differences Estimators." Journal of Econometrics, Vol. 219 (1), pp. 101-122, doi:10.1016/j.jeconom.2020.06.003
# use the simulated data provided in the package covX = as.matrix(cbind(1, sim_rc[,5:8])) # Implement OR DiD estimator reg_did_rc(y = sim_rc$y, post = sim_rc$post, D = sim_rc$d, covariates= covX)
# use the simulated data provided in the package covX = as.matrix(cbind(1, sim_rc[,5:8])) # Implement OR DiD estimator reg_did_rc(y = sim_rc$y, post = sim_rc$post, D = sim_rc$d, covariates= covX)
sim_rc
contains a simulated dataset following the DGP1 in Sant'Anna
and Zhao (2020).
sim_rc
sim_rc
A data frame in "long" format with 1000 observations on the following and 8 variables:
unique identifier for each cross-sectional unit.
an indicator variable for post-treatment period (1 if post, 0 if pre treatment period).
outcome of interest
an indicator variable for treatment group. Equal to 1 if experience treatment in the post-treatment period; equal to 0 if never experience treatment.
Covariate z1 in Sant'Anna and Zhao(2020)
Covariate z2 in Sant'Anna and Zhao(2020)
Covariate z3 in Sant'Anna and Zhao(2020)
Covariate z4 in Sant'Anna and Zhao(2020)
Sant'Anna and Zhao (2020)
Sant'Anna, Pedro H. C. and Zhao, Jun. (2020), "Doubly Robust Difference-in-Differences Estimators." Journal of Econometrics, Vol. 219 (1), pp. 101-122, doi:10.1016/j.jeconom.2020.06.003
std_ipw_did_panel
is used to compute inverse probability weighted (IPW) estimators for the ATT
in difference-in-differences (DiD) setups with panel data. IPW weights are normalized to sum up to one, that is,
the estimator is of the Hajek type.
std_ipw_did_panel( y1, y0, D, covariates, i.weights = NULL, boot = FALSE, boot.type = "weighted", nboot = NULL, inffunc = FALSE )
std_ipw_did_panel( y1, y0, D, covariates, i.weights = NULL, boot = FALSE, boot.type = "weighted", nboot = NULL, inffunc = FALSE )
y1 |
An |
y0 |
An |
D |
An |
covariates |
An |
i.weights |
An |
boot |
Logical argument to whether bootstrap should be used for inference. Default is FALSE. |
boot.type |
Type of bootstrap to be performed (not relevant if |
nboot |
Number of bootstrap repetitions (not relevant if |
inffunc |
Logical argument to whether influence function should be returned. Default is FALSE. |
A list containing the following components:
ATT |
The IPW DiD point estimate. |
se |
The IPW DiD standard error |
uci |
Estimate of the upper bound of a 95% CI for the ATT |
lci |
Estimate of the lower bound of a 95% CI for the ATT |
boots |
All Bootstrap draws of the ATT, in case bootstrap was used to conduct inference. Default is NULL |
att.inf.func |
Estimate of the influence function. Default is NULL |
call.param |
The matched call. |
argu |
Some arguments used (explicitly or not) in the call (panel = TRUE, normalized = TRUE, boot, boot.type, nboot, type="ipw") |
Abadie, Alberto (2005), "Semiparametric Difference-in-Differences Estimators", Review of Economic Studies, vol. 72(1), p. 1-19, doi:10.1111/0034-6527.00321.
Sant'Anna, Pedro H. C. and Zhao, Jun. (2020), "Doubly Robust Difference-in-Differences Estimators." Journal of Econometrics, Vol. 219 (1), pp. 101-122, doi:10.1016/j.jeconom.2020.06.003
# Form the Lalonde sample with CPS comparison group eval_lalonde_cps <- subset(nsw, nsw$treated == 0 | nsw$sample == 2) # Further reduce sample to speed example set.seed(123) unit_random <- sample(1:nrow(eval_lalonde_cps), 5000) eval_lalonde_cps <- eval_lalonde_cps[unit_random,] # Select some covariates covX = as.matrix(cbind(1, eval_lalonde_cps$age, eval_lalonde_cps$educ, eval_lalonde_cps$black, eval_lalonde_cps$married, eval_lalonde_cps$nodegree, eval_lalonde_cps$hisp, eval_lalonde_cps$re74)) # Implement normalized IPW DiD with panel data std_ipw_did_panel(y1 = eval_lalonde_cps$re78, y0 = eval_lalonde_cps$re75, D = eval_lalonde_cps$experimental, covariates = covX)
# Form the Lalonde sample with CPS comparison group eval_lalonde_cps <- subset(nsw, nsw$treated == 0 | nsw$sample == 2) # Further reduce sample to speed example set.seed(123) unit_random <- sample(1:nrow(eval_lalonde_cps), 5000) eval_lalonde_cps <- eval_lalonde_cps[unit_random,] # Select some covariates covX = as.matrix(cbind(1, eval_lalonde_cps$age, eval_lalonde_cps$educ, eval_lalonde_cps$black, eval_lalonde_cps$married, eval_lalonde_cps$nodegree, eval_lalonde_cps$hisp, eval_lalonde_cps$re74)) # Implement normalized IPW DiD with panel data std_ipw_did_panel(y1 = eval_lalonde_cps$re78, y0 = eval_lalonde_cps$re75, D = eval_lalonde_cps$experimental, covariates = covX)
std_ipw_did_rc
is used to compute inverse probability weighted (IPW) estimators for the ATT
in DID setups with stationary repeated cross-sectional data. IPW weights are normalized to sum up to one, that is,
the estimator is of the Hajek type.
std_ipw_did_rc( y, post, D, covariates, i.weights = NULL, boot = FALSE, boot.type = "weighted", nboot = NULL, inffunc = FALSE )
std_ipw_did_rc( y, post, D, covariates, i.weights = NULL, boot = FALSE, boot.type = "weighted", nboot = NULL, inffunc = FALSE )
y |
An |
post |
An |
D |
An |
covariates |
An |
i.weights |
An |
boot |
Logical argument to whether bootstrap should be used for inference. Default is FALSE. |
boot.type |
Type of bootstrap to be performed (not relevant if |
nboot |
Number of bootstrap repetitions (not relevant if |
inffunc |
Logical argument to whether influence function should be returned. Default is FALSE. |
A list containing the following components:
ATT |
The IPW DID point estimate. |
se |
The IPW DID standard error |
uci |
Estimate of the upper bound of a 95% CI for the ATT |
lci |
Estimate of the lower bound of a 95% CI for the ATT |
boots |
All Bootstrap draws of the ATT, in case bootstrap was used to conduct inference. Default is NULL |
att.inf.func |
Estimate of the influence function. Default is NULL |
call.param |
The matched call. |
argu |
Some arguments used (explicitly or not) in the call (panel = FALSE, normalized = TRUE, boot, boot.type, nboot, type="ipw") |
Abadie, Alberto (2005), "Semiparametric Difference-in-Differences Estimators", Review of Economic Studies, vol. 72(1), p. 1-19, doi:10.1111/0034-6527.00321.
Sant'Anna, Pedro H. C. and Zhao, Jun. (2020), "Doubly Robust Difference-in-Differences Estimators." Journal of Econometrics, Vol. 219 (1), pp. 101-122, doi:10.1016/j.jeconom.2020.06.003
# use the simulated data provided in the package covX = as.matrix(cbind(1, sim_rc[,5:8])) # Implement normalized IPW DID estimator std_ipw_did_rc(y = sim_rc$y, post = sim_rc$post, D = sim_rc$d, covariates= covX)
# use the simulated data provided in the package covX = as.matrix(cbind(1, sim_rc[,5:8])) # Implement normalized IPW DID estimator std_ipw_did_rc(y = sim_rc$y, post = sim_rc$post, D = sim_rc$d, covariates= covX)
twfe_did_panel
is used to compute linear two-way fixed effects estimators for the ATT
in difference-in-differences (DiD) setups with panel data. As illustrated by Sant'Anna and Zhao (2020),
this estimator generally do not recover the ATT. We encourage empiricists to adopt alternative specifications.
twfe_did_panel( y1, y0, D, covariates, i.weights = NULL, boot = FALSE, boot.type = "weighted", nboot = NULL, inffunc = FALSE )
twfe_did_panel( y1, y0, D, covariates, i.weights = NULL, boot = FALSE, boot.type = "weighted", nboot = NULL, inffunc = FALSE )
y1 |
An |
y0 |
An |
D |
An |
covariates |
An |
i.weights |
An |
boot |
Logical argument to whether bootstrap should be used for inference. Default is FALSE. |
boot.type |
Type of bootstrap to be performed (not relevant if |
nboot |
Number of bootstrap repetitions (not relevant if |
inffunc |
Logical argument to whether influence function should be returned. Default is FALSE. |
A list containing the following components:
ATT |
The TWFE DiD point estimate |
se |
The TWFE DiD standard error |
uci |
Estimate of the upper bound of a 95% CI for the TWFE parameter. |
lci |
Estimate of the lower bound of a 95% CI for the TWFE parameter. |
boots |
All Bootstrap draws of the ATT, in case bootstrap was used to conduct inference. Default is NULL |
att.inf.func |
Estimate of the influence function. Default is NULL |
# Form the Lalonde sample with CPS comparison group eval_lalonde_cps <- subset(nsw, nsw$treated == 0 | nsw$sample == 2) # Further reduce sample to speed example set.seed(123) unit_random <- sample(1:nrow(eval_lalonde_cps), 5000) eval_lalonde_cps <- eval_lalonde_cps[unit_random,] # Select some covariates covX = as.matrix(cbind(1, eval_lalonde_cps$age, eval_lalonde_cps$educ, eval_lalonde_cps$black, eval_lalonde_cps$married, eval_lalonde_cps$nodegree, eval_lalonde_cps$hisp, eval_lalonde_cps$re74)) # Implement TWFE DiD with panel data twfe_did_panel(y1 = eval_lalonde_cps$re78, y0 = eval_lalonde_cps$re75, D = eval_lalonde_cps$experimental, covariates = covX)
# Form the Lalonde sample with CPS comparison group eval_lalonde_cps <- subset(nsw, nsw$treated == 0 | nsw$sample == 2) # Further reduce sample to speed example set.seed(123) unit_random <- sample(1:nrow(eval_lalonde_cps), 5000) eval_lalonde_cps <- eval_lalonde_cps[unit_random,] # Select some covariates covX = as.matrix(cbind(1, eval_lalonde_cps$age, eval_lalonde_cps$educ, eval_lalonde_cps$black, eval_lalonde_cps$married, eval_lalonde_cps$nodegree, eval_lalonde_cps$hisp, eval_lalonde_cps$re74)) # Implement TWFE DiD with panel data twfe_did_panel(y1 = eval_lalonde_cps$re78, y0 = eval_lalonde_cps$re75, D = eval_lalonde_cps$experimental, covariates = covX)
twfe_did_rc
is used to compute linear two-way fixed effects estimators for the ATT
in difference-in-differences (DiD) setups with stationary repeated cross-sectional data. As illustrated
by Sant'Anna and Zhao (2020),this estimator generally do not recover the ATT. We encourage empiricists
to adopt alternative specifications.
twfe_did_rc( y, post, D, covariates = NULL, i.weights = NULL, boot = FALSE, boot.type = "weighted", nboot = NULL, inffunc = FALSE )
twfe_did_rc( y, post, D, covariates = NULL, i.weights = NULL, boot = FALSE, boot.type = "weighted", nboot = NULL, inffunc = FALSE )
y |
An |
post |
An |
D |
An |
covariates |
An |
i.weights |
An |
boot |
Logical argument to whether bootstrap should be used for inference. Default is FALSE. |
boot.type |
Type of bootstrap to be performed (not relevant if |
nboot |
Number of bootstrap repetitions (not relevant if |
inffunc |
Logical argument to whether influence function should be returned. Default is FALSE. |
A list containing the following components:
ATT |
The TWFE DiD point estimate |
se |
The TWFE DiD standard error |
uci |
Estimate of the upper bound of a 95% CI for the TWFE parameter. |
lci |
Estimate of the lower bound of a 95% CI for the TWFE parameter. |
boots |
All Bootstrap draws of the ATT, in case bootstrap was used to conduct inference. Default is NULL |
att.inf.func |
Estimate of the influence function. Default is NULL |
# use the simulated data provided in the package covX = as.matrix(sim_rc[,5:8]) # Implement TWFE DiD estimator (you probably should consider something else....) twfe_did_rc(y = sim_rc$y, post = sim_rc$post, D = sim_rc$d, covariates= covX)
# use the simulated data provided in the package covX = as.matrix(sim_rc[,5:8]) # Implement TWFE DiD estimator (you probably should consider something else....) twfe_did_rc(y = sim_rc$y, post = sim_rc$post, D = sim_rc$d, covariates= covX)