Install Package

knitr::opts_chunk$set(echo = TRUE)

remotes::install_github("MartinSpindler/hdm", 
                        ref = "p-refactor-autodml-pd", force = TRUE)
library(hdm)
library(data.table)
library(plm)

Install Data

We will be using the Grunfeld data (https://www.statsmodels.org/dev/datasets/generated/grunfeld.html).

We treat “investment” as the outcome of interest, with “capital” as the treatment, and “value” as a control variable. This is just to showcase the package useage. Our parameter of interest is the Average Partial Derviative (APD).

\[APD = \frac{\partial \text{investment}_{it}}{\partial \text{capital}_{it}}\]

Number of observations - 220 (20 years for 11 firms)

Number of variables - 5

Variables name definitions::

invest  - Gross investment in 1947 dollars
value   - Market value as of Dec. 31 in 1947 dollars
capital - Stock of plant and equipment in 1947 dollars
firm    - General Motors, US Steel, General Electric, Chrysler,
        Atlantic Refining, IBM, Union Oil, Westinghouse, Goodyear,
        Diamond Match, American Steel
year    - 1935 - 1954
data(Grunfeld, package = "plm")

# make sure your data is balanced, package only works for balanced data 
DT_balanced <- make.pbalanced(Grunfeld, index = c("firm", "year"), 
                     balance.type = "shared.individuals")

Running the analysis

We are allowing our regressors to be polynomial of degree 2. I set there to be no interactions. The unit is firm, so there will be fixed effects \(\alpha_i\) for each firm \(i\). So the model we are running is a regularized version of

\[inv_{it} = \alpha_i + \beta_1 \text{captial}_{it} + \beta_2 \text{captial}^2_{it} + \beta_3 \text{value}_{it} + \beta_4 \text{value}^2_{it}\]

# cleans data for DML regression, creates correct inputs
data_base <- DataAPDAutoDML(x = c('value'), d = "capital", 
                       y = "inv" , data = DT_balanced, poly_order = 2,
                       interactions = FALSE, unit = "firm", 
                       time = "year")
# runs the DML analysis
auto_base= rlassoAutoDML(data_base, est_type = "APD")

We can get our estimate of the APD via

summary(auto_base)
## Estimation and significance testing of the treatment effect 
## Type: APD 
##      coeff.      se. t-value p-value    
## TE 0.101685 0.009631   10.56  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Given our model it will be

\[APD = \frac{\partial \text{investment}_{it}}{\partial \text{capital}_{it}} = \beta_1 + \beta_2 \times 2 E[\text{captial}_{it}] \]