PROFESSIONAL ACADEMIC STUDY RESOURCES WEBSITE +1 813 434 1028  proexpertwritings@hotmail.com

Using R in Econometrics

Description

We will use the online spending application. First, create the variable yspend (10,000 observations with spending for each household) and the sparse matrix xweb, following the lecture notes (you can just copy the code from the lecture notes, and you don’t need to include this data preparation step when turning in the problem set).

1. (45 points) Recall that lasso selects a sparse model by zeroing out covariates. We will run a bootstrap experiment to see whether this model selection procedure is stable across different samples. In general, one should be cautious when applying bootstrap with the lasso (bootstrap CIs can fail to cover the true parameter with high probability), but we’re just using it to get a sense of the stability of lasso model selection. (a) (10 points) Run a lasso regression of log(yspend) on xweb, using 5-fold crossvalidation to pick λ. Report the chosen λ (you can use the command modelname$lambda.min), and report the indices of the nonzero coefficients (you don’t have to report their names).

(b) (10 points) Redraw a single “bootstrap” sample (same sample size as the original sample) by sampling from yspend and xweb with replacement. Run the lasso regression from part (a) on this bootstrap sample. i. Report the indices of the nonzero coefficients for the bootstrap sample. ii. Report the indices of the coefficients that are nonzero only for the bootstrap sample. iii. Report the indices of the coefficients that are nonzero only for the original sample. iv. Report the indices of the coefficients that are nonzero for both samples.

(c) (25 points) Based on these results, does the set of nonzero coefficients selected by the lasso seem to be stable across random draws of the data? Discuss.

(d) Optional bonus question: repeat part (a) by manually coding the 5-fold crossvalidation to choose the optimal λ by following the algorithm on p.74 of lec04 slides. You will have to use glmnet instead of gamlr since gamlr does not let us specify the value of λ. Use glmnet(xweb,log(yspend),lambda=somevalue). For the sequence of λ, start from λ1 = 0.232476092 and consider 100 values where λt = 0.9545485 · λt−1. Note: λ found from this exercise may not exactly equal the one found in (a). Why?

Share your love

Newsletter Updates

Enter your email address below and subscribe to our newsletter

Leave a Reply

Your email address will not be published. Required fields are marked *