- Least Absolute Shrinkage Selection Operator = LASSO
- Supervised Machine Learning Methods for prediction.
- Helps when aim is to select the best sub-set of predictors for an outcome.
- Determines which predictors are relevant for an outcome by applying a penalty (
**Lambda**) to the OLS least square. This causes some coefficients toto zero excluding them from the model.**shrink** - As Lambda increases, more varibles get excluded
- Results in
**Parsimonious**model

**Crossvaldiation**

- It is a resampling technique for selection of observations for creating a model within the training dataset
- CV is done within the TRAIN dataset only
- Can be done k-times; eg. 10 fold crossvalidation
- Helps generate a model that is more relatistic for new cases
- by allowing the model to learn from the underlyng distribution
- Prevents
**overfitting**

- Running the model k times allows us to chsose the model with best Lambda or AIC/BIC

By default, stata will select model with highest lambda. By default, stata fits up to 100 models with varying lambdas. The model with largest out of sample r-square and minimum CV mean prediction error gets selected by cross-validation

**LASSO commands**

**splitsample**: to generate traing and validation /testing/hold-out sample sets- Estimation:
**lasso****elasticnet****sqrtlasso**- Selection methods
- cross-validation
- adaptive
- plugin
- customized

- Graph:
**cvplot**: cross-validation plot**bicplot****coefpath**: coefficient path

- Exploratory tools:
**lassocoef**: display lasso coefficients**lassoinfo**: summary of lasso fitting**lassoknots**: detailed tabulate table of knots**lassoselect**: manually select a tuning parameter

- Prediction
**lassogof**: evaluate in-sample and out-of-sample prediction**predict**: prediction for linear, binary, count, survival data

**SSC Addons based methods**

- caliberationbelt – GiViTi Caliberation belt and test and plot for model valdiation between observed and predicted probability of outcome. It gives a test statistic and a p value in the plot. Large p-value ndicates there is no statistcially difference between model predictions and 45 degree line. 45 degree lines indicates that the observed and predicted rates are same. We want large p vales and a non-signifivant p value.
- cvauroc – AUC and Discrmination performance of the model – displaus AUC at each fold and mean AUC
**Rule of Thumb: cvAUC of 0.5 = Same as chance, AUC > 0.7 = Good MOdel, > 0.8 = strong model, 1 = Perfect fit**- rocreg – Alternative way to estimate AUC – uses bootstrap replication

Stabdard Lasso estimation commands

- lasso
- cvplot
- lassoknots
- lassoselect
- lassocoef
- lassogof
- bicplot

#### Lasso Inference commnds

- dsregress, poregress, xporegress
- dslogit pologic xpologit
- dspoisson, popoisson, xpopoisson

ds referes to double selection lasso regression

xpo referes to cross-fit partialling out lasso regression

## Predict after LASSO

Two options:

**Penalised:**Coefficients based prediction – default – penalized coefficients be used to calculate predictions. Penalized coefficients are those estimated by lasso in the calculation of the lasso penalty**Postselection**specifies that postselection coefficients be used to calculate predictions. Postselection coefficients are calculated by taking the variables selected by lasso and refitting the model with the appropriate ordinary estimator: linear regression for linear models, logistic regression for logit models, probit regression for probit models, Poisson regression for poisson models, and Cox regression for cox models**:***.*

It has been mentioned that In the **linear **model, **post-selection coefficients tend to be less biased** and may have better out-of-sample prediction performance than the penalized coefficients http://fmwww.bc.edu/RePEc/scon2019/chicago19_Liu.pdf

**Sample command sequence**

```
splitsample , generate(sample) nsplit(2) rseed(1234)
keep if sample==1
lasso logit outcome predictor1 predictor2 predictor3 i.predictor4, selection(cv) rseed(1234) folds(10)
est store model1
lassocoef model1, display(coef, penalized) sort(coef, penalized)
predict double outcome_predicted, pr
calibrationbelt outcome outcome_predicted, devel("internal") clevel1(0.95) clevel2 (0.99) maxDeg(4) thres(0.95)
cvauroc outcome outcome_predicted, kfold(10) seed(1972) fit detail graphlowess
rocreg outcome outcome_predicted, bseed(123456)
******************** Example from StataCorp Youtube video
lasso logit outcome predictor1 predictor2 predictor3 i.predictor4, selection(cv) rseed(1234) folds(10)
est store model1
cvplot // Cross-validation plot - shows at what value fo lambda is the cross-validation function is minimized
est store cv
lassoknots, display(nonzero osr2 bic) // displays infor about all models fit during CV
* Select a specific model based on BIC or Number of Coef criteria
lassoselect id = 4 // Lowest BIC
cvplot
est store minBIC
** Adaptive LASSO model
lasso logit outcome predictor1 predictor2 predictor3 i.predictor4, selection(adaptive) rseed(1234) folds(10)
est store model1
est store adaptive
** Compare variables included in various models, with largest standardzied coefficients displayed at top
lassocoef cv minBIC adaptive, sort(coef, standardized) nofvlabel
** Goodness of Fit of model on the test sample
lassogof cv minBIC adaptive, over(sample) postselection
* Can choose the model with minimum mean square error and largest r-square in testing dataset
********************************** LASSO INFERENCE
webuse cattaneo2
dsregress .........
```

Code language: JavaScript (javascript)

**Sources**

The Stata Blog » An introduction to the lasso in Stata

The Stata Blog » Using the lasso for inference in high-dimensional models

Using lasso and related estimators for prediction (stata.com)

Lasso for prediction and model selection | Stata

Predicting the individualized risk of poor adherence to ART medication among adolescents living with HIV in Uganda: the Suubi+Adherence study – PMC (nih.gov) – calibrationbelt

http://fmwww.bc.edu/RePEc/scon2019/chicago19_Liu.pdf