Epidemiology & Technology

LASSO in Stata

  • Least Absolute Shrinkage Selection Operator = LASSO
  • Supervised Machine Learning Methods for prediction.
  • Helps when aim is to select the best sub-set of predictors for an outcome.
  • Determines which predictors are relevant for an outcome by applying a penalty (Lambda) to the OLS least square. This causes some coefficients to shrink to zero excluding them from the model.
  • As Lambda increases, more varibles get excluded
  • Results in Parsimonious model


  • It is a resampling technique for selection of observations for creating a model within the training dataset
  • CV is done within the TRAIN dataset only
  • Can be done k-times; eg. 10 fold crossvalidation
  • Helps generate a model that is more relatistic for new cases
    • by allowing the model to learn from the underlyng distribution
    • Prevents overfitting
  • Running the model k times allows us to chsose the model with best Lambda or AIC/BIC

By default, stata will select model with highest lambda


  • lasso:
    • by default, stata fits up to 100 models with varying lambdas.
    • The model with largest out of sample r-square and minimum CV mean prediction error gets selected by cross-validation
  • cvplot:
  • lassocoef
  • predict
  • caliberationbelt – GiViTi Caliberation belt and test and plot for model valdiation between observed and predicted probability of outcome. It gives a test statistic and a p value in the plot. Large p-value ndicates there is no statistcially difference between model predictions and 45 degree line. 45 degree lines indicates that the observed and predicted rates are same. We want large p vales and a non-signifivant p value.
  • cvauroc – AUC and Discrmination performance of the model – displaus AUC at each fold and mean AUC
  • Rule of Thumb: cvAUC of 0.5 = Same as chance, AUC > 0.7 = Good MOdel, > 0.8 = strong model, 1 = Perfect fit
  • rocreg – Alternative way to estimate AUC – uses bootstrap replication

Stabdard Lasso estimation commands

  • lasso
  • cvplot
  • lassoknots
  • lassoselect
  • lassocoef
  • lassogof
  • bicplot

Lasso Inference commnds

  • dsregress, poregress, xporegress
  • dslogit pologic xpologit
  • dspoisson, popoisson, xpopoisson

ds referes to double selection lasso regression

xpo referes to cross-fit partialling out lasso regression

Sample command sequence

splitsample , generate(sample) nsplit(2) rseed(1234)
keep if sample==1
lasso logit outcome predictor1 predictor2 predictor3 i.predictor4, selection(cv) rseed(1234) folds(10)
est store model1
lassocoef model1, display(coef, penalized) sort(coef, penalized)
predict double outcome_predicted, pr
calibrationbel outcome outcome_predicted, devel("internal") clevel1(0.95) clevel2 (0.99) maxDeg(4) thres(0.95)
cvauroc outcome outcome_predicted, kfold(10) seed(1972) fit detail graphlowess
rocreg outcome outcome_predicted, bseed(123456)

******************** Example from StataCorp Youtube video
lasso logit outcome predictor1 predictor2 predictor3 i.predictor4, selection(cv) rseed(1234) folds(10)
est store model1
cvplot // Cross-validation plot - shows at what value fo lambda is the cross-validation function is minimized
est store cv
lassoknots, display(nonzero osr2 bic) // displays infor about all models fit during CV
* Select a specific model  based on BIC or Number of Coef criteria
lassoselect id = 4 // Lowest BIC
est store minBIC

**  Adaptive LASSO model
lasso logit outcome predictor1 predictor2 predictor3 i.predictor4, selection(adaptive) rseed(1234) folds(10)
est store model1
est store  adaptive

** Compare variables included in various models, with largest standardzied coefficients displayed at top
lassocoef cv minBIC adaptive, sort(coef, standardized) nofvlabel

** Goodness of Fit of model on the test sample
lassogof cv minBIC adaptive, over(sample) postselection
 * Can choose the model with minimum mean square error and largest r-square in testing dataset

********************************** LASSO INFERENCE
webuse cattaneo2
dsregress .........

Code language: JavaScript (javascript)


The Stata Blog » An introduction to the lasso in Stata

The Stata Blog » Using the lasso for inference in high-dimensional models

Using lasso and related estimators for prediction (stata.com)

Lasso | Stata

Lasso for prediction and model selection | Stata

lasso18 (stata.com)

Applying Machine Learning Techniques in Stata to Predict Health Outcomes Using HIV-related Data – YouTube

Predicting the individualized risk of poor adherence to ART medication among adolescents living with HIV in Uganda: the Suubi+Adherence study – PMC (nih.gov)

Lasso for prediction and model selection (youtube.com)

Related posts