Epidemiology & Technology

LASSO in Stata

  • Least Absolute Shrinkage Selection Operator = LASSO
  • Supervised Machine Learning Methods for prediction.
  • Helps when aim is to select the best sub-set of predictors for an outcome.
  • Determines which predictors are relevant for an outcome by applying a penalty (Lambda) to the OLS least square. This causes some coefficients to shrink to zero excluding them from the model.
  • As Lambda increases, more varibles get excluded
  • Results in Parsimonious model

Crossvaldiation

  • It is a resampling technique for selection of observations for creating a model within the training dataset
  • CV is done within the TRAIN dataset only
  • Can be done k-times; eg. 10 fold crossvalidation
  • Helps generate a model that is more relatistic for new cases
    • by allowing the model to learn from the underlyng distribution
    • Prevents overfitting
  • Running the model k times allows us to chsose the model with best Lambda or AIC/BIC

By default, stata will select model with highest lambda. By default, stata fits up to 100 models with varying lambdas. The model with largest out of sample r-square and minimum CV mean prediction error gets selected by cross-validation

LASSO commands

  • splitsample : to generate traing and validation /testing/hold-out sample sets
  • Estimation:
    • lasso
    • elasticnet
    • sqrtlasso
    • Selection methods
      • cross-validation
      • adaptive
      • plugin
      • customized
  • Graph:
    • cvplot: cross-validation plot
    • bicplot
    • coefpath: coefficient path
  • Exploratory tools:
    • lassocoef: display lasso coefficients
    • lassoinfo: summary of lasso fitting
    • lassoknots: detailed tabulate table of knots
    • lassoselect: manually select a tuning parameter
  • Prediction
    • lassogof: evaluate in-sample and out-of-sample prediction
    • predict: prediction for linear, binary, count, survival data

SSC Addons based methods

  • caliberationbelt – GiViTi Caliberation belt and test and plot for model valdiation between observed and predicted probability of outcome. It gives a test statistic and a p value in the plot. Large p-value ndicates there is no statistcially difference between model predictions and 45 degree line. 45 degree lines indicates that the observed and predicted rates are same. We want large p vales and a non-signifivant p value.
  • cvauroc – AUC and Discrmination performance of the model – displaus AUC at each fold and mean AUC
  • Rule of Thumb: cvAUC of 0.5 = Same as chance, AUC > 0.7 = Good MOdel, > 0.8 = strong model, 1 = Perfect fit
  • rocreg – Alternative way to estimate AUC – uses bootstrap replication

Stabdard Lasso estimation commands

  • lasso
  • cvplot
  • lassoknots
  • lassoselect
  • lassocoef
  • lassogof
  • bicplot

Lasso Inference commnds

  • dsregress, poregress, xporegress
  • dslogit pologic xpologit
  • dspoisson, popoisson, xpopoisson

ds referes to double selection lasso regression

xpo referes to cross-fit partialling out lasso regression

Predict after LASSO

Two options:

  • Penalised: Coefficients based prediction – default – penalized coefficients be used to calculate predictions. Penalized coefficients are those estimated by lasso in the calculation of the lasso penalty
  • Postselection: specifies that postselection coefficients be used to calculate predictions. Postselection coefficients are calculated by taking the variables selected by lasso and refitting the model with the appropriate ordinary estimator: linear regression for linear models, logistic regression for logit models, probit regression for probit models, Poisson regression for poisson models, and Cox regression for cox models.

It has been mentioned that In the linear model, post-selection coefficients tend to be less biased and may have better out-of-sample prediction performance than the penalized coefficients http://fmwww.bc.edu/RePEc/scon2019/chicago19_Liu.pdf

Sample command sequence

splitsample , generate(sample) nsplit(2) rseed(1234)
keep if sample==1
lasso logit outcome predictor1 predictor2 predictor3 i.predictor4, selection(cv) rseed(1234) folds(10)
est store model1
lassocoef model1, display(coef, penalized) sort(coef, penalized)
predict double outcome_predicted, pr


calibrationbelt outcome outcome_predicted, devel("internal") clevel1(0.95) clevel2 (0.99) maxDeg(4) thres(0.95)
cvauroc outcome outcome_predicted, kfold(10) seed(1972) fit detail graphlowess
rocreg outcome outcome_predicted, bseed(123456)


******************** Example from StataCorp Youtube video
lasso logit outcome predictor1 predictor2 predictor3 i.predictor4, selection(cv) rseed(1234) folds(10)
est store model1
cvplot // Cross-validation plot - shows at what value fo lambda is the cross-validation function is minimized
est store cv
lassoknots, display(nonzero osr2 bic) // displays infor about all models fit during CV
* Select a specific model  based on BIC or Number of Coef criteria
lassoselect id = 4 // Lowest BIC
cvplot
est store minBIC

**  Adaptive LASSO model
lasso logit outcome predictor1 predictor2 predictor3 i.predictor4, selection(adaptive) rseed(1234) folds(10)
est store model1
est store  adaptive

** Compare variables included in various models, with largest standardzied coefficients displayed at top
lassocoef cv minBIC adaptive, sort(coef, standardized) nofvlabel

** Goodness of Fit of model on the test sample
lassogof cv minBIC adaptive, over(sample) postselection
 * Can choose the model with minimum mean square error and largest r-square in testing dataset



********************************** LASSO INFERENCE
webuse cattaneo2
dsregress .........

Code language: JavaScript (javascript)

Sources

The Stata Blog » An introduction to the lasso in Stata

The Stata Blog » Using the lasso for inference in high-dimensional models

Using lasso and related estimators for prediction (stata.com)

Lasso | Stata

Lasso for prediction and model selection | Stata

lasso18 (stata.com)

Applying Machine Learning Techniques in Stata to Predict Health Outcomes Using HIV-related Data – YouTube

Predicting the individualized risk of poor adherence to ART medication among adolescents living with HIV in Uganda: the Suubi+Adherence study – PMC (nih.gov) – calibrationbelt

http://fmwww.bc.edu/RePEc/scon2019/chicago19_Liu.pdf

Lasso for prediction and model selection (youtube.com)

Related posts