Age-Sex Direct Standardization in Stata

When using person level / individual level data (not aggregated data), quick reference commands are:

  • Age standardization within sex
    • proportion msvi, stdize(ageGrp) stdweight(per) over(sex)
      • per = percent of population in age group calculated separately for males and females
    • dstdize msvi p ageGrp if sex==1, by(sex) using(stdPop) print AND dstdize msvi p ageGrp if sex==2, by(sex) using(stdPop) print
      • p = 1 in main dataset, and
      • p = population size for age-sex group in using/standar population dataset
  • Age-sex standardized prevalence of full sample
    • dstdize msvi p ageGrp sex , by(byVar) using(stdPop) print
      • byVar = a constant within sample
  • Incorrect usage
    • proportion msvi, stdize(ageGrp sex) stdweight(per)
    • dstdize msvi p ageGrp , by(sex) using(stdPop) print
    • dstdize msvi p ageGrp sex , by(sex) using(stdPop) print

Datasets

Main dataset

Working with individual level data, we can use the dstdize command to generate direct standadrized by age-sex results. Variables to be created in main dataset:

  • Variable p=1 to represent each individual
  • ageGrp
  • sex

Also, here msvi is my outcome variable coded as 0 = no outcome /1 = outcome observed. Listing of this file

* Example generated by -dataex-. * To install: ssc install dataex. clear input long id float p byte(ageGrp sex) float msvi 103845 1 1 1 0 102853 1 1 1 0 103914 1 1 1 0 103936 1 1 1 0 102329 1 1 1 0 100825 1 1 1 1 103638 1 1 1 0 103608 1 1 1 0 101622 1 1 1 0 104438 1 2 1 0 103501 1 2 1 0 104502 1 2 1 0 101033 1 2 1 0 104232 1 2 1 0 104856 1 2 1 0 104713 1 2 1 0 100355 1 2 1 0 104630 1 2 1 0 102349 1 2 1 0 100646 1 2 1 0 100452 1 2 1 1 100116 1 2 1 0 100622 1 2 1 0 100321 1 2 1 0 100501 1 2 1 0 102826 1 2 1 0 100956 1 2 1 0 104755 1 2 1 0 102945 1 2 1 0 104201 1 2 1 0 100918 1 2 1 0 101403 1 2 1 0 101311 1 3 1 0 101501 1 3 1 0 100809 1 3 1 0 103517 1 3 1 0 102358 1 3 1 0 101503 1 3 1 0 100915 1 3 1 0 100505 1 3 1 0 104141 1 3 1 0 102020 1 3 1 0 100827 1 3 1 1 101028 1 3 1 0 103101 1 3 1 0 100944 1 3 1 0 104901 1 3 1 0 101103 1 3 1 0 101535 1 1 2 0 100122 1 1 2 0 104508 1 1 2 0 101035 1 1 2 0 102520 1 1 2 0 104617 1 1 2 0 101732 1 1 2 0 103252 1 1 2 0 104313 1 1 2 0 102212 1 1 2 0 102029 1 1 2 0 102145 1 1 2 0 101606 1 1 2 0 101204 1 1 2 0 101409 1 1 2 0 102238 1 1 2 0 103733 1 1 2 0 102030 1 1 2 0 104014 1 1 2 1 100652 1 1 2 0 102720 1 2 2 0 100654 1 2 2 0 103934 1 2 2 1 104655 1 2 2 1 104709 1 2 2 0 100946 1 2 2 0 103901 1 2 2 0 102524 1 2 2 0 102636 1 2 2 1 101017 1 2 2 0 100939 1 2 2 0 100921 1 2 2 0 101358 1 2 2 0 100222 1 2 2 0 100931 1 2 2 1 103640 1 2 2 0 104344 1 2 2 0 100439 1 2 2 0 101858 1 2 2 0 103122 1 2 2 0 103120 1 2 2 1 103809 1 2 2 0 101723 1 2 2 0 102232 1 2 2 0 100217 1 2 2 0 104925 1 3 2 1 103229 1 3 2 0 102730 1 3 2 0 103527 1 3 2 1 102551 1 3 2 0 103011 1 3 2 0 101533 1 3 2 1 end label values ageGrp ageGrp label def ageGrp 1 "50-59", modify label def ageGrp 2 "60-69", modify label def ageGrp 3 "70-79", modify label values sex sexlb label def sexlb 1 "Male", modify label def sexlb 2 "Female", modify label values msvi yn label def yn 0 "No", modify label def yn 1 "Yes", modify label var ageGrp "Age" label var sex "Gender" label var msvi "Mod.-Sev. VI" save prevData.dta, replace table ageGrp msvi sex ------------------------------------ | Gender and Mod.-Sev. VI | -- Male -- - Female - Age | No Yes No Yes ----------+------------------------- 50-59 | 8 1 19 1 60-69 | 22 1 20 5 70-79 | 15 1 4 3 ------------------------------------ tab ageGrp sex, col nof | Gender Age | Male Female | Total -----------+----------------------+---------- 50-59 | 18.75 38.46 | 29.00 60-69 | 47.92 48.08 | 48.00 70-79 | 33.33 13.46 | 23.00 -----------+----------------------+---------- Total | 100.00 100.00 | 100.00
Code language: JavaScript (javascript)

In another stdPop.dta file, save the following data about the standard population:

  • p = population size of that age-sex group.
  • ageGrp
  • sex
Standard Population dataset

In another stdPop.dta file, save the following data about the standard population:

  • p = population size of that age-sex group.
  • ageGrp
  • sex

Standard data example

* Example generated by -dataex-. To install: ssc install dataex clear input byte(ageGrp sex) double p 1 1 70193 2 1 41400 3 1 21287 1 2 61583 2 2 38746 3 2 21172 end label values ageGrp ageGrp label def ageGrp 1 "50-59", modify label def ageGrp 2 "60-69", modify label def ageGrp 3 "70-79", modify label def ageGrp 4 ">=80", modify label values sex sexlb label def sexlb 1 "Male", modify label def sexlb 2 "Female", modify label var ageGrp "Age" label var sex "Gender" label var p "Each Dist age-group wise male, female population" ************* bysort sex: egen per = pc(p) save stdPop.dta, replace
Code language: Stata (stata)

As we can see, there are differnces in the age distrbution of males and females in sample compred to the standard population.

Age sex structures of the two datasets

use prevData.dta, clear tab ageGrp sex, col nof | Gender Age | Male Female | Total -----------+----------------------+---------- 50-59 | 18.75 38.46 | 29.00 60-69 | 47.92 48.08 | 48.00 70-79 | 33.33 13.46 | 23.00 -----------+----------------------+---------- Total | 100.00 100.00 | 100.00
Code language: Stata (stata)

Standard data example

use stdPop.dta, clear table ageGrp sex , c(min per) format(%5.2f) -------------------------- | Gender Age | Male Female ----------+--------------- 50-59 | 52.82 50.69 60-69 | 31.16 31.89 70-79 | 16.02 17.43 --------------------------
Code language: Stata (stata)

As we can see, there are differnces in the age distrbution of males and females in sample compred to the standard population.

Age Standardization with-in Sex

use prevData.dta, clear merge m:1 ageGrp sex using stdPop.dta * CRUDE PREVALENCE proportion msvi, over(sex) -------------------------------------------------------------- | Logit Over | Proportion Std. Err. [95% Conf. Interval] -------------+------------------------------------------------ Yes | Male | .0625 .0349386 .020012 .1787423 Female | .1730769 .0524627 .0918451 .3022431 * AGE STANDARDIZATION WITHIN EACH SEX * CORRECT WAY proportion msvi, stdize(ageGrp) stdweight(per) over(sex) | Logit Over | Proportion Std. Err. [95% Conf. Interval] -------------+------------------------------------------------ Yes | Male | .0822521 .0577207 .0192773 .2900981 Female | .1638016 .0482005 .088793 .2825279 * Males dstdize msvi p age if sex==1, by(sex) using(stdPop) print Summary of Study Populations: sex N Crude Adj_Rate Confidence Interval -------------------------------------------------------------------------- 1 48 0.062500 0.082252 [ 0.000000, 0.195383] * Females dstdize msvi p age if sex==2, by(sex) using(stdPop) print Summary of Study Populations: sex N Crude Adj_Rate Confidence Interval -------------------------------------------------------------------------- 2 52 0.173077 0.163802 [ 0.069330, 0.258273]
Code language: Stata (stata)

Age-sex standardized [full population]

dstdize requires a by(var) mandatorily since usually the point of teh command is to compare prevalence across different populations. If if we are not interested in comparing across populations but just in estimating for our one population, what to do? Is actually quite simple. Create a new variable balled byVar (or any name that you wish) and make it constant across your entire sample.

gen byVar=1 dstdize msvi p ageGrp sex , by(byVar) using(stdPop) print byVar N Crude Adj_Rate Confidence Interval -------------------------------------------------------------------------- 1 100 0.120000 0.121203 [ 0.046850, 0.195556]
Code language: Stata (stata)

Wrong commands

*** WRONG COMMANDS that may seem correct when you are typing them out. **** Pay careful attention to the standard population output and **** these proportions being applied to the stratum-specific prevalences dstdize msvi p ageGrp , by(sex) using(stdPop) print sex N Crude Adj_Rate Confidence Interval -------------------------------------------------------------------------- 1 48 0.062500 0.081689 [ 0.000000, 0.193017] 2 52 0.173077 0.160447 [ 0.067534, 0.253361] ** Here , the population age disribution (not the age-sex distribution) is ** being applied to both sexes ** ALSO WRONG dstdize msvi p ageGrp sex , by(sex) using(stdPop) print sex N Crude Adj_Rate Confidence Interval -------------------------------------------------------------------------- Male 48 0.062500 0.042966 [ 0.000000, 0.102061] Female 52 0.173077 0.078237 [ 0.033115, 0.123360] * The sum of standard population distrbution of males or females in not 100% * but closer to 50%. * This would explain almost half the prevalence estimate observed here * comapred to corerct results
Code language: Stata (stata)