Skip to content
Home » Analysis of Categorical Outcome Data across Groups in Stata

Analysis of Categorical Outcome Data across Groups in Stata

  • December 3, 2021December 3, 2021

Situation: Yes / No or categorical outcomes, being compared across groups

Data Preparation

0-1 coding: Ensure that categorical outcome and exposure variables are coded as : 0 = no, 1 = yes. While this is not required for Chi-square, logistic regression etc, It is a pre-requisite for using epidemiological analysis cc and cs commands that can provide results in the form of risk difference, Odds ratio, risk ratios etc.

Ensure that categorical groups are coded in increments of 1. What I mean to say is that 0=illiterate, 1=primary school, 3 = middle school is bad, 0=illiterate, 1=primary school, 2 = middle school is good. How do you check it – use codebook or tab1 var, nolabel

Understand the data

A key step is to understand which participant groups have higher or lower levels of outcomes.

tab outcomeVar groupVar, col

Check whether the two groups have same proportion of outcome

prtest outcomeVar, by(groupVar)

Hypothesis testing using Chi Square

tab outcomeVar groupVar, col chi

Use Exact tests if you get a message that one or more cells have an expected value of < 5

tab outcomeVar groupVar, col chi exact

Wondering that you are getting the same p value on Chi-square and prtest… well that is expected. The advantage of the prtest command is that you also get the 95% CIs of the proportions.

Comparing yes/no outcome across two groups only

Odds ratios Calculation: cc outcomeVar groupVar or logit pneumonia i.vaccine, or

Risk Ratio Calculation: cs outcomeVar groupVar

Try These out !

preserve use, clear describe pneumonia vaccine codebook pneumonia vaccine count tab pneumonia vaccine, col tab pneumonia vaccine, col chi prtest pneumonia, by(vaccine) cc pneumonia vaccine cs pneumonia vaccine logit pneumonia i.vaccine, or restore
Code language: JavaScript (javascript)

Comparing yes/no Outcome Across Three or more groups

In this case, we can use Mantel-haenzel techniques. tabodds and mhodds are your friends. Alternatively, you could just run a logistic regression .