# Analysis of Categorical Outcome Data across Groups in Stata

Situation: Yes / No or categorical outcomes, being compared across groups

## Data Preparation

0-1 coding: Ensure that categorical outcome and exposure variables are coded as : `0 = no, 1 = yes`. While this is not required for Chi-square, logistic regression etc, It is a pre-requisite for using epidemiological analysis `cc` and `cs` commands that can provide results in the form of risk difference, Odds ratio, risk ratios etc.

Ensure that categorical groups are coded in increments of 1. What I mean to say is that` 0=illiterate, 1=primary school, 3 = middle school` is bad, `0=illiterate, 1=primary school, 2 = middle school` is good. How do you check it – use `codebook` or `tab1 var, nolabel`

## Understand the data

A key step is to understand which participant groups have higher or lower levels of outcomes.

`tab outcomeVar groupVar, col`

## Check whether the two groups have same proportion of outcome

`prtest outcomeVar, by(groupVar)`

Hypothesis testing using Chi Square

`tab outcomeVar groupVar, col chi`

Use Exact tests if you get a message that one or more cells have an expected value of < 5

`tab outcomeVar groupVar, col chi exact`

Wondering that you are getting the same p value on Chi-square and prtest… well that is expected. The advantage of the prtest command is that you also get the 95% CIs of the proportions.

## Comparing yes/no outcome across two groups only

Odds ratios Calculation: `cc outcomeVar groupVar` or `logit pneumonia i.vaccine, or`

Risk Ratio Calculation: `cs outcomeVar groupVar`

Try These out !

``````preserve
use https://www.stata-press.com/data/r17/pneumoniacrt, clear
describe  pneumonia vaccine
codebook  pneumonia vaccine
count
tab  pneumonia vaccine, col
tab  pneumonia vaccine, col chi
prtest pneumonia, by(vaccine)
cc  pneumonia vaccine
cs  pneumonia vaccine
logit   pneumonia i.vaccine, or
restore```Code language: JavaScript (javascript)```

## Comparing yes/no Outcome Across Three or more groups

In this case, we can use Mantel-haenzel techniques. `tabodds` and `mhodds` are your friends. Alternatively, you could just run a logistic regression .