Logistic regression model with Python

Objective: Assess the association between income per person, alcohol consumption and cancer rate using logistic regression.

Independent variables: Income per person and alcohol consumption. Income per person: 2010 Gross Domestic Product per capita in constant 2000 US$. Alcohol consumption: 2008 alcohol consumption per adult (age 15+) liters, recorded and estimated average alcohol consumption, adult (15+) per capita consumption in liters pure alcohol.

Dependent variable: Breast cancer rate: 2002 breast cancer new cases per 100,000 female.

Data management: All variables of interest are quantitative continuous. So they were recoded into two categories (binary). Breast cancer rate less or equal to 20 per 100,000 female is coded as low breast cancer rate (0) and coded as high rate when greater than 20 per 100,000 female (1).

Income per person less or equal to 5,000 US$ was coded as low income per person (0) and coded as high income per person when greater than 5,000 US$ (1).

Alcohol consumption less or equal to 5 liters was coded as low alcohol consumption (0) and coded as high alcohol consumption when greater than 5 liters (1).

Python code:

The Python code written to perform the analysis is accessible here.




The bivariate analysis of the association between income per person and breast cancer rate shows that the odd of having higher cancer rates, was 16 times (OR= 15.78, 95% CI (3.64 – 68.31), p-value < 0.001) greater for countries with higher income per person. After controlling for alcohol consumption, countries with higher income per person still have 12-times greater odds of having higher cancer rates (OR =12.00, 95% CI (2.72 – 52.85), p-value = 0.001). Countries with higher alcohol consumption also have 3 times greater odds of having higher breast cancer rate (OR = 3.02, 95% CI (1.34 – 6.85), p-value = 0.008), adjusting for income.

This result support the hypothesis of association between income and breast cancer rate. There was no evidence that alcohol consumption confounds this relationship.

One thought on “Logistic regression model with Python

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s