The Coursera course I am taking this week is dedicated to the Regression Modeling in Practice, Week2 -Basics of Linear Regression. I decided to use The GapMinder dataset and run linear regression models to assess the association between urbanicity and breast cancers rate. Urbanicity is 2008 urban population (% of total). Urban population refers to people living in urban areas as defined by national statistical offices (calculated using World Bank population estimates and urban ratios from the United Nations World Urbanization Prospects). Breast cancers rate is the 2002 breast cancer new cases per 100,000 female.
The code I wrote is accessible here
Output 1: Mean urban rate and Centered mean urban rate
Figure 1: Scatterplot for the Association Between (non centered) Urbanicity and Breast Cancers Rate
Figure 2: Scatterplot for the Association Between Centered mean urbanicity and Breast Cancers Rate
Output 2: Regression model for the Association Between (non-centered) Urbanicity and Breast Cancers Rate
Output 3: Regression model for the Association Between Centered mean urbanicity and Breast Cancers Rate
Results: Urbanicity (Beta = 0.5616, p < 0.001) and breast cancers rate are significantly and positively associated.
One thought on “Simple linear regression with Python”