Machine Learning: k-means cluster analysis with Python

A k-means cluster analysis was conducted to identify underlying subgroups of adolescents based on their similarity of responses on 18 variables that represent characteristics that could have an impact adolescents self-esteem. Clustering variables included gender, ethnicity (Hispanic, White, Black, Non american, Asian), age, two binary variables measuring whether or not the adolescent had ever used … More Machine Learning: k-means cluster analysis with Python

Machine Learning: Lasso Regression Analysis with Python

A lasso regression analysis was conducted to identify a subset of variables from a pool of 23 categorical and quantitative predictor variables that best predicted a quantitative response variable measuring adolescents’ grade point average (GPA). Categorical predictors included gender and a series of 5 binary categorical variables for race and ethnicity (Hispanic, White, Black, Native … More Machine Learning: Lasso Regression Analysis with Python

Machine Learning: Building a Random Forest with Python

Random forest analysis was performed to evaluate the importance of a series of explanatory variables in predicting regular smoking among adolescent – a binary categorical response variable. The following explanatory variables were included as possible contributors to a random forest evaluating the response variable: gender, age, (race/ethnicity) Hispanic, White, Black, Native American and Asian, alcohol … More Machine Learning: Building a Random Forest with Python

Machine Learning: Growing a Decision Tree with Python

Decision tree analysis was performed to test nonlinear relationships among a series of explanatory variables and a binary, categorical response variable. The training sample and the test sample were set at a ratio of 40/60. For the present analyses, the maximum number of nodes was limited to 5. The following explanatory variables were included as … More Machine Learning: Growing a Decision Tree with Python

Logistic regression model with Python

Objective: Assess the association between income per person, alcohol consumption and cancer rate using logistic regression. Independent variables: Income per person and alcohol consumption. Income per person: 2010 Gross Domestic Product per capita in constant 2000 US$. Alcohol consumption: 2008 alcohol consumption per adult (age 15+) liters, recorded and estimated average alcohol consumption, adult (15+) … More Logistic regression model with Python

Multiple Regression and Regression Diagnostics with Python

Objective: Perform a multivariate regression modeling to identify indicators associated with breast cancer, and conduct a regression diagnostic of our model. Indicators of interest are: urbanization rate, life expectancy, CO2 emission, income per person, alcohol consumption and employment rate. The dependent variable is breast cancer rate, which is the 2002 breast cancer new cases per … More Multiple Regression and Regression Diagnostics with Python