An Algorithm for Predicting Lifetime Risk of Maternal Death from the World Development Indicators

Final report prepared for the Data Analysis and Interpretation Specialization

A Specialization Certificate authorized by Wesleyan University and offered through Coursera

Introduction

This project is intended to identify the best predictors of lifetime risk of maternal death (%) using the world development indicators. World Development Indicators are The World Bank collection of development indicators, compiled from officially-recognized international sources. It presents the most current and accurate global development data available, and includes national, regional and global estimates.

Maternal mortality is unacceptably high. About 830 women die from pregnancy- or childbirth-related complications around the world every day. According to the World Health Organization (WHO), almost all of these deaths occurs in low-resource settings, and most could be prevented. In 2000, the UN Millennium Declaration identified improvement of maternal health as one of the eight fundamental goals for furthering human development. As part of Millennium Development Goal 5, the UN established the target of reducing the maternal mortality ratio by three-quarters between 1990 and 2015 for all national and regional populations.

Given the importance granted to maternal health in the global health and international development arena, it is relevant to identify the major Development Indicators that influence maternal mortality.

Method

Sample

This capstone data set is a subset of data extracted from the primary World Bank collection of development indicators. The sample consists of N=248 countries for the years 2012 and 2013.

Data management, variables and measures

The outcome of interest in this project is lifetime risk of maternal death (%). The world development indicators of interest used as predictors are presented in Table 1.

GDP per capita (current us$), Crude birth rate (per 1,000 people), Total fertility rate, and Strength of legal rights index where slip into two categories of equal size. The lowest and highest values were labeled respectively as low and high or weak and strong. Observations with missing values were dropped. The Python code written for the data management and analysis is accessible on GitHub.

Table 1. World Development Indicators for the study on predictors of lifetime risk of maternal death. Source: The World Bank Group.

Number Variable name Variable description
1 x100_2013 Death rate, crude (per 1,000 people)
2 x11_2013 Adjusted net national income per capita (current us$)
3 x121_2013 Exports of goods and services (% of gdp)
4 x125_2013 Fertility rate, total (births per woman)
5 x139_2013 GDP at market prices (current us$)
6 x140_2013 GDP growth (annual %)
7 x142_2013 GDP per capita (current us$)
8 x143_2013 GDP per capita growth (annual %)
9 x149_2013 Health expenditure per capita (current us$)
10 x14_2013 Adjusted savings: consumption of fixed capital (% of gni)
11 x150_2013 Health expenditure, total (% of gdp)
12 x155_2013 Improved sanitation facilities (% of population with access)
13 x156_2013 Improved water source (% of population with access)
14 x157_2013 Incidence of tuberculosis (per 100,000 people)
15 x169_2013 Labor force, female (% of total labor force)
16 x16_2013 Adjusted savings: education expenditure (% of gni)
17 x171_2013 Life expectancy at birth, female (years)
18 x172_2013 Life expectancy at birth, male (years)
19 x173_2013 Life expectancy at birth, total (years)
20 x190_2013 Mortality rate, infant (per 1,000 live births)
21 x191_2013 Mortality rate, neonatal (per 1,000 live births)
22 x192_2013 Mortality rate, under-5 (per 1,000)
23 x1_2012 Access to electricity (% of population)
24 x204_2013 Out-of-pocket health expenditure (% of total expenditure on health)
25 x205_2012 Percentage of students in primary education who are female (%)
26 x67_2012 Cause of death, by communicable diseases and maternal, prenatal and nutrition conditions (% of total)
27 x68_2012 Cause of death, by injury (% of total)
28 x69_2012 Cause of death, by non-communicable diseases (% of total)
29 x58_2013 Birth rate, crude (per 1,000 people)
30 x29_2013 Age dependency ratio (% of working-age population)
31 x283_2013 Urban population (% of total)
32 x275_2013 Survival to age 65, male (% of cohort)
33 x274_2013 Survival to age 65, female (% of cohort)
34 x267_2013 Strength of legal rights index (0=weak to 12=strong)
35 x25_2013 Adolescent fertility rate (births per 1,000 women ages 15-19)
36 x223_2013 Population, female (% of total)
37 x222_2013 Population, ages 15-64 (% of total)
38 x221_2013 Population, ages 0-14 (% of total)

Analyses

As all predictors and the lifetime risk of maternal death response variable are quantitative, their distributions were evaluated by calculating their summary statistics (mean, standard deviation and minimum and maximum). Frequency distribution of variables recoded into categories where not examined because of the equal size split.

Scatter plots were drawn for a visualization of the bivariate associations between individual predictors and the risk of maternal death response variable. Analysis of variance (ANOVA) were used to test bivariate associations between categorized predictors and lifetime risk of maternal death response variable.

Lasso regression with the least angle regression selection algorithm was used to identify the subset of variables that best predicted lifetime risk of maternal death. The lasso regression model was estimated on a training data set consisting of a random sample of 60% of the batches and a test data set included the other 40% of the batches. Before conducting the lasso regression analysis, all predictor variables were standardized to have a mean=0 and standard deviation=1. Cross validation was performed using k-fold cross validation specifying 10 folds. The change in the cross validation mean squared error rate at each step was used to identify the best subset of predictor variables. Predictive accuracy was measured by determining the mean squared error rate of the training data prediction algorithm when applied to observations in the test data set.

Categorized variables were used for ANOVA and box plot visualization purpose only, and where not included in the lasso regression. Their original quantitative were used for the lasso regression instead.

Results

Descriptive Statistics

A total of 55 countries where included in the analysis after dropping observations with missing values.

Table 2 shows descriptive statistics for lifetime risk of maternal death and the quantitative predictors. The 2013 average lifetime risk of maternal death 0.5297% (SD = 1.0432%), with a minimum lifetime risk of maternal death of 0.0043% and a maximum 4.7250%.

Table 2:  Descriptive Statistics for data analytic variables

Variable Label Count (n) Mean Std Dev Min Max
x174_2013 Risk_of_maternal_death 55 0.52970239 1.043256 0.004329 4.725049
x100_2013 Death rate /1,000 55 8.129709091 2.375154 3.392 14.111
x11_2013 Net national income /capita (US$) 55 14724.46547 16264.72 162.6867 71105.36
x121_2013 Exports (% of GDP) 55 44.15138561 21.60125 6.307895 106.6796
x125_2013 Total fertility rate 55 2.491927273 1.332382 1.28 7.623
x139_2013 GDP (US$) 55 4.06086E+11 7.81E+11 1.62E+09 3.75E+12
x140_2013 GDP growth (annual %) 55 2.969904857 3.37084 -5.35675 11.64458
x142_2013 GDP /capita (US$) 55 18234.68801 20053.73 239.8697 84669.29
x143_2013 GDP /capita growth (%) 55 1.819958812 2.854032 -5.13648 9.659583
x149_2013 Health expend. /capita (US$) 55 1614.056856 2092.634 26.21477 9276.473
x14_2013 Adjusted savings(% of GNI) 55 12.98534114 5.471415 2.131745 23.83965
x150_2013 Total health expend. (% of GDP) 55 7.627847627 2.314408 1.978438 12.8853
x155_2013 Improved sanitation (%) 55 80.43454545 27.10311 10.5 100
x156_2013 Improved water source (%) 55 91.25090909 13.23896 53.4 100
x157_2013 Incidence of TB /100,000 55 74.05636364 113.5403 3.6 665
x169_2013 Femal labor force (%) 55 42.05226108 8.335438 15.1653 51.44402
x16_2013 Adj. sav. educ. expend. (% of GNI) 55 4.720654736 1.6099 1.1 8.3
x171_2013 Female life expect. (years) 55 76.48958182 7.894863 56.205 85.5
x172_2013 Male life expect. (years) 55 71.30623636 7.187317 53.934 81.8
x173_2013 Total life expect. (years) 55 73.83758004 7.47068 55.0418 83.11707

 

Table 2 (Continued): Descriptive Statistics for data analytic variables

Variable Label Count (n) Mean Std Dev Min Max
x190_2013 Infant mortality rate /‰ LB 55 18.08181818 18.84818 1.6 69.9
x191_2013 Neonatal mortality rate /‰ LB 55 10.52181818 9.681592 1 36.8
x192_2013 Under-5 mortality rate /‰ LB 55 24.30363636 28.68151 2.1 104.8
x1_2012 Access to electric. (%) 55 86.45913236 26.34191 9.8 100
x204_2013 O-of-p health expend. (% of total) 55 29.39492839 15.81812 5.394363 73.79183
x205_2012 %female students primary educ. (%) 55 48.28414133 1.392425 40.71588 50.45235
x67_2012 C of death: mat., prenat., nutri. (%) 55 17.47818182 20.1268 1.4 67.6
x68_2012 C of death: injury (%) 55 8.310909091 3.997299 3.4 18.4
x69_2012 C of death: NCD (%) 55 74.21272727 21.95457 24.7 93
x58_2013 Crude birth rate /‰ people 55 18.8218 10.32416 7.9 49.661
x29_2013 Age dependency ratio (%) 55 57.23838888 16.25614 34.85127 112.3096
x283_2013 Urban population (%) 55 61.41727273 19.82994 15.944 97.776
x275_2013 Male survival to age 65 (%) 55 73.84794436 11.91491 46.67865 90.24135
x274_2013 Female survival to age 65 (%) 55 82.75748327 11.31914 50.68189 93.82151
x267_2013 Strength rights index 55 4.981818182 2.54217 0 11
x25_2013 Ado. fertility rate 55 42.24563636 42.49686 3.2754 206.045
x223_2013 Female population (%) 55 50.52907693 1.313537 43.4732 53.29677
x222_2013 Population, 15-64 (%) 55 64.18742482 5.846812 47.10103 74.15577
x221_2013 Population, 0-14 (%) 55 25.31552138 10.77501 13.0939 50.33574

Bivariate Analyses

Visualizations of the association between quantitative predictors of interest and the risk of maternal death are presented in Figure 1 to Figure 8. Boxplots and ANOVA shows that the risk of maternal death is: 1) significantly higher in the category of high crude birth rate (F = 9.229, p = 0.00480, R2 = 0.229, Figure 9); 2) significantly higher in the category of high fertility rate (F = 9.229, p = 0.00480, R2 = 0.229, Figure 10); 3) significantly higher in the category of low GDP per capita (F = 9.260, p = 0.00474, R2 = 0.230, Figure 11); 4) not significantly different as function of Strength of legal rights – split into two categories (F = 2.211, p = 0.147, R2 = 0.067, Figure 12).

Lasso Regression Analysis

Of the 38 predictor variables, 17 were retained in the selected model. During the estimation process, Total fertility rate, Access to electricity (%), Adolescent fertility, Female survival to age 65 (%), and Cause of death by injury (%), were the top five most strongly associated with the risk of maternal death (Table 3).

Table 3: Predictors of maternal death and their regression coefficients

Rank Variable Label Coefficient Orientation of the association
1 x125_2013 Total fertility rate 0.34053 Positive
2 x1_2012 Access to electricity (%) -0.33209 Negative
3 x25_2013 Adolescent fertility rate 0.31192 Positive
4 x274_2013 Female survival to age 65 (%) -0.29956 Negative
5 x68_2012 Cause of death: injury (%) -0.19320 Negative
6 x157_2013 Incidence of TB per 100,000 0.18341 Positive
7 x191_2013 Neonatal mortality rate (‰ live birth) -0.17137 Negative
8 x143_2013 GDP per capita growth (%) -0.12596 Negative
9 x205_2012 %female students in primary educ. (%) -0.09713 Negative
10 x150_2013 Total health expenditure (% of GDP) 0.06920 Positive
11 x169_2013 Female labor force (%) -0.04516 Negative
12 x100_2013 Death rate per 1,000 0.02954 Positive
13 x14_2013 Adjusted savings (% of GNI) 0.01909 Positive
14 x283_2013 Urban population (%) -0.01645 Negative
15 x121_2013 Exports (% of GDP) 0.01477 Positive
16 x156_2013 Improved water source (%) -0.01112 Negative
17 x16_2013 Adj. saving: educ. expenditure (% of GNI) 0.00972 Positive
18 x11_2013 Net national income per capita (US$) 0
19 x139_2013 GDP (US$) 0
20 x140_2013 GDP growth (annual %) 0
21 x142_2013 GDP per capita (US$) 0
22 x149_2013 Health expenditure capita (US$) 0
23 x155_2013 Improved sanitation (%) 0
24 x171_2013 Female life expect. (years) 0
25 x172_2013 Male life expect. (years) 0
26 x173_2013 Total life expect. (years) 0
27 x190_2013 Infant mortality rate /‰ LB 0
28 x192_2013 Under-5 mortality rate /‰ LB 0
29 x204_2013 Out-of-pocket health expend. (% of total) 0
30 x221_2013 Population, 0-14 (%) 0
31 x222_2013 Population, 15-64 (%) 0
32 x223_2013 Female population (%) 0
33 x267_2013 Strength rights index 0
34 x275_2013 Male survival to age 65 (%) 0
35 x29_2013 Age dependency ratio (%) 0
36 x58_2013 Crude birth rate /‰ people 0
37 x67_2012 Cause of death: mat., prenat., nutri. (%) 0
38 x69_2012 Cause of death: NCD (%) 0

Total fertility rate, Adolescent fertility rate, Incidence of TB per 100,000, Total health expenditure (% of GDP), Death rate per 1,000, Adjusted savings (% of GNI), Exports (% of GDP), and Adjusted saving: educ. expenditure (% of GNI), were positively associated with the risk of maternal death. Access to electricity (%), Female survival to age 65 (%), Cause of death: injury (%), Neonatal mortality rate (‰ live birth), GDP per capita growth (%), %female students in primary educ. (%), Female labor force (%), Urban population (%), and Improved water source (%), were negatively associated with the risk of maternal death.

Picture1
Figure 13. Partial view of the lasso regression output

Together, these 17 predictors accounted for 68.54% of the variance in the risk of maternal death (Figure 13)

 

The mean squared error (MSE) for the test data (MSE = 0.1872) is 46 times greater than the MSE for the training data (MSE = 0.0040), suggesting that predictive accuracy did not decline but increased when the lasso regression algorithm developed on the training data set was applied to predict lead the risk of maternal death in the test data set. The regression coefficients progression for lasso paths and the mean squared error on each fold are presented in Figure 14, and Figure 15.

Comments and Conclusion

This project used lasso regression analysis to identify a subset of 2012 or 2013 World Development Indicators that best predict the risk of maternal death among 55 countries. The risk of maternal death ranges from 0.0043% to 4.7250% with a mean of 0.5297% (SD = 1.0433), suggesting that there was considerable variability in the risk across countries.

The lasso regression analysis indicated that 17 of the 38 World Development Indicators were selected in the final model. These 17 predictors are: Total fertility rate, Adolescent fertility rate, Incidence of TB per 100,000, Total health expenditure (% of GDP), Death rate per 1,000, Adjusted savings (% of GNI), Exports (% of GDP), Adjusted saving: educ. expenditure (% of GNI), Access to electricity (%), Female survival to age 65 (%), Cause of death: injury (%), Neonatal mortality rate (‰ live birth), GDP per capita growth (%), %female students in primary educ. (%), Female labor force (%), Urban population (%), and Improved water source (%). They accounted for 68.54% of the observed variability in the risk of maternal death.

 

The other 21 predictors were excluded: Net national income per capita (US$), GDP (US$), GDP growth (annual %), GDP per capita (US$), Health expenditure capita (US$), Improved sanitation (%), Female life expect. (years), Male life expect. (years), Total life expect. (years), Infant mortality rate /‰ LB, Under-5 mortality rate /‰ LB, Out-of-pocket health expend. (% of total), Population, 0-14 (%), Population, 15-64 (%), Female population (%), Strength rights index, Male survival to age 65 (%), Age dependency ratio (%), Crude birth rate /‰ people, Cause of death: mat., prenat., nutri. (%), and Cause of death: NCD (%).

The top 5 strongest predictors of risk of maternal death are:  Total fertility rate, Access to electricity (%), Adolescent fertility, Female survival to age 65 (%), and Cause of death by injury (%).

On one hand, the risk of maternal death increases as function of Total fertility rate, Adolescent fertility rate, Incidence of TB per 100,000, Total health expenditure (% of GDP), Death rate per 1,000, Adjusted savings (% of GNI), Exports (% of GDP), and Adjusted saving: educ. expenditure (% of GNI). On the other hand, the risk decreases as function of Access to electricity (%), Female survival to age 65 (%), Cause of death: injury (%), Neonatal mortality rate (‰ live birth), GDP per capita growth (%), %female students in primary educ. (%), Female labor force (%), Urban population (%), and Improved water source (%).

There was 46-fold increase in the MSE when the training set lasso regression algorithm was used to predict the risk of maternal death in the test data set. This suggests that the predictive accuracy of the algorithm may be unstable in future samples of countries.

The results of this project suggest that effort to reduce Total fertility rate, Adolescent fertility rate, Incidence of TB per 100,000, Total health expenditure (% of GDP), Death rate per 1,000, Adjusted savings (% of GNI), Exports (% of GDP), and Adjusted saving: educ. expenditure (% of GNI), could contribute to reducing the risk of maternal mortality. Efforts designated to increase Access to electricity (%), Female survival to age 65 (%), Cause of death: injury (%), Neonatal mortality rate (‰ live birth), GDP per capita growth (%), %female students in primary educ. (%), Female labor force (%), Urban population (%), and Improved water source (%), could also contribute reducing the risk of maternal mortality.

This project developed an algorithm for the risk of maternal death, which is predictive at 68.54%. It provides more information on World Development Indicators that are most likely to have a significant impact on the risk of maternal death. However, there are some limitations that should be taken into account when considering reduction of maternal death based on the results of this project: 1) the algorithm has a high MSE for the test data set, 2) the low sample size (55 countries), 3) indicators are from 2012 or 2013 only. Another weakness of this project is that there is a large number of development related indicators that were not included in this algorithm. It is possible that the factors identified as important predictors of risk of maternal death among the set of predictors analyzed in this project are confounded by other factors not considered in this analysis. As a result, some factors identified here as strongest predictors may not emerge as important factors when other factors are taken into consideration. Thus, future efforts to develop a solid predictive algorithm for the risk of maternal death should expand the algorithm by adding more indicators to the statistical model.

 

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s