# Mortality in the United States and Its Causes

In this chapter, vital statistics for the United States of America are explored. The Center for Disease Control maintains several datasets containing vital statistics for the nation. These datasets contain records of deaths organized by year. Each record includes age, gender, race, cause of death, and other details. This chapter explores data for the year 2016.

# The Human Lifespan

Figure 1 shows the distribution of age at death for all records. The plot shows a right-skewed distribution as expected. The leftmost bar stands out somewhat. This bar enumerates infant mortality.

Figure 1: Distribution of Age at Death

Considering only the data in this first bar, another histogram is constructed. This histogram shows the age in months for these records. Figure 2 shows that infant deaths occur most frequently after birth and sharply decline thereafter.

Figure 2: Infant Mortality

Next, the rightmost bars of the histogram are considered. These bars contain records for those older than 100 years old. The records are grouped by gender and race and displayed in a bar plot. The y-axis represents the percentage of centenarians for each race.

Figure 3: Percentage of Centenarians by Race

Figure 3 shows several things. The first is that the majority of people who live at least 100 years are women. In fact, females account for roughly 82% of this number. The second is that people of some races are more likely to survive their first century. Japanese and Chinese are significantly more likely to do so.

Next, records of all ages are grouped by gender. The distributions for men and women are plotted in both a line and bar chart. Figure 4 shows that men die earlier than women. This trend begins in the late teenage years and continues into adulthood. The count of female records outpaces men only near the end of the human lifespan.

Figure 4: Age Distribution at Death by Gender

The average lifespans of men and women are compared. It is found that men live roughly 6.6 years shorter than women. This difference is highly significant. A Welch’s t-test for the difference of means has $t \approx -303$.

Next, the average and standard deviation age at death is computed for each race. The result is shown in a bar plot.

Figure 5: Age at Death by Race

Figure 5 shows that white and Asian people live longer than other races on average. Japanese people have the longest average lifespan together with the lowest standard deviation. The low standard deviation suggests fewer Japanese people die early in life.

Figure 6: Distribution of Age at Death by Race

This is confirmed by plotting the distribution of several races side by side. Figure 6 shows that relatively fewer Japanese people die before reaching the end of the human lifespan.

# Manner of Death

Next, the manner of death is explored. The dataset classifies the manner of death into 7 categories. The categories and their counts are listed in Table 1.

Description Class
Natural 2212118
Unspecified 294239
Accident 160768
Suicide 45155
Homicide 20544
Unknown 12467
TBD 4573

Table 1: Distribution of Age at Death by Race

The average age at death for each category is shown in Figure 7. Deaths from natural causes have the greatest average age. Homicides have the least.

Figure 7: Average Age at Death by Manner of Death

Next, the records are grouped by manner of death and race. Bar charts for accidents, suicides, and homicides are constructed. The y-axis represents the percentage of all deaths for each race accounted for by a specific manner.

Figure 8: Manner of Death by Race

The chart for homicides show that Japanese and Chinese have the lowest homicide rates among all races. This factor contributes to the longevity of these races as death by homicide typically occurs earlier in life. Conversely, homicide rates are highest among blacks. This factor contributes to the relatively shorter average lifespan of the race.

# Underlying Cause of Death

Next, underlying cause of death is explored. Each record is labeled with an ICD-10 code indicating the underlying cause of death. The cumulative percentage of records accounted for by top diseases is computed and the result is shown in Figure 9. As can be seen, a small number of causes are responsible for a large number of deaths. Well over 60% of all deaths are the result of less than 50 causes of mortality.

Figure 9: Cumulative Percentage of Deaths by Top Diseases

Next, the records are grouped by ICD-10 code and the counts of each are computed. The result is shown in a bar chart in Figure 10. The corresponding ICD-10 codes are listed in Table 2.

Figure 10: Leading Causes of Death

ICD-10 Age Std. Age Count Desc
I251 79.9 12.9 161079 Atherosclerotic Heart Disease
C349 71.7 11.0 146786 Malignant Neoplasm: Bronchus or Lung
J449 77.2 11.0 116117 Chronic Obstructive Pulmonary Disease
G309 86.9 7.7 113096 Alzheimer Disease
I219 74.5 14.0 107594 Acute Myocardial Infarction
F03 87.4 7.9 100901 Dementia
I500 83.6 11.6 64439 Congestive Heart Failure
I250 71.9 14.8 62909 Atherosclerotic Cardiovascular Disease
I64 81.2 12.1 61818 Stroke
J189 79.7 14.3 42189 Pneumonia

Table 2: Leading Causes of Death

Heart disease accounts for the largest number of deaths. Atherosclerosis, the build-up of plaque on the arterial walls, is involved in several of the leading causes of death. Lung cancer and COPD are also responsible for a sizable portion of the records. Both of these pulmonary conditions are strongly associated with smoking.

Next, causes of death in those under the age of 50 are explored. A similar bar chart and table are constructed from these records.

Figure 11: Leading Causes of Death under 50

ICD-10 Age Std. Age Count Desc
X42 40.8 13.0 19167 Accidental Poisoning by and Exposure to Narcotics
X44 42.2 13.5 16872 Accidental Poisoning by and Exposure to Unspecified Drugs
X95 32.3 13.3 11466 Assault by Unspecified Firearm Discharge
X70 40.0 16.7 8425 Intentional Self-Harm by Hanging Strangulation and Suffocation
V892 43.0 21.5 7900 Person Injured in a Motor-Vehicle Accident
X74 50.1 19.6 6826 Intentional Self-Harm by Unspecified Firearm Discharge
I219 74.5 14.0 5375 Acute Myocardial Infarction
C509 68.7 14.9 4880 Malignant Neoplasm: Breast
R99 55.9 28.8 4873 Other Ill-Defined and Unspecified Causes of Mortality
I250 71.9 14.8 4284 Atherosclerotic Cardiovascular Disease

Table 3: Leading Causes of Death Under 50

The leading causes of death in those under 50 are not due to disease processes. Drug overdose, homicide, and suicide lead. The only diseases present in the top 10 leading causes of death are breast cancer and heart disease.

Next, deaths caused by cancer are considered for all ages. Lung cancer accounts for a clear majority of deaths due to cancer. The large number of deaths due to pancreatic cancer are presumed due to the present difficulty in treating it. The prognosis for breast cancer is better, though it is a more common diesease.

Figure 12: Leading Causes of Death by Cancer

ICD-10 Age Std. Age Count Desc
C349 71.7 11.0 146786 Malignant Neoplasm: Bronchus or lung
C259 71.8 11.9 42121 Malignant Neoplasm: Pancreas
C509 68.7 14.9 41913 Malignant Neoplasm: Breast
C189 72.0 14.0 39249 Malignant Neoplasm: Colon
C61 78.6 10.5 30396 Malignant Neoplasm: Prostate
C80 72.4 13.4 27845 Malignant Neoplasm: Unspecified Site
C679 77.7 11.6 16586 Malignant Neoplasm: Bladder
C719 64.1 16.1 15303 Malignant Neoplasm: Brain
C159 69.5 11.9 15285 Malignant Neoplasm: Esophagus
C56 69.8 13.0 14242 Malignant Neoplasm: Ovary

Table 4: Leading Causes of Death by Cancer

Next, methods of suicide are considered. A similar table and bar chart are constructed only from records due to suicide.

Figure 13: Most Common Methods of Suicide

ICD-10 Age Std. Age Count Desc
X74 50.1 19.6 13948 Intentional Self-Harm by Unspecified Firearm Discharge
X70 40.0 16.7 11682 Intentional Self-Harm by Hanging Strangulation and Suffocation
X72 50.3 19.8 6116 Intentional Self-Harm by Handgun Discharge
X64 50.3 15.1 3241 Intentional Self-Poisoning by and Exposure to Unspecified Drugs
X73 47.7 19.5 2892 Intentional Self-Harm by Rifle, Shotgun and Larger Firearm Discharge
X67 47.9 16.7 1369 Intentional Self-Poisoning by Exposure to Gases (CO2, Helium, etc)
X80 43.3 18.0 1123 Intentional Self-Harm by Jumping from a High Place
X61 49.1 15.5 1064 Intentional Self-Poisoning by Exposure to Sedatives

Table 5: Most Common Methods of Suicide

The most common method of suicide, by a significant margin, is via firearm. Hanging is also prevalant. Intentional poisoning is a distant third.

# Education Levels

Finally, records are grouped by education level. Education level is recorded as a categorical variable with 9 categories based on different educational milestones. The descriptions for each of the categories are shown in Table 5.

Category Description
2 9 – 12th Grade, No Diploma
3 High School Graduate or GED Completed
4 Some College Credit, but No Degree
5 Associate Degree
6 Bachelors Degree
7 Masters Degree
8 Doctorate or Professional Degree
9 Unknown

Table 6: Education Levels with Categorical Labels

The categories increase with level of education. Records with unknown education level are discarded; they account for less than 2% of all records.

A bar chart is constructed of the average age of each group. The result is shown in Figure 14. People who complete at least a bachelor’s degree live longer on average.

Figure 14: Average Age at Death by Education Level

The bar chart also suggests a modest increasing trend with education. To further explore this trend, a scatter plot is constructed from the data points. A trend line is fit to the data and the coefficient of determination is computed.

Figure 15: Relationship Between Education and Lifespan

The $R^{2}$ of the fit is 0.641, the general F-statistic of the model is 12.512 with a corresponding p-value = 0.008. The coefficient of age is 0.907 and is significant. The coefficient indicates that average lifespan increases by roughly 1 year for each educational milestone completed.