Mortality in the United States and Its Causes

In this chapter, vital statistics for the United States of America are explored. The Center for Disease Control maintains several datasets containing vital statistics for the nation. These datasets contain records of deaths organized by year. Each record includes age, gender, race, cause of death, and other details. This chapter explores data for the year 2016.

Continue reading “Mortality in the United States and Its Causes”

Advertisements

On Forename Popularity in the USA

In this chapter, forenames in the USA are considered. The United States Social Security Administration (SSA) makes available a dataset containing information about Social Security records. The dataset contains counts of the number of records that exist for a specific first name and birth year.

Continue reading “On Forename Popularity in the USA”

A Statistical Analysis of Facial Attractiveness

An intermediate activation volume produced by a convolutional neural network predicting the attractiveness of a person.


Does beauty truly lie in the eye of its beholder? This chapter explores the complex array of factors that influence facial attractiveness to answer that question or at least to understand it better.

Continue reading “A Statistical Analysis of Facial Attractiveness”

Visualizing Bitcoin Wealth Distribution

This post explores the distribution of wealth among nonempty addresses on the Bitcoin network.

All addresses on the Bitcoin network are queried. The number of addresses with at least one satoshi is 24,473,765 at the time of the query. The resulting addresses are sorted by the amount of Bitcoin they contain. The list is divided into quantiles and the wealth of each quantile is plotted in a bar plot.

Continue reading “Visualizing Bitcoin Wealth Distribution”

Binary Classification with Artificial Neural Networks using Python and TensorFlow

This post is an introduction to using the TFANN module for classification problems. The TFANN module is available here on GitHub. The name TFANN is an abbreviation for TensorFlow Artificial Neural Network. TensorFlow is an open-source library for data flow programming. Due to the nature of computational graphs, using TensorFlow can be challenging at times. The TFANN module provides several classes that allow for interaction with the TensorFlow API using familiar object-oriented programming paradigms.

Continue reading “Binary Classification with Artificial Neural Networks using Python and TensorFlow”

Using Random Forests and Wordclouds to Visualize Feature Importance in Document Classification

What characteristics do the works of famous authors have that make them unique? This post uses ensemble methods and wordclouds to explore just that.

Project Gutenberg offers a large number of freely available works from many famous authors. The dataset for this post consists of books, taken from Project Gutenberg, written by each of the following authors:

  • Austen
  • Dickens
  • Dostoyevsky
  • Doyle
  • Dumas
  • Stevenson
  • Stoker
  • Tolstoy
  • Twain
  • Wells

Continue reading “Using Random Forests and Wordclouds to Visualize Feature Importance in Document Classification”

Analysis of Historical Weather Data for Los Angeles, CA

This post explores historical weather data from Los Angeles, California over the period of 1906 to the present using Pandas and Matplotlib. The data in the post was collected from the National Centers for Environmental Information website. An order must be placed through the website to obtain a (temporary) link to download the data.

Continue reading “Analysis of Historical Weather Data for Los Angeles, CA”