Using Random Forests and Wordclouds to Visualize Feature Importance in Document Classification

What characteristics do the works of famous authors have that make them unique? This post uses ensemble methods and wordclouds to explore just that.

Project Gutenberg offers a large number of freely available works from many famous authors. The dataset for this post consists of books, taken from Project Gutenberg, written by each of the following authors:

  • Austen
  • Dickens
  • Dostoyevsky
  • Doyle
  • Dumas
  • Stevenson
  • Stoker
  • Tolstoy
  • Twain
  • Wells

Continue reading “Using Random Forests and Wordclouds to Visualize Feature Importance in Document Classification”

Advertisements

Analysis of Historical Weather Data for Los Angeles, CA

This post explores historical weather data from Los Angeles, California over the period of 1906 to the present using Pandas and Matplotlib. The data in the post was collected from the National Centers for Environmental Information website. An order must be placed through the website to obtain a (temporary) link to download the data.

Continue reading “Analysis of Historical Weather Data for Los Angeles, CA”