Using Random Forests and Wordclouds to Visualize Feature Importance in Document Classification

What characteristics do the works of famous authors have that make them unique? This post uses ensemble methods and wordclouds to explore just that.

Project Gutenberg offers a large number of freely available works from many famous authors. The dataset for this post consists of books, taken from Project Gutenberg, written by each of the following authors:

  • Austen
  • Dickens
  • Dostoyevsky
  • Doyle
  • Dumas
  • Stevenson
  • Stoker
  • Tolstoy
  • Twain
  • Wells

Continue reading “Using Random Forests and Wordclouds to Visualize Feature Importance in Document Classification”

Analysis of Historical Weather Data for Los Angeles, CA

This post explores historical weather data from Los Angeles, California over the period of 1906 to the present using Pandas and Matplotlib. The data in the post was collected from the National Centers for Environmental Information website. An order must be placed through the website to obtain a (temporary) link to download the data.

Continue reading “Analysis of Historical Weather Data for Los Angeles, CA”

Stock Market Prediction in Python Part 2

This post is part of a series on artificial neural networks (ANN) in TensorFlow and Python.

  1. Stock Market Prediction Using Multi-Layer Perceptrons With TensorFlow
  2. Stock Market Prediction in Python Part 2
  3. Visualizing Neural Network Performance on High-Dimensional Data
  4. Image Classification Using Convolutional Neural Networks in TensorFlow

This post revisits the problem of predicting stock prices based on historical stock data using TensorFlow that was explored in a previous post. In the previous post, stock price was predicted solely based on the date. First, the date was converted to a numerical value in LibreOffice, then the resulting integer value was read into a matrix using numpy. As stated in the post, this method was not meant to be indicative of how actual stock prediction is done. This post aims to slightly improve upon the previous model and explore new features in tensorflow and Anaconda python. The corresponding source code is available here.

Note: See a later post Visualizing Neural Network Performance on High-Dimensional Data for code to help visualize neural network learning and performance.

Continue reading “Stock Market Prediction in Python Part 2”