Using Random Forests and Wordclouds to Visualize Feature Importance in Document Classification

What characteristics do the works of famous authors have that make them unique? This post uses ensemble methods and wordclouds to explore just that.

Project Gutenberg offers a large number of freely available works from many famous authors. The dataset for this post consists of books, taken from Project Gutenberg, written by each of the following authors:

  • Austen
  • Dickens
  • Dostoyevsky
  • Doyle
  • Dumas
  • Stevenson
  • Stoker
  • Tolstoy
  • Twain
  • Wells

Continue reading “Using Random Forests and Wordclouds to Visualize Feature Importance in Document Classification”

Advertisements

Visualizing Neural Network Performance on High-Dimensional Data

This post is part of a series on artificial neural networks (ANN) in TensorFlow and Python.

  1. Stock Market Prediction Using Multi-Layer Perceptrons With TensorFlow
  2. Stock Market Prediction in Python Part 2
  3. Visualizing Neural Network Performance on High-Dimensional Data
  4. Image Classification Using Convolutional Neural Networks in TensorFlow

This post presents a short script that plots neural network performance on high-dimensional binary data using MatPlotLib in Python. Binary vectors, or vectors only containing 0 and 1, can be useful for representing categorical data or discrete phenomena. The code in this post is available on GitHub.

Continue reading “Visualizing Neural Network Performance on High-Dimensional Data”