I am a big fan of the CountVectorizer class in scikit-learn . With a robust and easy interface that produces (sparse!) matrices, what’s not to love? Well, it’s… pretty… slow…
The performance is okay for 10s of MB of text, but GBs take minutes or more. It terms out that CountVectorizer is implemented in pure Python. The functions are single threaded too. It seems like low-hanging fruit. Just whip up some parallel C++, right? Well, not quite, but I’m getting ahead of myself.
Continue reading “TVLib: A C++ Text Vectorization Library with Python Bindings”
There is a dataset on Kaggle that contains questions taken from Stack Overflow about the Python programming language. This post briefly explores portions of the dataset.
Continue reading “A Look at Stack Overflow Questions about Python”
This post is an introduction to using the TFANN module for classification problems. The TFANN module is available here on GitHub. The name TFANN is an abbreviation for TensorFlow Artificial Neural Network. TensorFlow is an open-source library for data flow programming. Due to the nature of computational graphs, using TensorFlow can be challenging at times. The TFANN module provides several classes that allow for interaction with the TensorFlow API using familiar object-oriented programming paradigms.
Continue reading “Binary Classification with Artificial Neural Networks using Python and TensorFlow”
In this post, deep learning neural networks are applied to the problem of predicting Bitcoin and other cryptocurrency prices. A chartist approach is taken to predict future values; the network makes predictions based on historical trends in the price and trading volume. A 1D convolutional neural network (CNN) transforms an input volume consisting of historical prices from several major cryptocurrencies into future price information.
Continue reading “Cryptocurrency Price Prediction Using Deep Learning in TensorFlow”
In this post, deep learning neural networks are applied to the problem of optical character recognition (OCR) using Python and TensorFlow. This post makes use of TensorFlow and the convolutional neural network class available in the TFANN module. The full source code from this post is available here.
Continue reading “Deep Learning OCR using TensorFlow and Python”
What characteristics do the works of famous authors have that make them unique? This post uses ensemble methods and wordclouds to explore just that.
Project Gutenberg offers a large number of freely available works from many famous authors. The dataset for this post consists of books, taken from Project Gutenberg, written by each of the following authors:
Continue reading “Using Random Forests and Wordclouds to Visualize Feature Importance in Document Classification”
This post is the forth part of a series on creating an AI for the game Path of Exile © (PoE).
- A Deep Learning Based AI for Path of Exile: A Series
- Calibrating a Projection Matrix for Path of Exile
- PoE AI Part 3: Movement and Navigation
- PoE AI Part 4: Real-Time Screen Capture and Plumbing
- AI Plays Path of Exile Part 5: Real-Time Obstacle and Enemy Detection using CNNs in TensorFlow
As discussed in the first post of this series, the AI program takes a screenshot of the game and uses it to form predictions that are then used to update its internal state. In this post, efficient methods for capturing images of the game screen are explored.
Continue reading “PoE AI Part 4: Real-time Screen Capture and Plumbing”