I am a big fan of the CountVectorizer class in scikit-learn . With a robust and easy interface that produces (sparse!) matrices, what’s not to love? Well, it’s… pretty… slow…
The performance is okay for 10s of MB of text, but GBs take minutes or more. It terms out that CountVectorizer is implemented in pure Python. The functions are single threaded too. It seems like low-hanging fruit. Just whip up some parallel C++, right? Well, not quite, but I’m getting ahead of myself.
Continue reading “TVLib: A C++ Text Vectorization Library with Python Bindings”
This chapter explores recessions in the United States of America. Datasets are collected from a variety of locations including the Federal Reserve Economic Data (FRED) and from the website of Yale professor and Nobel laureate Dr. Robert J. Shiller. A classifier model is constructed which predicts recessions and this model is analyzed for useful insights.
Continue reading “On the Analysis and Prediction of Recessions in the USA”
In this post, an approach to detecting malware using machine learning is presented. System call activity is processed and analyzed by a classification model to detect the presence of malicious applications.
Continue reading “Malware Detection and Classification using Logistic Regression”
CMoerae is a cryptocurrency dashboard application. The dashboard displays predictions and market information for 20 of the most popular cryptocurrencies. CMoerae uses machine learning to make up-to-date predictions based on recent market data. The model is similar to that of my Twitter bot RoboInsights.
Continue reading “Introducing CMoerae: A Cryptocurrency Dashboard Application”
An intermediate activation volume produced by a convolutional neural network predicting the attractiveness of a person.
Does beauty truly lie in the eye of its beholder? This chapter explores the complex array of factors that influence facial attractiveness to answer that question or at least to understand it better.
Continue reading “A Statistical Analysis of Facial Attractiveness”
This post is an introduction to using the TFANN module for classification problems. The TFANN module is available here on GitHub. The name TFANN is an abbreviation for TensorFlow Artificial Neural Network. TensorFlow is an open-source library for data flow programming. Due to the nature of computational graphs, using TensorFlow can be challenging at times. The TFANN module provides several classes that allow for interaction with the TensorFlow API using familiar object-oriented programming paradigms.
Continue reading “Binary Classification with Artificial Neural Networks using Python and TensorFlow”
In this post, deep learning neural networks are applied to the problem of predicting Bitcoin and other cryptocurrency prices. A chartist approach is taken to predict future values; the network makes predictions based on historical trends in the price and trading volume. A 1D convolutional neural network (CNN) transforms an input volume consisting of historical prices from several major cryptocurrencies into future price information.
Continue reading “Cryptocurrency Price Prediction Using Deep Learning in TensorFlow”