TVLib: A C++ Text Vectorization Library with Python Bindings

I am a big fan of the CountVectorizer class in scikit-learn [1]. With a robust and easy interface that produces (sparse!) matrices, what’s not to love? Well, it’s… pretty… slow…

The performance is okay for 10s of MB of text, but GBs take minutes or more. It terms out that CountVectorizer is implemented in pure Python. The functions are single threaded too. It seems like low-hanging fruit. Just whip up some parallel C++, right? Well, not quite, but I’m getting ahead of myself.

Continue reading “TVLib: A C++ Text Vectorization Library with Python Bindings”

Advertisements

Malware Detection and Classification using Logistic Regression

In this post, an approach to detecting malware using machine learning is presented. System call activity is processed and analyzed by a classification model to detect the presence of malicious applications.

Continue reading “Malware Detection and Classification using Logistic Regression”

Binary Classification with Artificial Neural Networks using Python and TensorFlow

This post is an introduction to using the TFANN module for classification problems. The TFANN module is available here on GitHub. The name TFANN is an abbreviation for TensorFlow Artificial Neural Network. TensorFlow is an open-source library for data flow programming. Due to the nature of computational graphs, using TensorFlow can be challenging at times. The TFANN module provides several classes that allow for interaction with the TensorFlow API using familiar object-oriented programming paradigms.

Continue reading “Binary Classification with Artificial Neural Networks using Python and TensorFlow”

Cryptocurrency Price Prediction Using Deep Learning in TensorFlow

In this post, deep learning neural networks are applied to the problem of predicting Bitcoin and other cryptocurrency prices. A chartist approach is taken to predict future values; the network makes predictions based on historical trends in the price and trading volume. A 1D convolutional neural network (CNN) transforms an input volume consisting of historical prices from several major cryptocurrencies into future price information.

Continue reading “Cryptocurrency Price Prediction Using Deep Learning in TensorFlow”

What is a Bitcoin Worth, Anyway?

Is the recent surge in Bitcoin’s price a speculative bubble?

By definition, an economic bubble is a situation in which an asset is traded within a price range that far exceeds its intrinsic value. So, the question is: what is the intrinsic value of Bitcoin? The purpose of this post is to explain some of the technical details of Bitcoin so as to gain a better idea of its value.

Continue reading “What is a Bitcoin Worth, Anyway?”

Deep Learning OCR using TensorFlow and Python

In this post, deep learning neural networks are applied to the problem of optical character recognition (OCR) using Python and TensorFlow. This post makes use of TensorFlow and the convolutional neural network class available in the TFANN module. The full source code from this post is available here.

Continue reading “Deep Learning OCR using TensorFlow and Python”