I am a big fan of the CountVectorizer class in scikit-learn . With a robust and easy interface that produces (sparse!) matrices, what’s not to love? Well, it’s… pretty… slow…
The performance is okay for 10s of MB of text, but GBs take minutes or more. It terms out that CountVectorizer is implemented in pure Python. The functions are single threaded too. It seems like low-hanging fruit. Just whip up some parallel C++, right? Well, not quite, but I’m getting ahead of myself.
Continue reading “TVLib: A C++ Text Vectorization Library with Python Bindings”
In this post, an approach to detecting malware using machine learning is presented. System call activity is processed and analyzed by a classification model to detect the presence of malicious applications.
Continue reading “Malware Detection and Classification using Logistic Regression”
There is a dataset on Kaggle that contains questions taken from Stack Overflow about the Python programming language. This post briefly explores portions of the dataset.
Continue reading “A Look at Stack Overflow Questions about Python”
This post is an introduction to using the TFANN module for classification problems. The TFANN module is available here on GitHub. The name TFANN is an abbreviation for TensorFlow Artificial Neural Network. TensorFlow is an open-source library for data flow programming. Due to the nature of computational graphs, using TensorFlow can be challenging at times. The TFANN module provides several classes that allow for interaction with the TensorFlow API using familiar object-oriented programming paradigms.
Continue reading “Binary Classification with Artificial Neural Networks using Python and TensorFlow”
In this post, deep learning neural networks are applied to the problem of predicting Bitcoin and other cryptocurrency prices. A chartist approach is taken to predict future values; the network makes predictions based on historical trends in the price and trading volume. A 1D convolutional neural network (CNN) transforms an input volume consisting of historical prices from several major cryptocurrencies into future price information.
Continue reading “Cryptocurrency Price Prediction Using Deep Learning in TensorFlow”
Is the recent surge in Bitcoin’s price a speculative bubble?
By definition, an economic bubble is a situation in which an asset is traded within a price range that far exceeds its intrinsic value. So, the question is: what is the intrinsic value of Bitcoin? The purpose of this post is to explain some of the technical details of Bitcoin so as to gain a better idea of its value.
Continue reading “What is a Bitcoin Worth, Anyway?”
In this post, deep learning neural networks are applied to the problem of optical character recognition (OCR) using Python and TensorFlow. This post makes use of TensorFlow and the convolutional neural network class available in the TFANN module. The full source code from this post is available here.
Continue reading “Deep Learning OCR using TensorFlow and Python”