Maps are a fundamental data structure. Their prevalence is a testament to their importance. Indeed, many search problems can be reduced to the construction of an appropriate map. However, a search problem occasionally arises that is difficult to solve, at least directly, with a map. The interval query problem is one such problem.
Datasets containing nonhomogenous groups of samples present a challenge to linear models. In particular, such datasets violate the assumption that there is a linear relationship between the independent and dependent variables. If the data is grouped into distinct clusters, linear models may predict responses that fall in between the clusters. These predictions can be quite far from the targets depending on how the data is structured. In this post, a method is presented for automatically handling nonhomogenous datasets using linear models.
I am a big fan of the CountVectorizer class in scikit-learn . With a robust and easy interface that produces (sparse!) matrices, what’s not to love? Well, it’s… pretty… slow…
The performance is okay for 10s of MB of text, but GBs take minutes or more. It terms out that CountVectorizer is implemented in pure Python. The functions are single threaded too. It seems like low-hanging fruit. Just whip up some parallel C++, right? Well, not quite, but I’m getting ahead of myself.
In this post, an approach to detecting malware using machine learning is presented. System call activity is processed and analyzed by a classification model to detect the presence of malicious applications.
There is a dataset on Kaggle that contains questions taken from Stack Overflow about the Python programming language. This post briefly explores portions of the dataset.
This post is an introduction to using the TFANN module for classification problems. The TFANN module is available here on GitHub. The name TFANN is an abbreviation for TensorFlow Artificial Neural Network. TensorFlow is an open-source library for data flow programming. Due to the nature of computational graphs, using TensorFlow can be challenging at times. The TFANN module provides several classes that allow for interaction with the TensorFlow API using familiar object-oriented programming paradigms.
In this post, deep learning neural networks are applied to the problem of predicting Bitcoin and other cryptocurrency prices. A chartist approach is taken to predict future values; the network makes predictions based on historical trends in the price and trading volume. A 1D convolutional neural network (CNN) transforms an input volume consisting of historical prices from several major cryptocurrencies into future price information.