Text Mining Online Reviews for Sentiment Analysis

Text Mining Online Reviews for Sentiment Analysis

This post aims to introduce several basic text mining techniques. Sample implementations will be explored in the Scikit-learn library using Anaconda Python.

Introduction

In data science and machine learning, there is often difficulty in extracting useful features from raw data. Textual data presents an interesting challenge in this regards, especially due to its abundance on the internet. Because of its complexity, natural language is often not directly suited to training a classifier or regressor model. The following section discusses several simple ways to extract useful features from raw text. The dataset containing the raw text that will be used can be found here.

Continue reading

Multimodal Biometrics for Enhanced Mobile Device Security

This post is a reference to a contributed article that I helped to co-author which was recently published in Communications of the ACM, Vol. 59 No. 4, Pages 58-65. The article, which I worked on while in graduate school, describes the advantages of using multimodal biometrics to secure mobile devices such as cell phones and tablets. An implementation for the Android OS of such a multimodal biometric system is presented along with results and a conclusion. Please find the article at this link, if you wish to read more.

N

Stock Market Prediction Using Multi-Layer Perceptrons With TensorFlow

Stock Market Prediction Using Multi-Layer Perceptrons With TensorFlow

In this post a multi-layer perceptron (MLP) class based on the TensorFlow library is discussed. The class is then applied to the problem of performing stock prediction given historical data. Note: This post is not meant to characterize how stock prediction is actually done; it is intended to demonstrate the TensorFlow library and MLPs. Update: See part 2 of this series for more examples of using python and TensorFlow for performing stock prediction. Update 2: See a later post Visualizing Neural Network Performance on High-Dimensional Data for code to help visualize neural network learning and performance.

Data Setup

The data used in this post was collected from finance.yahoo.com. The data consists of historical stock data from Yahoo Inc. over the period of the 12th of April 1996 to the 19th of April 2016. The data can be downloaded as a CSV file from the provided link. To pre-process the data for the neural network, first transform the dates into integer values using LibreOffice’s DATEVALUE function. A screen-shot of the transformed data can be seen as follows:

Continue reading

CombinoChord: A Guitar Chord Generator App

CombinoChord: A Guitar Chord Generator App

This post is concerned with an approach to generating guitar chords fingerings given run-time parameters regarding the guitar configuration and player’s hand. The approach is expected to run acceptably on an Android mobile device and should be responsive to user input and should assign conventional fingerings high heuristic scores. The core source code that is described in this post is available at the following git repository. The app is available for download on the Google play store.

Problem Significance

The non-trivial nature of this problem stems from the way in which guitars are constructed. A brute force approach is unsatisfactory because there are a large number of possible candidates the vast majority of which are anatomically impossible or produce incorrect notes. Consider enumerating every possible way in which a player could place his or her fingers (excluding the thumb). Due to the fact that each finger may optionally form a barre, the number of candidates to consider is:

Continue reading

Multi-Layer Perceptrons and Back-Propagation; a Derivation and Implementation in Python

Multi-Layer Perceptrons and Back-Propagation; a Derivation and Implementation in Python

Artificial neural networks have regained popularity in machine learning circles with recent advances in deep learning. Deep learning techniques trace their origins back to the concept of back-propagation in multi-layer perceptron (MLP) networks, the topic of this post.

Multi-Layer Perceptron Networks for Regression

A MLP network consists of layers of artificial neurons connected by weighted edges. Neurons are denoted n_{ij} for the j-th neuron in the i-th layer of the MLP from left to right top to bottom. Inputs are fed into the leftmost layer and propagate through the network along weighted edges until reaching the final, or output, layer. An example of a MLP network can be seen below in Figure 1. Continue reading

Eigenfaces versus Fisherfaces on the Faces94 Database with Scikit-Learn

Eigenfaces versus Fisherfaces on the Faces94 Database with Scikit-Learn

In this post, two basic facial recognition techniques will be compared on the Faces94 database. Images from the Faces94 database are 180 by 200 pixels in resolution and were taken as the subjects were speaking to induce variations in the images. In order to train a classifier with the images, the raw pixel information is extracted, converted to grayscale, and flattened into vectors of dimension 180 \times 200 = 36000. For this experiment, 12 subjects will be used from the database with 20 files will be used per subject. Each subject is confined to a unique directory that contains only 20 image files. Continue reading

Wine Classification Using Linear Discriminant Analysis with Python and SciKit-Learn

Wine Classification Using Linear Discriminant Analysis with Python and SciKit-Learn

In this post, a classifier is constructed which determines to which cultivar a specific wine sample belongs. Each sample consists a vector \textbf{v} of 13 attributes of the wine, that is \textbf{v} \in \mathbb{R}^{13}. The attributes are as follows:

  1. Alcohol
  2. Malic acid
  3. Ash
  4. Alcalinity of ash
  5. Magnesium
  6. Total phenols
  7. Flavanoids
  8. Nonflavanoid phenols
  9. Proanthocyanins
  10. Color intensity
  11. Hue
  12. OD280/OD315 of diluted wines
  13. Proline

Based on these attributes, the goal is to identify from which of three cultivars the data originated. The data set is available at the UCI Machine Learning Repository. Below are shown three sample rows from the data set. Continue reading