Updated April 8th 2020
I’ve spent a disturbing amount of time trying to come up with a decent model for the CoVID-19 pandemic. The big challenge is how little good data there is. Pretty much all available data is riddled with confounding variables and bias. There is a long list of challenges but some I find most daunting are that:
Continue reading “CoVID-19 Projections using Kernel SVR and Death Rate Analysis”
The following plots explore the influence of two economic factors during recessions on the S&P 500 index: unemployment and gross domestic product (GDP). A linear model is constructed to predict the low of the S&P 500 index during a given recession using two transformed variables derived from the maximum unemployment and GDP differentials for that recession.
Figure 1: Multiple Regression Analysis of Past Recessions
Predictions for the decrease in GDP and unemployment levels are taken to be 25% and 20% respectively. These are taken from Goldman Sachs predictions and estimates from Mnuchin respectively. A ridge regression model is fit with regularization weight chosen using grid search and leave one out validation.
Past recessions with more similarity both temporally and in terms of the current estimates are given more weight. By so doing, the model focuses on past recessions that are more similar in nature to the current situation. In the above plots, the weight of each recession is indicated by the size of the scatter plot marker.
Note: This post is for informational purposes only and does not constitute financial, professional, or any other form of advice.
Decision trees are a simple yet powerful method of machine learning. A binary tree is constructed in which the leaf nodes represent predictions. The internal nodes are decision points. Thus, paths from the root to the leafs represent sequences of decisions that result in an ultimate prediction.
Decision trees can also be used in hierarchical models. For instance, the leafs can instead represent subordinate models. Thus, a path from the root to a leaf node is a sequence of decisions that result in a prediction made by a subordinate model. The subordinate model is only responsible for predicting samples that fall within the leaf.
This post presents an approach for a hierarchical decision tree model with subordinate linear regression models.
Continue reading “Applying Correlation as a Criterion in Hierarchical Decision Trees”
Datasets containing nonhomogenous groups of samples present a challenge to linear models. In particular, such datasets violate the assumption that there is a linear relationship between the independent and dependent variables. If the data is grouped into distinct clusters, linear models may predict responses that fall in between the clusters. These predictions can be quite far from the targets depending on how the data is structured. In this post, a method is presented for automatically handling nonhomogenous datasets using linear models.
Continue reading “A Method for Addressing Nonhomogeneous Data using an Implicit Hierarchical Linear Model”
A problem that frequently arises when applying linear models is that of multicollinearity. The term multicollinearity describes the phenomenon where one or more features in the data matrix can be accurately predicted using a linear model involving others of the features. The consequences of multicollinearity include numerical instability due to ill-conditioning, and difficulty in interpreting the regression coefficients. An approach to decorrelate features is presented using the Gram-Schmidt process.
Continue reading “Decorrelating Features using the Gram-Schmidt Process”
This chapter explores recessions in the United States of America. Datasets are collected from a variety of locations including the Federal Reserve Economic Data (FRED) and from the website of Yale professor and Nobel laureate Dr. Robert J. Shiller. A classifier model is constructed which predicts recessions and this model is analyzed for useful insights.
Continue reading “On the Analysis and Prediction of Recessions in the USA”
Recently, I have been experimenting with windowing functions for time series data. While trying out my code, I came up with a nice and (somewhat) thought-provoking plot.
Continue reading “Real S&P Composite Moving Average”