nicholastsmith

Posted on February 16, 2023February 16, 2023 by Nicholas T Smith

The Matrix-Vector Product and the SVD

The singular value decomposition (SVD) allows one to re-write a given matrix as a sum of rank one matrices. Specifically, using the SVD, one may re-write a given matrix A as follows:

$\mathbf{A}=\mathbf{U}\mathbf{\Sigma}\mathbf{V}^T = \sum\limits_{i=1}^{n}{\mathbf{U}_i\mathbf{\Sigma}_{ii}\mathbf{V}_i^T}$ ,

where $\mathbf{V}_i^T$ is the transpose of the i-th column of V. Further, the Eckart-Young-Mirsky theorem proves that the best rank k approximation to the matrix A is found by summing only the first k elements of the right-hand sum.
Read more

Posted on December 23, 2022 by Nicholas T Smith

Cause of Death Records in 2021

This post explores Multiple Cause-of-Death records for the year 2021, taken from the U.S. Division of Vital Statistics. Specifically, the impact of the second full year of the coronavirus pandemic is explored in more detail.
Read more

Posted on December 19, 2022 by Nicholas T Smith

Increases in Circulatory Death During the Coronavirus Pandemic

This post takes a closer look at Multiple Cause-of-Death records during the first year of the Coronavirus Pandemic. In this post, changes in mortality records involving the circulatory system (i.e. ICD-10 codes starting with I) are analyzed in more detail. These codes cover commonly occurring diseases like heart attacks, strokes, and other disease of the cardiovascular and, more broadly, the circulatory system.

Posted on December 6, 2022December 6, 2022 by Nicholas T Smith

Cause of Death in the USA: 1959-2020

This post takes another look at Mortality Multiple Cause-of-Death records from the U.S. Division of Vital Statistics. Previous posts have analyzed records for the year 2016 and also deaths due to Influenza & Pneumonia in 2017. In this post, records from 1959 to the 2020, the latest year currently available, are analyzed.
Read more

Posted on November 3, 2022December 11, 2022 by Nicholas T Smith

Embedding Recipes using Kernel PCA

The previous post discusses Kernel PCA and recipes, or formulae, for deriving new kernels from known good kernels. This post applies these approaches to generate vector embeddings in a specific domain: culinary recipes. The idea is to find a low-dimensional representation of recipes such that points in the embedding space are neighbors to similar recipes.
Read more

Posted on October 31, 2022November 1, 2022 by Nicholas T Smith

Kernel Recipes and Kernel PCA

One strength of kernel methods is their ability to operate directly on non-numerical objects like sets. As seen in the previous post, the Jaccard index on sets satisfies Mercer’s condition and thus is a valid kernel. The process of proving a similarity measure is a valid kernel is somewhat involved, but thankfully several theorems can be employed to get more mileage out of the set of known good kernels. This post outlines some recipes for producing new valid kernels and introduces a method for obtaining numerical representations of samples using kernel methods.
Read more

Posted on October 7, 2022 by Nicholas T Smith

The Jaccard Kernel and its Implied Feature Space

Kernel methods leverage the kernel trick to implicitly perform inner products in often high and even infinite dimensional feature spaces. For instance, the radial basis function (RBF) kernel can be shown to map to an infinite-dimensional feature space. In general, if a similarity function can be shown to satisfy Mercer’s Condition, one may operate on the finite-dimensional Gram matrix induced by the function, while receiving the benefit of mathematical guarantees about the implied transformation.

Posted on September 8, 2022 by Nicholas T Smith

On The Importance of Centering in PCA

The previous post presents methods for efficiently performing principal component analysis (PCA) on certain rectangular sparse matrices. Since routines for performing the singular value decomposition (SVD) on sparse matrices are readily available (e.g. svds and TruncatedSVD), it is reasonable to investigate the influence centering has on the resulting transformation.
Read more

Posted on September 5, 2022December 11, 2022 by Nicholas T Smith

Weighted Sparse PCA for Rectangular Matrices

Consider a sparse mxn rectangular matrix $\mathbf{X}$ , where either $m >> n$ or $m << n$ . Performing a principal component analysis (PCA) on $\mathbf{X}$ involves computing the eigenvectors of its covariance matrix. This is often accomplished using the singular value decomposition (SVD) of the centered matrix $\mathbf{X}-\mathbf{U}$ . But, with large sparse matrices, this centering step is frequently intractable. If it is tractable, however, to compute the eigendecomposition of either an mxm or an nxn dense matrix in memory, other approaches are possible.

Posted on July 8, 2022February 8, 2023 by Nicholas T Smith

Weighted PCA

Consider an m by n matrix $\mathbf{A}$ and an m by 1 vector of integers $\mathbf{w}$ . Now, consider the matrix $\mathbf{R}$ , where $\mathbf{R}$ is formed by taking $\mathbf{A_i}$ , that is the i-th row of $\mathbf{A}$ , and repeating it $\mathbf{w_i}$ times. This new matrix $\mathbf{R}$ has dimension s by n, where

$s = \sum_{i=1}^{m}{\mathbf{w_i}}$ .

If $s >> m$ , then it is undesirable to compute the full matrix $\mathbf{R}$ and perform principal component analysis (PCA) on it. Instead, the highly structured form of $\mathbf{R}$ , suggests there should be a more efficient method to perform this decomposition. This method is known as weighted PCA and is the topic of this post.