Monthly Archives: December 2014

Biyaheng Panulat 2014

Assalamu Alaykum! (Peace be upon you all!)Last October 2014 I attended the “Biyaheng Panulat 2014” wherein famous Filipino writers and composers came to our campus to, well, supposedly give inspiration to those who are aspiring to be writers and compos…

Seda Centrio CDO Hotel New Year Countdown

Ring in the New Year 2015 at Seda Centrio, Cagayan de Oro’s premiere hotel. A special New Year’s Eve Countdown Party will be held from 7PM to 2AM and for only Php 1,200 nett per person, you get access to a carefully prepared cocktail buffet, live music (c/o local singers …

Handmade acoustic guitars

Whether you play guitar for friends or as a professional, great sound quality can make a real difference. Guitar players of all kinds, from folk to bluegrass to jazz, understand the beautiful sounds that can come from acoustic handmade guitars. Those who make acoustic guitars can be true craftsmen. They …

Almost

“Almost”by Ahmad “Anakiluh” HajiriCamera used: Canon EOS 500DF/22, 1/3200. ISO-100Photo taken in BASECO, Manila.”There are dreams that we can almost reach, yet still unreachable. We just have to keep on believing and dreaming, and sooner we will reach …

UNYPAD-Maguindanao Cluster-1 Chapter implements capacity building program

By Sittie Saada S. Sampayan COTABATO City (December 26, 2014) – The Maguindanao Cluster-1 chapter of the United Youth for Peace and Development (UNYPAD) organized a seminar-workshop on Islamic Leadership and Management, which was actively participated in by the 18 youth leaders of the UNYPAD coming from municipalities of Parang, Barira, Buldon, Matanog, Sultan Mastura

Public speaking training for UNYPAD MSU-Chapter held

“We have a lot of ideas in mind but when we’re in the front of people, we’re mentally blocked and we’re having a stage fright,” expressed by one of the participants in Seminar-Workshop on public speaking organized by the UNYPAD school-based chapter of Mindanao State University-Maguindnao. Yusoph Sabpa, president of UNYPAD MSU-Chapter said that every

Upcoming projects

In sha Allah:December 27, 2014New Muslim Care Philippine’sFeed a Child. Touch a Life Program.for more info please click the image or visit the NMCP’s FB page.2015TOWARDS PEACE 2015.Visit the FB page for more info: Towards Peace 2015March 1, 2015 in Man…

R: Principal Component Analysis on Imaging

Ever wonder what’s the mathematics behind face recognition on most gadgets like digital camera and smartphones? Well for most part it has something to do with statistics. One statistical tool that is capable of doing such feature is the Principal Component Analysis (PCA). In this post, however, we will not do (sorry to disappoint you) face recognition as we reserve this for future post while I’m still doing research on it. Instead, we go through its basic concept and use it for data reduction on spectral bands of the image using R.

Let’s view it mathematically

Consider a line $L$ in a parametric form described as a set of all vectors $k\cdot\mathbf{u}+\mathbf{v}$ parameterized by $k\in \mathbb{R}$, where $\mathbf{v}$ is a vector orthogonal to a normalized vector $\mathbf{u}$. Below is the graphical equivalent of the statement:

So if given a point $\mathbf{x}=[x_1,x_2]^T$, the orthogonal projection of this point on the line $L$ is given by $(\mathbf{u}^T\mathbf{x})\mathbf{u}+\mathbf{v}$. Graphically, we mean

$Proj$ is the projection of the point $\mathbf{x}$ on the line, where the position of it is defined by the scalar $\mathbf{u}^{T}\mathbf{x}$. Therefore, if we consider $\mathbf{X}=[X_1, X_2]^T$ be a random vector, then the random variable $Y=\mathbf{u}^T\mathbf{X}$ describes the variability of the data on the direction of the normalized vector $\mathbf{u}$. So that $Y$ is a linear combination of $X_i, i=1,2$. The principal component analysis identifies a linear combinations of the original variables $\mathbf{X}$ that contain most of the information, in the sense of variability, contained in the data. The general assumption is that useful information is proportional to the variability. PCA is used for data dimensionality reduction and for interpretation of data. (Ref 1. Bajorski, 2012)

To better understand this, consider two dimensional data set, below is the plot of it along with two lines ($L_1$ and $L_2$) that are orthogonal to each other:

If we project the points orthogonally to both lines we have,

So that if normalized vector $\mathbf{u}_1$ defines the direction of $L_1$, then the variability of the points on $L_1$ is described by the random variable $Y_1=\mathbf{u}_1^T\mathbf{X}$. Also if $\mathbf{u}_2$ is a normalized vector that defines the direction of $L_2$, then the variability of the points on this line is described by the random variable $Y_2=\mathbf{u}_2^T\mathbf{X}$. The first principal component is one with maximum variability. So in this case, we can see that $Y_2$ is more variable than $Y_1$, since the points projected on $L_2$ are more dispersed than in $L_1$. In practice, however, the linear combinations $Y_i = \mathbf{u}_i^T\mathbf{X}, i=1,2,\cdots,p$ is maximized sequentially so that $Y_1$ is the linear combination of the first principal component, $Y_2$ is the linear combination of the second principal component, and so on. Further, the estimate of the direction vector $\mathbf{u}$ is simply the normalized eigenvector $\mathbf{e}$ of the variance-covariance matrix $\mathbf{\Sigma}$ of the original variable $\mathbf{X}$. And the variability explained by the principal component is the corresponding eigenvalue $\lambda$. For more details on theory of PCA refer to (Bajorski, 2012) at Reference 1 below.

As promised we will do dimensionality reduction using PCA. We will use the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) data from (Barjorski, 2012), you can use other locations of AVIRIS data that can be downloaded here. However, since for most cases the AVIRIS data contains thousands of bands so for simplicity we will stick with the data given in (Bajorski, 2012) as it was cleaned reducing to 152 bands only.

What is spectral bands?

In imaging, spectral bands refer to the third dimension of the image usually denoted as $\lambda$. For example, RGB image contains red, green and blue bands as shown below along with the first two dimensions $x$ and $y$ that define the resolution of the image.

These are few of the bands that are visible to our eyes, there are other bands that are not visible to us like infrared, and many other in electromagnetic spectrum. That is why in most cases AVIRIS data contains huge number of bands each captures different characteristics of the image. Below is the proper description of the data.

Data

The Airborne Visible/Infrared Imaging Spectrometer (AVIRIS), is a sensor collecting spectral radiance in the range of wavelengths from 400 to 2500 nm. It has been flown on various aircraft platforms, and many images of the Earth’s surface are available. A 100 by 100 pixel AVIRIS image of an urban area in Rochester, NY, near the Lake Ontario shoreline is shown below. The scene has a wide range of natural and man-made material including a mixture of commercial/warehouse and residential neighborhoods, which adds a wide range of spectral diversity. Prior to processing, invalid bands (due to atmospheric water absorption) were removed, reducing the overall dimensionality to 152 bands. This image has been used in Bajorski et al. (2004) and Bajorski (2011a, 2011b). The first 152 values in the AVIRIS Data represent the spectral radiance values (a spectral curve) for the top left pixel. This is followed by spectral curves of the pixels in the first row, followed by the next row, and so on. (Ref. 1 Bajorski, 2012)

To load the data, run the following code:

Above code uses EBImage package, and can be installed from my previous post.

Why do we need to reduce the dimension of the data?

Before we jump in to our analysis, in case you may ask why? Well sometimes it’s just difficult to do analysis on high dimensional data, especially on interpreting it. This is because there are dimensions that aren’t significant (like redundancy) which adds to our problem on the analysis. So in order to deal with this, we remove those nuisance dimension and deal with the significant one.

To perform PCA in R, we use the function princomp as seen below:

The structure of princomp consist of a list shown above, we will give description to selected outputs. Others can be found in the documentation of the function by executing ?princomp.

  • sdev – standard deviation, the square root of the eigenvalues $\lambda$ of the variance-covariance matrix $\mathbf{\Sigma}$ of the data, dat.mat;
  • loadings – eigenvectors $\mathbf{e}$ of the variance-covariance matrix $\mathbf{\Sigma}$ of the data, dat.mat;
  • scores – the principal component scores.

Recall that the objective of PCA is to find for a linear combination $Y=\mathbf{u}^T\mathbf{X}$ that will maximize the variance $Var(Y)$. So that from the output, the estimate of the components of $\mathbf{u}$ is the entries of the loadings which is a matrix of eigenvectors, where the columns corresponds to the eigenvectors of the sequence of principal components, that is if the first principal component is given by $Y_1=\mathbf{u}_1^T\mathbf{X}$, then the estimate of $\mathbf{u}_1$ which is $\mathbf{e}_1$ (eigenvector) is the set of coefficients obtained from the first column of the loadings. The explained variability of the first principal component is the square of the first standard deviation sdev, the explained variability of the second principal component is the square of the second standard deviation sdev, and so on. Now let’s interpret the loadings (coefficients) of the first three principal components. Below is the plot of this,

Base above, the coefficients of the first principal component (PC1) are almost all negative. A closer look, the variability in this principal component is mainly explained by the weighted average of radiance of the spectral bands 35 to 100. Analogously, PC2 mainly represents the variability of the weighted average of radiance of spectral bands 1 to 34. And further, the fluctuation of the coefficients of PC3 makes it difficult to tell on which bands greatly contribute on its variability. Aside from examining the loadings, another way to see the impact of the PCs is through the impact plot where the impact curve $\sqrt{\lambda_j}\mathbf{e}_j$ are plotted, I want you to explore that.

Moving on, let’s investigate the percent of variability in $X_i$ explained by the $j$th principal component, below is the formula of this, \begin{equation}\nonumber \frac{\lambda_j\cdot e_{ij}^2}{s_{ii}}, \end{equation} where $s_{ii}$ is the estimated variance of $X_i$. So that below is the percent of explained variability in $X_i$ of the first three principal components including the cumulative percent variability (sum of PC1, PC2, and PC3),

For the variability of the first 33 bands, PC2 takes on about 90 percent of the explained variability as seen in the above plot. And still have great contribution further to 102 to 152 bands. On the other hand, from bands 37 to 100, PC1 explains almost all the variability with PC2 and PC3 explain 0 to 1 percent only. The sum of the percentage of explained variability of these principal components is indicated as orange line in the above plot, which is the cumulative percent variability.

To wrap up this section, here is the percentage of the explained variability of the first 10 PCs.

PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10
Table 1: Variability Explained by the First Ten Principal Components for the AVIRIS data.
82.057 17.176 0.320 0.182 0.094 0.065 0.037 0.029 0.014 0.005

Above variability were obtained by noting that the variability explained by the principal component is simply the eigenvalue (square of the sdev) of the variance-covariance matrix $\mathbf{\Sigma}$ of the original variable $\mathbf{X}$, hence the percentage of variability explained by the $j$th PC is equal to its corresponding eigenvalue $\lambda_j$ divided by the overall variability which is the sum of the eigenvalues, $\sum_{j=1}^{p}\lambda_j$, as we see in the following code,

Stopping Rules

Given the list of percentage of variability explained by the PCs in Table 1, how many principal components should we take into account that would best represent the variability of the original data? To answer that, we introduce the following stopping rules that will guide us on deciding the number of PCs:

  1. Scree plot;
  2. Simple fare-share;
  3. Broken-stick; and,
  4. Relative broken-stick.

The scree plot is the plot of the variability of the PCs, that is the plot of the eigenvalues. Where we look for an elbow or sudden drop of the eigenvalues on the plot, hence for our example we have

Therefore, we need return the first two principal components based on the elbow shape. However, if the eigenvalues differ by order of magnitude, it is recommended to use the logarithmic scale which is illustrated below,

Unfortunately, sometimes it won’t work as we can see here, it’s just difficult to determine where the elbow is. The succeeding discussions on the last three stopping rules are based on (Bajorski, 2012). The simple fair-share stopping rule identifies the largest $k$ such that $\lambda_k$ is larger than its fair share, that is larger than $(\lambda_1+\lambda_2+\cdots+\lambda_p)/p$. To illustrate this, consider the following:

Thus, we need to stop at second principal component.

If one was concerned that the above method produces too many principal components, a broken-stick rule could be used. The rule is that it identifies the principal components with largest $k$ such that $\lambda_j/(\lambda_1+\lambda_2+\cdots +\lambda_p)>a_j$, for all $j\leq k$, where \begin{equation}\nonumber a_j = \frac{1}{p}\sum_{i=j}^{p}\frac{1}{i},\quad j =1,\cdots, p. \end{equation} Let’s try it,

Above result coincides with the first two stopping rule. The draw back of simple fair-share and broken-stick rules is that it do not work well when the eigenvalues differ by orders of magnitude. In such case, we then use the relative broken-stick rule, where we analyze $\lambda_j$ as the first eigenvalue in the set $\lambda_j\geq \lambda_{j+1}\geq\cdots\geq\lambda_{p}$, where $j < p$. The dimensionality $k$ is chosen as the largest value such that $\lambda_j/(\lambda_j+\cdots +\lambda_p)>b_j$, for all $j\leq k$, where \begin{equation}\nonumber b_j = \frac{1}{p-j+1}\sum_{i=1}^{p-j+1}\frac{1}{i}. \end{equation} Applying this to the data we have,

According to the numerical output, the first 34 principal components are enough to represent the variability of the original data.

Principal Component Scores

The principal component scores is the resulting new data set obtained from the linear combinations $Y_j=\mathbf{e}_j(\mathbf{x}-\bar{\mathbf{x}}), j = 1,\cdots, p$. So that if we use the first three stopping rules, then below is the scores (in image) of PC1 and PC2,

If we base on the relative broken-stick rule then we return the first 34 PCs, and below is the corresponding scores (in image).

Click on the image to zoom in.

Residual Analysis

Of course when doing PCA there are errors to be considered unless one would return all the PCs, but that would not make any sense because why would someone apply PCA when you still take into account all the dimensions? An overview of the errors in PCA without going through the theory is that, the overall error is simply the excluded variability explained by the $k$th to $p$th principal components, $k>j$.

Reference

MILF community leaders attend 3-day seminar-workshop

By Phix Samsaraji ZAMBOANGA City (December 25, 2014) – Hundreds of political leaders and members of the Moro Islamic Liberation Front (MILF) in Zamboanga City participated in the 3-day seminar workshop  on Bangsamoro Governance, Basic Organizing, and advocacy on Bangsamoro Basic Law (BBL) conducted by the Bangsamoro Leadership and Management Institute (BLMI) in partnership with

Becoming Filipino Kyle Jennermann has a message for Mindanaoan.com readers

A lot of my readers have been asking me about Kyle Jennermann, the 27 year old Canadian who wants to become Filipino. Yes, you read that right. Kyle, who apparently has fallen madly in love with our country, has been chronicling his fun and happy escapades as he tries to become Pinoy …

DERMAX Laser Center CDO to open January 2015

DERMAX Laser Center, known for their innovative, science-based and technology-focused skin care products and procedures such as laser treatments for permanent hair removal and skin discolorations, will open its very first branch in Cagayan de Oro on January 8, 2015 at Level 3, Centrio Mall. The DERMAX Laser Center brand …

Pancake House re-opens at Limketkai Mall CDO

Just writing about this is making me swirl with happiness! Pancake House has finally re-opened along the North Concourse of Limketkai Mall, Cagayan de Oro City! After a temporary halt in operations a few months ago (due to the expansion of the newly-opened Globe Gen3 store), Pancake House CDO is …

Trial

Bismillah

Category: Uncategorized

Mindanaoan.com declared top CDO blog by ICT Awards

It is with utmost humility that I announce that Mindanaoan.com has just been declared as a “Top CDO Blog” in the recently concluded 2014 ICT Awards held at the Activity Center, Centrio Mall, Cagayan de Oro, Philippines. The ICT Awards, organized by the CDO ICT Business Council, Cagayan de Oro’s main …