## Coffee and feelm.

Coffee and feelm.

Coffee and feelm.

Assalamu Alaykum! (Peace be upon you all!)First, a friendly warning: If you came here hoping to find a helpful tip on how to survive the climb to Mt. Palay-Palay more nown as Pico de Loro (“Parrot’s beak”), you are in the WRONG PLACE! Please find anoth…

Assalamu Alaykum! (Peace be upon you all!)Last October 2014 I attended the “Biyaheng Panulat 2014” wherein famous Filipino writers and composers came to our campus to, well, supposedly give inspiration to those who are aspiring to be writers and compos…

Whether you play guitar for friends or as a professional, great sound quality can make a real difference. Guitar players of all kinds, from folk to bluegrass to jazz, understand the beautiful sounds that can come from acoustic handmade guitars. Those who make acoustic guitars can be true craftsmen. They …

A cat in Ueno.

This is so cute. Kitties desperately wanting fish. Well, create a diversion and then go for the goods. #cat

Chewy is grumpy this chilly morning.

“Almost”by Ahmad “Anakiluh” HajiriCamera used: Canon EOS 500DF/22, 1/3200. ISO-100Photo taken in BASECO, Manila.”There are dreams that we can almost reach, yet still unreachable. We just have to keep on believing and dreaming, and sooner we will reach …

By Sittie Saada S. Sampayan COTABATO City (December 26, 2014) – The Maguindanao Cluster-1 chapter of the United Youth for Peace and Development (UNYPAD) organized a seminar-workshop on Islamic Leadership and Management, which was actively participated in by the 18 youth leaders of the UNYPAD coming from municipalities of Parang, Barira, Buldon, Matanog, Sultan Mastura

“We have a lot of ideas in mind but when we’re in the front of people, we’re mentally blocked and we’re having a stage fright,” expressed by one of the participants in Seminar-Workshop on public speaking organized by the UNYPAD school-based chapter of Mindanao State University-Maguindnao. Yusoph Sabpa, president of UNYPAD MSU-Chapter said that every

In sha Allah:December 27, 2014New Muslim Care Philippine’sFeed a Child. Touch a Life Program.for more info please click the image or visit the NMCP’s FB page.2015TOWARDS PEACE 2015.Visit the FB page for more info: Towards Peace 2015March 1, 2015 in Man…

Ever wonder what’s the mathematics behind face recognition on most gadgets like digital camera and smartphones? Well for most part it has something to do with statistics. One statistical tool that is capable of doing such feature is the Principal Component Analysis (PCA). In this post, however, we will not do (sorry to disappoint you) face recognition as we reserve this for future post while I’m still doing research on it. Instead, we go through its basic concept and use it for data reduction on spectral bands of the image using R.
### Let’s view it mathematically

### What is spectral bands?

### Why do we need to reduce the dimension of the data?

### Stopping Rules

### Principal Component Scores

### Residual Analysis

### Reference

Consider a line $L$ in a parametric form described as a set of all vectors $k\cdot\mathbf{u}+\mathbf{v}$ parameterized by $k\in \mathbb{R}$, where $\mathbf{v}$ is a vector orthogonal to a normalized vector $\mathbf{u}$. Below is the graphical equivalent of the statement:

So if given a point $\mathbf{x}=[x_1,x_2]^T$, the orthogonal projection of this point on the line $L$ is given by $(\mathbf{u}^T\mathbf{x})\mathbf{u}+\mathbf{v}$. Graphically, we mean

$Proj$ is the projection of the point $\mathbf{x}$ on the line, where the position of it is defined by the scalar $\mathbf{u}^{T}\mathbf{x}$. Therefore, if we consider $\mathbf{X}=[X_1, X_2]^T$ be a random vector, then the random variable $Y=\mathbf{u}^T\mathbf{X}$ describes the variability of the data on the direction of the normalized vector $\mathbf{u}$. So that $Y$ is a linear combination of $X_i, i=1,2$. *The principal component analysis identifies a linear combinations of the original variables $\mathbf{X}$ that contain most of the information, in the sense of variability, contained in the data. The general assumption is that useful information is proportional to the variability. PCA is used for data dimensionality reduction and for interpretation of data. (Ref 1. Bajorski, 2012)*

To better understand this, consider two dimensional data set, below is the plot of it along with two lines ($L_1$ and $L_2$) that are orthogonal to each other:

If we project the points orthogonally to both lines we have,

So that if normalized vector $\mathbf{u}_1$ defines the direction of $L_1$, then the variability of the points on $L_1$ is described by the random variable $Y_1=\mathbf{u}_1^T\mathbf{X}$. Also if $\mathbf{u}_2$ is a normalized vector that defines the direction of $L_2$, then the variability of the points on this line is described by the random variable $Y_2=\mathbf{u}_2^T\mathbf{X}$. The first principal component is one with maximum variability. So in this case, we can see that $Y_2$ is more variable than $Y_1$, since the points projected on $L_2$ are more dispersed than in $L_1$. In practice, however, the linear combinations $Y_i = \mathbf{u}_i^T\mathbf{X}, i=1,2,\cdots,p$ is maximized sequentially so that $Y_1$ is the linear combination of the first principal component, $Y_2$ is the linear combination of the second principal component, and so on. Further, the estimate of the direction vector $\mathbf{u}$ is simply the normalized eigenvector $\mathbf{e}$ of the variance-covariance matrix $\mathbf{\Sigma}$ of the original variable $\mathbf{X}$. And the variability explained by the principal component is the corresponding eigenvalue $\lambda$. For more details on theory of PCA refer to (Bajorski, 2012) at Reference 1 below.

As promised we will do dimensionality reduction using PCA. We will use the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) data from (Barjorski, 2012), you can use other locations of AVIRIS data that can be downloaded here. However, since for most cases the AVIRIS data contains thousands of bands so for simplicity we will stick with the data given in (Bajorski, 2012) as it was cleaned reducing to 152 bands only.

In imaging, spectral bands refer to the third dimension of the image usually denoted as $\lambda$. For example, RGB image contains red, green and blue bands as shown below along with the first two dimensions $x$ and $y$ that define the resolution of the image.

These are few of the bands that are visible to our eyes, there are other bands that are not visible to us like infrared, and many other in electromagnetic spectrum. That is why in most cases AVIRIS data contains huge number of bands each captures different characteristics of the image. Below is the proper description of the data.
### Data

The Airborne Visible/Infrared Imaging Spectrometer (AVIRIS), is a sensor collecting spectral radiance in the range of wavelengths from 400 to 2500 nm. It has been flown on various aircraft platforms, and many images of the Earth’s surface are available. A 100 by 100 pixel AVIRIS image of an urban area in Rochester, NY, near the Lake Ontario shoreline is shown below. The scene has a wide range of natural and man-made material including a mixture of commercial/warehouse and residential neighborhoods, which adds a wide range of spectral diversity. Prior to processing, invalid bands (due to atmospheric water absorption) were removed, reducing the overall dimensionality to 152 bands. This image has been used in Bajorski et al. (2004) and Bajorski (2011a, 2011b). The first 152 values in the AVIRIS Data represent the spectral radiance values (a spectral curve) for the top left pixel. This is followed by spectral curves of the pixels in the first row, followed by the next row, and so on. (Ref. 1 Bajorski, 2012)

To load the data, run the following code:

Above code uses EBImage package, and can be installed from my previous post.

Before we jump in to our analysis, in case you may ask why? Well sometimes it’s just difficult to do analysis on high dimensional data, especially on interpreting it. This is because there are dimensions that aren’t significant (like redundancy) which adds to our problem on the analysis. So in order to deal with this, we remove those nuisance dimension and deal with the significant one.

To perform PCA in R, we use the function `princomp`

as seen below:

The structure of `princomp`

consist of a list shown above, we will give description to selected outputs. Others can be found in the documentation of the function by executing `?princomp`

.

`sdev`

– standard deviation, the square root of the eigenvalues $\lambda$ of the variance-covariance matrix $\mathbf{\Sigma}$ of the data,`dat.mat`

;`loadings`

– eigenvectors $\mathbf{e}$ of the variance-covariance matrix $\mathbf{\Sigma}$ of the data,`dat.mat`

;`scores`

– the principal component scores.

Recall that the objective of PCA is to find for a linear combination $Y=\mathbf{u}^T\mathbf{X}$ that will maximize the variance $Var(Y)$. So that from the output, the estimate of the components of $\mathbf{u}$ is the entries of the `loadings`

which is a matrix of eigenvectors, where the columns corresponds to the eigenvectors of the sequence of principal components, that is if the first principal component is given by $Y_1=\mathbf{u}_1^T\mathbf{X}$, then the estimate of $\mathbf{u}_1$ which is $\mathbf{e}_1$ (eigenvector) is the set of coefficients obtained from the first column of the `loadings`

. The explained variability of the first principal component is the square of the first standard deviation `sdev`

, the explained variability of the second principal component is the square of the second standard deviation `sdev`

, and so on. Now let’s interpret the loadings (coefficients) of the first three principal components. Below is the plot of this,

Base above, the coefficients of the first principal component (PC1) are almost all negative. A closer look, the variability in this principal component is mainly explained by the weighted average of radiance of the spectral bands 35 to 100. Analogously, PC2 mainly represents the variability of the weighted average of radiance of spectral bands 1 to 34. And further, the fluctuation of the coefficients of PC3 makes it difficult to tell on which bands greatly contribute on its variability. Aside from examining the loadings, another way to see the impact of the PCs is through the *impact plot* where the *impact curve* $\sqrt{\lambda_j}\mathbf{e}_j$ are plotted, I want you to explore that.

Moving on, let’s investigate the percent of variability in $X_i$ explained by the $j$th principal component, below is the formula of this, \begin{equation}\nonumber \frac{\lambda_j\cdot e_{ij}^2}{s_{ii}}, \end{equation} where $s_{ii}$ is the estimated variance of $X_i$. So that below is the percent of explained variability in $X_i$ of the first three principal components including the cumulative percent variability (sum of PC1, PC2, and PC3),

For the variability of the first 33 bands, PC2 takes on about 90 percent of the explained variability as seen in the above plot. And still have great contribution further to 102 to 152 bands. On the other hand, from bands 37 to 100, PC1 explains almost all the variability with PC2 and PC3 explain 0 to 1 percent only. The sum of the percentage of explained variability of these principal components is indicated as orange line in the above plot, which is the cumulative percent variability.

To wrap up this section, here is the percentage of the explained variability of the first 10 PCs.

PC1 | PC2 | PC3 | PC4 | PC5 | PC6 | PC7 | PC8 | PC9 | PC10 |
---|---|---|---|---|---|---|---|---|---|

Table 1: Variability Explained by the First Ten Principal Components for the AVIRIS data. |
|||||||||

82.057 | 17.176 | 0.320 | 0.182 | 0.094 | 0.065 | 0.037 | 0.029 | 0.014 | 0.005 |

Above variability were obtained by noting that the variability explained by the principal component is simply the eigenvalue (square of the `sdev`

) of the variance-covariance matrix $\mathbf{\Sigma}$ of the original variable $\mathbf{X}$, hence the percentage of variability explained by the $j$th PC is equal to its corresponding eigenvalue $\lambda_j$ divided by the overall variability which is the sum of the eigenvalues, $\sum_{j=1}^{p}\lambda_j$, as we see in the following code,

Given the list of percentage of variability explained by the PCs in Table 1, how many principal components should we take into account that would best represent the variability of the original data? To answer that, we introduce the following stopping rules that will guide us on deciding the number of PCs:

- Scree plot;
- Simple fare-share;
- Broken-stick; and,
- Relative broken-stick.

The scree plot is the plot of the variability of the PCs, that is the plot of the eigenvalues. Where we look for an elbow or sudden drop of the eigenvalues on the plot, hence for our example we have

Therefore, we need return the first two principal components based on the elbow shape. However, if the eigenvalues differ by order of magnitude, it is recommended to use the logarithmic scale which is illustrated below,

Unfortunately, sometimes it won’t work as we can see here, it’s just difficult to determine where the elbow is. The succeeding discussions on the last three stopping rules are based on (Bajorski, 2012). The *simple fair-share* stopping rule identifies the largest $k$ such that $\lambda_k$ is larger than its fair share, that is larger than $(\lambda_1+\lambda_2+\cdots+\lambda_p)/p$. To illustrate this, consider the following:

Thus, we need to stop at second principal component.

If one was concerned that the above method produces too many principal components, a *broken-stick rule* could be used. The rule is that it identifies the principal components with largest $k$ such that $\lambda_j/(\lambda_1+\lambda_2+\cdots +\lambda_p)>a_j$, for all $j\leq k$, where \begin{equation}\nonumber a_j = \frac{1}{p}\sum_{i=j}^{p}\frac{1}{i},\quad j =1,\cdots, p. \end{equation} Let’s try it,

Above result coincides with the first two stopping rule. The draw back of simple fair-share and broken-stick rules is that it do not work well when the eigenvalues differ by orders of magnitude. In such case, we then use the *relative broken-stick* rule, where we analyze $\lambda_j$ as the first eigenvalue in the set $\lambda_j\geq \lambda_{j+1}\geq\cdots\geq\lambda_{p}$, where $j < p$. The dimensionality $k$ is chosen as the largest value such that $\lambda_j/(\lambda_j+\cdots +\lambda_p)>b_j$, for all $j\leq k$, where \begin{equation}\nonumber b_j = \frac{1}{p-j+1}\sum_{i=1}^{p-j+1}\frac{1}{i}. \end{equation} Applying this to the data we have,

According to the numerical output, the first 34 principal components are enough to represent the variability of the original data.

The principal component scores is the resulting new data set obtained from the linear combinations $Y_j=\mathbf{e}_j(\mathbf{x}-\bar{\mathbf{x}}), j = 1,\cdots, p$. So that if we use the first three stopping rules, then below is the scores (in image) of PC1 and PC2,

If we base on the relative broken-stick rule then we return the first 34 PCs, and below is the corresponding scores (in image).

Click on the image to zoom in. |

Of course when doing PCA there are errors to be considered unless one would return all the PCs, but that would not make any sense because why would someone apply PCA when you still take into account all the dimensions? An overview of the errors in PCA without going through the theory is that, the overall error is simply the excluded variability explained by the $k$th to $p$th principal components, $k>j$.

Aslan can’t get enough of spaghetti.

After I made the ‘big move’ late in 2013, I got to spend most of 2014 in Mindanao. Let me share with you how it was for me.. in black and white.

By Phix Samsaraji ZAMBOANGA City (December 25, 2014) – Hundreds of political leaders and members of the Moro Islamic Liberation Front (MILF) in Zamboanga City participated in the 3-day seminar workshop on Bangsamoro Governance, Basic Organizing, and advocacy on Bangsamoro Basic Law (BBL) conducted by the Bangsamoro Leadership and Management Institute (BLMI) in partnership with

Orange Shroomby Ahmad “Anakiluh” Musahari

“When the sky meets the ground”Photo by Ahmad “Anakiluh” MusahariSomewhere in Ternate, Cavite going to Mt.Pico de LoroCamera used: Canon EOS 500DF/22, 1/3200s, ISO 100Focal length: 18mm

I want this Tausug delicacy.

Assalamu Alaykum! (Peace be upon you all!)Just to let you all know, I am auto-scheduling this post to publish later. I am writing this at 9PM of December 20 and hopefully this will be published later, early morning the next day in sha Allah. I have to …

Assalamu Alaykum! (Peace be upon you all!)No, this is not about my first born son. We are still way too early for that! haha. This is about the first baby boy I delivered (or helped or assisted in delivery) during our 24-hour duty at the Labor Room-Del…

DERMAX Laser Center, known for their innovative, science-based and technology-focused skin care products and procedures such as laser treatments for permanent hair removal and skin discolorations, will open its very first branch in Cagayan de Oro on January 8, 2015 at Level 3, Centrio Mall. The DERMAX Laser Center brand …

Assalamu Alaykum (Peace be upon you all!)Our one-month rotation with the department of Obstetrics and Gynecology for our ICC Training (OBGYN 250) finally ended today. It was indeed a wonderful experience, that 30-day madness trying to remember all thos…

Lovers in Makassar.

Bismillah

Yakan wedding procession happening now.

Flags at half mast. ARMM mourns the loss of National Artist Abdulmari Imao of Sulu.

It is with utmost humility that I announce that Mindanaoan.com has just been declared as a “Top CDO Blog” in the recently concluded 2014 ICT Awards held at the Activity Center, Centrio Mall, Cagayan de Oro, Philippines. The ICT Awards, organized by the CDO ICT Business Council, Cagayan de Oro’s main …