How Google Used Eigenvalues and Eigenvectors to Rank the Entire Internet

5–8 minutes

Introduction

When students first learn eigenvalues and eigenvectors, they often encounter them as abstract mathematical concepts. They learn how to compute them, solve homework problems, and pass exams, but many are left wondering:

What are eigenvalues and eigenvectors actually used for?

One of the most famous applications in modern history comes from Google.

Google’s original search algorithm, called PageRank, used eigenvectors to determine the importance of webpages. At a time when the internet was growing rapidly, Google needed a way to decide which webpages deserved to appear first in search results.

Remarkably, the solution came from linear algebra.

In this article, we will see how Google transformed the internet into a giant matrix and how the dominant eigenvector of that matrix became the foundation of webpage ranking.


The Search Problem

Imagine searching for:

“Learn Bayesian Statistics”

Millions of webpages contain those words.

Which webpage should appear first?

Early search engines mostly relied on keyword matching.

The idea was simple:

  • Count how often the search terms appear.
  • Pages containing more occurrences rank higher.

Unfortunately, this system was easy to manipulate.

A webpage owner could simply repeat:

“Bayesian Statistics Bayesian Statistics Bayesian Statistics…”

hundreds of times.

The page might rank highly despite having little useful content.

Google needed a better approach.


Google’s Fundamental Insight

Larry Page and Sergey Brin realized something important.

Instead of asking:

What does a webpage say about itself?

they asked:

What do other webpages say about it?

Suppose:

  • Harvard links to a webpage.
  • MIT links to a webpage.
  • Stanford links to a webpage.

That webpage is probably useful.

A hyperlink can therefore be viewed as a vote.

The more links a page receives, the more important it may be.

However, another problem immediately appears.

Should every vote count equally?

A vote from Harvard should probably matter more than a vote from an unknown personal blog.

This creates a recursive definition.

A page is important if important pages link to it.

But how do we know those pages are important?

Because important pages link to them.

This circular relationship turns out to be exactly what eigenvectors solve.


Representing the Web as a Graph

Consider four webpages:

  • Page A
  • Page B
  • Page C
  • Page D

Suppose:

  • A links to B and C
  • B links to C
  • C links to A
  • D links to C

Graphically:

A → B

A → C

B → C

C → A

D → C

Mathematicians call this a directed graph.

The internet itself can be viewed as an enormous directed graph containing billions of webpages and hyperlinks.


Converting the Graph into a Matrix

Linear algebra works with matrices.

We therefore represent the graph using an adjacency matrix.

Rows represent pages receiving links.

Columns represent pages sending links.

$$A=\begin{bmatrix}0 & 0 & 1 & 0 \\ 1 & 0 & 0 & 0 \\ 1 & 1 & 0 & 1 \\ 0 & 0 & 0 & 0\end{bmatrix}$$

Interpretation:

$$A_{31}=1$$

means Page A links to Page C.

Similarly:

$$A_{21}=1$$

means Page A links to Page B.

Every hyperlink on the internet can be encoded in a matrix.


Assigning Importance Scores

Suppose every page receives an importance score.

Let:

$$x=\begin{bmatrix}x_A \\ x_B \\ x_C \\ x_D\end{bmatrix}$$

where:

  • $$x_A$$ = importance of Page A
  • $$x_B$$ = importance of Page B
  • $$x_C$$ = importance of Page C
  • $$x_D$$ = importance of Page D

Google’s key idea was:

The importance of a page should equal the importance flowing into it from other pages.

Mathematically:

$$Ax=x$$

This equation says that when the matrix acts on the importance vector, the result is the same vector.

Rearranging:

$$(A-I)x=0$$

This is an eigenvector equation.


What Is an Eigenvector?

An eigenvector is a special vector that keeps the same direction when multiplied by a matrix.

The defining equation is:

$$Av=\lambda v$$

where:

  • $$A$$ is a matrix
  • $$v$$ is an eigenvector
  • $$\lambda$$ is an eigenvalue

Most vectors change direction after matrix multiplication.

Eigenvectors do not.

They are only scaled by the eigenvalue.


A Simple Example

Consider:

$$A=\begin{bmatrix}2 & 0 \\ 0 & 3\end{bmatrix}$$

Take:

$$v=\begin{bmatrix}1 \\ 0\end{bmatrix}$$

Then:

$$Av=\begin{bmatrix}2 \\ 0\end{bmatrix}$$

Notice:

$$Av=2v$$

Thus:

$$\lambda=2$$

is an eigenvalue.

Similarly:

$$v=\begin{bmatrix}0 \\ 1\end{bmatrix}$$

gives:

$$Av=\begin{bmatrix}0 \\ 3\end{bmatrix}=3v$$

and therefore:

$$\lambda=3$$

is another eigenvalue.


Why Eigenvectors Matter for Google

Google wanted webpage rankings that would eventually stabilize.

Imagine importance flowing through hyperlinks.

Pages pass importance to other pages.

Those pages pass importance onward.

After enough iterations, the ranking should settle into a stable pattern.

That stable pattern is an eigenvector.

The eigenvector contains the relative importance of every webpage.


The Random Surfer Model

Google introduced a beautiful interpretation called the Random Surfer Model.

Imagine a user browsing the internet.

The user:

  1. Opens a webpage.
  2. Clicks a random hyperlink.
  3. Repeats forever.

Question:

After a very long time, where will this user spend most of their time?

Pages visited frequently should receive higher rankings.

This transforms the problem into a Markov chain.


Building the Probability Matrix

Suppose Page A links to two pages.

Each outgoing link receives probability:

$$\frac12$$

Instead of counting links, we use probabilities.

The transition matrix becomes:

$$P=\begin{bmatrix}0 & 0 & 1 & 0 \\ \frac12 & 0 & 0 & 0 \\ \frac12 & 1 & 0 & 1 \\ 0 & 0 & 0 & 0\end{bmatrix}$$

Each column sums to one.

The matrix describes how a random surfer moves through the web.


The PageRank Vector

Let:

$$r=\begin{bmatrix}r_A \\ r_B \\ r_C \\ r_D\end{bmatrix}$$

represent the long-run probabilities of visiting each page.

Eventually the probabilities stabilize.

At equilibrium:

$$Pr=r$$

Compare this with:

$$Av=\lambda v$$

We obtain:

$$Pr=1r$$

Thus:

$$r$$

is an eigenvector associated with:

$$\lambda=1$$

This eigenvector becomes the PageRank vector.


Why Does Eigenvalue 1 Appear?

Suppose the rankings have stabilized.

After another click, the probabilities should remain unchanged.

Therefore:

$$Pr=r$$

The vector is unchanged.

The scaling factor is:

$$1$$

which means:

$$\lambda=1$$

represents equilibrium.


Computing the PageRank Vector

Google cannot directly solve systems involving billions of webpages.

Instead, it uses repeated multiplication.

Start with:

$$r_0=\begin{bmatrix}0.25 \\ 0.25 \\ 0.25 \\0.25\end{bmatrix}$$

Then compute:

$$r_1=Pr_0$$

$$r_2=Pr_1$$

$$r_3=Pr_2$$

and continue.

Eventually:

$$r_k \rightarrow r$$

The process converges to the dominant eigenvector.

This procedure is called Power Iteration.


The Damping Factor

A practical problem appears.

Some webpages have no outgoing links.

Others form loops.

A surfer could become trapped.

Google solved this by introducing a damping factor.

Assume:

  • 85% of the time the user follows a hyperlink.
  • 15% of the time the user jumps randomly to another page.

The PageRank equation becomes:

$$r=\alpha Pr+(1-\alpha)\frac{1}{N}e$$

where:

  • $$\alpha=0.85$$
  • $$N$$ = total number of webpages
  • $$e$$ = vector of ones

This guarantees a unique solution.


Connections to Statistics

Eigenvalues and eigenvectors appear throughout statistics.

Principal Component Analysis

PCA finds eigenvectors of covariance matrices.

These eigenvectors identify directions of maximum variance.

Factor Analysis

Many factor analysis methods rely on eigenvalue decompositions.

Spectral Clustering

Graph clustering methods use eigenvectors extensively.

Markov Chains

Long-run equilibrium distributions correspond to eigenvectors associated with eigenvalue one.

PageRank itself is fundamentally a Markov chain problem.


Applications Beyond Google

Eigenvalues and eigenvectors appear in:

  • Machine Learning
  • Recommendation Systems
  • Finance
  • Social Network Analysis
  • Epidemiology
  • Image Compression
  • Engineering
  • Quantum Mechanics

They are among the most important concepts in applied mathematics.


The Big Intuition

One of the best ways to think about an eigenvector is:

An eigenvector represents a stable pattern hidden inside a system.

For Google:

  • The system was the internet.
  • The hidden pattern was webpage importance.

The dominant eigenvector revealed that pattern.


Conclusion

Google transformed the internet into a giant matrix.

Hyperlinks became votes.

Browsing became a Markov process.

Importance became an eigenvector.

At the heart of PageRank lies one elegant equation:

$$Av=\lambda v$$

What began as a problem of ranking webpages became a beautiful application of linear algebra.

The next time you study eigenvalues and eigenvectors, remember that these concepts helped organize billions of webpages and changed the way humanity accesses information.

References

Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The PageRank Citation Ranking: Bringing Order to the Web.

Langville, A., & Meyer, C. (2012). Google’s PageRank and Beyond.

Strang, G. (2016). Introduction to Linear Algebra.

Meyer, C. (2000). Matrix Analysis and Applied Linear Algebra.

Leave a Reply

Discover more from Nerdish.Org

Subscribe now to keep reading and get access to the full archive.

Continue reading