I'm David and this is my blog.


Friday, April 20, 2007

Anatomy of a Search Engine

Today's search engines are pretty impressive if you think about it. Type in a keyword or three and you're a just couple clicks away from the information you want. Of course, the internet is young and it was just yesterday that Google and Yahoo! were born.

In the net's early years, there was Gopher and WAIS. Before that there were just printed directories of websites and advertisement through email and word of mouth. Since it's always good to know more about the tools we use, I'm going to dissect the search engine and make its anatomy our subject today.

Crawl before you walk

The first step in creating a search engine is to start building a list of websites. This is done through a process called crawling. The word has its root in the metaphor of a spider crawling across her web. It's done by one or more computers running web spider or robot software.

These robots only job is to constantly load and save webpages. How do they find which webpages to save? Naturally, by following links from other webpages! Sounds simple but here's the hard part. Since the internet has more than 8 billion pages, it takes a lot of computers and a lot of bandwidth to do a timely and effective crawl.

Index the text


So you've managed to crawl the entire internet and you've now got all these billions of webpages saved on your computers. Now, you could try to search this mess directly but it would take a really really fantastically long time to get results. That's where indexing comes in. The process of indexing these pages is to create lists of words, numbers, and phrases and keep track of everywhere they appear.

For example, part of our search engine would be a list of all occurrences of the word "pepper". When you've got lists like that organized for every term you run across, you can allow people to start entering words and phrases. Even better, you can give them some links to webpages that featured their terms. In the world of search engine technology, this brings us up to about 1995.

A sorted affair

The key advances that have been made by all the search engines since 1995 have been in the area of sorting. Sorting the results returned from an index is fairly difficult when there may be a million or a billion results returned for a common search like "pepper". This is where a reputation system and click-tracking come in. A reputation system like Google's PageRank helps Google by giving them a single number to compare between sites to decide which to show you.

You don't want to see a page full of ads when you search for pepper and Google's PageRank helps them make that decision. They base their PageRank on the quality and number of links that point to a particular web page. Take this PageRank and combine it with Google's records of what people clicked on in the past for words like "pepper" and you start to have a pretty good search engine.

One word: advertising

Having a good search will bring traffic in droves. Once you get all this traffic, it's time to monetize. This is where Google has really shined. They created a small ad format and a real time auction that shows the ads you're most likely to click on. Every time you click, Google makes some money. If the ad is displayed on a website other than Google's, the owner of that website gets some cash too.

I don't have to tell you the power of Google's position. It's like the hypothalamus of the Internet, helping to direct attention efficiently. If you understand this article and have the desire, you can be the next Google. They'll be unseated sooner or later and it might as well be you raking it in, right? Short of that, I hope you better understand and can more easily find what you're looking for using our tool of tools, the search engine.

2 Comments:

  • Cool, great job explaining Google's algorithm in a couple paragraphs. Time for me to unseat Google by undercutting their advertising by a penny.

    By Blogger qian, At 4/27/07 1:19 PM  

  • Thanks qian. I like your penny-off plan. When you make your billions, please hook me up with some stock options for uhhh consulting!

    By Blogger David, At 5/6/07 5:09 PM  

Post a Comment



<< Home