Search This Blog

Tuesday, March 16, 2010

How Search Engines Work

By Justin Harrison

Sometimes referred to as 'spiders' or crawlers, automated search engine robots seek out web pages for the user. Just how do they accomplish this and is this of importance? What is the real purpose of these robots?

A search engine robot is a very simple program that has some basic functionality to help it understand web pages. However, spiders only have limited functionality to interpret websites: they cannot interpret frames, Flash video, images, or JavaScript; they can't enter password-protected areas and can't click buttons; they can be stopped by dynamically-generated URLs and JavaScript navigation. However, within HTML code, they're able to retrieve data by travelling through the web to find information and links.

The 'submit url' function places the url into a list of urls the robots are going to explore. Even without submitting your url directly, robots will try to find your site by following links. That's why building visibility through a web of links is important.

Links are collected from every page that is visited. These links are used in following those links to other pages. The robot gets around on the World Wide Web by following links from one place to another.

When the robots return, the information they gathered is assimilated into the search engine's database. Through a complex algorithm, this data is interpreted and web sites are ranked according to how relevant they are to various topics that would be searched for. Some of the bots are quite easy to notice - Google's is the appropriately-named Googlebot, where Inktomi utilizes a more ambiguous bot named Slurp. Others may be difficult to identify at all.

Once in the database, the information becomes part of the search engine directory and ranking process. Indexing is based on how the search engine engineers have decided to evaluate information returned by the spiders. When you enter a query into a search engine, it uses several calculations behind the scenes to determine which results you're most likely looking for, out of the sites the spiders have returned. The database selects the best matches and displays them. The database is constantly updated by spiders crawling websites over and over again, to make sure that the most up-to-date information is available.

The search engine sorts the information that has been delivered to the databases which has become a part of the search engine and directory ranking process. This allows it to display the results. Databases are updated periodically. Robots visit you regularly to find any changes to your pages so that the latest information will be available. The way in which the search engine is set up determines how the number of visits you get is calculated. This can vary with different search engines. If your website is down or experiencing a large amount of traffic, the robot may not be able to access the page they are trying to visit. The website may not be re-indexed when this occurs. This depends on how frequently your site is visited by the robot. In the hope that your site will be accessible again, the robot will re-visit your site to see if it has become accessible.

About the Author:

No comments:

Post a Comment