Search engine “spiders” are robots that seek out webpages to display in search engines. Below we’ll discuss how they work and why they’re important.

Robots actually have the same basic functionality that earlier browsers had.  Just like these early browsers, search engine robots do not have the ability to do certain things. Robots cannot get past password protected areas. They do not understand frames, Flash movies, nor Images or JavaScript. Even if you use a robot, you have to click the buttons on your website. They can cease to function while using JavaScript navigation or when indexing a dynamically generated URL. A search engine robot retrieves data and finds information and links on the web.

The ‘submit url’ function places the url into a list of urls the robots are going to explore. Even without submitting your url directly, robots will try to find your site by following links. That’s why building visibility through a web of links is seo important.

By collecting and following links, robots manage tn transport themselves all over the internet. Think of it as an internet equivalent of the roads we use in our lives. Robots travel on the roads and read the signposts so they know what leads to where.

To ensure that searchers get the right results with the most relevant response to their query, quick calculations are done to see that this happens. Server logs and log statistics program results can be checked by the user to see what pages have been visited and how often. Some robots may be easy to identify such as Google’s “Googlebot”, while less well-known ones such as Inktomi’s “Slurp” are not easily identifiable. Some robots even appear to be human-powered browsers.

There may be robots that you do not want to visit your website such as aggressive bandwidth grabbing robots and others. The ability to identify individual robots and the number of their visits is useful. Information on the undesirable robots is helpful also. IP names and addresses of search engine robots are listed at the end of this article in a resources section. These robots read the pages on your website by visiting your page and looking at the text that is visible on the page, and then looks at the source code tags such as title tags, meta tags and others. They look at the hyperlinks on your page. From these links, the search engine robot can determine what your page is about. Each search engine has its own algorithm to determine what is important. Information is indexed and delivered to the search engine’s database according to how the robot has been set up through the search engine.

If you’re interested in seeing which pages the spiders have visited on your website, you can check your server logs or the results from your log statistics. From this information you’ll know which spiders have visited, where they went, when they came, and which pages they crawl most often. Some are easy to identify, such as Google’s “Googlebot,” while others are harder: “Slurp” from Inktomi, for example. In addition to identifying which spiders visit, you can also find if any spiders are draining your bandwidth so that you can block them from your site. The internet has plenty of information on identifying these bad bots. There are also certain things can prevent good spiders from crawling your site, such as the site being down or huge amounts of traffic. This can prevent your site from being re-indexed, though most spiders will eventually come by again to try re-accessing the page.


Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.