A robot is a computer program that automatically reads web pages and goes through every link that it finds. The purpose of robots is to gather information. Some of the most famous robots mentioned in this article work for the search engines, indexing all the information available on the web.
A spider is designed to crawl the Web. The smartest way to keep spiders consistently coming back to visit, is to focus on content freshness. Add new articles, pages, FAQs, good useful information to your Web site on a consistent basis - instead of doing major updates. Instead of adding 20 new articles all at once, try adding only 1 article a day (everyday) and watch what happens.
If the robots can find your site but can't make sense of it, then you may need to look at the content and technology used on your pages. Frames, Flash, dynamically generated pages, and invalid HTML source code can cause problems when the search engine robot tries to access your web pages. While some search engines are beginning to be able to index dynamically generated pages and Flash (e.g. Google and AllTheWeb), use of some of these technologies can hinder your ability to be indexed by the search engine robots.
The whole process begins when a web page is sent to a search engine for submission. The submitted URL is added to the queue of websites that will be visited by the search engine spider. Submissions can be optional though because most spiders will be able to find the content in a web page if other websites link to the page. This is the reason why it is a good idea to build reciprocal links with other website.
Site redirects are also red flags for the search robots. While sites might need to do this on a temporary basis as they are making updates or modifications to their pages, it is important that these are only used for a very short period of time. Long-term redirects equal a site that is not valid in the mind of the robot. Finally, the more external links to your site, the higher regard it will have.
When the search engine robot visits your page, it looks at the visible text on the page, the content of the various tags in your page's source code (title tag, meta tags, etc.), and the hyperlinks on your page. From the words and the links that the robot finds, the search engine decides what your page is about. There are many factors used to figure out what "matters" and each search engine has its own algorithm in order to evaluate and process the information. Depending on how the robot is set up through the search engine, the information is indexed and then delivered to the search engine's database.
Robots.txt file has lots of importance as it allows spiders or crawlers to allow or disallow to crawl all pages of a website or a particular webpage. Sometimes people have some confidential data on their website and by using robots.txt file they can restrict crawlers or spider to not crawl or index that particular page so no one can reach on that page and in this way confidential data on that page will be secure
Article Directory : http://www.articlecube.com