Google-Translate-Chinese (Simplified) BETA | Google-Translate-English to German | Google-Translate-English to Korean BETA | Google-Translate-English to Russian BETA | Google-Translate-English to Spanish

Welcome Guest

Search:


ReadEZArchive.com » Internet » Search-engines » Search Engine Spiders And Your Robots.txt File

Search Engine Spiders And Your Robots.txt File

View PDF | Print View
by: eatmoreherbs
Total views: 2
Word Count: 588



In this article we will discuss search engine spiders and what they do. You will also learn how to create a robots.txt file and why you might need one.

Search engine spiders are automated software programs that crawl the Web looking for pages to feed to search engines. They are also called crawlers, robots and bots. Spiders are one of the most useful programs on the internet. They are a key part in how the search engines operate. Spiders allow your site to be found by the millions of people who use search engines. Feed the spiders right and they will tell the search engines about your site.

How Spiders Work

A search engine is an index to the Internet, search engines point to relevant web sites depending on your search. Search engines need a tool that is able to visit websites, navigate the websites, decide what the website is about and add that data to the search engine.

Spiders are essentially programs that "crawl" sites and report back to their boss their findings. Their purpose in life is to make it easy for your site to get listed in search engines.

Spiders work by finding links to web sites, visiting those web sites, going through the content of a web site and then reporting the content of the site back to the database of the search engine they work for. From there, the information is added to the search engine, and the site then shows up in search results.

The robots.txt file

By defining a few rules, you can tell robots to not crawl certain directories or files, within your site. Web sites do not absolutely have to have a robots.txt file, they can get along just fine without one. Most spiders look for a robots.txt file as soon as they arrive on your site. Take a look at your site statistics. If your statistics has a "files not found" section, you may see many entries where spiders failed to find the file on your site.

The default behavior is to allow all unless you have a Disallow for that resource. If you wish to exclude some of your pages from search engine indexing, this is the tool approved by the search engines. Creating a robots.txt file that guides spiders is simple.

If you want to allow the spiders to crawl your site but exclude directories of your choice, copy and paste the following into a blank txt file:

User-agent: *
Disallow: /directory1/
Disallow: /directory2/
Disallow: /directory3/

To exclude files of your choice, type in the path to the files you want to exclude:

User-agent: *
Disallow: /directory1/page1.html
Disallow: /directory2/page2.html
Disallow: /directory3/page3.html

To exclude all the search engine spiders from your entire web site, copy and paste the following into the txt file:

User-agent: *
Disallow: /

This will keep a specific search engine spider from indexing your site:

User-agent: Name_of_Robot
Disallow: /

To allow a single robot and exclude all other robots:

User-agent: Googlebot
Disallow:
User-agent: *
Disallow: /

There can only be one robots.txt on a site, and you may not have blank lines in a record. Once you have it the way you want, save the file as "robots" and as a .txt file. Uploading the file to the root directory of your site, that is the directory where your home page or index page is. Put the robots.txt file right alongside the index file.


About the Author

Sign up for the Web Success Weekly Email. Lean simple, step by step methods to get your business online and making money, the easy way: http://websuccess.info/ By Harvey Lew Robinson: http://websuccess.info/seo/spiders.html





HTML For Publishers


Please note: This article is free to reprint but all links must remain active.



Rating: Not yet rated

Comments

No comments posted.

Add Comment

You do not have permission to comment. If you log in, you may be able to comment.






Sign up for PayPal and start accepting credit card payments instantly.



Check out these great links:
Indy Hip-hop Film | Free Scholarship Search | London Escorts | Stun Guns, Pepper Spray

Article Directory | 155 Free eBooks | Quality Sites | Article Directory Elite | Baby Picture Generator
Modern Home Furniture | Unique Articles | Make Money Online Info | Submit Article

Privacy Policy | Contact Us | Terms of Service | Technorati Profile | B2B Internet Marketing

The ReadEZArchive.com Article Directory and Publishing Resource Center is available to writers, bloggers, publishers and
anyone seeking informational content. We have free eBooks and promotional tools for you to use, and you can even
submit your website link to our Link Directory if you're a website owner. We also have a Topsites list for
Article Directory owners to add their banner to for even more traffic and promotion.


Copyright 2007-2008 www.ReadEZArchive.com - Article Directory and Publishing Resource Center