Excerpted from DR. WEBSITE ( http://webdeveloper.internet.com/drweb/)
How to Tell the Search Bots Where to Look on Your Site
You will need to create a text file called robots.txt and place it at the root level of your server; you can include syntax in this file to tell robots that they are barred from accessing all or certain parts of your server. Well-behaved robots that adhere to the robots exclusion standard will search for this file upon visiting your site.
Here's an example of what your robots.txt could include:
User-agent: *
Disallow: /tmp
Disallow: /personal/topsecretIn the first line, the asterisk indicates that these limitations are directed at all robots; you could also include the names of robots here if you only wanted to allow or disallow specific ones.
The second and third lines instruct robots that all URLs on the site matching the pattern /tmp or /personal/topsecret should not be visited.
To see how web sites use the robots.txt file, point your browser at any top level site, for instance:
http://www.whitehouse.gov/robots.txt
http://www.sun.com/robots.txtTo create your own robots.txt file, use a basic text editor (rather than a word processor) and follow the examples that you find on other web sites.
Using META Tags to communicate with webbots
Keyword and Description attributes
Chances are that if you manually code your Web pages, youre aware of the "keyword" and "description" attributes. These allow the search engines to easily index your page using the keywords you specifically tell it, along with a description of the site that you yourself get to write. You use the keywords attribute to tell the search engines which keywords to use, like this:
<META NAME ="keywords" CONTENT="life, universe, mankind, plants, relationships, the meaning of life, science">
By the way, dont think you can "spike" the keywords by using the same word repeated over and over, as most search engines have refined their spiders to ignore such spam. Using the META description attribute, you add your own description for your page:
<META NAME="description" CONTENT="This page is about the meaning of life, the universe, mankind and plants.">
Make sure that you use several of your keywords in your description. While you are at it, you may want to include the same description enclosed in comment tags, just for the spiders that do not look at META tags. To do that, just use the regular comment tags, like this:
<!--// This page is about the meaning of life, the universe, mankind and plants. //--!>
The following is an example of a META description from the Smithsonian Institute web...
<html> <head> <title>Smithsonian Institution</title><meta NAME="description" CONTENT="The Smithsonian Institution is composed of sixteen museums and galleries and the National Zoo and numerous research facilities in the United States and abroad."></head>Controlling webbot behavior at the file level
If you wish to have some measure of control over what is or is not indexed by spiders, and you don't wish to have the global controlling features determined by the robots.txt file, then the robots META attribute was designed to make your life easier. In its complete form, it looks like the following:
<META NAME="robots" CONTENT="all | none | index | noindex | follow | nofollow">
The default for the robot attribute is "all". This would allow all of the files to be indexed. "None" would tell the spider not to index any files, and not to follow the hyperlinks on the page to other pages. "Index" indicates that this page may be indexed by the spider, while "follow" would mean that the spider is free to follow the links from this page to other pages. The inverse is also true, thus this META tag:
<META NAME="robots" CONTENT=" noindex">
would tell the spider not to index this page, but would allow it to follow subsidiary links and index those pages. "nofollow" would allow the page itself to be indexed, but the links could not be followed. For more information on the robots META attribute, visit the http://www.W3.org web for authoritative documentation on robots and the META tags associated with optimizing pages for search engines.
Placement of META tags
META tags should always be placed in the head of the HTML document between the actual <HEAD> tags, before the BODY tag. This is very important with framed pages, as a lot of developers tend to forget to include them on individual framed pages. Remember, if you only use META tags on the frameset pages, you'll be missing a large number of potential hits.
References:
http://www.vancouver-webpages.com/META/ http://www.htdig.org/meta.html http://www.kollar.com/robots.html http://wdvl.com/Search/Meta/Tag.html