Elementary SEO: All about robots.txt

Posted by Posted on by
0

SEO is about making your pages rank higher on search results. But there are certain pages on your website which you don’t want users to arrive at from search results. The robots.txt file is used to exclude such pages from showing up in search results.

Search engines use bots or robots to crawl websites and learn about them, so that they’ll know which websites should show up for a particular keyword. Whenever such bots arrive at a website, the first thing they look for is the robots.txt file, because it contains instructions from the website’s owner. Now, there are good bots and bad bots. The bad ones especially, like the malware bots that are on the lookout for security vulnerabilities, pay no attention to the robots.txt file.

What is the role of robots.txt?

It contains two important information. Which bots are allowed to crawl this website, and which pages on this site should not be crawled.

How to create a robots.txt?

It can be created using any text-editor. The name of this file is case-sensitive, so it should be lower-case only. The robots.txt file should be put in the root folder of your website, along with the index or welcome page, so that the path of this file always is, www.yourdomainname.com/robots.txt .

It usually has two commands. User-agent is to specify the bot to which the following instructions apply. Disallow specifies the pages which are restricted.

A simple example of a robots.txt file is as below.

User-agent: *
Disallow: /

So, in the example above, the “*” beside User-agent says, the following commands apply to every kind of bot that lands on this site.
The “/” beside Disallow, represents all sub-directories in the root folder are restricted to bots. That means, no page inside the root folder should be crawled, by any bot.

Here are a few examples. To permit select bots, and keep the rest out,

User-agent: Googlebot
Disallow:

User-agent: *
Disallow: /

To restrict select directories on a website from being crawled, the commands would be,

User-agent: *
Disallow: /directory/

To block files of a specific type,

User-agent: Googlebot

Disallow: /*.gif$

To block a particular directory, and everything in it,

User-agent: Googlebot-Image

Disallow: /images/dogs.jpg

Alternate method – META TAG:
You can also include a robots <meta> in the header of every page on your site. The syntax is,

<META NAME=”ROBOTS” CONTENT=”NOINDEX, FOLLOW”>
<META NAME=”ROBOTS” CONTENT=”INDEX, NOFOLLOW”>

The FOLLOW / NOFOLLOW attribute is for the links on that page. If it is NOFOLLOW, then bots should not be following any links on that page. If no meta tag is included, then it implies INDEX and FOLLOW, so there is no need to explicitly mention this.

As with Zoho Sites, you can access the crawler specification in the SEO settings page.

The commands entered here will automatically be saved in the robots.txt file of your website, and can be accessed at www.yourdomainname.com/robots.txt .