Getting to Know robots.txt

We all love visitors to our websites. We love the recognition, the sense of connectedness and the thrill of fame on whatever scale we happen to achieve. There is also a fascination with the numbers, pure numbers – where visitors came from, how they found our sites and what they are interested in. All this is good. The tingle of recognition ignites our passions and makes us want to improve. To make our sites better for our visitors, our customers and yes, even our communities.

But how do people find your site? How can you make sure that Google, Yahoo and all those other search engines are indexing your site in the right way? For example, you might want only Google to index your site. Or you might want to index everything except your photo gallery.

OR better yet … you might want to STOP search engines from indexing your site while you are still tweaking it prior to launch.

If you are using WordPress, you will be able to setup WordPress so that it hides your site from search engines with the click of a button.

But the rest of us have to rely on the Robots Exclusion Protocol.

Why use robots.txt?

Because the vast majority of us find websites via search, the careful management of robots.txt can have two particular impacts for marketers:

  1. Messaging – you can ensure that the bots index the pages that you want and specify the descriptions and information that tell your story – even before your customers get to your site.
  2. Customer experience – craft and refine the experience of your site by focusing on the experience of discovering your site and engaging your customer’s imagination early.

Of course, you may just want to funnel traffic from particular engines, or exclude indexing altogether.

What is robots.txt?

As the name suggests, robots.txt is a text file designed for automated web robots. It is one of the first files that a web bot scans on your website amd it tells the bot what to scan and what not to. Of course, if the person who wrote/designed the bot chooses to ignore robots.txt, then there is not much you can do about it (beyond putting a password on your site).

Where do you put robots.txt?

It’s very simple. Put your robots.txt file in the top level directory of your website – the same place as your home page.

What’s in robots.txt?

On the first line of robots.txt, you specify which bot you are targeting. Using * indicates that robots.txt applies to ALL bots.

On each subsequent line, you specify the “rules” for indexing your site. For example, you may want to disallow your personal photo gallery and the directory that runs your website programming scripts (cgi-bin). If so, your robots.txt file would look something like this:

User-agent: *
Disallow: /cgi-bin/
Disallow: /photos/

Blocking all search bots in robots.txt

User-agent: *
Disallow: /

Allowing only ONE search bot in robots.txt

User-agent: Google
Disallow:
User-agent: *
Disallow: /

Learning more about robots.txt

There are a few other tips and tricks that you can use to direct search bots to optimise your messaging and your customer’s experience. Check out the robotstxt.org website for more details.


This is part of my Digital 101 series – designed to explain the technical aspects of the digital landscape in a way that helps marketers do their job better.