Prevent Google From Indexing Your WordPress Admin Folder With X-Robots-Tag

WordPress Security

I recently wrote an article for State of Digital where I lamented the default security features in WordPress. Since it is such a popular content management system, WordPress is targeted by hackers more than any other website platform.

WordPress websites are subjected to hacking attempts every single day. According to Wordfence’s March 2017 attack report, there were over 32 million attempted brute force attacks against WordPress sites in that month alone.

Out of the box, WordPress has some severe security flaws leaving it vulnerable to brute force attacks. One of these flaws is how WordPress prevents search engines like Google from crawling back-end administration files: through a simple robots.txt disallow rule.

User-agent: *
Disallow: /wp-admin/

 
While at first glance this may seem perfectly sensible, it is in fact a terrible solution. There are two major issues with the robots.txt disallow rule:

  1. Because a website’s robots.txt file is publicly viewable, a disallow rule points hackers to your login folder.
  2. A disallow rule doesn’t actually prevent search engines from showing blocked pages in its search results.

I don’t recommend using robots.txt blocking as a method to protect secure login folders. Instead there are other, more elegant ways of ensuring your admin folders are secure and cannot be crawled and indexed by search engines.

X-Robots-Tag HTTP Header

In the context of SEO, the most common HTTP headers people have heard of are the HTTP status code and the User-Agent header. But there are other HTTP headers which can be utilised by clever SEOs and web developers to optimise how search engines interact with a website, such as Cache-Control headers and the X-Robots-Tag header.

The X-Robots-Tag is a HTTP header that informs search engine crawlers (‘robots’) how they should treat the page being requested. It’s this tag that can be used as a very effective way to prevent login folders and other sensitive information from being shown in Google’s search results.

Search engines like Google support the X-Robots-Tag HTTP header and will comply to the directives given by this header. The directives the X-Robots-Tag header can provide are almost identical to the directives enabled by the meta robots tag.

But, contrary to the meta robots tag, the X-Robots-Tag header doesn’t require the inclusion of an HTML meta tag on every affected page on your site. Additionally, you can configure the X-Robots-Tag HTTP header to work for files where you can’t include a meta tag, such as PDF files and Word documents.

With a few simple lines of text in your website’s Apache htaccess configuration file, we can prevent search engines from including sensitive pages and folders in its search results.

For example, With the following lines of text in the website’s htaccess file, we can prevent all PDF and Word document files from being indexed by Google:

<Files ~ "\.(pdf|doc|docx)$">
 Header set X-Robots-Tag "noindex, nofollow"
</Files>

 
It’s always a good idea to configure your website this way, to prevent potentially sensitive documents from appearing in Google’s search results. The question is, can we use the X-Robots-Tag header to protect a WordPress website’s admin folder?

X-Robots-Tag and /wp-admin

The X-Robots-Tag doesn’t allow us to protect entire folders in one go. Unfortunately, due to Apache htaccess restrictions, the header only triggers on rules applying to file types and not for entire folders on your site.

Yet, because all of WordPress’s back-end functionality exists within the /wp-admin folder (or whichever folder you may have changed that to) we can create a separate htaccess file for that folder to ensure the X-Robots-Tag HTTP header to all webpages in that folder. All we need to do is create a new htaccess file containing the following rule:

Header set X-Robots-Tag "noindex, nofollow"

 
We then use our preferred FTP programme to upload this .htaccess file to the /wp-admin folder, and voila. Every page in the /wp-admin section will now serve the X-Robots-Tag HTTP header with the ‘noindex, nofollow’ directives. This will ensure the WordPress admin pages will never be indexed by search engines.

You can also upload such an htaccess file configured to serve X-Robots-Tag headers to any folder on your website that you want to protect this way. For example, you might have a folder where you store sensitive documents you want to share with specific 3rd parties, but don’t want search engines to see. Or if you run a different CMS, you can use this to protect that system’s back-end folders from getting indexed.

To check whether a page on your site serves the X-Robots-Tag HTTP header, you can use a browser plugin like Live HTTP Headers [Firefox] or Ayima Redirect Path [Chrome], which will show you a webpage’s full HTTP response.

I would strongly recommend you check several different types of pages on your site after you’ve implemented the X-Robots-Tag HTTP header, because a small error can result in every page on your website serving that header. And that would be a Bad Thing.

To check if Google has indexed webpages on your site in the /wp-admin folder, you can do a search with advanced operators like this:

site:website.com inurl:wp-admin

This will then give a search result listing all pages on website.com that have ‘wp-admin’ anywhere in the URL. If all is well, you should get zero results:

Using Google to find admin files on your site

The X-Robots-Tag HTTP header is a simple and more robust approach to secure your WordPress login folders, and can also help optimise how search engines crawl and index your webpages.

While it adds to your security, it’s by no means the only thing you need to do to secure your site. Always make sure you have plenty of security measures in place – such as basic authentication in addition to your CMS login – and install a plugin like Wordfence or Sucuri to add extra layers of protection.

If you liked this post, please share it on social media. You might also like to read this post about protecting your staging environments.

Hosting, Legal, Security, Technical

Comments

  1. Nice article. Lets asume my wordpress Installation is:
    Website.com
    and I do have an Folder below ist
    Website.com/contenttemporary
    and on the subsequent Folders I store my temporary Content like:
    Website.com/contenttemporary/article1
    If I want prevent Google from indexing and following all aricles at /contenttemorary/article1-n
    what can I do?

    Reply »

    1. Yep. The fact that WP puts the AJAX functionality in the admin folder is in itself a huge issue. Personally, I don’t think blocking admin-ajax.php for search engines is a problem, mostly because search engines like Google stopped crawling AJAX recently anyway.

      Reply »

  2. What is the surround of the

    Header set X-Robots-Tag “noindex, nofollow”

    as in the prior example:

    Header set X-Robots-Tag “noindex, nofollow”

    because I assume just putting in a Header set without the wrap around isn’t going to help.

    Reply »

  3. Hi Barry!

    Previously GSC reported blocked resource error (only on smartphone) for “wp-admin/admin-ajax.php” because it was blocked in robots.txt (admin-ajax.php is needed for most WordPress sites but Googlebot can’t handle it.)

    I allowed it in robots.txt a few weeks ago but now I get a soft 404 error for wp-admin/admin-ajax.php

    What should we do with that?

    I just applied what you wrote and set the X-Robots-Tag “noindex, nofollow” for admin-ajax.php

    Don’t know if it’s enough to prevent Google to show it up in GSC. These errors show up since MFI, no problems for desktop.

    Reply »

Leave a Reply

Your email address will not be published. Required fields are marked *


Award Wins

DANI Awards 2018 Winners

UK Search Awards 2016 Winners

UK Search Awards 2017 Finalists