151 items found for ""
- How To Get In To Google News – My Moz Whiteboard Friday
- Technical SEO in the Real World
In September 2018 I gave a talk at the awesome Learn Inbound conference in Dublin, where I was privileged to be part of a speaker lineup that included Britney Muller, Wil Reynolds, Ian Lurie, Aleyda Solis, Paddy Moogan, Laura Crimmons, Jon Myers, and many more excellent speakers. My talk was about some of the more interesting technical SEO conundrums I’ve encountered over the years. The folks at Learn Inbound recorded the talk and have made it available for viewing: I gave an updated version of this talk two weeks later at BrightonSEO, so if you missed either one of those you can now watch it back for yourself.
- Google AMP Can Go To Hell
Let’s talk about Accelerated Mobile Pages, or AMP for short. AMP is a Google pet project that purports to be “an open-source initiative aiming to make the web better for all”. While there is a lot of emphasis on the official AMP site about its open source nature, the fact is that over 90% of contributions to this project come from Google employees, and it was initiated by Google. So let’s be real: AMP is a Google project. Google is also the reason AMP sees any kind of adoption at all. Basically, Google has forced websites – specifically news publishers – to create AMP versions of their articles. For publishers, AMP is not optional; without AMP, a publisher’s articles will be extremely unlikely to appear in the Top Stories carousel on mobile search in Google. And due to the popularity of mobile search compared to desktop search, visibility in Google’s mobile search results is a must for publishers that want to survive in this era of diminishing revenue and fierce online competition for eyeballs. If publishers had a choice, they’d ignore AMP entirely. It already takes a lot of resources to keep a news site running smoothly and performing well. AMP adds the extra burden of creating separate AMP versions of articles, and keeping these articles compliant with the ever-evolving standard. So AMP is being kept alive artificially. AMP survives not because of its merits as a project, but because Google forces websites to either adopt AMP or forego large amounts of potential traffic. And Google is not satisfied with that. No, Google wants more from AMP. A lot more. Search Console Messages Yesterday some of my publishing clients received these messages from Google Search Console: Take a good look at those messages. A very good look. These are the issues that Google sees with the AMP versions of these websites: “The AMP page is missing all navigational features present in the canonical page, such as a table of contents and/or hamburger menu.”“The canonical page allows users to view and add comments, but the AMP article does not. This is often considered missing content by users.”“The canonical URL allows users to share content directly to diverse social media platforms. This feature is missing on the AMP page.”“The canonical page contains a media carousel that is missing or broken in the AMP version of the page.” Basically, any difference between the AMP version and the regular version of a page is seen as a problem that needs to be fixed. Google wants the AMP version to be 100% identical to the canonical version of the page. Yet due to the restrictive nature of AMP, putting these features in to an article’s AMP version is not easy. It requires a lot of development resources to make this happen and appease Google. It basically means developers have to do all the work they already put in to building the normal version of the site all over again specifically for the AMP version. Canonical AMP The underlying message is clear: Google wants full equivalency between AMP and canonical URL. Every element that is present on a website’s regular version should also be present on its AMP version: every navigation item, every social media sharing button, every comment box, every image gallery. Google wants publishers’ AMP version to look, feel, and behave exactly like the regular version of the website. What is the easiest, most cost-efficient, least problematic method of doing this? Yes, you guessed it – just build your entire site in AMP. Rather than create two separate versions of your site, why not just build the whole site in AMP and so drastically reduce the cost of keeping your site up and running? Google doesn’t quite come out and say this explicitly, but they’ve been hinting at it for quite a while. It was part of the discussion at AMP Conf 2018 in Amsterdam, and these latest Search Console messages are not-so-subtle hints at publishers: fully embracing AMP as the default front-end codebase for their websites is the path of least resistance. That’s what Google wants. They want websites to become fully AMP, every page AMP compliant and adhering to the limitations of the AMP standard. The Google-Shaped Web The web is a messy, complicated place. Since the web’s inception developers have played loose and fast with official standards, and web browsers like Netscape and Internet Explorer added to this mess by introducing their own unofficial technologies to help advance the web’s capabilities. The end result is an enormously diverse and anarchic free-for-all where almost no two websites use the same code. It’s extremely rare to find websites that look good, have great functionality, and are fully W3C compliant. For a search engine like Google, whose entire premise is based on understanding what people have published on the web, this is a huge challenge. Google’s crawlers and indexers have to be very forgiving and process a lot of junk to be able to find and index content on the web. And as the web continues to evolve and becomes more complex, Google struggles more and more with this. For years Google has been nudging webmasters to create better websites – ‘better’ meaning ‘easier for Google to understand’. Technologies like XML sitemaps and schema.org structured data are strongly supported by Google because they make the search engine’s life easier. Other initiatives like disavow files and rel=nofollow help Google keep its link graph clean and free from egregious spam. All the articles published on Google’s developer website are intended to ensure the chaotic, messy web becomes more like a clean, easy-to-understand web. In other words, a Google-shaped web. This is a battle Google has been fighting for decades. And the latest weapon in Google’s arsenal is AMP. Websites built entirely in AMP are a total wet dream for Google. AMP pages are fast to load (so fast to crawl), easy to understand (thanks to mandatory structured data), and devoid of any unwanted clutter or mess (as that breaks the standard). An AMPified web makes Google’s life so much easier. They would no longer struggle to crawl and index websites, they would require significantly less effort to extract meaningful content from webpages, and would enable them to rank the best possible pages in any given search result. Moreover, AMP allows Google to basically take over hosting the web as well. The Google AMP Cache will serve AMP pages instead of a website’s own hosting environment, and also allow Google to perform their own optimisations to further enhance user experience. As a side benefit, it also allows Google full control over content monetisation. No more rogue ad networks, no more malicious ads, all monetisation approved and regulated by Google. If anything happens that falls outside of the AMP standard’s restrictions, the page in question simply becomes AMP-invalid and is ejected from the AMP cache – and subsequently from Google’s results. At that point the page might as well not exist any more. Neat. Tidy. Homogenous. Google-shaped. Dance, Dance for Google Is this what we want? Should we just succumb to Google’s desires and embrace AMP, hand over control of our websites and content to Google? Yes, we’d be beholden to what Google deems is acceptable and publishable, but at least we’ll get to share in the spoils. Google makes so much money, plenty of companies would be happy feeding off the crumbs that fall from Google’s richly laden table. It would be easy, wouldn’t it? Just do what Google tells you to. Stop struggling with tough decisions, just let go of the reins and dance to Google’s fiddle. Dance, dance like your company’s life depends on it. Because it does. You know what I say to that? No. Google can go to hell. Who are they to decide how the web should work? They didn’t invent it, they didn’t popularise it – they got filthy rich off of it, and think that gives them the right to tell the web what to do. “Don’t wear that dress,” Google is saying, “it makes you look cheap. Wear this instead, nice and prim and tidy.” F#&! you Google, and f#&! the AMP horse you rode in on. This is the World Wide Web – not the Google Wide Web. We will do as we damn well please. It’s not our job to please Google and make our websites nice for them. No, they got this the wrong way round – it’s their job to make sense of our websites, because without us Google wouldn’t exist. Google has built their entire empire on the backs of other people’s effort. People use Google to find content on the web. Google is just a doorman, not the destination. Yet the search engine has epic delusions of grandeur and has started to believe they are the destination, that they are the gatekeepers of the web, that they should dictate how the web evolves. Take your dirty paws off our web, Google. It’s not your plaything, it belongs to everyone. Fight Back Some of my clients will ask me what to do with those messages. I will tell them to delete them. Ignore Google’s nudging, pay no heed. Google is going to keep pushing. I expect those messages to turn in to warnings, and eventually become full-fledged errors that invalidate the AMP standard. Google wants a cleaner, tidier, less diverse web, and they will use every weapon at their disposal to accomplish that. Canonical AMP is just one of those weapons, and they have plenty more. Their partnership with the web’s most popular CMS, for example, is worth keeping an eye on. The easy thing to do is to simply obey. Do what Google says. Accept their proclamations and jump when they tell you to. Or you could fight back. You could tell them to stuff it, and find ways to undermine their dominance. Use a different search engine, and convince your friends and family to do the same. Write to your elected officials and ask them to investigate Google’s monopoly. Stop using the Chrome browser. Ditch your Android phone. Turn off Google’s tracking of your every move. And, for goodness sake, disable AMP on your website. Don’t feed the monster – fight it.
- Google News vs Donald Trump: Bias in Google’s Algorithms?
This morning US president Donald Trump sent out a few tweets about Google News. Since optimising news publishers for Google News is one of my key specialities as a provider of SEO services, this piqued my interest more than a little. In his tweets, Trump accuses Google News of having a liberal anti-Trump bias: “96% of results on “Trump News” are from National Left-Wing Media, very dangerous. Google & others are suppressing voices of Conservatives and hiding information and news that is good. They are controlling what we can & cannot see. This is a very serious situation-will be addressed!” The source of Trump’s information regarding Google News’s perceived bias is the right-wing blog PJ Media, who published a story about the sites that Google News lists when searching for ‘Trump’ in Google and selecting the ‘News’ tab in search results. According to PJ Media, “Not a single right-leaning site appeared on the first page of search results.” This is the chart that PJ Media used to determine if a listed news site is right-wing or left-wing: Putting aside the questionable accuracy of this chart and the tiny sample size of PJ Media’s research, there is a valid underlying question: can algorithms be truly neutral? Google News vs Regular Google Search First of all we need to be clear about what we mean when we say ‘Google News’. Google’s search ecosystem is vast, complex, and intricately linked. Originally, Google News was a separate search vertical that allowed people to search for news stories. It was soft-launched in beta in 2002 and officially launched in 2006. Then, in 2007, came Universal Search. Google started combining results from different verticals – images, videos, news, shopping – with its regular web search results. This was the start of Google’s SERPs as we know them today: rich results pages where pages from the we are combined with relevant news stories, images, and knowledge graph information. This is still the norm today. Take, for example, Google’s regular web search result for ‘trump’: In just this one search results we have a knowledge panel on the right with information on Trump, related movies & TV shows, Trump’s official social media profiles, and a ‘People also search for’ box. In the main results area we have a Top Stories carousel followed by recent tweets from Donald Trump, relevant videos, a ‘People also ask’ box of related searches, a box with other US presidents, another box with political leaders, and a box with people relevant to Trump’s wife Ivana. And amidst all this there are nine ‘regular’ web search results. While Trump’s official website is listed, it’s not the first regular result and the page is dominated by results from publishers: The Guardian, BBC, The Independent, Washington Post, The Atlantic, Vanity Fair, and NY Magazine. There’s a reason publishers tend to dominate such search results – I’ve given conference talks about that topic – but believe it or not, that’s not where news websites get the majority of their Google traffic from. Nor is the news.google.com vertical a particularly large source of traffic: it only accounts for approximately 3% of traffic to news sites. So where does publishers’ search traffic come from? Well, news publishers depend almost entirely on the Top Stories carousel for their search traffic: Especially on mobile devices (which is where the majority of Google searches happen) the Top Stories carousel is a very dominant feature of the results page: According to research from Searchmetrics, this top stories box appears in approximately 11.5% of all Google searches which amounts to billions of search results pages every single day. This is why news publishers work so very hard to appear in that Top Stories carousel, even when it means implementing technologies like AMP which are contrary to most news organisation’s core principles but is a requirement for appearing in Top Stories on mobile. Of course, search is not the only source of traffic for news publishers, but it is by far the largest: News publishers don’t really have much of a choice: they either play by Google’s rules to try and claim visibility in Google News, or try and survive on the scraps that fall from Google’s table. For me the interesting question is not ‘is Google News biased?’ but ‘how does Google select Top Stories?’ The answer to that question has three main elements: technology, relevancy, and authority. The Technology of Google News & Top Stories The technical aspects of ranking in the Top Stories carousel are fairly straightforward, but by no means simple. First of all, the news site has to be included in the Google News index. This is not optional – according to NewsDashboard, over 99% of articles shown in Top Stories are from websites that are included in the Google News index. Because this news index is manually maintained, there is an immediate opportunity for accusations of bias. The people responsible for curating the Google News index make decisions about which websites are okay and which aren’t, and this cannot be a ‘neutral’ and ‘objective’ process because people aren’t neutral and objective. Every news site that is accepted or rejected is done so on the basis of a human decision. As all human decisions are subject to bias – especially unconscious bias – this makes the initial approval process already a subjective one. Secondly, the news site needs to have certain technical elements in place to allow Google News to quickly crawl and index new articles. This includes structured data markup for your articles, and a means of letting Google know you have new articles (usually through a news-specific XML sitemap). Both of these technologies are heavily influenced by Google: schema.org is a joint project from Google, Bing and Yahoo, and the sitemaps protocol is entirely dependent on search engines like Google for its existence. Thirdly, you need to have valid AMP versions of your articles. Some may see this as an optional aspect, but really, without AMP a news site will not appear in Top Stories on mobile search results. This presents such a catastrophic loss of potential search traffic that it’s economically unfeasible for news websites to forego AMP. While AMP is presented as an open source project, in reality the vast majority of its code is written by Google engineers. At last count, over 90% of the AMP code comes from Googlers. So let’s be honest, AMP is a Google project. This gives Google full technical control over Google News and Top Stories – in Google’s own crawling, indexing, and ranking systems, as well as the technologies that news publishers need to adopt to be considered for Google News. Publishers don’t have all that much freedom in designing their tech stack if they want to have any hopes of getting traffic from Google. Ranking in Top Stories The other aspects of ranking in Google News and Top Stories are about the news site’s editorial choices. While historically the Top Stories algorithm has been quite simplistic and easy to manipulate, that’s less the case nowadays. Since the powerful backlash against holocaust denial stories appearing in Google News, the search engine has started putting more resources in its News division, with a newly launched Google News vertical as the result. The algorithms that decide which stories show up in any given Top Stories carousel take a number of aspects in to consideration: Is the article relevant for this query? Is it a recently published or updated article? Is it original content? Is the publisher known to write about this topic? Is the publisher trustworthy and reliable? In Google News there is also a certain amount of personalisation, where Google’s users will see more stories from publishers that they prefer or are seen as geographically relevant (for example because it’s a newspaper local to the story’s focus). And of course, a lot of the rankings of any given news article depend on how well the article has been optimised for search. A classic example is Angelina Jolie’s column for the New York Times about her double mastectomy – if you search for ‘angelina jolie mastectomy‘ her column doesn’t rank at all, and at the time it didn’t appear in any Top Stories carousel. What you see are loads of other articles written about her mastectomy, but the actual column that kicked off the story is nowhere to be found. One look at the article in question should tell you why: it’s entirely unoptimised for the most relevant searches that people might type in to Google. Some journalism purists might argue that tweaking an article’s headline and content for maximum visibility in Google News is a pollution of their craft. Yet journalists seem to have no qualms about optimising headlines for maximum visibility at news stands. News publishers have always tried to grab people’s attention with headlines and introduction text, and doing this for Google News is simply an extension of that practice. Yet even with the best optimised content, news publishers are entirely dependent on Google’s interpretations of their writing. It’s Google’s algorithms that decide if and where an article appears in the Top Stories carousel. Algorithms Are Never Neutral According to Google, the new version of Google News uses artificial intelligence: “The reimagined Google News uses a new set of AI techniques to take a constant flow of information as it hits the web, analyze it in real time and organize it into storylines.” This seems like an attempt at claiming neutrality by virtue of machines making the decisions, not humans. But this doesn’t stand up to scrutiny. All algorithmic evaluations are the result of human decisions. Algorithms are coded by people, and that means they will carry some measure of those people’s own unconscious biases and perceptions. No matter how hard Google tries to make algorithms ‘neutral’, it’s impossible to achieve real neutrality in any algorithm. When Google’s algorithm decides that the story from Site A should appear first in the Top Stories carousel, and a similar story from Site B should be way down at the end of the carousel (or not in there at all), that is the result of countless human decisions – some large, some small – about what constitutes relevancy and trustworthiness. Even with a diverse base of employees from all different backgrounds and walks of life, creating neutral algorithms is immensely challenging. Senior engineers’ decisions will almost always outweigh junior staff’s decisions, and some people’s biases will be represented in those editorial decisions that are made about how an algorithm ranks content. And here Google can be very rightfully accused: it has an incredibly homogenous employee base. Ironically, while Google reports on its employees’ ethnicity and gender, it doesn’t report on political leanings – which is what sparked the furore about their lack of diversity in the first place. So we have no way of really knowing if Google’s engineers come from varied political backgrounds. This leaves Google wide open to criticisms of bias, and it’ll be very hard to dismiss those concerns. Is Google News Biased? To return to the question of bias in Google News, does Donald Trump have a point? The PJ Media article that sparked the controversy is deeply flawed and entirely unrepresentative, but there are other sources that point towards a left-leaning bias in Google News: Yet simply looking at Google News search results and evaluating their diversity of opinion is a dangerous approach, because it fails to look at the underlying dependencies that go in to creating that result in the first place: the technological demands placed on news publishers, the skill of individual journalists to optimise their articles for Google News, and the ability of news organisations to break stories and set the news agenda. And we can’t leave out the fact that Google openly admits to making editorial decisions in Google News. Yes, actual people choosing stories to show up for trending topics. From its own relevant support documentation on curation: The choice of language is very interesting: by using phrases like ’empirical signals’ and ‘algorithmically populated’ Google intends to create the perception that these human curators have no real editorial influence over what is shown in Google News. Yet, even if we accept the notion of Google’s curators being able to make neutral decisions (which we’re not), we know that algorithms are not neutral themselves, and – risking treading on philosophical grounds – there’s no such thing as ’empirical signals’ when it comes to news. Despite its efforts with the Google News Initiative, Google has done little to alleviate legitimate fears of bias in its ranking algorithms. In fact, due to its near full control over the entire process, Google leaves itself very susceptible to accusations of bias in Google News. With Google’s astonishing dominance in search with over 86% worldwide market share, this does beg the question: Can we trust Google? Should we?
- Technical SEO Masterclass in London
- Polemic Digital backs Glentoran FC for the 2018/19 season
Polemic Digital’s logo features on the back of Glentoran’s home and away shirts for the 2018/19 season While Polemic Digital works with clients across the globe such as News UK, Seven West Media, Fox News, and Mail Online, we’re a key part of East Belfast’s thriving business community and have been based in the City East Business Centre since our inception. Last month we agreed a partnership with Glentoran FC, the iconic East Belfast football club, to become one of the club’s sponsors with our company’s logo featuring on the back of the players’ 2018/19 shirts. Commenting on this partnership with Glentoran FC, Polemic Digital’s founder Barry Adams said: “The people and businesses of East Belfast have supported and inspired us to take pride in working hard and achieving great results. Glentoran embodies this spirit of teamwork and the will to win against the odds. In our conversations with Simon Wallace we quickly recognised the kindred spirit shared by Glentoran and Polemic Digital, and we’re proud to become part of the club’s long and celebrated history.” Glentoran’s Simon Wallace pictured with Barry Adams from Polemic Digital at The Oval. Simon Wallace, commercial manager at Glentoran FC, added: “It’s great to have a successful local business like Polemic Digital partner with Glentoran and support our club. As a small local firm, Polemic manages to punch above its weight locally and internationally, which mirrors the drive and ambition that Glentoran FC has shown throughout the years.” “We’re looking forward to a great season in the league,” Barry Adams continued. “Glentoran is such an iconic club, we’re fully behind the team and hope for a successful season.” Read more about Polemic Digital here, and visit the Glentoran website at www.glentoran.com.
- Polemic Digital wins Best SEO Campaign at the 2018 DANI Awards
On Friday 13 April the 2018 DANI Awards were held in Whitla Hall at Queen’s University Belfast. Since 2018 the DANI Awards have celebrated the great work in digital done in Northern Ireland, and this year was the biggest event yet with more award submissions than ever before. We were up for three awards and, with a shortlist full of great companies and exciting projects, we knew there was going to be tough competition in every category. The one we most looked forward to was of course the Best SEO Campaign award. In an ever-changing industry where many agencies chase after the latest hype, we are unashamedly an SEO-only agency. It’s the one thing we do, and we try to do it as well as it can be done. So we were very happy and honoured to win Best SEO Campaign and take home the prize! It was an evening to celebrate – not only did we win, we also saw many of our friends in the Northern Irish digital industry pick up awards! Huge congratulations to the folks at The Tomorrow Lab, Digital 24, Loud Mouth Media, and Fathom, and especially to Emma Gribben for winning Young Digital Person of the Year! (I interviewed Emma as part of my NI Digital Experts series, read her story here.) We won the award for our work with TheSun.co.uk, and sometimes people ask what makes for an award-winning campaign. How do the judges decide what’s worthy of recognition, and are the winners really deserving of it? I don’t usually share specific client results, but since this was such a great project to work on and has been written about before, I feel it’s okay to share a bit about this project. Since the launch of the new TheSun.co.uk site in 2016, search visibility growth has been astonishing and the site has been going from strength to strength. This is what the Sistrix graph of an award-winning campaign looks like: While I collected the award, it’s really for the combined efforts of everyone involved in SEO at The Sun. They are some of the smartest and most driven people I’ve ever had the privilege of working with. Lately many news sites have had to deal with significant algorithm updates from Google that had a profound impact on the industry. These are the types of challenges that I thrive on. Hopefully we can continue the site’s stellar growth and demonstrate the power of an all-encompassing approach to SEO.
- Polemic Digital shortlisted for three 2018 DANI Awards
Since their inception in 2010, the DANI Awards have celebrated the best and brightest of the Northern Irish digital scene. In a previous life, when I was with Pierce Communications, we won two DANI Awards in 2012 for our campaigns for Emo Oil and Total Produce. And in 2014, I achieved a great personal honour by winning Digital Industries Person of the Year at that year’s DANI Awards. Pursuing awards is not something we actively engage in at Polemic Digital. Awards are a dime a dozen, and we have little faith in the majority of them. So far we have chosen to only enter the renowned UK Search Awards, with some measure of success. This year we decided to also enter the DANI Awards, because we felt our client projects were achieving a level of success that warranted recognition. And that gamble has seemed to pay off, as all three of the client projects we’ve entered have been shortlisted! Our work is competing in the following categories: Best SEO Campaign for our work with TheSun.co.uk Best Campaign in Retail for our work with SkirtingsRUs.co.uk Best Campaign in Healthcare for our work with DocklandsDental.ie All three of these websites have achieved considerable SEO success since we started working with them, and to be shortlisted for the 2018 DANI Awards shows that we’re at the forefront of the Northern Irish digital scene. Hopefully on the night itself we’ll come away with some silverware. It’ll be a great event regardless, celebrating the awesome work that’s being done in our wee country. I’m also very pleased to see people and companies we consider friends of Polemic Digital also shortlisted at this year’s awards. Good luck to the folks at The Tomorrow Lab, Digital 24, Loud Mouth Media, and Fathom, and especially to Emma Gribben who’s shortlisted for Best Young Digital Person!
- View Source: Why it Still Matters and How to Quickly Compare it to a Rendered DOM
- Polemic Digital shortlisted for two 2017 UK Search Awards
Last year, we entered the UK Search Awards for the very first time in our existence. In a marketplace where award ceremonies are a dime a dozen, the UK Search Awards have always stood out as something special. The judging panel on these awards is second to none, and we knew that our work was going to be judged on merit alone – and not the size of our sponsorship budget. So in 2016, with two and a half years of business under our belt, we more or less wanted to see where we stood in the crowded SEO landscape in the UK. We submitted a few projects to the awards, and were delighted to find ourselves shortlisted in three award categories. We never expected to win that year. After all, we were just a small two-person business in Belfast, and we were competing against some of the UK’s biggest and most established agencies and brands. So when we ended up winning two awards, we were stunned and amazed. Polemic Digital’s 2016 UK Search Awards This year we decided to enter again. While the business has evolved somewhat this last year, focusing primarily in SEO audits, SEO training, and specialised SEO for news publishers, we had a few ongoing projects we were proud of and perhaps the judges might consider favourably. So when last week the shortlist was announced, we were eager to see if we’d made the cut. And indeed, we did! Polemic Digital is shortlisted in two categories: Best Use of Search – Retail Best Small SEO Agency While we won last year’s Best Small SEO Agency award, in which our tiny two-person agency went up against outfits that had up to 25 members of staff, we feel our chances to extend our winning streak are quite small; every year the competition gets tougher, with more companies submitting more projects to the awards. This year, the shortlist boasts a truly outstanding selection of agencies and projects. Still, even if we leave empty-handed, the awards night on November 30 in London will be another superb event celebrating all that is awesome about the search industry in the UK. Our local friends at Loud Mouth Media are once again on the shortlist, continuing their success as Northern Ireland’s finest PPC agency. And many of our agency friends in the UK, such as Marketing Signals, Branded3, Verve Search, MediaVision, BlueGlass, 10 Yetis, Screaming Frog, and many more are also shortlisted for awards. So it’ll be an amazing night, no matter what. Just to be shortlisted among the UK’s biggest and best is all we ever wanted, so we’re already considering this mission accomplished! Update: we didn’t win any awards but had a great night nonetheless. Congratulations to all the winners, well-deserved!
- Technical SEO Training
- Prevent Google From Indexing Your WordPress Admin Folder With X-Robots-Tag
I recently wrote an article for State of Digital where I lamented the default security features in WordPress. Since it is such a popular content management system, WordPress is targeted by hackers more than any other website platform. WordPress websites are subjected to hacking attempts every single day. According to Wordfence’s March 2017 attack report, there were over 32 million attempted brute force attacks against WordPress sites in that month alone. Out of the box, WordPress has some severe security flaws leaving it vulnerable to brute force attacks. One of these flaws is how WordPress prevents search engines like Google from crawling back-end administration files: through a simple robots.txt disallow rule. User-agent: * Disallow: /wp-admin/ While at first glance this may seem perfectly sensible, it is in fact a terrible solution. There are two major issues with the robots.txt disallow rule: Because a website’s robots.txt file is publicly viewable, a disallow rule points hackers to your login folder. A disallow rule doesn’t actually prevent search engines from showing blocked pages in its search results. I don’t recommend using robots.txt blocking as a method to protect secure login folders. Instead there are other, more elegant ways of ensuring your admin folders are secure and cannot be crawled and indexed by search engines. X-Robots-Tag HTTP Header In the context of SEO, the most common HTTP headers people have heard of are the HTTP status code and the User-Agent header. But there are other HTTP headers which can be utilised by clever SEOs and web developers to optimise how search engines interact with a website, such as Cache-Control headers and the X-Robots-Tag header. The X-Robots-Tag is a HTTP header that informs search engine crawlers (‘robots’) how they should treat the page being requested. It’s this tag that can be used as a very effective way to prevent login folders and other sensitive information from being shown in Google’s search results. Search engines like Google support the X-Robots-Tag HTTP header and will comply to the directives given by this header. The directives the X-Robots-Tag header can provide are almost identical to the directives enabled by the meta robots tag. But, contrary to the meta robots tag, the X-Robots-Tag header doesn’t require the inclusion of an HTML meta tag on every affected page on your site. Additionally, you can configure the X-Robots-Tag HTTP header to work for files where you can’t include a meta tag, such as PDF files and Word documents. With a few simple lines of text in your website’s Apache htaccess configuration file, we can prevent search engines from including sensitive pages and folders in its search results. For example, With the following lines of text in the website’s htaccess file, we can prevent all PDF and Word document files from being indexed by Google: Header set X-Robots-Tag "noindex, nofollow" It’s always a good idea to configure your website this way, to prevent potentially sensitive documents from appearing in Google’s search results. The question is, can we use the X-Robots-Tag header to protect a WordPress website’s admin folder? X-Robots-Tag and /wp-admin The X-Robots-Tag doesn’t allow us to protect entire folders in one go. Unfortunately, due to Apache htaccess restrictions, the header only triggers on rules applying to file types and not for entire folders on your site. Yet, because all of WordPress’s back-end functionality exists within the /wp-admin folder (or whichever folder you may have changed that to) we can create a separate htaccess file for that folder to ensure the X-Robots-Tag HTTP header to all webpages in that folder. All we need to do is create a new htaccess file containing the following rule: Header set X-Robots-Tag "noindex, nofollow" We then use our preferred FTP programme to upload this .htaccess file to the /wp-admin folder, and voila. Every page in the /wp-admin section will now serve the X-Robots-Tag HTTP header with the ‘noindex, nofollow’ directives. This will ensure the WordPress admin pages will never be indexed by search engines. You can also upload such an htaccess file configured to serve X-Robots-Tag headers to any folder on your website that you want to protect this way. For example, you might have a folder where you store sensitive documents you want to share with specific 3rd parties, but don’t want search engines to see. Or if you run a different CMS, you can use this to protect that system’s back-end folders from getting indexed. To check whether a page on your site serves the X-Robots-Tag HTTP header, you can use a browser plugin like HTTP Header Spy [Firefox] or Ayima Redirect Path [Chrome], which will show you a webpage’s full HTTP response. I would strongly recommend you check several different types of pages on your site after you’ve implemented the X-Robots-Tag HTTP header, because a small error can result in every page on your website serving that header. And that would be a Bad Thing. To check if Google has indexed webpages on your site in the /wp-admin folder, you can do a search with advanced operators like this: site:website.com inurl:wp-admin This will then give a search result listing all pages on website.com that have ‘wp-admin’ anywhere in the URL. If all is well, you should get zero results: The X-Robots-Tag HTTP header is a simple and more robust approach to secure your WordPress login folders, and can also help optimise how search engines crawl and index your webpages. While it adds to your security, it’s by no means the only thing you need to do to secure your site. Always make sure you have plenty of security measures in place – such as basic authentication in addition to your CMS login – and install a plugin like Wordfence or Sucuri to add extra layers of protection. If you liked this post, please share it on social media. You might also like to read this post about protecting your staging environments.