This morning US president Donald Trump sent out a few tweets about Google News. Since optimising news publishers for Google News is one of my key specialities as a provider of SEO services, this piqued my interest more than a little.
In his tweets, Trump accuses Google News of having a liberal anti-Trump bias:
“96% of results on “Trump News” are from National Left-Wing Media, very dangerous. Google & others are suppressing voices of Conservatives and hiding information and news that is good. They are controlling what we can & cannot see. This is a very serious situation-will be addressed!”
The source of Trump’s information regarding Google News’s perceived bias is the right-wing blog PJ Media, who published a story about the sites that Google News lists when searching for ‘Trump’ in Google and selecting the ‘News’ tab in search results. According to PJ Media,
“Not a single right-leaning site appeared on the first page of search results.”
This is the chart that PJ Media used to determine if a listed news site is right-wing or left-wing:
Political bias in news organisations. Source: SharylAttkisson.com
Putting aside the questionable accuracy of this chart and the tiny sample size of PJ Media’s research, there is a valid underlying question: can algorithms be truly neutral?
Google News vs Regular Google Search
First of all we need to be clear about what we mean when we say ‘Google News’. Google’s search ecosystem is vast, complex, and intricately linked.
Originally, Google News was a separate search vertical that allowed people to search for news stories. It was soft-launched in beta in 2002 and officially launched in 2006.
Then, in 2007, came Universal Search. Google started combining results from different verticals – images, videos, news, shopping – with its regular web search results. This was the start of Google’s SERPs as we know them today: rich results pages where pages from the we are combined with relevant news stories, images, and knowledge graph information.
This is still the norm today. Take, for example, Google’s regular web search result for ‘trump’:
Top half of Google’s SERP for ‘trump’ – click for the full SERP
In just this one search results we have a knowledge panel on the right with information on Trump, related movies & TV shows, Trump’s official social media profiles, and a ‘People also search for’ box.
In the main results area we have a Top Stories carousel followed by recent tweets from Donald Trump, relevant videos, a ‘People also ask’ box of related searches, a box with other US presidents, another box with political leaders, and a box with people relevant to Trump’s wife Ivana.
And amidst all this there are nine ‘regular’ web search results. While Trump’s official website is listed, it’s not the first regular result and the page is dominated by results from publishers: The Guardian, BBC, The Independent, Washington Post, The Atlantic, Vanity Fair, and NY Magazine.
There’s a reason publishers tend to dominate such search results – I gave a talk about that topic in Paris last year – but believe it or not, that’s not where news websites get the majority of their Google traffic from. Nor is the news.google.com vertical a particularly large source of traffic: it only accounts for approximately 2% of traffic to news sites. So where does publishers’ search traffic come from?
Well, news publishers depend almost entirely on the Top Stories carousel for their search traffic:
Especially on mobile devices (which is where the majority of Google searches happen) the Top Stories carousel is a very dominant feature of the results page:
According to research from Searchmetrics, this top stories box appears in approximately 11.5% of all Google searches which amounts to billions of search results pages every single day.
This is why news publishers work so very hard to appear in that Top Stories carousel, even when it means implementing technologies like AMP which are contrary to most news organisation’s core principles but is a requirement for appearing in Top Stories on mobile.
Of course, search is not the only source of traffic for news publishers, but it is by far the largest:
Traffic sources for news publishers. Source: Parse.ly.
News publishers don’t really have much of a choice: they either play by Google’s rules to try and claim visibility in Google News, or try and survive on the scraps that fall from Google’s table.
For me the interesting question is not ‘is Google News biased?’ but ‘how does Google select Top Stories?’
The answer to that question has three main elements: technology, relevancy, and authority.
The Technology of Google News & Top Stories
The technical aspects of ranking in the Top Stories carousel are fairly straightforward, but by no means simple.
First of all, the news site has to be included in the Google News index. This is not optional – according to NewsDashboard, over 99% of articles shown in Top Stories are from websites that are included in the Google News index.
Because this news index is manually maintained, there is an immediate opportunity for accusations of bias. The people responsible for curating the Google News index make decisions about which websites are okay and which aren’t, and this cannot be a ‘neutral’ and ‘objective’ process because people aren’t neutral and objective. Every news site that is accepted or rejected is done so on the basis of a human decision.
As all human decisions are subject to bias – especially unconscious bias – this makes the initial approval process already a subjective one.
Secondly, the news site needs to have certain technical elements in place to allow Google News to quickly crawl and index new articles. This includes structured data markup for your articles, and a means of letting Google know you have new articles (usually through a news-specific XML sitemap).
Both of these technologies are heavily influenced by Google: schema.org is a joint project from Google, Bing and Yahoo, and the sitemaps protocol is entirely dependent on search engines like Google for its existence.
Thirdly, you need to have valid AMP versions of your articles. Some may see this as an optional aspect, but really, without AMP a news site will not appear in Top Stories on mobile search results. This presents such a catastrophic loss of potential search traffic that it’s economically unfeasible for news websites to forego AMP.
While AMP is presented as an open source project, in reality the vast majority of its code is written by Google engineers. At last count, over 90% of the AMP code comes from Googlers. So let’s be honest, AMP is a Google project.
This gives Google full technical control over Google News and Top Stories – in Google’s own crawling, indexing, and ranking systems, as well as the technologies that news publishers need to adopt to be considered for Google News. Publishers don’t have all that much freedom in designing their tech stack if they want to have any hopes of getting traffic from Google.
Ranking in Top Stories
The other aspects of ranking in Google News and Top Stories are about the news site’s editorial choices. While historically the Top Stories algorithm has been quite simplistic and easy to manipulate, that’s less the case nowadays.
Since the powerful backlash against holocaust denial stories appearing in Google News, the search engine has started putting more resources in its News division, with a newly launched Google News vertical as the result.
The algorithms that decide which stories show up in any given Top Stories carousel take a number of aspects in to consideration:
- Is the article relevant for this query?
- Is it a recently published or updated article?
- Is it original content?
- Is the publisher known to write about this topic?
- Is the publisher trustworthy and reliable?
In Google News there is also a certain amount of personalisation, where Google’s users will see more stories from publishers that they prefer or are seen as geographically relevant (for example because it’s a newspaper local to the story’s focus).
And of course, a lot of the rankings of any given news article depend on how well the article has been optimised for search. A classic example is Angelina Jolie’s column for the New York Times about her double mastectomy – if you search for ‘angelina jolie mastectomy‘ her column doesn’t rank at all, and at the time it didn’t appear in any Top Stories carousel. What you see are loads of other articles written about her mastectomy, but the actual column that kicked off the story is nowhere to be found.
One look at the article in question should tell you why: it’s entirely unoptimised for the most relevant searches that people might type in to Google.
Some journalism purists might argue that tweaking an article’s headline and content for maximum visibility in Google News is a pollution of their craft. Yet journalists seem to have no qualms about optimising headlines for maximum visibility at news stands. News publishers have always tried to grab people’s attention with headlines and introduction text, and doing this for Google News is simply an extension of that practice.
Yet even with the best optimised content, news publishers are entirely dependent on Google’s interpretations of their writing. It’s Google’s algorithms that decide if and where an article appears in the Top Stories carousel.
Algorithms Are Never Neutral
According to Google, the new version of Google News uses artificial intelligence:
“The reimagined Google News uses a new set of AI techniques to take a constant flow of information as it hits the web, analyze it in real time and organize it into storylines.”
This seems like an attempt at claiming neutrality by virtue of machines making the decisions, not humans. But this doesn’t stand up to scrutiny.
All algorithmic evaluations are the result of human decisions. Algorithms are coded by people, and that means they will carry some measure of those people’s own unconscious biases and perceptions. No matter how hard Google tries to make algorithms ‘neutral’, it’s impossible to achieve real neutrality in any algorithm.
When Google’s algorithm decides that the story from Site A should appear first in the Top Stories carousel, and a similar story from Site B should be way down at the end of the carousel (or not in there at all), that is the result of countless human decisions – some large, some small – about what constitutes relevancy and trustworthiness.
Even with a diverse base of employees from all different backgrounds and walks of life, creating neutral algorithms is immensely challenging. Senior engineers’ decisions will almost always outweigh junior staff’s decisions, and some people’s biases will be represented in those editorial decisions that are made about how an algorithm ranks content.
And here Google can be very rightfully accused: it has an incredibly homogenous employee base.
Ironically, while Google reports on its employees’ ethnicity and gender, it doesn’t report on political leanings – which is what sparked the furore about their lack of diversity in the first place. So we have no way of really knowing if Google’s engineers come from varied political backgrounds.
This leaves Google wide open to criticisms of bias, and it’ll be very hard to dismiss those concerns.
Is Google News Biased?
To return to the question of bias in Google News, does Donald Trump have a point? The PJ Media article that sparked the controversy is deeply flawed and entirely unrepresentative, so the question remains unanswered.
Simply looking at Google News search results and evaluating their diversity of opinion is also a dangerous approach, because it fails to look at the underlying dependencies that go in to creating that result in the first place: the technological demands placed on news publishers, the skill of individual journalists to optimise their articles for Google News, the ability of news organisations to break stories and set the news agenda, and of course the editorial decisions that have gone in to Google’s ranking algorithms.
And we can’t leave out the fact that Google openly admits to making editorial decisions in Google News. Yes, actual people choosing stories to show up for trending topics. From its own relevant support documentation on curation:
The choice of language is very interesting: by using phrases like ’empirical signals’ and ‘algorithmically populated’ Google intends to create the perception that these human curators have no real editorial influence over what is shown in Google News. Yet, even if we accept the notion of Google’s curators being able to make neutral decisions (which we’re not), we know that algorithms are not neutral themselves, and – risking treading on philosophical grounds – there’s no such thing as ’empirical signals’ when it comes to news.
Despite its efforts with the Google News Initiative, Google has done little to alleviate legitimate fears of bias in its ranking algorithms. In fact, due to its near full control over the entire process, Google leaves itself very susceptible to accusations of bias in Google News.
With Google’s astonishing dominance in search with over 86% worldwide market share, this does beg the question: Can we trust Google? Should we?