View Source: Why it Still Matters and How to Quickly Compare it to a Rendered DOM

Some SEOs say we should stop looking at a webpage's raw HTML source code. These SEOs are wrong, and here's why.

SEOs love to jump on bandwagons. Since the dawn of the industry, SEO practitioners have found hills to die on – from doorway pages to keyword density to PageRank Sculpting to Google Plus.

One of the latest hypes has been ‘rendered DOM’; basically, the fully rendered version of a webpage with all client-side code executed. When Google published details about their web rendering service last year, some SEOs were quick to proclaim that only fully rendered pages mattered. In fact, some high profile SEOs went as far as saying that “view source is dead” and that the rendered DOM is the only thing an SEO needs to look at.

These people would be wrong, of course.

Such proclamations stem from a fundamental ignorance about how search engines work. Yes, the rendered DOM is what Google will eventually use to index a webpage’s content. But the indexer is only part of the search engine. There are other aspects of a search engine that are just as important, and that don’t necessarily look at a webpage’s rendered DOM.

One such element is the crawler. This is the first point of contact between a webpage and a search engine. And, guess what, the crawler doesn’t render pages. I’ve explained the difference between crawling and indexing before, so make sure to read that.

Due to the popularity of JavaScript and SEO at the moment, there are plenty of smart folks conducting tests to see exactly how putting content in to JavaScript affects crawling, indexing, and ranking. So far we’ve learned that JavaScript can hinder crawling, and that indexing of JS-enabled content is often delayed.

So we know the crawler only sees a page’s raw HTML. And I suspect we know that Google has a multilayered indexing approach that first uses a webpage’s raw HTML before it gets around to rendering the page and extracting that version’s content. In a nutshell, a webpage’s raw source code still matters. In fact, it matters a lot.

View Source

I’ve found it useful to compare a webpage’s raw HTML source code to the fully rendered version. Such a comparison enables me to evaluate the differences and look at any potential issues that might occur with crawling and indexing.

For example, there could be some links to deeper pages that are only visible once the page is completely rendered. These links would not be seen by the crawler, so we can expect a delay to the crawling and indexing of those deeper pages.

Or we could find that a piece of JavaScript manipulates the DOM and makes changes to the page’s content. For example, I’ve seen comment plugins insert new <h1> heading tags on to a page, causing all kinds of on-page issues.

So let me show you how I quickly compare a webpage’s raw HTML with the fully rendered version.

HTML Source

Getting a webpage’s HTML source code is pretty easy: use the ‘view source’ feature in your browser (Ctrl+u in Chrome) to look at a page’s source code – or right-click and select ‘View Source’ – then copy & paste the entire code in to a new text file.

Rendered Code

Extracting the fully rendered version of a webpage’s code is a bit more work. In Chrome, you can open the browser’s DevTools with the Ctrl+Shift+i shortcut, or right-click and select ‘Inspect Element’.

In this view, make sure you’re on the Elements tab. There, right-click on the opening <html> tag of the code, and select Copy > Copy outerHTML.

DevTools - Copy OuterHTML

You can then paste this in to a new text file as well.

With Chrome DevTools you get the computed DOM as your version of Chrome has rendered it, which may include code manipulations from your plugins and will likely be a different version of Chrome than Google’s web rendering service.

To analyse potential issues with rendering of the code in Google search, you will need the code from the computed DOM as Google’s indexer sees it.

For this, you can use Google’s Rich Results Testing tool. This is the only tool that currently renders webpages the same way as Google’s indexer, and has a ‘View Source Code’ button that allows you to see – and copy – the fully rendered HTML:

Google Rich Results Test

Compare Raw HTML to Rendered HTML

To compare the two versions of a webpage’s code, I use Diff Checker. There are other tools available, so use whichever you prefer. I like Diff Checker because it’s free and it visually highlights the differences.

Just copy the two versions in to the two Diff Checker fields and click the ‘Find Difference’ button. The output will look like this:

Diff Checker output

In many cases, you’ll get loads of meaningless differences such as removed spaces and closing slashes. To clean things up, you can do a find & replace on the text file where you saved the raw HTML, for example to replace all instances of ‘/>’ with just ‘>’. Then, when you run the comparison again, you’ll get much cleaner output:

Diff Checker output - clean

Now you can easily spot any meaningful differences between the two versions, and evaluate if these differences could cause problems for crawling and indexing.

This will highlight where JavaScript or other client-side code has manipulated the page content, and allows you to judge whether those changes will meaningfully impact on the page’s SEO.

Unminify

Sometimes a webpage’s source code will be minified, which removes all spaces and tabs to save bytes. This leads to big walls of text that can be very hard, if not impossible, to analyse:

Minified HTML code

In that case, I use unminify.com to put tabs and spaces back and make it a clearly readable piece of source code. This then helps with identifying problems when you use Diff Checker to compare the two versions.

Unminified HTML code

Comparing two neatly formatted pieces of code is much easier and allows you to quickly focus on areas of the code that are genuinely different – which indicates that either the browser or a piece of client-side code has manipulated the page in some way.

Google Fetch & Render

The importance of a webpage’s raw HTML code for SEO is implied by Google itself. In Search Console’s ‘Fetch as Google’ feature, there are two options for looking at a webpage:

GSC Fetch and Render

These two options highlight the different ways in which Google’s systems will interact with a webpage:

  • Fetch: how the crawler sees the page
  • Fetch and Render: how Google’s indexer will eventually render the page

Because Google’s crawler doesn’t fully render webpages, the raw HTML source code will continue to be an important aspect of any holistic analysis of a webpage’s SEO. Failure to take the source code in to account will leave you open to a whole range of rookie mistakes.

What tools do you use to compare raw HTML and rendered code? Share your own tips and tricks in the comments.

Update: It’s worth noting that the Sitebulb SEO crawler now has a built-in feature that allows you to visually compare a webpage’s raw HTML and rendered DOM.

SEO, Technical

Comments

  1. YES – I freaking love everything about this. The number of SEOs I hear going on about how “tech stack doesn’t matter cos Google reads everything anyway” is so frustrating these days. Proper analysis is still essential to make sure everything can be parsed and read before you even talk about indexing! Sharing so hard my mouse button nearly broke. :)

    Reply »

  2. I’m using view source and some ChromeDevTools features daily, but it’s the first time I read about “Copy outerHTML”.

    This is gold, thanks so much!

    Reply »

  3. Hey Barry,

    Great Post

    With SEORadar, we store and archive both the HTML and the DOM. We also just created a fetch vs. rendered diff utility for URLs we are monitoring (but it’s not free). However, after reading your post, I think it would be a great idea to break it out into a separate free tool and we are looking into breaking it out of our app.

    Reply »

  4. A really useful read and something that I have had to deal with last year with the Dev team. Always handy to have a resource like this to back up comments and people’s thoughts etc.

    Reply »

    1. Hi Bill, yes that would certainly speed up indexing of that particular page’s content. Due to the limits on the amount of Fetch & Renders you can do in GSC, it’s not recommended to rely on this feature for an entire website. :)

      Reply »

  5. Thank you for this great article. Something didn’t set with me right about relying on the search engines to see “everything” and I figured the view source is a simpler version of what a search engine sees. This article backs up my hypothesis.

    Reply »

  6. Wow, I love what you have discussed here. I am fan of checking view source and thanks to this article, you proved that it still matters. Thanks for the detailed tips and for suggesting Sitebulb SEO crawler.

    Reply »

Leave a Reply

Your email address will not be published. Required fields are marked *


Award Wins

DANI Awards 2018 Winners

UK Search Awards 2016 Winners

UK Search Awards 2017 Finalists