- Jan 17, 2018

View Source: Why it Still Matters and How to Quickly Compare it to a Rendered DOM

SEOs love to jump on bandwagons. Since the dawn of the industry, SEO practitioners have found hills to die on – from doorway pages to keyword density to PageRank Sculpting to Google Plus.

One of the latest hypes has been ‘rendered DOM’; basically, the fully rendered version of a webpage with all client-side code executed.

When Google published details about their web rendering service last year, some SEOs were quick to proclaim that only fully rendered pages mattered. In fact, some high profile SEOs went as far as saying that “view source is dead” and that the rendered DOM is the only thing an SEO needs to look at.

These people would be wrong, of course.

Such proclamations stem from a fundamental ignorance about how search engines work. Yes, the rendered DOM is what Google will eventually use to index a webpage’s content. But the indexer is only part of the search engine. There are other aspects of a search engine that are just as important, and that don’t necessarily look at a webpage’s rendered DOM.

One such element is the crawler. This is the first point of contact between a webpage and a search engine. And, guess what, the crawler doesn’t render pages. I’ve explained the difference between crawling and indexing before, so make sure to read that.

Due to the popularity of JavaScript and SEO at the moment, there are plenty of smart folks conducting tests to see exactly how putting content in to JavaScript affects crawling, indexing, and ranking. So far we’ve learned that JavaScript can hinder crawling, and that indexing of JS-enabled content is often delayed.

So we know the crawler only sees a page’s raw HTML. And I suspect we know that Google has a multilayered indexing approach that first uses a webpage’s raw HTML before it gets around to rendering the page and extracting that version’s content. In a nutshell, a webpage’s raw source code still matters. In fact, it matters a lot.

I’ve found it useful to compare a webpage’s raw HTML source code to the fully rendered version. Such a comparison enables me to evaluate the differences and look at any potential issues that might occur with crawling and indexing.

For example, there could be some links to deeper pages that are only visible once the page is completely rendered. These links would not be seen by the crawler, so we can expect a delay to the crawling and indexing of those deeper pages.

Or we could find that a piece of JavaScript manipulates the DOM and makes changes to the page’s content. For example, I’ve seen comment plugins insert new <h1> heading tags on to a page, causing all kinds of on-page issues.

So let me show you how I quickly compare a webpage’s raw HTML with the fully rendered version.

HTML Source

Getting a webpage’s HTML source code is pretty easy: use the ‘view source’ feature in your browser (Ctrl+u in Chrome) to look at a page’s source code – or right-click and select ‘View Source’ – then copy & paste the entire code in to a new text file.

Rendered Code

Extracting the fully rendered version of a webpage’s code is a bit more work. In Chrome, you can open the browser’s DevTools with the Ctrl+Shift+i shortcut, or right-click and select ‘Inspect Element’.

In this view, make sure you’re on the Elements tab. There, right-click on the opening <html> tag of the code, and select Copy > Copy outerHTML.

You can then paste this in to a new text file as well.

With Chrome DevTools you get the computed DOM as your version of Chrome has rendered it, which may include code manipulations from your plugins and will likely be a different version of Chrome than Google’s render of the page. While Google now has their evergreen rendering engine that uses the latest version of Chrome, it’s unlikely Google will process all the client-side code the same way as your browser does. There are limits on both time and CPU cycles that Google’s rendering of a page can run in to, so your own browser’s version of the rendered code is likely to be different from Google’s.

To analyse potential issues with rendering of the code in Google search, you will need the code from the computed DOM as Google’s indexer sees it.

For this, you can use Google’s Rich Results Testing tool. This tool renders webpages the same way as Google’s indexer, and has a ‘View Source Code’ button that allows you to see – and copy – the fully rendered HTML:

Compare Raw HTML to Rendered HTML

To compare the two versions of a webpage’s code, I use Diff Checker. There are other tools available, so use whichever you prefer. I like Diff Checker because it’s free and it visually highlights the differences.

Just copy the two versions in to the two Diff Checker fields and click the ‘Find Difference’ button. The output will look like this:

In many cases, you’ll get loads of meaningless differences such as removed spaces and closing slashes. To clean things up, you can do a find & replace on the text file where you saved the raw HTML, for example to replace all instances of ‘/>’ with just ‘>’. Then, when you run the comparison again, you’ll get much cleaner output:

Now you can easily spot any meaningful differences between the two versions, and evaluate if these differences could cause problems for crawling and indexing.

This will highlight where JavaScript or other client-side code has manipulated the page content, and allows you to judge whether those changes will meaningfully impact on the page’s SEO.

DirtyMarkup Formatter

When you do your first DiffChecker comparison, you’ll quickly find that it’s not always very useful. When a page is rendered by Google, a lot of unnecessary HTML is stripped (such as closing slashes in HTML tags) and a general cleanup of the code happens.

Sometimes a webpage’s source code will be minified, which removes all spaces and tabs to save bytes. This leads to big walls of text that can be very hard, if not impossible, to analyse:

For this yeason, I always run both the raw HTML and the fully rendered code through the same code cleanup tool. I like to use DirtyMarkup Formatter for this.

By running both the HTML source and the rendered DOM through the same cleanup tool, you end up with code on both sides of the comparison that has identical formatting. This then helps with identifying problems when you use Diff Checker to compare the two versions.

Comparing two neatly formatted pieces of code is much easier and allows you to quickly focus on areas of the code that are genuinely different – which indicates that either the browser or a piece of client-side code has manipulated the page in some way.

Built-In Comparison

If all of the above sounds like a lot of manual effort, you’re right. That’s why SEO tool vendors like DeepCrawl, Screaming Frog, and Sitebulb now have built-in comparison features for HTML and rendered versions of each crawled page on your site.

I still prefer to look at manual comparisons of key pages for every site that I audit. It’s not that I don’t trust the tools, but there’s a risk in only looking at websites through the lens of SEO tools. Nothing beats proper manual analysis of a webpage when it comes to finding SEO issues and making informed, actionable recommendations.