View Source: Why it Still Matters and How to Quickly Compare it to a Rendered DOM

Some SEOs say we should stop looking at a webpage's raw HTML source code. These SEOs are wrong, and here's why.

SEOs love to jump on bandwagons. Since the dawn of the industry, SEO practitioners have found hills to die on – from doorway pages to keyword density to PageRank Sculpting to Google Plus.

One of the latest hypes has been ‘rendered DOM’; basically, the fully rendered version of a webpage with all client-side code executed. When Google published details about their web rendering service last year, some SEOs were quick to proclaim that only fully rendered pages mattered. In fact, some high profile SEOs went as far as saying that “view source is dead” and that the rendered DOM is the only thing an SEO needs to look at.

These people would be wrong, of course.

Such proclamations stem from a fundamental ignorance about how search engines work. Yes, the rendered DOM is what Google will eventually use to index a webpage’s content. But the indexer is only part of the search engine. There are other aspects of a search engine that are just as important, and that don’t necessarily look at a webpage’s rendered DOM.

One such element is the crawler. This is the first point of contact between a webpage and a search engine. And, guess what, the crawler doesn’t render pages. I’ve explained the difference between crawling and indexing before, so make sure to read that.

Due to the popularity of JavaScript and SEO at the moment, there are plenty of smart folks conducting tests to see exactly how putting content in to JavaScript affects crawling, indexing, and ranking. So far we’ve learned that JavaScript can hinder crawling, and that indexing of JS-enabled content is often delayed.

So we know the crawler only sees a page’s raw HTML. And I suspect we know that Google has a multilayered indexing approach that first uses a webpage’s raw HTML before it gets around to rendering the page and extracting that version’s content. In a nutshell, a webpage’s raw source code still matters. In fact, it matters a lot.

View Source

I’ve found it useful to compare a webpage’s raw HTML source code to the fully rendered version. Such a comparison enables me to evaluate the differences and look at any potential issues that might occur with crawling and indexing.

For example, there could be some links to deeper pages that are only visible once the page is completely rendered. These links would not be seen by the crawler, so we can expect a delay to the crawling and indexing of those deeper pages.

Or we could find that a piece of JavaScript manipulates the DOM and makes changes to the page’s content. For example, I’ve seen comment plugins insert new <h1> heading tags on to a page, causing all kinds of on-page issues.

So let me show you how I quickly compare a webpage’s raw HTML with the fully rendered version.

HTML Source

Getting a webpage’s HTML source code is pretty easy: use the ‘view source’ feature in your browser (Ctrl+u in Chrome) to look at a page’s source code – or right-click and select ‘View Source’ – then copy & paste the entire code in to a new text file.

Rendered Code

Extracting the fully rendered version of a webpage’s code is a bit more work. In Chrome, you can open the browser’s DevTools with the Ctrl+Shift+i shortcut, or right-click and select ‘Inspect Element’.

In this view, make sure you’re on the Elements tab. There, right-click on the opening <html> tag of the code, and select Copy > Copy outerHTML.

DevTools - Copy OuterHTML

You can then paste this in to a new text file as well.

With Chrome DevTools you get the computed DOM as your version of Chrome has rendered it, which may include code manipulations from your plugins and will likely be a different version of Chrome than Google’s render of the page. While Google now has their evergreen rendering engine that uses the latest version of Chrome, it’s unlikely Google will process all the client-side code the same way as your browser does. There are limits on both time and CPU cycles that Google’s rendering of a page can run in to, so your own browser’s version of the rendered code is likely to be different from Google’s.

To analyse potential issues with rendering of the code in Google search, you will need the code from the computed DOM as Google’s indexer sees it.

For this, you can use Google’s Rich Results Testing tool. This tool renders webpages the same way as Google’s indexer, and has a ‘View Source Code’ button that allows you to see – and copy – the fully rendered HTML:

Google Rich Results Test

Compare Raw HTML to Rendered HTML

To compare the two versions of a webpage’s code, I use Diff Checker. There are other tools available, so use whichever you prefer. I like Diff Checker because it’s free and it visually highlights the differences.

Just copy the two versions in to the two Diff Checker fields and click the ‘Find Difference’ button. The output will look like this:

Diff Checker output

In many cases, you’ll get loads of meaningless differences such as removed spaces and closing slashes. To clean things up, you can do a find & replace on the text file where you saved the raw HTML, for example to replace all instances of ‘/>’ with just ‘>’. Then, when you run the comparison again, you’ll get much cleaner output:

Diff Checker output - clean

Now you can easily spot any meaningful differences between the two versions, and evaluate if these differences could cause problems for crawling and indexing.

This will highlight where JavaScript or other client-side code has manipulated the page content, and allows you to judge whether those changes will meaningfully impact on the page’s SEO.

DirtyMarkup Formatter

When you do your first DiffChecker comparison, you’ll quickly find that it’s not always very useful. When a page is rendered by Google, a lot of unnecessary HTML is stripped (such as closing slashes in HTML tags) and a general cleanup of the code happens.

Sometimes a webpage’s source code will be minified, which removes all spaces and tabs to save bytes. This leads to big walls of text that can be very hard, if not impossible, to analyse:

Minified HTML code

For this yeason, I always run both the raw HTML and the fully rendered code through the same code cleanup tool. I like to use DirtyMarkup Formatter for this.

By running both the HTML source and the rendered DOM through the same cleanup tool, you end up with code on both sides of the comparison that has identical formatting. This then helps with identifying problems when you use Diff Checker to compare the two versions.

Clean HTML code

Comparing two neatly formatted pieces of code is much easier and allows you to quickly focus on areas of the code that are genuinely different – which indicates that either the browser or a piece of client-side code has manipulated the page in some way.

Built-In Comparison

If all of the above sounds like a lot of manual effort, you’re right. That’s why SEO tool vendors like DeepCrawl, Screaming Frog, and Sitebulb now have built-in comparison features for HTML and rendered versions of each crawled page on your site.

I still prefer to look at manual comparisons of key pages for every site that I audit. It’s not that I don’t trust the tools, but there’s a risk in only looking at websites through the lens of SEO tools

Nothing beats proper manual analysis of a webpage when it comes to finding SEO issues and making informed, actionable recommendations.

SEO, Technical, Tools


  1. Thanks Barry – that does help! In an ideal world there wouldn’t be an infinite scroll (at least not in its rawest form), but your point about the links being initial discovered could be really useful for this purpose. Thanks again.

  2. This has really helped – thanks Barry. I was searching around for a very specific JS question and this has been super useful.

    I’ve a quick question on the back of it (I think it’s almost reversing one of your examples)

    Let’s say you had infinite scroll (unfortunately!); this would mean that Google would not be able to see all of the ‘rendered HTML’ content as it cannot scroll.

    Could a workaround for this be outputting all content to your ‘raw HTML’ (so if you viewed the source, you would see all content)? When a user accessed the page, standard infinite scroll JS functionality would kick in, but for crawlers all of the code would be there.

    Or, would this raw HTML be ‘overwritten’ by the rendered HTML and Googlebot would not longer be able to see all of this content?

    1. Hi Mike – great question. I think if the rendered DOM of the page removes the additional content from the HTML, then it would be a problem. Google would see two sets of content for that page – the raw HTML which contains everything, and the rendered version which only contains a fraction of that content. I don’t think that’s a healthy scenario, and Google is likely to index the rendered version in the end. So that means the infinite scroll content won’t make it into Google’s index.

      If however the infinite scroll content is more there for discovery – i.e. just as links to articles or product pages for Google to crawl – then it might be sufficient to have them in the raw HTML and not in the rendered DOM. It would mean there’s likely to internal link value flowing to those pages, but at least Googlebot could crawl them.

      Hope that helps!

  3. This is fantastic! I was searching for something that I could share with a developer without having to provide any additional explanation. Thank you, Barry.

  4. Hi, this is one of the best articles I’ve recently read, so thanks for sharing this info.

    I was surprised to find out that the rendered HTML is completely different from the raw HTML on the website I’m managing. Basically Google is rendering the old version of the website for some reason, and 90% of the website pages are not indexed because they are considered a duplicate of the Homepage. Technically they are all different and I even added the rel canonical tag on each page. However, I discovered that the HTML that Google rendered is completely different to the raw code. Is there a way to update the rendered HTML? (I tried to re-index some pages, but nothing happens)

    I would appreciate your opinion on this. Thanks a mil!

    1. Hi Andreea, that’s an interesting problem – do you know why the rendered HTML is so different from the raw HTML? WHat script could be causing that? If you want I can take a closer look, just send me an email on and I’ll have a look.

  5. It is a mistake to say “And, guess what, the crawler doesn’t render pages.” Many can,and do

  6. Hi Barry,

    Thank you for sharing such an interesting topic. Commonly, I only ignore the view source. I didn’t know that it has some impact in SEO field.

  7. This article is great! after much research finally I found some useful info. I have one question, while in source code I can see img alt text and headlines I can’t see them in the DOM, so in the rendered version (inspect), does this mean that the images may face indexing problems? they work with lazy loading

    1. Yep if the rendered DOM doesn’t have alt attributes then Google could hiccup over that. Lazy loading isn’t great for SEO, I’m afraid…

  8. Wow, I love what you have discussed here. I am fan of checking view source and thanks to this article, you proved that it still matters. Thanks for the detailed tips and for suggesting Sitebulb SEO crawler.

  9. Thank you for this great article. Something didn’t set with me right about relying on the search engines to see “everything” and I figured the view source is a simpler version of what a search engine sees. This article backs up my hypothesis.

  10. Hi Barry,

    This is a great article. Thanks!
    If indexing of JS-enabled content is often delayed, do you think fetch and render would speed it up for that url or just a temporary render?

    1. Hi Bill, yes that would certainly speed up indexing of that particular page’s content. Due to the limits on the amount of Fetch & Renders you can do in GSC, it’s not recommended to rely on this feature for an entire website. :)

  11. A really useful read and something that I have had to deal with last year with the Dev team. Always handy to have a resource like this to back up comments and people’s thoughts etc.

  12. Hey Barry,

    Great Post

    With SEORadar, we store and archive both the HTML and the DOM. We also just created a fetch vs. rendered diff utility for URLs we are monitoring (but it’s not free). However, after reading your post, I think it would be a great idea to break it out into a separate free tool and we are looking into breaking it out of our app.

  13. I’m using view source and some ChromeDevTools features daily, but it’s the first time I read about “Copy outerHTML”.

    This is gold, thanks so much!

  14. YES – I freaking love everything about this. The number of SEOs I hear going on about how “tech stack doesn’t matter cos Google reads everything anyway” is so frustrating these days. Proper analysis is still essential to make sure everything can be parsed and read before you even talk about indexing! Sharing so hard my mouse button nearly broke. :)

Comments are closed.