SPA에서 URL 해시 라우팅을 SEO 친화적으로 만드는 방법 • Special Agent Squeaky

이 블로그 게시물은 2013년 4월에 게시되었으므로, 읽는 시점에 따라 일부 내용이 최신이 아닐 수 있습니다. 안타깝게도 정보의 정확성을 보장하기 위해 게시물을 항상 최신 상태로 유지하는 것은 어렵습니다.

I love Single Page Applications. Even though I know there are some flaws with them (such as the slightly increased performance which made Twitter recently rollback their solution), I really like that it enables the developer to create really fluid user friendly websites.

One of the more obvious challenge with "SPAs" are that they are not really search engine optimized. Meaning, since your website’s content is most likely generate or added to the site on the fly with JavaScript, search engines have problems crawling and extracting information from it (since search engine crawlers don't usually execute the JavaScript when fetching a site’s contents).

However Google themselves have come out with some advice concerning this problem. One of their advice is using a snapshot technique, which I am going to briefly demonstrate in this guide.

But let's start from the beginning

My Single Page Applications website that generates the content by JavaScript, hence not currently very search engine friendly.

The Node.js webserver:

var express = require( "express" );

var app = express();

app.use( express.static( __dirname + '/public' ) );

app.listen( 8080 );

console.log( "Webserver started." );

and my single index.html file:

<!doctype html>

<html>
   <head>
       <script src="http://code.jquery.com/jquery-1.9.1.min.js"></script>
       <script type="text/javascript">

           function checkURL() {

               var myRegexp = /\/user\/(\w+)/;
               var match = myRegexp.exec(document.URL);

               if( match !== null ) {

                   $("body").html( "<p>" + match[1] + " has two cute cats!<p>" );

               }

           }

           $(document).ready(function () {

               checkURL();

           });

       </script>

   </head>

   <body onhashchange="checkURL();">

       <p><a href="/#!/user/john/">You should really visit John's page.</a></p>

   </body>
</html>

So basically here I have done a Single Page Application. If you visit http://localhost:8080/ you will see a simple page with a link on it - but if you visit http://localhost:8080/#/user/john/ you will learn that John has two cute cats.

The obvious problem here is that when Google crawls the url http://localhost:8080/#/user/john/ they will not learn that John has two cute cats, since that content was generated by JavaScript. So now that the problem is identified, how do we solve it?

Step 1 - adding the ! exclamation mark character

As suggested by Google, we should add an ! exclamation mark character next to our hash character, making it into the escaped fragment sequence.

So in our HTML page, we change the link "You should really visit my John's page." so that it now points to:

http://localhost:8080/#!/user/john/

The reason why we add this is because when Google finds links with #! they will convert that into _escaped_fragment_ when crawling the website. Basically meaning that the Google bot will fetch the contents of this URL instead:

http://localhost:8080/_escaped_fragment_/user/john/

However, if we try this new URL we will get a 404 Not Found error since our Node webserver is only serving our index.html at the moment. We need to fix that.

Step 2 - Capturing the Google bot requests

Now we have to create a special support for the requests performed by the Google bot. We do this by setting up a new mapping and add a special handler for these requests:

app.get( "/_escaped_fragment_/*", function( request, response ) {

    response.writeHead( 200,
        {
            "Content-Type": "text/html; charset=UTF-8"
        } );

    response.end( "Hello Google bot!" );

} );

This would give the Google bot a "Hello Google bot!" greeting when visiting this url:

http://localhost:8080/#!/user/john/

Step 3 - Creating the snapshots

In order to tell Google the actual contents of the URL (after JavaScript has generated the content that is) we need to take a snapshot of the site and provide that instead (through our newly implemented request handler).

To achieve this we will use a headless browser, such as PhantomJS using the Node module "phantomjs Node".

This is my Node PhantomJS script (the script provided to PhantomJS with instructions on what to do):

var system = require( "system" );

var page = require( "webpage" ).create();

var url = system.args[1];

page.open( url, function( status ) {

   var pageContent = page.evaluate( function() {

       return document.getElementsByTagName( "html" )[0].innerHTML;

   } );

   console.log( pageContent );

   phantom.exit();

} );

And here is my new Node webserver:

var express = require( "express" );
var path = require( "path" );
var childProcess = require( "child_process" );
var phantomjs = require( "phantomjs" );
var binPath = phantomjs.path;
var app = express();

app.use( express.static( __dirname + "/public" ) );

app.listen( 8080 );

app.get( "/_escaped_fragment_/*", function( request, response ) {

	var script = path.join( __dirname, "get_html.js" );

	var url = "http://localhost:8080" + request.url.replace( "_escaped_fragment_", "#!" );

	var childArgs =
	[
		script, url
	];

	childProcess.execFile( binPath, childArgs, function( err, stdout, stderr ) {

		response.writeHead( 200, {
			"Content-Type": "text/html; charset=UTF-8"
		} );

		response.end( "<!doctype html><html>" + stdout + "</html>" );

	} );

} );

console.log( "Webserver started." );

Wrapping everything together will result in when Google bot now does a request to http://localhost:8080/#!/user/john/, PhantomJS will create a snapshop of the real url and deliver that to the search engine.

Future performance improvements

Please note that this example above is not really performance friendly, as it actually will do an own request, for each search engine request that comes in. There is plenty of room to increase performance, as caching the snapshots on disk or even in the memory, etc.

Old comments from Disqus

Mark Everitt, Tuesday, November 19, 2013 11:13 PM

Thanks for this article. It gives me a fantastic starting point to unify the web clients and REST API consumers of my webservice without killing SEO.

Chase Adams, Tuesday, November 12, 2013 1:55 AM

I've been talking about doing this with my team with our mobile website, I'm interested to see how scalable it is in enterprise level applications. Great article and very easy to read. Thanks!

Createmyownwebsite.co, Sunday, September 22, 2013 6:40 AM

This is good tip! I am honestly not aware of search bots etc Let me try this now

Special Agent Squeaky님이 작성했습니다. 최초 게시일 2013년 4월 20일. 최종 업데이트일 2013년 4월 20일.