Removing analytics clutter from campaign URLs

30 November 2012

Long, complex URLs stuffed with query string parameters. We web developers are responsible for a fair few of those, although with the increasing adoption of URL Rewriting they're less visible than once they were.

Against this trend are the URLs which Google Analytics (GA) encourages its users to deploy to track specific campaigns, such as how many visitors arrived via a particular marketing link, or from an RSS reader. I'm sure you've seen them: http://www.example.com/story.html?utm_source=blah&utm_medium=blah&utm_term=blah&utm_content=blah&utm_campaign=blah

Nothing wrong with this, you might say. It may look like junk to ordinary folk, but it's doing necessary work.

Except that, unlike the query string identifiers an app needs to retrieve the correct product page etc. these campaign parameters aren't an intrinsic part of the URL - they're just there for the benefit of the site owner wanting to snoop measure activity.

Still, no harm done to the visitor... Unless they want to do something with the URL, such as bookmark it or copy and paste it into another application. If they don't want all that "utm" gunk sticking to it, they'll have to clean it off manually.

Wouldn't it be nice if we could save them that bother by cleaning up our own URLs, whilst still allowing GA to do its work?

Using Javascript to manipulate the URL

It's certainly possible to make changes to the current URL once a page has loaded, and Paul Irish has published a nifty script which appears to do the job. Unfortunately it relies on the HTML5 History API, which you can't use in IE9 and below (which in practice means all IE versions at the time of writing).

For less capable browsers we can use the window.location object. The following will remove the entire query string from the current window's URL:


window.location.search = "";

Trouble is, if you add that line of code to your page's document.ready() handler, you'll find yourself in an infinite loop—because changing the query string using this method causes a full page refresh. Even if we add some conditions so that it fires just once to get rid of the "utm" params, you'll still have an additional page load every time. Not good for the user or the analytics.

However there is a property of window.location that can be manipulated without a page refresh: the hash—i.e. the portion after the # sign.


window.location.hash= "";

This will leave the trailing # sign which is not ideal, but anything after it will be stripped in any browser without triggering a page reload.

Configuring GA to use hashes instead of query strings

But what use is this? The GA parameters are in the search property/query string, not the hash. True, but remember GA is also Javascript and it's quite easy to tell it to grab the parameters from the hash using the method setAllowAnchor in your configuration "snippet".


var _gaq = _gaq || [];
_gaq.push(
	['_setAccount','YOUR-GA-KEY']
	,['_setAllowAnchor',true]
	,['_trackPageview']
);

This allows URLs with a hash instead of a question mark separating the campaign parameters to be tracked:

http://www.example.com/story.html#utm_source=blah&utm_medium=blah&utm_term=blah&utm_content=blah&utm_campaign=blah

A cross-browser solution

With this in place and our inbound campaign links using hashes instead of question marks, we just need to implement a function to detect and strip out any "utm" parameters using the best method the browser supports: the HTML5 window.history object or plain old hash manipulation:


var removeUtms	=	function(){
	var l = window.location;
	if( l.hash.indexOf( "utm" ) != -1 ){
		if( window.history.replaceState ){
			history.replaceState({},'', l.pathname + l.search);
		} else {
			l.hash = "";
		}
	};
};
var _gaq = _gaq || [];
_gaq.push(
	['_setAccount','YOUR-GA-KEY']
	,['_setAllowAnchor',true]
	,['_trackPageview']
);
_gaq.push( removeUtms );
(function() {
	var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
	ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
	var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
})();

To be absolutely sure that you don't zap the parameters before GA has had a chance to register them, it's important to call the function using the _gaq.push() method, which will ensure it fires after the page view has been tracked.

Note also that any query string parameters—which might be present for other purposes—won't be touched. Only the hash values are stripped.

Campaign for cleaner URLs

To see it in action, click the following link to another page on this site and (assuming JS is enabled in your browser) notice how the "utm" parameters disappear once the page has loaded, leaving a nice clean URL for copying and pasting/bookmarking.

http://cfsimplicity.com/tags/60/analytics#utm_source=post61&utm_medium=link&utm_term=testing&utm_content=testing&utm_campaign=cleanerurls

Anchors

Update October 2013: Ilhan asks a good question in the comments: what if your campaign URL includes a named anchor? In other words you want to link to a specific part of a page by appending the name of an anchor or id, e.g. campaign.html?#bottomofpage. The script above removes everything after the #, including any named anchors.

An anchor must come immediately after the # symbol, before the utm parameters, so we can adapt our function so that it will preserve any non utm string in that initial position.


var removeUtms	=	function(){
  var l = window.location;
  if( l.hash.indexOf( "utm" ) != -1 ){
    var anchor = l.hash.match(/#(?!utm)[^&]+/);
    anchor	=	anchor? anchor[0]: '';
    if(!anchor && window.history.replaceState){
    	history.replaceState({},'', l.pathname + l.search);
    } else {
    	l.hash = anchor;
    }
  };
};

Here's a simple demonstration. The link includes an anchor named "bottom" which I've styled to appear 2000 pixels from the top, so well below the initial viewport on most screens. You should see that the utm parameters have been removed from the URL but the anchor remains and the window is focused at the bottom (the word "bottom" appears in the bottom left corner) .

http://cfsimplicity.com/ga-anchortest.html#bottom&utm_source=post61&utm_medium=link&utm_term=testing&utm_content=testing&utm_campaign=cleanerurls

Comments

  • Formatting comments: See this list of formatting tags you can use in your comments.
  • Want to paste code? Enclose within <pre><code> tags for syntax higlighting and better formatting and if possible use script. If your code includes "self-closing" tags, such as <cfargument>, you must add an explicit closing tag, otherwise it is likely to be mangled by the Disqus parser.