<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Silver Spike</title>
	<atom:link href="http://www.silverspike.co.uk/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.silverspike.co.uk</link>
	<description>The Official SilverDisc Blog</description>
	<lastBuildDate>Thu, 11 Mar 2010 13:53:20 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=abc</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Not Impressed By Microsoft&#8217;s New Bing Ad</title>
		<link>http://www.silverspike.co.uk/2010/03/11/not-impressed-by-microsofts-new-bing-ad/</link>
		<comments>http://www.silverspike.co.uk/2010/03/11/not-impressed-by-microsofts-new-bing-ad/#comments</comments>
		<pubDate>Thu, 11 Mar 2010 13:53:20 +0000</pubDate>
		<dc:creator>Alan Perkins</dc:creator>
				<category><![CDATA[Google]]></category>
		<category><![CDATA[Microsoft Bing]]></category>

		<guid isPermaLink="false">http://www.silverspike.co.uk/?p=115</guid>
		<description><![CDATA[Microsoft launched their new Bing ad on television last night.
My first impressions were that the ad is too negative.  It doesn&#8217;t show what Bing can do for you.  It&#8217;s at risk of associating Bing with information overload and distressed searchers.  I&#8217;m also not convinced the phrase &#8220;decision engine&#8221; is a good one [...]]]></description>
			<content:encoded><![CDATA[<p>Microsoft launched <a href="http://www.campaignlive.co.uk/thework/news/989165/Microsoft-Bing-JWT-London">their new Bing ad</a> on television last night.</p>
<p>My first impressions were that the ad is too negative.  It doesn&#8217;t show what Bing can do for you.  It&#8217;s at risk of associating Bing with information overload and distressed searchers.  I&#8217;m also not convinced the phrase &#8220;decision engine&#8221; is a good one &#8211; too techie, too nebulous.  Who&#8217;s making the decisions &#8211; me, or Bing?  </p>
<p>Compare it with Google&#8217;s Superbowl ad:</p>
<p><object width="560" height="340"><param name="movie" value="http://www.youtube.com/v/nnsSUqgkDwU&#038;hl=en_GB&#038;fs=1&#038;"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/nnsSUqgkDwU&#038;hl=en_GB&#038;fs=1&#038;" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="560" height="340"></embed></object></p>
<p>This has its own potential problems &#8211; I&#8217;m not sure I would have been brave enough to use no voiceover whatsoever on a TV ad running in a £60,000 per second timeslot &#8211; but in general it&#8217;s a much more upbeat ad showing someone achieving something &#8211; lots of things &#8211; using Google Search.</p>
<p>In Microsoft&#8217;s position, I think I&#8217;d accept the fact that lots of people use Google and get good results lots of the time, and show that Bing is an alternative that often succeeds when Google fails.  I&#8217;d challenge the notion that Google always delivers the right result, every time, and that if Google doesn&#8217;t deliver it it can&#8217;t be on the Web.  I&#8217;d get people to try Bing &#8211; that&#8217;s all you can ask of the ad.  An idea would be to use something based on the famous <a href="http://www.youtube.com/watch?v=ZWEhc1Lb30s">&#8220;Pepsi Challenge&#8221;</a>, but bring it right up to date.</p>
<p>Having seen the interview with <a href="http://community.microsoftadvertising.com/blogs/advertiser/archive/2010/03/11/bing-uk-tv-ads-video-interview-with-ashley-highfield.aspx">Ashley Highfield</a>, I&#8217;m looking forward to more ads in the series.  It would be great to see Bing achieve the double digit market share that he desires, but I think this was a bad start to the campaign.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.silverspike.co.uk/2010/03/11/not-impressed-by-microsofts-new-bing-ad/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Testing PPC Conversion Tracking</title>
		<link>http://www.silverspike.co.uk/2010/03/10/testing-ppc-conversion-trackin/</link>
		<comments>http://www.silverspike.co.uk/2010/03/10/testing-ppc-conversion-trackin/#comments</comments>
		<pubDate>Wed, 10 Mar 2010 14:41:19 +0000</pubDate>
		<dc:creator>Alan Perkins</dc:creator>
				<category><![CDATA[Analytics & Log Files]]></category>
		<category><![CDATA[PPC]]></category>

		<guid isPermaLink="false">http://www.silverspike.co.uk/?p=112</guid>
		<description><![CDATA[It&#8217;s SilverDisc&#8217;s 17th Birthday today, so here&#8217;s a free gift of an idea for Google, Yahoo and Microsoft to consider.
Here at SilverDisc we&#8217;re often having to install and test new conversion tracking code for our PPC clients.  Usually this involves searching for one of our client&#8217;s keywords on each search engine, clicking on it [...]]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s SilverDisc&#8217;s 17th Birthday today, so here&#8217;s a free gift of an idea for Google, Yahoo and Microsoft to consider.</p>
<p>Here at SilverDisc we&#8217;re often having to install and test new conversion tracking code for our PPC clients.  Usually this involves searching for one of our client&#8217;s keywords on each search engine, clicking on it (thus incurring cost for the client) then going through the client&#8217;s site, making a test purchase and, later, checking that all the analytics has worked.</p>
<p>A cool feature that the search engines could add to improve efficiency would be a dummy campaign/ad-group/keyword that was automatically created by the engine itself within the PPC account specifically to test conversion tracking.</p>
<p>The keyword could be assigned by the engine itself, and could be very long, cryptic and unique to each client account, e.g. g54fr89fdcdjasdoe84.</p>
<ul>
<li>Searching for this keyword would always trigger the client&#8217;s ad</li>
<li>Clicking this ad would not incur any real charges (although it may simulate a charge).  Alternatively, a very low charge could be applied, e.g. £0.01.</li>
<li>Conversion tracking could work much faster for this one keyword, e.g. near-real-time, to allow better, faster testing</li>
</ul>
<p>This would save loads of time within agencies and mean that client accounts were up and running sooner, making more money for both clients and search engines.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.silverspike.co.uk/2010/03/10/testing-ppc-conversion-trackin/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Calling for link spam reports</title>
		<link>http://www.silverspike.co.uk/2010/03/09/calling-for-link-spam-reports/</link>
		<comments>http://www.silverspike.co.uk/2010/03/09/calling-for-link-spam-reports/#comments</comments>
		<pubDate>Tue, 09 Mar 2010 15:31:01 +0000</pubDate>
		<dc:creator>Alan Perkins</dc:creator>
				<category><![CDATA[Links]]></category>
		<category><![CDATA[SEO]]></category>
		<category><![CDATA[Spam]]></category>

		<guid isPermaLink="false">http://www.silverspike.co.uk/?p=107</guid>
		<description><![CDATA[I see that Matt Cutts of Google is calling for link spam reports.
I&#8217;m still very troubled by this paid links issue after all these years!
I agree it&#8217;s Google&#8217;s right to penalise or promote any page/site in its natural listings, which represent Google&#8217;s subjective opinion of relevancy.
However, the idea that all paid links are bad/&#8221;evil&#8221; is [...]]]></description>
			<content:encoded><![CDATA[<p>I see that <a href="http://www.mattcutts.com/blog/calling-for-link-spam-reports/">Matt Cutts of Google is calling for link spam reports</a>.</p>
<p>I&#8217;m still very troubled by this paid links issue after all these years!</p>
<p>I agree it&#8217;s Google&#8217;s right to penalise or promote any page/site in its natural listings, which represent Google&#8217;s subjective opinion of relevancy.</p>
<p>However, the idea that all paid links are bad/&#8221;evil&#8221; is wrong in so many ways:</p>
<ul>
<li>Paid links pre-date Google.</li>
<li>There is no machine-readable standard for labelling a paid link.  I&#8217;ll repeat that &#8211; there is no machine-readable standard for labelling a paid link.</li>
<li>Labelling paid links fails the &#8220;Does this makes sense in the absence of search engines?&#8221; ethical test.  The answer may well be &#8220;Yes&#8221;.  (Where the answer is &#8220;No&#8221;, I agree paid links are spam).</li>
<li>Labelling paid links fails the &#8220;Would I do this if search engines did not exist?&#8221; test.  In fact, you have to know that Google exists, and that they mind about paid links, in order to label those paid links in the non-standard way that Google asks you to label them.  This is perhaps my biggest beef with Google&#8217;s approach to paid links &#8211; they actually violate one of Google&#8217;s published Webmaster principles.</li>
<li>What does &#8220;paid&#8221; mean anyway?  An actual exchange of cash?  If you look at the top results for any hugely commercial field, say &#8220;car insurance&#8221;, it&#8217;s hard to believe that there is no commercial influence in the results!  When all that a company does is commercial, then every link (positive or negative) to that company&#8217;s site is commercial in nature.</li>
</ul>
<p>I understand that a market in paid links arose because of Google&#8217;s algorithm.</p>
<p>However, the irony is that in responding to that market by asking all publishers to label paid links in a non-standard way, Google violated <a href="http://www.google.com/support/webmasters/bin/answer.py?hl=en&#038;answer=35769#3">its own principles</a>.  It started to ask publishers to adapt what they published to suit Google (because Google existed), and called them spammers if they didn&#8217;t.  That&#8217;s the wrong way around.  It&#8217;s the spammers that do stuff purely because Google exists!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.silverspike.co.uk/2010/03/09/calling-for-link-spam-reports/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Google Results Prefetching in Firefox/Mozilla</title>
		<link>http://www.silverspike.co.uk/2009/08/31/google-results-prefetching-in-firefox/</link>
		<comments>http://www.silverspike.co.uk/2009/08/31/google-results-prefetching-in-firefox/#comments</comments>
		<pubDate>Mon, 31 Aug 2009 09:39:33 +0000</pubDate>
		<dc:creator>Alan Perkins</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[Google]]></category>

		<guid isPermaLink="false">http://www.silverspike.co.uk/?p=92</guid>
		<description><![CDATA[It appears that, some time ago, Google removed details of results prefetching from its Webmaster guidelines while continuing to implement results prefetching in its search results. Here's what they removed.]]></description>
			<content:encoded><![CDATA[<p>It appears that, some time ago, Google removed details of results prefetching from its Webmaster guidelines while continuing to implement results prefetching in its search results.</p>
<p>If you haven&#8217;t a clue what I&#8217;m talking about, the <a href="http://www.archive.org/">Wayback Machine</a> has the <a href="http://web.archive.org/web/20050601001018/http://www.google.com/webmasters/faq.html">original Google Webmaster help</a> on this topic, which I&#8217;ll paste here verbatim in order to make it searchable (Wayback Machine pages aren&#8217;t indexed by search engines):</p>
<blockquote>
<h2>Results Prefetching Questions</h2>
<p>1. What is &#8220;results prefetching,&#8221; and how does it impact my site?</p>
<p>On some searches, Google uses a special &lt;link&gt; tag supported by Firefox and Mozilla to instruct the browser to download the top search result before the user clicks on the result. When the user clicks on the top result, the destination page will load faster than before. This tag is only inserted when it is likely that the user will click on the first link.</p>
<p>For example, when a Firefox user searches for [<a href="http://www.google.com/search?q=stanford">stanford</a>], Google includes the following tag in the results HTML:</p>
<p><code>&lt;link rel="prefetch" href="http://www.stanford.edu/"&gt;</code></p>
<p>The <a href="http://www.mozilla.org/projects/netlib/Link_Prefetching_FAQ.html">official Mozilla Link Prefetching FAQ</a> describes the behavior of this tag in detail.</p>
<p>Prefetching may impact your site because the prefetch request will happen whether or not the user clicks on the result, so it may result in additional traffic to your web server. Google only inserts this tag when there is a high likelihood that the user will click on the top result, but clearly this heuristic is not right 100% of the time.</p>
<p>2. Can I distinguish prefetch requests from normal requests?</p>
<p>Yes, as described in the <a href="http://www.mozilla.org/projects/netlib/Link_Prefetching_FAQ.html">Mozilla Link Prefetching FAQ</a>, prefetch requests include the additional HTTP header</p>
<p><code>X-moz: prefetch</code></p>
<p>3. I want to block/ignore prefetch requests. What should I do?</p>
<p>To block or ignore prefetch requests (from Google and other web sites), you should configure your web server to return a 404 HTTP response code for requests that contain the &#8220;X-moz: prefetch&#8221; header.</p></blockquote>
<p>What else do you need to know about results prefetching?</p>
<p>If you run Google Analytics or another JavaScript-based analytics package, you won&#8217;t see these prefetched pages in your analytics.  That&#8217;s because only the HTML is prefetched, not the images, JavaScript, etc. referenced by that HTML, which means that the Analytics JavaScript is never even fetched, let alone executed.  You need to look at raw log files to see prefetched pages.</p>
<p>Google only issues the prefetch code when they are very confident that searchers will click on the #1 result (as in their example, a search for stanford).  Most times, particularly for more &#8220;normal&#8221; sites (i.e. not Stanford), Google won&#8217;t issue the code.  So you may never see this on your own site.</p>
<p>However, it&#8217;s worth being aware of this issue because if you do see a prefetch in your raw logs you&#8217;ll want to know why; and because, depending on how you calculate conversions, the fact that a page is prefetched but never viewed by a searcher may significantly affect your conversion tracking and monetisation on that page.  I&#8217;m surprised that Google removed this info from their Webmaster help.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.silverspike.co.uk/2009/08/31/google-results-prefetching-in-firefox/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Nicola has baby</title>
		<link>http://www.silverspike.co.uk/2009/06/29/nicola-has-baby/</link>
		<comments>http://www.silverspike.co.uk/2009/06/29/nicola-has-baby/#comments</comments>
		<pubDate>Mon, 29 Jun 2009 15:25:28 +0000</pubDate>
		<dc:creator>Alan Perkins</dc:creator>
				<category><![CDATA[SilverDisc News]]></category>

		<guid isPermaLink="false">http://www.silverspike.co.uk/?p=89</guid>
		<description><![CDATA[It seems like only last year &#8211; and it was &#8211; that we posted the happy news of Nicola&#8217;s marriage to Sam Richards here on The Silver Spike.
Ever the fast worker, Nicola gave birth yesterday to Caitlin Rose Richards at 10.29am, weighing 7lb 1oz.
Congratulations to Nicola and Sam from all at SilverDisc!
]]></description>
			<content:encoded><![CDATA[<p>It seems like only last year &#8211; and it was &#8211; that we posted the happy <a href="http://www.silverspike.co.uk/2008/08/02/nicola-gets-married/">news of Nicola&#8217;s marriage to Sam Richards here on The Silver Spike</a>.</p>
<p>Ever the fast worker, Nicola gave birth yesterday to Caitlin Rose Richards at 10.29am, weighing 7lb 1oz.</p>
<p>Congratulations to Nicola and Sam from all at SilverDisc!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.silverspike.co.uk/2009/06/29/nicola-has-baby/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Early Easter Egg &#8211; The rel=canonical Calculator</title>
		<link>http://www.silverspike.co.uk/2009/04/09/rel-canonical-calculator/</link>
		<comments>http://www.silverspike.co.uk/2009/04/09/rel-canonical-calculator/#comments</comments>
		<pubDate>Thu, 09 Apr 2009 11:19:04 +0000</pubDate>
		<dc:creator>Alan Perkins</dc:creator>
				<category><![CDATA[Crawling and Indexing]]></category>

		<guid isPermaLink="false">http://www.silverspike.co.uk/?p=85</guid>
		<description><![CDATA[SilverDisc offers an early Easter Egg to Silver Spike readers &#8211; a rel=canonical calculator to help you help search engines to deliver more high quality, high converting visitors to your site.
This builds on the recent series of posts on this topic:

URL Canonicalisation and Normalisation
rel=canonical tag
a robots.txt equivalent to rel=canonical

The rel=canonical calculator will go on general [...]]]></description>
			<content:encoded><![CDATA[<p>SilverDisc offers an early <a href="http://en.wikipedia.org/wiki/Easter_egg_(media)">Easter Egg</a> to Silver Spike readers &#8211; a <a href="http://www.silverdisc.co.uk/tools/canonical/">rel=canonical calculator</a> to help you help search engines to deliver more high quality, high converting visitors to your site.</p>
<p>This builds on the recent series of posts on this topic:</p>
<ul>
<li><a href="http://">URL Canonicalisation and Normalisation</a></li>
<li><a href="http://www.silverspike.co.uk/2009/03/03/rel-canonical-tag/">rel=canonical tag</a></li>
<li><a href="http://www.silverspike.co.uk/2009/03/08/a-robotstxt-equivalent-to-relcanonical/">a robots.txt equivalent to rel=canonical</a></li>
</ul>
<p>The <a href="http://www.silverdisc.co.uk/tools/canonical/">rel=canonical calculator</a> will go on general release in the next couple of weeks, and we will be making some PHP code available to insert the rel=canonical tag on your own pages.  That&#8217;s right &#8211; FREE CODE.  Register using the instructions provided on the <a href="http://www.silverdisc.co.uk/tools/canonical/">rel=canonical calculator</a> page.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.silverspike.co.uk/2009/04/09/rel-canonical-calculator/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>a robots.txt equivalent to rel=canonical</title>
		<link>http://www.silverspike.co.uk/2009/03/08/a-robotstxt-equivalent-to-relcanonical/</link>
		<comments>http://www.silverspike.co.uk/2009/03/08/a-robotstxt-equivalent-to-relcanonical/#comments</comments>
		<pubDate>Sun, 08 Mar 2009 18:00:56 +0000</pubDate>
		<dc:creator>Alan Perkins</dc:creator>
				<category><![CDATA[Crawling and Indexing]]></category>
		<category><![CDATA[General]]></category>
		<category><![CDATA[Search Engines]]></category>
		<category><![CDATA[robots.txt]]></category>

		<guid isPermaLink="false">http://www.silverspike.co.uk/?p=71</guid>
		<description><![CDATA[In my last post I looked at the rel=canonical tag and finished by promising to look at some of the limitations of rel=canonical and consider some alternatives.
Many of the alternatives have existed for some time &#8211; the use of redirects and cookies, for example.  However, the introduction of a rel=canonical tag was an opportunity [...]]]></description>
			<content:encoded><![CDATA[<p>In my last post I looked at the <a href="http://www.silverspike.co.uk/2009/03/03/rel-canonical-tag/">rel=canonical tag</a> and finished by promising to look at some of the limitations of rel=canonical and consider some alternatives.</p>
<p>Many of the alternatives have existed for some time &#8211; the use of redirects and cookies, for example.  However, the introduction of a rel=canonical tag was an opportunity for search engines to also introduce other, more efficient, standards.  These are the alternatives I would like to consider &#8211; alternatives that don&#8217;t exist yet, which the search engines could have introduced this time around and may introduce in future.</p>
<p>I see the rel=canonical tag as analogous to the meta robots tag, and  therefore suffering from many of the same limitations:</p>
<ul>
<li>The rel=canonical tag is located in a HTML file, and that HTML therefore needs to be fetched and parsed in order for the tag to be seen and acted upon.  Therefore, the tag does not save bandwidth or CPU for the Web site or search engine.</li>
<li>The rel=canonical tag is located in a HTML file and gives instructions about that file.  Therefore, it cannot be used to solve canonical issues for non-HTML files such as images, PDF files or Flash movies.</li>
<li>The rel=canonical tag acts at a micro-level rather than a macro-level.  Therefore it is difficult to review that a site-wide policy has been correctly implemented using rel=canonical;  Every possible file has to be inspected.  Also, code changes have to be made in order to write the rel=canonical tag.  This may slow its implementation.</li>
</ul>
<p>Where the above issues apply to rel=canonical, and similar issues apply to the meta robots tag, it struck me that an opportunity has been missed to also solve canonical issues through the robots.txt file.  Any fix applied through robots.txt would not suffer from the above problems.</p>
<p>Extensions to robots.txt could be made in a number of ways.  For example, a <a href="http://httpd.apache.org/docs/2.0/mod/mod_rewrite.html">mod_rewrite</a>-type syntax could be introduced.  However, I&#8217;m not sure anything so advanced is needed.  Most canonical issues arise from three things:</p>
<ol>
<li>the use of query parameters in dynamic URLs.</li>
<li>www versus non-www versions of a site (and other subdomains).</li>
<li>inconsistent use of default index page URLs.</li>
</ol>
<p>Some simple robots.txt fields to control these issues would fix most problems without the pain and errors that a mod_rewrite implementation would create.</p>
<h2>Query Parameters</h2>
<p>Google Analytics and Yahoo Site Explorer are two examples of tools that allow simple manipulation of URL query parameters.  <a href="http://help.yahoo.com/l/us/yahoo/search/siteexplorer/dynamic/dynamic-03.html">Yahoo&#8217;s Dynamic URL Help</a> lists some of the crawling, indexing and ranking benefits of this approach.</p>
<p>Yahoo Site Explorer allows you to remove a query parameter or set a query parameter to a default value within a URL.  Using this, a URL such as</p>
<ul>
<li>http://www.example.com/page.php?refby=affiliate&#038;sid=abc123</li>
</ul>
<p>could be crawled and indexed as</p>
<ul>
<li>http://www.example.com/page.php?refby=yhoo_srch</li>
</ul>
<p>The session id has been dropped and the referrer has been overwritten as yhoo_srch, meaning all traffic sent by Yahoo Search could be attributed to Yahoo Search rather than the affiliate.  This functionality could be implemented in robots.txt using a new syntax something like the following:</p>
<p><code>User-Agent: Slurp<br />
Disallow:<br />
QueryParam: -sid<br />
QueryParam: refby=yhoo_srch<br />
</code></p>
<p>meaning that the sid query parameter is to be dropped (as it is preceded by &#8216;-&#8217;) and the refby query parameter is to be overwritten with a default value (as a default value is provided).  The same effect could be achieved with a single line:</p>
<p><code>User-Agent: Slurp<br />
Disallow:<br />
QueryParam: -sid, refby=yhoo_srch<br />
</code></p>
<p>One problem with both Google Analytics and Yahoo Site Explorer is that you must list the query parameters you wish to <i>drop</i> from URLs &#8211; not the ones you wish to <i>keep</i>.  Because third parties can link to your site, you&#8217;re not in control of the links they create and the query parameters they use.  Therefore, canonical issues can only truly be solved by specifying the query parameters you wish to keep, rather than those you wish to drop.  To solve this, wildcards could specify the default action to be applied to all non-listed query parameters.  Therefore I propose the following syntax:</p>
<p><code><br />
QueryParam: <i>retainParam</i>[=defaultValue]<br />
QueryParam: -<i>dropParam</i><br />
QueryParam: [-]*<br />
</code></p>
<p>where&#8230;</p>
<ul>
<li>retainParam[=value]: specfies a query parameter you definitely want to keep, and an optional default value you want it set to</li>
<li>-dropParam: specifies a query parameter you definitely want to drop</li>
<li>*: means keep all query parameters not specified (default)</li>
<li>-*: means drop all query parameters not specified</li>
</ul>
<h2>Default domain and Index Pages</h2>
<p>Two further, much simpler additions to robots.txt could clear up the majority of other canonical problems.  These are Domain and IndexPage:</p>
<p><code><br />
Domain: <i>defaultDomain</i><br />
IndexPage: <i>defaultIndexPage</i><br />
</code></p>
<p>defaultDomain specfies the default domain for this robots.txt file.  For example, if the search engine retrieves http://www.example.com/robots.txt and finds &#8230;</p>
<p><code><br />
Domain: http://example.com/<br />
</code></p>
<p>&#8230;it would know to index all URLs under the non-www domain.  This would allow multiple parked domains to share the same content and robots.txt file without needing redirects or causing canonical issues, which is currently a common problem.</p>
<p>The IndexPage field specifies a default index page for the domain, i.e.  a page for which the following two URLs are considered equivalent:</p>
<p>http://www.example.com/path/</p>
<p>http://www.example.com/path/<i>defaultIndexPage</i></p>
<h2>Conclusion</h2>
<p>In this post I&#8217;ve proposed three new fields to add to robots.txt to provide an alternative to the rel=canonical tag, just as the current robots.txt fields are themselves alternatives to the meta robots tag, with their own advantages and disadvantages.  The chief advantages I see of canonicalising through robots.txt are:</p>
<ul>
<li>Acting through robots.txt means that a resource does not have to be fetched and parsed in order for the canonicalisation instructions to be followed.  Therefore, bandwidth and CPU is saved for both the Web site and search engine.</li>
<li>Acting through robots.txt means that canonical issues can be solved for non-HTML files such as images, PDF files or Flash movies.</li>
<li>Acting through robots.txt means large scale changes can be made very quickly and easily without the need for any code changes.  It&#8217;s also much easier to review the changes that have been made.</li>
</ul>
<p>The Domain, IndexPage and QueryParam fields would all be optional and independent of each other.  It would be great if the search engines could introduce some or all of these ideas into robots.txt.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.silverspike.co.uk/2009/03/08/a-robotstxt-equivalent-to-relcanonical/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>rel=canonical tag</title>
		<link>http://www.silverspike.co.uk/2009/03/03/rel-canonical-tag/</link>
		<comments>http://www.silverspike.co.uk/2009/03/03/rel-canonical-tag/#comments</comments>
		<pubDate>Tue, 03 Mar 2009 01:32:33 +0000</pubDate>
		<dc:creator>Alan Perkins</dc:creator>
				<category><![CDATA[Crawling and Indexing]]></category>
		<category><![CDATA[SEO]]></category>
		<category><![CDATA[Technical Architecture]]></category>

		<guid isPermaLink="false">http://www.silverspike.co.uk/?p=40</guid>
		<description><![CDATA[So, Google, Yahoo, Microsoft and, more recently, Ask have announced the new &#8220;canonical&#8221; link type or, more colloquially, the rel=canonical tag.
Much has already been written about this tag and its purpose: to help prevent duplicate content issues.  Probably the best summary is this Matt Cutts video:

This tag is a welcome addition to the armoury [...]]]></description>
			<content:encoded><![CDATA[<p>So, <a href="http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html">Google</a>, <a href="http://ysearchblog.com/2009/02/12/fighting-duplication-adding-more-arrows-to-your-quiver/">Yahoo</a>, <a href="http://blogs.msdn.com/webmaster/archive/2009/02/12/partnering-to-help-solve-duplicate-content-issues.aspx">Microsoft</a> and, more recently, <a href="http://blog.ask.com/2009/02/ask-is-going-canonical.html">Ask</a> have announced the new &#8220;canonical&#8221; <a href="http://www.w3.org/TR/REC-html40/struct/links.html#adef-rel">link type</A> or, more colloquially, the rel=canonical tag.</p>
<p>Much has already been written about this tag and its purpose: to help prevent duplicate content issues.  Probably the best summary is this <a href="http://www.mattcutts.com/blog/canonical-link-tag-video/">Matt Cutts video</a>:</p>
<p><center><object width="480" height="289"><param name="movie" value="http://www.youtube.com/v/Cm9onOGTgeM&#038;hl=en&#038;fs=1"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/Cm9onOGTgeM&#038;hl=en&#038;fs=1" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="480" height="289"></embed></object></center></p>
<p>This tag is a welcome addition to the armoury in the fight against duplicate content issues.  In addition to Matt&#8217;s comments, I would make the following points:</p>
<h2>Copyright Protection</h2>
<p>Scrapers are forever copying content and publishing it on their own sites/splogs.  Sometimes they are exceptionally lazy or stupid, even to the extent that they <a href="http://www.webmasterworld.com/forum89/9965.htm">copy Adsense code</a> onto their own sites.  If they copy your rel=canonical tag onto their site, that would give a strong &#8220;hint&#8221; to the search engine that you were the original owner of the content:</p>
<p align="left"><code>&lt;link rel="canonical" href="href="http://www.mysite.com/my/content/" /&gt;</code></p>
<h2>Microsoft Platforms</h2>
<p>Matt made reference to the Microsoft platform in his video, but I would emphasise the point.  Microsoft&#8217;s implementation of <A HREF="http://www.ietf.org/rfc/rfc2396.txt">RFC 2396</A> is flawed.  The path component of a URL is supposed to be case sensitive, but Microsoft makes it case insensitive.  If there are n alphabetic characters in the path, then a Microsoft implementation gives 2<sup>n</sup> possible variations of that path, where there should be only one.  For example, if n=1 and the path is &#8220;/a/&#8221;. Microsoft would allow &#8220;/a/&#8221; and &#8220;/A/&#8221;; if n=2 and the path is &#8220;/ab/&#8221;. Microsoft would allow &#8220;/ab/&#8221;, &#8220;/aB&#8221;, &#8220;/Ab&#8221; and &#8220;/AB/&#8221;; and so on. 2<sup>n</sup> variations gives vast potential for duplicate content and it is a big issue with sites built on the Microsoft platform.  The rel=canonical tag makes it very easy to specify the correct, case-sensitive path on a Microsoft platform:</p>
<p align="left"><code>&lt;link rel="canonical" href="http://www.mysite.com/my/case/sensitive/path/" /&gt;</code></p>
<h2>Static Web Content</h2>
<p>Static web content is content that is stored in the format in which it is delivered. Typically, static content is served under a static URL (a URL that does not contain a question mark).  However, it is possible to link to static content and append query parameters, even though these query parameters will have no impact on the content that is served.  One example of when this might happen is when a referrer parameter is passed to a JavaScript function within the static content:</p>
<p align="left"><code>&lt;a href="http://www.mysite.com/?referrer=myAffiliate0001"&gt;Affiliate Link&lt;/a&gt;</code></p>
<p>Thousands of links can be created to a single, static URL, each with a different referrer query parameter attached.  For sites built on static content, trying to manage such links has been difficult in the past.  Now, it&#8217;s relatively easy.  Each page of static content simply needs to contain a rel=canonical tag:</p>
<p align="left"><code>&lt;link rel="canonical" href="http://www.mysite.com/my/static/url.html/" /&gt;</code></p>
<h2>Conclusions: rel=canonical</h2>
<p>For the reasons stated above, I would recommend the use of a rel=canonical tag in all static content.  In fact, I would recommend its use in all content, static or dynamic &#8211; with appropriate care of course.  It&#8217;s a powerful tag and using it wrongly could have dire consequences.</p>
<p>In the next post I&#8217;ll look at some of the limitations of the rel=canonical tag and consider some alternatives.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.silverspike.co.uk/2009/03/03/rel-canonical-tag/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>URL Canonicalisation and Normalisation</title>
		<link>http://www.silverspike.co.uk/2009/02/28/url-canonicalisation-and-normalisation/</link>
		<comments>http://www.silverspike.co.uk/2009/02/28/url-canonicalisation-and-normalisation/#comments</comments>
		<pubDate>Sat, 28 Feb 2009 17:36:41 +0000</pubDate>
		<dc:creator>Alan Perkins</dc:creator>
				<category><![CDATA[Crawling and Indexing]]></category>
		<category><![CDATA[General]]></category>
		<category><![CDATA[Search Engines]]></category>
		<category><![CDATA[Technical Architecture]]></category>

		<guid isPermaLink="false">http://www.silverspike.co.uk/?p=31</guid>
		<description><![CDATA[I’ve been meaning to write about the new rel=canonical tag, which was proposed by Google, Yahoo and Microsoft on February 12.  I managed to squeeze some thoughts on it into my presentation and workshop at SES London, and I’ll be speaking more about it at SES New York next month, but before I blogged [...]]]></description>
			<content:encoded><![CDATA[<p>I’ve been meaning to write about the new rel=canonical tag, which was proposed by <a href="http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html">Google</a>, <a href="http://ysearchblog.com/2009/02/12/fighting-duplication-adding-more-arrows-to-your-quiver/">Yahoo</a> and <a href="http://blogs.msdn.com/webmaster/archive/2009/02/12/partnering-to-help-solve-duplicate-content-issues.aspx">Microsoft</a> on February 12.  I managed to squeeze some thoughts on it into my presentation and workshop at <a href="http://www.searchenginestrategies.com/london/">SES London</a>, and I’ll be speaking more about it at <a href="http://www.searchenginestrategies.com/newyork/">SES New York</a> next month, but before I blogged about it I really wanted to write more about URL Canonicalisation and Normalisation in general.</p>
<h2>Canonicalisation or Canonicalization?<br />
Normalisation or Normalization?</h2>
<p>I’m British, so I say Canonicalisation and Normalisation.  Your mileage may vary.</p>
<h2>What is URL Canonicalisation?</h2>
<p>We’re talking about search engines here, so let’s try a definition that applies generally, but leans towards search:</p>
<dl>
<dt>URL Canonicalisation</dt>
<dd>involves taking a <a href="#set-of-urls">set of different URLs</a> that <a href="#similar-content">all serve or lead to the same or similar content</a>, and applying <a href="#cononicalisation-rules">rules</a> to select one URL from that set under which that content should be <a href="#indexed">indexed or presented</a>.</dd>
</dl>
<p>I’ve hyperlinked the terms I think are important to more detail below, but before we go into them let’s try defining URL Normalisation.</p>
<dl>
<dt>URL Normalisation</dt>
<dd>involves taking a <a href="#single-url">single URL</a> and applying a <a href="#normalisation-algorithm">normalisation algorithm to produce a standard form</a> for that URL.</dd>
</dl>
<p>Others define normalisation and canonicalisation as all part of the same thing, but I like to think of them as separate processes.  To my way of thinking:</p>
<ul>
<li>you can normalise a single URL but you can only canonicalise a set of URLs</li>
<li>an un-normalised URL will serve the same content as a normalised URL, because it’s the same URL</li>
<li>all indexed URLs are normalised; not all are canonicalised</li>
<li>normalisation occurs before canonicalisation</li>
</ul>
<p>Now let’s go back and look at those hyperlinked terms in more detail.  </p>
<h2><a name="set-of-urls">Set of different URLs</a></h2>
<p>This is the key to canonicalisation and why it’s needed: the same content is being presented at a number of different URLs.  By different URLs, I mean those URLs are really different to each other – they could potentially show different content but (in this case) they don’t.</p>
<p>Here is an example set of URLs:</p>
<ul>
<li>http://www.example.com/</li>
<li>http://example.com/</li>
<li>http://www.example.com/index.html</li>
<li>http://example.com/default.asp</li>
<li>http://www.example.com/?referrer=affiliateName</li>
<li>http://www.example.com/?sessionid=123456</li>
</ul>
<h2><a name="similar-content">All serve or lead to the same or similar content</a></h2>
<p>If each of the above URLs served the same, or essentially the same, content, it’s likely that they would be canonicalised to fewer URLs – possibly only one.  If they each served completely different content, then it’s much less likely that this canonicalisation would take place.  By “or lead to”, I mean that the URL may redirect (e.g. with a HTTP 301 or HTTP 302 redirect) to another URL.</p>
<h2><a name="cononicalisation-rules">Canonicalisation Rules</a></h2>
<p>The rules for canonicalisation vary from engine to engine and time to time.  Here are a few examples of when canonicalisation will take place …</p>
<ul>
<li>If www and non-www versions of the URL exist, then canonicalise</li>
<li>If the same base URL is seen with different numbers of query parameters, then canonicalise</li>
<li>If the filename component of the URL matches a known set of index pages (e.g. index.*, default.*, etc.) then canonicalise</li>
<li>If the home page (“/”) redirects to another page, then canonicalise</li>
</ul>
<p>… and here are some examples of how canonicalisation will take place:</p>
<ul>
<li>Choose the URL with the highest Pagerank (or similar link-based or other off-page criteria)</li>
<li>Obey rel=nofollow webmaster hint</li>
<li>Choose the simplest URL (e.g. the shortest URL, or the one with fewest query parameters)</li>
</ul>
<h2><a name="indexed">Indexed or presented</a></h2>
<p>Sometimes only one URL from a set will be indexed, which means that it will always be the candidate URL to be presented in a set of search results.</p>
<p>At other times multiple URLs may be indexed, even though they are known to be part of the same canonical set. One of these URLs will be selected to appear in a given set of search results.  The URL that is selected may vary (for example, by query or by searcher location) – but only one will ever appear on a given search results page.</p>
<h2><a name="single-url">Single URL</a></h2>
<p>Normalisation operates on a single URL rather than on a set of URLs.  That single URL may need be supplemented with other data in order for normalisation to take place.  For example, un-normalised URLs may be relative or absolute.  A normalised URL will always be a <a href="http://www.ietf.org/rfc/rfc2396.txt">fully-qualified absolute URL</a> so, along with a relative URL, the containing URL or <BASE HREF…> tag will need to be known in order for normalisation to take place.</p>
<h2><a name="normalisation-algorithm">Normalisation algorithm to produce a standard form</a></h2>
<p>Like canonicalisation rules, the normalisation algorithm may vary from engine to engine and time to time.  However, it’s much less likely to vary.  Here is an example of the kind of things that are done during normalisation:</p>
<ol>
<li>convert a relative URL to an absolute URL</li>
<li>convert the scheme and the host name components of the URL to lower case</li>
<li>remove the port component if it matches the default port</li>
<li>escape characters that should be represented as octets (or a +)</li>
<li>unescape octets that are better represented as plain characters</li>
<li>convert all escape sequences to upper case</li>
</ol>
<p>Here are some examples of each operation:</p>
<ol>
<li>In http://www.silverdisc.co.uk/ , a link to “/contact.html” would be normalised to http://www.silverdisc.co.uk/contact.html</li>
<li>HTTP://WWW.SILVERDISC.CO.UK/contact.html would be normalised to http://www.silverdisc.co.uk/contact.html</li>
<li>http://www.silverdisc.co.uk:80/contact.html would be normalised to http://www.silverdisc.co.uk/contact.html, because 80 is the default port for HTTP connections.</li>
<li>http://www.silverdisc.co.uk/contact.html?name=Alan Perkins would be normalised to http://www.silverdisc.co.uk/contact.html?name=Alan+Perkins or http://www.silverdisc.co.uk/contact.html?name=Alan%20Perkins, because a space is not a valid character in a URL.</li>
<li>http://www.silverdisc.co.uk/cont%61ct.html would be normalised to http://www.silverdisc.co.uk/contact.html, because %61 is better represented as the character “a” in a URL.</li>
<li>A %2a in a URL would be converted to %2A for consistency</li>
</ol>
<h2>Summary</h2>
<p>That completes this introduction to URL canonicalisation and normalisation.  In the next post, I’ll look at rel=nofollow.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.silverspike.co.uk/2009/02/28/url-canonicalisation-and-normalisation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Nicola Gets Married!</title>
		<link>http://www.silverspike.co.uk/2008/08/02/nicola-gets-married/</link>
		<comments>http://www.silverspike.co.uk/2008/08/02/nicola-gets-married/#comments</comments>
		<pubDate>Sat, 02 Aug 2008 09:30:08 +0000</pubDate>
		<dc:creator>Alan Perkins</dc:creator>
				<category><![CDATA[SilverDisc News]]></category>

		<guid isPermaLink="false">http://www.silverspike.co.uk/?p=29</guid>
		<description><![CDATA[The preparations seem to have been going on since she joined us three years ago, but today is a big day on the SilverDisc social scene as Nicola, our Head of Paid Search Marketing, gets married!  Miss. Brack becomes Mrs. Richards.
Loads of congratulation to Nicola and Sam, her lucky new husband.  Here&#8217;s to [...]]]></description>
			<content:encoded><![CDATA[<p>The preparations seem to have been going on since she joined us three years ago, but today is a big day on the SilverDisc social scene as Nicola, our Head of Paid Search Marketing, gets married!  Miss. Brack becomes Mrs. Richards.</p>
<p>Loads of congratulation to Nicola and Sam, her lucky new husband.  Here&#8217;s to the bride and groom!</p>
<p><img src="http://www.silverspike.co.uk/wp-content/uploads/2008/08/nicolaandsam.jpg" alt="Nicola and Sam Richards" title="nicolaandsam" width="359" height="607" class="aligncenter size-full wp-image-30" /></p>
]]></content:encoded>
			<wfw:commentRss>http://www.silverspike.co.uk/2008/08/02/nicola-gets-married/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
