Articles About | Search Engines
13
Working With Multi-Regional Web Sites
0 Comments | Posted by Alan Perkins in Search Engines, Technical Architecture
Google’s John Mueller has published a good article on working with multi-regional web sites. He confirms that country-code Top Level Domains (ccTLDs) are the best way to host multi-regional content. He also clears up some of the myths surrounding duplicate content on multi-regional domains, which is most welcome.
John doesn’t mention that the same thinking applies even if you are targeting a single country. A ccTLD is the best way to indicate the location of your target market to search engines – and to that market itself, of course.
A URL gives you at least five places to target a country: domain (ccTLD), subdomain (de.domain.com), directory(www.domain.com/de/), path parameters (www.domain.com/;domain=de) and query parameters(www.domain.com/?domain=de). However, there are lots more axes for the content to be split along:
- Category – Web, Enterprise, Social, Real Time
- Context – Intranet, Library, Personal
- Topic – Health, Travel, Jobs, etc.
- Vertical – Finance, Education, Government, etc.
- Platform – Desktop, Mobile, Television, Kiosk
- Format – Text, Image, Audio, Video, Map
(Note: the above is slightly modified from a table provided by Search Patterns, an excellent read)
Given this number of ways of organising content, and the fact that the location and language of your target audience are major considerations (worthy of a major axis), in all but the most trivial cases a ccTLD is the obvious choice for geo-targeting. It’s good to see official written confirmation of this from Google.
11
Not Impressed By Microsoft’s New Bing Ad
0 Comments | Posted by Alan Perkins in Google, Microsoft Bing
Microsoft launched their new Bing ad on television last night.
My first impressions were that the ad is too negative. It doesn’t show what Bing can do for you. It’s at risk of associating Bing with information overload and distressed searchers. I’m also not convinced the phrase “decision engine” is a good one – too techie, too nebulous. Who’s making the decisions – me, or Bing?
Compare it with Google’s Superbowl ad:
This has its own potential problems – I’m not sure I would have been brave enough to use no voiceover whatsoever on a TV ad running in a £60,000 per second timeslot – but in general it’s a much more upbeat ad showing someone achieving something – lots of things – using Google Search.
In Microsoft’s position, I think I’d accept the fact that lots of people use Google and get good results lots of the time, and show that Bing is an alternative that often succeeds when Google fails. I’d challenge the notion that Google always delivers the right result, every time, and that if Google doesn’t deliver it it can’t be on the Web. I’d get people to try Bing – that’s all you can ask of the ad. An idea would be to use something based on the famous “Pepsi Challenge”, but bring it right up to date.
Having seen the interview with Ashley Highfield, I’m looking forward to more ads in the series. It would be great to see Bing achieve the double digit market share that he desires, but I think this was a bad start to the campaign.
31
Google Results Prefetching in Firefox/Mozilla
0 Comments | Posted by Alan Perkins in General, Google
It appears that, some time ago, Google removed details of results prefetching from its Webmaster guidelines while continuing to implement results prefetching in its search results.
If you haven’t a clue what I’m talking about, the Wayback Machine has the original Google Webmaster help on this topic, which I’ll paste here verbatim in order to make it searchable (Wayback Machine pages aren’t indexed by search engines):
Results Prefetching Questions
1. What is “results prefetching,” and how does it impact my site?
On some searches, Google uses a special <link> tag supported by Firefox and Mozilla to instruct the browser to download the top search result before the user clicks on the result. When the user clicks on the top result, the destination page will load faster than before. This tag is only inserted when it is likely that the user will click on the first link.
For example, when a Firefox user searches for [stanford], Google includes the following tag in the results HTML:
<link rel="prefetch" href="http://www.stanford.edu/">The official Mozilla Link Prefetching FAQ describes the behavior of this tag in detail.
Prefetching may impact your site because the prefetch request will happen whether or not the user clicks on the result, so it may result in additional traffic to your web server. Google only inserts this tag when there is a high likelihood that the user will click on the top result, but clearly this heuristic is not right 100% of the time.
2. Can I distinguish prefetch requests from normal requests?
Yes, as described in the Mozilla Link Prefetching FAQ, prefetch requests include the additional HTTP header
X-moz: prefetch3. I want to block/ignore prefetch requests. What should I do?
To block or ignore prefetch requests (from Google and other web sites), you should configure your web server to return a 404 HTTP response code for requests that contain the “X-moz: prefetch” header.
What else do you need to know about results prefetching?
If you run Google Analytics or another JavaScript-based analytics package, you won’t see these prefetched pages in your analytics. That’s because only the HTML is prefetched, not the images, JavaScript, etc. referenced by that HTML, which means that the Analytics JavaScript is never even fetched, let alone executed. You need to look at raw log files to see prefetched pages.
Google only issues the prefetch code when they are very confident that searchers will click on the #1 result (as in their example, a search for stanford). Most times, particularly for more “normal” sites (i.e. not Stanford), Google won’t issue the code. So you may never see this on your own site.
However, it’s worth being aware of this issue because if you do see a prefetch in your raw logs you’ll want to know why; and because, depending on how you calculate conversions, the fact that a page is prefetched but never viewed by a searcher may significantly affect your conversion tracking and monetisation on that page. I’m surprised that Google removed this info from their Webmaster help.
8
a robots.txt equivalent to rel=canonical
0 Comments | Posted by Alan Perkins in Crawling and Indexing, General, Search Engines, robots.txt
In my last post I looked at the rel=canonical tag and finished by promising to look at some of the limitations of rel=canonical and consider some alternatives.
Many of the alternatives have existed for some time – the use of redirects and cookies, for example. However, the introduction of a rel=canonical tag was an opportunity for search engines to also introduce other, more efficient, standards. These are the alternatives I would like to consider – alternatives that don’t exist yet, which the search engines could have introduced this time around and may introduce in future.
I see the rel=canonical tag as analogous to the meta robots tag, and therefore suffering from many of the same limitations:
- The rel=canonical tag is located in a HTML file, and that HTML therefore needs to be fetched and parsed in order for the tag to be seen and acted upon. Therefore, the tag does not save bandwidth or CPU for the Web site or search engine.
- The rel=canonical tag is located in a HTML file and gives instructions about that file. Therefore, it cannot be used to solve canonical issues for non-HTML files such as images, PDF files or Flash movies.
- The rel=canonical tag acts at a micro-level rather than a macro-level. Therefore it is difficult to review that a site-wide policy has been correctly implemented using rel=canonical; Every possible file has to be inspected. Also, code changes have to be made in order to write the rel=canonical tag. This may slow its implementation.
Where the above issues apply to rel=canonical, and similar issues apply to the meta robots tag, it struck me that an opportunity has been missed to also solve canonical issues through the robots.txt file. Any fix applied through robots.txt would not suffer from the above problems.
Extensions to robots.txt could be made in a number of ways. For example, a mod_rewrite-type syntax could be introduced. However, I’m not sure anything so advanced is needed. Most canonical issues arise from three things:
- the use of query parameters in dynamic URLs.
- www versus non-www versions of a site (and other subdomains).
- inconsistent use of default index page URLs.
Some simple robots.txt fields to control these issues would fix most problems without the pain and errors that a mod_rewrite implementation would create.
Query Parameters
Google Analytics and Yahoo Site Explorer are two examples of tools that allow simple manipulation of URL query parameters. Yahoo’s Dynamic URL Help lists some of the crawling, indexing and ranking benefits of this approach.
Yahoo Site Explorer allows you to remove a query parameter or set a query parameter to a default value within a URL. Using this, a URL such as
- http://www.example.com/page.php?refby=affiliate&sid=abc123
could be crawled and indexed as
- http://www.example.com/page.php?refby=yhoo_srch
The session id has been dropped and the referrer has been overwritten as yhoo_srch, meaning all traffic sent by Yahoo Search could be attributed to Yahoo Search rather than the affiliate. This functionality could be implemented in robots.txt using a new syntax something like the following:
User-Agent: Slurp
Disallow:
QueryParam: -sid
QueryParam: refby=yhoo_srch
meaning that the sid query parameter is to be dropped (as it is preceded by ‘-’) and the refby query parameter is to be overwritten with a default value (as a default value is provided). The same effect could be achieved with a single line:
User-Agent: Slurp
Disallow:
QueryParam: -sid, refby=yhoo_srch
One problem with both Google Analytics and Yahoo Site Explorer is that you must list the query parameters you wish to drop from URLs – not the ones you wish to keep. Because third parties can link to your site, you’re not in control of the links they create and the query parameters they use. Therefore, canonical issues can only truly be solved by specifying the query parameters you wish to keep, rather than those you wish to drop. To solve this, wildcards could specify the default action to be applied to all non-listed query parameters. Therefore I propose the following syntax:
QueryParam: retainParam[=defaultValue]
QueryParam: -dropParam
QueryParam: [-]*
where…
- retainParam[=value]: specfies a query parameter you definitely want to keep, and an optional default value you want it set to
- -dropParam: specifies a query parameter you definitely want to drop
- *: means keep all query parameters not specified (default)
- -*: means drop all query parameters not specified
Default domain and Index Pages
Two further, much simpler additions to robots.txt could clear up the majority of other canonical problems. These are Domain and IndexPage:
Domain: defaultDomain
IndexPage: defaultIndexPage
defaultDomain specfies the default domain for this robots.txt file. For example, if the search engine retrieves http://www.example.com/robots.txt and finds …
Domain: http://example.com/
…it would know to index all URLs under the non-www domain. This would allow multiple parked domains to share the same content and robots.txt file without needing redirects or causing canonical issues, which is currently a common problem.
The IndexPage field specifies a default index page for the domain, i.e. a page for which the following two URLs are considered equivalent:
http://www.example.com/path/
http://www.example.com/path/defaultIndexPage
Conclusion
In this post I’ve proposed three new fields to add to robots.txt to provide an alternative to the rel=canonical tag, just as the current robots.txt fields are themselves alternatives to the meta robots tag, with their own advantages and disadvantages. The chief advantages I see of canonicalising through robots.txt are:
- Acting through robots.txt means that a resource does not have to be fetched and parsed in order for the canonicalisation instructions to be followed. Therefore, bandwidth and CPU is saved for both the Web site and search engine.
- Acting through robots.txt means that canonical issues can be solved for non-HTML files such as images, PDF files or Flash movies.
- Acting through robots.txt means large scale changes can be made very quickly and easily without the need for any code changes. It’s also much easier to review the changes that have been made.
The Domain, IndexPage and QueryParam fields would all be optional and independent of each other. It would be great if the search engines could introduce some or all of these ideas into robots.txt.
28
URL Canonicalisation and Normalisation
0 Comments | Posted by Alan Perkins in Crawling and Indexing, General, Search Engines, Technical Architecture
I’ve been meaning to write about the new rel=canonical tag, which was proposed by Google, Yahoo and Microsoft on February 12. I managed to squeeze some thoughts on it into my presentation and workshop at SES London, and I’ll be speaking more about it at SES New York next month, but before I blogged about it I really wanted to write more about URL Canonicalisation and Normalisation in general.
Canonicalisation or Canonicalization?
Normalisation or Normalization?
I’m British, so I say Canonicalisation and Normalisation. Your mileage may vary.
What is URL Canonicalisation?
We’re talking about search engines here, so let’s try a definition that applies generally, but leans towards search:
- URL Canonicalisation
- involves taking a set of different URLs that all serve or lead to the same or similar content, and applying rules to select one URL from that set under which that content should be indexed or presented.
I’ve hyperlinked the terms I think are important to more detail below, but before we go into them let’s try defining URL Normalisation.
- URL Normalisation
- involves taking a single URL and applying a normalisation algorithm to produce a standard form for that URL.
Others define normalisation and canonicalisation as all part of the same thing, but I like to think of them as separate processes. To my way of thinking:
- you can normalise a single URL but you can only canonicalise a set of URLs
- an un-normalised URL will serve the same content as a normalised URL, because it’s the same URL
- all indexed URLs are normalised; not all are canonicalised
- normalisation occurs before canonicalisation
Now let’s go back and look at those hyperlinked terms in more detail.
Set of different URLs
This is the key to canonicalisation and why it’s needed: the same content is being presented at a number of different URLs. By different URLs, I mean those URLs are really different to each other – they could potentially show different content but (in this case) they don’t.
Here is an example set of URLs:
- http://www.example.com/
- http://example.com/
- http://www.example.com/index.html
- http://example.com/default.asp
- http://www.example.com/?referrer=affiliateName
- http://www.example.com/?sessionid=123456
All serve or lead to the same or similar content
If each of the above URLs served the same, or essentially the same, content, it’s likely that they would be canonicalised to fewer URLs – possibly only one. If they each served completely different content, then it’s much less likely that this canonicalisation would take place. By “or lead to”, I mean that the URL may redirect (e.g. with a HTTP 301 or HTTP 302 redirect) to another URL.
Canonicalisation Rules
The rules for canonicalisation vary from engine to engine and time to time. Here are a few examples of when canonicalisation will take place …
- If www and non-www versions of the URL exist, then canonicalise
- If the same base URL is seen with different numbers of query parameters, then canonicalise
- If the filename component of the URL matches a known set of index pages (e.g. index.*, default.*, etc.) then canonicalise
- If the home page (“/”) redirects to another page, then canonicalise
… and here are some examples of how canonicalisation will take place:
- Choose the URL with the highest Pagerank (or similar link-based or other off-page criteria)
- Obey rel=nofollow webmaster hint
- Choose the simplest URL (e.g. the shortest URL, or the one with fewest query parameters)
Indexed or presented
Sometimes only one URL from a set will be indexed, which means that it will always be the candidate URL to be presented in a set of search results.
At other times multiple URLs may be indexed, even though they are known to be part of the same canonical set. One of these URLs will be selected to appear in a given set of search results. The URL that is selected may vary (for example, by query or by searcher location) – but only one will ever appear on a given search results page.
Single URL
Normalisation operates on a single URL rather than on a set of URLs. That single URL may need be supplemented with other data in order for normalisation to take place. For example, un-normalised URLs may be relative or absolute. A normalised URL will always be a fully-qualified absolute URL so, along with a relative URL, the containing URL or
Normalisation algorithm to produce a standard form
Like canonicalisation rules, the normalisation algorithm may vary from engine to engine and time to time. However, it’s much less likely to vary. Here is an example of the kind of things that are done during normalisation:
- convert a relative URL to an absolute URL
- convert the scheme and the host name components of the URL to lower case
- remove the port component if it matches the default port
- escape characters that should be represented as octets (or a +)
- unescape octets that are better represented as plain characters
- convert all escape sequences to upper case
Here are some examples of each operation:
- In http://www.silverdisc.co.uk/ , a link to “/contact.html” would be normalised to http://www.silverdisc.co.uk/contact.html
- HTTP://WWW.SILVERDISC.CO.UK/contact.html would be normalised to http://www.silverdisc.co.uk/contact.html
- http://www.silverdisc.co.uk:80/contact.html would be normalised to http://www.silverdisc.co.uk/contact.html, because 80 is the default port for HTTP connections.
- http://www.silverdisc.co.uk/contact.html?name=Alan Perkins would be normalised to http://www.silverdisc.co.uk/contact.html?name=Alan+Perkins or http://www.silverdisc.co.uk/contact.html?name=Alan%20Perkins, because a space is not a valid character in a URL.
- http://www.silverdisc.co.uk/cont%61ct.html would be normalised to http://www.silverdisc.co.uk/contact.html, because %61 is better represented as the character “a” in a URL.
- A %2a in a URL would be converted to %2A for consistency
Summary
That completes this introduction to URL canonicalisation and normalisation. In the next post, I’ll look at rel=nofollow.
14
Google Adwords trademark policies: what’s the billion dollar question?
0 Comments | Posted by Alan Perkins in Google, PPC
The Google Adwords trademark policy is aimed at balancing the interests of trademark holders, advertisers and internet users. Does it always do this, or can it sometimes provide a method for trademark holders to restrict competition and potentially cause harm to advertisers, internet users and Google itself? That’s the billion dollar question.
Google Adwords is a paid search marketing program offered by Google that allows millions of organisations around the world to advertise their products and services in Google’s search results. That makes Adwords a big deal. Adwords accounts for the lions’ share of Google’s revenues, which totalled $16.5bn in 2007 alone.
Yahoo! and Microsoft offer similar programs to Adwords. However, Google is the market leader, with estimates of its paid search market share ranging from 58% upwards. Google clearly holds a dominant position within the paid search marketplace, so its policy decisions matter.
Google’s dominance has created a significant demand within Adwords from third party advertisers who would like to market products and services against the results of popular trademarks which they do not own. As a result, there have been several instances where Google has faced legal action by trademark holders trying to restrict third parties bidding on those search terms relating to their trademarks. Trademark holders in the US, such as Geico and American Airlines, have previously filed suit . In Europe, Google has been sued by the likes of Louis Vuitton in France .
These legal actions led to the introduction of the Google Adwords Trademark Policy. There are in fact two policies, one or other of which is in force in any location around the world. These policies allow the trademark holder to exert significant influence over the use of their marks within the Adwords program.
Whilst it may seem a reasonable response on the part of Google to seek to recognise and protect the rights of trademark owners, especially in response to suggestions Google may be facilitating passing off and/or infringement of registered trademarks, the problem is that the Google Adwords Trademark Policy may in fact give far more power to trademark holders than they need to protect their goodwill and prevent passing off. Google’s trademark policies may fail to recognise the legitimate right of third parties to use registered trademarks which they do not own to legally sell products and services which they have a right to sell and facilitate Trademark holder to restrict free trade in goods and services.
For example, in the motor market, many private individuals, non-franchise and franchise dealers have a legitimate right to use manufacturer and model trademarks in order to describe a car or range of cars they wish to advertise.
An example would be if you wished to sell your Peugeot 308. Do you really want to have to call it a mid size French 1.9 litre diesel hatchback? Somehow the sale is much more likely to happen if you just call it by its make and model rather than a bland description.
Clearly in this example there is no passing off and no loss of goodwill. It is completely understood by all parties that the advertiser of the car is not necessarily the trademark holder. Yet Google’s trademark policies mean that advertisers can be prevented from using trademarked terms even so. Has this policy really balanced the interests of trademark holders, advertisers and internet users, as Google purports to do? Commenting, Kevin McGuinness of London-based commercial law specialists Sabretooth Law stated
In restricting the use of trademarks Google may have diminished the ability of non-owners of trademarks to legitimately use such trademarks in the course of carrying on their trade. Given the size of the market in which Google operates and the importance of the advertising market to automobile resale sector this is likely to be an area where both English and European competition authorities may take an interest in arrangements which potentially restrict competition to the detriment of the general public.
Antitrust or anti-competition issues have been one area where both the UK and European competition authorities have consistently demonstrated a keen interest in protecting the European consumer and Google’s dominant position in the paid search marketing sector would suggest it needs to ensure its policies are legal, not only in the US but also in Europe.
In the UK, an organisation can be fined 10% of its worldwide annual revenues for engaging in anti-competitive behaviour. As noted earlier, these amounted to $16.5bn for Google in 2007 alone, so 10% would be $1.65bn. That is a large number!
Is it Google’s responsibility, though, or is it the responsibility of the respective trademark holders? Or is it both?
It seems harsh to hold Google solely responsible, when Google has been simply trying to respect trademarks holders’ legitimate rights; especially in light of the fact that Google has been sued by several trademark holders and to some extent its trademark policy is a result of that. In addition, by restricting competition on some trademarked terms, Google may have impacted its own revenues. Kevin McGuinness again:
As Google is the participant in the on-line market place, which is itself restricting the availability for use of other persons’ trademarks, it could be that Google, not the trademark holders, may be found to be at fault. This hardly seems fair given Google’s long standing commitment to ethical good business practice.
Clearly Google does not exercise its trademark policy in isolation. Only when a trademark holder files a trademark complaint in the appropriate jurisdiction does Google exercise its policy. This is why you can see Google Adwords for lots of trademarked terms, but not all.
Evidence of how trademark holders are working with search engines came in a recent interview with New Media Age magazine (subscription required) when Steve Bowler, Marketing Manager of Land Rover, stated:
One of the areas that wasn’t looked at properly before was search. Previously it was recognised as being somewhat important yet ancillary to TV, press and outdoor. Now, though, we take search very seriously, working with the search engines on how to deal with issues like trademarking.
As a result, Kevin McGuinness states:
Competition authorities could conclude that Google and trademark holders are each using Google Adwords to prevent competition.
Not only Google but each individual trademark holder could be investigated and potentially fined up to 10% of global revenues. Trademark holders who have restricted their trademarks include Alfa Romeo, Peugeot and Land Rover.
Do the same issues also affect Yahoo! and Microsoft? No. Both of these search companies have much more targeted trademark policies. For example, Yahoo!’s policy is:
As applied to nominative uses of another’s trademark, Yahoo! Search Marketing requires advertisers to meet one of the following two conditions: … Reseller [... or ...] Information Site, Not Competitive
And Microsoft’s policy, though targeted, is elegantly simple:
Affiliates and resellers may bid on trademarked terms relevant to the goods, services, or sites that they promote.
Why does Google not have such a simple policy? Perhaps because, though simply stated, the Yahoo! and Microsoft policies require more editorial intervention than the Google policy, or perhaps because Google’s current policy arises from being sued by trademark holders, rather than being pursued by competition authorities. Google’s official response is posted on their Inside Adwords Blog:
We will not allow the use of a trademark term according to the parameters of the trademark complaint filed by the trademark owner. Therefore, unless the trademark owner specifically grants you permission to use their trademarked term by contacting our Trademark team, we are not able to approve the use of the trademark in your AdWords ads.
There is no explanation there, nor has one ever been offered on the many occasions Google has been given to comment on this issue, but one can only assume that Google believes it is on solid legal ground in operating this policy. The question is: are they correct?
Though a vast improvement on Google’s trademark policy, Yahoo!’s and Microsoft’s policies both restrict comparative advertising (advertising which “explicitly or by implication, identifies a competitor or goods or services offered by a competitor”). A recent European court case showed that such restrictions may be unlawful . However that is a different, and far less contentious, issue than the anti-competition issues raised by the Google Adwords Trademark Policy alone.
So, the question remains. Has Google and/or its advertisers been in contravention of UK or EU competition laws in exercising its trademark policy to date? Microsoft’s European court experience should provide ample evidence that American software giants need to be very careful within the European Union. Once the EU competition authorities decide to bite, they rarely let go of their prey quickly. Given the enmity between the two, will Microsoft be at the head of the line to point out the ongoing competition issues in Google’s trademark policies?
Google has, since its inception, been a beacon of best business practice, but it may be on the wrong side of this legal issue by trying to do the right thing by trademark holders who continue to abuse its policies in order to restrict fair competition. With fines of up to 10% of global turnover possible, it’s a high stakes issue.
OK, sorry for the slightly misleading headline (although if you read on you’ll find it’s not that misleading). No apologies, though, for giving my opinion on what is now old news, which is that Google has dropped Best Practice Funding for agencies from 2009 onwards. Don’t ever expect this blog to be first with the news … there are others in the industry who are devoted to that. What you can expect here is considered, truthful opinion and, hopefully, an insight that you won’t find anywhere else.
There’s plenty of comment around about the fact that BPF was not a subsidy, was not a commission and was not, in fact, related to any individual advertiser but rather to the net billings of the whole agency. Personally, I think it’s great the playing field is levelled, but I’m still not looking forward to having to renegotiate rates with clients. Any agency that doesn’t have to renegotiate was either not receiving BPF or was charging too much in the first place, and SilverDisc does not fit either of those two categories, I’m happy to say.
What’s missing is comment on what BPF actually is, and what its withdrawal therefore signifies.
Probably the best document that describes what Best Practice Funding is, if you’re prepared to read between the lines, is the 2007 – Best Practice Funding Terms And Conditions. This lists several conditions that an agency must meet in order to fully qualify for BPF. Those conditions include:
- the fact that the agency, rather than the agency’s customer, must communicate with Google
- the fact that the agency is responsible for Google being paid its invoices on time
I can’t help feeling that Google is massively undervaluing the role of agencies in providing these services. Their support role, in both account management and invoicing, will grow enormously in 2009. I hope that Google uses the time between now and then to grow its infrastructure accordingly.
Another requirement on agencies to qualify for BPF is that they employ at least two GAP-qualified staff. This is where my slightly misleading headline actually has a ring of truth. The GAP exam has been the best tool for building and maintaining an understanding of Adwords. I’ve passed it myself and, before Christmas, I’m due to renew my qualification. All my PPC management staff and PPC programming staff (we write PPC API apps to manage our clients’ spends) have passed the GAP exam too and, again, are due to renew before Christmas.
I always thought that the Google’s encouragement of agency staff being GAP-qualified was of great benefit to Google, the agencies, and the industry as a whole. In dropping BPF, I think Google are sending a poor message – in, literally, stopping funding best practices, they are stopping supporting best practices.
5
Can’t Google Write Any Decent Analytics Documentation Themselves?
0 Comments | Posted by Alan Perkins in Analytics & Log Files, Google
Rarely does Google give such a public ringing endorsement for a third party as this, on the official Google Analytics blog:
Can I look forward to a link drop to SilverDisc here or here?
14
How much PageRank does a page that is not in the index have?
0 Comments | Posted by Alan Perkins in Crawling and Indexing, Google, Links, Search Engines
If a page has a NOINDEX tag on it, how much PageRank does it have? The intuitive answer would be “None”. How can a page that is not indexed have PageRank? Wouldn’t it be treated like a dangling link and disregarded during PageRank calculations?
Apparently not. Matt Cutts states on seomoz:
Does a link from a page with meta robots=”noindex, follow” carry less weight? no weight?
For Google, I believe such links would carry the same weight as normal links on regular pages.
Hmmm. Does he mean that the unindexed page actually has a PageRank? Or does he mean that the zero Pagerank that the unindexed page has would be divided out among the links on the page, giving nothing to each? I wonder …
One thing’s for sure … if “NOINDEX, FOLLOW” works as implied, it’s a great way to inject spammy content and links.
Thanks to Dan Thies for drawing my attention to the latest “mayhem” surrounding Google, rel=nofollow and the FTC. This is an area close to my heart, as my article from 2005, Search Marketing & The Law, made clear:
It would be foolish to expect to be operating in a multi-billion dollar global marketing industry and not expect to comply with marketing laws and regulations in the countries in which you are marketing.
The current confusion stems from Matt Cutts’ blog post on paid links back in April, which called for both human readable and machine readable disclosure of paid links – machine readable first:
If you want to sell a link, you should at least provide machine-readable disclosure for paid links by making your link in a way that doesn’t affect search engines. There’s a ton of ways to do that. For example, you could make a paid link go through a redirect where the redirect url is robot’ed out using robots.txt. You could also use the rel=nofollow attribute.
The problem here is that there is no machine-readable disclosure for paid links. Matt suggests that there a “ton” of ways, but none of these ways mean “this link is paid”, let alone the means, method and motive for payment. This is where the confusion starts.
Matt then goes on to discuss human-readable disclosure:
The other best practice I’d advise is to provide human readable disclosure that a link/review/article is paid.
Here I fully agree with Matt – it’s important not to mislead your visitors. No confusion here.
The real confusion seems to come from the next thing Matt says:
Google’s quality guidelines are more concerned with the machine-readable aspect of disclosing paid links/posts, but the Federal Trade Commission has said that human-readable disclosure is important too:
The petition to us did raise a question about compliance with the FTC act,” said Mary K. Engle, FTC associate director for advertising practices. “We wanted to make clear . . . if you’re being paid, you should disclose that.”
To make sure that you’re in good shape, go with both human-readable disclosure and machine-readable disclosure, using any of the methods I mentioned above.
Some people have inferred that Matt is saying that paid links that aren’t labelled in a machine-readable way are contravening the FTC guidelines. He isn’t saying this at all. Read carefully. The FTC is concerned with human-readable disclosure, not machine-readable disclosure. There is no machine-readable disclosure for paid links.
It is possible to place deceptive advertising in search results using various means. But failing to label a link as paid in a machine-readable way is not one of them. There is no machine-readable disclosure for paid links.
