The Silver Spike | The Official SilverDisc Blog

Archive for June 2007

Jun/07

15

QDF – WTF…?

Search engine freshness is an issue that’s always been of interest to me … certainly since the very early days of my search marketing adventures, in the mid-1990s. It was of such interest that it lies at the core of one of my patents, “Process for maintaining ongoing registration for pages on a given search engine“.

I would define search engine freshness as follows:

freshness
An indication of how recently a search engine has crawled and indexed a page, a part of its index or its index as a whole

I think most search engine engineers would define it in a similar way. For example Matt Cutts, in his blog post “Measuring Freshness” from September 2005, stated

The authors tracked Google, Yahoo, and MSN over 42 days using 38 German webpages that were updated daily and that included a datestamp somewhere on the page. They measured freshness by looking at each search engine’s cached page to see how up-to-date the page was. If you measure success by having a version of a page within 0 or 1 days, Google succeeded a little under 83% of the time, MSN succeeded 48% of the time, and Yahoo succeeded about 42% of the time.

So I’m surprised to find the following quote in the recent, much-publicised New York Times article “Google Keeps Tweaking Its Search Engine“:

So [Mr. Singhal] monitors complaints on his white board, prioritizing them if they keep coming back. For much of the second half of last year, one of the recurring items was “freshness.”

Freshness, which describes how many recently created or changed pages are included in a search result, is at the center of a constant debate in search: Is it better to provide new information or to display pages that have stood the test of time and are more likely to be of higher quality? Until now, Google has preferred pages old enough to attract others to link to them.

But last year, Mr. Singhal started to worry that Google’s balance was off. When the company introduced its new stock quotation service, a search for “Google Finance” couldn’t find it. After monitoring similar problems, he assembled a team of three engineers to figure out what to do about them.

Earlier this spring, he brought his squad’s findings to Mr. Manber’s weekly gathering of top search-quality engineers who review major projects. At the meeting, a dozen people sat around a large table, another dozen sprawled on red couches, and two more beamed in from New York via video conference, their images projected on a large screen. Most were men, and many were tapping away on laptops. One of the New Yorkers munched on cake.

Mr. Singhal introduced the freshness problem, explaining that simply changing formulas to display more new pages results in lower-quality searches much of the time. He then unveiled his team’s solution: a mathematical model that tries to determine when users want new information and when they don’t. (And yes, like all Google initiatives, it had a name: QDF, for “query deserves freshness.”)

“Query deserves freshness”? I thought that all queries deserved freshness! A stale index results in poor quality SERPs that could lead to lots of pages that no longer exist or have been changed since they were indexed. A fresh index solves this problem to a great extent.

This appears to be a new, different use of the word freshness in relation to search engines, referring not to the last indexed date of a page, but to the last modified date of a page – something very different!

Matt Cutts referred to this NYT article in “Five things you didn’t know about Google’s search“, without mentioning this variation in the use of the word Freshness, which I find odd – especially as elsewhere in the article, it was claimed that Matt and Mr. Singhal share an office (with two other people). Hmmm … do they talk, do you think? :D

I think I’d have called it “Query Deserves Recency”.

No tags

Jun/07

8

Google Defines Cloaking – Again!

I see that Google have added to their Quality Guidelines, including a new, helpful(?) definition of cloaking:

Cloaking refers to the practice of presenting different content or URLs to users and search engines. Serving up different results based on user agent may cause your site to be perceived as deceptive and removed from the Google index.

Some examples of cloaking include:

  • Serving a page of HTML text to search engines, while showing a page of images or Flash to users.
  • Serving different content to search engines than to users.

That’s fairly clear then. :)

My own definition of cloaking is

Cloaking
The identification of a search engine spider by some feature of its IP address or HTTP request, and the resultant delivery of a response to that spider designed to game the search engine’s ranking algorithm.

My rule of thumb is that you should not need to know that a search engine is making the request in order to deliver a response to that request. The obvious exception to this rule of thumb is Paid Inclusion. Paid Inclusion isn’t cloaking. ;)

No tags

Theme Design by devolux.org