SEO Blog - Internet marketing news and views  

Google Talks Tracking

Written by David Harry   
Saturday, 22 December 2007 19:05

Conversions, Cookies and Bad Actors too!

A while ago I came across some poignant questions that were posed to Google by US Senator Joe Barton Joe Barton gets Lip Servicemostly in regards to Privacy concerns. What was of particular interest to this wandering web Gypsy was that much of it had to do with user data collection which is a favourite of mine in the form of User Performance Metrics. It seems we have some answers...yipeeeee….

Unfortunately, much of the detailed response is what one might expect as far as any real actionable goodies that we can all get excited about. What I do take way from this is the reality that such metrics are indeed being collected and used to enhance both personalized search and regular index search services. The responses are also interesting for what it doesn’t say.

Without further ado… here’s what I took away from it.

 

Data Retention Policies

When discussing the data retention policies for the Search aspects they stated;

“ Any user can visit the Google web site from any computer and use our search engine without providing Personally Identifying Information (PII). For these services, Google retains very little data, typically just standard server log information which includes: the uniform resource locator (URL); the Internet Protocol (IP) address associated with the computer or proxy server from which the request originated; the time and date of the request; the operating system that runs on the computer; and the type of browser that runs on the computer. We also may collect a unique cookie ID generated for the computer from which the request originated.”

And

“We recently announced our plans to further anonymize this “unauthenticated” data after 18 months. Specifically, we will obfuscate both the IP address and the cookie, which, in some cases, can be used in association with other information to identify an individual

 

Then they move onto services outside of the main search engine;

“There are other services, such as Gmail, Google Web History and Google Calendar, which are private and/or customized for the individual user. These customized services require registration (typically just a user name, alternate email address and country) in order to secure the account for that user. As a general matter, we try to retain this data for as long as the user wants it retained. Indeed, we build in features that allow users to control both the collection and deletion of their personal information.”

And who gets to have access to that data?

“We restrict access to personal information to Google employees, contractors and agents who need to know that information in order to operate, develop or improve our services.”

Ok, so what we have at this point is pretty straightforward ad we know they keep anonymous data for 18 months and data from service accounts, for as long as possible. We also know that Google may not directly be the only ones looking at said data.

 

Data Collection and Usage

 

Cookies;

They go through the basics of what cookies are and then Google ‘Pref’ cookie that stores information such as “the fact that a user wants search results in English, no more than 10 results on a given page, or a SafeSearch setting to filter out explicit sexual content”.

They discuss the lifetime which is currently set at 2038 (the recognized end date for the vast majority of computers operating in Unix time) and how ‘in the coming months’ they plan to change that to auto-expire after 2 years to appease privacy concerns. In addition to that they intend to include them in the 18 Month Anonymizer plan; by deleting Pref cookie unique ID numbers.

They also discuss general ‘authentication cookies’ that are related to the various Google services. This is straightforward stuff and are deleted when the user logs out of said service.

 

Adwords and Adsense; There is also a ‘conversion measurement cookie’;

“In our business of contextual advertising, which we describe below in detail, we use cookies for very limited purposes for AdWords and AdSense. For example, we serve a conversion measurement cookie on a per advertiser basis for purposes of measuring conversion. That is, when a user clicks on an ad provided by Google and the user is taken to the advertiser’s website, we may serve a cookie to measure whether the user completes a purchase. “- ” The conversion measurement cookie expires 30 days from when it is set

 

Data Usage

When asked; Please explain how Google uses the information or data described in Question 1(a) – (l), including, but not limited to, the following uses: perfecting Google’s search algorithm; operating Google’s advertising programs such as AdWords and AdSense; and research or analysis of user activity on www.google.com.

“First, we use this data to improve our search algorithms for the benefit of our users. For example, we receive real-time feedback on the quality of our search results when users click on results. If they click on the first result then we know that we have likely provided them with what they are looking for. If, however, they click far down in the results then our search rankings could be improved.”

Well there we have it…. The data is used to ‘improve our search algorithms’. While it is certainly stating the obvious as far as I am concerned, it is nice to see it stated as such. Once again we have no way of telling if this is purely limited to logged in users (ie; personalized search), but it doesn’t take an SEO Rocket Scientist to start connecting some dots here.

AND;

“…. we retain this data to defend our systems from malicious access and exploitation attempts, as well as click fraud and web spam. For example, with historical analysis we can identify and protect against patterns of malicious behaviour that are hard to identify in real-time. Our retention policies also protect our users from threats like spam and phishing

They mention a few times that part of the reasoning for collecting and retaining data (18 month plan) are related to combating, among other things, Web Spam. So it is Matt’s fault that we’re being watched. Damn you Cutts! (lol)… They also denote their interest in cyclical patterns;

“Some patterns operate on hourly cycles. Some are daily. Others are annual. In order to detect a pattern, we need more data than the length of the pattern. In addition, it is difficult to detect illicit behavior because bad actors go to great lengths to avoid detection. One method of detecting new illicit behaviors is to compare old data with new data. It is generally the case that the older the data the better it is to contrast old patterns with new patterns that may include new and sophisticated illicit behavior

 

Ad Targeting;

The question at hand is how the user data is used to drive targeted advertisements to individual users. For those that didn’t catch it, be sure to read the patent analysis relating to this topic; here and here.

They discuss aspects relating to AdServing based upon;

  1. Current to previous search queries
  2. Language (based on which Google domain is searched)
  3. IP address for potential ‘localized’ implications

 

As well as AdSense for Content which looks at;

  1. the content of the page
  2. language of content
  3. IP address for potential ‘localized’ implications

 

 The Google ToolBar;

While there isn’t much mention of the ol GTB, there is one that is interesting;

“the user has the option to opt into receiving PageRank information in the user’s toolbar. PageRank stores the sites that have been visited by the browser that the Toolbar is enhancing, much in the same way a browser stores a user’s web surfing history.”

Once more, nothing we haven’t already looked at as far as data collection is concerned, but a least we can now say with a certain degree of certainty that the Google Toolbar does in fact, gather user data.

 

 

Google Services

 

Google Search;

“When an unauthenticated user searches for the term “car,” we collect typical log data, such as the URL, including the search query; the IP address associated with the computer or proxy server from which the query originated; the time and date of the search; the operating system that runs on the computer; the type of browser that runs on the computer. We also may collect a unique cookie ID generated for the computer from which the query originated. The cookie is used to recognize the user preferences (e.g., selected language interface, search results formats, etc.), but users may choose to delete the cookie from their browser and still use the service

Nice huh? How many people actually think (or know how) to remove a cookie from their system? Not to many I would have to think.

“As announced last March, Google will anonymize the cookie ID and the last octet (typically one to three digits) of the IP address associated with search queries after 18 months. Even though neither an IP address nor a unique cookie ID is PII (personal identification information), we believe that our users would prefer that we further anonymize this data after a reasonable period of time. We plan to begin anonymizing our search logs in the manner described above beginning in January 2008.

Users also have the option of accessing our search services (including Google Maps, News and Images) when they are logged in as a registered user. If a user is logged into his or her Google Account, then the searches also may be recorded in association with that account. As described in more detail below, this record is fully transparent to the user, and the user has the ability to pause this function or delete any record

This is pretty much as we know it in the world of Personalized Search, nothing earth shattering there.

 

Google Web History Program; they discuss how during registration for a Google account the process includes, “a notice that Web History will be enabled with the account” and that it tracks “the user’s history of Google searches, web pages, images, videos and news stories”. What I did find interesting was the part where they stated;

“When a user disables or signs out of his or her account, searches conducted by that user will not be associated with the user’s account”

This is particularly interesting due to the wording that does not state that this data is no longer collected, but simply that it is not ‘associated with the user’s account”. By inference, this means that this data is still collected in an aggregate format that most certainly could be used as performance metrics for other areas (see my recent post on UPMs). They merely state that it reverts back to the anonymizer policies (as stated earlier).

Google Maps; Search records the same types of data as described above in the main search area.

 

Google News; searching G News also follows the same approach and also subscribes to the 18 month policy.

 

Google Images; also treated the same, including the 18 month log anonymization.

 

Google Analytics; they really don’t say much here other than GA is for website owners “to improve their marketing campaigns and web pages” and goes on to discuss that they don’t track the people using/logging into GA. Of course there is NO mention of what they are doing with the actual traffic data, including tracking site users and referrer data. Is that by design? Who knows, I tossed out my Tin Foil hat, remember?

 Google Desktop; they start off by talking about their great phishing, malware and spyware defences to show how much they care about the users privacy (WTF that has to do with anything, I dunno).

“The Google Desktop application only searches on the local computer where the user has stored data; no data is stored on Google’s servers unless the user explicitly uploads data files to Google (as described in the Search Across Computers feature below). Users may, however, use Google Desktop to search the web, in which case the searches are stored with a user’s Web History (also described below) but can be deleted by the user

 

Google Maps for Mobile: also the same standard policies with the exception that actual ‘location information’ is stored. They say they store a user ID…not PII… like there is much of a difference.

“The current version of GMM does not require user authentication, and we do not collect name, email address, phone number, Google Account information, or other PII when users use GMM.”

 

 

Lip Service

The rest is basically some lip service and discussed, ad nausea, many of the same answers already covered and some specifics relating to the inevitable Double Click purchase. While that may be of interest to some, there isn’t anything to glean of actionable value or interest.

 

And that’s it. I just figured I would jump on this before the Xmas break… one less thing to do next week

Happy Holidays….  Cya soon!

 

 

Comments  

 
0 # Gab Goldenberg 2007-12-25 20:17
Hey Dave,

Another great post here - shame it didn't frontpage Sphinn. Sorry if I didn't notice it in time; I've still Sphunn it, though it's a bit too late now.

Anyways, it may just be my browser, but I don't see the links you discussed : "For those that didn’t catch it, be sure to read the patent analysis relating to this topic; here and here."

On a related note, it's funny to see I'm not the only one with privacy concerns about all this data google is collecting on folks. I wonder if fair use might prevent them using it? Like, my email is copyright, and I didn't ask them to serve me any ads with it... Also, it's funny how they would kill your rankings and QS for Adwords if you used opt-out techniques, yet that is precisely what they're doing with telling people to delete cookies if they don't like them or sign out of their gooogle accounts before searching. F*in hypocrites.
Reply | Reply with quote | Quote
 
 
0 # theGypsy 2007-12-26 23:48
Hey G - I had seen the questions when they came out and was more than a little interested, thus compelled to write about it. In the end it seems worded well and gives no real insight beyond much that is already available.

It was fun though.... a sad life mine is friend... he he

Thanks for the catch, I was bagged and missed those links, I have put them in now.

Thanks for stopping in over the Holidays

Dave
Reply | Reply with quote | Quote
 

Add comment


Security code
Refresh

Search the Site

SEO Training

Tools of the Trade

Banner
Banner
Banner

On Twitter

Follow me on Twitter

Site Designed by Verve Developments.