Conversions, Cookies and Bad Actors too!
A while ago I came across some poignant questions that were posed to Google by US Senator Joe Barton mostly in regards to Privacy concerns. What was of particular interest to this wandering web Gypsy was that much of it had to do with user data collection which is a favourite of mine in the form of User Performance Metrics. It seems we have some answers...yipeeeee
Unfortunately, much of the detailed response is what one might expect as far as any real actionable goodies that we can all get excited about. What I do take way from this is the reality that such metrics are indeed being collected and used to enhance both personalized search and regular index search services. The responses are also interesting for what it doesnt say.
Without further ado
heres what I took away from it.
Data Retention Policies
When discussing the data retention policies for the Search aspects they stated;
Any user can visit the Google web site from any computer and use our search engine without providing Personally Identifying Information (PII). For these services, Google retains very little data, typically just standard server log information which includes: the uniform resource locator (URL); the Internet Protocol (IP) address associated with the computer or proxy server from which the request originated; the time and date of the request; the operating system that runs on the computer; and the type of browser that runs on the computer. We also may collect a unique cookie ID generated for the computer from which the request originated.
We recently announced our plans to further anonymize this unauthenticated data after 18 months. Specifically, we will obfuscate both the IP address and the cookie, which, in some cases, can be used in association with other information to identify an individual.
Then they move onto services outside of the main search engine;
There are other services, such as Gmail, Google Web History and Google Calendar, which are private and/or customized for the individual user. These customized services require registration (typically just a user name, alternate email address and country) in order to secure the account for that user. As a general matter, we try to retain this data for as long as the user wants it retained. Indeed, we build in features that allow users to control both the collection and deletion of their personal information.
And who gets to have access to that data?
We restrict access to personal information to Google employees, contractors and agents who need to know that information in order to operate, develop or improve our services.
Ok, so what we have at this point is pretty straightforward ad we know they keep anonymous data for 18 months and data from service accounts, for as long as possible. We also know that Google may not directly be the only ones looking at said data.
Data Collection and Usage
They go through the basics of what cookies are and then Google Pref cookie that stores information such as the fact that a user wants search results in English, no more than 10 results on a given page, or a SafeSearch setting to filter out explicit sexual content.
They discuss the lifetime which is currently set at 2038 (the recognized end date for the vast majority of computers operating in Unix time) and how in the coming months they plan to change that to auto-expire after 2 years to appease privacy concerns. In addition to that they intend to include them in the 18 Month Anonymizer plan; by deleting Pref cookie unique ID numbers.
They also discuss general authentication cookies that are related to the various Google services. This is straightforward stuff and are deleted when the user logs out of said service.
Adwords and Adsense; There is also a conversion measurement cookie;
When asked; Please explain how Google uses the information or data described in Question 1(a) (l), including, but not limited to, the following uses: perfecting Googles search algorithm; operating Googles advertising programs such as AdWords and AdSense; and research or analysis of user activity on www.google.com.
First, we use this data to improve our search algorithms for the benefit of our users. For example, we receive real-time feedback on the quality of our search results when users click on results. If they click on the first result then we know that we have likely provided them with what they are looking for. If, however, they click far down in the results then our search rankings could be improved.
Well there we have it
. The data is used to improve our search algorithms. While it is certainly stating the obvious as far as I am concerned, it is nice to see it stated as such. Once again we have no way of telling if this is purely limited to logged in users (ie; personalized search), but it doesnt take an SEO Rocket Scientist to start connecting some dots here.
. we retain this data to defend our systems from malicious access and exploitation attempts, as well as click fraud and web spam. For example, with historical analysis we can identify and protect against patterns of malicious behaviour that are hard to identify in real-time. Our retention policies also protect our users from threats like spam and phishing.
They mention a few times that part of the reasoning for collecting and retaining data (18 month plan) are related to combating, among other things, Web Spam. So it is Matts fault that were being watched. Damn you Cutts! (lol)
They also denote their interest in cyclical patterns;
Some patterns operate on hourly cycles. Some are daily. Others are annual. In order to detect a pattern, we need more data than the length of the pattern. In addition, it is difficult to detect illicit behavior because bad actors go to great lengths to avoid detection. One method of detecting new illicit behaviors is to compare old data with new data. It is generally the case that the older the data the better it is to contrast old patterns with new patterns that may include new and sophisticated illicit behavior.
The question at hand is how the user data is used to drive targeted advertisements to individual users. For those that didnt catch it, be sure to read the patent analysis relating to this topic; here and here.
They discuss aspects relating to AdServing based upon;
- Current to previous search queries
- Language (based on which Google domain is searched)
- IP address for potential localized implications
As well as AdSense for Content which looks at;
- the content of the page
- language of content
- IP address for potential localized implications
The Google ToolBar;
While there isnt much mention of the ol GTB, there is one that is interesting;
the user has the option to opt into receiving PageRank information in the users toolbar. PageRank stores the sites that have been visited by the browser that the Toolbar is enhancing, much in the same way a browser stores a users web surfing history.
Once more, nothing we havent already looked at as far as data collection is concerned, but a least we can now say with a certain degree of certainty that the Google Toolbar does in fact, gather user data.
When an unauthenticated user searches for the term car, we collect typical log data, such as the URL, including the search query; the IP address associated with the computer or proxy server from which the query originated; the time and date of the search; the operating system that runs on the computer; the type of browser that runs on the computer. We also may collect a unique cookie ID generated for the computer from which the query originated. The cookie is used to recognize the user preferences (e.g., selected language interface, search results formats, etc.), but users may choose to delete the cookie from their browser and still use the service.
Nice huh? How many people actually think (or know how) to remove a cookie from their system? Not to many I would have to think.
As announced last March, Google will anonymize the cookie ID and the last octet (typically one to three digits) of the IP address associated with search queries after 18 months. Even though neither an IP address nor a unique cookie ID is PII (personal identification information), we believe that our users would prefer that we further anonymize this data after a reasonable period of time. We plan to begin anonymizing our search logs in the manner described above beginning in January 2008.
Users also have the option of accessing our search services (including Google Maps, News and Images) when they are logged in as a registered user. If a user is logged into his or her Google Account, then the searches also may be recorded in association with that account. As described in more detail below, this record is fully transparent to the user, and the user has the ability to pause this function or delete any record.
This is pretty much as we know it in the world of Personalized Search, nothing earth shattering there.
Google Web History Program; they discuss how during registration for a Google account the process includes, a notice that Web History will be enabled with the account and that it tracks the users history of Google searches, web pages, images, videos and news stories. What I did find interesting was the part where they stated;
When a user disables or signs out of his or her account, searches conducted by that user will not be associated with the users account
This is particularly interesting due to the wording that does not state that this data is no longer collected, but simply that it is not associated with the users account. By inference, this means that this data is still collected in an aggregate format that most certainly could be used as performance metrics for other areas (see my recent post on UPMs). They merely state that it reverts back to the anonymizer policies (as stated earlier).
Google Maps; Search records the same types of data as described above in the main search area.
Google News; searching G News also follows the same approach and also subscribes to the 18 month policy.
Google Images; also treated the same, including the 18 month log anonymization.
Google Analytics; they really dont say much here other than GA is for website owners to improve their marketing campaigns and web pages and goes on to discuss that they dont track the people using/logging into GA. Of course there is NO mention of what they are doing with the actual traffic data, including tracking site users and referrer data. Is that by design? Who knows, I tossed out my Tin Foil hat, remember?
Google Desktop; they start off by talking about their great phishing, malware and spyware defences to show how much they care about the users privacy (WTF that has to do with anything, I dunno).
The Google Desktop application only searches on the local computer where the user has stored data; no data is stored on Googles servers unless the user explicitly uploads data files to Google (as described in the Search Across Computers feature below). Users may, however, use Google Desktop to search the web, in which case the searches are stored with a users Web History (also described below) but can be deleted by the user.
Google Maps for Mobile: also the same standard policies with the exception that actual location information is stored. They say they store a user ID
like there is much of a difference.
The current version of GMM does not require user authentication, and we do not collect name, email address, phone number, Google Account information, or other PII when users use GMM.
The rest is basically some lip service and discussed, ad nausea, many of the same answers already covered and some specifics relating to the inevitable Double Click purchase. While that may be of interest to some, there isnt anything to glean of actionable value or interest.
And thats it. I just figured I would jump on this before the Xmas break
one less thing to do next week
. Cya soon!