August 26, 2009

Google’s User Data Empire


I’ve been holding off on doing this entry for a bit, but with the introduction of SearchWiki their aims are so clear to me, I just can’t hold off anymore. Google’s problems over the past 2 years have been the result of an algorithm overly based on links. They’ve finally hit their wall. With the latest batch of link buying platforms, their options for truly detecting it are dying out. One can call Google many things, but ignorant of the marketplace and SEOs is not one of those things. So they needed a response. Their response? User data. Lots of fucking user data.

I know I’ve covered a similar topic before(how Google is essentially creating it’s own internet), but I wanted to do one specifically on user data.



The Basic Layout of the Google User Data Empire



Google Adsense - Google adsense has the unique ability to track without fear of repurcussion. Why? Because any data they send back can be used and archived in their eternal battle against click fraud. This means they transmit everything from screen resolution to ability/version of flash(things that arguably have nothing to do with click fraud). Either way, it’s a window they have into millions and millions of hits on the internet daily. It’s targetted towards informational sites though, and not commercial sites(Google’s true interest).

Google Analytics - This is Google’s window into non informational sites. It tracks an absolutely obscene amount of user data(actually, more than you can see/use in their analytics panel). Without this, they’d have no window into sale based sites that would give the competition traffic if they ran adsense. Webmasters flock to this tool, not realizing the danger of feeding Google all that information. Here’s a hint: it tracks conversion rates. Now, Google is currently taking anywhere from 2-5x the amount of adsense revenue they’re giving to the website owner, which means if you do PPC you’re more or less at their mercy for how much you’re paying per click. Them knowing how much you’re making per click via their conversion tracking could (in theory) allow them to adjust your PPC expenses up, while still remaining profitable. But once again, the real gold here is the ability to track the users.

Google Chrome - Google Chrome is an interesting creation. Google is a public company. That means they cannot create something like chrome without a significant financial reason. The trick is they’re already propping up firefox via $59.5-70 million a year in donations(85% of Firefox’s revenue) to keep them as the default search. $70 million is jack shit to Google, so they definitely wouldn’t create Chrome simply to save on that, and they’re already getting the ad revenue from firefox searches so that itself doesn’t make sense. So why would they create Chrome?

Unique Identifier - Chrome generates a unique id whether or not you agree to send your data to Google. If you agree to send it, this ID gets trasmitted. So what does that do? It makes it so they can identify you regardless of where your computer is, and regardless of cookies. It’s truly the perfect information gatherer.

[Partially] Closed Source - I’m no open source junkie, but let’s not kid ourselves. The one primary difference between Firefox and Chrome is that Chrome is closed source. It’s based off of Chromium, a BSD licensed piece of software. BSD license means you don’t have to open source your modification on their code(unlike the GPL). This means one has to run a sniffer to see the data Chrome is sending out; you can’t simply open the source code. While initial versions don’t send out an excessive amount of data, I’m willing to bet user adoption will change that.

Typing Tracking - I just sniffed a Chrome request(opted in to trasmit data). The page I was going to was complete blank except for a fake 404 error. Magically, it created 2 requests to Google. One was a “google suggest” style query(which means yes, Google suggest is used for tracking). The other was a curious query, as it trasmitted events(used generic names so I dont know what each stood for), a unique ID, and interestingly enough a variable called “rep”, presumably implying a user reputation level. A single type in of a domain created 3 of these “events”. I wonder what they are.

Google Checkout - One of a few ways Google is moving to be able to identify real people. That is to say it’s a way to be able to tie an IP and a cookie/username to a real, 100% legit name. This is worth more than most could ever imagine. Not only is that person identified as someone with a credit card, but the billing address itself gives you a region the person is from, and a probable demographic. Also used to tie back to a real identity is the much debated Google Health, which can store medical information on an individual.

Google Toolbar - Fantastic for identifying webmasters, the Google toolbar is among the most powerful methods of getting user data. How long do you think it will be before they turn users into unknowing cloaking checkers(click search results, omgz this pagerank request isn’t for the right domain)? Every single webpage you access, private or not, gets sent to Google for their page rank check.

Google Android - The one set of data they couldn’t access properly before. Phone habits. Note how agressively they’ve pursued the cell phone market(IPhone anyone?)

SearchWiki - Google’s latest addition to let you reorganize the search results. They say the data is used only for the user that changes it. Fun fact? That makes no sense. Google already has bookmarks, and if you are logged in and click “Web History”(and are opted in) it will show you the searches you’ve made and the results you’ve clicked. So their is absolutely no reason for the creation of this other than to alter search results, and more importantly gauge user’s reactions to commercial vs. informational sites.

Other Obvious Sources - Gmail(your contacts, your interests), the actual search results, and many more.

Google justifies all of this on the idea that a lot of other companies have been gathering this data for some time. But there’s a difference. Those companies only had data from one source at a time. For Google, it’s different. Their specialty is organizing information. They have access to more avenues for userdata than any other company in the history of the world, and the ability to connect every aspect of every person’s life. Log into gmail on android? Congrats, your phone number can now be tied to your IP home IP. Don’t search using Google? Between adsense and analytics, you’ve probably got a 35-50% chance of sending data to Google anyways with every page load. Did you buy something through an ad served by Google? With conversion tracking, they know you bought, and can tie that back to everything else.



Why I’m Scared as a User

I’m really beginning to get scared here. Even ignoring Google’s less than benevolent intentions, can anyone imagine a data breach? No company is truly secure. 4 years ago the entire member database of the largest porn network on the planet was available(including passwords) for 1 grand. over 500,000 records. There have been data breaches at pharmaceutical companies, leaking millions customer records, down to the pill they took and when the prescription was up. Government servers get compromised, credit bureaus get compromised. So why would Google be any different?



Why I’m Scared as a Webmaster

Google has an interesting issue. They have more userdata than they can allow adwords advertisers to target. This is an absolutely insane amount of information. So they’re left with 3 options.



Enter the CPA Market - With their Google Affiliate Network, this seems like a likely path. Imagine a massive in house program that can get clicks for dirt cheap(remember, Google takes a HUGE cut out of adsense revenue. Surrendering that they can afford conversion rates that would make normal PPCers cringe).

Not Use the Data - Google is a publically traded company. Their responsibility is to stock holders. So regardless of how warm and fuzzy they act to the internet community at large, this option is not viable. Their privacy policies contradict the filth they spew towards the consumer about how the data will and won’t be used. And guess which one is legally the reality? The privacy policy. They’re using the data folks.

Take Control from Advertisers - They can’t let me target based on all the data they have, so the alternative is to make the decisions for me based on what they think is best. Well, sort of. Remember that Google automatically optimizes not for conversions, but for click through and profit on their end.

I don’t understand how prominent geeks normally so paranoid over spyware and whatnot can ignore Google. They function on a higher level than any spyware company in history, and do it all by winking at the webmaster community and acting like they’ll look out for us. “Do No Evil” is the motto of a private company. Not a public company. It’s the antithesis of the free market economy. What is good for the consumer is not good for the company, and that is especially true with an advertising company that has access to so much data.



Until next time,

XMCP



PS: Edited the entry to indicate that chrome is partially closed source. Though the open source aspects are chromium for the most part. To clarify, here’s a line from Chrome’s TOS: 10.2 You may not (and you may not permit anyone else to) copy, modify, create a derivative work of, reverse engineer, decompile or otherwise attempt to extract the source code of the Software or any part thereof, unless this is expressly permitted or required by law, or unless you have been specifically told that you may do so by Google, in writing.

0 comments:

Post a Comment

Followers

 

Slightly Shady SEO. Copyright 2008 All Rights Reserved Revolution Two Church theme by Brian Gardner Converted into Blogger Template by Bloganol dot com