September 5, 2009

Is MSN Lying About Their Referrer Spam?


Introduction


So several months ago MSN started crawling the web with a new crawler. This one was designed to (in theory) detect sites that were cloaking. The idea behind this was that if the site was serving up a different page to bots, and a lot of cloaking sites bypass their own IP lists to save on CPU usage if there’s a referrer detected, then sending a fake “LIVSOP” search engine referrer should detect the cloaking sites.



Sounds like a good move for them, right? No. The way they did was sending out a lot of referrers that were pretending to be from Live.com search results. Like an excessive amount I still get 30+ per day on this blog.

This was supposed to bypass a lot of cloaking filters, but really only succeeded in dirtying up everyones logs, and causing a handful of people to ban the abusive IP range. If you want to take a look, it is well documented everywhere. Eventually, they admitted they were doing it, and the dust settled.



The Reality

So. I got asked by Gab of SEO ROI to find a domain I had that was completely banned so that he could settle a discussion he was having about Google Bowling(yes, he was testing against one of his old sites). I trust Gab, so I said sure.



Now, at this point I’m running my blackhat off of 2 shared hosting accounts. My one ancient one, and my one super-shared host I have that is running my current software that uses almost no CPU usage, so I can host a lot of domains on it. I reasonably figured that my latest host’s domains were probably still mostly alive in MSN/Yahoo, even if they were banned from Google, and decided to root around my old account for a domain.



Now, on the ancient hosting account, I NEVER updated any databases or software to combat MSN’s referrer spam cloaking detection. All of my domains(as it has been verified) behaved exactly how MSN would expect them to(yes, every single domain on this account cloaks). And then I realized something. Every Single domain I had was indexed on either Yahoo or MSN still. That is not a good sign for them. Some of these are way older than a cloaked domain should be. I eventually had to give Gab a less than successful domain I had that was still indexed!



When I woke up the next morning, I then ran some figures and realized that 85% of the domains I had on that account are still indexed by Live/MSN. It’s been MONTHS since they started screwing with all my logs. They have crawled these domains well over 50,000 times (my database on that account freaks out if I query for more than 50,000 logs in the DB, so that’s all I can confirm) since they began spewing their crap all over my log files.

In fact, my very first blackhat domain (a piece of crap to be honest) is still somehow indexed by Live.com. In fact, it appears(as of this morning) to have increased it’s indexed pages that it had last night to now easily exceed 15,000 pages.



So Why The Hell are They Crawling Like This?

To be honest, I have no idea. A thought by Gab(somewhat jokingly) was that maybe they’re just trying to shuffle some traffic into their Search Engine. A “Hey, I’m still here!” kind of thing. That seems a bit of strech, but who knows? There’s either some other reason for them doing this referrer spam, or they’re just really really terrible at detecting cloaking. I have no idea. These sites were exactly what their referrer spam should’ve detected.



Any ideas?



-XMCP

0 comments:

Post a Comment

Followers

 

Slightly Shady SEO. Copyright 2008 All Rights Reserved Revolution Two Church theme by Brian Gardner Converted into Blogger Template by Bloganol dot com