Site Scrapers Find Free Money on the Web

[Updated below at the bottom]

You may have noticed that I have temporarily switched to short feeds because scraper sites like Zmarter.com are ripping off content we publish here. Yves Smith wrote a post about DMCA violations by Zmarter and other site scrapers yesterday.

This is how it works: site scrapers’ entire business model involves copying every single post from leading news sources and re-publishing in order to earn advertising dollars from Google and other sites. They accomplish this by using WordPress plugins that allow them to automatically post the content of RSS feeds which other sites publish.

Blogs are especially vulnerable to this kind of copyright infringement because blogs generally publish full feeds in order to facilitate ease of use for readers. Most traditional publishers use short feeds, perhaps in order to increase site traffic or to prevent content scraping. The Guardian is one admirable exception.

Ironically, it is because of high rankings in Google’s search algorithm that these sites are even able to garner traffic and earn dollars from Internet advertising . Some "aggregators" like The Market Financial and Before It’s News are even listed as trusted news sources on Google News – where they must pass a human eye to be accepted as a reputable news source. Some scrapers also have Facebook and Twitter accounts linking back to their site or specific scraped articles. In two cases, seeing links to articles I wrote on Google News – but on aggregator sites, I clicked through to find many posts from Credit Writedowns on each site published in full despite our having never been contacted regarding re-publishing rights. Links back and short excerpts are one thing; that’s legitimate aggregation. Full content scraping is quite another story. Apparently, all you have to do to make money on the Internet is set up a website, install some plugins to scrape good content elsewhere, make sure you optimize your site for search rankings, and contact Google to include you in their list of trusted sources. It’s as good as free money – and often Google is collecting the advertising money along with you.

In my view, it is just this kind of situation which will ultimately win Google more regulatory scrutiny because Google dominates both search and advertising online. I am surprised that they have not taken steps to eliminate this kind of situation.

Recently, the New York Times wrote a widely-followed story about allegedly unethical business practices at a site called DecorMyEyes. Apparently, the individual behind the site was using a loophole in Google’s ranking algorithm and his bad reputation amongst customers to rank higher in search results at Google.

I am an avid user of Google’s products. Moreover, I understand that Google took steps to keep fraudsters from gaming the search results after the NYTimes revealed problems at the site in question. Nevertheless, cutting and pasting content on blogs, newsletters, internet pages or any other media and trying to earn money off of it is a bad business model. It is unethical. And right now, it seems as if Google not only condones it but fosters it through inclusion of scraper sites in Google News.

The mainstream media has been complaining about this for years. Bloggers are actually seen as a major source of copyright infringement. But bloggers like Matthew Yglesias have been complaining about this as well. The point Yglesias makes is that the Internet is about the dissemination of information. People like him want more information shared more freely. Like Felix Salmon, he doesn’t like the FT’s draconian solution to concerns over copyright infringement. That’s just going to turn readers off. But I reckon they don’t want people copying their content without attribution either.

I will certainly be much more circumspect about how much I quote from particular sources in the future. But I don’t want to go the route of short RSS feeds and DMCA violation reports. Unless Google gets better control of its search functionality, for many publishers, that looks to be the only remedy.

————

Update 1600ET: Search Engine Land is now reporting that Google is not just overhauling its search algorithms but is also dropping some sites from inclusion in Google News and reviewing standards. Before posting this article, I had linked to another Search Engine Land article from yesterday which noted that Google was revamping its copyright protection rules . Clearly, the kinks are still being worked out in these new rules.

UPDATE 5 Dec 2010 930ET: Having contacted InfoLinks, an in-text advertiser for Zmarter, I received the following reply by e-mail:

Hi Edward,

Thank you for contacting Infolinks.

I understand your frustration regarding this situation. However, please understand that the issue you are currently experiencing is out of our jurisdiction. Although we are providing the In-text ads, we have no control over the ownership of the actual content on a website.

I understand that this goes against your terms of service, which is why I would strongly recommend that you contact the owner of the website.

Please let me know if there is anything else I can assist you with and I’ll do my best to help.

Sincerely,

Dana Z

Account Executive

E [email protected]

W www.infolinks.com

Happy New Website – For a Happy New Year!

I responded as follows:

Dana,

I gather from your response then that you have no terms of service to prevent your advertisers from advertising on sites that are in violation of copyrights. In that sense, you are encouraging copyright violations. I can’t imagine reputable advertisers would be comfortable with that. I know that other in-text advertisers are not.

https://www.vibrantmedia.co.uk/publishers/vm_editorial_policy.asp

Are you saying that you are unwilling to protect your customers from advertising on sites violating copyrights?

Regards,

Edward

As I said at the outset, companies that merely re-publish content created by others can generate a lot of revenue if the ecosystem in which they operate permits this. It requires search engines and advertising companies to look the other way, condoning and even encouraging their activities. As long as the advertisers behind InfoLinks whose ads are served on the pages in violation of copyright are narrowly focused only on performance, this kind of activity is encouraged. But that means they are also condoning and encouraging copyright violations. Apparently, that is the world we live in on the web right now.

P.S. – I should add that Zmarter is hotlinking graphics from Credit Writedowns’ content delivery network’s servers. Credit Writedowns pays money to its CDN for each graphic served by Zmarter. So, Credit Writedowns is literally paying in order to serve content on Zmarter and similar sites. Last month this bill skyrocketed due to site scrapers. This is why we have been forced to switch to short RSS feeds

Update 5 Dec 2010 1000ET: The Market Financial has now eliminated its scraped content and is focusing on original content instead. Good show. Zmarter continues to scrape – including this post!

UPDATE 6 DEC 2010 1700ET: Search Engine Land is now reporting that the merchant in question from the NYTimes exposé has been arrested. (video below)

13 Comments
  1. jimh009 says

    Edward,

    As a Webmaster myself, and having a popular and very large site that I started in 2002, site scraping is far from new. Even though my site isn’t a blog, site scrapers have been harvesting bits and pieces of it ever since 2003. Google has been battling scrapers in their organic search results for seemingly forever. The emergence of blogs with RSS feeds has made their jobs simpler, as everything is there “for the taking” in one nice, easy to read file.

    Google has been fairly good about getting rid of the scrapers out of both their organic SERPS as well as dumping these publishers from their Adsense program. However, there are many more contextual advertisement programs than just Adsense today – thus, they have new revenue streams.

    Google is in a never-ending “arms race” with the scrapers and spammers of the world. They’ve done an ok job at it, but it really is a never-ending battle. They win one battle, but then some new technology or method comes along and it’s back to “square one” for Google and the other search engines. All the new Google products (such as Google News), only provide more avenues for the spammers and scrapers of the world to make money online.

    Of course, it’s not just the search engines that are having problems. Twitter is full of spam anymore, and Facebook has it’s own problems too.

    1. Edward Harrison says

      You’re right that this category and mouse game will never go aeay. It’s like the cat and mouse in viruses.

      It was good to see that Google is taking this seriously though. Twitter too.

  2. Steve Hamlin says

    “I should add that Zmarter is hotlinking graphics from Credit Writedowns’ content delivery network’s servers.”

    1. That gives you great ability to mess with them. Replace oft-hotlined graphics with a image (and alt-text) that reads “Zn&$%ter is stealing my content from CreditWritedowns.com – do not trust this web site”. Or go straight for the juvenile, and denigrate Zn&$%ter’s readers or executive, or poison their hotlinked images with porn. That’s get them to change pretty quick.

    2. Looks like they are based Toronto. Depending on how pissed off you are, get a legal friend to file a small claims court action in Canada. Save the forms, repeat occasionally until they cease. Execute judgment verdicts against them in as public a way as you can – garnishment orders to their advertiser’s A/P departments, or the like. Involve their business relationships if permissible.

    Good luck.

Comments are closed.

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More