I'm somewhat disturbed but still ambivalent about the
large number of scraper sites - sites with little or
no original content that just reprint articles taken from other sites.
There can be value to such sites in the sense that they consolidate
information in specific subject areas, but what disturbs me is
that often the original source isn't immediately apparent.
For example, take a look at
(link dead, sorry)
The Linux World Learns How Larry Ellison Does Business. Until you click on the "Continue" link, it
isn't at all obvious that this content is actually taken from
another site. There's no indication that this is a quote - in
fact, it definitely isn't a quote. I'm not convinced this
extract would qualify as "fair use" either: the copyright rules for fair use seem to imply that
there has to be more than just the other person's content:
.. amount and substantiality of the portion used in relation to the copyrighted work as a whole
Now of course this depends upon the license offered by the owner
of the content. People often take things from here, and in most
cases I'm fine with that as long as full and proper attribution
back here is given with the post. Incidentally, if one of my
articles was taken and presented as that Larry Ellison example
More recently I changed my Copyright Policy. Please read that before
taking any content.
But never mind that. Let's say a site does crib my content
along with others and fairly presents it with no attempt at
obfuscating the original source. Does that have value? Maybe,
but this is where I get really wishy-washy. In theory, such
an accumulation of knowledge in a particular subject area could
be of value to someone wanting that area. That's particularly
true when Google fails to deliver good results, either through
having search terms with too many alternate meanings or because
there is too much "garbage" to sift through: a human editor
can do far better than Google in those circumstances.
But that sort of thing could be done just as well
with links. Why is it necessary to duplicate the actual content?
The answer is plain: it's necessary because that's the only
way search engines will see the site as authoritative. So, as
much as we (the original authors) may dislike it, we probably aren't
going to stop it unless we prohibit re-use of our material outright.
And that is something I do not want to do, both on moral and
practical grounds. Morally, I prefer to share, and practically I
can't prevent it and actually do benefit from it (for example, I
get a lot of traffic from WebProNews, a regular regurgitator of my content).
I also recognize that even I use consolidation sites more than I
use original source sites: it's just easier to find the things I
am looking for (however, I do quote the original source if
I quote anything at all).
It still annoys me greatly when something I wrote and first
published here turns up in search engine results at some other
site. Damn it, *I* wrote it, they didn't. The search engines should
be sending the traffic to me, not to them.
I'd like to propose a solution: a simple tag system
that search engines could recognize which would attribute the
original source. We content authors could make inclusion
of that tag a condition for republication on the web and search engines
could cooperate by recognizing that tag and properly attributing the
source. For example, it might be as simple as an href with
//aplawrence.com/Web/web_scrapers.html - this hyperlink must
be included to republish this article on the Web.
If search engines understood the meaning of that, and would
redirect subsequent traffic to the real source, I'd be a lot happier.
Understand that I'm not talking about using this for "fair use"
quoting, and also that I'd expect search engines to show both
sources. For example, if I wrote an article about widgets that
was picked up by the Widgets Today site, a search engine that
had indexed that for certain terms could display both the
Widgets Today link and an "Original Source" link back here. That
would give full and proper credit and also give searchers a choice
as to what they wished to read.
See also Does RSS Imply Permission To Reuse Content?
If you know of any efforts in this regard or have other ideas, I'd love to
hear about it in the comments below - thanks!
Got something to add? Send me email.
Increase ad revenue 50-250% with Ezoic
More Articles by Anthony Lawrence
Find me on Google+
© 2012-07-19 Anthony Lawrence