March 4, 2016 2:51 AM / by Joshua Ballard
Estimated Read Time: 6 minutes + 17 minute recommended video
Not every one pays attention to Google’s Recaptcha panels, but you have all experienced them.
They don’t always look exactly the same, but they are consistent none the less.
In fact, 7 days after I post this blog all commenters will need to complete one.
Why Recaptchas Are Important
The primary goal of a recaptcha is to prevent automated bots completing processes that should only be completed by a human being.
Some of these tasks include things such as:
- Posting Comments on Websites
- Creating Social Media Profiles
- Competing in Online Games, such as Poker
- Filling Out Forms on Websites
The internet would be a far less hospitable place if bots were allowed to run rampant and take control of huge portions of the web.
If you run a website, you know how annoying it is to see a comment on your new blog post, only to see it is actually a garbled blob of nonsense text linking to something ridiculous (such as get rich quick schemes, Viagra e-commerce stores and so on)
If you have ever used Facebook and kept being added by random strangers, it gets frustrating.
Worse again, if you use dating apps such as Tinder, you will regularly come into contact with a spam bot that is trying to get you to go to another website. Many users have previously complained about this consistent problem.
Worse again, if you enjoy a few bouts of online poker, you may be versing a complicated AI, much more capable of reading and determining odds than you are.
Recaptchas play a vital role in reducing the spam bots abilities to carry out these tasks.
Note, this just means better programmers are making better spam bots that can bypass more recaptchas… it can be a bit of a hamster wheel situation.
My Biggest Problem With Spam Bots
Working as an Inbound Marketer involves many tasks and processes.
Many of these tasks revolve around SEO.
Every SEO practitioner must decide for themselves if they are going to operate in a white hat manner, or a black hat manner.
Essentially, whether they are going to play by the rules or try to game the system.
Playing by the rules produces slower, more consistent and reliable results.
Gaming the system can produce fast, inconsistent and often short lived results.
You will often hear hordes of business owners testifying that SEO just cant be trusted as a long term marketing strategy, that the results could swing in the opposite direction at any moment.
More often than not, the results that suddenly turned on them, were a by product of black hat SEO.
Spam Bots play a huge role in the black hat SEO world.
How Spam Bots Help Black Hat SEO
I’m going to break this down into two categories, the Unsophisticated Method and the Refined Method.
If you would like a more in depth look at the Black Hat methodology, Tim Soulo from Ahref’s recently put up a great post that explains how this shaded economy operates.
The unsophisticated method consists of using bots to:
- Create Links from Blog Comments
- Create Links from Spam Directories
- Maliciously Hack Sites to Place Links
The Google algorithms have grown more and more capable of identifying these practices, and therefore they are quickly caught out and rectified.
The more refined approach uses bots to manage and create Private Blog Networks, often referred to as PBN’s.
A private blog network is a series of websites that exist solely to pass on authority to other websites.
Often they are built on expired domains that previously held authority.
As a domain expires, it is placed in a public auction; search engines exist that will then list off the available domains along with metrics such as:
- how many links it has
- how it is already ranking
- trust metrics for the links it has
From there, the PBN owner can purchase a domain which is aged (this has benefit), that has existing links coming in from other trustworthy domains.
These networks are extremely hard for Google to detect, as they are designed to fly under the radar.
This doesn’t mean Google is not trying 😉
How bots help a Black Hat PBN:
- Updating and managing the websites
- “spinning content” so that it evades detection
How Recaptcha Could Be Modified to Help Detect PBN’s
The use of spun content has no value what so ever for a human being reading something.
When a person encounters spun content, they may think it is just extremely poor English, but more often than not it is actually the results of an unsophisticated bot which is reproducing content whilst attempting to avoid detection by text matching software.
One of the largest reasons spun content does not work properly is that it produces unidiomatic phrases that a human would never use.
Languages employ idiom (the Syntactical, grammatical, or structural form peculiar to a language) in a consistent way.
Essentially, the way we structure our sentences is not the only way we could logically construct them. The structure comes about naturally as part of a collective culture of acquired practice.
The Wikipedia example sums this up perfectly well:
“For example, although in English it is idiomatic (accepted as structurally correct) to say “cats are associated with agility”, other forms could have developed, such as “cats associate toward agility” or “cats are associated of agility”. Wiki Quote
Since bots are machines, they do not pick up on the nuances of languages (though they are getting better every day)
Therefore, if we used a reCAPTCHA form which required the user to select the correct idiomatic sentence structure of a language, it would be able to detect bots.
A while ago I watched a very inspirational TED Talk
You can watch his TED Talk below, running time just over 16 minutes:
The Origin of My Idea
As I discovered yet another competitor’s PBN this week, I was feeling in the dumps.
From the research I completed I could see that the options I had weren’t that great.
I could take a few of the following options:
- Outwork them with white hat methodology, and trust the long game
- Tattle on them, registering a webspam complaint
- Accept that this is how it is done, and replicate it
I’ve always been a firm believer in white hat methodology, but it has always been frustrating to see black hat SEO reaping benefits and results.
I had never been more drawn to the dark side than this week.
I lay awake, contemplating:
- Will I simply join them, and create a PBN
- How would I do it
- Would it work
- What would be needed
- Should I Take That Step
I tried to think of ways that Google could identify a PBN, and wondered what they are actually doing at the moment to already.
I ended up recalling this particular TED talk, and it occurred to me that a large scale PBN relies heavily on spun content in order for it to maintain profitability (because creating quality content is just not cost effective when you are running thousands of sites).
I started thinking about the idea of identifying spun content via a reCAPTCHA, essentially bringing millions of humans on board, one sentence at a time.
I would love to generate a conversation on this, so feel free to contribute to the comments section below, or share your thoughts with this hashtags: #pbn #recaptcha