An Interesting Social Media Question
The Market Ticker ® - Commentary on The Capital Markets
Posted 2012-04-29 13:35
by Karl Denninger
in Technology
Ignore this thread
An Interesting Social Media Question
 

This weekend I was going through the logs for The Market Ticker and Tickerforum, and noted that a few rogue "robots" had managed to come at the site with some very high-rate download requests (basically "spidering" the site) while evading the general rules and policies of good conduct on the Internet.

That is, they were essentially grabbing pages as fast as the system could return them (which is very fast), a rather anti-social practice.

Forum spam and data-mining have become a rather serious problem over the last couple of years and the volume of this sort of crap has increased substantially, with the worst offenses coming from (no surprise) places like China, India and (surprise!) Europe.

Tickerforum has a number of "defensive" measures it employs to stop abusive practices; it is able, for example, to detect "robotic" attempts to register accounts (a common spammer tactic; register an account and then spam away, all automatically without a human involved), and will "ip ban" the source addresses where such attempts come from.  There are also a number of well-placed "Honey Pot" traps that will never be seen by a human, but a robot will find them, and if they're followed that draws an instant ip ban as well.

Nonetheless this is basically a cyber arms race, where the malefactors develop their weapons, you improve your shields, they rotate their phaser frequencies a bit, you rotate your shields, and so on.  I've done pretty well keeping most of these guys out -- many other forums and discussion areas on the web, not so much.

So this weekend I added a "rate monitor" tracking system that is entirely content-insensitive and adaptive, with the intent that it should be able to automatically detect attempts to play the game I had caught and squash it in real-time, barring subsequent attempts by blocking the originating IP address.  But after coding up a first pass and turning it on, then watching the results for a while, I found something else interesting in the logs it generated.

There's a new "kid in town" in the spidering game -- and it's not the ordinary sort of "search engine."  These are data miners who are not looking to provide a public service such as Google or Yahoo does, but rather are spidering sites for the purpose of selling "brand management."

This is something I believe that online forum operators need to think about from a policy perspective.  After all, we have a database of messages that has value (otherwise nobody would come visit us, right?)  These folks are mining our data and then selling this data to third parties -- not making it available for free, but rather marketing the intelligence they gather -- without compensating us!

I have a problem with that and if you run one of these sites you should too.

If you do a google search for "Web Reputation Management" you will find a huge number of firms that are plying this trade. I wondered where they were getting the data they were using -- whether they were, when contracted, going out and performing searches at that time, then doing their thing to "counter" whatever the client might not like.

That's not their model folks. 

They are data mining your site on a continual basis and then selling that data to other people.  They are adding to your system and network load on a continual basis and yet you receive zero value from these "hits" as they are not people, they're robots and simply grabbing your data store and analyzing it for their paying customers.

You ought to be getting paid for this; you're certainly not getting value in-kind as you do with real humans -- whether by whatever subscription or donation model you may have, from viewed advertising or from that person's contribution to the discussions underway. 

Nope -- these folks are doing nothing other than taking and reselling your data, without your consent and in many cases in direct violation of your terms of service and compilation copyrights. And they're not doing it to provide a search function to the public at no charge to -- they're doing it to compile a proprietary database of information they intend to keep and sell.

I'm going to implement a means to block this sort of thing here, since I've now found that I can identify the firms doing it with a high degree of success.   And I believe that others who run sites similar to this one should do so as well.

Let's put a stop this sort of business model.

Discussion below (registration required to post)
 

Main Navigation
Full-Text Search & Archives
Archive Access
Get Adobe Flash player





Blogtalk 3:30 CT Mondays
Items To Look At


Discuss The Capital Markets along with daily technical analysis with our Gold Donor program.

Where We Are, Where We're Heading (2013) - The annual 2013 Ticker

Links and Blogroll
Our policy on reciprocal links: Send us an email with your information and why you think your blog or news site would make a good addition - in most cases reciprocal link requests will be granted.
Seeking Alpha Certified
Legal Disclaimer

The content on this site is provided without any warranty, express or implied. All opinions expressed on this site are those of the author and may contain errors or omissions.

NO MATERIAL HERE CONSTITUTES "INVESTMENT ADVICE" NOR IS IT A RECOMMENDATION TO BUY OR SELL ANY FINANCIAL INSTRUMENT, INCLUDING BUT NOT LIMITED TO STOCKS, OPTIONS, BONDS OR FUTURES.

The author may have a position in any company or security mentioned herein. Actions you undertake as a consequence of any analysis, opinion or advertisement on this site are your sole responsibility.

Looking for "The Best of Market Ticker"? Check out
Ticker Classics.

Visit the forum to discuss this and other investing-related topics; see the FAQ on the forum for information about Gold Donor status including access to our technical analysis video server.

Market charts, when present, used with permission of TD Ameritrade/ThinkOrSwim Inc. Neither TD Ameritrade or ThinkOrSwim have reviewed, approved or disapproved any content herein.

Market Ticker content may be reproduced or excerpted online provided full attribution is given and the original article source is linked to. Please contact Karl Denninger for reprint permission in other media.

Submissions may be sent "over the transom" to The Editor at any time. To be considered for publication your submission must include full and correct contact information and be related to an economic or political matter of the day. All submissions become the property of The Market Ticker.

Leads on stories of current economic and political interest are always welcome. Our fax tip line is 850-897-9364; please include contact information with your transmission.

 
Comments.......
User: Not logged on
Login Register Top Blog Top Blog Topics FAQ
Showing Page 1 of 3  First123Last
User Info An Interesting Social Media Question in forum [Market-Ticker]
Christiangustafson
Posts: 4140
Incept: 2007-06-27
Green
Helping Hand Acceptance Corporation
Banned
Report This As A Bad Post Add To Your Ignored User List
Could we please bring back the 1995 Internet?

I could do without everyone who got online since then.

----------
It is therefore, on opinion only that government is founded... -- Hume
Emdeplam
Posts: 2046
Incept: 2008-01-10
Silver
Report This As A Bad Post Add To Your Ignored User List
When I post to sites I lose the ability to monetize that IP. When those sites choose open ,public, access they often lose it as well.
Genesis
Posts: 130663
Incept: 2007-06-26
Admin A True American Patriot!
Report This As A Bad Post Add To Your Ignored User List
You give in exchange for value Emd -- you are a participant in the community.

Open access is intended for people. Data-mining is not, and yet the convention on the 'Net is that we all permit it to happen for search purposes, so people can find what we say.

I have no quarrel with someone coming to me and asking for such access, intending to spider the site in exchange for something. If the "something" is free, public access to the search results, I've no problem with it -- that's why Bing, Google, Baidu and similar are not blocked around here. Indeed the robots.txt file is there for this purpose and is defined for the express purpose of allowing webmasters to control what is and is not spidered and by whom, should they so choose.

These folks are not only ignoring the robots file and good practice, they're not spidering for the purpose of making their results available to the public.

----------
I don't care if it makes sense -- only if it makes money. -- Me
Bank (n): See scam, fraud and theft. Eat a bankster -- they're low-carb.
What part of "shall not be infringed" was unclear?

Bertdilbert
Posts: 2651
Incept: 2008-12-22
Gold
CA
Report This As A Bad Post Add To Your Ignored User List
Is there a way to catch the robot and divert it to a useless database?

----------
Dear Euroland: Relax, Germany has a plan for your money!

Political Capital Defined: We are out of money but will tax our citizens for whatever it takes to "SAVE" the Euro.
Genesis
Posts: 130663
Incept: 2007-06-26
Admin A True American Patriot!
Report This As A Bad Post Add To Your Ignored User List
Yes, I could catch the request and divert it to something other than the requested content if I wanted to.

One of the pleasures of never storing anything in a flat file (that is, everything goes through the AKCS software) is that it can do whatever it wants with the request.

The only reason I caught this was that they were less-than-judicious in their exploitation of the resource. That is, they presented an unreasonable load and it stood out like a big middle finger on my traffic graph, prompting me to go look for the reason. I expected a DDOS attack -- what I found instead was a high-rate spider that was attempting to grab the entire publicly-accessible (no-sign-in required) area of the forum, and it got stopped by one of the honey pots. The problem for them is that it left traces all over the place as to who they were and what they were doing, and I subsequently found a bunch of others doing the same thing.

Sadly for them I now know how to detect and trap them..... and have.

----------
I don't care if it makes sense -- only if it makes money. -- Me
Bank (n): See scam, fraud and theft. Eat a bankster -- they're low-carb.
What part of "shall not be infringed" was unclear?

Bertdilbert
Posts: 2651
Incept: 2008-12-22
Gold
CA
Report This As A Bad Post Add To Your Ignored User List
Dirvert it then make it go REAL SLOW...

----------
Dear Euroland: Relax, Germany has a plan for your money!

Political Capital Defined: We are out of money but will tax our citizens for whatever it takes to "SAVE" the Euro.
Analog
Posts: 542
Incept: 2010-12-29
Gold
arkansas ozarks
Report This As A Bad Post Add To Your Ignored User List
Can you send them to a place that'll give them blacole virus ?



Genesis
Posts: 130663
Incept: 2007-06-26
Admin A True American Patriot!
Report This As A Bad Post Add To Your Ignored User List
smiley

"If request from jackass, go ahead and return it -- after delaying 10 seconds."

Analog: Yes, I can redirect, but this is a ROBOT, not a person with a browser, so they won't see anything.

----------
I don't care if it makes sense -- only if it makes money. -- Me
Bank (n): See scam, fraud and theft. Eat a bankster -- they're low-carb.
What part of "shall not be infringed" was unclear?

Stinkydrunk
Posts: 757
Incept: 2008-04-12
Green
SE MI
Report This As A Bad Post Add To Your Ignored User List
So if a topic gets discussed on Tickerforum, say about something obscure like "Phase 41 space modulator shotgun", and I search Google for that phrase, will that TF discussion show up in the results?

----------
If the generally accepted meaning of the word marriage can be redefined, so can "keep and bear" or "freedom of speech" or anything else in the Constitution.

Ignoring: mpilar, landshark, agau, dbcooper
Genesis
Posts: 130663
Incept: 2007-06-26
Admin A True American Patriot!
Report This As A Bad Post Add To Your Ignored User List
If it's in a place where you don't need to log in, yes. Those where you need to sign in to see them, no.

That's part of the general web infrastructure. What I object to is companies that spider the site and then sell that data - - they have no right to do that and in addition the reason they're doing it is for things like "reputation management" and "what has this job applicant posted."

We (webmasters) all allow this sort of spidering and such to go on for search purposes, but when it's for private exploitation for the latter sort of purpose I have a problem with it.

----------
I don't care if it makes sense -- only if it makes money. -- Me
Bank (n): See scam, fraud and theft. Eat a bankster -- they're low-carb.
What part of "shall not be infringed" was unclear?

Genesis
Posts: 130663
Incept: 2007-06-26
Admin A True American Patriot!
Report This As A Bad Post Add To Your Ignored User List
BTW if you want to know how FAST bots like Google catch this stuff....
Google Search, just now wrote..
An Interesting Social Media Question in [Market-Ticker]
market-ticker.org/akcs-www?post=205336
9 posts - 5 authors - 34 minutes ago
One of the pleasures of never storing anything in a flat file (that is, everything goes through the AKCS software) is that it can do whatever it ...

Note when I posted that comment on this thread, and how quickly Google grabbed it. I searched for "One of the pleasures of never storing anything in a flat file" -- a rather ordinary statement -- and this was the top return.

Google is very good at this, they're always in the "active user" list, and they are never obtrusive about it either. I have no objection to Google doing what they do, as their primary use is for search and index, which is fine.

In the retail world register scan data is routinely sold, sliced and diced. But the retailers get paid for that data, as well they should. It's theirs.

That which is used and put back where the public can get at it has a fair value exchange. What I'm arguing here is that what these folks are doing has no value exchange -- it's a literal "grab" without compensation and could in fact be used to harm the people who use the site(s) in question, and you have no way to know how it's being used, who has access to it or on what terms.

----------
I don't care if it makes sense -- only if it makes money. -- Me
Bank (n): See scam, fraud and theft. Eat a bankster -- they're low-carb.
What part of "shall not be infringed" was unclear?

Supertruckertom
Posts: 215
Incept: 2010-11-07
Green
USA
Report This As A Bad Post Add To Your Ignored User List
less than 30 minutes........WOW.
So are you going to sell your algorithm or keep it to yourself?
This stuff is fascinating to me.
I've been in some huge data centers and telco switches here in ATL and I know the activity is all around me, just unseen.
One place has a 120 megawatt substation to feed it.
Multiple terrabit fibers coming in from 56 Marietta St. to 1033 Jefferson St.
Skynet just isn't self aware yet.


----------
What I do is fairly simple.
People need their stuff.
It is my job to get it to them.

Reason: spelling, add a coma
Genesis
Posts: 130663
Incept: 2007-06-26
Admin A True American Patriot!
Report This As A Bad Post Add To Your Ignored User List
Yep.

"Anything you say can and will be used against you." Indeed.

But as long as it's for public search, who cares? You said it, right?

The problem comes when you have dozens, hundreds or thousands of firms out there doing this sort of thing for private, pecuniary interest, they're all hitting your systems and adding to your bill, then selling their "results" to people and you get nothing out of it -- not visibility, not more hits, not advertising exposure -- nothing.

If I can detect and block that (and I can), I will.

----------
I don't care if it makes sense -- only if it makes money. -- Me
Bank (n): See scam, fraud and theft. Eat a bankster -- they're low-carb.
What part of "shall not be infringed" was unclear?
Mayorquimby
Posts: 13907
Incept: 2008-09-18
Green
The Archaic Past
Report This As A Bad Post Add To Your Ignored User List
Who has the time to program all these attack programs, implement them and WHY?

----------
They who wish to hurt you, work within the law.
- Morrissey

Gold is theft.
Flappingeagle
Posts: 1224
Incept: 2011-04-14

Report This As A Bad Post Add To Your Ignored User List
It is too bad you don't have some program that can subtley change the content before it is distributed to those who are taking your data for their own private gain. All it would take would the the inserting of a "not" or a "never" in most sentences to reverse their meaning.

Perhaps even better would be a subtle redirect to a competitors site so that they end up mining each other. It probably is a good thing that I don't have a site like yours and the computer savvy to go along with it, I would lie awake at night thinking of ways to screw those guys.

Flap

----------
Here are my predictions for everyone to see:
S&P 500 at 320, DOW at 2200, Gold $300/oz, and Corn $2/bu.
"You can't build a house of cards on a shaking table." - Tony Johns
The January 2015 AMZN put at $130 (cost $4.25) will be a winner.
Jotapay
Posts: 16721
Incept: 2008-08-26
Silver
Austin, Tx
Report This As A Bad Post Add To Your Ignored User List
Knowing that many types of things like this have been going on for a while now, I reduced my exposure to the internet a couple of years ago. I simply don't post much content at all there any more. TickerForum is the only place I post anything of any value any more and even that is heavily filtered. Anything else is randomized and very generic.

I saw the effect of all this when I donated to Debra Medina. Instantly I started getting showered with mail from every ******n republican and charity in the world. It still hasn't stopped. I know I'm still going to be in many databases, but I try and minimize that as much as possible. I just find it distasteful and creepy that they're constantly spying on me, nevermind that I don't see a penny for the thousands of dollars in revenue that I generate for these companies every year.
Jotapay
Posts: 16721
Incept: 2008-08-26
Silver
Austin, Tx
Report This As A Bad Post Add To Your Ignored User List
Quote:
Who has the time to program all these attack programs, implement them and WHY?


Once you code up a basic framework with all the methods that you will use, it's pretty simple from that point to tweak it and add a module here and there.
Jotapay
Posts: 16721
Incept: 2008-08-26
Silver
Austin, Tx
Report This As A Bad Post Add To Your Ignored User List
Quote:
It is too bad you don't have some program that can subtley change the content before it is distributed to those who are taking your data for their own private gain.


There's more to it than that and you have to question whether it's worth the time. If you wanted to **** with them, you would have to accurately guess what they're analyzing (their algorithm). So you would need to change the content in such a way that it ****s up their algorithm in the way you intended. Your selected inputs (the changes you make to the content) would need to be correct. Then you would need to decide that the time you spend doing this is more interesting than going to the beach or something else.

Genesis
Posts: 130663
Incept: 2007-06-26
Admin A True American Patriot!
Report This As A Bad Post Add To Your Ignored User List
Blocking them is easier...

----------
I don't care if it makes sense -- only if it makes money. -- Me
Bank (n): See scam, fraud and theft. Eat a bankster -- they're low-carb.
What part of "shall not be infringed" was unclear?
Cobra2411
Posts: 10335
Incept: 2007-06-26
Gold A True American Patriot!
Philly P.a.
Report This As A Bad Post Add To Your Ignored User List
Retun the text of War & Peace one sentence at a time...

----------
To err is human. To really **** things up takes government.
Dtlgc
Posts: 935
Incept: 2007-11-26
Green
Texas
Report This As A Bad Post Add To Your Ignored User List
Gen, are they using robot names, or just IP addresses?

My web server just shuts down the return of data to "bad" robots after 100 successive requests in x-number of seconds...
Imustbenutz
Posts: 283
Incept: 2010-11-04
Green
Absurdistan, USSA
Report This As A Bad Post Add To Your Ignored User List
Are these robots capable of getting into the database?

I, too, am interested in a server plug-in or site plug-in to thwart those bastards. How much will it cost and when will the beta version be ready?
Jstanley01
Posts: 8171
Incept: 2008-07-30
Silver A True American Patriot!
San Antonio, Texas
Report This As A Bad Post Add To Your Ignored User List
The problem is, the good guys don't share information. You block an IP on behalf of TF, but maybe no one else does. So the bad guys stay in business with it. A subscription service for website owners that attacks the spammers and robber bots at the root is what's needed. I smell a startup.

----------
You can't cheat an honest man. ~P.T. Barnum
Tienkou
Posts: 4225
Incept: 2007-09-09
Green
Connecticut
Report This As A Bad Post Add To Your Ignored User List
Any one that is curious should look up the Metasploit Project.

----------
Barack Hussein Obama - The last President of the First American Revolution.
The US Congress has abdicated its role as a governing body.

The most dangerous man is the one with nothing left to lose. Our government is making more of them everyday.
Login Register Top Blog Top Blog Topics FAQ
Showing Page 1 of 3  First123Last