New book - 'Building a Better World in Your Backyard' - on Kickstarter (sponsored friend)

Appropedia:Anti-spam and anti-vandalism/patterns

From Appropedia
Jump to: navigation, search

This page is for making notes about spam patterns, so we can figure out how block them.

August 2011[edit]

Some spam I deleted today:

<span class="plainlinks">[http://www.cleanfresnocarpets.com<span style="color:black;font-weight:normal; text-decoration:none!important;background:none!important; text-decoration:none;">fresno carpet cleaning</span>]
<div style="font-size:100%; text-align: center;">
[http://cataclysmforum.ru/red/?k=prevacid&said=18_08_2011<u style="color:blue; font-size:32px;">CLICK HERE TO ORDER PREVACID ONLINE!</u>]
<ul style="color:green; font-size:24px;">
<li>Safe And Secure</li>
<li>Worldwide Shipping</li>
</ul>
[http://cataclysmforum.ru/red/?k=prevacid&said=18_08_2011<u style="color:blue; font-size:32px;">CLICK HERE TO ORDER PREVACID ONLINE!</u>]
<ul style="color:green; font-size:24px;">
<li>We Beat Any Price!</li>
<li>No Prescription Needed!</li>
</ul>
[http://cataclysmforum.ru/red/?k=prevacid&said=18_08_2011<u style="color:blue; font-size:32px;">CLICK HERE TO ORDER PREVACID ONLINE!</u>]
</div>
<br>
<br>

and

<div style="font-size:100%; text-align: center;">
[http://cataclysmforum.ru/red/?k=zinc&said=18_08_2011<u style="color:blue; font-size:32px;">CLICK HERE TO ORDER ZINC ONLINE!</u>]
<ul style="color:green; font-size:24px;">
<li>Safe And Secure</li>
<li>Worldwide Shipping</li>
</ul>
[http://cataclysmforum.ru/red/?k=zinc&said=18_08_2011<u style="color:blue; font-size:32px;">CLICK HERE TO ORDER ZINC ONLINE!</u>]
<ul style="color:green; font-size:24px;">
<li>We Beat Any Price!</li>
<li>No Prescription Needed!</li>
</ul>
[http://cataclysmforum.ru/red/?k=zinc&said=18_08_2011<u style="color:blue; font-size:32px;">CLICK HERE TO ORDER ZINC ONLINE!</u>]
</div>
<br>
<br>


For those last two, there was also a lot of text with an enormous number of <br> tags. That could be a spam signal, but would give false positives on its own - as someone genuine, pasting in some appropriate content but in poorly converted HTML markup. I think AbuseFilter lets us assign points to a suspicious pattern, so if it fits that pattern and another one, then it can be blocked. (It can also be flagged by AbuseFilter, at least.) --Chriswaterguy 02:06, 18 August 2011 (PDT)

We're getting lots of Polish spam, often on user pages, e.g. this one. I see this one repeats itself a lot:

[http://www.przeprowadzki-podkarpackie.pl przeprowadzki Jaslo] Rzeszow to miasto w poludniowo-wschodniej Polsce, stolica wojewodztwa podkarpackiego  a takze diecezji rzeszowskiej. Rzeszow stanowi miasto na prawach powiatu, a takze jest siedzibe wladz powiatu rzeszowskiego. W miescie  sa sie  masowe  narodowe uczelnie wyzsze. Rzeszow  kladziony jest na pograniczu Pogorza Karpackiego i Kotliny Sandomierskiej nad rzeka Wislok. Studenci, ktorzy  rozmieszczaja nauke w Rzeszowie czesto  wymuszeni sa do korzys

--Chriswaterguy 22:25, 25 August 2011 (PDT)

I think we should block links with class="plainlinks" and other efforts to make links look like normal text as it's not a useful feature unless you are trying to spam. Also we can flag articles that use .pl links for a little while. Easier to spot spam that way? --08:48, 26 August 2011 (PDT)
That's an excellent start. Do you understand the AbuseFilter syntax enough to make a start with that? I'm happy to work with you on this - ping me via chat or whatever.
For any of these patterns, it would be nice to tone down the response if it's an autoconfirmed user. If Lonny (or any user that's lasted 10 edits and 10 days without being blocked) adds a link with .pl, no need to flag it.
Wikipedia's filters may offer some insights, too.
By the way, feel free to experiment wildly and recklessly, as long as the experimental filters don't block anyone :-). --Chriswaterguy 10:50, 26 August 2011 (PDT)
Roger roger. For now I'll settle for warn / flag as the action for any filter. I have to do some more looking into changing filters based on the age of a users account. Is 10 edits / 10 days a good starting point? --Tahnok 11:05, 26 August 2011 (PDT)
AbuseFilter rules can be found at https://secure.wikimedia.org/wikipedia/mediawiki/wiki/Extension:AbuseFilter/RulesFormat --Tahnok 13:09, 26 August 2011 (PDT)
Thanks - rules url noted.
10 edits / 10 days should befine - I'm not aware of any spammer that made it more than a day or two and 3 or 4 edits, so that leaves us some margin. When the spambots get smarter, we can tweak the rules. --Chriswaterguy 08:59, 28 August 2011 (PDT)

personal injury solicitor[edit]

Pretty sure that I saw personal injury solicitor link spam before. Edits to be flagged? --comment by Tahnok, {{{2}}}.

Agreed. See suggestions below. --Chriswaterguy 10:22, 7 November 2011 (PST)


Suggestion[edit]

How about an AbuseFilter filter that blocks any edit with the following criteria:

  1. New or anon editor
  2. Includes <a href=" - or the BBCode equivalent, [url.*[/url] (noting that it may start with either [url] or [url=)
  3. Includes one of a number of key phrases. "personal injury," casino, nonchalantly, "This page is a really good article"... but there are lists for this: this one and I'm sure there are more up-to-date ones.

Just the first two might be enough, but it's possible that a new genuine editor could paste in raw HTML, so I suggest only blocking if it also has a suspicious key phrase.

It could be flagged if it either includes the HTML/BBCode link code or a suspicious word, but not both. Does that need to be with a separate filter, or can we flag partial filter hits? --Chriswaterguy 10:22, 7 November 2011 (PST)