Spam attacks! When? How? What? … in Ruby

May 13, 2007

Today I was wondering about those spams I receive daily. GMail is doing a great work at detecting them, reason why I decided to forward several of my polluted personal mails to my GMail addie. I wanted to know more about those spams and additionally wanted to do that quickly and with fun. So I took my favorite Ruby IDE, I installed some ruby gems: gmailer, activerecord, gruff (together with Mouraf’s patch to extend legend as it was cut when too long on Gruff::Pie).
You should note that to make GMailUtils gem run, you should also have ‘net/https’ library installed on your pc, else you’ll end with a mysterious:
irb(main):001:0> require 'gmailer'
LoadError: no such file to load -- gmailer
from (irb):1:in `require'
from (irb):1

Solution :
sudo apt-get install libopenssl-ruby
Why I decided to use an api to get my mails while GMail allow pop3? Firstly I wanted to play with this gem, then you can’t get directly your spam through gmail pop3 (you’ll need some label tricks to finally put them in your ‘Inbox’, something that you could do in a first pass with GMailUtils), also the api goes through https thus bypassing usual firewalls that block pop3 port.
I wanted to know which language those f****** spams were in… So I decided to code (translate in Ruby even!) a language detector (am blogging about it here).
So some lines of code later I had what I wanted to know (at least for the last month as GMail has a Spam buffer):

Pie LanguageChart Hour

Pie SemanticMonthly Spam

Bars WeekDay Spam

What to conclude from this? Well actually nothing much… Except that 91% of spam is in english (followed by 7% of which are french but that is normal as I am french and I have french email addie forwarded to my gmail). 34% of the pollution is concerning viagra, sex… 16% about watch. Concerning the distribution of spam in time I thought I would find more observable period; It only seems as if I receive more spam on Thursday.

One comment

  1. To tell you the truth, I’ve been trying to figure out some of this info here. While I have no idea how to use this code I can tell from your graphs that I’m getting the same type of incoming garbage as you. And I’ve noticed just from observation that there are certain days where I get more of this garbage. I’ve noticed that when I visit certain blogs with ads (even hosted on WordPress) that I get more spam to my Gmail account. I’ve been watching because my Akismet thing is up to like 40 some thousand now. Insane.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: