Spam attacks! When? How? What? … in Ruby

May 13, 2007

Today I was wondering about those spams I receive daily. GMail is doing a great work at detecting them, reason why I decided to forward several of my polluted personal mails to my GMail addie. I wanted to know more about those spams and additionally wanted to do that quickly and with fun. So I took my favorite Ruby IDE, I installed some ruby gems: gmailer, activerecord, gruff (together with Mouraf’s patch to extend legend as it was cut when too long on Gruff::Pie).
You should note that to make GMailUtils gem run, you should also have ‘net/https’ library installed on your pc, else you’ll end with a mysterious:
irb(main):001:0> require 'gmailer'
LoadError: no such file to load -- gmailer
from (irb):1:in `require'
from (irb):1

Solution :
sudo apt-get install libopenssl-ruby
Why I decided to use an api to get my mails while GMail allow pop3? Firstly I wanted to play with this gem, then you can’t get directly your spam through gmail pop3 (you’ll need some label tricks to finally put them in your ‘Inbox’, something that you could do in a first pass with GMailUtils), also the api goes through https thus bypassing usual firewalls that block pop3 port.
I wanted to know which language those f****** spams were in… So I decided to code (translate in Ruby even!) a language detector (am blogging about it here).
So some lines of code later I had what I wanted to know (at least for the last month as GMail has a Spam buffer):

Pie LanguageChart Hour

Pie SemanticMonthly Spam

Bars WeekDay Spam

What to conclude from this? Well actually nothing much… Except that 91% of spam is in english (followed by 7% of which are french but that is normal as I am french and I have french email addie forwarded to my gmail). 34% of the pollution is concerning viagra, sex… 16% about watch. Concerning the distribution of spam in time I thought I would find more observable period; It only seems as if I receive more spam on Thursday.