Total Google searches: 5346

January 1, 2008

That’s the number of search I did using Google in last 6 months (since they set up their web history around June). Do you imagine that ? More than 10000 searches/year, 27/day, 1/hour…

Now you have an idea of Google success. Google’s search engine is the thing I use the more together with… Gmail, GoogleReader…

Of course I am not a typical web user as 1) I work in IT 2) I am damn curious and need to find what I am looking for.

Now this raises several questions :

  • Is this number (5346) accurate ? Didn’t I search more… or less ?

The counter must not have missed many queries (ie when I am not logged in) as I nearly always have a living opened Google session with Gmail in some other tab while I am on inet. Though this number could actually be cut by a half because many semantical searches out of all this needed several refinements, each incrementing the counter (ie “m17n windows”, “notepad m17n compliant”, “m17n editor”, …)

  • Is this web history useful ? For me ? For Google ?

For me, not really… All searches lead to Rome anyway so why would I need a very specific search I did 5 months ago ? The only time it was useful for me was now to realize how much I was using Google.

For Google,… it’s a damn mine of information (one more).

Oh, by the way, I only clicked on 4 sponsors in 6 months, which gives a 0.0007% click ratio. But maybe this is because I block Ads, Fuck Ads! Too bad such useful services as Google live on ads only…


I will not violate Demeter’s Law!

October 28, 2007

I find “laws” when applied to software design a wrong term. I prefer the term “best practice”.

I will not violate Demeter’s Law!


Rails Wanted Dead or Alive $2.500.000!

July 10, 2007

Wanted Rails

Rails success is not debatable. But this was not an easy battle (and still isn’t) and some projects are trying to keep the ewes into the JEE garden.

Rails SLOC 2004-2007

This represents the SLOC (Source Lines Of Code) of Rails since it was born 3 years ago.
What to say about this? Basically you can’t say much about SLOC (see the SLOC part of a previous post)… Except that the progress is regular (which can be explained by the fact that it is maintained by a company in contrast to open source projects which might have an important growing community of commiters)
Now sloccount gives us a summary:

Total Physical Source Lines of Code (SLOC) = 74,385 (100% Ruby)
Development Effort Estimate, Person-Years (Person-Months) = 18.45 (221.45)
(Basic COCOMO model, Person-Months = 2.4 * (KSLOC**1.05))
Schedule Estimate, Years (Months) = 1.62 (19.46)
(Basic COCOMO model, Months = 2.5 * (person-months**0.38))
Estimated Average Number of Developers (Effort/Schedule) = 11.38
Total Estimated Cost to Develop = $ 2,492,879
(average salary = $56,286/year, overhead = 2.40)

So the reward for Rails is $2,492,879!
This is based on 2004 salary…

Now if you want to have some fun and get some stats from your favourite svn repository, go get Ruby, go get Gruff, go get sloccount, go get a Beer, and relax!
Sorry for the format but WordPress really doesn’t help me…
# railssloccount.rb
# July 10, 2007

require 'date'
require 'rubygems'
require 'gruff'

DIR_DOWNLOAD = '/home/username/railssloccount/'
STEP = 15
FIRST_DATE = '2004-11-29'
axis = {}
datas = []
index = 0

puts "svn checkout --revision {#{FIRST_DATE}} \
\"hxxp://dev.rubyonrails.org/svn/rails/trunk\" \

`svn checkout --revision {#{FIRST_DATE}} \
\"hxxp://dev.rubyonrails.org/svn/rails/trunk\" \

(Date.parse(FIRST_DATE)..Date.today).step(STEP) do |date|
axis[index] = date.to_s \
index += 1
puts "svn update --revision {#{date}} \
\"hxxp://dev.rubyonrails.org/svn/rails/trunk\" \

`svn update --revision {#{date}} \
\"hxxp://dev.rubyonrails.org/svn/rails/trunk\" \

puts "sloccount #{DIR_DOWNLOAD}trunk"
result_cmd = `sloccount #{DIR_DOWNLOAD}trunk`
File.open("#{DIR_DOWNLOAD}sloccount_#{date}.txt", "w") { |f|
f << result_cmd
result = result_cmd[/ruby:\s*(\d+)/].gsub(/ruby:\s*/, '').to_i
puts "Date: #{date} - SLOC: #{result}"
datas << result

axis[index-1] = last_date.to_s

g = Gruff::Line.new(GRAPH_SIZE)
g.data('Rails SLOC', datas)
g.labels = axis
g.minimum_value = 0
g.title = 'Rails SLOC'


A Controller DSL to Complement MVC

July 3, 2007

Bruce Williams has been working on the view pattern over the past years; Notably the problems it raises: heaviness, conciseness, complexity, depending on the solution you choose. The presentation he gave at RailsConf, When “V” is for “Vexing”, introduced a Controller DSL: folder_for. Bruce gave an exhaustive use case of it on his blog.
Starting from a simple example that will need Folder interfaces, he naturally chose to use a DSL: folder_for.

 class CarsController < ApplicationController

    # ...
    folder_for :show do
      tab "General Information" do
        @score = current_user.score_for_car(@car)
      tab "History"
      tab "Photos"
    # ...


Bruce uses convention over the use of one view template to render each selected tab partial (cars/show.html.erb). His views will be structured this way then:


Bruce doesn’t only bring a new plugin which is more like a sample code; But he showed that a hundred lines of code was enough to accomplish his task demonstrating that a DSL can be deployed effectively with Ruby. The code is cleaner even if breaking some rules:

There are likely several of you reading this that are disturbed to see a folder reference within a controller, as this amounts to some level of MVC “separation of concerns” blasphemy in your very strict, very well-worn code rulebook.

I’m here to tell you it’s okay, and you’ll recover in time.

Let’s keep in mind here that the MVC separation of concerns, while a great rule of thumb, is just that … it is not an ivory tower to be left unassailed in times of dire need. At times, it makes sense to allow abstractions to cross these boundaries for the sake of reducing our own overhead, and in the cause of developing your own app-wide domain specific language—something, that in my book, is the principal sign of a good Rails developer (for what it’s worth).


When visionaries are too much in advance.

May 15, 2007

There’s a company out there that always surprised me for always being in advance next to its competitors. This company is JetBrains (and no I do not work for JetBrains nor do I have stock options in it). But sometimes seeing in the future doesn’t always pay (in term of $!). Indeed maybe you remember back in 2004 Sergey Dmitriev, the cofounder and CEO of JetBrains Inc., published a paper about Language Oriented Programming: The Next Programming Paradigm.

Rather than solving problems in general-purpose programming languages, the programmer creates one or more domain-specific programming languages for the problem first, and solves the problem in those languages.

I remember reading that paper when it came out and finding it really promising and avantgardist, I thought it would be the future. Soon after I gave a try to MPS EAP version. Martin Fowler found it promising as well in 2005:

Although I’m not enough of a prognosticator to say whether they will succeed in their ambition, I do think that these tools are some of the most interesting things on the horizon of software development.

The old debates were already starting… Unfortunately, years passed and MPS became a commercial failure, and has been like discontinued since then. Nowadays everybody and his dog talk about DSL, maybe that’s the reason why it appears again on JetBrains website. Sergey had only 2 years in advance with his development software solution. People always complain about late delivery of their software, JetBrains is the only company delivering your software and features before you ever needed it!


How to detect which language a text is written in? Or when science meets human!

May 13, 2007

As I mentioned earlier in my spam attack analysis, I wanted to know which language spams I receive are written in. My first bruteforce-like idea was to take each word one by one, and search in english/french/german/… dictionaries whether the words were in. But with this approach I would miss all the conjugated verbs (until I had a really nice dictionary like the one I have now in firefox plugin). Then I remember that languages could differ in the distribution of their alphabetical letters, but well I had no statistics about that…
That was it for my own brainstorming, I decided to have a look at what google thinks about this problem. I firstly landed on some online language detector… The easy solution would have been to abuse this service which must have some cool algorithms, but well I needed to know what kind of algorithms it could be, and I didn’t want to rely on any thirdparty web service. Finally I read about Evaluation of Language Identification Methods, of which the abstract seemed perfect:

Language identification plays a major role in several Natural Language Processing applications. It is mostly used as an important preprocessing step. Various approaches have been made to master the task. Recognition rates have tremendously increased. Today identification rates of up to 99 % can be reached even for small input. The following paper will give an overview about the approaches, explain how they work, and comment on their accuracy. In the remainder of the paper, three freely available language identification programs are tested and evaluated.

I found the N-gram approach on page 8 (chapter 4) rather interesting. The principle is to cut into defined pieces m long texts written in their respective language (english, french…), that we will call training texts, and count how much time each piece appeared; Do the same on the text you want to identify, and check the training text matching your text the best; This training text is most likely written in the same language as your text.
The pieces are the N-grams, ie for the word GARDEN the bi-grams (N=2) are: G, GA, AR, RD, DE, EN, N.
Now there are various way of finding the best matching text playing with the N-grams, distances, score…

N-gram comparison
I found an implementation from 1996 in C, here with sources. So I followed same algorithm and implemented it in Ruby. Those C sources reminded me of my C days where you had to implement your lists, hashes. Those sources are optimized for memory usage (10 years ago…)… At the end the Ruby code is a hundred line while the C was four times more, and the Ruby code is easier to read. Don’t take that as a demonstration, it is not!! I admit the C binary is maybe a bit faster (but not that much ;)). I’ll try to commit it on rubyforge when I have some time.

The results are excellent as shown in the paper:

N-gram results

Anyway actually in this story the most interesting was not the implementation but the method: It is funny that you can identify languages (so human population as well) by without requiring linguistic knowledge: ignoring grammar, senses of words (dictionary)… But by only analyzing letters and blocks of letters from Shakespeare or Baudelaire. N-grams can also be used in other areas, for example in music to predict which note is likely to follow.


Spam attacks! When? How? What? … in Ruby

May 13, 2007

Today I was wondering about those spams I receive daily. GMail is doing a great work at detecting them, reason why I decided to forward several of my polluted personal mails to my GMail addie. I wanted to know more about those spams and additionally wanted to do that quickly and with fun. So I took my favorite Ruby IDE, I installed some ruby gems: gmailer, activerecord, gruff (together with Mouraf’s patch to extend legend as it was cut when too long on Gruff::Pie).
You should note that to make GMailUtils gem run, you should also have ‘net/https’ library installed on your pc, else you’ll end with a mysterious:
irb(main):001:0> require 'gmailer'
LoadError: no such file to load -- gmailer
from (irb):1:in `require'
from (irb):1

Solution :
sudo apt-get install libopenssl-ruby
Why I decided to use an api to get my mails while GMail allow pop3? Firstly I wanted to play with this gem, then you can’t get directly your spam through gmail pop3 (you’ll need some label tricks to finally put them in your ‘Inbox’, something that you could do in a first pass with GMailUtils), also the api goes through https thus bypassing usual firewalls that block pop3 port.
I wanted to know which language those f****** spams were in… So I decided to code (translate in Ruby even!) a language detector (am blogging about it here).
So some lines of code later I had what I wanted to know (at least for the last month as GMail has a Spam buffer):

Pie LanguageChart Hour

Pie SemanticMonthly Spam

Bars WeekDay Spam

What to conclude from this? Well actually nothing much… Except that 91% of spam is in english (followed by 7% of which are french but that is normal as I am french and I have french email addie forwarded to my gmail). 34% of the pollution is concerning viagra, sex… 16% about watch. Concerning the distribution of spam in time I thought I would find more observable period; It only seems as if I receive more spam on Thursday.