Archive for the ‘Ruby’ Category


JRuby still kicking some ass

April 28, 2008

Ruby’s future is looking bright, in no small part due to the many implementations. How’s your favorite language looking?

That’s always such a pleasure to read Charles views and stories.


What’s wrong with *J*Groovy?

February 23, 2008


Every new year comes up with its JRuby vs (J)Groovy battle. Obviously (J)Groovy users/community are good at promotion, you’ll hardly ever see posts about JRuby without (J)Groovy marketing propaganda posted in comments.

I am definitely in favor of JRuby but I can understand some of the (J)Groovy arguments. I have nothing against JGroovy, it surely integrates pretty well with Java World and I think the mix of Java with Groovy will be easily adopted by the fearful Managers (fearing Ruby).

So what’s wrong with *J*Groovy ?…


JRuby and Jython were developped on top of 2 popular languages (Ruby and Python) which have been existing for more than 15 years, with their own community. Today Ruby and Python are heavily used running their own vms on nearly all os possible. Ruby has no less than 4 vm implementations. JGroovy is based on Groovy language which is 5 years old and only cloned what it found cool from its neighboors to patch Java dinosaur. Would you invest in such language? At least Sun, Microsoft, IBM, Thoughtworks, Oracle made their choice…

Fact is Groovy language has no mean of existence on its own without Java.


Rails Wanted Dead or Alive $2.500.000!

July 10, 2007

Wanted Rails

Rails success is not debatable. But this was not an easy battle (and still isn’t) and some projects are trying to keep the ewes into the JEE garden.

Rails SLOC 2004-2007

This represents the SLOC (Source Lines Of Code) of Rails since it was born 3 years ago.
What to say about this? Basically you can’t say much about SLOC (see the SLOC part of a previous post)… Except that the progress is regular (which can be explained by the fact that it is maintained by a company in contrast to open source projects which might have an important growing community of commiters)
Now sloccount gives us a summary:

Total Physical Source Lines of Code (SLOC) = 74,385 (100% Ruby)
Development Effort Estimate, Person-Years (Person-Months) = 18.45 (221.45)
(Basic COCOMO model, Person-Months = 2.4 * (KSLOC**1.05))
Schedule Estimate, Years (Months) = 1.62 (19.46)
(Basic COCOMO model, Months = 2.5 * (person-months**0.38))
Estimated Average Number of Developers (Effort/Schedule) = 11.38
Total Estimated Cost to Develop = $ 2,492,879
(average salary = $56,286/year, overhead = 2.40)

So the reward for Rails is $2,492,879!
This is based on 2004 salary…

Now if you want to have some fun and get some stats from your favourite svn repository, go get Ruby, go get Gruff, go get sloccount, go get a Beer, and relax!
Sorry for the format but WordPress really doesn’t help me…
# railssloccount.rb
# July 10, 2007

require 'date'
require 'rubygems'
require 'gruff'

DIR_DOWNLOAD = '/home/username/railssloccount/'
STEP = 15
FIRST_DATE = '2004-11-29'
axis = {}
datas = []
index = 0

puts "svn checkout --revision {#{FIRST_DATE}} \
\"hxxp://\" \

`svn checkout --revision {#{FIRST_DATE}} \
\"hxxp://\" \

(Date.parse(FIRST_DATE) do |date|
axis[index] = date.to_s \
index += 1
puts "svn update --revision {#{date}} \
\"hxxp://\" \

`svn update --revision {#{date}} \
\"hxxp://\" \

puts "sloccount #{DIR_DOWNLOAD}trunk"
result_cmd = `sloccount #{DIR_DOWNLOAD}trunk`"#{DIR_DOWNLOAD}sloccount_#{date}.txt", "w") { |f|
f << result_cmd
result = result_cmd[/ruby:\s*(\d+)/].gsub(/ruby:\s*/, '').to_i
puts "Date: #{date} - SLOC: #{result}"
datas << result

axis[index-1] = last_date.to_s

g ='Rails SLOC', datas)
g.labels = axis
g.minimum_value = 0
g.title = 'Rails SLOC'


How to detect which language a text is written in? Or when science meets human!

May 13, 2007

As I mentioned earlier in my spam attack analysis, I wanted to know which language spams I receive are written in. My first bruteforce-like idea was to take each word one by one, and search in english/french/german/… dictionaries whether the words were in. But with this approach I would miss all the conjugated verbs (until I had a really nice dictionary like the one I have now in firefox plugin). Then I remember that languages could differ in the distribution of their alphabetical letters, but well I had no statistics about that…
That was it for my own brainstorming, I decided to have a look at what google thinks about this problem. I firstly landed on some online language detector… The easy solution would have been to abuse this service which must have some cool algorithms, but well I needed to know what kind of algorithms it could be, and I didn’t want to rely on any thirdparty web service. Finally I read about Evaluation of Language Identification Methods, of which the abstract seemed perfect:

Language identification plays a major role in several Natural Language Processing applications. It is mostly used as an important preprocessing step. Various approaches have been made to master the task. Recognition rates have tremendously increased. Today identification rates of up to 99 % can be reached even for small input. The following paper will give an overview about the approaches, explain how they work, and comment on their accuracy. In the remainder of the paper, three freely available language identification programs are tested and evaluated.

I found the N-gram approach on page 8 (chapter 4) rather interesting. The principle is to cut into defined pieces m long texts written in their respective language (english, french…), that we will call training texts, and count how much time each piece appeared; Do the same on the text you want to identify, and check the training text matching your text the best; This training text is most likely written in the same language as your text.
The pieces are the N-grams, ie for the word GARDEN the bi-grams (N=2) are: G, GA, AR, RD, DE, EN, N.
Now there are various way of finding the best matching text playing with the N-grams, distances, score…

N-gram comparison
I found an implementation from 1996 in C, here with sources. So I followed same algorithm and implemented it in Ruby. Those C sources reminded me of my C days where you had to implement your lists, hashes. Those sources are optimized for memory usage (10 years ago…)… At the end the Ruby code is a hundred line while the C was four times more, and the Ruby code is easier to read. Don’t take that as a demonstration, it is not!! I admit the C binary is maybe a bit faster (but not that much ;)). I’ll try to commit it on rubyforge when I have some time.

The results are excellent as shown in the paper:

N-gram results

Anyway actually in this story the most interesting was not the implementation but the method: It is funny that you can identify languages (so human population as well) by without requiring linguistic knowledge: ignoring grammar, senses of words (dictionary)… But by only analyzing letters and blocks of letters from Shakespeare or Baudelaire. N-grams can also be used in other areas, for example in music to predict which note is likely to follow.


Spam attacks! When? How? What? … in Ruby

May 13, 2007

Today I was wondering about those spams I receive daily. GMail is doing a great work at detecting them, reason why I decided to forward several of my polluted personal mails to my GMail addie. I wanted to know more about those spams and additionally wanted to do that quickly and with fun. So I took my favorite Ruby IDE, I installed some ruby gems: gmailer, activerecord, gruff (together with Mouraf’s patch to extend legend as it was cut when too long on Gruff::Pie).
You should note that to make GMailUtils gem run, you should also have ‘net/https’ library installed on your pc, else you’ll end with a mysterious:
irb(main):001:0> require 'gmailer'
LoadError: no such file to load -- gmailer
from (irb):1:in `require'
from (irb):1

Solution :
sudo apt-get install libopenssl-ruby
Why I decided to use an api to get my mails while GMail allow pop3? Firstly I wanted to play with this gem, then you can’t get directly your spam through gmail pop3 (you’ll need some label tricks to finally put them in your ‘Inbox’, something that you could do in a first pass with GMailUtils), also the api goes through https thus bypassing usual firewalls that block pop3 port.
I wanted to know which language those f****** spams were in… So I decided to code (translate in Ruby even!) a language detector (am blogging about it here).
So some lines of code later I had what I wanted to know (at least for the last month as GMail has a Spam buffer):

Pie LanguageChart Hour

Pie SemanticMonthly Spam

Bars WeekDay Spam

What to conclude from this? Well actually nothing much… Except that 91% of spam is in english (followed by 7% of which are french but that is normal as I am french and I have french email addie forwarded to my gmail). 34% of the pollution is concerning viagra, sex… 16% about watch. Concerning the distribution of spam in time I thought I would find more observable period; It only seems as if I receive more spam on Thursday.


Why Ruby Matters ?

March 22, 2007

Last week Alexis was wondering about Haskell becoming the future of Rubyists. Reginald Braithwaite, one of the ex-JProbe suite leader, re-read Why Functional Programming matters ? from John Hugues. Although this paper is 23 years old, it is still up-to-date, and the functional paradigms it describes are still applicable. Reginald found out that there were insights that apply to programming language in general :

In a very real sense, the design of a programming language is a strong expression of the opinions of the designer about good programs. When I first read WhyFP, I thought the author was expressing an opinion about the design of good programming languages. Whereas on the second reading, I realized he was expressing an opinion about the design of good programs.

Then Reginald defines what makes a language better or more powerful.

Any feature (or removal of an [harmful] feature) which makes the programs written in the language better makes the language better.

Making an analogy with Mathematics, Reginald compares factoring with the the act of dividing a program into smaller part. The process of breaking a program into distinct features overlapping as little as possible in functionalities is called Separate of Concern (SoC). Programs that separate their concern are well factored. From this fact, Reginald defines the power of programming language :

One thing that makes a programming language “more powerful” in my opinion is the provision of more ways to factor programs. Or if you prefer, more axes of composition. The more different ways you can compose programs out of subprograms, the more powerful a language is.

Structured programming is a way to promote this.

Reginald illustrates his talk with Ruby examples where you can clearly distinguish the separation of concern between the how and the what.

In the end even if Ruby cannot be called a pure functional language, Reginald showed us notably Why… Ruby Matters.


Ruby / Rails IDE Comparison : Idea, Netbeans, RadRails

February 28, 2007

Starting BlocksWelcome early-early adopters!
Ruby and Rails are getting more and more popular in the community and well known editors start to get into the business for our pleasure!
While my editor of choice for Java has always been Idea (since v2.6 about 6 years ago) as I always found their product avantgardist and really userfriendly and codingfriendly, I wanted to see what was going on in the Rails / Ruby world where I was historically using RadRails and SciTe because of the lack of serious competitors. The simple editors like vim (for the nostaligcs) or SciTE are likely to fit your needs for short and simple scripts but a full IDE is always better to have when you are working on a more important project. The Ruby language itself eliminated a lot of features you would need from an IDE in other languages like Java (For example I am thinking about the Generating Getters / Setters from fields that you get directly with the attr accessors or some long live template public static final String …). The absence of type and the dynamism makes it also impossible for IDEs to do some operations you would do on typed static language (like Java).
Firstly you must notice that apart from RadRails which has been into Rails / Ruby editing for some time, Idea and NetBeans support for Ruby is really fresh (officially) so you should be lenient. You’ll also observe those IDEs tested here are all written in Java (as Plugins). Now you can wonder why not in Ruby ? There are several reasons I guess, notably a lack of serious good looking cross platform gui framework in Ruby (Tk is far from swing and swt quality and anyway it is not Ruby anyway even if that’s the easiest interface to plug with Ruby or Python); Also making an IDE from a well proven platforms guarantees that you’ll benefit from the history and quality of existing software features.
You should also try those IDEs by yourself as an IDE is a day-to-day tool that you learn to use and adopt with time and not with some simple test. That’s why here I’ll mainly compare features.

Read the rest of this entry ?