Archive

Archive for January, 2012

Dynamic JVM language Complexity and Implementation Line Count

January 14, 2012 10 comments

Yesterday, I was hanging out in the #clojure IRC chat (pretty good bunch of guys).  A fella mentioned that he was impressed with how easy it is to read and understand clojure’s guts, and I totally identified with his sentiment.  I’ve always found it informative and enjoyable.

His words:

“im pretty sure i know more about the internals of clojures functions in a few years of dabbling than i know about python after many years of serious use”

Now, I don’t mean to python-bash, I myself was very interested in python in much the same way as I am in clojure now, in fact I volunteered on the AV crew at pyCon 2009 and learned a ton of good stuff. I recommend an experience like this to anyone that can do it.  Of course, I’ve come to love clojure for the same reasons I loved python, plus the extra concurrency features, metaprogramming and speed.

In python, I always had the sentiment that there was some magical __hack__() function you could mess with to do weird stuff.  In general, there felt to be a whole layer of that underpinning everything.  You get the sense that there’s some untouchable magic.  I pride myself on doing deep-dives into whatever I’m working on and understanding it thoroughly at every level of abstraction.  I like the security and agility that comes with knowing what my code actually does down to the machine-level.  It means nothing’s sacred.  I am comfortable changing my code around quickly and I can generally grok others’ code pretty well.  In the couple of years that I really cared about python, I never reached this level, even with the great talks I saw at pyCon that explained a lot of internals.   Maybe if I spent some more years on it, the situation would improve, but lately I’m just more interested in learning CS fundamentals and actually getting stuff done.  Still, I could work in python without knowing every detail, and it felt better than java.

Clojure doesn’t feel like that at all.  In fact, to me it’s like a thin veneer over java.  I think I understand the gist of the whole implementation (minus macros, crazy data structure stuff and STM implementation details).  All the features are pretty orthogonal.  Yet, when people talk about clojure, they say it is powerful, terse and clear (I think).

I decided to do a little line count of the language implementations for fun.  I chose jython and jruby instead of their C implementations as it seems more apples to apples.  I chose jruby for the contrast to jython, since I was initially under the impression that they’re pretty similar.  They all share a garbage collector and underlying java infrastructure to build from, so ideally the code should be representative of only the language’s syntax and semantics, along with the standard library.  I downloaded the latest version of each implementation from their github pages, and took care to only count code in the src/ subdirectories, not tests.  Hopefully this is some indication of relative complexity.

Too much talking.  Here’s results:

JRuby:

-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Java                          1230          35707          50104         191259
yacc                             3            394            121           3986
Ruby                            36            239            196           1452
HTML                             1              0              0             65
-------------------------------------------------------------------------------
SUM:                          1270          36340          50421         196762
-------------------------------------------------------------------------------

Jython:

--------------------------------------------------------------------------------
Language                      files          blank        comment           code
--------------------------------------------------------------------------------
Java                            436          15226          15396          94203
Python                           10            916           1032           3192
Bourne Again Shell                1             23             33            152
DOS Batch                         1             29             28            101
--------------------------------------------------------------------------------
SUM:                            448          16194          16489          97648
--------------------------------------------------------------------------------

Clojure:

-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Java                           146           5485          11086          35386
Lisp                            38           2286            828          14557
XML                              2              0              0             81
HTML                             2              6             58             71
-------------------------------------------------------------------------------
SUM:                           188           7777          11972          50095
-------------------------------------------------------------------------------

Now, what can we observe here?

Firstly, both jython and jruby are mostly implemented in java.  Clojure on the other hand has a disproportionate amount of high-level code.  I think this might be due to performance philosophy more than anything.  I bet if the implementors could get the same speed implementing ruby-in-ruby or python-in-python, they would (see PyPy and Rubinius).  Clojure has a strong performance focus, and clojure code is compiled as statically as possible, which means you can write more performant stuff in high-level code (I know JRuby has gotten much, much faster lately with the invokeDynamic work).  Also, since it’s a lisp, it bootstraps itself from meager beginnings incrementally. I encourage everyone to study core.clj to see how awesome this is.

Secondly, wtf, clojure is 1/2 the size of jython and 1/4 the size of jruby.  Why?  Well, it’s a lisp.  This means that a parser is pretty trivial with LispReader.java clocking in at 1k lines.  Comparatively, the jython antlr package is 6k, and the jruby parser is 19k of java (4k yacc).  Once it’s parsed, it’s all native data structures that can be worked on with the normal sequence functions.  The others have crazy AST tree-visitor things.

How do we explain the rest of the size difference?  I’m not really sure.  I think a lot of it is actual language semantics, perhaps a substantial amount of code could be in an expanded standard library, but I highly doubt that jruby’s library is THAT much more useful than jython’s.  Of course, both those languages have the burden of implementing a standard library that acts the same on native code and the jvm, so I imagine there’s a lot of wheel re-invention.  Also, clojure’s relatively new.  I have no doubt that it will expand, but I trust it will keep comparatively slim.  Fun fact: if you pretend clojure code is 10x as dense as java (some say that), you still get 181k , less than jruby :-).

So, I know relatively little about ruby, and a bit more about python.  The conclusion I want to put forth isn’t that they’re doing it wrong or we’re better.  I just want us to ask the question, is all that extra stuff really necessary for the power you get from it?  And, wouldn’t you want to know how your tools work, too, or is it just a weird fixation?  Do people like to program by faith, or is it really not so hard to hold those languages fully in your head?

EDIT:
Groovy:

-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Java                           828          20549          51344         103197
Groovy                         147           2899           5407          12944
HTML                            62            153            165           1829
XML                              6             23             42             79
CSS                              1              7              7             14
-------------------------------------------------------------------------------
SUM:                          1044          23631          56965         118063
-------------------------------------------------------------------------------

Groovy is representative of a dynamic language that is loyal to only the JVM.  It should be more comparable to clojure since it does not have the burden of multi-platform semantics.  The antlr package alone (parser) is 7k lines of java.
Scala (a little complicated):
main src directory:

--------------------------------------------------------------------------------
Language                      files          blank        comment           code
--------------------------------------------------------------------------------
Scala                          1507          31557          64947         136732
Java                            202           5734          14834          26034
Javascript                        7             97            114           1217
XML                              18             74             70            973
CSS                               4            195             98            866
HTML                              1             19             13            168
Bourne Again Shell                1             24             35            143
Bourne Shell                      1              1              4              3
--------------------------------------------------------------------------------
SUM:                           1741          37701          80115         166136
--------------------------------------------------------------------------------

just the compiler directory:

--------------------------------------------------------------------------------
Language                      files          blank        comment           code
--------------------------------------------------------------------------------
Scala                           456          14648          22478          77911
Javascript                        7             97            114           1217
CSS                               3            179             97            804
Bourne Again Shell                1             24             35            143
XML                               2              0              4             30
--------------------------------------------------------------------------------
SUM:                            469          14948          22728          80105
--------------------------------------------------------------------------------

Scala’s here just for fun. it’s a static language, so it doesn’t quite fit in with the rest.

Categories: Observations