Home > Observations > Dynamic JVM language Complexity and Implementation Line Count

Dynamic JVM language Complexity and Implementation Line Count

Yesterday, I was hanging out in the #clojure IRC chat (pretty good bunch of guys).  A fella mentioned that he was impressed with how easy it is to read and understand clojure’s guts, and I totally identified with his sentiment.  I’ve always found it informative and enjoyable.

His words:

“im pretty sure i know more about the internals of clojures functions in a few years of dabbling than i know about python after many years of serious use”

Now, I don’t mean to python-bash, I myself was very interested in python in much the same way as I am in clojure now, in fact I volunteered on the AV crew at pyCon 2009 and learned a ton of good stuff. I recommend an experience like this to anyone that can do it.  Of course, I’ve come to love clojure for the same reasons I loved python, plus the extra concurrency features, metaprogramming and speed.

In python, I always had the sentiment that there was some magical __hack__() function you could mess with to do weird stuff.  In general, there felt to be a whole layer of that underpinning everything.  You get the sense that there’s some untouchable magic.  I pride myself on doing deep-dives into whatever I’m working on and understanding it thoroughly at every level of abstraction.  I like the security and agility that comes with knowing what my code actually does down to the machine-level.  It means nothing’s sacred.  I am comfortable changing my code around quickly and I can generally grok others’ code pretty well.  In the couple of years that I really cared about python, I never reached this level, even with the great talks I saw at pyCon that explained a lot of internals.   Maybe if I spent some more years on it, the situation would improve, but lately I’m just more interested in learning CS fundamentals and actually getting stuff done.  Still, I could work in python without knowing every detail, and it felt better than java.

Clojure doesn’t feel like that at all.  In fact, to me it’s like a thin veneer over java.  I think I understand the gist of the whole implementation (minus macros, crazy data structure stuff and STM implementation details).  All the features are pretty orthogonal.  Yet, when people talk about clojure, they say it is powerful, terse and clear (I think).

I decided to do a little line count of the language implementations for fun.  I chose jython and jruby instead of their C implementations as it seems more apples to apples.  I chose jruby for the contrast to jython, since I was initially under the impression that they’re pretty similar.  They all share a garbage collector and underlying java infrastructure to build from, so ideally the code should be representative of only the language’s syntax and semantics, along with the standard library.  I downloaded the latest version of each implementation from their github pages, and took care to only count code in the src/ subdirectories, not tests.  Hopefully this is some indication of relative complexity.

Too much talking.  Here’s results:

JRuby:

-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Java                          1230          35707          50104         191259
yacc                             3            394            121           3986
Ruby                            36            239            196           1452
HTML                             1              0              0             65
-------------------------------------------------------------------------------
SUM:                          1270          36340          50421         196762
-------------------------------------------------------------------------------

Jython:

--------------------------------------------------------------------------------
Language                      files          blank        comment           code
--------------------------------------------------------------------------------
Java                            436          15226          15396          94203
Python                           10            916           1032           3192
Bourne Again Shell                1             23             33            152
DOS Batch                         1             29             28            101
--------------------------------------------------------------------------------
SUM:                            448          16194          16489          97648
--------------------------------------------------------------------------------

Clojure:

-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Java                           146           5485          11086          35386
Lisp                            38           2286            828          14557
XML                              2              0              0             81
HTML                             2              6             58             71
-------------------------------------------------------------------------------
SUM:                           188           7777          11972          50095
-------------------------------------------------------------------------------

Now, what can we observe here?

Firstly, both jython and jruby are mostly implemented in java.  Clojure on the other hand has a disproportionate amount of high-level code.  I think this might be due to performance philosophy more than anything.  I bet if the implementors could get the same speed implementing ruby-in-ruby or python-in-python, they would (see PyPy and Rubinius).  Clojure has a strong performance focus, and clojure code is compiled as statically as possible, which means you can write more performant stuff in high-level code (I know JRuby has gotten much, much faster lately with the invokeDynamic work).  Also, since it’s a lisp, it bootstraps itself from meager beginnings incrementally. I encourage everyone to study core.clj to see how awesome this is.

Secondly, wtf, clojure is 1/2 the size of jython and 1/4 the size of jruby.  Why?  Well, it’s a lisp.  This means that a parser is pretty trivial with LispReader.java clocking in at 1k lines.  Comparatively, the jython antlr package is 6k, and the jruby parser is 19k of java (4k yacc).  Once it’s parsed, it’s all native data structures that can be worked on with the normal sequence functions.  The others have crazy AST tree-visitor things.

How do we explain the rest of the size difference?  I’m not really sure.  I think a lot of it is actual language semantics, perhaps a substantial amount of code could be in an expanded standard library, but I highly doubt that jruby’s library is THAT much more useful than jython’s.  Of course, both those languages have the burden of implementing a standard library that acts the same on native code and the jvm, so I imagine there’s a lot of wheel re-invention.  Also, clojure’s relatively new.  I have no doubt that it will expand, but I trust it will keep comparatively slim.  Fun fact: if you pretend clojure code is 10x as dense as java (some say that), you still get 181k , less than jruby :-).

So, I know relatively little about ruby, and a bit more about python.  The conclusion I want to put forth isn’t that they’re doing it wrong or we’re better.  I just want us to ask the question, is all that extra stuff really necessary for the power you get from it?  And, wouldn’t you want to know how your tools work, too, or is it just a weird fixation?  Do people like to program by faith, or is it really not so hard to hold those languages fully in your head?

EDIT:
Groovy:

-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Java                           828          20549          51344         103197
Groovy                         147           2899           5407          12944
HTML                            62            153            165           1829
XML                              6             23             42             79
CSS                              1              7              7             14
-------------------------------------------------------------------------------
SUM:                          1044          23631          56965         118063
-------------------------------------------------------------------------------

Groovy is representative of a dynamic language that is loyal to only the JVM.  It should be more comparable to clojure since it does not have the burden of multi-platform semantics.  The antlr package alone (parser) is 7k lines of java.
Scala (a little complicated):
main src directory:

--------------------------------------------------------------------------------
Language                      files          blank        comment           code
--------------------------------------------------------------------------------
Scala                          1507          31557          64947         136732
Java                            202           5734          14834          26034
Javascript                        7             97            114           1217
XML                              18             74             70            973
CSS                               4            195             98            866
HTML                              1             19             13            168
Bourne Again Shell                1             24             35            143
Bourne Shell                      1              1              4              3
--------------------------------------------------------------------------------
SUM:                           1741          37701          80115         166136
--------------------------------------------------------------------------------

just the compiler directory:

--------------------------------------------------------------------------------
Language                      files          blank        comment           code
--------------------------------------------------------------------------------
Scala                           456          14648          22478          77911
Javascript                        7             97            114           1217
CSS                               3            179             97            804
Bourne Again Shell                1             24             35            143
XML                               2              0              4             30
--------------------------------------------------------------------------------
SUM:                            469          14948          22728          80105
--------------------------------------------------------------------------------

Scala’s here just for fun. it’s a static language, so it doesn’t quite fit in with the rest.

About these ads
Categories: Observations
  1. January 15, 2012 at 5:09 am | #1

    Think about it like this. A string in clojure is a java string. JRuby has to implement it’s own String class. Now repeat this for a lot of things and that explains a fair amount of the difference. There also isn’t a standard library in clojure, but there is in JRuby. This explains even more of the difference.

    This has drawbacks. When I was first learning clojure I didn’t know java very well. It was very frustrating when some clojure function expected a Reader and this other function expected a java File, etc, etc. I had to learn a fair amount of java to use clojure.

    • gtrak
      January 15, 2012 at 5:23 pm | #2

      Yes, I can’t relate to someone learning clojure who isn’t familiar with Java, but I’d argue that Java + clojure might actually be less complex than ruby all the way down. Not counting java’s vast libraries of course, since you only pay mentally for what you use.

  2. Orion Edwards
    January 15, 2012 at 5:13 am | #3

    The Ruby standard library is quite substantial (as is Python’s), and so JRuby requires a *ton* of code for compatibility with other ruby code. I’m suspicious that quite a large part of the perceived complexity is nothing to do with the language itself, merely standard libraries.

    JRuby also seems to get a lot more work put into it than Jython, so I’d guess that’s why it’s larger

  3. January 15, 2012 at 12:23 pm | #4

    Interesting, I’m really curious why jruby is so much larger… but not curious enough to start digging around :(

  4. Nicolas
    January 15, 2012 at 8:24 pm | #5

    I would no judge JRuby or Jython on code size. What I just see is that clojure.core is a pretty small code base.

    I guess we have several reasons:
    It is a lisp: simpler to parse and compile.
    There is no lot of libraries included in clojure.core…
    Clojure is newer and thus don’t have same amount of bug fix and features
    Clojure is made with the JVM in mind. It doesn’t fight it, it leverage it.

    Anyway, all theses softwares appear to be quite small. 50 to 200K LOC is not a huge amount of code. And that pretty impressing counting what they achieve.

    I would be currious how it compare with scala…

  5. January 15, 2012 at 8:57 pm | #6

    Great Stuff!. What about comparing to something small like a Lua version for he JVM?

  6. January 17, 2012 at 9:31 pm | #7

    Yeah, I have to agree it’s not fair to compare code sizes of Clojure and JRuby. JRuby is a port from another platform, Clojure isn’t. In order to keep compatibility, JRuby needs a lot of code that Clojure doesn’t.

    • gtrak
      January 17, 2012 at 9:39 pm | #8

      Yes, I think I address these points. Python and Ruby are a closer comparison to each other because of this issue, in fact, this is why I included ruby in the argument in the first place. Maybe I should throw Groovy in the mix since it’s only loyal to the jvm?

  7. January 18, 2012 at 10:55 pm | #9

    I think “native” JVM langs like Groovy and Scala would be better for comparison with Clojure, I’d be curious to see what those numbers look like.

    • gtrak
      January 19, 2012 at 3:05 am | #10

      I added groovy and scala

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: