Archive

Archive for the ‘Observations’ Category

Clojure Tradeoffs (design implications and why you should care)

June 26, 2013 3 comments

EDIT: HN thread: https://news.ycombinator.com/item?id=5943982

Clojure as a language and community is very sensitive to the definition and design of tradeoffs. This post is an attempt to elucidate the tradeoffs chosen by the language, what they mean to interested parties, and an attempt to predict the future based on these choices.

Motivation

Rich Hickey’s said a few things about design and the role of tradeoffs, in a recent talk he described design as consciously making choices about tradeoffs.  He has another important design tenet: design by decoupling concepts from each other.  So, clojure is in the interesting position of being an extra layer of abstractions that claims to actually simplify the task of programming in the long run.  It does this by pulling apart concepts programmers take for granted in order to assemble them more effectively.  Below are some tradeoffs I noticed through working with clojure for the past year and a half.  Some of them, I had never thought about in my previous languages, but I can see that by accepting a language, I also accepted a set of tradeoffs that guided how I work.  Because design tradeoffs (manifested as abstractions) determine what is easy and what is difficult, I think it’s valuable to see what tradeoffs are made.  It’s valuable to know what they encourage, what they discourage, and how they interact, so that we can have more control over our tools and environments.

LISPiness

Clojure is a lisp.  That fact alone means it builds on 50 years of infrastructure and thought, some of which has been absorbed into mainstream languages, but it also presents a foreign and scary interface to users that are used to other syntaxes.  You can perform ‘syntactic abstraction’ at the expense of visual clarity, but it also becomes easier to express ideas and give them names.  The programmer that works on a team is forced to become more judicious about these design choices, but the greater ease of expression means you’re never bound by what your language provides.  It is the ultimate tool to identify and remove repetition.

Concurrency

Clojure has a concurrency focus.  This tradeoff interacts with the decoupling desire in order to diminish the numbers and effects of problematic sites in programs.  With regards to state change over time, this is similar to Python’s mantra, ‘better explicit than implicit.’  Even though Clojure has all the nifty concurrency features you could ever want, thread-pools, async methods, etc.,  the most important simplifying feature to consider is in fact immutability by default, which is a decoupling of state-change and value that most languages freely intermix.  This, as a concept, is difficult to make compelling without an already captive audience, but there are great treatments of it from multiple sources.  The pitch usually starts off with emphasizing the pain of building shared-state concurrent systems, but it is applicable and useful in other contexts.

The Value of Values: http://www.infoq.com/presentations/Value-Values

In practice, it means you don’t lose any capabilities of java, but since you accept that what’s easy, idiomatic, and low-friction in clojure is the right thing to do, you will by default write safe and moderately performant code.  Reliable concurrency semantics fall out from the intentional convenience of a suite of core functions built around this ‘feature’.  It’s a similar conversation to telling a C programmer that you’re taking away their malloc, but the data structures needed to express it are written in normal java.  The brilliance of clojure’s core library is that it makes using these safe/fast immutable data structures (relatively) more convenient/effective than any other option, without artificially making java-like things less (absolutely) convenient/effective.

Shared-memory over other computing paradigms, ie message-passing.

Like C, C++, Java, Ruby, Python, etc.. clojure maps closely to the actual semantics of the machines that run those languages, which in our case is the Von-Neumann shared state bit-banging model.  We generally don’t even think about this tradeoff, but there are examples of systems that hold some other construct as fundamental, such as Erlang’s actor model.  In practice, this means there’s no wall of abstraction preventing you from using lower-level constructs that map efficiently to the hardware.  You can write low-level code just as well as or better than java, while writing high-level code very easily.  It’s simple to switch modes of thought and mix and match levels of abstraction due to the ‘Composition Tradeoff’.  The resulting abstraction soup is a little unnerving at first, but you come to appreciate it after a few months of use.  In my opinion, dealing with it is a worthwhile meta-skill :-).

Dynamic over Static

It’s the same in every dynamic vs static debate. Static languages have the advantage of stronger compiler support of domain-level assertions encoded directly into a type system.  Dynamic languages trade that for increased flexibility, which is helpful when existing code is repurposed or used unpredictably.  It becomes harder to reason about contracts of programmatic interfaces when the compiler and IDE isn’t helping you.  Increased documentation is more necessary as a result.  Automated tests can fill the role of compile-time assertions at run-time.  As an added benefit, 90% of my time is spent in interactive development, building things in a live, running environment.  Usually, static languages have a speed advantage, but in Clojure this is not the case…

Speed over Convenience

Clojurists love their neat dynamic tools, however they are also speed junkies.  There is no pervasive run-time system in clojure to slow everything down, however idiomatic use promotes heavy use of the immutable data structures, which are optimized to perform competitively to other choices.  More relevantly, Clojure lets you apply the 80/20 rule.  Given the ease of interop with java, it’s possible to pick and choose your own performance tradeoffs to implement components, without losing the benefits of dynamic languages.  For instance, Clojure Records provide fast java field-access for known-ahead-of-time fields, but they are also backed by a standard immutable hashmap for additional properties.  They can be treated with the conveniences of standard hash-maps, but they are also efficient.  Clojure’s deftype is equivalent to raw java if you need to go a step further.  At the core, care is taken to provide fast implementations for common operations.  Nothing prevents a user from using their own abstractions and data structures, and extending clojure’s abstractions over them.

Composition over Inversion-of-control

The emphasis on concurrency via shared, immutable data promotes standard, dynamic methods for libraries and functions to interact.  Since a user can trust that data is immutable and reliable, there is no need for things like defensive copying as is standard practice (or should be) in multi-threaded object systems like java.  It’s simply not easy or expedient to go out of your way to destroy someone else’s data.  This trust in data integrity coupled with the syntactic abstraction afforded via macros and higher-order functions means you can write concise code that composes in intuitive ways, without the need to hook yourself into someone else’s sandbox (eg Spring or Rails).

The most troublesome thing about such frameworks is the use of polymorphism and inversion of control as a sledgehammer to get around the inherent problems of OO.   Namely, OO couples state-change to objects (binds the effects of time to a specific bucket of memory), and functions to classes.  Clojure, on the other hand, feels like nothing you write is actually ‘doing’ anything at all.  Functions generally simply transform data, and occasionally you might fire off a side-effect or perform some coordinated state change.  You can trust that there is usually a simple relationship of inputs to outputs.  When you want polymorphism, you can get it in spades, but you’ll end up sprinkling it in occasionally instead of being bound to a particular style throughout the construction of your application.

When was the last time you tried to switch some Spring beans or Rails controllers over to another framework, or use two such frameworks in one application?  This is problematic primarily due to inversion of control binding all your code to the framework’s assumptions.  In clojure, you compose functions yourself, making more choices along the way, but the benefits of doing so coupled with the ease of dealing in data overshadows the need to trust in someone else’s choices.  Code becomes actually reusable, and it usually even reads more like a tree-expansion than a graph traversal.  The language features themselves are mostly orthogonal, and are similarly composable.

Community over Individualism

Lisp has a history of promoting an individualist spirit.  There’s such a thing as the ‘Lisp Curse’ http://www.winestockwebdesign.com/Essays/Lisp_Curse.html .  I personally believe Clojure is positioned to beat the curse, due to this generation’s emphasis on open-source, social media, friendly and productive chat rooms and newsgroups, and clojure’s intentional design decisions to promote interop between libraries.  One example is Clojure’s standardized Lisp Reader, which is more restricted than Common Lisp’s, but enables source code to be shared more easily.  There are excellent conferences with highly interesting talks, and the bootstrapping by java’s pre-existing momentum meant clojure was uniquely positioned to be useful at an early state.  At this point, I feel there is enough momentum to keep clojure moving forward for the foreseeable future.

Long-term benefits over short-term approachability

Clojure optimizes for long-term use and long-term simplicity over familiarity and initial ease.  However, at each decision point, there is compelling rationale driving the design decisions.  Things are made very easy when not at the expense of primary design concerns.

Tradeoffs for individuals

Certainly, by pursuing clojure, you are not pursuing other things, but the design elements are excellent to study, and the language itself is small.  Additionally, you are not leaving the JVM, which will be a relevant platform for many years to come.  Clojurescript has recently also come into existence, offering similar design tradeoffs targeting the Javascript VM, which will also stay relevant for the foreseeable future.  An investment in clojure will position the user to take advantage of many platforms in perhaps more convenient ways.  It’s not a wall of abstraction, and it doesn’t protect you from learning the host environment.  For these reasons, it’s less of an isolating ‘language’ commitment than other languages might be, while conferring substantial benefits.  Sure, languages come and go, but working in a lisp makes it easy to not get distracted by syntax, and to instead focus on semantics.  Time spent dealing with abstractions this way makes it easy to disregard superficial differences in other languages, and the experience makes it easy to learn them quickly should the need arise.

Tradeoffs for companies

Companies have to worry about a number of things with regard to technology choices, namely there is a question of the ability to hire good developers to work in a language.  Clojure is still not yet mainstream, and the developers are few and far between. However, if you manage to find one, you are guaranteed that they will be someone who cares about optimizing their workflow, productivity, and relevance.  Interest in clojure is a good indicator of respect for the above tradeoffs and good design sensibility.  The language and toolchain is increasing in popularity.  The community is very much engaged and invested in its success, and it continues to grow.  General hardware trends and competitive trends are going to push more interest in clojure’s direction, and the language itself will keep pace with innovation.  The bottom line is that it takes a bit of effort to learn, but it stays out of the way and presents safety and simplicity as the convenient things to do.  It integrates well with any JVM solution, and many companies such as Twitter leverage a JVM polyglot infrastructure that includes clojure.  I think the most relevant analysis was the recent ThoughtWorks Radar: http://www.thoughtworks.com/radar , which both placed clojure in the ‘adopt’ category and promoted small composable libraries, a hallmark of clojure’s approach.

Tradeoffs for me

Personally, through my experiences at work, multiple conferences, IRC and newsgroups, I’m convinced that the Clojure community is a melting pot of innovation from many walks of developers.  I have confidence that there won’t be a more relevant language for me for at least five years.  Given trends in hardware, concurrency will become more of a driving force in language decisions, and I want to be on the cusp.  For larger numbers of cores and distributed systems in the future, clojure is making its way into message-passing, and will certainly have a solid offering.  I can stop worrying about languages for a while, and I can instead focus on the JVM platform itself and general systems problems until we hit a point where the tradeoffs are no longer a match for the systems that need to be built.

What they say about lisp is true, I really feel like I’m learning the truths of computing without getting bogged down by the act of expression.  After I got over the initial hump, I now spend 99% my time thinking about the problems I’m trying to solve instead of fussing with the tools.  When I have to learn something new about the language by doing a deep-dive, I always feel it’s a worthwhile exercise due to the readability of concise, idiomatic, well-composed code.

The right tool for the job

The ‘right tool for the job’ might be a more relevant thing to say in the material world, where you have to go somewhere to pay money for tools, and repurposing them would be too costly.  In open-source tech, we are able to trade our time and speculation to improve our own tools.  Taking advantage and contributing back to someone else’s tools is freely encouraged. A 50-year hammer has an opportunity to influence the design of a future toolmaker’s swiss army knife in combination with other tools built by experts from other disciplines.  I hope I’ve been persuasive that it’s more interesting to talk about actual design tradeoffs and their implications.

In conclusion, Clojure’s a more right tool for more jobs than one might think.  By making opinionated yet cautious choices, Clojure allows the user enough breathing room to compose and extend constructs in whichever way is appropriate, while maintaining a set of standards that promote safe, performant, and beautiful code.

Advertisements
Categories: Clojure, Observations

Dynamic JVM language Complexity and Implementation Line Count

January 14, 2012 10 comments

Yesterday, I was hanging out in the #clojure IRC chat (pretty good bunch of guys).  A fella mentioned that he was impressed with how easy it is to read and understand clojure’s guts, and I totally identified with his sentiment.  I’ve always found it informative and enjoyable.

His words:

“im pretty sure i know more about the internals of clojures functions in a few years of dabbling than i know about python after many years of serious use”

Now, I don’t mean to python-bash, I myself was very interested in python in much the same way as I am in clojure now, in fact I volunteered on the AV crew at pyCon 2009 and learned a ton of good stuff. I recommend an experience like this to anyone that can do it.  Of course, I’ve come to love clojure for the same reasons I loved python, plus the extra concurrency features, metaprogramming and speed.

In python, I always had the sentiment that there was some magical __hack__() function you could mess with to do weird stuff.  In general, there felt to be a whole layer of that underpinning everything.  You get the sense that there’s some untouchable magic.  I pride myself on doing deep-dives into whatever I’m working on and understanding it thoroughly at every level of abstraction.  I like the security and agility that comes with knowing what my code actually does down to the machine-level.  It means nothing’s sacred.  I am comfortable changing my code around quickly and I can generally grok others’ code pretty well.  In the couple of years that I really cared about python, I never reached this level, even with the great talks I saw at pyCon that explained a lot of internals.   Maybe if I spent some more years on it, the situation would improve, but lately I’m just more interested in learning CS fundamentals and actually getting stuff done.  Still, I could work in python without knowing every detail, and it felt better than java.

Clojure doesn’t feel like that at all.  In fact, to me it’s like a thin veneer over java.  I think I understand the gist of the whole implementation (minus macros, crazy data structure stuff and STM implementation details).  All the features are pretty orthogonal.  Yet, when people talk about clojure, they say it is powerful, terse and clear (I think).

I decided to do a little line count of the language implementations for fun.  I chose jython and jruby instead of their C implementations as it seems more apples to apples.  I chose jruby for the contrast to jython, since I was initially under the impression that they’re pretty similar.  They all share a garbage collector and underlying java infrastructure to build from, so ideally the code should be representative of only the language’s syntax and semantics, along with the standard library.  I downloaded the latest version of each implementation from their github pages, and took care to only count code in the src/ subdirectories, not tests.  Hopefully this is some indication of relative complexity.

Too much talking.  Here’s results:

JRuby:

-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Java                          1230          35707          50104         191259
yacc                             3            394            121           3986
Ruby                            36            239            196           1452
HTML                             1              0              0             65
-------------------------------------------------------------------------------
SUM:                          1270          36340          50421         196762
-------------------------------------------------------------------------------

Jython:

--------------------------------------------------------------------------------
Language                      files          blank        comment           code
--------------------------------------------------------------------------------
Java                            436          15226          15396          94203
Python                           10            916           1032           3192
Bourne Again Shell                1             23             33            152
DOS Batch                         1             29             28            101
--------------------------------------------------------------------------------
SUM:                            448          16194          16489          97648
--------------------------------------------------------------------------------

Clojure:

-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Java                           146           5485          11086          35386
Lisp                            38           2286            828          14557
XML                              2              0              0             81
HTML                             2              6             58             71
-------------------------------------------------------------------------------
SUM:                           188           7777          11972          50095
-------------------------------------------------------------------------------

Now, what can we observe here?

Firstly, both jython and jruby are mostly implemented in java.  Clojure on the other hand has a disproportionate amount of high-level code.  I think this might be due to performance philosophy more than anything.  I bet if the implementors could get the same speed implementing ruby-in-ruby or python-in-python, they would (see PyPy and Rubinius).  Clojure has a strong performance focus, and clojure code is compiled as statically as possible, which means you can write more performant stuff in high-level code (I know JRuby has gotten much, much faster lately with the invokeDynamic work).  Also, since it’s a lisp, it bootstraps itself from meager beginnings incrementally. I encourage everyone to study core.clj to see how awesome this is.

Secondly, wtf, clojure is 1/2 the size of jython and 1/4 the size of jruby.  Why?  Well, it’s a lisp.  This means that a parser is pretty trivial with LispReader.java clocking in at 1k lines.  Comparatively, the jython antlr package is 6k, and the jruby parser is 19k of java (4k yacc).  Once it’s parsed, it’s all native data structures that can be worked on with the normal sequence functions.  The others have crazy AST tree-visitor things.

How do we explain the rest of the size difference?  I’m not really sure.  I think a lot of it is actual language semantics, perhaps a substantial amount of code could be in an expanded standard library, but I highly doubt that jruby’s library is THAT much more useful than jython’s.  Of course, both those languages have the burden of implementing a standard library that acts the same on native code and the jvm, so I imagine there’s a lot of wheel re-invention.  Also, clojure’s relatively new.  I have no doubt that it will expand, but I trust it will keep comparatively slim.  Fun fact: if you pretend clojure code is 10x as dense as java (some say that), you still get 181k , less than jruby :-).

So, I know relatively little about ruby, and a bit more about python.  The conclusion I want to put forth isn’t that they’re doing it wrong or we’re better.  I just want us to ask the question, is all that extra stuff really necessary for the power you get from it?  And, wouldn’t you want to know how your tools work, too, or is it just a weird fixation?  Do people like to program by faith, or is it really not so hard to hold those languages fully in your head?

EDIT:
Groovy:

-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Java                           828          20549          51344         103197
Groovy                         147           2899           5407          12944
HTML                            62            153            165           1829
XML                              6             23             42             79
CSS                              1              7              7             14
-------------------------------------------------------------------------------
SUM:                          1044          23631          56965         118063
-------------------------------------------------------------------------------

Groovy is representative of a dynamic language that is loyal to only the JVM.  It should be more comparable to clojure since it does not have the burden of multi-platform semantics.  The antlr package alone (parser) is 7k lines of java.
Scala (a little complicated):
main src directory:

--------------------------------------------------------------------------------
Language                      files          blank        comment           code
--------------------------------------------------------------------------------
Scala                          1507          31557          64947         136732
Java                            202           5734          14834          26034
Javascript                        7             97            114           1217
XML                              18             74             70            973
CSS                               4            195             98            866
HTML                              1             19             13            168
Bourne Again Shell                1             24             35            143
Bourne Shell                      1              1              4              3
--------------------------------------------------------------------------------
SUM:                           1741          37701          80115         166136
--------------------------------------------------------------------------------

just the compiler directory:

--------------------------------------------------------------------------------
Language                      files          blank        comment           code
--------------------------------------------------------------------------------
Scala                           456          14648          22478          77911
Javascript                        7             97            114           1217
CSS                               3            179             97            804
Bourne Again Shell                1             24             35            143
XML                               2              0              4             30
--------------------------------------------------------------------------------
SUM:                            469          14948          22728          80105
--------------------------------------------------------------------------------

Scala’s here just for fun. it’s a static language, so it doesn’t quite fit in with the rest.

Categories: Observations

Update

April 3, 2011 Leave a comment

I’ve started a new challenging job. One way that it’s unexpectedly hard is the difficulty of learning an existing code-base. Previously I was writing mostly new code, but I think it’s highlighting a fundamental issue in the way we’ve designed software for the last 20 years or so.

The promise of OOP is modularity, reuse, and general lowering of complexity. But, I think there is a fundamental mismatch between the ideals of objects and types, their interactions and how we think. My approach so far has always been an imperative one, mainly I think about what it’s going to do as I build my mental model of a code-base.

So, the first point that I want to make is as developers we all aim for simplicity and conciseness. We have certain requirements that our code should meet. Static type systems like C++ and Java promise that we can encode those requirements as types and function signatures. So, we do that, and generally take it to an extreme for maximum effectiveness. Encapsulation promises that we can encode our contracts in types. Polymorphism gives us some flexibility later, but the main issue is we have to create huge infrastructures of code to encode our requirements, and polymorphism is necessary to make it not suck as much every time we change something.

As I’m reading over a code-base, I have to re-discover the original intent that was lost in the ceremony. There’s cruft all around, and I’m following trails and back-tracking with some sense that what I’m looking for might be just around the corner, and some kind of hunch for predicting whether I’m getting closer or not, all the while learning what the code is going to do and getting bogged down in implementation details.

I propose a few things that would help.

Firstly, if we keep going the way we are, only able to encode contracts by types, I want a tool that can index call stacks, then find a path through from one type to another. I want to know what the main entry points are, for instance do I have to create a factory that takes an inner class enum for it’s argument to get access to a transformation to get the type I want? I should easily be able to generate a chain of function calls to get to the type I want. This would also make it simple to find the main entry points and highlight the designer’s intent.

Secondly, I want some way to formally specify the inputs and outputs of everything that is succinct, easy and helpful to use, lives outside of implementation details, and would be tool-friendly.

Thirdly, I want the capability of performing a semantic code search using these formal constructs. We really should be past the point of searching by text.

Categories: Observations

Game engine, OOP thoughts

January 1, 2010 Leave a comment

I think I have learned something valuable recently in thinking about how I will construct my game objects, and I should share.  I was considering the implementation details of embedding a physics engine into my objects, and I felt considerable apprehension about using multiple inheritance, though it was the most obvious thing to do.  I also considered a templated mixin approach, since I had seen it when using the opensteer pathfinding library for a class project.  Eventually, this led me to what I think now is the right answer.  The downside of the obvious thing, a crazy inheritance hierarchy, is I’d be making it very hard to change my mind if  something were to come up later that I hadn’t planned on, such as a new functionality or ability.  I’d have to blow up my interface classes, or make some awful hacks to make it work right.  The correct thing to do is to use components and has-a relationships as much as possible within the definitions of classes.  Multiple sources reference Design Patterns:

(1) program to an interface and not to an implementation

(2) favor object composition over class inheritance

There must be a good reason these folks that are much smarter than me say this, and I think this commitment to complexity is the issue that made me feel so nervous about adding functionality through multiple inheritance.  Simplicity makes life easier.  My scene graph structure should not have any idea of the physics details (or sound or whatever), that’s something for the classes to figure out.  If I am stingy with my inheritance, I’m less likely to get stuck somewhere along the way or need to hack something.  My game objects (scene graph nodes) can have-a physics object just as easily as they can be-a physics object, but the implementation ends up less tied down if I use components where it makes sense.  I think it will also give less troubles when I start thinking about multi-threading.  For instance, I could have a separate data structures for each type of component (physics objects, sound sources, etc.), and I wouldn’t need to parse the scene graph all the time.  Each subsystem would only be exposed to what it needs to in order to do its job.  This feels more right.

Some articles I found:

A talking door broke everything:

http://forums.tigsource.com/index.php?topic=10112.0

Interesting read:

http://gamearchitect.net/Articles/GameObjects1.html

Categories: Observations

Welcome

January 19, 2009 Leave a comment

Hi, welcome to my site!  It’s still a work in progress.

Categories: Observations