Home > Clojure > Separation of concerns, or let-map, birth of a clojure macro

Separation of concerns, or let-map, birth of a clojure macro

Motivation

I’ve been doing substantial work in clojure recently, and it’s created opportunities for me to think about how I program more deeply. Of course, in itself that would be a useless diversion, but in fact the nature of lisp’s abstraction tools affects my workflow significantly. There is truth to the notion of growing your own language to do the job. I really get the sense that I am learning deeper secrets of expressivity and programming by having so much power over my code, and I spend very little time actually repeating myself. There are of course adjustments to be made coming from an imperative, statically typed background, and even when I thought I understood the language from about a year of study, books and toy programs, I’m finding there’s a lot more to think about when you use it full-time. Here, I provide one such real-world example and a walkthrough of one of my first proper user-defined syntactic abstractions (macro).

Consider the case of a ‘main’ method, the initial entry-point to a command line program. Here’s the simplest example:

(defn -main
  [& args]
  (do-things args))

Clojure’s runtime will pack the java String array that normally takes the role of a variadic arglist into a clojure Seq named args. This simple case of a main method is simply passing the args to a function that can handle it. However, this is suboptimal. It’s taking the burden of argument-parsing and placing it deeper in your application code. Really, if I saw this in the wild, my first impulse would be to separate the concerns as much as possible. The business code should be concerned with data on its own terms, and any adapters and bridges from other domains (shell-land, the internet, etc.) should live outside as meager servants. Core application/library code should be general and pure. Consider the case that do-things is of this form:

(defn do-things
  "Does the things"
  [{:keys [things-to-do when-to-do-them how-to-do-them]}]
  (actually-do-things things-to-do when-to-do-them how-to-do-them))

If you’re not familiar with the {:keys [..]} syntax, it’s a shortcut for pulling out values from associative maps, called destructuring. It’s a pervasive pattern in clojure binding forms, originating from common lisp. It hides the map lookup for each key behind syntax sugar, using a macro. We will take advantage of this ability later, but for now, think of ‘do-things’ as a function that takes a map of this form:

{:things-to-do ...
 :when-to-do-them ...
 :how-to-do-them ...}

The keys are clojure keywords, and the values are whatever they need to be. In my feeble mind, I think of this structure as a sort of anonymous type, a specification of a contract to the input of the function. Since clojure is so incredibly powerful at dealing with data, we can avoid writing code that would normally show up as api by taking advantage of rich declarative data. And code is data. Back to our -main function:

(defn -main
  [& args]
  (do-things (parse args)))

`

We’ve added an extra step, now there exists a ‘parse’ method. By separating out the parsing of the args, we can test parsing separately, however the parse method’s output is now coupled to the contract of the do-things function. In most cases, this is not a substantial improvement, but it’s nice to have the option and freedom. The -main function is becoming closer to being a simple glue binding data transformations together, where each piece can exist, vary, be reasoned upon and tested independently.

To fully decouple the parsing, -main has to have a transformation step, here’s a simple example of what our parse method could look like in this scenario. Our parse method is now simply concerned with the structure of the arguments and defines its own input/output contract. ‘some-magic-parsing-fn’ is some library function that provides a data-centric interface to the arguments, for instance a getopt-style command-line input can be parsed into a clojure map, where options and specs specify how the data is transformed. We take that raw transformation and apply some defaults and things in our custom parse method.

(defn parse 
   [& args]
   (let [parsed (some-magic-parsing-fn args)
         {:keys [in out munge]} parsed
         in (or in "default-in")
         out (or out (get-default-out in))]
     {:in in
      :out out
      :munge munge}))

Here we see a function that applies some defaults to the parsed arguments, even making the default of ‘out’ rely on a function of the ‘in’. So, we have our separately-concerned argument transformation, it’s independent from both our glue logic and our library code. All it’s really doing is munging, ‘destructuring’ and reassembling things into a new map form. But, something feels odd and a little repetitive, no? What the heck is this about?

{:in in       
 :out out
 :munge munge}

It’s just ugly! Why did I do that? It turns out, that having the ability to destructure in binding forms is hella convenient, and you can’t do anything similar inside a map literal. Neither can you reference values to key/value pairs that you’ve already defined inline. In my case I had to do this same sort of thing a few times, I thought it would be worth my while to make it pretty. Wouldn’t it be great to glue together a destructuring binding form that outputs as a map? That seemed totally like a useful thing, so I made it. It’s kind of a ‘restructuring’ operation.

Implementation

I asked around on IRC for some advice on how to do this and got some help. I didn’t come up with it, but I can explain it :-).

(defmacro let-map
  [pairs]
  (let [names (map first (partition 2 pairs))]
    `(let [~@pairs]
       (zipmap
        [~@(map (comp keyword name) names)]
        [~@names]))))

Defmacro is itself a macro in clojure. The clojure compiler provides a hook to run code as a macro, that is, at compile-time. The common use of a macro is to manipulate forms and spit out other code, that is, syntactic abstraction, though it can be used for lots of things. The structure of ‘pairs’ is expected to be a name (symbol) followed by a value, just like regular let. The first let extracts all the names from the bindings vector for later use. The backtick is a syntax-quote operator, it tells the reader to interpret the following s-exp as data, and it will not evaluate calls. The major difference over the normal ‘ quote operator is the addition of namespace-expanding any symbols, for instance:

user> '(let [a b c])
(let [a b c])
user> `(let [a b c])
(clojure.core/let [user/a user/b user/c])
user> `(let [a b c])

Since a, b, and c are not resolvable, they are defined to be within the current namespace.

So, now we spit out a proper let form. Within the bindings of the let we see this nonsense: ~@pairs. What is that?! The ~ (unquote) switches the next form back to value-mode, where symbols actually get evaluated into values. This is the mode we’re normally in when we write code. Since pairs is a seq of values, we have a problem. [[a b c d]] is not what we’re looking for. How do we un-nest the values from the inner vector? The @ operator is the key, when used in this context it splices (expands) the values from a seq out of the seq. So, we have our let-bindings, how do we output a map? We have to build up the keys and values. Zipmap is suitable for assembling a maps from seqs of keys and values. We create a seq of keys by applying ‘keyword’ to the name of each symbol, splicing them into a vector. We create a seq of values by simply splicing the names in value-mode. The compiler will automatically substitute values in-place of the symbols, as in regular code. So, that totally works, but we don’t yet have destructuring. Wouldn’t it be nice if we could make that happen? It’s not so difficult actually, with some caveats.

Let’s see what ‘destructure’ gives us:

user> (destructure '[[a b] [1 2]])
[vec__2314 [1 2] a (clojure.core/nth vec__2314 0 nil) b (clojure.core/nth vec__2314 1 nil)]

Hmm…. interesante. It’s generating symbols and function calls to pull out the values that we need.

What if we just run our bindings through destructure? Here’s the new let-map:

(defmacro let-map
  "Avoids having to build maps from let bindings manually"
  [pairs]
  (let [pairs (destructure pairs)
        names (map first (partition 2 pairs))]
    `(let [~@pairs]
       (zipmap [~@(map (comp keyword name) names)]
               [~@names]))))

user> (let-map [[a b] [1 2]
                {c :c d :d} {:c 3 :d 4}])
{:d 4, :c 3, :map__2495 {:c 3, :d 4}, :b 2, :a 1, :vec__2494 [1 2]}

There you have it, if you’re willing to tolerate some extra gensyms for the destructured maps, this will suffice, but I think we can do better. How about if we walk through our input and pick out the original symbols? Then we can assemble only the parts we want.

My first thought was to flatten everything, though ‘flatten’ only works on seqs and vectors. If we look at the code to flatten, it gives us a clue what to do.

(defn flatten
  "Takes any nested combination of sequential things (lists, vectors,
  etc.) and returns their contents as a single, flat sequence.
  (flatten nil) returns nil."
  {:added "1.2"
   :static true}
  [x]
  (filter (complement sequential?)
          (rest (tree-seq sequential? seq x))))

So, we want to walk the forms and identify any symbols. Destructuring works on vectors or maps, so it would make sense to expand only those. We look at the definition of tree-seq, which is a clever builder higher-order-function that creates a sequence based on a tree walk, branching when a predicate is true, and pulling out a seq of children by calling a function on the node. It’s the right tool for the job!

user> (tree-seq #(or (vector? %) (map? %)) identity [[5 6] [1 2] {7 :c 9 :d}])
([[5 6] [1 2] {7 :c, 9 :d}] [5 6] 5 6 [1 2] 1 2 {7 :c, 9 :d} [7 :c] 7 :c [9 :d] 9 :d)

So, we create a seq, then we simply filter it for symbols. Since I always wish smart guys would show their thought process so I can learn how to be like them, I will show you all my mistakes along the way :-).

(defmacro get-symbols
  [form]
  (->> (tree-seq #(or (vector? %) (map? %)) identity form)
       (filter symbol?)))
user> (get-symbols '[[a b] [1 2] {c :c d :d}])
()

Oops…

(defmacro get-symbols
  [form]
  `(->> (tree-seq #(or (vector? %) (map? %)) identity form)  ;hmmm, I might be forgetting something here
        (filter symbol?)))

user> (get-symbols '[[a b] [1 2] {c :c d :d}])

No such var: user/form
  [Thrown class java.lang.RuntimeException]

Oops.

(defmacro get-symbols
  [form]
  `(->> (tree-seq #(or (vector? %) (map? %)) identity ~form)
        (filter symbol?)))

user> (get-symbols '[[a b] [1 2] {c :c d :d}])
(a b c d)

Success! Now, let’s hook it up to let-map.

(defmacro let-map
  "Avoids having to build maps from let bindings manually"
  [bindings]
  (let [bindings (destructure bindings)
        names (get-symbols bindings)]  ; as long as it's in there, right?
    `(let [~@bindings]
       (zipmap [~@(map (comp keyword name) names)]
               [~@names]))))

user> (let-map [[a b] [1 2]
                {c :c d :d} {:c 3 :d 4}])
{:d 4, :c 3, :map__3140 {:c 3, :d 4}, :b 2, :a 1, :vec__3139 [1 2]}

…try again…

(defmacro let-map
  "Avoids having to build maps from let bindings manually"
  [bindings]
  (let [names (get-symbols bindings)   ; immutability decomplects time and value
        bindings (destructure bindings)]
    `(let [~@bindings]
       (zipmap [~@(map (comp keyword name) names)]
               [~@names]))))

user> (let-map [[a b] [1 2]
                {c :c d :d} {:c 3 :d 4}])
{:d 4, :c 3, :b 2, :a 1}

Shazam! And if you want to inline get-symbols, notice we don’t have to quote anything.

EDIT: we also want to only zip uniquely, and calling ‘name’ may be unnecessary
EDIT: only traverse the odd elements bindings for symbols, in case vals are quoted symbols

Final Code: let-map

(defmacro let-map
  "Avoids having to build maps from let bindings manually"
  [bindings]
  (let [names (->> (take-nth 2 bindings)
                   (tree-seq #(or (sequential? %) (map? %)) identity)
                   (filter symbol?)
                   (into #{}))  ; dumps it all into a set
        bindings (destructure bindings)]
    `(let [~@bindings]
       (zipmap [~@(map keyword names)]
               [~@names]))))

Here we can see the code the macro is producing:

user> (pprint (macroexpand-1 '(let-map [[a b] [1 2]
                                        {c :c d :d} {:c 3 :d 4}])))
(clojure.core/let
 [vec__3309
  [1 2]
  a
  (clojure.core/nth vec__3309 0 nil)
  b
  (clojure.core/nth vec__3309 1 nil)
  map__3310
  {:c 3, :d 4}
  map__3310
  (if
   (clojure.core/seq? map__3310)
   (clojure.core/apply clojure.core/hash-map map__3310)
   map__3310)
  c
  (clojure.core/get map__3310 :c)
  d
  (clojure.core/get map__3310 :d)]
 (clojure.core/zipmap [:a :b :c :d] [a b c d]))

TL;DR

What did we just do? We’ve put somewhat of a ‘design pattern’ into a single word. We can trade a little bit of effort for arbitrary expressivity. But we lose a little on having to grok the ever-more nested conceptual tree of abstractions. Personally, I think the tradeoff is worth it. Behold the power of lisp!

The final result, it would be much nastier if we had to explode out all the maps:

(defn parse 
   [& args]
   (let-map [parsed (some-magic-parsing-fn args)
             {:keys [in out munge]} parsed
             in (or in "default-in")
             out (or out (get-default-out in))]))

(defn do-things
  "Does the things"
  [{:keys [things-to-do when-to-do-them how-to-do-them]}]
  (actually-do-things things-to-do when-to-do-them how-to-do-them))

(defn -main
  [& args]
  (let [{:keys [in out munge]}]
    (do-things 
      (let-map [things-to-do (make-things in)
                when-to-do-them :now
                how-to-do-them (munging munge out)]))))
About these ads
Categories: Clojure
  1. Miloslav Rauš
    June 11, 2012 at 11:33 am

    Nice example/ useful construct (+1 for inclusion into core / some

    The last example (let-map/line 17 in -main in TL;DR section) has the closing square-bracket too soon.

    I would also choose other name (something not containing let-; it somehow implies that it should create bindings usable in the body of the construct; maybe to-map, or create-map, named-map or what-ever ;-)).

    Also, the use of (comp keyword name) might be unnecessary, as keyword will convert symbol names into keywords. The only difference is that name strips namespace part of the name, might this be important ?

    • Miloslav Rauš
      June 11, 2012 at 11:35 am

      sorry for unfinished sentence; +1 for inclusion into core / some relevant “contrib” part.

      PS:

    • gtrak
      June 11, 2012 at 12:12 pm

      Ah, Fixed the error. As for calling ‘name’ on it, is there any harm in stripping the namespace? I couldn’t come up with a counter-example, or even an example where it would apply.

  2. Octopusgrabbus
    June 11, 2012 at 1:24 pm

    I’m going to take a closer look at this. I still have no reason to use macros, at least yet.

    • gtrak
      June 11, 2012 at 1:56 pm

      Just to nitpick, you ‘use’ macros all over the place, I think you mean ‘create’ them. Really I think of them as existing for scripting the compiler, anything you want to do at compile-time is a macro. For instance, another macro I created recently would parse a data directory and generate ‘deftest’ forms for each data file. That’s not something I could have done with just functions. So, part of the answer is to use a macro when you want something to happen at compile-time. In this way they’re analagous to C++ templates, but much more powerful as you can see, since you can manipulate your code using all the tools you already have at your disposal.

  3. gtrak
    June 11, 2012 at 4:42 pm

    ah, I found another issue:

    user> (macroexpand-1 ‘(let-map [a 1 b 2 a b]))
    (clojure.core/let [a 1 b 2 a b] (clojure.core/zipmap [:a :b :a :b] [a b a b]))
    Let’s clean that up.

  4. gtrak
    June 11, 2012 at 5:00 pm

    Also we probably only want to consider symbols that are children of the odd-numbered values of ‘bindings’, in the case that a bound value might actually be or contain a quoted symbol. I will provide a fix for that a bit later.

  5. gtrak
    June 11, 2012 at 6:33 pm

    helpful IRC guys showed me a nice alternative, check out https://github.com/flatland/useful/blob/develop/src/useful/map.clj#L5

    usage: (keyed [a b c]) => {:a a, :b b, :c c}

    It has some nice advantages, namely it doesn’t require a zipmap op at runtime, as it just builds the proper form to define a map at compile-time, and it’s much simpler to avoid parsing the let syntax for bindings.

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: