Written on: 2019-08-07

Optimization of higher-order functions in Clojure

Start with a very simple Clojure program:

$ mkdir -p src/blog
$ cat >src/blog/core.clj <<EOF
(ns blog.core
  (:gen-class))

(defn -main
  [& args]
  (println "foo"))
EOF

Make a directory for compiled classes and compile it:

$ mkdir out
$ java -cp clojure.jar:src -Dclojure.compile.path=out clojure.lang.Compile blog.core
Compiling blog.core to out

And check that it worked:

$ java -cp clojure.jar:out blog.core
foo

If you take a look in the out directory, you will see that simple program has turned into five separate class files:

$ du -a out
4       out/blog/core__init.class
4       out/blog/core$loading__6434__auto____279.class
4       out/blog/core$_main.class
4       out/blog/core$fn__281.class
4       out/blog/core.class
24      out/blog
28      out

Let's add a function to our namespace:

$ cat >src/blog/core.clj <<EOF
(ns blog.core
  (:gen-class))

(defn foo [args] args)

(defn -main
  [& args]
  (println (foo args)))
EOF
$ java -cp clojure.jar:src -Dclojure.compile.path=out clojure.lang.Compile blog.core
Compiling blog.core to out
$ java -cp clojure.jar:out blog.core bar
(bar)

You should now see that another class file has been created in out called core$foo.class. Each new Clojure function we create will result in a new Java class file being created.

Let's take a look at it:

$ javap -c out/blog/core\$foo.class
Compiled from "core.clj"
public final class blog.core$foo extends clojure.lang.AFunction {
  public blog.core$foo();
    Code:
       0: aload_0
       1: invokespecial #9                  // Method clojure/lang/AFunction."<init>":()V
       4: return

  public static java.lang.Object invokeStatic(java.lang.Object);
    Code:
       0: aload_0
       1: aconst_null
       2: astore_0
       3: areturn

  public java.lang.Object invoke(java.lang.Object);
    Code:
       0: aload_1
       1: aconst_null
       2: astore_1
       3: invokestatic  #16                 // Method invokeStatic:(Ljava/lang/Object;)Ljava/lang/Object;
       6: areturn

  public static {};
    Code:
       0: return
}

Our Clojure function blog.core/foo has turned into a Java class blog.core$foo. It has a constructor that just calls the superclass constructor and two methods: invokeStatic and invoke.

Let's make foo do something more than just return its argument:

$ cat >src/blog/core.clj <<EOF
(ns blog.core
  (:gen-class))

(defn foo [args] (rest args))

(defn -main
  [& args]
  (println (foo args)))
EOF
$ java -cp clojure.jar:src -Dclojure.compile.path=out clojure.lang.Compile blog.core
Compiling blog.core to out
$ java -cp clojure.jar:out blog.core bar baz
(baz)

And disassemble it again:

$ javap -c out/blog/core\$foo.class
Compiled from "core.clj"
public final class blog.core$foo extends clojure.lang.AFunction {
  public static final clojure.lang.Var const__0;

  public blog.core$foo();
    Code:
       0: aload_0
       1: invokespecial #9                  // Method clojure/lang/AFunction."<init>":()V
       4: return

  public static java.lang.Object invokeStatic(java.lang.Object);
    Code:
       0: getstatic     #15                 // Field const__0:Lclojure/lang/Var;
       3: invokevirtual #21                 // Method clojure/lang/Var.getRawRoot:()Ljava/lang/Object;
       6: checkcast     #23                 // class clojure/lang/IFn
       9: aload_0
      10: aconst_null
      11: astore_0
      12: invokeinterface #26,  2           // InterfaceMethod clojure/lang/IFn.invoke:(Ljava/lang/Object;)Ljava/lang/Object;
      17: areturn

  public java.lang.Object invoke(java.lang.Object);
    Code:
       0: aload_1
       1: aconst_null
       2: astore_1
       3: invokestatic  #30                 // Method invokeStatic:(Ljava/lang/Object;)Ljava/lang/Object;
       6: areturn

  public static {};
    Code:
       0: ldc           #33                 // String clojure.core
       2: ldc           #35                 // String rest
       4: invokestatic  #41                 // Method clojure/lang/RT.var:(Ljava/lang/String;Ljava/lang/String;)Lclojure/lang/Var;
       7: checkcast     #17                 // class clojure/lang/Var
      10: putstatic     #15                 // Field const__0:Lclojure/lang/Var;
      13: return
}

The constructor hasn't changed and invoke still just calls invokeStatic, but invokeStatic now does something more. We have a Clojure Var that is initialized to clojure.core/rest in the class' static initialization block and whose invoke method is called (via the clojure.lang.IFn interface) in our invokeStatic method. It is a bit convoluted but it's pretty clear how that comes from our use of rest.

Now let's add another function of our own and map it over the args:

$ cat >src/blog/core.clj <<EOF
(ns blog.core
  (:gen-class))

(defn addy [x] (str x "y"))

(defn foo [args] (map addy args))

(defn -main
  [& args]
  (println (foo args)))
$ java -cp clojure.jar:src -Dclojure.compile.path=out clojure.lang.Compile blog.core
Compiling blog.core to out
$ java -cp clojure.jar:out blog.core bar baz
(bary bazy)

We now have a new file in out: core$addy.class. This contains the new class for our addy function, which has its own constructor, invoke and invokeStatic methods and its own Var object for clojure.core/str.

If we disassemble core$foo again, we can see how it is used:

public final class blog.core$foo extends clojure.lang.AFunction {
  public static final clojure.lang.Var const__0;

  public static final clojure.lang.Var const__1;

  public blog.core$foo();
    Code:
       0: aload_0
       1: invokespecial #9                  // Method clojure/lang/AFunction."<init>":()V
       4: return

  public static java.lang.Object invokeStatic(java.lang.Object);
    Code:
       0: getstatic     #15                 // Field const__0:Lclojure/lang/Var;
       3: invokevirtual #21                 // Method clojure/lang/Var.getRawRoot:()Ljava/lang/Object;
       6: checkcast     #23                 // class clojure/lang/IFn
       9: getstatic     #26                 // Field const__1:Lclojure/lang/Var;
      12: invokevirtual #21                 // Method clojure/lang/Var.getRawRoot:()Ljava/lang/Object;
      15: aload_0
      16: aconst_null
      17: astore_0
      18: invokeinterface #30,  3           // InterfaceMethod clojure/lang/IFn.invoke:(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
      23: areturn

  public java.lang.Object invoke(java.lang.Object);
    Code:
       0: aload_1
       1: aconst_null
       2: astore_1
       3: invokestatic  #34                 // Method invokeStatic:(Ljava/lang/Object;)Ljava/lang/Object;
       6: areturn

  public static {};
    Code:
       0: ldc           #37                 // String clojure.core
       2: ldc           #39                 // String map
       4: invokestatic  #45                 // Method clojure/lang/RT.var:(Ljava/lang/String;Ljava/lang/String;)Lclojure/lang/Var;
       7: checkcast     #17                 // class clojure/lang/Var
      10: putstatic     #15                 // Field const__0:Lclojure/lang/Var;
      13: ldc           #47                 // String blog.core
      15: ldc           #49                 // String addy
      17: invokestatic  #45                 // Method clojure/lang/RT.var:(Ljava/lang/String;Ljava/lang/String;)Lclojure/lang/Var;
      20: checkcast     #17                 // class clojure/lang/Var
      23: putstatic     #26                 // Field const__1:Lclojure/lang/Var;
      26: return
}

Unsurprisingly, we have two Var objects: one for clojure.core/map and one for blog.core/addy. We call the invoke method on map via the IFn interface, just as we did for rest last time, and we pass the addy Var as an argument to that method.

This is quite a lot of indirection. By using Var objects like this, Clojure allows the user to redefine any Var at runtime. You could connect to a running REPL and redefine map to do something else, then foo would call your new function instead. This is very flexible but not always necessary, especially in a production environment.

Since indirection is slow, Clojure offers the clojure.compiler.direct-linking option when compiling. This will replace the invocations of map via a Var with direct method invocations:

$ java -cp clojure.jar:src -Dclojure.compile.path=out -Dclojure.compiler.direct-linking=true clojure.lang.Compile blog.core
Compiling blog.core to out
$ java -cp clojure.jar:out blog.core bar baz
(bary bazy)
$ javap -c out/blog/core\$foo.class
Compiled from "core.clj"
public final class blog.core$foo extends clojure.lang.AFunction {
  public static final clojure.lang.Var const__1;

  public blog.core$foo();
    Code:
       0: aload_0
       1: invokespecial #9                  // Method clojure/lang/AFunction."<init>":()V
       4: return

  public static java.lang.Object invokeStatic(java.lang.Object);
    Code:
       0: getstatic     #15                 // Field const__1:Lclojure/lang/Var;
       3: invokevirtual #21                 // Method clojure/lang/Var.getRawRoot:()Ljava/lang/Object;
       6: aload_0
       7: aconst_null
       8: astore_0
       9: invokestatic  #26                 // Method clojure/core$map.invokeStatic:(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
      12: areturn

  public java.lang.Object invoke(java.lang.Object);
    Code:
       0: aload_1
       1: aconst_null
       2: astore_1
       3: invokestatic  #31                 // Method invokeStatic:(Ljava/lang/Object;)Ljava/lang/Object;
       6: areturn

  public static {};
    Code:
       0: ldc           #34                 // String blog.core
       2: ldc           #36                 // String addy
       4: invokestatic  #42                 // Method clojure/lang/RT.var:(Ljava/lang/String;Ljava/lang/String;)Lclojure/lang/Var;
       7: checkcast     #17                 // class clojure/lang/Var
      10: putstatic     #15                 // Field const__1:Lclojure/lang/Var;
      13: return
}

The map invocation is no longer performed via the invoke method on a Var: instead we call the core$map class' invokeStatic method directly. But we do still have a Var for the function that we pass to map. This hasn't been removed because we are not directly invoking it, we are passing it as an argument to map.

This remaining level of indirection, the function call, adds a considerable overhead to the use of map. To eliminate this overhead, we would like to inline the body of the addy function.

If we were writing in a procedural language with a for loop, we probably wouldn't have introduced a function call here anyway: the functionality provided by addy is so trivial that we would have just written it in the body of the for loop. But if we wish to use a functional programming style with map then we must provide a function. We are now at the mercy of the compiler: is it clever enough to automatically inline addy and remove the function call overhead?

Clearly the first-stage Java compiler did not inline it. However, Java bytecode is executed on the JVM, which has a JIT that optimizes hot code. We can ask the JVM to log whenever the JIT inlines a function:

$ java -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining -cp clojure.jar:out blog.core bar baz | grep addy

And we see nothing. (Without the | grep addy we see lots of things, but we want to see addy being inlined and we don't see that.) This isn't very surprising since we only call addy once: the JIT won't consider that code a candidate for optimization unless it is called many times.

So let's call it lots of times. Also, let's print to stderr so that we can easily distinguish between our program's output and the JVM's PrintInlining output:

$ cat >src/blog/core.clj <<EOF
(ns blog.core
  (:gen-class))

(defn addy [x] (str x "y"))

(defn foo [args] (map addy args))

(defn -main
  [& args]
  (binding [*out* *err*]
    (dotimes [_ 100000]
      (println (foo args)))))
$ java -cp clojure.jar:src -Dclojure.compile.path=out -Dclojure.compiler.direct-linking=true clojure.lang.Compile blog.core
Compiling blog.core to out
$ java -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining -cp clojure.jar:out blog.core bar baz 2>/dev/null | grep addy
                              @ 3   blog.core$addy::invokeStatic (19 bytes)
                              @ 3   blog.core$addy::invokeStatic (19 bytes)   inline (hot)

Hooray! inline (hot) is what we wanted to see. That means the JIT has determined that addy is a hot function and has successfully inlined it. This removes the function call overhead and makes our use of map as fast as a for loop. Or does it?

We can see that core$addy::invokeStatic has been inlined into its caller. That might be enough to eliminate the function call overhead in some languages, but we have already seen that a function call in Clojure consists of more than a single function call in the resulting Java. Let's see the complete stack trace:

$ cat >src/blog/core.clj <<EOF
(ns blog.core
  (:gen-class))

(defn addy [x]
  (.printStackTrace (new Exception))
  (str x "y"))

(defn foo [args] (doall (map addy args)))

(defn -main
  [& args]
  (println (foo args)))
$ java -cp clojure.jar:src -Dclojure.compile.path=out -Dclojure.compiler.direct-linking=true clojure.lang.Compile blog.core
Compiling blog.core to out
$ java -cp clojure.jar:out blog.core bar
java.lang.Exception
        at blog.core$addy.invokeStatic(core.clj:4)
        at blog.core$addy.invoke(core.clj:4)
        at clojure.core$map$fn__5587.invoke(core.clj:2747)
        at clojure.lang.LazySeq.sval(LazySeq.java:40)
        at clojure.lang.LazySeq.seq(LazySeq.java:49)
        at clojure.lang.RT.seq(RT.java:528)
        at clojure.core$seq__5124.invokeStatic(core.clj:137)
        at clojure.core$dorun.invokeStatic(core.clj:3125)
        at clojure.core$doall.invokeStatic(core.clj:3140)
        at blog.core$foo.invokeStatic(core.clj:8)
        at blog.core$_main.invokeStatic(core.clj:10)
        at blog.core$_main.doInvoke(core.clj:10)
        at clojure.lang.RestFn.applyTo(RestFn.java:137)
        at blog.core.main(Unknown Source)
(bary)

Note that I added a doall to foo just to make the stack trace clearer: otherwise addy is called when the sequence is being realized so foo does not appear in the stack, which may be surprising. This does not change the important point, which is that map calls core$addy.invoke that in turn calls core$addy.invokeStatic.

invokeStatic is a static class method. We have already seen that the function is passed to map as a Var object, which implements the IFn interface that provides an invoke method. So it is not enough for core$addy.invokeStatic to be inlined: we also need core$addy.invoke to be inlined too. Unfortunately, as we've seen from the -XX:+PrintInlining output, core$addy.invoke was not inlined.

Do transducers help with this?

$ cat >src/blog/core.clj <<EOF
(ns blog.core
  (:gen-class))

(defn addy [x] (str x "y"))

(defn foo [args] (into '() (map addy) args))

(defn -main
  [& args]
  (binding [*out* *err*]
    (dotimes [_ 1000000]
      (println (foo args)))))
$ java -cp clojure.jar:src -Dclojure.compile.path=out -Dclojure.compiler.direct-linking=true clojure.lang.Compile blog.core
Compiling blog.core to out
[charles@sunshine clj-blog]$ java -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining -cp clojure.jar:out blog.core bar baz 2>/dev/null | grep addy
                              @ 3   blog.core$addy::invokeStatic (19 bytes)
                              @ 20   blog.core$addy::invoke (7 bytes)   inline (hot)
                                @ 3   blog.core$addy::invokeStatic (19 bytes)   inline (hot)
                                @ 20   blog.core$addy::invoke (7 bytes)   inline (hot)
                                  @ 3   blog.core$addy::invokeStatic (19 bytes)   inline (hot)
                                @ 20   blog.core$addy::invoke (7 bytes)   inline (hot)
                                  @ 3   blog.core$addy::invokeStatic (19 bytes)   inline (hot)
                              @ 3   blog.core$addy::invokeStatic (19 bytes)   inline (hot)

Yes! Now we have both core$addy::invokeStatic and core$addy::invoke inlined. Let's check that is enough:

$ cat >src/blog/core.clj <<EOF
(ns blog.core
  (:gen-class))

(defn addy [x]
  (.printStackTrace (new Exception))
  (str x "y"))

(defn foo [args] (into '() (map addy) args))

(defn -main
  [& args]
  (println (foo args)))
$ java -cp clojure.jar:src -Dclojure.compile.path=out -Dclojure.compiler.direct-linking=true clojure.lang.Compile blog.core
Compiling blog.core to out
$ java -cp clojure.jar:out blog.core bar
java.lang.Exception
        at blog.core$addy.invokeStatic(core.clj:4)
        at blog.core$addy.invoke(core.clj:4)
        at clojure.core$map$fn__5583$fn__5584.invoke(core.clj:2734)
        at clojure.lang.ArraySeq.reduce(ArraySeq.java:109)
        at clojure.core$transduce.invokeStatic(core.clj:6803)
        at clojure.core$into.invokeStatic(core.clj:6819)
        at blog.core$foo.invokeStatic(core.clj:8)
        at blog.core$_main.invokeStatic(core.clj:10)
        at blog.core$_main.doInvoke(core.clj:10)
        at clojure.lang.RestFn.applyTo(RestFn.java:137)
        at blog.core.main(Unknown Source)
(bary)

addy.invokeStatic and addy.invoke are the only functions beneath map in the call stack so if they are both inlined then we have eliminated all function calls.

Why does using the transducer version of map improve inlining? I have no idea. I tried it because, 27 minutes and 20 seconds into his talk "Transducers" at StrangeLoop 2014, Rich Hickey said of transducers, "They're just a stack of function calls. They're short. Potentially they could be inlined." But it is not obvious to me why the resulting Java bytecode is any more amenable to inlining than it was in the non-transducer case. Both times we call a static class method via a dynamic interface method.

Obviously, in this particular case, something about the code (and especially the hot paths through the running code) allowed the JIT to inline the call in the transducer case and not in the other case. I am not certain that this would generalize: I think all we can reasonably infer from this is that inlining is far from certain.

So sometimes, if we are lucky, we can use higher-order functions with standard functional-programming patterns such as map, filter, reduce, etc. and the JVM JIT will optimize hot code to achieve parity with the old procedural-style for loop. But sometimes it won't.

We must therefore accept that the use of map, filter, reduce et al. in Clojure will often result in suboptimal code. Is there a benefit to the programmer that outweighs the cost to the computer? I suspect this is a matter of taste.

Personally, I find the myriad different ways of performing the same task quite confusing. Should I be using map directly? Or as a transducer? Or with transients? Or a loop? These are not questions that I want to think about when I have a problem to solve: I want to think about the problem itself, not the many different ways in which I might express the same algorithm. Having a single loop construct would be simpler both for me and for the computer.