Optimization of higher-order functions in Clojure
Start with a very simple Clojure program:
$ mkdir -p src/blog
$ cat >src/blog/core.clj <<EOF
(ns blog.core
(:gen-class))
(defn -main
[& args]
(println "foo"))
EOF
Make a directory for compiled classes and compile it:
$ mkdir out
$ java -cp clojure.jar:src -Dclojure.compile.path=out clojure.lang.Compile blog.core
Compiling blog.core to out
And check that it worked:
$ java -cp clojure.jar:out blog.core
foo
If you take a look in the out
directory, you will see that simple program has
turned into five separate class files:
$ du -a out
4 out/blog/core__init.class
4 out/blog/core$loading__6434__auto____279.class
4 out/blog/core$_main.class
4 out/blog/core$fn__281.class
4 out/blog/core.class
24 out/blog
28 out
Let's add a function to our namespace:
$ cat >src/blog/core.clj <<EOF
(ns blog.core
(:gen-class))
(defn foo [args] args)
(defn -main
[& args]
(println (foo args)))
EOF
$ java -cp clojure.jar:src -Dclojure.compile.path=out clojure.lang.Compile blog.core
Compiling blog.core to out
$ java -cp clojure.jar:out blog.core bar
(bar)
You should now see that another class file has been created in out
called
core$foo.class
. Each new Clojure function we create will result in a new Java
class file being created.
Let's take a look at it:
$ javap -c out/blog/core\$foo.class
Compiled from "core.clj"
public final class blog.core$foo extends clojure.lang.AFunction {
public blog.core$foo();
Code:
0: aload_0
1: invokespecial #9 // Method clojure/lang/AFunction."<init>":()V
4: return
public static java.lang.Object invokeStatic(java.lang.Object);
Code:
0: aload_0
1: aconst_null
2: astore_0
3: areturn
public java.lang.Object invoke(java.lang.Object);
Code:
0: aload_1
1: aconst_null
2: astore_1
3: invokestatic #16 // Method invokeStatic:(Ljava/lang/Object;)Ljava/lang/Object;
6: areturn
public static {};
Code:
0: return
}
Our Clojure function blog.core/foo
has turned into a Java class
blog.core$foo
. It has a constructor that just calls the superclass constructor
and two methods: invokeStatic
and invoke
.
Let's make foo
do something more than just return its argument:
$ cat >src/blog/core.clj <<EOF
(ns blog.core
(:gen-class))
(defn foo [args] (rest args))
(defn -main
[& args]
(println (foo args)))
EOF
$ java -cp clojure.jar:src -Dclojure.compile.path=out clojure.lang.Compile blog.core
Compiling blog.core to out
$ java -cp clojure.jar:out blog.core bar baz
(baz)
And disassemble it again:
$ javap -c out/blog/core\$foo.class
Compiled from "core.clj"
public final class blog.core$foo extends clojure.lang.AFunction {
public static final clojure.lang.Var const__0;
public blog.core$foo();
Code:
0: aload_0
1: invokespecial #9 // Method clojure/lang/AFunction."<init>":()V
4: return
public static java.lang.Object invokeStatic(java.lang.Object);
Code:
0: getstatic #15 // Field const__0:Lclojure/lang/Var;
3: invokevirtual #21 // Method clojure/lang/Var.getRawRoot:()Ljava/lang/Object;
6: checkcast #23 // class clojure/lang/IFn
9: aload_0
10: aconst_null
11: astore_0
12: invokeinterface #26, 2 // InterfaceMethod clojure/lang/IFn.invoke:(Ljava/lang/Object;)Ljava/lang/Object;
17: areturn
public java.lang.Object invoke(java.lang.Object);
Code:
0: aload_1
1: aconst_null
2: astore_1
3: invokestatic #30 // Method invokeStatic:(Ljava/lang/Object;)Ljava/lang/Object;
6: areturn
public static {};
Code:
0: ldc #33 // String clojure.core
2: ldc #35 // String rest
4: invokestatic #41 // Method clojure/lang/RT.var:(Ljava/lang/String;Ljava/lang/String;)Lclojure/lang/Var;
7: checkcast #17 // class clojure/lang/Var
10: putstatic #15 // Field const__0:Lclojure/lang/Var;
13: return
}
The constructor hasn't changed and invoke
still just calls invokeStatic
, but
invokeStatic
now does something more. We have a Clojure Var
that is
initialized to clojure.core/rest
in the class' static
initialization block
and whose invoke
method is called (via the clojure.lang.IFn
interface) in
our invokeStatic
method. It is a bit convoluted but it's pretty clear how that
comes from our use of rest
.
Now let's add another function of our own and map
it over the args
:
$ cat >src/blog/core.clj <<EOF
(ns blog.core
(:gen-class))
(defn addy [x] (str x "y"))
(defn foo [args] (map addy args))
(defn -main
[& args]
(println (foo args)))
$ java -cp clojure.jar:src -Dclojure.compile.path=out clojure.lang.Compile blog.core
Compiling blog.core to out
$ java -cp clojure.jar:out blog.core bar baz
(bary bazy)
We now have a new file in out
: core$addy.class
. This contains the new class
for our addy
function, which has its own constructor, invoke
and
invokeStatic
methods and its own Var
object for clojure.core/str
.
If we disassemble core$foo
again, we can see how it is used:
public final class blog.core$foo extends clojure.lang.AFunction {
public static final clojure.lang.Var const__0;
public static final clojure.lang.Var const__1;
public blog.core$foo();
Code:
0: aload_0
1: invokespecial #9 // Method clojure/lang/AFunction."<init>":()V
4: return
public static java.lang.Object invokeStatic(java.lang.Object);
Code:
0: getstatic #15 // Field const__0:Lclojure/lang/Var;
3: invokevirtual #21 // Method clojure/lang/Var.getRawRoot:()Ljava/lang/Object;
6: checkcast #23 // class clojure/lang/IFn
9: getstatic #26 // Field const__1:Lclojure/lang/Var;
12: invokevirtual #21 // Method clojure/lang/Var.getRawRoot:()Ljava/lang/Object;
15: aload_0
16: aconst_null
17: astore_0
18: invokeinterface #30, 3 // InterfaceMethod clojure/lang/IFn.invoke:(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
23: areturn
public java.lang.Object invoke(java.lang.Object);
Code:
0: aload_1
1: aconst_null
2: astore_1
3: invokestatic #34 // Method invokeStatic:(Ljava/lang/Object;)Ljava/lang/Object;
6: areturn
public static {};
Code:
0: ldc #37 // String clojure.core
2: ldc #39 // String map
4: invokestatic #45 // Method clojure/lang/RT.var:(Ljava/lang/String;Ljava/lang/String;)Lclojure/lang/Var;
7: checkcast #17 // class clojure/lang/Var
10: putstatic #15 // Field const__0:Lclojure/lang/Var;
13: ldc #47 // String blog.core
15: ldc #49 // String addy
17: invokestatic #45 // Method clojure/lang/RT.var:(Ljava/lang/String;Ljava/lang/String;)Lclojure/lang/Var;
20: checkcast #17 // class clojure/lang/Var
23: putstatic #26 // Field const__1:Lclojure/lang/Var;
26: return
}
Unsurprisingly, we have two Var
objects: one for clojure.core/map
and one
for blog.core/addy
. We call the invoke
method on map
via the IFn
interface, just as we did for rest
last time, and we pass the addy
Var
as
an argument to that method.
This is quite a lot of indirection. By using Var
objects like this, Clojure
allows the user to redefine any Var
at runtime. You could connect to a
running REPL and redefine map
to do something else, then foo
would call
your new function instead. This is very flexible but not always necessary,
especially in a production environment.
Since indirection is slow, Clojure offers the clojure.compiler.direct-linking
option when compiling. This will replace the invocations of map
via a Var
with direct method invocations:
$ java -cp clojure.jar:src -Dclojure.compile.path=out -Dclojure.compiler.direct-linking=true clojure.lang.Compile blog.core
Compiling blog.core to out
$ java -cp clojure.jar:out blog.core bar baz
(bary bazy)
$ javap -c out/blog/core\$foo.class
Compiled from "core.clj"
public final class blog.core$foo extends clojure.lang.AFunction {
public static final clojure.lang.Var const__1;
public blog.core$foo();
Code:
0: aload_0
1: invokespecial #9 // Method clojure/lang/AFunction."<init>":()V
4: return
public static java.lang.Object invokeStatic(java.lang.Object);
Code:
0: getstatic #15 // Field const__1:Lclojure/lang/Var;
3: invokevirtual #21 // Method clojure/lang/Var.getRawRoot:()Ljava/lang/Object;
6: aload_0
7: aconst_null
8: astore_0
9: invokestatic #26 // Method clojure/core$map.invokeStatic:(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
12: areturn
public java.lang.Object invoke(java.lang.Object);
Code:
0: aload_1
1: aconst_null
2: astore_1
3: invokestatic #31 // Method invokeStatic:(Ljava/lang/Object;)Ljava/lang/Object;
6: areturn
public static {};
Code:
0: ldc #34 // String blog.core
2: ldc #36 // String addy
4: invokestatic #42 // Method clojure/lang/RT.var:(Ljava/lang/String;Ljava/lang/String;)Lclojure/lang/Var;
7: checkcast #17 // class clojure/lang/Var
10: putstatic #15 // Field const__1:Lclojure/lang/Var;
13: return
}
The map
invocation is no longer performed via the invoke
method on a Var
:
instead we call the core$map
class' invokeStatic
method directly. But we
do still have a Var
for the function that we pass to map
. This hasn't been
removed because we are not directly invoking it, we are passing it as an
argument to map
.
This remaining level of indirection, the function call, adds a considerable
overhead to the use of map
. To eliminate this overhead, we would like to
inline the body of the addy
function.
If we were writing in a procedural language with a for
loop, we probably
wouldn't have introduced a function call here anyway: the functionality provided
by addy
is so trivial that we would have just written it in the body of the
for
loop. But if we wish to use a functional programming style with map
then
we must provide a function. We are now at the mercy of the compiler: is it
clever enough to automatically inline addy
and remove the function call
overhead?
Clearly the first-stage Java compiler did not inline it. However, Java bytecode is executed on the JVM, which has a JIT that optimizes hot code. We can ask the JVM to log whenever the JIT inlines a function:
$ java -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining -cp clojure.jar:out blog.core bar baz | grep addy
And we see nothing. (Without the | grep addy
we see lots of things, but we
want to see addy
being inlined and we don't see that.) This isn't very
surprising since we only call addy
once: the JIT won't consider that code a
candidate for optimization unless it is called many times.
So let's call it lots of times. Also, let's print to stderr
so that we can
easily distinguish between our program's output and the JVM's PrintInlining
output:
$ cat >src/blog/core.clj <<EOF
(ns blog.core
(:gen-class))
(defn addy [x] (str x "y"))
(defn foo [args] (map addy args))
(defn -main
[& args]
(binding [*out* *err*]
(dotimes [_ 100000]
(println (foo args)))))
$ java -cp clojure.jar:src -Dclojure.compile.path=out -Dclojure.compiler.direct-linking=true clojure.lang.Compile blog.core
Compiling blog.core to out
$ java -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining -cp clojure.jar:out blog.core bar baz 2>/dev/null | grep addy
@ 3 blog.core$addy::invokeStatic (19 bytes)
@ 3 blog.core$addy::invokeStatic (19 bytes) inline (hot)
Hooray! inline (hot)
is what we wanted to see. That means the JIT has
determined that addy
is a hot function and has successfully inlined it. This
removes the function call overhead and makes our use of map
as fast as a for
loop. Or does it?
We can see that core$addy::invokeStatic
has been inlined into its caller. That
might be enough to eliminate the function call overhead in some languages, but
we have already seen that a function call in Clojure consists of more than a
single function call in the resulting Java. Let's see the complete stack trace:
$ cat >src/blog/core.clj <<EOF
(ns blog.core
(:gen-class))
(defn addy [x]
(.printStackTrace (new Exception))
(str x "y"))
(defn foo [args] (doall (map addy args)))
(defn -main
[& args]
(println (foo args)))
$ java -cp clojure.jar:src -Dclojure.compile.path=out -Dclojure.compiler.direct-linking=true clojure.lang.Compile blog.core
Compiling blog.core to out
$ java -cp clojure.jar:out blog.core bar
java.lang.Exception
at blog.core$addy.invokeStatic(core.clj:4)
at blog.core$addy.invoke(core.clj:4)
at clojure.core$map$fn__5587.invoke(core.clj:2747)
at clojure.lang.LazySeq.sval(LazySeq.java:40)
at clojure.lang.LazySeq.seq(LazySeq.java:49)
at clojure.lang.RT.seq(RT.java:528)
at clojure.core$seq__5124.invokeStatic(core.clj:137)
at clojure.core$dorun.invokeStatic(core.clj:3125)
at clojure.core$doall.invokeStatic(core.clj:3140)
at blog.core$foo.invokeStatic(core.clj:8)
at blog.core$_main.invokeStatic(core.clj:10)
at blog.core$_main.doInvoke(core.clj:10)
at clojure.lang.RestFn.applyTo(RestFn.java:137)
at blog.core.main(Unknown Source)
(bary)
Note that I added a doall
to foo
just to make the stack trace clearer:
otherwise addy
is called when the sequence is being realized so foo
does
not appear in the stack, which may be surprising. This does not change the
important point, which is that map
calls core$addy.invoke
that in turn calls
core$addy.invokeStatic
.
invokeStatic
is a static class method. We have already seen that the function
is passed to map
as a Var
object, which implements the IFn
interface that
provides an invoke
method. So it is not enough for core$addy.invokeStatic
to
be inlined: we also need core$addy.invoke
to be inlined too. Unfortunately, as
we've seen from the -XX:+PrintInlining
output, core$addy.invoke
was not
inlined.
Do transducers help with this?
$ cat >src/blog/core.clj <<EOF
(ns blog.core
(:gen-class))
(defn addy [x] (str x "y"))
(defn foo [args] (into '() (map addy) args))
(defn -main
[& args]
(binding [*out* *err*]
(dotimes [_ 1000000]
(println (foo args)))))
$ java -cp clojure.jar:src -Dclojure.compile.path=out -Dclojure.compiler.direct-linking=true clojure.lang.Compile blog.core
Compiling blog.core to out
[charles@sunshine clj-blog]$ java -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining -cp clojure.jar:out blog.core bar baz 2>/dev/null | grep addy
@ 3 blog.core$addy::invokeStatic (19 bytes)
@ 20 blog.core$addy::invoke (7 bytes) inline (hot)
@ 3 blog.core$addy::invokeStatic (19 bytes) inline (hot)
@ 20 blog.core$addy::invoke (7 bytes) inline (hot)
@ 3 blog.core$addy::invokeStatic (19 bytes) inline (hot)
@ 20 blog.core$addy::invoke (7 bytes) inline (hot)
@ 3 blog.core$addy::invokeStatic (19 bytes) inline (hot)
@ 3 blog.core$addy::invokeStatic (19 bytes) inline (hot)
Yes! Now we have both core$addy::invokeStatic
and core$addy::invoke
inlined.
Let's check that is enough:
$ cat >src/blog/core.clj <<EOF
(ns blog.core
(:gen-class))
(defn addy [x]
(.printStackTrace (new Exception))
(str x "y"))
(defn foo [args] (into '() (map addy) args))
(defn -main
[& args]
(println (foo args)))
$ java -cp clojure.jar:src -Dclojure.compile.path=out -Dclojure.compiler.direct-linking=true clojure.lang.Compile blog.core
Compiling blog.core to out
$ java -cp clojure.jar:out blog.core bar
java.lang.Exception
at blog.core$addy.invokeStatic(core.clj:4)
at blog.core$addy.invoke(core.clj:4)
at clojure.core$map$fn__5583$fn__5584.invoke(core.clj:2734)
at clojure.lang.ArraySeq.reduce(ArraySeq.java:109)
at clojure.core$transduce.invokeStatic(core.clj:6803)
at clojure.core$into.invokeStatic(core.clj:6819)
at blog.core$foo.invokeStatic(core.clj:8)
at blog.core$_main.invokeStatic(core.clj:10)
at blog.core$_main.doInvoke(core.clj:10)
at clojure.lang.RestFn.applyTo(RestFn.java:137)
at blog.core.main(Unknown Source)
(bary)
addy.invokeStatic
and addy.invoke
are the only functions beneath map
in
the call stack so if they are both inlined then we have eliminated all function
calls.
Why does using the transducer version of map
improve inlining? I have no idea.
I tried it because, 27 minutes and 20 seconds into his talk "Transducers" at
StrangeLoop 2014, Rich Hickey said of transducers, "They're just a stack of
function calls. They're short. Potentially they could be inlined." But it is not
obvious to me why the resulting Java bytecode is any more amenable to inlining
than it was in the non-transducer case. Both times we call a static class method
via a dynamic interface method.
Obviously, in this particular case, something about the code (and especially the hot paths through the running code) allowed the JIT to inline the call in the transducer case and not in the other case. I am not certain that this would generalize: I think all we can reasonably infer from this is that inlining is far from certain.
So sometimes, if we are lucky, we can use higher-order functions with standard
functional-programming patterns such as map
, filter
, reduce
, etc. and the
JVM JIT will optimize hot code to achieve parity with the old procedural-style
for
loop. But sometimes it won't.
We must therefore accept that the use of map
, filter
, reduce
et al. in
Clojure will often result in suboptimal code. Is there a benefit to the
programmer that outweighs the cost to the computer? I suspect this is a matter
of taste.
Personally, I find the myriad different ways of performing the same task quite
confusing. Should I be using map
directly? Or as a transducer? Or with
transients? Or a loop? These are not questions that I want to think about when I
have a problem to solve: I want to think about the problem itself, not the many
different ways in which I might express the same algorithm. Having a single loop
construct would be simpler both for me and for the computer.