Friday, August 27, 2010

Why Scala's "Option" and Haskell's "Maybe" types will save you from null

Cedric Beust makes a bunch of claims in his post on Why Scala's "Option" and Haskell's "Maybe" types won't save you from null, commenters say he doesn't get it and he says nobody has argued with his main points. So I thought I'd do a careful investigation to see what, if anything, is being missed.

First, right off the top here: Scala has true blue Java-like null; any reference may be null. Its presence muddies the water quite a bit. But since Beust's article explicitly talks about Haskell he's clearly not talking about that aspect of Scala because Haskell doesn't have null. I'll get back to Scala's null at the end but for now pretend it doesn't exist.

Second, just to set the record straight: "Option" has roots in programming languages as far back as ML. Both Haskell's "Maybe" and Scala's "Option" (and F#'s "Option" and others) trace their ancestry to it.

Now, to the post.

First Objection

in practice, I find that hash tables allowing null values are rare to the point where this limitation has never bothered me in fifteen years of Java.

Really? Let me give one concrete example. You're writing a cache system (in the form of an associative map) in front of a persistent storage with nullable values. Question: how do you differentiate between "I haven't cached this value" from "I've cached the value and it was null in the database"? The answer is that you can't use "null" to mean both things, it's ambiguous. That may not "bother" Beust, but I find it irritating that you end up having to use two different mechanisms to say this in Java.

With Maybe/Option you just say that the map holds optional values. A fetch result of None/Nothing means you haven't put that key in your map. A result of Some(None)/Just Nothing means you've got the key but it maps to nothing, and a result of Some(value)/Just value means that the key is in the map and that there's a "real" value there in storage.

Save Me!

Back to Beust

The claim that Option eliminates NullPointerException is more outrageous and completely bogus. Here is how you avoid a null pointer exception with the Option class (from the blog):

val result = map.get( "Hello" )
 
result match {
  case None => print "No key with that name!"
  case Some(x) => print "Found value" + x
}

See what's going on here? You avoid a NullPointerException by... testing against null, except that it's called None. What have we gained, exactly?

Here, is, IMHO the heart of what people are arguing with Beust about. He thinks this code is no different from the kind of "if (result == null)" code you write in Java. Sure, identical. Except for one, huge, glaring, major, stab you in the face difference: in Java the type system says Map<T>#get will return a T. It doesn't say it MIGHT return a T. That's left to the documentation where it's out of reach for the compiler. Here in Java.

Map<String, Integer> myStuff = new HashMap<String, Integer()>;
myStuff.put("hello", 42);
int result = myStuff.get("goodbye") * 2; // runtime NPE

Java NPEs when you try to use the result of the fetch. On the other hand, when you use a Scala map you get an Option[T] while Haskell's says it returns a Maybe t. The type system says you can't use the underlying value directly. Here's me trying to pull something from a map and manipulate it directly in both Haskell

mystuff = fromList[("hello", 42)]
result = (lookup "goodbye" myStuff) * 2 -- compile time error

And Scala

val myStuff = Map("hello" -> 42)
val result = (myStuff get "goodbye") * 2 // compile time error

In both cases I get a static error.

Now...there are a lot of (IMHO crappy) debates on static vs dynamic typing. But one thing must be very clear: the two situations are at least different. This isn't the same as a Java NPE (or the equivalents in Ruby/Python/whatever). Option/Maybe tell the static type system that you may or may not have what you're looking for.

Blowups Can Happen

Onward.

The worst part about this example is that it forces me to deal with the null case right here. Sometimes, that's what I want to do but what if such an error is a programming error (e.g. an assertion error) that should simply never happen? In this case, I just want to assume that I will never receive null and I don't want to waste time testing for this case: my application should simply blow up if I get a null at this point.

There is a cavernous hole in Haskell Maybe and Scala Option. Beust's post does not talk about it at all, though. The hole is that Haskell and Scala are Turing complete, and that means that the type system cannot prevent you from throwing optionality out if that's what you really want to do. In Haskell

loopForever :: a
loopForever = loopForever -- really, forever

stripOption :: Maybe t -> t
stripOption Just x = x
stripOption _ = loopForever

-- or better yet
stripOption = fromMaybe loopForever

And Scala

final def loopForever[A] : A = loopForever // and ever, amen

def stripOption[T](o : Option[T]) : T = o match {
  case Some(x) => x
  case None => loopForever
}

// or, better still

def stripOption[T](o : Option[T]) : T = o getOrElse loopForever

In their respective languages language the following compile just fine but will cause non-termination at runtime.

result = (stripOption (lookup "goodbye" myStuff)) * 2
val result = stripOption(myStuff get "goodbye") * 2

Of course, if you really want to write that in real code you would replace "loopForever" with throwing an exception. That's exactly what Scala's Option#get and Haskell's Data.Option.fromJust do.

result = (fromJust (lookup "goodbye" myStuff)) * 2
val result = (myStuff get "goodbye").get * 2

Why did I write a silly loop? To make a point: Turing completeness punches a hole in the type system. "Throw" style operators exploit essentially the same hole. Every programmer should know this in the bottom of their heart.

Whether using the loopForever version or the exception version, stripOption/get/fromJust is different from null dereferencing. It requires an explicit step on the part of the programmer. The programmer says "I know what I'm doing here, let me continue, blow up if I'm wrong." But, IMHO, it's easy enough to say that it kills his objection of "forcing" him to deal with null instead of just blowing up. Yes, there's a hoop to jump through, but it's tiny. KABOOM!

Nullable Types

Next point:

When it comes to alleviating the problems caused by null pointer exceptions, the only approach I've seen recently that demonstrates a technical improvement is Fantom (Groovy also supports a similar approach), which attacks the problem from two different angles, static and runtime:

Static: In Fantom, the fact that a variable can be null or not is captured by a question mark appended to its type:

Str   // never stores null
Str?  // might store null

This allows the compiler to reason about the nullability of your code.

One technical quibble. Since Groovy doesn't have static types this static bit isn't relevant to Groovy.

Anyway, the equivalents in Scala and Haskell are String vs Option[String]/Maybe String. More characters, but the same idea. Except that Fantom and Nice and other languages with nullable types don't tend to support saying ??String whereas Option[Option[String]] and Maybe (Maybe String) are totally reasonable. See my caching example above.

Safe Invoke

Runtime: The second aspect is solved by Fantom's "safe invoke" operator, ?.. This operator allows you to dereference a null pointer without receiving a null pointer exception:

// hard way
Str? email := null
if (userList != null)
{
  user := userList.findUser("bob")
  if (user != null) email = user.email
}
 
// easy way
email := userList?.findUser("bob")?.email

Note how this second example is semantically equivalent to the first one but with a lot of needless boiler plate removed.

And in Scala that might look like

userList flatMap (_ findUser "bob") flatMap (_.email)

while haskell would use

userList >>= findUser "bob" >>= email

There are some more keystrokes indeed, but here's the thing...

Better

So, can someone explain to me how Option addresses the null pointer problem better than Fantom's approach

Option/Maybe/flatMap/>>= are all ordinary library code. No magic. The types are algebraic data types, an idea that is massively useful. The methods belong to a class of types called Monad. If Option/Maybe don't work for you as-is you can build your own versions. With nullable types and ?. operator you're stuck with what the language provides. If they don't work for your particular case then you can't create your own variants. Quick, in Fantom extend the ?. operator so that works for exception-like semantics such that failure carries a reason instead of just computing null. In Haskell and Scala that already exists and it's ordinary library code in a type called Either.

Conclusion

In practice, outside of learning, I've never ever gotten the equivelent of an NPE by abusing Haskell's fromJust or Scala's Option#get, not even in testing. On the other hand NPE's are a fact of my life when dealing with nulls in Java et al. And while it's true that testing has revealed the vast majority of potential NPEs (or equivalent) in my life, it certainly hasn't dug up all of them. I've seen plenty of production systems brought to their knees by NPEs.

Exceptions plus a deliberate weakening of the type system allow developers to create "oh, just blow up" code when that's the right answer. But the developer has to explicitly say so.

Fantom/Nice style nullable types plus a "safe invoke" operator are a nice improvement over plain old null. But they are not composable (no ??T) and they are special purpose magic aimed at a very narrow domain instead of general purpose magic such as algebraic data types and higher order functions that are broadly applicable to huge problem domains.

So there you have it, Cedric, what are we missing in your post?

PostScript on Scala and Null

Which brings me back to Scala's null. For interoperability reasons Scala has a full blown Java-like null. But the experience that all the Scala developers I know have is that NPE is mostly only a problem when dealing with existing Java libraries because Scala programmers and libraries avoid it. Plus, Scala's "Option(foo)" promotes foo to either Some(foo) or None depending on whether it's null or not that at least enables the map/flatMap style of "safe invoke." It's far less than perfect, but about the best that can be done when dealing directly with Java.

For Haskell, null is a non-issue.

25 comments:

MLeoDaalder said...

In Haskell null is called undefined. But other than that, I agree with you completely!

rantomaniac said...

Sigh, it all looks very elegant and attractive in theory, but to be better, it'd need to be optimized, as opposed to using another full-blown heap-allocated object for your Option[T]. Scala has awesome abstractions, but penalizes you for using them with speed and memory penalties.

Ricardo Silva Lima said...

Well, About the "Blowups can happen" argument, I believe the solution would be simply to not use the Option, then the compiler will ensure that the blowup won't happen. Also, I'd like to point out that IMHO, the for notation is easier in the eyes of those who are not used to monads although it forces the programmer to declare a variable for each step.

Just my 2 cents.

James Iry said...

Undefined is bottom, null is not. In particular you can check if a value is equal to null but not to bottom. In that sense null is much more like Nothing/None than it is like bottom. However, from a typing standpoint, null can take on most any type so its type is more bottom-like (usual Java caveats about primitive vs reference blah blah blah).

Daniel Sobral said...

I think Lift's Box is an even better example than Scala's Either.

Dave said...

The day I ran into the Beust article was the same day that I saw a forum post on the scala mailing list questioning whether Option's 'get' routine should exist.

If we could lock those two people in a room, I think the Option class would be what came out.

Olivier said...

In your experience, how much was the speed/memory penalty when using Option over plain objects/null? What methodology did you use: benchmarking, profiling a real-world scenario?

How would you optimize Option?

James Iry said...

> How would you optimize Option?

If Scala didn't already have null then a reference to a T would never be null. A one level Option[T] would optimize down to holding a reference to a T or a null. Primitives would still have to be boxed to be in an Option. A two level, Option[Option[T]] would need one more level of boxing in order to distinguish between Some(None) and None (Some(None) = Boxed null and None = null). Etc.

Because Scala has null, and in particular because you want Some(null) to have the obvious meaning, that stuff doesn't work.

Outside the JVM you can do pointer bit flipping to avoid some of this boxing.

Toby said...

Correct me if I'm wrong but I think there is a tiny difference between Fantom's null-safe invoke (a feature in Groovy too, probably many other langs as well) and the scala example you provided. Here's a slightly more concrete example:

In groovy:
def userList = null
userList?.find{ n -> n == "bob" }?.length()
// yields null
userList?.find{ n -> n == "joe" }?.length()
// yields null
def userList = ["joe", "jack"]
userList?.find{ n -> n == "bob" }?.length()
// yields null
userList?.find{ n -> n == "joe" }?.length()
// yields 3

The same in scala:

var userList:Option[List[String]] = None
userList flatMap (_ find (_ equals "bob")) flatMap (u => Some(u length))
// yields None:Option[Int]
userList flatMap (_ find (_ equals "joe")) flatMap (u => Some(u length))
// yields None:Option[Int]

var userList = Some(List("joe", "jack"))
userList flatMap (_ find (_ equals "bob")) flatMap (u => Some(u length))
// yields None:Option[Int]
userList flatMap (_ find (_ equals "joe")) flatMap (u => Some(u length))
// yields Some(3):Option[Int]

In this case flatMap isn't just a drop-in replacement for the null-safe operator, since every parameter being passed around needs to be wrapped with Some().

As you've mentioned in your post, this isn't really an issue unless dealing with legacy APIs, but it's still worth noting. Also, adding all those wrappers around your code does increase the verbosity somewhat - I suppose scala could be tricked into automatically wrapping where appropriate given a well-placed implicit.

Of course this doesn't do any damage to your main argument, which I definitely agree with!

Tony Morris said...

Thanks James for saying what I tried to. I hope it works in that the point is finally communicated, but I have a strong suspicion that it won't :( Surprise me!

In your introduction, you refer to "missing the point" -- it's not you or commentators (besides one or two) missing the point. Pointing this out without tact is likely to explode. Thanks again for trying. It's a tough game.

MartinMauch said...

Hi James,

I think you can solve the cache problem using Map.containsKey in Java. But the point is still valid.

James said...

There's also another benefit. Say you've used the flatMap method or it's equivalent for comprehension:
for (ul <- userList; u <- ul findUser "bob") yield u.email

What if your company has two people called 'bob' or people have more than one email address? Just make findUser and email return them all, and it will automatically do the right thing - give you a list of all the email addresses for everyone called 'bob'.

Do that with safe-invoke and you have to alter your code to start looping and it gets a lot less elegant than it was before.

Jed Wesley-Smith said...

Map.containsKey isn't threadsafe. Having a map/flatmap projection means that you will only run your code on a valid value even if it is running an a shared mutable map.

James Iry said...

Right, as I said, I need to use a different mechanism to determine containment.

An extension to the problem: imagine my key is an integer within a narrow range (1 to 100, say). It might make more sense to use an Array (or ArrayList) than a Map. But ArrayList doesn't have a "containsValueAtIndex" method.

The point is that Maybe/Option allow me to distinguish different forms of unknown in those cases where I need to. join/flatten (and their cousins >>=/flatMap) allow me to ignore the different levels if I don't care.

Vlad said...

You talking non determinism but here is the case of Option monad.

This -
userList flatMap (_ findUser "bob") flatMap (_.email)
should be written in idiomatic Scala as:
for (ul <- userList;
u <- ul findUser "bob"
e <- u.email ) yield e

Toby said...

(Late reply... sorry)

You're definitely right that it's a different model, so it can't be a drop-in replacement. And I'd definitely write the code the way you suggested, I just added the Some boxing code to use flatMap throughout.

My only point was that Some/None/Option can have an advantage over null only when you can ensure that "nullable" values are properly wrapped. If a library returns null, it still falls to you to check against null and deal / wrap accordingly. In such cases I'd probably still be glad to have the null-safe operator.

Grundlefleck said...

Hi, I've arrived at this blog a bit late, hopefully you'll still keep an eye on the comments.

In your example of the cache, in Java, why couldn't you use an enum to signify the different values of "not retrieved" and "nothing found"?

Hope this makes sense.

James Iry said...

What? A separate array with values of "not retrieved" and "nothing found" that parallels an array with values and nulls? What does that buy me?

Mike said...

Slight syntax typo:
"stripOption Just x = x"
should be
"stripOption (Just x) = x"

truth_machine said...

How does someone with Beust's mental capacity get the stature to be worth responding to?

Wei Hu said...

Very nice post, just one nit though:

> With Maybe/Option you just say that the map holds optional values. A fetch result of None/Nothing means you haven't put that key in your map. A result of Some(None)/Just Nothing means you've got the key but it maps to nothing, and a result of Some(value)/Just value means that the key is in the map and that there's a "real" value there in storage.

To encode a real value, you have to write Just (Just value), which is a little inconvenient.

JimDesu said...

What he doesn't get with Maybe is that the "dealing with Nothing" is already done by >>= (or fmap ( or <*>)). You can weld together quite complex processings without ever having to handle those null-cases except where functionally necessary.

Jonathan C Dickinson said...

I thought about a language where everything is a list (like JavaScript+jQuery - which I hate). An operation on a 'null' (empty list) just becomes a nop (which is mathematically correct, no?). A VM/Compiler designed for this from the ground-up could probably do a good job at it.

John Lonergan said...

re "In practice, outside of learning, I've never ever gotten the equivelent
of an NPE by abusing Haskell's fromJust or Scala's Option#get, not even
in testing."

Just seen the equiv of an NPE in prod due to an unchecked Option.get

Path to this event was a refactoring that made a field optional and the programmer just added a get in a load of places without thinking - not clever - but for me I'd rather have a non-null ref type in the language. (What's the plans for the NotNull trait in scala?)

Anyway, back in my C++ days I never saw an NPE in prod because we used reference types (typically refs to const) everywhere - use of pointers was restricted to a few specific interface modules that maintained by more skilled staff. Whilst it is possible to create a null ref in C++ one has to do a simple but non-obvious contortion to achieve it.

Paulcc Two said...

Excellent response to the original article! 

One detail that needs more prominence, imho, is what the Maybe/Option type is representing inside the code. It is explicitly indicating that a value may or may not be there, and that it's up to the caller to act accordingly. 

With null etc, this detail is implicit and it's up to the programmer to decide and/or remember to deal with it. 

On a less important point: Haskell compilers can eliminate some of the overheads through inlining, deforestation, and other techniques, so performance isn't as bad as some people assume.  I'd willingly trade a bit of overhead for a lot more confidence in the code not throwing a NPE!