Monad Transformers 101

June 10, 2023
« Previous post   Next post »

Summary

In this post, I go over what monad transformers are and how to use them. I go into the internals of some common transformers, and we see how monad transformers are essentially functions that take in a monad and return an "augmented" monad with extra capabilities. We finish with a discussion of two different ways of using monad transformers, mtl-style and transformers-style.

This post is adapted from the chapter on monad transformers from my book, Abstractions in Context.

In this two-part series, we're going to talk about monad transformers, what they are, and why they matter.

You may have heard of them before, and you might be wondering what they are. What is being "transformed" through monads?

A Transformer trying to understand transformers.

From a practical standpoint, what monad transformers give you is the ability to pick and choose what "capabilities" your code has, and combine those capabilities à la carte. A few examples: you might want to add in the capability for your code to access random values, or handle exceptions. In another language, you might think of these as things that have to be baked into the language at, say, the compiler level. They're just always available, and there's nothing you can do about it. But Haskell takes the opposite design philosophy, by making it so your functions can do almost nothing side effect-y by default. So it's very important that you can put those functionalities back into your code, and do so conveniently.

Capability is a very broad term, though, and many of the things that monad transformers allow you to "mix in" to your code are much more featureful than you might expect. It's possible to add in the ability to stream data from a file or socket directly on top of existing code, or use monad transformers to do incremental processing of some parse input.

Don't worry if none of that preamble makes any sense right now; we'll see in more detail how this pans out once we start looking at specific examples. It's time to talk about how monad transformers work.

What is a monad transformer, anyways?

Imagine, if you will, working with a bunch of functions that return various combinations of IO and Maybe, like IO a, Maybe a, IO (Maybe a). Somehow we need to reconcile these types so that we can use them together. This isn't too uncommon; validating data (say, against a database, or against a 3rd party API) might be one instance where this happens a lot. IO to access some not-in-memory information, Maybe to signal when the data doesn't pass validation.

But this is quickly going to get very tedious, because you're going to end up having to write a lot of code like this:

validateData1 :: Int -> IO (Maybe Int)
validateData2 :: String -> IO (Maybe String)

validateForm :: Int -> String -> IO (Maybe (Int, String))
validateForm rawData1 rawData2 = do
  data1m <- validateData1 data1
  case data1m of
    Nothing -> pure Nothing
    Just data1 -> do
      data2m <- validateData2 data2
      case data2m of
        Nothing -> pure Nothing
        Just data2 -> pure (data1, data2)

We're only two parameters deep and this is already getting unreadable.

What we want is some way where each monadic binding will both deal with the IO portion, and handle pattern-matching/short-circuiting on the inner Maybe. Which suggests creating a new monad, with a different bind definition:

newtype MaybeIO a = MaybeIO { runMaybeIO :: IO (Maybe a) }

Try implementing the functor/applicative/monad instances for this type; it shouldn't be too difficult. Once we have those instances, we can write code like this:

validateData1 :: Int -> MaybeIO Int
validateData2 :: String -> MaybeIO String

validateForm :: Int -> String -> MaybeIO (Int, String)
validateForm rawData1 rawData2 = do
  data1 <- validateData1 rawData1
  data2 <- validateData2 rawData2
  pure (data1, data2)

Much less noisy, no?

We've run into a different problem, though. What if we want to add another monad into this stack? Say we want to also have a Reader to hold some configuration, or some State to hold a cache. Are we going to create new wrapping types for every single possible combination of monads? Are we going to do the tedious work of writing monad instances and helper functions for each of those possible combinations?

It seems like what we want is a way to "isolate" the functionality of each individual monad we want to use, while then having the ability to cobble them back together into a working whole. That is, we'd have a "Maybe" component that gives you the ability to short-circuit, we'd have a "Reader" component that lets you store some read-only data, and some way to plug those together.


What about a type like this?

newtype MaybeT m a = MaybeT { runMaybeT :: m (Maybe a) }

Notice how if you substitute in IO for the mm parameter, you get exactly the same structure as we had with our original MaybeIO type:

MaybeT  IO  a IO  ( Maybe  a ) MaybeIO  a

We took the normal monad (in this case, Maybe) and punched a parameter into the type to put some other monad into. If we could write a working monad instance for our MaybeT type, we could then drop this type "on top" of any existing monad and keep the underlying functionality while adding Maybe-ness!

instance Functor m => Functor (MaybeT m) where
  fmap f (MaybeT mx) = MaybeT $ (fmap . fmap) f mx

instance Applicative m => Applicative (MaybeT m) where
  pure = MaybeT . pure . pure
  (<*>) (MaybeT mf) (MaybeT mx) = MaybeT $ liftA2 (<*>) mf mx

instance Monad m => Monad (MaybeT m) where
  return = pure
  (>>=) (MaybeT rawX) f = MaybeT $ do
    mx <- rawX
    case mx of
      Nothing -> pure Nothing
      Just x -> runMaybeT (f x)

Figuring out the implementation of these instances can be little tricky, but once we have them we're able to use any inner monad we want. We could write similar instances for a hypothetical StateT, for a ReaderT, and so on, and then we'd have the ability to mix and match them in our code, while still retaining the syntactical convenience of our initial MaybeIO example.


The key thing to realize is that monad transformers can take in any other monad as their inner monad. And since monad transformers are themselves monads, you can stack these up indefinitely!

-- a type the compiler will happily accept
-- notice how each "layer" takes in exactly one other monad
type AMonadStack a =
  StateT Int (ReaderT String (MaybeT IO)) a

A word about the terminology here. A bunch of transformers chained together like this, where each transformer is the inner monad of another transformer, is referred to as a monad "stack." The innermost monad in the stack, like IO above, is usually referred to as the "base" or "bottom" of the monad stack. The base monad is also what we eventually run our code in; typical choices for the base are IO (for side effects) or Identity (for code that's pure). We write the bulk of our code wrapped inside a monad stack to get a convenient monad instance, then at the toplevel of our program we use functions like runMaybeT, runReaderT, runStateT etc. to unwrap all those transformer types and get a value like IO (Maybe a) or state -> cfg -> IO (a, state) that we can actually run.

A monad stack.

There's one small hiccup we still need to handle, though. Look at MaybeT. How do we access the inner Maybe value? When we go to signal that the current function should short-circuit, how do we do it? Returning a Nothing won't work, since Maybe and MaybeT are distinct types. So we'll need a helper function that specifically returns a MaybeT.

nothing :: Applicative m => MaybeT m a
nothing = MaybeT (pure Nothing)

For any other monad transformers we create, we'd need to do the same thing and write helper functions to enable monad-specific functionality. But compared to the amount of boilerplate that we'd need to implement the "every possible combination of monads" choice from before, this level of repetition is something we'd take anyday.

-- for StateT
get :: Applicative m => StateT s m s
put :: Applicative m => s -> StateT s m ()
-- for ReaderT
ask :: Applicative m => ReaderT cfg m cfg
local :: (cfg -> cfg') -> ReaderT cfg' m a -> ReaderT cfg m a

You can think of monad transformers as functions, functions that take in some other monadic type and "transform" it by adding in extra capabilities. Hence the name. And it really can be any monad; the definitions that we've written above are completely agnostic of what the inner monad is, only that it implements certain typeclasses.1

mtl-style and transformers-style

Everything we've covered so far forms the foundationals of monad transformers; if you've made it to this point, you've understood monad transformers. However, there are a few extra conveniences that are possible to add. Specifically, there are two common ways to use typeclasses to make working with transformers more convenient, referred to as the transformers style and the mtl style.

Consider our MaybeT example. Our solution worked well when every function we were working with returned MaybeT IO a. But what if we want to just call IO a actions directly? It seems like we should be able to. Unfortunately, trying to call them directly inside a MaybeT function wouldn't work; it's a different type, after all.

validateInput :: MaybeT IO String
validateInput = do
  line <- getLine  -- doesn't compile; IO =/= MaybeT IO
  ...

Instead, we have to write something like this:

validateInput :: MaybeT IO String
validateInput = do
  line <- MaybeT $ fmap Just getLine
  ...

And we'd have to do something similar for each IO action we ran inside our MaybeT IO. Smells a bit boilerplate-y, doesn't it? We could cut down on some of the repetition by writing a function IO a -> MaybeT IO a, but what do we then do if we're using a different inner monad? What do we do if our monad stack is multiple layers deep; do we have to manually do the wrapping for every layer above the one we're trying to use? Plus, forget MaybeT; wouldn't every monad transformer type need us to write a similar function?

transformers-style and mtl-style, named after the respective libraries which implement them in Haskell, are two different ways to tackle this problem, cutting down on the amount of repetition needed to use monad transformers in your own code. We'll start with transformers.

Fundamentally, the problem is that we need some easy way to take the inner, "wrapped" monad, and convert it to the "wrapping" transformer. Well, the simplest, most direct way to solve that seems like an actual function from one to the other. We've already talked about how we could do this and write a function IO a -> MaybeT IO a, but as we said, we need a function like this for both variations on the inner and the outer type. Sounds like polymorphism; sounds like we need a typeclass specifically for monad transfomers.

class MonadTrans trans where
  lift :: Monad m => m a -> trans m a

Take a second to understand this typeclass; try substituting in some specific monads for m and trans. For instance, if we substitute trans = MaybeT, m = IO, lift will have exactly the type IO a -> MaybeT IO a. But since lift is polymorphic, we can now wrap any inner monad we want with just one function.

This also gives us a concise definition of a monad transformer: it's any type where an inner monad can be converted to the type itself. The operation is called "lift" because we're moving an inner monad "upwards" through the stack, towards the topmost transformer.

instance MonadTrans MaybeT where
  lift = MaybeT . fmap Just
instance MonadTrans (StateT s) where
  lift m = StateT (\s -> fmap ((,) s) m)

validateInput :: MaybeT IO String
validateInput = do
  line <- lift getLine  -- much shorter now!
  ...

This solution is what's known as transformers style.

Success! And with this, we're done, right? Boilerplate problem solved? Unfortunately, not quite yet. Though this is a major improvement, there are still degenerate cases when the monad stack starts getting tall.

foo :: StateT Int (ReaderT String (MaybeT IO)) ()
foo = do
  -- need to do some IO in this stack
  input <- lift $ lift $ lift getLine  -- lots of lifts needed
  ...

We've at least gotten rid of the problem of requiring the end developer to implement and/or remember lots of different lifting functions for each monad transformer. But we can still only lift one layer at a time. What we really want is, for every component in our monad stack, we can use the functions from that component in any stack containing that component, with no wrapping needed.

For this to work, suddenly the functions that each layer provide have to be polymorphic. For StateT, right now we have functions like this:

get :: Applicative m => StateT s m s
put :: Applicative m => s -> StateT s m ()

But for what we want, signatures like this can't possibly work; the return type is too concrete.

Instead, we need these functions to have a signature more like so:

get :: MonadState trans s => trans s
put :: MonadState trans s => s -> trans ()

Rather than tying ourselves to a concrete state type, we use a typeclass that represents "statefulness." The idea is that as long as the type of our monad stack implements this typeclass, we can call get and put without lifting, no matter how deep the StateT is in the stack, regardless of what order the transformers have been stacked in.

class Monad m => MonadState m s where
  get :: m s
  put :: s -> m ()

-- StateT can implement MonadState directly...
instance Monad m => MonadState (StateT s m) s where
  get = StateT $ \s -> pure (s, s)
  put s = StateT $ const (pure (s, ()))

-- ...and if the inner monad supports statefulness, other monad
-- transformers can simply delegate the state operations downward
instance MonadState m s => MonadState (MaybeT m) s where
  get = lift get
  put s = lift (put s)
  -- instances look very similar for ReaderT, WriterT, etc.

Another way to look at it is that for any transformer type that's not StateT itself, the MonadState instance handles calling the appropriate amount of lifts for us.

With all that, you can see that we now have the capability to use get and put in whatever monad stack we want, as long as we specify that said monad stack has something to handle that statefulness somewhere in the hierarchy:

-- our functions now work in this stack...
foo :: StateT Int IO ()
foo = do
  state <- get
  put (state + 1)

-- ...as well as this one
bar :: MaybeT (StateT Int IO) ()
bar = do
  state <- get
  put (state + 1)

-- ...or even fully polymorphic
baz :: MonadState m Int => m ()
baz = do
  state <- get
  put (state + 1)

This solution is what's known as mtl style.2

To fully make use of this style, we'd have to write similar typeclasses for all our other transformers as well, which can be a lot of work. But once we do, the syntactic noise of calling lift disappears; we can directly use any function from any transformer in our stack wherever we want. Whether doing all this implementation work purely to remove calls to lift is worth it is another question entirely, but there's no denying that this works.

Beyond just the syntactic convenience of not having to call lift anymore, there are a number of other benefits to using monad transformers like this compared to transformers style.

Say we had two functions, one that returned a MaybeT IO a and one that returned a MaybeT (StateT Int IO) b. Intuitively, it seems like we should be able to use these together, since the latter monad stack has strictly more functionality than the former. But if you actually try to do this, you'll need to do some painful finagling to convert from one type to the other; lifts won't cut it here, since the offending StateT is in the middle of the stack rather than the top.

This problem disappears in mtl style, since we never actually specify concrete types for our monad stack anywhere, just typeclasses describing what functionality we need. So if we have one function that uses just reader functions, and another that uses both state and reader functions, there's nothing stopping us from combining them.

justReader :: MonadReader m String => m ()
justReader = ...

readerAndState :: (MonadReader m String, MonadState m Int) => m ()
readerAndState = ...

-- we can use both in the same function!
combined :: (MonadReader m String, MonadState m Int) => m ()
combined = do
  justReader
  readerAndState

In fact, this elegantly deals with a number of other potential problems with using transformers style, like how ReaderT cfg (StateT s IO) a and StateT s (ReaderT cfg IO) a aren't the same type and can't be used together, despite being equivalent by inspection; in mtl style, the order that transformers are specified in doesn't matter when writing your logic, only when you eventually go to run it.


Between mtl and transformers, which one is better? While we can see that mtl is definitely more syntactically convenient, it's actually a rather subtle question. Both libraries can be better in different situations. mtl allows you to avoid having to specify an order when writing code, and only choose it once you go to run it, which can matter when working with monads that allow for early exits. It's also just nicer to use. On the other hand, transformers can allow you to have more than one of the same transformer in the same stack; with mtl, if you want two different StateT's, both with an Int state parameter, you can't; there's no way to differentiate the constraints needed. This ambiguity doesn't exist in transformers style. Implementing mtl style (say, if you create your own monad transformers) is significantly more work as well, since every new transformer requires a typeclass, plus instances for every existing transformer, something known as the "n2n^2 instances" problem.

One important difference is that mtl is often slower than transformers, despite providing the exact same functionality and using the exact same underlying transformer types. The reasons why are outside the scope of this post, but it's something to keep in mind if you're using monad transformers for performance-critical code.3

In the next post, we'll go deeper into the why of monad transformers. We'll look at practical examples of combining transformers to solve real problems. We'll see how the resulting code is more than the sum of its parts, while still retaining the modularity typical of Haskell abstractions. Stay tuned!

Found this useful, or otherwise have comments or questions? Talk to me!

« Previous post   Next post »

Before you close that tab...


Footnotes

↥1 I wish this footnote didn’t have to exist, but it does. Unfortunately, there are exceptions; there are some transformers that don’t form law-abiding monads, or have to be used in very specific ways. The most notorious of these is ListT. My impression is that people try to stay away from these kinds of transformers that don’t compose well.

↥2 More generally, this approach of using typeclass constraints/instances and making the type of a value completely polymorphic is known as “tagless final.”. Tagless final can be seen as an inversion of control, where the behavior of the code is determined by usage sites, rather than by the code definition itself. It’s a more general technique than we’ve looked at here; it just happens to be useful for working with monad transformers as well.

↥3 The short explanation for why mtl is slower is that generally GHC doesn’t monomorphize code that calls typeclass functions; usually a record containing the type-specific typeclass functions are passed to the code at runtime instead. If you’re interested in the gorey details, check out this great talk about the performance of various effect systems by Alexis King!