String interpolation and overlapping instances 101

Posted on May 27, 2019

Are you frustrated trying to do any kind of string manipulation in Haskell? The same kind of interpolation or string building that would require zero thought in other languages seems to always turn into a mess of (<>)s, mconcats, and shows in vanilla Haskell. Throw in Text and ByteString and now we have to juggle collisions between different definitions of pack, module imports, lazy versus strict versions of the same functions… not to mention having to manually handle Unicode conversions between Text and ByteString, even when 99% of the time you know it’ll be UTF-8. In short, it just becomes a huge mess.

I got fed up with this situation and wrote string-interpolate.

Here’s how you use it:

Note how we can directly interpolate Strings into a Text without needing to do any explicit conversion; interpolation does what you expect, where textual types and Chars will be interpolated without the surrounding quotes.

You don’t have to worry about lazy versus strict:

Anything which implements Show gets interpolated as you’d expect:

Conversions between Text/String and ByteString automatically convert to/from UTF-8:

In short, it makes building up strings much, much easier. It’s also fast!

The rest of this post is about the internal architecture of string-interpolate; while the tricks used involving -XOverlappingInstances and related type hackery should be old hat to advanced Haskellers, I haven’t seen a case study of using them in practice.


While using string-interpolate is extremely simple, internally we have to somehow transform the quasiquotes that users give us into code that:

  1. handle taking in values of the various textual types to interpolate, and
  2. outputting every textual type

Since quasiquote handlers don’t allow you to look at the type of identifiers, we can’t actually emit some code conditionally; we have to spit out the same code no matter what types our users give us. Sounds like a job for typeclasses.

The simplest way to implement string-interpolate would just be to use Show and IsString together, since every textual type already implements both of these. So a quasiquote like:

would emit code like:1

While this immediately gets us to a working, convenient solution, it’s painfully slow; using show and toString means we have to construct a String for every interpolated value and then convert it into our target type. Just as importantly, it doesn’t handle Unicode strings correctly! If we took our emoji-containing Text above and tried to interpolate it into a ByteString, we’d end up just getting a string containing the Unicode code point as a sequence of digit characters. Ditto if we tried to interpolate the UTF-8 bytes into a Text, just with the individual bytes. Weirdly, interpolating a ByteString into another ByteString won’t even give you the same string if that ByteString isn’t ASCII, which certainly seems counterintuitive for any users.

Another idea would be to create our own Interpolatable typeclass. Since we want the behavior to depend both on what we’re interpolating in and what we want out, we’d have to take in two parameters:

So that our quasiquote from earlier now emits something like:

Now we’re fast and handle Unicode properly, but it means that we’re kind of limited in what we can interpolate. Do we really want to provide instances for, say, Int, Integer, Float, Double, Bool, and so on? Not to mention that if someone wants to use their own type with our interpolation, they’ll also have to write their own instances for Interpolatable. Wasn’t our library supposed to be convenient?

It would be nice if we could simply use Show as a default, and use the specific instances if we’ve defined one. What happens if we just try to add an instance for anything that’s Show?

This seems to compile fine, but when we try it out, GHC starts complaining at us:

Hmm. Well, that’s not great.

What GHC is telling us is that it doesn’t know which typeclass instance to use for our call to interpolate; it could use the Interpolatable Text ByteString instance, which is probably what we want. But the Interpolatable src ByteString instance is also possible, since src could unify with Text.

We’re able to describe in plain English what we want the compiler to do: only fall back to the Show-based instance if we don’t have something more specific. Is there a way to explain that to the compiler? Thankfully, yes!

If we mark our instances as overlapping, GHC will attempt to pick the most specific instance when there are multiple that might work. The exact instance matching rules are explained in detail here.

Success!

In practice, I’ve found overlapping instances to be useful for exactly this sort of behavior: providing a “default” typeclass behavior that can be given specific overrides.


At this point we’ve sketched out the implementation of a fast, Unicode-aware interpolation backend; you could totally take this and spin off a new library with it. However, there’s one last tweak we want to make to how this works for string-interpolate. Relying on the Monoid-based implementation to handle assembling the final output is okay in most cases, but would blow up if the interpolation gets really large. Both Text and ByteString have their own Builder types that are designed for efficiently constructing instances of their associated textual types. Could we turn our interpolation fragments into that intermediate type first, and then use that to output the result, instead of having to directly convert our fragments into the target type?

As it turns out, yes, but here’s where the type hacking starts to get really hairy.

We need to associate with each output type what its builder type is, which sounds like a job for type families. If you’re not familiar with type families, all they are are functions that operate on type terms instead of data. So we can have a function that takes in the output type and returns the builder type. One nice thing is that there’s a convenient syntax for doing this right in the typeclass instance itself:

Once again, though, we run into an issue when we try to define our “default” implementation:

Unfortunately, type families don’t play nice with overlaps. The problem is exactly as it was with our basic overlapping instances: there are two possibilities for what the type SinkBuilder ByteString might be, and the compiler doesn’t currently have the ability to decide between them. Here there doesn’t seem to be any way to get around this restriction on overlapping instances. So instead, we have to figure out a way to make our instances not overlapping.

The approach I chose was to add another parameter to all the involved typeclasses purely for the purpose of making the instance heads not collide. So the typeclasses look like this:

And once that’s set up, we can write our instances using the flag to make them distinct:

Since the flag parameter in the instances differ, a type can’t match both of these at once, solving our overlap problem. The rest is then implementing all the instances for InterpSink and Interpolatable that we want to offer.

So… success! We have our working interpolation backend, using the most efficient builder types that we can for each textual type. Unfortunately, we gave up the ability for library users to override the Show/IsString default behavior for their own types, since there’s no way to define a new instance of HasCustomSink. If that was a priority, we could stick with our two-parameter version using overlapping instances and take the performance hit on more complicated interpolations.

Unfortunately, I haven’t been able to figure out how to get string-interpolate to have all three of 1) usage of efficient intermediate types, 2) defaulting to Show/IsString, and 3) allowing library users to customise the behavior for their own types. Any two together seem doable, but getting all three seems not to be.

Have you run into obvious use cases that Haskell makes way harder than other languages? Do you like string-interpolate and find it useful? Talk to me!


Footnotes

↥1 We’d still have to do some finagling to get rid of the surrounding quotes for textual types, but this approach is directionally correct.


Want more Haskell tips and tricks? Subscribe for useful stuff, no spam, straight to your inbox.