How to Find the Right Cats Import (And Why They're Like That)
Table of contents
> wc -w
~ 2841
Reading time estimate ~ 15 minutes
Introduction
cats is, if you haven't heard of it1, a popular library for providing various basic abstractions for typed functional programming in Scala. Many of these abstractions provide powerful primitives for expressing yourself in a more declarative and abstract way.
Unfortunately, new users often find it difficult to know how to gain access to the appropriate primitives for their situation, even after they begin to understand what typeclass provides those primitives.
I have a friend, an experienced Scala software engineer, who still to this day asks me how I know an incantation2 when I need something from cats. "How did I memorize all these?", he will say.
The answer, like most such tricks, is that I didn't. I am constitutionally incapable of internalizing raw, heterogeneous information like that.
What I did do is internalize a rule, a rule that lets me reliably find the import I need — a rule this blog post will try to explain to you3.
If you see the rule and don't understand it then in the following section I'll provide a discussion of typeclasses and their components in the cats
encoding in Scala. After that we'll revisit the rule with our new knowledge, hopefully making it even easier to internalize and apply freely.
The Rule
Caveat
Before I go into this, know that in principle you don't need this knowledge. You can just import everything if you want:
import cats._
import cats.data._
import cats.syntax.all._
This will work! Might bloat the namespace, make your editor tooling run worse, and ruin your autocomplete, but the code will compile4. It also makes it a bit harder to teach - new users tend to have trouble knowing what is available and what's providing it when using these "catch all" imports.
For what it's worth, I tend to think the lack of specificity is aesthetically and intellectually unsatisfying, but that's not a terribly pragmatic motivation.
In any case, if you want to be more specific and keep things lean and hygeinic, keep reading.
No, Really, The Rule
The rule5 is straight forward6.
Typeclass Reference
If you need to reference a typeclass by name, you import it from root, unless it is related to Arrow
or it is lawless.
These imports are direct, you just need the name of the thing.
So, this is the schema:
// Lawful and not in the `Arrow` hierarchy (this is the vast majority of cases).
import cats.$TYPECLASS_NAME
// Lawless
import alleycats.$TYPECLASS_NAME
// Related to `Arrow`, this is rare and if you are here you probably already know these rules!
import cats.arrow.$TYPECLASS_NAME
So, if you need Foldable
, then you want this import import cats.Foldable
.
Usually, if you need to reference a typeclass, you probably know its name, so this should be easy enough to discover.
Implementations of Typeclasses
If you need the concrete implementation of a typeclass for a given datatype, then there is only one schema.
The implementation is what provides the connection between a given datatype (like Option
) and a given typeclass (like Functor
). You will essentially always want this in scope.
You can find a more involved discussion in the section on what an implementation is, below.
These imports are data relative, that is you need to know the type of data that you're working with to get any of the behaviors you want.
This is the schema7:
// lawful
import cats.instances.$NAME_OF_DATA_TYPE._
// lawless (note std rather than instances)
import alleycats.std.$NAME_OF_DATA_TYPE._
So, if you want the implementation of Applicative
for Either
, you would want import cats.instances.either._
. Note that Applicative
doesn't enter into it, you just need to know you are working with Either
to get what you need.
Syntax for Typeclasses
If you need syntax for a particular method so that you can call it in method style, then there is only one schema.
The syntax is what provides the ability to call typeclass methods in an idiomatic way using method invocation dot syntax, for example. You will only need this in scope if you want that idiomatic syntax yourself, directly.
You can find a more involved discussion in the section on what typeclass syntax is, below.
These imports are typeclass relative, that is you need to know the typeclass that provides the method you want in order to get it, regardless of the data with which you are using it.
This is the schema:
// lawful
import cats.syntax.$NAME_OF_TYPECLASS._
// lawless
import alleycats.syntax.$NAME_OF_TYPECLASS._
So, if you need .separate
on a List[Either[A, B]]
, which is provided by Alternative
, then you want:
import cats.syntax.alternative._
Or, if you want .reduce
on a Tuple2
, which is provided by Reducible
, then you want:
import cats.syntax.reducible._
Caveat (Import Dependencies)
Importantly, these syntax imports don't do anything without the imports for the implementation from the prior section. So what you really want for full functionality is both of these:
import cats.instances.$NAME_OF_DATA_TYPE._
import cats.syntax.$NAME_OF_TYPECLASS._
So, in the above concrete examples, you would need:
import cats.instances.either._ // bifoldable for either
import cats.instances.list._ // alternative for list
import cats.syntax.alternative._ // the syntax for alternative
And,
import cats.instances.tuple._
import cats.syntax.reducible._
To know why, read the section on understanding the components below!
Datatypes
cats
also comes packaged with a variety of datatypes, like Chain
or Ior
or EitherT
. These all have direct imports as well.
The schema:
import cats.data.$NAME_OF_DATATYPE
So, if you need Chain
, then you want import cats.data.Chain
.
Understanding the Components
There are four possible components you might import from cats
. These reflect cats
encoding of a typeclass
8 for the first three, and the datatypes cats
provides for the last.
If you aren't familiar with typeclasses, they are a mechanism for achieving ad hoc polymorphism in a principled manner in a strongly typed setting. They originated in Haskell, but we can encode them easily enough in Scala by relying on implicit resolution.
Ad hoc polymorphism is something you are familiar with, even if you have not heard the term. It means that a given function or operator can behave differently based on the types provided, with potentially different implementations for each type. If you are from a less functional background, then overloading would be a good way to start thinking about the general concept.
There are essentially three portions of a typeclass in this encoding:
- The definition of the typeclass, its interface.
- The implementation of the typeclass for concrete datatypes.
- The syntax of the typeclass allowing for its functionality to be invoked idiomatically in Scala.
Let's walk through each of these by showing an example.
Optional: Some Talking Around Typeclasses
A typeclass is, to a certain degree, a kind of implicitly passed compile time checked vtable.
If you have the right background, this already clarified some things, maybe. If you, like me, do not have the background to have that term as an obvious and already known idea then I'll share my understanding. It is essentially a lookup table to find the implementations for function names at runtime.
At a certain point in programming history, you would have these in the background, allowing you to delegate the implementation of a symbol to the right specific executor.
In other words, it is a kind of dynamic dispatch (a way of figuring out what to do when a function is called dynamically, depending on the context).
In our specific situation, with typeclasses in a statically typed functional programming language, the context will be the type of the dictionary, and the type against which that dictionary is brought to bear. That shift in context from the symbol or term of invocation to the term and type, means that we move from dynamic dispatch, to a static dispatch that is compile time resolved dynamically against the types calculated.
Let's look a bit more concretely in Scala.
Definition of Typeclass
So, we need to define our typeclass. Since Scala is a typed language and does not natively provide a typeclass mechanism9, that means defining a type.
In the current and relatively standard encoding this means defining a trait with abstract methods for each symbol to be concretely implemented and then statically, but polymorphically, dispatched.
So, let's say we want to encode the typeclass Monoid
, which captures the idea that we can combine two values of a type, and many such types have a value that "makes no difference" when so combined with other values10.
trait Monoid[A] {
def combine(l: A, r: A): A
def empty: A
}
This describes the signature of the interface we will be asking various concrete types to implement concretely.
In other words, any datatype that wants to leverage any benefits afforded to users of the Monoid
typeclass, needs to provide an implementation of both combine
and empty
with the appropriate type signature.
But should they do so, they could plausibly interact with any function of the following signature (suitably made concrete) without any further ceremony:
def areInverse[A: Monoid](a: A, b: A): Boolean = ???
Like if someone had an implementation of Monoid
for their type MyNewAwfulThing
(and we'll provide one below), this would just compile, and work appropriately11:
case class MyNewAwfulThing(value: Int)
// This would compile and work
// (if we did what's in the next section)
areInverse(MyNewAwfulThing(-1), MyNewAwfulThing(1))
That would be the case even if MyNewAwfulThing
and areInverse
were defined in totally separate codebases and by different people.
You can learn more about that in any good discussion of typeclasses in Scala (for example this book I have read or this other book I haven't fully read have good material).
Implementation of Typeclass for Datatype
So now that Monoid
is defined, we need to explain how a given concrete datatype would participate12 with that typeclass.
Concretely, this looks like providing an implicit val
that has the right type, that is also in implicit scope.
// Just reminding ourselves
case class MyNewAwfulThing(value :Int)
// Not getting into it, but companion objects are
// always in implicit scope
object MyNewAwfulThing {
// Implicit declaration so compiler can find it by type
implicit val monoid: Monoid[MyNewAwfulThing] =
new Monoid[MyNewAwfulThing] {
def combine(
l: MyNewAwfulThing,
r: MyNewAwfulThing
): MyNewAwfulThing =
MyNewAwfulThing(l.value + r.value)
def empty: MyNewAwfulThing = MyNewAwfulThing(0)
}
}
This implicit val
provides evidence to the compiler that the type MyNewAwfulThing
participates in the Monoid
typeclass — it has a valid implementation thereof.
Note that it's not that we extended the trait
we used to define the typeclass earlier. That would be classic subtype polymorphism rather than the ad hoc polymorphism we're looking at here.
So long as this implementation is in implicit scope, any user of both Monoid
based functions and MyNewAwfulThing
can use them together!
Syntax for Typeclass
But so far, there are ergonomics issues as far as Scala is concerned. So far, this is the best a person could do for implementing a function areInverse
:
// Ignore that that's a sort of silly implementation versus == empty
// for now, focus on the syntax
def areInverse[A: Monoid](a: A, b: A): Boolean =
Monoid[A].combine(a, b) == Monoid[A].empty
That's fine, but like... wouldn't it be nicer to write a.combine(b)
? Or something even easier?
This is where the last component comes in. In a language which has typeclass support intrinsically, this sort of plumbing might happen as part of method resolution directly, but Scala doesn't have that, so we have to provide it.
This can be done by providing an implicit class
:
implicit class monoidSyntax[A: Monoid](self: A) {
def combine(other: A): A =
Monoid[A].combine(self, other)
def |+|(other: A): A =
self.combine(other)
}
The implicit class for the syntax delegates calls to the symbol combine
in tail position against a value of type A
that has a Monoid
instance in implicit scope to the function on that Monoid
instance.
Essentially, all it does is forward the call to the implementation we defined in the last section. The implicit
nature of these declarations allows the compiler to resolve them by type at compile time.
That means we can now write something like this for our silly areInverse
function:
def areInverse[A: Monoid](a: A, b: A): Boolean =
a |+| b == Monoid[A].empty
// or, equivalently, a.combine(b) == Monoid[A].empty
And I think that looks a lot nicer!
The Rule, Again
Remember what we said about the import conventions above?
Typeclasses have direct imports:
import cats.$NAME_OF_TYPECLASS
Implementations have data type relative imports:
import cats.instances.$NAME_OF_DATATYPE._
And, finally, syntax has typeclass relative imports:
import cats.syntax.$NAME_OF_TYPECLASS._
Having discussed all the components and made our own toy implementations of them, we can see why this is the case!
Syntax is generic over all implementations, so it doesn't mention any specific concrete datatype in its definition. It only mentions the typeclass for whom it provides syntax. You can use the syntax to implement new generic typeclass functions without having to think about specific types.
As such, the most natural place to look for the syntax is relative to the typeclass.
Meanwhile, implementations are always specific to the datatype, you can't get around talking about the datatype you're working with when writing the implementation, and when you want to use it you are already writing code that uses that datatype!
So, similarly, the most natural place to look for the implementation is relative to the datatype.
This structure to the relevance of the components is why the imports are the way they are, and knowing that motivation can help you remember where the relevant import lives.
Conclusion
Hopefully, by this point, you have an understanding of the rule, and why it is the way it is. If not, sorry, maybe you at least learned a little more about typeclass encoding in Scala along the way.
Essentially, you have to hold in mind there are three components:
- The typeclass definition
- The implementation of the typeclass for a data type
- The syntax for the typeclass
In order to use the typeclass idiomatically, you need both the implementation and the syntax.
The implementation is imported relative to the data you are working with, while the syntax is imported relative to the typeclass you are using.
This is because the syntax merely plumbs invocations to the implementation so it is generic with regards to the implementation, and the implementation is specific to the datatype.
Thanks for reading.
Credits
Thanks to Jesse Atkinson and Jenifer Carter for comments on an earlier draft. All errors remain my own.
And if you haven't, I'm not sure why you're here, but if you are a Scala engineer interested in functional programming you should check it out, I guess?
Like, for example, import cats.instances.option.catsStdInstancesForOption
to get the Monad
implementation for Option
. If that baffles you, don't worry that's the point of this post.
This is written as of 9/10/2024, so the rule applies as of then. However, the project's compatibility policy and guarantees here in the repository or here at the official website promises to you that this rule would continue to be valid up until a major version, and as such you should be able to wield it with confidence, as I do.
Might also slow your compile times, is my understanding, but I've never bothered to do any benchmarking to be sure.
Or, perhaps, more honestly the rule set.
You can also read the actual documentation where the import convention is laid out.
Optionally, you can be more specific by importing specific names, and those are largely partitioned by the origin in the hierarchy of the typeclass.
A walkthrough for this encoding and for programming with cats
in general can be found in this book by underscore. I've had colleagues use it and they seem to find it helpful so maybe you will also.
This would be in distinction to both Rust and Haskell, that offer native intrinsics for describing and interacting with typeclasses that are to some degree orthogonal to the type system. The implementations are discovered by type, but the typeclasses themselves don't appear to have types (I'm welcome to correction on this front!) outside of specific odd circumstances like trait objects in Rust (and even then really that's an object being typed by its typeclass capacities). For what it's worth, I have not looked into the new features in Scala 3 as deeply as I ought to have (been busy elsewhere), but my understanding is that to some degree those bridge this gap and further elide this distinction.
If this concept is new to you, I would toss a search on Monoid
, but for what its worth a natural example is that of the natural numbers. Two numbers can be added, and any number plus zero is the same number. Two natural numbers can also be multiplied, and any number times 1 is itself. If you want to see a good forest of examples, this talk might be of interest!
Assuming the Monoid
instance was lawful. Sadly, I won't be getting into lawfulness here, but there are plenty of great discussions elsewhere. In short, there are equational properties expected for most of the big typeclasses that people provide, and these properties are expected for anything to work correctly in a consistent fashion. For the example of Monoid
, those are associativity and identity. You can look that stuff up here if you want at the monoid wikipedia page. You can also grab a book in abstract algebra and have a really good time if you have the time and energy.
Half seriously, in an almost Platonic way!