I know I’m writing about Elixir, but I want to start by talking a little bit about Go. Not the incredible standard library, simple syntax, etc. Just the concurrency. Channels and goroutines are really what make Go special, and allow you to do Erlang/Elixir style message passing. I’ve written a lot of Go and one thing I constantly see is that these primitives are usually hidden behind interfaces. Your http server handlers are already in goroutines so they feel like a thread. A lot of the synchronization primitives rarely expose a channel. Waitgroups and mutexes either implement it differently or wrap up channels for us. Who cares, doesn’t really matter, the interfaces are small and we can trust that they work in most cases. Part of the magic is it feels like you are just writing sequential or old school threaded code.
Elixir has that “shared-nothing” model as its only operating mode. It can really get funky because if you want shared state, you have to send a message. Everything is immutable. Go ahead, pass that variable or struct or map. It doesn’t matter, you can’t change what the sender, caller, etc. see. So instead of shared access with “write” and “read”, you just “send a message” and wait for the response. Now instead of “I need to lock a piece of memory possibly across machines” you just need to send a message. The magic of this is that now you can wrap up these processes in a way that doesn’t require any of the sharing of memory. “Distributing” is no longer a matter of syncing with Redis or a third party, it’s just making sure that your code handles messages correctly. I won’t get in to Erlang’s built in distributed computing capabilities until a little later.
But almost all of the Go code I’ve had to work on doesn’t exploit the fact that we can pass messages. I was constantly writing simple “done” channels, waitgroups, and adding something like Redis to do things that you can do directly in Go if you stop thinking about sharing. You don’t need to distribute memory if all you have to do is pass messages! Sharing memory across machines is really hard but allowing machines to send messages is really easy. Sure it has its own set of problems, but now instead of having everyone agree that a certain resource is locked, or handling concurrent writes/reads, we just all agree that the resource is both state and process. Discord has a lot of great writing on this (like these). It’s a high level overview, but one process receives a notification that something has happened, and then individual users are partitioned across sub-processes. Again, wait til the end and there is some further reading on Elixir built ins that make this much simpler.
Okay, so… we should probably put this together right? Well let’s look at two ways of distributing work for our own product. Say it’s a kanban board. We need to process users creating issues, moving issues, possibly doing illegal things to issues. The way we should do it is just by using sql transactions in whatever our database is. We aren’t Atlassian and we don’t have Atlassian problems. But, now we may have to handle a lot of nuance between different tables, we have to deal with SQL, all kinds of things. What if we didn’t even need SQL?
What if for each issue being worked on, we had exactly one goroutine to handle its state. HTTP requests modifying a ticket don’t simultaneously open transactions to modify, they both just send messages to the goroutine to update and it takes care of it, then responding to both.
Wait wait wait, that’s a LOT of goroutines. Of course it is! They’re not infinitely cheap, but they can be spun up and down pretty easily. Perhaps we select on our work queue and a time.After to close down the goroutine. Now we can also do something like… just have a folder on disk where each goroutine owns one file and can write JSON directly to it. Maybe it’s still backed by a database and now the goroutine can optimistically return updates and sync the end state to the backend every once in a while.
Elixir has
Registry,
which lets you register your process locally with a name. Maybe it’s the ID of a ticket. You could do the same (a subset of Registry’s functionality) with Go and a map of strings to channels. When a goroutine for an ID is no longer alive, we go look up the channel, if it doesn’t exist we call our goroutine factory and it starts draining our messages. Now we have in-process actors for free.
But there is also pg and mnesia. Which are, respectively, a distributed registry and database built into the standard library. It’s insane how well they work up to a point. I use mnesia in one project to store cached HTTP requests to services running on machines and pg for creating a simple registry of interested parties for some system notifications. These don’t have any kind of parallel in vanilla Go, and would typically be what you’d pull in something like etcd for.
If you really want to read some fascinating documentation, I would highly recommend just browsing the opening descriptions of pg, mnesia, erpc.