Intro
I’m deploying a cluster of Elixir with 5 to 7 nodes on Kubernetes(K8s), our system is using :erpc in internal Elixir cluster for save time to develop. It’s perfect for case static/fix cluster but hard for case run a Elixir cluster on K8s.
Why it so hard?
K8s is designed for dynamic thing like web service (usually, stateless & not join a cluster like Elixir cluster). IP and hostname of pod is dynamic it’s changed every time pod is restart an node name of Elixir is changed follow pod. It’s not match with our system design. Of course, we can workaround by using headless service or using Gossip strategy of :libcluster then prefix of app name and check it in Elixir code but it’s more complicated for scaling and sound not good for thing work well with distributed system like Elixir.
Of course, we can use other thing like broker or bus message or wrap to gRPC/Rest API but that increase complex for developing. We want to use rpc of Elixir for fast develop (benefit from dynamic type language).
Our way
We made a library has name ClusterHelper to map role (or Id if we needed) to Elixir node name in runtime. If a node join to cluster it will auto update role of node for other nodes in cluster. The library is an application run beside every Elixir node in our cluster, rpc or other code can lookup Elixir node name if needed.
Based on role can bring for us some benefits like scale number of Elixir nodes (Pod on K8s) easily, we don’t need to care about node name of Elixir and how to map it on K8s. Easy to scale a service by add more nodes for that service.
We have another library with name EasyRpc help us work smoothly with number of nodes have same a role.