Why might we want to use a message bus? Not a ZeroMQ, but a dedicated, separate hardware with all the bells and whistles?
Here are my five reasons.
Future-proof data architecture
The first post service has been invented 4500 years ago. The beauty of post service is that it has unlimited use-cases. Initially you would use it to send business orders, bank notes and bills. Later it became cheap enough and people literate enough to write personal and romantic letters, or even to play chess via post. Nowadays the post is primarily used as additional authentication method for MFA, to distribute plastic cards, and voting ballots.
Message buses offer the same beauty in regard of how the data is processed. The publisher sends the data, and it doesn’t know if it will be processed at all (or dropped immediately after arrival) and if yes, then how.
This is a huge enabler for decoupled, independend, and therefore rapid software development. For this to work, the only requirement for the producer is not to implement some specific use-case. For example, if the producer has 10 data fields, and developers know about one consumer that would only need 3 fields, a common anti-pattern would be to send only these 3 fields to the message bus.
It is anti-pattern, because the loss of possible future data usages costs more than the gain of processing less data (and therefore using cheaper hardware).
Fan-out
To be able to add new consumers for the same data, we need to organize a fan-out: the possibility to consume the same data by different consumers. This can be implemented either by copying the data (eg. RabbitMQ exchanges and queues) or by creating a persistent message log (Kafka, RabbitMQ streams).
Without the fan-out, producer would need to send the data to different consumers, so it would need to know about existence of all consumers, which contradicts the “future-proof” principle above. Even though the producer could send then different fields to different consumers (reducing thereby hardware requirements), any saving in hardware cost would be more than compensated by the loss caused by waiting for the producer team to implement, test and roll out suppor for every new consumer.
Real-time “push” data processing
The two previous chapters have implicetely assumed the alternative to message bus to be a direct web service call (producer -> consumer). We now consider another alternative to message buses: store the data into a database.
The advantage of this approach is the ability to process the data in richer ways (eg. query across different data messages, aggregate, join messages with some dictionary data or correlate messages).
The disadvantage is that many databases don’t support pushing of events to the clients, so that messages cannot be processed in real-time. Some database have triggers so that data changes can at least be processed with SQL or whatever functions are available inside of the trigger. Message buses don’t impose any limitation to the consumers so that data can be processed in any programming language, using any frameworks and any hardware (we could use LLMs to process the data and do that on huge expensive GPU server cluster).
Some databases do not support in-memory-only tables so insert rate might be limited by the database insisting on writing all the data onto a persistent medium. Kafka has the same problem, but RabbitMQ and other message buses offer the choice for the admins whether the messages will be persisted. Therefore, a much higher byte throughput can be achieved if needed.
Fault Tolerance
Decoupling faults and crashes of consumers from the producer is the primary use-case for message buses. Also, rollouts of new versions of consumers do not influence the stability of the producer.
For this to work properly, we need to bust some myths and understand how to choose the hardware for the message bus.
Myth: performance decoupling
Sometimes, message bus is seen as a way to decouple the performance of producer from the performance of the consumer. “We don’t want the user to wait until this long-runnig process P is finished, so instead of calling the web service implementing P directly, we will just publish a message and return to the user immediately, and let the P to be processed in the background”.
This would work, only if the duration of P is less or equal the average time span between two consequtive messages published on the bus.
For example, if we publish a message every second (1000ms), then the consumer for P has enough time to process the data (200ms). If we publish every 200ms, then the queue will still be empty most of the time, because the consumer just needs these 200ms to process each message. Remember the school problem with a bath tube and two pipes? This is exactly it. Finally, after all these years, you can apply this school math at work.
In the practice we can allow spikes, for example, a 100 of messages published at the same millisecond. The consumer would need 20 seconds to process all of them, and if in the next 20 seconds no new messages arrive, the queue will become empty again. This spike handling is also one of the reasons why people employ dedicated message buses.
But in the general case, we cannot allow for the situation where consumer is consistenly slower than the incoming message rate. In this case, the queue will just grow, until the hardware resources will be depleted, and then the message bus would either crash or throttle incoming messages.
Myth: just add consumers
So one consumer takes 200ms for the process P, and we have new messages coming every 50ms. Surely we can just start 4 consumers for the same queue to work in parallel and so solve the problem?
Yes and no.
Consumers having only one input (from the message bus) and no outputs are quite pointless and useless. As a rule, a consumer might have more inputs (eg. reading some other data from a database) and at least one output (sending a e-mail, calling some web service, storing data into a database or sending it again to the message bus). These inputs and outputs don’t scale indefinitely, so increasing the number of consumers from 1 to 4 might saturate either one of the other inputs or outputs and therefore might not yield a desired scaling effect.
Choosing hardware
In the normal operations, the dedicated message bus is pure waste of hardware: the queues are always empty and the CPU is also not really loaded.
We have to choose hardware for the worst-case scenario. This can be a complete down of the consumers, for at least three days (a weekend until the problem will be detected, and a day for the devops to actually fix the problem and re-start the consumers). So just estimate the average amount of published data in bytes for three days, and provide as much RAM (plus overhead for the server and OS) or HDD space if you’re using persistent logs.
Again, paying for hardware here is more than compensated by the advantages discussed above (short time-to-market, less project management and team coordination, fault tolerance etc).
Monitoring and UI
In the real life, consumers will not take exact the same amount of time to process message (like, 200ms). Instead, some messages will take longer than others. Also, consumers might be limited by some other resource like a database or a GPU server running ML models, and so if these ressources have more load, also the consumers become slower.
Therefore, one of the core features a dedicated message bus must provide for successful operations is monitoring and UI.
We want to be notified if a queue has grown over some threshold and it still has the tendency to further growth. We also want to be notified if there are no incoming messages being published. We want to be able to quickly look in the UI what is happening (does the queue have consumers? How many? How big are the messages? Is the message bus already throttling the producers?).
Trivial Scaling
If we choose the hardware properly and operate the message bus professionally, and our consumers are quicker than the incoming message rate most of the time, scaling of the message bus itself is more than trivial: whenever the performance of one message bus server is not enough, we just provision second identical hardware, install second message bus there and install a TCP load balancer (nginx, traeffik, or their cloud alternatives) in front of them.
These two nodes don’t even need to know about existence of each other and don’t need to communicate with each other. Here, message buses have an advantage over database. If we shard the same table over different ClickHouse nodes, and then run some simple query like “SELECT count() FROM Table”, each of the nodes will execute the query by their own, and then they need to talk to each other to combine (sum) the partial results for the final answer. This is not the case for message buses, because each consumer only concentrates on one queue, one message per time, so there are no queries spanning across messages or across the queues.
Summary
Message buses should be used when rapid software development and short time-to-market are essential, for the systems that require high scalability, fault tolerance, and decoupled, independent data processing. They are particularly beneficial in environments where data needs to be consumed by multiple services or in real-time scenarios where immediate processing is critical. Despite the upfront costs of dedicated hardware and implementation, the long-term benefits of rapid software development, reduced system dependencies, and robust fault management significantly outweigh these expenses. Message buses streamline complex data flows, enhance reliability, and simplify scaling, making them a valuable investment for modern, dynamic architectures.