Peter Marton, Co-Founder and CTO, RisingStack, discusses how they transformed a monolithic application into a microservices one.
Most software projects start with solving one problem, then comes another one, and the project continues growing. This is how monoliths are built. Every new feature gets added to the existing application, making it more and more complex. After a certain complexity, the engineering team won’t be able to cope with it, and they can’t deliver new features with the usual quality and speed anymore.
This is a common experience within the software industry. I know it because we had to deal with the same issues while we built our Node.js Debugging and Performance Monitoring tool called Trace in the previous 10 months.
In the following article, I’ll briefly summarise the story of transforming our monolithic application and let you know about our best practices and recommendations with it.
Why did we start to move towards microservices?
We encountered two major technical and productivity issue with our monolithic application:
- Our engineering team was growing, and we experienced that larger groups became less focused. We needed smaller teams with separate features.
- We needed high fault-tolerance because our users expected a monitoring solution that is functioning all the time.
How did we solve these problems?
Creating focused and independent teams
We decided to split our teams based on the features of our application. These teams consist 2 or 3 developers, each possessing a different set of knowledge, which lets them work together seamlessly, and independently. One team is responsible for only a few, usually 3-4 services. This means that they can now try out new things more quickly, and they can design deploy and monitor their services separately.
We had to improve our code to create a system with really good uptime because each of our services must be able to fail separately, without affecting each other.
For this, we started caching the critical resources because our services always depend on each other. But what can you do about temporary failures when caching cannot help?
To solve this, we are using a technique called CQRS – Command Query Responsibility Segregation. What does it mean for our system? For example when we are collecting some data from our customer’s application we do it with a lightweight collector process which doesn’t have any external dependencies: It doesn’t reach any databases, and it doesn’t call any other services.
In this service, we only validate the incoming data and put it to RabbitMQ. At the other side of the RabbitMQ, a worker is continuing the messages, is calling all the dependencies, other services and saving the data into a database. When any of these dependencies break, we simply put these messages back to the queue for later processing. Your queue will start to grow but after the consumers are coming back, the messages can be processed, and you won’t lose any data.
What is the unwanted side-effect of the microservices transition?
Increasing architectural complexity comes by definition with microservices. You end up with much more services that are communicating with each other through some kind of network, and it makes debugging harder.
In a monolithic application, specific code pieces are communicating in the applications memory. In a microservices application, the services will usually use some kind of network, like an HTTP protocol, RabbitMQ or Kafka. These networks will add delays to the internal communication of your application. If you put services into a call chain, the response times will be higher. It’s also harder to test because you will have to do some kind of integration tests when your services are depending on each other.
Network delay is the evil with microservices, and the easiest solution to keep it minimal is to put your services close to each other. Try to keep them in the same data centre with your databases.
Avoid using PaaS providers who use external routing. At the beginning of our transition, we had to deal with high response times until we figured out that every request between our services went outside the internet and back, and reached more than 30 network hops.
When they a team is building a feature based on the other one’s service, and it has a bad response time, they will have to deal with it too. To support teams working together, we started to support service principles.
For example, we allow ourselves to create a maximum of 3 depth call chains, which is good not just for the complexity but for the response times in the system as well.
We also enforce ourselves to create backward compatible endpoints. We do this because we don’t want to ask each other all the time about things like: What’s changed in your endpoints? Which version is deployed? It’s a highly recommended we make the ever existing endpoints compatible.
Take note that the transition to microservices is not recommended in the MVP phase, but it’s highly beneficial after you have a product that needs scaling and a big engineering team.
Solving the problems which are caused by the complexity of a microservices architecture is not easy, but very rewarding. I hope the advice I gave you proves itself to be valuable to you.
Edited for web by Cecilia Rehn.