Using Microservices to design for scale

Posted by David Estes on Jun 09, 2015

Woah! totally used the latest trendy buzzword in the title. Microservices are all the talk in the developer community these days. Some even say if you aren't designing with microservices you're doing it wrong. They have also caused a trend of teams of developers who have never built microservices, going all in on them. Or even using them where it makes no sense to use them. So let's talk about some practical use cases for microservices and how they can even be used with Integrated Systems (Monoliths).

Don't Kill off your Monoliths because its trendy

At the recent RailsConf DHH gave a keynote speech about the trendiness of microservice architecture and what it means to your development stack in the context of a monolith. I happen to agree with almost every single point he made here and agree that the term monolith is a highly negative term that doesn't correctly describe the concept. In this talk he proposed renaming them "Integrated Systems" which is truly what they are. So now that we have that out of the way lets talk about microservices.

Only use microservices when it makes sense. Some people think they should redesign their entire architecture with microservices and perhaps thats a valid scenario, but I'm willing to bet you're not at that level of scale and all you're doing is making your life, and the lives of your DevOps / ITOps teams a hell of a lot worse. Let's say for example you have this massive web application that you want to move off into 10 or 20 microservices. Bear in mind that every microservice you add to the system has to be maintained, and accounted for by your dev ops and IT teams. Not only that, if you are not very careful and account for inter-communication redundancy and service redundancy you are exponentially reducing your "Mean time to failure". For every microservice you add, you add a failure point. There is no scenario in which this is not the case. You can, of course, offset these failure point impacts by adding layers of redundancy, which make the impact fractional, but again that requires a lot of work.

One approach is to mix your "Integrated Systems" based application with microservices as needed. Or only split things into microservices when you discover that you need to scale a feature independently of the rest of the system. One example of this was a scenario in which we needed to archive all of the incoming data payloads to s3 for long term archival. We needed to be able to scale our write throughput to Amazon's s3 service without scaling up our giant Grails application. The other benefits of this is this also meant a lot of open sockets and a lot of bandwidth throughput. We did not want these open sockets and bandwidth to adversely affect the consumer facing side of the application. To solve this, we developed a very small Groovy application that was simply a RabbitMQ consumer. It did nothing with any other service but consume messages and offload the message payloads to s3. By using Karman with its AWS adapter, we were able to achieve a write throughput of 3900 s3 writes per second on 2 Amazon micro instances. The other great part about this architecture was that any hiccup in the Amazon s3 service did not impact the rest of the system since it wasn’t critical to the rest of the services offered. Our RabbitMQ message bus acted as a buffer that could also queue up items to be written in the event our microservice was being restarted or had an outage. This, in effect, helped reduce the failure impact of the service but did add 2 new servers and 2 new applications that had to be monitored for failure.

When building microservices it is imperative that you keep their datasets as isolated as possible. Don't try to connect your microservice to a shared database unless it's absolutely necessary. By doing so you have reduced the ability to independently scale your microservice. How so? By requiring shared connections to a database you have now introduced a scaling dependency that is the database. In a shared context, it's much more involved scaling up a database when your microservice may have only needed a small subset of data within it. Not only are you impacted by query throughput but also the number of open connections your database allows.

Happy Apps is a perfect example of a hybrid system. In all, there are 4 separate repositories serving 4 different types of users. The first is a CMS engine for running the marketing site. It is highly optimized for caching marketing pages and can be scaled independently because, as most of you know, Your marketing site is the start of the funnel. It will get the most traffic at the beginning and encounter surges in traffic. You really don't want to have to scale your web application with the marketing site.

Another system is the public API docs. This is also a static set of pages that can be scaled independently and is mainly kept separate so that it can be updated by the public and independently versioned without giving away proprietary code. In many cases, you could also keep this tied in with your marketing site.

The third system is the main web application itself. This is a grails based integrated system that is highly optimized to maintain check state, and app state, while producing a great user interface for the user to interact with their monitoring information. This application handles calculating availability of your checks, sending out alerts for outages, providing api endpoints for developers, and interpreting results from the fourth service.

For the case of Happy Apps, we did not think it made any sense to couple the running of all the various checks with the main application. This would cost us severely in building big high memory profile Amazon servers with high network capacity. Because we were always going to have more checks in the system that had to be run at a periodic interval then we were going to have active users on the site. So what did we do? We made a microservice because in this context it made complete sense. We designed and built a tiny groovy based application that acted as a RabbitMQ consumer (sound familiar). Everything needed to execute the check is contained in the message payload. Once a check has completed or failed more than once , the results are placed back onto the rabbit message bus in a "Results" queue. These results are then consumed by the "Integrated System" or Happy apps web server. This is persisted to the database, written off to Elastic for historical archival, and in the event of a failure run through a slew of algorithms to determine how the incident should be grouped together to reduce noise and to determine who should be alerted of the issue.

Now we can scale our check servers independently, and can reduce the affect of open connections on our main web servers. Even better these services are small enough to run on micro instances. We run more check servers on small boxes rather than big instances with few check servers to ensure that a parallel check for another account will not adversely impact the response time statistics of another persons check. And, if we run into a scenario where we find we need to scale something else in our application stack independently, it's really not hard to split it off into another microservice to get the job done. That is part of the beauty and elegance of decoupled code design.