Introduction
As with all trending architectures, companies are looking to leverage (specific) benefits before fully understanding the advantages and disadvantages of these solutions. Working with Microservices Architectures is no exception.
According to the InfoQ Trends Report, microservices – as a topic – has moved steadily through the chasm and is currently in the late majority phase. However, there are a lot of articles describing unsuccessful moves leading to push-backs and eventually rollbacks. Moving to a Microservices Architecture is not the silver bullet solution.
Microservices Architectures promise:
- Faster time to market;
- Increased flexibility;
- Improved scalability and resilience.
However, these widely advertised promises are solely focused on the microservice itself. A microservice becomes simper, developers get more productive and systems can be scaled quickly and precisely, rather than in large monoliths. But the promises skip the implications/caveats as part of a larger (composite) systems and landscape. To move to an Microservices Architecture, not only the characteristics of the microservice itself, but the system-wide characteristics and surrounding interactions will need to be addressed.
Archers Microservices Reference Framework
The previous diagram captures the difference between the Inner Architecture, i.e. the application architecture of one microservice, and the Outer Architecture, i.e. all capabilities needed to manage your Microservices Architectures:
- Inner Architecture (IA): all architectural decisions made concerning the service (i.e. technology, implementation, business logic) belong to the inner architecture. This service is only exposed as an endpoint and abstracts all details of the implementation. This enables organizations to build and run services in a constantly changing environment. The main focus of these principles is to enable speed and scalability, and to reduce the costs.
- Outer Architecture (OA): characteristics that apply to the solution within the ecosystem of your enterprise landscape are described in the Outer Architecture. Microservices introduce new challenges and amplify old ones. Removing some of the complexity of your Inner Architecture will not make these challenges disappear. It will make them latent, as they become visible when going to production, such as interactions between services, fault handling, …. The supporting platform capabilities become much more important. Everything needs to work together to make good on the promises of ‘flexible’ and ‘scalable’ development and deployment. To address these latent complexities, they also need to be considered during design and implementation of the microservices. The Outer Architecture can provide you useful insights on how (micro-)services work together.
- External Architecture (EA): where Inner and Outer Architecture focus on the setup and interactions between dependent microservices in your own landscape, external systems should also be addressed which require other capabilities. These systems can include both other enterprise applications within your own enterprise landscape, but also partner applications that typically do not fall within our responsibility.
Working with Microservices Architectures has taught us that some (hidden) challenges and obstacles need to be addressed in order to setup the correct baseline for you Microservices Architectures. At Archers, our Microservices Architects researched these complexities and defined the capabilities to reduce or mitigate these challenges. The result of this research is the Archers Microservices Reference Framework.
This blog will focus on the some of the biggest challenges when moving to production, most of them originating by the distributed nature of the landscape. This will give you a head start and some insights towards a successful Microservice Architecture move.
Troubleshooting/Root Cause Analysis
Systems should be designed to handle failures, preferably automated, but this does not mean all failures can and/or will recover correctly; or that your services will not be working correctly; or worst case, will be unavailable during an undefined period. When working within a distributed landscape, additional Points-Of-Failure are introduced. It is good practice to design your Internal Architecture for failure to increase the resilience of each service. But what about the Outer and/or External Architecture? How can you detect problems when collaborating with other services? What if services start failing? Does one problem trigger a cascading effect, making you entire landscape unavailable?
Detecting failures is critical and can only be achieved by setting up monitoring and logging. Our Archers Microservices Reference Framework includes the ‘Observability and Analysis’ capability. This capability is positioned as the most fundamental part of working with microservices, but also the most challenging one.
The following topics can be addressed with this capability:
- Logging: keep track of all programs (and error) related data in a centralized way
- Monitoring: collect, aggregate and analyze to see how a system behaves
- Tracing: following the flow and data progression throughout your distributed landscape
These topics can be on several levels (business, application, service, infrastructure…) and should be taken into account when designing solutions. They all serve different needs but are complementary features to have.
However, there are other benefits of this capability: once you have this observability part setup correctly, it can be integrated in an enterprise Incident Management tool. Based on these metrics 2 other equally important practices arise:
- Alerting: if metrics are indicating failures or thresholds are being passed, operation needs to be informed of these events. This can be an automated process (scaling due to lack of resources) but can also be a service engineer being alerted to check system status.
- Root cause analysis: when a non-recoverable failure has occurred, the most important step should be to recover ASAP. But it does not end when recovered: a root cause analysis should be performed. All logs and metrics that lead up to the time of failure can be consolidated. This can lead to a fix and/or extra metrics to be set up to mitigate this kind of failure in the future or to recover from them automatically.
A technical deep dive into this capability using Elastic, was described in our blog about Observability in our microservices architecture with Elastic Stack
Data consistency
One of the fundamental principles of microservice design is autonomy. If you’ve read Pat Helland’s 2005 paper about data on the inside vs data on the outside, this concept should sound familiar. This implies that each micro service needs to have its own data store and full control on its own data. This data is called Inside data. The counterpart of inside data is called Outside data, which represents data flowing between independent services.
When operating within a monolith using a single data store, one can benefit from ACID (atomic, consistent, isolated and durable) transactions. These transaction allow the data to be read and modified in a consistent manner by locking and unlocking data at the beginning and ending of a transaction. However, this can only be applied to Inside data. Once the data flows through the service boundaries, the transactional context is unlocked. This means the data you receive from another service is a snapshot from the past, which can be consistent, but also can be changed in the meantime.
So how do we handle data consistently? How do we keep outside data consistent? Do we just query them (over HTTP)? But then we create a distributed monolith? Do we duplicate relevant data into our own database? How do we keep our data consistent?
To address the first problem of outside data, one must be able to determine which ‘version’ of the data is being transferred around. This can be done by using an ‘historical’ identifier, implemented by using either a date (modification date,..) or a unique identifier (hash, versioning,…).
To answer the second question on how to keep Outside data consistent, is make sure the data is update atomically. This can be done by using distributed transactions via a two-phase commit, but this is not scalable as it is a synchronous, blocking operation. In a distributed landscape, where we are promised scalable solutions, this requires a different approach:
- One way to fix this problem is to implement the SAGA pattern. You can find a deep dive into this topic in one of our previous blogs Choreography vs. Orchestration, which explains 2 ways to implement the SAGA pattern. This blog briefly explains the meaning of compensating transactions, cost of change and can act as a decision matrix to guide you through the correct implementation. Which type of implementation you choose, will depend on your requirements, i.e. either observability or independence;
- Another way to fix this problem was discussed in our previous blog around, Event Driven Architectures (Blog) (EDA), which briefly explains Event Sourcing and CQRS, but also states some recommendations when this way of working should be used. Advantages of communicating via events are that we reduce coupling and resilience by design. But this blog also provides insights on the drawbacks of an Event Driven Architecture.
The answers to this question will be supported from several capabilities, i.e., ‘Coordination and Service Discovery’ and ‘Mediation’, but will depend on how you will implement the solution from the second question. Considered that this is not a ‘one solution fits all’ solution.
Communication
Microservices communicate with each other – synchronously or asynchronously – preferably via lightweight language-agnostic protocols, such as HTTP REST (Fielding), Websockets or AMQP. As stated in the previous section, i.e. troubleshooting, microservices must be designed to deal with (communication) failures. It’s not a matter of do failures occur, but rather when they will occur. But how do we cope with these failures? Do we just retry?
To get a glimpse of what can go wrong with communication in distributed environments, we refer to The fallacies of distributed computing (L. Peter Deutsch), which lists 8 false assumptions when developing distributed applications.
The fallacies are:
- The network is reliable;
- Latency is zero;
- Bandwidth is infinite;
- The network is secure;
- Topology doesn’t change;
- There is one administrator;
- Transport cost is zero;
- The network is homogeneous.
In a microservices environment, the distributed nature of it increases the dependency on network and infrastructure services, thus increasing the probability of failure. Also, a poor configuration may lead to increased latency and reduced speed of calls across different services which results in slow response times.
Several of our supporting capabilities in our Archers Microservices Reference Framework try to mitigate these risks, such as Mediation and Connectivity.
These capabilities can work together to allow you to dynamically call a service. The whereabouts of this called service are not your main concern, and should be handled appropriately through supporting components, e.g. a service mesh, which support this Service discovery and coordination. Which specific components you should use, will depend on the problems you are trying to solve.
On the other hand, several instances if the same service can exist at any time. Which service instance will be called is handled by your supporting platform components, and also provide you load balancing and dynamic routing to a running service instance.
Complexity
When working with microservices, the overall complexity increases, which can relate to the infrastructure, but also integration challenges, dependency management, automation processes,. The Process & Governance capabilities provide you with guidance and tools to reduce complex problems (i.e. unpredictable) and to be able to solve complicated problems (i.e. manageable and predictable). (cfr. It’s Not Complicated: The Art and Science of Complexity in Business)
One of the topics in Complexity tries to solve infrastructure problems. When building applications, the Inner Architecture will give you guidelines on how the application should be build, not where the service should be deployed. But how do you build and run a service without knowing where to run it? How about portability of your application? Can it run on different environments? How do you keep track of changes of infrastructure?
The key when building applications is to abstract the infrastructure and implement best practices. The most common practice is to use containerization, which allows you to bundle all dependencies and tools within one independent deployable unit, called a container. The container abstracts the runtime, but also enables portability to other infrastructure, whether it is on premise or in the cloud.
When deploying to a container runtime, you are certain all specifications are aligned with your requirements. But what of these requirements change? How do you keep track of these changes?
Not only your code should flow from development to production, but your environment should resemble production as much as possible. An answer to this can be done by introducing Infrastructure-as-code, where you use a declarative language to describe what you need and put in a configuration file. This has two major advantages:
- Version control can be applied to infrastructure code files;
- The declarative language enables other providers to implement the resources;
- The configuration can be used to promote from development to production, but also to build an environment from scratch (disaster recovery,…)
The same way of working also applies to other types of configurations like: (not exhaustive)
- security configuration;
- application configuration;
- network configuration.
Conclusion:
At first glance, the promises of Microservice Architecture are a no-brainer when it comes to solving issues when working with a monolithic, or distributed monoliths. But pay attention to the caveats. The complexity is not removed from your enterprise landscape, but just moved to the outer architecture of your microservice.
Some more integration blogs for you
ALWAYS LOOKING FORWARD TO CONNECTING WITH YOU!
We’ll be happy to get to know you.