Those of you who’ve read the previous blogpost about our Microservices ambitions, know that we already decided on Cassandra as the primary persistence layer of our Microservices architecture. “I do not regret that decision, but I have to admit that I made it without fully understanding the consequences”, Quinten explains.
“Since then I have dived head-first into the rabbit hole that is NoSQL databases. I guess I was just trying to retroactively make sure that that decision was in fact the right one. There is a lot of information out there, but most of it has a focus on just one NoSQL type. In addition, most of these products are open source, yet a lot of info originates from companies that in one way or another make a profit from it. Possibly or even likely, that info omits many of the pitfalls its respective database technology can bring with it.”
In this, Quinten saw an opportunity to bundle and share what he had learnt.
Choosing the right database
Why should you, a microservices developer or architect, care about databases? That is the first question Quinten would really like to answer.
As you know, the microservices way of working is all about the independence of services. It seems quite logical that each one of them should be responsible for its own persistence layer. This gives you the absolute freedom to choose something that fits the needs of your little service perfectly. That’s awesome! Right? Well yes, however… Choosing a database is not easy. There are a lot of possibilities out there and making a choice has some very real consequences. Luckily, a microservices approach is much more forgiving for making mistakes, but making the wrong one can still come back to haunt you in production.
“As a small disclaimer, I should mention that I am by no means a database expert in even one technology, let alone in all of them. However, by having a good high-level understanding on what’s happening under the hood, I could at least eliminate some of the risks that come with choosing a database. I hope to do the same for you.”
Before you start, there are some of the questions that need to be answered:
- Do you need your database to be distributed?
- Is your data relational?
- Should you consider NoSQL?
- What’s ACID and the CAP theorem and where do the requirements of my data fit in all that?
- What skills do your team have and what kind of investments are you prepared to make?
- Should you host it yourself or should you consider a database-as-a-service completely managed by a cloud provider?
The impact of the nature of your data
The most important question should always be about your data being relational or not. Before you even start to look at the plethora of NoSQL database offerings that seems to grow by the day, take a minute to think about the nature of your data. From my experience, most of the data we come across is relational and structured. If it is, there are not many reasons to consider NoSQL.
Other arguments often brought to the table are scaling and performance. Admittedly, doing this yourself is probably easier with most NoSQL databases, but remember that you can always offload that problem to your cloud provider. Solutions like Amazon RDS and Azure SQL might seem expensive, but the cost of setting up and managing your own on-premise proprietary solutions are often highly underestimated.
If you’re sure it’s not relational or you’re interested anyhow, the next part of this blog can guide you through the large amount of NoSQL stores out there.