Evaluating Distributed Databases – 4 Major Questions to Ask
Distributed databases come out with many challenges for those who adopt it as new. To build up an enterprise distributed database from scratch, one should continuously put in the effort and make some wise decisions. You should not tamper the development speed and also have to make some long-term considerations for your team and the customers.
Three years into the market and two major releases in between, TiDB 2.0 is not a primarily accepted DB, which is deployed in more than 200 top-line companies. Their team answered many of the queries from the users regarding TiDB and the distributed databases in general. Here, we have gathered some of the crucial questions among that lot, which we cover so frequently, which every engineer should be asking while planning to architect a distributed database.
1. Why can Distributed databases not be counted as a silver bullet?
In fact, in the kind of technological diversity, you may not be able to find a single solution that can satisfy all your needs. This is the same in case of database management too. If your data can ideally fit into a single instance of MySQL without much pressure on the server, or if the performance requirements for all complex queries are not that high, a distributed database cannot be ideal
.
The selection of a distributed database may typically incur some additional cost, which may not be affordable for limited workloads. If you want to ensure a High Availability (HA) for smaller workloads, then the master-slave replication approach of MySQL plus the GTID solution may be good enough to get it done. However, with MySQL's active community, you can instantly find solutions for any issues out there. So, if it is enough to have a single instance database, you may stick with conventional MySQL.
So, when is the ideal time to deploy a TiDB type distributed database system? A tough question to answer as it depends on situational needs. You must look for the signals above to understand whether a distributed database can be an ideal solution to serve the majority of your purposes. Some key questions to ask yourself are:
Are you thinking of options to migrate, replicate, or scale the database?
Are you looking to optimize with your limited storage capacity now?’
Are you worried about the slower query performance now?
Are you researching any middleware scaling solutions?
Are you trying to implement manual sharding?
TiDB and MySQL need not be considered as mutually exclusive choices. You must option to simultaneously try out MySQL and TiDB even for the single instance workloads.
2. Why should we separate SQL from the storage?
There are many reasons why you may separate SQL from storage. Here are a few, as explained by RemoteDBA.com.
Easy to maintain – We may separate the SQL layer of TiDB from the underlying storage layer to make the deployment, management, and maintenance simpler. DevOps is not merely about implementation, but it also must serve by quickly isolating the issues, debugging the system, and handling overall maintenance. For example, a bug you find in the SQL layer that needs an update urgently may be a time-consuming and risky affair if your entire system is infused into one. However, if you have a separate SQL layer, this update can be easily put on without disrupting any other part of the system.
Better usage of resources – Modular design is much friendly with resource usage and allocation. As we know, the SQL process and storage depend on different computing resources in function. When storage is mainly dependent on the I/O and affected by the hardware used, SQL processing may primarily rely on the CPU capacity and size of the RAM. By putting storage and SQL in different layers, it will be easier to make the system more efficient with the right kind of resources.
Bring more efficiency and flexibility in development – By using separate key-value to build a separate storage layer, it can increase the flexibility of the entire system. It can also offer more horizontal scalability. Having a separate storage layer will also set open the possibility of benefitting from different computing modes in a distributed system.
3. Why is Latency not considered as a premier measure?
The critical question DBA's may hear frequently could be, "whether TiDB can replace Redis?” The answer is no because TiDB is not a caching service. It is a distributed DB solution that can support transactions with consistency, horizontal scalability, and high availability. In TiDB, your data is replicated among multiple machines. In the TiDB principle, every system act as a “source of truth.”
To make this possible in distributed databases, there should be some tradeoffs and latency. An ideal production in a distributed environment may require only a low level of latency, which may be less than 1 ms, for which you may have to use some caching solutions like Redis. The majority of the customers may be using Redis with TiDB. With this, when the caching layer is compromised, there could be a database that is always on with the data being still available, consistent, and featuring unlimited capacity.
When it comes to measuring it more successfully, throughput is a better measuring stick in a distributed database environment. If the performance of the system tends to increase linearly with the number of machines getting added to the system when the latency is held steady, it could be an indication of a solid distributed database.
4. Which is better: ‘Range-Based’ or ‘Hash-Based’ Sharding?
Usually, it is range-based sharding in the case of TiKV if the goal is to support a featured relational database. Such a relational DB must also support various scan operations as Index Scan and Table Scan etc. Even though sharding is harder in range-based sharding, hash-based may not maintain the data in a proper sequence in the table data. So, even a small scan in the hash-based setting may jump around many shards between different nodes. This would not be an issue in range-based sharding.
When you start small, any kind of a database may work, but as the company grows and the data grows exponentially, challenges regarding your infrastructure technology matter. So, making the right scalable options at the very first point in terms of database technology will further make your journey smoother.