Upgrade to MongoDB 6.0 for improved index management, error handling, powerful aggregation, replicaset enhancements, time series collection improvements, and embedded field exclusion. Experience efficient database management.
Discover how MySQL 8 enhances security through features like password expiry, password validation, account lock/unlock, privilege management, and dual passwords. Safeguard your valuable data and streamline access control in your MySQL environment.
Real time data streaming from MariaDB to Columnstore
By Sukan | May 28, 2019 | Posted in : MySQL
In the recent past there has been a massive business shift towards analytics database. Unlike traditional databases, analytics database always stands out in running big data queries. No matter how much we optimize, transaction databases are not suitable for these kind of workload.
Business these days are gripped in running statistics on top of everything they have, for them these reports are critical as they can analyse, optimize and apply them to boost their productivity. So it’s no brainer to look for an analytic storage database.
Choosing the right analytic storage database is not an easy task, as you have to consider many factors such as type of workload, queries, storage, compression.. Etc. If you are looking for one, please refer to the previous blog in this series. Next thing would be streaming the data to the analytic database, ETL jobs are the foremost thing that comes in everyone’s mind. Although it is a proven one, it takes a lot of time to write queries and unfortunately if you are on a tight pipeline you don’t have much time to write these jobs.
Fortunately we have few options with maxscale, proxysql , tungsten which eliminates the need for ETL.
In this blog, we can see how we have achieved real time data streaming from transactional to analytics database.
Background of the setup
MariaDB as Transactional DB
MariaDB columnstore as Analytics DB
MariaDB Maxscale as the binlog reader
Requirements on MariaDB
Need to enable binary logging
Need to update binary log format as ROW
Need to create a user to stream the data from MariaDB
Requirements on Maxscale
Configure replication_router and replication_listener in the maxscale config
Replication user needs to be used in the maxscale config to communicate with the MariaDB.
Architecture
Steps
Maxscale will read the events from the binary log and it uses avro / json format to load the data into the columnstore using the CDC adaptor.
CDS adaptor is a utility which will act as an intermediate between Maxscale service and analytics DB.
Once it is ready, we can call the mxs_adapter utility, it will start the data streaming to columnstore thus eliminating the need for an ETL job.
Conclusion
ETL is time consuming which is tough to maintain, needs to be updated regularly, whereas this service is easy to deploy and maintain. With this recent trend on analytics, data streaming from transactional database becomes much simpler and hassle free.
1 Comments
Praveen AdepuAugust 02, 2020 03:18 am
Nice article. We are trying to implement the real time data streaming from MariaDB 10.5 to ColumnStore 1.5 using MaxScale 2.5. Unfortunately, the CDC adaptor is deprecated in MaxScale 2.5. We tried HTAP, but it is very slow. Can I please know what is the alternative for the CDC adapter in MaxScale 2.5?