Spider storage engine was introduced in Mariadb 10.0.4 as an alpha version. After a long wait, spider engine become stable from Mariadb 10.3.7.
Spider storage engine comes with inbuilt sharding features which supports partitioning and xa transactions. When a table is created with Spider, the table creates links to the remote server (MySQL / MariaDB).
For example,
If the table is created on the Spider Node(10.100.10.10) with Spider Engine, then the table creates a link to connect the Backend Server (10.100.10.15).
It supports both MySQL and MariaDB servers as backend servers.
Currently Spider Storage Engine has two types:
Federation
Sharding
1) Federation:
Entire data will be kept on remote data nodes.
To use spider, you need a minimum of two or more instances of MariaDB or MySQL, typically running on separate hosts.
In federation, application sends traffic to Spider node and spider decides on which data node it has to get the data. The data nodes are the MariaDB servers that actually store the table data and in our case, 10.100.10.15 and 10.100.10.18.
In order for this to work, you need to configure the data nodes to accept queries from the Spider node.
Using Federation,
Read queries, Spider perform load balancing to the data nodes.
Writes queries are mirrored to the data nodes.
Full copy of the table on each node.
2) Sharding:
This feature will keep the table data partitioned across different remote backend data nodes (Shards).
Sharding in spider is splitting up (or horizontally partition) big tables among several isolated DB instances, called a database shard. We can divide huge tables and load share the traffic to multiple shard servers by using spider.How to shard the data is really interesting. You just mention the native partition in spider node. Lets say, we created Table1 with 2 hash partition in spider node. Part 1 of data will store in 10.100.10.15 and Part 2 of data will get stored in 10.100.10.18 server. You can use all partitioning rules(key, range, hash, and so on).
Using Sharding,
Ability to store large amounts of data.
Quick result during the retrieval process as spider uses certain methods of parallelization.
If your data set is huge and you want to scale horizontally.
If you want to split the write traffic across nodes.
Different set of storage engines across data nodes.
When you need parallel execution and data consistency.-
In the next blog, we would explain a use case where we have faced issue with huge data set and how we have achieved horizontal sharding using Spider storage engine.