Sunday, September 22, 2013

MySQL Fabric - Sharding - Migrating From an Unsharded to a Sharded Setup


Introduction


Having written an application to work with an unsharded MySQL setup, how do we migrate the application to a sharded setup ?


The above problem can be divided into two parts


  • Migrating the MySQL Store (with minimum disruption to ongoing transactions)
  • Migrating the Application.


This blog deals with the first problem of migrating the MySQL Store. The second problem of migrating the application will be handled in a separate blog post, and we shall assume for now that we just replace the old unsharded non-Fabric aware application with the new Fabric aware application.



 
Most enterprises start with a single server setup to develop a prototype and later realize that they need to scale. This blog presents a way for such a setup to seamlessly scale to a sharded setup with minimal disruption of ongoing transactions.


Unsharded Setup


The following is the schema in the MySQL Server running on localhost:13013


 
The employee database has the following tables

  • employee
  • salaries
  • dept_emp
  • fuel_reimb

 

Creating the Sharded Configuration


Let us assume that our final sharded configuration will have three shards. Each of these shards will be present in one group and there will be one global group.


 
The tables employee, salaries and dept_emp need to be sharded based on emp_no while fuel_reimb is a global table that needs to be present in all the shards.


Bring the Target server into the Fabric ecosystem


Create a Fabric group and add the server to it. Promote the server to master.


Commands:

mysqlfabric group create GROUPID1

mysqlfabric group add GROUPID1 localhost:13013 root ""

mysqlfabric group promote GROUPID1


 
The server is now part of the Fabric setup.


Creating the Global Group


The global group in a sharding setup is used to propagate schema updates to all tables in the sharding setup and updates to global tables throughout the sharding ecosystem. The global group contains

  • The schema of the sharded tables
  • The global tables


NOTE: Once we have restored the schema on the global group, until the global group is connected to the server group of the shard (GROUPID1) i.e. Until we create the sharding setup, we need to ensure that schema changing transactions do not occur on the server group of the shard. If schema changes occur in the server group of the shard, these will need to be propogated to the global group also, otherwise it will result in mismatching schemas in the global group and the server group of the shard. This can cause problems for global operations that expect the new schema. Once the sharding setup is created, schema changes can be sent to the global group.


NOTE: Similarly once the global tables have been restored on the global group, we need to prevent changes to the global groups until the sharding setup is created, after which the changes can be propagated to the global group.


Commands:

mysqlfabric group create GLOBAL_GROUP

mysqlfabric group add GLOBAL_GROUP localhost:13011 root ""

mysqlfabric group promote GLOBAL_GROUP

 

Creating the data in the Global Group


Backup the schema of all the tables and restore them on the on the global group. Also backup the data in the global tables and restore them on the global group.


Commands:

Use mysqldump to backup the schemas

mysqldump -d -u root --single-transaction --all-databases --socket <Server socket file for localhost:13013> > schema.sql

NOTE: --no-data can be used instead of -d in the above command.


Use mysql client to restore the schemas

mysql -u root --socket <Server socket file for localhost:13011> < schema.sql


Use mysqldump to backup the global tables

mysqldump -u root --no-create-info --single-transaction --socket <server socket file for localhost:13013> employee fuel_reimb > global.sql


The GTID_EXECUTED variable would have been set on the destination server. This causes applying the dump using the mysql client to fail. Reset this to allow restoring the global table.

reset master


Restore the data on the global tables

mysql --database=employee -u root --socket <server socket file for localhost:13011> < global.sql

 

Creating the Sharding Setup


Create a sharding definition


A sharding definition introduces a particular sharding scheme to which multiple tables related by a common sharding key can be mapped. A Sharding definition is composed of

  • Sharding scheme – RANGE, HASH
  • Global Group – Gets the global updates for all the shard tables mapped to this sharding definition. It also stores the global tables in this sharding definition.

Command

mysqlfabric sharding define HASH GLOBAL_GROUP

 

Map the tables to the sharding definition


Each sharding definition is associated with a unique ID. We map this unique ID to the table being sharded. Each mapping maps a table and the column containing the shard key to the sharding definition.

Commands:

mysqlfabric sharding add_mapping 1 employee.employee emp_no

mysqlfabric sharding add_mapping 1 employee.dept_emp emp_no

mysqlfabric sharding add_mapping 1 employee.salaries emp_no

 

Define the Shards


The shard definitions define the way the data is sharded based on the shard key. For a hash based sharding scheme, since the shards are automatically defined on the shard key, we do not need to manually define the lower_bounds for each shard.


Each shard is associated with a sharding definition and a group on which the data will be present. When we define a shard we also need to specify if it will be ENABLED for operations on shards.

Commands:

mysqlfabric sharding add_shard 1 GROUPID1 ENABLED


The sharding setup now contains a global group and one shard containing all the data.




Once the sharding setup has been created we can start using the Fabric enabled connector to start executing transactions. This would require shutting down the non-Fabric-aware application and replacing it by a Fabric-aware application.

 

Split the shards


Split the shard containing the data. Splitting helps to distribute the load into another group. This is a way to scale our setup for both read and write loads. The splitting can happen while transactions are ongoing.

Commands:

Add another group to which we want to split the data

mysqlfabric group create GROUPID2


Add the servers that are part of the split group


mysqlfabric group add GROUPID2 localhost:13009 root ""


Promote a server as master in the split group


mysqlfabric group promote GROUPID2


Split the shard


If you do not know the shard ID, dump the sharding information to find the shard ID.

mysqlfabric store dump_sharding_information


split the shard

mysqlfabric sharding split 1 GROUPID2


Now the data is distributed between the GROUPID1 and the GROUPID2.


 

Further Splits


The shards can be split further depending on need. For example to split shard with shard ID = 3 into a group with group ID = GroupID3 we will run the following command,

Commands:

mysqlfabric sharding split 3 GROUPID3


After the above command is executed the topology would look like the following.
 

Please find a few thoughts on maintaining a sharded system here http://vnwrites.blogspot.com/2013/09/mysqlfabric-sharding-maintenance.html

4 comments:

  1. Replies
    1. Thank you :),

      Please do download MySQL Fabric(from http://labs.mysql.com/) and try it out, and please do give feedback :).

      Delete
  2. Great post! The fact that you means someone is reading and liking it! Congrats!That’s great advice.

    Server installtion and migration

    ReplyDelete
  3. Great article. Is there any plan to make mysql client aware of sharding? there would be times when customers wish to load data files using mysql client tools such as mysqlimport

    ReplyDelete