RDS MySQL to Cloud SQL MySQL Migration with Rollback Strategy

Trushar Borse
Google Cloud - Community
10 min readAug 3, 2023

--

Background & Purpose:

AWS to GCP managed database migration is a very common combination we come across as part of customer’s GCP adoption journeys.

e.g: AWS RDS Aurora MySQL to GCP Cloud SQL MySQL

OR

AWS RDS MySQL Community Edition to Cloud SQL MySQL

Customers (especially larger enterprises) usually ask for a Rollback plan as part of their migration strategy in order to plan the mitigation in case things don’t go as planned due to any reason.

Purpose of this document is to cover the most common scenarios we come across in migration from AWS MySQL DBaaS to Cloud SQL MySQL.

Below scenarios have been covered during the rest of this document.

Scenario 1

Source -> Aurora Version 2 — Compatible with MySQL 5.7

Destination -> GCP MySQL instance 5.7

Rollback -> Aurora Version 2 ( This is the same source instance )

Note -> Avoid creating new database/users during the migration/rollback

  • We need an intermediate EC2 machine in AWS to ignore the mysql (seeded) database from GCP CSQL to AWS replication for rollback. Otherwise mysql.heartbeat table will also start replicating and break the replication. We can’t add replication filters in the Aurora 2.0 version. It is available from version 3.0 and is MySQL 8.0 compatible.
  • We use the same source Aurora instance in the rollback strategy.

Steps

  1. If the GTID is off then we need to enable it at the source Aurora instance:

Steps to enable GTID in Aurora

a. Create new parameter group (if running on default)

b. Change parameters:

gtid_mode = ON

enforce_gtid_consistency = ON

binlog_format=ROW

c. Apply and restart the Aurora database.

2. Using the GCP DMS migration job, start continuous replication from Aurora to GCP.

Google Cloud DMS for MySQL Database

Wait until the migration job reaches the CDC state. We can check the replication status by logging into the GCP CSQL MySQL and issue command “SHOW SLAVE STATUS \G”. Monitor the value of variable “seconds_behind_master”, it should be 0.

Side Note: Consider the point of character set and collation at server level (character_set_server flag). Whether the CSQL MySQL instance should switch to use Latin1 before it gets hydrated as part of the DMS job.

More crucial if there are routines (functions, procedures)

3. To support the rollback we need binary logs to be enabled on the CloudSQL instance which is currently read replica of the Aurora source instance created by DMS job. Steps to enable the binary logging from gcloud command on the GCP read replica instance:

gcloud sql instances patch instance_name — enable-bin-log

This will enable binary logging in row format with GTID mode ON.

Set flag expire_log_days to 7 in CSQL MySQL 5.7

Same flag is binlog_expire_logs_seconds in CSQL MySQL 8.0

4. Prepare the EC2 machine for the rollback plan->

Create an EC2 machine for the replication from GCP to EC2.

Aurora => GCP => EC2

a. Install Mysql community 5.7 version on EC2 machine ubuntu using below link:

Add EBS volume to the EC2 for making the disk available. Resize it if required.

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-using-volumes.html

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/recognize-expanded-volume-linux.html

NOTE: For smaller databases, it’s fine to accommodate this within the root/boot EBS volume itself.

b. mysqldump on AWS EC2 from the GCP with binary log position and restore it.

You might need to run the below using root unix user.

mysqldump \

-h x.x.x.x -u root -p<password> \

— databases database_name \

— hex-blob \

— set-gtid-purged=on

— single-transaction \

— routines \

| gzip > outputfile.sql.gz

Restore command:

gunzip < outputfile.sql.gz | mysql

(no -u root -p needed if no username/password set at the time of installing and setting up mysql on EC2 VM)

ERROR 1840 (HY000) at line 13828: @@GLOBAL.GTID_PURGED can only be set when @@GLOBAL.GTID_EXECUTED is empty .

The above error if received at the end of database restoration should not be a problem.

TIP: Use screen or nohup based approach to run the dump and restore to avoid session break in case of larger databases

c. Edit /etc/mysql/my.cnf file.

Enable log_bin with row format and enable log_slave_updates. Also add replication filter “Replicate_Wild_Ignore_Table: mysql.%”

[mysqld]

server-id=100

datadir=/data/ (add only if planning to run mysql outside of default /)

log-bin=binlog.log

binlog_format=row

gtid_mode=ON

enforce-gtid-consistency=ON

log-slave-updates=ON

bind_address=0.0.0.0

expire_logs_days=10

replicate_wild_ignore_table=mysql.%

Restart mysql service on EC2 VM after setting up my.cnf

sudo service mysql restart

Using replicate_wild_ignore_table flag to ignore all the queries for the mysql database to avoid replication issues.

d. Create replication from GCP Cloud SQL MySQL to EC2 machine using “Change master to command.”

On GCP CSQL MySQL:

CREATE USER ‘replica_user1’@’%’ IDENTIFIED BY ‘<password>’;

Above can be executed by the customer/implementor.

GRANT REPLICATION SLAVE ON *.* TO ‘replica_user1’@’%’;

Note — End user can’t run this as GCP instance is a read replica. Seek help via GCP support ticket with justification to have the grant command executed in the concerned Cloud SQL instance. Do also note that it’s a temporary arrangement only for the purpose and duration of enabling rollback mechanism.

On EC2:

change master to master_host=’x.x.x.x’,master_user=’replica_user1’,master_password=’<password>’,master_auto_position=1 ;

start slave;

x.x.x.x -> Outgoing IP of CSQL instance

Rollback Steps:

At this stage:

  • We already have cutover the traffic from source Aurora to GCP CSQL MySQL.
  • We already promoted the GCP instance from the DMS console to serve the live application traffic.

1) Now we need to set up Replication from the EC2 to source (the Aurora instance).

Aurora => GCP CSQL => EC2 => Aurora

Create replication user on EC2 machine.

CREATE USER ‘replica_user2’@’%’ IDENTIFIED BY ‘<password>’;

GRANT REPLICATION SLAVE ON *.* TO ‘replica_user2’@’%’;

2) Start replication from the EC2 machine (master) to Aurora instance, run the procedure on the Aurora instance:

CALL mysql.rds_set_external_master_with_auto_position ( host_name , host_port , replication_user_name , replication_user_password , 0 );

3) Check the replication status in Aurora. Once all data is migrated, rollback is considered completed and Aurora is ready to use again.

Scenario 2

Source -> Aurora Version 3 — Compatible with MySQL 8.0

Destination -> GCP MySQL instance 8.0

Rollback -> Aurora Version 3 ( This is same source instance )

Note -> Avoid creating new database/users during the migration/rollback

  1. If the GTID is off then we need to enable it at the source Aurora instance:

Steps to enable GTID in Aurora

a. Create new parameter group (if running on default)

b. Change parameters::

gtid_mode = ON

enforce_gtid_consistency = ON

binlog_format=ROW

c. Apply and restart the Aurora database.

2. Using the GCP DMS migration job, start continuous replication from Aurora to GCP.

Aurora => GCP

Google Cloud DMS for MySQL Database

Wait until the migration job reaches the CDC state. We can check the replication status by logging into the GCP mysql and issue command “SHOW SLAVE STATUS \G”. Monitor the value of variable “seconds_behind_master”, it should be 0.

3) To support the rollback we need binary logs to be enabled on the CloudSQL instance which is currently a read-replica of the Aurora source instance created by DMS job. Steps to enable the binary logging from gcloud command on the GCP read replica instance:

gcloud sql instances patch instance_name — enable-bin-log

This will enable binary logging in row format with GTID mode ON.

Rollback Steps:

At this stage:

  • We have already cutover the traffic from the source Aurora to GCP CSQL MySQL for migration.
  • We have already promoted the GCP instance from the DMS console to receive the traffic from the application.

1) Now we need to set up Replication from the GCP CSQL MySQL to source (the Aurora instance) to apply the changes happening on the GCP back to source Aurora.

Aurora => GCP => Aurora

Create replication user on GCP machine.

CREATE USER ‘replica_user’@’%’ IDENTIFIED BY ‘<password>’;

GRANT REPLICATION SLAVE ON *.* TO ‘replica_user’@’%’;

2) Edit the parameter group of source Aurora instances to add values in the flag replicate-do-db and include all the user databases as a comma separated list.

3) Start replication from the CloudSQL instance to Aurora instance, run the procedure on the Aurora instance:

CALL mysql.rds_set_external_master_with_auto_position ( host_name , host_port , replication_user_name , replication_user_password , 0 );

Scenario 3

Database size : Large (e.g. > 5TB)

MySQL version -> 5.7

Migration of larger databases where DMS is slow (usually greater than 2 TB). Do MySQL backup and restore manually using mydumper and myloader.

  1. If the GTID is off then we need to enable it at the source Aurora instance:

Steps to enable GTID in Aurora

a. Create new parameter group (if running on default)

b. Change parameters:

gtid_mode = ON

Enforce_gtid_consistency = ON

binlog_format=ROW

c. Apply and restart the Aurora database.

2. Create replication user in the source Aurora database:

create user ‘repl_user’@’%’ identified by ‘Repl@1234’ ;

GRANT REPLICATION SLAVE, EXECUTE ON *.* TO ‘repl_user’@’%’ ;

FLUSH PRIVILEGES;

3. Setup the source instance at the cloudsql end using the doc: https://cloud.google.com/sql/docs/mysql/replication/configure-replication-from-external#custom-import-example

Example:

Create json:

source.json:

{

“name”: “cloudsql-source-instance”,

“region”: “us-central1”,

“databaseVersion”: “MYSQL_5_7”,

“onPremisesConfiguration”: {

“hostPort”: “aurora-migration-test.cluster-xxxxxxx.us-east-1.rds.amazonaws.com:3306”,

“username”: “repl_user”,

“password”: “<password>”

}

}

Create source instance:

gcloud auth login

ACCESS_TOKEN=”$(gcloud auth print-access-token)”

curl — header “Authorization: Bearer ${ACCESS_TOKEN}” \

— header ‘Content-Type: application/json’ \

— data @./source.json \

-X POST \

https://sqladmin.googleapis.com/sql/v1beta4/projects/<gcp_project>/instances

4. Set up a Cloud SQL replica of the source representation instance. Use values as appropriate for your setup:

Create json (sample):

Replica.json

{

“settings”: {

“tier”: “db-custom-4–15360”,

“dataDiskSizeGb”: “300”

},

“masterInstanceName”: “cloudsql-source-instance”,

“region”: “us-central1”,

“databaseVersion”: “MYSQL_5_7”,

“name”: “replica-of-source-representation”

}

Create replica instance:

gcloud auth login

ACCESS_TOKEN=”$(gcloud auth print-access-token)”

curl — header “Authorization: Bearer ${ACCESS_TOKEN}” \

— header ‘Content-Type: application/json’ \

— data @./replica.json \

-X POST \

https://sqladmin.googleapis.com/sql/v1beta4/projects/<gcp_project>/instances

5. Promote the replica instance:

gcloud sql instances promote-replica replica-of-source-representation — project=<gcp_project>

6. Take the backup of the source database using mydumper. Ensure to use one of the recent versions of this tool.

Mydumper Sample command:
mydumper -h H<Sourcehost> -u <username> -p <password> -B <dbname> — rows=5000000 — threads=32 — trx-consistency-onlylong-query-retries 1long-query-retry-interval 600 — compress — regex ‘^(?!(mysql\.|sys\.)) — outputdir /backup/ — logfile /backup_log.out

Example:

mydumper -h aurora-migration-test.cluster-xxxxxxx.us-east-1.rds.amazonaws.com -u admin -p “Root1234” — rows=5000000 — threads=32 — trx-consistency-only — long-query-retries 1 — long-query-retry-interval 600 — compress — regex ‘^(?!(mysql\.|sys\.))’ — outputdir backupdir/ — logfile backup_log.out

7. Restore the backup on the CloudSQL replica instance:

Myloader Sample command:

myloader -h <Targethost> -u <username> -p <password> — threads=32 — max-threads-per-table=8 -d /backup/ — innodb-optimize-keys -s <Source_DB> -B <target_DB> — queries-per-transaction=500000 -o — verbose=3 — logfile /backup.log

Example:

myloader -h 10.xx.xx.x— threads=32 — max-threads-per-table=8 -d backupdir/ — innodb-optimize-keys — queries-per-transaction=500000 -o — verbose=3 — logfile restore.log

Parameters to improve the performance of backup/restore (Values below are just for EXAMPLE. Please adjust those as per your setup/env)

innodb_io_capacity = 15000

innodb_io_capacity_max = 30000

innodb_read_io_threads = 64

innodb_write_io_threads = 64

innodb_adaptive_hash_index = 0

innodb_purge_threads = 2

innodb_change_buffer_max_size = 30

query_cache_size = 0

innodb_log_file_size = 10G

innodb_log_buffer_size = 2G

innodb_flush_log_at_trx_commit = 2

wait_timeout = 86400

max_allowed_packet=1G

Also, choose the Zonal instance for the restoration of large databases to get the better speed. Once the database is loaded completely, then you can switch to a regional instance.

***

After restoration, we need to create replication from the Aurora/RDS to cloudsql.

8. Demote the replica instance:

Example json:

demote.json

{

“demoteMasterContext”: {

“masterInstanceName”:”cloudsql-source-instance”,

“skipReplicationSetup”: true

}

}

Demote instance:

gcloud auth login

ACCESS_TOKEN=”$(gcloud auth print-access-token)”

curl — header “Authorization: Bearer ${ACCESS_TOKEN}” \

— header ‘Content-Type: application/json’ \

— data @./demote.json \

-X POST \

https://sqladmin.googleapis.com/sql/v1beta4/projects/<gcp_project>/instances/replica-of-source-representation/demoteMaster

9. Get the GTID info from the backup:

sudo cat backupdir/metadata

Started dump at: 2023–06–26 12:08:15

SHOW MASTER STATUS:

Log: mysql-bin-changelog.000078

Pos: 9426

GTID:e9ef4768–1194–3b73-b585-dea9f52b5a67:1–20981

Finished dump at: 2023–06–26 12:09:56

10. Login to the replica instance and setup replication:

mysql> call mysql.resetMaster();

mysql> call mysql.skipTransactionWithGtid(‘e9ef4768–1194–3b73-b585-dea9f52b5a67:1–20981’);

mysql> call mysql.setupExternalSourceAutoPosition(‘aurora-migration-test.cluster-xxxxxxx.us-east-1.rds.amazonaws.com’, 3306, \

‘repl_user’, ‘<password>’, 1,false, false);

mysql> call mysql.startReplication();

mysql> show slave status \G

11. Once the replication is in sync, we can stop the traffic on source Aurora and promote the replica to start using it as a new source.

gcloud sql instances promote-replica replica-of-source-representation — project=<gcp_project>

Rollback Strategy for the large database is same as in Scenario 1

Scenario 4

How to perform Database Upgrade during Migration using GCP DMS

Current Limitation:

DMS, as of today, do not support selecting the latest or your desired minor version of Cloud SQL for MySQL and Postgres.

For example, if your source MySQL Version is 8.0.32 and you want to migrate from another cloud provider to GCP CloudSQL using DMS, DMS will create a MySQL database with 8.0.30 version. Customers will need another downtime to upgrade MySQL 8.0.30 to 8.0.32.

We cannot do major or minor version upgrades of external Read Replica which DMS uses.

Solution:-

We can create a Cascading Replica from the External Read replica and Upgrade the major or minor version of it.

Use Case

One is using GCP DMS to migrate from AWS RDS MYSQL 8.0.32 to CloudSQL for MySQL.

DMS currently creates the target instance as 8.0.30. This instance is known as External Primary Read Replica or simply an external Read Replica.

Many customers need to have the source and target database versions, including minor versions, to be the same.

The solution is to create a Cascading Read Replica from the External Read Replica.

Then upgrade the Cascading Read Replica from 8.0.30 to 8.0.32 while your source database and the DMS Target (External Read Replica) uses CDC to be in sync.

Creation of Cascading Read Replica

gcloud sql instances create mysql-prod-target — master-instance-name=mysql-temp-dest

Setup (Before cutover):-

mysql-prod(Primary)->mysql-temp-dest(External Primary Read Replica created by DMS)->mysql-prod-target(Cascading Replica)

Mysql-temp-dest is the target DMS Instance and in CDC Mode.

During cutover DMS job will promote it to primary

Setup(After cutover)

mysql-temp-dest(Primary) →mysql-prod-target(Read Replica)

Mysql-temp-dest is the 8.0.30 version.

Mysql-prod-target will be upgraded to 8.0.32 version.

Once, Mysql-prod-target is in sync it can be promoted and the application can point to it.

When using GCP Console to upgrade the cascading replica, it will error out with the following message

“Cascading replica cannot be upgraded using console”

Gcloud would probably fail with the below error:

gcloud sql instances patch Mysql-prod-target — database-version=MYSQL_8_0_32

ERROR: gcloud crashed (KeyError): ‘MYSQL_8_0_32’

API approach should work using the way described below.

Create the request JSON with the desired/target database version.

vi request.json

{

“databaseVersion”: “MYSQL_8_0_32”

}

curl -X PATCH -H “Authorization: Bearer $(gcloud auth print-access-token)” -H “Content-Type: application/json; charset=utf-8” -d @request.json “https://sqladmin.googleapis.com/v1/projects/<gcp_project>/instances/mysql-prod-target"

{

“kind”: “sql#operation”,

“targetLink”: “https://sqladmin.googleapis.com/v1/projects/<gcp_project>/instances/mysql-prod-target",

“status”: “PENDING”,

“user”: “<gcp_user>”,

“insertTime”: “2023–06–23T05:50:19.756Z”,

“operationType”: “UPDATE”,

“name”: “ea37c299-aaeb-48c9-a6c0–47cb0000002f”,

“targetId”: “mysql-prod-target”,

“selfLink”: “https://sqladmin.googleapis.com/v1/projects/<gcp_project>/operations/ea37c299-aaeb-48c9-a6c0-47cb0000002f",

“targetProject”: “<gcp_project>

--

--