Background

In one of the projects which I was working on, we had to execute a DDL (alter table) on a table which doesn’t have huge data (around 20,000 records) but has a high traffic of reads and relatively lower traffic of updates \ DML.

Ours was a zero downtime deployment on production. (Refer to this blog for more information https://blog.imaginea.com/zero-downtime-deployment/).

When we tried to run the alter scripts, we observed that the script hanged for a long time and was never executing. We were surprised to notice this as the number of records in the table is not huge.

MySQL’s Online DDL

MySQL supports two kinds of algorithms to carry out alter table operation viz., (i) INPLACE (ii) COPY.

By default, MySQL used INPLACE algorithm, until and unless you specify algorithm as COPY in the alter statement which you have issued.

The sequence of operations when an alter table command is issued is clearly documented in MySQL’s documentation. Refer https://dev.mysql.com/doc/refman/5.6/en/innodb-create-index-concurrency.html. It clearly states that queries and DML operations on a table can still proceed while the DDL operation is in progress for algorithm INPLACE.

Let’s reproduce the issue faced

Step 1: Start two sessions. Start a transaction on Session-1 and issue a select query on a table.

Step 2: In Session-2, issue an alter command on a different table other than the table used in Session-1.

Step 3: Here, we can see that the above command executed successfully. Now, let’s try to issue an alter command on table1.

Here, we can observe that the above statement hangs and keeps waiting.

Step 4: Now, in Session-3, try to issue a query on table1

we can see that this statement also hangs. Now check the InnoDB status. We can see that the above query status will be “waiting for the metadata lock”.

Step 5: Now, try to end the transaction in Session-1 by issuing a rollback or commit;

We can see that, the alter command in Session-2 and select query in Session-3 now executed Successfully.

Now, let’s extend the scenario to multiple transactions.

Step 1: Start a transaction in Session 1 and issue a select query on table1.

Step 2: In Session 2, issue a alter table command on table1.

Step 3: In Session 3, start a new transaction and issue a query on table1

We can see that select query on Session 3 waits to acquire the lock.

Step 4: Lets try to end the transaction on Session 1 by issuing a commit or rollback;

Here, we can see that the alter command on table1 in Session-2 still waits for the lock, but the select query in Session-3 returns the results, immediately after ending the transaction on Session-1.

Step 5: Now, end the transaction on Session 3.

Now, we can see that the alter statement now goes through in Session 2 and a new column gets added.

Let’s explain

Carefully observing the Online DDL operation which MySQL follows is having 3 steps.

(i) Initialization (ii) Execution (ii) Commit Table definition.

During the first two phases, MySQL allows queries and DML’s to execute.

In the third phase, the metadata lock is upgraded to exclusive to evict the old table definition and commit the new one. This lock is for a very brief period.

During, this process of upgrading metadata lock to an exclusive lock, alter operation waits for all pending concurrent transactions to complete. This is because the transaction holds the metadata lock. Hence, DDL statement execution waits until pending transactions are complete and the subsequent further Select Query or DML statements on the table will also wait until the lock has been released. This scenario exists on all MySQL versions.

Both INPLACE and COPY algorithm follow acquiring of exclusive metadata lock during the commit table definition stage but differs with the amount of time lock is acquired. Lock is acquired for a breif period with INPLACE algorithm than COPY.

Conclusion

It’s no good when a table gets locked and the queries getting piled up to acquire the lock on a table. Hence, we have to be careful especially when an alter statement is executed on a Zero-Downtime-Deployment model on a high transactional table (Reads, DML’s) irrespective of the table size. Better to have a planned downtime or execute during traffic is relatively lower.