Category: technologies

Effectively manage row deletion using MySQL partitioning

Article available in English

When you are about to delete some rows in MySQL, your first choice is usually using DELETE FROM. This is called hard deletion (or physical deletion). In fact, real-life business rarely uses this command when an immediate delete request is issued. More often than not, they use a technique called “soft deletion” (or logical deletion), which is implemented by adding a column named “deleted” or whatever with the same meaning to mark the rows as “deleted”. The purpose of this is to keep track of which rows were deleted, so as to inspect and easily revert changes if something goes wrong. However, with time the soft-deleted rows will start to clog up the database, which results in a performance decrease. After a certain period, soft-deleted rows have to be hard-deleted. However, querying soft-deleted rows followed by deletion with DELETE command will take quite much time and is prone to mistake. In this article, we will take a look at the method using partition to manage row deletion, which can be applied to a variety of use cases.

1. Review: What is partitioning

Partitioning is a technique used to divide a database and its content into multiple parts. The primary use of partitioning is for load balancing. However, you can use partitioning to make large data more manageable, which is the gist of this article.

There are 2 types of partitioning: Vertical and horizontal partitioning. While vertical partitioning involves table structure adjustment, horizontal partitioning splits a table’s content into multiple parts. The technique used to manage deleted rows in this article will utilize horizontal partitioning.

According to Wikipedia, more than one table has to be created to implement horizontal partitioning. Fortunately, MySQL 5.6 and above comes with horizontal partitioning support out of the box. Defining a partitioning scheme and MySQL will take care of the rest.

2. Implement new row deletion mechanism

Assume you have an already prepared table named receipt (How the table is structured is at your own idea).

  • The first thing is to add 2 more columns, “disabled” and “month” to the receipt table. A simple ALTER TABLE should suffice.

The “disabled” column will be responsible for keeping track of the soft-deleted rows. The “month” column will record when a row is updated (soft deletion is likely to be the last update query ever executed on a row, therefore “month” of a soft-deleted column will mark its deletion time in month).

  • Execute a horizontal partitioning to this table by the MySQL code below
    ALTER TABLE receipt PARTITION BY LIST COLUMNS(disabled, month)
    (0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6),
    (0, 7), (0, 8), (0, 9), (0, 10), (0, 11), (0, 12)),
    PARTITION pd1m1 VALUES IN ((1, 1)),
    PARTITION pd1m2 VALUES IN ((1, 2)),
    PARTITION pd1m3 VALUES IN ((1, 3)),
    PARTITION pd1m4 VALUES IN ((1, 4)),
    PARTITION pd1m5 VALUES IN ((1, 5)),
    PARTITION pd1m6 VALUES IN ((1, 6)),
    PARTITION pd1m7 VALUES IN ((1, 7)),
    PARTITION pd1m8 VALUES IN ((1, 8)),
    PARTITION pd1m9 VALUES IN ((1, 9)),
    PARTITION pd1m10 VALUES IN ((1, 10)),
    PARTITION pd1m11 VALUES IN ((1, 11)),
    PARTITION pd1m12 VALUES IN ((1, 12)))

Let me explain what is happening here.
The SQL code allows us to divide the receipt into 13 partitions, based on the value of the (“disabled”, “month”) tuple. The first partition, named “pd0”, stores all records that still haven’t been marked as “soft-deleted”, thus “disabled” value is 0. The following 12 partitions store all soft-deleted rows, which will be divided and arranged by deleted month.

  • Create a MySQL trigger that activated when a row is soft-deleted. The trigger will modify the “month” column value to the month when that row is soft-deleted:
    CREATE TRIGGER change_month_column_on_soft_deletion
    IF old.disabled = 0 AND new.disabled = 1 THEN
    SET new.month := month(CURRENT_TIMESTAMP);
    END IF;

The code above is self-explanatory: Change the “month” column value to the month of the timestamp recorded when a row’s “disabled” column value is changed.

  • Now our database will automatically organize soft-deleted rows based on the deleted timestamp’s month. The soft-deleted data must be emptied after a certain duration, therefore a crontab that automatically runs a bash file which will execute hard deletion on rows that marked soft-deleted x months ago (x based on the requirement of the project) needs to be created too. The SQL command to execute is straightforward: We only have to truncate the appropriate partition. The bash check today’s month and perform this SQL command
    /* x is from 1 to 12, which partition will be
    truncated depends on a switch case that takes value from today's month */

The bash file’s executed routine should be once per month.
TRUNCATE has a much faster execution time than DELETE FROM WHERE clause, the difference is profound when dealing with a large set of data. That’s how we take advantage of partition: Load balancing, ease of management and execution.

3. Conclusion

Instead of “month” column, you can use the timestamp column of the table, in this case, the partition will be based on RANGE method. I simplified the example by only using the timestamp’s month so that you can quickly have a glance at how this mechanism works. The “disabled” column can receive other numbers than a binary flag, to indicate different layers of deletion.
Soft deletion has some disadvantages too: The application code must filter soft-deleted rows when querying (this can be alleviated by using MySQL VIEW to do the filtering work), UNIQUE key and CONSTRAINT may not play gracefully when soft deletion is implemented, and storage management problems. However, in real life business, when data matters much more than storage cost, having soft deletion implemented can be a lifesaver. I would be happy if this article can aid you in choosing the proper way to manage deleted data.