Natural Ordering in MySQL

Published in Databases on Nov 5, 2008

I ran into an instance recently where I wanted to implement natural sorting of a result set in MySQL.

When you're dealing with numerical strings or strings with a common non-numeric prefix, the common solution of casting the order column to an integer by adding zero to it works fine. However, if neither of the aforementioned conditions is the case, it takes a little more work.

What actually happens when you add zero to a non-numeric column depends on the characters at the beginning of the column value. If the column does not begin with a sequence of one or more numeric characters, then adding zero to that column produces zero (e.g. "dog" + 0 = 0).

If the column does begin with numeric characters, then adding zero to it produces the sequence of numeric characters up to the first non-numeric character in the original value or the end of the value, whichever comes first (e.g. "12 dogs" + 0 = 12).

An example might be the easiest way to illustrate this.

mysql> SELECT name+0<>0, name+0, name 
    -> FROM `recommendation` 
    -> ORDER BY name+0<>0 DESC, name+0, name;
+-----------+--------+------------------------+
| name+00 | name+0 | name                   |
+-----------+--------+------------------------+
|         1 |      3 | 3 month follow-up      | 
|         1 |      6 | 6 month follow-up      | 
|         1 |     12 | 12 month follow-up     | 
|         0 |      0 | Intervention           | 
|         0 |      0 | Observation            | 
|         0 |      0 | Specialty Consultation | 
+-----------+--------+------------------------+
6 rows in set (0.00 sec)

The first ORDER BY clause checks the string to see if it begins with numeric characters, then places results for those that do first. If you prefer that numeric results appear after non-numeric results, then you can exclude this clause.

The second ORDER BY clause orders the numeric results by casting them to integers and ordering by those integers.

The third ORDER BY clause orders the non-numeric results by the original column value.

And that's all there is to it. Hope this proves helpful to someone.