Optimizing MySQL: ORDER BY RAND()

by Matt Mecham on December 14, 2010

in Articles

As I wrap up development of a project, the last thing I generally do is go through and optimize the code and MySQL work. Generally when I race through development I do enough on-the-spot optimization to ensure good practises but I don’t obsess over the minute details.

One of the functions of IP.Gallery 4 is to show a feature image. I did this quickly by using ORDER BY RAND() even though I was aware that it’s horribly inefficient and doesn’t scale well. Indeed, I’ve read reports where a million rows can take more than 60 seconds to return a result.

This is one of those times where it seems to you do to more work to save a little time. Lets examine the original query:

SELECT i.*,a.*,m.members_display_name, m.members_seo_name, m.member_id
	FROM gallery_images i
		LEFT JOIN gallery_albums_main a ON ( a.album_id=i.img_album_id )
		LEFT JOIN members m ON ( m.member_id=i.member_id )
	WHERE ( ( i.image_privacy IN (2,1,0,3) AND i.member_id=1 )
		OR ( i.image_privacy=1 AND i.member_id !=1 AND ( ( ( FIND_IN_SET(4,i.image_parent_permission) )
		OR ( i.image_parent_permission='*' ) )) )
		OR ( i.image_privacy IN (1,2) AND i.member_id IN(12,14,15,66,100) ) ) AND i.approved IN(0,1) AND i.idate > 1260786609 AND i.image_feature_flag > 0
ORDER BY RAND()
LIMIT 0,1;

This is fairly straightforward. It selects a single row based on selective WHERE and returns a RAND() result. However, MySQL doesn’t optimize this well and the process is:


+----+-------------+-------+--------+-------------------------------------------------------------+--------------------+---------+---------------------+------+----------------------------------------------+
| id | select_type | table | type   | possible_keys                                               | key                | key_len | ref                 | rows | Extra                                        |
+----+-------------+-------+--------+-------------------------------------------------------------+--------------------+---------+---------------------+------+----------------------------------------------+
|  1 | SIMPLE      | i     | range  | member_id,im_select,gb_select,image_feature_flag,rnd_lookup | image_feature_flag | 4       | NULL                |  160 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | a     | eq_ref | PRIMARY                                                     | PRIMARY            | 8       | ipb3.i.img_album_id |    1 |                                              |
|  1 | SIMPLE      | m     | eq_ref | PRIMARY                                                     | PRIMARY            | 3       | ipb3.i.member_id    |    1 |                                              |
+----+-------------+-------+--------+-------------------------------------------------------------+--------------------+---------+---------------------+------+----------------------------------------------+

To fetch the result, MySQL has to dump the results to a temporary table then filesort them (and probably not a qsort either).

The solution I used is documented here and works very well. In a nutshell, you select a RANDOM number between 0 and the MAX id in the table and then you select the nearest neighbour in case the ID selected doesn’t exist as it has been deleted (what is referred to as a ‘hole’ in the table). This will never be as truly random because if your table had numbers 100 to 300 missing and the ID returned was between 101 and 299, it would always select 300 but for our application this is more than fine.

The new query looks horrifically complex but because it is all indexed it is incredibly quick:

SELECT i.*, a.*, m.members_display_name, m.members_seo_name, m.member_id
	FROM gallery_images AS i
		JOIN (SELECT (RAND() *
			(SELECT MAX(id)
				FROM gallery_images FORCE INDEX (rnd_lookup)
					WHERE ( ( image_privacy IN (2,1,0,3) AND member_id=1 )
						OR ( image_privacy=1 AND member_id !=1
							AND ( ( ( FIND_IN_SET(4,image_parent_permission) )
						OR ( image_parent_permission='*' ) )) )
						OR ( image_privacy IN (1,2)
							AND member_id IN(12,14,15,66,100) ) )
							AND approved IN(0,1)
							AND idate > 1260786406 AND image_feature_flag > 0 ) ) AS id ) AS rand2
		LEFT JOIN gallery_albums_main a ON (i.img_album_id=a.album_id)
		LEFT JOIN members m ON (i.member_id=m.member_id)
	WHERE ( ( i.image_privacy IN (2,1,0,3) AND i.member_id=1 )
	  OR ( i.image_privacy=1 AND i.member_id !=1
	  	AND ( ( ( FIND_IN_SET(4,i.image_parent_permission) )
	  OR ( i.image_parent_permission='*' ) )) )
	  OR ( i.image_privacy IN (1,2)
	  	AND i.member_id IN(12,14,15,66,100) ) )
	  	AND i.approved IN(0,1)
	  	AND i.idate > 1260786406
	  	AND i.image_feature_flag > 0
	  	AND i.id >= rand2.id
ORDER BY i.id ASC
LIMIT 1;

And this is shown in the EXPLAIN:

+----+-------------+------------+--------+---------------------------------------------------------------------+------------+---------+---------------------+------+----------------+
| id | select_type | table      | type   | possible_keys                                                       | key        | key_len | ref                 | rows | Extra          |
+----+-------------+------------+--------+---------------------------------------------------------------------+------------+---------+---------------------+------+----------------+
|  1 | PRIMARY     |  | system | NULL                                                                | NULL       | NULL    | NULL                |    1 |                |
|  1 | PRIMARY     | i          | range  | PRIMARY,member_id,im_select,gb_select,image_feature_flag,rnd_lookup | PRIMARY    | 8       | NULL                |  133 | Using where    |
|  1 | PRIMARY     | a          | eq_ref | PRIMARY                                                             | PRIMARY    | 8       | ipb3.i.img_album_id |    1 |                |
|  1 | PRIMARY     | m          | eq_ref | PRIMARY                                                             | PRIMARY    | 3       | ipb3.i.member_id    |    1 |                |
|  2 | DERIVED     | NULL       | NULL   | NULL                                                                | NULL       | NULL    | NULL                | NULL | No tables used |
|  3 | SUBQUERY    | x          | range  | rnd_lookup                                                          | rnd_lookup | 7       | NULL                |  158 | Using where    |
+----+-------------+------------+--------+---------------------------------------------------------------------+------------+---------+---------------------+------+----------------+
6 rows in set (0.02 sec)

Even on my local test board which as around 200 images, this takes the query from 0.18 down to 0.004 which is impressive.

This proves that you shouldn’t be afraid to use sub queries or joins because when used correctly they are often more efficient than a straight query.

{ 1 comment }

1 u─čur mirza zeyrek January 11, 2012 at 2:43 am

i think you should provide a little bit more simple examples.

Comments on this entry are closed.

Previous post:

Next post: