Let's say I have a table with an integer column called ID being a primary key. It is unique and not-null, but we cannot guarantee it being sequential without gaps. I.e. we may have rows with ID = 1,2,6,7,8 and we do not have rows with ID 3,4,5 etc etc. This is why we cannot just generate a random number and get a row with that corresponding ID.
Also we want all rows to have equal chance of being selected, so b/c of the gaps in ID values we cannot use simple approach with random number in 0 to max(id) range
The number of rows in the table is not known either.
Here is the table schema:
CREATE TABLE IF NOT EXISTS `test_data` (
`id` int(10) unsigned NOT NULL,
`create_date` datetime DEFAULT NULL,
`text_1` varchar(255) NOT NULL DEFAULT 'BMqFXUslYnGsYsPxHTtZVbcwnEWFmSXxTAUV9YxXXDH5ClUEUO8kFz0cW1xC3o9aMSwabnEr43W23KZnKvrk8PHEJv18SU5JHTH72sLTtleitBJBIWmIpul7LtuYOpc4iRDqEAT80UeG7L2l4r1pr2jEMW7222reAOuIcBIUcsH9LYlojeQjVkc9ZhYXgnN3xRGHLJ3L0MGoXO4GHttEv053DqkkKYEye34bpGI2tJ0IE9M8BIFf2u08jB50nhD',
`text_2` varchar(255) NOT NULL DEFAULT 'hoA6tWi8AEcikkJM50Mz800PGTUKNnyj3OCKhyJ4ExaJf6bYbqXlNWo4y0XXXo7HuvsNgYWnn16211RbKDesQ852QA33s1eni4pBoraEs3YiV0W69yMY7Nf0pvQI198HUVKYPWk9zpK38PDphtPJXO2z5Wb8mbBN0gN8iK5xzUQQDwoAJlO3Z8xXn2OWyVjKswRbZNKW6l0tvn0zN4S4BoR9gkN7s4Ov9tTGeF4uwWYhPEs0WsDqatMjmbnMQmC'
) ENGINE=InnoDB;
ALTER TABLE `test_data`
ADD PRIMARY KEY (`id`), ADD UNIQUE KEY `id` (`id`);
How can I select a random row from that MySQL table using just SQL syntax? I.e. without using PHP code? Thanks!
I've populated your test table with 20,000 rows. Then I have tested (and benchmarked for performance) 3 different ways to get your random row. The approach #2 was the fastest one:
SELECT * FROM `test_data`
ORDER BY RAND()
LIMIT 1
Execution plan:
As you may see, it requires full table scan, i.e. reading all 20,000 rows, calculating 20,000 random numbers, then SORTING 20,000 rows (by RAND), then fetching the first row after sorting. Also, the sorting happens not in memory but using I/), i.e. "using filesort".
So I decided to try making it a bit more efficient - see below ...
Instead of the full table scan we just go through full index scan to get the random id, then fetch the complete record once. The execution time reduced from 0.09 to 0.07 sec.
SELECT t2.* FROM
(SELECT id FROM `test_data`
ORDER BY RAND()
LIMIT 1) AS t1
INNER JOIN test_data as t2 ON t1.id = t2.id
The idea was to get random row by performing random number of FETCHES from the unsorted cursor in the loop. The last fetched ID would be the ID of resulting random record. In theory, benefits of this approach would include:
here is the code
CREATE PROCEDURE test()
BEGIN
DECLARE done INT DEFAULT FALSE;
DECLARE current_id INT;
DECLARE row_counter INT;
DECLARE random_row_number INT;
DECLARE cur1 CURSOR FOR SELECT id FROM test_data;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET @done = TRUE;
SELECT @row_counter := COUNT(*) FROM test_data;
SET @random_row_number = floor(1 + (rand() * @row_counter));
SET @row_counter = 1;
OPEN cur1;
read_loop: LOOP
FETCH cur1 INTO current_id;
IF @done THEN
LEAVE read_loop;
END IF;
IF @row_counter >= @random_row_number THEN
LEAVE read_loop;
END IF;
SET @row_counter = @row_counter + 1;
END LOOP;
CLOSE cur1;
SELECT * FROM test_data WHERE id = current_id;
END
unfortunately, this procedural approach was good in theory but turned to be the slowest in practice. Most likely b/c MySQL cursor operations slowed things down. If anyone has any other idea why this approach with random number of fetches was so slow, please comment!
Thanks!
FavScripts.com is a free tool to save your favorite scripts and commands, then quickly find and copy-paste your commands with just few clicks.
Boost your productivity with FavScripts.com!