[Dbix-class] MySQL "OR" Performance for Large Query Sets

Jon Schutz jon+dbix at youramigo.com
Sun Nov 16 12:41:02 GMT 2008


I've just done some benchmarking on MySQL 5 for queries where you need
to select back many individual records, e.g. you could use

SELECT ... WHERE id=? (and iterate for all possible id values)

or

SELECT ... WHERE id=1 OR id=2 OR id=3 OR ...

or

SELECT ... WHERE id IN (1, 2, 3, ...)

and there are several other ways to do it.

What I found was that using OR, at a certain point (when selecting >
about 1000 keys) the performance went exponentially sour - like, to
quote one benchmark, the OR query took 6400 secs vs 30s typical for
other methods.  I've written the results up in more detail at
http://notes.jschutz.net/19/perl/mysql-many-row-select-performance -
have a look at the graph if nothing else.  Using 'IN' is the best
solution for any sort of large key set.

I figure this is relevant to DBIx::Class since it uses SQL::Abstract
which will translate a search spec such as A => [ 1 .. 1000 ] into (A=?
OR A=? ... ) 1000 times, which could be a real gotcha (indeed, some of
my own code, which I shall now be reviewing, can expect arrays of tens
of thousands of keys...).

I was wondering - has anyone else seen similar behaviour?


-- 

Jon Schutz                        My tech notes http://notes.jschutz.net
Chief Technology Officer                        http://www.youramigo.com
YourAmigo         




More information about the DBIx-Class mailing list