Archive

Posts Tagged ‘optimization’

Optimizing MySQL Queries

June 18, 2009 Leave a comment

Hey buddies,
I searched for different MySQL optimization techniques. I consolidated all result for everybody’s reference. [:)]

After you delete a lot of rows for a particular table you should optimize it.

1. optimize table tableName;

If you have inserted a great number of rows I would analyze the table with SQL similar to the following.

1. analyze table tableName;

Try to avoid NULLs
Nulls are a special case in most databases. MySQL is no exception to this rule. They require more coding to handle internally, more checking, special index logic, and so on. Some developers simply aren’t aware, and when NULL is the default, that is the way tables are created. However it’s better to use NOT NULL in most cases, and even use a special value, such as 0 or -1 for your default value.

Optimizing COUNT (my_col) and COUNT (*)

If you’re using MyISAM tables, count(*) with no where clause is very fast because the statistics on rowcounts is exact. So MySQL doesn’t have to look at the table at all to get the count. The same can be said for count(my_col) if that column is NOT NULL.

If you are doing count() with a where clause, there’s not much you can do to optimize it further, beyond the obvious of indexing the column in the where clause.
MyISAM keeps an internal cache of table meta-data like the number of rows. This means that, generally, COUNT(*) incurs no additional cost for a well-structured query. InnoDB, however, has no such cache. For a concrete example, let’s say we’re trying to paginate a query. If you have a query SELECT * FROM users LIMIT 5,10, let’s say, running SELECT COUNT(*) FROM users LIMIT 5,10 is essentially free with MyISAM but takes the same amount of time as the first query with InnoDB. MySQL has a SQL_CALC_FOUND_ROWS option which tells InnoDB to calculate the number of rows as it runs the query, which can then be retreived by executing SELECT FOUND_ROWS().

Optimizing Subqueries

MySQL’s query optimization engine isn’t always the most efficient when it comes to subqueries. That’s why it is often a good idea to convert a subquery to a join. Joins have already been handled properly by the optimizer. Of course, be sure the column you’re joining on in the second table is indexed. On the first table MySQL usually does a full table scan on against the subset of rows from the second table. This is part of the nested loops algorithm, which MySQL often engages to perform join operations.

SELECT a.id,
(SELECT MAX(created)
FROM posts
WHERE author_id = a.id)
AS latest_post
FROM authors a

Since this subquery is correlated, i.e., references a table in the outer query, one should convert the subquery to a join.
SELECT a.id, MAX(p.created) AS latest_post
FROM authors a
INNER JOIN posts p
ON (a.id = p.author_id)
GROUP BY a.id

Optimizing UNION
UNION has an interesting optimization that exists across a few different databases. It’s obvious when you think about how it works. UNION gives you the rows from two tables that don’t exist in the other. So implicitly, you are removing duplicates. To do this the MySQL database must return distinct rows, and thus must sort the data. Sorting, as we know is expensive, especially for large tables.

UNION ALL can very well be a big speedup for you. UNION ALL will provide the results you want, without the heavy overhead of sorting the data.

Learn Your Indices

Often your choice of indices will make or break your database. For those who haven’t progressed this far in their database studies, an index is a sort of hash. If we issue the query SELECT * FROM users WHERE last_name = ‘Goldstein’ and last_name has no index then your DBMS must scan every row of the table and compare it to the string ‘Goldstein.’ An index is usually a B-tree (though there are other options) which speeds up this comparison considerably.

You should probably create indices for any field on which you are selecting, grouping, ordering, or joining. Obviously each index requires space proportional to the number of rows in your table, so too many indices winds up taking more memory. You also incur a performance hit on write operations, since every write now requires that the corresponding index be updated.

Read Peter Zaitsev’s MySQL Performance Blog if you’re into the nitty-gritty of MySQL performance. He covers many of the finer aspects of database administration and performance.