Posts Tagged ‘database’

Optimizing MySQL Queries

June 18, 2009 Leave a comment

Hey buddies,
I searched for different MySQL optimization techniques. I consolidated all result for everybody’s reference. [:)]

After you delete a lot of rows for a particular table you should optimize it.

1. optimize table tableName;

If you have inserted a great number of rows I would analyze the table with SQL similar to the following.

1. analyze table tableName;

Try to avoid NULLs
Nulls are a special case in most databases. MySQL is no exception to this rule. They require more coding to handle internally, more checking, special index logic, and so on. Some developers simply aren’t aware, and when NULL is the default, that is the way tables are created. However it’s better to use NOT NULL in most cases, and even use a special value, such as 0 or -1 for your default value.

Optimizing COUNT (my_col) and COUNT (*)

If you’re using MyISAM tables, count(*) with no where clause is very fast because the statistics on rowcounts is exact. So MySQL doesn’t have to look at the table at all to get the count. The same can be said for count(my_col) if that column is NOT NULL.

If you are doing count() with a where clause, there’s not much you can do to optimize it further, beyond the obvious of indexing the column in the where clause.
MyISAM keeps an internal cache of table meta-data like the number of rows. This means that, generally, COUNT(*) incurs no additional cost for a well-structured query. InnoDB, however, has no such cache. For a concrete example, let’s say we’re trying to paginate a query. If you have a query SELECT * FROM users LIMIT 5,10, let’s say, running SELECT COUNT(*) FROM users LIMIT 5,10 is essentially free with MyISAM but takes the same amount of time as the first query with InnoDB. MySQL has a SQL_CALC_FOUND_ROWS option which tells InnoDB to calculate the number of rows as it runs the query, which can then be retreived by executing SELECT FOUND_ROWS().

Optimizing Subqueries

MySQL’s query optimization engine isn’t always the most efficient when it comes to subqueries. That’s why it is often a good idea to convert a subquery to a join. Joins have already been handled properly by the optimizer. Of course, be sure the column you’re joining on in the second table is indexed. On the first table MySQL usually does a full table scan on against the subset of rows from the second table. This is part of the nested loops algorithm, which MySQL often engages to perform join operations.

(SELECT MAX(created)
FROM posts
WHERE author_id =
AS latest_post
FROM authors a

Since this subquery is correlated, i.e., references a table in the outer query, one should convert the subquery to a join.
SELECT, MAX(p.created) AS latest_post
FROM authors a
INNER JOIN posts p
ON ( = p.author_id)

Optimizing UNION
UNION has an interesting optimization that exists across a few different databases. It’s obvious when you think about how it works. UNION gives you the rows from two tables that don’t exist in the other. So implicitly, you are removing duplicates. To do this the MySQL database must return distinct rows, and thus must sort the data. Sorting, as we know is expensive, especially for large tables.

UNION ALL can very well be a big speedup for you. UNION ALL will provide the results you want, without the heavy overhead of sorting the data.

Learn Your Indices

Often your choice of indices will make or break your database. For those who haven’t progressed this far in their database studies, an index is a sort of hash. If we issue the query SELECT * FROM users WHERE last_name = ‘Goldstein’ and last_name has no index then your DBMS must scan every row of the table and compare it to the string ‘Goldstein.’ An index is usually a B-tree (though there are other options) which speeds up this comparison considerably.

You should probably create indices for any field on which you are selecting, grouping, ordering, or joining. Obviously each index requires space proportional to the number of rows in your table, so too many indices winds up taking more memory. You also incur a performance hit on write operations, since every write now requires that the corresponding index be updated.

Read Peter Zaitsev’s MySQL Performance Blog if you’re into the nitty-gritty of MySQL performance. He covers many of the finer aspects of database administration and performance.


Exception Handling in SQL Server 2005

June 12, 2009 Leave a comment

The new release of SQL Server 2005 has provided some new and interesting features for developers and administrators. No doubt that more preference is given to the tasks performed by the administrator, but there are various functionality enhancements added for the developers to make their SQL code more powerful and error resistant.

Being a developer, the most useful feature I found was the EXCEPTION handling technique. While programming in .NET or any other language, it is a curse if you are not writing your code in a TRY-CATCH block.

Similarly, when an error is thrown while executing a stored procedure through your application, you only get the information that some xyz error has occurred. This kind of error is OK when your application is in production. But while developing, it is not useful, as you have to find out what the error is, which line or procedure generated the error, etc.
Basic syntax

[ ; ]

Important Points

Here are some important things to know about TRY-CATCH

1. To catch all the errors which are of severity greater than 10 and which do not close the database connection.
2. Each TRY block is associated with only one CATCH block and must be followed immediately by an associated CATCH block.
3. The CATCH block is executed only when any error occurs. If no errors, then the next statement after the END CATCH is executed.
4. The TRY-CATCH block should be in the same batch i.e., A single TRY-CATCH block cannot span multiple batches. Similar for stored procedures or triggers.
5. The TRY-CATCH construct cannot be used within a user-defined function.
6. TRY-CATCH constructs can be nested.
7. On ERROR, the control passes to the first statement in the associated CATCH block.
8. Errors trapped by a CATCH block are not returned to the calling application. If any of the error information must be returned to the application, the code in the CATCH block must do so using mechanisms, such as SELECT result sets or the RAISERROR and PRINT statements.
9. If an ERROR occurs in the CATCH block and if the CATCH block contains a nested TRY-CATCH, it will be passed to the nested CATCH block. If there is no nested TRY-CATCH construct, the error is passed back to the caller.
10. The CATCH block does not handle compile errors, such as syntax errors, that prevent a batch from executing. Also, object name resolution errors are not handled by the CATCH block.

The following functions are used to get the ERROR details:


All the above functions can be used in CATCH block, and returns specific values related to the ERROR generated. If they are used outside CATCH block, then all of them will return NULL.
How to USE it?
STEP I: You can create a common procedure that gets you the details of the generated ERROR.

ERROR_NUMBER() AS ErrorNumber,
ERROR_SEVERITY() AS ErrorSeverity,
ERROR_STATE() AS ErrorState,
ERROR_PROCEDURE() AS ErrorProcedure,
ERROR_LINE() AS ErrorLine,
ERROR_MESSAGE() AS ErrorMessage;

This procedure can be executed in your CATCH block.

STEP II: Your actual code where all the processing is done.

EXEC ErrorDetails

On execution of the code in STEP II, the result obtained is.

In SQL Server 2000, the techniques previously used to perform error checking was through @@ERROR and performing GOTO or RETURN. Using TRY-CATCH in your SQL code will help you to monitor your code and handle errors easily.


Categories: database, SQL Tags: ,

GROUP BY function in SQL

June 10, 2009 Leave a comment

Hey Guys,
I learn couple of important things about group by function that:
GROUP BY doesn’t allow aggregate functions
As with any other group, the SELECT and GROUP BY lists must match. Including an aggregate in the SELECT clause is the only exception to this rule.


Include only those columns that define the group in both the GROUP BY and SELECT column lists. In other words, the SELECT list must match the GROUP BY list, with one exception: The SELECT list can include aggregate functions. (GROUP BY doesn’t allow aggregate functions.)
In addition, you can’t refer to an aliased field in the GROUP BY clause



The resulting groups comprise all ZIP values in the underlying data. However, the aggregate column (KYCustomersByZIP) would display 0 for any group other than a Kentucky ZIP.

Remote queries don’t support GROUP BY ALL.

Usually, you’ll use HAVING to evaluate a group using an aggregate. For instance, the following statement returns a unique list of ZIP codes, but the list might not include every ZIP code in the underlying data source:
SELECT ZIP, Count(ZIP) AS CustomersByZIP FROM Customers GROUP BY ZIP HAVING Count(ZIP) = 1

Only those groups with just one customer make it to the results.

Get a closer look at WHERE and HAVING

If you’re still confused about when to use WHERE and when to use HAVING, apply the following guidelines:

* WHERE comes before GROUP BY; SQL evaluates the WHERE clause before it groups records.
* HAVING comes after GROUP BY; SQL evaluates HAVING after it groups records.

Summarise the aggregate

You can further summarise data by displaying a subtotal for each group. SQL’s ROLLUP operator displays an extra record, a subtotal, for each group. That record is the result of evaluating all the records within each group using an aggregate function. The following statement totals the OrderTotal column for each group:
SELECT Customer, OrderNumber, Sum(Cost * Quantity) AS OrderTotal FROM Orders GROUP BY Customer, OrderNumber WITH ROLLUP

The ROLLUP row for a group with two OrderTotal values of 20 and 25 would display an OrderTotal of 45. The first record in a ROLLUP result is unique because it evaluates all of the group records. That value is a grand total for the entire recordset.

ROLLUP doesn’t support DISTINCT in aggregate functions or the GROUP BY ALL clause.

The CUBE operator goes a step further than ROLLUP by returning totals for each value in each group. The results are similar to ROLLUP, but CUBE includes an additional record for each column in the group. The following statement displays a subtotal for each group and an additional total for each customer:
SELECT Customer, OrderNumber, Sum(Cost * Quantity) AS OrderTotal FROM Orders GROUP BY Customer, OrderNumber WITH CUBE

CUBE doesn’t support GROUP BY ALL.

Bring order to summaries

When the results of a CUBE are confusing (and they usually are), add the GROUPING function as follows:
SELECT GROUPING(Customer), OrderNumber, Sum(Cost * Quantity) AS OrderTotal FROM Orders GROUP BY Customer, OrderNumber WITH CUBE

The results include two additional values for each row:

* The value 1 indicates that the value to the left is a summary value–the result of the ROLLUP or CUBE operator.
* The value 0 indicates that the value to the left is a detail record produced by the original GROUP BY clause.


Categories: database, SQL Tags: , , , , , , ,

Understanding “derived” tables in SQL

June 3, 2009 Leave a comment

In concept this is similar to creating a temporary table and then using the temporary table in your query, but the approach is much simpler, because it can all be done in one step. They can be tremendously useful in certain situations.

Boost Performance
The biggest benefit of using derived tables over using temporary tables is that they require fewer steps, and everything happens in memory instead of a combination of memory and disk.
The fewer the steps involved, along with less I/O, the faster the performance.Here are the steps when you use a temporary table:

* Lock tempdb database
* CREATE the temporary table (write activity)
* SELECT data & INSERT data (read & write activity)
* SELECT data from temporary table and permanent table(s) (read activity)
* DROP TABLE (write activity)
* Release the locks

Compare the above to the number of steps it takes for a derived table:

* CREATE locks, unless isolation level of “read uncommitted” is used
* SELECT data (read activity)
* Release the locks

As is rather obvious from this example, using derived tables instead of temporary tables reduces disk I/O and can boost performance.
For example:
Below, I’m going to use the Northwind database that included with MS-SQL 2000. Let’s say you’ve been asked to generate a report that shows off the total number of orders each customer placed in 1996. “Not a problem.” you think to yourself. “This is just an easy aggregated join query.” So, you sit down and generate your query and come up with: You can write the following query:

SELECT C.CustomerID, C.CompanyName, COUNT(O.OrderID) AS TotalOrders
C.CustomerID = O.CustomerID
WHERE YEAR(O.OrderDate) = 1996
GROUP BY C.CustomerID, C.CompanyName Looks good, But there’s something missing. Customers that didn’t place an order in 1996 aren’t showing up. You could leave off customers that didn’t place an order, but in many cases the customers that didn’t buy place orders are the ones that are of the most interest.
If you think you can include those customers with a “is null” check, you’d be wrong. Run the following code and see if you notice what is wrong: SELECT C.CustomerID, C.CompanyName,
COUNT(O.OrderID) AS TotalOrders
C.CustomerID = O.CustomerID
WHERE (YEAR(O.OrderDate) = 1996 OR O.OrderDate IS NULL)
GROUP BY C.CustomerID, C.CompanyName

At first glance this may look right, but if you examine the results carefully you’ll see that the customers with the zero count have never placed an order. If a customer has placed an order, but just not in the year 1996 they won’t show up. This is because the “is null” check finds customers that have never placed an order—it still doesn’t do anything to add customers who’ve placed an order, but just not in 1996.

This is where a derived table can come in handy. Instead of using the “Orders” table, we’ll reduce the Orders table to a snapshot of itself—including only data from the year 1996.

SELECT C.CustomerID, C.CompanyName,
COUNT(dOrders.OrderID) AS TotalOrders
/* start our derived table */
(SELECT * FROM Orders WHERE YEAR(Orders.OrderDate) = 1996) AS dOrders
/* end our derived table */
C.CustomerID = dOrders.CustomerID
GROUP BY C.CustomerID, C.CompanyName

Run the above query and examine the results. You should now see a row returned for each Customer and the total number or orders placed in the year 1996—including the customers that didn’t place an order.

The reason this works is because the LEFT JOIN will include all matches or null values. If the LEFT JOIN has matches (like in our first query,) but non that match the criteria those rows are excluded. In our derived table, since we’re only running against the orders from the year 1996 we’re ensuring that we return results for each customer.

Updating a derived table
Now the question, updating a derived table, hmmm I don’t think that it is a big issue to mention here but still it’s a question asked about derived tables. So the answer is as follows,

* MS SQL 2000 – Not possible
* MS SQL 2005 – Possible

Reference from and many thanks to following websites and articles:

Categories: database, SQL Tags: ,