2008-04-10

Hibernate without a DBA: a sure path to failure!

Recently, my team had to deploy a Java web application that we bought from a small company. They used Hibernate as their Object Relational Mapping and were using it to generate their database schema.

In order to make their installation process easier, the application was creating or updating the database schema at startup time. This may sound like a nice idea but in most companies, you don't want to grant your application DROP, CREATE or ALTER privileges as it could be a security vulnerability if your web application gets hacked. Fortunately, Hibernate provides you with ant tasks to generate your schema creation script and if your customer runs on a different database server, you will generate a script for it.

That's great but as soon as we started to run our load tests, we got tons of deadlock errors on SQL Server 2005:

Transaction (Process ID 54) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
A typical developer reaction is to accuse Hibernate, the driver or the database engine. After a quick search on Google I was under the impression that it should be a bug in the application.

Application bugs


The show_sql flag in Hibernate configuration is a useful tool, it helped the developers to find that under some circumstances the application would save the same data twice. They fixed the code,we ran the load test again and observed a great performance improvement but still deadlock errors.

Another search on Google leaved me with more questions than answers with few exceptions like this excellent article from Bart Duncan. After reading it, you should be convinced that you need some help from a good DBA. We were fortunate to have a good DBA team even though I found it difficult to explain them what statements or queries we were using because they were generated by Hibernate and not written by ourselves.
The most difficult part was to obtain a trace on the database server when reproducing a deadlock. Many trace flags can be too invasive and change your timing preventing you to reproduce the deadlock. We found that the trace flag 1222 was the most helpful to get data without modifying the execution timing.

Unicode encoding

Our DBA analyzed the data and found a deadlock cause: inserting unicode strings into a varchar column that was indexed, this caused a conversion and index scan resulting in a concurrency between 2 threads executing same statement updating different rows but using same index. Solution: convert column types to nvarchar or change your JDBC driver settings to avoid encoding string parameters in Unicode, both will also bring you an additional gain in performance.

Missing index on a foreign key

We changed the JDBC driver setting and ran again the load test: deadlocks again but new ones.
Our DBA did analyze the new trace and quickly found a foreign key in a one-to-many relationship which was not indexed, this resulted in an index scan and deadlock on concurrent updates. This is something that you can and should specify in your Hibernate mappings.

Other issues

You can get deadlocks also when using clustered indexes. It turns out that by default primary keys in SQL Server are clustered indexes. So by using Hibernate schema, you will get clustered indexes for all your primary keys, it's usually not a problem especially for naturally growing keys like identity columns but if you are using randomly generated strings for your ids it can be a problem.

Conclusion

Hibernate like other ORMs is a useful tool but you should use its schema generation feature for what it is: a help to speed up your initial development but you will not avoid fine tuning of your table definitions, indexes and only an experimented DBA can help you there.

Do not believe that by using Hibernate you will be able to migrate easily from one database engine to another one, you will almost always end up tuning your schema to solve deadlocks or performance issues in a specific way.

Aucun commentaire: