Side Spin, Gael's blog: 2008

2008-08-10

FindBugs on a Wicket + Spring application

FindBugs is a great static analysis tool and it helped me to find several bugs but recently we switched to Wicket 1.3 framework for web development and we got few false positives which were related to using Spring with Wicket.

Serialization

We had Wicket page classes with fields being injected by Spring using the @SpringBean annotation.
FindBugs complained that these fields should have been serializable or transient (SE_BAD_FIELD error) and this seemed an issue as Wicket does use serialization a lot to save pages into its session.
In fact, it turned out to be a false postive because Wicket manages this by using Spring dynamic proxies.

Unitialized fields in constructor

Another FindBugs complaint in Wicket page classes was about de-referencing fields in constructor before having initialize them (UR_UNINIT_READ error).
Again, this turned out to be a false positive because our pages did extend WebPage class that takes care of injecting all SpringBean annotated fields using a PropertyResolver.
Something to remember: the default PropertyResolver is able to initialize private fields and ignores setters.

Conclusion

Using Spring beans in Wicket pages introduces a lot of dynamicity that defeats static analysis when dealing with object initialization and serialization.
FindBugs is so valuable that it is worth excluding these rules on Pages and having a naming convention for these classes makes it easier to do. Our page classes are now named with the "Page" suffix.

2008-04-19

My first (useful) script in Groovy

We had a large number of XML files to modify and found that Groovy with its GPath syntax was the right tool. Here is an example that takes a (simplified) XML file and changes the value of one attribute for a subset of nodes.

<?xml version="1.0"?>
<design>
  <process>
    <variable id="V1" visible="true" />
    <variable id="V2" visible="false" />
    <variable id="V3" visible="false" />
  </process>
</design>

And the script to change all XML files in the current directory.
I found it easier to write and debug it than using a mix of java and XSL.


def basedir = new File( ".")

// Create a directory for patched files
new File("patched").mkdir()

// Get files with ".xml" extension
files = basedir.listFiles().grep(~/.*\.xml$/)

// Iterate on the files
files.each {
  patchXML(it)
}

def patchXML(file) {
 println "-------------------"
 println "Patching $file.name"

 def design = new XmlParser().parse(file)
 def modified = false

 for (variable in design.process.variable) {
   switch (variable.@id) {
     case "V1":
     case "V2":
         if (variable.@visible == "false") {
             variable.@visible = "true"
             modified = true
         }
   }
 }
 if (modified) {
   new File("patched/$file.name").withPrintWriter() {
      new XmlNodePrinter(it).print(design)
   }
   println "Patched $file.name is in \"patched\" directory"
 } else {
   println "$file.name was not modified"
 }
 println "-------------------"
}

2008-04-10

Hibernate without a DBA: a sure path to failure!

Recently, my team had to deploy a Java web application that we bought from a small company. They used Hibernate as their Object Relational Mapping and were using it to generate their database schema.

In order to make their installation process easier, the application was creating or updating the database schema at startup time. This may sound like a nice idea but in most companies, you don't want to grant your application DROP, CREATE or ALTER privileges as it could be a security vulnerability if your web application gets hacked. Fortunately, Hibernate provides you with ant tasks to generate your schema creation script and if your customer runs on a different database server, you will generate a script for it.

That's great but as soon as we started to run our load tests, we got tons of deadlock errors on SQL Server 2005:

Transaction (Process ID 54) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.

A typical developer reaction is to accuse Hibernate, the driver or the database engine. After a quick search on Google I was under the impression that it should be a bug in the application.

Application bugs

The show_sql flag in Hibernate configuration is a useful tool, it helped the developers to find that under some circumstances the application would save the same data twice. They fixed the code,we ran the load test again and observed a great performance improvement but still deadlock errors.

Another search on Google leaved me with more questions than answers with few exceptions like this excellent article from Bart Duncan. After reading it, you should be convinced that you need some help from a good DBA. We were fortunate to have a good DBA team even though I found it difficult to explain them what statements or queries we were using because they were generated by Hibernate and not written by ourselves.
The most difficult part was to obtain a trace on the database server when reproducing a deadlock. Many trace flags can be too invasive and change your timing preventing you to reproduce the deadlock. We found that the trace flag 1222 was the most helpful to get data without modifying the execution timing.

Unicode encoding

Our DBA analyzed the data and found a deadlock cause: inserting unicode strings into a varchar column that was indexed, this caused a conversion and index scan resulting in a concurrency between 2 threads executing same statement updating different rows but using same index. Solution: convert column types to nvarchar or change your JDBC driver settings to avoid encoding string parameters in Unicode, both will also bring you an additional gain in performance.

Missing index on a foreign key

We changed the JDBC driver setting and ran again the load test: deadlocks again but new ones.
Our DBA did analyze the new trace and quickly found a foreign key in a one-to-many relationship which was not indexed, this resulted in an index scan and deadlock on concurrent updates. This is something that you can and should specify in your Hibernate mappings.

Other issues

You can get deadlocks also when using clustered indexes. It turns out that by default primary keys in SQL Server are clustered indexes. So by using Hibernate schema, you will get clustered indexes for all your primary keys, it's usually not a problem especially for naturally growing keys like identity columns but if you are using randomly generated strings for your ids it can be a problem.

Conclusion

Hibernate like other ORMs is a useful tool but you should use its schema generation feature for what it is: a help to speed up your initial development but you will not avoid fine tuning of your table definitions, indexes and only an experimented DBA can help you there.

Do not believe that by using Hibernate you will be able to migrate easily from one database engine to another one, you will almost always end up tuning your schema to solve deadlocks or performance issues in a specific way.

2008-02-27

Public Maven repositories and wrong POMs

I experienced an issue with log4j 1.2.15: it comes with a POM that wrongly forces you to include extra dependencies that should be optional (e.g. Java mail if you don't plan to use the SMTPAppender feature). This issue has been reported as bug #43304 to log4j team.

My first reaction was to exclude these dependencies in my project pom.xml but this did not work (I did not take time to investigate why). Anyway, it was faster for me to patch the log4j POM in our intranet repository.

Lesson learned: managing dependencies can be hard and you should not depend on public repositories as they can be wrong sometimes. Make sure your team has an Intranet repository and make it simple to update with a good repository manager like Artifactory.

Side Spin, Gael's blog