JiBX: Performance with JiBX

JiBX Basics

Overview

Flexibility

Clean code

News and Status

User Comments

Performance

JiBX is designed for runtime performance. There are three main aspects of this: Efficient unmarshalling, direct access to data, and lean runtime. The first aspect is probably the most difficult to explain. To understand how this applies in JiBX you have to know the difference between the common event-driven approach to parsing XML (as implemented by SAX/SAX2 parsers) and the newer pull parser apprach (as with XMLPull). Event-driven parsers deliver document components (elements and character data content) to your code one at a time, using callbacks. You call the parser from your code, defining a "handler" to process these callbacks. The parser runs through the entire document with a call to your handler for each component, and then returns. It's up to your handler code to organize the information contained in the document components.

The problem with this approach is that it requires your handler code to track where the parser is at in the document and interpret the components appropriately. At the most basic level, the start tag for an element containing a text value is reported first by the parser, followed by one or more chunks of text, followed finally by the end tag. Your handler generally needs to accumulate the text until the end tag is reported, then do something with the text. The "something" it does may be effected by other items, such as the attributes of the start tag, so that's more information that needs to be held. The net result is that handler code for event-driven parsers tends to involve a lot of String matching on element names in chained "if" statements (or the equivalent using a Hashtable lookup). This is both messy to write and maintain, and inefficient.

Pull parsing turns the parse event reporting around. Instead of the parser calling methods in your handler to report document components, you call the parser to get each component in turn - the parser becomes essentially an iterator for moving through the components of a document. When you write code using this approach the state information is actually inherent in the code. To take the case of an element containing text, you can write your code with the knowledge that the element you're processing has text content, and just process it directly - effectively one call to get the start tag, one call to get the content, and a third to get the end tag. Even better, you can write a method that handles any element containing a text value - just call the method with the expected element name, and it returns the text content or an error. Since most XML documents use fixed ordering of child elements, processing the children of a particular element becomes as simple as just making once call after another to this method.

JiBX uses a pull parser for unmarshalling so that it can take advantage of this code structure advantage. Because the code for pull parsing can be much simpler than that for event-driven parsing, JiBX is able to use byte code enhancement to add both marshalling and unmarshalling code to your class definitions. Using byte code enhancement in turn lets JiBX make use of the other two performance aspects.

"Direct access to data" just means that JiBX accesses data from your objects in whatever way is most natural. Normally this is by loading and storing field values directly in the marshalling and unmarshalling code added to your classes by byte code enhancement. It can also be by using JavaBean-style get/set methods, if that's a better solution for your code. Because JiBX uses byte code enhancement there's no need to make the fields or get/set methods public, unlike most other data binding frameworks. There's also no runtime cost for accessing the data, unlike the frameworks that use reflection to move data in and out of objects.

The final performance aspect of JiBX is that the runtime is lean and mean - configuration information is normally processed at application assembly time (after compiling classes), and is embedded directly in the code added by byte code enhancment. Other frameworks that read configuration files at runtime (most do not) suffer a performance disadvantage both from the actual processing time and from the added code needed to support this. This can make for a very slow start on execution, as classes are loaded and compiled to native code by the JVM. JiBX's approach of handling all the configuration prior to running the application minimizes startup overhead, both directly and through avoiding unnecessary runtime code. JiBX's use of a pull parser also helps keep the runtime small and efficient, as does its limited validation support - JiBX automatically checks many aspects of a document based on your mapping when unmarshalling, but does not support in-memory validation.