Oscar Westra van Holthe - Kind

Persistence API's - JDO vs. JPA

JDO (Java Data Objects) and JPA (Java Persistence Architecture) are two APIs to persist objects in a database. Both work with Java 1.5+, both support JavaEE and JavaSE.

The only difference is the way they store data and the options an application has. The most important differences are explained the easiest as features JDO has (and JPA does not), and vice versa.



In short one can conclude that with a few exceptions (most notably the Criteria API from JPA), JDO offers the most flexibility. Yet I will still argue that JPA is the better choice.


Keep It Simple & Stupid!

When you have the opportunity to degisn the architecture of an application, you should take into account that it will be used for years. And all that time the application will be maintained, enlarged and changed by different people. So it is very important that all these people understand fully how the architecture works, and especially how they should structure theys changes. The how is important, as otherwise the application will degrade into a legacy application.

In other words: it must be simple.

This simplicity can be achieved in several ways:

Choose an obituous paradigm

Relational databases are used so much, that they implicitly influence how programmers understand a data model. Most programmers make assumptions that hold for a relational database. It doesn't matter if you're using an OODBMS, Googles Bigtable or any other database. It also means that those assuptions are usually wrong for non-relational databases!

For example, many programmers assume the ACID properties, which do not hold for Google Bigtable and other 'no-SQL' databases. Also few programmers realize that it's not always easy to filter a production size database to make a smaller test set. This is especially treue if system administrators do not have access to the data (in a way they understand). The tools that exist for this usually only support relational databases. Other databases usually do not score well on this point.

Another assumption programmers make is that the number of queries the ORM layer makes will be reasonable. But I've seen first hand that when using JDO fetch plans (which are global to all queries in a transaction), this is not the case. For one screen JDO generated so many queries, that you could have lunch and come back just as the transaction timed out. Naturally this is an extreme example. The insidiousness of the assumption is that it holds often enough to create problems that are difficult to solve.

Use a simple structure

A layered system with a model, data access objects and services is easy to understand, even if it usually yields a structured, non-OO system as most people implement it by separating the data (into the model) and the code that operates on it (into the services). However, such a simple structure is easy enough to follow that even the 'mere mortals' among programmers can understand it.

And because most common interpretations of the Model-DAO-Service architecture are pretty consistent, your architecture will remain intact.

The most optimal architecture is not only sufficiently efficient. Most of all it is easy to understand. This keeps it intact. It also keeps maintenance costs down as there is less likely to be an irregular mix of designs that is difficult to understand, fix and change.

For this reason it is best to choose a simple, flexible architecture, and then stick to it. Contrary to popular belief, this last part is the most important part of being a software architect.

Reduce the number of options

The more options, the more difficult it is to understant the choices and to choose. This starts as soon as there are two options. As soon as you deviate from "1 concept, 1 option", it should stand out. This is one of the reasons annotations are more popular than XML for ORM tools: 1 concept (the entity) has one option (defined in the class).

Why JPA is better, when when you might use JDO anyway

JDO has some features for which there is no workaround in JPA. In decreasing order of usefullness, these are:

  1. Non-relational databases
  2. Persisted interfaces
  3. Defining metadata through an API
  4. Transactional fields
  5. Fetch Plans (some would put this higher, but below I'll explain why they are a bad idea)

If you need one of these features, you might use JDO. Other JDO specific features, such as the extra relationships types it supports, are actually a good reason to refactor your design.

I claim that the features listed above are not needed. Here is why, per feature:

Non-relational databases
If you're storing objects in a database, profgrammers expect the behavior of a relational database. Deviating from this is only confusing. If, for whatever reason, you do need another datastore, be it Excel, a JCR-repository, or anything else, it had better be different. If not, you'll open a whole slew of gotcha's (i.e. things that work as documented, but not as expected). Using the same persistence API is therefore a bad idea.
Persisted interfaces
Interfaces definine the behaviour of an object. They contain no data, and have no identity. If you persist an interface, it actually means that the implementing classes have something in common. That can only be data, which implies that a persisted class is a better choice.
Defining metadata through an API
This is a short one: your data model does not change while the application is running (it might via an update, but that's something else). Therefore, it is never necesary to change the metadata. The only useful use of this feature is to dynamically define fetch groups, as maintaining them is very error prone. But as argued there, fetch plans solve a problem you shouldn not have in the first place.
Transactional fields
Although it is useful if non-persisted fields rollback when the transaction does, it is tricky to follow. A field is after all, no longer either transient or persisted. And even then, there are hardly situations where the objects from a failed transaction are reused. The only possible situation where they might be useful are systems with long running, and thus complex, transactions. The problem you're having then (complexity) cannot be solved with transactional fields however.
Fetch Plans

This is the only feature specific to JDO that actually seems useful. A fetch plan is much more flexible than the usual, static stretegy to eagerly fetch a field.

But when do you really need it?

Usually, adjustments to the static strategy are only needed for queries, as a good UI only ever shows the same fields of an entity (with only a few exceptions). For all lists (including Collection-relations) you'll be using a query, and then it's more clear what you're doing by defining your strategy in the query itself. This way, only the query defined what is retrieved (1 concept, 1 option). This makes your application maintainable.

And if you want to use fetch plans anyway, it'll usually be based on the data you need on screen. So either your DAO layer will have a method per use case (and the method defines the fetch join fetch plan, or you have generic queries and define the fetch plan elsewhere. But then you have a circular dependency: your View and Service layers depend on the data (as usual), but also the other way around: your Model depends on your View/Service layer having the internal knowledge of the DAO on how the data is retrieved (specifically that only part is retrieved). This breaks your architectural layers!

But if even that does not scare you (it should!), your problems are not over yet. To keep fetch plans maintainable, sooner rather than later you'll literally translate the displayed relationships to fetch plans. An elegant implementation might even translate the relationships (as for example OGNL-expressions) to the individual relationships on entities (these will be the fetch groups). The result of this is that there must be a fetch group for every relationship (as a String!), which makes your model much less maintainable. With all future costs and problems that go with it.

JPA also has a few major advantages over JDO, all related to the Criterias API:

Especially when querying data JPA offers more safe choices than JDO. Safe in the sense that the compiler can help to recognize errors. JPA also has a clearer structure to point out deviations on the default fetch behavior. Remember that displaying (subsets of) data is so much more common than altering it. An API that is optimized for sets has a clear advantage. Given the commonalities between JPQL and SQL, JPA is the clear winner here.

As a final remark I would say that relational databases are so common because usually they are a sufficient choice. There where they are not (think Google, Bol.com, Amazon, Facebook, etc.), the demandfs are so extreme that you need a specialized approach. Including the better, more expensive programmers to handle it.

In all usual cases JPA is a sufficiently efficient choice, and also the most simple to boot. This makes it the most maintainable and least expensive choice.