|
|
Discussion: container-managed persistence
f you followed the tutorial on container-managed
persistence with jBoss, you will have seen that creating persistent,
distributed objects is not really any more difficult than creating transient
ones. The EJB container does most of the hard work; all the programmer needs to
do is to tell it which fields are persistent. However, it isn't quite as simple
as that, and naive use of CMP can lead to very inefficient programs. To see
why, it's necessary to understand at least in outline how the EJB server deals
with container-managed persistence.
Technical overview
In the EJB field there is a very strong correspondence between `rows of a database table', and
`instances of an object'. It is clear that the EJB developers had this notion in
mind from the very beginning. While the specification doesn't stipulate that persistence is
provided by database tables, in practice it always is. Moreover, it is tacitly
assumed that the communication between the Beans and the database will be by
means of SQL statements. What does this imply for
container-managed persistence?
When an persistent object is instantiated, the
EJB container must generate SQL code that will write a row in the table. When
the object is deleted, it must generate SQL to remove it. This isn't much of a
problem. When one object asks for a reference to another, the container must
find (or create) that object's row in the table, read the columns,
instantiate the object in the JVM, and
write the data from the table into its instance variables. Because this process
can be quite slow, the EJB server may choose to defer it. That is, when one
object gets a reference to an object that is container-managed, the latter
object may be uninitialized. Initialization from the database table takes place
later, perhaps when one of the methods is called. This late initialization
reduces inefficiencies arising from initializing objects that are never read,
but has its own problems, as we shall see.
Limitations of CMP
Efficiency limitations
The main limitation is that the EJB container will probably not
be able to generate database access statements with the efficiency of a human
programmer. Consider this example: suppose I have an database table containing
details of my music CD collection. I want to search ithe collection
for any one which has the text `Chopin' (case insensitive) in
either the `title' or `notes' column. In SQL I could write a statement like this:
SELECT FROM CD WHERE title LIKE "%chopin%" OR notes LIKE "%chopin%";
The % character is an SQL wild-card and takes care of finding the required
string somewhere inside the field; the `LIKE' operator is case-insensitive by
default. How could we achieve this with a container-managed EJB? If `CD' is an
EJB, the container-supplied method `findAll()' in its home interface
will get all the current instances of
`CD'. In practice it will do this by executing a statement like
SELECT FROM CD;
and then instantiating CD for each row found. At some point it will probably
store the primary key from each row of the database into the
appropriate attribute of each CD instance. Then the program must examine the objects one
at a time, checking whether they meet the required criteria (i.e., the word
`Chopin' in the appropriate attributes). As the program
iterates throuugh the objects, the server must cause their attributes to be
read from the table; it won't have done this until now because it would try to
conserve memory. So for each object examined the server will generate SQL code
like this:
SELECT FROM CD WHERE ID=xxxx;
Suppose there are 200 CDs known to the system. Rather than executing one SQL
statement to get a list of all matching CDs, the CMP scheme has executed
over 200 SQL statements to achieve the same effect. We can't improve the
situation by using a call to findByTitle then findByNotes()
because these methods only provide exact string matches.
Another efficiency limitation comes from the way the database table is updated
when attributes change. There are two main ways to achieve this. The server
could execute an instruction like this:
UPDATE CD SET artist="Bloggs" WHERE ID="200";
for example. This is efficient, but requires the that `Artist' field really be
called `artist'. This makes it difficult to change the names of columns in the
table. Alternatively the server could do a SELECT to get the current column
values, delete the whole row, then insert a row with modified values. This
allows a number of values to change at once and, because all values are
written, it doesn't matter what the columns are called. This is the approach
that jBoss uses. The problem is that if a class has ten persistent attributes,
and they are altered one after the other, in the worst case this results in ten
row deletions and ten row insertions.
Limitations of late initialization
Suppose we want to find whether a CD with a specific ID exists on the system.
With CMP this corresponds to finding whether there is a row in the database
table with the corresponding value of the `id' column. The code in Java might
look like this:
// Get a reference to a CD Bean
Object ref = jndiContext.lookup("cd/CD");
// Get a reference from this to the Bean's Home interface
CDHome home = (CDHome)
PortableRemoteObject.narrow (ref, CDHome.class);
// Find the matching CD
CD cd = home.findByPrimaryKey("xxx");
What will happen if `XXX' is not the ID of a CD that exists? There would seem to be two
sensible approaches. Either `findByPrimaryKey' could throw an exception, or
perhaps it could return a null reference. In either case the client could
easily tell whether the object exists. In practice, the EJB server may do
neither of these things. It may well return a reference to a CD bean instance,
which appears to be a perfectly valid object. However, none of the object's
attributes will be initialized; initialization won't happen until the object is
really required. This is done to improve efficiency; there is, after all, no
need to initialize the object unless it will be needed. However, if the program
continues to execute on the basis that `cd' refers to a valid object, an
exception will be thrown later when the program tries to interact with it.
This may not be a problem; if the ID had been generated from some earlier
database access then we may be sure it really exists, and any failure to find
it in the database represents a serious failure. However, if the data has come
from the user, it is reasonable to expect some errors of typing or memory.
Things can be made more predictable by always reading one of the attributes of
an object after getting a reference to it, like this:
CD cd = home.findByPrimaryKey("xxx");
String dummy = cd.getId();
If there is no CD whose ID field is `XXX' then this will throw a
java.rmi.NoSuchObjectException. This gets around the problem of late
initialization, but at the cost of an additional SQL access.
Suitability of container-managed persistence
In many applications of object-oriented programming we have had to accept that
some things that are philosophically objects are in reality implemented
as something
else. The `something else' may be a row of a database table, or a line of a
text file, or whatever; at some point we had to code the interface between the
object-oriented system and the `something elses'. Entity JavaBeans goes some
way towards eliminating this problem; things that are philosophically object
can be modelled as objects, with methods and persistence. But this comes at a
cost. It's worth asking whether the `CD' EJB in the tutorial example really is
an object in a meaningful sense. It has attributes, but it doesn't do
very much. We don't really gain all that much by making it an object; it could
have remained a database row, and been manipulated through the `CDCollection'
class. Of course this isn't as elegant, but elegance can come at a high price.
In summary then, container-managed persistence is straightforward to implement
using jBoss (or any other EJB server, for that matter) but needs to be used
quite carefully if serious inefficiencies are to be avoided.
|