To be part of the revolution or not: April 2011

Tuesday, April 26, 2011

Hibernate, enumerations and smiley faces

I have a database table that has a column with single character values that are pre-defined. In the application tier, they are mapped as enumerations. So, it can be of value 'A', 'B', 'C' let's say.

Hibernate works well with that when I have to do something like

SELECT * FROM table WHERE columnX = 'A';

I use the CriteriaQuery to generate an instance of the object, add the column name, eq, value and viola, I am good to go. I would get an instance of whatever object the table maps to. Great. Keep in mind that my query is a bit more complex (column1 = blah and column2 = blah1 and columnX = 'A').

Now let's say I want to make this somewhat more flexible. In some cases, I need to do this

SELECT * FROM table WHERE columnX = '_';

How do I do that with hibernate (given that I am trying to use the enum for the value portion of the Criteria)?

The best I was able to come up with was to pass in null and check in my query creation method whether the passed in value was null; if so, then don't add it to the query (match any single character for a single character column should be the same as not passing in that where clause). Is there a better way of doing that?

Wednesday, April 13, 2011

Which is faster, contains() or indexOf() in a List?

One of the small things I was trying to solve today had to do with figuring out if something is not a member of a list. The list can be arbitrarily long and I need an efficient way of getting the data.

My first attempt to write that was something like:

public static boolean isNotMemberOf( Object type, List invalidTypes) {
return (invalidTypes.indexOf(type) == -1);
}

This makes for ugly reading though. My next attempt was to use the contains method

public static boolean isNotMemberOf( Object type, List invalidTypes) {
return (!invalidTypes.contains(type));
}

Internally, contains apparently uses the indexOf call. So the question then becomes, which one is faster and what should I do? Should I just use contains for easier readability (and fewer WTFs a month later on, when someone else sees it)?

Tuesday, April 5, 2011

Getting better data for testing

I learned something neat the other day that I thought was worth writing about. I write a lot of unit test code to test the code I am writing. Since I use an ORM, I am able to create the persisted 'objects' in memory using mock factories. These are great because you can do something like

MyObject object = MyObjectMockFactory.create(EntityManager)
object.setProperty(blah);

and then use it in my testing and do something like

assertEquals( expectedValue, object.getProperty);

This is great. I especially enjoy using all that MockFactory based magic to create proper data structures in memory. All the relationships between tables in the database are created properly. There is a slight drawback to this approach though. Say you have an object that you are interested in testing. It is related to 20 other objects that you need to exist for your test. Now, you have to create all those objects, associate them properly so that you can use it in your test. Except you don't care about those objects. You just want them to exist. This starts to get annoying after a while. Imagine the object you are interested in is also relevant to many other classes and tests. All of a sudden, everyone has to create that object and its associated objects over and over for their unit tests. This gets painful (and inconsistent) quickly.

So how can we make this better? One solution I came across was DbUnit. To make this really effective though, I ended up using Jailer to extract the data and use it using DbUnit to do the tests. Jailer is really neat. It allows you to model the data structures you care about, filter it and export it in a DbUnit compatible flat file. Once that is done, you use DbUnit to load up the DbUnit xml files, create the structures in HSQLDB and you're off to the races.

The benefit to using Jailer is that you can have consistent data sets for multiple tests that multiple people can now use without having to build up all those 20 objects one at a time and associating them. All that stuff is now taken care of by a couple of calls to the DbUnit API.

There are down sides to it for sure. In my case, I have to test for some data that doesn't exist or doesn't match up. That is hard when all the data is exported from a consistent database. But you can doctor that after the model is loaded up. You also need to have representative datasets. As in, if you expect 10 types of orders to be in the database for your tests, the exported data better have that. People also need to know what the data stored in the XML files so that they can use them in the appropriate manner.

I really do like not having to generate tonnes of data artificially; rather, it makes sense to get the data that already exists, plunk it into the model and get on with testing. And this can be extended to large regression testing models. Generate the model, understand it and add more. OF course, you don't want to bloat your DbUnit tests with large data sets... but you can if you want to I suppose.

To be part of the revolution or not