Tuesday, April 5, 2011

Getting better data for testing

I learned something neat the other day that I thought was worth writing about. I write a lot of unit test code to test the code I am writing. Since I use an ORM, I am able to create the persisted 'objects' in memory using mock factories. These are great because you can do something like
MyObject object = MyObjectMockFactory.create(EntityManager)
object.setProperty(blah);

and then use it in my testing and do something like
assertEquals( expectedValue, object.getProperty);
This is great. I especially enjoy using all that MockFactory based magic to create proper data structures in memory. All the relationships between tables in the database are created properly. There is a slight drawback to this approach though. Say you have an object that you are interested in testing. It is related to 20 other objects that you need to exist for your test. Now, you have to create all those objects, associate them properly so that you can use it in your test. Except you don't care about those objects. You just want them to exist. This starts to get annoying after a while. Imagine the object you are interested in is also relevant to many other classes and tests. All of a sudden, everyone has to create that object and its associated objects over and over for their unit tests. This gets painful (and inconsistent) quickly.

So how can we make this better? One solution I came across was DbUnit. To make this really effective though, I ended up using Jailer to extract the data and use it using DbUnit to do the tests. Jailer is really neat. It allows you to model the data structures you care about, filter it and export it in a DbUnit compatible flat file. Once that is done, you use DbUnit to load up the DbUnit xml files, create the structures in HSQLDB and you're off to the races.

The benefit to using Jailer is that you can have consistent data sets for multiple tests that multiple people can now use without having to build up all those 20 objects one at a time and associating them. All that stuff is now taken care of by a couple of calls to the DbUnit API. 

There are down sides to it for sure. In my case, I have to test for some data that doesn't exist or doesn't match up. That is hard when all the data is exported from a consistent database. But you can doctor that after the model is loaded up. You also need to have representative datasets. As in, if you expect 10 types of orders to be in the database for your tests, the exported data better have that. People also need to know what the data stored in the XML files so that they can use them in the appropriate manner.

I really do like not having to generate tonnes of data artificially; rather, it makes sense to get the data that already exists, plunk it into the model and get on with testing.  And this can be extended to large regression testing models. Generate the model, understand it and add more. OF course, you don't want to bloat your DbUnit tests with large data sets... but you can if you want to I suppose.

No comments: