Using tests to help refactor a legacy codebase

I have been tidying up a legacy code base recently. Its actually my own code, and there’s plenty to be done. Of course it was written under hard time pressure and I was working on my own so the quality has suffered. My discipline was not at the level at which I would apply to a commercial project. Which is an interesting learning in itself.

Here are some (quite lengthy) notes about the process I went through to clean up this code.

The first thing I noticed about the codebase was that although I had a build file which did lots of cool things, it didn’t run the tests automatically.

Step One – re-factor the build script

I wanted to tidy up the build script. It was quite long and hard to understand. Also I want to distribute this project on Google code to people and so I wanted to make it possible for people to simply checkout the code, and type “go” to get a distributable package built.

One of the issues with this is the dependant libraries. So I introduced Ivy to manage this for me (other posts to come on this topic).

Secondly I wanted to compile and run source code, unit tests functional tests (i.e. tests which may use external resources and involve a set of objects working together) and hopefully acceptance tests.

In ant, doing all this can require quite a bit of duplication. So I broke out some macrodefs to handle compilation and execution (again I’ll post this as a separate item hopefully).

Ok so the build looks sweet. When I finally run it and get it to open the junit reports in a browser for me, I see heaps of errors and failures! oh dear. I obviously hadn’t been running the tests every time I checked in. Bad developer!

So, step one with the legacy code base is to get all the tests runnable from the command line ON THE DEV BOX (I have seen a project where this was not simple to do and so the devs relied on the “CI” build to do this for them. not fast enough feedback).

I would like to get a CI build set up for it, but not sure where I’m going to host it. I think that the precedence should be FIRST dev build, THEN CI. If you cant get feedback on your dev box you are asking for integration problems.

Step Two – Make the tests green

Most tests were easy to fix. There were several types of failure / error:

1) Error because of missing resources. Simple to fix, just needed to copy them onto the classpath.

2) Changed functionality where test not updated. Often I had changed the underlying code (e.g. formatting a value) and not updated the tests. This is mostly caused by not running ALL the tests every time a change is made (see kent becks’ JUnit Max

3) Incomprehensible failure – Ah. One of the tests had such a complex set-up that I just couldn’t remember what the hell it was supposed to be testing. Well, I could see what it was trying to test, but the scenario was so complex that it was hard to work out why it was failing.

Solution ? Rip it out. This might seem a little drastic. Infact in one case I simplified it greatly but in a couple of other places I simply removed the test.

My rationale for this is that a failing test that I cannot fix is providing NO VALUE. Infact it is probably providing NEGATIVE value and actually creating NOISE.

Besides, its always in version control if I REALLY want to get it back later.

Step Three – Remove compilation warnings

My project was written in Java 1.4 I was using “enum” all over the shop as a variable name, which is obviously not cool in 1.5. So I removed all the warnings. Again, Warnings are NOISE, adding accidental complexity to my codebase. A health warning here. I once worked on a codebase where development was done on a branch. Going round removing all the warnings everywhere on a branch can seriously affect your merge because there are heaps of small changes which can lead to conflicts. Although relatively safe in terms of refactoring the cost in terms of merge hell is probably not worth it.

I did not remove deprecations yet. Thats another step.

Step Four – Refactor the tests

This is the interesting one and the one that prompted this post.

When running the unit tests, I saw that a few of them were taking quite a long time. These tended to be of the kind which were testing random number generators. To do this, they were generating say 100,000 random numbers in a Gaussian distribution and then analysing the standard deviation of the results.

Also I noticed that some of the tests had many test methods all testing different functionality (for example I had a “Math” class which does heaps of things).

Firstly I created a new source folder for “Functional Tests” to put these longer running, more high level tests in. As I was adding the new Test Cases, I realised that what I was doing was re-organising the functional areas of the tests. For example I extracted a class called “GaussianRandomGeneratorTest” which used to be a single method on my uber-test case.

I realised that what I was doing was Refactoring the Test Cases to represent a new, structure, without refactoring any of the source. Interesting. This means that what I am doing is 100% safe, there is NO WAY I can introduce a bug into the code because I am not mutating it. And yet I am building up a new understanding and description of the code which can point the way towards how the code might look one day.

This is potentially a powerful technique. I realise that I have used the same technique on a previous legacy source base, without really conceptualising it. We extracted all the tests that worked for the area we wanted to change, and then re-structured some of them to make more sense.

In summary, refactor the tests! Its 100% safe and can give you the freedom to experiment with levels of abstraction and cohesion without risking any changes to the code.

Step Five – Remove unused code

This one is quite hard. But luckily in my case I had used deprecation to mark classes I no longer needed. So this stage will involve removing all deprecated classes. Note that I have the luxury to do this as no-one depends on my project. At a minimum I should make sure that nothing inside the codebase uses these deprecated classes.


“Object Calisthenics” : Jeff Bay – II

Following my Previous post on this topic, I have been in contact with Jeff via email, and he mentioned that the version of the chapter which is on his website is a preview version and so the published one should be taken as the final version.

Something I mentioned there is a rule which wasn’t included in the final version (it is called “3. No static methods other than factory methods”). Jeff had some interesting comments and it sparked some good discussion between my pair (Steph) and I and so I think its worth adding the result here.

As I indicated, I had been thinking a bit about statics recently and been trying to pin down what I think about them. I was thinking along the lines of “Don’t use statics when you really want an instance” (see below). Jeff made the observation that actually they use statics quite a bit under certain circumstances.

With Jeffs kind permission, I will try to summarise our discussion.

Methods with no side effects

we use static methods extensively. whenever we can have a method that doesn’t have any side effects, we mark it as static so that the ide can alert us to the fact that it is so.

I realised that this is something I also like, if we’ve just written a method and it has no interaction with the state of the object, making it static is a clear indicator at the compiler level that the method has no side effects. I would tend to just do this without even thinking about it, so it was interesting to call it out.

Guard conditions / Encapsulation

bombUnless(isTrue, "the thing was supposed to be false!");

Here, there is a piece of logic which belongs to this class, but doesn’t involve the state of the instance. Steph and I discussed this also.

String params = url.params();
if (params == null || params.trim().equals("")) children.add(sentinal);

Becomes :

if (isEmpty(params)) children.add(sentinal);

I had hinted at this in the previous post, in terms of it being a common pattern. One place it is frequent is in the area of testing and mocking, for example Mockito or JUnit both make extensive use of statics which makes the code easier to read.

Jeff had this to say:

These kinds of minor improvements to code, getting rid of micro-duplication, have given me some of the best gains in my practice. Without static methods and static imports to help reduce the client’s overhead for the method call, it’s hard to make certain kind of improvements worthwhile.

Don’t use statics where you can have an instance

This was my original understanding of the rule “only use statics for factories”. I had been thinking that the intent of the rule was this, so for example:

public void talkToTheService() {
    Result result = SomeService.doSomethingForMe(myArgs);

instead of :

private SomeService someServiceInstance = new SomeService();

public void talkToTheService() {
    Result result = someServiceInstance.doSomethingForMe(myArgs);


I’m sure there is more to say on the subject, but I think some of these points are useful. As a general point, Jeff concluded our discussion with:

Static methods should be reserved for things that either touch multiple objects and have no identity beyond the functional cohesion (make it a static method to give the related steps a name) or things that represent generic operations on fundamental types (such as the isEmpty example).

Thanks Jeff!


“Object Calisthenics” : Jeff Bay

I happened to notice the book “The ThoughtWorks Anthology” lying around the office. It contains quite a few interesting articles. One in particular resonated with me.

In Chapter 6, Jeff Bay talks about a set of simple rules for writing OO code, which he suggests you try applying rigorously on a small project (say 1000 lines of code). The article can be downloaded here.

This is a hard exercise, especially because many of these rules are not universally applicable. The fact is, sometimes classes are a little more than 50 lines. But there’s great value in thinking about what would have to happen to move those responsibilities into real, first-class-objects of their own. It’s developing this type of thinking that’s the real value of the exercise. So stretch the limits of what you imagine is possible, and see whether you start thinking about your code in a new way.

What I liked about this article is that it cut to the core of something which I have been working at explaining over the last few months. I sometimes feel like a “purist” when discussing OO design. It is a common situation to find myself defending writing a small class, or encapsulating a String to make a small value object which only has a single attribute. Jeff explains in a great way a few principles and challenges the reader to try them out in a rigorous way, just to see how it works out. This is a great way to present it, its not saying “I know the right way and you must follow the rules”, its suggesting that you should give it a chance and you might begin to see some rewards, or “Try it, you might like it”.

I also liked the fact that eight out of the ten rules were all about encapsulation something I have also been thinking a good deal about recently, prompted by this experience, even to the extent of getting it printed on a t-shirt (but that’s another story).

Since writing this post, I have had a discussion with Jeff via email, the results of which can be found here, this paragraph is superseded.

Another one which isnt actually in the book, but that I have been getting into recently is “No static methods except as factories”, This one is particularly hard to follow when so many libraries use them and its just so convenient:)

He’s not suggesting that these rules are always applicable but I think its interesting to make the effort of taking it to an extreme and see where it leads you, it may open new possibilities when going back to everyday coding.


Distilling the principles of software design

How do I make sure I am reading enough, or have a good set of working principles to be able to apply to my software design / coding ?

Most people come to this by experience. Experience and interaction with other software developers. Ideas (or memes) spread by word of mouth, more specifically the internet.

Something the scientific community often publishes are review papers. These take a particular field and summarise the findings, providing a massive list of references at the end.

These are extremely useful if you want to then work in that area. Because it has been peer reviewed, there is a certain amount of confidence that crucial works (at least until the date of publication) have been included.

We almost have this in software, but people tend to write books, or blogs about things. These tend to be written by people who have gained a reputation in the field for collating ideas, for example Martin Fowler. Maybe its the same thing, and you get a word of mouth gain in respect for these people.

Perhaps a more formal peer-reviewed process could be realised for software design principles. As a local alternative, I have created a page called ‘Essential Reading’ where I am going to begin to collate such references. Works which I believe contain valuable principles, and which everyone should be aware. I’ll see if it is possible to get something like a review paper going.

I think the goal is that you should be able to communicate about any of these concepts amongst developers and they should be able to understand what your talking about. Its a bit like patterns, but with a wider context, they are principles.

I was inspired to write this after finding Neal Ford‘s book which is a book that I had thinking would be great to write. Neal has been working on it for a couple of years and I am looking forward to reading it.

Non-randomly enough, I just found this from Joel which is all about Unicode, but is entitled “The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)” maybe what I’m looking for is a similar article about design principles. Infact, Joel has a list of articles by job role which he has written which is close to what I’m looking for. Also, he co-founded Stack Overflow which is a way forward.

I would also like to start ‘’ which would be a similar idea but allows you to post snippets of code and people can vote on wether they are readable or not. I have the domain name already.