How structure helps me to maintain software

7 February 2017

Oscar Westra van Holthe - Kind

Software maintenance can be hell. I'm sure that many developers have opened up an existing codebase, only to be greeted with a plethora of packages. You can usually see in excruciating detail the kinds of classes the software has. Any features implemented by the software are obfuscated. And once you slowly, ever so slowly, get to know the software, it gets worse: all those packages have a more or less good reason to exist. This means refactoring bit by bit (the boy scout rule) only makes the situation worse, and so your velocity remains low.

The cause? The structural design of the software, including the names of packages, classes, etc.

This blog post describes my solution to this problem, starting with how I analyze software to judge its maintainability. I'll focus heavily on structure (and naming), because that is what you initially see when opening a codebase. Given the power of first impressions, this has a direct impact on maintainability.

Evaluating software maintainability

If you want to improve any aspect of anything, you must be able to measure improvements. One of the ways to evaluate the maintainability of software is to look at its Dependency Structure Matrix (DSM).I won't explain how to read a DSM, as others have described DSM's very well already. The reason I like to use a DSM is that it shows a lot. It contains the package/class names in your software, and how they relate to each other in terms of dependencies, i.e. coupling. This tells you a few interesting things about its maintainability:

If there are circular dependencies, these show up immediately. These deteriorate maintainability and should be removed as soon as possible. In my experience, having a clear structure and intentional revealing names helps preventing circular dependencies.
If you have numbers above the diagonal, you have an unclear design (and a source for circular dependencies). The fix is the same as for circular dependencies, but easier to do.
If you have many and/or high numbers in your DSM (zeroes don't show), you have high coupling. This reduces long term maintainability. Reducing coupling is difficult though, as it involves looking differently at your problem domain. In the end though, reducing coupling always involves some restructuring.

It is no coincidence that improving your software structure helps with all these issues. Software maintainability is not determined by the 'right' (ahem) framework or library. There is no silver bullet. It is the software structure that chiefly determines maintainability of software.

Structuring your software

Developers increasingly use Test Driven Design (TDD). In practice, this means we write our tested code API first. This limits our API design to classes (data) and functions (code). What about packages? As it turns out, designing your API first at every level is a great help in reducing coupling. This starts at the application scope (especially in a micro service environment), and extends to the modules, packages and then the classes and methods. Going API first this way means that at every step of the way, you're thinking off what your users really need — results — while at the same allowing you to encapsulate (hide) all logic and details to get these results.

So, how do you go about designing API first at every level? Some time ago, I realized that API's are everywhere in code: these are the parts that are visible. A class API consists of its public/protected/package private methods, depending on where you look from. A package API consists of its public classes. And a module? The API of a module consists of all packages inside it. So to design API first at all levels of a codebase, you start at the modules.

On modules

When designing modules, I tend to stick with just one: the entire codebase. This has a downside. I must continuously ensure my package structure doesn't get out of hand. You can make this easier by separating the core of the codebase into a module 'core', with an additional module for each connection you need to make. So you'll get modules like 'http', 'jms', 'database', and so on. Each of these modules depends on 'core', but not on each other. This forces all functionality to be named (and exposed) in 'core', so you can test all features you support there. Splitting the core module is something you might to at a later stage as well.

Did you recognize the Hexagonal Architecture in the previous paragraph? It was described by Alistair Cockburn in 2005, and I felt something click when I read it. I think it is a convenient way to ensure that all features in the codebase can be tested without starting up a container / the service, etc. I realize we can do so easily with Docker these days. But it is even easier when you can do without. Next up: packages.

Packaging by layer of feature?

By far the easiest — and most common — choice is to package by layer. You can already see it sort classesinto different packages before you've completed your first feature. And when packages become too big, it's fairly easy to come up with names to split packages. This is because packaging by layer creates ataxonomy, a structural classification of classes. However, packaging by layer aggressively prevents the use of non-`public` class scopes. It encourages low modularity, low cohesion and high coupling; the opposite of what you want to achieve.

Remember what I wrote about designing API first? It allows you to encapsulate (hide) stuff. Packaging by layer would undo your hard work.

By contrast, packaging by feature allows you to hide feature internals using non-`public` scopes. This limitscoupling. Additionally, using features as a unit also encourages high cohesion and makes higher modularity easier. Useful if you want to be able to split your code into modules in the future.

Some time ago, anyone used to using a taxonomy may feel lost: how do you easily recognize converters, services, etc. if they don't have their own package? Should we guess at the design patterns being used? The answer to that lies with your classes that implement the design patterns.

Class naming

Class design is not extremely important to structure software. Classes contain code, and thus implement (business) functionality with it. This keeps our users happy. There is one aspect of classes that I think is important to structure a codebase. Their names.

To determine where to make a change, you must be able to read what a class does and how it relates to the other classes that implement a feature. It helps if the classes are together of course (another reason to use packaging by feature), but more important to this level are the design patterns you used.

Looking at the class list, I like to see the design patterns in the class names. So this is where I would use mytaxonomy: in the class naming scheme. In my opinion, class names should describe what they represent (for data classes and entities), or the feature/function they provide plus their place in the taxonomy. This means you get names like User, Project, Task, FileManager, TaskService, ProjectController, ProjectRepository,TaskMapper, etc.

Bringing it all together

When you're building or maintaining a codebase, it is a good idea to periodically look at its Dependency Structure Matrix. This helps you check adherence to the package principles.

Things I tend check:

Absence of circular dependencies (the Acyclic Dependencies Principle; ADP).
Whether the DSM is in *lower triangular form* (i.e. there are no numbers above the diagonal). If so, both the Stable Dependencies Principle (SDP) and the Stable Abstractions Principle (SAP) are sufficiently adhered to. Using this shortcut means I don't have to worry about finding a metric about how stable or abstract a package is.
If there are high or many numbers below the diagonal (i.e. high coupling). Having less and lower numbers denote adherence to the Common Reuse Principle (CRP) and to a lesser extend the Common Closure Principle (CCP). And even if not, having low coupling is a good idea in any case.

Final thoughts

The structure of a codebase is extremely important, as it is the first thing a developer sees when opening it. Ifdone well, it makes a codebase self-documenting both by re-enforcing its place in the IT landscape and by detailing the features it implements.

Packaging by feature is a necessity for this. It enables you to hide details by using non-'public' class scopes(there's a reason the Java designers made *package private* the default class scope). This in turn helps you to reduce coupling and enhance the cohesion of packages and modularity of the codebase.

The class taxonomy can be used in names, with suffixes like __Service, __Controller, __Mapper, etc. This documents what design patterns are used to implement a feature, so you can more easily make changes. To check if I did a good job, I use a Dependency Structure Matrix. This displays the various parts of the codebase, the dependencies and the amount of coupling between them, and allows me to judge how well I've adhered to the package design principles.

Oscar Westra van Holthe - Kind

All articles by me