Skip to main content

My favourite engineering mess-up

Written on under Dev

I've had an itch to write something about my favourite mess-up. I've found some time so sit down by the fire with a blanket, make yourself some hot chocolate (preferably before you sit down) and read my tale.

The Scene

This one happened about two years ago whilst working at Condé Nast Britain. I was one of a few that helped to maintain and release our UK brand sites. For those of you who are unaware of what they are, at the time they were GQ, Vogue, Wired, Glamour, Tatler, Traveller, House and Garden and Brides. We had eight brand websites, running on one platform that we called Merlin (the engine, not the wizard). Each of these brand websites was built on top of an application core, passed in a few configs and then ran. Instead of one repo that housed the core and eight brand configs, we had the core repo plus eight brand repos. They were all pretty much identical with the exception of marketing requirements and bloody redirects.

Our release process was interesting. As you can gather, we had eight repos that would need to be individually deployed. How does an engineer solve that task? By building a script. I had created a release script that I could specify the version bump, update a changelog from reading commit messages, tag and then release.

A lot of people can tell you that I'm a very big fan of having informative commit messages. I like to use the Angular method of writing messages as it provides a structure that can be used to identify the version bump and create a changelog. I recently got to watch Alice Bartlett's FFConf talk on git messages that I would strongly recommend watching. Very enjoyable but also has some good recommendations for a message.

Back to the story, one day we had made a change and it needed to be deployed to all brands. Easy. Done it a several million times. I opened up my eight terminal tabs, made the core version bump and started doing releases. GQ. Vogue. Wired. Oh GQ is taking a while. Glamour. Tatler. Traveller. GQ isn't doing much. House and Garden. Brides.

GQ had halted on building the changelog.

A command line terminal with a cancelled job during a changelog task

I must have messed up. I ran the release command again and waited. Nought.

The Bug

A bunch of us huddled around my screen and started spitballing ideas. All the dependencies were the same for the repos, we could reproduce on other computers, we had broken javascript.

After routing around for a couple of minutes trying to work out where exactly the issue was, we found it. A regular expression in a dependency of a dependency of a dependency of a dependency of a dependency (ah the good old NPM ecosystem) that reads the commit messages had changed. But why is it breaking in this repo and not Vogue or Glamour?

So as I previously mentioned, I like informative commit messages. I like very informative commit messages. The regular expression was getting stuck on a commit message I had written. But why? Well friends, that was because my commit message contained 17489 characters. I told you I like to be informative. But Dan, that's a very long message. What was actually in this message? A lot of three's.

A github commit message showing a lot of three's

I had made a change two years prior on gallery image thumbnails to ensure they stay at 33.33% and being the "funny man" I am, wrote it as if it were recurring. I also had the audacity to make another joke in the commit saying I missed a three.

A github comment reading "i think i missed a 3"

The Fix

You're probably expecting the fix to be we pinned the dependency to an older version or we corrected the commit through a rebase so that it might not impact anything else or even you were a good developer and submitted a PR to fix the regular expression whilst explaining the insane situation you're in. Well, you would be wrong. I wrote a monkey patch.

You can view the code here: https://github.com/cnduk/merlin-www-components/blob/develop/packages/merlin-www-build-tools/js/tasks/release.js#L114-L216. The idea was that we would overwrite a dependency of the offending dependency, intercept the reading of raw commit messages, truncate any that are too long (we deemed 1000 to be long enough) and then pass it back to the original dependency to do its things.

A command line terminal with a working changelog task

If you've looked at the code, you'll see I left a note saying I would submit a PR with the fix. I never did. And I probably won't. Sorry.

The Moral

So what is the moral of this story I'm trying to share. Don't write too informative commit message? Don't have fun with your commit messages? Don't work with recurring numbers, lets only use whole numbers for things? I don't really have one. Have fun with things, life is short. If you are the cause of an issue that would impact deployment or other engineers, maybe give a hand to fix it. I certainly didn't think this would bite me in the ass.

Perhaps next time I'll write a story about the time I made an alert box appear in a loop on the home page of The Daily Telegraph during my first week of working there. Or maybe getting in trouble for a dinosaur at Condé.