In college I worked for a company whose goal was to prove that their management techniques could get a bunch of freshman to write quality code.
They couldn't. I would go find the code that caused a bug, fix it and discover that the bug was still there. Because previous students had, rather than add a parameter to a function, would make a copy and slightly modify it.
I deleted about 3/4 of their code base (thousands of lines of Turbo Pascal) that fall.
Bonus: the customer was the Department of Energy, and the program managed nuclear material inventory. Sleep tight.
In addition to not breaking existing code, also has added benefit of boosting personal contribution metrics in eyes of management. Oh and it's really easy to revert things - all I have to do is find the latest copy and delete it. It'll work great, promise.
Add tests to the function as it exists today. Submit. Add new functionality, make sure tests still pass. Done. Updating a function here and there shouldn't require more staff.
This implies adding tests that accurately capture all the nuances of the function and don't test the simplest logic need to hit code coverage. When we are talking someone new to the function, then this is about the same as asking them to learn the function so they can be sure they didn't make an error when they changed it. The benefit of tests is that they are written by the person creating the function originally who is most aware of the hidden dangers of it.
I'm distrustful on unit testing as I've seen too many tests written to make code coverage numbers but that don't actually test the functions they are aimed at. A non-trivial number which run the function asynchronously and then report a successful run before the function even finishes executing, meaning that even throwing errors don't fail the tests (granted, part of that is on the testing framework for letting unexpected errors ever result in a pass).
We have a saying at my work. "If you like it, then you should have put a test on it". If the original author didn't add adequate coverage and you end up breaking them, it's on them.
Of course, this is the way you need to write tests -- to test the actual logical pathways and requirements of the code, and not just finagle them together to overfit some code coverage metric.
Yes, but it just creates a new immutable branch in the commit graph. All the old commits are still there, but if they're not reachable from the root refs, they'll get GC'd eventually. The only mutable parts are HEAD, branch/tag names etc that can be changed to point to whatever. Anything that has a hash is necessarily immutable, because changing it in any way (including changing its parent pointer(s)) changes the hash.
I work with someone who has a habit of code duplication like this. Typically it’s an effort to turn around something quickly for someone who is demanding and loud. Refactoring the shared function to support the end edge case would take more time and testing, so he doesn’t do it. This is a symptom of the core problem.
> The duplicated code that needs updating in 50 places every time a bug or new feature comes in? Yes, I'm sure.
If you're talking about duplicate code showing up in 50 places then your problem is not code duplication but incompetent developers not being able to maintain a project.
If instead you're talking about code with a passing resemblance showing up in 2 or 3 places then odds are you're actually looking at more maintainable code straight in the eye and you're not able to understand how that makes the project more maintainable.
I have a habit of doing this for data processing code (python, polars).
For other code it's an absolute stink and i agree. But for data transforms... I've seen the alternative, a neatly abstracted in-house library of abstracted combinations of dataframe operations with different parameters and.. It's the most aesthetically pleasing unfathomable hell I've ever experienced.
So now, when munging dataframes, i will be much faster to reach for 'copy that function and modify it slightly' - maintenance headache, but at least the result is readable.
But it's a false premise; the claim is that just copy/pasting something is faster, but is it really?
The demanding / loud person can and should be ignored; as a developer, you are responsible for code quality and maintainability, not your / their manager.
> I work with someone who has a habit of code duplication like this.
Are you sure it's code duplication?
I mean, read your own description: the new function does not need to support edge cases. Having to handle edge cases is a huge code smell, and a clear sign of premature generalization.
And you even admit the guy was more productive and added less bugs?
There is a reason why the mistakes caused by naive approaches to Don't Repeat Yourself (DRY) are corrected with Write Everything Twice (WET).
I didn’t say less bugs. There are a lot of bugs, they are just localized to each call, and then copy/pasted all over the place. So when found, they need to be fixed in a bunch of places. It makes for quite the mess.
They just aren’t making changes to the shared function, so they don’t need to test existing functionality still works, just their single use case.
This reminds me of my experience. I've worked for one company based in SEA that had almost identical portals in several countries in the region. Portals were developed by an Australian company and I was hired to maintain existing/develop new portals.
Source code for each portal was stored in a separate Git repository. I've asked the original authors how am I supposed to fix bugs that affect all the portals or develop new functionality for all the portals. The answer was to backport all fixes manually to all copies of the source code.
Then I've asked: isn't it possible to use a single source repository and use feature flags to customize appearance and features of each portals. Original authors said that it is impossible.
In 2-3 months I've merged the code of 4-5 portals into one repository, added feature flags, upgraded the framework version, release went flawlessly, and it was possible to fix a bug simultaneously for all the portals or develop a new functionality available across all the countries where the company operated. It was a huge relief for me as copying bugfixes manually was tedious and error-prone process.
I once had to deal with some contractors that habitually did this, when confronted on how this could lead to confusion they said "that's what Ctrl+F is for."
Oh boy! This reminded me of one of my worst tech leads. He pushed secret tokens to github. When I asked in the team meeting why would we do this instead of using secrets manager, the response was: "These are private respos. Also we signed an NDA before joining the company"
> Bonus: the customer was the Department of Energy, and the program managed nuclear material inventory. Sleep tight.
These are my favorite (in a sense) programmer stories--that there's these incomprehensible piles of rubbish that somehow, like, run The World and things, and yet somehow things manage to work (in an outwardly observable sense).
Although, I recall two somewhat recent stories where this wasn't the case. The unemployment benefits fiascos during early Covid-era, and some more recent air traffic control-related things (one which effected me personally).
They couldn't. I would go find the code that caused a bug, fix it and discover that the bug was still there. Because previous students had, rather than add a parameter to a function, would make a copy and slightly modify it.
I deleted about 3/4 of their code base (thousands of lines of Turbo Pascal) that fall.
Bonus: the customer was the Department of Energy, and the program managed nuclear material inventory. Sleep tight.