Intro
Most test suites enforce a binary pass or fail. Your Lighthouse score is above the threshold, or it isn’t. Your lint rule count is under the limit, or it isn’t. This works fine when you are starting a project from scratch and can set the bar where you want it. But what about the project you inherited with 200 lint warnings, or the site where half the pages score below 80 on performance? You can’t always fix everything at once, and setting the threshold low enough for the current state to pass defeats the purpose of having a threshold at all.
Temporal ratcheting gives you a third option. Set the bar at where things are today, and define a schedule by which that bar must rise. The tests still pass right now, but they won’t pass forever unless someone does the work to improve things. The expectation of progress is baked directly into the test suite.
What a Ratchet Does
A ratchet, in the mechanical sense, is a device that allows movement in only one direction. Turn a ratchet wrench clockwise and it grips. Try to turn it back and it holds. In software quality terms, a ratchet is a mechanism that prevents backsliding (i.e., it prevents things from getting worse).
Notion’s engineering team published a detailed write-up on their ratcheting system using custom ESLint rules. They maintain a database of allowed error counts per file, and CI blocks any merge that increases the count. If you fix a warning in a file, the allowed count for that file drops, and nobody can reintroduce it. That is a static ratchet: the ceiling only moves when someone improves things.
Temporal ratcheting takes this a step further. Instead of a fixed baseline that only moves on improvement, the threshold has a starting point and a trajectory. You record the current state of quality at a point in time, set that as the minimum acceptable level, and then define a schedule by which that minimum must improve.
This is akin to a well-known quote:
“The arc of the moral universe is long, but it bends toward justice.”
— Theodore Parker and Martin Luther King Jr.
Temporal ratcheting works on the same principle. The arc of your codebase quality is long, but you can bend it toward improvement by building the expectation of progress directly into your tests.
How It Works
The core mechanism has three parts:
- Record a starting point. Run your test suite against the full codebase and capture the current state. Maybe 200 out of 500 files have lint warnings, or 4 out of 15 pages fail an accessibility check. Maybe your average Lighthouse performance score is 72.
- Set that as the baseline. The tests pass if the count is at or below the baseline. They fail if the count increases. At this point you have a static ratchet.
- Define a schedule for tightening. The test factors in the current date to compute the allowed threshold. Maybe the allowed warning count decreases by 5 per month, or the minimum Lighthouse score increases by 2 points per quarter. The threshold automatically tightens over time, and the tests will eventually fail if nobody addresses the gap.
Here’s some pseudo-code that shows you how to interpolate between the baseline and the target based on how much time has elapsed:
type RatchetConfig = {
metric: string;
baselineValue: number;
baselineDate: Date;
targetValue: number;
targetDate: Date;
}
function getCurrentThreshold(config: RatchetConfig): number {
const start = config.baselineDate.getTime();
const end = config.targetDate.getTime();
const now = Date.now();
// How far along the timeline are we? (clamped between 0 and 1)
const elapsed = Math.max(0, Math.min(1, (now - start) / (end - start)));
// Linear interpolation from baseline to target
return config.baselineValue + (config.targetValue - config.baselineValue) * elapsed;
}
With this function, a configuration like the following would produce a threshold that starts at 200 and decreases by roughly 16 per month over the course of a year:
const lintRatchet: RatchetConfig = {
metric: "lint-warnings",
baselineValue: 200,
baselineDate: new Date("2026-01-01"),
targetValue: 0,
targetDate: new Date("2027-01-01"),
};
On March 1st, the allowed count would be around 167. By July 1st, it would be around 100. By November, it would be around 33. If the team hasn’t been fixing warnings and the count is still at 180 when the threshold hits 167, the tests start failing. That is the point. The system creates steady pressure to do the work incrementally, well before the deadline arrives.
Where Temporal Ratcheting Fits
I reach for temporal ratcheting when there is a big gap between where the codebase is and where I want it to be, and closing that gap all at once is not practical:
- Legacy codebases with accumulated debt. You inherit a project with hundreds of lint warnings, low test coverage, or accessibility violations. You can’t fix them all at once, and you also can’t ignore them. Temporal ratcheting lets you commit to a pace of improvement without requiring a massive cleanup.
- Lighthouse and performance scores. A site that scores 72 on performance today might not be able to hit 95 overnight. But you can set a trajectory from 72 to 90 over the next two quarters, and the test suite will enforce that trajectory.
- Accessibility compliance. Similar to performance, accessibility improvements can take time when the codebase has a lot of violations. A ratchet lets you commit to steady progress while still shipping features.
- Test coverage. If your project has 40% test coverage and you want to get to 80%, the ratchet can enforce that coverage must increase by a fixed amount each month.
It does not replace fixing things. The ratchet is the forcing function, not the fix. You still need people doing the actual work of improving the code. What the ratchet does is make the expected pace of improvement visible and enforceable.
Static vs. Temporal Ratchets
Static ratchets (like Notion’s approach) and temporal ratchets serve different purposes.
A static ratchet says: “don’t make things worse.” If you fix a lint warning, the allowed count drops by one and nobody can reintroduce it. This is great for preventing new debt from accumulating, but it doesn’t create any pressure to fix existing debt. A codebase can sit at 200 warnings indefinitely as long as nobody adds warning 201.
A temporal ratchet says: “things must improve on a schedule.” The threshold tightens whether or not anyone is actively working on it. If the team goes three months without fixing warnings, the tests start failing. This creates the pressure to prioritize the cleanup work that a static ratchet lacks.
Choosing a Schedule
The schedule is the most important decision. Too aggressive and the team burns out fighting ratchet failures instead of shipping features. Too relaxed and the ratchet has no practical effect. Here are some guidelines that help:
- Match the schedule to the team’s capacity. If a small team is also shipping features, a target of “fix 5 warnings per month” is more realistic than “fix 50.” The ratchet should feel like steady progress, not an emergency.
- Revisit the schedule periodically. A ratchet is not a blood oath. If circumstances change (team shrinks, priorities shift), adjust the schedule rather than letting it break the build for weeks.
Making Failures Actionable
The worst thing a ratchet failure can do is leave the developer staring at a red build with no idea what to fix. “Lint check failed” tells you nothing. Compare that to something like this:
Lint ratchet failure:
Current warnings: 183
Allowed threshold: 167 (as of 2026-03-01)
Baseline: 200 (set 2026-01-01)
Target: 0 (by 2027-01-01)
You need to fix at least 16 warnings to pass.
This tells the developer the scope of the problem, how far behind the schedule the codebase is, and what needs to happen. It turns a red build into an actionable work item rather than a mystery to debug.
The Forcing Function You Set and Forget
The beauty of temporal ratcheting is that it makes improvement the default outcome of normal development. Nobody has to carve out time for a dedicated “fix all the accessibility issues” sprint. Instead, the system applies steady pressure. Over weeks and months, the codebase converges on the desired state.
You set it up once: record the baseline, define the schedule, and write the test. Then it runs quietly in the background. Each CI run checks the current date, computes the current threshold, and compares it to the actual count. The team fixes issues at their own pace, so long as they stay ahead of the schedule. By the time the target date arrives, the work is done, and it happened incrementally as part of regular development rather than as a disruptive cleanup project.
