I spent a Saturday last month profiling a FastAPI service that was exhibiting mysterious p99 latency spikes. Requests that normally completed in 8ms would occasionally balloon to 120ms. No pattern in the endpoints, no correlation with payload size. Just random, infuriating pauses. The culprit was Python's incremental GC -- or rather, the lack of it. That service was running Python 3.12. After migrating to 3.14, those spikes collapsed to under 12ms. Here is exactly why, and how the new collector works under the hood.
Why GC Pauses Matter for Real Applications
Reference counting handles most of Python's memory management. When an object's reference count drops to zero, it gets freed immediately. No pause, no scan, no problem. But reference counting alone cannot handle reference cycles -- objects that point to each other, keeping their counts above zero even when the group as a whole is unreachable. That is what the cyclic garbage collector exists to solve.
The trouble is that solving reference cycles requires traversal. The collector has to walk object graphs, identify unreachable clusters, and break them apart. During that walk, your application thread is frozen. This is a stop-the-world pause, and its duration scales with the number of objects being scanned.
For a script that runs, does its work, and exits, this is irrelevant. But for a web server handling hundreds of requests per second, a 100ms pause means a hundred requests just got delayed. Rippling's engineering team documented third-generation collection pauses exceeding 2 seconds in production, directly inflating their p99 API latency to roughly 3 seconds against a p50 of 50ms. Close.com measured that approximately 3% of their total CPU time was spent in garbage collection, with p95 latency spikes of 80-100ms attributable purely to GC. These are not theoretical concerns. They are production outages measured with distributed tracing.
The old workaround was to call gc.disable() or gc.freeze() and manually manage collection between requests. Instagram famously disabled the GC entirely. These are hacks born from the collector's fundamental design problem: it does too much work at once.
The Old Generational Collector and Its Problems
Before Python 3.14, CPython used a three-generation collector. Every container object (lists, dicts, classes, instances -- anything that can hold references to other objects) was placed into one of three generations: generation 0 (young), generation 1 (intermediate), and generation 2 (old).
The idea was based on the weak generational hypothesis: most objects die young. If an object survives a generation-0 collection, it gets promoted to generation 1. Survive that, and it moves to generation 2. The young generation gets collected frequently, the old generation rarely.
Here is how you could inspect the thresholds on Python 3.12:
import gc
# Default thresholds: (700, 10, 10)
print(gc.get_threshold())
# When (allocations - deallocations) > 700, generation 0 is collected.
# After 10 gen-0 collections, generation 1 is collected.
# After 10 gen-1 collections, generation 2 is collected.
The problem is generation 2. It accumulates every long-lived object in your process: all your imported modules, your ORM models, your connection pools, your cached data structures. In a mature web worker, generation 2 can contain millions of objects. When generation 2 finally triggers, the collector has to traverse all of them in a single stop-the-world pass.
The math is punishing. A full-heap scan of 220,000 objects that spills out of L2 cache into L3 can cost roughly 12 million CPU cycles. At modern clock speeds, that is 4-12ms of pure freeze time just for the traversal, before accounting for the actual cycle-breaking work. Double or triple that object count -- which is entirely normal for a Django or Flask application with an ORM -- and you are looking at pauses of 30-100ms or more.
How the New Incremental GC Works
Python 3.14 replaces the three-generation model with a two-generation incremental collector, contributed by Mark Shannon. The core insight: if you cannot avoid scanning the old generation, at least do not scan all of it at once.
The new design has two generations: young and old. The intermediate generation is gone. Every collection pass scans the entire young generation (which is small and bounded) plus a fraction of the old generation. The key word is "fraction." Instead of one massive pause, the old generation gets scanned incrementally across many small pauses.
The old generation maintains two lists:
Pending list: objects that have not yet been examined in the current full scan cycle.
Visited list: objects that have already been examined.
Each GC invocation does the following:
Scans all objects in the young generation.
Takes a slice of the oldest objects from the pending list.
Computes the transitive closure -- finds all objects reachable from that slice that have not yet been visited.
Runs cycle detection on the combined set.
Promotes surviving young objects to the old generation.
Moves examined old objects from pending to visited.
When the pending list is exhausted, the full scan cycle is complete. The visited and pending lists swap roles, and the next full scan begins. No single pause ever has to touch the entire old generation.
import gc
# Python 3.14 thresholds: (700, 10, 0)
# threshold0: allocation trigger (same as before)
# threshold1: inversely controls old-gen fraction per increment
# threshold2: now ignored (always 0)
print(gc.get_threshold())
# gc.collect(0) -- collect young only
# gc.collect(1) -- collect young + one increment of old (NEW in 3.14)
# gc.collect(2) or gc.collect() -- full collection (same as before)
Benchmarks: Pause Times Before and After
The official Python 3.14 documentation states that maximum pause times are reduced by "an order of magnitude or more for larger heaps."
Consider a web worker with 500,000 tracked container objects in the old generation. Under the old three-generation collector, a full generation-2 scan traverses all 500,000 objects. Assuming a mixed cache hit profile, each object touch costs roughly 50-100 nanoseconds due to pointer chasing. That is a pause of 25-50ms for traversal alone.
Under the new incremental collector with the default threshold1=10, each increment touches about 50,000 objects from the old generation plus the young generation (typically a few thousand). That brings the per-increment pause to approximately 2.5-5ms -- a 10x reduction.
Here is a simple way to measure the difference yourself:
import gc
import time
class Node:
"""A node that forms reference cycles."""
def __init__(self):
self.next = None
self.data = [0] * 100 # some bulk
def create_cycles(n):
"""Create n reference cycles of length 2."""
nodes = []
for _ in range(n):
a, b = Node(), Node()
a.next = b
b.next = a
nodes.extend([a, b])
del nodes # drop external refs; cycles remain
# Build up a large old generation
for _ in range(50):
create_cycles(10_000)
gc.collect() # promote to old gen
# Now measure a single collection
create_cycles(5_000) # fresh young-gen garbage
gc.disable()
start = time.perf_counter_ns()
gc.collect(1) # Python 3.14: incremental old-gen scan
elapsed_ms = (time.perf_counter_ns() - start) / 1_000_000
gc.enable()
print(f"Incremental collection pause: {elapsed_ms:.2f} ms")
# Compare with full collection
gc.disable()
start = time.perf_counter_ns()
gc.collect(2) # full scan
full_ms = (time.perf_counter_ns() - start) / 1_000_000
gc.enable()
print(f"Full collection pause: {full_ms:.2f} ms")
On my test machine with a million objects in the old generation, incremental pauses consistently measured under 8ms while a full collection took 60-80ms.
What This Means for Web Servers and Real-Time Systems
For ASGI/WSGI web servers (Gunicorn, Uvicorn, Hypercorn), this is the most impactful change. Previously, companies like Close.com and Rippling had to build custom worker classes that deferred GC to between-request windows. With incremental collection, the worst-case in-request pause drops from "however big your heap is" to "however big 1/10th of your heap is." For most applications, that means the difference between a noticeable latency spike and background noise.
For real-time systems -- game servers, trading systems, audio processing -- Python was generally avoided partly because of GC unpredictability. While Python is still not going to replace C++ in a hot loop, the incremental collector makes it viable for the orchestration layer of latency-sensitive systems where pauses need to stay under 10-15ms.
For data pipelines and ETL, the improvement is less about latency and more about throughput consistency. A full GC pause in the middle of a streaming batch can cause backpressure cascades in systems like Kafka consumers. Smaller, more frequent pauses are far easier for downstream systems to absorb.
Tuning the New GC
The good news: for most applications, the defaults are solid. threshold0=700 triggers collection frequently enough to keep the young generation small, and threshold1=10 gives you 10% increments through the old generation. But if you want to squeeze out the last milliseconds, here are the knobs.
Lower pause times at the cost of more frequent collections:
import gc
# Scan 5% of old gen per increment instead of 10%
# More increments needed for a full scan, but each one is shorter
gc.set_threshold(700, 20)
Higher throughput at the cost of larger pauses:
import gc
# Scan 20% of old gen per increment
# Fewer total increments, but each pause is longer
gc.set_threshold(700, 5)
The gc.freeze() technique still works and is still valuable. If your application has a heavy startup phase (importing Django, loading ML models, populating caches), those objects will sit in the old generation getting scanned forever. Freezing them removes them from the collector's scope entirely:
import gc
# After application startup / model loading
gc.collect() # clean up any startup garbage
gc.freeze() # exclude current tracked objects from future GC
# Optionally raise threshold0 to reduce young-gen frequency
gc.set_threshold(5000, 10)
This combined approach -- freeze startup objects, keep incremental collection for runtime objects -- is the current best practice.
Monitoring in production is straightforward with gc.callbacks:
Hook this into your metrics pipeline and you will have per-collection pause histograms that tell you exactly how the incremental collector is performing.
One important behavioral note: gc.get_objects(1) now returns an empty list in Python 3.14, since generation 1 no longer exists as a discrete container. If you have monitoring code that iterates over per-generation objects, you will need to update it.
After twenty years of the three-generation design, Python's garbage collector finally learned the same lesson that Java's G1 and Go's concurrent collector learned years ago: it is not about how fast you collect, it is about how little you pause. Python 3.14 gets that right.