DirectBuffer creation / disposal has hidden contention on sun.misc.Cleaner
I found recently a (dying) JVM with about 10 threads BLOCKED on the sun.misc.Cleaner class instance. This was not the root cause of the failure, but I could learn something by looking into those blocks.
A common & simple (not ideal) way to trigger the release of the memory backing a DirectByteBuffer is:
sun.misc.Cleaner cleaner = ((DirectBuffer) buffer).cleaner();
cleaner.clean();
All DirectByteBuffers contain a Cleaner, which is a type of PhantomReference. PhantomReference differs from its relatives SoftReference and WeakReference in that it allows to execute actions after the JVM has performed all collection tasks related to referent, including running its finalizer. As “Effective Java” advises in Item 6, finalizers should be avoided as “unpredictable, often dangerous, and generally unnecessary.:”they are sensitive to GC, JVM implementations, and risk exposing references to about-to-be collected objects that will prevent their collection. PhantomReference is a suitable alternative.
The Cleaner works as described in its Javadoc:
The cleaner tracks a referent object and encapsulates a thunk of
arbitrary cleanup code. Some time after the GC detects that a
cleaner's referent has become phantom-reachable, the
reference-handler thread will run the cleaner.
The ReferenceHandler is one of the JVM threads related to Garbage Collection tasks, it will appear in any thread dump. The cleaner execution happens precisely by invoking Reference::tryHandlePending
static boolean tryHandlePending(boolean waitForNotify) {
Reference<Object> r;
Cleaner c;
try {
synchronized (lock) {
if (pending != null) {
r = pending;
// 'instanceof' might throw OutOfMemoryError sometimes
// so do this before un-linking 'r' from the 'pending' chain...
c = r instanceof Cleaner ? (Cleaner) r : null;
[...]
// Fast path for cleaners
if (c != null) {
c.clean();
return true;
}
That is, once a referent is collected, the ReferenceHandler will take the corresponding Reference, and if it’s a Cleaner (as our case) invoke the clean method on it.
The last quoted paragraph continues:
Cleaners may also be invoked directly; they are thread safe and ensure
that they run their thunks at most once.
Why would we want to invoke cleaners directly?
Because it’s hard to predict when the DirectByteBuffer will be collected and return its underlying native memory to the OS. For example, imagine an application that generates mostly short-lived objects and promotes very few objects to the Old Generation. It will take a long time (hours? days?) to trigger a Major GC (which cleans the Old Generation). If one of these tenured objects is a DirectByteBuffer, it will keep hold of the underlying native memory even long after it becomes unreachable. To make things worse, cleaning PhantomReferences (as well as Soft or Weak ones) requires two GC cycles.
So it makes sense that we can call the Cleaner explicitly and thus expedite releasing that memory. That’s is typically done as shown at the beginning, or if we want to avoid the sun.misc.Cleaner import:
Method cleanerMethod = buffer.getClass().getMethod("cleaner");
cleanerMethod.setAccessible(true);
Object cleaner = cleanerMethod.invoke(buffer);
Method cleanMethod = cleaner.getClass().getMethod("clean");
cleanMethod.setAccessible(true);
cleanMethod.invoke(cleaner);
Which should be Oracle/OpenJDK friendly, although not totally platform independent.
Taking a look at the Cleaner source, it turns out that both add and remove methods, which are called when instantiating a new DBB, and after invoking the Cleaner, are static and synchronized, so they’ll both coordinate using the monitor from the Cleaner class instance itself.
private static synchronized Cleaner add(Cleaner c1)
private static synchronized boolean remove(Cleaner c1)
This means we have contention among all application threads trying to create or dispose DirectByteBuffer instances, plus the ReferenceHandler thread.
I already knew that DBBs are better reused than created frequently, because native memory allocations are much more expensive than in heap. So here is another reason: high rate of cleanup and creation also adds contention on the sun.misc.Cleaner class instance’s monitor.
One would assume that the rate of allocations of DBB should never be so high as to turn this into a problem. Except when it does. The JVM in this case ran a legacy service where a misbehaving connection pool caused lots of new connections being destroyed and re-created, along with their internal DirectByteBuffer. The situation degraded enough to make DBB allocations and releases impact the ReferenceHandler (thus, GC) and create manifest contention on Cleaner which contributed to snowballing to a complete loss of service.
Any thoughts? send me an email!
To get notified on new posts, follow me on Bluesky / Twitter, or subscribe via RSS feed or email:
Archive
- Why aren't we all serverless yet?
- Identifiers are better off without meaning
- Alert on symptoms, not causes
- How about we forget the concept of test types?
- How organisations cripple engineering teams with good intentions
- Migrating an Eureka-based microservice fleet to Kubernetes
- Talk write-up: "How to build a PaaS for 1500 engineers"
- Kubernetes made my latency 10x higher
- Sizing Kubernetes pods for JVM apps without fearing the OOM Killer
- GC forensics by example: multi-second pauses and allocation pressure
- How does the default hashCode() work?
- Frugal memory management on the JVM (Meetup)
- DirectBuffer creation / disposal has hidden contention on sun.misc.Cleaner