i have a large app which is currently running with 600GB max memory (Xmx). the app is processing at a controlled rate now (every half an hour) to avoid an OOM.
each run is consuming and processing >3.2 million kafka messages in less than 5 minutes (1 or 2 minutes normally).
even after a lot tuning, when i look at the heap size, i saw the memory footprint is keep bumping up. even though the eden space has a lot frequent GC (minor) during the <5 minutes interval, the old gen is keep bumping up gradually.
this really seems like a memory leak.
however, after another thorough check into the code, it looks like the collection of objects which are not used did get dereferenced.
so if the code is right, then it looks like the gc might not be doing its job on the old gen.
so i have then triggered a manual GC, which resulted
after spending sometime to look into this in details, turns out the default ratio to trigger the gc on old gen is >40%. and this is definitely the ideal set up, because for the application, for example, it’s sit idling there 25 minutes out of every 30 minutes interval. while because the gc is waiting on an fixed size ratio to trigger, the memory was mostly wasted.
around 120GB was wasted in this case before the manual gc.
so turns out this was a proposal by JEP 346 in 2018 to tune this.
and obviously for now before that JEP implemented, leverage on the periodicGC is a much needed practice instead of leaving to the gc algorithm alone.
-XX:G1PeriodicGCInterval=600000 -XX:G1PeriodicGCSystemLoadThreshold=LOAD -XX:-G1PeriodicGCInvokesConcurrent
full gc with default setting where it’s triggered at >40% threshold: