Tuning the JVM for Performance Optimization
TORO Integrate is a web application configured to run on Apache Tomcat, a Java Servlet Container. Tomcat, in turn, runs on top of a Java Virtual Mahchine (JVM); the Java HotSpot Virtual Machine implementation in particular if Oracle JDK is the JDK of choice1. In this guide, we will discuss the various JVM configuration options that can help, directly or indirectly, improve TORO Integrate's performance.
Generic JVM configuration?
While it may be worthwhile to have a one-size-fits-all JVM configuration, the truth is that there is no such thing; you should set up your JVM depending on how your organization uses TORO Integrate. The following are but tips intended to assist you in deciding which parameters you should use and which values would be appropriate for them.
Tuning requires more than just a basic knowledge of Java. The person responsible should have a thorough understanding of the production environment, JVM internal memory management, and testing methodology. Expertise in these topics is important, as attempting to tune an application without the technical know-how might end up doing more harm than good.
JVM configuration is a broad topic; for the meantime, this article will focus on topics and configuration options related to the JVM Garbage Collector (GC), a tool that greatly contributes to JVM performance degradation.
Crash Course on the JVM GC and Heap
Among the benefits of running an application on a JVM is that under the covers, it uses a memory management tool called a garbage collector. This tool frees up unused Java objects2 in the heap. The heap is where all live and 'dead' objects reside; it is a "runtime data area from which memory for all class instances and arrays is allocated"3.
However, while the idea of a garbage collector sounds appealing, it comes at a cost; cleaning unused Java objects in the heap space causes all application threads to stop until the cleaning operation completes. This is called a stop-the-world (STW) event. The duration of the pause depends on the machine, JVM configuration, and the number of objects stored in memory. Simply put, the STW events of the GC can greatly degrade the overall performance of an application by stalling the application and consuming what otherwise could have been throughput.
The heap is divided into two important parts: (1) young generation and (2) old generation.
According to Oracle's documentation on Java Garbage Collection Basics:
"The young generation is where all new objects are allocated and aged. When the young generation fills up, this causes a minor garbage collection. Minor collections can be optimized assuming a high object mortality rate. A young generation full of dead objects is collected very quickly. Some surviving objects are aged and eventually move to the old generation."
"The old generation is used to store long surviving objects. Typically, a threshold is set for young generation object and when that age is met, the object gets moved to the old generation. Eventually the old generation needs to be collected. This event is called a major garbage collection."
"Often a major collection is much slower because it involves all live objects. So for responsive applications, major garbage collections should be minimized. Also note, that the length of the stop-the-world event for a major garbage collection is affected by the kind of garbage collector that is used for the old generation space."
The two most influential factors that affect garbage collection performance are:
Total available memory
A common resolution for improving the performance of any application is by giving it more memory to work with. By giving the JVM more memory, it will take longer for the heap to fill up; thus, collections will occur less often. But a bigger heap can also mean the presence of more objects4; thereby making garbage collection slower. From this, we can say that "throughput is inversely proportional to the amount of memory available."5
Proportion of the heap dedicated to the young generation
"The bigger the young generation, the less often minor collections occur. However, for a bounded heap size, a larger young generation implies a smaller old generation, which will increase the frequency of major collections. The optimal choice depends on the lifetime distribution of the objects allocated by the application."5
Tuning the Java Heap
One of the easiest ways to boost TORO Integrate's performance is by giving it more memory; however, the more memory you allocate, the more garbage the collector will have to clean. The JVM heap size determines how long and how often the garbage collector spends on cleaning unused Java objects. The goal in choosing the optimal heap size is to minimize the time spent on garbage collection versus the time spent on the application.
The JVM arguments for setting the heap size are:
-Xmx<size>, which sets the initial and minimum heap size
-Xmx<size>, which sets the maximum heap size
java -Xmx2g -Xms2g TOROIntegrate
Equal values for
-Xms size with equal values will yield higher performance throughput than setting the
values differently as this prevents the JVM from spending time to increment the allocated heap.
Tuning the JVM Garbage Collector
When increasing the allocated JVM memory is no longer a valid resolution for improving the performance of your application, you can start tuning your garbage collector to improve the pause time or frequency of stop-the-world events. In general, to do this you must:
- Choose the type of garbage collector that suits your production environment.
- Profile the results and see whether your changes were satisfactory.
Rinse and repeat until you get the JVM Garbage Collector configuration that is right for you.
Choosing the Right JVM Garbage Collector
"When does the choice of a garbage collector matter? For some applications, the answer is never. That is, the application can perform well in the presence of garbage collection with pauses of modest frequency and duration. However, this is not the case for a large class of applications, particularly those with large amounts of data (multiple gigabytes), many threads, and high transaction rates."
There are two types of collectors: (1) a throughput-oriented collector and (2) a low pause collector. Throughput is defined by the application total time that is not spent in the garbage collector. Pause-time is defined by how long the stop-the-world event takes during garbage collection.
A simple way to choose between the different types of garbage collection is to consider the requirements of your production environment. If you want TORO Integrate to perform as much work as possible and can withstand recurring longer pauses then a suitable garbage collector is a throughput-oriented collector. If, however, your production environment prioritizes low latency then the low-pause collector may be more suitable.
Throughput Oriented Collectors
This is used to perform all garbage collection work using a single thread. This works best on a single processor machine as it cannot take advantage of multiprocessor hardware, although it can be useful on multiprocessor if the dataset is less than 100 MB. This is enabled by the JVM argument
Parallel Collector (or Throughput Collector)
This performs minor collections in parallel, which significantly reduces garbage collection overhead. This garbage collector is best suited if your production environment runs on a multiprocessor with medium to large sized data sets. This is enabled by the JVM argument
XX:+UseParallelGC. This argument, however, only uses parallel garbage collection on minor collections; in major collections you will need to explicitly enable it with
Low Pause Collectors (Response Time Oriented Collector)
Concurrent Mark-and-Sweep Collector
This collector is designed to perform the cleaning work concurrent with the application. This still requires the application to pause for certain cleaning phases. This is enabled by the
Garbage First Collector
Another alternative to the Concurrent Garbage Collector is the new G1. This is the default garbage collector of Java 9 and is said to replace the CMS GC. This is enabled by the
After choosing the garbage collector that suits your production environment, the most reliable way to know whether you made the right choice or not is by profiling TORO Integrate. Don't get fixated on your JVM configuration just yet. In the world of performance tuning, there is one iron-clad rule: "Never guess. Profile the application."
When testing your configuration, it is important to follow these points to ensure the accuracy of results:
- Perform tests in a controlled environment. This means that you have to test TORO Integrate in a dedicated environment where nothing but TORO Integrate affects the overall performance.
- Perform tests that mimic real world scenarios.
- Learn how to monitor the garbage collector's logs.
The procedure for profiling the configured application is generally a repetitive process and roughly entails the following steps:
- Determine the production environment's requirements.
- Determine the suitable garbage collector for the production environment.
- Configure the garbage collector suitable for the production environment.
Enable garbage collector logging on the JVM. This can be as simple as using the JVM option:
Conduct tests using a test case that mimics a real world scenario. A good example is executing REST calls.
- Analyze the garbage collection logs. You can make use of a user-friendly automated GC log analyzer to do the job for you.
- Decide whether the current configuration is satisfactory for your production environment. If (1) the application pause duration is too long or (2) if garbage collection occurs too frequently, then you might have to tune garbage collection further. In this case, repeat all steps starting from step two.
To save us some time, we'll use one of our existing tuning examples: Tuning web service performance. We will be using the configuration and testing guidelines described in the linked document.
In this test, we'll:
- Turn off both Tracker and Monitor
- Modify the test script to return a 29.346 KB JSON response; and
- Measure performance using throughput
The test case will show the difference between two JVM arguments:
-Xmx1g -Xms1g and
We've already ran the test ahead, and the results are described in the chart and table below:
The difference between these two configurations is only minimal. This is mainly due to the fact that the only significant process that consumes memory during the test is the receiving of REST requests and serving of REST responses. In a real world scenario, TORO Integrate will have more work to do than just receiving and responding to REST requests.
Not enough performance boost?
Sometimes, the default configuration of the garbage collector is not enough or adding more memory is not an option. In this case, configuring the heap's young generation is an option; this is considered the second-most influential factor in the performance of the garbage collector.
To guide you in your journey to JVM optimization, you may check out:
- Tuning Java Garbage Collection for Web Services
- Garbage Collection Optimization for High-Throughput and Low-Latency Java Applications
- Java Platform, Standard Edition HotSpot Virtual Machine Garbage Collection Tuning Guide
As of the moment, TORO can only recommend running TORO Integrate with the Oracle JDK, which uses the HotSpot JVM. Switching to a different JDK may result in unexpected behavior. ↩
Java objects that are no longer referenced at any succeeding point of the program. ↩
Oracle Corporation. (n.d.). Chapter 2. The Structure of the Java Virtual Machine. Retrieved January 7, 2019, from https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-2.html ↩
Though this is not always the case. ↩
Oracle Corporation. (n.d.). Sizing the Generations. Retrieved March 13, 2019, from https://docs.oracle.com/javase/8/docs/technotes/guides/vm/gctuning/sizing.html#sthref22 ↩↩