Sunday, February 12, 2017

JVM Performance Tuning Part V - JVM Diagnostics

In this last section, I attempt to pull together the individual topics covered in the previous blogs and offer some general advice for approaching tuning and how to verify your tweaks are having the desired effects.

Just to remind ourselves, I cover Basic Principals in Part I, The Java Memory Model in Part II,  Garbage Collectors in Part III, and Heap Configuration and the JIT in Part IV.

Here we’ll learn that, in practice, these discrete topics impact each other and that tuning is really about getting the balance right. That means that you’ll need to have an understanding of the whole, in order to be confident when addresses the specific.

Memory Size & GC Performance

In their guide to Memory Management in the Java Hotspot™ Virtual Machine (from 2006) Sun Microsystems remind us that:
“Sufficient available memory is the most important factor affecting GC performance.”
And:
“The second most influential factor is the proportion of the heap dedicated to the Young [New] generation.”
These observations still hold true. Ensuring sufficient memory is available – both for holding older objects and to ensure there is room for new objects – is the main factor affecting how your GC behaves and, in turn, how your application performs.  

Achieving the Balance

Like any tuning effort, JVM tuning is all about achieving a balance. To help with understanding what might be perceived as conflicting goals, the following observations are worth considering:
  • Minor-Collections are typically much faster (by an order of magnitude) than Major-Collections.
  • Major-Collections are more likely to adversely affect observable performance, affecting start-up times and/or causing the application to pause or even appear unresponsive.
  • The JVM would seem to only resort to Major GC-cycle when it can’t clear enough memory in a Minor GC cycle.
  • A larger memory heap, with sufficient space allocated to the NewGen, can help avoid Major-Collections, though at the cost of prioritizing performance over memory usage.
  • Larger generations can result in longer GC cycles.
So, what can we conclude from this? Well, it would seem reasonable that you’ll need a large enough New-Generation so as not to have to resort to having the GC clear out the Tenured-Generation, thereby incurring the cost of a Major-Collection. New objects are allocated in the New-Generation so keeping the memory levels optimal here is key to ensuring smooth performance overall. Of course, long-lived objects will eventually be promoted from the New-Generation to the Tenured-Generation so it also follows that this generation needs to be large enough to hold these mature objects, and still leave some room for expansion. Should the JVM have to trigger a Major-Collection, then it’s likely it really does need the extra space and it’s critical that it can recuperate some memory. Often, depending on the nature and lifecycle of your mature objects, this situation may well be a short precursor to an OutOfMemory exception and the termination of your process.
Ultimately, the ideal scenario we want to achieve is to have the lower-cost Minor GC-cycle run as needed and, if at all possible, avoid the more expensive Major GC-cycles entirely.

GC Behaviour in Practice

As a practical observation, you’ll find that the Minor GC-cycles kick-in very soon after the process starts, triggering quite frequently thereafter. This typically doesn’t affect throughput at all whereas you may well see your application pause when a Major GC-cycle is triggered. In fact, unless you explicitly log diagnostic info you may well never really even notice a Minor GC-cycle in effect. You may recall from Part I that I made an analogy with bailing a sinking boat. While GC strategies vary, the main job is to ensure there is enough memory free to allocate new objects and keep the ship afloat. That’s really what the Minor-GC cycle is doing; clearing out stale memory (i.e. garbage objects) fast enough to make room for fresh memory allocations. It merely needs to make room for new objects before limits are reached and in this regard, GC performance has a direct bearing on memory size.
However, all this depends on the characteristics of your application and you may well need to do some leg-work to gain that level of understanding. This is where a profiling tool can help you, not so much with the performance profiling per se but with giving you a better overview of what your application is actually doing or at least indicating which parts of the code base are actively exercised. In turn, this can help you learn what sort of objects are being allocated and from where. I like to use JProfiler, a tool I selected after a short evaluation several years ago. Like all sophisticated tools, it takes a little time and effort to become familiar with and to get the full and proper use out of it. However, one factor that fed into my decision to opt for JProfiler as because I felt that it was the most intuitive profiler to use and I could quickly become effective. As an aside, I’d advise that spending the time and effort needed to understand the tool and the advanced features in particular may well dividends when you find yourself with the need to profile.

New/Tenured Memory Ratio

In cases where you don’t have a handle on the characteristics or behaviour of the application, you may want to consider using a trial-and-error approach to determine the optimal ratio between the new and tenured generations. Remember that, rather than configuring the size of the New Generation explicitly, you can configure it as a ratio of the Tenured Generation.
For example, use -XX:NewRatio=3 to do this.
This translates as a ratio of 1:3, which means that the size of the New Generation will be a quarter or 25% of the total heap.
Note however that the New Gen grows and shrinks in correlation with the Tenured Generation, which may or may not be a consideration for you. Personally, I prefer to have more control and explicitly define the size of the New Gen, even when I’ve worked it out to be a neat fraction of the Tenured Generation.

Garbage Collection Diagnostics

You can get the JVM to print statistics from each GC-cycle, which reveals interesting details on what the GC is doing and how it’s performing. The following JVM options will enable this:
JVM Parameter
-verbose:gc
-XX:+PrintGC
-XX:+PrintGCDetails
-XX:+ PrintGCTimeStamps


The actual logged output will vary according to what GC policy is in use but here’s a sample of what the output looks like:
The first thing to note is that the log will tell you whether the GC cycle is a Minor or a Major collection. Next, it provides before and after details on memory sizes. The exact format can vary but you should be able to work out how much memory was reclaimed by the GC. Finally, it’ll report the time spent performing the GC and it’s worth comparing the scale of the Minor collections versus the Major collection.
The log output for a full or major GC cycle is highlighted below and you can see that it reports the before and after memory levels of each of the generations. In particular, note that the major GC cycle also examines PermGen.

Next, let’s examine the log output from the minor collection of the New Generation.
The log entries for the Minor GC cycle shows the before-and-after memory levels of the New Gen and the before-and-after levels of the full heap, which allows us to calculate the amount of the garbage collected.
The logs also report the time taken for the GC cycle, as follows:
The following article from Sun Microsystems (but now on the Oracle website) does an excellent job of explaining what the diagnostic entries mean and it’s worth studying it to decipher what you can gleam. http://www.oracle.com/technetwork/java/example-141412.html

Diagnostics Analysis

By gathering these figures over time and then ‘massaging’ them into data, we can devise some key performance indicators and get an indication of how effective our GC policy and Heap-configuration actually is. (In this particular case, I manually massaged the raw data into a format that could be imported into an Excel spreadsheet but, with a little bit of effort, I’ve no doubt that the process could be automated through scripting.)
The raw data from the logs can be used to give percentage indicators, as follows:

Heap Occupancy lets us know how much of the heap we’re actually using, as a percentage. It can be used to help determine whether the heap is too big or too small. When tracked over a number of intervals, it can give an indication as to your application’s memory profile. (This would be merely a rough indicator and a profiler would give a more complete picture over time.)

GC Efficiency provides an indication as to how effective the GC invocation, for each cycle. It indicates how much of the New Generation was cleared out by the GC cycle, which is a general ‘health’ indicator as it infers that the GC cycle was effective, cleared out memory and that there is no need to perform a full cycle.


The following two graphs illustrate the same information, as bar and line graphs respectively.
Promotion Ratio gives an indication as to how much memory, or more precisely what percentage of the heap, is being promoted to the Tenured Generation. This is calculated by subtracting the size of the ‘heap decrease’ from the change in size of the New Generation.

Young/Tenured Ratio represents the size of the New Generation as a percentage of the entire heap and gives an indication as to the relationship of new objects versus the lifecycle of ‘tenured’ objects. It can be used to help size the New/Tenured generations, relative to each other.
Collectively, these figures provide indicators as to how effective your tuning actually is. They let you know whether your heap configuration settings are roughly correct and how effective your choice of GC is. Additionally, they provide a little insight as to the behaviour of the lifecycle of the objects in your application over time. This can be an indicator as to whether you have an over-reliance on short-lived or temporary object or whether your cache (of objects) is too large for the amount of heap you’ve allocated. These indicators aren’t conclusive but can provide some clues as to where you might need to start when it comes to heap-analysis.


Comparative Analysis

Of course, it’s possible to contrast these data points. The following chart shows how the Promotion Ratio and the Young/Tenured Ratio look over the same time period.


And the following shows how Heap Occupancy and GC Effectiveness look over time.

Ultimately, these graphs merely allow us to see the diagnostics data more effectively though, in so doing, it may help spot patterns and identify where we might have to start profiling and investigating deeper.


Summary

The JVM diagnostics should really be used to confirm whether your hypothesis – i.e. the particular combination of GC policy, Heap Configuration settings – is actually having the effect you’re hoping for. It should be clearly understood at this point that these tweaks can impact each other and that what you’re looking to achieve is a working balance – i.e. a compromise that’s ‘good enough’. Use the diagnostics to check that by achieving the results in one area, you haven’t created a problem in the other.
I feel it only fair to point out that there are limits to what you can achieve here. While my investigations have left me convinced that you can nearly always improve on the default JVM settings, the scale of the improvements are likely to be relatively small. If your application ‘runs like dog’ (i.e. performs poorly) to begin with, then no amount of tweaking is going to turn it into a racehorse. Ultimately, you may have no choice but to look at the fundamental design of the application and set about re-architecting the solution. In this case, all you can hope for from the diagnostics is perhaps an indication as to where the issue starts to show. Then again, that will likely only be a symptom of an underlying flawed design. In my case, it was just enough to get us over a hurdle for the short-term.
Java is a great programming environment but, in insulating programmers from the mechanics, many Java programmers have little-to-no understanding of the effects of their code at the machine level (JVM). Performance is often an afterthought. And, it can incredibly difficult to retrofit performance into an application.
Equally, as often quoted “premature optimization is the root of all evil” and we do a disservice to our coding when our brains are preoccupied with the minutiae of the performance of each line of code. Instead, a balance is required and I’d advise that Java programmers regularly run their code under a profiler, just to keep on top of things, to flag issues early and to help them understand the wider context.




References








Monday, March 21, 2016

JVM Performance Tuning Part IV – Heap Configuration & JIT Compiler

In this section, I discuss Heap Configuration and the Java Just-in-Time compiler (JIT) , building upon previous blogs covering principal concepts in Part I, the Java Memory Model in Part II and Garbage Collectors in Part III.

Heap Layout & Sizing

In Part II I describe how Java organizes memory so we understand that the HotSpot™ VM adopts a generational approach to organizing memory. From Part III, we understand that the GC tackles garbage collection differently in each of the memory generations, even to the point where we can configure different garbage collectors for the New and Tenured generations respectively.
An interesting point to note is that most monitoring tools display memory levels in a single graph – which is a good thing – but you should understand that memory levels are going to alter independently in each of the generations. (The Java Console tool does actually show a graph illustrating the memory levels of each memory section.) The following diagram is somewhat confusing but it attempts to illustrate this point, and show where some of the terms referencing limits and ratios apply.
To help with this, I’ve effectively turned the diagrams in Part II on-end, which showed memory laid out from left-to-right, to show the layout from bottom-to-top.

At the bottom, the diagram shows the amount of memory used plus the memory allocated in the New Generation. Above this, the diagram shows the amount of memory used plus the memory allocated in the Tenured Generation.
You may well see references to the amount of memory committed versus the amount of virtual memory. Memory allocated by the VM but not yet used by the application is referred to as virtual. It is memory that’s been assigned to the application but has not yet been used by the application and essentially represents the amount of memory in reserve for the application, without the JVM having to extend the heap further.

The total amount of memory taken by the VM (from the OS) is referred to as committed and represents the sum of memory actually used by the application plus the memory allocated (or reserved) for the application.  The term committed can be a little bit confusing so I find it helps to consider that the JVM is just another application running on the OS that assigns blocks of real memory. In turn, the JVM acts like, well, a VM or OS and assigns memory as requested by the application, ensuring there is a reserve that’s allocated though not yet used. From the perspective of the OS, that memory is committed to the JVM.
The Max Heap is a limit value representing the absolute maximum amount of heap memory that the JVM can consume and assign. Typically, the size of the Committed Heap will start off as less than the Max Heap though it’s quite permissible for the JVM to have the Committed Heap equal to the Max Heap. In fact, it’s recommended in some cases where applications are sensitive to start-up time and have an aggressive memory allocation profile at this point.
The Min Heap is a limit value representing the minimum amount of Heap Memory the JVM can allocate. Remember that allocating memory comes as a cost so it’s important to understand that the min help should be large enough to run your application. In other words, it’s a helpful instruction to the JVM on where it needs to start.
At a finer level of granularity, you may see mention of Max New Heap and Min New Heap and similarly, Max Tenured Heap and Min Tenured Heap. This is merely a way to refer to Max and Min heap sizes within the context of a specific generation.


The following diagram illustrates yet another way to consider heap sizes.

The Max Heap Free Ratio is merely a way to express the maximum amount of memory available (but not allocated) as a percentage of the total committed. Similarly, the Min Heap Free Ratio is merely a way to express the minimum amount of memory that, as percentage of the total committed, should be available. Both of these ratios are used to instruct the JVM as to how it should behave when allocating and de-allocating memory ahead of or after demand. Clearly, some common sense needs to prevail when specifying these as they essentially refer to the size of virtual memory and can have a direct impact on when the heap is extended or contracted.

Stack Size

While it may seem obvious at first, Stack Size can affect the overall memory size of Java applications, in particular where there are a large number of threads operating, as is often the case with ‘server’ style Java applications, especially with larger ‘monolithic’ servers.
The significant point to note is that stack-size is universal – it’s applied to all threads in the application, regardless of whether they’re the particular thread (or group of threads) that require a larger stack.
To illustrate the point, say your large-scale, enterprise server application utilizes 100 threads. Clearly then, every 256K increase in stack size results in an overall increase of 25MB in total memory consumption. Now, admittedly this was more of an issue when deploying on 32-bit platforms than 64-bit but it’s worth bearing it in mind when you’re strapped for memory space.
Note that the default stack size varies according to platform, architecture and release. Regardless, an insufficient stack size will simply result in a StackOverflow crash. Often, this can be as a result of recursion or the need to traverse large tree/graphs (using recursion) so it’s often the case that it’s just in the one place and hence one thread that encounters this sort of issue. The quick fix is to increase the stack size but where things are tight, you may have to reconsider how the code operates to see if a less stack-intensive solution can be deployed.

JIT Compiler

Most Java programmers understand the role and purpose of the JRE (Java Runtime Environment) and that their Java code gets compiled into ‘byte-code’ that’s interpreted at run-time. This aspect of Java attracted a lot of criticism in the early days when compared to performance of native binary compiled from, say, C++ code. Enter the JIT – the Java ‘just-in-time’ compiler – an aspect of the JVM that converts byte-code into native code at run-time to achieve performance levels equivalent to natively compiled code.
The key to understanding this aspect of the JVM is:
  • All code starts out as byte-code but can be compiled into native code when certain thresholds are achieved.
  • There is a cost to performing the JIT compilation.
Basically, the JIT kicks in when a code section executing count exceeds a certain threshold and this can impact start-up time. For applications with a sensitive start-up period, it may make sense to increase this threshold to put off the JIT compilation costs.
Alternatively, you could opt to pay that cost up front and gain the benefit over the lifetime of the program. The following JVM options allow you to customize JIT behavior.
Parameter
Effect
-XX:CompileThreshold=300000
Delay JIT being triggered.
-XBatch
Perform JIT Compilation in foreground.





In my next and last blog in this series, I’ll discuss JVM Diagnostics and will illustrate how to work with the JVM to test and verify whether your tuning effects are having the desired effect. I’ll also show you how, with a little bit of grunt effort, you can visualize and interpret diagnostic information to a greater level of depth than many profiling or memory monitoring tools and identify key performance indicators.