A couple weeks ago, New Relic launched a new hosted, on-demand Java profiling tool. We’re big fans of New Relic (great for Ruby or Java), dynaTrace (great for .NET or Java), and other application performance monitoring tools because they complement our website load testing and website monitoring services so well.

At BrowserMob, we’re a Java shop, and so it didn’t take long for us to have an opportunity to try out New Relic’s Java profiler. We were getting reports from some customers that viewing the response time charts of their monitoring data was occasionally slow (don’t worry, we didn’t miss the irony.)
To investigate the problem, we turned on New Relic’s profiler and selected 5 minutes of profiler time. You can choose between 1 and 10 minutes as the length of time you want to gather data. During that time, there is a small but potentially noticeable hit in performance on your site, but the upside is that the profiler is able to inspect deep in to our codebase and expose bottlenecks.
Five minutes later, we instantly saw the problem:

As you can see, during that 5 minute period, 55% of the time was spent on java.net.SocketInputStream.socketRead. Tracing back a bit, we were able to see that was from a call by MonitoringController.selectRangeForObject. It turned out we were calling a cloud-based data server that was having performance issues due to the way we constructed our API call. It didn’t take more than a few minutes to optimize the call and roll out the fix.
Such is the power of application performance monitoring tools. If you’re doing any sort of load testing, website monitoring, or page optimizations, you’re going to be very blind unless you have access to information like this. That’s why we jumped at the opportunity to partner with New Relic and dynaTrace and encourage all our customers to look at their products.




But thread call stack sampling does not scale (certainly not in production) unless the concurrent workload is relatively small and you have low hanging fruit that could in all likelihood be found with very simplistic mechanism such as a thread stack dump like above.
I wrote a number of articles on this. Sampling is cheap for the vendor development wise but its cost benefit analysis at runtime is very poor once you go beyond the obvious.
http://williamlouth.wordpress.com/2009/10/21/java-call-sampling-overhead-in-the-wild/
http://williamlouth.wordpress.com/2009/01/16/profiling-sampling-versus-execution-part-2/
http://williamlouth.wordpress.com/2009/01/07/profiling-sampling-versus-execution-part-1/
William
Hi William,
New Relic’s sampling approach must be different from the one you benchmark in http://williamlouth.wordpress.com/2009/10/21/java-call-sampling-overhead-in-the-wild/. For our solution, overhead to the application is zero when you’re not running a “collection session”. And when you are running a collection session, it’s only on a single JVM in your cluster and the impact to that single JVM is around 10% cpu burn. For a tool that gives complete thread state visibility that amount of overhead seems very reasonable.
Jim
Jim I have not figured out whether you are inexperienced enough to believe you what you and your company claims or experience enough to withhold enough information to prevent anyone showing how clearly wrong you are in your claims.
Our testing of the impact of call stack sampling is independent of any profiling processing. You cannot do any better than the numbers we shown as this is the baseline from which you add you own overhead.
So how can we both be right. Well there are a number of factors some of which reflect the type of web applications that your company generally monitors.
1. The web application is severely IO bound. Any overhead whether excessive compared to other solutions will pale in comparison to the amount of time spent executing database/file/messaging operations.
2. The degree of concurrency is relatively low compared to enterprise applications which can easily have more than 200 active threads at a time.
3. The maximum call stack depth is relatively low compared to enterprise enterprise applications (especially Spring based ones) which generally exceed 300 by the time they hit their exit point (messaging, database, rmi,…).
4. You have offloaded a large amount of (cpu) work onto other non-application specific threads assuming their is a lot of spare compute capacity.
5. The sampling time interval is set extremely high compared to other typical profilers.
Now after reading my links I am surprised that you could not manage to provide any further information other than 10% which by the way I find pretty excessive and certainly not production ready but then again we operate at different ends of the spectrum in terms of performance whereas JXInsight Probes technology has a cost in the low nanoseconds yours is in the double digit microseconds for your trace technology – 1000x slower.
If you dispute any of the above then we again extend our offer (which was previously reused) to have an official performance shootout – ours probes technology against your legacy trace technology. Then we can finally bring truth to the claims of zero overhead on and off both in terms of runtime and analysis.
http://williamlouth.wordpress.com/2010/03/02/does-transaction-tracing-scale-analysis/
By the way Patrick we have also requested a similar shootout of dynaTrace (aka dynaCopy) on many occasions – each time they also refused.
Our technology is hundreds (if not thousands) of times more efficient than both solutions and yet we do not go around making the unqualified “low overhead” claims both these cowboys make.
William
Hi William,
I claimed that our overhead was less than 10% cpu burn when sampling threads, and the data for Patrick’s app confirms this. Here’s a graph of CPU burn on the java process:
http://skitch.com/jim-gochee/db36j/centcom-application-cpu-usage-broken-out-by-host-new-relic-rpm
I’ve put arrows at the start/end of the sampling session.
I’m not sure what ax you’ve got to grind, but our product is free to download and run, so knock yourself dead comparing us to your solution.
Jim
What is it with you guys. I ask for real data you provide another meaningless chart. Did you actually read my comment above. It stated the cost factors (drivers) for call stack sampling overhead. You failed to even list one. You could not even manage to specify the cpu count on the machine.
As I stated before you cannot beat the figures I listed. These are the baseline figures for the Java runtime. Your collection and processing costs will go on top of these. The only way you can reduce the impact is to pick a IO/DB bound app (which you have), one with very short call stacks (which you have), one with very little concurrent thread processing (which you have), and you have increased the sampling interval close to a second or more (which I suspect).
Again if you dispute my findings (and I am THE expert in this field) you are more than welcome to use my test classes or benchmark against our technology. I know you unit costs are their horrendous so its no problem for me but for you well……
I’m just wondering if your name is actually William Loudmouth– I’m planning on launching a campaign to smear your product on the simple grounds that you’re a simple @#$hole with a constant axe to grind. You’re such a schmuck that it doesn’t even matter whether you’re right or wrong, but the tone of your messages has convinced me to ignore your software and anything to do with you at all costs. Do all of us a favor and please have your PR division prevent you from any more postings.
Moving on from “Who is the Biggest Schmuck Here” competition I have posted recently some further information on the “best practices” used by vendor marketing departments in claiming “low overhead”.
http://williamlouth.wordpress.com/2010/05/25/the-java-application-performance-management-vendor-showdown/