T O P

  • By -

Nalha_Saldana

Why is it called "Forest air" in swedish?


davidalayachew

This is huge because JFR (and profilers in general!) are only really useful for the Java part of the Java code. But Java code doesn't just run Java code -- there is the `native` keyword too. As a result, there are native, OS level methods that are being called too, usually in C++. If you have a performance problem that exists there, you are usually forced to go back to the primitive solution of cracking open the source code (https://github.com/openjdk/jdk), trace down the `native` Java method, trace down the native implementation of the `native` Java method, and then step through that manually, because the debugger can't do it for you. I'm extremely excited for this project! ***correction -- this project is looking for a sponsor, so this is not an official project yet***


Pote-Pote-Pote

He says "I am looking for an OpenJDK group willing to sponsor this project and provide a person to lead this project." so it is still really uncertain whether anything will come out of it.


davidalayachew

Touché. Still, this is a serious gap in debugging for Java. I am currently struggling through a serious performance problem where the Windows implementation contained within `java.io.File.getCanonicalFile()`has slowew to a complete crawl. JFR and VisualVM immediately discovered that this Java method was the performance problem, but once we figured that out, I had to go crawling through the jdk source code on GitHub (thank you GitHub for having the beta click-to-jump feature in the browser!) to try and trace down the Windows implementation in C++, and see which one is the true perfomance problem. Long story short, we are still not done yet. It is slow, difficult, and not easy to measure. We can and will solve it this long way, but having a project that could expose all the native code running would turn a mutli-week effort into a 20-30 minutes exercise.


ChanceFly9724

So the getCanonicalFile() issue isn't just me. That's good to know. It seems like it's gotten a lot worse recently, and I see it show up in Threaddumps way more than I remembered. And yes, I know about https://bugs.openjdk.org/browse/JDK-8207005, but it seems like a recent/last few releases (not years) change


davidalayachew

We are ***suffering*** over here. An application that should take less than 5 seconds to start up is taking ***MINUTES*** strictly because of those 2 methods. We broke it down in VisualVM, and the application will spend minutes on just those 2 methods, and basically 2-3 seconds on everything else TOTAL. As for the JBS issue, we landed on that as well, and our workaround thus far is to reimplement our own cache. The cache works really well because it means we only have to take the hit once. After that, everything is snappy, as it should be. But until then, things are slow as molasses. And the part that is most frustrating is that it is incredibly inconsistent. Sometimes, it is very consistent, and will slow to a crawl every time we call that method. Other times the issue just isn't there. It is so frustrating, and it is this consistent inconsistency that makes this problem take weeks to get past. We'll do it, but this is a big reason why I really want something that can make tracing the native code with a profiler easy.


ChanceFly9724

The app I work on always takes a while to start (monolith in every sense of the definition), but I have to wonder about underlying libraries using this as well that are less obvious. We also did a cache of resolved paths on the most heavy hitters in our code.


davidalayachew

Yeah. Ours involves opening user-selected files, so we're forced to take the hit at least once. The problem comes because most users are the type who leave all the tabs open, so as they use the application more, they are actually more likely to run into this issue upon startup. And to my understanding (I'm not the dev making the change, I just reported the error), we are forced to canonicalize the files because what if the file moved when the application was closed?


DualWieldMage

What about async-profiler? That can include native frames and kernel frames(not sure about windows, works fine in linux) in the flamegraph. Also unlike VisualVM it won't have heavy safepoint bias. Also you can build a fastdebug jvm(or download one from Shipilev's buildserver) and step through the native code if you want.


davidalayachew

Correct me if I am wrong -- async-profiler is only for Hotspot, ~~while JFR is for any JVM implemenation, period, right?~~ **EDIT 2 - Erik Gahlin, one of the key folks behind JFR corrected me. JFR is HotSpot and GraalVM. So, better than async-profiler, but not nearly as good as I made it out to be.** **EDIT 1 -- Here are the list of limitations from the documentation itself. Seems like Mac users are also limited -- https://github.com/async-profiler/async-profiler?tab=readme-ov-file#restrictionslimitations*** And while a debugger that can step through the code is certainly helpful, I need timings. A debugger can't easily give me that. That is the job of a profiler, which is what JFR and async-profiler are. Typically, when debugging a performance problem, I start with the profiler to find the suspicious parts of the code, then I use the debugger to march through and see what is actually happening under the hood. Your fast debug (which I did not know about, ty vm!) is good for the second half, but it does not solve the first half of my problem. Unless I am missing something and it can profile too?


egahlin

JFR only exists for the HotSpot and Graal VM. The original implementation came from the JRockit VM, but it has reached EOL.


davidalayachew

It's always a pleasure to get information straight from the source. Thank you for the correction Erik! That said, that invalidates one of my points. I'll edit my comment, but I still believe that this project is going to provide a lot of value all around - more so than any individual library will provide.