I'm genuinely curious on how well this is working, is there an independent Java test suite that covers major Java 5/6 features that can verify that the JOPA compiler works per the spec? I.e. I see that Claude has wrote a few tests in it's commits, but it would be wonderful if there's a non-Clauded independent test suite (probably from other Java implementations?) that tracks progress.
I do feel that that is pretty much needed to claim that Claude is adding features to match the Java spec.
Well, it's complicated. The original jdk compliance tests are notoriously hard to deal with. Currently I parse nearly 100% of positive testcases from JDK 7 test suite (in one of Java 7 branches) but I only have several dozens of true end to end tests (build .java with jopa, validate classfile with javap, run classfile with javac).
So, I can't tell how good it actually is but it definitely handles reasonably complex source files with generics (something the original compiler was unable to do).
The actual goal of the project is to be able to build at least ANT to simplify clean bootstrap of OpenJDK.
Essentially, I've tried to throw a task which, I thought, Claude won't handle. It did with minimal supervision. Some things had to be done in "adversarial" mode where Claude coded and Codex criticized/reviewed, but it is what it is. An LLM was able to implement generics and many other language features with very little supervision in less than a day o_O.
I've been thrilled to see it using GDB with inhuman speed and efficiency.
I am very impressed with the kind of things people pull out of Claude's жопа but can't see such opportunities in my own work. Is success mostly the result of it being able to test its output reliably, and of how easy it is to set up the environment for this testing?
> Is success mostly the result of it being able to test its output reliably, and of how easy it is to set up the environment for this testing?
I won't say so. From my experience the key to success is the ability to split big tasks into smaller ones and help the model with solutions when it's stuck.
Reproducible environments (Nix) help a lot, yes, same for sound testing strategies. But the ability to plan is the key.
One other thing I've observed is that Claude fares much better in a well engineered pre-existing codebase. It adopts to most of the style and has plenty of "positive" examples to follow. It also benefits from the existing test infrastructure. It will still tend to go in infinite loops or introduce bugs and then oscillate between them, but I've found it to be scarily efficient at implement medium sized features in complicated codebases.
Yes, that too, but this particular project was an ancient C++ codebase with extremely tight coupling, manual memory management and very little abstraction.
Well, just told it to use gdb when necessary, MCP wasn't required at all! Also it helps to tell it to integrate cpptrace and always look at the stacks.
Ah, by the way. I've tried to do the same with Codex (gpt-5.1-codex-max) and Gemini (2.5 pro), both failed spectacularly. This job was done mostly by Sonnet 4.5. Java 6 did not require intensive supervision. Java 7 parts are done with Opus 4.5 and it constantly hits its limits, I have to regularly intervene.
Btw, working on Java 7 support. At this moment I sorta have working Java 7 compiler targeting Java 6 bytecode (Java 7 has StackMapTable which is sort of annoying).
Also, I've tried to replace parser with a modern one. Claude succeeds in generating Java 8 parsers with various parser generators/parser combinators but fails to resolve extremely tight coupling.
Claude won't handle a project of that scale. Even with Java 7 modernization project, which is much simpler than full javac translation, I constantly hit context limits and Claude throws things like "API Error: 400 {"type":"error","error":{"type":"invalid_request_error","message":"messages.3.content.76: `thinking` or `redacted_thinking` blocks in the latest assistant message cannot be modified.
These blocks must remain as they were in the original response."},"request_id":"req_011CVWwBJpf3ZrmYGkYZLQVf"}" at me.
It's much older, but even now this is THE ONLY viable pathway to bootstrap a modern JDK from scratch. I'm trying to modernize it so the bootstrap path might be shortened.
I'm genuinely curious on how well this is working, is there an independent Java test suite that covers major Java 5/6 features that can verify that the JOPA compiler works per the spec? I.e. I see that Claude has wrote a few tests in it's commits, but it would be wonderful if there's a non-Clauded independent test suite (probably from other Java implementations?) that tracks progress.
I do feel that that is pretty much needed to claim that Claude is adding features to match the Java spec.
Well, it's complicated. The original jdk compliance tests are notoriously hard to deal with. Currently I parse nearly 100% of positive testcases from JDK 7 test suite (in one of Java 7 branches) but I only have several dozens of true end to end tests (build .java with jopa, validate classfile with javap, run classfile with javac).
So, I can't tell how good it actually is but it definitely handles reasonably complex source files with generics (something the original compiler was unable to do).
The actual goal of the project is to be able to build at least ANT to simplify clean bootstrap of OpenJDK.
Essentially, I've tried to throw a task which, I thought, Claude won't handle. It did with minimal supervision. Some things had to be done in "adversarial" mode where Claude coded and Codex criticized/reviewed, but it is what it is. An LLM was able to implement generics and many other language features with very little supervision in less than a day o_O.
I've been thrilled to see it using GDB with inhuman speed and efficiency.
I am very impressed with the kind of things people pull out of Claude's жопа but can't see such opportunities in my own work. Is success mostly the result of it being able to test its output reliably, and of how easy it is to set up the environment for this testing?
> Is success mostly the result of it being able to test its output reliably, and of how easy it is to set up the environment for this testing?
I won't say so. From my experience the key to success is the ability to split big tasks into smaller ones and help the model with solutions when it's stuck.
Reproducible environments (Nix) help a lot, yes, same for sound testing strategies. But the ability to plan is the key.
One other thing I've observed is that Claude fares much better in a well engineered pre-existing codebase. It adopts to most of the style and has plenty of "positive" examples to follow. It also benefits from the existing test infrastructure. It will still tend to go in infinite loops or introduce bugs and then oscillate between them, but I've found it to be scarily efficient at implement medium sized features in complicated codebases.
Yes, that too, but this particular project was an ancient C++ codebase with extremely tight coupling, manual memory management and very little abstraction.
how did you get gdb working with Claude? There are a few mcp servers that looks fine, curious what you used
Well, just told it to use gdb when necessary, MCP wasn't required at all! Also it helps to tell it to integrate cpptrace and always look at the stacks.
MCP is more or less obsolete for code generation since agents can just run CLI tools directly.
Jopa means ass in russian, this reminded me of Pidora.
I came here for this comment! TIL about Pidora :D
Don't forget NPM packages Mocha and Chai (Pee and Tea)
There's JEPA too
Ah, by the way. I've tried to do the same with Codex (gpt-5.1-codex-max) and Gemini (2.5 pro), both failed spectacularly. This job was done mostly by Sonnet 4.5. Java 6 did not require intensive supervision. Java 7 parts are done with Opus 4.5 and it constantly hits its limits, I have to regularly intervene.
Btw, working on Java 7 support. At this moment I sorta have working Java 7 compiler targeting Java 6 bytecode (Java 7 has StackMapTable which is sort of annoying).
Also, I've tried to replace parser with a modern one. Claude succeeds in generating Java 8 parsers with various parser generators/parser combinators but fails to resolve extremely tight coupling.
what is the feasibility/crazyness level of "llm porting" the javac source code to c++ ?
setting copyright issues aside, javac is a pretty clean textual-input-output program, and It can probably be reduced to a single thread variant
Claude won't handle a project of that scale. Even with Java 7 modernization project, which is much simpler than full javac translation, I constantly hit context limits and Claude throws things like "API Error: 400 {"type":"error","error":{"type":"invalid_request_error","message":"messages.3.content.76: `thinking` or `redacted_thinking` blocks in the latest assistant message cannot be modified. These blocks must remain as they were in the original response."},"request_id":"req_011CVWwBJpf3ZrmYGkYZLQVf"}" at me.
tangential: Isn't there from the same time period, a java compiler written in java?
It's much older, but even now this is THE ONLY viable pathway to bootstrap a modern JDK from scratch. I'm trying to modernize it so the bootstrap path might be shortened.
See https://bootstrappable.org/projects/java.html
Java-to-bytecode compiler (javac) has always been written in Java. There was a JVM written in Java: Jikes RVM.