TLDR:
Partial evaluation starts at RootNode.execute() and follows normal Java calls - no reflection on node classes.
Node instance constancy and AST shape are the foundation of performance.
Granularity matters; boundaries matter even more.
DSLs and directives aren’t mandatory, but they encode the performance idioms you’d otherwise have to rediscover.
Inspection with IGV is normal — nearly everyone does it when tuning a language.
Full Answers:
how does Truffle identify the code to optimize?
Truffle starts partial evaluation at RootNode.execute(VirtualFrame). During partial evaluation, the RootNode instance itself is treated as a constant, while the VirtualFrame argument represents the dynamic input to the program.
Beyond that, Truffle does not use reflection or heuristics to discover execute() methods. It simply follows the normal Java call graph starting from the RootNode. Any code reachable from that entry point is a candidate for partial evaluation.
This means you can structure Node.execute(..) calls however you like, but for the compiler to inline and optimize them, the node instances must be constant from the RootNode’s point of view. To achieve that you should:
Make fields final where possible.
Annotate node fields with @CompilationFinal if their value is stable after construction.
Use @Child / @Children to declare child nodes (this tells Truffle the AST shape and lets it treat those nodes as constants).
Granularity and @TruffleBoundary
Granularity matters a lot. Many small, type-specialized Node subclasses typically optimize better than one monolithic execute() method. @TruffleBoundary explicitly stops partial evaluation/inlining across a method boundary (useful for I/O or debugging), so placing it incorrectly can destroy performance. The usual pattern is to keep “hot” interpreter code boundary-free and push any side effects or slow paths behind boundaries.
Truffle DSLs and compiler directives
The DSLs (Specialization, Library, Bytecode DSL) are not strictly required for peak performance. Anything the DSL generates you could hand-write yourself. However, they dramatically reduce boilerplate and encode best practices: specialization guards, cached values, automatic rewriting of nodes, etc. This both improves maintainability and makes performance tuning much easier.
Similarly, compiler directives (@ExplodeLoop, @CompilationFinal(dimensions = ...), etc.) give the optimizer hints. They are incremental , you can start with a naïve interpreter, but expect to add annotations to reach competitive performance. Without them, partial evaluation may not unroll loops or constant-fold as expected.
Performance expectations and inspection
Truffle interpreters are not automatically fast. A naïve tree-walk interpreter can easily be slower under partial evaluation than as plain Java. Understanding how PE works, constants vs. dynamics, call graph shape, guard failures, loop explosion, etc. is essential.
In practice, most language implementers end up inspecting the optimized code. Graal provides two main tools:
Ideal Graph Visualizer (IGV) for looking at the compiler graphs and ASTs.
Compilation logs / Truffle’s performance counters to see node rewriting, inlining, and assumptions.
The Truffle docs have a dedicated “Optimizing Your Interpreter” guide that demonstrate the patterns. I would also recommend checking out the other language implementations for best practices.