tfpt review “/shelveset:Loops20;REDMOND\tomat”
Implements adaptive loop compilation. This feature needed major changes
to local variable handling and control flow implementation in
interpreter.
Local variables
Replaces a list of local variables with LocalsVariable structure that
encapsulates a dictionary. It doesn’t support variable shadowing yet but
it at least detects it and throws NotSupportedException. Previously we
silently used wrong indices to the variable array.
Control flow
Reimplements interpreter goto instructions and exception handling. Goto
instructions used to encode all information describing the jump (a list
of finally blocks to be executed and target stack depth). The loop
compiler needs to find all GotoExpressions within the loop that jump out
of the loop and associate them with the corresponding Goto instructions.
This cannot be done in presence of reducible nodes as they don’t
preserve nodes identity. Therefore we need to move the jump information
from goto instruction to the target label and track current try and
finally blocks.
GotoInstruction, EnterTryFinallyInstruction and
LeaveExceptionHandlerInstruction derive now from
IndexedBranchInstruction. While OffsetInstruction hold on a relative
offset these instructions hold on the target label index in the table of
RuntimeLabels. RuntimeLabel struct comprises of target instruction index
and target stack depth and target continuation stack depth. That’s all
it is needed for a jump to be executed. Jumps via label index are a
little bit slower than jumps to relative offset since they need to look
up the target index in the label table. Also the label table is only as
big as there are gotos and try-catch/try-finally blocks in the lambda.
We can easily convert other branch instructions into
IndexedBranchInstructions if we find it better.
Using indexed branch instructions moves target stack depth to the label.
We also need to move finally list out of goto instruction. Since a
single label might be used as a target of multiple goto
instructions/expressions and these could be nested in different
try-finally blocks we need to track the stack of finally blocks that we
enter and leave as we execute instructions.
EnterTryFinallyInstruction is added at the beginning of every
try-finally block. This instruction pushes a local continuation into the
stack of continuations stored on InterpretedFrame. The top item of this
stack is current continuation. A continuation is implemented as an
integer index into label table. The continuation pushed by
EnterTryFinally points to finally clause.
GotoInstruction sets the current pending continuation and pending value
(if it transfers a value) and jumps to the current continuation if there
is any.
A GotoInstruction is emitted at the end of the try-finally body. This
goto’s target is the end of the entire try expression.
EnterFinallyInstruction is emitted at the beginning of finally clause.
It removes the current continuation from the continuation stack, pushes
the pending continuation and value onto the data stack and invalidates
them. If any exception is thrown but not caught during execution of
finally clause the current pending continuation is canceled (and
forgotten) and a new one is set.
LeaveFinallyInstruction is emitted at the end of the finally clause. It
pops the pending continuation (and pending value) from data stack and
yields to it. YeildToPendingContinuation operation compares continuation
stack depth of the current continuation with the continuation stack
depth of the pending one. It jumps to the pending one only if its depth
is less, i.e. when there is no continuation (finally clause) to be
executed before we can jump to the target block. Otherwise it jumps to
the current continuation.
Whenever an exception occurs we catch it in Interpreter.Run method. We
look for the exception handler that should be executed.
If we find one we perform the same steps as if we just executed
GotoInstruction targeted to the exception handle: we set the current
pending continuation to the label that points to the handler and set
pending value to the exception object. Finally, we jump to the current
continuation.
If there is no catch or fault handler we do the same as if there was one
with instruction index Int32.MaxValue. That emulates a jump to the end
of the instruction sequence. If this jump is not interrupted by another
exception raised from some finally/fault block or goto jumping from a
finally block we finish instruction execution and return from Run method
with the current InstructionIndex set to the special value
Int32.MaxValue. That indicates that we should rethrow the exception and
so we do.
Moves InterpretedFrame chaining from IronRuby to the interpreter. The
frames are linked into a stack by Interpreter.Run method so that each
CLR frame of this method corresponds to an interpreted stack frame in
the interpreted stack. The two traces can be combined into one. A static
ThreadLocal variable is updated upon entry and exit
from Run method.
Loop compiler
Adds a new EnterLoopInstruction that is injected at the beginning of a
loop generated from LoopExpression. This instruction has a counter that
increments each time it is executed. If the counter reached
CompilationThreshold a compilation is started on a background thread.
The instruction holds on the LoopExpression to compile. The loop needs
to be massaged before we can compile it to a lambda. The lambda we
produce looks like:
int lambda(InterpretedFrame frame) {
T$1 loc$1 = (T$1)frame.Data[$index1];
…
T$n loc$n = (T$n)frame.Data[$indexN];
StrongBox closure_loc$1 = frame.Closure[$index1];
…
StrongBox closure_loc$M = frame.Closure[$indexM];
try {
… loc$1 = value …
… closure_loc$1.Value = (object)value;
… return frame.Goto(labelIndex, value) // for each goto label
(value), where label is outside loop
} finally {
// write back
Frame.Data[$index1] = (object)loc$1;
}
return $breakOffset;
}
When the lambda is ready the EnterLoopInstruction is replaced by a
CompiledLoopInstruction that holds on a delegate to the compiled lambda
and calls it upon execution.
Perf impact
The interpreter thruput with disabled compilation is about 5% worse on
Pystone with this change. About 1% amounts for tracking interpreted
stack chain the rest is probably due to the more expensive try-finally
blocks (continuation stack is allocated, continuations are pushed/popped
on entry/exit to try and finally blocks, etc.).
-X:NoAdaptiveCompilation is now better than adaptive compilation only by
4-7% (for compilation threshold 2 and 32, respectively), it used to be
about 4 times better.
Misc
Special cases adaptive compilation for CompilationThreshold 0 and 1. In
both cases the compilation is synchronous. This allows us to easily test
and debug loop compiler and lambda compiler.
Implements instruction provider for FinallyFlowControlExpression - the
interpreter handles jumps from finally directly, so we don’t need to
rewrite the tree.
FlowControlRewriter should reduce all extensible nodes within the tree.
It might miss some goto expressions or finally clauses otherwise (e.g. {
label: try { REDUCIBLE } finally { REDUCIBLE; } }, where any of the
REDUCIBLEs reduces to “goto label”.
Ruby, Python:
CatchBlock defines a scope for its exception variable, which wasn’t
taken into account in Python and Ruby AST generators and rewriters. They
declared the variable in the containing block duplicating the variable
definition and depending on variable shadowing. Removes the duplicate
declarations.
Removes “compileLoops” argument passed to LightCompile. All loops are
adaptively compiled now.
Python
Adds missing debug info around for-loop initialization (see
test_traceback.py run:test_throw_while_yield)
Increases test_memory limit to 18k since the loop is adaptively compiled
now. We might want to disable adaptive compilation during this test.
Disables test_dict.py run:test_container_iterator. Filed bug:
http://ironpython.codeplex.com/WorkItem/View.aspx?WorkItemId=25419
Disables test_traceback.py run:test_throw_while_yield. Filed bug:
http://ironpython.codeplex.com/WorkItem/View.aspx?WorkItemId=25428
Ruby:
Fixes mangling of “me” name.
Disabled one test case in core/kernel/caller_spec.rb. The behavior that
made this test accidentally pass was incorrect.
Tomas