|
Make the worklist in the uninitialized values checker actually a queue.
Previously, despite the names 'enqueue' and 'dequeue', it behaved as
a stack and visited blocks in a LIFO fashion. This interacts badly with
extremely broad CFGs *inside* of a loop (such as a large switch inside
a state machine) where every block updates a different variable.
When encountering such a CFG, the checker visited blocks in essentially
a "depth first" order due to the stack-like behavior of the work list.
Combined with each block updating a different variable, the saturation
logic of the checker caused it to re-traverse blocks [1,N-1] of the
broad CFG inside the loop after traversing block N. These re-traversals
were to propagate the variable values derived from block N. Assuming
approximately the same number of variables as inner blocks exist, the
end result is O(N^2) updates. By making this a queue, we also make the
traversal essentially "breadth-first" across each of the N inner blocks
of the loop. Then all of this state is propagated around to all N inner
blocks of the loop. The result is O(N) updates.
The truth is in the numbers:
Before, PR10183: 2540494 block vists (max: 2536495, avg: 37360)
After, PR10183: 137803 block visits (max: 134406, avg: 2026)
The nearly 20x reduction in work for PR10183 corresponds to a roughly
100x speedup in compile time.
|