|
|
Created:
12 years, 3 months ago by tejohnson Modified:
10 years, 4 months ago Reviewers:
christophe.lyon, matthew.gretton-dann, stevenb.gcc, law, howarth, hubicka CC:
davidxl, gcc-patches_gcc.gnu.org, stevenb.gcc, matthew.gretton-dann_linaro.org, christophe.lyon_linaro.org Base URL:
svn+ssh://gcc.gnu.org/svn/gcc/trunk/gcc/ Visibility:
Public. |
Patch Set 1 #Patch Set 2 : Fix PR 53743 and other -freorder-blocks-and-partition failures #Patch Set 3 : Fix PR 53743 and other -freorder-blocks-and-partition failures #
MessagesTotal messages: 35
This patch fixes three different failures I encountered while trying to use -freorder-blocks-and-partition, including the failure reported in PR 53743. Bootstrapped and tested on x86_64-unknown-linux-gnu. Ok for trunk? Thanks, Teresa 2012-10-29 Teresa Johnson <tejohnson@google.com> PR optimization/53743 * function.c (thread_prologue_and_epilogue_insns): Don't store exit predecessor BB until after it is potentially split. * bb-reorder.c (insert_section_boundary_note): Ensure that a barrier exists before a switch section node, as this is expected by later passes (e.g. dwarf CFI code). * cfgrtl.c (rtl_can_merge_blocks): Use the same condition looking for region-crossing jumps as in try_redirect_by_replacing_jump, which may be called while merging blocks. (cfg_layout_can_merge_blocks_p): Ditto. Index: function.c =================================================================== --- function.c (revision 192692) +++ function.c (working copy) @@ -6517,7 +6517,7 @@ epilogue_done: basic_block simple_return_block_cold = NULL; edge pending_edge_hot = NULL; edge pending_edge_cold = NULL; - basic_block exit_pred = EXIT_BLOCK_PTR->prev_bb; + basic_block exit_pred; int i; gcc_assert (entry_edge != orig_entry_edge); @@ -6545,6 +6545,12 @@ epilogue_done: else pending_edge_cold = e; } + + /* Save a pointer to the exit's predecessor BB for use in + inserting new BBs at the end of the function. Do this + after the call to split_block above which may split + the original exit pred. */ + exit_pred = EXIT_BLOCK_PTR->prev_bb; FOR_EACH_VEC_ELT (edge, unconverted_simple_returns, i, e) { Index: bb-reorder.c =================================================================== --- bb-reorder.c (revision 192692) +++ bb-reorder.c (working copy) @@ -2188,6 +2188,8 @@ insert_section_boundary_note (void) first_partition = BB_PARTITION (bb); if (BB_PARTITION (bb) != first_partition) { + /* There should be a barrier between text sections. */ + emit_barrier_after (BB_END (bb->prev_bb)); new_note = emit_note_before (NOTE_INSN_SWITCH_TEXT_SECTIONS, BB_HEAD (bb)); /* ??? This kind of note always lives between basic blocks, Index: cfgrtl.c =================================================================== --- cfgrtl.c (revision 192692) +++ cfgrtl.c (working copy) @@ -912,7 +912,8 @@ rtl_can_merge_blocks (basic_block a, basic_block b partition boundaries). See the comments at the top of bb-reorder.c:partition_hot_cold_basic_blocks for complete details. */ - if (BB_PARTITION (a) != BB_PARTITION (b)) + if (find_reg_note (BB_END (a), REG_CROSSING_JUMP, NULL_RTX) + || BB_PARTITION (a) != BB_PARTITION (b)) return false; /* Protect the loop latches. */ @@ -3978,7 +3979,8 @@ cfg_layout_can_merge_blocks_p (basic_block a, basi partition boundaries). See the comments at the top of bb-reorder.c:partition_hot_cold_basic_blocks for complete details. */ - if (BB_PARTITION (a) != BB_PARTITION (b)) + if (find_reg_note (BB_END (a), REG_CROSSING_JUMP, NULL_RTX) + || BB_PARTITION (a) != BB_PARTITION (b)) return false; /* Protect the loop latches. */ -- This patch is available for review at http://codereview.appspot.com/6823047
Sign in to reply to this message.
On 30 October 2012 05:20, Teresa Johnson <tejohnson@google.com> wrote: > Index: cfgrtl.c > =================================================================== > --- cfgrtl.c (revision 192692) > +++ cfgrtl.c (working copy) > @@ -912,7 +912,8 @@ rtl_can_merge_blocks (basic_block a, basic_block b > partition boundaries). See the comments at the top of > bb-reorder.c:partition_hot_cold_basic_blocks for complete details. */ > > - if (BB_PARTITION (a) != BB_PARTITION (b)) > + if (find_reg_note (BB_END (a), REG_CROSSING_JUMP, NULL_RTX) > + || BB_PARTITION (a) != BB_PARTITION (b)) > return false; > > /* Protect the loop latches. */ > @@ -3978,7 +3979,8 @@ cfg_layout_can_merge_blocks_p (basic_block a, basi > partition boundaries). See the comments at the top of > bb-reorder.c:partition_hot_cold_basic_blocks for complete details. */ > > - if (BB_PARTITION (a) != BB_PARTITION (b)) > + if (find_reg_note (BB_END (a), REG_CROSSING_JUMP, NULL_RTX) > + || BB_PARTITION (a) != BB_PARTITION (b)) > return false; > > /* Protect the loop latches. */ As this if() condition seems to be the canonical way to detect being in a different partition should it be moved out into a query function, and all of cfgrtl.c updated to use it? [Note I am not a maintainer and so can't approve/reject your patch]. Thanks, Matt -- Matthew Gretton-Dann Linaro Toolchain Working Group matthew.gretton-dann@linaro.org
Sign in to reply to this message.
On Tue, Oct 30, 2012 at 8:49 AM, Matthew Gretton-Dann wrote: > On 30 October 2012 05:20, Teresa Johnson wrote: >> Index: cfgrtl.c >> =================================================================== >> --- cfgrtl.c (revision 192692) >> +++ cfgrtl.c (working copy) >> @@ -912,7 +912,8 @@ rtl_can_merge_blocks (basic_block a, basic_block b >> partition boundaries). See the comments at the top of >> bb-reorder.c:partition_hot_cold_basic_blocks for complete details. */ >> >> - if (BB_PARTITION (a) != BB_PARTITION (b)) >> + if (find_reg_note (BB_END (a), REG_CROSSING_JUMP, NULL_RTX) >> + || BB_PARTITION (a) != BB_PARTITION (b)) >> return false; >> >> /* Protect the loop latches. */ >> @@ -3978,7 +3979,8 @@ cfg_layout_can_merge_blocks_p (basic_block a, basi >> partition boundaries). See the comments at the top of >> bb-reorder.c:partition_hot_cold_basic_blocks for complete details. */ >> >> - if (BB_PARTITION (a) != BB_PARTITION (b)) >> + if (find_reg_note (BB_END (a), REG_CROSSING_JUMP, NULL_RTX) >> + || BB_PARTITION (a) != BB_PARTITION (b)) >> return false; >> >> /* Protect the loop latches. */ > > As this if() condition seems to be the canonical way to detect being > in a different partition should it be moved out into a query function, > and all of cfgrtl.c updated to use it? Not just in cfgrtl.c but for example also in ifcvt.c (which currently only tests for notes, that's broken). Ciao! Steven
Sign in to reply to this message.
On Tue, Oct 30, 2012 at 6:20 AM, Teresa Johnson wrote: > Index: bb-reorder.c > =================================================================== > --- bb-reorder.c (revision 192692) > +++ bb-reorder.c (working copy) > @@ -2188,6 +2188,8 @@ insert_section_boundary_note (void) > first_partition = BB_PARTITION (bb); > if (BB_PARTITION (bb) != first_partition) > { > + /* There should be a barrier between text sections. */ > + emit_barrier_after (BB_END (bb->prev_bb)); So why isn't there one? There can't be a fall-through edge from one section to the other, so cfgrtl.c:fixup_reorder_chain should have added a barrier here already in the code under the comment: /* Now add jumps and labels as needed to match the blocks new outgoing edges. */ Why isn't it doing that for you? BTW, something else I noted in cfgrtl.c: NOTE_INSN_SWITCH_TEXT_SECTIONS shouldn't be copied in duplicate_insn_chain, so the following is necessary for robustness: Index: cfgrtl.c =================================================================== --- cfgrtl.c (revision 191819) +++ cfgrtl.c (working copy) @@ -3615,7 +3615,6 @@ break; case NOTE_INSN_EPILOGUE_BEG: - case NOTE_INSN_SWITCH_TEXT_SECTIONS: emit_note_copy (insn); break; There can be only one! One note to rule them all! etc. > Index: cfgrtl.c > =================================================================== > --- cfgrtl.c (revision 192692) > +++ cfgrtl.c (working copy) > @@ -912,7 +912,8 @@ rtl_can_merge_blocks (basic_block a, basic_block b > partition boundaries). See the comments at the top of > bb-reorder.c:partition_hot_cold_basic_blocks for complete details. */ > > - if (BB_PARTITION (a) != BB_PARTITION (b)) > + if (find_reg_note (BB_END (a), REG_CROSSING_JUMP, NULL_RTX) > + || BB_PARTITION (a) != BB_PARTITION (b)) > return false; My dislike for this whole scheme just continues to grow... How can there be a REG_CROSSING_JUMP note if BB_PARTITION(a)==BB_PARTITION(b)? That is a bug. We should not need the notes here. As long as we have the CFG, BB_PARTITION(a)==BB_PARTITION(b) should be the canonical way to check whether two blocks are in the same partition, and the EDGE_CROSSING flag should be set iff an edge crosses from one section to another. The REG_CROSSING_JUMP note should only be used to see if a JUMP_INSN may jump to another section, without having to check all successor edges. Any place where we have to check the BB_PARTITION or edge->flags&EDGE_CROSSING *and* REG_CROSSING_JUMP indicates a bug in the partitioning updating. Another BTW: sched-vis.c doesn't handle REG_CROSSING_JUMP notes so that slim RTL dumping breaks. I need this patchlet to make things work: Index: sched-vis.c =================================================================== --- sched-vis.c (revision 191819) +++ sched-vis.c (working copy) @@ -553,6 +553,11 @@ { char t1[BUF_LEN], t2[BUF_LEN], t3[BUF_LEN]; + if (! x) + { + sprintf (buf, "(nil)"); + return; + } switch (GET_CODE (x)) { case SET: Ciao! Steven
Sign in to reply to this message.
On Tue, Oct 30, 2012 at 6:48 PM, Steven Bosscher <stevenb.gcc@gmail.com> wrote: > On Tue, Oct 30, 2012 at 6:20 AM, Teresa Johnson wrote: >> Index: bb-reorder.c >> =================================================================== >> --- bb-reorder.c (revision 192692) >> +++ bb-reorder.c (working copy) >> @@ -2188,6 +2188,8 @@ insert_section_boundary_note (void) >> first_partition = BB_PARTITION (bb); >> if (BB_PARTITION (bb) != first_partition) >> { >> + /* There should be a barrier between text sections. */ >> + emit_barrier_after (BB_END (bb->prev_bb)); > > So why isn't there one? There can't be a fall-through edge from one > section to the other, so cfgrtl.c:fixup_reorder_chain should have > added a barrier here already in the code under the comment: > > /* Now add jumps and labels as needed to match the blocks new > outgoing edges. */ > > Why isn't it doing that for you? Maybe it's because fix_up_fall_thru_edges calls force_nonfallthru, which is incorrectly inserting JUMP_INSNs and BARRIERs in cfglayout mode. I'm going to test this patch: Index: cfgrtl.c =================================================================== --- cfgrtl.c (revision 192889) +++ cfgrtl.c (working copy) @@ -1511,16 +1511,17 @@ force_nonfallthru_and_redirect (edge e, #endif } set_return_jump_label (BB_END (jump_block)); + emit_barrier_after (BB_END (jump_block)); } - else + else if (current_ir_type () == IR_RTL_CFGRTL) { rtx label = block_label (target); emit_jump_insn_after_setloc (gen_jump (label), BB_END (jump_block), loc); JUMP_LABEL (BB_END (jump_block)) = label; LABEL_NUSES (label)++; + emit_barrier_after (BB_END (jump_block)); } - emit_barrier_after (BB_END (jump_block)); redirect_edge_succ_nodup (e, target); if (abnormal_edge_flags)
Sign in to reply to this message.
Hello Teresa, Could you try this patch for me also? It moves bbpart outside the part of the passes pipeline that works in cfglayout mode.
Sign in to reply to this message.
On Tue, Oct 30, 2012 at 10:28 PM, Steven Bosscher wrote: > Hello Teresa, > > Could you try this patch for me also? It moves bbpart outside the part > of the passes pipeline that works in cfglayout mode. where's the "unsend" button if you need it... So, to complete the mail... Could you try this patch for me also? It moves bbpart outside the part of the passes pipeline that works in cfglayout mode. It looks like when someone (/me looks the other way) changed the compiler to work like that, he/she forgot about updating this pass... Instead, the pass should run just before register allocation, as late as possible so that any funny CFG modifications have taken place. A possible down-side is that profile info may have degenerated a bit further, but I don't think that's a serious concern because the partitioning is actually quite stupid: Just stuff all blocks with count==0 into the cold section. Updating 0-counts is easy enough that I think GCC should get that right everywhere. It'd be nice to figure out if there are less stupid^Wsimplistic heuristics to decide what should go into the cold partition, for GCC 4.9... Ciao! Steven * passes.c (init_optimization_passes): Move pass_partition_blocks just before RA. Index: passes.c =================================================================== --- passes.c (revision 192995) +++ passes.c (working copy) @@ -1595,7 +1595,6 @@ init_optimization_passes (void) NEXT_PASS (pass_ud_rtl_dce); NEXT_PASS (pass_combine); NEXT_PASS (pass_if_after_combine); - NEXT_PASS (pass_partition_blocks); NEXT_PASS (pass_regmove); NEXT_PASS (pass_outof_cfg_layout_mode); NEXT_PASS (pass_split_all_insns); @@ -1606,6 +1605,7 @@ init_optimization_passes (void) NEXT_PASS (pass_match_asm_constraints); NEXT_PASS (pass_sms); NEXT_PASS (pass_sched); + NEXT_PASS (pass_partition_blocks); NEXT_PASS (pass_ira); NEXT_PASS (pass_reload); NEXT_PASS (pass_postreload);
Sign in to reply to this message.
On Tue, Oct 30, 2012 at 2:33 PM, Steven Bosscher <stevenb.gcc@gmail.com> wrote: > On Tue, Oct 30, 2012 at 10:28 PM, Steven Bosscher wrote: >> Hello Teresa, >> >> Could you try this patch for me also? It moves bbpart outside the part >> of the passes pipeline that works in cfglayout mode. > > where's the "unsend" button if you need it... > > So, to complete the mail... > > Could you try this patch for me also? It moves bbpart outside the part > of the passes pipeline that works in cfglayout mode. It looks like > when someone (/me looks the other way) changed the compiler to work > like that, he/she forgot about updating this pass... Sure, I will give this a try after your verification patch tests complete. Does this mean that the patch you posted above to force_nonfallthru_and_redirect is no longer needed either? I'll see if I can avoid the need for some of my fixes, although I believe at least the function.c one will still be needed. I'll check. Regarding your earlier question about why we needed to add the barrier, I need to dig up the details again but essentially I found that the barriers were being added by bbpart, but bbro was reordering things and the block that ended up at the border between the hot and cold section didn't necessarily have a barrier on it because it was not previously at the region boundary. > > Instead, the pass should run just before register allocation, as late > as possible so that any funny CFG modifications have taken place. A > possible down-side is that profile info may have degenerated a bit > further, but I don't think that's a serious concern because the > partitioning is actually quite stupid: Just stuff all blocks with > count==0 into the cold section. Updating 0-counts is easy enough that > I think GCC should get that right everywhere. > > It'd be nice to figure out if there are less stupid^Wsimplistic > heuristics to decide what should go into the cold partition, for GCC > 4.9... Yep, that is one of my goals in looking at this - I am hoping to use the new fdo summary info I added to tune this into a more robust decision based on the counter values in the working set summary. Thanks for the help! Teresa > > Ciao! > Steven > > > * passes.c (init_optimization_passes): Move > pass_partition_blocks just before RA. > > Index: passes.c > =================================================================== > --- passes.c (revision 192995) > +++ passes.c (working copy) > @@ -1595,7 +1595,6 @@ init_optimization_passes (void) > NEXT_PASS (pass_ud_rtl_dce); > NEXT_PASS (pass_combine); > NEXT_PASS (pass_if_after_combine); > - NEXT_PASS (pass_partition_blocks); > NEXT_PASS (pass_regmove); > NEXT_PASS (pass_outof_cfg_layout_mode); > NEXT_PASS (pass_split_all_insns); > @@ -1606,6 +1605,7 @@ init_optimization_passes (void) > NEXT_PASS (pass_match_asm_constraints); > NEXT_PASS (pass_sms); > NEXT_PASS (pass_sched); > + NEXT_PASS (pass_partition_blocks); > NEXT_PASS (pass_ira); > NEXT_PASS (pass_reload); > NEXT_PASS (pass_postreload); -- Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Sign in to reply to this message.
On Tue, Oct 30, 2012 at 2:33 PM, Steven Bosscher <stevenb.gcc@gmail.com> wrote: > On Tue, Oct 30, 2012 at 10:28 PM, Steven Bosscher wrote: >> Hello Teresa, >> >> Could you try this patch for me also? It moves bbpart outside the part >> of the passes pipeline that works in cfglayout mode. > > where's the "unsend" button if you need it... > > So, to complete the mail... > > Could you try this patch for me also? It moves bbpart outside the part > of the passes pipeline that works in cfglayout mode. It looks like > when someone (/me looks the other way) changed the compiler to work > like that, he/she forgot about updating this pass... > > Instead, the pass should run just before register allocation, as late > as possible so that any funny CFG modifications have taken place. A > possible down-side is that profile info may have degenerated a bit > further, but I don't think that's a serious concern because the > partitioning is actually quite stupid: Just stuff all blocks with > count==0 into the cold section. Updating 0-counts is easy enough that > I think GCC should get that right everywhere. > > It'd be nice to figure out if there are less stupid^Wsimplistic > heuristics to decide what should go into the cold partition, for GCC > 4.9... > > Ciao! > Steven > > > * passes.c (init_optimization_passes): Move > pass_partition_blocks just before RA. > > Index: passes.c > =================================================================== > --- passes.c (revision 192995) > +++ passes.c (working copy) > @@ -1595,7 +1595,6 @@ init_optimization_passes (void) > NEXT_PASS (pass_ud_rtl_dce); > NEXT_PASS (pass_combine); > NEXT_PASS (pass_if_after_combine); > - NEXT_PASS (pass_partition_blocks); > NEXT_PASS (pass_regmove); > NEXT_PASS (pass_outof_cfg_layout_mode); > NEXT_PASS (pass_split_all_insns); > @@ -1606,6 +1605,7 @@ init_optimization_passes (void) > NEXT_PASS (pass_match_asm_constraints); > NEXT_PASS (pass_sms); > NEXT_PASS (pass_sched); > + NEXT_PASS (pass_partition_blocks); > NEXT_PASS (pass_ira); > NEXT_PASS (pass_reload); > NEXT_PASS (pass_postreload); This doesn't quite work for me. I got an error because bbpart has PROP_cfglayout listed in its required properties. I tried removing that, but then verify_flow_info gives an error about a missing barrier after a bb with no fall through edges. Teresa -- Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Sign in to reply to this message.
On Tue, Oct 30, 2012 at 10:43 PM, Teresa Johnson wrote: > Sure, I will give this a try after your verification patch tests > complete. Does this mean that the patch you posted above to > force_nonfallthru_and_redirect is no longer needed either? I'll see if > I can avoid the need for some of my fixes, although I believe at least > the function.c one will still be needed. I'll check. The force_nonfallthru change is still necessary, because force_nonfallthru should be almost a no-op in cfglayout mode. The whole concept of a fallthru edge doesn't exist in cfglayout mode: any single_succ edge is a fallthru edge until the order of the basic blocks has been determined and the insn chain is re-linked (cfglayout mode originally was developed for bb-reorder, to move blocks around more easily). So the correct patch would actually be: Index: cfgrtl.c =================================================================== --- cfgrtl.c (revision 193046) +++ cfgrtl.c (working copy) @@ -4547,7 +4547,7 @@ struct cfg_hooks cfg_layout_rtl_cfg_hooks = { cfg_layout_split_edge, rtl_make_forwarder_block, NULL, /* tidy_fallthru_edge */ - rtl_force_nonfallthru, + NULL, /* force_nonfallthru */ rtl_block_ends_with_call_p, rtl_block_ends_with_condjump_p, rtl_flow_call_edges_add, (Or better yet: Remove the force_nonfallthru and tidy_fallthru_edge hooks, they are cfgrtl-only.) But obviously that won't work because bb-reorder.c:fix_up_fall_thru_edges calls this hook while we're in cfglayout mode. That is a bug. The call to force_nonfallthru results in a "dangling" barrier: cfgrtl.c:1523 emit_barrier_after (BB_END (jump_block)); In cfglayout mode, barriers don't exist in the insns chain, and they don't have to because every edge is a fallthru edge. If there are barriers before cfglayout mode, they are either removed or linked in the basic block footer, and fixup_reorder_chain restores or inserts barriers where necessary to drop out of cfglayout mode. This emit_barrier_after call hangs a barrier after BB_END but not in the footer, and I'm pretty sure the result will be that the barrier is lost in fixup_reorder_chain. See also emit_barrier_after_bb for how inserting a barrier should be done in cfglayout mode. So in short, bbpart doesn't know what it wants to be: a cfgrtl or a cfglayout pass. It doesn't work without cfglayout but it's doing things that are only correct in the cfgrtl world and Very Wrong Indeed in cfglayout-land. > Regarding your earlier question about why we needed to add the > barrier, I need to dig up the details again but essentially I found > that the barriers were being added by bbpart, but bbro was reordering > things and the block that ended up at the border between the hot and > cold section didn't necessarily have a barrier on it because it was > not previously at the region boundary. That doesn't sound right. bbpart doesn't actually re-order the basic blocks, it only marks the blocks with the partition they will be assigned to. Whatever ends up at the border between the two partitions is not relevant: the hot section cannot end in a fall-through edge to the cold section (verify_flow_info even checks for that, see "fallthru edge crosses section boundary (bb %i)") so it must end in some explicit jump. Such jumps are always followed by a barrier. The only reason I can think of why there might be a missing barrier, is because fixup_reorder_chain has a bug and forgets to insert the barrier in some cases (and I suspect this may be the case for return patterns, or the a.m. issue of a dropper barrier). I would like to work on debugging this, but it's hard without test cases... Ciao! Steven
Sign in to reply to this message.
> I would like to work on debugging this, but it's hard without test cases... Maybe the files I attached to my PR55121 could help you in this respect? Your "sanity checking" patching does complain with these input files. Christophe.
Sign in to reply to this message.
On Wed, Oct 31, 2012 at 4:02 PM, Steven Bosscher <stevenb.gcc@gmail.com> wrote: > On Tue, Oct 30, 2012 at 10:43 PM, Teresa Johnson wrote: >> Sure, I will give this a try after your verification patch tests >> complete. Does this mean that the patch you posted above to >> force_nonfallthru_and_redirect is no longer needed either? I'll see if >> I can avoid the need for some of my fixes, although I believe at least >> the function.c one will still be needed. I'll check. > > The force_nonfallthru change is still necessary, because > force_nonfallthru should be almost a no-op in cfglayout mode. The > whole concept of a fallthru edge doesn't exist in cfglayout mode: any > single_succ edge is a fallthru edge until the order of the basic > blocks has been determined and the insn chain is re-linked (cfglayout > mode originally was developed for bb-reorder, to move blocks around > more easily). So the correct patch would actually be: > > Index: cfgrtl.c > =================================================================== > --- cfgrtl.c (revision 193046) > +++ cfgrtl.c (working copy) > @@ -4547,7 +4547,7 @@ struct cfg_hooks cfg_layout_rtl_cfg_hooks = { > cfg_layout_split_edge, > rtl_make_forwarder_block, > NULL, /* tidy_fallthru_edge */ > - rtl_force_nonfallthru, > + NULL, /* force_nonfallthru */ > rtl_block_ends_with_call_p, > rtl_block_ends_with_condjump_p, > rtl_flow_call_edges_add, > > (Or better yet: Remove the force_nonfallthru and tidy_fallthru_edge > hooks, they are cfgrtl-only.) > > But obviously that won't work because > bb-reorder.c:fix_up_fall_thru_edges calls this hook while we're in > cfglayout mode. That is a bug. The call to force_nonfallthru results > in a "dangling" barrier: > > cfgrtl.c:1523 emit_barrier_after (BB_END (jump_block)); > > In cfglayout mode, barriers don't exist in the insns chain, and they > don't have to because every edge is a fallthru edge. If there are > barriers before cfglayout mode, they are either removed or linked in > the basic block footer, and fixup_reorder_chain restores or inserts > barriers where necessary to drop out of cfglayout mode. This > emit_barrier_after call hangs a barrier after BB_END but not in the > footer, and I'm pretty sure the result will be that the barrier is > lost in fixup_reorder_chain. See also emit_barrier_after_bb for how > inserting a barrier should be done in cfglayout mode. > > So in short, bbpart doesn't know what it wants to be: a cfgrtl or a > cfglayout pass. It doesn't work without cfglayout but it's doing > things that are only correct in the cfgrtl world and Very Wrong Indeed > in cfglayout-land. > > >> Regarding your earlier question about why we needed to add the >> barrier, I need to dig up the details again but essentially I found >> that the barriers were being added by bbpart, but bbro was reordering >> things and the block that ended up at the border between the hot and >> cold section didn't necessarily have a barrier on it because it was >> not previously at the region boundary. > > That doesn't sound right. bbpart doesn't actually re-order the basic > blocks, it only marks the blocks with the partition they will be > assigned to. Whatever ends up at the border between the two partitions > is not relevant: the hot section cannot end in a fall-through edge to > the cold section (verify_flow_info even checks for that, see "fallthru > edge crosses section boundary (bb %i)") so it must end in some > explicit jump. Such jumps are always followed by a barrier. The only > reason I can think of why there might be a missing barrier, is because > fixup_reorder_chain has a bug and forgets to insert the barrier in > some cases (and I suspect this may be the case for return patterns, or > the a.m. issue of a dropper barrier). > > I would like to work on debugging this, but it's hard without test cases... I'm working on trying to reproduce some of these failures in a test case I can share. So far, I have only been able to reproduce the failure reported in PR 53743 in spec2006 (456.hmmer/sre_math.c). Still working on getting a smaller/shareable test case for the other 2 issues. The failure in PR 53743 (assert in cfg_layout_merge_blocks) is what I had fixed with my original changes to cfgrtl.c. Need to understand why there is a reg crossing note between 2 bbs in the same partition. In the hmmer test case I also hit a failures in rtl_verify_flow_info and rtl_verify_flow_info_1: gcc -c -o sre_math.o -DSPEC_CPU -D NDEBUG -fprofile-use -freorder-blocks-and-partition -O2 sre_math.c sre_math.c: In function ‘Gammln’: sre_math.c:161:1: error: EDGE_CROSSING incorrectly set across same section } ^ sre_math.c:161:1: error: missing barrier after block 6 sre_math.c:161:1: internal compiler error: verify_flow_info failed This was due to some code in thread_prologue_and_epilogue_insns that duplicated tail blocks: if (e) { copy_bb = create_basic_block (NEXT_INSN (BB_END (e->src)), NULL_RTX, e->src); BB_COPY_PARTITION (copy_bb, e->src); } In this case e->src (bb 6) was in the cold section and e->dest was in the hot section, and e->src ended with a REG_CROSSING_JUMP followed by a barrier. The new copy_bb got put into the cold section by the copy partition above, leading to the first error. And because the create_basic_block call inserted the new copy_bb before NEXT_INSN (BB_END (e->src)), which in this case was the barrier, we ended up without the barrier after the crossing edge. I fixed this by making the following change: --- function.c (revision 192692) +++ function.c (working copy) @@ -6249,9 +6249,18 @@ thread_prologue_and_epilogue_insns (void) break; if (e) { + rtx note; copy_bb = create_basic_block (NEXT_INSN (BB_END (e->src)), NULL_RTX, e->src); BB_COPY_PARTITION (copy_bb, e->src); + /* Remove the region crossing note from jump at end of + e->src if it exists. */ + note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); + if (note) + /* There would also have been a barrier after e->src, that + is now after copy_bb, but that shouldn't be a + problem?. */ + remove_note (BB_END (e->src), note); } else { But I am not sure this is really correct in all cases - for example, what if another hot bb that also didn't require a prologue branched into the new cloned tail sequence, which is now cold? E.g. dup_block_and_redirect will redirect all predecessors that don't need a prologue to the new copy. I'm going to see if I can get the other 2 failures I had found to trigger on spec or a smaller test case. Teresa > > Ciao! > Steven -- Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Sign in to reply to this message.
On Thu, Nov 1, 2012 at 10:19 AM, Teresa Johnson <tejohnson@google.com> wrote: > On Wed, Oct 31, 2012 at 4:02 PM, Steven Bosscher <stevenb.gcc@gmail.com> wrote: >> On Tue, Oct 30, 2012 at 10:43 PM, Teresa Johnson wrote: >>> Sure, I will give this a try after your verification patch tests >>> complete. Does this mean that the patch you posted above to >>> force_nonfallthru_and_redirect is no longer needed either? I'll see if >>> I can avoid the need for some of my fixes, although I believe at least >>> the function.c one will still be needed. I'll check. >> >> The force_nonfallthru change is still necessary, because >> force_nonfallthru should be almost a no-op in cfglayout mode. The >> whole concept of a fallthru edge doesn't exist in cfglayout mode: any >> single_succ edge is a fallthru edge until the order of the basic >> blocks has been determined and the insn chain is re-linked (cfglayout >> mode originally was developed for bb-reorder, to move blocks around >> more easily). So the correct patch would actually be: >> >> Index: cfgrtl.c >> =================================================================== >> --- cfgrtl.c (revision 193046) >> +++ cfgrtl.c (working copy) >> @@ -4547,7 +4547,7 @@ struct cfg_hooks cfg_layout_rtl_cfg_hooks = { >> cfg_layout_split_edge, >> rtl_make_forwarder_block, >> NULL, /* tidy_fallthru_edge */ >> - rtl_force_nonfallthru, >> + NULL, /* force_nonfallthru */ >> rtl_block_ends_with_call_p, >> rtl_block_ends_with_condjump_p, >> rtl_flow_call_edges_add, >> >> (Or better yet: Remove the force_nonfallthru and tidy_fallthru_edge >> hooks, they are cfgrtl-only.) >> >> But obviously that won't work because >> bb-reorder.c:fix_up_fall_thru_edges calls this hook while we're in >> cfglayout mode. That is a bug. The call to force_nonfallthru results >> in a "dangling" barrier: >> >> cfgrtl.c:1523 emit_barrier_after (BB_END (jump_block)); >> >> In cfglayout mode, barriers don't exist in the insns chain, and they >> don't have to because every edge is a fallthru edge. If there are >> barriers before cfglayout mode, they are either removed or linked in >> the basic block footer, and fixup_reorder_chain restores or inserts >> barriers where necessary to drop out of cfglayout mode. This >> emit_barrier_after call hangs a barrier after BB_END but not in the >> footer, and I'm pretty sure the result will be that the barrier is >> lost in fixup_reorder_chain. See also emit_barrier_after_bb for how >> inserting a barrier should be done in cfglayout mode. >> >> So in short, bbpart doesn't know what it wants to be: a cfgrtl or a >> cfglayout pass. It doesn't work without cfglayout but it's doing >> things that are only correct in the cfgrtl world and Very Wrong Indeed >> in cfglayout-land. >> >> >>> Regarding your earlier question about why we needed to add the >>> barrier, I need to dig up the details again but essentially I found >>> that the barriers were being added by bbpart, but bbro was reordering >>> things and the block that ended up at the border between the hot and >>> cold section didn't necessarily have a barrier on it because it was >>> not previously at the region boundary. >> >> That doesn't sound right. bbpart doesn't actually re-order the basic >> blocks, it only marks the blocks with the partition they will be >> assigned to. Whatever ends up at the border between the two partitions >> is not relevant: the hot section cannot end in a fall-through edge to >> the cold section (verify_flow_info even checks for that, see "fallthru >> edge crosses section boundary (bb %i)") so it must end in some >> explicit jump. Such jumps are always followed by a barrier. The only >> reason I can think of why there might be a missing barrier, is because >> fixup_reorder_chain has a bug and forgets to insert the barrier in >> some cases (and I suspect this may be the case for return patterns, or >> the a.m. issue of a dropper barrier). >> >> I would like to work on debugging this, but it's hard without test cases... > > I'm working on trying to reproduce some of these failures in a test > case I can share. So far, I have only been able to reproduce the > failure reported in PR 53743 in spec2006 (456.hmmer/sre_math.c). Still > working on getting a smaller/shareable test case for the other 2 > issues. > > The failure in PR 53743 (assert in cfg_layout_merge_blocks) is what I > had fixed with my original changes to cfgrtl.c. Need to understand why > there is a reg crossing note between 2 bbs in the same partition. Interestingly, this turned out to be the same root cause as the verify_flow_info failures below. It is fixed by the same fix to thread_prologue_and_epilogue_insns. When the code below created the copy_bb and put it in e->src's partition, it made it insufficient for the merge blocks routine to check if the two bbs were in the same partition, because they were in the same partition but separated by the region crossing jump. I'll do some testing of the fix below, but do you have any comments on the correctness or the potential issue I raised (see my note just below the patch)? Do you recommend pursuing the move of the bb partition phase until later, after we leave cfglayout mode? I need to revisit to see if my prologue/epilogue fix below also addresses the issue I saw when I tried moving it. Thanks, Teresa > > In the hmmer test case I also hit a failures in rtl_verify_flow_info > and rtl_verify_flow_info_1: > > gcc -c -o sre_math.o -DSPEC_CPU -D > NDEBUG -fprofile-use -freorder-blocks-and-partition -O2 > sre_math.c > sre_math.c: In function ‘Gammln’: > sre_math.c:161:1: error: EDGE_CROSSING incorrectly set across same section > } > ^ > sre_math.c:161:1: error: missing barrier after block 6 > sre_math.c:161:1: internal compiler error: verify_flow_info failed > > > This was due to some code in thread_prologue_and_epilogue_insns that > duplicated tail blocks: > > if (e) > { > copy_bb = create_basic_block (NEXT_INSN (BB_END (e->src)), > NULL_RTX, e->src); > BB_COPY_PARTITION (copy_bb, e->src); > } > > In this case e->src (bb 6) was in the cold section and e->dest was in > the hot section, and e->src ended with a REG_CROSSING_JUMP followed by > a barrier. The new copy_bb got put into the cold section by the copy > partition above, leading to the first error. And because the > create_basic_block call inserted the new copy_bb before NEXT_INSN > (BB_END (e->src)), which in this case was the barrier, we ended up > without the barrier after the crossing edge. > > I fixed this by making the following change: > > --- function.c (revision 192692) > +++ function.c (working copy) > @@ -6249,9 +6249,18 @@ thread_prologue_and_epilogue_insns (void) > break; > if (e) > { > + rtx note; > copy_bb = create_basic_block (NEXT_INSN (BB_END (e->src)), > NULL_RTX, e->src); > BB_COPY_PARTITION (copy_bb, e->src); > + /* Remove the region crossing note from jump at end of > + e->src if it exists. */ > + note = find_reg_note (BB_END (e->src), > REG_CROSSING_JUMP, NULL_RTX); > + if (note) > + /* There would also have been a barrier after > e->src, that > + is now after copy_bb, but that shouldn't be a > + problem?. */ > + remove_note (BB_END (e->src), note); > } > else > { > > But I am not sure this is really correct in all cases - for example, > what if another hot bb that also didn't require a prologue branched > into the new cloned tail sequence, which is now cold? E.g. > dup_block_and_redirect will redirect all predecessors that don't need > a prologue to the new copy. > > I'm going to see if I can get the other 2 failures I had found to > trigger on spec or a smaller test case. > > Teresa > >> >> Ciao! >> Steven > > > > -- > Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413 -- Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Sign in to reply to this message.
On Tue, Oct 30, 2012 at 10:48 AM, Steven Bosscher <stevenb.gcc@gmail.com> wrote: > On Tue, Oct 30, 2012 at 6:20 AM, Teresa Johnson wrote: >> Index: bb-reorder.c >> =================================================================== >> --- bb-reorder.c (revision 192692) >> +++ bb-reorder.c (working copy) >> @@ -2188,6 +2188,8 @@ insert_section_boundary_note (void) >> first_partition = BB_PARTITION (bb); >> if (BB_PARTITION (bb) != first_partition) >> { >> + /* There should be a barrier between text sections. */ >> + emit_barrier_after (BB_END (bb->prev_bb)); > > So why isn't there one? There can't be a fall-through edge from one > section to the other, so cfgrtl.c:fixup_reorder_chain should have > added a barrier here already in the code under the comment: > > /* Now add jumps and labels as needed to match the blocks new > outgoing edges. */ > > Why isn't it doing that for you? > > BTW, something else I noted in cfgrtl.c: > NOTE_INSN_SWITCH_TEXT_SECTIONS shouldn't be copied in > duplicate_insn_chain, so the following is necessary for robustness: > > Index: cfgrtl.c > =================================================================== > --- cfgrtl.c (revision 191819) > +++ cfgrtl.c (working copy) > @@ -3615,7 +3615,6 @@ > break; > > case NOTE_INSN_EPILOGUE_BEG: > - case NOTE_INSN_SWITCH_TEXT_SECTIONS: > emit_note_copy (insn); > break; Shouldn't the patch be: @@ -3630,10 +3631,11 @@ duplicate_insn_chain (rtx from, rtx to) case NOTE_INSN_FUNCTION_BEG: /* There is always just single entry to function. */ case NOTE_INSN_BASIC_BLOCK: + /* We should only switch text sections once. */ + case NOTE_INSN_SWITCH_TEXT_SECTIONS: break; case NOTE_INSN_EPILOGUE_BEG: - case NOTE_INSN_SWITCH_TEXT_SECTIONS: emit_note_copy (insn); break; i.e. move the NOTE above to where we will ignore it. Otherwise, we would fall into the default case which is listed as unreachable. Teresa > > > There can be only one! One note to rule them all! etc. > > >> Index: cfgrtl.c >> =================================================================== >> --- cfgrtl.c (revision 192692) >> +++ cfgrtl.c (working copy) >> @@ -912,7 +912,8 @@ rtl_can_merge_blocks (basic_block a, basic_block b >> partition boundaries). See the comments at the top of >> bb-reorder.c:partition_hot_cold_basic_blocks for complete details. */ >> >> - if (BB_PARTITION (a) != BB_PARTITION (b)) >> + if (find_reg_note (BB_END (a), REG_CROSSING_JUMP, NULL_RTX) >> + || BB_PARTITION (a) != BB_PARTITION (b)) >> return false; > > My dislike for this whole scheme just continues to grow... How can > there be a REG_CROSSING_JUMP note if BB_PARTITION(a)==BB_PARTITION(b)? > That is a bug. We should not need the notes here. > > As long as we have the CFG, BB_PARTITION(a)==BB_PARTITION(b) should be > the canonical way to check whether two blocks are in the same > partition, and the EDGE_CROSSING flag should be set iff an edge > crosses from one section to another. The REG_CROSSING_JUMP note should > only be used to see if a JUMP_INSN may jump to another section, > without having to check all successor edges. > > Any place where we have to check the BB_PARTITION or > edge->flags&EDGE_CROSSING *and* REG_CROSSING_JUMP indicates a bug in > the partitioning updating. > > Another BTW: sched-vis.c doesn't handle REG_CROSSING_JUMP notes so > that slim RTL dumping breaks. I need this patchlet to make things > work: > Index: sched-vis.c > =================================================================== > --- sched-vis.c (revision 191819) > +++ sched-vis.c (working copy) > @@ -553,6 +553,11 @@ > { > char t1[BUF_LEN], t2[BUF_LEN], t3[BUF_LEN]; > > + if (! x) > + { > + sprintf (buf, "(nil)"); > + return; > + } > switch (GET_CODE (x)) > { > case SET: > > Ciao! > Steven -- Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Sign in to reply to this message.
On Thu, Nov 1, 2012 at 10:26 PM, Teresa Johnson wrote: > I'll do some testing of the fix below, but do you have any comments on > the correctness or the potential issue I raised (see my note just > below the patch)? Sorry, I don't know the pro- and epilogue threading code well enough to be of any help to you... > Do you recommend pursuing the move of the bb partition phase until > later, after we leave cfglayout mode? I need to revisit to see if my > prologue/epilogue fix below also addresses the issue I saw when I > tried moving it. I think it should not be moved, cfglayout mode is the natural mode for this kind of CFG transformations. But someone will have to tackle the force_nonfallthru issue. Ciao! Steven
Sign in to reply to this message.
On Thu, Nov 1, 2012 at 2:26 PM, Teresa Johnson <tejohnson@google.com> wrote: > On Thu, Nov 1, 2012 at 10:19 AM, Teresa Johnson <tejohnson@google.com> wrote: >> On Wed, Oct 31, 2012 at 4:02 PM, Steven Bosscher <stevenb.gcc@gmail.com> wrote: >>> On Tue, Oct 30, 2012 at 10:43 PM, Teresa Johnson wrote: >>>> Sure, I will give this a try after your verification patch tests >>>> complete. Does this mean that the patch you posted above to >>>> force_nonfallthru_and_redirect is no longer needed either? I'll see if >>>> I can avoid the need for some of my fixes, although I believe at least >>>> the function.c one will still be needed. I'll check. >>> >>> The force_nonfallthru change is still necessary, because >>> force_nonfallthru should be almost a no-op in cfglayout mode. The >>> whole concept of a fallthru edge doesn't exist in cfglayout mode: any >>> single_succ edge is a fallthru edge until the order of the basic >>> blocks has been determined and the insn chain is re-linked (cfglayout >>> mode originally was developed for bb-reorder, to move blocks around >>> more easily). So the correct patch would actually be: >>> >>> Index: cfgrtl.c >>> =================================================================== >>> --- cfgrtl.c (revision 193046) >>> +++ cfgrtl.c (working copy) >>> @@ -4547,7 +4547,7 @@ struct cfg_hooks cfg_layout_rtl_cfg_hooks = { >>> cfg_layout_split_edge, >>> rtl_make_forwarder_block, >>> NULL, /* tidy_fallthru_edge */ >>> - rtl_force_nonfallthru, >>> + NULL, /* force_nonfallthru */ >>> rtl_block_ends_with_call_p, >>> rtl_block_ends_with_condjump_p, >>> rtl_flow_call_edges_add, >>> >>> (Or better yet: Remove the force_nonfallthru and tidy_fallthru_edge >>> hooks, they are cfgrtl-only.) >>> >>> But obviously that won't work because >>> bb-reorder.c:fix_up_fall_thru_edges calls this hook while we're in >>> cfglayout mode. That is a bug. The call to force_nonfallthru results >>> in a "dangling" barrier: >>> >>> cfgrtl.c:1523 emit_barrier_after (BB_END (jump_block)); >>> >>> In cfglayout mode, barriers don't exist in the insns chain, and they >>> don't have to because every edge is a fallthru edge. If there are >>> barriers before cfglayout mode, they are either removed or linked in >>> the basic block footer, and fixup_reorder_chain restores or inserts >>> barriers where necessary to drop out of cfglayout mode. This >>> emit_barrier_after call hangs a barrier after BB_END but not in the >>> footer, and I'm pretty sure the result will be that the barrier is >>> lost in fixup_reorder_chain. See also emit_barrier_after_bb for how >>> inserting a barrier should be done in cfglayout mode. >>> >>> So in short, bbpart doesn't know what it wants to be: a cfgrtl or a >>> cfglayout pass. It doesn't work without cfglayout but it's doing >>> things that are only correct in the cfgrtl world and Very Wrong Indeed >>> in cfglayout-land. >>> >>> >>>> Regarding your earlier question about why we needed to add the >>>> barrier, I need to dig up the details again but essentially I found >>>> that the barriers were being added by bbpart, but bbro was reordering >>>> things and the block that ended up at the border between the hot and >>>> cold section didn't necessarily have a barrier on it because it was >>>> not previously at the region boundary. >>> >>> That doesn't sound right. bbpart doesn't actually re-order the basic >>> blocks, it only marks the blocks with the partition they will be >>> assigned to. Whatever ends up at the border between the two partitions >>> is not relevant: the hot section cannot end in a fall-through edge to >>> the cold section (verify_flow_info even checks for that, see "fallthru >>> edge crosses section boundary (bb %i)") so it must end in some >>> explicit jump. Such jumps are always followed by a barrier. The only >>> reason I can think of why there might be a missing barrier, is because >>> fixup_reorder_chain has a bug and forgets to insert the barrier in >>> some cases (and I suspect this may be the case for return patterns, or >>> the a.m. issue of a dropper barrier). >>> >>> I would like to work on debugging this, but it's hard without test cases... >> >> I'm working on trying to reproduce some of these failures in a test >> case I can share. So far, I have only been able to reproduce the >> failure reported in PR 53743 in spec2006 (456.hmmer/sre_math.c). Still >> working on getting a smaller/shareable test case for the other 2 >> issues. >> >> The failure in PR 53743 (assert in cfg_layout_merge_blocks) is what I >> had fixed with my original changes to cfgrtl.c. Need to understand why >> there is a reg crossing note between 2 bbs in the same partition. > > Interestingly, this turned out to be the same root cause as the > verify_flow_info failures below. It is fixed by the same fix to > thread_prologue_and_epilogue_insns. When the code below created the > copy_bb and put it in e->src's partition, it made it insufficient for > the merge blocks routine to check if the two bbs were in the same > partition, because they were in the same partition but separated by > the region crossing jump. > > I'll do some testing of the fix below, but do you have any comments on > the correctness or the potential issue I raised (see my note just > below the patch)? > > Do you recommend pursuing the move of the bb partition phase until > later, after we leave cfglayout mode? I need to revisit to see if my > prologue/epilogue fix below also addresses the issue I saw when I > tried moving it. I found that indeed it wasn't a complete fix as there can be cases where we needed to redirect another pred branch in the hot section to the copy_bb that was now marked cold. I also found another case further down in the same thread_prologue_and_epilogue_insns routine that required the exact same kind of fixup (where we create a new block to hold a simple return and redirect edges to that block). Here too a new block was created, put into the same partition as the pred that we redirect to it, and that pred no longer needed to be marked with a region crossing jump. In all of these cases the edge redirection is handled by rtl_redirect_edge_and_branch_force, so I simply moved the fixup there and had it fixup jumps/edges from all preds as necessary based on the new target's partition. Finally, I also fixed the below call to create_basic_block to correctly insert the new basic block after any barriers following the BB_END (e->src). Another email coming shortly on the root cause I found for one of the other issues I was attempting to fix in my original patch set, that you also had questions about. Thanks, Teresa > > Thanks, > Teresa > >> >> In the hmmer test case I also hit a failures in rtl_verify_flow_info >> and rtl_verify_flow_info_1: >> >> gcc -c -o sre_math.o -DSPEC_CPU -D >> NDEBUG -fprofile-use -freorder-blocks-and-partition -O2 >> sre_math.c >> sre_math.c: In function ‘Gammln’: >> sre_math.c:161:1: error: EDGE_CROSSING incorrectly set across same section >> } >> ^ >> sre_math.c:161:1: error: missing barrier after block 6 >> sre_math.c:161:1: internal compiler error: verify_flow_info failed >> >> >> This was due to some code in thread_prologue_and_epilogue_insns that >> duplicated tail blocks: >> >> if (e) >> { >> copy_bb = create_basic_block (NEXT_INSN (BB_END (e->src)), >> NULL_RTX, e->src); >> BB_COPY_PARTITION (copy_bb, e->src); >> } >> >> In this case e->src (bb 6) was in the cold section and e->dest was in >> the hot section, and e->src ended with a REG_CROSSING_JUMP followed by >> a barrier. The new copy_bb got put into the cold section by the copy >> partition above, leading to the first error. And because the >> create_basic_block call inserted the new copy_bb before NEXT_INSN >> (BB_END (e->src)), which in this case was the barrier, we ended up >> without the barrier after the crossing edge. >> >> I fixed this by making the following change: >> >> --- function.c (revision 192692) >> +++ function.c (working copy) >> @@ -6249,9 +6249,18 @@ thread_prologue_and_epilogue_insns (void) >> break; >> if (e) >> { >> + rtx note; >> copy_bb = create_basic_block (NEXT_INSN (BB_END (e->src)), >> NULL_RTX, e->src); >> BB_COPY_PARTITION (copy_bb, e->src); >> + /* Remove the region crossing note from jump at end of >> + e->src if it exists. */ >> + note = find_reg_note (BB_END (e->src), >> REG_CROSSING_JUMP, NULL_RTX); >> + if (note) >> + /* There would also have been a barrier after >> e->src, that >> + is now after copy_bb, but that shouldn't be a >> + problem?. */ >> + remove_note (BB_END (e->src), note); >> } >> else >> { >> >> But I am not sure this is really correct in all cases - for example, >> what if another hot bb that also didn't require a prologue branched >> into the new cloned tail sequence, which is now cold? E.g. >> dup_block_and_redirect will redirect all predecessors that don't need >> a prologue to the new copy. >> >> I'm going to see if I can get the other 2 failures I had found to >> trigger on spec or a smaller test case. >> >> Teresa >> >>> >>> Ciao! >>> Steven >> >> >> >> -- >> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413 > > > > -- > Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413 -- Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Sign in to reply to this message.
On Tue, Oct 30, 2012 at 10:48 AM, Steven Bosscher <stevenb.gcc@gmail.com> wrote: > On Tue, Oct 30, 2012 at 6:20 AM, Teresa Johnson wrote: >> Index: bb-reorder.c >> =================================================================== >> --- bb-reorder.c (revision 192692) >> +++ bb-reorder.c (working copy) >> @@ -2188,6 +2188,8 @@ insert_section_boundary_note (void) >> first_partition = BB_PARTITION (bb); >> if (BB_PARTITION (bb) != first_partition) >> { >> + /* There should be a barrier between text sections. */ >> + emit_barrier_after (BB_END (bb->prev_bb)); > > So why isn't there one? There can't be a fall-through edge from one > section to the other, so cfgrtl.c:fixup_reorder_chain should have > added a barrier here already in the code under the comment: > > /* Now add jumps and labels as needed to match the blocks new > outgoing edges. */ > > Why isn't it doing that for you? I triggered the same error in 445.gobmk once I applied the thread_prologue_and_epilogue_insns fixes. This is an assert in the dwarf CFI code that complains about a NOTE_INSN_SWITCH_TEXTS_SECTION note not being preceeded by a barrier: gcc -c -o engine/utils.o -DSPEC_CPU -DNDEBUG -DHAVE_CONFIG_H -I. -I.. -I../include -I./include -fprofile-use -freorder-blocks-and-partition -freorder-blocks -ffunction-sections -O2 engine/utils.c engine/utils.c: In function ‘visible_along_edge’: engine/utils.c:274:1: internal compiler error: in create_pseudo_cfg, at dwarf2cfi.c:2742 } ^ In this case the switch section note was inside a BB. What I found was that this was due to several phases going into and back out of cfglayout mode again. In this case it was the compgotos phase. There aren't any computed gotos, but this change occurs during cfg_layout_initialize (in try_optimize_cfg called via cleanup_cfg). Here it merged two (non-contiguous) blocks that had a single-successor/single-predecessor relationship. However, the source block was previously on the section boundary and had a SWITCH note prior. This note is put into the header of the bb when we go into cfglayout mode, and ended up inside the new merged block, which was in any case not on the new border between the hot and cold sections. The correct solution in my opinion is to strip out the SWITCH note every time we enter cfglayout mode after bbro, and then invoke insert_section_boundary_note when leaving cfglayout (if one was found on entry to that cfglayout mode) to reapply it. This fixed not only the 445.gobmk failure, but I also found that I no longer need the above change to insert a barrier when inserting the SWITCH note (although now that I think about it more, it must have been the prologue/epilogue code fix that addressed that). In any case, the 445.gobmk code also showed another bug, that would have been caught by your new patch to verify that cold sections never dominate hot sections. In this case, the func entry block was marked cold and we switch to hot code part way through the function. The reason is that the entry bb count was 2 which is < than the # training runs which is 8. But later on in the code there is a loop which brings those bb counts above 8, and so they are marked hot. Obviously this doesn't make sense. The fix I plan to implement to the bbpart code to ensure no cold blocks dominate hot bbs should address this, but a more sophisticated algorithm for marking blocks hot vs cold would be better (either via the structural method you were working on, or by doing this on traces as part of bbro as David suggested). My plan is to add in your domination check patch, implement the domination fixes in the bbpart algorithm, do a bunch more testing and send the whole shebang out for review. Thanks, Teresa > > BTW, something else I noted in cfgrtl.c: > NOTE_INSN_SWITCH_TEXT_SECTIONS shouldn't be copied in > duplicate_insn_chain, so the following is necessary for robustness: > > Index: cfgrtl.c > =================================================================== > --- cfgrtl.c (revision 191819) > +++ cfgrtl.c (working copy) > @@ -3615,7 +3615,6 @@ > break; > > case NOTE_INSN_EPILOGUE_BEG: > - case NOTE_INSN_SWITCH_TEXT_SECTIONS: > emit_note_copy (insn); > break; > > > There can be only one! One note to rule them all! etc. > > >> Index: cfgrtl.c >> =================================================================== >> --- cfgrtl.c (revision 192692) >> +++ cfgrtl.c (working copy) >> @@ -912,7 +912,8 @@ rtl_can_merge_blocks (basic_block a, basic_block b >> partition boundaries). See the comments at the top of >> bb-reorder.c:partition_hot_cold_basic_blocks for complete details. */ >> >> - if (BB_PARTITION (a) != BB_PARTITION (b)) >> + if (find_reg_note (BB_END (a), REG_CROSSING_JUMP, NULL_RTX) >> + || BB_PARTITION (a) != BB_PARTITION (b)) >> return false; > > My dislike for this whole scheme just continues to grow... How can > there be a REG_CROSSING_JUMP note if BB_PARTITION(a)==BB_PARTITION(b)? > That is a bug. We should not need the notes here. > > As long as we have the CFG, BB_PARTITION(a)==BB_PARTITION(b) should be > the canonical way to check whether two blocks are in the same > partition, and the EDGE_CROSSING flag should be set iff an edge > crosses from one section to another. The REG_CROSSING_JUMP note should > only be used to see if a JUMP_INSN may jump to another section, > without having to check all successor edges. > > Any place where we have to check the BB_PARTITION or > edge->flags&EDGE_CROSSING *and* REG_CROSSING_JUMP indicates a bug in > the partitioning updating. > > Another BTW: sched-vis.c doesn't handle REG_CROSSING_JUMP notes so > that slim RTL dumping breaks. I need this patchlet to make things > work: > Index: sched-vis.c > =================================================================== > --- sched-vis.c (revision 191819) > +++ sched-vis.c (working copy) > @@ -553,6 +553,11 @@ > { > char t1[BUF_LEN], t2[BUF_LEN], t3[BUF_LEN]; > > + if (! x) > + { > + sprintf (buf, "(nil)"); > + return; > + } > switch (GET_CODE (x)) > { > case SET: > > Ciao! > Steven -- Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Sign in to reply to this message.
Ping. Teresa On Thu, Nov 15, 2012 at 12:10 PM, Teresa Johnson <tejohnson@google.com> wrote: > Revised patch that fixes failures encountered when enabling > -freorder-blocks-and-partition, including the failure reported in PR 53743. > > This includes new verification code to ensure no cold blocks dominate hot > blocks contributed by Steven Bosscher. > > I attempted to make the handling of partition updates through the optimization > passes much more consistent, removing a number of partial fixes in the code > stream in the process. The code to fixup partitions (including the BB_PARTITION > assignement, region crossing jump notes, and switch text section notes) is > now handled in a few centralized locations. For example, inside > rtl_redirect_edge_and_branch and force_nonfallthru_and_redirect, so that callers > don't need to attempt the fixup themselves. > > For optimization passes that make adjustments to the cfg while in cfg layout > mode that are not easy to fix up incrementally, the new routine > fixup_partitions handles the cleanup globally. This does require calculation > of the dominance relation, however, as far as I can tell the routines which > now invoke this global fixup (try_optimize_cfg and commit_edge_insertions) > are invoked typically once (or a small number of times in the case of > try_optimize_cfg) per optimization pass. Additionally, I compared the > -ftime-report output for some large fdo compilations and saw only minimal > increases in the dominance computation times, which were only a tiny percent > of the overall compile time. > > Additionally, I added a flag to the rtl_data structure to indicate whether > any partitioning was actually performed, so that optimizations which were > conservatively disabled whenever the flag_reorder_blocks_and_partition > is enabled (e.g. try_crossjump_to_edge, part of connect_traces) can be less > conservative for functions where no partitions were formed (e.g. they are > completely hot). > > Bootstrapped and tested on x86_64-unknown-linux-gnu. Also tested with SPEC2006 int > benchmarks and internal google benchmarks using profile feedback and > -freorder-blocks-and-partition to get more coverage. Ok for trunk? > > Thanks, > Teresa > > 2012-11-14 Teresa Johnson <tejohnson@google.com> > Steven Bosscher <steven@gcc.gnu.org> > > * cfghooks.h (cfg_layout_finalize): New parameter. > * modulo-sched.c (rest_of_handle_sms): New cfg_layout_finalize > parameter. > * ifcvt.c (find_if_case_1): Replace BB_COPY_PARTITION with assert > as this is now done by redirect_edge_and_branch_force. > * function.c (thread_prologue_and_epilogue_insns): Insert new bb after > barriers, new cfg_layout_finalize parameter, and don't store exit > predecessor BB until after it is potentially split. > * function.h (struct rtl_data): New flag has_bb_partition. > * hw-doloop.c (reorder_loops): New cfg_layout_finalize parameter. > * cfgcleanup.c (try_crossjump_to_edge): Only skip optimization if > any blocks in function actually partitioned. > (try_optimize_cfg): If cfg changed, invoke fixup_partitions to clean > up partitioning. > * bb-reorder.c (connect_traces): Only look for partitions and skip > block copying if any blocks in function actually partitioned. > (emit_barrier_after_bb): Handle insertion in non-cfglayout mode. > (find_rarely_executed_basic_blocks_and_crossing_edges): Ensure > that no cold blocks dominate a hot block. > (fix_up_fall_thru_edges): Replace BB_COPY_PARTITION with assert > as this is now done by force_nonfallthru_and_redirect. > (add_reg_crossing_jump_notes): Handle the fact that some jumps may > already be marked with region crossing note. > (reorder_basic_blocks): Only need to verify partitions if any > blocks in function actually partitioned. > (insert_section_boundary_note): Only need to insert note if any > blocks in function actually partitioned. > (rest_of_handle_reorder_blocks): New cfg_layout_finalize > parameter, and remove call to insert_section_boundary_note as this > is now called via cfg_layout_finalize/fixup_reorder_chain. > (duplicate_computed_gotos): New cfg_layout_finalize > parameter. > (partition_hot_cold_basic_blocks): Set flag indicating function > has bb partitions. > * bb-reorder.h: Declare insert_section_boundary_note and > emit_barrier_after_bb, which are no longer static. > * basic-block.h: Declare new function fixup_partitions. > * cfgrtl.c (try_redirect_by_replacing_jump): Remove unnecessary > check for region crossing note. > (fixup_partition_crossing): New function. > (fixup_bb_partition): Ditto. > (rtl_redirect_edge_and_branch): Fixup partition boundaries. > (force_nonfallthru_and_redirect): Fixup partition boundaries, > remove old code that tried to do this. Emit barrier correctly > when we are in cfglayout mode. > (rtl_split_edge): Correctly fixup partition boundaries. > (commit_one_edge_insertion): Remove old code that tried to > fixup region crossing edge since this is now handled in > split_block, and set up insertion point correctly since > block may now end in a jump. > (commit_edge_insertions): Invoke fixup_partitions to sanitize partition > boundaries after optimizations that modify cfg and before trying to > verify the flow info. > (fixup_partitions): New function. > (rtl_verify_flow_info_1): Add verification that no cold bbs dominate > hot bbs. > (record_effective_endpoints): Remove region-crossing notes and set flag > indicating that they need to be reinserted on exit from cfglayout mode. > (outof_cfg_layout_mode): New cfg_layout_finalize parameter. > (fixup_reorder_chain): Call insert_section_boundary_note if necessary. > Remove old code that attempted to fixup region crossing note as > this is now handled in force_nonfallthru_and_redirect. > (duplicate_insn_chain): Don't duplicate switch section notes. > (cfg_layout_finalize): Pass new parameter to fixup_reorder_chain. > (rtl_can_remove_branch_p): Remove unnecessary check for region crossing > note. > > Index: cfghooks.h > =================================================================== > --- cfghooks.h (revision 193376) > +++ cfghooks.h (working copy) > @@ -204,7 +204,7 @@ extern void copy_bbs (basic_block *, unsigned, bas > void account_profile_record (struct profile_record *, int); > > extern void cfg_layout_initialize (unsigned int); > -extern void cfg_layout_finalize (void); > +extern void cfg_layout_finalize (bool); > > /* Hooks containers. */ > extern struct cfg_hooks gimple_cfg_hooks; > @@ -218,4 +218,3 @@ extern void cfg_layout_rtl_register_cfg_hooks (voi > extern void gimple_register_cfg_hooks (void); > extern struct cfg_hooks get_cfg_hooks (void); > extern void set_cfg_hooks (struct cfg_hooks); > - > Index: modulo-sched.c > =================================================================== > --- modulo-sched.c (revision 193376) > +++ modulo-sched.c (working copy) > @@ -3354,7 +3354,7 @@ rest_of_handle_sms (void) > if (bb->next_bb != EXIT_BLOCK_PTR) > bb->aux = bb->next_bb; > free_dominance_info (CDI_DOMINATORS); > - cfg_layout_finalize (); > + cfg_layout_finalize (false); > #endif /* INSN_SCHEDULING */ > return 0; > } > Index: ifcvt.c > =================================================================== > --- ifcvt.c (revision 193376) > +++ ifcvt.c (working copy) > @@ -3900,10 +3900,9 @@ find_if_case_1 (basic_block test_bb, edge then_edg > if (new_bb) > { > df_bb_replace (then_bb_index, new_bb); > - /* Since the fallthru edge was redirected from test_bb to new_bb, > - we need to ensure that new_bb is in the same partition as > - test bb (you can not fall through across section boundaries). */ > - BB_COPY_PARTITION (new_bb, test_bb); > + /* This should have been done above via force_nonfallthru_and_redirect > + (possibly called from redirect_edge_and_branch_force). */ > + gcc_assert (BB_PARTITION (new_bb) == BB_PARTITION (test_bb)); > } > > num_true_changes++; > Index: function.c > =================================================================== > --- function.c (revision 193376) > +++ function.c (working copy) > @@ -6249,8 +6249,10 @@ thread_prologue_and_epilogue_insns (void) > break; > if (e) > { > - copy_bb = create_basic_block (NEXT_INSN (BB_END (e->src)), > - NULL_RTX, e->src); > + /* Make sure we insert after any barriers. */ > + rtx end = get_last_bb_insn (e->src); > + copy_bb = create_basic_block (NEXT_INSN (end), > + NULL_RTX, e->src); > BB_COPY_PARTITION (copy_bb, e->src); > } > else > @@ -6475,7 +6477,7 @@ thread_prologue_and_epilogue_insns (void) > if (cur_bb->index >= NUM_FIXED_BLOCKS > && cur_bb->next_bb->index >= NUM_FIXED_BLOCKS) > cur_bb->aux = cur_bb->next_bb; > - cfg_layout_finalize (); > + cfg_layout_finalize (false); > } > > epilogue_done: > @@ -6517,7 +6519,7 @@ epilogue_done: > basic_block simple_return_block_cold = NULL; > edge pending_edge_hot = NULL; > edge pending_edge_cold = NULL; > - basic_block exit_pred = EXIT_BLOCK_PTR->prev_bb; > + basic_block exit_pred; > int i; > > gcc_assert (entry_edge != orig_entry_edge); > @@ -6545,6 +6547,12 @@ epilogue_done: > else > pending_edge_cold = e; > } > + > + /* Save a pointer to the exit's predecessor BB for use in > + inserting new BBs at the end of the function. Do this > + after the call to split_block above which may split > + the original exit pred. */ > + exit_pred = EXIT_BLOCK_PTR->prev_bb; > > FOR_EACH_VEC_ELT (edge, unconverted_simple_returns, i, e) > { > Index: function.h > =================================================================== > --- function.h (revision 193376) > +++ function.h (working copy) > @@ -459,6 +459,11 @@ struct GTY(()) rtl_data { > sched2) and is useful only if the port defines LEAF_REGISTERS. */ > bool uses_only_leaf_regs; > > + /* Nonzero if the function being compiled has undergone hot/cold partitioning > + (under flag_reorder_blocks_and_partition) and has at least one cold > + block. */ > + bool has_bb_partition; > + > /* Like regs_ever_live, but 1 if a reg is set or clobbered from an > asm. Unlike regs_ever_live, elements of this array corresponding > to eliminable regs (like the frame pointer) are set if an asm > Index: hw-doloop.c > =================================================================== > --- hw-doloop.c (revision 193376) > +++ hw-doloop.c (working copy) > @@ -547,7 +547,7 @@ reorder_loops (hwloop_info loops) > else > bb->aux = NULL; > } > - cfg_layout_finalize (); > + cfg_layout_finalize (false); > clear_aux_for_blocks (); > df_analyze (); > } > Index: cfgcleanup.c > =================================================================== > --- cfgcleanup.c (revision 193376) > +++ cfgcleanup.c (working copy) > @@ -1824,7 +1824,7 @@ try_crossjump_to_edge (int mode, edge e1, edge e2, > partition boundaries). See the comments at the top of > bb-reorder.c:partition_hot_cold_basic_blocks for complete details. */ > > - if (flag_reorder_blocks_and_partition && reload_completed) > + if (crtl->has_bb_partition && reload_completed) > return false; > > /* Search backward through forwarder blocks. We don't need to worry > @@ -2767,10 +2767,21 @@ try_optimize_cfg (int mode) > df_analyze (); > } > > + if (changed) > + { > + /* Edge forwarding in particular can cause hot blocks previously > + reached by both hot and cold blocks to become dominated only > + by cold blocks. This will cause the verification below to fail, > + and lead to now cold code in the hot section. This is not easy > + to detect and fix during edge forwarding, and in some cases > + is only visible after newly unreachable blocks are deleted, > + which will be done in fixup_partitions. */ > + fixup_partitions (); > + > #ifdef ENABLE_CHECKING > - if (changed) > - verify_flow_info (); > + verify_flow_info (); > #endif > + } > > changed_overall |= changed; > first_pass = false; > Index: bb-reorder.c > =================================================================== > --- bb-reorder.c (revision 193376) > +++ bb-reorder.c (working copy) > @@ -1054,7 +1054,7 @@ connect_traces (int n_traces, struct trace *traces > current_partition = BB_PARTITION (traces[0].first); > two_passes = false; > > - if (flag_reorder_blocks_and_partition) > + if (crtl->has_bb_partition) > for (i = 0; i < n_traces && !two_passes; i++) > if (BB_PARTITION (traces[0].first) > != BB_PARTITION (traces[i].first)) > @@ -1263,7 +1263,7 @@ connect_traces (int n_traces, struct trace *traces > } > } > > - if (flag_reorder_blocks_and_partition) > + if (crtl->has_bb_partition) > try_copy = false; > > /* Copy tiny blocks always; copy larger blocks only when the > @@ -1381,13 +1381,14 @@ get_uncond_jump_length (void) > return length; > } > > -/* Emit a barrier into the footer of BB. */ > +/* Emit a barrier after BB, into the footer if we are in CFGLAYOUT mode. */ > > -static void > +void > emit_barrier_after_bb (basic_block bb) > { > rtx barrier = emit_barrier_after (BB_END (bb)); > - BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier); > + if (current_ir_type () == IR_RTL_CFGLAYOUT) > + BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier); > } > > /* The landing pad OLD_LP, in block OLD_BB, has edges from both partitions. > @@ -1463,18 +1464,109 @@ find_rarely_executed_basic_blocks_and_crossing_edg > { > VEC(edge, heap) *crossing_edges = NULL; > basic_block bb; > - edge e; > - edge_iterator ei; > + edge e, e2; > + edge_iterator ei, ei2; > + unsigned int cold_bb_count = 0; > + VEC (basic_block, heap) *bbs_in_hot_partition = NULL; > + VEC (basic_block, heap) *bbs_newly_hot = NULL; > > /* Mark which partition (hot/cold) each basic block belongs in. */ > FOR_EACH_BB (bb) > { > if (probably_never_executed_bb_p (cfun, bb)) > - BB_SET_PARTITION (bb, BB_COLD_PARTITION); > + { > + BB_SET_PARTITION (bb, BB_COLD_PARTITION); > + cold_bb_count++; > + } > else > - BB_SET_PARTITION (bb, BB_HOT_PARTITION); > + { > + BB_SET_PARTITION (bb, BB_HOT_PARTITION); > + VEC_safe_push (basic_block, heap, bbs_in_hot_partition, bb); > + } > } > > + /* Ensure that no cold bbs dominate hot bbs. This could happen as a result of > + several different possibilities. One is that there are edge weight insanities > + due to optimization phases that do not properly update basic block profile > + counts. The second is that the entry of the function may not be hot, because > + it is entered fewer times than the number of profile training runs, but there > + is a loop inside the function that causes blocks within the function to be > + above the threshold for hotness. */ > + if (cold_bb_count) > + { > + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); > + > + if (dom_calculated_here) > + calculate_dominance_info (CDI_DOMINATORS); > + > + /* Keep examining hot bbs until we have either checked them all, or > + re-marked all cold bbs hot. */ > + while (! VEC_empty (basic_block, bbs_in_hot_partition) > + && cold_bb_count) > + { > + basic_block dom_bb; > + > + bb = VEC_pop (basic_block, bbs_in_hot_partition); > + dom_bb = get_immediate_dominator (CDI_DOMINATORS, bb); > + > + /* If bb's immediate dominator is also hot then it is ok. */ > + if (BB_PARTITION (dom_bb) != BB_COLD_PARTITION) > + continue; > + > + /* We have a hot bb with an immediate dominator that is cold. > + The dominator needs to be re-marked to hot. */ > + BB_SET_PARTITION (dom_bb, BB_HOT_PARTITION); > + cold_bb_count--; > + > + /* Now we need to examine newly-hot dom_bb to see if it is also > + dominated by a cold bb. */ > + VEC_safe_push (basic_block, heap, bbs_in_hot_partition, dom_bb); > + > + /* We should also adjust any cold blocks that the newly-hot bb > + feeds and see if it makes sense to re-mark those as hot as > + well. */ > + VEC_safe_push (basic_block, heap, bbs_newly_hot, dom_bb); > + while (! VEC_empty (basic_block, bbs_newly_hot)) > + { > + basic_block new_hot_bb = VEC_pop (basic_block, bbs_newly_hot); > + /* Examine all successors of this newly-hot bb to see if they > + are cold and should be re-marked as hot. */ > + FOR_EACH_EDGE (e, ei, new_hot_bb->succs) > + { > + bool any_cold_preds = false; > + basic_block succ = e->dest; > + if (BB_PARTITION (succ) != BB_COLD_PARTITION) > + continue; > + /* Does this block have any cold predecessors now? */ > + FOR_EACH_EDGE (e2, ei2, succ->preds) > + { > + if (BB_PARTITION (e2->src) == BB_COLD_PARTITION) > + { > + any_cold_preds = true; > + break; > + } > + } > + if (any_cold_preds) > + continue; > + > + /* Here we have a successor of newly-hot bb that is cold > + but no longer has any cold precessessors. Since the original > + assignment of our newly-hot bb was incorrect, this successor's > + assignment as cold is also suspect. Go ahead and re-mark it > + as hot now too. Better heuristics may be in order here. */ > + BB_SET_PARTITION (succ, BB_HOT_PARTITION); > + cold_bb_count--; > + VEC_safe_push (basic_block, heap, bbs_in_hot_partition, succ); > + /* Examine this successor as a newly-hot bb. */ > + VEC_safe_push (basic_block, heap, bbs_newly_hot, succ); > + } > + } > + } > + > + if (dom_calculated_here) > + free_dominance_info (CDI_DOMINATORS); > + } > + > /* The format of .gcc_except_table does not allow landing pads to > be in a different partition as the throw. Fix this by either > moving or duplicating the landing pads. */ > @@ -1766,10 +1858,10 @@ fix_up_fall_thru_edges (void) > new_bb->aux = cur_bb->aux; > cur_bb->aux = new_bb; > > - /* Make sure new fall-through bb is in same > - partition as bb it's falling through from. */ > + /* This is done by force_nonfallthru_and_redirect. */ > + gcc_assert (BB_PARTITION (new_bb) > + == BB_PARTITION (cur_bb)); > > - BB_COPY_PARTITION (new_bb, cur_bb); > single_succ_edge (new_bb)->flags |= EDGE_CROSSING; > } > else > @@ -2067,7 +2159,10 @@ add_reg_crossing_jump_notes (void) > FOR_EACH_BB (bb) > FOR_EACH_EDGE (e, ei, bb->succs) > if ((e->flags & EDGE_CROSSING) > - && JUMP_P (BB_END (e->src))) > + && JUMP_P (BB_END (e->src)) > + /* Some notes were added during fix_up_fall_thru_edges, via > + force_nonfallthru_and_redirect. */ > + && !find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX)) > add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); > } > > @@ -2160,7 +2255,7 @@ reorder_basic_blocks (void) > dump_flow_info (dump_file, dump_flags); > } > > - if (flag_reorder_blocks_and_partition) > + if (crtl->has_bb_partition) > verify_hot_cold_block_grouping (); > } > > @@ -2172,14 +2267,14 @@ reorder_basic_blocks (void) > encountering this note will make the compiler switch between the > hot and cold text sections. */ > > -static void > +void > insert_section_boundary_note (void) > { > basic_block bb; > rtx new_note; > int first_partition = 0; > > - if (!flag_reorder_blocks_and_partition) > + if (!crtl->has_bb_partition) > return; > > FOR_EACH_BB (bb) > @@ -2222,10 +2317,8 @@ rest_of_handle_reorder_blocks (void) > FOR_EACH_BB (bb) > if (bb->next_bb != EXIT_BLOCK_PTR) > bb->aux = bb->next_bb; > - cfg_layout_finalize (); > + cfg_layout_finalize (true); > > - /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes. */ > - insert_section_boundary_note (); > return 0; > } > > @@ -2366,7 +2459,7 @@ duplicate_computed_gotos (void) > } > > done: > - cfg_layout_finalize (); > + cfg_layout_finalize (false); > > BITMAP_FREE (candidates); > return 0; > @@ -2511,6 +2604,8 @@ partition_hot_cold_basic_blocks (void) > if (crossing_edges == NULL) > return 0; > > + crtl->has_bb_partition = true; > + > /* Make sure the source of any crossing edge ends in a jump and the > destination of any crossing edge has a label. */ > add_labels_and_missing_jumps (crossing_edges); > Index: bb-reorder.h > =================================================================== > --- bb-reorder.h (revision 193376) > +++ bb-reorder.h (working copy) > @@ -36,4 +36,8 @@ extern struct target_bb_reorder *this_target_bb_re > > extern int get_uncond_jump_length (void); > > +extern void insert_section_boundary_note (void); > + > +extern void emit_barrier_after_bb (basic_block bb); > + > #endif > Index: basic-block.h > =================================================================== > --- basic-block.h (revision 193376) > +++ basic-block.h (working copy) > @@ -806,6 +806,7 @@ extern basic_block force_nonfallthru_and_redirect > extern bool contains_no_active_insn_p (const_basic_block); > extern bool forwarder_block_p (const_basic_block); > extern bool can_fallthru (basic_block, basic_block); > +extern void fixup_partitions (void); > > /* In cfgbuild.c. */ > extern void find_many_sub_basic_blocks (sbitmap); > Index: cfgrtl.c > =================================================================== > --- cfgrtl.c (revision 193376) > +++ cfgrtl.c (working copy) > @@ -46,6 +46,7 @@ along with GCC; see the file COPYING3. If not see > #include "tree.h" > #include "hard-reg-set.h" > #include "basic-block.h" > +#include "bb-reorder.h" > #include "regs.h" > #include "flags.h" > #include "function.h" > @@ -67,11 +68,12 @@ along with GCC; see the file COPYING3. If not see > Only applicable if the CFG is in cfglayout mode. */ > static GTY(()) rtx cfg_layout_function_footer; > static GTY(()) rtx cfg_layout_function_header; > +static bool had_sec_boundary_notes; > > static rtx skip_insns_after_block (basic_block); > static void record_effective_endpoints (void); > static rtx label_for_bb (basic_block); > -static void fixup_reorder_chain (void); > +static void fixup_reorder_chain (bool finalize_reorder_blocks); > > void verify_insn_chain (void); > static void fixup_fallthru_exit_predecessor (void); > @@ -976,8 +978,7 @@ try_redirect_by_replacing_jump (edge e, basic_bloc > partition boundaries). See the comments at the top of > bb-reorder.c:partition_hot_cold_basic_blocks for complete details. */ > > - if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX) > - || BB_PARTITION (src) != BB_PARTITION (target)) > + if (BB_PARTITION (src) != BB_PARTITION (target)) > return NULL; > > /* We can replace or remove a complex jump only when we have exactly > @@ -1286,6 +1287,71 @@ redirect_branch_edge (edge e, basic_block target) > return e; > } > > +/* Called when edge E has been redirected to a new destination, > + in order to update the region crossing flag on the edge and > + jump. */ > + > +static void > +fixup_partition_crossing (edge e, basic_block target) > +{ > + rtx note; > + > + gcc_assert (e->dest == target); > + > + if (e->src == ENTRY_BLOCK_PTR || target == EXIT_BLOCK_PTR) > + return; > + /* If we redirected an existing edge, it may already be marked > + crossing, even though the new src is missing a reg crossing note. > + But make sure reg crossing note doesn't already exist before > + inserting. */ > + if (BB_PARTITION (e->src) != BB_PARTITION (target)) > + { > + e->flags |= EDGE_CROSSING; > + note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); > + if (JUMP_P (BB_END (e->src)) > + && !note) > + add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); > + } > + else if (BB_PARTITION (e->src) == BB_PARTITION (target)) > + { > + e->flags &= ~EDGE_CROSSING; > + /* Remove the region crossing note from jump at end of > + e->src if it exists. */ > + note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); > + if (note) > + remove_note (BB_END (e->src), note); > + } > +} > + > +/* Called when block BB has been reassigned to a different partition, > + to ensure that the region crossing attributes are updated. */ > + > +static void > +fixup_bb_partition (basic_block bb) > +{ > + edge e; > + edge_iterator ei; > + > + /* Now need to make bb's pred edges non-region crossing. */ > + FOR_EACH_EDGE (e, ei, bb->preds) > + { > + fixup_partition_crossing (e, e->dest); > + } > + > + /* Possibly need to make bb's successor edges region crossing, > + or remove stale region crossing. */ > + FOR_EACH_EDGE (e, ei, bb->succs) > + { > + if ((e->flags & EDGE_FALLTHRU) > + && BB_PARTITION (bb) != BB_PARTITION (e->dest) > + && e->dest != EXIT_BLOCK_PTR) > + /* force_nonfallthru_and_redirect calls fixup_partition_crossing. */ > + force_nonfallthru (e); > + else > + fixup_partition_crossing (e, e->dest); > + } > +} > + > /* Attempt to change code to redirect edge E to TARGET. Don't do that on > expense of adding new instructions or reordering basic blocks. > > @@ -1302,16 +1368,18 @@ rtl_redirect_edge_and_branch (edge e, basic_block > { > edge ret; > basic_block src = e->src; > + basic_block dest = e->dest; > > if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH)) > return NULL; > > - if (e->dest == target) > + if (dest == target) > return e; > > if ((ret = try_redirect_by_replacing_jump (e, target, false)) != NULL) > { > df_set_bb_dirty (src); > + fixup_partition_crossing (ret, target); > return ret; > } > > @@ -1320,6 +1388,7 @@ rtl_redirect_edge_and_branch (edge e, basic_block > return NULL; > > df_set_bb_dirty (src); > + fixup_partition_crossing (ret, target); > return ret; > } > > @@ -1454,18 +1523,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc > /* Make sure new block ends up in correct hot/cold section. */ > > BB_COPY_PARTITION (jump_block, e->src); > - if (flag_reorder_blocks_and_partition > - && targetm_common.have_named_sections > - && JUMP_P (BB_END (jump_block)) > - && !any_condjump_p (BB_END (jump_block)) > - && (EDGE_SUCC (jump_block, 0)->flags & EDGE_CROSSING)) > - add_reg_note (BB_END (jump_block), REG_CROSSING_JUMP, NULL_RTX); > > /* Wire edge in. */ > new_edge = make_edge (e->src, jump_block, EDGE_FALLTHRU); > new_edge->probability = probability; > new_edge->count = count; > > + /* If e->src was previously region crossing, it no longer is > + and the reg crossing note should be removed. */ > + fixup_partition_crossing (new_edge, jump_block); > + > /* Redirect old edge. */ > redirect_edge_pred (e, jump_block); > e->probability = REG_BR_PROB_BASE; > @@ -1521,13 +1588,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc > LABEL_NUSES (label)++; > } > > - emit_barrier_after (BB_END (jump_block)); > + /* We might be in cfg layout mode, and if so, the following routine will > + insert the barrier correctly. */ > + emit_barrier_after_bb (jump_block); > redirect_edge_succ_nodup (e, target); > > if (abnormal_edge_flags) > make_edge (src, target, abnormal_edge_flags); > > df_mark_solutions_dirty (); > + fixup_partition_crossing (e, target); > return new_bb; > } > > @@ -1626,7 +1696,7 @@ rtl_move_block_after (basic_block bb ATTRIBUTE_UNU > static basic_block > rtl_split_edge (edge edge_in) > { > - basic_block bb; > + basic_block bb, new_bb; > rtx before; > > /* Abnormal edges cannot be split. */ > @@ -1659,12 +1729,26 @@ rtl_split_edge (edge edge_in) > else > { > bb = create_basic_block (before, NULL, edge_in->dest->prev_bb); > - /* ??? Why not edge_in->dest->prev_bb here? */ > - BB_COPY_PARTITION (bb, edge_in->dest); > + if (edge_in->src == ENTRY_BLOCK_PTR) > + BB_COPY_PARTITION (bb, edge_in->dest); > + else > + /* Put the split bb into the src partition, to avoid creating > + a situation where a cold bb dominates a hot bb, in the case > + where src is cold and dest is hot. The src will dominate > + the new bb (whereas it might not have dominated dest). */ > + BB_COPY_PARTITION (bb, edge_in->src); > } > > make_single_succ_edge (bb, edge_in->dest, EDGE_FALLTHRU); > > + /* Can't allow a region crossing edge to be fallthrough. */ > + if (BB_PARTITION (bb) != BB_PARTITION (edge_in->dest) > + && edge_in->dest != EXIT_BLOCK_PTR) > + { > + new_bb = force_nonfallthru (single_succ_edge (bb)); > + gcc_assert (!new_bb); > + } > + > /* For non-fallthru edges, we must adjust the predecessor's > jump instruction to target our new block. */ > if ((edge_in->flags & EDGE_FALLTHRU) == 0) > @@ -1777,17 +1861,13 @@ commit_one_edge_insertion (edge e) > else > { > bb = split_edge (e); > - after = BB_END (bb); > > - if (flag_reorder_blocks_and_partition > - && targetm_common.have_named_sections > - && e->src != ENTRY_BLOCK_PTR > - && BB_PARTITION (e->src) == BB_COLD_PARTITION > - && !(e->flags & EDGE_CROSSING) > - && JUMP_P (after) > - && !any_condjump_p (after) > - && (single_succ_edge (bb)->flags & EDGE_CROSSING)) > - add_reg_note (after, REG_CROSSING_JUMP, NULL_RTX); > + /* If e crossed a partition boundary, we needed to make bb end in > + a region-crossing jump, even though it was originally fallthru. */ > + if (JUMP_P (BB_END (bb))) > + before = BB_END (bb); > + else > + after = BB_END (bb); > } > > /* Now that we've found the spot, do the insertion. */ > @@ -1827,6 +1907,14 @@ commit_edge_insertions (void) > { > basic_block bb; > > + /* Optimization passes that invoke this routine can cause hot blocks > + previously reached by both hot and cold blocks to become dominated only > + by cold blocks. This will cause the verification below to fail, > + and lead to now cold code in the hot section. In some cases this > + may only be visible after newly unreachable blocks are deleted, > + which will be done by fixup_partitions. */ > + fixup_partitions (); > + > #ifdef ENABLE_CHECKING > verify_flow_info (); > #endif > @@ -2028,7 +2116,75 @@ get_last_bb_insn (basic_block bb) > > return end; > } > - > + > +/* Perform cleanup on the hot/cold bb partitioning after optimization > + passes that modify the cfg. */ > + > +void > +fixup_partitions (void) > +{ > + basic_block bb; > + > + if (!crtl->has_bb_partition) > + return; > + > + /* Delete any blocks that became unreachable and weren't > + already cleaned up, for example during edge forwarding > + and convert_jumps_to_returns. This will expose more > + opportunities for fixing the partition boundaries here. > + Also, the calculation of the dominance graph during verification > + will assert if there are unreachable nodes. */ > + delete_unreachable_blocks (); > + > + /* If there are partitions, do a sanity check on them: A basic block in > + a cold partition cannot dominate a basic block in a hot partition. > + Fixup any that now violate this requirement, as a result of edge > + forwarding and unreachable block deletion. */ > + VEC (basic_block, heap) *bbs_in_cold_partition = NULL; > + VEC (basic_block, heap) *bbs_to_fix = NULL; > + FOR_EACH_BB (bb) > + if ((BB_PARTITION (bb) == BB_COLD_PARTITION)) > + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb); > + if (! VEC_empty (basic_block, bbs_in_cold_partition)) > + { > + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); > + basic_block son; > + > + if (dom_calculated_here) > + calculate_dominance_info (CDI_DOMINATORS); > + > + while (! VEC_empty (basic_block, bbs_in_cold_partition)) > + { > + bb = VEC_pop (basic_block, bbs_in_cold_partition); > + /* If bb is not yet cold (because it was added below as > + a block dominated by a cold bb) then mark it cold here. */ > + if ((BB_PARTITION (bb) != BB_COLD_PARTITION)) > + { > + BB_SET_PARTITION (bb, BB_COLD_PARTITION); > + VEC_safe_push (basic_block, heap, bbs_to_fix, bb); > + } > + /* Any blocks dominated by a block in the cold section > + must also be cold. */ > + for (son = first_dom_son (CDI_DOMINATORS, bb); > + son; > + son = next_dom_son (CDI_DOMINATORS, son)) > + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son); > + } > + > + if (dom_calculated_here) > + free_dominance_info (CDI_DOMINATORS); > + } > + > + /* Do the partition fixup after all necessary blocks have been converted to > + cold, so that we only update the region crossings the minimum number of > + places, which can require forcing edges to be non fallthru. */ > + while (! VEC_empty (basic_block, bbs_to_fix)) > + { > + bb = VEC_pop (basic_block, bbs_to_fix); > + fixup_bb_partition (bb); > + } > +} > + > /* Verify the CFG and RTL consistency common for both underlying RTL and > cfglayout RTL. > > @@ -2052,6 +2208,7 @@ rtl_verify_flow_info_1 (void) > rtx x; > int err = 0; > basic_block bb; > + bool have_partitions = false; > > /* Check the general integrity of the basic blocks. */ > FOR_EACH_BB_REVERSE (bb) > @@ -2169,6 +2326,8 @@ rtl_verify_flow_info_1 (void) > > if (e->flags & EDGE_ABNORMAL) > n_abnormal++; > + > + have_partitions |= is_crossing; > } > > if (n_eh && !find_reg_note (BB_END (bb), REG_EH_REGION, NULL_RTX)) > @@ -2293,6 +2452,40 @@ rtl_verify_flow_info_1 (void) > } > } > > + /* If there are partitions, do a sanity check on them: A basic block in > + a cold partition cannot dominate a basic block in a hot partition. */ > + VEC (basic_block, heap) *bbs_in_cold_partition = NULL; > + if (have_partitions && !err) > + FOR_EACH_BB (bb) > + if ((BB_PARTITION (bb) == BB_COLD_PARTITION)) > + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb); > + if (! VEC_empty (basic_block, bbs_in_cold_partition)) > + { > + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); > + basic_block son; > + > + if (dom_calculated_here) > + calculate_dominance_info (CDI_DOMINATORS); > + > + while (! VEC_empty (basic_block, bbs_in_cold_partition)) > + { > + bb = VEC_pop (basic_block, bbs_in_cold_partition); > + if ((BB_PARTITION (bb) != BB_COLD_PARTITION)) > + { > + error ("non-cold basic block %d dominated " > + "by a block in the cold partition", bb->index); > + err = 1; > + } > + for (son = first_dom_son (CDI_DOMINATORS, bb); > + son; > + son = next_dom_son (CDI_DOMINATORS, son)) > + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son); > + } > + > + if (dom_calculated_here) > + free_dominance_info (CDI_DOMINATORS); > + } > + > /* Clean up. */ > return err; > } > @@ -2965,14 +3158,41 @@ record_effective_endpoints (void) > else > cfg_layout_function_header = NULL_RTX; > > + had_sec_boundary_notes = false; > + > next_insn = get_insns (); > FOR_EACH_BB (bb) > { > rtx end; > > if (PREV_INSN (BB_HEAD (bb)) && next_insn != BB_HEAD (bb)) > - BB_HEADER (bb) = unlink_insn_chain (next_insn, > - PREV_INSN (BB_HEAD (bb))); > + { > + /* Rather than try to keep section boundary notes incrementally > + up-to-date through cfg layout optimizations, simply remove them > + and flag that they should be re-inserted when exiting > + cfg layout mode. */ > + rtx check_insn = next_insn; > + while (check_insn) > + { > + if (NOTE_P (check_insn) > + && NOTE_KIND (check_insn) == NOTE_INSN_SWITCH_TEXT_SECTIONS) > + { > + had_sec_boundary_notes |= true; > + /* Remove note from chain. Grab new next_insn first. */ > + if (next_insn == check_insn) > + next_insn = NEXT_INSN (check_insn); > + /* Delete note. */ > + delete_insn (check_insn); > + /* There will only be one. */ > + break; > + } > + check_insn = NEXT_INSN (check_insn); > + } > + /* If we still have header instructions left after above loop. */ > + if (next_insn != BB_HEAD (bb)) > + BB_HEADER (bb) = unlink_insn_chain (next_insn, > + PREV_INSN (BB_HEAD (bb))); > + } > end = skip_insns_after_block (bb); > if (NEXT_INSN (BB_END (bb)) && BB_END (bb) != end) > BB_FOOTER (bb) = unlink_insn_chain (NEXT_INSN (BB_END (bb)), end); > @@ -3000,7 +3220,7 @@ outof_cfg_layout_mode (void) > if (bb->next_bb != EXIT_BLOCK_PTR) > bb->aux = bb->next_bb; > > - cfg_layout_finalize (); > + cfg_layout_finalize (false); > > return 0; > } > @@ -3120,10 +3340,13 @@ relink_block_chain (bool stay_in_cfglayout_mode) > } > > > -/* Given a reorder chain, rearrange the code to match. */ > +/* Given a reorder chain, rearrange the code to match. If > + this is called when we will FINALIZE_REORDER_BLOCKS, or when > + section boundary notes were removed on entry to cfg layout > + mode, insert section boundary notes here. */ > > static void > -fixup_reorder_chain (void) > +fixup_reorder_chain (bool finalize_reorder_blocks) > { > basic_block bb; > rtx insn = NULL; > @@ -3150,7 +3373,7 @@ static void > PREV_INSN (BB_HEADER (bb)) = insn; > insn = BB_HEADER (bb); > while (NEXT_INSN (insn)) > - insn = NEXT_INSN (insn); > + insn = NEXT_INSN (insn); > } > if (insn) > NEXT_INSN (insn) = BB_HEAD (bb); > @@ -3175,6 +3398,11 @@ static void > insn = NEXT_INSN (insn); > > set_last_insn (insn); > + > + /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes. */ > + if (had_sec_boundary_notes || finalize_reorder_blocks) > + insert_section_boundary_note (); > + > #ifdef ENABLE_CHECKING > verify_insn_chain (); > #endif > @@ -3187,7 +3415,7 @@ static void > edge e_fall, e_taken, e; > rtx bb_end_insn; > rtx ret_label = NULL_RTX; > - basic_block nb, src_bb; > + basic_block nb; > edge_iterator ei; > > if (EDGE_COUNT (bb->succs) == 0) > @@ -3322,7 +3550,6 @@ static void > /* We got here if we need to add a new jump insn. > Note force_nonfallthru can delete E_FALL and thus we have to > save E_FALL->src prior to the call to force_nonfallthru. */ > - src_bb = e_fall->src; > nb = force_nonfallthru_and_redirect (e_fall, e_fall->dest, ret_label); > if (nb) > { > @@ -3330,17 +3557,6 @@ static void > bb->aux = nb; > /* Don't process this new block. */ > bb = nb; > - > - /* Make sure new bb is tagged for correct section (same as > - fall-thru source, since you cannot fall-thru across > - section boundaries). */ > - BB_COPY_PARTITION (src_bb, single_pred (bb)); > - if (flag_reorder_blocks_and_partition > - && targetm_common.have_named_sections > - && JUMP_P (BB_END (bb)) > - && !any_condjump_p (BB_END (bb)) > - && (EDGE_SUCC (bb, 0)->flags & EDGE_CROSSING)) > - add_reg_note (BB_END (bb), REG_CROSSING_JUMP, NULL_RTX); > } > } > > @@ -3644,10 +3860,11 @@ duplicate_insn_chain (rtx from, rtx to) > case NOTE_INSN_FUNCTION_BEG: > /* There is always just single entry to function. */ > case NOTE_INSN_BASIC_BLOCK: > + /* We should only switch text sections once. */ > + case NOTE_INSN_SWITCH_TEXT_SECTIONS: > break; > > case NOTE_INSN_EPILOGUE_BEG: > - case NOTE_INSN_SWITCH_TEXT_SECTIONS: > emit_note_copy (insn); > break; > > @@ -3759,10 +3976,13 @@ break_superblocks (void) > } > > /* Finalize the changes: reorder insn list according to the sequence specified > - by aux pointers, enter compensation code, rebuild scope forest. */ > + by aux pointers, enter compensation code, rebuild scope forest. If > + this is called when we will FINALIZE_REORDER_BLOCKS, indicate that > + to fixup_reorder_chain so that it can insert the proper switch text > + section notes. */ > > void > -cfg_layout_finalize (void) > +cfg_layout_finalize (bool finalize_reorder_blocks) > { > #ifdef ENABLE_CHECKING > verify_flow_info (); > @@ -3775,7 +3995,7 @@ void > #endif > ) > fixup_fallthru_exit_predecessor (); > - fixup_reorder_chain (); > + fixup_reorder_chain (finalize_reorder_blocks); > > rebuild_jump_labels (get_insns ()); > delete_dead_jumptables (); > @@ -4454,8 +4674,7 @@ rtl_can_remove_branch_p (const_edge e) > if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH)) > return false; > > - if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX) > - || BB_PARTITION (src) != BB_PARTITION (target)) > + if (BB_PARTITION (src) != BB_PARTITION (target)) > return false; > > if (!onlyjump_p (insn) > > -- > This patch is available for review at http://codereview.appspot.com/6823047 -- Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Sign in to reply to this message.
Hi, I have tested your patch on Spec2000 on ARM, and I can still see several failures caused by: "error: fallthru edge crosses section boundary", including the case described in PR55121. Christophe. On 26 November 2012 16:55, Teresa Johnson <tejohnson@google.com> wrote: > Ping. > Teresa > > On Thu, Nov 15, 2012 at 12:10 PM, Teresa Johnson <tejohnson@google.com> > wrote: > > Revised patch that fixes failures encountered when enabling > > -freorder-blocks-and-partition, including the failure reported in PR > 53743. > > > > This includes new verification code to ensure no cold blocks dominate hot > > blocks contributed by Steven Bosscher. > > > > I attempted to make the handling of partition updates through the > optimization > > passes much more consistent, removing a number of partial fixes in the > code > > stream in the process. The code to fixup partitions (including the > BB_PARTITION > > assignement, region crossing jump notes, and switch text section notes) > is > > now handled in a few centralized locations. For example, inside > > rtl_redirect_edge_and_branch and force_nonfallthru_and_redirect, so that > callers > > don't need to attempt the fixup themselves. > > > > For optimization passes that make adjustments to the cfg while in cfg > layout > > mode that are not easy to fix up incrementally, the new routine > > fixup_partitions handles the cleanup globally. This does require > calculation > > of the dominance relation, however, as far as I can tell the routines > which > > now invoke this global fixup (try_optimize_cfg and > commit_edge_insertions) > > are invoked typically once (or a small number of times in the case of > > try_optimize_cfg) per optimization pass. Additionally, I compared the > > -ftime-report output for some large fdo compilations and saw only minimal > > increases in the dominance computation times, which were only a tiny > percent > > of the overall compile time. > > > > Additionally, I added a flag to the rtl_data structure to indicate > whether > > any partitioning was actually performed, so that optimizations which were > > conservatively disabled whenever the flag_reorder_blocks_and_partition > > is enabled (e.g. try_crossjump_to_edge, part of connect_traces) can be > less > > conservative for functions where no partitions were formed (e.g. they are > > completely hot). > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu. Also tested with > SPEC2006 int > > benchmarks and internal google benchmarks using profile feedback and > > -freorder-blocks-and-partition to get more coverage. Ok for trunk? > > > > Thanks, > > Teresa > > > > 2012-11-14 Teresa Johnson <tejohnson@google.com> > > Steven Bosscher <steven@gcc.gnu.org> > > > > * cfghooks.h (cfg_layout_finalize): New parameter. > > * modulo-sched.c (rest_of_handle_sms): New cfg_layout_finalize > > parameter. > > * ifcvt.c (find_if_case_1): Replace BB_COPY_PARTITION with assert > > as this is now done by redirect_edge_and_branch_force. > > * function.c (thread_prologue_and_epilogue_insns): Insert new bb > after > > barriers, new cfg_layout_finalize parameter, and don't store exit > > predecessor BB until after it is potentially split. > > * function.h (struct rtl_data): New flag has_bb_partition. > > * hw-doloop.c (reorder_loops): New cfg_layout_finalize parameter. > > * cfgcleanup.c (try_crossjump_to_edge): Only skip optimization if > > any blocks in function actually partitioned. > > (try_optimize_cfg): If cfg changed, invoke fixup_partitions to > clean > > up partitioning. > > * bb-reorder.c (connect_traces): Only look for partitions and > skip > > block copying if any blocks in function actually partitioned. > > (emit_barrier_after_bb): Handle insertion in non-cfglayout mode. > > (find_rarely_executed_basic_blocks_and_crossing_edges): Ensure > > that no cold blocks dominate a hot block. > > (fix_up_fall_thru_edges): Replace BB_COPY_PARTITION with assert > > as this is now done by force_nonfallthru_and_redirect. > > (add_reg_crossing_jump_notes): Handle the fact that some jumps > may > > already be marked with region crossing note. > > (reorder_basic_blocks): Only need to verify partitions if any > > blocks in function actually partitioned. > > (insert_section_boundary_note): Only need to insert note if any > > blocks in function actually partitioned. > > (rest_of_handle_reorder_blocks): New cfg_layout_finalize > > parameter, and remove call to insert_section_boundary_note as > this > > is now called via cfg_layout_finalize/fixup_reorder_chain. > > (duplicate_computed_gotos): New cfg_layout_finalize > > parameter. > > (partition_hot_cold_basic_blocks): Set flag indicating function > > has bb partitions. > > * bb-reorder.h: Declare insert_section_boundary_note and > > emit_barrier_after_bb, which are no longer static. > > * basic-block.h: Declare new function fixup_partitions. > > * cfgrtl.c (try_redirect_by_replacing_jump): Remove unnecessary > > check for region crossing note. > > (fixup_partition_crossing): New function. > > (fixup_bb_partition): Ditto. > > (rtl_redirect_edge_and_branch): Fixup partition boundaries. > > (force_nonfallthru_and_redirect): Fixup partition boundaries, > > remove old code that tried to do this. Emit barrier correctly > > when we are in cfglayout mode. > > (rtl_split_edge): Correctly fixup partition boundaries. > > (commit_one_edge_insertion): Remove old code that tried to > > fixup region crossing edge since this is now handled in > > split_block, and set up insertion point correctly since > > block may now end in a jump. > > (commit_edge_insertions): Invoke fixup_partitions to sanitize > partition > > boundaries after optimizations that modify cfg and before trying > to > > verify the flow info. > > (fixup_partitions): New function. > > (rtl_verify_flow_info_1): Add verification that no cold bbs > dominate > > hot bbs. > > (record_effective_endpoints): Remove region-crossing notes and > set flag > > indicating that they need to be reinserted on exit from > cfglayout mode. > > (outof_cfg_layout_mode): New cfg_layout_finalize parameter. > > (fixup_reorder_chain): Call insert_section_boundary_note if > necessary. > > Remove old code that attempted to fixup region crossing note as > > this is now handled in force_nonfallthru_and_redirect. > > (duplicate_insn_chain): Don't duplicate switch section notes. > > (cfg_layout_finalize): Pass new parameter to fixup_reorder_chain. > > (rtl_can_remove_branch_p): Remove unnecessary check for region > crossing > > note. > > > > Index: cfghooks.h > > =================================================================== > > --- cfghooks.h (revision 193376) > > +++ cfghooks.h (working copy) > > @@ -204,7 +204,7 @@ extern void copy_bbs (basic_block *, unsigned, bas > > void account_profile_record (struct profile_record *, int); > > > > extern void cfg_layout_initialize (unsigned int); > > -extern void cfg_layout_finalize (void); > > +extern void cfg_layout_finalize (bool); > > > > /* Hooks containers. */ > > extern struct cfg_hooks gimple_cfg_hooks; > > @@ -218,4 +218,3 @@ extern void cfg_layout_rtl_register_cfg_hooks (voi > > extern void gimple_register_cfg_hooks (void); > > extern struct cfg_hooks get_cfg_hooks (void); > > extern void set_cfg_hooks (struct cfg_hooks); > > - > > Index: modulo-sched.c > > =================================================================== > > --- modulo-sched.c (revision 193376) > > +++ modulo-sched.c (working copy) > > @@ -3354,7 +3354,7 @@ rest_of_handle_sms (void) > > if (bb->next_bb != EXIT_BLOCK_PTR) > > bb->aux = bb->next_bb; > > free_dominance_info (CDI_DOMINATORS); > > - cfg_layout_finalize (); > > + cfg_layout_finalize (false); > > #endif /* INSN_SCHEDULING */ > > return 0; > > } > > Index: ifcvt.c > > =================================================================== > > --- ifcvt.c (revision 193376) > > +++ ifcvt.c (working copy) > > @@ -3900,10 +3900,9 @@ find_if_case_1 (basic_block test_bb, edge then_edg > > if (new_bb) > > { > > df_bb_replace (then_bb_index, new_bb); > > - /* Since the fallthru edge was redirected from test_bb to new_bb, > > - we need to ensure that new_bb is in the same partition as > > - test bb (you can not fall through across section boundaries). > */ > > - BB_COPY_PARTITION (new_bb, test_bb); > > + /* This should have been done above via > force_nonfallthru_and_redirect > > + (possibly called from redirect_edge_and_branch_force). */ > > + gcc_assert (BB_PARTITION (new_bb) == BB_PARTITION (test_bb)); > > } > > > > num_true_changes++; > > Index: function.c > > =================================================================== > > --- function.c (revision 193376) > > +++ function.c (working copy) > > @@ -6249,8 +6249,10 @@ thread_prologue_and_epilogue_insns (void) > > break; > > if (e) > > { > > - copy_bb = create_basic_block (NEXT_INSN (BB_END > (e->src)), > > - NULL_RTX, e->src); > > + /* Make sure we insert after any barriers. */ > > + rtx end = get_last_bb_insn (e->src); > > + copy_bb = create_basic_block (NEXT_INSN (end), > > + NULL_RTX, e->src); > > BB_COPY_PARTITION (copy_bb, e->src); > > } > > else > > @@ -6475,7 +6477,7 @@ thread_prologue_and_epilogue_insns (void) > > if (cur_bb->index >= NUM_FIXED_BLOCKS > > && cur_bb->next_bb->index >= NUM_FIXED_BLOCKS) > > cur_bb->aux = cur_bb->next_bb; > > - cfg_layout_finalize (); > > + cfg_layout_finalize (false); > > } > > > > epilogue_done: > > @@ -6517,7 +6519,7 @@ epilogue_done: > > basic_block simple_return_block_cold = NULL; > > edge pending_edge_hot = NULL; > > edge pending_edge_cold = NULL; > > - basic_block exit_pred = EXIT_BLOCK_PTR->prev_bb; > > + basic_block exit_pred; > > int i; > > > > gcc_assert (entry_edge != orig_entry_edge); > > @@ -6545,6 +6547,12 @@ epilogue_done: > > else > > pending_edge_cold = e; > > } > > + > > + /* Save a pointer to the exit's predecessor BB for use in > > + inserting new BBs at the end of the function. Do this > > + after the call to split_block above which may split > > + the original exit pred. */ > > + exit_pred = EXIT_BLOCK_PTR->prev_bb; > > > > FOR_EACH_VEC_ELT (edge, unconverted_simple_returns, i, e) > > { > > Index: function.h > > =================================================================== > > --- function.h (revision 193376) > > +++ function.h (working copy) > > @@ -459,6 +459,11 @@ struct GTY(()) rtl_data { > > sched2) and is useful only if the port defines LEAF_REGISTERS. */ > > bool uses_only_leaf_regs; > > > > + /* Nonzero if the function being compiled has undergone hot/cold > partitioning > > + (under flag_reorder_blocks_and_partition) and has at least one cold > > + block. */ > > + bool has_bb_partition; > > + > > /* Like regs_ever_live, but 1 if a reg is set or clobbered from an > > asm. Unlike regs_ever_live, elements of this array corresponding > > to eliminable regs (like the frame pointer) are set if an asm > > Index: hw-doloop.c > > =================================================================== > > --- hw-doloop.c (revision 193376) > > +++ hw-doloop.c (working copy) > > @@ -547,7 +547,7 @@ reorder_loops (hwloop_info loops) > > else > > bb->aux = NULL; > > } > > - cfg_layout_finalize (); > > + cfg_layout_finalize (false); > > clear_aux_for_blocks (); > > df_analyze (); > > } > > Index: cfgcleanup.c > > =================================================================== > > --- cfgcleanup.c (revision 193376) > > +++ cfgcleanup.c (working copy) > > @@ -1824,7 +1824,7 @@ try_crossjump_to_edge (int mode, edge e1, edge e2, > > partition boundaries). See the comments at the top of > > bb-reorder.c:partition_hot_cold_basic_blocks for complete details. > */ > > > > - if (flag_reorder_blocks_and_partition && reload_completed) > > + if (crtl->has_bb_partition && reload_completed) > > return false; > > > > /* Search backward through forwarder blocks. We don't need to worry > > @@ -2767,10 +2767,21 @@ try_optimize_cfg (int mode) > > df_analyze (); > > } > > > > + if (changed) > > + { > > + /* Edge forwarding in particular can cause hot blocks > previously > > + reached by both hot and cold blocks to become > dominated only > > + by cold blocks. This will cause the verification below > to fail, > > + and lead to now cold code in the hot section. This is > not easy > > + to detect and fix during edge forwarding, and in some > cases > > + is only visible after newly unreachable blocks are > deleted, > > + which will be done in fixup_partitions. */ > > + fixup_partitions (); > > + > > #ifdef ENABLE_CHECKING > > - if (changed) > > - verify_flow_info (); > > + verify_flow_info (); > > #endif > > + } > > > > changed_overall |= changed; > > first_pass = false; > > Index: bb-reorder.c > > =================================================================== > > --- bb-reorder.c (revision 193376) > > +++ bb-reorder.c (working copy) > > @@ -1054,7 +1054,7 @@ connect_traces (int n_traces, struct trace *traces > > current_partition = BB_PARTITION (traces[0].first); > > two_passes = false; > > > > - if (flag_reorder_blocks_and_partition) > > + if (crtl->has_bb_partition) > > for (i = 0; i < n_traces && !two_passes; i++) > > if (BB_PARTITION (traces[0].first) > > != BB_PARTITION (traces[i].first)) > > @@ -1263,7 +1263,7 @@ connect_traces (int n_traces, struct trace *traces > > } > > } > > > > - if (flag_reorder_blocks_and_partition) > > + if (crtl->has_bb_partition) > > try_copy = false; > > > > /* Copy tiny blocks always; copy larger blocks only when > the > > @@ -1381,13 +1381,14 @@ get_uncond_jump_length (void) > > return length; > > } > > > > -/* Emit a barrier into the footer of BB. */ > > +/* Emit a barrier after BB, into the footer if we are in CFGLAYOUT > mode. */ > > > > -static void > > +void > > emit_barrier_after_bb (basic_block bb) > > { > > rtx barrier = emit_barrier_after (BB_END (bb)); > > - BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier); > > + if (current_ir_type () == IR_RTL_CFGLAYOUT) > > + BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier); > > } > > > > /* The landing pad OLD_LP, in block OLD_BB, has edges from both > partitions. > > @@ -1463,18 +1464,109 @@ > find_rarely_executed_basic_blocks_and_crossing_edg > > { > > VEC(edge, heap) *crossing_edges = NULL; > > basic_block bb; > > - edge e; > > - edge_iterator ei; > > + edge e, e2; > > + edge_iterator ei, ei2; > > + unsigned int cold_bb_count = 0; > > + VEC (basic_block, heap) *bbs_in_hot_partition = NULL; > > + VEC (basic_block, heap) *bbs_newly_hot = NULL; > > > > /* Mark which partition (hot/cold) each basic block belongs in. */ > > FOR_EACH_BB (bb) > > { > > if (probably_never_executed_bb_p (cfun, bb)) > > - BB_SET_PARTITION (bb, BB_COLD_PARTITION); > > + { > > + BB_SET_PARTITION (bb, BB_COLD_PARTITION); > > + cold_bb_count++; > > + } > > else > > - BB_SET_PARTITION (bb, BB_HOT_PARTITION); > > + { > > + BB_SET_PARTITION (bb, BB_HOT_PARTITION); > > + VEC_safe_push (basic_block, heap, bbs_in_hot_partition, bb); > > + } > > } > > > > + /* Ensure that no cold bbs dominate hot bbs. This could happen as a > result of > > + several different possibilities. One is that there are edge weight > insanities > > + due to optimization phases that do not properly update basic block > profile > > + counts. The second is that the entry of the function may not be > hot, because > > + it is entered fewer times than the number of profile training > runs, but there > > + is a loop inside the function that causes blocks within the > function to be > > + above the threshold for hotness. */ > > + if (cold_bb_count) > > + { > > + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); > > + > > + if (dom_calculated_here) > > + calculate_dominance_info (CDI_DOMINATORS); > > + > > + /* Keep examining hot bbs until we have either checked them all, > or > > + re-marked all cold bbs hot. */ > > + while (! VEC_empty (basic_block, bbs_in_hot_partition) > > + && cold_bb_count) > > + { > > + basic_block dom_bb; > > + > > + bb = VEC_pop (basic_block, bbs_in_hot_partition); > > + dom_bb = get_immediate_dominator (CDI_DOMINATORS, bb); > > + > > + /* If bb's immediate dominator is also hot then it is ok. */ > > + if (BB_PARTITION (dom_bb) != BB_COLD_PARTITION) > > + continue; > > + > > + /* We have a hot bb with an immediate dominator that is cold. > > + The dominator needs to be re-marked to hot. */ > > + BB_SET_PARTITION (dom_bb, BB_HOT_PARTITION); > > + cold_bb_count--; > > + > > + /* Now we need to examine newly-hot dom_bb to see if it is > also > > + dominated by a cold bb. */ > > + VEC_safe_push (basic_block, heap, bbs_in_hot_partition, > dom_bb); > > + > > + /* We should also adjust any cold blocks that the newly-hot bb > > + feeds and see if it makes sense to re-mark those as hot as > > + well. */ > > + VEC_safe_push (basic_block, heap, bbs_newly_hot, dom_bb); > > + while (! VEC_empty (basic_block, bbs_newly_hot)) > > + { > > + basic_block new_hot_bb = VEC_pop (basic_block, > bbs_newly_hot); > > + /* Examine all successors of this newly-hot bb to see if > they > > + are cold and should be re-marked as hot. */ > > + FOR_EACH_EDGE (e, ei, new_hot_bb->succs) > > + { > > + bool any_cold_preds = false; > > + basic_block succ = e->dest; > > + if (BB_PARTITION (succ) != BB_COLD_PARTITION) > > + continue; > > + /* Does this block have any cold predecessors now? */ > > + FOR_EACH_EDGE (e2, ei2, succ->preds) > > + { > > + if (BB_PARTITION (e2->src) == BB_COLD_PARTITION) > > + { > > + any_cold_preds = true; > > + break; > > + } > > + } > > + if (any_cold_preds) > > + continue; > > + > > + /* Here we have a successor of newly-hot bb that is > cold > > + but no longer has any cold precessessors. Since > the original > > + assignment of our newly-hot bb was incorrect, this > successor's > > + assignment as cold is also suspect. Go ahead and > re-mark it > > + as hot now too. Better heuristics may be in order > here. */ > > + BB_SET_PARTITION (succ, BB_HOT_PARTITION); > > + cold_bb_count--; > > + VEC_safe_push (basic_block, heap, > bbs_in_hot_partition, succ); > > + /* Examine this successor as a newly-hot bb. */ > > + VEC_safe_push (basic_block, heap, bbs_newly_hot, > succ); > > + } > > + } > > + } > > + > > + if (dom_calculated_here) > > + free_dominance_info (CDI_DOMINATORS); > > + } > > + > > /* The format of .gcc_except_table does not allow landing pads to > > be in a different partition as the throw. Fix this by either > > moving or duplicating the landing pads. */ > > @@ -1766,10 +1858,10 @@ fix_up_fall_thru_edges (void) > > new_bb->aux = cur_bb->aux; > > cur_bb->aux = new_bb; > > > > - /* Make sure new fall-through bb is in same > > - partition as bb it's falling through from. */ > > + /* This is done by > force_nonfallthru_and_redirect. */ > > + gcc_assert (BB_PARTITION (new_bb) > > + == BB_PARTITION (cur_bb)); > > > > - BB_COPY_PARTITION (new_bb, cur_bb); > > single_succ_edge (new_bb)->flags |= EDGE_CROSSING; > > } > > else > > @@ -2067,7 +2159,10 @@ add_reg_crossing_jump_notes (void) > > FOR_EACH_BB (bb) > > FOR_EACH_EDGE (e, ei, bb->succs) > > if ((e->flags & EDGE_CROSSING) > > - && JUMP_P (BB_END (e->src))) > > + && JUMP_P (BB_END (e->src)) > > + /* Some notes were added during fix_up_fall_thru_edges, via > > + force_nonfallthru_and_redirect. */ > > + && !find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, > NULL_RTX)) > > add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); > > } > > > > @@ -2160,7 +2255,7 @@ reorder_basic_blocks (void) > > dump_flow_info (dump_file, dump_flags); > > } > > > > - if (flag_reorder_blocks_and_partition) > > + if (crtl->has_bb_partition) > > verify_hot_cold_block_grouping (); > > } > > > > @@ -2172,14 +2267,14 @@ reorder_basic_blocks (void) > > encountering this note will make the compiler switch between the > > hot and cold text sections. */ > > > > -static void > > +void > > insert_section_boundary_note (void) > > { > > basic_block bb; > > rtx new_note; > > int first_partition = 0; > > > > - if (!flag_reorder_blocks_and_partition) > > + if (!crtl->has_bb_partition) > > return; > > > > FOR_EACH_BB (bb) > > @@ -2222,10 +2317,8 @@ rest_of_handle_reorder_blocks (void) > > FOR_EACH_BB (bb) > > if (bb->next_bb != EXIT_BLOCK_PTR) > > bb->aux = bb->next_bb; > > - cfg_layout_finalize (); > > + cfg_layout_finalize (true); > > > > - /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes. */ > > - insert_section_boundary_note (); > > return 0; > > } > > > > @@ -2366,7 +2459,7 @@ duplicate_computed_gotos (void) > > } > > > > done: > > - cfg_layout_finalize (); > > + cfg_layout_finalize (false); > > > > BITMAP_FREE (candidates); > > return 0; > > @@ -2511,6 +2604,8 @@ partition_hot_cold_basic_blocks (void) > > if (crossing_edges == NULL) > > return 0; > > > > + crtl->has_bb_partition = true; > > + > > /* Make sure the source of any crossing edge ends in a jump and the > > destination of any crossing edge has a label. */ > > add_labels_and_missing_jumps (crossing_edges); > > Index: bb-reorder.h > > =================================================================== > > --- bb-reorder.h (revision 193376) > > +++ bb-reorder.h (working copy) > > @@ -36,4 +36,8 @@ extern struct target_bb_reorder *this_target_bb_re > > > > extern int get_uncond_jump_length (void); > > > > +extern void insert_section_boundary_note (void); > > + > > +extern void emit_barrier_after_bb (basic_block bb); > > + > > #endif > > Index: basic-block.h > > =================================================================== > > --- basic-block.h (revision 193376) > > +++ basic-block.h (working copy) > > @@ -806,6 +806,7 @@ extern basic_block force_nonfallthru_and_redirect > > extern bool contains_no_active_insn_p (const_basic_block); > > extern bool forwarder_block_p (const_basic_block); > > extern bool can_fallthru (basic_block, basic_block); > > +extern void fixup_partitions (void); > > > > /* In cfgbuild.c. */ > > extern void find_many_sub_basic_blocks (sbitmap); > > Index: cfgrtl.c > > =================================================================== > > --- cfgrtl.c (revision 193376) > > +++ cfgrtl.c (working copy) > > @@ -46,6 +46,7 @@ along with GCC; see the file COPYING3. If not see > > #include "tree.h" > > #include "hard-reg-set.h" > > #include "basic-block.h" > > +#include "bb-reorder.h" > > #include "regs.h" > > #include "flags.h" > > #include "function.h" > > @@ -67,11 +68,12 @@ along with GCC; see the file COPYING3. If not see > > Only applicable if the CFG is in cfglayout mode. */ > > static GTY(()) rtx cfg_layout_function_footer; > > static GTY(()) rtx cfg_layout_function_header; > > +static bool had_sec_boundary_notes; > > > > static rtx skip_insns_after_block (basic_block); > > static void record_effective_endpoints (void); > > static rtx label_for_bb (basic_block); > > -static void fixup_reorder_chain (void); > > +static void fixup_reorder_chain (bool finalize_reorder_blocks); > > > > void verify_insn_chain (void); > > static void fixup_fallthru_exit_predecessor (void); > > @@ -976,8 +978,7 @@ try_redirect_by_replacing_jump (edge e, basic_bloc > > partition boundaries). See the comments at the top of > > bb-reorder.c:partition_hot_cold_basic_blocks for complete details. > */ > > > > - if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX) > > - || BB_PARTITION (src) != BB_PARTITION (target)) > > + if (BB_PARTITION (src) != BB_PARTITION (target)) > > return NULL; > > > > /* We can replace or remove a complex jump only when we have exactly > > @@ -1286,6 +1287,71 @@ redirect_branch_edge (edge e, basic_block target) > > return e; > > } > > > > +/* Called when edge E has been redirected to a new destination, > > + in order to update the region crossing flag on the edge and > > + jump. */ > > + > > +static void > > +fixup_partition_crossing (edge e, basic_block target) > > +{ > > + rtx note; > > + > > + gcc_assert (e->dest == target); > > + > > + if (e->src == ENTRY_BLOCK_PTR || target == EXIT_BLOCK_PTR) > > + return; > > + /* If we redirected an existing edge, it may already be marked > > + crossing, even though the new src is missing a reg crossing note. > > + But make sure reg crossing note doesn't already exist before > > + inserting. */ > > + if (BB_PARTITION (e->src) != BB_PARTITION (target)) > > + { > > + e->flags |= EDGE_CROSSING; > > + note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, > NULL_RTX); > > + if (JUMP_P (BB_END (e->src)) > > + && !note) > > + add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); > > + } > > + else if (BB_PARTITION (e->src) == BB_PARTITION (target)) > > + { > > + e->flags &= ~EDGE_CROSSING; > > + /* Remove the region crossing note from jump at end of > > + e->src if it exists. */ > > + note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, > NULL_RTX); > > + if (note) > > + remove_note (BB_END (e->src), note); > > + } > > +} > > + > > +/* Called when block BB has been reassigned to a different partition, > > + to ensure that the region crossing attributes are updated. */ > > + > > +static void > > +fixup_bb_partition (basic_block bb) > > +{ > > + edge e; > > + edge_iterator ei; > > + > > + /* Now need to make bb's pred edges non-region crossing. */ > > + FOR_EACH_EDGE (e, ei, bb->preds) > > + { > > + fixup_partition_crossing (e, e->dest); > > + } > > + > > + /* Possibly need to make bb's successor edges region crossing, > > + or remove stale region crossing. */ > > + FOR_EACH_EDGE (e, ei, bb->succs) > > + { > > + if ((e->flags & EDGE_FALLTHRU) > > + && BB_PARTITION (bb) != BB_PARTITION (e->dest) > > + && e->dest != EXIT_BLOCK_PTR) > > + /* force_nonfallthru_and_redirect calls > fixup_partition_crossing. */ > > + force_nonfallthru (e); > > + else > > + fixup_partition_crossing (e, e->dest); > > + } > > +} > > + > > /* Attempt to change code to redirect edge E to TARGET. Don't do that > on > > expense of adding new instructions or reordering basic blocks. > > > > @@ -1302,16 +1368,18 @@ rtl_redirect_edge_and_branch (edge e, basic_block > > { > > edge ret; > > basic_block src = e->src; > > + basic_block dest = e->dest; > > > > if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH)) > > return NULL; > > > > - if (e->dest == target) > > + if (dest == target) > > return e; > > > > if ((ret = try_redirect_by_replacing_jump (e, target, false)) != NULL) > > { > > df_set_bb_dirty (src); > > + fixup_partition_crossing (ret, target); > > return ret; > > } > > > > @@ -1320,6 +1388,7 @@ rtl_redirect_edge_and_branch (edge e, basic_block > > return NULL; > > > > df_set_bb_dirty (src); > > + fixup_partition_crossing (ret, target); > > return ret; > > } > > > > @@ -1454,18 +1523,16 @@ force_nonfallthru_and_redirect (edge e, > basic_bloc > > /* Make sure new block ends up in correct hot/cold section. */ > > > > BB_COPY_PARTITION (jump_block, e->src); > > - if (flag_reorder_blocks_and_partition > > - && targetm_common.have_named_sections > > - && JUMP_P (BB_END (jump_block)) > > - && !any_condjump_p (BB_END (jump_block)) > > - && (EDGE_SUCC (jump_block, 0)->flags & EDGE_CROSSING)) > > - add_reg_note (BB_END (jump_block), REG_CROSSING_JUMP, NULL_RTX); > > > > /* Wire edge in. */ > > new_edge = make_edge (e->src, jump_block, EDGE_FALLTHRU); > > new_edge->probability = probability; > > new_edge->count = count; > > > > + /* If e->src was previously region crossing, it no longer is > > + and the reg crossing note should be removed. */ > > + fixup_partition_crossing (new_edge, jump_block); > > + > > /* Redirect old edge. */ > > redirect_edge_pred (e, jump_block); > > e->probability = REG_BR_PROB_BASE; > > @@ -1521,13 +1588,16 @@ force_nonfallthru_and_redirect (edge e, > basic_bloc > > LABEL_NUSES (label)++; > > } > > > > - emit_barrier_after (BB_END (jump_block)); > > + /* We might be in cfg layout mode, and if so, the following routine > will > > + insert the barrier correctly. */ > > + emit_barrier_after_bb (jump_block); > > redirect_edge_succ_nodup (e, target); > > > > if (abnormal_edge_flags) > > make_edge (src, target, abnormal_edge_flags); > > > > df_mark_solutions_dirty (); > > + fixup_partition_crossing (e, target); > > return new_bb; > > } > > > > @@ -1626,7 +1696,7 @@ rtl_move_block_after (basic_block bb ATTRIBUTE_UNU > > static basic_block > > rtl_split_edge (edge edge_in) > > { > > - basic_block bb; > > + basic_block bb, new_bb; > > rtx before; > > > > /* Abnormal edges cannot be split. */ > > @@ -1659,12 +1729,26 @@ rtl_split_edge (edge edge_in) > > else > > { > > bb = create_basic_block (before, NULL, edge_in->dest->prev_bb); > > - /* ??? Why not edge_in->dest->prev_bb here? */ > > - BB_COPY_PARTITION (bb, edge_in->dest); > > + if (edge_in->src == ENTRY_BLOCK_PTR) > > + BB_COPY_PARTITION (bb, edge_in->dest); > > + else > > + /* Put the split bb into the src partition, to avoid creating > > + a situation where a cold bb dominates a hot bb, in the case > > + where src is cold and dest is hot. The src will dominate > > + the new bb (whereas it might not have dominated dest). */ > > + BB_COPY_PARTITION (bb, edge_in->src); > > } > > > > make_single_succ_edge (bb, edge_in->dest, EDGE_FALLTHRU); > > > > + /* Can't allow a region crossing edge to be fallthrough. */ > > + if (BB_PARTITION (bb) != BB_PARTITION (edge_in->dest) > > + && edge_in->dest != EXIT_BLOCK_PTR) > > + { > > + new_bb = force_nonfallthru (single_succ_edge (bb)); > > + gcc_assert (!new_bb); > > + } > > + > > /* For non-fallthru edges, we must adjust the predecessor's > > jump instruction to target our new block. */ > > if ((edge_in->flags & EDGE_FALLTHRU) == 0) > > @@ -1777,17 +1861,13 @@ commit_one_edge_insertion (edge e) > > else > > { > > bb = split_edge (e); > > - after = BB_END (bb); > > > > - if (flag_reorder_blocks_and_partition > > - && targetm_common.have_named_sections > > - && e->src != ENTRY_BLOCK_PTR > > - && BB_PARTITION (e->src) == BB_COLD_PARTITION > > - && !(e->flags & EDGE_CROSSING) > > - && JUMP_P (after) > > - && !any_condjump_p (after) > > - && (single_succ_edge (bb)->flags & EDGE_CROSSING)) > > - add_reg_note (after, REG_CROSSING_JUMP, NULL_RTX); > > + /* If e crossed a partition boundary, we needed to make bb end in > > + a region-crossing jump, even though it was originally > fallthru. */ > > + if (JUMP_P (BB_END (bb))) > > + before = BB_END (bb); > > + else > > + after = BB_END (bb); > > } > > > > /* Now that we've found the spot, do the insertion. */ > > @@ -1827,6 +1907,14 @@ commit_edge_insertions (void) > > { > > basic_block bb; > > > > + /* Optimization passes that invoke this routine can cause hot blocks > > + previously reached by both hot and cold blocks to become dominated > only > > + by cold blocks. This will cause the verification below to fail, > > + and lead to now cold code in the hot section. In some cases this > > + may only be visible after newly unreachable blocks are deleted, > > + which will be done by fixup_partitions. */ > > + fixup_partitions (); > > + > > #ifdef ENABLE_CHECKING > > verify_flow_info (); > > #endif > > @@ -2028,7 +2116,75 @@ get_last_bb_insn (basic_block bb) > > > > return end; > > } > > - > > + > > +/* Perform cleanup on the hot/cold bb partitioning after optimization > > + passes that modify the cfg. */ > > + > > +void > > +fixup_partitions (void) > > +{ > > + basic_block bb; > > + > > + if (!crtl->has_bb_partition) > > + return; > > + > > + /* Delete any blocks that became unreachable and weren't > > + already cleaned up, for example during edge forwarding > > + and convert_jumps_to_returns. This will expose more > > + opportunities for fixing the partition boundaries here. > > + Also, the calculation of the dominance graph during verification > > + will assert if there are unreachable nodes. */ > > + delete_unreachable_blocks (); > > + > > + /* If there are partitions, do a sanity check on them: A basic block > in > > + a cold partition cannot dominate a basic block in a hot partition. > > + Fixup any that now violate this requirement, as a result of edge > > + forwarding and unreachable block deletion. */ > > + VEC (basic_block, heap) *bbs_in_cold_partition = NULL; > > + VEC (basic_block, heap) *bbs_to_fix = NULL; > > + FOR_EACH_BB (bb) > > + if ((BB_PARTITION (bb) == BB_COLD_PARTITION)) > > + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb); > > + if (! VEC_empty (basic_block, bbs_in_cold_partition)) > > + { > > + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); > > + basic_block son; > > + > > + if (dom_calculated_here) > > + calculate_dominance_info (CDI_DOMINATORS); > > + > > + while (! VEC_empty (basic_block, bbs_in_cold_partition)) > > + { > > + bb = VEC_pop (basic_block, bbs_in_cold_partition); > > + /* If bb is not yet cold (because it was added below as > > + a block dominated by a cold bb) then mark it cold here. */ > > + if ((BB_PARTITION (bb) != BB_COLD_PARTITION)) > > + { > > + BB_SET_PARTITION (bb, BB_COLD_PARTITION); > > + VEC_safe_push (basic_block, heap, bbs_to_fix, bb); > > + } > > + /* Any blocks dominated by a block in the cold section > > + must also be cold. */ > > + for (son = first_dom_son (CDI_DOMINATORS, bb); > > + son; > > + son = next_dom_son (CDI_DOMINATORS, son)) > > + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, > son); > > + } > > + > > + if (dom_calculated_here) > > + free_dominance_info (CDI_DOMINATORS); > > + } > > + > > + /* Do the partition fixup after all necessary blocks have been > converted to > > + cold, so that we only update the region crossings the minimum > number of > > + places, which can require forcing edges to be non fallthru. */ > > + while (! VEC_empty (basic_block, bbs_to_fix)) > > + { > > + bb = VEC_pop (basic_block, bbs_to_fix); > > + fixup_bb_partition (bb); > > + } > > +} > > + > > /* Verify the CFG and RTL consistency common for both underlying RTL and > > cfglayout RTL. > > > > @@ -2052,6 +2208,7 @@ rtl_verify_flow_info_1 (void) > > rtx x; > > int err = 0; > > basic_block bb; > > + bool have_partitions = false; > > > > /* Check the general integrity of the basic blocks. */ > > FOR_EACH_BB_REVERSE (bb) > > @@ -2169,6 +2326,8 @@ rtl_verify_flow_info_1 (void) > > > > if (e->flags & EDGE_ABNORMAL) > > n_abnormal++; > > + > > + have_partitions |= is_crossing; > > } > > > > if (n_eh && !find_reg_note (BB_END (bb), REG_EH_REGION, NULL_RTX)) > > @@ -2293,6 +2452,40 @@ rtl_verify_flow_info_1 (void) > > } > > } > > > > + /* If there are partitions, do a sanity check on them: A basic block > in > > + a cold partition cannot dominate a basic block in a hot partition. > */ > > + VEC (basic_block, heap) *bbs_in_cold_partition = NULL; > > + if (have_partitions && !err) > > + FOR_EACH_BB (bb) > > + if ((BB_PARTITION (bb) == BB_COLD_PARTITION)) > > + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb); > > + if (! VEC_empty (basic_block, bbs_in_cold_partition)) > > + { > > + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); > > + basic_block son; > > + > > + if (dom_calculated_here) > > + calculate_dominance_info (CDI_DOMINATORS); > > + > > + while (! VEC_empty (basic_block, bbs_in_cold_partition)) > > + { > > + bb = VEC_pop (basic_block, bbs_in_cold_partition); > > + if ((BB_PARTITION (bb) != BB_COLD_PARTITION)) > > + { > > + error ("non-cold basic block %d dominated " > > + "by a block in the cold partition", bb->index); > > + err = 1; > > + } > > + for (son = first_dom_son (CDI_DOMINATORS, bb); > > + son; > > + son = next_dom_son (CDI_DOMINATORS, son)) > > + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, > son); > > + } > > + > > + if (dom_calculated_here) > > + free_dominance_info (CDI_DOMINATORS); > > + } > > + > > /* Clean up. */ > > return err; > > } > > @@ -2965,14 +3158,41 @@ record_effective_endpoints (void) > > else > > cfg_layout_function_header = NULL_RTX; > > > > + had_sec_boundary_notes = false; > > + > > next_insn = get_insns (); > > FOR_EACH_BB (bb) > > { > > rtx end; > > > > if (PREV_INSN (BB_HEAD (bb)) && next_insn != BB_HEAD (bb)) > > - BB_HEADER (bb) = unlink_insn_chain (next_insn, > > - PREV_INSN (BB_HEAD (bb))); > > + { > > + /* Rather than try to keep section boundary notes > incrementally > > + up-to-date through cfg layout optimizations, simply remove > them > > + and flag that they should be re-inserted when exiting > > + cfg layout mode. */ > > + rtx check_insn = next_insn; > > + while (check_insn) > > + { > > + if (NOTE_P (check_insn) > > + && NOTE_KIND (check_insn) == > NOTE_INSN_SWITCH_TEXT_SECTIONS) > > + { > > + had_sec_boundary_notes |= true; > > + /* Remove note from chain. Grab new next_insn first. */ > > + if (next_insn == check_insn) > > + next_insn = NEXT_INSN (check_insn); > > + /* Delete note. */ > > + delete_insn (check_insn); > > + /* There will only be one. */ > > + break; > > + } > > + check_insn = NEXT_INSN (check_insn); > > + } > > + /* If we still have header instructions left after above > loop. */ > > + if (next_insn != BB_HEAD (bb)) > > + BB_HEADER (bb) = unlink_insn_chain (next_insn, > > + PREV_INSN (BB_HEAD > (bb))); > > + } > > end = skip_insns_after_block (bb); > > if (NEXT_INSN (BB_END (bb)) && BB_END (bb) != end) > > BB_FOOTER (bb) = unlink_insn_chain (NEXT_INSN (BB_END (bb)), > end); > > @@ -3000,7 +3220,7 @@ outof_cfg_layout_mode (void) > > if (bb->next_bb != EXIT_BLOCK_PTR) > > bb->aux = bb->next_bb; > > > > - cfg_layout_finalize (); > > + cfg_layout_finalize (false); > > > > return 0; > > } > > @@ -3120,10 +3340,13 @@ relink_block_chain (bool stay_in_cfglayout_mode) > > } > > > > > > -/* Given a reorder chain, rearrange the code to match. */ > > +/* Given a reorder chain, rearrange the code to match. If > > + this is called when we will FINALIZE_REORDER_BLOCKS, or when > > + section boundary notes were removed on entry to cfg layout > > + mode, insert section boundary notes here. */ > > > > static void > > -fixup_reorder_chain (void) > > +fixup_reorder_chain (bool finalize_reorder_blocks) > > { > > basic_block bb; > > rtx insn = NULL; > > @@ -3150,7 +3373,7 @@ static void > > PREV_INSN (BB_HEADER (bb)) = insn; > > insn = BB_HEADER (bb); > > while (NEXT_INSN (insn)) > > - insn = NEXT_INSN (insn); > > + insn = NEXT_INSN (insn); > > } > > if (insn) > > NEXT_INSN (insn) = BB_HEAD (bb); > > @@ -3175,6 +3398,11 @@ static void > > insn = NEXT_INSN (insn); > > > > set_last_insn (insn); > > + > > + /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes. */ > > + if (had_sec_boundary_notes || finalize_reorder_blocks) > > + insert_section_boundary_note (); > > + > > #ifdef ENABLE_CHECKING > > verify_insn_chain (); > > #endif > > @@ -3187,7 +3415,7 @@ static void > > edge e_fall, e_taken, e; > > rtx bb_end_insn; > > rtx ret_label = NULL_RTX; > > - basic_block nb, src_bb; > > + basic_block nb; > > edge_iterator ei; > > > > if (EDGE_COUNT (bb->succs) == 0) > > @@ -3322,7 +3550,6 @@ static void > > /* We got here if we need to add a new jump insn. > > Note force_nonfallthru can delete E_FALL and thus we have to > > save E_FALL->src prior to the call to force_nonfallthru. */ > > - src_bb = e_fall->src; > > nb = force_nonfallthru_and_redirect (e_fall, e_fall->dest, > ret_label); > > if (nb) > > { > > @@ -3330,17 +3557,6 @@ static void > > bb->aux = nb; > > /* Don't process this new block. */ > > bb = nb; > > - > > - /* Make sure new bb is tagged for correct section (same as > > - fall-thru source, since you cannot fall-thru across > > - section boundaries). */ > > - BB_COPY_PARTITION (src_bb, single_pred (bb)); > > - if (flag_reorder_blocks_and_partition > > - && targetm_common.have_named_sections > > - && JUMP_P (BB_END (bb)) > > - && !any_condjump_p (BB_END (bb)) > > - && (EDGE_SUCC (bb, 0)->flags & EDGE_CROSSING)) > > - add_reg_note (BB_END (bb), REG_CROSSING_JUMP, NULL_RTX); > > } > > } > > > > @@ -3644,10 +3860,11 @@ duplicate_insn_chain (rtx from, rtx to) > > case NOTE_INSN_FUNCTION_BEG: > > /* There is always just single entry to function. */ > > case NOTE_INSN_BASIC_BLOCK: > > + /* We should only switch text sections once. */ > > + case NOTE_INSN_SWITCH_TEXT_SECTIONS: > > break; > > > > case NOTE_INSN_EPILOGUE_BEG: > > - case NOTE_INSN_SWITCH_TEXT_SECTIONS: > > emit_note_copy (insn); > > break; > > > > @@ -3759,10 +3976,13 @@ break_superblocks (void) > > } > > > > /* Finalize the changes: reorder insn list according to the sequence > specified > > - by aux pointers, enter compensation code, rebuild scope forest. */ > > + by aux pointers, enter compensation code, rebuild scope forest. If > > + this is called when we will FINALIZE_REORDER_BLOCKS, indicate that > > + to fixup_reorder_chain so that it can insert the proper switch text > > + section notes. */ > > > > void > > -cfg_layout_finalize (void) > > +cfg_layout_finalize (bool finalize_reorder_blocks) > > { > > #ifdef ENABLE_CHECKING > > verify_flow_info (); > > @@ -3775,7 +3995,7 @@ void > > #endif > > ) > > fixup_fallthru_exit_predecessor (); > > - fixup_reorder_chain (); > > + fixup_reorder_chain (finalize_reorder_blocks); > > > > rebuild_jump_labels (get_insns ()); > > delete_dead_jumptables (); > > @@ -4454,8 +4674,7 @@ rtl_can_remove_branch_p (const_edge e) > > if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH)) > > return false; > > > > - if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX) > > - || BB_PARTITION (src) != BB_PARTITION (target)) > > + if (BB_PARTITION (src) != BB_PARTITION (target)) > > return false; > > > > if (!onlyjump_p (insn) > > > > -- > > This patch is available for review at > http://codereview.appspot.com/6823047 > > > > -- > Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413 >
Sign in to reply to this message.
Hi, I have tested your patch on Spec2000 on ARM, and I can still see several failures caused by: "error: fallthru edge crosses section boundary", including the case described in PR55121. On 26 November 2012 16:55, Teresa Johnson <tejohnson@google.com> wrote: > Ping. > Teresa > > On Thu, Nov 15, 2012 at 12:10 PM, Teresa Johnson <tejohnson@google.com> wrote: >> Revised patch that fixes failures encountered when enabling >> -freorder-blocks-and-partition, including the failure reported in PR 53743. >> >> This includes new verification code to ensure no cold blocks dominate hot >> blocks contributed by Steven Bosscher. >> >> I attempted to make the handling of partition updates through the optimization >> passes much more consistent, removing a number of partial fixes in the code >> stream in the process. The code to fixup partitions (including the BB_PARTITION >> assignement, region crossing jump notes, and switch text section notes) is >> now handled in a few centralized locations. For example, inside >> rtl_redirect_edge_and_branch and force_nonfallthru_and_redirect, so that callers >> don't need to attempt the fixup themselves. >> >> For optimization passes that make adjustments to the cfg while in cfg layout >> mode that are not easy to fix up incrementally, the new routine >> fixup_partitions handles the cleanup globally. This does require calculation >> of the dominance relation, however, as far as I can tell the routines which >> now invoke this global fixup (try_optimize_cfg and commit_edge_insertions) >> are invoked typically once (or a small number of times in the case of >> try_optimize_cfg) per optimization pass. Additionally, I compared the >> -ftime-report output for some large fdo compilations and saw only minimal >> increases in the dominance computation times, which were only a tiny percent >> of the overall compile time. >> >> Additionally, I added a flag to the rtl_data structure to indicate whether >> any partitioning was actually performed, so that optimizations which were >> conservatively disabled whenever the flag_reorder_blocks_and_partition >> is enabled (e.g. try_crossjump_to_edge, part of connect_traces) can be less >> conservative for functions where no partitions were formed (e.g. they are >> completely hot). >> >> Bootstrapped and tested on x86_64-unknown-linux-gnu. Also tested with SPEC2006 int >> benchmarks and internal google benchmarks using profile feedback and >> -freorder-blocks-and-partition to get more coverage. Ok for trunk? >> >> Thanks, >> Teresa >> >> 2012-11-14 Teresa Johnson <tejohnson@google.com> >> Steven Bosscher <steven@gcc.gnu.org> >> >> * cfghooks.h (cfg_layout_finalize): New parameter. >> * modulo-sched.c (rest_of_handle_sms): New cfg_layout_finalize >> parameter. >> * ifcvt.c (find_if_case_1): Replace BB_COPY_PARTITION with assert >> as this is now done by redirect_edge_and_branch_force. >> * function.c (thread_prologue_and_epilogue_insns): Insert new bb after >> barriers, new cfg_layout_finalize parameter, and don't store exit >> predecessor BB until after it is potentially split. >> * function.h (struct rtl_data): New flag has_bb_partition. >> * hw-doloop.c (reorder_loops): New cfg_layout_finalize parameter. >> * cfgcleanup.c (try_crossjump_to_edge): Only skip optimization if >> any blocks in function actually partitioned. >> (try_optimize_cfg): If cfg changed, invoke fixup_partitions to clean >> up partitioning. >> * bb-reorder.c (connect_traces): Only look for partitions and skip >> block copying if any blocks in function actually partitioned. >> (emit_barrier_after_bb): Handle insertion in non-cfglayout mode. >> (find_rarely_executed_basic_blocks_and_crossing_edges): Ensure >> that no cold blocks dominate a hot block. >> (fix_up_fall_thru_edges): Replace BB_COPY_PARTITION with assert >> as this is now done by force_nonfallthru_and_redirect. >> (add_reg_crossing_jump_notes): Handle the fact that some jumps may >> already be marked with region crossing note. >> (reorder_basic_blocks): Only need to verify partitions if any >> blocks in function actually partitioned. >> (insert_section_boundary_note): Only need to insert note if any >> blocks in function actually partitioned. >> (rest_of_handle_reorder_blocks): New cfg_layout_finalize >> parameter, and remove call to insert_section_boundary_note as this >> is now called via cfg_layout_finalize/fixup_reorder_chain. >> (duplicate_computed_gotos): New cfg_layout_finalize >> parameter. >> (partition_hot_cold_basic_blocks): Set flag indicating function >> has bb partitions. >> * bb-reorder.h: Declare insert_section_boundary_note and >> emit_barrier_after_bb, which are no longer static. >> * basic-block.h: Declare new function fixup_partitions. >> * cfgrtl.c (try_redirect_by_replacing_jump): Remove unnecessary >> check for region crossing note. >> (fixup_partition_crossing): New function. >> (fixup_bb_partition): Ditto. >> (rtl_redirect_edge_and_branch): Fixup partition boundaries. >> (force_nonfallthru_and_redirect): Fixup partition boundaries, >> remove old code that tried to do this. Emit barrier correctly >> when we are in cfglayout mode. >> (rtl_split_edge): Correctly fixup partition boundaries. >> (commit_one_edge_insertion): Remove old code that tried to >> fixup region crossing edge since this is now handled in >> split_block, and set up insertion point correctly since >> block may now end in a jump. >> (commit_edge_insertions): Invoke fixup_partitions to sanitize partition >> boundaries after optimizations that modify cfg and before trying to >> verify the flow info. >> (fixup_partitions): New function. >> (rtl_verify_flow_info_1): Add verification that no cold bbs dominate >> hot bbs. >> (record_effective_endpoints): Remove region-crossing notes and set flag >> indicating that they need to be reinserted on exit from cfglayout mode. >> (outof_cfg_layout_mode): New cfg_layout_finalize parameter. >> (fixup_reorder_chain): Call insert_section_boundary_note if necessary. >> Remove old code that attempted to fixup region crossing note as >> this is now handled in force_nonfallthru_and_redirect. >> (duplicate_insn_chain): Don't duplicate switch section notes. >> (cfg_layout_finalize): Pass new parameter to fixup_reorder_chain. >> (rtl_can_remove_branch_p): Remove unnecessary check for region crossing >> note. >> >> Index: cfghooks.h >> =================================================================== >> --- cfghooks.h (revision 193376) >> +++ cfghooks.h (working copy) >> @@ -204,7 +204,7 @@ extern void copy_bbs (basic_block *, unsigned, bas >> void account_profile_record (struct profile_record *, int); >> >> extern void cfg_layout_initialize (unsigned int); >> -extern void cfg_layout_finalize (void); >> +extern void cfg_layout_finalize (bool); >> >> /* Hooks containers. */ >> extern struct cfg_hooks gimple_cfg_hooks; >> @@ -218,4 +218,3 @@ extern void cfg_layout_rtl_register_cfg_hooks (voi >> extern void gimple_register_cfg_hooks (void); >> extern struct cfg_hooks get_cfg_hooks (void); >> extern void set_cfg_hooks (struct cfg_hooks); >> - >> Index: modulo-sched.c >> =================================================================== >> --- modulo-sched.c (revision 193376) >> +++ modulo-sched.c (working copy) >> @@ -3354,7 +3354,7 @@ rest_of_handle_sms (void) >> if (bb->next_bb != EXIT_BLOCK_PTR) >> bb->aux = bb->next_bb; >> free_dominance_info (CDI_DOMINATORS); >> - cfg_layout_finalize (); >> + cfg_layout_finalize (false); >> #endif /* INSN_SCHEDULING */ >> return 0; >> } >> Index: ifcvt.c >> =================================================================== >> --- ifcvt.c (revision 193376) >> +++ ifcvt.c (working copy) >> @@ -3900,10 +3900,9 @@ find_if_case_1 (basic_block test_bb, edge then_edg >> if (new_bb) >> { >> df_bb_replace (then_bb_index, new_bb); >> - /* Since the fallthru edge was redirected from test_bb to new_bb, >> - we need to ensure that new_bb is in the same partition as >> - test bb (you can not fall through across section boundaries). */ >> - BB_COPY_PARTITION (new_bb, test_bb); >> + /* This should have been done above via force_nonfallthru_and_redirect >> + (possibly called from redirect_edge_and_branch_force). */ >> + gcc_assert (BB_PARTITION (new_bb) == BB_PARTITION (test_bb)); >> } >> >> num_true_changes++; >> Index: function.c >> =================================================================== >> --- function.c (revision 193376) >> +++ function.c (working copy) >> @@ -6249,8 +6249,10 @@ thread_prologue_and_epilogue_insns (void) >> break; >> if (e) >> { >> - copy_bb = create_basic_block (NEXT_INSN (BB_END (e->src)), >> - NULL_RTX, e->src); >> + /* Make sure we insert after any barriers. */ >> + rtx end = get_last_bb_insn (e->src); >> + copy_bb = create_basic_block (NEXT_INSN (end), >> + NULL_RTX, e->src); >> BB_COPY_PARTITION (copy_bb, e->src); >> } >> else >> @@ -6475,7 +6477,7 @@ thread_prologue_and_epilogue_insns (void) >> if (cur_bb->index >= NUM_FIXED_BLOCKS >> && cur_bb->next_bb->index >= NUM_FIXED_BLOCKS) >> cur_bb->aux = cur_bb->next_bb; >> - cfg_layout_finalize (); >> + cfg_layout_finalize (false); >> } >> >> epilogue_done: >> @@ -6517,7 +6519,7 @@ epilogue_done: >> basic_block simple_return_block_cold = NULL; >> edge pending_edge_hot = NULL; >> edge pending_edge_cold = NULL; >> - basic_block exit_pred = EXIT_BLOCK_PTR->prev_bb; >> + basic_block exit_pred; >> int i; >> >> gcc_assert (entry_edge != orig_entry_edge); >> @@ -6545,6 +6547,12 @@ epilogue_done: >> else >> pending_edge_cold = e; >> } >> + >> + /* Save a pointer to the exit's predecessor BB for use in >> + inserting new BBs at the end of the function. Do this >> + after the call to split_block above which may split >> + the original exit pred. */ >> + exit_pred = EXIT_BLOCK_PTR->prev_bb; >> >> FOR_EACH_VEC_ELT (edge, unconverted_simple_returns, i, e) >> { >> Index: function.h >> =================================================================== >> --- function.h (revision 193376) >> +++ function.h (working copy) >> @@ -459,6 +459,11 @@ struct GTY(()) rtl_data { >> sched2) and is useful only if the port defines LEAF_REGISTERS. */ >> bool uses_only_leaf_regs; >> >> + /* Nonzero if the function being compiled has undergone hot/cold partitioning >> + (under flag_reorder_blocks_and_partition) and has at least one cold >> + block. */ >> + bool has_bb_partition; >> + >> /* Like regs_ever_live, but 1 if a reg is set or clobbered from an >> asm. Unlike regs_ever_live, elements of this array corresponding >> to eliminable regs (like the frame pointer) are set if an asm >> Index: hw-doloop.c >> =================================================================== >> --- hw-doloop.c (revision 193376) >> +++ hw-doloop.c (working copy) >> @@ -547,7 +547,7 @@ reorder_loops (hwloop_info loops) >> else >> bb->aux = NULL; >> } >> - cfg_layout_finalize (); >> + cfg_layout_finalize (false); >> clear_aux_for_blocks (); >> df_analyze (); >> } >> Index: cfgcleanup.c >> =================================================================== >> --- cfgcleanup.c (revision 193376) >> +++ cfgcleanup.c (working copy) >> @@ -1824,7 +1824,7 @@ try_crossjump_to_edge (int mode, edge e1, edge e2, >> partition boundaries). See the comments at the top of >> bb-reorder.c:partition_hot_cold_basic_blocks for complete details. */ >> >> - if (flag_reorder_blocks_and_partition && reload_completed) >> + if (crtl->has_bb_partition && reload_completed) >> return false; >> >> /* Search backward through forwarder blocks. We don't need to worry >> @@ -2767,10 +2767,21 @@ try_optimize_cfg (int mode) >> df_analyze (); >> } >> >> + if (changed) >> + { >> + /* Edge forwarding in particular can cause hot blocks previously >> + reached by both hot and cold blocks to become dominated only >> + by cold blocks. This will cause the verification below to fail, >> + and lead to now cold code in the hot section. This is not easy >> + to detect and fix during edge forwarding, and in some cases >> + is only visible after newly unreachable blocks are deleted, >> + which will be done in fixup_partitions. */ >> + fixup_partitions (); >> + >> #ifdef ENABLE_CHECKING >> - if (changed) >> - verify_flow_info (); >> + verify_flow_info (); >> #endif >> + } >> >> changed_overall |= changed; >> first_pass = false; >> Index: bb-reorder.c >> =================================================================== >> --- bb-reorder.c (revision 193376) >> +++ bb-reorder.c (working copy) >> @@ -1054,7 +1054,7 @@ connect_traces (int n_traces, struct trace *traces >> current_partition = BB_PARTITION (traces[0].first); >> two_passes = false; >> >> - if (flag_reorder_blocks_and_partition) >> + if (crtl->has_bb_partition) >> for (i = 0; i < n_traces && !two_passes; i++) >> if (BB_PARTITION (traces[0].first) >> != BB_PARTITION (traces[i].first)) >> @@ -1263,7 +1263,7 @@ connect_traces (int n_traces, struct trace *traces >> } >> } >> >> - if (flag_reorder_blocks_and_partition) >> + if (crtl->has_bb_partition) >> try_copy = false; >> >> /* Copy tiny blocks always; copy larger blocks only when the >> @@ -1381,13 +1381,14 @@ get_uncond_jump_length (void) >> return length; >> } >> >> -/* Emit a barrier into the footer of BB. */ >> +/* Emit a barrier after BB, into the footer if we are in CFGLAYOUT mode. */ >> >> -static void >> +void >> emit_barrier_after_bb (basic_block bb) >> { >> rtx barrier = emit_barrier_after (BB_END (bb)); >> - BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier); >> + if (current_ir_type () == IR_RTL_CFGLAYOUT) >> + BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier); >> } >> >> /* The landing pad OLD_LP, in block OLD_BB, has edges from both partitions. >> @@ -1463,18 +1464,109 @@ find_rarely_executed_basic_blocks_and_crossing_edg >> { >> VEC(edge, heap) *crossing_edges = NULL; >> basic_block bb; >> - edge e; >> - edge_iterator ei; >> + edge e, e2; >> + edge_iterator ei, ei2; >> + unsigned int cold_bb_count = 0; >> + VEC (basic_block, heap) *bbs_in_hot_partition = NULL; >> + VEC (basic_block, heap) *bbs_newly_hot = NULL; >> >> /* Mark which partition (hot/cold) each basic block belongs in. */ >> FOR_EACH_BB (bb) >> { >> if (probably_never_executed_bb_p (cfun, bb)) >> - BB_SET_PARTITION (bb, BB_COLD_PARTITION); >> + { >> + BB_SET_PARTITION (bb, BB_COLD_PARTITION); >> + cold_bb_count++; >> + } >> else >> - BB_SET_PARTITION (bb, BB_HOT_PARTITION); >> + { >> + BB_SET_PARTITION (bb, BB_HOT_PARTITION); >> + VEC_safe_push (basic_block, heap, bbs_in_hot_partition, bb); >> + } >> } >> >> + /* Ensure that no cold bbs dominate hot bbs. This could happen as a result of >> + several different possibilities. One is that there are edge weight insanities >> + due to optimization phases that do not properly update basic block profile >> + counts. The second is that the entry of the function may not be hot, because >> + it is entered fewer times than the number of profile training runs, but there >> + is a loop inside the function that causes blocks within the function to be >> + above the threshold for hotness. */ >> + if (cold_bb_count) >> + { >> + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); >> + >> + if (dom_calculated_here) >> + calculate_dominance_info (CDI_DOMINATORS); >> + >> + /* Keep examining hot bbs until we have either checked them all, or >> + re-marked all cold bbs hot. */ >> + while (! VEC_empty (basic_block, bbs_in_hot_partition) >> + && cold_bb_count) >> + { >> + basic_block dom_bb; >> + >> + bb = VEC_pop (basic_block, bbs_in_hot_partition); >> + dom_bb = get_immediate_dominator (CDI_DOMINATORS, bb); >> + >> + /* If bb's immediate dominator is also hot then it is ok. */ >> + if (BB_PARTITION (dom_bb) != BB_COLD_PARTITION) >> + continue; >> + >> + /* We have a hot bb with an immediate dominator that is cold. >> + The dominator needs to be re-marked to hot. */ >> + BB_SET_PARTITION (dom_bb, BB_HOT_PARTITION); >> + cold_bb_count--; >> + >> + /* Now we need to examine newly-hot dom_bb to see if it is also >> + dominated by a cold bb. */ >> + VEC_safe_push (basic_block, heap, bbs_in_hot_partition, dom_bb); >> + >> + /* We should also adjust any cold blocks that the newly-hot bb >> + feeds and see if it makes sense to re-mark those as hot as >> + well. */ >> + VEC_safe_push (basic_block, heap, bbs_newly_hot, dom_bb); >> + while (! VEC_empty (basic_block, bbs_newly_hot)) >> + { >> + basic_block new_hot_bb = VEC_pop (basic_block, bbs_newly_hot); >> + /* Examine all successors of this newly-hot bb to see if they >> + are cold and should be re-marked as hot. */ >> + FOR_EACH_EDGE (e, ei, new_hot_bb->succs) >> + { >> + bool any_cold_preds = false; >> + basic_block succ = e->dest; >> + if (BB_PARTITION (succ) != BB_COLD_PARTITION) >> + continue; >> + /* Does this block have any cold predecessors now? */ >> + FOR_EACH_EDGE (e2, ei2, succ->preds) >> + { >> + if (BB_PARTITION (e2->src) == BB_COLD_PARTITION) >> + { >> + any_cold_preds = true; >> + break; >> + } >> + } >> + if (any_cold_preds) >> + continue; >> + >> + /* Here we have a successor of newly-hot bb that is cold >> + but no longer has any cold precessessors. Since the original >> + assignment of our newly-hot bb was incorrect, this successor's >> + assignment as cold is also suspect. Go ahead and re-mark it >> + as hot now too. Better heuristics may be in order here. */ >> + BB_SET_PARTITION (succ, BB_HOT_PARTITION); >> + cold_bb_count--; >> + VEC_safe_push (basic_block, heap, bbs_in_hot_partition, succ); >> + /* Examine this successor as a newly-hot bb. */ >> + VEC_safe_push (basic_block, heap, bbs_newly_hot, succ); >> + } >> + } >> + } >> + >> + if (dom_calculated_here) >> + free_dominance_info (CDI_DOMINATORS); >> + } >> + >> /* The format of .gcc_except_table does not allow landing pads to >> be in a different partition as the throw. Fix this by either >> moving or duplicating the landing pads. */ >> @@ -1766,10 +1858,10 @@ fix_up_fall_thru_edges (void) >> new_bb->aux = cur_bb->aux; >> cur_bb->aux = new_bb; >> >> - /* Make sure new fall-through bb is in same >> - partition as bb it's falling through from. */ >> + /* This is done by force_nonfallthru_and_redirect. */ >> + gcc_assert (BB_PARTITION (new_bb) >> + == BB_PARTITION (cur_bb)); >> >> - BB_COPY_PARTITION (new_bb, cur_bb); >> single_succ_edge (new_bb)->flags |= EDGE_CROSSING; >> } >> else >> @@ -2067,7 +2159,10 @@ add_reg_crossing_jump_notes (void) >> FOR_EACH_BB (bb) >> FOR_EACH_EDGE (e, ei, bb->succs) >> if ((e->flags & EDGE_CROSSING) >> - && JUMP_P (BB_END (e->src))) >> + && JUMP_P (BB_END (e->src)) >> + /* Some notes were added during fix_up_fall_thru_edges, via >> + force_nonfallthru_and_redirect. */ >> + && !find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX)) >> add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); >> } >> >> @@ -2160,7 +2255,7 @@ reorder_basic_blocks (void) >> dump_flow_info (dump_file, dump_flags); >> } >> >> - if (flag_reorder_blocks_and_partition) >> + if (crtl->has_bb_partition) >> verify_hot_cold_block_grouping (); >> } >> >> @@ -2172,14 +2267,14 @@ reorder_basic_blocks (void) >> encountering this note will make the compiler switch between the >> hot and cold text sections. */ >> >> -static void >> +void >> insert_section_boundary_note (void) >> { >> basic_block bb; >> rtx new_note; >> int first_partition = 0; >> >> - if (!flag_reorder_blocks_and_partition) >> + if (!crtl->has_bb_partition) >> return; >> >> FOR_EACH_BB (bb) >> @@ -2222,10 +2317,8 @@ rest_of_handle_reorder_blocks (void) >> FOR_EACH_BB (bb) >> if (bb->next_bb != EXIT_BLOCK_PTR) >> bb->aux = bb->next_bb; >> - cfg_layout_finalize (); >> + cfg_layout_finalize (true); >> >> - /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes. */ >> - insert_section_boundary_note (); >> return 0; >> } >> >> @@ -2366,7 +2459,7 @@ duplicate_computed_gotos (void) >> } >> >> done: >> - cfg_layout_finalize (); >> + cfg_layout_finalize (false); >> >> BITMAP_FREE (candidates); >> return 0; >> @@ -2511,6 +2604,8 @@ partition_hot_cold_basic_blocks (void) >> if (crossing_edges == NULL) >> return 0; >> >> + crtl->has_bb_partition = true; >> + >> /* Make sure the source of any crossing edge ends in a jump and the >> destination of any crossing edge has a label. */ >> add_labels_and_missing_jumps (crossing_edges); >> Index: bb-reorder.h >> =================================================================== >> --- bb-reorder.h (revision 193376) >> +++ bb-reorder.h (working copy) >> @@ -36,4 +36,8 @@ extern struct target_bb_reorder *this_target_bb_re >> >> extern int get_uncond_jump_length (void); >> >> +extern void insert_section_boundary_note (void); >> + >> +extern void emit_barrier_after_bb (basic_block bb); >> + >> #endif >> Index: basic-block.h >> =================================================================== >> --- basic-block.h (revision 193376) >> +++ basic-block.h (working copy) >> @@ -806,6 +806,7 @@ extern basic_block force_nonfallthru_and_redirect >> extern bool contains_no_active_insn_p (const_basic_block); >> extern bool forwarder_block_p (const_basic_block); >> extern bool can_fallthru (basic_block, basic_block); >> +extern void fixup_partitions (void); >> >> /* In cfgbuild.c. */ >> extern void find_many_sub_basic_blocks (sbitmap); >> Index: cfgrtl.c >> =================================================================== >> --- cfgrtl.c (revision 193376) >> +++ cfgrtl.c (working copy) >> @@ -46,6 +46,7 @@ along with GCC; see the file COPYING3. If not see >> #include "tree.h" >> #include "hard-reg-set.h" >> #include "basic-block.h" >> +#include "bb-reorder.h" >> #include "regs.h" >> #include "flags.h" >> #include "function.h" >> @@ -67,11 +68,12 @@ along with GCC; see the file COPYING3. If not see >> Only applicable if the CFG is in cfglayout mode. */ >> static GTY(()) rtx cfg_layout_function_footer; >> static GTY(()) rtx cfg_layout_function_header; >> +static bool had_sec_boundary_notes; >> >> static rtx skip_insns_after_block (basic_block); >> static void record_effective_endpoints (void); >> static rtx label_for_bb (basic_block); >> -static void fixup_reorder_chain (void); >> +static void fixup_reorder_chain (bool finalize_reorder_blocks); >> >> void verify_insn_chain (void); >> static void fixup_fallthru_exit_predecessor (void); >> @@ -976,8 +978,7 @@ try_redirect_by_replacing_jump (edge e, basic_bloc >> partition boundaries). See the comments at the top of >> bb-reorder.c:partition_hot_cold_basic_blocks for complete details. */ >> >> - if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX) >> - || BB_PARTITION (src) != BB_PARTITION (target)) >> + if (BB_PARTITION (src) != BB_PARTITION (target)) >> return NULL; >> >> /* We can replace or remove a complex jump only when we have exactly >> @@ -1286,6 +1287,71 @@ redirect_branch_edge (edge e, basic_block target) >> return e; >> } >> >> +/* Called when edge E has been redirected to a new destination, >> + in order to update the region crossing flag on the edge and >> + jump. */ >> + >> +static void >> +fixup_partition_crossing (edge e, basic_block target) >> +{ >> + rtx note; >> + >> + gcc_assert (e->dest == target); >> + >> + if (e->src == ENTRY_BLOCK_PTR || target == EXIT_BLOCK_PTR) >> + return; >> + /* If we redirected an existing edge, it may already be marked >> + crossing, even though the new src is missing a reg crossing note. >> + But make sure reg crossing note doesn't already exist before >> + inserting. */ >> + if (BB_PARTITION (e->src) != BB_PARTITION (target)) >> + { >> + e->flags |= EDGE_CROSSING; >> + note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); >> + if (JUMP_P (BB_END (e->src)) >> + && !note) >> + add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); >> + } >> + else if (BB_PARTITION (e->src) == BB_PARTITION (target)) >> + { >> + e->flags &= ~EDGE_CROSSING; >> + /* Remove the region crossing note from jump at end of >> + e->src if it exists. */ >> + note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); >> + if (note) >> + remove_note (BB_END (e->src), note); >> + } >> +} >> + >> +/* Called when block BB has been reassigned to a different partition, >> + to ensure that the region crossing attributes are updated. */ >> + >> +static void >> +fixup_bb_partition (basic_block bb) >> +{ >> + edge e; >> + edge_iterator ei; >> + >> + /* Now need to make bb's pred edges non-region crossing. */ >> + FOR_EACH_EDGE (e, ei, bb->preds) >> + { >> + fixup_partition_crossing (e, e->dest); >> + } >> + >> + /* Possibly need to make bb's successor edges region crossing, >> + or remove stale region crossing. */ >> + FOR_EACH_EDGE (e, ei, bb->succs) >> + { >> + if ((e->flags & EDGE_FALLTHRU) >> + && BB_PARTITION (bb) != BB_PARTITION (e->dest) >> + && e->dest != EXIT_BLOCK_PTR) >> + /* force_nonfallthru_and_redirect calls fixup_partition_crossing. */ >> + force_nonfallthru (e); >> + else >> + fixup_partition_crossing (e, e->dest); >> + } >> +} >> + >> /* Attempt to change code to redirect edge E to TARGET. Don't do that on >> expense of adding new instructions or reordering basic blocks. >> >> @@ -1302,16 +1368,18 @@ rtl_redirect_edge_and_branch (edge e, basic_block >> { >> edge ret; >> basic_block src = e->src; >> + basic_block dest = e->dest; >> >> if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH)) >> return NULL; >> >> - if (e->dest == target) >> + if (dest == target) >> return e; >> >> if ((ret = try_redirect_by_replacing_jump (e, target, false)) != NULL) >> { >> df_set_bb_dirty (src); >> + fixup_partition_crossing (ret, target); >> return ret; >> } >> >> @@ -1320,6 +1388,7 @@ rtl_redirect_edge_and_branch (edge e, basic_block >> return NULL; >> >> df_set_bb_dirty (src); >> + fixup_partition_crossing (ret, target); >> return ret; >> } >> >> @@ -1454,18 +1523,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc >> /* Make sure new block ends up in correct hot/cold section. */ >> >> BB_COPY_PARTITION (jump_block, e->src); >> - if (flag_reorder_blocks_and_partition >> - && targetm_common.have_named_sections >> - && JUMP_P (BB_END (jump_block)) >> - && !any_condjump_p (BB_END (jump_block)) >> - && (EDGE_SUCC (jump_block, 0)->flags & EDGE_CROSSING)) >> - add_reg_note (BB_END (jump_block), REG_CROSSING_JUMP, NULL_RTX); >> >> /* Wire edge in. */ >> new_edge = make_edge (e->src, jump_block, EDGE_FALLTHRU); >> new_edge->probability = probability; >> new_edge->count = count; >> >> + /* If e->src was previously region crossing, it no longer is >> + and the reg crossing note should be removed. */ >> + fixup_partition_crossing (new_edge, jump_block); >> + >> /* Redirect old edge. */ >> redirect_edge_pred (e, jump_block); >> e->probability = REG_BR_PROB_BASE; >> @@ -1521,13 +1588,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc >> LABEL_NUSES (label)++; >> } >> >> - emit_barrier_after (BB_END (jump_block)); >> + /* We might be in cfg layout mode, and if so, the following routine will >> + insert the barrier correctly. */ >> + emit_barrier_after_bb (jump_block); >> redirect_edge_succ_nodup (e, target); >> >> if (abnormal_edge_flags) >> make_edge (src, target, abnormal_edge_flags); >> >> df_mark_solutions_dirty (); >> + fixup_partition_crossing (e, target); >> return new_bb; >> } >> >> @@ -1626,7 +1696,7 @@ rtl_move_block_after (basic_block bb ATTRIBUTE_UNU >> static basic_block >> rtl_split_edge (edge edge_in) >> { >> - basic_block bb; >> + basic_block bb, new_bb; >> rtx before; >> >> /* Abnormal edges cannot be split. */ >> @@ -1659,12 +1729,26 @@ rtl_split_edge (edge edge_in) >> else >> { >> bb = create_basic_block (before, NULL, edge_in->dest->prev_bb); >> - /* ??? Why not edge_in->dest->prev_bb here? */ >> - BB_COPY_PARTITION (bb, edge_in->dest); >> + if (edge_in->src == ENTRY_BLOCK_PTR) >> + BB_COPY_PARTITION (bb, edge_in->dest); >> + else >> + /* Put the split bb into the src partition, to avoid creating >> + a situation where a cold bb dominates a hot bb, in the case >> + where src is cold and dest is hot. The src will dominate >> + the new bb (whereas it might not have dominated dest). */ >> + BB_COPY_PARTITION (bb, edge_in->src); >> } >> >> make_single_succ_edge (bb, edge_in->dest, EDGE_FALLTHRU); >> >> + /* Can't allow a region crossing edge to be fallthrough. */ >> + if (BB_PARTITION (bb) != BB_PARTITION (edge_in->dest) >> + && edge_in->dest != EXIT_BLOCK_PTR) >> + { >> + new_bb = force_nonfallthru (single_succ_edge (bb)); >> + gcc_assert (!new_bb); >> + } >> + >> /* For non-fallthru edges, we must adjust the predecessor's >> jump instruction to target our new block. */ >> if ((edge_in->flags & EDGE_FALLTHRU) == 0) >> @@ -1777,17 +1861,13 @@ commit_one_edge_insertion (edge e) >> else >> { >> bb = split_edge (e); >> - after = BB_END (bb); >> >> - if (flag_reorder_blocks_and_partition >> - && targetm_common.have_named_sections >> - && e->src != ENTRY_BLOCK_PTR >> - && BB_PARTITION (e->src) == BB_COLD_PARTITION >> - && !(e->flags & EDGE_CROSSING) >> - && JUMP_P (after) >> - && !any_condjump_p (after) >> - && (single_succ_edge (bb)->flags & EDGE_CROSSING)) >> - add_reg_note (after, REG_CROSSING_JUMP, NULL_RTX); >> + /* If e crossed a partition boundary, we needed to make bb end in >> + a region-crossing jump, even though it was originally fallthru. */ >> + if (JUMP_P (BB_END (bb))) >> + before = BB_END (bb); >> + else >> + after = BB_END (bb); >> } >> >> /* Now that we've found the spot, do the insertion. */ >> @@ -1827,6 +1907,14 @@ commit_edge_insertions (void) >> { >> basic_block bb; >> >> + /* Optimization passes that invoke this routine can cause hot blocks >> + previously reached by both hot and cold blocks to become dominated only >> + by cold blocks. This will cause the verification below to fail, >> + and lead to now cold code in the hot section. In some cases this >> + may only be visible after newly unreachable blocks are deleted, >> + which will be done by fixup_partitions. */ >> + fixup_partitions (); >> + >> #ifdef ENABLE_CHECKING >> verify_flow_info (); >> #endif >> @@ -2028,7 +2116,75 @@ get_last_bb_insn (basic_block bb) >> >> return end; >> } >> - >> + >> +/* Perform cleanup on the hot/cold bb partitioning after optimization >> + passes that modify the cfg. */ >> + >> +void >> +fixup_partitions (void) >> +{ >> + basic_block bb; >> + >> + if (!crtl->has_bb_partition) >> + return; >> + >> + /* Delete any blocks that became unreachable and weren't >> + already cleaned up, for example during edge forwarding >> + and convert_jumps_to_returns. This will expose more >> + opportunities for fixing the partition boundaries here. >> + Also, the calculation of the dominance graph during verification >> + will assert if there are unreachable nodes. */ >> + delete_unreachable_blocks (); >> + >> + /* If there are partitions, do a sanity check on them: A basic block in >> + a cold partition cannot dominate a basic block in a hot partition. >> + Fixup any that now violate this requirement, as a result of edge >> + forwarding and unreachable block deletion. */ >> + VEC (basic_block, heap) *bbs_in_cold_partition = NULL; >> + VEC (basic_block, heap) *bbs_to_fix = NULL; >> + FOR_EACH_BB (bb) >> + if ((BB_PARTITION (bb) == BB_COLD_PARTITION)) >> + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb); >> + if (! VEC_empty (basic_block, bbs_in_cold_partition)) >> + { >> + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); >> + basic_block son; >> + >> + if (dom_calculated_here) >> + calculate_dominance_info (CDI_DOMINATORS); >> + >> + while (! VEC_empty (basic_block, bbs_in_cold_partition)) >> + { >> + bb = VEC_pop (basic_block, bbs_in_cold_partition); >> + /* If bb is not yet cold (because it was added below as >> + a block dominated by a cold bb) then mark it cold here. */ >> + if ((BB_PARTITION (bb) != BB_COLD_PARTITION)) >> + { >> + BB_SET_PARTITION (bb, BB_COLD_PARTITION); >> + VEC_safe_push (basic_block, heap, bbs_to_fix, bb); >> + } >> + /* Any blocks dominated by a block in the cold section >> + must also be cold. */ >> + for (son = first_dom_son (CDI_DOMINATORS, bb); >> + son; >> + son = next_dom_son (CDI_DOMINATORS, son)) >> + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son); >> + } >> + >> + if (dom_calculated_here) >> + free_dominance_info (CDI_DOMINATORS); >> + } >> + >> + /* Do the partition fixup after all necessary blocks have been converted to >> + cold, so that we only update the region crossings the minimum number of >> + places, which can require forcing edges to be non fallthru. */ >> + while (! VEC_empty (basic_block, bbs_to_fix)) >> + { >> + bb = VEC_pop (basic_block, bbs_to_fix); >> + fixup_bb_partition (bb); >> + } >> +} >> + >> /* Verify the CFG and RTL consistency common for both underlying RTL and >> cfglayout RTL. >> >> @@ -2052,6 +2208,7 @@ rtl_verify_flow_info_1 (void) >> rtx x; >> int err = 0; >> basic_block bb; >> + bool have_partitions = false; >> >> /* Check the general integrity of the basic blocks. */ >> FOR_EACH_BB_REVERSE (bb) >> @@ -2169,6 +2326,8 @@ rtl_verify_flow_info_1 (void) >> >> if (e->flags & EDGE_ABNORMAL) >> n_abnormal++; >> + >> + have_partitions |= is_crossing; >> } >> >> if (n_eh && !find_reg_note (BB_END (bb), REG_EH_REGION, NULL_RTX)) >> @@ -2293,6 +2452,40 @@ rtl_verify_flow_info_1 (void) >> } >> } >> >> + /* If there are partitions, do a sanity check on them: A basic block in >> + a cold partition cannot dominate a basic block in a hot partition. */ >> + VEC (basic_block, heap) *bbs_in_cold_partition = NULL; >> + if (have_partitions && !err) >> + FOR_EACH_BB (bb) >> + if ((BB_PARTITION (bb) == BB_COLD_PARTITION)) >> + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb); >> + if (! VEC_empty (basic_block, bbs_in_cold_partition)) >> + { >> + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); >> + basic_block son; >> + >> + if (dom_calculated_here) >> + calculate_dominance_info (CDI_DOMINATORS); >> + >> + while (! VEC_empty (basic_block, bbs_in_cold_partition)) >> + { >> + bb = VEC_pop (basic_block, bbs_in_cold_partition); >> + if ((BB_PARTITION (bb) != BB_COLD_PARTITION)) >> + { >> + error ("non-cold basic block %d dominated " >> + "by a block in the cold partition", bb->index); >> + err = 1; >> + } >> + for (son = first_dom_son (CDI_DOMINATORS, bb); >> + son; >> + son = next_dom_son (CDI_DOMINATORS, son)) >> + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son); >> + } >> + >> + if (dom_calculated_here) >> + free_dominance_info (CDI_DOMINATORS); >> + } >> + >> /* Clean up. */ >> return err; >> } >> @@ -2965,14 +3158,41 @@ record_effective_endpoints (void) >> else >> cfg_layout_function_header = NULL_RTX; >> >> + had_sec_boundary_notes = false; >> + >> next_insn = get_insns (); >> FOR_EACH_BB (bb) >> { >> rtx end; >> >> if (PREV_INSN (BB_HEAD (bb)) && next_insn != BB_HEAD (bb)) >> - BB_HEADER (bb) = unlink_insn_chain (next_insn, >> - PREV_INSN (BB_HEAD (bb))); >> + { >> + /* Rather than try to keep section boundary notes incrementally >> + up-to-date through cfg layout optimizations, simply remove them >> + and flag that they should be re-inserted when exiting >> + cfg layout mode. */ >> + rtx check_insn = next_insn; >> + while (check_insn) >> + { >> + if (NOTE_P (check_insn) >> + && NOTE_KIND (check_insn) == NOTE_INSN_SWITCH_TEXT_SECTIONS) >> + { >> + had_sec_boundary_notes |= true; >> + /* Remove note from chain. Grab new next_insn first. */ >> + if (next_insn == check_insn) >> + next_insn = NEXT_INSN (check_insn); >> + /* Delete note. */ >> + delete_insn (check_insn); >> + /* There will only be one. */ >> + break; >> + } >> + check_insn = NEXT_INSN (check_insn); >> + } >> + /* If we still have header instructions left after above loop. */ >> + if (next_insn != BB_HEAD (bb)) >> + BB_HEADER (bb) = unlink_insn_chain (next_insn, >> + PREV_INSN (BB_HEAD (bb))); >> + } >> end = skip_insns_after_block (bb); >> if (NEXT_INSN (BB_END (bb)) && BB_END (bb) != end) >> BB_FOOTER (bb) = unlink_insn_chain (NEXT_INSN (BB_END (bb)), end); >> @@ -3000,7 +3220,7 @@ outof_cfg_layout_mode (void) >> if (bb->next_bb != EXIT_BLOCK_PTR) >> bb->aux = bb->next_bb; >> >> - cfg_layout_finalize (); >> + cfg_layout_finalize (false); >> >> return 0; >> } >> @@ -3120,10 +3340,13 @@ relink_block_chain (bool stay_in_cfglayout_mode) >> } >> >> >> -/* Given a reorder chain, rearrange the code to match. */ >> +/* Given a reorder chain, rearrange the code to match. If >> + this is called when we will FINALIZE_REORDER_BLOCKS, or when >> + section boundary notes were removed on entry to cfg layout >> + mode, insert section boundary notes here. */ >> >> static void >> -fixup_reorder_chain (void) >> +fixup_reorder_chain (bool finalize_reorder_blocks) >> { >> basic_block bb; >> rtx insn = NULL; >> @@ -3150,7 +3373,7 @@ static void >> PREV_INSN (BB_HEADER (bb)) = insn; >> insn = BB_HEADER (bb); >> while (NEXT_INSN (insn)) >> - insn = NEXT_INSN (insn); >> + insn = NEXT_INSN (insn); >> } >> if (insn) >> NEXT_INSN (insn) = BB_HEAD (bb); >> @@ -3175,6 +3398,11 @@ static void >> insn = NEXT_INSN (insn); >> >> set_last_insn (insn); >> + >> + /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes. */ >> + if (had_sec_boundary_notes || finalize_reorder_blocks) >> + insert_section_boundary_note (); >> + >> #ifdef ENABLE_CHECKING >> verify_insn_chain (); >> #endif >> @@ -3187,7 +3415,7 @@ static void >> edge e_fall, e_taken, e; >> rtx bb_end_insn; >> rtx ret_label = NULL_RTX; >> - basic_block nb, src_bb; >> + basic_block nb; >> edge_iterator ei; >> >> if (EDGE_COUNT (bb->succs) == 0) >> @@ -3322,7 +3550,6 @@ static void >> /* We got here if we need to add a new jump insn. >> Note force_nonfallthru can delete E_FALL and thus we have to >> save E_FALL->src prior to the call to force_nonfallthru. */ >> - src_bb = e_fall->src; >> nb = force_nonfallthru_and_redirect (e_fall, e_fall->dest, ret_label); >> if (nb) >> { >> @@ -3330,17 +3557,6 @@ static void >> bb->aux = nb; >> /* Don't process this new block. */ >> bb = nb; >> - >> - /* Make sure new bb is tagged for correct section (same as >> - fall-thru source, since you cannot fall-thru across >> - section boundaries). */ >> - BB_COPY_PARTITION (src_bb, single_pred (bb)); >> - if (flag_reorder_blocks_and_partition >> - && targetm_common.have_named_sections >> - && JUMP_P (BB_END (bb)) >> - && !any_condjump_p (BB_END (bb)) >> - && (EDGE_SUCC (bb, 0)->flags & EDGE_CROSSING)) >> - add_reg_note (BB_END (bb), REG_CROSSING_JUMP, NULL_RTX); >> } >> } >> >> @@ -3644,10 +3860,11 @@ duplicate_insn_chain (rtx from, rtx to) >> case NOTE_INSN_FUNCTION_BEG: >> /* There is always just single entry to function. */ >> case NOTE_INSN_BASIC_BLOCK: >> + /* We should only switch text sections once. */ >> + case NOTE_INSN_SWITCH_TEXT_SECTIONS: >> break; >> >> case NOTE_INSN_EPILOGUE_BEG: >> - case NOTE_INSN_SWITCH_TEXT_SECTIONS: >> emit_note_copy (insn); >> break; >> >> @@ -3759,10 +3976,13 @@ break_superblocks (void) >> } >> >> /* Finalize the changes: reorder insn list according to the sequence specified >> - by aux pointers, enter compensation code, rebuild scope forest. */ >> + by aux pointers, enter compensation code, rebuild scope forest. If >> + this is called when we will FINALIZE_REORDER_BLOCKS, indicate that >> + to fixup_reorder_chain so that it can insert the proper switch text >> + section notes. */ >> >> void >> -cfg_layout_finalize (void) >> +cfg_layout_finalize (bool finalize_reorder_blocks) >> { >> #ifdef ENABLE_CHECKING >> verify_flow_info (); >> @@ -3775,7 +3995,7 @@ void >> #endif >> ) >> fixup_fallthru_exit_predecessor (); >> - fixup_reorder_chain (); >> + fixup_reorder_chain (finalize_reorder_blocks); >> >> rebuild_jump_labels (get_insns ()); >> delete_dead_jumptables (); >> @@ -4454,8 +4674,7 @@ rtl_can_remove_branch_p (const_edge e) >> if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH)) >> return false; >> >> - if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX) >> - || BB_PARTITION (src) != BB_PARTITION (target)) >> + if (BB_PARTITION (src) != BB_PARTITION (target)) >> return false; >> >> if (!onlyjump_p (insn) >> >> -- >> This patch is available for review at http://codereview.appspot.com/6823047 > > > > -- > Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Sign in to reply to this message.
Are you sure you have all my changes applied? I applied the 4 patches attached to PR55121 into my trunk checkout that has my fixes, and to a pristine trunk checkout. I configured and built both for --target=arm-none-linux-gnueabi, and built using your options, .i file and gcda file. I can reproduce the failure using the pristine trunk with your patches but not with my fixed trunk + your patches. (I just updated to head to pickup recent changes and get the same result. The vec changes required some manual changes to the patch, which I will resend shortly.) Without my fixes: $ ~/extra/gcc_trunk_3_arm-eabi/gcc/cc1 -fpreproce ssed eval.i -quiet -dumpbase eval.c -march=armv7-a -mtune=cortex-a9 -mthumb -mfpu=neon -mvectorize-with-neon-quad -mfloat-abi=softfp -mtls-dialect=gnu -auxbase-strip eval.o -g -O3 -version -fprofile-use -fno-common -o eval.s -freorder-blocks-and-partition GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi) compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version 2.4.2-p1, MPC version 0.8.1 GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi) compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version 2.4.2-p1, MPC version 0.8.1 GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 Compiler executable checksum: d19cc60a2f07de08237a8488bb35cd1a eval.c: In function ‘Ge’: eval.c:792:1: internal compiler error: in df_compact_blocks, at df-core.c:1560 } ^ 0x622f71 df_compact_blocks() ../../gcc_trunk_3/gcc/df-core.c:1560 0x5cfcb5 compact_blocks() ../../gcc_trunk_3/gcc/cfg.c:162 0xc9dce0 reorder_basic_blocks ../../gcc_trunk_3/gcc/bb-reorder.c:2154 0xc9dce0 rest_of_handle_reorder_blocks ../../gcc_trunk_3/gcc/bb-reorder.c:2219 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See <http://gcc.gnu.org/bugs.html> for instructions. With my fixes: $ ~/extra/gcc_trunk_4_arm-eabi/gcc/cc1 -fpreproce ssed eval.i -quiet -dumpbase eval.c -march=armv7-a -mtune=cortex-a9 -mthumb -mfpu=neon -mvectorize-with-neon-quad -mfloat-abi=softfp -mtls-dialect=gnu -auxbase-strip eval.o -g -O3 -version -fprofile-use -fno-common -o eval.s -freorder-blocks-and-partition GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi) compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version 2.4.2-p1, MPC version 0.8.1 GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi) compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version 2.4.2-p1, MPC version 0.8.1 GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 Compiler executable checksum: 45b468efa7c981f9afb44c4dac2424f3 Thanks, Teresa On Mon, Nov 26, 2012 at 8:25 AM, Christophe Lyon <christophe.lyon@linaro.org> wrote: > Hi, > > I have tested your patch on Spec2000 on ARM, and I can still see > several failures caused by: > "error: fallthru edge crosses section boundary", including the case > described in PR55121. > > On 26 November 2012 16:55, Teresa Johnson <tejohnson@google.com> wrote: >> Ping. >> Teresa >> >> On Thu, Nov 15, 2012 at 12:10 PM, Teresa Johnson <tejohnson@google.com> wrote: >>> Revised patch that fixes failures encountered when enabling >>> -freorder-blocks-and-partition, including the failure reported in PR 53743. >>> >>> This includes new verification code to ensure no cold blocks dominate hot >>> blocks contributed by Steven Bosscher. >>> >>> I attempted to make the handling of partition updates through the optimization >>> passes much more consistent, removing a number of partial fixes in the code >>> stream in the process. The code to fixup partitions (including the BB_PARTITION >>> assignement, region crossing jump notes, and switch text section notes) is >>> now handled in a few centralized locations. For example, inside >>> rtl_redirect_edge_and_branch and force_nonfallthru_and_redirect, so that callers >>> don't need to attempt the fixup themselves. >>> >>> For optimization passes that make adjustments to the cfg while in cfg layout >>> mode that are not easy to fix up incrementally, the new routine >>> fixup_partitions handles the cleanup globally. This does require calculation >>> of the dominance relation, however, as far as I can tell the routines which >>> now invoke this global fixup (try_optimize_cfg and commit_edge_insertions) >>> are invoked typically once (or a small number of times in the case of >>> try_optimize_cfg) per optimization pass. Additionally, I compared the >>> -ftime-report output for some large fdo compilations and saw only minimal >>> increases in the dominance computation times, which were only a tiny percent >>> of the overall compile time. >>> >>> Additionally, I added a flag to the rtl_data structure to indicate whether >>> any partitioning was actually performed, so that optimizations which were >>> conservatively disabled whenever the flag_reorder_blocks_and_partition >>> is enabled (e.g. try_crossjump_to_edge, part of connect_traces) can be less >>> conservative for functions where no partitions were formed (e.g. they are >>> completely hot). >>> >>> Bootstrapped and tested on x86_64-unknown-linux-gnu. Also tested with SPEC2006 int >>> benchmarks and internal google benchmarks using profile feedback and >>> -freorder-blocks-and-partition to get more coverage. Ok for trunk? >>> >>> Thanks, >>> Teresa >>> >>> 2012-11-14 Teresa Johnson <tejohnson@google.com> >>> Steven Bosscher <steven@gcc.gnu.org> >>> >>> * cfghooks.h (cfg_layout_finalize): New parameter. >>> * modulo-sched.c (rest_of_handle_sms): New cfg_layout_finalize >>> parameter. >>> * ifcvt.c (find_if_case_1): Replace BB_COPY_PARTITION with assert >>> as this is now done by redirect_edge_and_branch_force. >>> * function.c (thread_prologue_and_epilogue_insns): Insert new bb after >>> barriers, new cfg_layout_finalize parameter, and don't store exit >>> predecessor BB until after it is potentially split. >>> * function.h (struct rtl_data): New flag has_bb_partition. >>> * hw-doloop.c (reorder_loops): New cfg_layout_finalize parameter. >>> * cfgcleanup.c (try_crossjump_to_edge): Only skip optimization if >>> any blocks in function actually partitioned. >>> (try_optimize_cfg): If cfg changed, invoke fixup_partitions to clean >>> up partitioning. >>> * bb-reorder.c (connect_traces): Only look for partitions and skip >>> block copying if any blocks in function actually partitioned. >>> (emit_barrier_after_bb): Handle insertion in non-cfglayout mode. >>> (find_rarely_executed_basic_blocks_and_crossing_edges): Ensure >>> that no cold blocks dominate a hot block. >>> (fix_up_fall_thru_edges): Replace BB_COPY_PARTITION with assert >>> as this is now done by force_nonfallthru_and_redirect. >>> (add_reg_crossing_jump_notes): Handle the fact that some jumps may >>> already be marked with region crossing note. >>> (reorder_basic_blocks): Only need to verify partitions if any >>> blocks in function actually partitioned. >>> (insert_section_boundary_note): Only need to insert note if any >>> blocks in function actually partitioned. >>> (rest_of_handle_reorder_blocks): New cfg_layout_finalize >>> parameter, and remove call to insert_section_boundary_note as this >>> is now called via cfg_layout_finalize/fixup_reorder_chain. >>> (duplicate_computed_gotos): New cfg_layout_finalize >>> parameter. >>> (partition_hot_cold_basic_blocks): Set flag indicating function >>> has bb partitions. >>> * bb-reorder.h: Declare insert_section_boundary_note and >>> emit_barrier_after_bb, which are no longer static. >>> * basic-block.h: Declare new function fixup_partitions. >>> * cfgrtl.c (try_redirect_by_replacing_jump): Remove unnecessary >>> check for region crossing note. >>> (fixup_partition_crossing): New function. >>> (fixup_bb_partition): Ditto. >>> (rtl_redirect_edge_and_branch): Fixup partition boundaries. >>> (force_nonfallthru_and_redirect): Fixup partition boundaries, >>> remove old code that tried to do this. Emit barrier correctly >>> when we are in cfglayout mode. >>> (rtl_split_edge): Correctly fixup partition boundaries. >>> (commit_one_edge_insertion): Remove old code that tried to >>> fixup region crossing edge since this is now handled in >>> split_block, and set up insertion point correctly since >>> block may now end in a jump. >>> (commit_edge_insertions): Invoke fixup_partitions to sanitize partition >>> boundaries after optimizations that modify cfg and before trying to >>> verify the flow info. >>> (fixup_partitions): New function. >>> (rtl_verify_flow_info_1): Add verification that no cold bbs dominate >>> hot bbs. >>> (record_effective_endpoints): Remove region-crossing notes and set flag >>> indicating that they need to be reinserted on exit from cfglayout mode. >>> (outof_cfg_layout_mode): New cfg_layout_finalize parameter. >>> (fixup_reorder_chain): Call insert_section_boundary_note if necessary. >>> Remove old code that attempted to fixup region crossing note as >>> this is now handled in force_nonfallthru_and_redirect. >>> (duplicate_insn_chain): Don't duplicate switch section notes. >>> (cfg_layout_finalize): Pass new parameter to fixup_reorder_chain. >>> (rtl_can_remove_branch_p): Remove unnecessary check for region crossing >>> note. >>> >>> Index: cfghooks.h >>> =================================================================== >>> --- cfghooks.h (revision 193376) >>> +++ cfghooks.h (working copy) >>> @@ -204,7 +204,7 @@ extern void copy_bbs (basic_block *, unsigned, bas >>> void account_profile_record (struct profile_record *, int); >>> >>> extern void cfg_layout_initialize (unsigned int); >>> -extern void cfg_layout_finalize (void); >>> +extern void cfg_layout_finalize (bool); >>> >>> /* Hooks containers. */ >>> extern struct cfg_hooks gimple_cfg_hooks; >>> @@ -218,4 +218,3 @@ extern void cfg_layout_rtl_register_cfg_hooks (voi >>> extern void gimple_register_cfg_hooks (void); >>> extern struct cfg_hooks get_cfg_hooks (void); >>> extern void set_cfg_hooks (struct cfg_hooks); >>> - >>> Index: modulo-sched.c >>> =================================================================== >>> --- modulo-sched.c (revision 193376) >>> +++ modulo-sched.c (working copy) >>> @@ -3354,7 +3354,7 @@ rest_of_handle_sms (void) >>> if (bb->next_bb != EXIT_BLOCK_PTR) >>> bb->aux = bb->next_bb; >>> free_dominance_info (CDI_DOMINATORS); >>> - cfg_layout_finalize (); >>> + cfg_layout_finalize (false); >>> #endif /* INSN_SCHEDULING */ >>> return 0; >>> } >>> Index: ifcvt.c >>> =================================================================== >>> --- ifcvt.c (revision 193376) >>> +++ ifcvt.c (working copy) >>> @@ -3900,10 +3900,9 @@ find_if_case_1 (basic_block test_bb, edge then_edg >>> if (new_bb) >>> { >>> df_bb_replace (then_bb_index, new_bb); >>> - /* Since the fallthru edge was redirected from test_bb to new_bb, >>> - we need to ensure that new_bb is in the same partition as >>> - test bb (you can not fall through across section boundaries). */ >>> - BB_COPY_PARTITION (new_bb, test_bb); >>> + /* This should have been done above via force_nonfallthru_and_redirect >>> + (possibly called from redirect_edge_and_branch_force). */ >>> + gcc_assert (BB_PARTITION (new_bb) == BB_PARTITION (test_bb)); >>> } >>> >>> num_true_changes++; >>> Index: function.c >>> =================================================================== >>> --- function.c (revision 193376) >>> +++ function.c (working copy) >>> @@ -6249,8 +6249,10 @@ thread_prologue_and_epilogue_insns (void) >>> break; >>> if (e) >>> { >>> - copy_bb = create_basic_block (NEXT_INSN (BB_END (e->src)), >>> - NULL_RTX, e->src); >>> + /* Make sure we insert after any barriers. */ >>> + rtx end = get_last_bb_insn (e->src); >>> + copy_bb = create_basic_block (NEXT_INSN (end), >>> + NULL_RTX, e->src); >>> BB_COPY_PARTITION (copy_bb, e->src); >>> } >>> else >>> @@ -6475,7 +6477,7 @@ thread_prologue_and_epilogue_insns (void) >>> if (cur_bb->index >= NUM_FIXED_BLOCKS >>> && cur_bb->next_bb->index >= NUM_FIXED_BLOCKS) >>> cur_bb->aux = cur_bb->next_bb; >>> - cfg_layout_finalize (); >>> + cfg_layout_finalize (false); >>> } >>> >>> epilogue_done: >>> @@ -6517,7 +6519,7 @@ epilogue_done: >>> basic_block simple_return_block_cold = NULL; >>> edge pending_edge_hot = NULL; >>> edge pending_edge_cold = NULL; >>> - basic_block exit_pred = EXIT_BLOCK_PTR->prev_bb; >>> + basic_block exit_pred; >>> int i; >>> >>> gcc_assert (entry_edge != orig_entry_edge); >>> @@ -6545,6 +6547,12 @@ epilogue_done: >>> else >>> pending_edge_cold = e; >>> } >>> + >>> + /* Save a pointer to the exit's predecessor BB for use in >>> + inserting new BBs at the end of the function. Do this >>> + after the call to split_block above which may split >>> + the original exit pred. */ >>> + exit_pred = EXIT_BLOCK_PTR->prev_bb; >>> >>> FOR_EACH_VEC_ELT (edge, unconverted_simple_returns, i, e) >>> { >>> Index: function.h >>> =================================================================== >>> --- function.h (revision 193376) >>> +++ function.h (working copy) >>> @@ -459,6 +459,11 @@ struct GTY(()) rtl_data { >>> sched2) and is useful only if the port defines LEAF_REGISTERS. */ >>> bool uses_only_leaf_regs; >>> >>> + /* Nonzero if the function being compiled has undergone hot/cold partitioning >>> + (under flag_reorder_blocks_and_partition) and has at least one cold >>> + block. */ >>> + bool has_bb_partition; >>> + >>> /* Like regs_ever_live, but 1 if a reg is set or clobbered from an >>> asm. Unlike regs_ever_live, elements of this array corresponding >>> to eliminable regs (like the frame pointer) are set if an asm >>> Index: hw-doloop.c >>> =================================================================== >>> --- hw-doloop.c (revision 193376) >>> +++ hw-doloop.c (working copy) >>> @@ -547,7 +547,7 @@ reorder_loops (hwloop_info loops) >>> else >>> bb->aux = NULL; >>> } >>> - cfg_layout_finalize (); >>> + cfg_layout_finalize (false); >>> clear_aux_for_blocks (); >>> df_analyze (); >>> } >>> Index: cfgcleanup.c >>> =================================================================== >>> --- cfgcleanup.c (revision 193376) >>> +++ cfgcleanup.c (working copy) >>> @@ -1824,7 +1824,7 @@ try_crossjump_to_edge (int mode, edge e1, edge e2, >>> partition boundaries). See the comments at the top of >>> bb-reorder.c:partition_hot_cold_basic_blocks for complete details. */ >>> >>> - if (flag_reorder_blocks_and_partition && reload_completed) >>> + if (crtl->has_bb_partition && reload_completed) >>> return false; >>> >>> /* Search backward through forwarder blocks. We don't need to worry >>> @@ -2767,10 +2767,21 @@ try_optimize_cfg (int mode) >>> df_analyze (); >>> } >>> >>> + if (changed) >>> + { >>> + /* Edge forwarding in particular can cause hot blocks previously >>> + reached by both hot and cold blocks to become dominated only >>> + by cold blocks. This will cause the verification below to fail, >>> + and lead to now cold code in the hot section. This is not easy >>> + to detect and fix during edge forwarding, and in some cases >>> + is only visible after newly unreachable blocks are deleted, >>> + which will be done in fixup_partitions. */ >>> + fixup_partitions (); >>> + >>> #ifdef ENABLE_CHECKING >>> - if (changed) >>> - verify_flow_info (); >>> + verify_flow_info (); >>> #endif >>> + } >>> >>> changed_overall |= changed; >>> first_pass = false; >>> Index: bb-reorder.c >>> =================================================================== >>> --- bb-reorder.c (revision 193376) >>> +++ bb-reorder.c (working copy) >>> @@ -1054,7 +1054,7 @@ connect_traces (int n_traces, struct trace *traces >>> current_partition = BB_PARTITION (traces[0].first); >>> two_passes = false; >>> >>> - if (flag_reorder_blocks_and_partition) >>> + if (crtl->has_bb_partition) >>> for (i = 0; i < n_traces && !two_passes; i++) >>> if (BB_PARTITION (traces[0].first) >>> != BB_PARTITION (traces[i].first)) >>> @@ -1263,7 +1263,7 @@ connect_traces (int n_traces, struct trace *traces >>> } >>> } >>> >>> - if (flag_reorder_blocks_and_partition) >>> + if (crtl->has_bb_partition) >>> try_copy = false; >>> >>> /* Copy tiny blocks always; copy larger blocks only when the >>> @@ -1381,13 +1381,14 @@ get_uncond_jump_length (void) >>> return length; >>> } >>> >>> -/* Emit a barrier into the footer of BB. */ >>> +/* Emit a barrier after BB, into the footer if we are in CFGLAYOUT mode. */ >>> >>> -static void >>> +void >>> emit_barrier_after_bb (basic_block bb) >>> { >>> rtx barrier = emit_barrier_after (BB_END (bb)); >>> - BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier); >>> + if (current_ir_type () == IR_RTL_CFGLAYOUT) >>> + BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier); >>> } >>> >>> /* The landing pad OLD_LP, in block OLD_BB, has edges from both partitions. >>> @@ -1463,18 +1464,109 @@ find_rarely_executed_basic_blocks_and_crossing_edg >>> { >>> VEC(edge, heap) *crossing_edges = NULL; >>> basic_block bb; >>> - edge e; >>> - edge_iterator ei; >>> + edge e, e2; >>> + edge_iterator ei, ei2; >>> + unsigned int cold_bb_count = 0; >>> + VEC (basic_block, heap) *bbs_in_hot_partition = NULL; >>> + VEC (basic_block, heap) *bbs_newly_hot = NULL; >>> >>> /* Mark which partition (hot/cold) each basic block belongs in. */ >>> FOR_EACH_BB (bb) >>> { >>> if (probably_never_executed_bb_p (cfun, bb)) >>> - BB_SET_PARTITION (bb, BB_COLD_PARTITION); >>> + { >>> + BB_SET_PARTITION (bb, BB_COLD_PARTITION); >>> + cold_bb_count++; >>> + } >>> else >>> - BB_SET_PARTITION (bb, BB_HOT_PARTITION); >>> + { >>> + BB_SET_PARTITION (bb, BB_HOT_PARTITION); >>> + VEC_safe_push (basic_block, heap, bbs_in_hot_partition, bb); >>> + } >>> } >>> >>> + /* Ensure that no cold bbs dominate hot bbs. This could happen as a result of >>> + several different possibilities. One is that there are edge weight insanities >>> + due to optimization phases that do not properly update basic block profile >>> + counts. The second is that the entry of the function may not be hot, because >>> + it is entered fewer times than the number of profile training runs, but there >>> + is a loop inside the function that causes blocks within the function to be >>> + above the threshold for hotness. */ >>> + if (cold_bb_count) >>> + { >>> + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); >>> + >>> + if (dom_calculated_here) >>> + calculate_dominance_info (CDI_DOMINATORS); >>> + >>> + /* Keep examining hot bbs until we have either checked them all, or >>> + re-marked all cold bbs hot. */ >>> + while (! VEC_empty (basic_block, bbs_in_hot_partition) >>> + && cold_bb_count) >>> + { >>> + basic_block dom_bb; >>> + >>> + bb = VEC_pop (basic_block, bbs_in_hot_partition); >>> + dom_bb = get_immediate_dominator (CDI_DOMINATORS, bb); >>> + >>> + /* If bb's immediate dominator is also hot then it is ok. */ >>> + if (BB_PARTITION (dom_bb) != BB_COLD_PARTITION) >>> + continue; >>> + >>> + /* We have a hot bb with an immediate dominator that is cold. >>> + The dominator needs to be re-marked to hot. */ >>> + BB_SET_PARTITION (dom_bb, BB_HOT_PARTITION); >>> + cold_bb_count--; >>> + >>> + /* Now we need to examine newly-hot dom_bb to see if it is also >>> + dominated by a cold bb. */ >>> + VEC_safe_push (basic_block, heap, bbs_in_hot_partition, dom_bb); >>> + >>> + /* We should also adjust any cold blocks that the newly-hot bb >>> + feeds and see if it makes sense to re-mark those as hot as >>> + well. */ >>> + VEC_safe_push (basic_block, heap, bbs_newly_hot, dom_bb); >>> + while (! VEC_empty (basic_block, bbs_newly_hot)) >>> + { >>> + basic_block new_hot_bb = VEC_pop (basic_block, bbs_newly_hot); >>> + /* Examine all successors of this newly-hot bb to see if they >>> + are cold and should be re-marked as hot. */ >>> + FOR_EACH_EDGE (e, ei, new_hot_bb->succs) >>> + { >>> + bool any_cold_preds = false; >>> + basic_block succ = e->dest; >>> + if (BB_PARTITION (succ) != BB_COLD_PARTITION) >>> + continue; >>> + /* Does this block have any cold predecessors now? */ >>> + FOR_EACH_EDGE (e2, ei2, succ->preds) >>> + { >>> + if (BB_PARTITION (e2->src) == BB_COLD_PARTITION) >>> + { >>> + any_cold_preds = true; >>> + break; >>> + } >>> + } >>> + if (any_cold_preds) >>> + continue; >>> + >>> + /* Here we have a successor of newly-hot bb that is cold >>> + but no longer has any cold precessessors. Since the original >>> + assignment of our newly-hot bb was incorrect, this successor's >>> + assignment as cold is also suspect. Go ahead and re-mark it >>> + as hot now too. Better heuristics may be in order here. */ >>> + BB_SET_PARTITION (succ, BB_HOT_PARTITION); >>> + cold_bb_count--; >>> + VEC_safe_push (basic_block, heap, bbs_in_hot_partition, succ); >>> + /* Examine this successor as a newly-hot bb. */ >>> + VEC_safe_push (basic_block, heap, bbs_newly_hot, succ); >>> + } >>> + } >>> + } >>> + >>> + if (dom_calculated_here) >>> + free_dominance_info (CDI_DOMINATORS); >>> + } >>> + >>> /* The format of .gcc_except_table does not allow landing pads to >>> be in a different partition as the throw. Fix this by either >>> moving or duplicating the landing pads. */ >>> @@ -1766,10 +1858,10 @@ fix_up_fall_thru_edges (void) >>> new_bb->aux = cur_bb->aux; >>> cur_bb->aux = new_bb; >>> >>> - /* Make sure new fall-through bb is in same >>> - partition as bb it's falling through from. */ >>> + /* This is done by force_nonfallthru_and_redirect. */ >>> + gcc_assert (BB_PARTITION (new_bb) >>> + == BB_PARTITION (cur_bb)); >>> >>> - BB_COPY_PARTITION (new_bb, cur_bb); >>> single_succ_edge (new_bb)->flags |= EDGE_CROSSING; >>> } >>> else >>> @@ -2067,7 +2159,10 @@ add_reg_crossing_jump_notes (void) >>> FOR_EACH_BB (bb) >>> FOR_EACH_EDGE (e, ei, bb->succs) >>> if ((e->flags & EDGE_CROSSING) >>> - && JUMP_P (BB_END (e->src))) >>> + && JUMP_P (BB_END (e->src)) >>> + /* Some notes were added during fix_up_fall_thru_edges, via >>> + force_nonfallthru_and_redirect. */ >>> + && !find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX)) >>> add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); >>> } >>> >>> @@ -2160,7 +2255,7 @@ reorder_basic_blocks (void) >>> dump_flow_info (dump_file, dump_flags); >>> } >>> >>> - if (flag_reorder_blocks_and_partition) >>> + if (crtl->has_bb_partition) >>> verify_hot_cold_block_grouping (); >>> } >>> >>> @@ -2172,14 +2267,14 @@ reorder_basic_blocks (void) >>> encountering this note will make the compiler switch between the >>> hot and cold text sections. */ >>> >>> -static void >>> +void >>> insert_section_boundary_note (void) >>> { >>> basic_block bb; >>> rtx new_note; >>> int first_partition = 0; >>> >>> - if (!flag_reorder_blocks_and_partition) >>> + if (!crtl->has_bb_partition) >>> return; >>> >>> FOR_EACH_BB (bb) >>> @@ -2222,10 +2317,8 @@ rest_of_handle_reorder_blocks (void) >>> FOR_EACH_BB (bb) >>> if (bb->next_bb != EXIT_BLOCK_PTR) >>> bb->aux = bb->next_bb; >>> - cfg_layout_finalize (); >>> + cfg_layout_finalize (true); >>> >>> - /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes. */ >>> - insert_section_boundary_note (); >>> return 0; >>> } >>> >>> @@ -2366,7 +2459,7 @@ duplicate_computed_gotos (void) >>> } >>> >>> done: >>> - cfg_layout_finalize (); >>> + cfg_layout_finalize (false); >>> >>> BITMAP_FREE (candidates); >>> return 0; >>> @@ -2511,6 +2604,8 @@ partition_hot_cold_basic_blocks (void) >>> if (crossing_edges == NULL) >>> return 0; >>> >>> + crtl->has_bb_partition = true; >>> + >>> /* Make sure the source of any crossing edge ends in a jump and the >>> destination of any crossing edge has a label. */ >>> add_labels_and_missing_jumps (crossing_edges); >>> Index: bb-reorder.h >>> =================================================================== >>> --- bb-reorder.h (revision 193376) >>> +++ bb-reorder.h (working copy) >>> @@ -36,4 +36,8 @@ extern struct target_bb_reorder *this_target_bb_re >>> >>> extern int get_uncond_jump_length (void); >>> >>> +extern void insert_section_boundary_note (void); >>> + >>> +extern void emit_barrier_after_bb (basic_block bb); >>> + >>> #endif >>> Index: basic-block.h >>> =================================================================== >>> --- basic-block.h (revision 193376) >>> +++ basic-block.h (working copy) >>> @@ -806,6 +806,7 @@ extern basic_block force_nonfallthru_and_redirect >>> extern bool contains_no_active_insn_p (const_basic_block); >>> extern bool forwarder_block_p (const_basic_block); >>> extern bool can_fallthru (basic_block, basic_block); >>> +extern void fixup_partitions (void); >>> >>> /* In cfgbuild.c. */ >>> extern void find_many_sub_basic_blocks (sbitmap); >>> Index: cfgrtl.c >>> =================================================================== >>> --- cfgrtl.c (revision 193376) >>> +++ cfgrtl.c (working copy) >>> @@ -46,6 +46,7 @@ along with GCC; see the file COPYING3. If not see >>> #include "tree.h" >>> #include "hard-reg-set.h" >>> #include "basic-block.h" >>> +#include "bb-reorder.h" >>> #include "regs.h" >>> #include "flags.h" >>> #include "function.h" >>> @@ -67,11 +68,12 @@ along with GCC; see the file COPYING3. If not see >>> Only applicable if the CFG is in cfglayout mode. */ >>> static GTY(()) rtx cfg_layout_function_footer; >>> static GTY(()) rtx cfg_layout_function_header; >>> +static bool had_sec_boundary_notes; >>> >>> static rtx skip_insns_after_block (basic_block); >>> static void record_effective_endpoints (void); >>> static rtx label_for_bb (basic_block); >>> -static void fixup_reorder_chain (void); >>> +static void fixup_reorder_chain (bool finalize_reorder_blocks); >>> >>> void verify_insn_chain (void); >>> static void fixup_fallthru_exit_predecessor (void); >>> @@ -976,8 +978,7 @@ try_redirect_by_replacing_jump (edge e, basic_bloc >>> partition boundaries). See the comments at the top of >>> bb-reorder.c:partition_hot_cold_basic_blocks for complete details. */ >>> >>> - if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX) >>> - || BB_PARTITION (src) != BB_PARTITION (target)) >>> + if (BB_PARTITION (src) != BB_PARTITION (target)) >>> return NULL; >>> >>> /* We can replace or remove a complex jump only when we have exactly >>> @@ -1286,6 +1287,71 @@ redirect_branch_edge (edge e, basic_block target) >>> return e; >>> } >>> >>> +/* Called when edge E has been redirected to a new destination, >>> + in order to update the region crossing flag on the edge and >>> + jump. */ >>> + >>> +static void >>> +fixup_partition_crossing (edge e, basic_block target) >>> +{ >>> + rtx note; >>> + >>> + gcc_assert (e->dest == target); >>> + >>> + if (e->src == ENTRY_BLOCK_PTR || target == EXIT_BLOCK_PTR) >>> + return; >>> + /* If we redirected an existing edge, it may already be marked >>> + crossing, even though the new src is missing a reg crossing note. >>> + But make sure reg crossing note doesn't already exist before >>> + inserting. */ >>> + if (BB_PARTITION (e->src) != BB_PARTITION (target)) >>> + { >>> + e->flags |= EDGE_CROSSING; >>> + note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); >>> + if (JUMP_P (BB_END (e->src)) >>> + && !note) >>> + add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); >>> + } >>> + else if (BB_PARTITION (e->src) == BB_PARTITION (target)) >>> + { >>> + e->flags &= ~EDGE_CROSSING; >>> + /* Remove the region crossing note from jump at end of >>> + e->src if it exists. */ >>> + note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); >>> + if (note) >>> + remove_note (BB_END (e->src), note); >>> + } >>> +} >>> + >>> +/* Called when block BB has been reassigned to a different partition, >>> + to ensure that the region crossing attributes are updated. */ >>> + >>> +static void >>> +fixup_bb_partition (basic_block bb) >>> +{ >>> + edge e; >>> + edge_iterator ei; >>> + >>> + /* Now need to make bb's pred edges non-region crossing. */ >>> + FOR_EACH_EDGE (e, ei, bb->preds) >>> + { >>> + fixup_partition_crossing (e, e->dest); >>> + } >>> + >>> + /* Possibly need to make bb's successor edges region crossing, >>> + or remove stale region crossing. */ >>> + FOR_EACH_EDGE (e, ei, bb->succs) >>> + { >>> + if ((e->flags & EDGE_FALLTHRU) >>> + && BB_PARTITION (bb) != BB_PARTITION (e->dest) >>> + && e->dest != EXIT_BLOCK_PTR) >>> + /* force_nonfallthru_and_redirect calls fixup_partition_crossing. */ >>> + force_nonfallthru (e); >>> + else >>> + fixup_partition_crossing (e, e->dest); >>> + } >>> +} >>> + >>> /* Attempt to change code to redirect edge E to TARGET. Don't do that on >>> expense of adding new instructions or reordering basic blocks. >>> >>> @@ -1302,16 +1368,18 @@ rtl_redirect_edge_and_branch (edge e, basic_block >>> { >>> edge ret; >>> basic_block src = e->src; >>> + basic_block dest = e->dest; >>> >>> if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH)) >>> return NULL; >>> >>> - if (e->dest == target) >>> + if (dest == target) >>> return e; >>> >>> if ((ret = try_redirect_by_replacing_jump (e, target, false)) != NULL) >>> { >>> df_set_bb_dirty (src); >>> + fixup_partition_crossing (ret, target); >>> return ret; >>> } >>> >>> @@ -1320,6 +1388,7 @@ rtl_redirect_edge_and_branch (edge e, basic_block >>> return NULL; >>> >>> df_set_bb_dirty (src); >>> + fixup_partition_crossing (ret, target); >>> return ret; >>> } >>> >>> @@ -1454,18 +1523,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc >>> /* Make sure new block ends up in correct hot/cold section. */ >>> >>> BB_COPY_PARTITION (jump_block, e->src); >>> - if (flag_reorder_blocks_and_partition >>> - && targetm_common.have_named_sections >>> - && JUMP_P (BB_END (jump_block)) >>> - && !any_condjump_p (BB_END (jump_block)) >>> - && (EDGE_SUCC (jump_block, 0)->flags & EDGE_CROSSING)) >>> - add_reg_note (BB_END (jump_block), REG_CROSSING_JUMP, NULL_RTX); >>> >>> /* Wire edge in. */ >>> new_edge = make_edge (e->src, jump_block, EDGE_FALLTHRU); >>> new_edge->probability = probability; >>> new_edge->count = count; >>> >>> + /* If e->src was previously region crossing, it no longer is >>> + and the reg crossing note should be removed. */ >>> + fixup_partition_crossing (new_edge, jump_block); >>> + >>> /* Redirect old edge. */ >>> redirect_edge_pred (e, jump_block); >>> e->probability = REG_BR_PROB_BASE; >>> @@ -1521,13 +1588,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc >>> LABEL_NUSES (label)++; >>> } >>> >>> - emit_barrier_after (BB_END (jump_block)); >>> + /* We might be in cfg layout mode, and if so, the following routine will >>> + insert the barrier correctly. */ >>> + emit_barrier_after_bb (jump_block); >>> redirect_edge_succ_nodup (e, target); >>> >>> if (abnormal_edge_flags) >>> make_edge (src, target, abnormal_edge_flags); >>> >>> df_mark_solutions_dirty (); >>> + fixup_partition_crossing (e, target); >>> return new_bb; >>> } >>> >>> @@ -1626,7 +1696,7 @@ rtl_move_block_after (basic_block bb ATTRIBUTE_UNU >>> static basic_block >>> rtl_split_edge (edge edge_in) >>> { >>> - basic_block bb; >>> + basic_block bb, new_bb; >>> rtx before; >>> >>> /* Abnormal edges cannot be split. */ >>> @@ -1659,12 +1729,26 @@ rtl_split_edge (edge edge_in) >>> else >>> { >>> bb = create_basic_block (before, NULL, edge_in->dest->prev_bb); >>> - /* ??? Why not edge_in->dest->prev_bb here? */ >>> - BB_COPY_PARTITION (bb, edge_in->dest); >>> + if (edge_in->src == ENTRY_BLOCK_PTR) >>> + BB_COPY_PARTITION (bb, edge_in->dest); >>> + else >>> + /* Put the split bb into the src partition, to avoid creating >>> + a situation where a cold bb dominates a hot bb, in the case >>> + where src is cold and dest is hot. The src will dominate >>> + the new bb (whereas it might not have dominated dest). */ >>> + BB_COPY_PARTITION (bb, edge_in->src); >>> } >>> >>> make_single_succ_edge (bb, edge_in->dest, EDGE_FALLTHRU); >>> >>> + /* Can't allow a region crossing edge to be fallthrough. */ >>> + if (BB_PARTITION (bb) != BB_PARTITION (edge_in->dest) >>> + && edge_in->dest != EXIT_BLOCK_PTR) >>> + { >>> + new_bb = force_nonfallthru (single_succ_edge (bb)); >>> + gcc_assert (!new_bb); >>> + } >>> + >>> /* For non-fallthru edges, we must adjust the predecessor's >>> jump instruction to target our new block. */ >>> if ((edge_in->flags & EDGE_FALLTHRU) == 0) >>> @@ -1777,17 +1861,13 @@ commit_one_edge_insertion (edge e) >>> else >>> { >>> bb = split_edge (e); >>> - after = BB_END (bb); >>> >>> - if (flag_reorder_blocks_and_partition >>> - && targetm_common.have_named_sections >>> - && e->src != ENTRY_BLOCK_PTR >>> - && BB_PARTITION (e->src) == BB_COLD_PARTITION >>> - && !(e->flags & EDGE_CROSSING) >>> - && JUMP_P (after) >>> - && !any_condjump_p (after) >>> - && (single_succ_edge (bb)->flags & EDGE_CROSSING)) >>> - add_reg_note (after, REG_CROSSING_JUMP, NULL_RTX); >>> + /* If e crossed a partition boundary, we needed to make bb end in >>> + a region-crossing jump, even though it was originally fallthru. */ >>> + if (JUMP_P (BB_END (bb))) >>> + before = BB_END (bb); >>> + else >>> + after = BB_END (bb); >>> } >>> >>> /* Now that we've found the spot, do the insertion. */ >>> @@ -1827,6 +1907,14 @@ commit_edge_insertions (void) >>> { >>> basic_block bb; >>> >>> + /* Optimization passes that invoke this routine can cause hot blocks >>> + previously reached by both hot and cold blocks to become dominated only >>> + by cold blocks. This will cause the verification below to fail, >>> + and lead to now cold code in the hot section. In some cases this >>> + may only be visible after newly unreachable blocks are deleted, >>> + which will be done by fixup_partitions. */ >>> + fixup_partitions (); >>> + >>> #ifdef ENABLE_CHECKING >>> verify_flow_info (); >>> #endif >>> @@ -2028,7 +2116,75 @@ get_last_bb_insn (basic_block bb) >>> >>> return end; >>> } >>> - >>> + >>> +/* Perform cleanup on the hot/cold bb partitioning after optimization >>> + passes that modify the cfg. */ >>> + >>> +void >>> +fixup_partitions (void) >>> +{ >>> + basic_block bb; >>> + >>> + if (!crtl->has_bb_partition) >>> + return; >>> + >>> + /* Delete any blocks that became unreachable and weren't >>> + already cleaned up, for example during edge forwarding >>> + and convert_jumps_to_returns. This will expose more >>> + opportunities for fixing the partition boundaries here. >>> + Also, the calculation of the dominance graph during verification >>> + will assert if there are unreachable nodes. */ >>> + delete_unreachable_blocks (); >>> + >>> + /* If there are partitions, do a sanity check on them: A basic block in >>> + a cold partition cannot dominate a basic block in a hot partition. >>> + Fixup any that now violate this requirement, as a result of edge >>> + forwarding and unreachable block deletion. */ >>> + VEC (basic_block, heap) *bbs_in_cold_partition = NULL; >>> + VEC (basic_block, heap) *bbs_to_fix = NULL; >>> + FOR_EACH_BB (bb) >>> + if ((BB_PARTITION (bb) == BB_COLD_PARTITION)) >>> + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb); >>> + if (! VEC_empty (basic_block, bbs_in_cold_partition)) >>> + { >>> + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); >>> + basic_block son; >>> + >>> + if (dom_calculated_here) >>> + calculate_dominance_info (CDI_DOMINATORS); >>> + >>> + while (! VEC_empty (basic_block, bbs_in_cold_partition)) >>> + { >>> + bb = VEC_pop (basic_block, bbs_in_cold_partition); >>> + /* If bb is not yet cold (because it was added below as >>> + a block dominated by a cold bb) then mark it cold here. */ >>> + if ((BB_PARTITION (bb) != BB_COLD_PARTITION)) >>> + { >>> + BB_SET_PARTITION (bb, BB_COLD_PARTITION); >>> + VEC_safe_push (basic_block, heap, bbs_to_fix, bb); >>> + } >>> + /* Any blocks dominated by a block in the cold section >>> + must also be cold. */ >>> + for (son = first_dom_son (CDI_DOMINATORS, bb); >>> + son; >>> + son = next_dom_son (CDI_DOMINATORS, son)) >>> + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son); >>> + } >>> + >>> + if (dom_calculated_here) >>> + free_dominance_info (CDI_DOMINATORS); >>> + } >>> + >>> + /* Do the partition fixup after all necessary blocks have been converted to >>> + cold, so that we only update the region crossings the minimum number of >>> + places, which can require forcing edges to be non fallthru. */ >>> + while (! VEC_empty (basic_block, bbs_to_fix)) >>> + { >>> + bb = VEC_pop (basic_block, bbs_to_fix); >>> + fixup_bb_partition (bb); >>> + } >>> +} >>> + >>> /* Verify the CFG and RTL consistency common for both underlying RTL and >>> cfglayout RTL. >>> >>> @@ -2052,6 +2208,7 @@ rtl_verify_flow_info_1 (void) >>> rtx x; >>> int err = 0; >>> basic_block bb; >>> + bool have_partitions = false; >>> >>> /* Check the general integrity of the basic blocks. */ >>> FOR_EACH_BB_REVERSE (bb) >>> @@ -2169,6 +2326,8 @@ rtl_verify_flow_info_1 (void) >>> >>> if (e->flags & EDGE_ABNORMAL) >>> n_abnormal++; >>> + >>> + have_partitions |= is_crossing; >>> } >>> >>> if (n_eh && !find_reg_note (BB_END (bb), REG_EH_REGION, NULL_RTX)) >>> @@ -2293,6 +2452,40 @@ rtl_verify_flow_info_1 (void) >>> } >>> } >>> >>> + /* If there are partitions, do a sanity check on them: A basic block in >>> + a cold partition cannot dominate a basic block in a hot partition. */ >>> + VEC (basic_block, heap) *bbs_in_cold_partition = NULL; >>> + if (have_partitions && !err) >>> + FOR_EACH_BB (bb) >>> + if ((BB_PARTITION (bb) == BB_COLD_PARTITION)) >>> + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb); >>> + if (! VEC_empty (basic_block, bbs_in_cold_partition)) >>> + { >>> + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); >>> + basic_block son; >>> + >>> + if (dom_calculated_here) >>> + calculate_dominance_info (CDI_DOMINATORS); >>> + >>> + while (! VEC_empty (basic_block, bbs_in_cold_partition)) >>> + { >>> + bb = VEC_pop (basic_block, bbs_in_cold_partition); >>> + if ((BB_PARTITION (bb) != BB_COLD_PARTITION)) >>> + { >>> + error ("non-cold basic block %d dominated " >>> + "by a block in the cold partition", bb->index); >>> + err = 1; >>> + } >>> + for (son = first_dom_son (CDI_DOMINATORS, bb); >>> + son; >>> + son = next_dom_son (CDI_DOMINATORS, son)) >>> + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son); >>> + } >>> + >>> + if (dom_calculated_here) >>> + free_dominance_info (CDI_DOMINATORS); >>> + } >>> + >>> /* Clean up. */ >>> return err; >>> } >>> @@ -2965,14 +3158,41 @@ record_effective_endpoints (void) >>> else >>> cfg_layout_function_header = NULL_RTX; >>> >>> + had_sec_boundary_notes = false; >>> + >>> next_insn = get_insns (); >>> FOR_EACH_BB (bb) >>> { >>> rtx end; >>> >>> if (PREV_INSN (BB_HEAD (bb)) && next_insn != BB_HEAD (bb)) >>> - BB_HEADER (bb) = unlink_insn_chain (next_insn, >>> - PREV_INSN (BB_HEAD (bb))); >>> + { >>> + /* Rather than try to keep section boundary notes incrementally >>> + up-to-date through cfg layout optimizations, simply remove them >>> + and flag that they should be re-inserted when exiting >>> + cfg layout mode. */ >>> + rtx check_insn = next_insn; >>> + while (check_insn) >>> + { >>> + if (NOTE_P (check_insn) >>> + && NOTE_KIND (check_insn) == NOTE_INSN_SWITCH_TEXT_SECTIONS) >>> + { >>> + had_sec_boundary_notes |= true; >>> + /* Remove note from chain. Grab new next_insn first. */ >>> + if (next_insn == check_insn) >>> + next_insn = NEXT_INSN (check_insn); >>> + /* Delete note. */ >>> + delete_insn (check_insn); >>> + /* There will only be one. */ >>> + break; >>> + } >>> + check_insn = NEXT_INSN (check_insn); >>> + } >>> + /* If we still have header instructions left after above loop. */ >>> + if (next_insn != BB_HEAD (bb)) >>> + BB_HEADER (bb) = unlink_insn_chain (next_insn, >>> + PREV_INSN (BB_HEAD (bb))); >>> + } >>> end = skip_insns_after_block (bb); >>> if (NEXT_INSN (BB_END (bb)) && BB_END (bb) != end) >>> BB_FOOTER (bb) = unlink_insn_chain (NEXT_INSN (BB_END (bb)), end); >>> @@ -3000,7 +3220,7 @@ outof_cfg_layout_mode (void) >>> if (bb->next_bb != EXIT_BLOCK_PTR) >>> bb->aux = bb->next_bb; >>> >>> - cfg_layout_finalize (); >>> + cfg_layout_finalize (false); >>> >>> return 0; >>> } >>> @@ -3120,10 +3340,13 @@ relink_block_chain (bool stay_in_cfglayout_mode) >>> } >>> >>> >>> -/* Given a reorder chain, rearrange the code to match. */ >>> +/* Given a reorder chain, rearrange the code to match. If >>> + this is called when we will FINALIZE_REORDER_BLOCKS, or when >>> + section boundary notes were removed on entry to cfg layout >>> + mode, insert section boundary notes here. */ >>> >>> static void >>> -fixup_reorder_chain (void) >>> +fixup_reorder_chain (bool finalize_reorder_blocks) >>> { >>> basic_block bb; >>> rtx insn = NULL; >>> @@ -3150,7 +3373,7 @@ static void >>> PREV_INSN (BB_HEADER (bb)) = insn; >>> insn = BB_HEADER (bb); >>> while (NEXT_INSN (insn)) >>> - insn = NEXT_INSN (insn); >>> + insn = NEXT_INSN (insn); >>> } >>> if (insn) >>> NEXT_INSN (insn) = BB_HEAD (bb); >>> @@ -3175,6 +3398,11 @@ static void >>> insn = NEXT_INSN (insn); >>> >>> set_last_insn (insn); >>> + >>> + /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes. */ >>> + if (had_sec_boundary_notes || finalize_reorder_blocks) >>> + insert_section_boundary_note (); >>> + >>> #ifdef ENABLE_CHECKING >>> verify_insn_chain (); >>> #endif >>> @@ -3187,7 +3415,7 @@ static void >>> edge e_fall, e_taken, e; >>> rtx bb_end_insn; >>> rtx ret_label = NULL_RTX; >>> - basic_block nb, src_bb; >>> + basic_block nb; >>> edge_iterator ei; >>> >>> if (EDGE_COUNT (bb->succs) == 0) >>> @@ -3322,7 +3550,6 @@ static void >>> /* We got here if we need to add a new jump insn. >>> Note force_nonfallthru can delete E_FALL and thus we have to >>> save E_FALL->src prior to the call to force_nonfallthru. */ >>> - src_bb = e_fall->src; >>> nb = force_nonfallthru_and_redirect (e_fall, e_fall->dest, ret_label); >>> if (nb) >>> { >>> @@ -3330,17 +3557,6 @@ static void >>> bb->aux = nb; >>> /* Don't process this new block. */ >>> bb = nb; >>> - >>> - /* Make sure new bb is tagged for correct section (same as >>> - fall-thru source, since you cannot fall-thru across >>> - section boundaries). */ >>> - BB_COPY_PARTITION (src_bb, single_pred (bb)); >>> - if (flag_reorder_blocks_and_partition >>> - && targetm_common.have_named_sections >>> - && JUMP_P (BB_END (bb)) >>> - && !any_condjump_p (BB_END (bb)) >>> - && (EDGE_SUCC (bb, 0)->flags & EDGE_CROSSING)) >>> - add_reg_note (BB_END (bb), REG_CROSSING_JUMP, NULL_RTX); >>> } >>> } >>> >>> @@ -3644,10 +3860,11 @@ duplicate_insn_chain (rtx from, rtx to) >>> case NOTE_INSN_FUNCTION_BEG: >>> /* There is always just single entry to function. */ >>> case NOTE_INSN_BASIC_BLOCK: >>> + /* We should only switch text sections once. */ >>> + case NOTE_INSN_SWITCH_TEXT_SECTIONS: >>> break; >>> >>> case NOTE_INSN_EPILOGUE_BEG: >>> - case NOTE_INSN_SWITCH_TEXT_SECTIONS: >>> emit_note_copy (insn); >>> break; >>> >>> @@ -3759,10 +3976,13 @@ break_superblocks (void) >>> } >>> >>> /* Finalize the changes: reorder insn list according to the sequence specified >>> - by aux pointers, enter compensation code, rebuild scope forest. */ >>> + by aux pointers, enter compensation code, rebuild scope forest. If >>> + this is called when we will FINALIZE_REORDER_BLOCKS, indicate that >>> + to fixup_reorder_chain so that it can insert the proper switch text >>> + section notes. */ >>> >>> void >>> -cfg_layout_finalize (void) >>> +cfg_layout_finalize (bool finalize_reorder_blocks) >>> { >>> #ifdef ENABLE_CHECKING >>> verify_flow_info (); >>> @@ -3775,7 +3995,7 @@ void >>> #endif >>> ) >>> fixup_fallthru_exit_predecessor (); >>> - fixup_reorder_chain (); >>> + fixup_reorder_chain (finalize_reorder_blocks); >>> >>> rebuild_jump_labels (get_insns ()); >>> delete_dead_jumptables (); >>> @@ -4454,8 +4674,7 @@ rtl_can_remove_branch_p (const_edge e) >>> if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH)) >>> return false; >>> >>> - if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX) >>> - || BB_PARTITION (src) != BB_PARTITION (target)) >>> + if (BB_PARTITION (src) != BB_PARTITION (target)) >>> return false; >>> >>> if (!onlyjump_p (insn) >>> >>> -- >>> This patch is available for review at http://codereview.appspot.com/6823047 >> >> >> >> -- >> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413 -- Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Sign in to reply to this message.
Here is the patch again, updated to use the new vec implementation. Thanks, Teresa Revised patch that fixes failures encountered when enabling -freorder-blocks-and-partition, including the failure reported in PR 53743. This includes new verification code to ensure no cold blocks dominate hot blocks contributed by Steven Bosscher. I attempted to make the handling of partition updates through the optimization passes much more consistent, removing a number of partial fixes in the code stream in the process. The code to fixup partitions (including the BB_PARTITION assignement, region crossing jump notes, and switch text section notes) is now handled in a few centralized locations. For example, inside rtl_redirect_edge_and_branch and force_nonfallthru_and_redirect, so that callers don't need to attempt the fixup themselves. For optimization passes that make adjustments to the cfg while in cfg layout mode that are not easy to fix up incrementally, the new routine fixup_partitions handles the cleanup globally. This does require calculation of the dominance relation, however, as far as I can tell the routines which now invoke this global fixup (try_optimize_cfg and commit_edge_insertions) are invoked typically once (or a small number of times in the case of try_optimize_cfg) per optimization pass. Additionally, I compared the -ftime-report output for some large fdo compilations and saw only minimal increases in the dominance computation times, which were only a tiny percent of the overall compile time. Additionally, I added a flag to the rtl_data structure to indicate whether any partitioning was actually performed, so that optimizations which were conservatively disabled whenever the flag_reorder_blocks_and_partition is enabled (e.g. try_crossjump_to_edge, part of connect_traces) can be less conservative for functions where no partitions were formed (e.g. they are completely hot). Bootstrapped and tested on x86_64-unknown-linux-gnu. Also tested with SPEC2006 int benchmarks and internal google benchmarks using profile feedback and -freorder-blocks-and-partition to get more coverage. Ok for trunk? Thanks, Teresa 2012-11-26 Teresa Johnson <tejohnson@google.com> Steven Bosscher <steven@gcc.gnu.org> * cfghooks.h (cfg_layout_finalize): New parameter. * modulo-sched.c (rest_of_handle_sms): New cfg_layout_finalize parameter. * ifcvt.c (find_if_case_1): Replace BB_COPY_PARTITION with assert as this is now done by redirect_edge_and_branch_force. * function.c (thread_prologue_and_epilogue_insns): Insert new bb after barriers, new cfg_layout_finalize parameter, and don't store exit predecessor BB until after it is potentially split. * function.h (struct rtl_data): New flag has_bb_partition. * hw-doloop.c (reorder_loops): New cfg_layout_finalize parameter. * cfgcleanup.c (try_crossjump_to_edge): Only skip optimization if any blocks in function actually partitioned. (try_optimize_cfg): If cfg changed, invoke fixup_partitions to clean up partitioning. * bb-reorder.c (connect_traces): Only look for partitions and skip block copying if any blocks in function actually partitioned. (emit_barrier_after_bb): Handle insertion in non-cfglayout mode. (find_rarely_executed_basic_blocks_and_crossing_edges): Ensure that no cold blocks dominate a hot block. (fix_up_fall_thru_edges): Replace BB_COPY_PARTITION with assert as this is now done by force_nonfallthru_and_redirect. (add_reg_crossing_jump_notes): Handle the fact that some jumps may already be marked with region crossing note. (reorder_basic_blocks): Only need to verify partitions if any blocks in function actually partitioned. (insert_section_boundary_note): Only need to insert note if any blocks in function actually partitioned. (rest_of_handle_reorder_blocks): New cfg_layout_finalize parameter, and remove call to insert_section_boundary_note as this is now called via cfg_layout_finalize/fixup_reorder_chain. (duplicate_computed_gotos): New cfg_layout_finalize parameter. (partition_hot_cold_basic_blocks): Set flag indicating function has bb partitions. * bb-reorder.h: Declare insert_section_boundary_note and emit_barrier_after_bb, which are no longer static. * basic-block.h: Declare new function fixup_partitions. * cfgrtl.c (try_redirect_by_replacing_jump): Remove unnecessary check for region crossing note. (fixup_partition_crossing): New function. (fixup_bb_partition): Ditto. (rtl_redirect_edge_and_branch): Fixup partition boundaries. (force_nonfallthru_and_redirect): Fixup partition boundaries, remove old code that tried to do this. Emit barrier correctly when we are in cfglayout mode. (rtl_split_edge): Correctly fixup partition boundaries. (commit_one_edge_insertion): Remove old code that tried to fixup region crossing edge since this is now handled in split_block, and set up insertion point correctly since block may now end in a jump. (commit_edge_insertions): Invoke fixup_partitions to sanitize partition boundaries after optimizations that modify cfg and before trying to verify the flow info. (fixup_partitions): New function. (rtl_verify_flow_info_1): Add verification that no cold bbs dominate hot bbs. (record_effective_endpoints): Remove region-crossing notes and set flag indicating that they need to be reinserted on exit from cfglayout mode. (outof_cfg_layout_mode): New cfg_layout_finalize parameter. (fixup_reorder_chain): Call insert_section_boundary_note if necessary. Remove old code that attempted to fixup region crossing note as this is now handled in force_nonfallthru_and_redirect. (duplicate_insn_chain): Don't duplicate switch section notes. (cfg_layout_finalize): Pass new parameter to fixup_reorder_chain. (rtl_can_remove_branch_p): Remove unnecessary check for region crossing note. Index: cfghooks.h =================================================================== --- cfghooks.h (revision 193827) +++ cfghooks.h (working copy) @@ -204,7 +204,7 @@ extern void copy_bbs (basic_block *, unsigned, bas void account_profile_record (struct profile_record *, int); extern void cfg_layout_initialize (unsigned int); -extern void cfg_layout_finalize (void); +extern void cfg_layout_finalize (bool); /* Hooks containers. */ extern struct cfg_hooks gimple_cfg_hooks; @@ -218,4 +218,3 @@ extern void cfg_layout_rtl_register_cfg_hooks (voi extern void gimple_register_cfg_hooks (void); extern struct cfg_hooks get_cfg_hooks (void); extern void set_cfg_hooks (struct cfg_hooks); - Index: modulo-sched.c =================================================================== --- modulo-sched.c (revision 193827) +++ modulo-sched.c (working copy) @@ -3347,7 +3347,7 @@ rest_of_handle_sms (void) if (bb->next_bb != EXIT_BLOCK_PTR) bb->aux = bb->next_bb; free_dominance_info (CDI_DOMINATORS); - cfg_layout_finalize (); + cfg_layout_finalize (false); #endif /* INSN_SCHEDULING */ return 0; } Index: ifcvt.c =================================================================== --- ifcvt.c (revision 193827) +++ ifcvt.c (working copy) @@ -3899,10 +3899,9 @@ find_if_case_1 (basic_block test_bb, edge then_edg if (new_bb) { df_bb_replace (then_bb_index, new_bb); - /* Since the fallthru edge was redirected from test_bb to new_bb, - we need to ensure that new_bb is in the same partition as - test bb (you can not fall through across section boundaries). */ - BB_COPY_PARTITION (new_bb, test_bb); + /* This should have been done above via force_nonfallthru_and_redirect + (possibly called from redirect_edge_and_branch_force). */ + gcc_assert (BB_PARTITION (new_bb) == BB_PARTITION (test_bb)); } num_true_changes++; Index: function.c =================================================================== --- function.c (revision 193827) +++ function.c (working copy) @@ -6246,8 +6246,10 @@ thread_prologue_and_epilogue_insns (void) break; if (e) { - copy_bb = create_basic_block (NEXT_INSN (BB_END (e->src)), - NULL_RTX, e->src); + /* Make sure we insert after any barriers. */ + rtx end = get_last_bb_insn (e->src); + copy_bb = create_basic_block (NEXT_INSN (end), + NULL_RTX, e->src); BB_COPY_PARTITION (copy_bb, e->src); } else @@ -6472,7 +6474,7 @@ thread_prologue_and_epilogue_insns (void) if (cur_bb->index >= NUM_FIXED_BLOCKS && cur_bb->next_bb->index >= NUM_FIXED_BLOCKS) cur_bb->aux = cur_bb->next_bb; - cfg_layout_finalize (); + cfg_layout_finalize (false); } epilogue_done: @@ -6514,7 +6516,7 @@ epilogue_done: basic_block simple_return_block_cold = NULL; edge pending_edge_hot = NULL; edge pending_edge_cold = NULL; - basic_block exit_pred = EXIT_BLOCK_PTR->prev_bb; + basic_block exit_pred; int i; gcc_assert (entry_edge != orig_entry_edge); @@ -6542,6 +6544,12 @@ epilogue_done: else pending_edge_cold = e; } + + /* Save a pointer to the exit's predecessor BB for use in + inserting new BBs at the end of the function. Do this + after the call to split_block above which may split + the original exit pred. */ + exit_pred = EXIT_BLOCK_PTR->prev_bb; FOR_EACH_VEC_ELT (unconverted_simple_returns, i, e) { Index: function.h =================================================================== --- function.h (revision 193827) +++ function.h (working copy) @@ -451,6 +451,11 @@ struct GTY(()) rtl_data { sched2) and is useful only if the port defines LEAF_REGISTERS. */ bool uses_only_leaf_regs; + /* Nonzero if the function being compiled has undergone hot/cold partitioning + (under flag_reorder_blocks_and_partition) and has at least one cold + block. */ + bool has_bb_partition; + /* Like regs_ever_live, but 1 if a reg is set or clobbered from an asm. Unlike regs_ever_live, elements of this array corresponding to eliminable regs (like the frame pointer) are set if an asm Index: hw-doloop.c =================================================================== --- hw-doloop.c (revision 193827) +++ hw-doloop.c (working copy) @@ -547,7 +547,7 @@ reorder_loops (hwloop_info loops) else bb->aux = NULL; } - cfg_layout_finalize (); + cfg_layout_finalize (false); clear_aux_for_blocks (); df_analyze (); } Index: cfgcleanup.c =================================================================== --- cfgcleanup.c (revision 193827) +++ cfgcleanup.c (working copy) @@ -1846,7 +1846,7 @@ try_crossjump_to_edge (int mode, edge e1, edge e2, partition boundaries). See the comments at the top of bb-reorder.c:partition_hot_cold_basic_blocks for complete details. */ - if (flag_reorder_blocks_and_partition && reload_completed) + if (crtl->has_bb_partition && reload_completed) return false; /* Search backward through forwarder blocks. We don't need to worry @@ -2789,10 +2789,21 @@ try_optimize_cfg (int mode) df_analyze (); } + if (changed) + { + /* Edge forwarding in particular can cause hot blocks previously + reached by both hot and cold blocks to become dominated only + by cold blocks. This will cause the verification below to fail, + and lead to now cold code in the hot section. This is not easy + to detect and fix during edge forwarding, and in some cases + is only visible after newly unreachable blocks are deleted, + which will be done in fixup_partitions. */ + fixup_partitions (); + #ifdef ENABLE_CHECKING - if (changed) - verify_flow_info (); + verify_flow_info (); #endif + } changed_overall |= changed; first_pass = false; Index: bb-reorder.c =================================================================== --- bb-reorder.c (revision 193827) +++ bb-reorder.c (working copy) @@ -1054,7 +1054,7 @@ connect_traces (int n_traces, struct trace *traces current_partition = BB_PARTITION (traces[0].first); two_passes = false; - if (flag_reorder_blocks_and_partition) + if (crtl->has_bb_partition) for (i = 0; i < n_traces && !two_passes; i++) if (BB_PARTITION (traces[0].first) != BB_PARTITION (traces[i].first)) @@ -1263,7 +1263,7 @@ connect_traces (int n_traces, struct trace *traces } } - if (flag_reorder_blocks_and_partition) + if (crtl->has_bb_partition) try_copy = false; /* Copy tiny blocks always; copy larger blocks only when the @@ -1381,13 +1381,14 @@ get_uncond_jump_length (void) return length; } -/* Emit a barrier into the footer of BB. */ +/* Emit a barrier after BB, into the footer if we are in CFGLAYOUT mode. */ -static void +void emit_barrier_after_bb (basic_block bb) { rtx barrier = emit_barrier_after (BB_END (bb)); - BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier); + if (current_ir_type () == IR_RTL_CFGLAYOUT) + BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier); } /* The landing pad OLD_LP, in block OLD_BB, has edges from both partitions. @@ -1463,18 +1464,109 @@ find_rarely_executed_basic_blocks_and_crossing_edg { vec<edge> crossing_edges = vNULL; basic_block bb; - edge e; - edge_iterator ei; + edge e, e2; + edge_iterator ei, ei2; + unsigned int cold_bb_count = 0; + vec<basic_block> bbs_in_hot_partition = vNULL; + vec<basic_block> bbs_newly_hot = vNULL; /* Mark which partition (hot/cold) each basic block belongs in. */ FOR_EACH_BB (bb) { if (probably_never_executed_bb_p (cfun, bb)) - BB_SET_PARTITION (bb, BB_COLD_PARTITION); + { + BB_SET_PARTITION (bb, BB_COLD_PARTITION); + cold_bb_count++; + } else - BB_SET_PARTITION (bb, BB_HOT_PARTITION); + { + BB_SET_PARTITION (bb, BB_HOT_PARTITION); + bbs_in_hot_partition.safe_push (bb); + } } + /* Ensure that no cold bbs dominate hot bbs. This could happen as a result of + several different possibilities. One is that there are edge weight insanities + due to optimization phases that do not properly update basic block profile + counts. The second is that the entry of the function may not be hot, because + it is entered fewer times than the number of profile training runs, but there + is a loop inside the function that causes blocks within the function to be + above the threshold for hotness. */ + if (cold_bb_count) + { + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); + + if (dom_calculated_here) + calculate_dominance_info (CDI_DOMINATORS); + + /* Keep examining hot bbs until we have either checked them all, or + re-marked all cold bbs hot. */ + while (! bbs_in_hot_partition.is_empty () + && cold_bb_count) + { + basic_block dom_bb; + + bb = bbs_in_hot_partition.pop (); + dom_bb = get_immediate_dominator (CDI_DOMINATORS, bb); + + /* If bb's immediate dominator is also hot then it is ok. */ + if (BB_PARTITION (dom_bb) != BB_COLD_PARTITION) + continue; + + /* We have a hot bb with an immediate dominator that is cold. + The dominator needs to be re-marked to hot. */ + BB_SET_PARTITION (dom_bb, BB_HOT_PARTITION); + cold_bb_count--; + + /* Now we need to examine newly-hot dom_bb to see if it is also + dominated by a cold bb. */ + bbs_in_hot_partition.safe_push (dom_bb); + + /* We should also adjust any cold blocks that the newly-hot bb + feeds and see if it makes sense to re-mark those as hot as + well. */ + bbs_newly_hot.safe_push (dom_bb); + while (! bbs_newly_hot.is_empty ()) + { + basic_block new_hot_bb = bbs_newly_hot.pop (); + /* Examine all successors of this newly-hot bb to see if they + are cold and should be re-marked as hot. */ + FOR_EACH_EDGE (e, ei, new_hot_bb->succs) + { + bool any_cold_preds = false; + basic_block succ = e->dest; + if (BB_PARTITION (succ) != BB_COLD_PARTITION) + continue; + /* Does this block have any cold predecessors now? */ + FOR_EACH_EDGE (e2, ei2, succ->preds) + { + if (BB_PARTITION (e2->src) == BB_COLD_PARTITION) + { + any_cold_preds = true; + break; + } + } + if (any_cold_preds) + continue; + + /* Here we have a successor of newly-hot bb that is cold + but no longer has any cold precessessors. Since the original + assignment of our newly-hot bb was incorrect, this successor's + assignment as cold is also suspect. Go ahead and re-mark it + as hot now too. Better heuristics may be in order here. */ + BB_SET_PARTITION (succ, BB_HOT_PARTITION); + cold_bb_count--; + bbs_in_hot_partition.safe_push (succ); + /* Examine this successor as a newly-hot bb. */ + bbs_newly_hot.safe_push (succ); + } + } + } + + if (dom_calculated_here) + free_dominance_info (CDI_DOMINATORS); + } + /* The format of .gcc_except_table does not allow landing pads to be in a different partition as the throw. Fix this by either moving or duplicating the landing pads. */ @@ -1766,10 +1858,10 @@ fix_up_fall_thru_edges (void) new_bb->aux = cur_bb->aux; cur_bb->aux = new_bb; - /* Make sure new fall-through bb is in same - partition as bb it's falling through from. */ + /* This is done by force_nonfallthru_and_redirect. */ + gcc_assert (BB_PARTITION (new_bb) + == BB_PARTITION (cur_bb)); - BB_COPY_PARTITION (new_bb, cur_bb); single_succ_edge (new_bb)->flags |= EDGE_CROSSING; } else @@ -2067,7 +2159,10 @@ add_reg_crossing_jump_notes (void) FOR_EACH_BB (bb) FOR_EACH_EDGE (e, ei, bb->succs) if ((e->flags & EDGE_CROSSING) - && JUMP_P (BB_END (e->src))) + && JUMP_P (BB_END (e->src)) + /* Some notes were added during fix_up_fall_thru_edges, via + force_nonfallthru_and_redirect. */ + && !find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX)) add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); } @@ -2160,7 +2255,7 @@ reorder_basic_blocks (void) dump_flow_info (dump_file, dump_flags); } - if (flag_reorder_blocks_and_partition) + if (crtl->has_bb_partition) verify_hot_cold_block_grouping (); } @@ -2172,14 +2267,14 @@ reorder_basic_blocks (void) encountering this note will make the compiler switch between the hot and cold text sections. */ -static void +void insert_section_boundary_note (void) { basic_block bb; rtx new_note; int first_partition = 0; - if (!flag_reorder_blocks_and_partition) + if (!crtl->has_bb_partition) return; FOR_EACH_BB (bb) @@ -2222,10 +2317,8 @@ rest_of_handle_reorder_blocks (void) FOR_EACH_BB (bb) if (bb->next_bb != EXIT_BLOCK_PTR) bb->aux = bb->next_bb; - cfg_layout_finalize (); + cfg_layout_finalize (true); - /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes. */ - insert_section_boundary_note (); return 0; } @@ -2366,7 +2459,7 @@ duplicate_computed_gotos (void) } done: - cfg_layout_finalize (); + cfg_layout_finalize (false); BITMAP_FREE (candidates); return 0; @@ -2511,6 +2604,8 @@ partition_hot_cold_basic_blocks (void) if (!crossing_edges.exists ()) return 0; + crtl->has_bb_partition = true; + /* Make sure the source of any crossing edge ends in a jump and the destination of any crossing edge has a label. */ add_labels_and_missing_jumps (crossing_edges); Index: bb-reorder.h =================================================================== --- bb-reorder.h (revision 193827) +++ bb-reorder.h (working copy) @@ -36,4 +36,8 @@ extern struct target_bb_reorder *this_target_bb_re extern int get_uncond_jump_length (void); +extern void insert_section_boundary_note (void); + +extern void emit_barrier_after_bb (basic_block bb); + #endif Index: basic-block.h =================================================================== --- basic-block.h (revision 193827) +++ basic-block.h (working copy) @@ -800,6 +800,7 @@ extern basic_block force_nonfallthru_and_redirect extern bool contains_no_active_insn_p (const_basic_block); extern bool forwarder_block_p (const_basic_block); extern bool can_fallthru (basic_block, basic_block); +extern void fixup_partitions (void); /* In cfgbuild.c. */ extern void find_many_sub_basic_blocks (sbitmap); Index: cfgrtl.c =================================================================== --- cfgrtl.c (revision 193827) +++ cfgrtl.c (working copy) @@ -46,6 +46,7 @@ along with GCC; see the file COPYING3. If not see #include "tree.h" #include "hard-reg-set.h" #include "basic-block.h" +#include "bb-reorder.h" #include "regs.h" #include "flags.h" #include "function.h" @@ -67,11 +68,12 @@ along with GCC; see the file COPYING3. If not see Only applicable if the CFG is in cfglayout mode. */ static GTY(()) rtx cfg_layout_function_footer; static GTY(()) rtx cfg_layout_function_header; +static bool had_sec_boundary_notes; static rtx skip_insns_after_block (basic_block); static void record_effective_endpoints (void); static rtx label_for_bb (basic_block); -static void fixup_reorder_chain (void); +static void fixup_reorder_chain (bool finalize_reorder_blocks); void verify_insn_chain (void); static void fixup_fallthru_exit_predecessor (void); @@ -976,8 +978,7 @@ try_redirect_by_replacing_jump (edge e, basic_bloc partition boundaries). See the comments at the top of bb-reorder.c:partition_hot_cold_basic_blocks for complete details. */ - if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX) - || BB_PARTITION (src) != BB_PARTITION (target)) + if (BB_PARTITION (src) != BB_PARTITION (target)) return NULL; /* We can replace or remove a complex jump only when we have exactly @@ -1286,6 +1287,71 @@ redirect_branch_edge (edge e, basic_block target) return e; } +/* Called when edge E has been redirected to a new destination, + in order to update the region crossing flag on the edge and + jump. */ + +static void +fixup_partition_crossing (edge e, basic_block target) +{ + rtx note; + + gcc_assert (e->dest == target); + + if (e->src == ENTRY_BLOCK_PTR || target == EXIT_BLOCK_PTR) + return; + /* If we redirected an existing edge, it may already be marked + crossing, even though the new src is missing a reg crossing note. + But make sure reg crossing note doesn't already exist before + inserting. */ + if (BB_PARTITION (e->src) != BB_PARTITION (target)) + { + e->flags |= EDGE_CROSSING; + note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); + if (JUMP_P (BB_END (e->src)) + && !note) + add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); + } + else if (BB_PARTITION (e->src) == BB_PARTITION (target)) + { + e->flags &= ~EDGE_CROSSING; + /* Remove the region crossing note from jump at end of + e->src if it exists. */ + note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); + if (note) + remove_note (BB_END (e->src), note); + } +} + +/* Called when block BB has been reassigned to a different partition, + to ensure that the region crossing attributes are updated. */ + +static void +fixup_bb_partition (basic_block bb) +{ + edge e; + edge_iterator ei; + + /* Now need to make bb's pred edges non-region crossing. */ + FOR_EACH_EDGE (e, ei, bb->preds) + { + fixup_partition_crossing (e, e->dest); + } + + /* Possibly need to make bb's successor edges region crossing, + or remove stale region crossing. */ + FOR_EACH_EDGE (e, ei, bb->succs) + { + if ((e->flags & EDGE_FALLTHRU) + && BB_PARTITION (bb) != BB_PARTITION (e->dest) + && e->dest != EXIT_BLOCK_PTR) + /* force_nonfallthru_and_redirect calls fixup_partition_crossing. */ + force_nonfallthru (e); + else + fixup_partition_crossing (e, e->dest); + } +} + /* Attempt to change code to redirect edge E to TARGET. Don't do that on expense of adding new instructions or reordering basic blocks. @@ -1302,16 +1368,18 @@ rtl_redirect_edge_and_branch (edge e, basic_block { edge ret; basic_block src = e->src; + basic_block dest = e->dest; if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH)) return NULL; - if (e->dest == target) + if (dest == target) return e; if ((ret = try_redirect_by_replacing_jump (e, target, false)) != NULL) { df_set_bb_dirty (src); + fixup_partition_crossing (ret, target); return ret; } @@ -1320,6 +1388,7 @@ rtl_redirect_edge_and_branch (edge e, basic_block return NULL; df_set_bb_dirty (src); + fixup_partition_crossing (ret, target); return ret; } @@ -1486,18 +1555,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc /* Make sure new block ends up in correct hot/cold section. */ BB_COPY_PARTITION (jump_block, e->src); - if (flag_reorder_blocks_and_partition - && targetm_common.have_named_sections - && JUMP_P (BB_END (jump_block)) - && !any_condjump_p (BB_END (jump_block)) - && (EDGE_SUCC (jump_block, 0)->flags & EDGE_CROSSING)) - add_reg_note (BB_END (jump_block), REG_CROSSING_JUMP, NULL_RTX); /* Wire edge in. */ new_edge = make_edge (e->src, jump_block, EDGE_FALLTHRU); new_edge->probability = probability; new_edge->count = count; + /* If e->src was previously region crossing, it no longer is + and the reg crossing note should be removed. */ + fixup_partition_crossing (new_edge, jump_block); + /* Redirect old edge. */ redirect_edge_pred (e, jump_block); e->probability = REG_BR_PROB_BASE; @@ -1553,13 +1620,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc LABEL_NUSES (label)++; } - emit_barrier_after (BB_END (jump_block)); + /* We might be in cfg layout mode, and if so, the following routine will + insert the barrier correctly. */ + emit_barrier_after_bb (jump_block); redirect_edge_succ_nodup (e, target); if (abnormal_edge_flags) make_edge (src, target, abnormal_edge_flags); df_mark_solutions_dirty (); + fixup_partition_crossing (e, target); return new_bb; } @@ -1658,7 +1734,7 @@ rtl_move_block_after (basic_block bb ATTRIBUTE_UNU static basic_block rtl_split_edge (edge edge_in) { - basic_block bb; + basic_block bb, new_bb; rtx before; /* Abnormal edges cannot be split. */ @@ -1691,12 +1767,26 @@ rtl_split_edge (edge edge_in) else { bb = create_basic_block (before, NULL, edge_in->dest->prev_bb); - /* ??? Why not edge_in->dest->prev_bb here? */ - BB_COPY_PARTITION (bb, edge_in->dest); + if (edge_in->src == ENTRY_BLOCK_PTR) + BB_COPY_PARTITION (bb, edge_in->dest); + else + /* Put the split bb into the src partition, to avoid creating + a situation where a cold bb dominates a hot bb, in the case + where src is cold and dest is hot. The src will dominate + the new bb (whereas it might not have dominated dest). */ + BB_COPY_PARTITION (bb, edge_in->src); } make_single_succ_edge (bb, edge_in->dest, EDGE_FALLTHRU); + /* Can't allow a region crossing edge to be fallthrough. */ + if (BB_PARTITION (bb) != BB_PARTITION (edge_in->dest) + && edge_in->dest != EXIT_BLOCK_PTR) + { + new_bb = force_nonfallthru (single_succ_edge (bb)); + gcc_assert (!new_bb); + } + /* For non-fallthru edges, we must adjust the predecessor's jump instruction to target our new block. */ if ((edge_in->flags & EDGE_FALLTHRU) == 0) @@ -1809,17 +1899,13 @@ commit_one_edge_insertion (edge e) else { bb = split_edge (e); - after = BB_END (bb); - if (flag_reorder_blocks_and_partition - && targetm_common.have_named_sections - && e->src != ENTRY_BLOCK_PTR - && BB_PARTITION (e->src) == BB_COLD_PARTITION - && !(e->flags & EDGE_CROSSING) - && JUMP_P (after) - && !any_condjump_p (after) - && (single_succ_edge (bb)->flags & EDGE_CROSSING)) - add_reg_note (after, REG_CROSSING_JUMP, NULL_RTX); + /* If e crossed a partition boundary, we needed to make bb end in + a region-crossing jump, even though it was originally fallthru. */ + if (JUMP_P (BB_END (bb))) + before = BB_END (bb); + else + after = BB_END (bb); } /* Now that we've found the spot, do the insertion. */ @@ -1859,6 +1945,14 @@ commit_edge_insertions (void) { basic_block bb; + /* Optimization passes that invoke this routine can cause hot blocks + previously reached by both hot and cold blocks to become dominated only + by cold blocks. This will cause the verification below to fail, + and lead to now cold code in the hot section. In some cases this + may only be visible after newly unreachable blocks are deleted, + which will be done by fixup_partitions. */ + fixup_partitions (); + #ifdef ENABLE_CHECKING verify_flow_info (); #endif @@ -2060,7 +2154,75 @@ get_last_bb_insn (basic_block bb) return end; } - + +/* Perform cleanup on the hot/cold bb partitioning after optimization + passes that modify the cfg. */ + +void +fixup_partitions (void) +{ + basic_block bb; + + if (!crtl->has_bb_partition) + return; + + /* Delete any blocks that became unreachable and weren't + already cleaned up, for example during edge forwarding + and convert_jumps_to_returns. This will expose more + opportunities for fixing the partition boundaries here. + Also, the calculation of the dominance graph during verification + will assert if there are unreachable nodes. */ + delete_unreachable_blocks (); + + /* If there are partitions, do a sanity check on them: A basic block in + a cold partition cannot dominate a basic block in a hot partition. + Fixup any that now violate this requirement, as a result of edge + forwarding and unreachable block deletion. */ + vec<basic_block> bbs_in_cold_partition = vNULL; + vec<basic_block> bbs_to_fix = vNULL; + FOR_EACH_BB (bb) + if ((BB_PARTITION (bb) == BB_COLD_PARTITION)) + bbs_in_cold_partition.safe_push (bb); + if (! bbs_in_cold_partition.is_empty ()) + { + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); + basic_block son; + + if (dom_calculated_here) + calculate_dominance_info (CDI_DOMINATORS); + + while (! bbs_in_cold_partition.is_empty ()) + { + bb = bbs_in_cold_partition.pop (); + /* If bb is not yet cold (because it was added below as + a block dominated by a cold bb) then mark it cold here. */ + if ((BB_PARTITION (bb) != BB_COLD_PARTITION)) + { + BB_SET_PARTITION (bb, BB_COLD_PARTITION); + bbs_to_fix.safe_push (bb); + } + /* Any blocks dominated by a block in the cold section + must also be cold. */ + for (son = first_dom_son (CDI_DOMINATORS, bb); + son; + son = next_dom_son (CDI_DOMINATORS, son)) + bbs_in_cold_partition.safe_push (son); + } + + if (dom_calculated_here) + free_dominance_info (CDI_DOMINATORS); + } + + /* Do the partition fixup after all necessary blocks have been converted to + cold, so that we only update the region crossings the minimum number of + places, which can require forcing edges to be non fallthru. */ + while (! bbs_to_fix.is_empty ()) + { + bb = bbs_to_fix.pop (); + fixup_bb_partition (bb); + } +} + /* Verify the CFG and RTL consistency common for both underlying RTL and cfglayout RTL. @@ -2084,6 +2246,7 @@ rtl_verify_flow_info_1 (void) rtx x; int err = 0; basic_block bb; + bool have_partitions = false; /* Check the general integrity of the basic blocks. */ FOR_EACH_BB_REVERSE (bb) @@ -2201,6 +2364,8 @@ rtl_verify_flow_info_1 (void) if (e->flags & EDGE_ABNORMAL) n_abnormal++; + + have_partitions |= is_crossing; } if (n_eh && !find_reg_note (BB_END (bb), REG_EH_REGION, NULL_RTX)) @@ -2325,6 +2490,40 @@ rtl_verify_flow_info_1 (void) } } + /* If there are partitions, do a sanity check on them: A basic block in + a cold partition cannot dominate a basic block in a hot partition. */ + vec<basic_block> bbs_in_cold_partition = vNULL; + if (have_partitions && !err) + FOR_EACH_BB (bb) + if ((BB_PARTITION (bb) == BB_COLD_PARTITION)) + bbs_in_cold_partition.safe_push (bb); + if (! bbs_in_cold_partition.is_empty ()) + { + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); + basic_block son; + + if (dom_calculated_here) + calculate_dominance_info (CDI_DOMINATORS); + + while (! bbs_in_cold_partition.is_empty ()) + { + bb = bbs_in_cold_partition.pop (); + if ((BB_PARTITION (bb) != BB_COLD_PARTITION)) + { + error ("non-cold basic block %d dominated " + "by a block in the cold partition", bb->index); + err = 1; + } + for (son = first_dom_son (CDI_DOMINATORS, bb); + son; + son = next_dom_son (CDI_DOMINATORS, son)) + bbs_in_cold_partition.safe_push (son); + } + + if (dom_calculated_here) + free_dominance_info (CDI_DOMINATORS); + } + /* Clean up. */ return err; } @@ -2997,14 +3196,41 @@ record_effective_endpoints (void) else cfg_layout_function_header = NULL_RTX; + had_sec_boundary_notes = false; + next_insn = get_insns (); FOR_EACH_BB (bb) { rtx end; if (PREV_INSN (BB_HEAD (bb)) && next_insn != BB_HEAD (bb)) - BB_HEADER (bb) = unlink_insn_chain (next_insn, - PREV_INSN (BB_HEAD (bb))); + { + /* Rather than try to keep section boundary notes incrementally + up-to-date through cfg layout optimizations, simply remove them + and flag that they should be re-inserted when exiting + cfg layout mode. */ + rtx check_insn = next_insn; + while (check_insn) + { + if (NOTE_P (check_insn) + && NOTE_KIND (check_insn) == NOTE_INSN_SWITCH_TEXT_SECTIONS) + { + had_sec_boundary_notes |= true; + /* Remove note from chain. Grab new next_insn first. */ + if (next_insn == check_insn) + next_insn = NEXT_INSN (check_insn); + /* Delete note. */ + delete_insn (check_insn); + /* There will only be one. */ + break; + } + check_insn = NEXT_INSN (check_insn); + } + /* If we still have header instructions left after above loop. */ + if (next_insn != BB_HEAD (bb)) + BB_HEADER (bb) = unlink_insn_chain (next_insn, + PREV_INSN (BB_HEAD (bb))); + } end = skip_insns_after_block (bb); if (NEXT_INSN (BB_END (bb)) && BB_END (bb) != end) BB_FOOTER (bb) = unlink_insn_chain (NEXT_INSN (BB_END (bb)), end); @@ -3032,7 +3258,7 @@ outof_cfg_layout_mode (void) if (bb->next_bb != EXIT_BLOCK_PTR) bb->aux = bb->next_bb; - cfg_layout_finalize (); + cfg_layout_finalize (false); return 0; } @@ -3152,10 +3378,13 @@ relink_block_chain (bool stay_in_cfglayout_mode) } -/* Given a reorder chain, rearrange the code to match. */ +/* Given a reorder chain, rearrange the code to match. If + this is called when we will FINALIZE_REORDER_BLOCKS, or when + section boundary notes were removed on entry to cfg layout + mode, insert section boundary notes here. */ static void -fixup_reorder_chain (void) +fixup_reorder_chain (bool finalize_reorder_blocks) { basic_block bb; rtx insn = NULL; @@ -3182,7 +3411,7 @@ static void PREV_INSN (BB_HEADER (bb)) = insn; insn = BB_HEADER (bb); while (NEXT_INSN (insn)) - insn = NEXT_INSN (insn); + insn = NEXT_INSN (insn); } if (insn) NEXT_INSN (insn) = BB_HEAD (bb); @@ -3207,6 +3436,11 @@ static void insn = NEXT_INSN (insn); set_last_insn (insn); + + /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes. */ + if (had_sec_boundary_notes || finalize_reorder_blocks) + insert_section_boundary_note (); + #ifdef ENABLE_CHECKING verify_insn_chain (); #endif @@ -3219,7 +3453,7 @@ static void edge e_fall, e_taken, e; rtx bb_end_insn; rtx ret_label = NULL_RTX; - basic_block nb, src_bb; + basic_block nb; edge_iterator ei; if (EDGE_COUNT (bb->succs) == 0) @@ -3354,7 +3588,6 @@ static void /* We got here if we need to add a new jump insn. Note force_nonfallthru can delete E_FALL and thus we have to save E_FALL->src prior to the call to force_nonfallthru. */ - src_bb = e_fall->src; nb = force_nonfallthru_and_redirect (e_fall, e_fall->dest, ret_label); if (nb) { @@ -3362,17 +3595,6 @@ static void bb->aux = nb; /* Don't process this new block. */ bb = nb; - - /* Make sure new bb is tagged for correct section (same as - fall-thru source, since you cannot fall-thru across - section boundaries). */ - BB_COPY_PARTITION (src_bb, single_pred (bb)); - if (flag_reorder_blocks_and_partition - && targetm_common.have_named_sections - && JUMP_P (BB_END (bb)) - && !any_condjump_p (BB_END (bb)) - && (EDGE_SUCC (bb, 0)->flags & EDGE_CROSSING)) - add_reg_note (BB_END (bb), REG_CROSSING_JUMP, NULL_RTX); } } @@ -3676,10 +3898,11 @@ duplicate_insn_chain (rtx from, rtx to) case NOTE_INSN_FUNCTION_BEG: /* There is always just single entry to function. */ case NOTE_INSN_BASIC_BLOCK: + /* We should only switch text sections once. */ + case NOTE_INSN_SWITCH_TEXT_SECTIONS: break; case NOTE_INSN_EPILOGUE_BEG: - case NOTE_INSN_SWITCH_TEXT_SECTIONS: emit_note_copy (insn); break; @@ -3791,10 +4014,13 @@ break_superblocks (void) } /* Finalize the changes: reorder insn list according to the sequence specified - by aux pointers, enter compensation code, rebuild scope forest. */ + by aux pointers, enter compensation code, rebuild scope forest. If + this is called when we will FINALIZE_REORDER_BLOCKS, indicate that + to fixup_reorder_chain so that it can insert the proper switch text + section notes. */ void -cfg_layout_finalize (void) +cfg_layout_finalize (bool finalize_reorder_blocks) { #ifdef ENABLE_CHECKING verify_flow_info (); @@ -3807,7 +4033,7 @@ void #endif ) fixup_fallthru_exit_predecessor (); - fixup_reorder_chain (); + fixup_reorder_chain (finalize_reorder_blocks); rebuild_jump_labels (get_insns ()); delete_dead_jumptables (); @@ -4486,8 +4712,7 @@ rtl_can_remove_branch_p (const_edge e) if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH)) return false; - if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX) - || BB_PARTITION (src) != BB_PARTITION (target)) + if (BB_PARTITION (src) != BB_PARTITION (target)) return false; if (!onlyjump_p (insn) On Mon, Nov 26, 2012 at 12:19 PM, Teresa Johnson <tejohnson@google.com> wrote: > Are you sure you have all my changes applied? I applied the 4 patches > attached to PR55121 into my trunk checkout that has my fixes, and to a > pristine trunk checkout. I configured and built both for > --target=arm-none-linux-gnueabi, and built using your options, .i file > and gcda file. I can reproduce the failure using the pristine trunk > with your patches but not with my fixed trunk + your patches. (I just > updated to head to pickup recent changes and get the same result. The > vec changes required some manual changes to the patch, which I will > resend shortly.) > > Without my fixes: > > $ ~/extra/gcc_trunk_3_arm-eabi/gcc/cc1 -fpreproce > ssed eval.i -quiet -dumpbase eval.c -march=armv7-a -mtune=cortex-a9 > -mthumb -mfpu=neon -mvectorize-with-neon-quad -mfloat-abi=softfp > -mtls-dialect=gnu -auxbase-strip eval.o -g -O3 -version -fprofile-use > -fno-common -o eval.s -freorder-blocks-and-partition > GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi) > compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version > 2.4.2-p1, MPC version 0.8.1 > GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 > GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi) > compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version > 2.4.2-p1, MPC version 0.8.1 > GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 > Compiler executable checksum: d19cc60a2f07de08237a8488bb35cd1a > eval.c: In function ‘Ge’: > eval.c:792:1: internal compiler error: in df_compact_blocks, at df-core.c:1560 > } > ^ > 0x622f71 df_compact_blocks() > ../../gcc_trunk_3/gcc/df-core.c:1560 > 0x5cfcb5 compact_blocks() > ../../gcc_trunk_3/gcc/cfg.c:162 > 0xc9dce0 reorder_basic_blocks > ../../gcc_trunk_3/gcc/bb-reorder.c:2154 > 0xc9dce0 rest_of_handle_reorder_blocks > ../../gcc_trunk_3/gcc/bb-reorder.c:2219 > Please submit a full bug report, > with preprocessed source if appropriate. > Please include the complete backtrace with any bug report. > See <http://gcc.gnu.org/bugs.html> for instructions. > > > With my fixes: > > $ ~/extra/gcc_trunk_4_arm-eabi/gcc/cc1 -fpreproce > ssed eval.i -quiet -dumpbase eval.c -march=armv7-a -mtune=cortex-a9 > -mthumb -mfpu=neon -mvectorize-with-neon-quad -mfloat-abi=softfp > -mtls-dialect=gnu -auxbase-strip eval.o -g -O3 -version -fprofile-use > -fno-common -o eval.s -freorder-blocks-and-partition > GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi) > compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version > 2.4.2-p1, MPC version 0.8.1 > GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 > GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi) > compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version > 2.4.2-p1, MPC version 0.8.1 > GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 > Compiler executable checksum: 45b468efa7c981f9afb44c4dac2424f3 > > > Thanks, > Teresa > > On Mon, Nov 26, 2012 at 8:25 AM, Christophe Lyon > <christophe.lyon@linaro.org> wrote: >> Hi, >> >> I have tested your patch on Spec2000 on ARM, and I can still see >> several failures caused by: >> "error: fallthru edge crosses section boundary", including the case >> described in PR55121. >> >> On 26 November 2012 16:55, Teresa Johnson <tejohnson@google.com> wrote: >>> Ping. >>> Teresa >>> >>> On Thu, Nov 15, 2012 at 12:10 PM, Teresa Johnson <tejohnson@google.com> wrote: >>>> Revised patch that fixes failures encountered when enabling >>>> -freorder-blocks-and-partition, including the failure reported in PR 53743. >>>> >>>> This includes new verification code to ensure no cold blocks dominate hot >>>> blocks contributed by Steven Bosscher. >>>> >>>> I attempted to make the handling of partition updates through the optimization >>>> passes much more consistent, removing a number of partial fixes in the code >>>> stream in the process. The code to fixup partitions (including the BB_PARTITION >>>> assignement, region crossing jump notes, and switch text section notes) is >>>> now handled in a few centralized locations. For example, inside >>>> rtl_redirect_edge_and_branch and force_nonfallthru_and_redirect, so that callers >>>> don't need to attempt the fixup themselves. >>>> >>>> For optimization passes that make adjustments to the cfg while in cfg layout >>>> mode that are not easy to fix up incrementally, the new routine >>>> fixup_partitions handles the cleanup globally. This does require calculation >>>> of the dominance relation, however, as far as I can tell the routines which >>>> now invoke this global fixup (try_optimize_cfg and commit_edge_insertions) >>>> are invoked typically once (or a small number of times in the case of >>>> try_optimize_cfg) per optimization pass. Additionally, I compared the >>>> -ftime-report output for some large fdo compilations and saw only minimal >>>> increases in the dominance computation times, which were only a tiny percent >>>> of the overall compile time. >>>> >>>> Additionally, I added a flag to the rtl_data structure to indicate whether >>>> any partitioning was actually performed, so that optimizations which were >>>> conservatively disabled whenever the flag_reorder_blocks_and_partition >>>> is enabled (e.g. try_crossjump_to_edge, part of connect_traces) can be less >>>> conservative for functions where no partitions were formed (e.g. they are >>>> completely hot). >>>> >>>> Bootstrapped and tested on x86_64-unknown-linux-gnu. Also tested with SPEC2006 int >>>> benchmarks and internal google benchmarks using profile feedback and >>>> -freorder-blocks-and-partition to get more coverage. Ok for trunk? >>>> >>>> Thanks, >>>> Teresa >>>> >>>> 2012-11-14 Teresa Johnson <tejohnson@google.com> >>>> Steven Bosscher <steven@gcc.gnu.org> >>>> >>>> * cfghooks.h (cfg_layout_finalize): New parameter. >>>> * modulo-sched.c (rest_of_handle_sms): New cfg_layout_finalize >>>> parameter. >>>> * ifcvt.c (find_if_case_1): Replace BB_COPY_PARTITION with assert >>>> as this is now done by redirect_edge_and_branch_force. >>>> * function.c (thread_prologue_and_epilogue_insns): Insert new bb after >>>> barriers, new cfg_layout_finalize parameter, and don't store exit >>>> predecessor BB until after it is potentially split. >>>> * function.h (struct rtl_data): New flag has_bb_partition. >>>> * hw-doloop.c (reorder_loops): New cfg_layout_finalize parameter. >>>> * cfgcleanup.c (try_crossjump_to_edge): Only skip optimization if >>>> any blocks in function actually partitioned. >>>> (try_optimize_cfg): If cfg changed, invoke fixup_partitions to clean >>>> up partitioning. >>>> * bb-reorder.c (connect_traces): Only look for partitions and skip >>>> block copying if any blocks in function actually partitioned. >>>> (emit_barrier_after_bb): Handle insertion in non-cfglayout mode. >>>> (find_rarely_executed_basic_blocks_and_crossing_edges): Ensure >>>> that no cold blocks dominate a hot block. >>>> (fix_up_fall_thru_edges): Replace BB_COPY_PARTITION with assert >>>> as this is now done by force_nonfallthru_and_redirect. >>>> (add_reg_crossing_jump_notes): Handle the fact that some jumps may >>>> already be marked with region crossing note. >>>> (reorder_basic_blocks): Only need to verify partitions if any >>>> blocks in function actually partitioned. >>>> (insert_section_boundary_note): Only need to insert note if any >>>> blocks in function actually partitioned. >>>> (rest_of_handle_reorder_blocks): New cfg_layout_finalize >>>> parameter, and remove call to insert_section_boundary_note as this >>>> is now called via cfg_layout_finalize/fixup_reorder_chain. >>>> (duplicate_computed_gotos): New cfg_layout_finalize >>>> parameter. >>>> (partition_hot_cold_basic_blocks): Set flag indicating function >>>> has bb partitions. >>>> * bb-reorder.h: Declare insert_section_boundary_note and >>>> emit_barrier_after_bb, which are no longer static. >>>> * basic-block.h: Declare new function fixup_partitions. >>>> * cfgrtl.c (try_redirect_by_replacing_jump): Remove unnecessary >>>> check for region crossing note. >>>> (fixup_partition_crossing): New function. >>>> (fixup_bb_partition): Ditto. >>>> (rtl_redirect_edge_and_branch): Fixup partition boundaries. >>>> (force_nonfallthru_and_redirect): Fixup partition boundaries, >>>> remove old code that tried to do this. Emit barrier correctly >>>> when we are in cfglayout mode. >>>> (rtl_split_edge): Correctly fixup partition boundaries. >>>> (commit_one_edge_insertion): Remove old code that tried to >>>> fixup region crossing edge since this is now handled in >>>> split_block, and set up insertion point correctly since >>>> block may now end in a jump. >>>> (commit_edge_insertions): Invoke fixup_partitions to sanitize partition >>>> boundaries after optimizations that modify cfg and before trying to >>>> verify the flow info. >>>> (fixup_partitions): New function. >>>> (rtl_verify_flow_info_1): Add verification that no cold bbs dominate >>>> hot bbs. >>>> (record_effective_endpoints): Remove region-crossing notes and set flag >>>> indicating that they need to be reinserted on exit from cfglayout mode. >>>> (outof_cfg_layout_mode): New cfg_layout_finalize parameter. >>>> (fixup_reorder_chain): Call insert_section_boundary_note if necessary. >>>> Remove old code that attempted to fixup region crossing note as >>>> this is now handled in force_nonfallthru_and_redirect. >>>> (duplicate_insn_chain): Don't duplicate switch section notes. >>>> (cfg_layout_finalize): Pass new parameter to fixup_reorder_chain. >>>> (rtl_can_remove_branch_p): Remove unnecessary check for region crossing >>>> note. >>>> >>>> Index: cfghooks.h >>>> =================================================================== >>>> --- cfghooks.h (revision 193376) >>>> +++ cfghooks.h (working copy) >>>> @@ -204,7 +204,7 @@ extern void copy_bbs (basic_block *, unsigned, bas >>>> void account_profile_record (struct profile_record *, int); >>>> >>>> extern void cfg_layout_initialize (unsigned int); >>>> -extern void cfg_layout_finalize (void); >>>> +extern void cfg_layout_finalize (bool); >>>> >>>> /* Hooks containers. */ >>>> extern struct cfg_hooks gimple_cfg_hooks; >>>> @@ -218,4 +218,3 @@ extern void cfg_layout_rtl_register_cfg_hooks (voi >>>> extern void gimple_register_cfg_hooks (void); >>>> extern struct cfg_hooks get_cfg_hooks (void); >>>> extern void set_cfg_hooks (struct cfg_hooks); >>>> - >>>> Index: modulo-sched.c >>>> =================================================================== >>>> --- modulo-sched.c (revision 193376) >>>> +++ modulo-sched.c (working copy) >>>> @@ -3354,7 +3354,7 @@ rest_of_handle_sms (void) >>>> if (bb->next_bb != EXIT_BLOCK_PTR) >>>> bb->aux = bb->next_bb; >>>> free_dominance_info (CDI_DOMINATORS); >>>> - cfg_layout_finalize (); >>>> + cfg_layout_finalize (false); >>>> #endif /* INSN_SCHEDULING */ >>>> return 0; >>>> } >>>> Index: ifcvt.c >>>> =================================================================== >>>> --- ifcvt.c (revision 193376) >>>> +++ ifcvt.c (working copy) >>>> @@ -3900,10 +3900,9 @@ find_if_case_1 (basic_block test_bb, edge then_edg >>>> if (new_bb) >>>> { >>>> df_bb_replace (then_bb_index, new_bb); >>>> - /* Since the fallthru edge was redirected from test_bb to new_bb, >>>> - we need to ensure that new_bb is in the same partition as >>>> - test bb (you can not fall through across section boundaries). */ >>>> - BB_COPY_PARTITION (new_bb, test_bb); >>>> + /* This should have been done above via force_nonfallthru_and_redirect >>>> + (possibly called from redirect_edge_and_branch_force). */ >>>> + gcc_assert (BB_PARTITION (new_bb) == BB_PARTITION (test_bb)); >>>> } >>>> >>>> num_true_changes++; >>>> Index: function.c >>>> =================================================================== >>>> --- function.c (revision 193376) >>>> +++ function.c (working copy) >>>> @@ -6249,8 +6249,10 @@ thread_prologue_and_epilogue_insns (void) >>>> break; >>>> if (e) >>>> { >>>> - copy_bb = create_basic_block (NEXT_INSN (BB_END (e->src)), >>>> - NULL_RTX, e->src); >>>> + /* Make sure we insert after any barriers. */ >>>> + rtx end = get_last_bb_insn (e->src); >>>> + copy_bb = create_basic_block (NEXT_INSN (end), >>>> + NULL_RTX, e->src); >>>> BB_COPY_PARTITION (copy_bb, e->src); >>>> } >>>> else >>>> @@ -6475,7 +6477,7 @@ thread_prologue_and_epilogue_insns (void) >>>> if (cur_bb->index >= NUM_FIXED_BLOCKS >>>> && cur_bb->next_bb->index >= NUM_FIXED_BLOCKS) >>>> cur_bb->aux = cur_bb->next_bb; >>>> - cfg_layout_finalize (); >>>> + cfg_layout_finalize (false); >>>> } >>>> >>>> epilogue_done: >>>> @@ -6517,7 +6519,7 @@ epilogue_done: >>>> basic_block simple_return_block_cold = NULL; >>>> edge pending_edge_hot = NULL; >>>> edge pending_edge_cold = NULL; >>>> - basic_block exit_pred = EXIT_BLOCK_PTR->prev_bb; >>>> + basic_block exit_pred; >>>> int i; >>>> >>>> gcc_assert (entry_edge != orig_entry_edge); >>>> @@ -6545,6 +6547,12 @@ epilogue_done: >>>> else >>>> pending_edge_cold = e; >>>> } >>>> + >>>> + /* Save a pointer to the exit's predecessor BB for use in >>>> + inserting new BBs at the end of the function. Do this >>>> + after the call to split_block above which may split >>>> + the original exit pred. */ >>>> + exit_pred = EXIT_BLOCK_PTR->prev_bb; >>>> >>>> FOR_EACH_VEC_ELT (edge, unconverted_simple_returns, i, e) >>>> { >>>> Index: function.h >>>> =================================================================== >>>> --- function.h (revision 193376) >>>> +++ function.h (working copy) >>>> @@ -459,6 +459,11 @@ struct GTY(()) rtl_data { >>>> sched2) and is useful only if the port defines LEAF_REGISTERS. */ >>>> bool uses_only_leaf_regs; >>>> >>>> + /* Nonzero if the function being compiled has undergone hot/cold partitioning >>>> + (under flag_reorder_blocks_and_partition) and has at least one cold >>>> + block. */ >>>> + bool has_bb_partition; >>>> + >>>> /* Like regs_ever_live, but 1 if a reg is set or clobbered from an >>>> asm. Unlike regs_ever_live, elements of this array corresponding >>>> to eliminable regs (like the frame pointer) are set if an asm >>>> Index: hw-doloop.c >>>> =================================================================== >>>> --- hw-doloop.c (revision 193376) >>>> +++ hw-doloop.c (working copy) >>>> @@ -547,7 +547,7 @@ reorder_loops (hwloop_info loops) >>>> else >>>> bb->aux = NULL; >>>> } >>>> - cfg_layout_finalize (); >>>> + cfg_layout_finalize (false); >>>> clear_aux_for_blocks (); >>>> df_analyze (); >>>> } >>>> Index: cfgcleanup.c >>>> =================================================================== >>>> --- cfgcleanup.c (revision 193376) >>>> +++ cfgcleanup.c (working copy) >>>> @@ -1824,7 +1824,7 @@ try_crossjump_to_edge (int mode, edge e1, edge e2, >>>> partition boundaries). See the comments at the top of >>>> bb-reorder.c:partition_hot_cold_basic_blocks for complete details. */ >>>> >>>> - if (flag_reorder_blocks_and_partition && reload_completed) >>>> + if (crtl->has_bb_partition && reload_completed) >>>> return false; >>>> >>>> /* Search backward through forwarder blocks. We don't need to worry >>>> @@ -2767,10 +2767,21 @@ try_optimize_cfg (int mode) >>>> df_analyze (); >>>> } >>>> >>>> + if (changed) >>>> + { >>>> + /* Edge forwarding in particular can cause hot blocks previously >>>> + reached by both hot and cold blocks to become dominated only >>>> + by cold blocks. This will cause the verification below to fail, >>>> + and lead to now cold code in the hot section. This is not easy >>>> + to detect and fix during edge forwarding, and in some cases >>>> + is only visible after newly unreachable blocks are deleted, >>>> + which will be done in fixup_partitions. */ >>>> + fixup_partitions (); >>>> + >>>> #ifdef ENABLE_CHECKING >>>> - if (changed) >>>> - verify_flow_info (); >>>> + verify_flow_info (); >>>> #endif >>>> + } >>>> >>>> changed_overall |= changed; >>>> first_pass = false; >>>> Index: bb-reorder.c >>>> =================================================================== >>>> --- bb-reorder.c (revision 193376) >>>> +++ bb-reorder.c (working copy) >>>> @@ -1054,7 +1054,7 @@ connect_traces (int n_traces, struct trace *traces >>>> current_partition = BB_PARTITION (traces[0].first); >>>> two_passes = false; >>>> >>>> - if (flag_reorder_blocks_and_partition) >>>> + if (crtl->has_bb_partition) >>>> for (i = 0; i < n_traces && !two_passes; i++) >>>> if (BB_PARTITION (traces[0].first) >>>> != BB_PARTITION (traces[i].first)) >>>> @@ -1263,7 +1263,7 @@ connect_traces (int n_traces, struct trace *traces >>>> } >>>> } >>>> >>>> - if (flag_reorder_blocks_and_partition) >>>> + if (crtl->has_bb_partition) >>>> try_copy = false; >>>> >>>> /* Copy tiny blocks always; copy larger blocks only when the >>>> @@ -1381,13 +1381,14 @@ get_uncond_jump_length (void) >>>> return length; >>>> } >>>> >>>> -/* Emit a barrier into the footer of BB. */ >>>> +/* Emit a barrier after BB, into the footer if we are in CFGLAYOUT mode. */ >>>> >>>> -static void >>>> +void >>>> emit_barrier_after_bb (basic_block bb) >>>> { >>>> rtx barrier = emit_barrier_after (BB_END (bb)); >>>> - BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier); >>>> + if (current_ir_type () == IR_RTL_CFGLAYOUT) >>>> + BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier); >>>> } >>>> >>>> /* The landing pad OLD_LP, in block OLD_BB, has edges from both partitions. >>>> @@ -1463,18 +1464,109 @@ find_rarely_executed_basic_blocks_and_crossing_edg >>>> { >>>> VEC(edge, heap) *crossing_edges = NULL; >>>> basic_block bb; >>>> - edge e; >>>> - edge_iterator ei; >>>> + edge e, e2; >>>> + edge_iterator ei, ei2; >>>> + unsigned int cold_bb_count = 0; >>>> + VEC (basic_block, heap) *bbs_in_hot_partition = NULL; >>>> + VEC (basic_block, heap) *bbs_newly_hot = NULL; >>>> >>>> /* Mark which partition (hot/cold) each basic block belongs in. */ >>>> FOR_EACH_BB (bb) >>>> { >>>> if (probably_never_executed_bb_p (cfun, bb)) >>>> - BB_SET_PARTITION (bb, BB_COLD_PARTITION); >>>> + { >>>> + BB_SET_PARTITION (bb, BB_COLD_PARTITION); >>>> + cold_bb_count++; >>>> + } >>>> else >>>> - BB_SET_PARTITION (bb, BB_HOT_PARTITION); >>>> + { >>>> + BB_SET_PARTITION (bb, BB_HOT_PARTITION); >>>> + VEC_safe_push (basic_block, heap, bbs_in_hot_partition, bb); >>>> + } >>>> } >>>> >>>> + /* Ensure that no cold bbs dominate hot bbs. This could happen as a result of >>>> + several different possibilities. One is that there are edge weight insanities >>>> + due to optimization phases that do not properly update basic block profile >>>> + counts. The second is that the entry of the function may not be hot, because >>>> + it is entered fewer times than the number of profile training runs, but there >>>> + is a loop inside the function that causes blocks within the function to be >>>> + above the threshold for hotness. */ >>>> + if (cold_bb_count) >>>> + { >>>> + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); >>>> + >>>> + if (dom_calculated_here) >>>> + calculate_dominance_info (CDI_DOMINATORS); >>>> + >>>> + /* Keep examining hot bbs until we have either checked them all, or >>>> + re-marked all cold bbs hot. */ >>>> + while (! VEC_empty (basic_block, bbs_in_hot_partition) >>>> + && cold_bb_count) >>>> + { >>>> + basic_block dom_bb; >>>> + >>>> + bb = VEC_pop (basic_block, bbs_in_hot_partition); >>>> + dom_bb = get_immediate_dominator (CDI_DOMINATORS, bb); >>>> + >>>> + /* If bb's immediate dominator is also hot then it is ok. */ >>>> + if (BB_PARTITION (dom_bb) != BB_COLD_PARTITION) >>>> + continue; >>>> + >>>> + /* We have a hot bb with an immediate dominator that is cold. >>>> + The dominator needs to be re-marked to hot. */ >>>> + BB_SET_PARTITION (dom_bb, BB_HOT_PARTITION); >>>> + cold_bb_count--; >>>> + >>>> + /* Now we need to examine newly-hot dom_bb to see if it is also >>>> + dominated by a cold bb. */ >>>> + VEC_safe_push (basic_block, heap, bbs_in_hot_partition, dom_bb); >>>> + >>>> + /* We should also adjust any cold blocks that the newly-hot bb >>>> + feeds and see if it makes sense to re-mark those as hot as >>>> + well. */ >>>> + VEC_safe_push (basic_block, heap, bbs_newly_hot, dom_bb); >>>> + while (! VEC_empty (basic_block, bbs_newly_hot)) >>>> + { >>>> + basic_block new_hot_bb = VEC_pop (basic_block, bbs_newly_hot); >>>> + /* Examine all successors of this newly-hot bb to see if they >>>> + are cold and should be re-marked as hot. */ >>>> + FOR_EACH_EDGE (e, ei, new_hot_bb->succs) >>>> + { >>>> + bool any_cold_preds = false; >>>> + basic_block succ = e->dest; >>>> + if (BB_PARTITION (succ) != BB_COLD_PARTITION) >>>> + continue; >>>> + /* Does this block have any cold predecessors now? */ >>>> + FOR_EACH_EDGE (e2, ei2, succ->preds) >>>> + { >>>> + if (BB_PARTITION (e2->src) == BB_COLD_PARTITION) >>>> + { >>>> + any_cold_preds = true; >>>> + break; >>>> + } >>>> + } >>>> + if (any_cold_preds) >>>> + continue; >>>> + >>>> + /* Here we have a successor of newly-hot bb that is cold >>>> + but no longer has any cold precessessors. Since the original >>>> + assignment of our newly-hot bb was incorrect, this successor's >>>> + assignment as cold is also suspect. Go ahead and re-mark it >>>> + as hot now too. Better heuristics may be in order here. */ >>>> + BB_SET_PARTITION (succ, BB_HOT_PARTITION); >>>> + cold_bb_count--; >>>> + VEC_safe_push (basic_block, heap, bbs_in_hot_partition, succ); >>>> + /* Examine this successor as a newly-hot bb. */ >>>> + VEC_safe_push (basic_block, heap, bbs_newly_hot, succ); >>>> + } >>>> + } >>>> + } >>>> + >>>> + if (dom_calculated_here) >>>> + free_dominance_info (CDI_DOMINATORS); >>>> + } >>>> + >>>> /* The format of .gcc_except_table does not allow landing pads to >>>> be in a different partition as the throw. Fix this by either >>>> moving or duplicating the landing pads. */ >>>> @@ -1766,10 +1858,10 @@ fix_up_fall_thru_edges (void) >>>> new_bb->aux = cur_bb->aux; >>>> cur_bb->aux = new_bb; >>>> >>>> - /* Make sure new fall-through bb is in same >>>> - partition as bb it's falling through from. */ >>>> + /* This is done by force_nonfallthru_and_redirect. */ >>>> + gcc_assert (BB_PARTITION (new_bb) >>>> + == BB_PARTITION (cur_bb)); >>>> >>>> - BB_COPY_PARTITION (new_bb, cur_bb); >>>> single_succ_edge (new_bb)->flags |= EDGE_CROSSING; >>>> } >>>> else >>>> @@ -2067,7 +2159,10 @@ add_reg_crossing_jump_notes (void) >>>> FOR_EACH_BB (bb) >>>> FOR_EACH_EDGE (e, ei, bb->succs) >>>> if ((e->flags & EDGE_CROSSING) >>>> - && JUMP_P (BB_END (e->src))) >>>> + && JUMP_P (BB_END (e->src)) >>>> + /* Some notes were added during fix_up_fall_thru_edges, via >>>> + force_nonfallthru_and_redirect. */ >>>> + && !find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX)) >>>> add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); >>>> } >>>> >>>> @@ -2160,7 +2255,7 @@ reorder_basic_blocks (void) >>>> dump_flow_info (dump_file, dump_flags); >>>> } >>>> >>>> - if (flag_reorder_blocks_and_partition) >>>> + if (crtl->has_bb_partition) >>>> verify_hot_cold_block_grouping (); >>>> } >>>> >>>> @@ -2172,14 +2267,14 @@ reorder_basic_blocks (void) >>>> encountering this note will make the compiler switch between the >>>> hot and cold text sections. */ >>>> >>>> -static void >>>> +void >>>> insert_section_boundary_note (void) >>>> { >>>> basic_block bb; >>>> rtx new_note; >>>> int first_partition = 0; >>>> >>>> - if (!flag_reorder_blocks_and_partition) >>>> + if (!crtl->has_bb_partition) >>>> return; >>>> >>>> FOR_EACH_BB (bb) >>>> @@ -2222,10 +2317,8 @@ rest_of_handle_reorder_blocks (void) >>>> FOR_EACH_BB (bb) >>>> if (bb->next_bb != EXIT_BLOCK_PTR) >>>> bb->aux = bb->next_bb; >>>> - cfg_layout_finalize (); >>>> + cfg_layout_finalize (true); >>>> >>>> - /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes. */ >>>> - insert_section_boundary_note (); >>>> return 0; >>>> } >>>> >>>> @@ -2366,7 +2459,7 @@ duplicate_computed_gotos (void) >>>> } >>>> >>>> done: >>>> - cfg_layout_finalize (); >>>> + cfg_layout_finalize (false); >>>> >>>> BITMAP_FREE (candidates); >>>> return 0; >>>> @@ -2511,6 +2604,8 @@ partition_hot_cold_basic_blocks (void) >>>> if (crossing_edges == NULL) >>>> return 0; >>>> >>>> + crtl->has_bb_partition = true; >>>> + >>>> /* Make sure the source of any crossing edge ends in a jump and the >>>> destination of any crossing edge has a label. */ >>>> add_labels_and_missing_jumps (crossing_edges); >>>> Index: bb-reorder.h >>>> =================================================================== >>>> --- bb-reorder.h (revision 193376) >>>> +++ bb-reorder.h (working copy) >>>> @@ -36,4 +36,8 @@ extern struct target_bb_reorder *this_target_bb_re >>>> >>>> extern int get_uncond_jump_length (void); >>>> >>>> +extern void insert_section_boundary_note (void); >>>> + >>>> +extern void emit_barrier_after_bb (basic_block bb); >>>> + >>>> #endif >>>> Index: basic-block.h >>>> =================================================================== >>>> --- basic-block.h (revision 193376) >>>> +++ basic-block.h (working copy) >>>> @@ -806,6 +806,7 @@ extern basic_block force_nonfallthru_and_redirect >>>> extern bool contains_no_active_insn_p (const_basic_block); >>>> extern bool forwarder_block_p (const_basic_block); >>>> extern bool can_fallthru (basic_block, basic_block); >>>> +extern void fixup_partitions (void); >>>> >>>> /* In cfgbuild.c. */ >>>> extern void find_many_sub_basic_blocks (sbitmap); >>>> Index: cfgrtl.c >>>> =================================================================== >>>> --- cfgrtl.c (revision 193376) >>>> +++ cfgrtl.c (working copy) >>>> @@ -46,6 +46,7 @@ along with GCC; see the file COPYING3. If not see >>>> #include "tree.h" >>>> #include "hard-reg-set.h" >>>> #include "basic-block.h" >>>> +#include "bb-reorder.h" >>>> #include "regs.h" >>>> #include "flags.h" >>>> #include "function.h" >>>> @@ -67,11 +68,12 @@ along with GCC; see the file COPYING3. If not see >>>> Only applicable if the CFG is in cfglayout mode. */ >>>> static GTY(()) rtx cfg_layout_function_footer; >>>> static GTY(()) rtx cfg_layout_function_header; >>>> +static bool had_sec_boundary_notes; >>>> >>>> static rtx skip_insns_after_block (basic_block); >>>> static void record_effective_endpoints (void); >>>> static rtx label_for_bb (basic_block); >>>> -static void fixup_reorder_chain (void); >>>> +static void fixup_reorder_chain (bool finalize_reorder_blocks); >>>> >>>> void verify_insn_chain (void); >>>> static void fixup_fallthru_exit_predecessor (void); >>>> @@ -976,8 +978,7 @@ try_redirect_by_replacing_jump (edge e, basic_bloc >>>> partition boundaries). See the comments at the top of >>>> bb-reorder.c:partition_hot_cold_basic_blocks for complete details. */ >>>> >>>> - if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX) >>>> - || BB_PARTITION (src) != BB_PARTITION (target)) >>>> + if (BB_PARTITION (src) != BB_PARTITION (target)) >>>> return NULL; >>>> >>>> /* We can replace or remove a complex jump only when we have exactly >>>> @@ -1286,6 +1287,71 @@ redirect_branch_edge (edge e, basic_block target) >>>> return e; >>>> } >>>> >>>> +/* Called when edge E has been redirected to a new destination, >>>> + in order to update the region crossing flag on the edge and >>>> + jump. */ >>>> + >>>> +static void >>>> +fixup_partition_crossing (edge e, basic_block target) >>>> +{ >>>> + rtx note; >>>> + >>>> + gcc_assert (e->dest == target); >>>> + >>>> + if (e->src == ENTRY_BLOCK_PTR || target == EXIT_BLOCK_PTR) >>>> + return; >>>> + /* If we redirected an existing edge, it may already be marked >>>> + crossing, even though the new src is missing a reg crossing note. >>>> + But make sure reg crossing note doesn't already exist before >>>> + inserting. */ >>>> + if (BB_PARTITION (e->src) != BB_PARTITION (target)) >>>> + { >>>> + e->flags |= EDGE_CROSSING; >>>> + note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); >>>> + if (JUMP_P (BB_END (e->src)) >>>> + && !note) >>>> + add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); >>>> + } >>>> + else if (BB_PARTITION (e->src) == BB_PARTITION (target)) >>>> + { >>>> + e->flags &= ~EDGE_CROSSING; >>>> + /* Remove the region crossing note from jump at end of >>>> + e->src if it exists. */ >>>> + note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); >>>> + if (note) >>>> + remove_note (BB_END (e->src), note); >>>> + } >>>> +} >>>> + >>>> +/* Called when block BB has been reassigned to a different partition, >>>> + to ensure that the region crossing attributes are updated. */ >>>> + >>>> +static void >>>> +fixup_bb_partition (basic_block bb) >>>> +{ >>>> + edge e; >>>> + edge_iterator ei; >>>> + >>>> + /* Now need to make bb's pred edges non-region crossing. */ >>>> + FOR_EACH_EDGE (e, ei, bb->preds) >>>> + { >>>> + fixup_partition_crossing (e, e->dest); >>>> + } >>>> + >>>> + /* Possibly need to make bb's successor edges region crossing, >>>> + or remove stale region crossing. */ >>>> + FOR_EACH_EDGE (e, ei, bb->succs) >>>> + { >>>> + if ((e->flags & EDGE_FALLTHRU) >>>> + && BB_PARTITION (bb) != BB_PARTITION (e->dest) >>>> + && e->dest != EXIT_BLOCK_PTR) >>>> + /* force_nonfallthru_and_redirect calls fixup_partition_crossing. */ >>>> + force_nonfallthru (e); >>>> + else >>>> + fixup_partition_crossing (e, e->dest); >>>> + } >>>> +} >>>> + >>>> /* Attempt to change code to redirect edge E to TARGET. Don't do that on >>>> expense of adding new instructions or reordering basic blocks. >>>> >>>> @@ -1302,16 +1368,18 @@ rtl_redirect_edge_and_branch (edge e, basic_block >>>> { >>>> edge ret; >>>> basic_block src = e->src; >>>> + basic_block dest = e->dest; >>>> >>>> if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH)) >>>> return NULL; >>>> >>>> - if (e->dest == target) >>>> + if (dest == target) >>>> return e; >>>> >>>> if ((ret = try_redirect_by_replacing_jump (e, target, false)) != NULL) >>>> { >>>> df_set_bb_dirty (src); >>>> + fixup_partition_crossing (ret, target); >>>> return ret; >>>> } >>>> >>>> @@ -1320,6 +1388,7 @@ rtl_redirect_edge_and_branch (edge e, basic_block >>>> return NULL; >>>> >>>> df_set_bb_dirty (src); >>>> + fixup_partition_crossing (ret, target); >>>> return ret; >>>> } >>>> >>>> @@ -1454,18 +1523,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc >>>> /* Make sure new block ends up in correct hot/cold section. */ >>>> >>>> BB_COPY_PARTITION (jump_block, e->src); >>>> - if (flag_reorder_blocks_and_partition >>>> - && targetm_common.have_named_sections >>>> - && JUMP_P (BB_END (jump_block)) >>>> - && !any_condjump_p (BB_END (jump_block)) >>>> - && (EDGE_SUCC (jump_block, 0)->flags & EDGE_CROSSING)) >>>> - add_reg_note (BB_END (jump_block), REG_CROSSING_JUMP, NULL_RTX); >>>> >>>> /* Wire edge in. */ >>>> new_edge = make_edge (e->src, jump_block, EDGE_FALLTHRU); >>>> new_edge->probability = probability; >>>> new_edge->count = count; >>>> >>>> + /* If e->src was previously region crossing, it no longer is >>>> + and the reg crossing note should be removed. */ >>>> + fixup_partition_crossing (new_edge, jump_block); >>>> + >>>> /* Redirect old edge. */ >>>> redirect_edge_pred (e, jump_block); >>>> e->probability = REG_BR_PROB_BASE; >>>> @@ -1521,13 +1588,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc >>>> LABEL_NUSES (label)++; >>>> } >>>> >>>> - emit_barrier_after (BB_END (jump_block)); >>>> + /* We might be in cfg layout mode, and if so, the following routine will >>>> + insert the barrier correctly. */ >>>> + emit_barrier_after_bb (jump_block); >>>> redirect_edge_succ_nodup (e, target); >>>> >>>> if (abnormal_edge_flags) >>>> make_edge (src, target, abnormal_edge_flags); >>>> >>>> df_mark_solutions_dirty (); >>>> + fixup_partition_crossing (e, target); >>>> return new_bb; >>>> } >>>> >>>> @@ -1626,7 +1696,7 @@ rtl_move_block_after (basic_block bb ATTRIBUTE_UNU >>>> static basic_block >>>> rtl_split_edge (edge edge_in) >>>> { >>>> - basic_block bb; >>>> + basic_block bb, new_bb; >>>> rtx before; >>>> >>>> /* Abnormal edges cannot be split. */ >>>> @@ -1659,12 +1729,26 @@ rtl_split_edge (edge edge_in) >>>> else >>>> { >>>> bb = create_basic_block (before, NULL, edge_in->dest->prev_bb); >>>> - /* ??? Why not edge_in->dest->prev_bb here? */ >>>> - BB_COPY_PARTITION (bb, edge_in->dest); >>>> + if (edge_in->src == ENTRY_BLOCK_PTR) >>>> + BB_COPY_PARTITION (bb, edge_in->dest); >>>> + else >>>> + /* Put the split bb into the src partition, to avoid creating >>>> + a situation where a cold bb dominates a hot bb, in the case >>>> + where src is cold and dest is hot. The src will dominate >>>> + the new bb (whereas it might not have dominated dest). */ >>>> + BB_COPY_PARTITION (bb, edge_in->src); >>>> } >>>> >>>> make_single_succ_edge (bb, edge_in->dest, EDGE_FALLTHRU); >>>> >>>> + /* Can't allow a region crossing edge to be fallthrough. */ >>>> + if (BB_PARTITION (bb) != BB_PARTITION (edge_in->dest) >>>> + && edge_in->dest != EXIT_BLOCK_PTR) >>>> + { >>>> + new_bb = force_nonfallthru (single_succ_edge (bb)); >>>> + gcc_assert (!new_bb); >>>> + } >>>> + >>>> /* For non-fallthru edges, we must adjust the predecessor's >>>> jump instruction to target our new block. */ >>>> if ((edge_in->flags & EDGE_FALLTHRU) == 0) >>>> @@ -1777,17 +1861,13 @@ commit_one_edge_insertion (edge e) >>>> else >>>> { >>>> bb = split_edge (e); >>>> - after = BB_END (bb); >>>> >>>> - if (flag_reorder_blocks_and_partition >>>> - && targetm_common.have_named_sections >>>> - && e->src != ENTRY_BLOCK_PTR >>>> - && BB_PARTITION (e->src) == BB_COLD_PARTITION >>>> - && !(e->flags & EDGE_CROSSING) >>>> - && JUMP_P (after) >>>> - && !any_condjump_p (after) >>>> - && (single_succ_edge (bb)->flags & EDGE_CROSSING)) >>>> - add_reg_note (after, REG_CROSSING_JUMP, NULL_RTX); >>>> + /* If e crossed a partition boundary, we needed to make bb end in >>>> + a region-crossing jump, even though it was originally fallthru. */ >>>> + if (JUMP_P (BB_END (bb))) >>>> + before = BB_END (bb); >>>> + else >>>> + after = BB_END (bb); >>>> } >>>> >>>> /* Now that we've found the spot, do the insertion. */ >>>> @@ -1827,6 +1907,14 @@ commit_edge_insertions (void) >>>> { >>>> basic_block bb; >>>> >>>> + /* Optimization passes that invoke this routine can cause hot blocks >>>> + previously reached by both hot and cold blocks to become dominated only >>>> + by cold blocks. This will cause the verification below to fail, >>>> + and lead to now cold code in the hot section. In some cases this >>>> + may only be visible after newly unreachable blocks are deleted, >>>> + which will be done by fixup_partitions. */ >>>> + fixup_partitions (); >>>> + >>>> #ifdef ENABLE_CHECKING >>>> verify_flow_info (); >>>> #endif >>>> @@ -2028,7 +2116,75 @@ get_last_bb_insn (basic_block bb) >>>> >>>> return end; >>>> } >>>> - >>>> + >>>> +/* Perform cleanup on the hot/cold bb partitioning after optimization >>>> + passes that modify the cfg. */ >>>> + >>>> +void >>>> +fixup_partitions (void) >>>> +{ >>>> + basic_block bb; >>>> + >>>> + if (!crtl->has_bb_partition) >>>> + return; >>>> + >>>> + /* Delete any blocks that became unreachable and weren't >>>> + already cleaned up, for example during edge forwarding >>>> + and convert_jumps_to_returns. This will expose more >>>> + opportunities for fixing the partition boundaries here. >>>> + Also, the calculation of the dominance graph during verification >>>> + will assert if there are unreachable nodes. */ >>>> + delete_unreachable_blocks (); >>>> + >>>> + /* If there are partitions, do a sanity check on them: A basic block in >>>> + a cold partition cannot dominate a basic block in a hot partition. >>>> + Fixup any that now violate this requirement, as a result of edge >>>> + forwarding and unreachable block deletion. */ >>>> + VEC (basic_block, heap) *bbs_in_cold_partition = NULL; >>>> + VEC (basic_block, heap) *bbs_to_fix = NULL; >>>> + FOR_EACH_BB (bb) >>>> + if ((BB_PARTITION (bb) == BB_COLD_PARTITION)) >>>> + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb); >>>> + if (! VEC_empty (basic_block, bbs_in_cold_partition)) >>>> + { >>>> + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); >>>> + basic_block son; >>>> + >>>> + if (dom_calculated_here) >>>> + calculate_dominance_info (CDI_DOMINATORS); >>>> + >>>> + while (! VEC_empty (basic_block, bbs_in_cold_partition)) >>>> + { >>>> + bb = VEC_pop (basic_block, bbs_in_cold_partition); >>>> + /* If bb is not yet cold (because it was added below as >>>> + a block dominated by a cold bb) then mark it cold here. */ >>>> + if ((BB_PARTITION (bb) != BB_COLD_PARTITION)) >>>> + { >>>> + BB_SET_PARTITION (bb, BB_COLD_PARTITION); >>>> + VEC_safe_push (basic_block, heap, bbs_to_fix, bb); >>>> + } >>>> + /* Any blocks dominated by a block in the cold section >>>> + must also be cold. */ >>>> + for (son = first_dom_son (CDI_DOMINATORS, bb); >>>> + son; >>>> + son = next_dom_son (CDI_DOMINATORS, son)) >>>> + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son); >>>> + } >>>> + >>>> + if (dom_calculated_here) >>>> + free_dominance_info (CDI_DOMINATORS); >>>> + } >>>> + >>>> + /* Do the partition fixup after all necessary blocks have been converted to >>>> + cold, so that we only update the region crossings the minimum number of >>>> + places, which can require forcing edges to be non fallthru. */ >>>> + while (! VEC_empty (basic_block, bbs_to_fix)) >>>> + { >>>> + bb = VEC_pop (basic_block, bbs_to_fix); >>>> + fixup_bb_partition (bb); >>>> + } >>>> +} >>>> + >>>> /* Verify the CFG and RTL consistency common for both underlying RTL and >>>> cfglayout RTL. >>>> >>>> @@ -2052,6 +2208,7 @@ rtl_verify_flow_info_1 (void) >>>> rtx x; >>>> int err = 0; >>>> basic_block bb; >>>> + bool have_partitions = false; >>>> >>>> /* Check the general integrity of the basic blocks. */ >>>> FOR_EACH_BB_REVERSE (bb) >>>> @@ -2169,6 +2326,8 @@ rtl_verify_flow_info_1 (void) >>>> >>>> if (e->flags & EDGE_ABNORMAL) >>>> n_abnormal++; >>>> + >>>> + have_partitions |= is_crossing; >>>> } >>>> >>>> if (n_eh && !find_reg_note (BB_END (bb), REG_EH_REGION, NULL_RTX)) >>>> @@ -2293,6 +2452,40 @@ rtl_verify_flow_info_1 (void) >>>> } >>>> } >>>> >>>> + /* If there are partitions, do a sanity check on them: A basic block in >>>> + a cold partition cannot dominate a basic block in a hot partition. */ >>>> + VEC (basic_block, heap) *bbs_in_cold_partition = NULL; >>>> + if (have_partitions && !err) >>>> + FOR_EACH_BB (bb) >>>> + if ((BB_PARTITION (bb) == BB_COLD_PARTITION)) >>>> + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb); >>>> + if (! VEC_empty (basic_block, bbs_in_cold_partition)) >>>> + { >>>> + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); >>>> + basic_block son; >>>> + >>>> + if (dom_calculated_here) >>>> + calculate_dominance_info (CDI_DOMINATORS); >>>> + >>>> + while (! VEC_empty (basic_block, bbs_in_cold_partition)) >>>> + { >>>> + bb = VEC_pop (basic_block, bbs_in_cold_partition); >>>> + if ((BB_PARTITION (bb) != BB_COLD_PARTITION)) >>>> + { >>>> + error ("non-cold basic block %d dominated " >>>> + "by a block in the cold partition", bb->index); >>>> + err = 1; >>>> + } >>>> + for (son = first_dom_son (CDI_DOMINATORS, bb); >>>> + son; >>>> + son = next_dom_son (CDI_DOMINATORS, son)) >>>> + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son); >>>> + } >>>> + >>>> + if (dom_calculated_here) >>>> + free_dominance_info (CDI_DOMINATORS); >>>> + } >>>> + >>>> /* Clean up. */ >>>> return err; >>>> } >>>> @@ -2965,14 +3158,41 @@ record_effective_endpoints (void) >>>> else >>>> cfg_layout_function_header = NULL_RTX; >>>> >>>> + had_sec_boundary_notes = false; >>>> + >>>> next_insn = get_insns (); >>>> FOR_EACH_BB (bb) >>>> { >>>> rtx end; >>>> >>>> if (PREV_INSN (BB_HEAD (bb)) && next_insn != BB_HEAD (bb)) >>>> - BB_HEADER (bb) = unlink_insn_chain (next_insn, >>>> - PREV_INSN (BB_HEAD (bb))); >>>> + { >>>> + /* Rather than try to keep section boundary notes incrementally >>>> + up-to-date through cfg layout optimizations, simply remove them >>>> + and flag that they should be re-inserted when exiting >>>> + cfg layout mode. */ >>>> + rtx check_insn = next_insn; >>>> + while (check_insn) >>>> + { >>>> + if (NOTE_P (check_insn) >>>> + && NOTE_KIND (check_insn) == NOTE_INSN_SWITCH_TEXT_SECTIONS) >>>> + { >>>> + had_sec_boundary_notes |= true; >>>> + /* Remove note from chain. Grab new next_insn first. */ >>>> + if (next_insn == check_insn) >>>> + next_insn = NEXT_INSN (check_insn); >>>> + /* Delete note. */ >>>> + delete_insn (check_insn); >>>> + /* There will only be one. */ >>>> + break; >>>> + } >>>> + check_insn = NEXT_INSN (check_insn); >>>> + } >>>> + /* If we still have header instructions left after above loop. */ >>>> + if (next_insn != BB_HEAD (bb)) >>>> + BB_HEADER (bb) = unlink_insn_chain (next_insn, >>>> + PREV_INSN (BB_HEAD (bb))); >>>> + } >>>> end = skip_insns_after_block (bb); >>>> if (NEXT_INSN (BB_END (bb)) && BB_END (bb) != end) >>>> BB_FOOTER (bb) = unlink_insn_chain (NEXT_INSN (BB_END (bb)), end); >>>> @@ -3000,7 +3220,7 @@ outof_cfg_layout_mode (void) >>>> if (bb->next_bb != EXIT_BLOCK_PTR) >>>> bb->aux = bb->next_bb; >>>> >>>> - cfg_layout_finalize (); >>>> + cfg_layout_finalize (false); >>>> >>>> return 0; >>>> } >>>> @@ -3120,10 +3340,13 @@ relink_block_chain (bool stay_in_cfglayout_mode) >>>> } >>>> >>>> >>>> -/* Given a reorder chain, rearrange the code to match. */ >>>> +/* Given a reorder chain, rearrange the code to match. If >>>> + this is called when we will FINALIZE_REORDER_BLOCKS, or when >>>> + section boundary notes were removed on entry to cfg layout >>>> + mode, insert section boundary notes here. */ >>>> >>>> static void >>>> -fixup_reorder_chain (void) >>>> +fixup_reorder_chain (bool finalize_reorder_blocks) >>>> { >>>> basic_block bb; >>>> rtx insn = NULL; >>>> @@ -3150,7 +3373,7 @@ static void >>>> PREV_INSN (BB_HEADER (bb)) = insn; >>>> insn = BB_HEADER (bb); >>>> while (NEXT_INSN (insn)) >>>> - insn = NEXT_INSN (insn); >>>> + insn = NEXT_INSN (insn); >>>> } >>>> if (insn) >>>> NEXT_INSN (insn) = BB_HEAD (bb); >>>> @@ -3175,6 +3398,11 @@ static void >>>> insn = NEXT_INSN (insn); >>>> >>>> set_last_insn (insn); >>>> + >>>> + /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes. */ >>>> + if (had_sec_boundary_notes || finalize_reorder_blocks) >>>> + insert_section_boundary_note (); >>>> + >>>> #ifdef ENABLE_CHECKING >>>> verify_insn_chain (); >>>> #endif >>>> @@ -3187,7 +3415,7 @@ static void >>>> edge e_fall, e_taken, e; >>>> rtx bb_end_insn; >>>> rtx ret_label = NULL_RTX; >>>> - basic_block nb, src_bb; >>>> + basic_block nb; >>>> edge_iterator ei; >>>> >>>> if (EDGE_COUNT (bb->succs) == 0) >>>> @@ -3322,7 +3550,6 @@ static void >>>> /* We got here if we need to add a new jump insn. >>>> Note force_nonfallthru can delete E_FALL and thus we have to >>>> save E_FALL->src prior to the call to force_nonfallthru. */ >>>> - src_bb = e_fall->src; >>>> nb = force_nonfallthru_and_redirect (e_fall, e_fall->dest, ret_label); >>>> if (nb) >>>> { >>>> @@ -3330,17 +3557,6 @@ static void >>>> bb->aux = nb; >>>> /* Don't process this new block. */ >>>> bb = nb; >>>> - >>>> - /* Make sure new bb is tagged for correct section (same as >>>> - fall-thru source, since you cannot fall-thru across >>>> - section boundaries). */ >>>> - BB_COPY_PARTITION (src_bb, single_pred (bb)); >>>> - if (flag_reorder_blocks_and_partition >>>> - && targetm_common.have_named_sections >>>> - && JUMP_P (BB_END (bb)) >>>> - && !any_condjump_p (BB_END (bb)) >>>> - && (EDGE_SUCC (bb, 0)->flags & EDGE_CROSSING)) >>>> - add_reg_note (BB_END (bb), REG_CROSSING_JUMP, NULL_RTX); >>>> } >>>> } >>>> >>>> @@ -3644,10 +3860,11 @@ duplicate_insn_chain (rtx from, rtx to) >>>> case NOTE_INSN_FUNCTION_BEG: >>>> /* There is always just single entry to function. */ >>>> case NOTE_INSN_BASIC_BLOCK: >>>> + /* We should only switch text sections once. */ >>>> + case NOTE_INSN_SWITCH_TEXT_SECTIONS: >>>> break; >>>> >>>> case NOTE_INSN_EPILOGUE_BEG: >>>> - case NOTE_INSN_SWITCH_TEXT_SECTIONS: >>>> emit_note_copy (insn); >>>> break; >>>> >>>> @@ -3759,10 +3976,13 @@ break_superblocks (void) >>>> } >>>> >>>> /* Finalize the changes: reorder insn list according to the sequence specified >>>> - by aux pointers, enter compensation code, rebuild scope forest. */ >>>> + by aux pointers, enter compensation code, rebuild scope forest. If >>>> + this is called when we will FINALIZE_REORDER_BLOCKS, indicate that >>>> + to fixup_reorder_chain so that it can insert the proper switch text >>>> + section notes. */ >>>> >>>> void >>>> -cfg_layout_finalize (void) >>>> +cfg_layout_finalize (bool finalize_reorder_blocks) >>>> { >>>> #ifdef ENABLE_CHECKING >>>> verify_flow_info (); >>>> @@ -3775,7 +3995,7 @@ void >>>> #endif >>>> ) >>>> fixup_fallthru_exit_predecessor (); >>>> - fixup_reorder_chain (); >>>> + fixup_reorder_chain (finalize_reorder_blocks); >>>> >>>> rebuild_jump_labels (get_insns ()); >>>> delete_dead_jumptables (); >>>> @@ -4454,8 +4674,7 @@ rtl_can_remove_branch_p (const_edge e) >>>> if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH)) >>>> return false; >>>> >>>> - if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX) >>>> - || BB_PARTITION (src) != BB_PARTITION (target)) >>>> + if (BB_PARTITION (src) != BB_PARTITION (target)) >>>> return false; >>>> >>>> if (!onlyjump_p (insn) >>>> >>>> -- >>>> This patch is available for review at http://codereview.appspot.com/6823047 >>> >>> >>> >>> -- >>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413 > > > > -- > Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413 -- Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Sign in to reply to this message.
On Mon, Nov 26, 2012 at 12:19:55PM -0800, Teresa Johnson wrote: > Are you sure you have all my changes applied? I applied the 4 patches > attached to PR55121 into my trunk checkout that has my fixes, and to a > pristine trunk checkout. I configured and built both for > --target=arm-none-linux-gnueabi, and built using your options, .i file > and gcda file. I can reproduce the failure using the pristine trunk > with your patches but not with my fixed trunk + your patches. (I just > updated to head to pickup recent changes and get the same result. The > vec changes required some manual changes to the patch, which I will > resend shortly.) Teresa, Your mailer seems to have corrupted the posted patch with stray =3D characters and line breaks. Can you repost a copy as an attachment to the list? Jack > > Without my fixes: > > $ ~/extra/gcc_trunk_3_arm-eabi/gcc/cc1 -fpreproce > ssed eval.i -quiet -dumpbase eval.c -march=armv7-a -mtune=cortex-a9 > -mthumb -mfpu=neon -mvectorize-with-neon-quad -mfloat-abi=softfp > -mtls-dialect=gnu -auxbase-strip eval.o -g -O3 -version -fprofile-use > -fno-common -o eval.s -freorder-blocks-and-partition > GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi) > compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version > 2.4.2-p1, MPC version 0.8.1 > GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 > GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi) > compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version > 2.4.2-p1, MPC version 0.8.1 > GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 > Compiler executable checksum: d19cc60a2f07de08237a8488bb35cd1a > eval.c: In function ‘Ge’: > eval.c:792:1: internal compiler error: in df_compact_blocks, at df-core.c:1560 > } > ^ > 0x622f71 df_compact_blocks() > ../../gcc_trunk_3/gcc/df-core.c:1560 > 0x5cfcb5 compact_blocks() > ../../gcc_trunk_3/gcc/cfg.c:162 > 0xc9dce0 reorder_basic_blocks > ../../gcc_trunk_3/gcc/bb-reorder.c:2154 > 0xc9dce0 rest_of_handle_reorder_blocks > ../../gcc_trunk_3/gcc/bb-reorder.c:2219 > Please submit a full bug report, > with preprocessed source if appropriate. > Please include the complete backtrace with any bug report. > See <http://gcc.gnu.org/bugs.html> for instructions. > > > With my fixes: > > $ ~/extra/gcc_trunk_4_arm-eabi/gcc/cc1 -fpreproce > ssed eval.i -quiet -dumpbase eval.c -march=armv7-a -mtune=cortex-a9 > -mthumb -mfpu=neon -mvectorize-with-neon-quad -mfloat-abi=softfp > -mtls-dialect=gnu -auxbase-strip eval.o -g -O3 -version -fprofile-use > -fno-common -o eval.s -freorder-blocks-and-partition > GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi) > compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version > 2.4.2-p1, MPC version 0.8.1 > GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 > GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi) > compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version > 2.4.2-p1, MPC version 0.8.1 > GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 > Compiler executable checksum: 45b468efa7c981f9afb44c4dac2424f3 > > > Thanks, > Teresa > > On Mon, Nov 26, 2012 at 8:25 AM, Christophe Lyon > <christophe.lyon@linaro.org> wrote: > > Hi, > > > > I have tested your patch on Spec2000 on ARM, and I can still see > > several failures caused by: > > "error: fallthru edge crosses section boundary", including the case > > described in PR55121. > > > > On 26 November 2012 16:55, Teresa Johnson <tejohnson@google.com> wrote: > >> Ping. > >> Teresa > >> > >> On Thu, Nov 15, 2012 at 12:10 PM, Teresa Johnson <tejohnson@google.com> wrote: > >>> Revised patch that fixes failures encountered when enabling > >>> -freorder-blocks-and-partition, including the failure reported in PR 53743. > >>> > >>> This includes new verification code to ensure no cold blocks dominate hot > >>> blocks contributed by Steven Bosscher. > >>> > >>> I attempted to make the handling of partition updates through the optimization > >>> passes much more consistent, removing a number of partial fixes in the code > >>> stream in the process. The code to fixup partitions (including the BB_PARTITION > >>> assignement, region crossing jump notes, and switch text section notes) is > >>> now handled in a few centralized locations. For example, inside > >>> rtl_redirect_edge_and_branch and force_nonfallthru_and_redirect, so that callers > >>> don't need to attempt the fixup themselves. > >>> > >>> For optimization passes that make adjustments to the cfg while in cfg layout > >>> mode that are not easy to fix up incrementally, the new routine > >>> fixup_partitions handles the cleanup globally. This does require calculation > >>> of the dominance relation, however, as far as I can tell the routines which > >>> now invoke this global fixup (try_optimize_cfg and commit_edge_insertions) > >>> are invoked typically once (or a small number of times in the case of > >>> try_optimize_cfg) per optimization pass. Additionally, I compared the > >>> -ftime-report output for some large fdo compilations and saw only minimal > >>> increases in the dominance computation times, which were only a tiny percent > >>> of the overall compile time. > >>> > >>> Additionally, I added a flag to the rtl_data structure to indicate whether > >>> any partitioning was actually performed, so that optimizations which were > >>> conservatively disabled whenever the flag_reorder_blocks_and_partition > >>> is enabled (e.g. try_crossjump_to_edge, part of connect_traces) can be less > >>> conservative for functions where no partitions were formed (e.g. they are > >>> completely hot). > >>> > >>> Bootstrapped and tested on x86_64-unknown-linux-gnu. Also tested with SPEC2006 int > >>> benchmarks and internal google benchmarks using profile feedback and > >>> -freorder-blocks-and-partition to get more coverage. Ok for trunk? > >>> > >>> Thanks, > >>> Teresa > >>> > >>> 2012-11-14 Teresa Johnson <tejohnson@google.com> > >>> Steven Bosscher <steven@gcc.gnu.org> > >>> > >>> * cfghooks.h (cfg_layout_finalize): New parameter. > >>> * modulo-sched.c (rest_of_handle_sms): New cfg_layout_finalize > >>> parameter. > >>> * ifcvt.c (find_if_case_1): Replace BB_COPY_PARTITION with assert > >>> as this is now done by redirect_edge_and_branch_force. > >>> * function.c (thread_prologue_and_epilogue_insns): Insert new bb after > >>> barriers, new cfg_layout_finalize parameter, and don't store exit > >>> predecessor BB until after it is potentially split. > >>> * function.h (struct rtl_data): New flag has_bb_partition. > >>> * hw-doloop.c (reorder_loops): New cfg_layout_finalize parameter. > >>> * cfgcleanup.c (try_crossjump_to_edge): Only skip optimization if > >>> any blocks in function actually partitioned. > >>> (try_optimize_cfg): If cfg changed, invoke fixup_partitions to clean > >>> up partitioning. > >>> * bb-reorder.c (connect_traces): Only look for partitions and skip > >>> block copying if any blocks in function actually partitioned. > >>> (emit_barrier_after_bb): Handle insertion in non-cfglayout mode. > >>> (find_rarely_executed_basic_blocks_and_crossing_edges): Ensure > >>> that no cold blocks dominate a hot block. > >>> (fix_up_fall_thru_edges): Replace BB_COPY_PARTITION with assert > >>> as this is now done by force_nonfallthru_and_redirect. > >>> (add_reg_crossing_jump_notes): Handle the fact that some jumps may > >>> already be marked with region crossing note. > >>> (reorder_basic_blocks): Only need to verify partitions if any > >>> blocks in function actually partitioned. > >>> (insert_section_boundary_note): Only need to insert note if any > >>> blocks in function actually partitioned. > >>> (rest_of_handle_reorder_blocks): New cfg_layout_finalize > >>> parameter, and remove call to insert_section_boundary_note as this > >>> is now called via cfg_layout_finalize/fixup_reorder_chain. > >>> (duplicate_computed_gotos): New cfg_layout_finalize > >>> parameter. > >>> (partition_hot_cold_basic_blocks): Set flag indicating function > >>> has bb partitions. > >>> * bb-reorder.h: Declare insert_section_boundary_note and > >>> emit_barrier_after_bb, which are no longer static. > >>> * basic-block.h: Declare new function fixup_partitions. > >>> * cfgrtl.c (try_redirect_by_replacing_jump): Remove unnecessary > >>> check for region crossing note. > >>> (fixup_partition_crossing): New function. > >>> (fixup_bb_partition): Ditto. > >>> (rtl_redirect_edge_and_branch): Fixup partition boundaries. > >>> (force_nonfallthru_and_redirect): Fixup partition boundaries, > >>> remove old code that tried to do this. Emit barrier correctly > >>> when we are in cfglayout mode. > >>> (rtl_split_edge): Correctly fixup partition boundaries. > >>> (commit_one_edge_insertion): Remove old code that tried to > >>> fixup region crossing edge since this is now handled in > >>> split_block, and set up insertion point correctly since > >>> block may now end in a jump. > >>> (commit_edge_insertions): Invoke fixup_partitions to sanitize partition > >>> boundaries after optimizations that modify cfg and before trying to > >>> verify the flow info. > >>> (fixup_partitions): New function. > >>> (rtl_verify_flow_info_1): Add verification that no cold bbs dominate > >>> hot bbs. > >>> (record_effective_endpoints): Remove region-crossing notes and set flag > >>> indicating that they need to be reinserted on exit from cfglayout mode. > >>> (outof_cfg_layout_mode): New cfg_layout_finalize parameter. > >>> (fixup_reorder_chain): Call insert_section_boundary_note if necessary. > >>> Remove old code that attempted to fixup region crossing note as > >>> this is now handled in force_nonfallthru_and_redirect. > >>> (duplicate_insn_chain): Don't duplicate switch section notes. > >>> (cfg_layout_finalize): Pass new parameter to fixup_reorder_chain. > >>> (rtl_can_remove_branch_p): Remove unnecessary check for region crossing > >>> note. > >>> > >>> Index: cfghooks.h > >>> =================================================================== > >>> --- cfghooks.h (revision 193376) > >>> +++ cfghooks.h (working copy) > >>> @@ -204,7 +204,7 @@ extern void copy_bbs (basic_block *, unsigned, bas > >>> void account_profile_record (struct profile_record *, int); > >>> > >>> extern void cfg_layout_initialize (unsigned int); > >>> -extern void cfg_layout_finalize (void); > >>> +extern void cfg_layout_finalize (bool); > >>> > >>> /* Hooks containers. */ > >>> extern struct cfg_hooks gimple_cfg_hooks; > >>> @@ -218,4 +218,3 @@ extern void cfg_layout_rtl_register_cfg_hooks (voi > >>> extern void gimple_register_cfg_hooks (void); > >>> extern struct cfg_hooks get_cfg_hooks (void); > >>> extern void set_cfg_hooks (struct cfg_hooks); > >>> - > >>> Index: modulo-sched.c > >>> =================================================================== > >>> --- modulo-sched.c (revision 193376) > >>> +++ modulo-sched.c (working copy) > >>> @@ -3354,7 +3354,7 @@ rest_of_handle_sms (void) > >>> if (bb->next_bb != EXIT_BLOCK_PTR) > >>> bb->aux = bb->next_bb; > >>> free_dominance_info (CDI_DOMINATORS); > >>> - cfg_layout_finalize (); > >>> + cfg_layout_finalize (false); > >>> #endif /* INSN_SCHEDULING */ > >>> return 0; > >>> } > >>> Index: ifcvt.c > >>> =================================================================== > >>> --- ifcvt.c (revision 193376) > >>> +++ ifcvt.c (working copy) > >>> @@ -3900,10 +3900,9 @@ find_if_case_1 (basic_block test_bb, edge then_edg > >>> if (new_bb) > >>> { > >>> df_bb_replace (then_bb_index, new_bb); > >>> - /* Since the fallthru edge was redirected from test_bb to new_bb, > >>> - we need to ensure that new_bb is in the same partition as > >>> - test bb (you can not fall through across section boundaries). */ > >>> - BB_COPY_PARTITION (new_bb, test_bb); > >>> + /* This should have been done above via force_nonfallthru_and_redirect > >>> + (possibly called from redirect_edge_and_branch_force). */ > >>> + gcc_assert (BB_PARTITION (new_bb) == BB_PARTITION (test_bb)); > >>> } > >>> > >>> num_true_changes++; > >>> Index: function.c > >>> =================================================================== > >>> --- function.c (revision 193376) > >>> +++ function.c (working copy) > >>> @@ -6249,8 +6249,10 @@ thread_prologue_and_epilogue_insns (void) > >>> break; > >>> if (e) > >>> { > >>> - copy_bb = create_basic_block (NEXT_INSN (BB_END (e->src)), > >>> - NULL_RTX, e->src); > >>> + /* Make sure we insert after any barriers. */ > >>> + rtx end = get_last_bb_insn (e->src); > >>> + copy_bb = create_basic_block (NEXT_INSN (end), > >>> + NULL_RTX, e->src); > >>> BB_COPY_PARTITION (copy_bb, e->src); > >>> } > >>> else > >>> @@ -6475,7 +6477,7 @@ thread_prologue_and_epilogue_insns (void) > >>> if (cur_bb->index >= NUM_FIXED_BLOCKS > >>> && cur_bb->next_bb->index >= NUM_FIXED_BLOCKS) > >>> cur_bb->aux = cur_bb->next_bb; > >>> - cfg_layout_finalize (); > >>> + cfg_layout_finalize (false); > >>> } > >>> > >>> epilogue_done: > >>> @@ -6517,7 +6519,7 @@ epilogue_done: > >>> basic_block simple_return_block_cold = NULL; > >>> edge pending_edge_hot = NULL; > >>> edge pending_edge_cold = NULL; > >>> - basic_block exit_pred = EXIT_BLOCK_PTR->prev_bb; > >>> + basic_block exit_pred; > >>> int i; > >>> > >>> gcc_assert (entry_edge != orig_entry_edge); > >>> @@ -6545,6 +6547,12 @@ epilogue_done: > >>> else > >>> pending_edge_cold = e; > >>> } > >>> + > >>> + /* Save a pointer to the exit's predecessor BB for use in > >>> + inserting new BBs at the end of the function. Do this > >>> + after the call to split_block above which may split > >>> + the original exit pred. */ > >>> + exit_pred = EXIT_BLOCK_PTR->prev_bb; > >>> > >>> FOR_EACH_VEC_ELT (edge, unconverted_simple_returns, i, e) > >>> { > >>> Index: function.h > >>> =================================================================== > >>> --- function.h (revision 193376) > >>> +++ function.h (working copy) > >>> @@ -459,6 +459,11 @@ struct GTY(()) rtl_data { > >>> sched2) and is useful only if the port defines LEAF_REGISTERS. */ > >>> bool uses_only_leaf_regs; > >>> > >>> + /* Nonzero if the function being compiled has undergone hot/cold partitioning > >>> + (under flag_reorder_blocks_and_partition) and has at least one cold > >>> + block. */ > >>> + bool has_bb_partition; > >>> + > >>> /* Like regs_ever_live, but 1 if a reg is set or clobbered from an > >>> asm. Unlike regs_ever_live, elements of this array corresponding > >>> to eliminable regs (like the frame pointer) are set if an asm > >>> Index: hw-doloop.c > >>> =================================================================== > >>> --- hw-doloop.c (revision 193376) > >>> +++ hw-doloop.c (working copy) > >>> @@ -547,7 +547,7 @@ reorder_loops (hwloop_info loops) > >>> else > >>> bb->aux = NULL; > >>> } > >>> - cfg_layout_finalize (); > >>> + cfg_layout_finalize (false); > >>> clear_aux_for_blocks (); > >>> df_analyze (); > >>> } > >>> Index: cfgcleanup.c > >>> =================================================================== > >>> --- cfgcleanup.c (revision 193376) > >>> +++ cfgcleanup.c (working copy) > >>> @@ -1824,7 +1824,7 @@ try_crossjump_to_edge (int mode, edge e1, edge e2, > >>> partition boundaries). See the comments at the top of > >>> bb-reorder.c:partition_hot_cold_basic_blocks for complete details. */ > >>> > >>> - if (flag_reorder_blocks_and_partition && reload_completed) > >>> + if (crtl->has_bb_partition && reload_completed) > >>> return false; > >>> > >>> /* Search backward through forwarder blocks. We don't need to worry > >>> @@ -2767,10 +2767,21 @@ try_optimize_cfg (int mode) > >>> df_analyze (); > >>> } > >>> > >>> + if (changed) > >>> + { > >>> + /* Edge forwarding in particular can cause hot blocks previously > >>> + reached by both hot and cold blocks to become dominated only > >>> + by cold blocks. This will cause the verification below to fail, > >>> + and lead to now cold code in the hot section. This is not easy > >>> + to detect and fix during edge forwarding, and in some cases > >>> + is only visible after newly unreachable blocks are deleted, > >>> + which will be done in fixup_partitions. */ > >>> + fixup_partitions (); > >>> + > >>> #ifdef ENABLE_CHECKING > >>> - if (changed) > >>> - verify_flow_info (); > >>> + verify_flow_info (); > >>> #endif > >>> + } > >>> > >>> changed_overall |= changed; > >>> first_pass = false; > >>> Index: bb-reorder.c > >>> =================================================================== > >>> --- bb-reorder.c (revision 193376) > >>> +++ bb-reorder.c (working copy) > >>> @@ -1054,7 +1054,7 @@ connect_traces (int n_traces, struct trace *traces > >>> current_partition = BB_PARTITION (traces[0].first); > >>> two_passes = false; > >>> > >>> - if (flag_reorder_blocks_and_partition) > >>> + if (crtl->has_bb_partition) > >>> for (i = 0; i < n_traces && !two_passes; i++) > >>> if (BB_PARTITION (traces[0].first) > >>> != BB_PARTITION (traces[i].first)) > >>> @@ -1263,7 +1263,7 @@ connect_traces (int n_traces, struct trace *traces > >>> } > >>> } > >>> > >>> - if (flag_reorder_blocks_and_partition) > >>> + if (crtl->has_bb_partition) > >>> try_copy = false; > >>> > >>> /* Copy tiny blocks always; copy larger blocks only when the > >>> @@ -1381,13 +1381,14 @@ get_uncond_jump_length (void) > >>> return length; > >>> } > >>> > >>> -/* Emit a barrier into the footer of BB. */ > >>> +/* Emit a barrier after BB, into the footer if we are in CFGLAYOUT mode. */ > >>> > >>> -static void > >>> +void > >>> emit_barrier_after_bb (basic_block bb) > >>> { > >>> rtx barrier = emit_barrier_after (BB_END (bb)); > >>> - BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier); > >>> + if (current_ir_type () == IR_RTL_CFGLAYOUT) > >>> + BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier); > >>> } > >>> > >>> /* The landing pad OLD_LP, in block OLD_BB, has edges from both partitions. > >>> @@ -1463,18 +1464,109 @@ find_rarely_executed_basic_blocks_and_crossing_edg > >>> { > >>> VEC(edge, heap) *crossing_edges = NULL; > >>> basic_block bb; > >>> - edge e; > >>> - edge_iterator ei; > >>> + edge e, e2; > >>> + edge_iterator ei, ei2; > >>> + unsigned int cold_bb_count = 0; > >>> + VEC (basic_block, heap) *bbs_in_hot_partition = NULL; > >>> + VEC (basic_block, heap) *bbs_newly_hot = NULL; > >>> > >>> /* Mark which partition (hot/cold) each basic block belongs in. */ > >>> FOR_EACH_BB (bb) > >>> { > >>> if (probably_never_executed_bb_p (cfun, bb)) > >>> - BB_SET_PARTITION (bb, BB_COLD_PARTITION); > >>> + { > >>> + BB_SET_PARTITION (bb, BB_COLD_PARTITION); > >>> + cold_bb_count++; > >>> + } > >>> else > >>> - BB_SET_PARTITION (bb, BB_HOT_PARTITION); > >>> + { > >>> + BB_SET_PARTITION (bb, BB_HOT_PARTITION); > >>> + VEC_safe_push (basic_block, heap, bbs_in_hot_partition, bb); > >>> + } > >>> } > >>> > >>> + /* Ensure that no cold bbs dominate hot bbs. This could happen as a result of > >>> + several different possibilities. One is that there are edge weight insanities > >>> + due to optimization phases that do not properly update basic block profile > >>> + counts. The second is that the entry of the function may not be hot, because > >>> + it is entered fewer times than the number of profile training runs, but there > >>> + is a loop inside the function that causes blocks within the function to be > >>> + above the threshold for hotness. */ > >>> + if (cold_bb_count) > >>> + { > >>> + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); > >>> + > >>> + if (dom_calculated_here) > >>> + calculate_dominance_info (CDI_DOMINATORS); > >>> + > >>> + /* Keep examining hot bbs until we have either checked them all, or > >>> + re-marked all cold bbs hot. */ > >>> + while (! VEC_empty (basic_block, bbs_in_hot_partition) > >>> + && cold_bb_count) > >>> + { > >>> + basic_block dom_bb; > >>> + > >>> + bb = VEC_pop (basic_block, bbs_in_hot_partition); > >>> + dom_bb = get_immediate_dominator (CDI_DOMINATORS, bb); > >>> + > >>> + /* If bb's immediate dominator is also hot then it is ok. */ > >>> + if (BB_PARTITION (dom_bb) != BB_COLD_PARTITION) > >>> + continue; > >>> + > >>> + /* We have a hot bb with an immediate dominator that is cold. > >>> + The dominator needs to be re-marked to hot. */ > >>> + BB_SET_PARTITION (dom_bb, BB_HOT_PARTITION); > >>> + cold_bb_count--; > >>> + > >>> + /* Now we need to examine newly-hot dom_bb to see if it is also > >>> + dominated by a cold bb. */ > >>> + VEC_safe_push (basic_block, heap, bbs_in_hot_partition, dom_bb); > >>> + > >>> + /* We should also adjust any cold blocks that the newly-hot bb > >>> + feeds and see if it makes sense to re-mark those as hot as > >>> + well. */ > >>> + VEC_safe_push (basic_block, heap, bbs_newly_hot, dom_bb); > >>> + while (! VEC_empty (basic_block, bbs_newly_hot)) > >>> + { > >>> + basic_block new_hot_bb = VEC_pop (basic_block, bbs_newly_hot); > >>> + /* Examine all successors of this newly-hot bb to see if they > >>> + are cold and should be re-marked as hot. */ > >>> + FOR_EACH_EDGE (e, ei, new_hot_bb->succs) > >>> + { > >>> + bool any_cold_preds = false; > >>> + basic_block succ = e->dest; > >>> + if (BB_PARTITION (succ) != BB_COLD_PARTITION) > >>> + continue; > >>> + /* Does this block have any cold predecessors now? */ > >>> + FOR_EACH_EDGE (e2, ei2, succ->preds) > >>> + { > >>> + if (BB_PARTITION (e2->src) == BB_COLD_PARTITION) > >>> + { > >>> + any_cold_preds = true; > >>> + break; > >>> + } > >>> + } > >>> + if (any_cold_preds) > >>> + continue; > >>> + > >>> + /* Here we have a successor of newly-hot bb that is cold > >>> + but no longer has any cold precessessors. Since the original > >>> + assignment of our newly-hot bb was incorrect, this successor's > >>> + assignment as cold is also suspect. Go ahead and re-mark it > >>> + as hot now too. Better heuristics may be in order here. */ > >>> + BB_SET_PARTITION (succ, BB_HOT_PARTITION); > >>> + cold_bb_count--; > >>> + VEC_safe_push (basic_block, heap, bbs_in_hot_partition, succ); > >>> + /* Examine this successor as a newly-hot bb. */ > >>> + VEC_safe_push (basic_block, heap, bbs_newly_hot, succ); > >>> + } > >>> + } > >>> + } > >>> + > >>> + if (dom_calculated_here) > >>> + free_dominance_info (CDI_DOMINATORS); > >>> + } > >>> + > >>> /* The format of .gcc_except_table does not allow landing pads to > >>> be in a different partition as the throw. Fix this by either > >>> moving or duplicating the landing pads. */ > >>> @@ -1766,10 +1858,10 @@ fix_up_fall_thru_edges (void) > >>> new_bb->aux = cur_bb->aux; > >>> cur_bb->aux = new_bb; > >>> > >>> - /* Make sure new fall-through bb is in same > >>> - partition as bb it's falling through from. */ > >>> + /* This is done by force_nonfallthru_and_redirect. */ > >>> + gcc_assert (BB_PARTITION (new_bb) > >>> + == BB_PARTITION (cur_bb)); > >>> > >>> - BB_COPY_PARTITION (new_bb, cur_bb); > >>> single_succ_edge (new_bb)->flags |= EDGE_CROSSING; > >>> } > >>> else > >>> @@ -2067,7 +2159,10 @@ add_reg_crossing_jump_notes (void) > >>> FOR_EACH_BB (bb) > >>> FOR_EACH_EDGE (e, ei, bb->succs) > >>> if ((e->flags & EDGE_CROSSING) > >>> - && JUMP_P (BB_END (e->src))) > >>> + && JUMP_P (BB_END (e->src)) > >>> + /* Some notes were added during fix_up_fall_thru_edges, via > >>> + force_nonfallthru_and_redirect. */ > >>> + && !find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX)) > >>> add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); > >>> } > >>> > >>> @@ -2160,7 +2255,7 @@ reorder_basic_blocks (void) > >>> dump_flow_info (dump_file, dump_flags); > >>> } > >>> > >>> - if (flag_reorder_blocks_and_partition) > >>> + if (crtl->has_bb_partition) > >>> verify_hot_cold_block_grouping (); > >>> } > >>> > >>> @@ -2172,14 +2267,14 @@ reorder_basic_blocks (void) > >>> encountering this note will make the compiler switch between the > >>> hot and cold text sections. */ > >>> > >>> -static void > >>> +void > >>> insert_section_boundary_note (void) > >>> { > >>> basic_block bb; > >>> rtx new_note; > >>> int first_partition = 0; > >>> > >>> - if (!flag_reorder_blocks_and_partition) > >>> + if (!crtl->has_bb_partition) > >>> return; > >>> > >>> FOR_EACH_BB (bb) > >>> @@ -2222,10 +2317,8 @@ rest_of_handle_reorder_blocks (void) > >>> FOR_EACH_BB (bb) > >>> if (bb->next_bb != EXIT_BLOCK_PTR) > >>> bb->aux = bb->next_bb; > >>> - cfg_layout_finalize (); > >>> + cfg_layout_finalize (true); > >>> > >>> - /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes. */ > >>> - insert_section_boundary_note (); > >>> return 0; > >>> } > >>> > >>> @@ -2366,7 +2459,7 @@ duplicate_computed_gotos (void) > >>> } > >>> > >>> done: > >>> - cfg_layout_finalize (); > >>> + cfg_layout_finalize (false); > >>> > >>> BITMAP_FREE (candidates); > >>> return 0; > >>> @@ -2511,6 +2604,8 @@ partition_hot_cold_basic_blocks (void) > >>> if (crossing_edges == NULL) > >>> return 0; > >>> > >>> + crtl->has_bb_partition = true; > >>> + > >>> /* Make sure the source of any crossing edge ends in a jump and the > >>> destination of any crossing edge has a label. */ > >>> add_labels_and_missing_jumps (crossing_edges); > >>> Index: bb-reorder.h > >>> =================================================================== > >>> --- bb-reorder.h (revision 193376) > >>> +++ bb-reorder.h (working copy) > >>> @@ -36,4 +36,8 @@ extern struct target_bb_reorder *this_target_bb_re > >>> > >>> extern int get_uncond_jump_length (void); > >>> > >>> +extern void insert_section_boundary_note (void); > >>> + > >>> +extern void emit_barrier_after_bb (basic_block bb); > >>> + > >>> #endif > >>> Index: basic-block.h > >>> =================================================================== > >>> --- basic-block.h (revision 193376) > >>> +++ basic-block.h (working copy) > >>> @@ -806,6 +806,7 @@ extern basic_block force_nonfallthru_and_redirect > >>> extern bool contains_no_active_insn_p (const_basic_block); > >>> extern bool forwarder_block_p (const_basic_block); > >>> extern bool can_fallthru (basic_block, basic_block); > >>> +extern void fixup_partitions (void); > >>> > >>> /* In cfgbuild.c. */ > >>> extern void find_many_sub_basic_blocks (sbitmap); > >>> Index: cfgrtl.c > >>> =================================================================== > >>> --- cfgrtl.c (revision 193376) > >>> +++ cfgrtl.c (working copy) > >>> @@ -46,6 +46,7 @@ along with GCC; see the file COPYING3. If not see > >>> #include "tree.h" > >>> #include "hard-reg-set.h" > >>> #include "basic-block.h" > >>> +#include "bb-reorder.h" > >>> #include "regs.h" > >>> #include "flags.h" > >>> #include "function.h" > >>> @@ -67,11 +68,12 @@ along with GCC; see the file COPYING3. If not see > >>> Only applicable if the CFG is in cfglayout mode. */ > >>> static GTY(()) rtx cfg_layout_function_footer; > >>> static GTY(()) rtx cfg_layout_function_header; > >>> +static bool had_sec_boundary_notes; > >>> > >>> static rtx skip_insns_after_block (basic_block); > >>> static void record_effective_endpoints (void); > >>> static rtx label_for_bb (basic_block); > >>> -static void fixup_reorder_chain (void); > >>> +static void fixup_reorder_chain (bool finalize_reorder_blocks); > >>> > >>> void verify_insn_chain (void); > >>> static void fixup_fallthru_exit_predecessor (void); > >>> @@ -976,8 +978,7 @@ try_redirect_by_replacing_jump (edge e, basic_bloc > >>> partition boundaries). See the comments at the top of > >>> bb-reorder.c:partition_hot_cold_basic_blocks for complete details. */ > >>> > >>> - if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX) > >>> - || BB_PARTITION (src) != BB_PARTITION (target)) > >>> + if (BB_PARTITION (src) != BB_PARTITION (target)) > >>> return NULL; > >>> > >>> /* We can replace or remove a complex jump only when we have exactly > >>> @@ -1286,6 +1287,71 @@ redirect_branch_edge (edge e, basic_block target) > >>> return e; > >>> } > >>> > >>> +/* Called when edge E has been redirected to a new destination, > >>> + in order to update the region crossing flag on the edge and > >>> + jump. */ > >>> + > >>> +static void > >>> +fixup_partition_crossing (edge e, basic_block target) > >>> +{ > >>> + rtx note; > >>> + > >>> + gcc_assert (e->dest == target); > >>> + > >>> + if (e->src == ENTRY_BLOCK_PTR || target == EXIT_BLOCK_PTR) > >>> + return; > >>> + /* If we redirected an existing edge, it may already be marked > >>> + crossing, even though the new src is missing a reg crossing note. > >>> + But make sure reg crossing note doesn't already exist before > >>> + inserting. */ > >>> + if (BB_PARTITION (e->src) != BB_PARTITION (target)) > >>> + { > >>> + e->flags |= EDGE_CROSSING; > >>> + note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); > >>> + if (JUMP_P (BB_END (e->src)) > >>> + && !note) > >>> + add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); > >>> + } > >>> + else if (BB_PARTITION (e->src) == BB_PARTITION (target)) > >>> + { > >>> + e->flags &= ~EDGE_CROSSING; > >>> + /* Remove the region crossing note from jump at end of > >>> + e->src if it exists. */ > >>> + note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); > >>> + if (note) > >>> + remove_note (BB_END (e->src), note); > >>> + } > >>> +} > >>> + > >>> +/* Called when block BB has been reassigned to a different partition, > >>> + to ensure that the region crossing attributes are updated. */ > >>> + > >>> +static void > >>> +fixup_bb_partition (basic_block bb) > >>> +{ > >>> + edge e; > >>> + edge_iterator ei; > >>> + > >>> + /* Now need to make bb's pred edges non-region crossing. */ > >>> + FOR_EACH_EDGE (e, ei, bb->preds) > >>> + { > >>> + fixup_partition_crossing (e, e->dest); > >>> + } > >>> + > >>> + /* Possibly need to make bb's successor edges region crossing, > >>> + or remove stale region crossing. */ > >>> + FOR_EACH_EDGE (e, ei, bb->succs) > >>> + { > >>> + if ((e->flags & EDGE_FALLTHRU) > >>> + && BB_PARTITION (bb) != BB_PARTITION (e->dest) > >>> + && e->dest != EXIT_BLOCK_PTR) > >>> + /* force_nonfallthru_and_redirect calls fixup_partition_crossing. */ > >>> + force_nonfallthru (e); > >>> + else > >>> + fixup_partition_crossing (e, e->dest); > >>> + } > >>> +} > >>> + > >>> /* Attempt to change code to redirect edge E to TARGET. Don't do that on > >>> expense of adding new instructions or reordering basic blocks. > >>> > >>> @@ -1302,16 +1368,18 @@ rtl_redirect_edge_and_branch (edge e, basic_block > >>> { > >>> edge ret; > >>> basic_block src = e->src; > >>> + basic_block dest = e->dest; > >>> > >>> if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH)) > >>> return NULL; > >>> > >>> - if (e->dest == target) > >>> + if (dest == target) > >>> return e; > >>> > >>> if ((ret = try_redirect_by_replacing_jump (e, target, false)) != NULL) > >>> { > >>> df_set_bb_dirty (src); > >>> + fixup_partition_crossing (ret, target); > >>> return ret; > >>> } > >>> > >>> @@ -1320,6 +1388,7 @@ rtl_redirect_edge_and_branch (edge e, basic_block > >>> return NULL; > >>> > >>> df_set_bb_dirty (src); > >>> + fixup_partition_crossing (ret, target); > >>> return ret; > >>> } > >>> > >>> @@ -1454,18 +1523,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc > >>> /* Make sure new block ends up in correct hot/cold section. */ > >>> > >>> BB_COPY_PARTITION (jump_block, e->src); > >>> - if (flag_reorder_blocks_and_partition > >>> - && targetm_common.have_named_sections > >>> - && JUMP_P (BB_END (jump_block)) > >>> - && !any_condjump_p (BB_END (jump_block)) > >>> - && (EDGE_SUCC (jump_block, 0)->flags & EDGE_CROSSING)) > >>> - add_reg_note (BB_END (jump_block), REG_CROSSING_JUMP, NULL_RTX); > >>> > >>> /* Wire edge in. */ > >>> new_edge = make_edge (e->src, jump_block, EDGE_FALLTHRU); > >>> new_edge->probability = probability; > >>> new_edge->count = count; > >>> > >>> + /* If e->src was previously region crossing, it no longer is > >>> + and the reg crossing note should be removed. */ > >>> + fixup_partition_crossing (new_edge, jump_block); > >>> + > >>> /* Redirect old edge. */ > >>> redirect_edge_pred (e, jump_block); > >>> e->probability = REG_BR_PROB_BASE; > >>> @@ -1521,13 +1588,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc > >>> LABEL_NUSES (label)++; > >>> } > >>> > >>> - emit_barrier_after (BB_END (jump_block)); > >>> + /* We might be in cfg layout mode, and if so, the following routine will > >>> + insert the barrier correctly. */ > >>> + emit_barrier_after_bb (jump_block); > >>> redirect_edge_succ_nodup (e, target); > >>> > >>> if (abnormal_edge_flags) > >>> make_edge (src, target, abnormal_edge_flags); > >>> > >>> df_mark_solutions_dirty (); > >>> + fixup_partition_crossing (e, target); > >>> return new_bb; > >>> } > >>> > >>> @@ -1626,7 +1696,7 @@ rtl_move_block_after (basic_block bb ATTRIBUTE_UNU > >>> static basic_block > >>> rtl_split_edge (edge edge_in) > >>> { > >>> - basic_block bb; > >>> + basic_block bb, new_bb; > >>> rtx before; > >>> > >>> /* Abnormal edges cannot be split. */ > >>> @@ -1659,12 +1729,26 @@ rtl_split_edge (edge edge_in) > >>> else > >>> { > >>> bb = create_basic_block (before, NULL, edge_in->dest->prev_bb); > >>> - /* ??? Why not edge_in->dest->prev_bb here? */ > >>> - BB_COPY_PARTITION (bb, edge_in->dest); > >>> + if (edge_in->src == ENTRY_BLOCK_PTR) > >>> + BB_COPY_PARTITION (bb, edge_in->dest); > >>> + else > >>> + /* Put the split bb into the src partition, to avoid creating > >>> + a situation where a cold bb dominates a hot bb, in the case > >>> + where src is cold and dest is hot. The src will dominate > >>> + the new bb (whereas it might not have dominated dest). */ > >>> + BB_COPY_PARTITION (bb, edge_in->src); > >>> } > >>> > >>> make_single_succ_edge (bb, edge_in->dest, EDGE_FALLTHRU); > >>> > >>> + /* Can't allow a region crossing edge to be fallthrough. */ > >>> + if (BB_PARTITION (bb) != BB_PARTITION (edge_in->dest) > >>> + && edge_in->dest != EXIT_BLOCK_PTR) > >>> + { > >>> + new_bb = force_nonfallthru (single_succ_edge (bb)); > >>> + gcc_assert (!new_bb); > >>> + } > >>> + > >>> /* For non-fallthru edges, we must adjust the predecessor's > >>> jump instruction to target our new block. */ > >>> if ((edge_in->flags & EDGE_FALLTHRU) == 0) > >>> @@ -1777,17 +1861,13 @@ commit_one_edge_insertion (edge e) > >>> else > >>> { > >>> bb = split_edge (e); > >>> - after = BB_END (bb); > >>> > >>> - if (flag_reorder_blocks_and_partition > >>> - && targetm_common.have_named_sections > >>> - && e->src != ENTRY_BLOCK_PTR > >>> - && BB_PARTITION (e->src) == BB_COLD_PARTITION > >>> - && !(e->flags & EDGE_CROSSING) > >>> - && JUMP_P (after) > >>> - && !any_condjump_p (after) > >>> - && (single_succ_edge (bb)->flags & EDGE_CROSSING)) > >>> - add_reg_note (after, REG_CROSSING_JUMP, NULL_RTX); > >>> + /* If e crossed a partition boundary, we needed to make bb end in > >>> + a region-crossing jump, even though it was originally fallthru. */ > >>> + if (JUMP_P (BB_END (bb))) > >>> + before = BB_END (bb); > >>> + else > >>> + after = BB_END (bb); > >>> } > >>> > >>> /* Now that we've found the spot, do the insertion. */ > >>> @@ -1827,6 +1907,14 @@ commit_edge_insertions (void) > >>> { > >>> basic_block bb; > >>> > >>> + /* Optimization passes that invoke this routine can cause hot blocks > >>> + previously reached by both hot and cold blocks to become dominated only > >>> + by cold blocks. This will cause the verification below to fail, > >>> + and lead to now cold code in the hot section. In some cases this > >>> + may only be visible after newly unreachable blocks are deleted, > >>> + which will be done by fixup_partitions. */ > >>> + fixup_partitions (); > >>> + > >>> #ifdef ENABLE_CHECKING > >>> verify_flow_info (); > >>> #endif > >>> @@ -2028,7 +2116,75 @@ get_last_bb_insn (basic_block bb) > >>> > >>> return end; > >>> } > >>> - > >>> + > >>> +/* Perform cleanup on the hot/cold bb partitioning after optimization > >>> + passes that modify the cfg. */ > >>> + > >>> +void > >>> +fixup_partitions (void) > >>> +{ > >>> + basic_block bb; > >>> + > >>> + if (!crtl->has_bb_partition) > >>> + return; > >>> + > >>> + /* Delete any blocks that became unreachable and weren't > >>> + already cleaned up, for example during edge forwarding > >>> + and convert_jumps_to_returns. This will expose more > >>> + opportunities for fixing the partition boundaries here. > >>> + Also, the calculation of the dominance graph during verification > >>> + will assert if there are unreachable nodes. */ > >>> + delete_unreachable_blocks (); > >>> + > >>> + /* If there are partitions, do a sanity check on them: A basic block in > >>> + a cold partition cannot dominate a basic block in a hot partition. > >>> + Fixup any that now violate this requirement, as a result of edge > >>> + forwarding and unreachable block deletion. */ > >>> + VEC (basic_block, heap) *bbs_in_cold_partition = NULL; > >>> + VEC (basic_block, heap) *bbs_to_fix = NULL; > >>> + FOR_EACH_BB (bb) > >>> + if ((BB_PARTITION (bb) == BB_COLD_PARTITION)) > >>> + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb); > >>> + if (! VEC_empty (basic_block, bbs_in_cold_partition)) > >>> + { > >>> + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); > >>> + basic_block son; > >>> + > >>> + if (dom_calculated_here) > >>> + calculate_dominance_info (CDI_DOMINATORS); > >>> + > >>> + while (! VEC_empty (basic_block, bbs_in_cold_partition)) > >>> + { > >>> + bb = VEC_pop (basic_block, bbs_in_cold_partition); > >>> + /* If bb is not yet cold (because it was added below as > >>> + a block dominated by a cold bb) then mark it cold here. */ > >>> + if ((BB_PARTITION (bb) != BB_COLD_PARTITION)) > >>> + { > >>> + BB_SET_PARTITION (bb, BB_COLD_PARTITION); > >>> + VEC_safe_push (basic_block, heap, bbs_to_fix, bb); > >>> + } > >>> + /* Any blocks dominated by a block in the cold section > >>> + must also be cold. */ > >>> + for (son = first_dom_son (CDI_DOMINATORS, bb); > >>> + son; > >>> + son = next_dom_son (CDI_DOMINATORS, son)) > >>> + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son); > >>> + } > >>> + > >>> + if (dom_calculated_here) > >>> + free_dominance_info (CDI_DOMINATORS); > >>> + } > >>> + > >>> + /* Do the partition fixup after all necessary blocks have been converted to > >>> + cold, so that we only update the region crossings the minimum number of > >>> + places, which can require forcing edges to be non fallthru. */ > >>> + while (! VEC_empty (basic_block, bbs_to_fix)) > >>> + { > >>> + bb = VEC_pop (basic_block, bbs_to_fix); > >>> + fixup_bb_partition (bb); > >>> + } > >>> +} > >>> + > >>> /* Verify the CFG and RTL consistency common for both underlying RTL and > >>> cfglayout RTL. > >>> > >>> @@ -2052,6 +2208,7 @@ rtl_verify_flow_info_1 (void) > >>> rtx x; > >>> int err = 0; > >>> basic_block bb; > >>> + bool have_partitions = false; > >>> > >>> /* Check the general integrity of the basic blocks. */ > >>> FOR_EACH_BB_REVERSE (bb) > >>> @@ -2169,6 +2326,8 @@ rtl_verify_flow_info_1 (void) > >>> > >>> if (e->flags & EDGE_ABNORMAL) > >>> n_abnormal++; > >>> + > >>> + have_partitions |= is_crossing; > >>> } > >>> > >>> if (n_eh && !find_reg_note (BB_END (bb), REG_EH_REGION, NULL_RTX)) > >>> @@ -2293,6 +2452,40 @@ rtl_verify_flow_info_1 (void) > >>> } > >>> } > >>> > >>> + /* If there are partitions, do a sanity check on them: A basic block in > >>> + a cold partition cannot dominate a basic block in a hot partition. */ > >>> + VEC (basic_block, heap) *bbs_in_cold_partition = NULL; > >>> + if (have_partitions && !err) > >>> + FOR_EACH_BB (bb) > >>> + if ((BB_PARTITION (bb) == BB_COLD_PARTITION)) > >>> + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb); > >>> + if (! VEC_empty (basic_block, bbs_in_cold_partition)) > >>> + { > >>> + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); > >>> + basic_block son; > >>> + > >>> + if (dom_calculated_here) > >>> + calculate_dominance_info (CDI_DOMINATORS); > >>> + > >>> + while (! VEC_empty (basic_block, bbs_in_cold_partition)) > >>> + { > >>> + bb = VEC_pop (basic_block, bbs_in_cold_partition); > >>> + if ((BB_PARTITION (bb) != BB_COLD_PARTITION)) > >>> + { > >>> + error ("non-cold basic block %d dominated " > >>> + "by a block in the cold partition", bb->index); > >>> + err = 1; > >>> + } > >>> + for (son = first_dom_son (CDI_DOMINATORS, bb); > >>> + son; > >>> + son = next_dom_son (CDI_DOMINATORS, son)) > >>> + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son); > >>> + } > >>> + > >>> + if (dom_calculated_here) > >>> + free_dominance_info (CDI_DOMINATORS); > >>> + } > >>> + > >>> /* Clean up. */ > >>> return err; > >>> } > >>> @@ -2965,14 +3158,41 @@ record_effective_endpoints (void) > >>> else > >>> cfg_layout_function_header = NULL_RTX; > >>> > >>> + had_sec_boundary_notes = false; > >>> + > >>> next_insn = get_insns (); > >>> FOR_EACH_BB (bb) > >>> { > >>> rtx end; > >>> > >>> if (PREV_INSN (BB_HEAD (bb)) && next_insn != BB_HEAD (bb)) > >>> - BB_HEADER (bb) = unlink_insn_chain (next_insn, > >>> - PREV_INSN (BB_HEAD (bb))); > >>> + { > >>> + /* Rather than try to keep section boundary notes incrementally > >>> + up-to-date through cfg layout optimizations, simply remove them > >>> + and flag that they should be re-inserted when exiting > >>> + cfg layout mode. */ > >>> + rtx check_insn = next_insn; > >>> + while (check_insn) > >>> + { > >>> + if (NOTE_P (check_insn) > >>> + && NOTE_KIND (check_insn) == NOTE_INSN_SWITCH_TEXT_SECTIONS) > >>> + { > >>> + had_sec_boundary_notes |= true; > >>> + /* Remove note from chain. Grab new next_insn first. */ > >>> + if (next_insn == check_insn) > >>> + next_insn = NEXT_INSN (check_insn); > >>> + /* Delete note. */ > >>> + delete_insn (check_insn); > >>> + /* There will only be one. */ > >>> + break; > >>> + } > >>> + check_insn = NEXT_INSN (check_insn); > >>> + } > >>> + /* If we still have header instructions left after above loop. */ > >>> + if (next_insn != BB_HEAD (bb)) > >>> + BB_HEADER (bb) = unlink_insn_chain (next_insn, > >>> + PREV_INSN (BB_HEAD (bb))); > >>> + } > >>> end = skip_insns_after_block (bb); > >>> if (NEXT_INSN (BB_END (bb)) && BB_END (bb) != end) > >>> BB_FOOTER (bb) = unlink_insn_chain (NEXT_INSN (BB_END (bb)), end); > >>> @@ -3000,7 +3220,7 @@ outof_cfg_layout_mode (void) > >>> if (bb->next_bb != EXIT_BLOCK_PTR) > >>> bb->aux = bb->next_bb; > >>> > >>> - cfg_layout_finalize (); > >>> + cfg_layout_finalize (false); > >>> > >>> return 0; > >>> } > >>> @@ -3120,10 +3340,13 @@ relink_block_chain (bool stay_in_cfglayout_mode) > >>> } > >>> > >>> > >>> -/* Given a reorder chain, rearrange the code to match. */ > >>> +/* Given a reorder chain, rearrange the code to match. If > >>> + this is called when we will FINALIZE_REORDER_BLOCKS, or when > >>> + section boundary notes were removed on entry to cfg layout > >>> + mode, insert section boundary notes here. */ > >>> > >>> static void > >>> -fixup_reorder_chain (void) > >>> +fixup_reorder_chain (bool finalize_reorder_blocks) > >>> { > >>> basic_block bb; > >>> rtx insn = NULL; > >>> @@ -3150,7 +3373,7 @@ static void > >>> PREV_INSN (BB_HEADER (bb)) = insn; > >>> insn = BB_HEADER (bb); > >>> while (NEXT_INSN (insn)) > >>> - insn = NEXT_INSN (insn); > >>> + insn = NEXT_INSN (insn); > >>> } > >>> if (insn) > >>> NEXT_INSN (insn) = BB_HEAD (bb); > >>> @@ -3175,6 +3398,11 @@ static void > >>> insn = NEXT_INSN (insn); > >>> > >>> set_last_insn (insn); > >>> + > >>> + /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes. */ > >>> + if (had_sec_boundary_notes || finalize_reorder_blocks) > >>> + insert_section_boundary_note (); > >>> + > >>> #ifdef ENABLE_CHECKING > >>> verify_insn_chain (); > >>> #endif > >>> @@ -3187,7 +3415,7 @@ static void > >>> edge e_fall, e_taken, e; > >>> rtx bb_end_insn; > >>> rtx ret_label = NULL_RTX; > >>> - basic_block nb, src_bb; > >>> + basic_block nb; > >>> edge_iterator ei; > >>> > >>> if (EDGE_COUNT (bb->succs) == 0) > >>> @@ -3322,7 +3550,6 @@ static void > >>> /* We got here if we need to add a new jump insn. > >>> Note force_nonfallthru can delete E_FALL and thus we have to > >>> save E_FALL->src prior to the call to force_nonfallthru. */ > >>> - src_bb = e_fall->src; > >>> nb = force_nonfallthru_and_redirect (e_fall, e_fall->dest, ret_label); > >>> if (nb) > >>> { > >>> @@ -3330,17 +3557,6 @@ static void > >>> bb->aux = nb; > >>> /* Don't process this new block. */ > >>> bb = nb; > >>> - > >>> - /* Make sure new bb is tagged for correct section (same as > >>> - fall-thru source, since you cannot fall-thru across > >>> - section boundaries). */ > >>> - BB_COPY_PARTITION (src_bb, single_pred (bb)); > >>> - if (flag_reorder_blocks_and_partition > >>> - && targetm_common.have_named_sections > >>> - && JUMP_P (BB_END (bb)) > >>> - && !any_condjump_p (BB_END (bb)) > >>> - && (EDGE_SUCC (bb, 0)->flags & EDGE_CROSSING)) > >>> - add_reg_note (BB_END (bb), REG_CROSSING_JUMP, NULL_RTX); > >>> } > >>> } > >>> > >>> @@ -3644,10 +3860,11 @@ duplicate_insn_chain (rtx from, rtx to) > >>> case NOTE_INSN_FUNCTION_BEG: > >>> /* There is always just single entry to function. */ > >>> case NOTE_INSN_BASIC_BLOCK: > >>> + /* We should only switch text sections once. */ > >>> + case NOTE_INSN_SWITCH_TEXT_SECTIONS: > >>> break; > >>> > >>> case NOTE_INSN_EPILOGUE_BEG: > >>> - case NOTE_INSN_SWITCH_TEXT_SECTIONS: > >>> emit_note_copy (insn); > >>> break; > >>> > >>> @@ -3759,10 +3976,13 @@ break_superblocks (void) > >>> } > >>> > >>> /* Finalize the changes: reorder insn list according to the sequence specified > >>> - by aux pointers, enter compensation code, rebuild scope forest. */ > >>> + by aux pointers, enter compensation code, rebuild scope forest. If > >>> + this is called when we will FINALIZE_REORDER_BLOCKS, indicate that > >>> + to fixup_reorder_chain so that it can insert the proper switch text > >>> + section notes. */ > >>> > >>> void > >>> -cfg_layout_finalize (void) > >>> +cfg_layout_finalize (bool finalize_reorder_blocks) > >>> { > >>> #ifdef ENABLE_CHECKING > >>> verify_flow_info (); > >>> @@ -3775,7 +3995,7 @@ void > >>> #endif > >>> ) > >>> fixup_fallthru_exit_predecessor (); > >>> - fixup_reorder_chain (); > >>> + fixup_reorder_chain (finalize_reorder_blocks); > >>> > >>> rebuild_jump_labels (get_insns ()); > >>> delete_dead_jumptables (); > >>> @@ -4454,8 +4674,7 @@ rtl_can_remove_branch_p (const_edge e) > >>> if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH)) > >>> return false; > >>> > >>> - if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX) > >>> - || BB_PARTITION (src) != BB_PARTITION (target)) > >>> + if (BB_PARTITION (src) != BB_PARTITION (target)) > >>> return false; > >>> > >>> if (!onlyjump_p (insn) > >>> > >>> -- > >>> This patch is available for review at http://codereview.appspot.com/6823047 > >> > >> > >> > >> -- > >> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413 > > > > -- > Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Sign in to reply to this message.
Sorry, I don't know what happened there. Patch is attached. Thanks, Teresa On Mon, Nov 26, 2012 at 12:42 PM, Jack Howarth <howarth@bromo.med.uc.edu> wrote: > On Mon, Nov 26, 2012 at 12:19:55PM -0800, Teresa Johnson wrote: >> Are you sure you have all my changes applied? I applied the 4 patches >> attached to PR55121 into my trunk checkout that has my fixes, and to a >> pristine trunk checkout. I configured and built both for >> --target=arm-none-linux-gnueabi, and built using your options, .i file >> and gcda file. I can reproduce the failure using the pristine trunk >> with your patches but not with my fixed trunk + your patches. (I just >> updated to head to pickup recent changes and get the same result. The >> vec changes required some manual changes to the patch, which I will >> resend shortly.) > > Teresa, > Your mailer seems to have corrupted the posted patch with stray > =3D characters and line breaks. Can you repost a copy as an attachment > to the list? > Jack > >> >> Without my fixes: >> >> $ ~/extra/gcc_trunk_3_arm-eabi/gcc/cc1 -fpreproce >> ssed eval.i -quiet -dumpbase eval.c -march=armv7-a -mtune=cortex-a9 >> -mthumb -mfpu=neon -mvectorize-with-neon-quad -mfloat-abi=softfp >> -mtls-dialect=gnu -auxbase-strip eval.o -g -O3 -version -fprofile-use >> -fno-common -o eval.s -freorder-blocks-and-partition >> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi) >> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version >> 2.4.2-p1, MPC version 0.8.1 >> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 >> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi) >> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version >> 2.4.2-p1, MPC version 0.8.1 >> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 >> Compiler executable checksum: d19cc60a2f07de08237a8488bb35cd1a >> eval.c: In function ‘Ge’: >> eval.c:792:1: internal compiler error: in df_compact_blocks, at df-core.c:1560 >> } >> ^ >> 0x622f71 df_compact_blocks() >> ../../gcc_trunk_3/gcc/df-core.c:1560 >> 0x5cfcb5 compact_blocks() >> ../../gcc_trunk_3/gcc/cfg.c:162 >> 0xc9dce0 reorder_basic_blocks >> ../../gcc_trunk_3/gcc/bb-reorder.c:2154 >> 0xc9dce0 rest_of_handle_reorder_blocks >> ../../gcc_trunk_3/gcc/bb-reorder.c:2219 >> Please submit a full bug report, >> with preprocessed source if appropriate. >> Please include the complete backtrace with any bug report. >> See <http://gcc.gnu.org/bugs.html> for instructions. >> >> >> With my fixes: >> >> $ ~/extra/gcc_trunk_4_arm-eabi/gcc/cc1 -fpreproce >> ssed eval.i -quiet -dumpbase eval.c -march=armv7-a -mtune=cortex-a9 >> -mthumb -mfpu=neon -mvectorize-with-neon-quad -mfloat-abi=softfp >> -mtls-dialect=gnu -auxbase-strip eval.o -g -O3 -version -fprofile-use >> -fno-common -o eval.s -freorder-blocks-and-partition >> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi) >> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version >> 2.4.2-p1, MPC version 0.8.1 >> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 >> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi) >> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version >> 2.4.2-p1, MPC version 0.8.1 >> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 >> Compiler executable checksum: 45b468efa7c981f9afb44c4dac2424f3 >> >> >> Thanks, >> Teresa >> >> On Mon, Nov 26, 2012 at 8:25 AM, Christophe Lyon >> <christophe.lyon@linaro.org> wrote: >> > Hi, >> > >> > I have tested your patch on Spec2000 on ARM, and I can still see >> > several failures caused by: >> > "error: fallthru edge crosses section boundary", including the case >> > described in PR55121. >> > >> > On 26 November 2012 16:55, Teresa Johnson <tejohnson@google.com> wrote: >> >> Ping. >> >> Teresa >> >> >> >> On Thu, Nov 15, 2012 at 12:10 PM, Teresa Johnson <tejohnson@google.com> wrote: >> >>> Revised patch that fixes failures encountered when enabling >> >>> -freorder-blocks-and-partition, including the failure reported in PR 53743. >> >>> >> >>> This includes new verification code to ensure no cold blocks dominate hot >> >>> blocks contributed by Steven Bosscher. >> >>> >> >>> I attempted to make the handling of partition updates through the optimization >> >>> passes much more consistent, removing a number of partial fixes in the code >> >>> stream in the process. The code to fixup partitions (including the BB_PARTITION >> >>> assignement, region crossing jump notes, and switch text section notes) is >> >>> now handled in a few centralized locations. For example, inside >> >>> rtl_redirect_edge_and_branch and force_nonfallthru_and_redirect, so that callers >> >>> don't need to attempt the fixup themselves. >> >>> >> >>> For optimization passes that make adjustments to the cfg while in cfg layout >> >>> mode that are not easy to fix up incrementally, the new routine >> >>> fixup_partitions handles the cleanup globally. This does require calculation >> >>> of the dominance relation, however, as far as I can tell the routines which >> >>> now invoke this global fixup (try_optimize_cfg and commit_edge_insertions) >> >>> are invoked typically once (or a small number of times in the case of >> >>> try_optimize_cfg) per optimization pass. Additionally, I compared the >> >>> -ftime-report output for some large fdo compilations and saw only minimal >> >>> increases in the dominance computation times, which were only a tiny percent >> >>> of the overall compile time. >> >>> >> >>> Additionally, I added a flag to the rtl_data structure to indicate whether >> >>> any partitioning was actually performed, so that optimizations which were >> >>> conservatively disabled whenever the flag_reorder_blocks_and_partition >> >>> is enabled (e.g. try_crossjump_to_edge, part of connect_traces) can be less >> >>> conservative for functions where no partitions were formed (e.g. they are >> >>> completely hot). >> >>> >> >>> Bootstrapped and tested on x86_64-unknown-linux-gnu. Also tested with SPEC2006 int >> >>> benchmarks and internal google benchmarks using profile feedback and >> >>> -freorder-blocks-and-partition to get more coverage. Ok for trunk? >> >>> >> >>> Thanks, >> >>> Teresa >> >>> >> >>> 2012-11-14 Teresa Johnson <tejohnson@google.com> >> >>> Steven Bosscher <steven@gcc.gnu.org> >> >>> >> >>> * cfghooks.h (cfg_layout_finalize): New parameter. >> >>> * modulo-sched.c (rest_of_handle_sms): New cfg_layout_finalize >> >>> parameter. >> >>> * ifcvt.c (find_if_case_1): Replace BB_COPY_PARTITION with assert >> >>> as this is now done by redirect_edge_and_branch_force. >> >>> * function.c (thread_prologue_and_epilogue_insns): Insert new bb after >> >>> barriers, new cfg_layout_finalize parameter, and don't store exit >> >>> predecessor BB until after it is potentially split. >> >>> * function.h (struct rtl_data): New flag has_bb_partition. >> >>> * hw-doloop.c (reorder_loops): New cfg_layout_finalize parameter. >> >>> * cfgcleanup.c (try_crossjump_to_edge): Only skip optimization if >> >>> any blocks in function actually partitioned. >> >>> (try_optimize_cfg): If cfg changed, invoke fixup_partitions to clean >> >>> up partitioning. >> >>> * bb-reorder.c (connect_traces): Only look for partitions and skip >> >>> block copying if any blocks in function actually partitioned. >> >>> (emit_barrier_after_bb): Handle insertion in non-cfglayout mode. >> >>> (find_rarely_executed_basic_blocks_and_crossing_edges): Ensure >> >>> that no cold blocks dominate a hot block. >> >>> (fix_up_fall_thru_edges): Replace BB_COPY_PARTITION with assert >> >>> as this is now done by force_nonfallthru_and_redirect. >> >>> (add_reg_crossing_jump_notes): Handle the fact that some jumps may >> >>> already be marked with region crossing note. >> >>> (reorder_basic_blocks): Only need to verify partitions if any >> >>> blocks in function actually partitioned. >> >>> (insert_section_boundary_note): Only need to insert note if any >> >>> blocks in function actually partitioned. >> >>> (rest_of_handle_reorder_blocks): New cfg_layout_finalize >> >>> parameter, and remove call to insert_section_boundary_note as this >> >>> is now called via cfg_layout_finalize/fixup_reorder_chain. >> >>> (duplicate_computed_gotos): New cfg_layout_finalize >> >>> parameter. >> >>> (partition_hot_cold_basic_blocks): Set flag indicating function >> >>> has bb partitions. >> >>> * bb-reorder.h: Declare insert_section_boundary_note and >> >>> emit_barrier_after_bb, which are no longer static. >> >>> * basic-block.h: Declare new function fixup_partitions. >> >>> * cfgrtl.c (try_redirect_by_replacing_jump): Remove unnecessary >> >>> check for region crossing note. >> >>> (fixup_partition_crossing): New function. >> >>> (fixup_bb_partition): Ditto. >> >>> (rtl_redirect_edge_and_branch): Fixup partition boundaries. >> >>> (force_nonfallthru_and_redirect): Fixup partition boundaries, >> >>> remove old code that tried to do this. Emit barrier correctly >> >>> when we are in cfglayout mode. >> >>> (rtl_split_edge): Correctly fixup partition boundaries. >> >>> (commit_one_edge_insertion): Remove old code that tried to >> >>> fixup region crossing edge since this is now handled in >> >>> split_block, and set up insertion point correctly since >> >>> block may now end in a jump. >> >>> (commit_edge_insertions): Invoke fixup_partitions to sanitize partition >> >>> boundaries after optimizations that modify cfg and before trying to >> >>> verify the flow info. >> >>> (fixup_partitions): New function. >> >>> (rtl_verify_flow_info_1): Add verification that no cold bbs dominate >> >>> hot bbs. >> >>> (record_effective_endpoints): Remove region-crossing notes and set flag >> >>> indicating that they need to be reinserted on exit from cfglayout mode. >> >>> (outof_cfg_layout_mode): New cfg_layout_finalize parameter. >> >>> (fixup_reorder_chain): Call insert_section_boundary_note if necessary. >> >>> Remove old code that attempted to fixup region crossing note as >> >>> this is now handled in force_nonfallthru_and_redirect. >> >>> (duplicate_insn_chain): Don't duplicate switch section notes. >> >>> (cfg_layout_finalize): Pass new parameter to fixup_reorder_chain. >> >>> (rtl_can_remove_branch_p): Remove unnecessary check for region crossing >> >>> note. >> >>> >> >>> Index: cfghooks.h >> >>> =================================================================== >> >>> --- cfghooks.h (revision 193376) >> >>> +++ cfghooks.h (working copy) >> >>> @@ -204,7 +204,7 @@ extern void copy_bbs (basic_block *, unsigned, bas >> >>> void account_profile_record (struct profile_record *, int); >> >>> >> >>> extern void cfg_layout_initialize (unsigned int); >> >>> -extern void cfg_layout_finalize (void); >> >>> +extern void cfg_layout_finalize (bool); >> >>> >> >>> /* Hooks containers. */ >> >>> extern struct cfg_hooks gimple_cfg_hooks; >> >>> @@ -218,4 +218,3 @@ extern void cfg_layout_rtl_register_cfg_hooks (voi >> >>> extern void gimple_register_cfg_hooks (void); >> >>> extern struct cfg_hooks get_cfg_hooks (void); >> >>> extern void set_cfg_hooks (struct cfg_hooks); >> >>> - >> >>> Index: modulo-sched.c >> >>> =================================================================== >> >>> --- modulo-sched.c (revision 193376) >> >>> +++ modulo-sched.c (working copy) >> >>> @@ -3354,7 +3354,7 @@ rest_of_handle_sms (void) >> >>> if (bb->next_bb != EXIT_BLOCK_PTR) >> >>> bb->aux = bb->next_bb; >> >>> free_dominance_info (CDI_DOMINATORS); >> >>> - cfg_layout_finalize (); >> >>> + cfg_layout_finalize (false); >> >>> #endif /* INSN_SCHEDULING */ >> >>> return 0; >> >>> } >> >>> Index: ifcvt.c >> >>> =================================================================== >> >>> --- ifcvt.c (revision 193376) >> >>> +++ ifcvt.c (working copy) >> >>> @@ -3900,10 +3900,9 @@ find_if_case_1 (basic_block test_bb, edge then_edg >> >>> if (new_bb) >> >>> { >> >>> df_bb_replace (then_bb_index, new_bb); >> >>> - /* Since the fallthru edge was redirected from test_bb to new_bb, >> >>> - we need to ensure that new_bb is in the same partition as >> >>> - test bb (you can not fall through across section boundaries). */ >> >>> - BB_COPY_PARTITION (new_bb, test_bb); >> >>> + /* This should have been done above via force_nonfallthru_and_redirect >> >>> + (possibly called from redirect_edge_and_branch_force). */ >> >>> + gcc_assert (BB_PARTITION (new_bb) == BB_PARTITION (test_bb)); >> >>> } >> >>> >> >>> num_true_changes++; >> >>> Index: function.c >> >>> =================================================================== >> >>> --- function.c (revision 193376) >> >>> +++ function.c (working copy) >> >>> @@ -6249,8 +6249,10 @@ thread_prologue_and_epilogue_insns (void) >> >>> break; >> >>> if (e) >> >>> { >> >>> - copy_bb = create_basic_block (NEXT_INSN (BB_END (e->src)), >> >>> - NULL_RTX, e->src); >> >>> + /* Make sure we insert after any barriers. */ >> >>> + rtx end = get_last_bb_insn (e->src); >> >>> + copy_bb = create_basic_block (NEXT_INSN (end), >> >>> + NULL_RTX, e->src); >> >>> BB_COPY_PARTITION (copy_bb, e->src); >> >>> } >> >>> else >> >>> @@ -6475,7 +6477,7 @@ thread_prologue_and_epilogue_insns (void) >> >>> if (cur_bb->index >= NUM_FIXED_BLOCKS >> >>> && cur_bb->next_bb->index >= NUM_FIXED_BLOCKS) >> >>> cur_bb->aux = cur_bb->next_bb; >> >>> - cfg_layout_finalize (); >> >>> + cfg_layout_finalize (false); >> >>> } >> >>> >> >>> epilogue_done: >> >>> @@ -6517,7 +6519,7 @@ epilogue_done: >> >>> basic_block simple_return_block_cold = NULL; >> >>> edge pending_edge_hot = NULL; >> >>> edge pending_edge_cold = NULL; >> >>> - basic_block exit_pred = EXIT_BLOCK_PTR->prev_bb; >> >>> + basic_block exit_pred; >> >>> int i; >> >>> >> >>> gcc_assert (entry_edge != orig_entry_edge); >> >>> @@ -6545,6 +6547,12 @@ epilogue_done: >> >>> else >> >>> pending_edge_cold = e; >> >>> } >> >>> + >> >>> + /* Save a pointer to the exit's predecessor BB for use in >> >>> + inserting new BBs at the end of the function. Do this >> >>> + after the call to split_block above which may split >> >>> + the original exit pred. */ >> >>> + exit_pred = EXIT_BLOCK_PTR->prev_bb; >> >>> >> >>> FOR_EACH_VEC_ELT (edge, unconverted_simple_returns, i, e) >> >>> { >> >>> Index: function.h >> >>> =================================================================== >> >>> --- function.h (revision 193376) >> >>> +++ function.h (working copy) >> >>> @@ -459,6 +459,11 @@ struct GTY(()) rtl_data { >> >>> sched2) and is useful only if the port defines LEAF_REGISTERS. */ >> >>> bool uses_only_leaf_regs; >> >>> >> >>> + /* Nonzero if the function being compiled has undergone hot/cold partitioning >> >>> + (under flag_reorder_blocks_and_partition) and has at least one cold >> >>> + block. */ >> >>> + bool has_bb_partition; >> >>> + >> >>> /* Like regs_ever_live, but 1 if a reg is set or clobbered from an >> >>> asm. Unlike regs_ever_live, elements of this array corresponding >> >>> to eliminable regs (like the frame pointer) are set if an asm >> >>> Index: hw-doloop.c >> >>> =================================================================== >> >>> --- hw-doloop.c (revision 193376) >> >>> +++ hw-doloop.c (working copy) >> >>> @@ -547,7 +547,7 @@ reorder_loops (hwloop_info loops) >> >>> else >> >>> bb->aux = NULL; >> >>> } >> >>> - cfg_layout_finalize (); >> >>> + cfg_layout_finalize (false); >> >>> clear_aux_for_blocks (); >> >>> df_analyze (); >> >>> } >> >>> Index: cfgcleanup.c >> >>> =================================================================== >> >>> --- cfgcleanup.c (revision 193376) >> >>> +++ cfgcleanup.c (working copy) >> >>> @@ -1824,7 +1824,7 @@ try_crossjump_to_edge (int mode, edge e1, edge e2, >> >>> partition boundaries). See the comments at the top of >> >>> bb-reorder.c:partition_hot_cold_basic_blocks for complete details. */ >> >>> >> >>> - if (flag_reorder_blocks_and_partition && reload_completed) >> >>> + if (crtl->has_bb_partition && reload_completed) >> >>> return false; >> >>> >> >>> /* Search backward through forwarder blocks. We don't need to worry >> >>> @@ -2767,10 +2767,21 @@ try_optimize_cfg (int mode) >> >>> df_analyze (); >> >>> } >> >>> >> >>> + if (changed) >> >>> + { >> >>> + /* Edge forwarding in particular can cause hot blocks previously >> >>> + reached by both hot and cold blocks to become dominated only >> >>> + by cold blocks. This will cause the verification below to fail, >> >>> + and lead to now cold code in the hot section. This is not easy >> >>> + to detect and fix during edge forwarding, and in some cases >> >>> + is only visible after newly unreachable blocks are deleted, >> >>> + which will be done in fixup_partitions. */ >> >>> + fixup_partitions (); >> >>> + >> >>> #ifdef ENABLE_CHECKING >> >>> - if (changed) >> >>> - verify_flow_info (); >> >>> + verify_flow_info (); >> >>> #endif >> >>> + } >> >>> >> >>> changed_overall |= changed; >> >>> first_pass = false; >> >>> Index: bb-reorder.c >> >>> =================================================================== >> >>> --- bb-reorder.c (revision 193376) >> >>> +++ bb-reorder.c (working copy) >> >>> @@ -1054,7 +1054,7 @@ connect_traces (int n_traces, struct trace *traces >> >>> current_partition = BB_PARTITION (traces[0].first); >> >>> two_passes = false; >> >>> >> >>> - if (flag_reorder_blocks_and_partition) >> >>> + if (crtl->has_bb_partition) >> >>> for (i = 0; i < n_traces && !two_passes; i++) >> >>> if (BB_PARTITION (traces[0].first) >> >>> != BB_PARTITION (traces[i].first)) >> >>> @@ -1263,7 +1263,7 @@ connect_traces (int n_traces, struct trace *traces >> >>> } >> >>> } >> >>> >> >>> - if (flag_reorder_blocks_and_partition) >> >>> + if (crtl->has_bb_partition) >> >>> try_copy = false; >> >>> >> >>> /* Copy tiny blocks always; copy larger blocks only when the >> >>> @@ -1381,13 +1381,14 @@ get_uncond_jump_length (void) >> >>> return length; >> >>> } >> >>> >> >>> -/* Emit a barrier into the footer of BB. */ >> >>> +/* Emit a barrier after BB, into the footer if we are in CFGLAYOUT mode. */ >> >>> >> >>> -static void >> >>> +void >> >>> emit_barrier_after_bb (basic_block bb) >> >>> { >> >>> rtx barrier = emit_barrier_after (BB_END (bb)); >> >>> - BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier); >> >>> + if (current_ir_type () == IR_RTL_CFGLAYOUT) >> >>> + BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier); >> >>> } >> >>> >> >>> /* The landing pad OLD_LP, in block OLD_BB, has edges from both partitions. >> >>> @@ -1463,18 +1464,109 @@ find_rarely_executed_basic_blocks_and_crossing_edg >> >>> { >> >>> VEC(edge, heap) *crossing_edges = NULL; >> >>> basic_block bb; >> >>> - edge e; >> >>> - edge_iterator ei; >> >>> + edge e, e2; >> >>> + edge_iterator ei, ei2; >> >>> + unsigned int cold_bb_count = 0; >> >>> + VEC (basic_block, heap) *bbs_in_hot_partition = NULL; >> >>> + VEC (basic_block, heap) *bbs_newly_hot = NULL; >> >>> >> >>> /* Mark which partition (hot/cold) each basic block belongs in. */ >> >>> FOR_EACH_BB (bb) >> >>> { >> >>> if (probably_never_executed_bb_p (cfun, bb)) >> >>> - BB_SET_PARTITION (bb, BB_COLD_PARTITION); >> >>> + { >> >>> + BB_SET_PARTITION (bb, BB_COLD_PARTITION); >> >>> + cold_bb_count++; >> >>> + } >> >>> else >> >>> - BB_SET_PARTITION (bb, BB_HOT_PARTITION); >> >>> + { >> >>> + BB_SET_PARTITION (bb, BB_HOT_PARTITION); >> >>> + VEC_safe_push (basic_block, heap, bbs_in_hot_partition, bb); >> >>> + } >> >>> } >> >>> >> >>> + /* Ensure that no cold bbs dominate hot bbs. This could happen as a result of >> >>> + several different possibilities. One is that there are edge weight insanities >> >>> + due to optimization phases that do not properly update basic block profile >> >>> + counts. The second is that the entry of the function may not be hot, because >> >>> + it is entered fewer times than the number of profile training runs, but there >> >>> + is a loop inside the function that causes blocks within the function to be >> >>> + above the threshold for hotness. */ >> >>> + if (cold_bb_count) >> >>> + { >> >>> + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); >> >>> + >> >>> + if (dom_calculated_here) >> >>> + calculate_dominance_info (CDI_DOMINATORS); >> >>> + >> >>> + /* Keep examining hot bbs until we have either checked them all, or >> >>> + re-marked all cold bbs hot. */ >> >>> + while (! VEC_empty (basic_block, bbs_in_hot_partition) >> >>> + && cold_bb_count) >> >>> + { >> >>> + basic_block dom_bb; >> >>> + >> >>> + bb = VEC_pop (basic_block, bbs_in_hot_partition); >> >>> + dom_bb = get_immediate_dominator (CDI_DOMINATORS, bb); >> >>> + >> >>> + /* If bb's immediate dominator is also hot then it is ok. */ >> >>> + if (BB_PARTITION (dom_bb) != BB_COLD_PARTITION) >> >>> + continue; >> >>> + >> >>> + /* We have a hot bb with an immediate dominator that is cold. >> >>> + The dominator needs to be re-marked to hot. */ >> >>> + BB_SET_PARTITION (dom_bb, BB_HOT_PARTITION); >> >>> + cold_bb_count--; >> >>> + >> >>> + /* Now we need to examine newly-hot dom_bb to see if it is also >> >>> + dominated by a cold bb. */ >> >>> + VEC_safe_push (basic_block, heap, bbs_in_hot_partition, dom_bb); >> >>> + >> >>> + /* We should also adjust any cold blocks that the newly-hot bb >> >>> + feeds and see if it makes sense to re-mark those as hot as >> >>> + well. */ >> >>> + VEC_safe_push (basic_block, heap, bbs_newly_hot, dom_bb); >> >>> + while (! VEC_empty (basic_block, bbs_newly_hot)) >> >>> + { >> >>> + basic_block new_hot_bb = VEC_pop (basic_block, bbs_newly_hot); >> >>> + /* Examine all successors of this newly-hot bb to see if they >> >>> + are cold and should be re-marked as hot. */ >> >>> + FOR_EACH_EDGE (e, ei, new_hot_bb->succs) >> >>> + { >> >>> + bool any_cold_preds = false; >> >>> + basic_block succ = e->dest; >> >>> + if (BB_PARTITION (succ) != BB_COLD_PARTITION) >> >>> + continue; >> >>> + /* Does this block have any cold predecessors now? */ >> >>> + FOR_EACH_EDGE (e2, ei2, succ->preds) >> >>> + { >> >>> + if (BB_PARTITION (e2->src) == BB_COLD_PARTITION) >> >>> + { >> >>> + any_cold_preds = true; >> >>> + break; >> >>> + } >> >>> + } >> >>> + if (any_cold_preds) >> >>> + continue; >> >>> + >> >>> + /* Here we have a successor of newly-hot bb that is cold >> >>> + but no longer has any cold precessessors. Since the original >> >>> + assignment of our newly-hot bb was incorrect, this successor's >> >>> + assignment as cold is also suspect. Go ahead and re-mark it >> >>> + as hot now too. Better heuristics may be in order here. */ >> >>> + BB_SET_PARTITION (succ, BB_HOT_PARTITION); >> >>> + cold_bb_count--; >> >>> + VEC_safe_push (basic_block, heap, bbs_in_hot_partition, succ); >> >>> + /* Examine this successor as a newly-hot bb. */ >> >>> + VEC_safe_push (basic_block, heap, bbs_newly_hot, succ); >> >>> + } >> >>> + } >> >>> + } >> >>> + >> >>> + if (dom_calculated_here) >> >>> + free_dominance_info (CDI_DOMINATORS); >> >>> + } >> >>> + >> >>> /* The format of .gcc_except_table does not allow landing pads to >> >>> be in a different partition as the throw. Fix this by either >> >>> moving or duplicating the landing pads. */ >> >>> @@ -1766,10 +1858,10 @@ fix_up_fall_thru_edges (void) >> >>> new_bb->aux = cur_bb->aux; >> >>> cur_bb->aux = new_bb; >> >>> >> >>> - /* Make sure new fall-through bb is in same >> >>> - partition as bb it's falling through from. */ >> >>> + /* This is done by force_nonfallthru_and_redirect. */ >> >>> + gcc_assert (BB_PARTITION (new_bb) >> >>> + == BB_PARTITION (cur_bb)); >> >>> >> >>> - BB_COPY_PARTITION (new_bb, cur_bb); >> >>> single_succ_edge (new_bb)->flags |= EDGE_CROSSING; >> >>> } >> >>> else >> >>> @@ -2067,7 +2159,10 @@ add_reg_crossing_jump_notes (void) >> >>> FOR_EACH_BB (bb) >> >>> FOR_EACH_EDGE (e, ei, bb->succs) >> >>> if ((e->flags & EDGE_CROSSING) >> >>> - && JUMP_P (BB_END (e->src))) >> >>> + && JUMP_P (BB_END (e->src)) >> >>> + /* Some notes were added during fix_up_fall_thru_edges, via >> >>> + force_nonfallthru_and_redirect. */ >> >>> + && !find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX)) >> >>> add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); >> >>> } >> >>> >> >>> @@ -2160,7 +2255,7 @@ reorder_basic_blocks (void) >> >>> dump_flow_info (dump_file, dump_flags); >> >>> } >> >>> >> >>> - if (flag_reorder_blocks_and_partition) >> >>> + if (crtl->has_bb_partition) >> >>> verify_hot_cold_block_grouping (); >> >>> } >> >>> >> >>> @@ -2172,14 +2267,14 @@ reorder_basic_blocks (void) >> >>> encountering this note will make the compiler switch between the >> >>> hot and cold text sections. */ >> >>> >> >>> -static void >> >>> +void >> >>> insert_section_boundary_note (void) >> >>> { >> >>> basic_block bb; >> >>> rtx new_note; >> >>> int first_partition = 0; >> >>> >> >>> - if (!flag_reorder_blocks_and_partition) >> >>> + if (!crtl->has_bb_partition) >> >>> return; >> >>> >> >>> FOR_EACH_BB (bb) >> >>> @@ -2222,10 +2317,8 @@ rest_of_handle_reorder_blocks (void) >> >>> FOR_EACH_BB (bb) >> >>> if (bb->next_bb != EXIT_BLOCK_PTR) >> >>> bb->aux = bb->next_bb; >> >>> - cfg_layout_finalize (); >> >>> + cfg_layout_finalize (true); >> >>> >> >>> - /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes. */ >> >>> - insert_section_boundary_note (); >> >>> return 0; >> >>> } >> >>> >> >>> @@ -2366,7 +2459,7 @@ duplicate_computed_gotos (void) >> >>> } >> >>> >> >>> done: >> >>> - cfg_layout_finalize (); >> >>> + cfg_layout_finalize (false); >> >>> >> >>> BITMAP_FREE (candidates); >> >>> return 0; >> >>> @@ -2511,6 +2604,8 @@ partition_hot_cold_basic_blocks (void) >> >>> if (crossing_edges == NULL) >> >>> return 0; >> >>> >> >>> + crtl->has_bb_partition = true; >> >>> + >> >>> /* Make sure the source of any crossing edge ends in a jump and the >> >>> destination of any crossing edge has a label. */ >> >>> add_labels_and_missing_jumps (crossing_edges); >> >>> Index: bb-reorder.h >> >>> =================================================================== >> >>> --- bb-reorder.h (revision 193376) >> >>> +++ bb-reorder.h (working copy) >> >>> @@ -36,4 +36,8 @@ extern struct target_bb_reorder *this_target_bb_re >> >>> >> >>> extern int get_uncond_jump_length (void); >> >>> >> >>> +extern void insert_section_boundary_note (void); >> >>> + >> >>> +extern void emit_barrier_after_bb (basic_block bb); >> >>> + >> >>> #endif >> >>> Index: basic-block.h >> >>> =================================================================== >> >>> --- basic-block.h (revision 193376) >> >>> +++ basic-block.h (working copy) >> >>> @@ -806,6 +806,7 @@ extern basic_block force_nonfallthru_and_redirect >> >>> extern bool contains_no_active_insn_p (const_basic_block); >> >>> extern bool forwarder_block_p (const_basic_block); >> >>> extern bool can_fallthru (basic_block, basic_block); >> >>> +extern void fixup_partitions (void); >> >>> >> >>> /* In cfgbuild.c. */ >> >>> extern void find_many_sub_basic_blocks (sbitmap); >> >>> Index: cfgrtl.c >> >>> =================================================================== >> >>> --- cfgrtl.c (revision 193376) >> >>> +++ cfgrtl.c (working copy) >> >>> @@ -46,6 +46,7 @@ along with GCC; see the file COPYING3. If not see >> >>> #include "tree.h" >> >>> #include "hard-reg-set.h" >> >>> #include "basic-block.h" >> >>> +#include "bb-reorder.h" >> >>> #include "regs.h" >> >>> #include "flags.h" >> >>> #include "function.h" >> >>> @@ -67,11 +68,12 @@ along with GCC; see the file COPYING3. If not see >> >>> Only applicable if the CFG is in cfglayout mode. */ >> >>> static GTY(()) rtx cfg_layout_function_footer; >> >>> static GTY(()) rtx cfg_layout_function_header; >> >>> +static bool had_sec_boundary_notes; >> >>> >> >>> static rtx skip_insns_after_block (basic_block); >> >>> static void record_effective_endpoints (void); >> >>> static rtx label_for_bb (basic_block); >> >>> -static void fixup_reorder_chain (void); >> >>> +static void fixup_reorder_chain (bool finalize_reorder_blocks); >> >>> >> >>> void verify_insn_chain (void); >> >>> static void fixup_fallthru_exit_predecessor (void); >> >>> @@ -976,8 +978,7 @@ try_redirect_by_replacing_jump (edge e, basic_bloc >> >>> partition boundaries). See the comments at the top of >> >>> bb-reorder.c:partition_hot_cold_basic_blocks for complete details. */ >> >>> >> >>> - if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX) >> >>> - || BB_PARTITION (src) != BB_PARTITION (target)) >> >>> + if (BB_PARTITION (src) != BB_PARTITION (target)) >> >>> return NULL; >> >>> >> >>> /* We can replace or remove a complex jump only when we have exactly >> >>> @@ -1286,6 +1287,71 @@ redirect_branch_edge (edge e, basic_block target) >> >>> return e; >> >>> } >> >>> >> >>> +/* Called when edge E has been redirected to a new destination, >> >>> + in order to update the region crossing flag on the edge and >> >>> + jump. */ >> >>> + >> >>> +static void >> >>> +fixup_partition_crossing (edge e, basic_block target) >> >>> +{ >> >>> + rtx note; >> >>> + >> >>> + gcc_assert (e->dest == target); >> >>> + >> >>> + if (e->src == ENTRY_BLOCK_PTR || target == EXIT_BLOCK_PTR) >> >>> + return; >> >>> + /* If we redirected an existing edge, it may already be marked >> >>> + crossing, even though the new src is missing a reg crossing note. >> >>> + But make sure reg crossing note doesn't already exist before >> >>> + inserting. */ >> >>> + if (BB_PARTITION (e->src) != BB_PARTITION (target)) >> >>> + { >> >>> + e->flags |= EDGE_CROSSING; >> >>> + note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); >> >>> + if (JUMP_P (BB_END (e->src)) >> >>> + && !note) >> >>> + add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); >> >>> + } >> >>> + else if (BB_PARTITION (e->src) == BB_PARTITION (target)) >> >>> + { >> >>> + e->flags &= ~EDGE_CROSSING; >> >>> + /* Remove the region crossing note from jump at end of >> >>> + e->src if it exists. */ >> >>> + note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); >> >>> + if (note) >> >>> + remove_note (BB_END (e->src), note); >> >>> + } >> >>> +} >> >>> + >> >>> +/* Called when block BB has been reassigned to a different partition, >> >>> + to ensure that the region crossing attributes are updated. */ >> >>> + >> >>> +static void >> >>> +fixup_bb_partition (basic_block bb) >> >>> +{ >> >>> + edge e; >> >>> + edge_iterator ei; >> >>> + >> >>> + /* Now need to make bb's pred edges non-region crossing. */ >> >>> + FOR_EACH_EDGE (e, ei, bb->preds) >> >>> + { >> >>> + fixup_partition_crossing (e, e->dest); >> >>> + } >> >>> + >> >>> + /* Possibly need to make bb's successor edges region crossing, >> >>> + or remove stale region crossing. */ >> >>> + FOR_EACH_EDGE (e, ei, bb->succs) >> >>> + { >> >>> + if ((e->flags & EDGE_FALLTHRU) >> >>> + && BB_PARTITION (bb) != BB_PARTITION (e->dest) >> >>> + && e->dest != EXIT_BLOCK_PTR) >> >>> + /* force_nonfallthru_and_redirect calls fixup_partition_crossing. */ >> >>> + force_nonfallthru (e); >> >>> + else >> >>> + fixup_partition_crossing (e, e->dest); >> >>> + } >> >>> +} >> >>> + >> >>> /* Attempt to change code to redirect edge E to TARGET. Don't do that on >> >>> expense of adding new instructions or reordering basic blocks. >> >>> >> >>> @@ -1302,16 +1368,18 @@ rtl_redirect_edge_and_branch (edge e, basic_block >> >>> { >> >>> edge ret; >> >>> basic_block src = e->src; >> >>> + basic_block dest = e->dest; >> >>> >> >>> if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH)) >> >>> return NULL; >> >>> >> >>> - if (e->dest == target) >> >>> + if (dest == target) >> >>> return e; >> >>> >> >>> if ((ret = try_redirect_by_replacing_jump (e, target, false)) != NULL) >> >>> { >> >>> df_set_bb_dirty (src); >> >>> + fixup_partition_crossing (ret, target); >> >>> return ret; >> >>> } >> >>> >> >>> @@ -1320,6 +1388,7 @@ rtl_redirect_edge_and_branch (edge e, basic_block >> >>> return NULL; >> >>> >> >>> df_set_bb_dirty (src); >> >>> + fixup_partition_crossing (ret, target); >> >>> return ret; >> >>> } >> >>> >> >>> @@ -1454,18 +1523,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc >> >>> /* Make sure new block ends up in correct hot/cold section. */ >> >>> >> >>> BB_COPY_PARTITION (jump_block, e->src); >> >>> - if (flag_reorder_blocks_and_partition >> >>> - && targetm_common.have_named_sections >> >>> - && JUMP_P (BB_END (jump_block)) >> >>> - && !any_condjump_p (BB_END (jump_block)) >> >>> - && (EDGE_SUCC (jump_block, 0)->flags & EDGE_CROSSING)) >> >>> - add_reg_note (BB_END (jump_block), REG_CROSSING_JUMP, NULL_RTX); >> >>> >> >>> /* Wire edge in. */ >> >>> new_edge = make_edge (e->src, jump_block, EDGE_FALLTHRU); >> >>> new_edge->probability = probability; >> >>> new_edge->count = count; >> >>> >> >>> + /* If e->src was previously region crossing, it no longer is >> >>> + and the reg crossing note should be removed. */ >> >>> + fixup_partition_crossing (new_edge, jump_block); >> >>> + >> >>> /* Redirect old edge. */ >> >>> redirect_edge_pred (e, jump_block); >> >>> e->probability = REG_BR_PROB_BASE; >> >>> @@ -1521,13 +1588,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc >> >>> LABEL_NUSES (label)++; >> >>> } >> >>> >> >>> - emit_barrier_after (BB_END (jump_block)); >> >>> + /* We might be in cfg layout mode, and if so, the following routine will >> >>> + insert the barrier correctly. */ >> >>> + emit_barrier_after_bb (jump_block); >> >>> redirect_edge_succ_nodup (e, target); >> >>> >> >>> if (abnormal_edge_flags) >> >>> make_edge (src, target, abnormal_edge_flags); >> >>> >> >>> df_mark_solutions_dirty (); >> >>> + fixup_partition_crossing (e, target); >> >>> return new_bb; >> >>> } >> >>> >> >>> @@ -1626,7 +1696,7 @@ rtl_move_block_after (basic_block bb ATTRIBUTE_UNU >> >>> static basic_block >> >>> rtl_split_edge (edge edge_in) >> >>> { >> >>> - basic_block bb; >> >>> + basic_block bb, new_bb; >> >>> rtx before; >> >>> >> >>> /* Abnormal edges cannot be split. */ >> >>> @@ -1659,12 +1729,26 @@ rtl_split_edge (edge edge_in) >> >>> else >> >>> { >> >>> bb = create_basic_block (before, NULL, edge_in->dest->prev_bb); >> >>> - /* ??? Why not edge_in->dest->prev_bb here? */ >> >>> - BB_COPY_PARTITION (bb, edge_in->dest); >> >>> + if (edge_in->src == ENTRY_BLOCK_PTR) >> >>> + BB_COPY_PARTITION (bb, edge_in->dest); >> >>> + else >> >>> + /* Put the split bb into the src partition, to avoid creating >> >>> + a situation where a cold bb dominates a hot bb, in the case >> >>> + where src is cold and dest is hot. The src will dominate >> >>> + the new bb (whereas it might not have dominated dest). */ >> >>> + BB_COPY_PARTITION (bb, edge_in->src); >> >>> } >> >>> >> >>> make_single_succ_edge (bb, edge_in->dest, EDGE_FALLTHRU); >> >>> >> >>> + /* Can't allow a region crossing edge to be fallthrough. */ >> >>> + if (BB_PARTITION (bb) != BB_PARTITION (edge_in->dest) >> >>> + && edge_in->dest != EXIT_BLOCK_PTR) >> >>> + { >> >>> + new_bb = force_nonfallthru (single_succ_edge (bb)); >> >>> + gcc_assert (!new_bb); >> >>> + } >> >>> + >> >>> /* For non-fallthru edges, we must adjust the predecessor's >> >>> jump instruction to target our new block. */ >> >>> if ((edge_in->flags & EDGE_FALLTHRU) == 0) >> >>> @@ -1777,17 +1861,13 @@ commit_one_edge_insertion (edge e) >> >>> else >> >>> { >> >>> bb = split_edge (e); >> >>> - after = BB_END (bb); >> >>> >> >>> - if (flag_reorder_blocks_and_partition >> >>> - && targetm_common.have_named_sections >> >>> - && e->src != ENTRY_BLOCK_PTR >> >>> - && BB_PARTITION (e->src) == BB_COLD_PARTITION >> >>> - && !(e->flags & EDGE_CROSSING) >> >>> - && JUMP_P (after) >> >>> - && !any_condjump_p (after) >> >>> - && (single_succ_edge (bb)->flags & EDGE_CROSSING)) >> >>> - add_reg_note (after, REG_CROSSING_JUMP, NULL_RTX); >> >>> + /* If e crossed a partition boundary, we needed to make bb end in >> >>> + a region-crossing jump, even though it was originally fallthru. */ >> >>> + if (JUMP_P (BB_END (bb))) >> >>> + before = BB_END (bb); >> >>> + else >> >>> + after = BB_END (bb); >> >>> } >> >>> >> >>> /* Now that we've found the spot, do the insertion. */ >> >>> @@ -1827,6 +1907,14 @@ commit_edge_insertions (void) >> >>> { >> >>> basic_block bb; >> >>> >> >>> + /* Optimization passes that invoke this routine can cause hot blocks >> >>> + previously reached by both hot and cold blocks to become dominated only >> >>> + by cold blocks. This will cause the verification below to fail, >> >>> + and lead to now cold code in the hot section. In some cases this >> >>> + may only be visible after newly unreachable blocks are deleted, >> >>> + which will be done by fixup_partitions. */ >> >>> + fixup_partitions (); >> >>> + >> >>> #ifdef ENABLE_CHECKING >> >>> verify_flow_info (); >> >>> #endif >> >>> @@ -2028,7 +2116,75 @@ get_last_bb_insn (basic_block bb) >> >>> >> >>> return end; >> >>> } >> >>> - >> >>> + >> >>> +/* Perform cleanup on the hot/cold bb partitioning after optimization >> >>> + passes that modify the cfg. */ >> >>> + >> >>> +void >> >>> +fixup_partitions (void) >> >>> +{ >> >>> + basic_block bb; >> >>> + >> >>> + if (!crtl->has_bb_partition) >> >>> + return; >> >>> + >> >>> + /* Delete any blocks that became unreachable and weren't >> >>> + already cleaned up, for example during edge forwarding >> >>> + and convert_jumps_to_returns. This will expose more >> >>> + opportunities for fixing the partition boundaries here. >> >>> + Also, the calculation of the dominance graph during verification >> >>> + will assert if there are unreachable nodes. */ >> >>> + delete_unreachable_blocks (); >> >>> + >> >>> + /* If there are partitions, do a sanity check on them: A basic block in >> >>> + a cold partition cannot dominate a basic block in a hot partition. >> >>> + Fixup any that now violate this requirement, as a result of edge >> >>> + forwarding and unreachable block deletion. */ >> >>> + VEC (basic_block, heap) *bbs_in_cold_partition = NULL; >> >>> + VEC (basic_block, heap) *bbs_to_fix = NULL; >> >>> + FOR_EACH_BB (bb) >> >>> + if ((BB_PARTITION (bb) == BB_COLD_PARTITION)) >> >>> + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb); >> >>> + if (! VEC_empty (basic_block, bbs_in_cold_partition)) >> >>> + { >> >>> + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); >> >>> + basic_block son; >> >>> + >> >>> + if (dom_calculated_here) >> >>> + calculate_dominance_info (CDI_DOMINATORS); >> >>> + >> >>> + while (! VEC_empty (basic_block, bbs_in_cold_partition)) >> >>> + { >> >>> + bb = VEC_pop (basic_block, bbs_in_cold_partition); >> >>> + /* If bb is not yet cold (because it was added below as >> >>> + a block dominated by a cold bb) then mark it cold here. */ >> >>> + if ((BB_PARTITION (bb) != BB_COLD_PARTITION)) >> >>> + { >> >>> + BB_SET_PARTITION (bb, BB_COLD_PARTITION); >> >>> + VEC_safe_push (basic_block, heap, bbs_to_fix, bb); >> >>> + } >> >>> + /* Any blocks dominated by a block in the cold section >> >>> + must also be cold. */ >> >>> + for (son = first_dom_son (CDI_DOMINATORS, bb); >> >>> + son; >> >>> + son = next_dom_son (CDI_DOMINATORS, son)) >> >>> + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son); >> >>> + } >> >>> + >> >>> + if (dom_calculated_here) >> >>> + free_dominance_info (CDI_DOMINATORS); >> >>> + } >> >>> + >> >>> + /* Do the partition fixup after all necessary blocks have been converted to >> >>> + cold, so that we only update the region crossings the minimum number of >> >>> + places, which can require forcing edges to be non fallthru. */ >> >>> + while (! VEC_empty (basic_block, bbs_to_fix)) >> >>> + { >> >>> + bb = VEC_pop (basic_block, bbs_to_fix); >> >>> + fixup_bb_partition (bb); >> >>> + } >> >>> +} >> >>> + >> >>> /* Verify the CFG and RTL consistency common for both underlying RTL and >> >>> cfglayout RTL. >> >>> >> >>> @@ -2052,6 +2208,7 @@ rtl_verify_flow_info_1 (void) >> >>> rtx x; >> >>> int err = 0; >> >>> basic_block bb; >> >>> + bool have_partitions = false; >> >>> >> >>> /* Check the general integrity of the basic blocks. */ >> >>> FOR_EACH_BB_REVERSE (bb) >> >>> @@ -2169,6 +2326,8 @@ rtl_verify_flow_info_1 (void) >> >>> >> >>> if (e->flags & EDGE_ABNORMAL) >> >>> n_abnormal++; >> >>> + >> >>> + have_partitions |= is_crossing; >> >>> } >> >>> >> >>> if (n_eh && !find_reg_note (BB_END (bb), REG_EH_REGION, NULL_RTX)) >> >>> @@ -2293,6 +2452,40 @@ rtl_verify_flow_info_1 (void) >> >>> } >> >>> } >> >>> >> >>> + /* If there are partitions, do a sanity check on them: A basic block in >> >>> + a cold partition cannot dominate a basic block in a hot partition. */ >> >>> + VEC (basic_block, heap) *bbs_in_cold_partition = NULL; >> >>> + if (have_partitions && !err) >> >>> + FOR_EACH_BB (bb) >> >>> + if ((BB_PARTITION (bb) == BB_COLD_PARTITION)) >> >>> + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb); >> >>> + if (! VEC_empty (basic_block, bbs_in_cold_partition)) >> >>> + { >> >>> + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); >> >>> + basic_block son; >> >>> + >> >>> + if (dom_calculated_here) >> >>> + calculate_dominance_info (CDI_DOMINATORS); >> >>> + >> >>> + while (! VEC_empty (basic_block, bbs_in_cold_partition)) >> >>> + { >> >>> + bb = VEC_pop (basic_block, bbs_in_cold_partition); >> >>> + if ((BB_PARTITION (bb) != BB_COLD_PARTITION)) >> >>> + { >> >>> + error ("non-cold basic block %d dominated " >> >>> + "by a block in the cold partition", bb->index); >> >>> + err = 1; >> >>> + } >> >>> + for (son = first_dom_son (CDI_DOMINATORS, bb); >> >>> + son; >> >>> + son = next_dom_son (CDI_DOMINATORS, son)) >> >>> + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son); >> >>> + } >> >>> + >> >>> + if (dom_calculated_here) >> >>> + free_dominance_info (CDI_DOMINATORS); >> >>> + } >> >>> + >> >>> /* Clean up. */ >> >>> return err; >> >>> } >> >>> @@ -2965,14 +3158,41 @@ record_effective_endpoints (void) >> >>> else >> >>> cfg_layout_function_header = NULL_RTX; >> >>> >> >>> + had_sec_boundary_notes = false; >> >>> + >> >>> next_insn = get_insns (); >> >>> FOR_EACH_BB (bb) >> >>> { >> >>> rtx end; >> >>> >> >>> if (PREV_INSN (BB_HEAD (bb)) && next_insn != BB_HEAD (bb)) >> >>> - BB_HEADER (bb) = unlink_insn_chain (next_insn, >> >>> - PREV_INSN (BB_HEAD (bb))); >> >>> + { >> >>> + /* Rather than try to keep section boundary notes incrementally >> >>> + up-to-date through cfg layout optimizations, simply remove them >> >>> + and flag that they should be re-inserted when exiting >> >>> + cfg layout mode. */ >> >>> + rtx check_insn = next_insn; >> >>> + while (check_insn) >> >>> + { >> >>> + if (NOTE_P (check_insn) >> >>> + && NOTE_KIND (check_insn) == NOTE_INSN_SWITCH_TEXT_SECTIONS) >> >>> + { >> >>> + had_sec_boundary_notes |= true; >> >>> + /* Remove note from chain. Grab new next_insn first. */ >> >>> + if (next_insn == check_insn) >> >>> + next_insn = NEXT_INSN (check_insn); >> >>> + /* Delete note. */ >> >>> + delete_insn (check_insn); >> >>> + /* There will only be one. */ >> >>> + break; >> >>> + } >> >>> + check_insn = NEXT_INSN (check_insn); >> >>> + } >> >>> + /* If we still have header instructions left after above loop. */ >> >>> + if (next_insn != BB_HEAD (bb)) >> >>> + BB_HEADER (bb) = unlink_insn_chain (next_insn, >> >>> + PREV_INSN (BB_HEAD (bb))); >> >>> + } >> >>> end = skip_insns_after_block (bb); >> >>> if (NEXT_INSN (BB_END (bb)) && BB_END (bb) != end) >> >>> BB_FOOTER (bb) = unlink_insn_chain (NEXT_INSN (BB_END (bb)), end); >> >>> @@ -3000,7 +3220,7 @@ outof_cfg_layout_mode (void) >> >>> if (bb->next_bb != EXIT_BLOCK_PTR) >> >>> bb->aux = bb->next_bb; >> >>> >> >>> - cfg_layout_finalize (); >> >>> + cfg_layout_finalize (false); >> >>> >> >>> return 0; >> >>> } >> >>> @@ -3120,10 +3340,13 @@ relink_block_chain (bool stay_in_cfglayout_mode) >> >>> } >> >>> >> >>> >> >>> -/* Given a reorder chain, rearrange the code to match. */ >> >>> +/* Given a reorder chain, rearrange the code to match. If >> >>> + this is called when we will FINALIZE_REORDER_BLOCKS, or when >> >>> + section boundary notes were removed on entry to cfg layout >> >>> + mode, insert section boundary notes here. */ >> >>> >> >>> static void >> >>> -fixup_reorder_chain (void) >> >>> +fixup_reorder_chain (bool finalize_reorder_blocks) >> >>> { >> >>> basic_block bb; >> >>> rtx insn = NULL; >> >>> @@ -3150,7 +3373,7 @@ static void >> >>> PREV_INSN (BB_HEADER (bb)) = insn; >> >>> insn = BB_HEADER (bb); >> >>> while (NEXT_INSN (insn)) >> >>> - insn = NEXT_INSN (insn); >> >>> + insn = NEXT_INSN (insn); >> >>> } >> >>> if (insn) >> >>> NEXT_INSN (insn) = BB_HEAD (bb); >> >>> @@ -3175,6 +3398,11 @@ static void >> >>> insn = NEXT_INSN (insn); >> >>> >> >>> set_last_insn (insn); >> >>> + >> >>> + /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes. */ >> >>> + if (had_sec_boundary_notes || finalize_reorder_blocks) >> >>> + insert_section_boundary_note (); >> >>> + >> >>> #ifdef ENABLE_CHECKING >> >>> verify_insn_chain (); >> >>> #endif >> >>> @@ -3187,7 +3415,7 @@ static void >> >>> edge e_fall, e_taken, e; >> >>> rtx bb_end_insn; >> >>> rtx ret_label = NULL_RTX; >> >>> - basic_block nb, src_bb; >> >>> + basic_block nb; >> >>> edge_iterator ei; >> >>> >> >>> if (EDGE_COUNT (bb->succs) == 0) >> >>> @@ -3322,7 +3550,6 @@ static void >> >>> /* We got here if we need to add a new jump insn. >> >>> Note force_nonfallthru can delete E_FALL and thus we have to >> >>> save E_FALL->src prior to the call to force_nonfallthru. */ >> >>> - src_bb = e_fall->src; >> >>> nb = force_nonfallthru_and_redirect (e_fall, e_fall->dest, ret_label); >> >>> if (nb) >> >>> { >> >>> @@ -3330,17 +3557,6 @@ static void >> >>> bb->aux = nb; >> >>> /* Don't process this new block. */ >> >>> bb = nb; >> >>> - >> >>> - /* Make sure new bb is tagged for correct section (same as >> >>> - fall-thru source, since you cannot fall-thru across >> >>> - section boundaries). */ >> >>> - BB_COPY_PARTITION (src_bb, single_pred (bb)); >> >>> - if (flag_reorder_blocks_and_partition >> >>> - && targetm_common.have_named_sections >> >>> - && JUMP_P (BB_END (bb)) >> >>> - && !any_condjump_p (BB_END (bb)) >> >>> - && (EDGE_SUCC (bb, 0)->flags & EDGE_CROSSING)) >> >>> - add_reg_note (BB_END (bb), REG_CROSSING_JUMP, NULL_RTX); >> >>> } >> >>> } >> >>> >> >>> @@ -3644,10 +3860,11 @@ duplicate_insn_chain (rtx from, rtx to) >> >>> case NOTE_INSN_FUNCTION_BEG: >> >>> /* There is always just single entry to function. */ >> >>> case NOTE_INSN_BASIC_BLOCK: >> >>> + /* We should only switch text sections once. */ >> >>> + case NOTE_INSN_SWITCH_TEXT_SECTIONS: >> >>> break; >> >>> >> >>> case NOTE_INSN_EPILOGUE_BEG: >> >>> - case NOTE_INSN_SWITCH_TEXT_SECTIONS: >> >>> emit_note_copy (insn); >> >>> break; >> >>> >> >>> @@ -3759,10 +3976,13 @@ break_superblocks (void) >> >>> } >> >>> >> >>> /* Finalize the changes: reorder insn list according to the sequence specified >> >>> - by aux pointers, enter compensation code, rebuild scope forest. */ >> >>> + by aux pointers, enter compensation code, rebuild scope forest. If >> >>> + this is called when we will FINALIZE_REORDER_BLOCKS, indicate that >> >>> + to fixup_reorder_chain so that it can insert the proper switch text >> >>> + section notes. */ >> >>> >> >>> void >> >>> -cfg_layout_finalize (void) >> >>> +cfg_layout_finalize (bool finalize_reorder_blocks) >> >>> { >> >>> #ifdef ENABLE_CHECKING >> >>> verify_flow_info (); >> >>> @@ -3775,7 +3995,7 @@ void >> >>> #endif >> >>> ) >> >>> fixup_fallthru_exit_predecessor (); >> >>> - fixup_reorder_chain (); >> >>> + fixup_reorder_chain (finalize_reorder_blocks); >> >>> >> >>> rebuild_jump_labels (get_insns ()); >> >>> delete_dead_jumptables (); >> >>> @@ -4454,8 +4674,7 @@ rtl_can_remove_branch_p (const_edge e) >> >>> if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH)) >> >>> return false; >> >>> >> >>> - if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX) >> >>> - || BB_PARTITION (src) != BB_PARTITION (target)) >> >>> + if (BB_PARTITION (src) != BB_PARTITION (target)) >> >>> return false; >> >>> >> >>> if (!onlyjump_p (insn) >> >>> >> >>> -- >> >>> This patch is available for review at http://codereview.appspot.com/6823047 >> >> >> >> >> >> >> >> -- >> >> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413 >> >> >> >> -- >> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413 -- Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Sign in to reply to this message.
I have updated my trunk checkout, and I can confirm that eval.c now compiles with your patch (and the other 4 patches I added to PR55121). Now, when looking at the whole Spec2k results: - vpr passes now (used to fail) - gcc, parser, perlbmk bzip2 and twolf no longer build: they all fail with the same error from gas: can't resolve `.text.unlikely' {.text.unlikely section} - `.LBB171' {.text section} - gap still does not build (same error as above) I haven't looked in detail, so I may be missing an obvious patch here. And I still observe runtime mis-behaviour on crafty, galgel, facerec and fma3d. Thanks Christophe. On 26 November 2012 21:52, Teresa Johnson <tejohnson@google.com> wrote: > Sorry, I don't know what happened there. Patch is attached. > Thanks, > Teresa > > On Mon, Nov 26, 2012 at 12:42 PM, Jack Howarth <howarth@bromo.med.uc.edu> wrote: >> On Mon, Nov 26, 2012 at 12:19:55PM -0800, Teresa Johnson wrote: >>> Are you sure you have all my changes applied? I applied the 4 patches >>> attached to PR55121 into my trunk checkout that has my fixes, and to a >>> pristine trunk checkout. I configured and built both for >>> --target=arm-none-linux-gnueabi, and built using your options, .i file >>> and gcda file. I can reproduce the failure using the pristine trunk >>> with your patches but not with my fixed trunk + your patches. (I just >>> updated to head to pickup recent changes and get the same result. The >>> vec changes required some manual changes to the patch, which I will >>> resend shortly.) >> >> Teresa, >> Your mailer seems to have corrupted the posted patch with stray >> =3D characters and line breaks. Can you repost a copy as an attachment >> to the list? >> Jack >> >>> >>> Without my fixes: >>> >>> $ ~/extra/gcc_trunk_3_arm-eabi/gcc/cc1 -fpreproce >>> ssed eval.i -quiet -dumpbase eval.c -march=armv7-a -mtune=cortex-a9 >>> -mthumb -mfpu=neon -mvectorize-with-neon-quad -mfloat-abi=softfp >>> -mtls-dialect=gnu -auxbase-strip eval.o -g -O3 -version -fprofile-use >>> -fno-common -o eval.s -freorder-blocks-and-partition >>> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi) >>> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version >>> 2.4.2-p1, MPC version 0.8.1 >>> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 >>> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi) >>> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version >>> 2.4.2-p1, MPC version 0.8.1 >>> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 >>> Compiler executable checksum: d19cc60a2f07de08237a8488bb35cd1a >>> eval.c: In function ‘Ge’: >>> eval.c:792:1: internal compiler error: in df_compact_blocks, at df-core.c:1560 >>> } >>> ^ >>> 0x622f71 df_compact_blocks() >>> ../../gcc_trunk_3/gcc/df-core.c:1560 >>> 0x5cfcb5 compact_blocks() >>> ../../gcc_trunk_3/gcc/cfg.c:162 >>> 0xc9dce0 reorder_basic_blocks >>> ../../gcc_trunk_3/gcc/bb-reorder.c:2154 >>> 0xc9dce0 rest_of_handle_reorder_blocks >>> ../../gcc_trunk_3/gcc/bb-reorder.c:2219 >>> Please submit a full bug report, >>> with preprocessed source if appropriate. >>> Please include the complete backtrace with any bug report. >>> See <http://gcc.gnu.org/bugs.html> for instructions. >>> >>> >>> With my fixes: >>> >>> $ ~/extra/gcc_trunk_4_arm-eabi/gcc/cc1 -fpreproce >>> ssed eval.i -quiet -dumpbase eval.c -march=armv7-a -mtune=cortex-a9 >>> -mthumb -mfpu=neon -mvectorize-with-neon-quad -mfloat-abi=softfp >>> -mtls-dialect=gnu -auxbase-strip eval.o -g -O3 -version -fprofile-use >>> -fno-common -o eval.s -freorder-blocks-and-partition >>> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi) >>> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version >>> 2.4.2-p1, MPC version 0.8.1 >>> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 >>> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi) >>> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version >>> 2.4.2-p1, MPC version 0.8.1 >>> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 >>> Compiler executable checksum: 45b468efa7c981f9afb44c4dac2424f3 >>> >>> >>> Thanks, >>> Teresa >>> >>> On Mon, Nov 26, 2012 at 8:25 AM, Christophe Lyon >>> <christophe.lyon@linaro.org> wrote: >>> > Hi, >>> > >>> > I have tested your patch on Spec2000 on ARM, and I can still see >>> > several failures caused by: >>> > "error: fallthru edge crosses section boundary", including the case >>> > described in PR55121. >>> > >>> > On 26 November 2012 16:55, Teresa Johnson <tejohnson@google.com> wrote: >>> >> Ping. >>> >> Teresa >>> >> >>> >> On Thu, Nov 15, 2012 at 12:10 PM, Teresa Johnson <tejohnson@google.com> wrote: >>> >>> Revised patch that fixes failures encountered when enabling >>> >>> -freorder-blocks-and-partition, including the failure reported in PR 53743. >>> >>> >>> >>> This includes new verification code to ensure no cold blocks dominate hot >>> >>> blocks contributed by Steven Bosscher. >>> >>> >>> >>> I attempted to make the handling of partition updates through the optimization >>> >>> passes much more consistent, removing a number of partial fixes in the code >>> >>> stream in the process. The code to fixup partitions (including the BB_PARTITION >>> >>> assignement, region crossing jump notes, and switch text section notes) is >>> >>> now handled in a few centralized locations. For example, inside >>> >>> rtl_redirect_edge_and_branch and force_nonfallthru_and_redirect, so that callers >>> >>> don't need to attempt the fixup themselves. >>> >>> >>> >>> For optimization passes that make adjustments to the cfg while in cfg layout >>> >>> mode that are not easy to fix up incrementally, the new routine >>> >>> fixup_partitions handles the cleanup globally. This does require calculation >>> >>> of the dominance relation, however, as far as I can tell the routines which >>> >>> now invoke this global fixup (try_optimize_cfg and commit_edge_insertions) >>> >>> are invoked typically once (or a small number of times in the case of >>> >>> try_optimize_cfg) per optimization pass. Additionally, I compared the >>> >>> -ftime-report output for some large fdo compilations and saw only minimal >>> >>> increases in the dominance computation times, which were only a tiny percent >>> >>> of the overall compile time. >>> >>> >>> >>> Additionally, I added a flag to the rtl_data structure to indicate whether >>> >>> any partitioning was actually performed, so that optimizations which were >>> >>> conservatively disabled whenever the flag_reorder_blocks_and_partition >>> >>> is enabled (e.g. try_crossjump_to_edge, part of connect_traces) can be less >>> >>> conservative for functions where no partitions were formed (e.g. they are >>> >>> completely hot). >>> >>> >>> >>> Bootstrapped and tested on x86_64-unknown-linux-gnu. Also tested with SPEC2006 int >>> >>> benchmarks and internal google benchmarks using profile feedback and >>> >>> -freorder-blocks-and-partition to get more coverage. Ok for trunk? >>> >>> >>> >>> Thanks, >>> >>> Teresa >>> >>> >>> >>> 2012-11-14 Teresa Johnson <tejohnson@google.com> >>> >>> Steven Bosscher <steven@gcc.gnu.org> >>> >>> >>> >>> * cfghooks.h (cfg_layout_finalize): New parameter. >>> >>> * modulo-sched.c (rest_of_handle_sms): New cfg_layout_finalize >>> >>> parameter. >>> >>> * ifcvt.c (find_if_case_1): Replace BB_COPY_PARTITION with assert >>> >>> as this is now done by redirect_edge_and_branch_force. >>> >>> * function.c (thread_prologue_and_epilogue_insns): Insert new bb after >>> >>> barriers, new cfg_layout_finalize parameter, and don't store exit >>> >>> predecessor BB until after it is potentially split. >>> >>> * function.h (struct rtl_data): New flag has_bb_partition. >>> >>> * hw-doloop.c (reorder_loops): New cfg_layout_finalize parameter. >>> >>> * cfgcleanup.c (try_crossjump_to_edge): Only skip optimization if >>> >>> any blocks in function actually partitioned. >>> >>> (try_optimize_cfg): If cfg changed, invoke fixup_partitions to clean >>> >>> up partitioning. >>> >>> * bb-reorder.c (connect_traces): Only look for partitions and skip >>> >>> block copying if any blocks in function actually partitioned. >>> >>> (emit_barrier_after_bb): Handle insertion in non-cfglayout mode. >>> >>> (find_rarely_executed_basic_blocks_and_crossing_edges): Ensure >>> >>> that no cold blocks dominate a hot block. >>> >>> (fix_up_fall_thru_edges): Replace BB_COPY_PARTITION with assert >>> >>> as this is now done by force_nonfallthru_and_redirect. >>> >>> (add_reg_crossing_jump_notes): Handle the fact that some jumps may >>> >>> already be marked with region crossing note. >>> >>> (reorder_basic_blocks): Only need to verify partitions if any >>> >>> blocks in function actually partitioned. >>> >>> (insert_section_boundary_note): Only need to insert note if any >>> >>> blocks in function actually partitioned. >>> >>> (rest_of_handle_reorder_blocks): New cfg_layout_finalize >>> >>> parameter, and remove call to insert_section_boundary_note as this >>> >>> is now called via cfg_layout_finalize/fixup_reorder_chain. >>> >>> (duplicate_computed_gotos): New cfg_layout_finalize >>> >>> parameter. >>> >>> (partition_hot_cold_basic_blocks): Set flag indicating function >>> >>> has bb partitions. >>> >>> * bb-reorder.h: Declare insert_section_boundary_note and >>> >>> emit_barrier_after_bb, which are no longer static. >>> >>> * basic-block.h: Declare new function fixup_partitions. >>> >>> * cfgrtl.c (try_redirect_by_replacing_jump): Remove unnecessary >>> >>> check for region crossing note. >>> >>> (fixup_partition_crossing): New function. >>> >>> (fixup_bb_partition): Ditto. >>> >>> (rtl_redirect_edge_and_branch): Fixup partition boundaries. >>> >>> (force_nonfallthru_and_redirect): Fixup partition boundaries, >>> >>> remove old code that tried to do this. Emit barrier correctly >>> >>> when we are in cfglayout mode. >>> >>> (rtl_split_edge): Correctly fixup partition boundaries. >>> >>> (commit_one_edge_insertion): Remove old code that tried to >>> >>> fixup region crossing edge since this is now handled in >>> >>> split_block, and set up insertion point correctly since >>> >>> block may now end in a jump. >>> >>> (commit_edge_insertions): Invoke fixup_partitions to sanitize partition >>> >>> boundaries after optimizations that modify cfg and before trying to >>> >>> verify the flow info. >>> >>> (fixup_partitions): New function. >>> >>> (rtl_verify_flow_info_1): Add verification that no cold bbs dominate >>> >>> hot bbs. >>> >>> (record_effective_endpoints): Remove region-crossing notes and set flag >>> >>> indicating that they need to be reinserted on exit from cfglayout mode. >>> >>> (outof_cfg_layout_mode): New cfg_layout_finalize parameter. >>> >>> (fixup_reorder_chain): Call insert_section_boundary_note if necessary. >>> >>> Remove old code that attempted to fixup region crossing note as >>> >>> this is now handled in force_nonfallthru_and_redirect. >>> >>> (duplicate_insn_chain): Don't duplicate switch section notes. >>> >>> (cfg_layout_finalize): Pass new parameter to fixup_reorder_chain. >>> >>> (rtl_can_remove_branch_p): Remove unnecessary check for region crossing >>> >>> note. >>> >>> >>> >>> Index: cfghooks.h >>> >>> =================================================================== >>> >>> --- cfghooks.h (revision 193376) >>> >>> +++ cfghooks.h (working copy) >>> >>> @@ -204,7 +204,7 @@ extern void copy_bbs (basic_block *, unsigned, bas >>> >>> void account_profile_record (struct profile_record *, int); >>> >>> >>> >>> extern void cfg_layout_initialize (unsigned int); >>> >>> -extern void cfg_layout_finalize (void); >>> >>> +extern void cfg_layout_finalize (bool); >>> >>> >>> >>> /* Hooks containers. */ >>> >>> extern struct cfg_hooks gimple_cfg_hooks; >>> >>> @@ -218,4 +218,3 @@ extern void cfg_layout_rtl_register_cfg_hooks (voi >>> >>> extern void gimple_register_cfg_hooks (void); >>> >>> extern struct cfg_hooks get_cfg_hooks (void); >>> >>> extern void set_cfg_hooks (struct cfg_hooks); >>> >>> - >>> >>> Index: modulo-sched.c >>> >>> =================================================================== >>> >>> --- modulo-sched.c (revision 193376) >>> >>> +++ modulo-sched.c (working copy) >>> >>> @@ -3354,7 +3354,7 @@ rest_of_handle_sms (void) >>> >>> if (bb->next_bb != EXIT_BLOCK_PTR) >>> >>> bb->aux = bb->next_bb; >>> >>> free_dominance_info (CDI_DOMINATORS); >>> >>> - cfg_layout_finalize (); >>> >>> + cfg_layout_finalize (false); >>> >>> #endif /* INSN_SCHEDULING */ >>> >>> return 0; >>> >>> } >>> >>> Index: ifcvt.c >>> >>> =================================================================== >>> >>> --- ifcvt.c (revision 193376) >>> >>> +++ ifcvt.c (working copy) >>> >>> @@ -3900,10 +3900,9 @@ find_if_case_1 (basic_block test_bb, edge then_edg >>> >>> if (new_bb) >>> >>> { >>> >>> df_bb_replace (then_bb_index, new_bb); >>> >>> - /* Since the fallthru edge was redirected from test_bb to new_bb, >>> >>> - we need to ensure that new_bb is in the same partition as >>> >>> - test bb (you can not fall through across section boundaries). */ >>> >>> - BB_COPY_PARTITION (new_bb, test_bb); >>> >>> + /* This should have been done above via force_nonfallthru_and_redirect >>> >>> + (possibly called from redirect_edge_and_branch_force). */ >>> >>> + gcc_assert (BB_PARTITION (new_bb) == BB_PARTITION (test_bb)); >>> >>> } >>> >>> >>> >>> num_true_changes++; >>> >>> Index: function.c >>> >>> =================================================================== >>> >>> --- function.c (revision 193376) >>> >>> +++ function.c (working copy) >>> >>> @@ -6249,8 +6249,10 @@ thread_prologue_and_epilogue_insns (void) >>> >>> break; >>> >>> if (e) >>> >>> { >>> >>> - copy_bb = create_basic_block (NEXT_INSN (BB_END (e->src)), >>> >>> - NULL_RTX, e->src); >>> >>> + /* Make sure we insert after any barriers. */ >>> >>> + rtx end = get_last_bb_insn (e->src); >>> >>> + copy_bb = create_basic_block (NEXT_INSN (end), >>> >>> + NULL_RTX, e->src); >>> >>> BB_COPY_PARTITION (copy_bb, e->src); >>> >>> } >>> >>> else >>> >>> @@ -6475,7 +6477,7 @@ thread_prologue_and_epilogue_insns (void) >>> >>> if (cur_bb->index >= NUM_FIXED_BLOCKS >>> >>> && cur_bb->next_bb->index >= NUM_FIXED_BLOCKS) >>> >>> cur_bb->aux = cur_bb->next_bb; >>> >>> - cfg_layout_finalize (); >>> >>> + cfg_layout_finalize (false); >>> >>> } >>> >>> >>> >>> epilogue_done: >>> >>> @@ -6517,7 +6519,7 @@ epilogue_done: >>> >>> basic_block simple_return_block_cold = NULL; >>> >>> edge pending_edge_hot = NULL; >>> >>> edge pending_edge_cold = NULL; >>> >>> - basic_block exit_pred = EXIT_BLOCK_PTR->prev_bb; >>> >>> + basic_block exit_pred; >>> >>> int i; >>> >>> >>> >>> gcc_assert (entry_edge != orig_entry_edge); >>> >>> @@ -6545,6 +6547,12 @@ epilogue_done: >>> >>> else >>> >>> pending_edge_cold = e; >>> >>> } >>> >>> + >>> >>> + /* Save a pointer to the exit's predecessor BB for use in >>> >>> + inserting new BBs at the end of the function. Do this >>> >>> + after the call to split_block above which may split >>> >>> + the original exit pred. */ >>> >>> + exit_pred = EXIT_BLOCK_PTR->prev_bb; >>> >>> >>> >>> FOR_EACH_VEC_ELT (edge, unconverted_simple_returns, i, e) >>> >>> { >>> >>> Index: function.h >>> >>> =================================================================== >>> >>> --- function.h (revision 193376) >>> >>> +++ function.h (working copy) >>> >>> @@ -459,6 +459,11 @@ struct GTY(()) rtl_data { >>> >>> sched2) and is useful only if the port defines LEAF_REGISTERS. */ >>> >>> bool uses_only_leaf_regs; >>> >>> >>> >>> + /* Nonzero if the function being compiled has undergone hot/cold partitioning >>> >>> + (under flag_reorder_blocks_and_partition) and has at least one cold >>> >>> + block. */ >>> >>> + bool has_bb_partition; >>> >>> + >>> >>> /* Like regs_ever_live, but 1 if a reg is set or clobbered from an >>> >>> asm. Unlike regs_ever_live, elements of this array corresponding >>> >>> to eliminable regs (like the frame pointer) are set if an asm >>> >>> Index: hw-doloop.c >>> >>> =================================================================== >>> >>> --- hw-doloop.c (revision 193376) >>> >>> +++ hw-doloop.c (working copy) >>> >>> @@ -547,7 +547,7 @@ reorder_loops (hwloop_info loops) >>> >>> else >>> >>> bb->aux = NULL; >>> >>> } >>> >>> - cfg_layout_finalize (); >>> >>> + cfg_layout_finalize (false); >>> >>> clear_aux_for_blocks (); >>> >>> df_analyze (); >>> >>> } >>> >>> Index: cfgcleanup.c >>> >>> =================================================================== >>> >>> --- cfgcleanup.c (revision 193376) >>> >>> +++ cfgcleanup.c (working copy) >>> >>> @@ -1824,7 +1824,7 @@ try_crossjump_to_edge (int mode, edge e1, edge e2, >>> >>> partition boundaries). See the comments at the top of >>> >>> bb-reorder.c:partition_hot_cold_basic_blocks for complete details. */ >>> >>> >>> >>> - if (flag_reorder_blocks_and_partition && reload_completed) >>> >>> + if (crtl->has_bb_partition && reload_completed) >>> >>> return false; >>> >>> >>> >>> /* Search backward through forwarder blocks. We don't need to worry >>> >>> @@ -2767,10 +2767,21 @@ try_optimize_cfg (int mode) >>> >>> df_analyze (); >>> >>> } >>> >>> >>> >>> + if (changed) >>> >>> + { >>> >>> + /* Edge forwarding in particular can cause hot blocks previously >>> >>> + reached by both hot and cold blocks to become dominated only >>> >>> + by cold blocks. This will cause the verification below to fail, >>> >>> + and lead to now cold code in the hot section. This is not easy >>> >>> + to detect and fix during edge forwarding, and in some cases >>> >>> + is only visible after newly unreachable blocks are deleted, >>> >>> + which will be done in fixup_partitions. */ >>> >>> + fixup_partitions (); >>> >>> + >>> >>> #ifdef ENABLE_CHECKING >>> >>> - if (changed) >>> >>> - verify_flow_info (); >>> >>> + verify_flow_info (); >>> >>> #endif >>> >>> + } >>> >>> >>> >>> changed_overall |= changed; >>> >>> first_pass = false; >>> >>> Index: bb-reorder.c >>> >>> =================================================================== >>> >>> --- bb-reorder.c (revision 193376) >>> >>> +++ bb-reorder.c (working copy) >>> >>> @@ -1054,7 +1054,7 @@ connect_traces (int n_traces, struct trace *traces >>> >>> current_partition = BB_PARTITION (traces[0].first); >>> >>> two_passes = false; >>> >>> >>> >>> - if (flag_reorder_blocks_and_partition) >>> >>> + if (crtl->has_bb_partition) >>> >>> for (i = 0; i < n_traces && !two_passes; i++) >>> >>> if (BB_PARTITION (traces[0].first) >>> >>> != BB_PARTITION (traces[i].first)) >>> >>> @@ -1263,7 +1263,7 @@ connect_traces (int n_traces, struct trace *traces >>> >>> } >>> >>> } >>> >>> >>> >>> - if (flag_reorder_blocks_and_partition) >>> >>> + if (crtl->has_bb_partition) >>> >>> try_copy = false; >>> >>> >>> >>> /* Copy tiny blocks always; copy larger blocks only when the >>> >>> @@ -1381,13 +1381,14 @@ get_uncond_jump_length (void) >>> >>> return length; >>> >>> } >>> >>> >>> >>> -/* Emit a barrier into the footer of BB. */ >>> >>> +/* Emit a barrier after BB, into the footer if we are in CFGLAYOUT mode. */ >>> >>> >>> >>> -static void >>> >>> +void >>> >>> emit_barrier_after_bb (basic_block bb) >>> >>> { >>> >>> rtx barrier = emit_barrier_after (BB_END (bb)); >>> >>> - BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier); >>> >>> + if (current_ir_type () == IR_RTL_CFGLAYOUT) >>> >>> + BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier); >>> >>> } >>> >>> >>> >>> /* The landing pad OLD_LP, in block OLD_BB, has edges from both partitions. >>> >>> @@ -1463,18 +1464,109 @@ find_rarely_executed_basic_blocks_and_crossing_edg >>> >>> { >>> >>> VEC(edge, heap) *crossing_edges = NULL; >>> >>> basic_block bb; >>> >>> - edge e; >>> >>> - edge_iterator ei; >>> >>> + edge e, e2; >>> >>> + edge_iterator ei, ei2; >>> >>> + unsigned int cold_bb_count = 0; >>> >>> + VEC (basic_block, heap) *bbs_in_hot_partition = NULL; >>> >>> + VEC (basic_block, heap) *bbs_newly_hot = NULL; >>> >>> >>> >>> /* Mark which partition (hot/cold) each basic block belongs in. */ >>> >>> FOR_EACH_BB (bb) >>> >>> { >>> >>> if (probably_never_executed_bb_p (cfun, bb)) >>> >>> - BB_SET_PARTITION (bb, BB_COLD_PARTITION); >>> >>> + { >>> >>> + BB_SET_PARTITION (bb, BB_COLD_PARTITION); >>> >>> + cold_bb_count++; >>> >>> + } >>> >>> else >>> >>> - BB_SET_PARTITION (bb, BB_HOT_PARTITION); >>> >>> + { >>> >>> + BB_SET_PARTITION (bb, BB_HOT_PARTITION); >>> >>> + VEC_safe_push (basic_block, heap, bbs_in_hot_partition, bb); >>> >>> + } >>> >>> } >>> >>> >>> >>> + /* Ensure that no cold bbs dominate hot bbs. This could happen as a result of >>> >>> + several different possibilities. One is that there are edge weight insanities >>> >>> + due to optimization phases that do not properly update basic block profile >>> >>> + counts. The second is that the entry of the function may not be hot, because >>> >>> + it is entered fewer times than the number of profile training runs, but there >>> >>> + is a loop inside the function that causes blocks within the function to be >>> >>> + above the threshold for hotness. */ >>> >>> + if (cold_bb_count) >>> >>> + { >>> >>> + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); >>> >>> + >>> >>> + if (dom_calculated_here) >>> >>> + calculate_dominance_info (CDI_DOMINATORS); >>> >>> + >>> >>> + /* Keep examining hot bbs until we have either checked them all, or >>> >>> + re-marked all cold bbs hot. */ >>> >>> + while (! VEC_empty (basic_block, bbs_in_hot_partition) >>> >>> + && cold_bb_count) >>> >>> + { >>> >>> + basic_block dom_bb; >>> >>> + >>> >>> + bb = VEC_pop (basic_block, bbs_in_hot_partition); >>> >>> + dom_bb = get_immediate_dominator (CDI_DOMINATORS, bb); >>> >>> + >>> >>> + /* If bb's immediate dominator is also hot then it is ok. */ >>> >>> + if (BB_PARTITION (dom_bb) != BB_COLD_PARTITION) >>> >>> + continue; >>> >>> + >>> >>> + /* We have a hot bb with an immediate dominator that is cold. >>> >>> + The dominator needs to be re-marked to hot. */ >>> >>> + BB_SET_PARTITION (dom_bb, BB_HOT_PARTITION); >>> >>> + cold_bb_count--; >>> >>> + >>> >>> + /* Now we need to examine newly-hot dom_bb to see if it is also >>> >>> + dominated by a cold bb. */ >>> >>> + VEC_safe_push (basic_block, heap, bbs_in_hot_partition, dom_bb); >>> >>> + >>> >>> + /* We should also adjust any cold blocks that the newly-hot bb >>> >>> + feeds and see if it makes sense to re-mark those as hot as >>> >>> + well. */ >>> >>> + VEC_safe_push (basic_block, heap, bbs_newly_hot, dom_bb); >>> >>> + while (! VEC_empty (basic_block, bbs_newly_hot)) >>> >>> + { >>> >>> + basic_block new_hot_bb = VEC_pop (basic_block, bbs_newly_hot); >>> >>> + /* Examine all successors of this newly-hot bb to see if they >>> >>> + are cold and should be re-marked as hot. */ >>> >>> + FOR_EACH_EDGE (e, ei, new_hot_bb->succs) >>> >>> + { >>> >>> + bool any_cold_preds = false; >>> >>> + basic_block succ = e->dest; >>> >>> + if (BB_PARTITION (succ) != BB_COLD_PARTITION) >>> >>> + continue; >>> >>> + /* Does this block have any cold predecessors now? */ >>> >>> + FOR_EACH_EDGE (e2, ei2, succ->preds) >>> >>> + { >>> >>> + if (BB_PARTITION (e2->src) == BB_COLD_PARTITION) >>> >>> + { >>> >>> + any_cold_preds = true; >>> >>> + break; >>> >>> + } >>> >>> + } >>> >>> + if (any_cold_preds) >>> >>> + continue; >>> >>> + >>> >>> + /* Here we have a successor of newly-hot bb that is cold >>> >>> + but no longer has any cold precessessors. Since the original >>> >>> + assignment of our newly-hot bb was incorrect, this successor's >>> >>> + assignment as cold is also suspect. Go ahead and re-mark it >>> >>> + as hot now too. Better heuristics may be in order here. */ >>> >>> + BB_SET_PARTITION (succ, BB_HOT_PARTITION); >>> >>> + cold_bb_count--; >>> >>> + VEC_safe_push (basic_block, heap, bbs_in_hot_partition, succ); >>> >>> + /* Examine this successor as a newly-hot bb. */ >>> >>> + VEC_safe_push (basic_block, heap, bbs_newly_hot, succ); >>> >>> + } >>> >>> + } >>> >>> + } >>> >>> + >>> >>> + if (dom_calculated_here) >>> >>> + free_dominance_info (CDI_DOMINATORS); >>> >>> + } >>> >>> + >>> >>> /* The format of .gcc_except_table does not allow landing pads to >>> >>> be in a different partition as the throw. Fix this by either >>> >>> moving or duplicating the landing pads. */ >>> >>> @@ -1766,10 +1858,10 @@ fix_up_fall_thru_edges (void) >>> >>> new_bb->aux = cur_bb->aux; >>> >>> cur_bb->aux = new_bb; >>> >>> >>> >>> - /* Make sure new fall-through bb is in same >>> >>> - partition as bb it's falling through from. */ >>> >>> + /* This is done by force_nonfallthru_and_redirect. */ >>> >>> + gcc_assert (BB_PARTITION (new_bb) >>> >>> + == BB_PARTITION (cur_bb)); >>> >>> >>> >>> - BB_COPY_PARTITION (new_bb, cur_bb); >>> >>> single_succ_edge (new_bb)->flags |= EDGE_CROSSING; >>> >>> } >>> >>> else >>> >>> @@ -2067,7 +2159,10 @@ add_reg_crossing_jump_notes (void) >>> >>> FOR_EACH_BB (bb) >>> >>> FOR_EACH_EDGE (e, ei, bb->succs) >>> >>> if ((e->flags & EDGE_CROSSING) >>> >>> - && JUMP_P (BB_END (e->src))) >>> >>> + && JUMP_P (BB_END (e->src)) >>> >>> + /* Some notes were added during fix_up_fall_thru_edges, via >>> >>> + force_nonfallthru_and_redirect. */ >>> >>> + && !find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX)) >>> >>> add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); >>> >>> } >>> >>> >>> >>> @@ -2160,7 +2255,7 @@ reorder_basic_blocks (void) >>> >>> dump_flow_info (dump_file, dump_flags); >>> >>> } >>> >>> >>> >>> - if (flag_reorder_blocks_and_partition) >>> >>> + if (crtl->has_bb_partition) >>> >>> verify_hot_cold_block_grouping (); >>> >>> } >>> >>> >>> >>> @@ -2172,14 +2267,14 @@ reorder_basic_blocks (void) >>> >>> encountering this note will make the compiler switch between the >>> >>> hot and cold text sections. */ >>> >>> >>> >>> -static void >>> >>> +void >>> >>> insert_section_boundary_note (void) >>> >>> { >>> >>> basic_block bb; >>> >>> rtx new_note; >>> >>> int first_partition = 0; >>> >>> >>> >>> - if (!flag_reorder_blocks_and_partition) >>> >>> + if (!crtl->has_bb_partition) >>> >>> return; >>> >>> >>> >>> FOR_EACH_BB (bb) >>> >>> @@ -2222,10 +2317,8 @@ rest_of_handle_reorder_blocks (void) >>> >>> FOR_EACH_BB (bb) >>> >>> if (bb->next_bb != EXIT_BLOCK_PTR) >>> >>> bb->aux = bb->next_bb; >>> >>> - cfg_layout_finalize (); >>> >>> + cfg_layout_finalize (true); >>> >>> >>> >>> - /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes. */ >>> >>> - insert_section_boundary_note (); >>> >>> return 0; >>> >>> } >>> >>> >>> >>> @@ -2366,7 +2459,7 @@ duplicate_computed_gotos (void) >>> >>> } >>> >>> >>> >>> done: >>> >>> - cfg_layout_finalize (); >>> >>> + cfg_layout_finalize (false); >>> >>> >>> >>> BITMAP_FREE (candidates); >>> >>> return 0; >>> >>> @@ -2511,6 +2604,8 @@ partition_hot_cold_basic_blocks (void) >>> >>> if (crossing_edges == NULL) >>> >>> return 0; >>> >>> >>> >>> + crtl->has_bb_partition = true; >>> >>> + >>> >>> /* Make sure the source of any crossing edge ends in a jump and the >>> >>> destination of any crossing edge has a label. */ >>> >>> add_labels_and_missing_jumps (crossing_edges); >>> >>> Index: bb-reorder.h >>> >>> =================================================================== >>> >>> --- bb-reorder.h (revision 193376) >>> >>> +++ bb-reorder.h (working copy) >>> >>> @@ -36,4 +36,8 @@ extern struct target_bb_reorder *this_target_bb_re >>> >>> >>> >>> extern int get_uncond_jump_length (void); >>> >>> >>> >>> +extern void insert_section_boundary_note (void); >>> >>> + >>> >>> +extern void emit_barrier_after_bb (basic_block bb); >>> >>> + >>> >>> #endif >>> >>> Index: basic-block.h >>> >>> =================================================================== >>> >>> --- basic-block.h (revision 193376) >>> >>> +++ basic-block.h (working copy) >>> >>> @@ -806,6 +806,7 @@ extern basic_block force_nonfallthru_and_redirect >>> >>> extern bool contains_no_active_insn_p (const_basic_block); >>> >>> extern bool forwarder_block_p (const_basic_block); >>> >>> extern bool can_fallthru (basic_block, basic_block); >>> >>> +extern void fixup_partitions (void); >>> >>> >>> >>> /* In cfgbuild.c. */ >>> >>> extern void find_many_sub_basic_blocks (sbitmap); >>> >>> Index: cfgrtl.c >>> >>> =================================================================== >>> >>> --- cfgrtl.c (revision 193376) >>> >>> +++ cfgrtl.c (working copy) >>> >>> @@ -46,6 +46,7 @@ along with GCC; see the file COPYING3. If not see >>> >>> #include "tree.h" >>> >>> #include "hard-reg-set.h" >>> >>> #include "basic-block.h" >>> >>> +#include "bb-reorder.h" >>> >>> #include "regs.h" >>> >>> #include "flags.h" >>> >>> #include "function.h" >>> >>> @@ -67,11 +68,12 @@ along with GCC; see the file COPYING3. If not see >>> >>> Only applicable if the CFG is in cfglayout mode. */ >>> >>> static GTY(()) rtx cfg_layout_function_footer; >>> >>> static GTY(()) rtx cfg_layout_function_header; >>> >>> +static bool had_sec_boundary_notes; >>> >>> >>> >>> static rtx skip_insns_after_block (basic_block); >>> >>> static void record_effective_endpoints (void); >>> >>> static rtx label_for_bb (basic_block); >>> >>> -static void fixup_reorder_chain (void); >>> >>> +static void fixup_reorder_chain (bool finalize_reorder_blocks); >>> >>> >>> >>> void verify_insn_chain (void); >>> >>> static void fixup_fallthru_exit_predecessor (void); >>> >>> @@ -976,8 +978,7 @@ try_redirect_by_replacing_jump (edge e, basic_bloc >>> >>> partition boundaries). See the comments at the top of >>> >>> bb-reorder.c:partition_hot_cold_basic_blocks for complete details. */ >>> >>> >>> >>> - if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX) >>> >>> - || BB_PARTITION (src) != BB_PARTITION (target)) >>> >>> + if (BB_PARTITION (src) != BB_PARTITION (target)) >>> >>> return NULL; >>> >>> >>> >>> /* We can replace or remove a complex jump only when we have exactly >>> >>> @@ -1286,6 +1287,71 @@ redirect_branch_edge (edge e, basic_block target) >>> >>> return e; >>> >>> } >>> >>> >>> >>> +/* Called when edge E has been redirected to a new destination, >>> >>> + in order to update the region crossing flag on the edge and >>> >>> + jump. */ >>> >>> + >>> >>> +static void >>> >>> +fixup_partition_crossing (edge e, basic_block target) >>> >>> +{ >>> >>> + rtx note; >>> >>> + >>> >>> + gcc_assert (e->dest == target); >>> >>> + >>> >>> + if (e->src == ENTRY_BLOCK_PTR || target == EXIT_BLOCK_PTR) >>> >>> + return; >>> >>> + /* If we redirected an existing edge, it may already be marked >>> >>> + crossing, even though the new src is missing a reg crossing note. >>> >>> + But make sure reg crossing note doesn't already exist before >>> >>> + inserting. */ >>> >>> + if (BB_PARTITION (e->src) != BB_PARTITION (target)) >>> >>> + { >>> >>> + e->flags |= EDGE_CROSSING; >>> >>> + note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); >>> >>> + if (JUMP_P (BB_END (e->src)) >>> >>> + && !note) >>> >>> + add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); >>> >>> + } >>> >>> + else if (BB_PARTITION (e->src) == BB_PARTITION (target)) >>> >>> + { >>> >>> + e->flags &= ~EDGE_CROSSING; >>> >>> + /* Remove the region crossing note from jump at end of >>> >>> + e->src if it exists. */ >>> >>> + note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); >>> >>> + if (note) >>> >>> + remove_note (BB_END (e->src), note); >>> >>> + } >>> >>> +} >>> >>> + >>> >>> +/* Called when block BB has been reassigned to a different partition, >>> >>> + to ensure that the region crossing attributes are updated. */ >>> >>> + >>> >>> +static void >>> >>> +fixup_bb_partition (basic_block bb) >>> >>> +{ >>> >>> + edge e; >>> >>> + edge_iterator ei; >>> >>> + >>> >>> + /* Now need to make bb's pred edges non-region crossing. */ >>> >>> + FOR_EACH_EDGE (e, ei, bb->preds) >>> >>> + { >>> >>> + fixup_partition_crossing (e, e->dest); >>> >>> + } >>> >>> + >>> >>> + /* Possibly need to make bb's successor edges region crossing, >>> >>> + or remove stale region crossing. */ >>> >>> + FOR_EACH_EDGE (e, ei, bb->succs) >>> >>> + { >>> >>> + if ((e->flags & EDGE_FALLTHRU) >>> >>> + && BB_PARTITION (bb) != BB_PARTITION (e->dest) >>> >>> + && e->dest != EXIT_BLOCK_PTR) >>> >>> + /* force_nonfallthru_and_redirect calls fixup_partition_crossing. */ >>> >>> + force_nonfallthru (e); >>> >>> + else >>> >>> + fixup_partition_crossing (e, e->dest); >>> >>> + } >>> >>> +} >>> >>> + >>> >>> /* Attempt to change code to redirect edge E to TARGET. Don't do that on >>> >>> expense of adding new instructions or reordering basic blocks. >>> >>> >>> >>> @@ -1302,16 +1368,18 @@ rtl_redirect_edge_and_branch (edge e, basic_block >>> >>> { >>> >>> edge ret; >>> >>> basic_block src = e->src; >>> >>> + basic_block dest = e->dest; >>> >>> >>> >>> if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH)) >>> >>> return NULL; >>> >>> >>> >>> - if (e->dest == target) >>> >>> + if (dest == target) >>> >>> return e; >>> >>> >>> >>> if ((ret = try_redirect_by_replacing_jump (e, target, false)) != NULL) >>> >>> { >>> >>> df_set_bb_dirty (src); >>> >>> + fixup_partition_crossing (ret, target); >>> >>> return ret; >>> >>> } >>> >>> >>> >>> @@ -1320,6 +1388,7 @@ rtl_redirect_edge_and_branch (edge e, basic_block >>> >>> return NULL; >>> >>> >>> >>> df_set_bb_dirty (src); >>> >>> + fixup_partition_crossing (ret, target); >>> >>> return ret; >>> >>> } >>> >>> >>> >>> @@ -1454,18 +1523,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc >>> >>> /* Make sure new block ends up in correct hot/cold section. */ >>> >>> >>> >>> BB_COPY_PARTITION (jump_block, e->src); >>> >>> - if (flag_reorder_blocks_and_partition >>> >>> - && targetm_common.have_named_sections >>> >>> - && JUMP_P (BB_END (jump_block)) >>> >>> - && !any_condjump_p (BB_END (jump_block)) >>> >>> - && (EDGE_SUCC (jump_block, 0)->flags & EDGE_CROSSING)) >>> >>> - add_reg_note (BB_END (jump_block), REG_CROSSING_JUMP, NULL_RTX); >>> >>> >>> >>> /* Wire edge in. */ >>> >>> new_edge = make_edge (e->src, jump_block, EDGE_FALLTHRU); >>> >>> new_edge->probability = probability; >>> >>> new_edge->count = count; >>> >>> >>> >>> + /* If e->src was previously region crossing, it no longer is >>> >>> + and the reg crossing note should be removed. */ >>> >>> + fixup_partition_crossing (new_edge, jump_block); >>> >>> + >>> >>> /* Redirect old edge. */ >>> >>> redirect_edge_pred (e, jump_block); >>> >>> e->probability = REG_BR_PROB_BASE; >>> >>> @@ -1521,13 +1588,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc >>> >>> LABEL_NUSES (label)++; >>> >>> } >>> >>> >>> >>> - emit_barrier_after (BB_END (jump_block)); >>> >>> + /* We might be in cfg layout mode, and if so, the following routine will >>> >>> + insert the barrier correctly. */ >>> >>> + emit_barrier_after_bb (jump_block); >>> >>> redirect_edge_succ_nodup (e, target); >>> >>> >>> >>> if (abnormal_edge_flags) >>> >>> make_edge (src, target, abnormal_edge_flags); >>> >>> >>> >>> df_mark_solutions_dirty (); >>> >>> + fixup_partition_crossing (e, target); >>> >>> return new_bb; >>> >>> } >>> >>> >>> >>> @@ -1626,7 +1696,7 @@ rtl_move_block_after (basic_block bb ATTRIBUTE_UNU >>> >>> static basic_block >>> >>> rtl_split_edge (edge edge_in) >>> >>> { >>> >>> - basic_block bb; >>> >>> + basic_block bb, new_bb; >>> >>> rtx before; >>> >>> >>> >>> /* Abnormal edges cannot be split. */ >>> >>> @@ -1659,12 +1729,26 @@ rtl_split_edge (edge edge_in) >>> >>> else >>> >>> { >>> >>> bb = create_basic_block (before, NULL, edge_in->dest->prev_bb); >>> >>> - /* ??? Why not edge_in->dest->prev_bb here? */ >>> >>> - BB_COPY_PARTITION (bb, edge_in->dest); >>> >>> + if (edge_in->src == ENTRY_BLOCK_PTR) >>> >>> + BB_COPY_PARTITION (bb, edge_in->dest); >>> >>> + else >>> >>> + /* Put the split bb into the src partition, to avoid creating >>> >>> + a situation where a cold bb dominates a hot bb, in the case >>> >>> + where src is cold and dest is hot. The src will dominate >>> >>> + the new bb (whereas it might not have dominated dest). */ >>> >>> + BB_COPY_PARTITION (bb, edge_in->src); >>> >>> } >>> >>> >>> >>> make_single_succ_edge (bb, edge_in->dest, EDGE_FALLTHRU); >>> >>> >>> >>> + /* Can't allow a region crossing edge to be fallthrough. */ >>> >>> + if (BB_PARTITION (bb) != BB_PARTITION (edge_in->dest) >>> >>> + && edge_in->dest != EXIT_BLOCK_PTR) >>> >>> + { >>> >>> + new_bb = force_nonfallthru (single_succ_edge (bb)); >>> >>> + gcc_assert (!new_bb); >>> >>> + } >>> >>> + >>> >>> /* For non-fallthru edges, we must adjust the predecessor's >>> >>> jump instruction to target our new block. */ >>> >>> if ((edge_in->flags & EDGE_FALLTHRU) == 0) >>> >>> @@ -1777,17 +1861,13 @@ commit_one_edge_insertion (edge e) >>> >>> else >>> >>> { >>> >>> bb = split_edge (e); >>> >>> - after = BB_END (bb); >>> >>> >>> >>> - if (flag_reorder_blocks_and_partition >>> >>> - && targetm_common.have_named_sections >>> >>> - && e->src != ENTRY_BLOCK_PTR >>> >>> - && BB_PARTITION (e->src) == BB_COLD_PARTITION >>> >>> - && !(e->flags & EDGE_CROSSING) >>> >>> - && JUMP_P (after) >>> >>> - && !any_condjump_p (after) >>> >>> - && (single_succ_edge (bb)->flags & EDGE_CROSSING)) >>> >>> - add_reg_note (after, REG_CROSSING_JUMP, NULL_RTX); >>> >>> + /* If e crossed a partition boundary, we needed to make bb end in >>> >>> + a region-crossing jump, even though it was originally fallthru. */ >>> >>> + if (JUMP_P (BB_END (bb))) >>> >>> + before = BB_END (bb); >>> >>> + else >>> >>> + after = BB_END (bb); >>> >>> } >>> >>> >>> >>> /* Now that we've found the spot, do the insertion. */ >>> >>> @@ -1827,6 +1907,14 @@ commit_edge_insertions (void) >>> >>> { >>> >>> basic_block bb; >>> >>> >>> >>> + /* Optimization passes that invoke this routine can cause hot blocks >>> >>> + previously reached by both hot and cold blocks to become dominated only >>> >>> + by cold blocks. This will cause the verification below to fail, >>> >>> + and lead to now cold code in the hot section. In some cases this >>> >>> + may only be visible after newly unreachable blocks are deleted, >>> >>> + which will be done by fixup_partitions. */ >>> >>> + fixup_partitions (); >>> >>> + >>> >>> #ifdef ENABLE_CHECKING >>> >>> verify_flow_info (); >>> >>> #endif >>> >>> @@ -2028,7 +2116,75 @@ get_last_bb_insn (basic_block bb) >>> >>> >>> >>> return end; >>> >>> } >>> >>> - >>> >>> + >>> >>> +/* Perform cleanup on the hot/cold bb partitioning after optimization >>> >>> + passes that modify the cfg. */ >>> >>> + >>> >>> +void >>> >>> +fixup_partitions (void) >>> >>> +{ >>> >>> + basic_block bb; >>> >>> + >>> >>> + if (!crtl->has_bb_partition) >>> >>> + return; >>> >>> + >>> >>> + /* Delete any blocks that became unreachable and weren't >>> >>> + already cleaned up, for example during edge forwarding >>> >>> + and convert_jumps_to_returns. This will expose more >>> >>> + opportunities for fixing the partition boundaries here. >>> >>> + Also, the calculation of the dominance graph during verification >>> >>> + will assert if there are unreachable nodes. */ >>> >>> + delete_unreachable_blocks (); >>> >>> + >>> >>> + /* If there are partitions, do a sanity check on them: A basic block in >>> >>> + a cold partition cannot dominate a basic block in a hot partition. >>> >>> + Fixup any that now violate this requirement, as a result of edge >>> >>> + forwarding and unreachable block deletion. */ >>> >>> + VEC (basic_block, heap) *bbs_in_cold_partition = NULL; >>> >>> + VEC (basic_block, heap) *bbs_to_fix = NULL; >>> >>> + FOR_EACH_BB (bb) >>> >>> + if ((BB_PARTITION (bb) == BB_COLD_PARTITION)) >>> >>> + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb); >>> >>> + if (! VEC_empty (basic_block, bbs_in_cold_partition)) >>> >>> + { >>> >>> + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); >>> >>> + basic_block son; >>> >>> + >>> >>> + if (dom_calculated_here) >>> >>> + calculate_dominance_info (CDI_DOMINATORS); >>> >>> + >>> >>> + while (! VEC_empty (basic_block, bbs_in_cold_partition)) >>> >>> + { >>> >>> + bb = VEC_pop (basic_block, bbs_in_cold_partition); >>> >>> + /* If bb is not yet cold (because it was added below as >>> >>> + a block dominated by a cold bb) then mark it cold here. */ >>> >>> + if ((BB_PARTITION (bb) != BB_COLD_PARTITION)) >>> >>> + { >>> >>> + BB_SET_PARTITION (bb, BB_COLD_PARTITION); >>> >>> + VEC_safe_push (basic_block, heap, bbs_to_fix, bb); >>> >>> + } >>> >>> + /* Any blocks dominated by a block in the cold section >>> >>> + must also be cold. */ >>> >>> + for (son = first_dom_son (CDI_DOMINATORS, bb); >>> >>> + son; >>> >>> + son = next_dom_son (CDI_DOMINATORS, son)) >>> >>> + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son); >>> >>> + } >>> >>> + >>> >>> + if (dom_calculated_here) >>> >>> + free_dominance_info (CDI_DOMINATORS); >>> >>> + } >>> >>> + >>> >>> + /* Do the partition fixup after all necessary blocks have been converted to >>> >>> + cold, so that we only update the region crossings the minimum number of >>> >>> + places, which can require forcing edges to be non fallthru. */ >>> >>> + while (! VEC_empty (basic_block, bbs_to_fix)) >>> >>> + { >>> >>> + bb = VEC_pop (basic_block, bbs_to_fix); >>> >>> + fixup_bb_partition (bb); >>> >>> + } >>> >>> +} >>> >>> + >>> >>> /* Verify the CFG and RTL consistency common for both underlying RTL and >>> >>> cfglayout RTL. >>> >>> >>> >>> @@ -2052,6 +2208,7 @@ rtl_verify_flow_info_1 (void) >>> >>> rtx x; >>> >>> int err = 0; >>> >>> basic_block bb; >>> >>> + bool have_partitions = false; >>> >>> >>> >>> /* Check the general integrity of the basic blocks. */ >>> >>> FOR_EACH_BB_REVERSE (bb) >>> >>> @@ -2169,6 +2326,8 @@ rtl_verify_flow_info_1 (void) >>> >>> >>> >>> if (e->flags & EDGE_ABNORMAL) >>> >>> n_abnormal++; >>> >>> + >>> >>> + have_partitions |= is_crossing; >>> >>> } >>> >>> >>> >>> if (n_eh && !find_reg_note (BB_END (bb), REG_EH_REGION, NULL_RTX)) >>> >>> @@ -2293,6 +2452,40 @@ rtl_verify_flow_info_1 (void) >>> >>> } >>> >>> } >>> >>> >>> >>> + /* If there are partitions, do a sanity check on them: A basic block in >>> >>> + a cold partition cannot dominate a basic block in a hot partition. */ >>> >>> + VEC (basic_block, heap) *bbs_in_cold_partition = NULL; >>> >>> + if (have_partitions && !err) >>> >>> + FOR_EACH_BB (bb) >>> >>> + if ((BB_PARTITION (bb) == BB_COLD_PARTITION)) >>> >>> + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb); >>> >>> + if (! VEC_empty (basic_block, bbs_in_cold_partition)) >>> >>> + { >>> >>> + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); >>> >>> + basic_block son; >>> >>> + >>> >>> + if (dom_calculated_here) >>> >>> + calculate_dominance_info (CDI_DOMINATORS); >>> >>> + >>> >>> + while (! VEC_empty (basic_block, bbs_in_cold_partition)) >>> >>> + { >>> >>> + bb = VEC_pop (basic_block, bbs_in_cold_partition); >>> >>> + if ((BB_PARTITION (bb) != BB_COLD_PARTITION)) >>> >>> + { >>> >>> + error ("non-cold basic block %d dominated " >>> >>> + "by a block in the cold partition", bb->index); >>> >>> + err = 1; >>> >>> + } >>> >>> + for (son = first_dom_son (CDI_DOMINATORS, bb); >>> >>> + son; >>> >>> + son = next_dom_son (CDI_DOMINATORS, son)) >>> >>> + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son); >>> >>> + } >>> >>> + >>> >>> + if (dom_calculated_here) >>> >>> + free_dominance_info (CDI_DOMINATORS); >>> >>> + } >>> >>> + >>> >>> /* Clean up. */ >>> >>> return err; >>> >>> } >>> >>> @@ -2965,14 +3158,41 @@ record_effective_endpoints (void) >>> >>> else >>> >>> cfg_layout_function_header = NULL_RTX; >>> >>> >>> >>> + had_sec_boundary_notes = false; >>> >>> + >>> >>> next_insn = get_insns (); >>> >>> FOR_EACH_BB (bb) >>> >>> { >>> >>> rtx end; >>> >>> >>> >>> if (PREV_INSN (BB_HEAD (bb)) && next_insn != BB_HEAD (bb)) >>> >>> - BB_HEADER (bb) = unlink_insn_chain (next_insn, >>> >>> - PREV_INSN (BB_HEAD (bb))); >>> >>> + { >>> >>> + /* Rather than try to keep section boundary notes incrementally >>> >>> + up-to-date through cfg layout optimizations, simply remove them >>> >>> + and flag that they should be re-inserted when exiting >>> >>> + cfg layout mode. */ >>> >>> + rtx check_insn = next_insn; >>> >>> + while (check_insn) >>> >>> + { >>> >>> + if (NOTE_P (check_insn) >>> >>> + && NOTE_KIND (check_insn) == NOTE_INSN_SWITCH_TEXT_SECTIONS) >>> >>> + { >>> >>> + had_sec_boundary_notes |= true; >>> >>> + /* Remove note from chain. Grab new next_insn first. */ >>> >>> + if (next_insn == check_insn) >>> >>> + next_insn = NEXT_INSN (check_insn); >>> >>> + /* Delete note. */ >>> >>> + delete_insn (check_insn); >>> >>> + /* There will only be one. */ >>> >>> + break; >>> >>> + } >>> >>> + check_insn = NEXT_INSN (check_insn); >>> >>> + } >>> >>> + /* If we still have header instructions left after above loop. */ >>> >>> + if (next_insn != BB_HEAD (bb)) >>> >>> + BB_HEADER (bb) = unlink_insn_chain (next_insn, >>> >>> + PREV_INSN (BB_HEAD (bb))); >>> >>> + } >>> >>> end = skip_insns_after_block (bb); >>> >>> if (NEXT_INSN (BB_END (bb)) && BB_END (bb) != end) >>> >>> BB_FOOTER (bb) = unlink_insn_chain (NEXT_INSN (BB_END (bb)), end); >>> >>> @@ -3000,7 +3220,7 @@ outof_cfg_layout_mode (void) >>> >>> if (bb->next_bb != EXIT_BLOCK_PTR) >>> >>> bb->aux = bb->next_bb; >>> >>> >>> >>> - cfg_layout_finalize (); >>> >>> + cfg_layout_finalize (false); >>> >>> >>> >>> return 0; >>> >>> } >>> >>> @@ -3120,10 +3340,13 @@ relink_block_chain (bool stay_in_cfglayout_mode) >>> >>> } >>> >>> >>> >>> >>> >>> -/* Given a reorder chain, rearrange the code to match. */ >>> >>> +/* Given a reorder chain, rearrange the code to match. If >>> >>> + this is called when we will FINALIZE_REORDER_BLOCKS, or when >>> >>> + section boundary notes were removed on entry to cfg layout >>> >>> + mode, insert section boundary notes here. */ >>> >>> >>> >>> static void >>> >>> -fixup_reorder_chain (void) >>> >>> +fixup_reorder_chain (bool finalize_reorder_blocks) >>> >>> { >>> >>> basic_block bb; >>> >>> rtx insn = NULL; >>> >>> @@ -3150,7 +3373,7 @@ static void >>> >>> PREV_INSN (BB_HEADER (bb)) = insn; >>> >>> insn = BB_HEADER (bb); >>> >>> while (NEXT_INSN (insn)) >>> >>> - insn = NEXT_INSN (insn); >>> >>> + insn = NEXT_INSN (insn); >>> >>> } >>> >>> if (insn) >>> >>> NEXT_INSN (insn) = BB_HEAD (bb); >>> >>> @@ -3175,6 +3398,11 @@ static void >>> >>> insn = NEXT_INSN (insn); >>> >>> >>> >>> set_last_insn (insn); >>> >>> + >>> >>> + /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes. */ >>> >>> + if (had_sec_boundary_notes || finalize_reorder_blocks) >>> >>> + insert_section_boundary_note (); >>> >>> + >>> >>> #ifdef ENABLE_CHECKING >>> >>> verify_insn_chain (); >>> >>> #endif >>> >>> @@ -3187,7 +3415,7 @@ static void >>> >>> edge e_fall, e_taken, e; >>> >>> rtx bb_end_insn; >>> >>> rtx ret_label = NULL_RTX; >>> >>> - basic_block nb, src_bb; >>> >>> + basic_block nb; >>> >>> edge_iterator ei; >>> >>> >>> >>> if (EDGE_COUNT (bb->succs) == 0) >>> >>> @@ -3322,7 +3550,6 @@ static void >>> >>> /* We got here if we need to add a new jump insn. >>> >>> Note force_nonfallthru can delete E_FALL and thus we have to >>> >>> save E_FALL->src prior to the call to force_nonfallthru. */ >>> >>> - src_bb = e_fall->src; >>> >>> nb = force_nonfallthru_and_redirect (e_fall, e_fall->dest, ret_label); >>> >>> if (nb) >>> >>> { >>> >>> @@ -3330,17 +3557,6 @@ static void >>> >>> bb->aux = nb; >>> >>> /* Don't process this new block. */ >>> >>> bb = nb; >>> >>> - >>> >>> - /* Make sure new bb is tagged for correct section (same as >>> >>> - fall-thru source, since you cannot fall-thru across >>> >>> - section boundaries). */ >>> >>> - BB_COPY_PARTITION (src_bb, single_pred (bb)); >>> >>> - if (flag_reorder_blocks_and_partition >>> >>> - && targetm_common.have_named_sections >>> >>> - && JUMP_P (BB_END (bb)) >>> >>> - && !any_condjump_p (BB_END (bb)) >>> >>> - && (EDGE_SUCC (bb, 0)->flags & EDGE_CROSSING)) >>> >>> - add_reg_note (BB_END (bb), REG_CROSSING_JUMP, NULL_RTX); >>> >>> } >>> >>> } >>> >>> >>> >>> @@ -3644,10 +3860,11 @@ duplicate_insn_chain (rtx from, rtx to) >>> >>> case NOTE_INSN_FUNCTION_BEG: >>> >>> /* There is always just single entry to function. */ >>> >>> case NOTE_INSN_BASIC_BLOCK: >>> >>> + /* We should only switch text sections once. */ >>> >>> + case NOTE_INSN_SWITCH_TEXT_SECTIONS: >>> >>> break; >>> >>> >>> >>> case NOTE_INSN_EPILOGUE_BEG: >>> >>> - case NOTE_INSN_SWITCH_TEXT_SECTIONS: >>> >>> emit_note_copy (insn); >>> >>> break; >>> >>> >>> >>> @@ -3759,10 +3976,13 @@ break_superblocks (void) >>> >>> } >>> >>> >>> >>> /* Finalize the changes: reorder insn list according to the sequence specified >>> >>> - by aux pointers, enter compensation code, rebuild scope forest. */ >>> >>> + by aux pointers, enter compensation code, rebuild scope forest. If >>> >>> + this is called when we will FINALIZE_REORDER_BLOCKS, indicate that >>> >>> + to fixup_reorder_chain so that it can insert the proper switch text >>> >>> + section notes. */ >>> >>> >>> >>> void >>> >>> -cfg_layout_finalize (void) >>> >>> +cfg_layout_finalize (bool finalize_reorder_blocks) >>> >>> { >>> >>> #ifdef ENABLE_CHECKING >>> >>> verify_flow_info (); >>> >>> @@ -3775,7 +3995,7 @@ void >>> >>> #endif >>> >>> ) >>> >>> fixup_fallthru_exit_predecessor (); >>> >>> - fixup_reorder_chain (); >>> >>> + fixup_reorder_chain (finalize_reorder_blocks); >>> >>> >>> >>> rebuild_jump_labels (get_insns ()); >>> >>> delete_dead_jumptables (); >>> >>> @@ -4454,8 +4674,7 @@ rtl_can_remove_branch_p (const_edge e) >>> >>> if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH)) >>> >>> return false; >>> >>> >>> >>> - if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX) >>> >>> - || BB_PARTITION (src) != BB_PARTITION (target)) >>> >>> + if (BB_PARTITION (src) != BB_PARTITION (target)) >>> >>> return false; >>> >>> >>> >>> if (!onlyjump_p (insn) >>> >>> >>> >>> -- >>> >>> This patch is available for review at http://codereview.appspot.com/6823047 >>> >> >>> >> >>> >> >>> >> -- >>> >> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413 >>> >>> >>> >>> -- >>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413 > > > > -- > Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Sign in to reply to this message.
Is this with the same target compiler and options used in PR55121? I will try to reproduce the compile-time failures with arm and those options if so. I haven't seen those with spec2006 linux x86_64. I'm not sure how to test the runtime behavior though. Thanks, Teresa On Wed, Nov 28, 2012 at 7:48 AM, Christophe Lyon <christophe.lyon@linaro.org> wrote: > I have updated my trunk checkout, and I can confirm that eval.c now > compiles with your patch (and the other 4 patches I added to PR55121). > > Now, when looking at the whole Spec2k results: > - vpr passes now (used to fail) > - gcc, parser, perlbmk bzip2 and twolf no longer build: they all fail > with the same error from gas: > can't resolve `.text.unlikely' {.text.unlikely section} - `.LBB171' > {.text section} > - gap still does not build (same error as above) > > I haven't looked in detail, so I may be missing an obvious patch here. > > And I still observe runtime mis-behaviour on crafty, galgel, facerec and fma3d. > > Thanks > Christophe. > > > On 26 November 2012 21:52, Teresa Johnson <tejohnson@google.com> wrote: >> Sorry, I don't know what happened there. Patch is attached. >> Thanks, >> Teresa >> >> On Mon, Nov 26, 2012 at 12:42 PM, Jack Howarth <howarth@bromo.med.uc.edu> wrote: >>> On Mon, Nov 26, 2012 at 12:19:55PM -0800, Teresa Johnson wrote: >>>> Are you sure you have all my changes applied? I applied the 4 patches >>>> attached to PR55121 into my trunk checkout that has my fixes, and to a >>>> pristine trunk checkout. I configured and built both for >>>> --target=arm-none-linux-gnueabi, and built using your options, .i file >>>> and gcda file. I can reproduce the failure using the pristine trunk >>>> with your patches but not with my fixed trunk + your patches. (I just >>>> updated to head to pickup recent changes and get the same result. The >>>> vec changes required some manual changes to the patch, which I will >>>> resend shortly.) >>> >>> Teresa, >>> Your mailer seems to have corrupted the posted patch with stray >>> =3D characters and line breaks. Can you repost a copy as an attachment >>> to the list? >>> Jack >>> >>>> >>>> Without my fixes: >>>> >>>> $ ~/extra/gcc_trunk_3_arm-eabi/gcc/cc1 -fpreproce >>>> ssed eval.i -quiet -dumpbase eval.c -march=armv7-a -mtune=cortex-a9 >>>> -mthumb -mfpu=neon -mvectorize-with-neon-quad -mfloat-abi=softfp >>>> -mtls-dialect=gnu -auxbase-strip eval.o -g -O3 -version -fprofile-use >>>> -fno-common -o eval.s -freorder-blocks-and-partition >>>> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi) >>>> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version >>>> 2.4.2-p1, MPC version 0.8.1 >>>> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 >>>> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi) >>>> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version >>>> 2.4.2-p1, MPC version 0.8.1 >>>> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 >>>> Compiler executable checksum: d19cc60a2f07de08237a8488bb35cd1a >>>> eval.c: In function ‘Ge’: >>>> eval.c:792:1: internal compiler error: in df_compact_blocks, at df-core.c:1560 >>>> } >>>> ^ >>>> 0x622f71 df_compact_blocks() >>>> ../../gcc_trunk_3/gcc/df-core.c:1560 >>>> 0x5cfcb5 compact_blocks() >>>> ../../gcc_trunk_3/gcc/cfg.c:162 >>>> 0xc9dce0 reorder_basic_blocks >>>> ../../gcc_trunk_3/gcc/bb-reorder.c:2154 >>>> 0xc9dce0 rest_of_handle_reorder_blocks >>>> ../../gcc_trunk_3/gcc/bb-reorder.c:2219 >>>> Please submit a full bug report, >>>> with preprocessed source if appropriate. >>>> Please include the complete backtrace with any bug report. >>>> See <http://gcc.gnu.org/bugs.html> for instructions. >>>> >>>> >>>> With my fixes: >>>> >>>> $ ~/extra/gcc_trunk_4_arm-eabi/gcc/cc1 -fpreproce >>>> ssed eval.i -quiet -dumpbase eval.c -march=armv7-a -mtune=cortex-a9 >>>> -mthumb -mfpu=neon -mvectorize-with-neon-quad -mfloat-abi=softfp >>>> -mtls-dialect=gnu -auxbase-strip eval.o -g -O3 -version -fprofile-use >>>> -fno-common -o eval.s -freorder-blocks-and-partition >>>> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi) >>>> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version >>>> 2.4.2-p1, MPC version 0.8.1 >>>> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 >>>> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi) >>>> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version >>>> 2.4.2-p1, MPC version 0.8.1 >>>> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 >>>> Compiler executable checksum: 45b468efa7c981f9afb44c4dac2424f3 >>>> >>>> >>>> Thanks, >>>> Teresa >>>> >>>> On Mon, Nov 26, 2012 at 8:25 AM, Christophe Lyon >>>> <christophe.lyon@linaro.org> wrote: >>>> > Hi, >>>> > >>>> > I have tested your patch on Spec2000 on ARM, and I can still see >>>> > several failures caused by: >>>> > "error: fallthru edge crosses section boundary", including the case >>>> > described in PR55121. >>>> > >>>> > On 26 November 2012 16:55, Teresa Johnson <tejohnson@google.com> wrote: >>>> >> Ping. >>>> >> Teresa >>>> >> >>>> >> On Thu, Nov 15, 2012 at 12:10 PM, Teresa Johnson <tejohnson@google.com> wrote: >>>> >>> Revised patch that fixes failures encountered when enabling >>>> >>> -freorder-blocks-and-partition, including the failure reported in PR 53743. >>>> >>> >>>> >>> This includes new verification code to ensure no cold blocks dominate hot >>>> >>> blocks contributed by Steven Bosscher. >>>> >>> >>>> >>> I attempted to make the handling of partition updates through the optimization >>>> >>> passes much more consistent, removing a number of partial fixes in the code >>>> >>> stream in the process. The code to fixup partitions (including the BB_PARTITION >>>> >>> assignement, region crossing jump notes, and switch text section notes) is >>>> >>> now handled in a few centralized locations. For example, inside >>>> >>> rtl_redirect_edge_and_branch and force_nonfallthru_and_redirect, so that callers >>>> >>> don't need to attempt the fixup themselves. >>>> >>> >>>> >>> For optimization passes that make adjustments to the cfg while in cfg layout >>>> >>> mode that are not easy to fix up incrementally, the new routine >>>> >>> fixup_partitions handles the cleanup globally. This does require calculation >>>> >>> of the dominance relation, however, as far as I can tell the routines which >>>> >>> now invoke this global fixup (try_optimize_cfg and commit_edge_insertions) >>>> >>> are invoked typically once (or a small number of times in the case of >>>> >>> try_optimize_cfg) per optimization pass. Additionally, I compared the >>>> >>> -ftime-report output for some large fdo compilations and saw only minimal >>>> >>> increases in the dominance computation times, which were only a tiny percent >>>> >>> of the overall compile time. >>>> >>> >>>> >>> Additionally, I added a flag to the rtl_data structure to indicate whether >>>> >>> any partitioning was actually performed, so that optimizations which were >>>> >>> conservatively disabled whenever the flag_reorder_blocks_and_partition >>>> >>> is enabled (e.g. try_crossjump_to_edge, part of connect_traces) can be less >>>> >>> conservative for functions where no partitions were formed (e.g. they are >>>> >>> completely hot). >>>> >>> >>>> >>> Bootstrapped and tested on x86_64-unknown-linux-gnu. Also tested with SPEC2006 int >>>> >>> benchmarks and internal google benchmarks using profile feedback and >>>> >>> -freorder-blocks-and-partition to get more coverage. Ok for trunk? >>>> >>> >>>> >>> Thanks, >>>> >>> Teresa >>>> >>> >>>> >>> 2012-11-14 Teresa Johnson <tejohnson@google.com> >>>> >>> Steven Bosscher <steven@gcc.gnu.org> >>>> >>> >>>> >>> * cfghooks.h (cfg_layout_finalize): New parameter. >>>> >>> * modulo-sched.c (rest_of_handle_sms): New cfg_layout_finalize >>>> >>> parameter. >>>> >>> * ifcvt.c (find_if_case_1): Replace BB_COPY_PARTITION with assert >>>> >>> as this is now done by redirect_edge_and_branch_force. >>>> >>> * function.c (thread_prologue_and_epilogue_insns): Insert new bb after >>>> >>> barriers, new cfg_layout_finalize parameter, and don't store exit >>>> >>> predecessor BB until after it is potentially split. >>>> >>> * function.h (struct rtl_data): New flag has_bb_partition. >>>> >>> * hw-doloop.c (reorder_loops): New cfg_layout_finalize parameter. >>>> >>> * cfgcleanup.c (try_crossjump_to_edge): Only skip optimization if >>>> >>> any blocks in function actually partitioned. >>>> >>> (try_optimize_cfg): If cfg changed, invoke fixup_partitions to clean >>>> >>> up partitioning. >>>> >>> * bb-reorder.c (connect_traces): Only look for partitions and skip >>>> >>> block copying if any blocks in function actually partitioned. >>>> >>> (emit_barrier_after_bb): Handle insertion in non-cfglayout mode. >>>> >>> (find_rarely_executed_basic_blocks_and_crossing_edges): Ensure >>>> >>> that no cold blocks dominate a hot block. >>>> >>> (fix_up_fall_thru_edges): Replace BB_COPY_PARTITION with assert >>>> >>> as this is now done by force_nonfallthru_and_redirect. >>>> >>> (add_reg_crossing_jump_notes): Handle the fact that some jumps may >>>> >>> already be marked with region crossing note. >>>> >>> (reorder_basic_blocks): Only need to verify partitions if any >>>> >>> blocks in function actually partitioned. >>>> >>> (insert_section_boundary_note): Only need to insert note if any >>>> >>> blocks in function actually partitioned. >>>> >>> (rest_of_handle_reorder_blocks): New cfg_layout_finalize >>>> >>> parameter, and remove call to insert_section_boundary_note as this >>>> >>> is now called via cfg_layout_finalize/fixup_reorder_chain. >>>> >>> (duplicate_computed_gotos): New cfg_layout_finalize >>>> >>> parameter. >>>> >>> (partition_hot_cold_basic_blocks): Set flag indicating function >>>> >>> has bb partitions. >>>> >>> * bb-reorder.h: Declare insert_section_boundary_note and >>>> >>> emit_barrier_after_bb, which are no longer static. >>>> >>> * basic-block.h: Declare new function fixup_partitions. >>>> >>> * cfgrtl.c (try_redirect_by_replacing_jump): Remove unnecessary >>>> >>> check for region crossing note. >>>> >>> (fixup_partition_crossing): New function. >>>> >>> (fixup_bb_partition): Ditto. >>>> >>> (rtl_redirect_edge_and_branch): Fixup partition boundaries. >>>> >>> (force_nonfallthru_and_redirect): Fixup partition boundaries, >>>> >>> remove old code that tried to do this. Emit barrier correctly >>>> >>> when we are in cfglayout mode. >>>> >>> (rtl_split_edge): Correctly fixup partition boundaries. >>>> >>> (commit_one_edge_insertion): Remove old code that tried to >>>> >>> fixup region crossing edge since this is now handled in >>>> >>> split_block, and set up insertion point correctly since >>>> >>> block may now end in a jump. >>>> >>> (commit_edge_insertions): Invoke fixup_partitions to sanitize partition >>>> >>> boundaries after optimizations that modify cfg and before trying to >>>> >>> verify the flow info. >>>> >>> (fixup_partitions): New function. >>>> >>> (rtl_verify_flow_info_1): Add verification that no cold bbs dominate >>>> >>> hot bbs. >>>> >>> (record_effective_endpoints): Remove region-crossing notes and set flag >>>> >>> indicating that they need to be reinserted on exit from cfglayout mode. >>>> >>> (outof_cfg_layout_mode): New cfg_layout_finalize parameter. >>>> >>> (fixup_reorder_chain): Call insert_section_boundary_note if necessary. >>>> >>> Remove old code that attempted to fixup region crossing note as >>>> >>> this is now handled in force_nonfallthru_and_redirect. >>>> >>> (duplicate_insn_chain): Don't duplicate switch section notes. >>>> >>> (cfg_layout_finalize): Pass new parameter to fixup_reorder_chain. >>>> >>> (rtl_can_remove_branch_p): Remove unnecessary check for region crossing >>>> >>> note. >>>> >>> >>>> >>> Index: cfghooks.h >>>> >>> =================================================================== >>>> >>> --- cfghooks.h (revision 193376) >>>> >>> +++ cfghooks.h (working copy) >>>> >>> @@ -204,7 +204,7 @@ extern void copy_bbs (basic_block *, unsigned, bas >>>> >>> void account_profile_record (struct profile_record *, int); >>>> >>> >>>> >>> extern void cfg_layout_initialize (unsigned int); >>>> >>> -extern void cfg_layout_finalize (void); >>>> >>> +extern void cfg_layout_finalize (bool); >>>> >>> >>>> >>> /* Hooks containers. */ >>>> >>> extern struct cfg_hooks gimple_cfg_hooks; >>>> >>> @@ -218,4 +218,3 @@ extern void cfg_layout_rtl_register_cfg_hooks (voi >>>> >>> extern void gimple_register_cfg_hooks (void); >>>> >>> extern struct cfg_hooks get_cfg_hooks (void); >>>> >>> extern void set_cfg_hooks (struct cfg_hooks); >>>> >>> - >>>> >>> Index: modulo-sched.c >>>> >>> =================================================================== >>>> >>> --- modulo-sched.c (revision 193376) >>>> >>> +++ modulo-sched.c (working copy) >>>> >>> @@ -3354,7 +3354,7 @@ rest_of_handle_sms (void) >>>> >>> if (bb->next_bb != EXIT_BLOCK_PTR) >>>> >>> bb->aux = bb->next_bb; >>>> >>> free_dominance_info (CDI_DOMINATORS); >>>> >>> - cfg_layout_finalize (); >>>> >>> + cfg_layout_finalize (false); >>>> >>> #endif /* INSN_SCHEDULING */ >>>> >>> return 0; >>>> >>> } >>>> >>> Index: ifcvt.c >>>> >>> =================================================================== >>>> >>> --- ifcvt.c (revision 193376) >>>> >>> +++ ifcvt.c (working copy) >>>> >>> @@ -3900,10 +3900,9 @@ find_if_case_1 (basic_block test_bb, edge then_edg >>>> >>> if (new_bb) >>>> >>> { >>>> >>> df_bb_replace (then_bb_index, new_bb); >>>> >>> - /* Since the fallthru edge was redirected from test_bb to new_bb, >>>> >>> - we need to ensure that new_bb is in the same partition as >>>> >>> - test bb (you can not fall through across section boundaries). */ >>>> >>> - BB_COPY_PARTITION (new_bb, test_bb); >>>> >>> + /* This should have been done above via force_nonfallthru_and_redirect >>>> >>> + (possibly called from redirect_edge_and_branch_force). */ >>>> >>> + gcc_assert (BB_PARTITION (new_bb) == BB_PARTITION (test_bb)); >>>> >>> } >>>> >>> >>>> >>> num_true_changes++; >>>> >>> Index: function.c >>>> >>> =================================================================== >>>> >>> --- function.c (revision 193376) >>>> >>> +++ function.c (working copy) >>>> >>> @@ -6249,8 +6249,10 @@ thread_prologue_and_epilogue_insns (void) >>>> >>> break; >>>> >>> if (e) >>>> >>> { >>>> >>> - copy_bb = create_basic_block (NEXT_INSN (BB_END (e->src)), >>>> >>> - NULL_RTX, e->src); >>>> >>> + /* Make sure we insert after any barriers. */ >>>> >>> + rtx end = get_last_bb_insn (e->src); >>>> >>> + copy_bb = create_basic_block (NEXT_INSN (end), >>>> >>> + NULL_RTX, e->src); >>>> >>> BB_COPY_PARTITION (copy_bb, e->src); >>>> >>> } >>>> >>> else >>>> >>> @@ -6475,7 +6477,7 @@ thread_prologue_and_epilogue_insns (void) >>>> >>> if (cur_bb->index >= NUM_FIXED_BLOCKS >>>> >>> && cur_bb->next_bb->index >= NUM_FIXED_BLOCKS) >>>> >>> cur_bb->aux = cur_bb->next_bb; >>>> >>> - cfg_layout_finalize (); >>>> >>> + cfg_layout_finalize (false); >>>> >>> } >>>> >>> >>>> >>> epilogue_done: >>>> >>> @@ -6517,7 +6519,7 @@ epilogue_done: >>>> >>> basic_block simple_return_block_cold = NULL; >>>> >>> edge pending_edge_hot = NULL; >>>> >>> edge pending_edge_cold = NULL; >>>> >>> - basic_block exit_pred = EXIT_BLOCK_PTR->prev_bb; >>>> >>> + basic_block exit_pred; >>>> >>> int i; >>>> >>> >>>> >>> gcc_assert (entry_edge != orig_entry_edge); >>>> >>> @@ -6545,6 +6547,12 @@ epilogue_done: >>>> >>> else >>>> >>> pending_edge_cold = e; >>>> >>> } >>>> >>> + >>>> >>> + /* Save a pointer to the exit's predecessor BB for use in >>>> >>> + inserting new BBs at the end of the function. Do this >>>> >>> + after the call to split_block above which may split >>>> >>> + the original exit pred. */ >>>> >>> + exit_pred = EXIT_BLOCK_PTR->prev_bb; >>>> >>> >>>> >>> FOR_EACH_VEC_ELT (edge, unconverted_simple_returns, i, e) >>>> >>> { >>>> >>> Index: function.h >>>> >>> =================================================================== >>>> >>> --- function.h (revision 193376) >>>> >>> +++ function.h (working copy) >>>> >>> @@ -459,6 +459,11 @@ struct GTY(()) rtl_data { >>>> >>> sched2) and is useful only if the port defines LEAF_REGISTERS. */ >>>> >>> bool uses_only_leaf_regs; >>>> >>> >>>> >>> + /* Nonzero if the function being compiled has undergone hot/cold partitioning >>>> >>> + (under flag_reorder_blocks_and_partition) and has at least one cold >>>> >>> + block. */ >>>> >>> + bool has_bb_partition; >>>> >>> + >>>> >>> /* Like regs_ever_live, but 1 if a reg is set or clobbered from an >>>> >>> asm. Unlike regs_ever_live, elements of this array corresponding >>>> >>> to eliminable regs (like the frame pointer) are set if an asm >>>> >>> Index: hw-doloop.c >>>> >>> =================================================================== >>>> >>> --- hw-doloop.c (revision 193376) >>>> >>> +++ hw-doloop.c (working copy) >>>> >>> @@ -547,7 +547,7 @@ reorder_loops (hwloop_info loops) >>>> >>> else >>>> >>> bb->aux = NULL; >>>> >>> } >>>> >>> - cfg_layout_finalize (); >>>> >>> + cfg_layout_finalize (false); >>>> >>> clear_aux_for_blocks (); >>>> >>> df_analyze (); >>>> >>> } >>>> >>> Index: cfgcleanup.c >>>> >>> =================================================================== >>>> >>> --- cfgcleanup.c (revision 193376) >>>> >>> +++ cfgcleanup.c (working copy) >>>> >>> @@ -1824,7 +1824,7 @@ try_crossjump_to_edge (int mode, edge e1, edge e2, >>>> >>> partition boundaries). See the comments at the top of >>>> >>> bb-reorder.c:partition_hot_cold_basic_blocks for complete details. */ >>>> >>> >>>> >>> - if (flag_reorder_blocks_and_partition && reload_completed) >>>> >>> + if (crtl->has_bb_partition && reload_completed) >>>> >>> return false; >>>> >>> >>>> >>> /* Search backward through forwarder blocks. We don't need to worry >>>> >>> @@ -2767,10 +2767,21 @@ try_optimize_cfg (int mode) >>>> >>> df_analyze (); >>>> >>> } >>>> >>> >>>> >>> + if (changed) >>>> >>> + { >>>> >>> + /* Edge forwarding in particular can cause hot blocks previously >>>> >>> + reached by both hot and cold blocks to become dominated only >>>> >>> + by cold blocks. This will cause the verification below to fail, >>>> >>> + and lead to now cold code in the hot section. This is not easy >>>> >>> + to detect and fix during edge forwarding, and in some cases >>>> >>> + is only visible after newly unreachable blocks are deleted, >>>> >>> + which will be done in fixup_partitions. */ >>>> >>> + fixup_partitions (); >>>> >>> + >>>> >>> #ifdef ENABLE_CHECKING >>>> >>> - if (changed) >>>> >>> - verify_flow_info (); >>>> >>> + verify_flow_info (); >>>> >>> #endif >>>> >>> + } >>>> >>> >>>> >>> changed_overall |= changed; >>>> >>> first_pass = false; >>>> >>> Index: bb-reorder.c >>>> >>> =================================================================== >>>> >>> --- bb-reorder.c (revision 193376) >>>> >>> +++ bb-reorder.c (working copy) >>>> >>> @@ -1054,7 +1054,7 @@ connect_traces (int n_traces, struct trace *traces >>>> >>> current_partition = BB_PARTITION (traces[0].first); >>>> >>> two_passes = false; >>>> >>> >>>> >>> - if (flag_reorder_blocks_and_partition) >>>> >>> + if (crtl->has_bb_partition) >>>> >>> for (i = 0; i < n_traces && !two_passes; i++) >>>> >>> if (BB_PARTITION (traces[0].first) >>>> >>> != BB_PARTITION (traces[i].first)) >>>> >>> @@ -1263,7 +1263,7 @@ connect_traces (int n_traces, struct trace *traces >>>> >>> } >>>> >>> } >>>> >>> >>>> >>> - if (flag_reorder_blocks_and_partition) >>>> >>> + if (crtl->has_bb_partition) >>>> >>> try_copy = false; >>>> >>> >>>> >>> /* Copy tiny blocks always; copy larger blocks only when the >>>> >>> @@ -1381,13 +1381,14 @@ get_uncond_jump_length (void) >>>> >>> return length; >>>> >>> } >>>> >>> >>>> >>> -/* Emit a barrier into the footer of BB. */ >>>> >>> +/* Emit a barrier after BB, into the footer if we are in CFGLAYOUT mode. */ >>>> >>> >>>> >>> -static void >>>> >>> +void >>>> >>> emit_barrier_after_bb (basic_block bb) >>>> >>> { >>>> >>> rtx barrier = emit_barrier_after (BB_END (bb)); >>>> >>> - BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier); >>>> >>> + if (current_ir_type () == IR_RTL_CFGLAYOUT) >>>> >>> + BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier); >>>> >>> } >>>> >>> >>>> >>> /* The landing pad OLD_LP, in block OLD_BB, has edges from both partitions. >>>> >>> @@ -1463,18 +1464,109 @@ find_rarely_executed_basic_blocks_and_crossing_edg >>>> >>> { >>>> >>> VEC(edge, heap) *crossing_edges = NULL; >>>> >>> basic_block bb; >>>> >>> - edge e; >>>> >>> - edge_iterator ei; >>>> >>> + edge e, e2; >>>> >>> + edge_iterator ei, ei2; >>>> >>> + unsigned int cold_bb_count = 0; >>>> >>> + VEC (basic_block, heap) *bbs_in_hot_partition = NULL; >>>> >>> + VEC (basic_block, heap) *bbs_newly_hot = NULL; >>>> >>> >>>> >>> /* Mark which partition (hot/cold) each basic block belongs in. */ >>>> >>> FOR_EACH_BB (bb) >>>> >>> { >>>> >>> if (probably_never_executed_bb_p (cfun, bb)) >>>> >>> - BB_SET_PARTITION (bb, BB_COLD_PARTITION); >>>> >>> + { >>>> >>> + BB_SET_PARTITION (bb, BB_COLD_PARTITION); >>>> >>> + cold_bb_count++; >>>> >>> + } >>>> >>> else >>>> >>> - BB_SET_PARTITION (bb, BB_HOT_PARTITION); >>>> >>> + { >>>> >>> + BB_SET_PARTITION (bb, BB_HOT_PARTITION); >>>> >>> + VEC_safe_push (basic_block, heap, bbs_in_hot_partition, bb); >>>> >>> + } >>>> >>> } >>>> >>> >>>> >>> + /* Ensure that no cold bbs dominate hot bbs. This could happen as a result of >>>> >>> + several different possibilities. One is that there are edge weight insanities >>>> >>> + due to optimization phases that do not properly update basic block profile >>>> >>> + counts. The second is that the entry of the function may not be hot, because >>>> >>> + it is entered fewer times than the number of profile training runs, but there >>>> >>> + is a loop inside the function that causes blocks within the function to be >>>> >>> + above the threshold for hotness. */ >>>> >>> + if (cold_bb_count) >>>> >>> + { >>>> >>> + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); >>>> >>> + >>>> >>> + if (dom_calculated_here) >>>> >>> + calculate_dominance_info (CDI_DOMINATORS); >>>> >>> + >>>> >>> + /* Keep examining hot bbs until we have either checked them all, or >>>> >>> + re-marked all cold bbs hot. */ >>>> >>> + while (! VEC_empty (basic_block, bbs_in_hot_partition) >>>> >>> + && cold_bb_count) >>>> >>> + { >>>> >>> + basic_block dom_bb; >>>> >>> + >>>> >>> + bb = VEC_pop (basic_block, bbs_in_hot_partition); >>>> >>> + dom_bb = get_immediate_dominator (CDI_DOMINATORS, bb); >>>> >>> + >>>> >>> + /* If bb's immediate dominator is also hot then it is ok. */ >>>> >>> + if (BB_PARTITION (dom_bb) != BB_COLD_PARTITION) >>>> >>> + continue; >>>> >>> + >>>> >>> + /* We have a hot bb with an immediate dominator that is cold. >>>> >>> + The dominator needs to be re-marked to hot. */ >>>> >>> + BB_SET_PARTITION (dom_bb, BB_HOT_PARTITION); >>>> >>> + cold_bb_count--; >>>> >>> + >>>> >>> + /* Now we need to examine newly-hot dom_bb to see if it is also >>>> >>> + dominated by a cold bb. */ >>>> >>> + VEC_safe_push (basic_block, heap, bbs_in_hot_partition, dom_bb); >>>> >>> + >>>> >>> + /* We should also adjust any cold blocks that the newly-hot bb >>>> >>> + feeds and see if it makes sense to re-mark those as hot as >>>> >>> + well. */ >>>> >>> + VEC_safe_push (basic_block, heap, bbs_newly_hot, dom_bb); >>>> >>> + while (! VEC_empty (basic_block, bbs_newly_hot)) >>>> >>> + { >>>> >>> + basic_block new_hot_bb = VEC_pop (basic_block, bbs_newly_hot); >>>> >>> + /* Examine all successors of this newly-hot bb to see if they >>>> >>> + are cold and should be re-marked as hot. */ >>>> >>> + FOR_EACH_EDGE (e, ei, new_hot_bb->succs) >>>> >>> + { >>>> >>> + bool any_cold_preds = false; >>>> >>> + basic_block succ = e->dest; >>>> >>> + if (BB_PARTITION (succ) != BB_COLD_PARTITION) >>>> >>> + continue; >>>> >>> + /* Does this block have any cold predecessors now? */ >>>> >>> + FOR_EACH_EDGE (e2, ei2, succ->preds) >>>> >>> + { >>>> >>> + if (BB_PARTITION (e2->src) == BB_COLD_PARTITION) >>>> >>> + { >>>> >>> + any_cold_preds = true; >>>> >>> + break; >>>> >>> + } >>>> >>> + } >>>> >>> + if (any_cold_preds) >>>> >>> + continue; >>>> >>> + >>>> >>> + /* Here we have a successor of newly-hot bb that is cold >>>> >>> + but no longer has any cold precessessors. Since the original >>>> >>> + assignment of our newly-hot bb was incorrect, this successor's >>>> >>> + assignment as cold is also suspect. Go ahead and re-mark it >>>> >>> + as hot now too. Better heuristics may be in order here. */ >>>> >>> + BB_SET_PARTITION (succ, BB_HOT_PARTITION); >>>> >>> + cold_bb_count--; >>>> >>> + VEC_safe_push (basic_block, heap, bbs_in_hot_partition, succ); >>>> >>> + /* Examine this successor as a newly-hot bb. */ >>>> >>> + VEC_safe_push (basic_block, heap, bbs_newly_hot, succ); >>>> >>> + } >>>> >>> + } >>>> >>> + } >>>> >>> + >>>> >>> + if (dom_calculated_here) >>>> >>> + free_dominance_info (CDI_DOMINATORS); >>>> >>> + } >>>> >>> + >>>> >>> /* The format of .gcc_except_table does not allow landing pads to >>>> >>> be in a different partition as the throw. Fix this by either >>>> >>> moving or duplicating the landing pads. */ >>>> >>> @@ -1766,10 +1858,10 @@ fix_up_fall_thru_edges (void) >>>> >>> new_bb->aux = cur_bb->aux; >>>> >>> cur_bb->aux = new_bb; >>>> >>> >>>> >>> - /* Make sure new fall-through bb is in same >>>> >>> - partition as bb it's falling through from. */ >>>> >>> + /* This is done by force_nonfallthru_and_redirect. */ >>>> >>> + gcc_assert (BB_PARTITION (new_bb) >>>> >>> + == BB_PARTITION (cur_bb)); >>>> >>> >>>> >>> - BB_COPY_PARTITION (new_bb, cur_bb); >>>> >>> single_succ_edge (new_bb)->flags |= EDGE_CROSSING; >>>> >>> } >>>> >>> else >>>> >>> @@ -2067,7 +2159,10 @@ add_reg_crossing_jump_notes (void) >>>> >>> FOR_EACH_BB (bb) >>>> >>> FOR_EACH_EDGE (e, ei, bb->succs) >>>> >>> if ((e->flags & EDGE_CROSSING) >>>> >>> - && JUMP_P (BB_END (e->src))) >>>> >>> + && JUMP_P (BB_END (e->src)) >>>> >>> + /* Some notes were added during fix_up_fall_thru_edges, via >>>> >>> + force_nonfallthru_and_redirect. */ >>>> >>> + && !find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX)) >>>> >>> add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); >>>> >>> } >>>> >>> >>>> >>> @@ -2160,7 +2255,7 @@ reorder_basic_blocks (void) >>>> >>> dump_flow_info (dump_file, dump_flags); >>>> >>> } >>>> >>> >>>> >>> - if (flag_reorder_blocks_and_partition) >>>> >>> + if (crtl->has_bb_partition) >>>> >>> verify_hot_cold_block_grouping (); >>>> >>> } >>>> >>> >>>> >>> @@ -2172,14 +2267,14 @@ reorder_basic_blocks (void) >>>> >>> encountering this note will make the compiler switch between the >>>> >>> hot and cold text sections. */ >>>> >>> >>>> >>> -static void >>>> >>> +void >>>> >>> insert_section_boundary_note (void) >>>> >>> { >>>> >>> basic_block bb; >>>> >>> rtx new_note; >>>> >>> int first_partition = 0; >>>> >>> >>>> >>> - if (!flag_reorder_blocks_and_partition) >>>> >>> + if (!crtl->has_bb_partition) >>>> >>> return; >>>> >>> >>>> >>> FOR_EACH_BB (bb) >>>> >>> @@ -2222,10 +2317,8 @@ rest_of_handle_reorder_blocks (void) >>>> >>> FOR_EACH_BB (bb) >>>> >>> if (bb->next_bb != EXIT_BLOCK_PTR) >>>> >>> bb->aux = bb->next_bb; >>>> >>> - cfg_layout_finalize (); >>>> >>> + cfg_layout_finalize (true); >>>> >>> >>>> >>> - /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes. */ >>>> >>> - insert_section_boundary_note (); >>>> >>> return 0; >>>> >>> } >>>> >>> >>>> >>> @@ -2366,7 +2459,7 @@ duplicate_computed_gotos (void) >>>> >>> } >>>> >>> >>>> >>> done: >>>> >>> - cfg_layout_finalize (); >>>> >>> + cfg_layout_finalize (false); >>>> >>> >>>> >>> BITMAP_FREE (candidates); >>>> >>> return 0; >>>> >>> @@ -2511,6 +2604,8 @@ partition_hot_cold_basic_blocks (void) >>>> >>> if (crossing_edges == NULL) >>>> >>> return 0; >>>> >>> >>>> >>> + crtl->has_bb_partition = true; >>>> >>> + >>>> >>> /* Make sure the source of any crossing edge ends in a jump and the >>>> >>> destination of any crossing edge has a label. */ >>>> >>> add_labels_and_missing_jumps (crossing_edges); >>>> >>> Index: bb-reorder.h >>>> >>> =================================================================== >>>> >>> --- bb-reorder.h (revision 193376) >>>> >>> +++ bb-reorder.h (working copy) >>>> >>> @@ -36,4 +36,8 @@ extern struct target_bb_reorder *this_target_bb_re >>>> >>> >>>> >>> extern int get_uncond_jump_length (void); >>>> >>> >>>> >>> +extern void insert_section_boundary_note (void); >>>> >>> + >>>> >>> +extern void emit_barrier_after_bb (basic_block bb); >>>> >>> + >>>> >>> #endif >>>> >>> Index: basic-block.h >>>> >>> =================================================================== >>>> >>> --- basic-block.h (revision 193376) >>>> >>> +++ basic-block.h (working copy) >>>> >>> @@ -806,6 +806,7 @@ extern basic_block force_nonfallthru_and_redirect >>>> >>> extern bool contains_no_active_insn_p (const_basic_block); >>>> >>> extern bool forwarder_block_p (const_basic_block); >>>> >>> extern bool can_fallthru (basic_block, basic_block); >>>> >>> +extern void fixup_partitions (void); >>>> >>> >>>> >>> /* In cfgbuild.c. */ >>>> >>> extern void find_many_sub_basic_blocks (sbitmap); >>>> >>> Index: cfgrtl.c >>>> >>> =================================================================== >>>> >>> --- cfgrtl.c (revision 193376) >>>> >>> +++ cfgrtl.c (working copy) >>>> >>> @@ -46,6 +46,7 @@ along with GCC; see the file COPYING3. If not see >>>> >>> #include "tree.h" >>>> >>> #include "hard-reg-set.h" >>>> >>> #include "basic-block.h" >>>> >>> +#include "bb-reorder.h" >>>> >>> #include "regs.h" >>>> >>> #include "flags.h" >>>> >>> #include "function.h" >>>> >>> @@ -67,11 +68,12 @@ along with GCC; see the file COPYING3. If not see >>>> >>> Only applicable if the CFG is in cfglayout mode. */ >>>> >>> static GTY(()) rtx cfg_layout_function_footer; >>>> >>> static GTY(()) rtx cfg_layout_function_header; >>>> >>> +static bool had_sec_boundary_notes; >>>> >>> >>>> >>> static rtx skip_insns_after_block (basic_block); >>>> >>> static void record_effective_endpoints (void); >>>> >>> static rtx label_for_bb (basic_block); >>>> >>> -static void fixup_reorder_chain (void); >>>> >>> +static void fixup_reorder_chain (bool finalize_reorder_blocks); >>>> >>> >>>> >>> void verify_insn_chain (void); >>>> >>> static void fixup_fallthru_exit_predecessor (void); >>>> >>> @@ -976,8 +978,7 @@ try_redirect_by_replacing_jump (edge e, basic_bloc >>>> >>> partition boundaries). See the comments at the top of >>>> >>> bb-reorder.c:partition_hot_cold_basic_blocks for complete details. */ >>>> >>> >>>> >>> - if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX) >>>> >>> - || BB_PARTITION (src) != BB_PARTITION (target)) >>>> >>> + if (BB_PARTITION (src) != BB_PARTITION (target)) >>>> >>> return NULL; >>>> >>> >>>> >>> /* We can replace or remove a complex jump only when we have exactly >>>> >>> @@ -1286,6 +1287,71 @@ redirect_branch_edge (edge e, basic_block target) >>>> >>> return e; >>>> >>> } >>>> >>> >>>> >>> +/* Called when edge E has been redirected to a new destination, >>>> >>> + in order to update the region crossing flag on the edge and >>>> >>> + jump. */ >>>> >>> + >>>> >>> +static void >>>> >>> +fixup_partition_crossing (edge e, basic_block target) >>>> >>> +{ >>>> >>> + rtx note; >>>> >>> + >>>> >>> + gcc_assert (e->dest == target); >>>> >>> + >>>> >>> + if (e->src == ENTRY_BLOCK_PTR || target == EXIT_BLOCK_PTR) >>>> >>> + return; >>>> >>> + /* If we redirected an existing edge, it may already be marked >>>> >>> + crossing, even though the new src is missing a reg crossing note. >>>> >>> + But make sure reg crossing note doesn't already exist before >>>> >>> + inserting. */ >>>> >>> + if (BB_PARTITION (e->src) != BB_PARTITION (target)) >>>> >>> + { >>>> >>> + e->flags |= EDGE_CROSSING; >>>> >>> + note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); >>>> >>> + if (JUMP_P (BB_END (e->src)) >>>> >>> + && !note) >>>> >>> + add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); >>>> >>> + } >>>> >>> + else if (BB_PARTITION (e->src) == BB_PARTITION (target)) >>>> >>> + { >>>> >>> + e->flags &= ~EDGE_CROSSING; >>>> >>> + /* Remove the region crossing note from jump at end of >>>> >>> + e->src if it exists. */ >>>> >>> + note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); >>>> >>> + if (note) >>>> >>> + remove_note (BB_END (e->src), note); >>>> >>> + } >>>> >>> +} >>>> >>> + >>>> >>> +/* Called when block BB has been reassigned to a different partition, >>>> >>> + to ensure that the region crossing attributes are updated. */ >>>> >>> + >>>> >>> +static void >>>> >>> +fixup_bb_partition (basic_block bb) >>>> >>> +{ >>>> >>> + edge e; >>>> >>> + edge_iterator ei; >>>> >>> + >>>> >>> + /* Now need to make bb's pred edges non-region crossing. */ >>>> >>> + FOR_EACH_EDGE (e, ei, bb->preds) >>>> >>> + { >>>> >>> + fixup_partition_crossing (e, e->dest); >>>> >>> + } >>>> >>> + >>>> >>> + /* Possibly need to make bb's successor edges region crossing, >>>> >>> + or remove stale region crossing. */ >>>> >>> + FOR_EACH_EDGE (e, ei, bb->succs) >>>> >>> + { >>>> >>> + if ((e->flags & EDGE_FALLTHRU) >>>> >>> + && BB_PARTITION (bb) != BB_PARTITION (e->dest) >>>> >>> + && e->dest != EXIT_BLOCK_PTR) >>>> >>> + /* force_nonfallthru_and_redirect calls fixup_partition_crossing. */ >>>> >>> + force_nonfallthru (e); >>>> >>> + else >>>> >>> + fixup_partition_crossing (e, e->dest); >>>> >>> + } >>>> >>> +} >>>> >>> + >>>> >>> /* Attempt to change code to redirect edge E to TARGET. Don't do that on >>>> >>> expense of adding new instructions or reordering basic blocks. >>>> >>> >>>> >>> @@ -1302,16 +1368,18 @@ rtl_redirect_edge_and_branch (edge e, basic_block >>>> >>> { >>>> >>> edge ret; >>>> >>> basic_block src = e->src; >>>> >>> + basic_block dest = e->dest; >>>> >>> >>>> >>> if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH)) >>>> >>> return NULL; >>>> >>> >>>> >>> - if (e->dest == target) >>>> >>> + if (dest == target) >>>> >>> return e; >>>> >>> >>>> >>> if ((ret = try_redirect_by_replacing_jump (e, target, false)) != NULL) >>>> >>> { >>>> >>> df_set_bb_dirty (src); >>>> >>> + fixup_partition_crossing (ret, target); >>>> >>> return ret; >>>> >>> } >>>> >>> >>>> >>> @@ -1320,6 +1388,7 @@ rtl_redirect_edge_and_branch (edge e, basic_block >>>> >>> return NULL; >>>> >>> >>>> >>> df_set_bb_dirty (src); >>>> >>> + fixup_partition_crossing (ret, target); >>>> >>> return ret; >>>> >>> } >>>> >>> >>>> >>> @@ -1454,18 +1523,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc >>>> >>> /* Make sure new block ends up in correct hot/cold section. */ >>>> >>> >>>> >>> BB_COPY_PARTITION (jump_block, e->src); >>>> >>> - if (flag_reorder_blocks_and_partition >>>> >>> - && targetm_common.have_named_sections >>>> >>> - && JUMP_P (BB_END (jump_block)) >>>> >>> - && !any_condjump_p (BB_END (jump_block)) >>>> >>> - && (EDGE_SUCC (jump_block, 0)->flags & EDGE_CROSSING)) >>>> >>> - add_reg_note (BB_END (jump_block), REG_CROSSING_JUMP, NULL_RTX); >>>> >>> >>>> >>> /* Wire edge in. */ >>>> >>> new_edge = make_edge (e->src, jump_block, EDGE_FALLTHRU); >>>> >>> new_edge->probability = probability; >>>> >>> new_edge->count = count; >>>> >>> >>>> >>> + /* If e->src was previously region crossing, it no longer is >>>> >>> + and the reg crossing note should be removed. */ >>>> >>> + fixup_partition_crossing (new_edge, jump_block); >>>> >>> + >>>> >>> /* Redirect old edge. */ >>>> >>> redirect_edge_pred (e, jump_block); >>>> >>> e->probability = REG_BR_PROB_BASE; >>>> >>> @@ -1521,13 +1588,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc >>>> >>> LABEL_NUSES (label)++; >>>> >>> } >>>> >>> >>>> >>> - emit_barrier_after (BB_END (jump_block)); >>>> >>> + /* We might be in cfg layout mode, and if so, the following routine will >>>> >>> + insert the barrier correctly. */ >>>> >>> + emit_barrier_after_bb (jump_block); >>>> >>> redirect_edge_succ_nodup (e, target); >>>> >>> >>>> >>> if (abnormal_edge_flags) >>>> >>> make_edge (src, target, abnormal_edge_flags); >>>> >>> >>>> >>> df_mark_solutions_dirty (); >>>> >>> + fixup_partition_crossing (e, target); >>>> >>> return new_bb; >>>> >>> } >>>> >>> >>>> >>> @@ -1626,7 +1696,7 @@ rtl_move_block_after (basic_block bb ATTRIBUTE_UNU >>>> >>> static basic_block >>>> >>> rtl_split_edge (edge edge_in) >>>> >>> { >>>> >>> - basic_block bb; >>>> >>> + basic_block bb, new_bb; >>>> >>> rtx before; >>>> >>> >>>> >>> /* Abnormal edges cannot be split. */ >>>> >>> @@ -1659,12 +1729,26 @@ rtl_split_edge (edge edge_in) >>>> >>> else >>>> >>> { >>>> >>> bb = create_basic_block (before, NULL, edge_in->dest->prev_bb); >>>> >>> - /* ??? Why not edge_in->dest->prev_bb here? */ >>>> >>> - BB_COPY_PARTITION (bb, edge_in->dest); >>>> >>> + if (edge_in->src == ENTRY_BLOCK_PTR) >>>> >>> + BB_COPY_PARTITION (bb, edge_in->dest); >>>> >>> + else >>>> >>> + /* Put the split bb into the src partition, to avoid creating >>>> >>> + a situation where a cold bb dominates a hot bb, in the case >>>> >>> + where src is cold and dest is hot. The src will dominate >>>> >>> + the new bb (whereas it might not have dominated dest). */ >>>> >>> + BB_COPY_PARTITION (bb, edge_in->src); >>>> >>> } >>>> >>> >>>> >>> make_single_succ_edge (bb, edge_in->dest, EDGE_FALLTHRU); >>>> >>> >>>> >>> + /* Can't allow a region crossing edge to be fallthrough. */ >>>> >>> + if (BB_PARTITION (bb) != BB_PARTITION (edge_in->dest) >>>> >>> + && edge_in->dest != EXIT_BLOCK_PTR) >>>> >>> + { >>>> >>> + new_bb = force_nonfallthru (single_succ_edge (bb)); >>>> >>> + gcc_assert (!new_bb); >>>> >>> + } >>>> >>> + >>>> >>> /* For non-fallthru edges, we must adjust the predecessor's >>>> >>> jump instruction to target our new block. */ >>>> >>> if ((edge_in->flags & EDGE_FALLTHRU) == 0) >>>> >>> @@ -1777,17 +1861,13 @@ commit_one_edge_insertion (edge e) >>>> >>> else >>>> >>> { >>>> >>> bb = split_edge (e); >>>> >>> - after = BB_END (bb); >>>> >>> >>>> >>> - if (flag_reorder_blocks_and_partition >>>> >>> - && targetm_common.have_named_sections >>>> >>> - && e->src != ENTRY_BLOCK_PTR >>>> >>> - && BB_PARTITION (e->src) == BB_COLD_PARTITION >>>> >>> - && !(e->flags & EDGE_CROSSING) >>>> >>> - && JUMP_P (after) >>>> >>> - && !any_condjump_p (after) >>>> >>> - && (single_succ_edge (bb)->flags & EDGE_CROSSING)) >>>> >>> - add_reg_note (after, REG_CROSSING_JUMP, NULL_RTX); >>>> >>> + /* If e crossed a partition boundary, we needed to make bb end in >>>> >>> + a region-crossing jump, even though it was originally fallthru. */ >>>> >>> + if (JUMP_P (BB_END (bb))) >>>> >>> + before = BB_END (bb); >>>> >>> + else >>>> >>> + after = BB_END (bb); >>>> >>> } >>>> >>> >>>> >>> /* Now that we've found the spot, do the insertion. */ >>>> >>> @@ -1827,6 +1907,14 @@ commit_edge_insertions (void) >>>> >>> { >>>> >>> basic_block bb; >>>> >>> >>>> >>> + /* Optimization passes that invoke this routine can cause hot blocks >>>> >>> + previously reached by both hot and cold blocks to become dominated only >>>> >>> + by cold blocks. This will cause the verification below to fail, >>>> >>> + and lead to now cold code in the hot section. In some cases this >>>> >>> + may only be visible after newly unreachable blocks are deleted, >>>> >>> + which will be done by fixup_partitions. */ >>>> >>> + fixup_partitions (); >>>> >>> + >>>> >>> #ifdef ENABLE_CHECKING >>>> >>> verify_flow_info (); >>>> >>> #endif >>>> >>> @@ -2028,7 +2116,75 @@ get_last_bb_insn (basic_block bb) >>>> >>> >>>> >>> return end; >>>> >>> } >>>> >>> - >>>> >>> + >>>> >>> +/* Perform cleanup on the hot/cold bb partitioning after optimization >>>> >>> + passes that modify the cfg. */ >>>> >>> + >>>> >>> +void >>>> >>> +fixup_partitions (void) >>>> >>> +{ >>>> >>> + basic_block bb; >>>> >>> + >>>> >>> + if (!crtl->has_bb_partition) >>>> >>> + return; >>>> >>> + >>>> >>> + /* Delete any blocks that became unreachable and weren't >>>> >>> + already cleaned up, for example during edge forwarding >>>> >>> + and convert_jumps_to_returns. This will expose more >>>> >>> + opportunities for fixing the partition boundaries here. >>>> >>> + Also, the calculation of the dominance graph during verification >>>> >>> + will assert if there are unreachable nodes. */ >>>> >>> + delete_unreachable_blocks (); >>>> >>> + >>>> >>> + /* If there are partitions, do a sanity check on them: A basic block in >>>> >>> + a cold partition cannot dominate a basic block in a hot partition. >>>> >>> + Fixup any that now violate this requirement, as a result of edge >>>> >>> + forwarding and unreachable block deletion. */ >>>> >>> + VEC (basic_block, heap) *bbs_in_cold_partition = NULL; >>>> >>> + VEC (basic_block, heap) *bbs_to_fix = NULL; >>>> >>> + FOR_EACH_BB (bb) >>>> >>> + if ((BB_PARTITION (bb) == BB_COLD_PARTITION)) >>>> >>> + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb); >>>> >>> + if (! VEC_empty (basic_block, bbs_in_cold_partition)) >>>> >>> + { >>>> >>> + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); >>>> >>> + basic_block son; >>>> >>> + >>>> >>> + if (dom_calculated_here) >>>> >>> + calculate_dominance_info (CDI_DOMINATORS); >>>> >>> + >>>> >>> + while (! VEC_empty (basic_block, bbs_in_cold_partition)) >>>> >>> + { >>>> >>> + bb = VEC_pop (basic_block, bbs_in_cold_partition); >>>> >>> + /* If bb is not yet cold (because it was added below as >>>> >>> + a block dominated by a cold bb) then mark it cold here. */ >>>> >>> + if ((BB_PARTITION (bb) != BB_COLD_PARTITION)) >>>> >>> + { >>>> >>> + BB_SET_PARTITION (bb, BB_COLD_PARTITION); >>>> >>> + VEC_safe_push (basic_block, heap, bbs_to_fix, bb); >>>> >>> + } >>>> >>> + /* Any blocks dominated by a block in the cold section >>>> >>> + must also be cold. */ >>>> >>> + for (son = first_dom_son (CDI_DOMINATORS, bb); >>>> >>> + son; >>>> >>> + son = next_dom_son (CDI_DOMINATORS, son)) >>>> >>> + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son); >>>> >>> + } >>>> >>> + >>>> >>> + if (dom_calculated_here) >>>> >>> + free_dominance_info (CDI_DOMINATORS); >>>> >>> + } >>>> >>> + >>>> >>> + /* Do the partition fixup after all necessary blocks have been converted to >>>> >>> + cold, so that we only update the region crossings the minimum number of >>>> >>> + places, which can require forcing edges to be non fallthru. */ >>>> >>> + while (! VEC_empty (basic_block, bbs_to_fix)) >>>> >>> + { >>>> >>> + bb = VEC_pop (basic_block, bbs_to_fix); >>>> >>> + fixup_bb_partition (bb); >>>> >>> + } >>>> >>> +} >>>> >>> + >>>> >>> /* Verify the CFG and RTL consistency common for both underlying RTL and >>>> >>> cfglayout RTL. >>>> >>> >>>> >>> @@ -2052,6 +2208,7 @@ rtl_verify_flow_info_1 (void) >>>> >>> rtx x; >>>> >>> int err = 0; >>>> >>> basic_block bb; >>>> >>> + bool have_partitions = false; >>>> >>> >>>> >>> /* Check the general integrity of the basic blocks. */ >>>> >>> FOR_EACH_BB_REVERSE (bb) >>>> >>> @@ -2169,6 +2326,8 @@ rtl_verify_flow_info_1 (void) >>>> >>> >>>> >>> if (e->flags & EDGE_ABNORMAL) >>>> >>> n_abnormal++; >>>> >>> + >>>> >>> + have_partitions |= is_crossing; >>>> >>> } >>>> >>> >>>> >>> if (n_eh && !find_reg_note (BB_END (bb), REG_EH_REGION, NULL_RTX)) >>>> >>> @@ -2293,6 +2452,40 @@ rtl_verify_flow_info_1 (void) >>>> >>> } >>>> >>> } >>>> >>> >>>> >>> + /* If there are partitions, do a sanity check on them: A basic block in >>>> >>> + a cold partition cannot dominate a basic block in a hot partition. */ >>>> >>> + VEC (basic_block, heap) *bbs_in_cold_partition = NULL; >>>> >>> + if (have_partitions && !err) >>>> >>> + FOR_EACH_BB (bb) >>>> >>> + if ((BB_PARTITION (bb) == BB_COLD_PARTITION)) >>>> >>> + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb); >>>> >>> + if (! VEC_empty (basic_block, bbs_in_cold_partition)) >>>> >>> + { >>>> >>> + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); >>>> >>> + basic_block son; >>>> >>> + >>>> >>> + if (dom_calculated_here) >>>> >>> + calculate_dominance_info (CDI_DOMINATORS); >>>> >>> + >>>> >>> + while (! VEC_empty (basic_block, bbs_in_cold_partition)) >>>> >>> + { >>>> >>> + bb = VEC_pop (basic_block, bbs_in_cold_partition); >>>> >>> + if ((BB_PARTITION (bb) != BB_COLD_PARTITION)) >>>> >>> + { >>>> >>> + error ("non-cold basic block %d dominated " >>>> >>> + "by a block in the cold partition", bb->index); >>>> >>> + err = 1; >>>> >>> + } >>>> >>> + for (son = first_dom_son (CDI_DOMINATORS, bb); >>>> >>> + son; >>>> >>> + son = next_dom_son (CDI_DOMINATORS, son)) >>>> >>> + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son); >>>> >>> + } >>>> >>> + >>>> >>> + if (dom_calculated_here) >>>> >>> + free_dominance_info (CDI_DOMINATORS); >>>> >>> + } >>>> >>> + >>>> >>> /* Clean up. */ >>>> >>> return err; >>>> >>> } >>>> >>> @@ -2965,14 +3158,41 @@ record_effective_endpoints (void) >>>> >>> else >>>> >>> cfg_layout_function_header = NULL_RTX; >>>> >>> >>>> >>> + had_sec_boundary_notes = false; >>>> >>> + >>>> >>> next_insn = get_insns (); >>>> >>> FOR_EACH_BB (bb) >>>> >>> { >>>> >>> rtx end; >>>> >>> >>>> >>> if (PREV_INSN (BB_HEAD (bb)) && next_insn != BB_HEAD (bb)) >>>> >>> - BB_HEADER (bb) = unlink_insn_chain (next_insn, >>>> >>> - PREV_INSN (BB_HEAD (bb))); >>>> >>> + { >>>> >>> + /* Rather than try to keep section boundary notes incrementally >>>> >>> + up-to-date through cfg layout optimizations, simply remove them >>>> >>> + and flag that they should be re-inserted when exiting >>>> >>> + cfg layout mode. */ >>>> >>> + rtx check_insn = next_insn; >>>> >>> + while (check_insn) >>>> >>> + { >>>> >>> + if (NOTE_P (check_insn) >>>> >>> + && NOTE_KIND (check_insn) == NOTE_INSN_SWITCH_TEXT_SECTIONS) >>>> >>> + { >>>> >>> + had_sec_boundary_notes |= true; >>>> >>> + /* Remove note from chain. Grab new next_insn first. */ >>>> >>> + if (next_insn == check_insn) >>>> >>> + next_insn = NEXT_INSN (check_insn); >>>> >>> + /* Delete note. */ >>>> >>> + delete_insn (check_insn); >>>> >>> + /* There will only be one. */ >>>> >>> + break; >>>> >>> + } >>>> >>> + check_insn = NEXT_INSN (check_insn); >>>> >>> + } >>>> >>> + /* If we still have header instructions left after above loop. */ >>>> >>> + if (next_insn != BB_HEAD (bb)) >>>> >>> + BB_HEADER (bb) = unlink_insn_chain (next_insn, >>>> >>> + PREV_INSN (BB_HEAD (bb))); >>>> >>> + } >>>> >>> end = skip_insns_after_block (bb); >>>> >>> if (NEXT_INSN (BB_END (bb)) && BB_END (bb) != end) >>>> >>> BB_FOOTER (bb) = unlink_insn_chain (NEXT_INSN (BB_END (bb)), end); >>>> >>> @@ -3000,7 +3220,7 @@ outof_cfg_layout_mode (void) >>>> >>> if (bb->next_bb != EXIT_BLOCK_PTR) >>>> >>> bb->aux = bb->next_bb; >>>> >>> >>>> >>> - cfg_layout_finalize (); >>>> >>> + cfg_layout_finalize (false); >>>> >>> >>>> >>> return 0; >>>> >>> } >>>> >>> @@ -3120,10 +3340,13 @@ relink_block_chain (bool stay_in_cfglayout_mode) >>>> >>> } >>>> >>> >>>> >>> >>>> >>> -/* Given a reorder chain, rearrange the code to match. */ >>>> >>> +/* Given a reorder chain, rearrange the code to match. If >>>> >>> + this is called when we will FINALIZE_REORDER_BLOCKS, or when >>>> >>> + section boundary notes were removed on entry to cfg layout >>>> >>> + mode, insert section boundary notes here. */ >>>> >>> >>>> >>> static void >>>> >>> -fixup_reorder_chain (void) >>>> >>> +fixup_reorder_chain (bool finalize_reorder_blocks) >>>> >>> { >>>> >>> basic_block bb; >>>> >>> rtx insn = NULL; >>>> >>> @@ -3150,7 +3373,7 @@ static void >>>> >>> PREV_INSN (BB_HEADER (bb)) = insn; >>>> >>> insn = BB_HEADER (bb); >>>> >>> while (NEXT_INSN (insn)) >>>> >>> - insn = NEXT_INSN (insn); >>>> >>> + insn = NEXT_INSN (insn); >>>> >>> } >>>> >>> if (insn) >>>> >>> NEXT_INSN (insn) = BB_HEAD (bb); >>>> >>> @@ -3175,6 +3398,11 @@ static void >>>> >>> insn = NEXT_INSN (insn); >>>> >>> >>>> >>> set_last_insn (insn); >>>> >>> + >>>> >>> + /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes. */ >>>> >>> + if (had_sec_boundary_notes || finalize_reorder_blocks) >>>> >>> + insert_section_boundary_note (); >>>> >>> + >>>> >>> #ifdef ENABLE_CHECKING >>>> >>> verify_insn_chain (); >>>> >>> #endif >>>> >>> @@ -3187,7 +3415,7 @@ static void >>>> >>> edge e_fall, e_taken, e; >>>> >>> rtx bb_end_insn; >>>> >>> rtx ret_label = NULL_RTX; >>>> >>> - basic_block nb, src_bb; >>>> >>> + basic_block nb; >>>> >>> edge_iterator ei; >>>> >>> >>>> >>> if (EDGE_COUNT (bb->succs) == 0) >>>> >>> @@ -3322,7 +3550,6 @@ static void >>>> >>> /* We got here if we need to add a new jump insn. >>>> >>> Note force_nonfallthru can delete E_FALL and thus we have to >>>> >>> save E_FALL->src prior to the call to force_nonfallthru. */ >>>> >>> - src_bb = e_fall->src; >>>> >>> nb = force_nonfallthru_and_redirect (e_fall, e_fall->dest, ret_label); >>>> >>> if (nb) >>>> >>> { >>>> >>> @@ -3330,17 +3557,6 @@ static void >>>> >>> bb->aux = nb; >>>> >>> /* Don't process this new block. */ >>>> >>> bb = nb; >>>> >>> - >>>> >>> - /* Make sure new bb is tagged for correct section (same as >>>> >>> - fall-thru source, since you cannot fall-thru across >>>> >>> - section boundaries). */ >>>> >>> - BB_COPY_PARTITION (src_bb, single_pred (bb)); >>>> >>> - if (flag_reorder_blocks_and_partition >>>> >>> - && targetm_common.have_named_sections >>>> >>> - && JUMP_P (BB_END (bb)) >>>> >>> - && !any_condjump_p (BB_END (bb)) >>>> >>> - && (EDGE_SUCC (bb, 0)->flags & EDGE_CROSSING)) >>>> >>> - add_reg_note (BB_END (bb), REG_CROSSING_JUMP, NULL_RTX); >>>> >>> } >>>> >>> } >>>> >>> >>>> >>> @@ -3644,10 +3860,11 @@ duplicate_insn_chain (rtx from, rtx to) >>>> >>> case NOTE_INSN_FUNCTION_BEG: >>>> >>> /* There is always just single entry to function. */ >>>> >>> case NOTE_INSN_BASIC_BLOCK: >>>> >>> + /* We should only switch text sections once. */ >>>> >>> + case NOTE_INSN_SWITCH_TEXT_SECTIONS: >>>> >>> break; >>>> >>> >>>> >>> case NOTE_INSN_EPILOGUE_BEG: >>>> >>> - case NOTE_INSN_SWITCH_TEXT_SECTIONS: >>>> >>> emit_note_copy (insn); >>>> >>> break; >>>> >>> >>>> >>> @@ -3759,10 +3976,13 @@ break_superblocks (void) >>>> >>> } >>>> >>> >>>> >>> /* Finalize the changes: reorder insn list according to the sequence specified >>>> >>> - by aux pointers, enter compensation code, rebuild scope forest. */ >>>> >>> + by aux pointers, enter compensation code, rebuild scope forest. If >>>> >>> + this is called when we will FINALIZE_REORDER_BLOCKS, indicate that >>>> >>> + to fixup_reorder_chain so that it can insert the proper switch text >>>> >>> + section notes. */ >>>> >>> >>>> >>> void >>>> >>> -cfg_layout_finalize (void) >>>> >>> +cfg_layout_finalize (bool finalize_reorder_blocks) >>>> >>> { >>>> >>> #ifdef ENABLE_CHECKING >>>> >>> verify_flow_info (); >>>> >>> @@ -3775,7 +3995,7 @@ void >>>> >>> #endif >>>> >>> ) >>>> >>> fixup_fallthru_exit_predecessor (); >>>> >>> - fixup_reorder_chain (); >>>> >>> + fixup_reorder_chain (finalize_reorder_blocks); >>>> >>> >>>> >>> rebuild_jump_labels (get_insns ()); >>>> >>> delete_dead_jumptables (); >>>> >>> @@ -4454,8 +4674,7 @@ rtl_can_remove_branch_p (const_edge e) >>>> >>> if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH)) >>>> >>> return false; >>>> >>> >>>> >>> - if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX) >>>> >>> - || BB_PARTITION (src) != BB_PARTITION (target)) >>>> >>> + if (BB_PARTITION (src) != BB_PARTITION (target)) >>>> >>> return false; >>>> >>> >>>> >>> if (!onlyjump_p (insn) >>>> >>> >>>> >>> -- >>>> >>> This patch is available for review at http://codereview.appspot.com/6823047 >>>> >> >>>> >> >>>> >> >>>> >> -- >>>> >> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413 >>>> >>>> >>>> >>>> -- >>>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413 >> >> >> >> -- >> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413 -- Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Sign in to reply to this message.
Yes, I have configured GCC with: --target=arm-none-linux-gnueabi--with-cpu=cortex-a9 --with-fpu=neon --with-float=softfp Thanks, Christophe. On 28 November 2012 16:56, Teresa Johnson <tejohnson@google.com> wrote: > Is this with the same target compiler and options used in PR55121? I > will try to reproduce the compile-time failures with arm and those > options if so. I haven't seen those with spec2006 linux x86_64. I'm > not sure how to test the runtime behavior though. > > Thanks, > Teresa > > On Wed, Nov 28, 2012 at 7:48 AM, Christophe Lyon > <christophe.lyon@linaro.org> wrote: >> I have updated my trunk checkout, and I can confirm that eval.c now >> compiles with your patch (and the other 4 patches I added to PR55121). >> >> Now, when looking at the whole Spec2k results: >> - vpr passes now (used to fail) >> - gcc, parser, perlbmk bzip2 and twolf no longer build: they all fail >> with the same error from gas: >> can't resolve `.text.unlikely' {.text.unlikely section} - `.LBB171' >> {.text section} >> - gap still does not build (same error as above) >> >> I haven't looked in detail, so I may be missing an obvious patch here. >> >> And I still observe runtime mis-behaviour on crafty, galgel, facerec and fma3d. >> >> Thanks >> Christophe. >> >> >> On 26 November 2012 21:52, Teresa Johnson <tejohnson@google.com> wrote: >>> Sorry, I don't know what happened there. Patch is attached. >>> Thanks, >>> Teresa >>> >>> On Mon, Nov 26, 2012 at 12:42 PM, Jack Howarth <howarth@bromo.med.uc.edu> wrote: >>>> On Mon, Nov 26, 2012 at 12:19:55PM -0800, Teresa Johnson wrote: >>>>> Are you sure you have all my changes applied? I applied the 4 patches >>>>> attached to PR55121 into my trunk checkout that has my fixes, and to a >>>>> pristine trunk checkout. I configured and built both for >>>>> --target=arm-none-linux-gnueabi, and built using your options, .i file >>>>> and gcda file. I can reproduce the failure using the pristine trunk >>>>> with your patches but not with my fixed trunk + your patches. (I just >>>>> updated to head to pickup recent changes and get the same result. The >>>>> vec changes required some manual changes to the patch, which I will >>>>> resend shortly.) >>>> >>>> Teresa, >>>> Your mailer seems to have corrupted the posted patch with stray >>>> =3D characters and line breaks. Can you repost a copy as an attachment >>>> to the list? >>>> Jack >>>> >>>>> >>>>> Without my fixes: >>>>> >>>>> $ ~/extra/gcc_trunk_3_arm-eabi/gcc/cc1 -fpreproce >>>>> ssed eval.i -quiet -dumpbase eval.c -march=armv7-a -mtune=cortex-a9 >>>>> -mthumb -mfpu=neon -mvectorize-with-neon-quad -mfloat-abi=softfp >>>>> -mtls-dialect=gnu -auxbase-strip eval.o -g -O3 -version -fprofile-use >>>>> -fno-common -o eval.s -freorder-blocks-and-partition >>>>> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi) >>>>> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version >>>>> 2.4.2-p1, MPC version 0.8.1 >>>>> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 >>>>> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi) >>>>> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version >>>>> 2.4.2-p1, MPC version 0.8.1 >>>>> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 >>>>> Compiler executable checksum: d19cc60a2f07de08237a8488bb35cd1a >>>>> eval.c: In function ‘Ge’: >>>>> eval.c:792:1: internal compiler error: in df_compact_blocks, at df-core.c:1560 >>>>> } >>>>> ^ >>>>> 0x622f71 df_compact_blocks() >>>>> ../../gcc_trunk_3/gcc/df-core.c:1560 >>>>> 0x5cfcb5 compact_blocks() >>>>> ../../gcc_trunk_3/gcc/cfg.c:162 >>>>> 0xc9dce0 reorder_basic_blocks >>>>> ../../gcc_trunk_3/gcc/bb-reorder.c:2154 >>>>> 0xc9dce0 rest_of_handle_reorder_blocks >>>>> ../../gcc_trunk_3/gcc/bb-reorder.c:2219 >>>>> Please submit a full bug report, >>>>> with preprocessed source if appropriate. >>>>> Please include the complete backtrace with any bug report. >>>>> See <http://gcc.gnu.org/bugs.html> for instructions. >>>>> >>>>> >>>>> With my fixes: >>>>> >>>>> $ ~/extra/gcc_trunk_4_arm-eabi/gcc/cc1 -fpreproce >>>>> ssed eval.i -quiet -dumpbase eval.c -march=armv7-a -mtune=cortex-a9 >>>>> -mthumb -mfpu=neon -mvectorize-with-neon-quad -mfloat-abi=softfp >>>>> -mtls-dialect=gnu -auxbase-strip eval.o -g -O3 -version -fprofile-use >>>>> -fno-common -o eval.s -freorder-blocks-and-partition >>>>> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi) >>>>> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version >>>>> 2.4.2-p1, MPC version 0.8.1 >>>>> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 >>>>> GNU C (GCC) version 4.8.0 20121126 (experimental) (arm-none-linux-gnueabi) >>>>> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version >>>>> 2.4.2-p1, MPC version 0.8.1 >>>>> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 >>>>> Compiler executable checksum: 45b468efa7c981f9afb44c4dac2424f3 >>>>> >>>>> >>>>> Thanks, >>>>> Teresa >>>>> >>>>> On Mon, Nov 26, 2012 at 8:25 AM, Christophe Lyon >>>>> <christophe.lyon@linaro.org> wrote: >>>>> > Hi, >>>>> > >>>>> > I have tested your patch on Spec2000 on ARM, and I can still see >>>>> > several failures caused by: >>>>> > "error: fallthru edge crosses section boundary", including the case >>>>> > described in PR55121. >>>>> > >>>>> > On 26 November 2012 16:55, Teresa Johnson <tejohnson@google.com> wrote: >>>>> >> Ping. >>>>> >> Teresa >>>>> >> >>>>> >> On Thu, Nov 15, 2012 at 12:10 PM, Teresa Johnson <tejohnson@google.com> wrote: >>>>> >>> Revised patch that fixes failures encountered when enabling >>>>> >>> -freorder-blocks-and-partition, including the failure reported in PR 53743. >>>>> >>> >>>>> >>> This includes new verification code to ensure no cold blocks dominate hot >>>>> >>> blocks contributed by Steven Bosscher. >>>>> >>> >>>>> >>> I attempted to make the handling of partition updates through the optimization >>>>> >>> passes much more consistent, removing a number of partial fixes in the code >>>>> >>> stream in the process. The code to fixup partitions (including the BB_PARTITION >>>>> >>> assignement, region crossing jump notes, and switch text section notes) is >>>>> >>> now handled in a few centralized locations. For example, inside >>>>> >>> rtl_redirect_edge_and_branch and force_nonfallthru_and_redirect, so that callers >>>>> >>> don't need to attempt the fixup themselves. >>>>> >>> >>>>> >>> For optimization passes that make adjustments to the cfg while in cfg layout >>>>> >>> mode that are not easy to fix up incrementally, the new routine >>>>> >>> fixup_partitions handles the cleanup globally. This does require calculation >>>>> >>> of the dominance relation, however, as far as I can tell the routines which >>>>> >>> now invoke this global fixup (try_optimize_cfg and commit_edge_insertions) >>>>> >>> are invoked typically once (or a small number of times in the case of >>>>> >>> try_optimize_cfg) per optimization pass. Additionally, I compared the >>>>> >>> -ftime-report output for some large fdo compilations and saw only minimal >>>>> >>> increases in the dominance computation times, which were only a tiny percent >>>>> >>> of the overall compile time. >>>>> >>> >>>>> >>> Additionally, I added a flag to the rtl_data structure to indicate whether >>>>> >>> any partitioning was actually performed, so that optimizations which were >>>>> >>> conservatively disabled whenever the flag_reorder_blocks_and_partition >>>>> >>> is enabled (e.g. try_crossjump_to_edge, part of connect_traces) can be less >>>>> >>> conservative for functions where no partitions were formed (e.g. they are >>>>> >>> completely hot). >>>>> >>> >>>>> >>> Bootstrapped and tested on x86_64-unknown-linux-gnu. Also tested with SPEC2006 int >>>>> >>> benchmarks and internal google benchmarks using profile feedback and >>>>> >>> -freorder-blocks-and-partition to get more coverage. Ok for trunk? >>>>> >>> >>>>> >>> Thanks, >>>>> >>> Teresa >>>>> >>> >>>>> >>> 2012-11-14 Teresa Johnson <tejohnson@google.com> >>>>> >>> Steven Bosscher <steven@gcc.gnu.org> >>>>> >>> >>>>> >>> * cfghooks.h (cfg_layout_finalize): New parameter. >>>>> >>> * modulo-sched.c (rest_of_handle_sms): New cfg_layout_finalize >>>>> >>> parameter. >>>>> >>> * ifcvt.c (find_if_case_1): Replace BB_COPY_PARTITION with assert >>>>> >>> as this is now done by redirect_edge_and_branch_force. >>>>> >>> * function.c (thread_prologue_and_epilogue_insns): Insert new bb after >>>>> >>> barriers, new cfg_layout_finalize parameter, and don't store exit >>>>> >>> predecessor BB until after it is potentially split. >>>>> >>> * function.h (struct rtl_data): New flag has_bb_partition. >>>>> >>> * hw-doloop.c (reorder_loops): New cfg_layout_finalize parameter. >>>>> >>> * cfgcleanup.c (try_crossjump_to_edge): Only skip optimization if >>>>> >>> any blocks in function actually partitioned. >>>>> >>> (try_optimize_cfg): If cfg changed, invoke fixup_partitions to clean >>>>> >>> up partitioning. >>>>> >>> * bb-reorder.c (connect_traces): Only look for partitions and skip >>>>> >>> block copying if any blocks in function actually partitioned. >>>>> >>> (emit_barrier_after_bb): Handle insertion in non-cfglayout mode. >>>>> >>> (find_rarely_executed_basic_blocks_and_crossing_edges): Ensure >>>>> >>> that no cold blocks dominate a hot block. >>>>> >>> (fix_up_fall_thru_edges): Replace BB_COPY_PARTITION with assert >>>>> >>> as this is now done by force_nonfallthru_and_redirect. >>>>> >>> (add_reg_crossing_jump_notes): Handle the fact that some jumps may >>>>> >>> already be marked with region crossing note. >>>>> >>> (reorder_basic_blocks): Only need to verify partitions if any >>>>> >>> blocks in function actually partitioned. >>>>> >>> (insert_section_boundary_note): Only need to insert note if any >>>>> >>> blocks in function actually partitioned. >>>>> >>> (rest_of_handle_reorder_blocks): New cfg_layout_finalize >>>>> >>> parameter, and remove call to insert_section_boundary_note as this >>>>> >>> is now called via cfg_layout_finalize/fixup_reorder_chain. >>>>> >>> (duplicate_computed_gotos): New cfg_layout_finalize >>>>> >>> parameter. >>>>> >>> (partition_hot_cold_basic_blocks): Set flag indicating function >>>>> >>> has bb partitions. >>>>> >>> * bb-reorder.h: Declare insert_section_boundary_note and >>>>> >>> emit_barrier_after_bb, which are no longer static. >>>>> >>> * basic-block.h: Declare new function fixup_partitions. >>>>> >>> * cfgrtl.c (try_redirect_by_replacing_jump): Remove unnecessary >>>>> >>> check for region crossing note. >>>>> >>> (fixup_partition_crossing): New function. >>>>> >>> (fixup_bb_partition): Ditto. >>>>> >>> (rtl_redirect_edge_and_branch): Fixup partition boundaries. >>>>> >>> (force_nonfallthru_and_redirect): Fixup partition boundaries, >>>>> >>> remove old code that tried to do this. Emit barrier correctly >>>>> >>> when we are in cfglayout mode. >>>>> >>> (rtl_split_edge): Correctly fixup partition boundaries. >>>>> >>> (commit_one_edge_insertion): Remove old code that tried to >>>>> >>> fixup region crossing edge since this is now handled in >>>>> >>> split_block, and set up insertion point correctly since >>>>> >>> block may now end in a jump. >>>>> >>> (commit_edge_insertions): Invoke fixup_partitions to sanitize partition >>>>> >>> boundaries after optimizations that modify cfg and before trying to >>>>> >>> verify the flow info. >>>>> >>> (fixup_partitions): New function. >>>>> >>> (rtl_verify_flow_info_1): Add verification that no cold bbs dominate >>>>> >>> hot bbs. >>>>> >>> (record_effective_endpoints): Remove region-crossing notes and set flag >>>>> >>> indicating that they need to be reinserted on exit from cfglayout mode. >>>>> >>> (outof_cfg_layout_mode): New cfg_layout_finalize parameter. >>>>> >>> (fixup_reorder_chain): Call insert_section_boundary_note if necessary. >>>>> >>> Remove old code that attempted to fixup region crossing note as >>>>> >>> this is now handled in force_nonfallthru_and_redirect. >>>>> >>> (duplicate_insn_chain): Don't duplicate switch section notes. >>>>> >>> (cfg_layout_finalize): Pass new parameter to fixup_reorder_chain. >>>>> >>> (rtl_can_remove_branch_p): Remove unnecessary check for region crossing >>>>> >>> note. >>>>> >>> >>>>> >>> Index: cfghooks.h >>>>> >>> =================================================================== >>>>> >>> --- cfghooks.h (revision 193376) >>>>> >>> +++ cfghooks.h (working copy) >>>>> >>> @@ -204,7 +204,7 @@ extern void copy_bbs (basic_block *, unsigned, bas >>>>> >>> void account_profile_record (struct profile_record *, int); >>>>> >>> >>>>> >>> extern void cfg_layout_initialize (unsigned int); >>>>> >>> -extern void cfg_layout_finalize (void); >>>>> >>> +extern void cfg_layout_finalize (bool); >>>>> >>> >>>>> >>> /* Hooks containers. */ >>>>> >>> extern struct cfg_hooks gimple_cfg_hooks; >>>>> >>> @@ -218,4 +218,3 @@ extern void cfg_layout_rtl_register_cfg_hooks (voi >>>>> >>> extern void gimple_register_cfg_hooks (void); >>>>> >>> extern struct cfg_hooks get_cfg_hooks (void); >>>>> >>> extern void set_cfg_hooks (struct cfg_hooks); >>>>> >>> - >>>>> >>> Index: modulo-sched.c >>>>> >>> =================================================================== >>>>> >>> --- modulo-sched.c (revision 193376) >>>>> >>> +++ modulo-sched.c (working copy) >>>>> >>> @@ -3354,7 +3354,7 @@ rest_of_handle_sms (void) >>>>> >>> if (bb->next_bb != EXIT_BLOCK_PTR) >>>>> >>> bb->aux = bb->next_bb; >>>>> >>> free_dominance_info (CDI_DOMINATORS); >>>>> >>> - cfg_layout_finalize (); >>>>> >>> + cfg_layout_finalize (false); >>>>> >>> #endif /* INSN_SCHEDULING */ >>>>> >>> return 0; >>>>> >>> } >>>>> >>> Index: ifcvt.c >>>>> >>> =================================================================== >>>>> >>> --- ifcvt.c (revision 193376) >>>>> >>> +++ ifcvt.c (working copy) >>>>> >>> @@ -3900,10 +3900,9 @@ find_if_case_1 (basic_block test_bb, edge then_edg >>>>> >>> if (new_bb) >>>>> >>> { >>>>> >>> df_bb_replace (then_bb_index, new_bb); >>>>> >>> - /* Since the fallthru edge was redirected from test_bb to new_bb, >>>>> >>> - we need to ensure that new_bb is in the same partition as >>>>> >>> - test bb (you can not fall through across section boundaries). */ >>>>> >>> - BB_COPY_PARTITION (new_bb, test_bb); >>>>> >>> + /* This should have been done above via force_nonfallthru_and_redirect >>>>> >>> + (possibly called from redirect_edge_and_branch_force). */ >>>>> >>> + gcc_assert (BB_PARTITION (new_bb) == BB_PARTITION (test_bb)); >>>>> >>> } >>>>> >>> >>>>> >>> num_true_changes++; >>>>> >>> Index: function.c >>>>> >>> =================================================================== >>>>> >>> --- function.c (revision 193376) >>>>> >>> +++ function.c (working copy) >>>>> >>> @@ -6249,8 +6249,10 @@ thread_prologue_and_epilogue_insns (void) >>>>> >>> break; >>>>> >>> if (e) >>>>> >>> { >>>>> >>> - copy_bb = create_basic_block (NEXT_INSN (BB_END (e->src)), >>>>> >>> - NULL_RTX, e->src); >>>>> >>> + /* Make sure we insert after any barriers. */ >>>>> >>> + rtx end = get_last_bb_insn (e->src); >>>>> >>> + copy_bb = create_basic_block (NEXT_INSN (end), >>>>> >>> + NULL_RTX, e->src); >>>>> >>> BB_COPY_PARTITION (copy_bb, e->src); >>>>> >>> } >>>>> >>> else >>>>> >>> @@ -6475,7 +6477,7 @@ thread_prologue_and_epilogue_insns (void) >>>>> >>> if (cur_bb->index >= NUM_FIXED_BLOCKS >>>>> >>> && cur_bb->next_bb->index >= NUM_FIXED_BLOCKS) >>>>> >>> cur_bb->aux = cur_bb->next_bb; >>>>> >>> - cfg_layout_finalize (); >>>>> >>> + cfg_layout_finalize (false); >>>>> >>> } >>>>> >>> >>>>> >>> epilogue_done: >>>>> >>> @@ -6517,7 +6519,7 @@ epilogue_done: >>>>> >>> basic_block simple_return_block_cold = NULL; >>>>> >>> edge pending_edge_hot = NULL; >>>>> >>> edge pending_edge_cold = NULL; >>>>> >>> - basic_block exit_pred = EXIT_BLOCK_PTR->prev_bb; >>>>> >>> + basic_block exit_pred; >>>>> >>> int i; >>>>> >>> >>>>> >>> gcc_assert (entry_edge != orig_entry_edge); >>>>> >>> @@ -6545,6 +6547,12 @@ epilogue_done: >>>>> >>> else >>>>> >>> pending_edge_cold = e; >>>>> >>> } >>>>> >>> + >>>>> >>> + /* Save a pointer to the exit's predecessor BB for use in >>>>> >>> + inserting new BBs at the end of the function. Do this >>>>> >>> + after the call to split_block above which may split >>>>> >>> + the original exit pred. */ >>>>> >>> + exit_pred = EXIT_BLOCK_PTR->prev_bb; >>>>> >>> >>>>> >>> FOR_EACH_VEC_ELT (edge, unconverted_simple_returns, i, e) >>>>> >>> { >>>>> >>> Index: function.h >>>>> >>> =================================================================== >>>>> >>> --- function.h (revision 193376) >>>>> >>> +++ function.h (working copy) >>>>> >>> @@ -459,6 +459,11 @@ struct GTY(()) rtl_data { >>>>> >>> sched2) and is useful only if the port defines LEAF_REGISTERS. */ >>>>> >>> bool uses_only_leaf_regs; >>>>> >>> >>>>> >>> + /* Nonzero if the function being compiled has undergone hot/cold partitioning >>>>> >>> + (under flag_reorder_blocks_and_partition) and has at least one cold >>>>> >>> + block. */ >>>>> >>> + bool has_bb_partition; >>>>> >>> + >>>>> >>> /* Like regs_ever_live, but 1 if a reg is set or clobbered from an >>>>> >>> asm. Unlike regs_ever_live, elements of this array corresponding >>>>> >>> to eliminable regs (like the frame pointer) are set if an asm >>>>> >>> Index: hw-doloop.c >>>>> >>> =================================================================== >>>>> >>> --- hw-doloop.c (revision 193376) >>>>> >>> +++ hw-doloop.c (working copy) >>>>> >>> @@ -547,7 +547,7 @@ reorder_loops (hwloop_info loops) >>>>> >>> else >>>>> >>> bb->aux = NULL; >>>>> >>> } >>>>> >>> - cfg_layout_finalize (); >>>>> >>> + cfg_layout_finalize (false); >>>>> >>> clear_aux_for_blocks (); >>>>> >>> df_analyze (); >>>>> >>> } >>>>> >>> Index: cfgcleanup.c >>>>> >>> =================================================================== >>>>> >>> --- cfgcleanup.c (revision 193376) >>>>> >>> +++ cfgcleanup.c (working copy) >>>>> >>> @@ -1824,7 +1824,7 @@ try_crossjump_to_edge (int mode, edge e1, edge e2, >>>>> >>> partition boundaries). See the comments at the top of >>>>> >>> bb-reorder.c:partition_hot_cold_basic_blocks for complete details. */ >>>>> >>> >>>>> >>> - if (flag_reorder_blocks_and_partition && reload_completed) >>>>> >>> + if (crtl->has_bb_partition && reload_completed) >>>>> >>> return false; >>>>> >>> >>>>> >>> /* Search backward through forwarder blocks. We don't need to worry >>>>> >>> @@ -2767,10 +2767,21 @@ try_optimize_cfg (int mode) >>>>> >>> df_analyze (); >>>>> >>> } >>>>> >>> >>>>> >>> + if (changed) >>>>> >>> + { >>>>> >>> + /* Edge forwarding in particular can cause hot blocks previously >>>>> >>> + reached by both hot and cold blocks to become dominated only >>>>> >>> + by cold blocks. This will cause the verification below to fail, >>>>> >>> + and lead to now cold code in the hot section. This is not easy >>>>> >>> + to detect and fix during edge forwarding, and in some cases >>>>> >>> + is only visible after newly unreachable blocks are deleted, >>>>> >>> + which will be done in fixup_partitions. */ >>>>> >>> + fixup_partitions (); >>>>> >>> + >>>>> >>> #ifdef ENABLE_CHECKING >>>>> >>> - if (changed) >>>>> >>> - verify_flow_info (); >>>>> >>> + verify_flow_info (); >>>>> >>> #endif >>>>> >>> + } >>>>> >>> >>>>> >>> changed_overall |= changed; >>>>> >>> first_pass = false; >>>>> >>> Index: bb-reorder.c >>>>> >>> =================================================================== >>>>> >>> --- bb-reorder.c (revision 193376) >>>>> >>> +++ bb-reorder.c (working copy) >>>>> >>> @@ -1054,7 +1054,7 @@ connect_traces (int n_traces, struct trace *traces >>>>> >>> current_partition = BB_PARTITION (traces[0].first); >>>>> >>> two_passes = false; >>>>> >>> >>>>> >>> - if (flag_reorder_blocks_and_partition) >>>>> >>> + if (crtl->has_bb_partition) >>>>> >>> for (i = 0; i < n_traces && !two_passes; i++) >>>>> >>> if (BB_PARTITION (traces[0].first) >>>>> >>> != BB_PARTITION (traces[i].first)) >>>>> >>> @@ -1263,7 +1263,7 @@ connect_traces (int n_traces, struct trace *traces >>>>> >>> } >>>>> >>> } >>>>> >>> >>>>> >>> - if (flag_reorder_blocks_and_partition) >>>>> >>> + if (crtl->has_bb_partition) >>>>> >>> try_copy = false; >>>>> >>> >>>>> >>> /* Copy tiny blocks always; copy larger blocks only when the >>>>> >>> @@ -1381,13 +1381,14 @@ get_uncond_jump_length (void) >>>>> >>> return length; >>>>> >>> } >>>>> >>> >>>>> >>> -/* Emit a barrier into the footer of BB. */ >>>>> >>> +/* Emit a barrier after BB, into the footer if we are in CFGLAYOUT mode. */ >>>>> >>> >>>>> >>> -static void >>>>> >>> +void >>>>> >>> emit_barrier_after_bb (basic_block bb) >>>>> >>> { >>>>> >>> rtx barrier = emit_barrier_after (BB_END (bb)); >>>>> >>> - BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier); >>>>> >>> + if (current_ir_type () == IR_RTL_CFGLAYOUT) >>>>> >>> + BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier); >>>>> >>> } >>>>> >>> >>>>> >>> /* The landing pad OLD_LP, in block OLD_BB, has edges from both partitions. >>>>> >>> @@ -1463,18 +1464,109 @@ find_rarely_executed_basic_blocks_and_crossing_edg >>>>> >>> { >>>>> >>> VEC(edge, heap) *crossing_edges = NULL; >>>>> >>> basic_block bb; >>>>> >>> - edge e; >>>>> >>> - edge_iterator ei; >>>>> >>> + edge e, e2; >>>>> >>> + edge_iterator ei, ei2; >>>>> >>> + unsigned int cold_bb_count = 0; >>>>> >>> + VEC (basic_block, heap) *bbs_in_hot_partition = NULL; >>>>> >>> + VEC (basic_block, heap) *bbs_newly_hot = NULL; >>>>> >>> >>>>> >>> /* Mark which partition (hot/cold) each basic block belongs in. */ >>>>> >>> FOR_EACH_BB (bb) >>>>> >>> { >>>>> >>> if (probably_never_executed_bb_p (cfun, bb)) >>>>> >>> - BB_SET_PARTITION (bb, BB_COLD_PARTITION); >>>>> >>> + { >>>>> >>> + BB_SET_PARTITION (bb, BB_COLD_PARTITION); >>>>> >>> + cold_bb_count++; >>>>> >>> + } >>>>> >>> else >>>>> >>> - BB_SET_PARTITION (bb, BB_HOT_PARTITION); >>>>> >>> + { >>>>> >>> + BB_SET_PARTITION (bb, BB_HOT_PARTITION); >>>>> >>> + VEC_safe_push (basic_block, heap, bbs_in_hot_partition, bb); >>>>> >>> + } >>>>> >>> } >>>>> >>> >>>>> >>> + /* Ensure that no cold bbs dominate hot bbs. This could happen as a result of >>>>> >>> + several different possibilities. One is that there are edge weight insanities >>>>> >>> + due to optimization phases that do not properly update basic block profile >>>>> >>> + counts. The second is that the entry of the function may not be hot, because >>>>> >>> + it is entered fewer times than the number of profile training runs, but there >>>>> >>> + is a loop inside the function that causes blocks within the function to be >>>>> >>> + above the threshold for hotness. */ >>>>> >>> + if (cold_bb_count) >>>>> >>> + { >>>>> >>> + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); >>>>> >>> + >>>>> >>> + if (dom_calculated_here) >>>>> >>> + calculate_dominance_info (CDI_DOMINATORS); >>>>> >>> + >>>>> >>> + /* Keep examining hot bbs until we have either checked them all, or >>>>> >>> + re-marked all cold bbs hot. */ >>>>> >>> + while (! VEC_empty (basic_block, bbs_in_hot_partition) >>>>> >>> + && cold_bb_count) >>>>> >>> + { >>>>> >>> + basic_block dom_bb; >>>>> >>> + >>>>> >>> + bb = VEC_pop (basic_block, bbs_in_hot_partition); >>>>> >>> + dom_bb = get_immediate_dominator (CDI_DOMINATORS, bb); >>>>> >>> + >>>>> >>> + /* If bb's immediate dominator is also hot then it is ok. */ >>>>> >>> + if (BB_PARTITION (dom_bb) != BB_COLD_PARTITION) >>>>> >>> + continue; >>>>> >>> + >>>>> >>> + /* We have a hot bb with an immediate dominator that is cold. >>>>> >>> + The dominator needs to be re-marked to hot. */ >>>>> >>> + BB_SET_PARTITION (dom_bb, BB_HOT_PARTITION); >>>>> >>> + cold_bb_count--; >>>>> >>> + >>>>> >>> + /* Now we need to examine newly-hot dom_bb to see if it is also >>>>> >>> + dominated by a cold bb. */ >>>>> >>> + VEC_safe_push (basic_block, heap, bbs_in_hot_partition, dom_bb); >>>>> >>> + >>>>> >>> + /* We should also adjust any cold blocks that the newly-hot bb >>>>> >>> + feeds and see if it makes sense to re-mark those as hot as >>>>> >>> + well. */ >>>>> >>> + VEC_safe_push (basic_block, heap, bbs_newly_hot, dom_bb); >>>>> >>> + while (! VEC_empty (basic_block, bbs_newly_hot)) >>>>> >>> + { >>>>> >>> + basic_block new_hot_bb = VEC_pop (basic_block, bbs_newly_hot); >>>>> >>> + /* Examine all successors of this newly-hot bb to see if they >>>>> >>> + are cold and should be re-marked as hot. */ >>>>> >>> + FOR_EACH_EDGE (e, ei, new_hot_bb->succs) >>>>> >>> + { >>>>> >>> + bool any_cold_preds = false; >>>>> >>> + basic_block succ = e->dest; >>>>> >>> + if (BB_PARTITION (succ) != BB_COLD_PARTITION) >>>>> >>> + continue; >>>>> >>> + /* Does this block have any cold predecessors now? */ >>>>> >>> + FOR_EACH_EDGE (e2, ei2, succ->preds) >>>>> >>> + { >>>>> >>> + if (BB_PARTITION (e2->src) == BB_COLD_PARTITION) >>>>> >>> + { >>>>> >>> + any_cold_preds = true; >>>>> >>> + break; >>>>> >>> + } >>>>> >>> + } >>>>> >>> + if (any_cold_preds) >>>>> >>> + continue; >>>>> >>> + >>>>> >>> + /* Here we have a successor of newly-hot bb that is cold >>>>> >>> + but no longer has any cold precessessors. Since the original >>>>> >>> + assignment of our newly-hot bb was incorrect, this successor's >>>>> >>> + assignment as cold is also suspect. Go ahead and re-mark it >>>>> >>> + as hot now too. Better heuristics may be in order here. */ >>>>> >>> + BB_SET_PARTITION (succ, BB_HOT_PARTITION); >>>>> >>> + cold_bb_count--; >>>>> >>> + VEC_safe_push (basic_block, heap, bbs_in_hot_partition, succ); >>>>> >>> + /* Examine this successor as a newly-hot bb. */ >>>>> >>> + VEC_safe_push (basic_block, heap, bbs_newly_hot, succ); >>>>> >>> + } >>>>> >>> + } >>>>> >>> + } >>>>> >>> + >>>>> >>> + if (dom_calculated_here) >>>>> >>> + free_dominance_info (CDI_DOMINATORS); >>>>> >>> + } >>>>> >>> + >>>>> >>> /* The format of .gcc_except_table does not allow landing pads to >>>>> >>> be in a different partition as the throw. Fix this by either >>>>> >>> moving or duplicating the landing pads. */ >>>>> >>> @@ -1766,10 +1858,10 @@ fix_up_fall_thru_edges (void) >>>>> >>> new_bb->aux = cur_bb->aux; >>>>> >>> cur_bb->aux = new_bb; >>>>> >>> >>>>> >>> - /* Make sure new fall-through bb is in same >>>>> >>> - partition as bb it's falling through from. */ >>>>> >>> + /* This is done by force_nonfallthru_and_redirect. */ >>>>> >>> + gcc_assert (BB_PARTITION (new_bb) >>>>> >>> + == BB_PARTITION (cur_bb)); >>>>> >>> >>>>> >>> - BB_COPY_PARTITION (new_bb, cur_bb); >>>>> >>> single_succ_edge (new_bb)->flags |= EDGE_CROSSING; >>>>> >>> } >>>>> >>> else >>>>> >>> @@ -2067,7 +2159,10 @@ add_reg_crossing_jump_notes (void) >>>>> >>> FOR_EACH_BB (bb) >>>>> >>> FOR_EACH_EDGE (e, ei, bb->succs) >>>>> >>> if ((e->flags & EDGE_CROSSING) >>>>> >>> - && JUMP_P (BB_END (e->src))) >>>>> >>> + && JUMP_P (BB_END (e->src)) >>>>> >>> + /* Some notes were added during fix_up_fall_thru_edges, via >>>>> >>> + force_nonfallthru_and_redirect. */ >>>>> >>> + && !find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX)) >>>>> >>> add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); >>>>> >>> } >>>>> >>> >>>>> >>> @@ -2160,7 +2255,7 @@ reorder_basic_blocks (void) >>>>> >>> dump_flow_info (dump_file, dump_flags); >>>>> >>> } >>>>> >>> >>>>> >>> - if (flag_reorder_blocks_and_partition) >>>>> >>> + if (crtl->has_bb_partition) >>>>> >>> verify_hot_cold_block_grouping (); >>>>> >>> } >>>>> >>> >>>>> >>> @@ -2172,14 +2267,14 @@ reorder_basic_blocks (void) >>>>> >>> encountering this note will make the compiler switch between the >>>>> >>> hot and cold text sections. */ >>>>> >>> >>>>> >>> -static void >>>>> >>> +void >>>>> >>> insert_section_boundary_note (void) >>>>> >>> { >>>>> >>> basic_block bb; >>>>> >>> rtx new_note; >>>>> >>> int first_partition = 0; >>>>> >>> >>>>> >>> - if (!flag_reorder_blocks_and_partition) >>>>> >>> + if (!crtl->has_bb_partition) >>>>> >>> return; >>>>> >>> >>>>> >>> FOR_EACH_BB (bb) >>>>> >>> @@ -2222,10 +2317,8 @@ rest_of_handle_reorder_blocks (void) >>>>> >>> FOR_EACH_BB (bb) >>>>> >>> if (bb->next_bb != EXIT_BLOCK_PTR) >>>>> >>> bb->aux = bb->next_bb; >>>>> >>> - cfg_layout_finalize (); >>>>> >>> + cfg_layout_finalize (true); >>>>> >>> >>>>> >>> - /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes. */ >>>>> >>> - insert_section_boundary_note (); >>>>> >>> return 0; >>>>> >>> } >>>>> >>> >>>>> >>> @@ -2366,7 +2459,7 @@ duplicate_computed_gotos (void) >>>>> >>> } >>>>> >>> >>>>> >>> done: >>>>> >>> - cfg_layout_finalize (); >>>>> >>> + cfg_layout_finalize (false); >>>>> >>> >>>>> >>> BITMAP_FREE (candidates); >>>>> >>> return 0; >>>>> >>> @@ -2511,6 +2604,8 @@ partition_hot_cold_basic_blocks (void) >>>>> >>> if (crossing_edges == NULL) >>>>> >>> return 0; >>>>> >>> >>>>> >>> + crtl->has_bb_partition = true; >>>>> >>> + >>>>> >>> /* Make sure the source of any crossing edge ends in a jump and the >>>>> >>> destination of any crossing edge has a label. */ >>>>> >>> add_labels_and_missing_jumps (crossing_edges); >>>>> >>> Index: bb-reorder.h >>>>> >>> =================================================================== >>>>> >>> --- bb-reorder.h (revision 193376) >>>>> >>> +++ bb-reorder.h (working copy) >>>>> >>> @@ -36,4 +36,8 @@ extern struct target_bb_reorder *this_target_bb_re >>>>> >>> >>>>> >>> extern int get_uncond_jump_length (void); >>>>> >>> >>>>> >>> +extern void insert_section_boundary_note (void); >>>>> >>> + >>>>> >>> +extern void emit_barrier_after_bb (basic_block bb); >>>>> >>> + >>>>> >>> #endif >>>>> >>> Index: basic-block.h >>>>> >>> =================================================================== >>>>> >>> --- basic-block.h (revision 193376) >>>>> >>> +++ basic-block.h (working copy) >>>>> >>> @@ -806,6 +806,7 @@ extern basic_block force_nonfallthru_and_redirect >>>>> >>> extern bool contains_no_active_insn_p (const_basic_block); >>>>> >>> extern bool forwarder_block_p (const_basic_block); >>>>> >>> extern bool can_fallthru (basic_block, basic_block); >>>>> >>> +extern void fixup_partitions (void); >>>>> >>> >>>>> >>> /* In cfgbuild.c. */ >>>>> >>> extern void find_many_sub_basic_blocks (sbitmap); >>>>> >>> Index: cfgrtl.c >>>>> >>> =================================================================== >>>>> >>> --- cfgrtl.c (revision 193376) >>>>> >>> +++ cfgrtl.c (working copy) >>>>> >>> @@ -46,6 +46,7 @@ along with GCC; see the file COPYING3. If not see >>>>> >>> #include "tree.h" >>>>> >>> #include "hard-reg-set.h" >>>>> >>> #include "basic-block.h" >>>>> >>> +#include "bb-reorder.h" >>>>> >>> #include "regs.h" >>>>> >>> #include "flags.h" >>>>> >>> #include "function.h" >>>>> >>> @@ -67,11 +68,12 @@ along with GCC; see the file COPYING3. If not see >>>>> >>> Only applicable if the CFG is in cfglayout mode. */ >>>>> >>> static GTY(()) rtx cfg_layout_function_footer; >>>>> >>> static GTY(()) rtx cfg_layout_function_header; >>>>> >>> +static bool had_sec_boundary_notes; >>>>> >>> >>>>> >>> static rtx skip_insns_after_block (basic_block); >>>>> >>> static void record_effective_endpoints (void); >>>>> >>> static rtx label_for_bb (basic_block); >>>>> >>> -static void fixup_reorder_chain (void); >>>>> >>> +static void fixup_reorder_chain (bool finalize_reorder_blocks); >>>>> >>> >>>>> >>> void verify_insn_chain (void); >>>>> >>> static void fixup_fallthru_exit_predecessor (void); >>>>> >>> @@ -976,8 +978,7 @@ try_redirect_by_replacing_jump (edge e, basic_bloc >>>>> >>> partition boundaries). See the comments at the top of >>>>> >>> bb-reorder.c:partition_hot_cold_basic_blocks for complete details. */ >>>>> >>> >>>>> >>> - if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX) >>>>> >>> - || BB_PARTITION (src) != BB_PARTITION (target)) >>>>> >>> + if (BB_PARTITION (src) != BB_PARTITION (target)) >>>>> >>> return NULL; >>>>> >>> >>>>> >>> /* We can replace or remove a complex jump only when we have exactly >>>>> >>> @@ -1286,6 +1287,71 @@ redirect_branch_edge (edge e, basic_block target) >>>>> >>> return e; >>>>> >>> } >>>>> >>> >>>>> >>> +/* Called when edge E has been redirected to a new destination, >>>>> >>> + in order to update the region crossing flag on the edge and >>>>> >>> + jump. */ >>>>> >>> + >>>>> >>> +static void >>>>> >>> +fixup_partition_crossing (edge e, basic_block target) >>>>> >>> +{ >>>>> >>> + rtx note; >>>>> >>> + >>>>> >>> + gcc_assert (e->dest == target); >>>>> >>> + >>>>> >>> + if (e->src == ENTRY_BLOCK_PTR || target == EXIT_BLOCK_PTR) >>>>> >>> + return; >>>>> >>> + /* If we redirected an existing edge, it may already be marked >>>>> >>> + crossing, even though the new src is missing a reg crossing note. >>>>> >>> + But make sure reg crossing note doesn't already exist before >>>>> >>> + inserting. */ >>>>> >>> + if (BB_PARTITION (e->src) != BB_PARTITION (target)) >>>>> >>> + { >>>>> >>> + e->flags |= EDGE_CROSSING; >>>>> >>> + note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); >>>>> >>> + if (JUMP_P (BB_END (e->src)) >>>>> >>> + && !note) >>>>> >>> + add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); >>>>> >>> + } >>>>> >>> + else if (BB_PARTITION (e->src) == BB_PARTITION (target)) >>>>> >>> + { >>>>> >>> + e->flags &= ~EDGE_CROSSING; >>>>> >>> + /* Remove the region crossing note from jump at end of >>>>> >>> + e->src if it exists. */ >>>>> >>> + note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, NULL_RTX); >>>>> >>> + if (note) >>>>> >>> + remove_note (BB_END (e->src), note); >>>>> >>> + } >>>>> >>> +} >>>>> >>> + >>>>> >>> +/* Called when block BB has been reassigned to a different partition, >>>>> >>> + to ensure that the region crossing attributes are updated. */ >>>>> >>> + >>>>> >>> +static void >>>>> >>> +fixup_bb_partition (basic_block bb) >>>>> >>> +{ >>>>> >>> + edge e; >>>>> >>> + edge_iterator ei; >>>>> >>> + >>>>> >>> + /* Now need to make bb's pred edges non-region crossing. */ >>>>> >>> + FOR_EACH_EDGE (e, ei, bb->preds) >>>>> >>> + { >>>>> >>> + fixup_partition_crossing (e, e->dest); >>>>> >>> + } >>>>> >>> + >>>>> >>> + /* Possibly need to make bb's successor edges region crossing, >>>>> >>> + or remove stale region crossing. */ >>>>> >>> + FOR_EACH_EDGE (e, ei, bb->succs) >>>>> >>> + { >>>>> >>> + if ((e->flags & EDGE_FALLTHRU) >>>>> >>> + && BB_PARTITION (bb) != BB_PARTITION (e->dest) >>>>> >>> + && e->dest != EXIT_BLOCK_PTR) >>>>> >>> + /* force_nonfallthru_and_redirect calls fixup_partition_crossing. */ >>>>> >>> + force_nonfallthru (e); >>>>> >>> + else >>>>> >>> + fixup_partition_crossing (e, e->dest); >>>>> >>> + } >>>>> >>> +} >>>>> >>> + >>>>> >>> /* Attempt to change code to redirect edge E to TARGET. Don't do that on >>>>> >>> expense of adding new instructions or reordering basic blocks. >>>>> >>> >>>>> >>> @@ -1302,16 +1368,18 @@ rtl_redirect_edge_and_branch (edge e, basic_block >>>>> >>> { >>>>> >>> edge ret; >>>>> >>> basic_block src = e->src; >>>>> >>> + basic_block dest = e->dest; >>>>> >>> >>>>> >>> if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH)) >>>>> >>> return NULL; >>>>> >>> >>>>> >>> - if (e->dest == target) >>>>> >>> + if (dest == target) >>>>> >>> return e; >>>>> >>> >>>>> >>> if ((ret = try_redirect_by_replacing_jump (e, target, false)) != NULL) >>>>> >>> { >>>>> >>> df_set_bb_dirty (src); >>>>> >>> + fixup_partition_crossing (ret, target); >>>>> >>> return ret; >>>>> >>> } >>>>> >>> >>>>> >>> @@ -1320,6 +1388,7 @@ rtl_redirect_edge_and_branch (edge e, basic_block >>>>> >>> return NULL; >>>>> >>> >>>>> >>> df_set_bb_dirty (src); >>>>> >>> + fixup_partition_crossing (ret, target); >>>>> >>> return ret; >>>>> >>> } >>>>> >>> >>>>> >>> @@ -1454,18 +1523,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc >>>>> >>> /* Make sure new block ends up in correct hot/cold section. */ >>>>> >>> >>>>> >>> BB_COPY_PARTITION (jump_block, e->src); >>>>> >>> - if (flag_reorder_blocks_and_partition >>>>> >>> - && targetm_common.have_named_sections >>>>> >>> - && JUMP_P (BB_END (jump_block)) >>>>> >>> - && !any_condjump_p (BB_END (jump_block)) >>>>> >>> - && (EDGE_SUCC (jump_block, 0)->flags & EDGE_CROSSING)) >>>>> >>> - add_reg_note (BB_END (jump_block), REG_CROSSING_JUMP, NULL_RTX); >>>>> >>> >>>>> >>> /* Wire edge in. */ >>>>> >>> new_edge = make_edge (e->src, jump_block, EDGE_FALLTHRU); >>>>> >>> new_edge->probability = probability; >>>>> >>> new_edge->count = count; >>>>> >>> >>>>> >>> + /* If e->src was previously region crossing, it no longer is >>>>> >>> + and the reg crossing note should be removed. */ >>>>> >>> + fixup_partition_crossing (new_edge, jump_block); >>>>> >>> + >>>>> >>> /* Redirect old edge. */ >>>>> >>> redirect_edge_pred (e, jump_block); >>>>> >>> e->probability = REG_BR_PROB_BASE; >>>>> >>> @@ -1521,13 +1588,16 @@ force_nonfallthru_and_redirect (edge e, basic_bloc >>>>> >>> LABEL_NUSES (label)++; >>>>> >>> } >>>>> >>> >>>>> >>> - emit_barrier_after (BB_END (jump_block)); >>>>> >>> + /* We might be in cfg layout mode, and if so, the following routine will >>>>> >>> + insert the barrier correctly. */ >>>>> >>> + emit_barrier_after_bb (jump_block); >>>>> >>> redirect_edge_succ_nodup (e, target); >>>>> >>> >>>>> >>> if (abnormal_edge_flags) >>>>> >>> make_edge (src, target, abnormal_edge_flags); >>>>> >>> >>>>> >>> df_mark_solutions_dirty (); >>>>> >>> + fixup_partition_crossing (e, target); >>>>> >>> return new_bb; >>>>> >>> } >>>>> >>> >>>>> >>> @@ -1626,7 +1696,7 @@ rtl_move_block_after (basic_block bb ATTRIBUTE_UNU >>>>> >>> static basic_block >>>>> >>> rtl_split_edge (edge edge_in) >>>>> >>> { >>>>> >>> - basic_block bb; >>>>> >>> + basic_block bb, new_bb; >>>>> >>> rtx before; >>>>> >>> >>>>> >>> /* Abnormal edges cannot be split. */ >>>>> >>> @@ -1659,12 +1729,26 @@ rtl_split_edge (edge edge_in) >>>>> >>> else >>>>> >>> { >>>>> >>> bb = create_basic_block (before, NULL, edge_in->dest->prev_bb); >>>>> >>> - /* ??? Why not edge_in->dest->prev_bb here? */ >>>>> >>> - BB_COPY_PARTITION (bb, edge_in->dest); >>>>> >>> + if (edge_in->src == ENTRY_BLOCK_PTR) >>>>> >>> + BB_COPY_PARTITION (bb, edge_in->dest); >>>>> >>> + else >>>>> >>> + /* Put the split bb into the src partition, to avoid creating >>>>> >>> + a situation where a cold bb dominates a hot bb, in the case >>>>> >>> + where src is cold and dest is hot. The src will dominate >>>>> >>> + the new bb (whereas it might not have dominated dest). */ >>>>> >>> + BB_COPY_PARTITION (bb, edge_in->src); >>>>> >>> } >>>>> >>> >>>>> >>> make_single_succ_edge (bb, edge_in->dest, EDGE_FALLTHRU); >>>>> >>> >>>>> >>> + /* Can't allow a region crossing edge to be fallthrough. */ >>>>> >>> + if (BB_PARTITION (bb) != BB_PARTITION (edge_in->dest) >>>>> >>> + && edge_in->dest != EXIT_BLOCK_PTR) >>>>> >>> + { >>>>> >>> + new_bb = force_nonfallthru (single_succ_edge (bb)); >>>>> >>> + gcc_assert (!new_bb); >>>>> >>> + } >>>>> >>> + >>>>> >>> /* For non-fallthru edges, we must adjust the predecessor's >>>>> >>> jump instruction to target our new block. */ >>>>> >>> if ((edge_in->flags & EDGE_FALLTHRU) == 0) >>>>> >>> @@ -1777,17 +1861,13 @@ commit_one_edge_insertion (edge e) >>>>> >>> else >>>>> >>> { >>>>> >>> bb = split_edge (e); >>>>> >>> - after = BB_END (bb); >>>>> >>> >>>>> >>> - if (flag_reorder_blocks_and_partition >>>>> >>> - && targetm_common.have_named_sections >>>>> >>> - && e->src != ENTRY_BLOCK_PTR >>>>> >>> - && BB_PARTITION (e->src) == BB_COLD_PARTITION >>>>> >>> - && !(e->flags & EDGE_CROSSING) >>>>> >>> - && JUMP_P (after) >>>>> >>> - && !any_condjump_p (after) >>>>> >>> - && (single_succ_edge (bb)->flags & EDGE_CROSSING)) >>>>> >>> - add_reg_note (after, REG_CROSSING_JUMP, NULL_RTX); >>>>> >>> + /* If e crossed a partition boundary, we needed to make bb end in >>>>> >>> + a region-crossing jump, even though it was originally fallthru. */ >>>>> >>> + if (JUMP_P (BB_END (bb))) >>>>> >>> + before = BB_END (bb); >>>>> >>> + else >>>>> >>> + after = BB_END (bb); >>>>> >>> } >>>>> >>> >>>>> >>> /* Now that we've found the spot, do the insertion. */ >>>>> >>> @@ -1827,6 +1907,14 @@ commit_edge_insertions (void) >>>>> >>> { >>>>> >>> basic_block bb; >>>>> >>> >>>>> >>> + /* Optimization passes that invoke this routine can cause hot blocks >>>>> >>> + previously reached by both hot and cold blocks to become dominated only >>>>> >>> + by cold blocks. This will cause the verification below to fail, >>>>> >>> + and lead to now cold code in the hot section. In some cases this >>>>> >>> + may only be visible after newly unreachable blocks are deleted, >>>>> >>> + which will be done by fixup_partitions. */ >>>>> >>> + fixup_partitions (); >>>>> >>> + >>>>> >>> #ifdef ENABLE_CHECKING >>>>> >>> verify_flow_info (); >>>>> >>> #endif >>>>> >>> @@ -2028,7 +2116,75 @@ get_last_bb_insn (basic_block bb) >>>>> >>> >>>>> >>> return end; >>>>> >>> } >>>>> >>> - >>>>> >>> + >>>>> >>> +/* Perform cleanup on the hot/cold bb partitioning after optimization >>>>> >>> + passes that modify the cfg. */ >>>>> >>> + >>>>> >>> +void >>>>> >>> +fixup_partitions (void) >>>>> >>> +{ >>>>> >>> + basic_block bb; >>>>> >>> + >>>>> >>> + if (!crtl->has_bb_partition) >>>>> >>> + return; >>>>> >>> + >>>>> >>> + /* Delete any blocks that became unreachable and weren't >>>>> >>> + already cleaned up, for example during edge forwarding >>>>> >>> + and convert_jumps_to_returns. This will expose more >>>>> >>> + opportunities for fixing the partition boundaries here. >>>>> >>> + Also, the calculation of the dominance graph during verification >>>>> >>> + will assert if there are unreachable nodes. */ >>>>> >>> + delete_unreachable_blocks (); >>>>> >>> + >>>>> >>> + /* If there are partitions, do a sanity check on them: A basic block in >>>>> >>> + a cold partition cannot dominate a basic block in a hot partition. >>>>> >>> + Fixup any that now violate this requirement, as a result of edge >>>>> >>> + forwarding and unreachable block deletion. */ >>>>> >>> + VEC (basic_block, heap) *bbs_in_cold_partition = NULL; >>>>> >>> + VEC (basic_block, heap) *bbs_to_fix = NULL; >>>>> >>> + FOR_EACH_BB (bb) >>>>> >>> + if ((BB_PARTITION (bb) == BB_COLD_PARTITION)) >>>>> >>> + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb); >>>>> >>> + if (! VEC_empty (basic_block, bbs_in_cold_partition)) >>>>> >>> + { >>>>> >>> + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); >>>>> >>> + basic_block son; >>>>> >>> + >>>>> >>> + if (dom_calculated_here) >>>>> >>> + calculate_dominance_info (CDI_DOMINATORS); >>>>> >>> + >>>>> >>> + while (! VEC_empty (basic_block, bbs_in_cold_partition)) >>>>> >>> + { >>>>> >>> + bb = VEC_pop (basic_block, bbs_in_cold_partition); >>>>> >>> + /* If bb is not yet cold (because it was added below as >>>>> >>> + a block dominated by a cold bb) then mark it cold here. */ >>>>> >>> + if ((BB_PARTITION (bb) != BB_COLD_PARTITION)) >>>>> >>> + { >>>>> >>> + BB_SET_PARTITION (bb, BB_COLD_PARTITION); >>>>> >>> + VEC_safe_push (basic_block, heap, bbs_to_fix, bb); >>>>> >>> + } >>>>> >>> + /* Any blocks dominated by a block in the cold section >>>>> >>> + must also be cold. */ >>>>> >>> + for (son = first_dom_son (CDI_DOMINATORS, bb); >>>>> >>> + son; >>>>> >>> + son = next_dom_son (CDI_DOMINATORS, son)) >>>>> >>> + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son); >>>>> >>> + } >>>>> >>> + >>>>> >>> + if (dom_calculated_here) >>>>> >>> + free_dominance_info (CDI_DOMINATORS); >>>>> >>> + } >>>>> >>> + >>>>> >>> + /* Do the partition fixup after all necessary blocks have been converted to >>>>> >>> + cold, so that we only update the region crossings the minimum number of >>>>> >>> + places, which can require forcing edges to be non fallthru. */ >>>>> >>> + while (! VEC_empty (basic_block, bbs_to_fix)) >>>>> >>> + { >>>>> >>> + bb = VEC_pop (basic_block, bbs_to_fix); >>>>> >>> + fixup_bb_partition (bb); >>>>> >>> + } >>>>> >>> +} >>>>> >>> + >>>>> >>> /* Verify the CFG and RTL consistency common for both underlying RTL and >>>>> >>> cfglayout RTL. >>>>> >>> >>>>> >>> @@ -2052,6 +2208,7 @@ rtl_verify_flow_info_1 (void) >>>>> >>> rtx x; >>>>> >>> int err = 0; >>>>> >>> basic_block bb; >>>>> >>> + bool have_partitions = false; >>>>> >>> >>>>> >>> /* Check the general integrity of the basic blocks. */ >>>>> >>> FOR_EACH_BB_REVERSE (bb) >>>>> >>> @@ -2169,6 +2326,8 @@ rtl_verify_flow_info_1 (void) >>>>> >>> >>>>> >>> if (e->flags & EDGE_ABNORMAL) >>>>> >>> n_abnormal++; >>>>> >>> + >>>>> >>> + have_partitions |= is_crossing; >>>>> >>> } >>>>> >>> >>>>> >>> if (n_eh && !find_reg_note (BB_END (bb), REG_EH_REGION, NULL_RTX)) >>>>> >>> @@ -2293,6 +2452,40 @@ rtl_verify_flow_info_1 (void) >>>>> >>> } >>>>> >>> } >>>>> >>> >>>>> >>> + /* If there are partitions, do a sanity check on them: A basic block in >>>>> >>> + a cold partition cannot dominate a basic block in a hot partition. */ >>>>> >>> + VEC (basic_block, heap) *bbs_in_cold_partition = NULL; >>>>> >>> + if (have_partitions && !err) >>>>> >>> + FOR_EACH_BB (bb) >>>>> >>> + if ((BB_PARTITION (bb) == BB_COLD_PARTITION)) >>>>> >>> + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, bb); >>>>> >>> + if (! VEC_empty (basic_block, bbs_in_cold_partition)) >>>>> >>> + { >>>>> >>> + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); >>>>> >>> + basic_block son; >>>>> >>> + >>>>> >>> + if (dom_calculated_here) >>>>> >>> + calculate_dominance_info (CDI_DOMINATORS); >>>>> >>> + >>>>> >>> + while (! VEC_empty (basic_block, bbs_in_cold_partition)) >>>>> >>> + { >>>>> >>> + bb = VEC_pop (basic_block, bbs_in_cold_partition); >>>>> >>> + if ((BB_PARTITION (bb) != BB_COLD_PARTITION)) >>>>> >>> + { >>>>> >>> + error ("non-cold basic block %d dominated " >>>>> >>> + "by a block in the cold partition", bb->index); >>>>> >>> + err = 1; >>>>> >>> + } >>>>> >>> + for (son = first_dom_son (CDI_DOMINATORS, bb); >>>>> >>> + son; >>>>> >>> + son = next_dom_son (CDI_DOMINATORS, son)) >>>>> >>> + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, son); >>>>> >>> + } >>>>> >>> + >>>>> >>> + if (dom_calculated_here) >>>>> >>> + free_dominance_info (CDI_DOMINATORS); >>>>> >>> + } >>>>> >>> + >>>>> >>> /* Clean up. */ >>>>> >>> return err; >>>>> >>> } >>>>> >>> @@ -2965,14 +3158,41 @@ record_effective_endpoints (void) >>>>> >>> else >>>>> >>> cfg_layout_function_header = NULL_RTX; >>>>> >>> >>>>> >>> + had_sec_boundary_notes = false; >>>>> >>> + >>>>> >>> next_insn = get_insns (); >>>>> >>> FOR_EACH_BB (bb) >>>>> >>> { >>>>> >>> rtx end; >>>>> >>> >>>>> >>> if (PREV_INSN (BB_HEAD (bb)) && next_insn != BB_HEAD (bb)) >>>>> >>> - BB_HEADER (bb) = unlink_insn_chain (next_insn, >>>>> >>> - PREV_INSN (BB_HEAD (bb))); >>>>> >>> + { >>>>> >>> + /* Rather than try to keep section boundary notes incrementally >>>>> >>> + up-to-date through cfg layout optimizations, simply remove them >>>>> >>> + and flag that they should be re-inserted when exiting >>>>> >>> + cfg layout mode. */ >>>>> >>> + rtx check_insn = next_insn; >>>>> >>> + while (check_insn) >>>>> >>> + { >>>>> >>> + if (NOTE_P (check_insn) >>>>> >>> + && NOTE_KIND (check_insn) == NOTE_INSN_SWITCH_TEXT_SECTIONS) >>>>> >>> + { >>>>> >>> + had_sec_boundary_notes |= true; >>>>> >>> + /* Remove note from chain. Grab new next_insn first. */ >>>>> >>> + if (next_insn == check_insn) >>>>> >>> + next_insn = NEXT_INSN (check_insn); >>>>> >>> + /* Delete note. */ >>>>> >>> + delete_insn (check_insn); >>>>> >>> + /* There will only be one. */ >>>>> >>> + break; >>>>> >>> + } >>>>> >>> + check_insn = NEXT_INSN (check_insn); >>>>> >>> + } >>>>> >>> + /* If we still have header instructions left after above loop. */ >>>>> >>> + if (next_insn != BB_HEAD (bb)) >>>>> >>> + BB_HEADER (bb) = unlink_insn_chain (next_insn, >>>>> >>> + PREV_INSN (BB_HEAD (bb))); >>>>> >>> + } >>>>> >>> end = skip_insns_after_block (bb); >>>>> >>> if (NEXT_INSN (BB_END (bb)) && BB_END (bb) != end) >>>>> >>> BB_FOOTER (bb) = unlink_insn_chain (NEXT_INSN (BB_END (bb)), end); >>>>> >>> @@ -3000,7 +3220,7 @@ outof_cfg_layout_mode (void) >>>>> >>> if (bb->next_bb != EXIT_BLOCK_PTR) >>>>> >>> bb->aux = bb->next_bb; >>>>> >>> >>>>> >>> - cfg_layout_finalize (); >>>>> >>> + cfg_layout_finalize (false); >>>>> >>> >>>>> >>> return 0; >>>>> >>> } >>>>> >>> @@ -3120,10 +3340,13 @@ relink_block_chain (bool stay_in_cfglayout_mode) >>>>> >>> } >>>>> >>> >>>>> >>> >>>>> >>> -/* Given a reorder chain, rearrange the code to match. */ >>>>> >>> +/* Given a reorder chain, rearrange the code to match. If >>>>> >>> + this is called when we will FINALIZE_REORDER_BLOCKS, or when >>>>> >>> + section boundary notes were removed on entry to cfg layout >>>>> >>> + mode, insert section boundary notes here. */ >>>>> >>> >>>>> >>> static void >>>>> >>> -fixup_reorder_chain (void) >>>>> >>> +fixup_reorder_chain (bool finalize_reorder_blocks) >>>>> >>> { >>>>> >>> basic_block bb; >>>>> >>> rtx insn = NULL; >>>>> >>> @@ -3150,7 +3373,7 @@ static void >>>>> >>> PREV_INSN (BB_HEADER (bb)) = insn; >>>>> >>> insn = BB_HEADER (bb); >>>>> >>> while (NEXT_INSN (insn)) >>>>> >>> - insn = NEXT_INSN (insn); >>>>> >>> + insn = NEXT_INSN (insn); >>>>> >>> } >>>>> >>> if (insn) >>>>> >>> NEXT_INSN (insn) = BB_HEAD (bb); >>>>> >>> @@ -3175,6 +3398,11 @@ static void >>>>> >>> insn = NEXT_INSN (insn); >>>>> >>> >>>>> >>> set_last_insn (insn); >>>>> >>> + >>>>> >>> + /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes. */ >>>>> >>> + if (had_sec_boundary_notes || finalize_reorder_blocks) >>>>> >>> + insert_section_boundary_note (); >>>>> >>> + >>>>> >>> #ifdef ENABLE_CHECKING >>>>> >>> verify_insn_chain (); >>>>> >>> #endif >>>>> >>> @@ -3187,7 +3415,7 @@ static void >>>>> >>> edge e_fall, e_taken, e; >>>>> >>> rtx bb_end_insn; >>>>> >>> rtx ret_label = NULL_RTX; >>>>> >>> - basic_block nb, src_bb; >>>>> >>> + basic_block nb; >>>>> >>> edge_iterator ei; >>>>> >>> >>>>> >>> if (EDGE_COUNT (bb->succs) == 0) >>>>> >>> @@ -3322,7 +3550,6 @@ static void >>>>> >>> /* We got here if we need to add a new jump insn. >>>>> >>> Note force_nonfallthru can delete E_FALL and thus we have to >>>>> >>> save E_FALL->src prior to the call to force_nonfallthru. */ >>>>> >>> - src_bb = e_fall->src; >>>>> >>> nb = force_nonfallthru_and_redirect (e_fall, e_fall->dest, ret_label); >>>>> >>> if (nb) >>>>> >>> { >>>>> >>> @@ -3330,17 +3557,6 @@ static void >>>>> >>> bb->aux = nb; >>>>> >>> /* Don't process this new block. */ >>>>> >>> bb = nb; >>>>> >>> - >>>>> >>> - /* Make sure new bb is tagged for correct section (same as >>>>> >>> - fall-thru source, since you cannot fall-thru across >>>>> >>> - section boundaries). */ >>>>> >>> - BB_COPY_PARTITION (src_bb, single_pred (bb)); >>>>> >>> - if (flag_reorder_blocks_and_partition >>>>> >>> - && targetm_common.have_named_sections >>>>> >>> - && JUMP_P (BB_END (bb)) >>>>> >>> - && !any_condjump_p (BB_END (bb)) >>>>> >>> - && (EDGE_SUCC (bb, 0)->flags & EDGE_CROSSING)) >>>>> >>> - add_reg_note (BB_END (bb), REG_CROSSING_JUMP, NULL_RTX); >>>>> >>> } >>>>> >>> } >>>>> >>> >>>>> >>> @@ -3644,10 +3860,11 @@ duplicate_insn_chain (rtx from, rtx to) >>>>> >>> case NOTE_INSN_FUNCTION_BEG: >>>>> >>> /* There is always just single entry to function. */ >>>>> >>> case NOTE_INSN_BASIC_BLOCK: >>>>> >>> + /* We should only switch text sections once. */ >>>>> >>> + case NOTE_INSN_SWITCH_TEXT_SECTIONS: >>>>> >>> break; >>>>> >>> >>>>> >>> case NOTE_INSN_EPILOGUE_BEG: >>>>> >>> - case NOTE_INSN_SWITCH_TEXT_SECTIONS: >>>>> >>> emit_note_copy (insn); >>>>> >>> break; >>>>> >>> >>>>> >>> @@ -3759,10 +3976,13 @@ break_superblocks (void) >>>>> >>> } >>>>> >>> >>>>> >>> /* Finalize the changes: reorder insn list according to the sequence specified >>>>> >>> - by aux pointers, enter compensation code, rebuild scope forest. */ >>>>> >>> + by aux pointers, enter compensation code, rebuild scope forest. If >>>>> >>> + this is called when we will FINALIZE_REORDER_BLOCKS, indicate that >>>>> >>> + to fixup_reorder_chain so that it can insert the proper switch text >>>>> >>> + section notes. */ >>>>> >>> >>>>> >>> void >>>>> >>> -cfg_layout_finalize (void) >>>>> >>> +cfg_layout_finalize (bool finalize_reorder_blocks) >>>>> >>> { >>>>> >>> #ifdef ENABLE_CHECKING >>>>> >>> verify_flow_info (); >>>>> >>> @@ -3775,7 +3995,7 @@ void >>>>> >>> #endif >>>>> >>> ) >>>>> >>> fixup_fallthru_exit_predecessor (); >>>>> >>> - fixup_reorder_chain (); >>>>> >>> + fixup_reorder_chain (finalize_reorder_blocks); >>>>> >>> >>>>> >>> rebuild_jump_labels (get_insns ()); >>>>> >>> delete_dead_jumptables (); >>>>> >>> @@ -4454,8 +4674,7 @@ rtl_can_remove_branch_p (const_edge e) >>>>> >>> if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH)) >>>>> >>> return false; >>>>> >>> >>>>> >>> - if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX) >>>>> >>> - || BB_PARTITION (src) != BB_PARTITION (target)) >>>>> >>> + if (BB_PARTITION (src) != BB_PARTITION (target)) >>>>> >>> return false; >>>>> >>> >>>>> >>> if (!onlyjump_p (insn) >>>>> >>> >>>>> >>> -- >>>>> >>> This patch is available for review at http://codereview.appspot.com/6823047 >>>>> >> >>>>> >> >>>>> >> >>>>> >> -- >>>>> >> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413 >>>>> >>>>> >>>>> >>>>> -- >>>>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413 >>> >>> >>> >>> -- >>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413 > > > > -- > Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Sign in to reply to this message.
On Wed, Nov 28, 2012 at 7:48 AM, Christophe Lyon <christophe.lyon@linaro.org > wrote: > I have updated my trunk checkout, and I can confirm that eval.c now > compiles with your patch (and the other 4 patches I added to PR55121). > good > > Now, when looking at the whole Spec2k results: > - vpr passes now (used to fail) > good > - gcc, parser, perlbmk bzip2 and twolf no longer build: they all fail > with the same error from gas: > can't resolve `.text.unlikely' {.text.unlikely section} - `.LBB171' > {.text section} > - gap still does not build (same error as above) > > I haven't looked in detail, so I may be missing an obvious patch here. > Finally had a chance to get back to this. I was able to reproduce the failure using x86_64 linux with "-freorder-blocks-and-partition -g". However, I am also getting the same failure with a pristine copy of trunk. Can you confirm whether you were seeing any of these failures without my patches, because I believe they are probably a limitation with function splitting and debug info that is orthogonal to my patch. > And I still observe runtime mis-behaviour on crafty, galgel, facerec and > fma3d. > I'm not seeing this on x86_64, unfortunately, so it might require some follow-on work to triage and fix. I'll look into the gas failure, but if someone could review this patch in the meantime given that it does improve things considerably (at least without -g), that would be great. Thanks, Teresa > Thanks > Christophe. > > > On 26 November 2012 21:52, Teresa Johnson <tejohnson@google.com> wrote: > > Sorry, I don't know what happened there. Patch is attached. > > Thanks, > > Teresa > > > > On Mon, Nov 26, 2012 at 12:42 PM, Jack Howarth <howarth@bromo.med.uc.edu> > wrote: > >> On Mon, Nov 26, 2012 at 12:19:55PM -0800, Teresa Johnson wrote: > >>> Are you sure you have all my changes applied? I applied the 4 patches > >>> attached to PR55121 into my trunk checkout that has my fixes, and to a > >>> pristine trunk checkout. I configured and built both for > >>> --target=arm-none-linux-gnueabi, and built using your options, .i file > >>> and gcda file. I can reproduce the failure using the pristine trunk > >>> with your patches but not with my fixed trunk + your patches. (I just > >>> updated to head to pickup recent changes and get the same result. The > >>> vec changes required some manual changes to the patch, which I will > >>> resend shortly.) > >> > >> Teresa, > >> Your mailer seems to have corrupted the posted patch with stray > >> =3D characters and line breaks. Can you repost a copy as an attachment > >> to the list? > >> Jack > >> > >>> > >>> Without my fixes: > >>> > >>> $ ~/extra/gcc_trunk_3_arm-eabi/gcc/cc1 -fpreproce > >>> ssed eval.i -quiet -dumpbase eval.c -march=armv7-a -mtune=cortex-a9 > >>> -mthumb -mfpu=neon -mvectorize-with-neon-quad -mfloat-abi=softfp > >>> -mtls-dialect=gnu -auxbase-strip eval.o -g -O3 -version -fprofile-use > >>> -fno-common -o eval.s -freorder-blocks-and-partition > >>> GNU C (GCC) version 4.8.0 20121126 (experimental) > (arm-none-linux-gnueabi) > >>> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version > >>> 2.4.2-p1, MPC version 0.8.1 > >>> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 > >>> GNU C (GCC) version 4.8.0 20121126 (experimental) > (arm-none-linux-gnueabi) > >>> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version > >>> 2.4.2-p1, MPC version 0.8.1 > >>> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 > >>> Compiler executable checksum: d19cc60a2f07de08237a8488bb35cd1a > >>> eval.c: In function ‘Ge’: > >>> eval.c:792:1: internal compiler error: in df_compact_blocks, at > df-core.c:1560 > >>> } > >>> ^ > >>> 0x622f71 df_compact_blocks() > >>> ../../gcc_trunk_3/gcc/df-core.c:1560 > >>> 0x5cfcb5 compact_blocks() > >>> ../../gcc_trunk_3/gcc/cfg.c:162 > >>> 0xc9dce0 reorder_basic_blocks > >>> ../../gcc_trunk_3/gcc/bb-reorder.c:2154 > >>> 0xc9dce0 rest_of_handle_reorder_blocks > >>> ../../gcc_trunk_3/gcc/bb-reorder.c:2219 > >>> Please submit a full bug report, > >>> with preprocessed source if appropriate. > >>> Please include the complete backtrace with any bug report. > >>> See <http://gcc.gnu.org/bugs.html> for instructions. > >>> > >>> > >>> With my fixes: > >>> > >>> $ ~/extra/gcc_trunk_4_arm-eabi/gcc/cc1 -fpreproce > >>> ssed eval.i -quiet -dumpbase eval.c -march=armv7-a -mtune=cortex-a9 > >>> -mthumb -mfpu=neon -mvectorize-with-neon-quad -mfloat-abi=softfp > >>> -mtls-dialect=gnu -auxbase-strip eval.o -g -O3 -version -fprofile-use > >>> -fno-common -o eval.s -freorder-blocks-and-partition > >>> GNU C (GCC) version 4.8.0 20121126 (experimental) > (arm-none-linux-gnueabi) > >>> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version > >>> 2.4.2-p1, MPC version 0.8.1 > >>> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 > >>> GNU C (GCC) version 4.8.0 20121126 (experimental) > (arm-none-linux-gnueabi) > >>> compiled by GNU C version 4.4.3, GMP version 4.3.2, MPFR version > >>> 2.4.2-p1, MPC version 0.8.1 > >>> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096 > >>> Compiler executable checksum: 45b468efa7c981f9afb44c4dac2424f3 > >>> > >>> > >>> Thanks, > >>> Teresa > >>> > >>> On Mon, Nov 26, 2012 at 8:25 AM, Christophe Lyon > >>> <christophe.lyon@linaro.org> wrote: > >>> > Hi, > >>> > > >>> > I have tested your patch on Spec2000 on ARM, and I can still see > >>> > several failures caused by: > >>> > "error: fallthru edge crosses section boundary", including the case > >>> > described in PR55121. > >>> > > >>> > On 26 November 2012 16:55, Teresa Johnson <tejohnson@google.com> > wrote: > >>> >> Ping. > >>> >> Teresa > >>> >> > >>> >> On Thu, Nov 15, 2012 at 12:10 PM, Teresa Johnson < > tejohnson@google.com> wrote: > >>> >>> Revised patch that fixes failures encountered when enabling > >>> >>> -freorder-blocks-and-partition, including the failure reported in > PR 53743. > >>> >>> > >>> >>> This includes new verification code to ensure no cold blocks > dominate hot > >>> >>> blocks contributed by Steven Bosscher. > >>> >>> > >>> >>> I attempted to make the handling of partition updates through the > optimization > >>> >>> passes much more consistent, removing a number of partial fixes in > the code > >>> >>> stream in the process. The code to fixup partitions (including the > BB_PARTITION > >>> >>> assignement, region crossing jump notes, and switch text section > notes) is > >>> >>> now handled in a few centralized locations. For example, inside > >>> >>> rtl_redirect_edge_and_branch and force_nonfallthru_and_redirect, > so that callers > >>> >>> don't need to attempt the fixup themselves. > >>> >>> > >>> >>> For optimization passes that make adjustments to the cfg while in > cfg layout > >>> >>> mode that are not easy to fix up incrementally, the new routine > >>> >>> fixup_partitions handles the cleanup globally. This does require > calculation > >>> >>> of the dominance relation, however, as far as I can tell the > routines which > >>> >>> now invoke this global fixup (try_optimize_cfg and > commit_edge_insertions) > >>> >>> are invoked typically once (or a small number of times in the case > of > >>> >>> try_optimize_cfg) per optimization pass. Additionally, I compared > the > >>> >>> -ftime-report output for some large fdo compilations and saw only > minimal > >>> >>> increases in the dominance computation times, which were only a > tiny percent > >>> >>> of the overall compile time. > >>> >>> > >>> >>> Additionally, I added a flag to the rtl_data structure to indicate > whether > >>> >>> any partitioning was actually performed, so that optimizations > which were > >>> >>> conservatively disabled whenever the > flag_reorder_blocks_and_partition > >>> >>> is enabled (e.g. try_crossjump_to_edge, part of connect_traces) > can be less > >>> >>> conservative for functions where no partitions were formed (e.g. > they are > >>> >>> completely hot). > >>> >>> > >>> >>> Bootstrapped and tested on x86_64-unknown-linux-gnu. Also tested > with SPEC2006 int > >>> >>> benchmarks and internal google benchmarks using profile feedback > and > >>> >>> -freorder-blocks-and-partition to get more coverage. Ok for trunk? > >>> >>> > >>> >>> Thanks, > >>> >>> Teresa > >>> >>> > >>> >>> 2012-11-14 Teresa Johnson <tejohnson@google.com> > >>> >>> Steven Bosscher <steven@gcc.gnu.org> > >>> >>> > >>> >>> * cfghooks.h (cfg_layout_finalize): New parameter. > >>> >>> * modulo-sched.c (rest_of_handle_sms): New > cfg_layout_finalize > >>> >>> parameter. > >>> >>> * ifcvt.c (find_if_case_1): Replace BB_COPY_PARTITION with > assert > >>> >>> as this is now done by redirect_edge_and_branch_force. > >>> >>> * function.c (thread_prologue_and_epilogue_insns): Insert > new bb after > >>> >>> barriers, new cfg_layout_finalize parameter, and don't > store exit > >>> >>> predecessor BB until after it is potentially split. > >>> >>> * function.h (struct rtl_data): New flag has_bb_partition. > >>> >>> * hw-doloop.c (reorder_loops): New cfg_layout_finalize > parameter. > >>> >>> * cfgcleanup.c (try_crossjump_to_edge): Only skip > optimization if > >>> >>> any blocks in function actually partitioned. > >>> >>> (try_optimize_cfg): If cfg changed, invoke > fixup_partitions to clean > >>> >>> up partitioning. > >>> >>> * bb-reorder.c (connect_traces): Only look for partitions > and skip > >>> >>> block copying if any blocks in function actually > partitioned. > >>> >>> (emit_barrier_after_bb): Handle insertion in non-cfglayout > mode. > >>> >>> (find_rarely_executed_basic_blocks_and_crossing_edges): > Ensure > >>> >>> that no cold blocks dominate a hot block. > >>> >>> (fix_up_fall_thru_edges): Replace BB_COPY_PARTITION with > assert > >>> >>> as this is now done by force_nonfallthru_and_redirect. > >>> >>> (add_reg_crossing_jump_notes): Handle the fact that some > jumps may > >>> >>> already be marked with region crossing note. > >>> >>> (reorder_basic_blocks): Only need to verify partitions if > any > >>> >>> blocks in function actually partitioned. > >>> >>> (insert_section_boundary_note): Only need to insert note > if any > >>> >>> blocks in function actually partitioned. > >>> >>> (rest_of_handle_reorder_blocks): New cfg_layout_finalize > >>> >>> parameter, and remove call to insert_section_boundary_note > as this > >>> >>> is now called via cfg_layout_finalize/fixup_reorder_chain. > >>> >>> (duplicate_computed_gotos): New cfg_layout_finalize > >>> >>> parameter. > >>> >>> (partition_hot_cold_basic_blocks): Set flag indicating > function > >>> >>> has bb partitions. > >>> >>> * bb-reorder.h: Declare insert_section_boundary_note and > >>> >>> emit_barrier_after_bb, which are no longer static. > >>> >>> * basic-block.h: Declare new function fixup_partitions. > >>> >>> * cfgrtl.c (try_redirect_by_replacing_jump): Remove > unnecessary > >>> >>> check for region crossing note. > >>> >>> (fixup_partition_crossing): New function. > >>> >>> (fixup_bb_partition): Ditto. > >>> >>> (rtl_redirect_edge_and_branch): Fixup partition boundaries. > >>> >>> (force_nonfallthru_and_redirect): Fixup partition > boundaries, > >>> >>> remove old code that tried to do this. Emit barrier > correctly > >>> >>> when we are in cfglayout mode. > >>> >>> (rtl_split_edge): Correctly fixup partition boundaries. > >>> >>> (commit_one_edge_insertion): Remove old code that tried to > >>> >>> fixup region crossing edge since this is now handled in > >>> >>> split_block, and set up insertion point correctly since > >>> >>> block may now end in a jump. > >>> >>> (commit_edge_insertions): Invoke fixup_partitions to > sanitize partition > >>> >>> boundaries after optimizations that modify cfg and before > trying to > >>> >>> verify the flow info. > >>> >>> (fixup_partitions): New function. > >>> >>> (rtl_verify_flow_info_1): Add verification that no cold > bbs dominate > >>> >>> hot bbs. > >>> >>> (record_effective_endpoints): Remove region-crossing notes > and set flag > >>> >>> indicating that they need to be reinserted on exit from > cfglayout mode. > >>> >>> (outof_cfg_layout_mode): New cfg_layout_finalize parameter. > >>> >>> (fixup_reorder_chain): Call insert_section_boundary_note > if necessary. > >>> >>> Remove old code that attempted to fixup region crossing > note as > >>> >>> this is now handled in force_nonfallthru_and_redirect. > >>> >>> (duplicate_insn_chain): Don't duplicate switch section > notes. > >>> >>> (cfg_layout_finalize): Pass new parameter to > fixup_reorder_chain. > >>> >>> (rtl_can_remove_branch_p): Remove unnecessary check for > region crossing > >>> >>> note. > >>> >>> > >>> >>> Index: cfghooks.h > >>> >>> =================================================================== > >>> >>> --- cfghooks.h (revision 193376) > >>> >>> +++ cfghooks.h (working copy) > >>> >>> @@ -204,7 +204,7 @@ extern void copy_bbs (basic_block *, unsigned, > bas > >>> >>> void account_profile_record (struct profile_record *, int); > >>> >>> > >>> >>> extern void cfg_layout_initialize (unsigned int); > >>> >>> -extern void cfg_layout_finalize (void); > >>> >>> +extern void cfg_layout_finalize (bool); > >>> >>> > >>> >>> /* Hooks containers. */ > >>> >>> extern struct cfg_hooks gimple_cfg_hooks; > >>> >>> @@ -218,4 +218,3 @@ extern void cfg_layout_rtl_register_cfg_hooks > (voi > >>> >>> extern void gimple_register_cfg_hooks (void); > >>> >>> extern struct cfg_hooks get_cfg_hooks (void); > >>> >>> extern void set_cfg_hooks (struct cfg_hooks); > >>> >>> - > >>> >>> Index: modulo-sched.c > >>> >>> =================================================================== > >>> >>> --- modulo-sched.c (revision 193376) > >>> >>> +++ modulo-sched.c (working copy) > >>> >>> @@ -3354,7 +3354,7 @@ rest_of_handle_sms (void) > >>> >>> if (bb->next_bb != EXIT_BLOCK_PTR) > >>> >>> bb->aux = bb->next_bb; > >>> >>> free_dominance_info (CDI_DOMINATORS); > >>> >>> - cfg_layout_finalize (); > >>> >>> + cfg_layout_finalize (false); > >>> >>> #endif /* INSN_SCHEDULING */ > >>> >>> return 0; > >>> >>> } > >>> >>> Index: ifcvt.c > >>> >>> =================================================================== > >>> >>> --- ifcvt.c (revision 193376) > >>> >>> +++ ifcvt.c (working copy) > >>> >>> @@ -3900,10 +3900,9 @@ find_if_case_1 (basic_block test_bb, edge > then_edg > >>> >>> if (new_bb) > >>> >>> { > >>> >>> df_bb_replace (then_bb_index, new_bb); > >>> >>> - /* Since the fallthru edge was redirected from test_bb to > new_bb, > >>> >>> - we need to ensure that new_bb is in the same partition as > >>> >>> - test bb (you can not fall through across section > boundaries). */ > >>> >>> - BB_COPY_PARTITION (new_bb, test_bb); > >>> >>> + /* This should have been done above via > force_nonfallthru_and_redirect > >>> >>> + (possibly called from redirect_edge_and_branch_force). > */ > >>> >>> + gcc_assert (BB_PARTITION (new_bb) == BB_PARTITION > (test_bb)); > >>> >>> } > >>> >>> > >>> >>> num_true_changes++; > >>> >>> Index: function.c > >>> >>> =================================================================== > >>> >>> --- function.c (revision 193376) > >>> >>> +++ function.c (working copy) > >>> >>> @@ -6249,8 +6249,10 @@ thread_prologue_and_epilogue_insns (void) > >>> >>> break; > >>> >>> if (e) > >>> >>> { > >>> >>> - copy_bb = create_basic_block (NEXT_INSN > (BB_END (e->src)), > >>> >>> - NULL_RTX, > e->src); > >>> >>> + /* Make sure we insert after any barriers. */ > >>> >>> + rtx end = get_last_bb_insn (e->src); > >>> >>> + copy_bb = create_basic_block (NEXT_INSN (end), > >>> >>> + NULL_RTX, > e->src); > >>> >>> BB_COPY_PARTITION (copy_bb, e->src); > >>> >>> } > >>> >>> else > >>> >>> @@ -6475,7 +6477,7 @@ thread_prologue_and_epilogue_insns (void) > >>> >>> if (cur_bb->index >= NUM_FIXED_BLOCKS > >>> >>> && cur_bb->next_bb->index >= NUM_FIXED_BLOCKS) > >>> >>> cur_bb->aux = cur_bb->next_bb; > >>> >>> - cfg_layout_finalize (); > >>> >>> + cfg_layout_finalize (false); > >>> >>> } > >>> >>> > >>> >>> epilogue_done: > >>> >>> @@ -6517,7 +6519,7 @@ epilogue_done: > >>> >>> basic_block simple_return_block_cold = NULL; > >>> >>> edge pending_edge_hot = NULL; > >>> >>> edge pending_edge_cold = NULL; > >>> >>> - basic_block exit_pred = EXIT_BLOCK_PTR->prev_bb; > >>> >>> + basic_block exit_pred; > >>> >>> int i; > >>> >>> > >>> >>> gcc_assert (entry_edge != orig_entry_edge); > >>> >>> @@ -6545,6 +6547,12 @@ epilogue_done: > >>> >>> else > >>> >>> pending_edge_cold = e; > >>> >>> } > >>> >>> + > >>> >>> + /* Save a pointer to the exit's predecessor BB for use in > >>> >>> + inserting new BBs at the end of the function. Do this > >>> >>> + after the call to split_block above which may split > >>> >>> + the original exit pred. */ > >>> >>> + exit_pred = EXIT_BLOCK_PTR->prev_bb; > >>> >>> > >>> >>> FOR_EACH_VEC_ELT (edge, unconverted_simple_returns, i, e) > >>> >>> { > >>> >>> Index: function.h > >>> >>> =================================================================== > >>> >>> --- function.h (revision 193376) > >>> >>> +++ function.h (working copy) > >>> >>> @@ -459,6 +459,11 @@ struct GTY(()) rtl_data { > >>> >>> sched2) and is useful only if the port defines > LEAF_REGISTERS. */ > >>> >>> bool uses_only_leaf_regs; > >>> >>> > >>> >>> + /* Nonzero if the function being compiled has undergone > hot/cold partitioning > >>> >>> + (under flag_reorder_blocks_and_partition) and has at least > one cold > >>> >>> + block. */ > >>> >>> + bool has_bb_partition; > >>> >>> + > >>> >>> /* Like regs_ever_live, but 1 if a reg is set or clobbered from > an > >>> >>> asm. Unlike regs_ever_live, elements of this array > corresponding > >>> >>> to eliminable regs (like the frame pointer) are set if an asm > >>> >>> Index: hw-doloop.c > >>> >>> =================================================================== > >>> >>> --- hw-doloop.c (revision 193376) > >>> >>> +++ hw-doloop.c (working copy) > >>> >>> @@ -547,7 +547,7 @@ reorder_loops (hwloop_info loops) > >>> >>> else > >>> >>> bb->aux = NULL; > >>> >>> } > >>> >>> - cfg_layout_finalize (); > >>> >>> + cfg_layout_finalize (false); > >>> >>> clear_aux_for_blocks (); > >>> >>> df_analyze (); > >>> >>> } > >>> >>> Index: cfgcleanup.c > >>> >>> =================================================================== > >>> >>> --- cfgcleanup.c (revision 193376) > >>> >>> +++ cfgcleanup.c (working copy) > >>> >>> @@ -1824,7 +1824,7 @@ try_crossjump_to_edge (int mode, edge e1, > edge e2, > >>> >>> partition boundaries). See the comments at the top of > >>> >>> bb-reorder.c:partition_hot_cold_basic_blocks for complete > details. */ > >>> >>> > >>> >>> - if (flag_reorder_blocks_and_partition && reload_completed) > >>> >>> + if (crtl->has_bb_partition && reload_completed) > >>> >>> return false; > >>> >>> > >>> >>> /* Search backward through forwarder blocks. We don't need to > worry > >>> >>> @@ -2767,10 +2767,21 @@ try_optimize_cfg (int mode) > >>> >>> df_analyze (); > >>> >>> } > >>> >>> > >>> >>> + if (changed) > >>> >>> + { > >>> >>> + /* Edge forwarding in particular can cause hot > blocks previously > >>> >>> + reached by both hot and cold blocks to become > dominated only > >>> >>> + by cold blocks. This will cause the verification > below to fail, > >>> >>> + and lead to now cold code in the hot section. > This is not easy > >>> >>> + to detect and fix during edge forwarding, and in > some cases > >>> >>> + is only visible after newly unreachable blocks > are deleted, > >>> >>> + which will be done in fixup_partitions. */ > >>> >>> + fixup_partitions (); > >>> >>> + > >>> >>> #ifdef ENABLE_CHECKING > >>> >>> - if (changed) > >>> >>> - verify_flow_info (); > >>> >>> + verify_flow_info (); > >>> >>> #endif > >>> >>> + } > >>> >>> > >>> >>> changed_overall |= changed; > >>> >>> first_pass = false; > >>> >>> Index: bb-reorder.c > >>> >>> =================================================================== > >>> >>> --- bb-reorder.c (revision 193376) > >>> >>> +++ bb-reorder.c (working copy) > >>> >>> @@ -1054,7 +1054,7 @@ connect_traces (int n_traces, struct trace > *traces > >>> >>> current_partition = BB_PARTITION (traces[0].first); > >>> >>> two_passes = false; > >>> >>> > >>> >>> - if (flag_reorder_blocks_and_partition) > >>> >>> + if (crtl->has_bb_partition) > >>> >>> for (i = 0; i < n_traces && !two_passes; i++) > >>> >>> if (BB_PARTITION (traces[0].first) > >>> >>> != BB_PARTITION (traces[i].first)) > >>> >>> @@ -1263,7 +1263,7 @@ connect_traces (int n_traces, struct trace > *traces > >>> >>> } > >>> >>> } > >>> >>> > >>> >>> - if (flag_reorder_blocks_and_partition) > >>> >>> + if (crtl->has_bb_partition) > >>> >>> try_copy = false; > >>> >>> > >>> >>> /* Copy tiny blocks always; copy larger blocks only > when the > >>> >>> @@ -1381,13 +1381,14 @@ get_uncond_jump_length (void) > >>> >>> return length; > >>> >>> } > >>> >>> > >>> >>> -/* Emit a barrier into the footer of BB. */ > >>> >>> +/* Emit a barrier after BB, into the footer if we are in > CFGLAYOUT mode. */ > >>> >>> > >>> >>> -static void > >>> >>> +void > >>> >>> emit_barrier_after_bb (basic_block bb) > >>> >>> { > >>> >>> rtx barrier = emit_barrier_after (BB_END (bb)); > >>> >>> - BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier); > >>> >>> + if (current_ir_type () == IR_RTL_CFGLAYOUT) > >>> >>> + BB_FOOTER (bb) = unlink_insn_chain (barrier, barrier); > >>> >>> } > >>> >>> > >>> >>> /* The landing pad OLD_LP, in block OLD_BB, has edges from both > partitions. > >>> >>> @@ -1463,18 +1464,109 @@ > find_rarely_executed_basic_blocks_and_crossing_edg > >>> >>> { > >>> >>> VEC(edge, heap) *crossing_edges = NULL; > >>> >>> basic_block bb; > >>> >>> - edge e; > >>> >>> - edge_iterator ei; > >>> >>> + edge e, e2; > >>> >>> + edge_iterator ei, ei2; > >>> >>> + unsigned int cold_bb_count = 0; > >>> >>> + VEC (basic_block, heap) *bbs_in_hot_partition = NULL; > >>> >>> + VEC (basic_block, heap) *bbs_newly_hot = NULL; > >>> >>> > >>> >>> /* Mark which partition (hot/cold) each basic block belongs in. > */ > >>> >>> FOR_EACH_BB (bb) > >>> >>> { > >>> >>> if (probably_never_executed_bb_p (cfun, bb)) > >>> >>> - BB_SET_PARTITION (bb, BB_COLD_PARTITION); > >>> >>> + { > >>> >>> + BB_SET_PARTITION (bb, BB_COLD_PARTITION); > >>> >>> + cold_bb_count++; > >>> >>> + } > >>> >>> else > >>> >>> - BB_SET_PARTITION (bb, BB_HOT_PARTITION); > >>> >>> + { > >>> >>> + BB_SET_PARTITION (bb, BB_HOT_PARTITION); > >>> >>> + VEC_safe_push (basic_block, heap, bbs_in_hot_partition, > bb); > >>> >>> + } > >>> >>> } > >>> >>> > >>> >>> + /* Ensure that no cold bbs dominate hot bbs. This could happen > as a result of > >>> >>> + several different possibilities. One is that there are edge > weight insanities > >>> >>> + due to optimization phases that do not properly update basic > block profile > >>> >>> + counts. The second is that the entry of the function may not > be hot, because > >>> >>> + it is entered fewer times than the number of profile > training runs, but there > >>> >>> + is a loop inside the function that causes blocks within the > function to be > >>> >>> + above the threshold for hotness. */ > >>> >>> + if (cold_bb_count) > >>> >>> + { > >>> >>> + bool dom_calculated_here = !dom_info_available_p > (CDI_DOMINATORS); > >>> >>> + > >>> >>> + if (dom_calculated_here) > >>> >>> + calculate_dominance_info (CDI_DOMINATORS); > >>> >>> + > >>> >>> + /* Keep examining hot bbs until we have either checked them > all, or > >>> >>> + re-marked all cold bbs hot. */ > >>> >>> + while (! VEC_empty (basic_block, bbs_in_hot_partition) > >>> >>> + && cold_bb_count) > >>> >>> + { > >>> >>> + basic_block dom_bb; > >>> >>> + > >>> >>> + bb = VEC_pop (basic_block, bbs_in_hot_partition); > >>> >>> + dom_bb = get_immediate_dominator (CDI_DOMINATORS, bb); > >>> >>> + > >>> >>> + /* If bb's immediate dominator is also hot then it is > ok. */ > >>> >>> + if (BB_PARTITION (dom_bb) != BB_COLD_PARTITION) > >>> >>> + continue; > >>> >>> + > >>> >>> + /* We have a hot bb with an immediate dominator that is > cold. > >>> >>> + The dominator needs to be re-marked to hot. */ > >>> >>> + BB_SET_PARTITION (dom_bb, BB_HOT_PARTITION); > >>> >>> + cold_bb_count--; > >>> >>> + > >>> >>> + /* Now we need to examine newly-hot dom_bb to see if it > is also > >>> >>> + dominated by a cold bb. */ > >>> >>> + VEC_safe_push (basic_block, heap, bbs_in_hot_partition, > dom_bb); > >>> >>> + > >>> >>> + /* We should also adjust any cold blocks that the > newly-hot bb > >>> >>> + feeds and see if it makes sense to re-mark those as > hot as > >>> >>> + well. */ > >>> >>> + VEC_safe_push (basic_block, heap, bbs_newly_hot, > dom_bb); > >>> >>> + while (! VEC_empty (basic_block, bbs_newly_hot)) > >>> >>> + { > >>> >>> + basic_block new_hot_bb = VEC_pop (basic_block, > bbs_newly_hot); > >>> >>> + /* Examine all successors of this newly-hot bb to > see if they > >>> >>> + are cold and should be re-marked as hot. */ > >>> >>> + FOR_EACH_EDGE (e, ei, new_hot_bb->succs) > >>> >>> + { > >>> >>> + bool any_cold_preds = false; > >>> >>> + basic_block succ = e->dest; > >>> >>> + if (BB_PARTITION (succ) != BB_COLD_PARTITION) > >>> >>> + continue; > >>> >>> + /* Does this block have any cold predecessors > now? */ > >>> >>> + FOR_EACH_EDGE (e2, ei2, succ->preds) > >>> >>> + { > >>> >>> + if (BB_PARTITION (e2->src) == > BB_COLD_PARTITION) > >>> >>> + { > >>> >>> + any_cold_preds = true; > >>> >>> + break; > >>> >>> + } > >>> >>> + } > >>> >>> + if (any_cold_preds) > >>> >>> + continue; > >>> >>> + > >>> >>> + /* Here we have a successor of newly-hot bb > that is cold > >>> >>> + but no longer has any cold precessessors. > Since the original > >>> >>> + assignment of our newly-hot bb was > incorrect, this successor's > >>> >>> + assignment as cold is also suspect. Go ahead > and re-mark it > >>> >>> + as hot now too. Better heuristics may be in > order here. */ > >>> >>> + BB_SET_PARTITION (succ, BB_HOT_PARTITION); > >>> >>> + cold_bb_count--; > >>> >>> + VEC_safe_push (basic_block, heap, > bbs_in_hot_partition, succ); > >>> >>> + /* Examine this successor as a newly-hot bb. */ > >>> >>> + VEC_safe_push (basic_block, heap, > bbs_newly_hot, succ); > >>> >>> + } > >>> >>> + } > >>> >>> + } > >>> >>> + > >>> >>> + if (dom_calculated_here) > >>> >>> + free_dominance_info (CDI_DOMINATORS); > >>> >>> + } > >>> >>> + > >>> >>> /* The format of .gcc_except_table does not allow landing pads > to > >>> >>> be in a different partition as the throw. Fix this by either > >>> >>> moving or duplicating the landing pads. */ > >>> >>> @@ -1766,10 +1858,10 @@ fix_up_fall_thru_edges (void) > >>> >>> new_bb->aux = cur_bb->aux; > >>> >>> cur_bb->aux = new_bb; > >>> >>> > >>> >>> - /* Make sure new fall-through bb is in same > >>> >>> - partition as bb it's falling through > from. */ > >>> >>> + /* This is done by > force_nonfallthru_and_redirect. */ > >>> >>> + gcc_assert (BB_PARTITION (new_bb) > >>> >>> + == BB_PARTITION (cur_bb)); > >>> >>> > >>> >>> - BB_COPY_PARTITION (new_bb, cur_bb); > >>> >>> single_succ_edge (new_bb)->flags |= > EDGE_CROSSING; > >>> >>> } > >>> >>> else > >>> >>> @@ -2067,7 +2159,10 @@ add_reg_crossing_jump_notes (void) > >>> >>> FOR_EACH_BB (bb) > >>> >>> FOR_EACH_EDGE (e, ei, bb->succs) > >>> >>> if ((e->flags & EDGE_CROSSING) > >>> >>> - && JUMP_P (BB_END (e->src))) > >>> >>> + && JUMP_P (BB_END (e->src)) > >>> >>> + /* Some notes were added during fix_up_fall_thru_edges, > via > >>> >>> + force_nonfallthru_and_redirect. */ > >>> >>> + && !find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, > NULL_RTX)) > >>> >>> add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, > NULL_RTX); > >>> >>> } > >>> >>> > >>> >>> @@ -2160,7 +2255,7 @@ reorder_basic_blocks (void) > >>> >>> dump_flow_info (dump_file, dump_flags); > >>> >>> } > >>> >>> > >>> >>> - if (flag_reorder_blocks_and_partition) > >>> >>> + if (crtl->has_bb_partition) > >>> >>> verify_hot_cold_block_grouping (); > >>> >>> } > >>> >>> > >>> >>> @@ -2172,14 +2267,14 @@ reorder_basic_blocks (void) > >>> >>> encountering this note will make the compiler switch between > the > >>> >>> hot and cold text sections. */ > >>> >>> > >>> >>> -static void > >>> >>> +void > >>> >>> insert_section_boundary_note (void) > >>> >>> { > >>> >>> basic_block bb; > >>> >>> rtx new_note; > >>> >>> int first_partition = 0; > >>> >>> > >>> >>> - if (!flag_reorder_blocks_and_partition) > >>> >>> + if (!crtl->has_bb_partition) > >>> >>> return; > >>> >>> > >>> >>> FOR_EACH_BB (bb) > >>> >>> @@ -2222,10 +2317,8 @@ rest_of_handle_reorder_blocks (void) > >>> >>> FOR_EACH_BB (bb) > >>> >>> if (bb->next_bb != EXIT_BLOCK_PTR) > >>> >>> bb->aux = bb->next_bb; > >>> >>> - cfg_layout_finalize (); > >>> >>> + cfg_layout_finalize (true); > >>> >>> > >>> >>> - /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes. */ > >>> >>> - insert_section_boundary_note (); > >>> >>> return 0; > >>> >>> } > >>> >>> > >>> >>> @@ -2366,7 +2459,7 @@ duplicate_computed_gotos (void) > >>> >>> } > >>> >>> > >>> >>> done: > >>> >>> - cfg_layout_finalize (); > >>> >>> + cfg_layout_finalize (false); > >>> >>> > >>> >>> BITMAP_FREE (candidates); > >>> >>> return 0; > >>> >>> @@ -2511,6 +2604,8 @@ partition_hot_cold_basic_blocks (void) > >>> >>> if (crossing_edges == NULL) > >>> >>> return 0; > >>> >>> > >>> >>> + crtl->has_bb_partition = true; > >>> >>> + > >>> >>> /* Make sure the source of any crossing edge ends in a jump and > the > >>> >>> destination of any crossing edge has a label. */ > >>> >>> add_labels_and_missing_jumps (crossing_edges); > >>> >>> Index: bb-reorder.h > >>> >>> =================================================================== > >>> >>> --- bb-reorder.h (revision 193376) > >>> >>> +++ bb-reorder.h (working copy) > >>> >>> @@ -36,4 +36,8 @@ extern struct target_bb_reorder > *this_target_bb_re > >>> >>> > >>> >>> extern int get_uncond_jump_length (void); > >>> >>> > >>> >>> +extern void insert_section_boundary_note (void); > >>> >>> + > >>> >>> +extern void emit_barrier_after_bb (basic_block bb); > >>> >>> + > >>> >>> #endif > >>> >>> Index: basic-block.h > >>> >>> =================================================================== > >>> >>> --- basic-block.h (revision 193376) > >>> >>> +++ basic-block.h (working copy) > >>> >>> @@ -806,6 +806,7 @@ extern basic_block > force_nonfallthru_and_redirect > >>> >>> extern bool contains_no_active_insn_p (const_basic_block); > >>> >>> extern bool forwarder_block_p (const_basic_block); > >>> >>> extern bool can_fallthru (basic_block, basic_block); > >>> >>> +extern void fixup_partitions (void); > >>> >>> > >>> >>> /* In cfgbuild.c. */ > >>> >>> extern void find_many_sub_basic_blocks (sbitmap); > >>> >>> Index: cfgrtl.c > >>> >>> =================================================================== > >>> >>> --- cfgrtl.c (revision 193376) > >>> >>> +++ cfgrtl.c (working copy) > >>> >>> @@ -46,6 +46,7 @@ along with GCC; see the file COPYING3. If not > see > >>> >>> #include "tree.h" > >>> >>> #include "hard-reg-set.h" > >>> >>> #include "basic-block.h" > >>> >>> +#include "bb-reorder.h" > >>> >>> #include "regs.h" > >>> >>> #include "flags.h" > >>> >>> #include "function.h" > >>> >>> @@ -67,11 +68,12 @@ along with GCC; see the file COPYING3. If not > see > >>> >>> Only applicable if the CFG is in cfglayout mode. */ > >>> >>> static GTY(()) rtx cfg_layout_function_footer; > >>> >>> static GTY(()) rtx cfg_layout_function_header; > >>> >>> +static bool had_sec_boundary_notes; > >>> >>> > >>> >>> static rtx skip_insns_after_block (basic_block); > >>> >>> static void record_effective_endpoints (void); > >>> >>> static rtx label_for_bb (basic_block); > >>> >>> -static void fixup_reorder_chain (void); > >>> >>> +static void fixup_reorder_chain (bool finalize_reorder_blocks); > >>> >>> > >>> >>> void verify_insn_chain (void); > >>> >>> static void fixup_fallthru_exit_predecessor (void); > >>> >>> @@ -976,8 +978,7 @@ try_redirect_by_replacing_jump (edge e, > basic_bloc > >>> >>> partition boundaries). See the comments at the top of > >>> >>> bb-reorder.c:partition_hot_cold_basic_blocks for complete > details. */ > >>> >>> > >>> >>> - if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX) > >>> >>> - || BB_PARTITION (src) != BB_PARTITION (target)) > >>> >>> + if (BB_PARTITION (src) != BB_PARTITION (target)) > >>> >>> return NULL; > >>> >>> > >>> >>> /* We can replace or remove a complex jump only when we have > exactly > >>> >>> @@ -1286,6 +1287,71 @@ redirect_branch_edge (edge e, basic_block > target) > >>> >>> return e; > >>> >>> } > >>> >>> > >>> >>> +/* Called when edge E has been redirected to a new destination, > >>> >>> + in order to update the region crossing flag on the edge and > >>> >>> + jump. */ > >>> >>> + > >>> >>> +static void > >>> >>> +fixup_partition_crossing (edge e, basic_block target) > >>> >>> +{ > >>> >>> + rtx note; > >>> >>> + > >>> >>> + gcc_assert (e->dest == target); > >>> >>> + > >>> >>> + if (e->src == ENTRY_BLOCK_PTR || target == EXIT_BLOCK_PTR) > >>> >>> + return; > >>> >>> + /* If we redirected an existing edge, it may already be marked > >>> >>> + crossing, even though the new src is missing a reg crossing > note. > >>> >>> + But make sure reg crossing note doesn't already exist before > >>> >>> + inserting. */ > >>> >>> + if (BB_PARTITION (e->src) != BB_PARTITION (target)) > >>> >>> + { > >>> >>> + e->flags |= EDGE_CROSSING; > >>> >>> + note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, > NULL_RTX); > >>> >>> + if (JUMP_P (BB_END (e->src)) > >>> >>> + && !note) > >>> >>> + add_reg_note (BB_END (e->src), REG_CROSSING_JUMP, > NULL_RTX); > >>> >>> + } > >>> >>> + else if (BB_PARTITION (e->src) == BB_PARTITION (target)) > >>> >>> + { > >>> >>> + e->flags &= ~EDGE_CROSSING; > >>> >>> + /* Remove the region crossing note from jump at end of > >>> >>> + e->src if it exists. */ > >>> >>> + note = find_reg_note (BB_END (e->src), REG_CROSSING_JUMP, > NULL_RTX); > >>> >>> + if (note) > >>> >>> + remove_note (BB_END (e->src), note); > >>> >>> + } > >>> >>> +} > >>> >>> + > >>> >>> +/* Called when block BB has been reassigned to a different > partition, > >>> >>> + to ensure that the region crossing attributes are updated. */ > >>> >>> + > >>> >>> +static void > >>> >>> +fixup_bb_partition (basic_block bb) > >>> >>> +{ > >>> >>> + edge e; > >>> >>> + edge_iterator ei; > >>> >>> + > >>> >>> + /* Now need to make bb's pred edges non-region crossing. */ > >>> >>> + FOR_EACH_EDGE (e, ei, bb->preds) > >>> >>> + { > >>> >>> + fixup_partition_crossing (e, e->dest); > >>> >>> + } > >>> >>> + > >>> >>> + /* Possibly need to make bb's successor edges region crossing, > >>> >>> + or remove stale region crossing. */ > >>> >>> + FOR_EACH_EDGE (e, ei, bb->succs) > >>> >>> + { > >>> >>> + if ((e->flags & EDGE_FALLTHRU) > >>> >>> + && BB_PARTITION (bb) != BB_PARTITION (e->dest) > >>> >>> + && e->dest != EXIT_BLOCK_PTR) > >>> >>> + /* force_nonfallthru_and_redirect calls > fixup_partition_crossing. */ > >>> >>> + force_nonfallthru (e); > >>> >>> + else > >>> >>> + fixup_partition_crossing (e, e->dest); > >>> >>> + } > >>> >>> +} > >>> >>> + > >>> >>> /* Attempt to change code to redirect edge E to TARGET. Don't do > that on > >>> >>> expense of adding new instructions or reordering basic blocks. > >>> >>> > >>> >>> @@ -1302,16 +1368,18 @@ rtl_redirect_edge_and_branch (edge e, > basic_block > >>> >>> { > >>> >>> edge ret; > >>> >>> basic_block src = e->src; > >>> >>> + basic_block dest = e->dest; > >>> >>> > >>> >>> if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH)) > >>> >>> return NULL; > >>> >>> > >>> >>> - if (e->dest == target) > >>> >>> + if (dest == target) > >>> >>> return e; > >>> >>> > >>> >>> if ((ret = try_redirect_by_replacing_jump (e, target, false)) > != NULL) > >>> >>> { > >>> >>> df_set_bb_dirty (src); > >>> >>> + fixup_partition_crossing (ret, target); > >>> >>> return ret; > >>> >>> } > >>> >>> > >>> >>> @@ -1320,6 +1388,7 @@ rtl_redirect_edge_and_branch (edge e, > basic_block > >>> >>> return NULL; > >>> >>> > >>> >>> df_set_bb_dirty (src); > >>> >>> + fixup_partition_crossing (ret, target); > >>> >>> return ret; > >>> >>> } > >>> >>> > >>> >>> @@ -1454,18 +1523,16 @@ force_nonfallthru_and_redirect (edge e, > basic_bloc > >>> >>> /* Make sure new block ends up in correct hot/cold section. > */ > >>> >>> > >>> >>> BB_COPY_PARTITION (jump_block, e->src); > >>> >>> - if (flag_reorder_blocks_and_partition > >>> >>> - && targetm_common.have_named_sections > >>> >>> - && JUMP_P (BB_END (jump_block)) > >>> >>> - && !any_condjump_p (BB_END (jump_block)) > >>> >>> - && (EDGE_SUCC (jump_block, 0)->flags & EDGE_CROSSING)) > >>> >>> - add_reg_note (BB_END (jump_block), REG_CROSSING_JUMP, > NULL_RTX); > >>> >>> > >>> >>> /* Wire edge in. */ > >>> >>> new_edge = make_edge (e->src, jump_block, EDGE_FALLTHRU); > >>> >>> new_edge->probability = probability; > >>> >>> new_edge->count = count; > >>> >>> > >>> >>> + /* If e->src was previously region crossing, it no longer is > >>> >>> + and the reg crossing note should be removed. */ > >>> >>> + fixup_partition_crossing (new_edge, jump_block); > >>> >>> + > >>> >>> /* Redirect old edge. */ > >>> >>> redirect_edge_pred (e, jump_block); > >>> >>> e->probability = REG_BR_PROB_BASE; > >>> >>> @@ -1521,13 +1588,16 @@ force_nonfallthru_and_redirect (edge e, > basic_bloc > >>> >>> LABEL_NUSES (label)++; > >>> >>> } > >>> >>> > >>> >>> - emit_barrier_after (BB_END (jump_block)); > >>> >>> + /* We might be in cfg layout mode, and if so, the following > routine will > >>> >>> + insert the barrier correctly. */ > >>> >>> + emit_barrier_after_bb (jump_block); > >>> >>> redirect_edge_succ_nodup (e, target); > >>> >>> > >>> >>> if (abnormal_edge_flags) > >>> >>> make_edge (src, target, abnormal_edge_flags); > >>> >>> > >>> >>> df_mark_solutions_dirty (); > >>> >>> + fixup_partition_crossing (e, target); > >>> >>> return new_bb; > >>> >>> } > >>> >>> > >>> >>> @@ -1626,7 +1696,7 @@ rtl_move_block_after (basic_block bb > ATTRIBUTE_UNU > >>> >>> static basic_block > >>> >>> rtl_split_edge (edge edge_in) > >>> >>> { > >>> >>> - basic_block bb; > >>> >>> + basic_block bb, new_bb; > >>> >>> rtx before; > >>> >>> > >>> >>> /* Abnormal edges cannot be split. */ > >>> >>> @@ -1659,12 +1729,26 @@ rtl_split_edge (edge edge_in) > >>> >>> else > >>> >>> { > >>> >>> bb = create_basic_block (before, NULL, > edge_in->dest->prev_bb); > >>> >>> - /* ??? Why not edge_in->dest->prev_bb here? */ > >>> >>> - BB_COPY_PARTITION (bb, edge_in->dest); > >>> >>> + if (edge_in->src == ENTRY_BLOCK_PTR) > >>> >>> + BB_COPY_PARTITION (bb, edge_in->dest); > >>> >>> + else > >>> >>> + /* Put the split bb into the src partition, to avoid > creating > >>> >>> + a situation where a cold bb dominates a hot bb, in the > case > >>> >>> + where src is cold and dest is hot. The src will > dominate > >>> >>> + the new bb (whereas it might not have dominated dest). > */ > >>> >>> + BB_COPY_PARTITION (bb, edge_in->src); > >>> >>> } > >>> >>> > >>> >>> make_single_succ_edge (bb, edge_in->dest, EDGE_FALLTHRU); > >>> >>> > >>> >>> + /* Can't allow a region crossing edge to be fallthrough. */ > >>> >>> + if (BB_PARTITION (bb) != BB_PARTITION (edge_in->dest) > >>> >>> + && edge_in->dest != EXIT_BLOCK_PTR) > >>> >>> + { > >>> >>> + new_bb = force_nonfallthru (single_succ_edge (bb)); > >>> >>> + gcc_assert (!new_bb); > >>> >>> + } > >>> >>> + > >>> >>> /* For non-fallthru edges, we must adjust the predecessor's > >>> >>> jump instruction to target our new block. */ > >>> >>> if ((edge_in->flags & EDGE_FALLTHRU) == 0) > >>> >>> @@ -1777,17 +1861,13 @@ commit_one_edge_insertion (edge e) > >>> >>> else > >>> >>> { > >>> >>> bb = split_edge (e); > >>> >>> - after = BB_END (bb); > >>> >>> > >>> >>> - if (flag_reorder_blocks_and_partition > >>> >>> - && targetm_common.have_named_sections > >>> >>> - && e->src != ENTRY_BLOCK_PTR > >>> >>> - && BB_PARTITION (e->src) == BB_COLD_PARTITION > >>> >>> - && !(e->flags & EDGE_CROSSING) > >>> >>> - && JUMP_P (after) > >>> >>> - && !any_condjump_p (after) > >>> >>> - && (single_succ_edge (bb)->flags & EDGE_CROSSING)) > >>> >>> - add_reg_note (after, REG_CROSSING_JUMP, NULL_RTX); > >>> >>> + /* If e crossed a partition boundary, we needed to make bb > end in > >>> >>> + a region-crossing jump, even though it was originally > fallthru. */ > >>> >>> + if (JUMP_P (BB_END (bb))) > >>> >>> + before = BB_END (bb); > >>> >>> + else > >>> >>> + after = BB_END (bb); > >>> >>> } > >>> >>> > >>> >>> /* Now that we've found the spot, do the insertion. */ > >>> >>> @@ -1827,6 +1907,14 @@ commit_edge_insertions (void) > >>> >>> { > >>> >>> basic_block bb; > >>> >>> > >>> >>> + /* Optimization passes that invoke this routine can cause hot > blocks > >>> >>> + previously reached by both hot and cold blocks to become > dominated only > >>> >>> + by cold blocks. This will cause the verification below to > fail, > >>> >>> + and lead to now cold code in the hot section. In some cases > this > >>> >>> + may only be visible after newly unreachable blocks are > deleted, > >>> >>> + which will be done by fixup_partitions. */ > >>> >>> + fixup_partitions (); > >>> >>> + > >>> >>> #ifdef ENABLE_CHECKING > >>> >>> verify_flow_info (); > >>> >>> #endif > >>> >>> @@ -2028,7 +2116,75 @@ get_last_bb_insn (basic_block bb) > >>> >>> > >>> >>> return end; > >>> >>> } > >>> >>> - > >>> >>> + > >>> >>> +/* Perform cleanup on the hot/cold bb partitioning after > optimization > >>> >>> + passes that modify the cfg. */ > >>> >>> + > >>> >>> +void > >>> >>> +fixup_partitions (void) > >>> >>> +{ > >>> >>> + basic_block bb; > >>> >>> + > >>> >>> + if (!crtl->has_bb_partition) > >>> >>> + return; > >>> >>> + > >>> >>> + /* Delete any blocks that became unreachable and weren't > >>> >>> + already cleaned up, for example during edge forwarding > >>> >>> + and convert_jumps_to_returns. This will expose more > >>> >>> + opportunities for fixing the partition boundaries here. > >>> >>> + Also, the calculation of the dominance graph during > verification > >>> >>> + will assert if there are unreachable nodes. */ > >>> >>> + delete_unreachable_blocks (); > >>> >>> + > >>> >>> + /* If there are partitions, do a sanity check on them: A basic > block in > >>> >>> + a cold partition cannot dominate a basic block in a hot > partition. > >>> >>> + Fixup any that now violate this requirement, as a result of > edge > >>> >>> + forwarding and unreachable block deletion. */ > >>> >>> + VEC (basic_block, heap) *bbs_in_cold_partition = NULL; > >>> >>> + VEC (basic_block, heap) *bbs_to_fix = NULL; > >>> >>> + FOR_EACH_BB (bb) > >>> >>> + if ((BB_PARTITION (bb) == BB_COLD_PARTITION)) > >>> >>> + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, > bb); > >>> >>> + if (! VEC_empty (basic_block, bbs_in_cold_partition)) > >>> >>> + { > >>> >>> + bool dom_calculated_here = !dom_info_available_p > (CDI_DOMINATORS); > >>> >>> + basic_block son; > >>> >>> + > >>> >>> + if (dom_calculated_here) > >>> >>> + calculate_dominance_info (CDI_DOMINATORS); > >>> >>> + > >>> >>> + while (! VEC_empty (basic_block, bbs_in_cold_partition)) > >>> >>> + { > >>> >>> + bb = VEC_pop (basic_block, bbs_in_cold_partition); > >>> >>> + /* If bb is not yet cold (because it was added below as > >>> >>> + a block dominated by a cold bb) then mark it cold > here. */ > >>> >>> + if ((BB_PARTITION (bb) != BB_COLD_PARTITION)) > >>> >>> + { > >>> >>> + BB_SET_PARTITION (bb, BB_COLD_PARTITION); > >>> >>> + VEC_safe_push (basic_block, heap, bbs_to_fix, bb); > >>> >>> + } > >>> >>> + /* Any blocks dominated by a block in the cold section > >>> >>> + must also be cold. */ > >>> >>> + for (son = first_dom_son (CDI_DOMINATORS, bb); > >>> >>> + son; > >>> >>> + son = next_dom_son (CDI_DOMINATORS, son)) > >>> >>> + VEC_safe_push (basic_block, heap, > bbs_in_cold_partition, son); > >>> >>> + } > >>> >>> + > >>> >>> + if (dom_calculated_here) > >>> >>> + free_dominance_info (CDI_DOMINATORS); > >>> >>> + } > >>> >>> + > >>> >>> + /* Do the partition fixup after all necessary blocks have been > converted to > >>> >>> + cold, so that we only update the region crossings the > minimum number of > >>> >>> + places, which can require forcing edges to be non fallthru. > */ > >>> >>> + while (! VEC_empty (basic_block, bbs_to_fix)) > >>> >>> + { > >>> >>> + bb = VEC_pop (basic_block, bbs_to_fix); > >>> >>> + fixup_bb_partition (bb); > >>> >>> + } > >>> >>> +} > >>> >>> + > >>> >>> /* Verify the CFG and RTL consistency common for both underlying > RTL and > >>> >>> cfglayout RTL. > >>> >>> > >>> >>> @@ -2052,6 +2208,7 @@ rtl_verify_flow_info_1 (void) > >>> >>> rtx x; > >>> >>> int err = 0; > >>> >>> basic_block bb; > >>> >>> + bool have_partitions = false; > >>> >>> > >>> >>> /* Check the general integrity of the basic blocks. */ > >>> >>> FOR_EACH_BB_REVERSE (bb) > >>> >>> @@ -2169,6 +2326,8 @@ rtl_verify_flow_info_1 (void) > >>> >>> > >>> >>> if (e->flags & EDGE_ABNORMAL) > >>> >>> n_abnormal++; > >>> >>> + > >>> >>> + have_partitions |= is_crossing; > >>> >>> } > >>> >>> > >>> >>> if (n_eh && !find_reg_note (BB_END (bb), REG_EH_REGION, > NULL_RTX)) > >>> >>> @@ -2293,6 +2452,40 @@ rtl_verify_flow_info_1 (void) > >>> >>> } > >>> >>> } > >>> >>> > >>> >>> + /* If there are partitions, do a sanity check on them: A basic > block in > >>> >>> + a cold partition cannot dominate a basic block in a hot > partition. */ > >>> >>> + VEC (basic_block, heap) *bbs_in_cold_partition = NULL; > >>> >>> + if (have_partitions && !err) > >>> >>> + FOR_EACH_BB (bb) > >>> >>> + if ((BB_PARTITION (bb) == BB_COLD_PARTITION)) > >>> >>> + VEC_safe_push (basic_block, heap, bbs_in_cold_partition, > bb); > >>> >>> + if (! VEC_empty (basic_block, bbs_in_cold_partition)) > >>> >>> + { > >>> >>> + bool dom_calculated_here = !dom_info_available_p > (CDI_DOMINATORS); > >>> >>> + basic_block son; > >>> >>> + > >>> >>> + if (dom_calculated_here) > >>> >>> + calculate_dominance_info (CDI_DOMINATORS); > >>> >>> + > >>> >>> + while (! VEC_empty (basic_block, bbs_in_cold_partition)) > >>> >>> + { > >>> >>> + bb = VEC_pop (basic_block, bbs_in_cold_partition); > >>> >>> + if ((BB_PARTITION (bb) != BB_COLD_PARTITION)) > >>> >>> + { > >>> >>> + error ("non-cold basic block %d dominated " > >>> >>> + "by a block in the cold partition", > bb->index); > >>> >>> + err = 1; > >>> >>> + } > >>> >>> + for (son = first_dom_son (CDI_DOMINATORS, bb); > >>> >>> + son; > >>> >>> + son = next_dom_son (CDI_DOMINATORS, son)) > >>> >>> + VEC_safe_push (basic_block, heap, > bbs_in_cold_partition, son); > >>> >>> + } > >>> >>> + > >>> >>> + if (dom_calculated_here) > >>> >>> + free_dominance_info (CDI_DOMINATORS); > >>> >>> + } > >>> >>> + > >>> >>> /* Clean up. */ > >>> >>> return err; > >>> >>> } > >>> >>> @@ -2965,14 +3158,41 @@ record_effective_endpoints (void) > >>> >>> else > >>> >>> cfg_layout_function_header = NULL_RTX; > >>> >>> > >>> >>> + had_sec_boundary_notes = false; > >>> >>> + > >>> >>> next_insn = get_insns (); > >>> >>> FOR_EACH_BB (bb) > >>> >>> { > >>> >>> rtx end; > >>> >>> > >>> >>> if (PREV_INSN (BB_HEAD (bb)) && next_insn != BB_HEAD (bb)) > >>> >>> - BB_HEADER (bb) = unlink_insn_chain (next_insn, > >>> >>> - PREV_INSN (BB_HEAD > (bb))); > >>> >>> + { > >>> >>> + /* Rather than try to keep section boundary notes > incrementally > >>> >>> + up-to-date through cfg layout optimizations, simply > remove them > >>> >>> + and flag that they should be re-inserted when exiting > >>> >>> + cfg layout mode. */ > >>> >>> + rtx check_insn = next_insn; > >>> >>> + while (check_insn) > >>> >>> + { > >>> >>> + if (NOTE_P (check_insn) > >>> >>> + && NOTE_KIND (check_insn) == > NOTE_INSN_SWITCH_TEXT_SECTIONS) > >>> >>> + { > >>> >>> + had_sec_boundary_notes |= true; > >>> >>> + /* Remove note from chain. Grab new next_insn > first. */ > >>> >>> + if (next_insn == check_insn) > >>> >>> + next_insn = NEXT_INSN (check_insn); > >>> >>> + /* Delete note. */ > >>> >>> + delete_insn (check_insn); > >>> >>> + /* There will only be one. */ > >>> >>> + break; > >>> >>> + } > >>> >>> + check_insn = NEXT_INSN (check_insn); > >>> >>> + } > >>> >>> + /* If we still have header instructions left after > above loop. */ > >>> >>> + if (next_insn != BB_HEAD (bb)) > >>> >>> + BB_HEADER (bb) = unlink_insn_chain (next_insn, > >>> >>> + PREV_INSN > (BB_HEAD (bb))); > >>> >>> + } > >>> >>> end = skip_insns_after_block (bb); > >>> >>> if (NEXT_INSN (BB_END (bb)) && BB_END (bb) != end) > >>> >>> BB_FOOTER (bb) = unlink_insn_chain (NEXT_INSN (BB_END > (bb)), end); > >>> >>> @@ -3000,7 +3220,7 @@ outof_cfg_layout_mode (void) > >>> >>> if (bb->next_bb != EXIT_BLOCK_PTR) > >>> >>> bb->aux = bb->next_bb; > >>> >>> > >>> >>> - cfg_layout_finalize (); > >>> >>> + cfg_layout_finalize (false); > >>> >>> > >>> >>> return 0; > >>> >>> } > >>> >>> @@ -3120,10 +3340,13 @@ relink_block_chain (bool > stay_in_cfglayout_mode) > >>> >>> } > >>> >>> > >>> >>> > >>> >>> -/* Given a reorder chain, rearrange the code to match. */ > >>> >>> +/* Given a reorder chain, rearrange the code to match. If > >>> >>> + this is called when we will FINALIZE_REORDER_BLOCKS, or when > >>> >>> + section boundary notes were removed on entry to cfg layout > >>> >>> + mode, insert section boundary notes here. */ > >>> >>> > >>> >>> static void > >>> >>> -fixup_reorder_chain (void) > >>> >>> +fixup_reorder_chain (bool finalize_reorder_blocks) > >>> >>> { > >>> >>> basic_block bb; > >>> >>> rtx insn = NULL; > >>> >>> @@ -3150,7 +3373,7 @@ static void > >>> >>> PREV_INSN (BB_HEADER (bb)) = insn; > >>> >>> insn = BB_HEADER (bb); > >>> >>> while (NEXT_INSN (insn)) > >>> >>> - insn = NEXT_INSN (insn); > >>> >>> + insn = NEXT_INSN (insn); > >>> >>> } > >>> >>> if (insn) > >>> >>> NEXT_INSN (insn) = BB_HEAD (bb); > >>> >>> @@ -3175,6 +3398,11 @@ static void > >>> >>> insn = NEXT_INSN (insn); > >>> >>> > >>> >>> set_last_insn (insn); > >>> >>> + > >>> >>> + /* Add NOTE_INSN_SWITCH_TEXT_SECTIONS notes. */ > >>> >>> + if (had_sec_boundary_notes || finalize_reorder_blocks) > >>> >>> + insert_section_boundary_note (); > >>> >>> + > >>> >>> #ifdef ENABLE_CHECKING > >>> >>> verify_insn_chain (); > >>> >>> #endif > >>> >>> @@ -3187,7 +3415,7 @@ static void > >>> >>> edge e_fall, e_taken, e; > >>> >>> rtx bb_end_insn; > >>> >>> rtx ret_label = NULL_RTX; > >>> >>> - basic_block nb, src_bb; > >>> >>> + basic_block nb; > >>> >>> edge_iterator ei; > >>> >>> > >>> >>> if (EDGE_COUNT (bb->succs) == 0) > >>> >>> @@ -3322,7 +3550,6 @@ static void > >>> >>> /* We got here if we need to add a new jump insn. > >>> >>> Note force_nonfallthru can delete E_FALL and thus we have > to > >>> >>> save E_FALL->src prior to the call to force_nonfallthru. > */ > >>> >>> - src_bb = e_fall->src; > >>> >>> nb = force_nonfallthru_and_redirect (e_fall, e_fall->dest, > ret_label); > >>> >>> if (nb) > >>> >>> { > >>> >>> @@ -3330,17 +3557,6 @@ static void > >>> >>> bb->aux = nb; > >>> >>> /* Don't process this new block. */ > >>> >>> bb = nb; > >>> >>> - > >>> >>> - /* Make sure new bb is tagged for correct section (same > as > >>> >>> - fall-thru source, since you cannot fall-thru across > >>> >>> - section boundaries). */ > >>> >>> - BB_COPY_PARTITION (src_bb, single_pred (bb)); > >>> >>> - if (flag_reorder_blocks_and_partition > >>> >>> - && targetm_common.have_named_sections > >>> >>> - && JUMP_P (BB_END (bb)) > >>> >>> - && !any_condjump_p (BB_END (bb)) > >>> >>> - && (EDGE_SUCC (bb, 0)->flags & EDGE_CROSSING)) > >>> >>> - add_reg_note (BB_END (bb), REG_CROSSING_JUMP, > NULL_RTX); > >>> >>> } > >>> >>> } > >>> >>> > >>> >>> @@ -3644,10 +3860,11 @@ duplicate_insn_chain (rtx from, rtx to) > >>> >>> case NOTE_INSN_FUNCTION_BEG: > >>> >>> /* There is always just single entry to function. */ > >>> >>> case NOTE_INSN_BASIC_BLOCK: > >>> >>> + /* We should only switch text sections once. */ > >>> >>> + case NOTE_INSN_SWITCH_TEXT_SECTIONS: > >>> >>> break; > >>> >>> > >>> >>> case NOTE_INSN_EPILOGUE_BEG: > >>> >>> - case NOTE_INSN_SWITCH_TEXT_SECTIONS: > >>> >>> emit_note_copy (insn); > >>> >>> break; > >>> >>> > >>> >>> @@ -3759,10 +3976,13 @@ break_superblocks (void) > >>> >>> } > >>> >>> > >>> >>> /* Finalize the changes: reorder insn list according to the > sequence specified > >>> >>> - by aux pointers, enter compensation code, rebuild scope > forest. */ > >>> >>> + by aux pointers, enter compensation code, rebuild scope > forest. If > >>> >>> + this is called when we will FINALIZE_REORDER_BLOCKS, indicate > that > >>> >>> + to fixup_reorder_chain so that it can insert the proper switch > text > >>> >>> + section notes. */ > >>> >>> > >>> >>> void > >>> >>> -cfg_layout_finalize (void) > >>> >>> +cfg_layout_finalize (bool finalize_reorder_blocks) > >>> >>> { > >>> >>> #ifdef ENABLE_CHECKING > >>> >>> verify_flow_info (); > >>> >>> @@ -3775,7 +3995,7 @@ void > >>> >>> #endif > >>> >>> ) > >>> >>> fixup_fallthru_exit_predecessor (); > >>> >>> - fixup_reorder_chain (); > >>> >>> + fixup_reorder_chain (finalize_reorder_blocks); > >>> >>> > >>> >>> rebuild_jump_labels (get_insns ()); > >>> >>> delete_dead_jumptables (); > >>> >>> @@ -4454,8 +4674,7 @@ rtl_can_remove_branch_p (const_edge e) > >>> >>> if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH)) > >>> >>> return false; > >>> >>> > >>> >>> - if (find_reg_note (insn, REG_CROSSING_JUMP, NULL_RTX) > >>> >>> - || BB_PARTITION (src) != BB_PARTITION (target)) > >>> >>> + if (BB_PARTITION (src) != BB_PARTITION (target)) > >>> >>> return false; > >>> >>> > >>> >>> if (!onlyjump_p (insn) > >>> >>> > >>> >>> -- > >>> >>> This patch is available for review at > http://codereview.appspot.com/6823047 > >>> >> > >>> >> > >>> >> > >>> >> -- > >>> >> Teresa Johnson | Software Engineer | tejohnson@google.com | > 408-460-2413 > >>> > >>> > >>> > >>> -- > >>> Teresa Johnson | Software Engineer | tejohnson@google.com | > 408-460-2413 > > > > > > > > -- > > Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413 > -- Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Sign in to reply to this message.
Hello, Sorry for the long delay (ref http://patchwork.ozlabs.org/patch/199397/) On 6 December 2012 20:26, Teresa Johnson <tejohnson@google.com> wrote: > On Wed, Nov 28, 2012 at 7:48 AM, Christophe Lyon < > christophe.lyon@linaro.org> wrote: > >> I have updated my trunk checkout, and I can confirm that eval.c now >> compiles with your patch (and the other 4 patches I added to PR55121). >> > > good > > >> Now, when looking at the whole Spec2k results: >> - vpr passes now (used to fail) >> > > good > > >> - gcc, parser, perlbmk bzip2 and twolf no longer build: they all fail >> with the same error from gas: >> can't resolve `.text.unlikely' {.text.unlikely section} - `.LBB171' >> {.text section} >> - gap still does not build (same error as above) >> >> I haven't looked in detail, so I may be missing an obvious patch here. >> > > Finally had a chance to get back to this. I was able to reproduce the > failure using x86_64 linux with "-freorder-blocks-and-partition -g". > However, I am also getting the same failure with a pristine copy of trunk. > Can you confirm whether you were seeing any of these failures without my > patches, because I believe they are probably a limitation with function > splitting and debug info that is orthogonal to my patch. > > Yes I confirm that I see these failures without your patch too; and both -freorder-blocks-and-partition and -g are present in my command-line. And now gap's integer.c fails to compile with a similar error message too. > >> And I still observe runtime mis-behaviour on crafty, galgel, facerec and >> fma3d. >> > > I'm not seeing this on x86_64, unfortunately, so it might require some > follow-on work to triage and fix. > > I'll look into the gas failure, but if someone could review this patch in > the meantime given that it does improve things considerably (at least > without -g), that would be great. > > Thanks, > Teresa > > > Thanks, Christophe
Sign in to reply to this message.
Hello, Sorry for the long delay (ref http://patchwork.ozlabs.org/patch/199397/) On 6 December 2012 20:26, Teresa Johnson <tejohnson@google.com> wrote: > > > > On Wed, Nov 28, 2012 at 7:48 AM, Christophe Lyon > <christophe.lyon@linaro.org> wrote: >> >> I have updated my trunk checkout, and I can confirm that eval.c now >> compiles with your patch (and the other 4 patches I added to PR55121). > > > good > >> >> >> Now, when looking at the whole Spec2k results: >> - vpr passes now (used to fail) > > > good > >> >> - gcc, parser, perlbmk bzip2 and twolf no longer build: they all fail >> with the same error from gas: >> can't resolve `.text.unlikely' {.text.unlikely section} - `.LBB171' >> {.text section} >> - gap still does not build (same error as above) >> >> I haven't looked in detail, so I may be missing an obvious patch here. > > > Finally had a chance to get back to this. I was able to reproduce the > failure using x86_64 linux with "-freorder-blocks-and-partition -g". > However, I am also getting the same failure with a pristine copy of trunk. > Can you confirm whether you were seeing any of these failures without my > patches, because I believe they are probably a limitation with function > splitting and debug info that is orthogonal to my patch. > Yes I confirm that I see these failures without your patch too; and both -freorder-blocks-and-partition and -g are present in my command-line. And now gap's integer.c fails to compile with a similar error message too. >> >> And I still observe runtime mis-behaviour on crafty, galgel, facerec and >> fma3d. > > > I'm not seeing this on x86_64, unfortunately, so it might require some > follow-on work to triage and fix. > > I'll look into the gas failure, but if someone could review this patch in > the meantime given that it does improve things considerably (at least > without -g), that would be great. > Indeed. > Thanks, > Teresa > Thanks Christophe
Sign in to reply to this message.
Thanks for the confirmation that the -g issue is orthogonal. I did start to try to address it but got pulled away by some other things for awhile. I'll see if I can take another stab at it. In the meantime, could one of the global maintainers take a look at the patch? I don't want it to get too stale, and without these fixes I am unable to get -freorder-blocks-and-partition to work at all. Thanks! Teresa On Thu, Jan 31, 2013 at 6:18 AM, Christophe Lyon <christophe.lyon@linaro.org> wrote: > Hello, > > Sorry for the long delay (ref http://patchwork.ozlabs.org/patch/199397/) > > > > On 6 December 2012 20:26, Teresa Johnson <tejohnson@google.com> wrote: >> >> >> >> On Wed, Nov 28, 2012 at 7:48 AM, Christophe Lyon >> <christophe.lyon@linaro.org> wrote: >>> >>> I have updated my trunk checkout, and I can confirm that eval.c now >>> compiles with your patch (and the other 4 patches I added to PR55121). >> >> >> good >> >>> >>> >>> Now, when looking at the whole Spec2k results: >>> - vpr passes now (used to fail) >> >> >> good >> >>> >>> - gcc, parser, perlbmk bzip2 and twolf no longer build: they all fail >>> with the same error from gas: >>> can't resolve `.text.unlikely' {.text.unlikely section} - `.LBB171' >>> {.text section} >>> - gap still does not build (same error as above) >>> >>> I haven't looked in detail, so I may be missing an obvious patch here. >> >> >> Finally had a chance to get back to this. I was able to reproduce the >> failure using x86_64 linux with "-freorder-blocks-and-partition -g". >> However, I am also getting the same failure with a pristine copy of trunk. >> Can you confirm whether you were seeing any of these failures without my >> patches, because I believe they are probably a limitation with function >> splitting and debug info that is orthogonal to my patch. >> > Yes I confirm that I see these failures without your patch too; and > both -freorder-blocks-and-partition and -g are present in my > command-line. > And now gap's integer.c fails to compile with a similar error message too. > >>> >>> And I still observe runtime mis-behaviour on crafty, galgel, facerec and >>> fma3d. >> >> >> I'm not seeing this on x86_64, unfortunately, so it might require some >> follow-on work to triage and fix. >> >> I'll look into the gas failure, but if someone could review this patch in >> the meantime given that it does improve things considerably (at least >> without -g), that would be great. >> > Indeed. > >> Thanks, >> Teresa >> > > Thanks > Christophe -- Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Sign in to reply to this message.
Somehow Rietveld didn't upload the patch properly. I've attached the patch to this email instead. Here is the description: I had sent this patch awhile back to address failures when using -freorder-blocks-and-partition. Could one of the global maintainers review it? Without these fixes this option is broken for many codes. The patch is largely identical to the version sent out before, but I just updated my client and re-did the bootstrap and regression testing on x86_64-unknown-linux-gnu. I also just rebuilt and tested cpu2006 (both int and fp) with profile feedback and -freorder-blocks-and-partition. Here is the description from the earlier mail: ---------------------- Revised patch that fixes failures encountered when enabling -freorder-blocks-and-partition, including the failure reported in PR 53743. This includes new verification code to ensure no cold blocks dominate hot blocks contributed by Steven Bosscher. I attempted to make the handling of partition updates through the optimization passes much more consistent, removing a number of partial fixes in the code stream in the process. The code to fixup partitions (including the BB_PARTITION assignment, region crossing jump notes, and switch text section notes) is now handled in a few centralized locations. For example, inside rtl_redirect_edge_and_branch and force_nonfallthru_and_redirect, so that callers don't need to attempt the fixup themselves. For optimization passes that make adjustments to the cfg while in cfg layout mode that are not easy to fix up incrementally, the new routine fixup_partitions handles the cleanup globally. This does require calculation of the dominance relation, however, as far as I can tell the routines which now invoke this global fixup (try_optimize_cfg and commit_edge_insertions) are invoked typically once (or a small number of times in the case of try_optimize_cfg) per optimization pass. Additionally, I compared the -ftime-report output for some large fdo compilations and saw only minimal increases in the dominance computation times, which were only a tiny percent of the overall compile time. Additionally, I added a flag to the rtl_data structure to indicate whether any partitioning was actually performed, so that optimizations which were conservatively disabled whenever the flag_reorder_blocks_and_partition is enabled (e.g. try_crossjump_to_edge, part of connect_traces) can be less conservative for functions where no partitions were formed (e.g. they are completely hot). ---------------------- Ok for trunk? Thanks, Teresa (patch attached) 2013/5/7 Teresa Johnson <tejohnson@google.com>: > -- Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Sign in to reply to this message.
On 05/07/13 23:13, Teresa Johnson wrote: > ---------------------- > Revised patch that fixes failures encountered when enabling > -freorder-blocks-and-partition, including the failure reported in PR 53743. > > This includes new verification code to ensure no cold blocks dominate hot > blocks contributed by Steven Bosscher. Seems like a reasonable verification; presumably if we have a cold block dominating a hot block, then the block/edge frequencies are badly broken. Ah, just saw the comments for the other case where this happens. cold entry, but hot loop inside pushing over the barrier. Arguably given a cold block in the dominator graph, all its children should have their frequences scaled down to avoid that situation. > Additionally, I added a flag to the rtl_data structure to indicate whether > any partitioning was actually performed, so that optimizations which were > conservatively disabled whenever the flag_reorder_blocks_and_partition > is enabled (e.g. try_crossjump_to_edge, part of connect_traces) can be less > conservative for functions where no partitions were formed (e.g. they are > completely hot). > ---------------------- > > Ok for trunk? I can't really comment on the cfglayout and related stuff -- it was added at a time when I wasn't doing much with GCC and thus I don't know much about it. However, I like the changes to record if we've done partitioning and checking those instead of flag_reorder_blocks_and_partition. That's simple enough that I'd support pulling it out as a separate patch and installing immediately if that can be done so without major headaches. I think we could do something similar with the code to verify the idom of a hot block is also hot. Though looking at the implementation I wonder if it could be simplified by walking the dominator tree? I can't look at it real closely tonight though. Could you pull those two logical hunks of work out into individual patches. jeff
Sign in to reply to this message.
> On 05/07/13 23:13, Teresa Johnson wrote: > >---------------------- > >Revised patch that fixes failures encountered when enabling > >-freorder-blocks-and-partition, including the failure reported in PR 53743. > > > >This includes new verification code to ensure no cold blocks dominate hot > >blocks contributed by Steven Bosscher. > Seems like a reasonable verification; presumably if we have a cold > block dominating a hot block, then the block/edge frequencies are > badly broken. Ah, just saw the comments for the other case where > this happens. cold entry, but hot loop inside pushing over the > barrier. Arguably given a cold block in the dominator graph, all its Yep, also note that sanity checking anything about frequencies is really hard. There are very many places in compiler that necesarilly need to invalidate frequencies in weird ways (at least short of rebuilding the whole profile from probabilities again). > I can't really comment on the cfglayout and related stuff -- it was > added at a time when I wasn't doing much with GCC and thus I don't > know much about it. I think I can take a look at the cfglayout stuff. Splitting the patch would be great. Honza > > However, I like the changes to record if we've done partitioning and > checking those instead of flag_reorder_blocks_and_partition. That's > simple enough that I'd support pulling it out as a separate patch > and installing immediately if that can be done so without major > headaches. > > I think we could do something similar with the code to verify the > idom of a hot block is also hot. Though looking at the > implementation I wonder if it could be simplified by walking the > dominator tree? I can't look at it real closely tonight though. > > Could you pull those two logical hunks of work out into individual > patches. > > jeff
Sign in to reply to this message.
On Fri, May 10, 2013 at 4:52 AM, Jan Hubicka <hubicka@ucw.cz> wrote: >> On 05/07/13 23:13, Teresa Johnson wrote: >> >---------------------- >> >Revised patch that fixes failures encountered when enabling >> >-freorder-blocks-and-partition, including the failure reported in PR 53743. >> > >> >This includes new verification code to ensure no cold blocks dominate hot >> >blocks contributed by Steven Bosscher. >> Seems like a reasonable verification; presumably if we have a cold >> block dominating a hot block, then the block/edge frequencies are >> badly broken. Ah, just saw the comments for the other case where >> this happens. cold entry, but hot loop inside pushing over the >> barrier. Arguably given a cold block in the dominator graph, all its >> children should have their frequences scaled down to avoid that situation. > > Yep, also note that sanity checking anything about frequencies is really hard. > There are very many places in compiler that necesarilly need to invalidate > frequencies in weird ways (at least short of rebuilding the whole profile > from probabilities again). Yes, as noted in the comments this was in part due to several places where counts/frequencies were not kept in sync. Rather than try to fix all of these, or do any scaling of frequencies, the partitioning code now just enforces that the partitioning is sane w.r.t. the given counts. This is done during bb partitioning. The sanity checking routine was also useful for finding places where optimization passes were splitting edges and causing hot blocks previously reached by both hot and cold blocks to become dominated by cold blocks (see comments in commit_edge_insertions in my patch), and making sure they got fixed up. But there is the issue of what we should do in the case of an infrequent but non-zero entry (marked cold by maybe_hot_count_p because its count is less than the number of training runs) that leads to a hot loop. The code I added to the partitioning routine (find_rarely_executed_basic_blocks_and_crossing_edges) will cause the entry to also be placed in the hot partition. I would argue this is the desired behavior - if the routine contains code that is very hot for, say, 1/2 its training runs, the entry and hot loop (and everything on the path in between) should be in the hot partition. > >> I can't really comment on the cfglayout and related stuff -- it was >> added at a time when I wasn't doing much with GCC and thus I don't >> know much about it. > > I think I can take a look at the cfglayout stuff. Splitting the patch would be great. Thanks, that would be great. I can split the patch first. > > Honza >> >> However, I like the changes to record if we've done partitioning and >> checking those instead of flag_reorder_blocks_and_partition. That's >> simple enough that I'd support pulling it out as a separate patch >> and installing immediately if that can be done so without major >> headaches. Ok, thanks, will do. >> >> I think we could do something similar with the code to verify the >> idom of a hot block is also hot. Though looking at the >> implementation I wonder if it could be simplified by walking the >> dominator tree? I can't look at it real closely tonight though. Looking at this code again, I agree with you. It looks like it is going to walk cold bb's more than once and O(n^2) in the worst case. I will fix this (there are a couple places in the patch that do a walk to ensure that this is not violated). >> >> Could you pull those two logical hunks of work out into individual >> patches. Will do. The only complication with splitting out the dominance checking stuff is that there are a number of changes in the patch to ensure that we don't violate this (hot block can't be dominated by cold block). I am not sure it makes sense or will be easy to split all of these out. I think what I will do is try to pull the big related chunks of them out to a separate patch (the new verification code, the code to prevent this in the partitioning routine, and fixup_partitions), but there are going to be a few places in the other patch that do some fixups related to this (e.g. in rtl_split_edge) that I would like to leave in the larger correctness patch. Thanks, Teresa >> >> jeff -- Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Sign in to reply to this message.
|