Rietveld Code Review Tool
Help | Bug tracker | Discussion group | Source code | Sign in
(946)

Issue 6351086: [PATCH] New fdo summary-based icache sensitive unrolling

Can't Edit
Can't Publish+Mail
Start Review
Created:
11 years, 8 months ago by tejohnson
Modified:
9 years, 6 months ago
Reviewers:
davidxl, stevenb.gcc, markus
CC:
hubicka_ucw.cz, gcc-patches_gcc.gnu.org
Base URL:
svn+ssh://gcc.gnu.org/svn/gcc/trunk/
Visibility:
Public.

Patch Set 1 #

Total comments: 5

Patch Set 2 : [PATCH] New fdo summary-based icache sensitive unrolling #

Unified diffs Side-by-side diffs Delta from patch set Stats (+428 lines, -28 lines) Patch
M gcc/cfgloop.h View 1 4 chunks +15 lines, -2 lines 0 comments Download
M gcc/cfgloop.c View 1 1 chunk +84 lines, -10 lines 0 comments Download
M gcc/common.opt View 1 2 chunks +8 lines, -0 lines 0 comments Download
M gcc/coverage.c View 1 1 chunk +8 lines, -0 lines 0 comments Download
M gcc/doc/invoke.texi View 1 5 chunks +27 lines, -2 lines 0 comments Download
M gcc/gcov-dump.c View 1 1 chunk +6 lines, -2 lines 0 comments Download
M gcc/gcov-io.h View 1 2 chunks +5 lines, -1 line 0 comments Download
M gcc/gcov-io.c View 1 2 chunks +4 lines, -0 lines 0 comments Download
M gcc/loop-iv.c View 1 1 chunk +6 lines, -2 lines 0 comments Download
M gcc/loop-unroll.c View 1 10 chunks +127 lines, -8 lines 0 comments Download
M gcc/params.def View 1 1 chunk +17 lines, -0 lines 0 comments Download
M libgcc/libgcov.c View 1 3 chunks +121 lines, -1 line 0 comments Download

Messages

Total messages: 17
tejohnson
Ports some patches related to improving FDO program summary information and using it to guide ...
11 years, 8 months ago (2012-07-11 17:43:01 UTC) #1
tejohnson
ping Teresa On Wed, Jul 11, 2012 at 10:42 AM, Teresa Johnson <tejohnson@google.com>wrote: > Ports ...
11 years, 8 months ago (2012-07-18 15:40:08 UTC) #2
tejohnson
Ping (retrying ping in plain text mode so that it goes through properly). Thanks, Teresa ...
11 years, 8 months ago (2012-07-18 15:48:12 UTC) #3
tejohnson
Ping. Teresa On Wed, Jul 18, 2012 at 8:48 AM, Teresa Johnson <tejohnson@google.com> wrote: > ...
11 years, 8 months ago (2012-07-26 15:57:26 UTC) #4
davidxl
http://codereview.appspot.com/6351086/diff/1/gcc/gcov-io.h File gcc/gcov-io.h (right): http://codereview.appspot.com/6351086/diff/1/gcc/gcov-io.h#newcode396 gcc/gcov-io.h:396: gcov_unsigned_t num_hot_counters;/* number of counters to reach a given ...
11 years, 8 months ago (2012-07-26 18:26:04 UTC) #5
tejohnson
On Thu, Jul 26, 2012 at 11:26 AM, <davidxl@google.com> wrote: > > http://codereview.appspot.com/**6351086/diff/1/gcc/gcov-io.h<http://codereview.appspot.com/6351086/diff/1/gcc/gcov-io.h> > File ...
11 years, 8 months ago (2012-07-26 20:32:03 UTC) #6
tejohnson
Resending in plain text mode...sigh. Teresa On Thu, Jul 26, 2012 at 1:32 PM, Teresa ...
11 years, 8 months ago (2012-07-26 20:38:57 UTC) #7
tejohnson
Updated patch based on review feedback. Bootstrapped and tested on x86_64-unknown-linux-gnu. Ok for trunk? Teresa ...
11 years, 8 months ago (2012-07-27 04:47:51 UTC) #8
stevenb.gcc
On Fri, Jul 27, 2012 at 6:47 AM, Teresa Johnson <tejohnson@google.com> wrote: > * gcc/gcov-io.h ...
11 years, 8 months ago (2012-07-27 07:31:59 UTC) #9
tejohnson
On Fri, Jul 27, 2012 at 12:31 AM, Steven Bosscher <stevenb.gcc@gmail.com> wrote: > On Fri, ...
11 years, 8 months ago (2012-07-27 14:33:17 UTC) #10
tejohnson
On Fri, Jul 27, 2012 at 3:14 AM, Jan Hubicka <hubicka@ucw.cz> wrote: > Hi, >> ...
11 years, 8 months ago (2012-07-27 14:57:11 UTC) #11
tejohnson
This patch was proposed awhile back, and the new working set program summary was split ...
11 years, 4 months ago (2012-11-20 00:58:18 UTC) #12
tejohnson
Ping. Teresa On Mon, Nov 19, 2012 at 4:58 PM, Teresa Johnson <tejohnson@google.com> wrote: > ...
11 years, 3 months ago (2012-12-11 17:13:35 UTC) #13
markus_trippelsdorf.de
On 2012.12.11 at 09:13 -0800, Teresa Johnson wrote: > Ping. > Teresa > > On ...
11 years, 3 months ago (2012-12-11 18:03:01 UTC) #14
hubicka_ucw.cz
> On 2012.12.11 at 09:13 -0800, Teresa Johnson wrote: > > Ping. > > Teresa ...
11 years, 3 months ago (2012-12-11 18:12:20 UTC) #15
tejohnson
On Tue, Dec 11, 2012 at 10:12 AM, Jan Hubicka <hubicka@ucw.cz> wrote: >> On 2012.12.11 ...
11 years, 3 months ago (2012-12-11 19:28:36 UTC) #16
tejohnson
11 years, 2 months ago (2013-01-02 15:55:41 UTC) #17
Ping.
Teresa

On Tue, Dec 11, 2012 at 9:13 AM, Teresa Johnson <tejohnson@google.com> wrote:
> Ping.
> Teresa
>
> On Mon, Nov 19, 2012 at 4:58 PM, Teresa Johnson <tejohnson@google.com> wrote:
>> This patch was proposed awhile back, and the new working set program
>> summary was split out and committed separately. I would like to
>> see if the loop unroller part of the patch can now go in. For the
>> previous versions of the patch and related discussion, see:
>>
>> http://gcc.gnu.org/ml/gcc-patches/2012-06/msg00437.html
>> and
>> http://gcc.gnu.org/ml/gcc-patches/2012-07/msg01373.html
>>
>> Use the new working set information in the FDO program summary for
>> codesize based unroll and peel decisions, to prevent those
>> optimizations from increasing code size too much when the
>> program may be sensitive to icache effects.
>>
>> As part of this, cache additional loop analysis results in the niter_desc
>> auxiliary information hanging off the loop structure to reduce
>> redundant analyses during unrolling.
>>
>> Bootstrapped and tested on x86_64-unknown-linux-gnu. Ok for trunk?
>>
>> Thanks,
>> Teresa
>>
>> 2012-11-19  Teresa Johnson  <tejohnson@google.com>
>>
>>         * gcc/doc/invoke.texi: Document new options
>>         -fpeel-codesize-limit and -funroll-codesize-limit, and new params
>>         codesize-hotness-threshold and unrollpeel-hotness-threshold.
>>         * gcc/loop-unroll.c (code_size_limit_factor): New function.
>>         (decide_unroll_runtime_iterations): Call code_size_limit_factor
>>         to control the unroll factor, and retrieve number of branches from
>>         niter_desc instead of via function that walks loop.
>>         (decide_peel_simple, decide_unroll_stupid): Ditto.
>>         * gcc/loop-iv.c (get_simple_loop_desc): Invoke new
analyze_loop_insns
>>         function, and add guards to enable this function to work for the
>>         outermost loop.
>>         * gcc/common.opt: Add -fpeel-codesize-limit and
>>         -funroll-codesize-limit.
>>         * gcc/cfgloop.c (insn_has_fp_set, analyze_loop_insns): New
functions.
>>         (num_loop_branches): Remove.
>>         * gcc/cfgloop.h (struct niter_desc): Added new fields to cache
>>         additional loop analysis information.
>>         (num_loop_branches): Remove.
>>         (analyze_loop_insns): Declare.
>>         * gcc/params.def (PARAM_UNROLLPEEL_CODESIZE_THRESHOLD): Add.
>>         (PARAM_UNROLLPEEL_HOTNESS_THRESHOLD): Ditto.
>>
>> Index: doc/invoke.texi
>> ===================================================================
>> --- doc/invoke.texi     (revision 193626)
>> +++ doc/invoke.texi     (working copy)
>> @@ -389,7 +389,7 @@ Objective-C and Objective-C++ Dialects}.
>>  -fno-sched-interblock -fno-sched-spec -fno-signed-zeros @gol
>>  -fno-toplevel-reorder -fno-trapping-math -fno-zero-initialized-in-bss @gol
>>  -fomit-frame-pointer -foptimize-register-move -foptimize-sibling-calls @gol
>> --fpartial-inlining -fpeel-loops -fpredictive-commoning @gol
>> +-fpartial-inlining -fpeel-codesize-limit -fpeel-loops
>> -fpredictive-commoning @gol
>>  -fprefetch-loop-arrays -fprofile-report @gol
>>  -fprofile-correction -fprofile-dir=@var{path} -fprofile-generate @gol
>>  -fprofile-generate=@var{path} @gol
>> @@ -421,7 +421,7 @@ Objective-C and Objective-C++ Dialects}.
>>  -ftree-reassoc -ftree-sink -ftree-slsr -ftree-sra @gol
>>  -ftree-switch-conversion -ftree-tail-merge @gol
>>  -ftree-ter -ftree-vect-loop-version -ftree-vectorize -ftree-vrp @gol
>> --funit-at-a-time -funroll-all-loops -funroll-loops @gol
>> +-funit-at-a-time -funroll-all-loops -funroll-loops -funroll-codesize-limit
@gol
>>  -funsafe-loop-optimizations -funsafe-math-optimizations -funswitch-loops
@gol
>>  -fvariable-expansion-in-unroller -fvect-cost-model -fvpt -fweb @gol
>>  -fwhole-program -fwpa -fuse-linker-plugin @gol
>> @@ -8771,6 +8771,14 @@ the loop is entered.  This usually makes programs
>>  @option{-funroll-all-loops} implies the same options as
>>  @option{-funroll-loops}.
>>
>> +@item -funroll-codesize-limit
>> +@opindex funroll-codesize-limit
>> +Limit loop unrolling of non-const non-FP loops in a profile feedback
>> compilation
>> +under estimates of a large code footprint. Enabled by default with
>> +@option{-fprofile-use}. Code size and execution weight thresholds are
>> controlled
>> +by the @option{unrollpeel-codesize-threshold} and
>> +@option{unrollpeel-hotness-threshold} parameters.
>> +
>>  @item -fpeel-loops
>>  @opindex fpeel-loops
>>  Peels loops for which there is enough information that they do not
>> @@ -8779,6 +8787,14 @@ roll much (from profile feedback).  It also turns
>>
>>  Enabled with @option{-fprofile-use}.
>>
>> +@item -fpeel-codesize-limit
>>
>> +@item -fpeel-codesize-limit
>> +@opindex fpeel-codesize-limit
>> +Limit loop peeling of non-const non-FP loops in a profile feedback
compilation
>> +under estimates of a large code footprint. Enabled by default with
>> +@option{-fprofile-use}. Code size and execution weight thresholds are
>> controlled
>> +by the @option{unrollpeel-codesize-threshold} and
>> +@option{unrollpeel-hotness-threshold} parameters.
>> +
>>  @item -fmove-loop-invariants
>>  @opindex fmove-loop-invariants
>>  Enables the loop invariant motion pass in the RTL loop optimizer.  Enabled
>> @@ -9142,6 +9158,15 @@ The maximum number of iterations of a loop to be s
>>  @item max-completely-peel-loop-nest-depth
>>  The maximum depth of a loop nest suitable for complete peeling.
>>
>> +@item unrollpeel-codesize-threshold
>> +Maximum profile-based code size footprint estimate for loop unrolling and
>> +peeling.
>> +
>> +@item unrollpeel-hotness-threshold
>> +Maximum ratio of total execution count to loop entry block count under
which
>> +most profile-based code size estimates will be ignored for loop unrolling
and
>> +peeling.
>> +
>>  @item max-unswitch-insns
>>  The maximum number of insns of an unswitched loop.
>>
>> Index: loop-unroll.c
>> ===================================================================
>> --- loop-unroll.c       (revision 193626)
>> +++ loop-unroll.c       (working copy)
>> @@ -32,6 +32,7 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "hashtab.h"
>>  #include "recog.h"
>>  #include "target.h"
>> +#include "gcov-io.h"
>>  #include "dumpfile.h"
>>
>>  /* This pass performs loop unrolling and peeling.  We only perform these
>> @@ -147,6 +148,82 @@ static void combine_var_copies_in_loop_exit (struc
>>                                              basic_block);
>>  static rtx get_expansion (struct var_to_expand *);
>>
>> +/* Determine whether and how much LOOP unrolling/peeling should be
constrained
>> +   based on code footprint estimates. Returns the codesize-based factor to
be
>> +   divided into the max instructions in an unrolled or peeled loop:
>> +   1) For size <= threshold, do not limit (by returning 1).
>> +   2) For threshold < size < 2*threshold, reduce maximum allowed peeled or
>> +      unrolled instructions according to loop hotness.
>> +   3) For threshold >= 2*threshold, disable unrolling/peeling (by returning
>> +      INT_MAX).  */
>> +
>> +static int
>> +code_size_limit_factor(struct loop *loop)
>> +{
>> +  unsigned size_threshold, num_hot_counters;
>> +  struct niter_desc *desc = get_simple_loop_desc (loop);
>> +  gcov_type sum_to_header_ratio;
>> +  int hotness_ratio_threshold;
>> +  int limit_factor;
>> +  gcov_working_set_t *ws;
>> +
>> +  ws = find_working_set(999);
>> +  if (! ws)
>> +    return 1;
>> +  num_hot_counters = ws->num_counters;
>> +
>> +  /* First check if the application has a large codesize footprint.
>> +     This is estimated from FDO profile summary information for the
>> +     program, where the num_hot_counters indicates the number of hottest
>> +     counters (blocks) that compose most of the execution time of
>> +     the program. A large value would indicate a large flat execution
>> +     profile where icache misses may be a concern.  */
>> +  size_threshold = PARAM_VALUE (PARAM_UNROLLPEEL_CODESIZE_THRESHOLD);
>> +  if (!profile_info
>> +      || num_hot_counters <= size_threshold
>> +      || !profile_info->sum_all)
>> +    return 1;
>> +
>> +  /* Next, exclude some loops where unrolling/peeling may be more
>> +     important to overall performance.  */
>> +
>> +  /* Ignore FP loops, which are more likely to benefit heavily from
>> +     unrolling. */
>> +  if (desc->has_fp)
>> +    return 1;
>> +
>> +  /* Next, set the value of the codesize-based unroll factor divisor which
in
>> +     most loops will need to be set to a value that will reduce or
eliminate
>> +     unrolling/peeling.  */
>> +  if (num_hot_counters < size_threshold * 2
>> +      && loop->header->count > 0)
>> +    {
>> +      /* For applications that are less than twice the codesize limit,
allow
>> +         limited unrolling for very hot loops.  */
>> +      sum_to_header_ratio = profile_info->sum_all / loop->header->count;
>> +      hotness_ratio_threshold = PARAM_VALUE
>> (PARAM_UNROLLPEEL_HOTNESS_THRESHOLD);
>> +      /* When the profile count sum to loop entry header ratio is smaller
than
>> +         the threshold (i.e. the loop entry is hot enough), the divisor is
set
>> +         to 1 so the unroll/peel factor is not reduced. When it is bigger
>> +         than the ratio, increase the divisor by the amount this ratio
>> +         is over the threshold, which will quickly reduce the unroll/peel
>> +         factor to zero as the loop's hotness reduces.  */
>> +      if (sum_to_header_ratio > hotness_ratio_threshold)
>> +        {
>> +          limit_factor = sum_to_header_ratio / hotness_ratio_threshold;
>> +          gcc_assert (limit_factor >= 1);
>> +        }
>> +      else
>> +        limit_factor = 1;
>> +    }
>> +  else
>> +    /* For appliations that are at least twice the codesize limit, set
>> +       the divisor to a large value that will force the unroll factor to 0. 
*/
>> +    limit_factor = INT_MAX;
>> +
>> +  return limit_factor;
>> +}
>> +
>>  /* Unroll and/or peel (depending on FLAGS) LOOPS.  */
>>  void
>>  unroll_and_peel_loops (int flags)
>> @@ -822,6 +899,7 @@ decide_unroll_runtime_iterations (struct loop *loo
>>  {
>>    unsigned nunroll, nunroll_by_av, i;
>>    struct niter_desc *desc;
>> +  int limit_factor = 1;
>>    double_int iterations;
>>
>>    if (!(flags & UAP_UNROLL))
>> @@ -835,10 +913,26 @@ decide_unroll_runtime_iterations (struct loop *loo
>>              "\n;; Considering unrolling loop with runtime "
>>              "computable number of iterations\n");
>>
>> +  if (flag_unroll_codesize_limit)
>> +    {
>> +      /* Determine whether to limit code size growth from unrolling,
>> +         using FDO profile summary information that gives an
>> +         estimated number of executed blocks.  */
>> +      limit_factor = code_size_limit_factor (loop);
>> +      if (dump_file && limit_factor > 1)
>> +        {
>> +          fprintf (dump_file,
>> +                   ";; Due to large code size footprint estimate, limit "
>> +                   "max unrolled insns by divisor %d\n", limit_factor);
>> +        }
>> +    }
>> +
>>    /* nunroll = total number of copies of the original loop body in
>>       unrolled loop (i.e. if it is 2, we have to duplicate loop body once. 
*/
>> -  nunroll = PARAM_VALUE (PARAM_MAX_UNROLLED_INSNS) / loop->ninsns;
>> -  nunroll_by_av = PARAM_VALUE (PARAM_MAX_AVERAGE_UNROLLED_INSNS) /
>> loop->av_ninsns;
>> +  nunroll = PARAM_VALUE (PARAM_MAX_UNROLLED_INSNS) / limit_factor
>> +            / loop->ninsns;
>> +  nunroll_by_av = PARAM_VALUE (PARAM_MAX_AVERAGE_UNROLLED_INSNS)
>> +                  / limit_factor / loop->av_ninsns;
>>    if (nunroll > nunroll_by_av)
>>      nunroll = nunroll_by_av;
>>    if (nunroll > (unsigned) PARAM_VALUE (PARAM_MAX_UNROLL_TIMES))
>> @@ -1229,6 +1323,8 @@ decide_peel_simple (struct loop *loop, int flags)
>>  {
>>    unsigned npeel;
>>    double_int iterations;
>> +  struct niter_desc *desc;
>> +  int limit_factor = 1;
>>
>>    if (!(flags & UAP_PEEL))
>>      {
>> @@ -1239,8 +1335,23 @@ decide_peel_simple (struct loop *loop, int flags)
>>    if (dump_file)
>>      fprintf (dump_file, "\n;; Considering simply peeling loop\n");
>>
>> +  if (flag_peel_codesize_limit)
>> +    {
>> +      /* Determine whether to limit code size growth from peeling,
>> +         using FDO profile summary information that gives an
>> +         estimated number of executed blocks.  */
>> +      limit_factor = code_size_limit_factor (loop);
>> +      if (dump_file && limit_factor > 1)
>> +       {
>> +          fprintf (dump_file,
>> +                   ";; Due to large code size footprint estimate, limit "
>> +                   "max peeled insns by divisor %d\n", limit_factor);
>> +       }
>> +    }
>> +
>>    /* npeel = number of iterations to peel.  */
>> -  npeel = PARAM_VALUE (PARAM_MAX_PEELED_INSNS) / loop->ninsns;
>> +  npeel = PARAM_VALUE (PARAM_MAX_PEELED_INSNS) / limit_factor
>> +          / loop->ninsns;
>>    if (npeel > (unsigned) PARAM_VALUE (PARAM_MAX_PEEL_TIMES))
>>      npeel = PARAM_VALUE (PARAM_MAX_PEEL_TIMES);
>>
>> @@ -1253,7 +1364,7 @@ decide_peel_simple (struct loop *loop, int flags)
>>      }
>>
>>    /* Do not simply peel loops with branches inside -- it increases number
>> -     of mispredicts.
>> +     of mispredicts.
>>       Exception is when we do have profile and we however have good chance
>>       to peel proper number of iterations loop will iterate in practice.
>>       TODO: this heuristic needs tunning; while for complette unrolling
>> @@ -1261,7 +1372,8 @@ decide_peel_simple (struct loop *loop, int flags)
>>       peeling it is not the case.  Also a function call inside loop is
>>       also branch from branch prediction POV (and probably better reason
>>       to not unroll/peel).  */
>> -  if (num_loop_branches (loop) > 1
>> +  desc = get_simple_loop_desc (loop);
>> +  if (desc->num_branches > 1
>>        && profile_status != PROFILE_READ)
>>      {
>>        if (dump_file)
>> @@ -1387,6 +1499,7 @@ decide_unroll_stupid (struct loop *loop, int flags
>>  {
>>    unsigned nunroll, nunroll_by_av, i;
>>    struct niter_desc *desc;
>> +  int limit_factor = 1;
>>    double_int iterations;
>>
>>    if (!(flags & UAP_UNROLL_ALL))
>> @@ -1398,11 +1511,26 @@ decide_unroll_stupid (struct loop *loop, int flags
>>    if (dump_file)
>>      fprintf (dump_file, "\n;; Considering unrolling loop stupidly\n");
>>
>> +  if (flag_unroll_codesize_limit)
>> +    {
>> +      /* Determine whether to limit code size growth from unrolling,
>> +         using FDO profile summary information that gives an
>> +         estimated number of executed blocks.  */
>> +      limit_factor = code_size_limit_factor (loop);
>> +      if (dump_file && limit_factor > 1)
>> +       {
>> +          fprintf (dump_file,
>> +                   ";; Due to large code size footprint estimate, limit "
>> +                   "max unrolled insns by divisor %d\n", limit_factor);
>> +       }
>> +    }
>> +
>>    /* nunroll = total number of copies of the original loop body in
>>       unrolled loop (i.e. if it is 2, we have to duplicate loop body once. 
*/
>> -  nunroll = PARAM_VALUE (PARAM_MAX_UNROLLED_INSNS) / loop->ninsns;
>> -  nunroll_by_av
>> -    = PARAM_VALUE (PARAM_MAX_AVERAGE_UNROLLED_INSNS) / loop->av_ninsns;
>> +  nunroll = PARAM_VALUE (PARAM_MAX_UNROLLED_INSNS) / limit_factor
>> +            / loop->ninsns;
>> +  nunroll_by_av = PARAM_VALUE (PARAM_MAX_AVERAGE_UNROLLED_INSNS)
>> +                  / limit_factor / loop->av_ninsns;
>>    if (nunroll > nunroll_by_av)
>>      nunroll = nunroll_by_av;
>>    if (nunroll > (unsigned) PARAM_VALUE (PARAM_MAX_UNROLL_TIMES))
>> @@ -1434,7 +1562,7 @@ decide_unroll_stupid (struct loop *loop, int flags
>>       of mispredicts.
>>       TODO: this heuristic needs tunning; call inside the loop body
>>       is also relatively good reason to not unroll.  */
>> -  if (num_loop_branches (loop) > 1)
>> +  if (desc->num_branches > 1)
>>      {
>>        if (dump_file)
>>         fprintf (dump_file, ";; Not unrolling, contains branches\n");
>> Index: loop-iv.c
>> ===================================================================
>> --- loop-iv.c   (revision 193626)
>> +++ loop-iv.c   (working copy)
>> @@ -3006,8 +3006,12 @@ get_simple_loop_desc (struct loop *loop)
>>    /* At least desc->infinite is not always initialized by
>>       find_simple_loop_exit.  */
>>    desc = XCNEW (struct niter_desc);
>> -  iv_analysis_loop_init (loop);
>> -  find_simple_exit (loop, desc);
>> +  if (loop->latch != EXIT_BLOCK_PTR)
>> +    {
>> +      iv_analysis_loop_init (loop);
>> +      find_simple_exit (loop, desc);
>> +    }
>> +  analyze_loop_insns (loop, desc);
>>    loop->aux = desc;
>>
>>    if (desc->simple_p && (desc->assumptions || desc->infinite))
>> Index: common.opt
>> ===================================================================
>> --- common.opt  (revision 193626)
>> +++ common.opt  (working copy)
>> @@ -1573,6 +1573,10 @@ fpcc-struct-return
>>  Common Report Var(flag_pcc_struct_return,1) Init(DEFAULT_PCC_STRUCT_RETURN)
>>  Return small aggregates in memory, not registers
>>
>> +fpeel-codesize-limit
>> +Common Report Var(flag_peel_codesize_limit) Init(1) Optimization
>> +Limit non-const non-FP loop peeling under profile estimates of large
>> code footprint
>> +
>>  fpeel-loops
>>  Common Report Var(flag_peel_loops) Optimization
>>  Perform loop peeling
>> @@ -2138,6 +2142,10 @@ funroll-all-loops
>>  Common Report Var(flag_unroll_all_loops) Optimization
>>  Perform loop unrolling for all loops
>>
>> +funroll-codesize-limit
>> +Common Report Var(flag_unroll_codesize_limit) Init(1) Optimization
>> +Limit non-const non-FP loop unrolling under profile estimates of
>> large code footprint
>> +
>>  ; Nonzero means that loop optimizer may assume that the induction variables
>>  ; that control loops do not overflow and that the loops with nontrivial
>>  ; exit condition are not infinite
>> Index: cfgloop.c
>> ===================================================================
>> --- cfgloop.c   (revision 193626)
>> +++ cfgloop.c   (working copy)
>> @@ -1148,24 +1148,98 @@ get_loop_exit_edges (const struct loop *loop)
>>    return edges;
>>  }
>>
>> -/* Counts the number of conditional branches inside LOOP.  */
>> +/* Determine if INSN is a floating point set.  */
>>
>> -unsigned
>> -num_loop_branches (const struct loop *loop)
>> +static bool
>> +insn_has_fp_set(rtx insn)
>>  {
>> -  unsigned i, n;
>> -  basic_block * body;
>> +  int i;
>> +  rtx pat = PATTERN(insn);
>> +  if (GET_CODE (pat) == SET)
>> +    return (FLOAT_MODE_P (GET_MODE (SET_DEST (pat))));
>> +  else if (GET_CODE (pat) == PARALLEL)
>> +    {
>> +      for (i = 0; i < XVECLEN (pat, 0); i++)
>> +        {
>> +          rtx sub = XVECEXP (pat, 0, i);
>> +          if (GET_CODE (sub) == SET)
>> +            return (FLOAT_MODE_P (GET_MODE (SET_DEST (sub))));
>> +        }
>> +    }
>> +  return false;
>> +}
>>
>> -  gcc_assert (loop->latch != EXIT_BLOCK_PTR);
>> +/* Analyzes the instructions inside LOOP, updating the DESC. Currently
counts
>> +   the number of conditional branch instructions, calls and fp
instructions,
>> +   as well as the average number of branches executed per iteration.  */
>>
>> +void
>> +analyze_loop_insns (const struct loop *loop, struct niter_desc *desc)
>> +{
>> +  unsigned i, nbranch;
>> +  gcov_type weighted_nbranch;
>> +  bool has_call, has_fp;
>> +  basic_block * body, bb;
>> +  rtx insn;
>> +  gcov_type header_count = loop->header->count;
>> +
>> +  nbranch = weighted_nbranch = 0;
>> +  has_call = has_fp = false;
>> +
>>    body = get_loop_body (loop);
>> -  n = 0;
>>    for (i = 0; i < loop->num_nodes; i++)
>> -    if (EDGE_COUNT (body[i]->succs) >= 2)
>> -      n++;
>> +    {
>> +      bb = body[i];
>> +
>> +      if (EDGE_COUNT (bb->succs) >= 2)
>> +        {
>> +          nbranch++;
>> +
>> +          /* If this block is executed less frequently than the header
(loop
>> +             entry), then it is weighted based on its execution count,
which
>> +             will be turned into a ratio compared to the loop header below.
*/
>> +          if (bb->count < header_count)
>> +            weighted_nbranch += bb->count;
>> +
>> +          /* When it is executed more frequently than the header (i.e. it
is
>> +             in a nested inner loop), simply weight the branch the same as
the
>> +             header execution count, so that it will contribute 1 branch to
>> +             the ratio computed below. */
>> +          else
>> +            weighted_nbranch += header_count;
>> +        }
>> +
>> +      /* No need to iterate through the instructions below if
>> +         both flags have already been set.  */
>> +      if (has_call && has_fp)
>> +        continue;
>> +
>> +      FOR_BB_INSNS (bb, insn)
>> +        {
>> +          if (!INSN_P (insn))
>> +            continue;
>> +
>> +          if (!has_call)
>> +            has_call = CALL_P (insn);
>> +
>> +          if (!has_fp)
>> +            has_fp = insn_has_fp_set (insn);
>> +        }
>> +    }
>>    free (body);
>>
>> -  return n;
>> +  desc->num_branches = nbranch;
>> +  /* Now divide the weights computed above by the loop header execution
count,
>> +     to compute the average number of branches through the loop. By adding
>> +     header_count/2 to the numerator we round to nearest with integer
>> +     division.  */
>> +  if (header_count  != 0)
>> +    desc->av_num_branches
>> +        = (weighted_nbranch + header_count/2) / header_count;
>> +  else
>> +    desc->av_num_branches = 0;
>> +  desc->has_call = has_call;
>> +  desc->has_fp = has_fp;
>>  }
>>
>>  /* Adds basic block BB to LOOP.  */
>> Index: cfgloop.h
>> ===================================================================
>> --- cfgloop.h   (revision 193626)
>> +++ cfgloop.h   (working copy)
>> @@ -252,7 +252,6 @@ extern basic_block *get_loop_body_in_custom_order
>>  extern vec<edge> get_loop_exit_edges (const struct loop *);
>>  extern edge single_exit (const struct loop *);
>>  extern edge single_likely_exit (struct loop *loop);
>> -extern unsigned num_loop_branches (const struct loop *);
>>
>>  extern edge loop_preheader_edge (const struct loop *);
>>  extern edge loop_latch_edge (const struct loop *);
>> @@ -365,7 +364,8 @@ struct rtx_iv
>>  };
>>
>>  /* The description of an exit from the loop and of the number of iterations
>> -   till we take the exit.  */
>> +   till we take the exit. Also includes other information used primarily
>> +   by the loop unroller.  */
>>
>>  struct niter_desc
>>  {
>> @@ -403,6 +403,18 @@ struct niter_desc
>>
>>    /* The number of iterations of the loop.  */
>>    rtx niter_expr;
>> +
>> +  /* The number of branches in the loop.  */
>> +  unsigned num_branches;
>> +
>> +  /* The number of executed branches per iteration.  */
>> +  unsigned av_num_branches;
>> +
>> +  /* Whether the loop contains a call instruction.  */
>> +  bool has_call;
>> +
>> +  /* Whether the loop contains fp instructions.  */
>> +  bool has_fp;
>>  };
>>
>>  extern void iv_analysis_loop_init (struct loop *);
>> @@ -416,6 +428,7 @@ extern void iv_analysis_done (void);
>>
>>  extern struct niter_desc *get_simple_loop_desc (struct loop *loop);
>>  extern void free_simple_loop_desc (struct loop *loop);
>> +void analyze_loop_insns (const struct loop *, struct niter_desc *desc);
>>
>>  static inline struct niter_desc *
>>  simple_loop_desc (struct loop *loop)
>> Index: params.def
>> ===================================================================
>> --- params.def  (revision 193626)
>> +++ params.def  (working copy)
>> @@ -321,6 +321,23 @@ DEFPARAM(PARAM_MAX_UNROLL_ITERATIONS,
>>          "max-completely-peel-loop-nest-depth",
>>          "The maximum depth of a loop nest we completely peel",
>>          8, 0, 0)
>> +/* The maximum code size estimate under which loop unrolling and peeling
>> + * is allowed in a profile feedback compile. This currently applies to
loops
>> + * with non-constant iteration counts and no floating point computations. 
*/
>> +DEFPARAM(PARAM_UNROLLPEEL_CODESIZE_THRESHOLD,
>> +        "unrollpeel-codesize-threshold",
>> +        "Maximum profile-based code size footprint estimate for loop
>> unrolling "
>> +        "and peeling",
>> +        15000, 0, 0)
>> +/* The maximum ratio of total profiled execution counts to loop entry block
>> +   count that must be exceeded to ignore most code size limits when
unrolling
>> +   and peeling.  */
>> +DEFPARAM(PARAM_UNROLLPEEL_HOTNESS_THRESHOLD,
>> +        "unrollpeel-hotness-threshold",
>> +        "Maximum ratio of total profiled execution count to loop entry "
>> +        "block count under which most codesize limits for unrolling and "
>> +        "peeling will be ignored",
>> +        100, 1, 0)
>>
>>  /* The maximum number of insns of an unswitched loop.  */
>>  DEFPARAM(PARAM_MAX_UNSWITCH_INSNS,
>>
>> On Thu, Jul 26, 2012 at 9:47 PM, Teresa Johnson <tejohnson@google.com>
wrote:
>>> Updated patch based on review feedback.
>>>
>>> Bootstrapped and tested on x86_64-unknown-linux-gnu. Ok for trunk?
>>>
>>> Teresa
>>>
>>>
>>> Original patch description:
>>>
>>> Ports some patches related to improving FDO program summary information
>>> and using it to guide loop unrolling from google branches to mainline.
>>> The patch is enhanced to add additional summary information to aid
>>> in determining hot/cold decisions.
>>>
>>> The original patch description is at:
>>>   http://gcc.gnu.org/ml/gcc-patches/2012-06/msg00437.html
>>> and further discussion about incorporating onto mainline is at:
>>>   http://gcc.gnu.org/ml/gcc-patches/2012-06/threads.html#00414
>>>
>>> Honza, can you take a look to see if this patch meets your needs?
>>>
>>> Full description:
>>>
>>> This patch adds new program summary information to the gcov
>>> profile files that indicate how many profiled counts compose
>>> the majority of the program's execution time. This is used to
>>> provide an indication of the overall code size of the program.
>>>
>>> The new profile summary information is then used to guide
>>> codesize based unroll and peel decisions, to prevent those
>>> optimizations from increasing code size too much when the
>>> program may be sensitive to icache effects.
>>>
>>> This patch also pulls in dependent portions of google/main r187660 that
cache
>>> additional loop analysis results in the niter_desc auxiliary information
>>> hanging off the loop structure (the optimization portions of that
>>> change are not included here, and have an outstanding review request
>>> for mainline).
>>>
>>> Bootstrapped and tested on x86_64-unknown-linux-gnu. Ok for trunk?
>>>
>>> Thanks,
>>> Teresa
>>>
>>> 2012-07-26  Teresa Johnson  <tejohnson@google.com>
>>>
>>>         * libgcc/libgcov.c (sort_by_reverse_gcov_value): New function.
>>>         (gcov_compute_cutoff_values): Ditto.
>>>         (gcov_exit): Call gcov_compute_cutoff_values and merge new summary
>>>         information.
>>>         * gcc/doc/invoke.texi (roll much): Document new options
>>>         -fpeel-codesize-limit and -funroll-codesize-limit, and new params
>>>         codesize-hotness-threshold and unrollpeel-hotness-threshold.
>>>         * gcc/gcov-io.c (gcov_write_summary): Write new summary info.
>>>         (gcov_read_summary): Read new summary info.
>>>         * gcc/gcov-io.h (GCOV_TAG_SUMMARY_LENGTH): Update for new summary
info.
>>>         (struct gcov_ctr_summary): Add new summary info: num_hot_counters
and
>>>         hot_cutoff_value.
>>>         * gcc/loop-unroll.c (code_size_limit_factor): New function.
>>>         (decide_unroll_runtime_iterations): Call code_size_limit_factor
>>>         to control the unroll factor, and retrieve number of branches from
>>>         niter_desc instead of via function that walks loop.
>>>         (decide_peel_simple, decide_unroll_stupid): Ditto.
>>>         * gcc/coverage.c (read_counts_file): Propagate new summary info.
>>>         * gcc/loop-iv.c (get_simple_loop_desc): Invoke new
analyze_loop_insns
>>>         function, and add guards to enable this function to work for the
>>>         outermost loop.
>>>         * gcc/common.opt: Add -fpeel-codesize-limit and
>>>         -funroll-codesize-limit.
>>>         * gcc/cfgloop.c (insn_has_fp_set, analyze_loop_insns): New
functions.
>>>         (num_loop_branches): Remove.
>>>         * gcc/cfgloop.h (struct niter_desc): Added new fields to cache
>>>         additional loop analysis information.
>>>         (num_loop_branches): Remove.
>>>         (analyze_loop_insns): Declare.
>>>         * gcc/params.def (PARAM_UNROLLPEEL_CODESIZE_THRESHOLD): Add.
>>>         (PARAM_UNROLLPEEL_HOTNESS_THRESHOLD): Ditto.
>>>         * gcc/gcov-dump.c (tag_summary): Dump new summary info.
>>>
>>> Index: libgcc/libgcov.c
>>> ===================================================================
>>> --- libgcc/libgcov.c    (revision 189893)
>>> +++ libgcc/libgcov.c    (working copy)
>>> @@ -276,6 +276,120 @@ gcov_version (struct gcov_info *ptr, gcov_unsigned
>>>    return 1;
>>>  }
>>>
>>> +/* Used by qsort to sort gcov values in descending order.  */
>>> +
>>> +static int
>>> +sort_by_reverse_gcov_value (const void *pa, const void *pb)
>>> +{
>>> +  const gcov_type a = *(gcov_type const *)pa;
>>> +  const gcov_type b = *(gcov_type const *)pb;
>>> +
>>> +  if (b > a)
>>> +    return 1;
>>> +  else if (b == a)
>>> +    return 0;
>>> +  else
>>> +    return -1;
>>> +}
>>> +
>>> +/* Determines the number of counters required to cover a given percentage
>>> +   of the total sum of execution counts in the summary, which is then also
>>> +   recorded in SUM.  */
>>> +
>>> +static void
>>> +gcov_compute_cutoff_values (struct gcov_summary *sum)
>>> +{
>>> +  struct gcov_info *gi_ptr;
>>> +  const struct gcov_fn_info *gfi_ptr;
>>> +  const struct gcov_ctr_info *ci_ptr;
>>> +  struct gcov_ctr_summary *cs_ptr;
>>> +  unsigned t_ix, f_ix, i, ctr_info_ix, index;
>>> +  gcov_unsigned_t c_num;
>>> +  gcov_type *value_array;
>>> +  gcov_type cum, cum_cutoff;
>>> +  char *cutoff_str;
>>> +  unsigned cutoff_perc;
>>> +
>>> +#define CUM_CUTOFF_PERCENT_TIMES_10 999
>>> +  cutoff_str = getenv ("GCOV_HOTCODE_CUTOFF_TIMES_10");
>>> +  if (cutoff_str && strlen (cutoff_str))
>>> +    cutoff_perc = atoi (cutoff_str);
>>> +  else
>>> +    cutoff_perc = CUM_CUTOFF_PERCENT_TIMES_10;
>>> +
>>> +  /* This currently only applies to arc counters.  */
>>> +  t_ix = GCOV_COUNTER_ARCS;
>>> +
>>> +  /* First check if there are any counts recorded for this counter.  */
>>> +  cs_ptr = &(sum->ctrs[t_ix]);
>>> +  if (!cs_ptr->num)
>>> +    return;
>>> +
>>> +  /* Determine the cumulative counter value at the specified cutoff
>>> +     percentage and record the percentage for use by gcov consumers.
>>> +     Check for overflow when sum_all is multiplied by the cutoff_perc,
>>> +     and if so, do the divide first.  */
>>> +  if ((cs_ptr->sum_all * cutoff_perc) / cutoff_perc != cs_ptr->sum_all)
>>> +    /* Overflow, do the divide first.  */
>>> +    cum_cutoff = cs_ptr->sum_all / 1000 * cutoff_perc;
>>> +  else
>>> +    /* Otherwise multiply first to get the correct value for small
>>> +       values of sum_all.  */
>>> +    cum_cutoff = (cs_ptr->sum_all * cutoff_perc) / 1000;
>>> +
>>> +  /* Next, walk through all the per-object structures and save each of
>>> +     the count values in value_array.  */
>>> +  index = 0;
>>> +  value_array = (gcov_type *) malloc (sizeof (gcov_type) * cs_ptr->num);
>>> +  for (gi_ptr = gcov_list; gi_ptr; gi_ptr = gi_ptr->next)
>>> +    {
>>> +      if (!gi_ptr->merge[t_ix])
>>> +        continue;
>>> +
>>> +      /* Find the appropriate index into the gcov_ctr_info array
>>> +         for the counter we are currently working on based on the
>>> +         existence of the merge function pointer for this object.  */
>>> +      for (i = 0, ctr_info_ix = 0; i < t_ix; i++)
>>> +        {
>>> +          if (gi_ptr->merge[i])
>>> +            ctr_info_ix++;
>>> +        }
>>> +      for (f_ix = 0; f_ix != gi_ptr->n_functions; f_ix++)
>>> +        {
>>> +          gfi_ptr = gi_ptr->functions[f_ix];
>>> +
>>> +          if (!gfi_ptr || gfi_ptr->key != gi_ptr)
>>> +            continue;
>>> +
>>> +          ci_ptr = &gfi_ptr->ctrs[ctr_info_ix];
>>> +          /* Sanity check that there are enough entries in value_arry
>>> +            for this function's counters. Gracefully handle the case when
>>> +            there are not, in case something in the profile info is
>>> +            corrupted.  */
>>> +          c_num = ci_ptr->num;
>>> +          if (index + c_num > cs_ptr->num)
>>> +            c_num = cs_ptr->num - index;
>>> +          /* Copy over this function's counter values.  */
>>> +          memcpy (&value_array[index], ci_ptr->values,
>>> +                  sizeof (gcov_type) * c_num);
>>> +          index += c_num;
>>> +        }
>>> +    }
>>> +
>>> +  /* Sort all the counter values by descending value and finally
>>> +     accumulate the values from hottest on down until reaching
>>> +     the cutoff value computed earlier.  */
>>> +  qsort (value_array, cs_ptr->num, sizeof (gcov_type),
>>> +         sort_by_reverse_gcov_value);
>>> +  for (cum = 0, c_num = 0; c_num < cs_ptr->num && cum < cum_cutoff;
c_num++)
>>> +    cum += value_array[c_num];
>>> +  /* Record the number of counters required to reach the cutoff value,
>>> +     as well as the counter value that caused cum to reach the cutoff.  */
>>> +  cs_ptr->num_hot_counters = c_num;
>>> +  cs_ptr->hot_cutoff_value = c_num > 0 ? value_array[c_num-1] : 0;
>>> +  free (value_array);
>>> +}
>>> +
>>>  /* Dump the coverage counts. We merge with existing counts when
>>>     possible, to avoid growing the .da files ad infinitum. We use this
>>>     program's checksum to make sure we only accumulate whole program
>>> @@ -347,6 +461,7 @@ gcov_exit (void)
>>>             }
>>>         }
>>>      }
>>> +  gcov_compute_cutoff_values (&this_prg);
>>>
>>>    {
>>>      /* Check if the level of dirs to strip off specified. */
>>> @@ -598,7 +713,12 @@ gcov_exit (void)
>>>           if (gi_ptr->merge[t_ix])
>>>             {
>>>               if (!cs_prg->runs++)
>>> -               cs_prg->num = cs_tprg->num;
>>> +                cs_prg->num = cs_tprg->num;
>>> +              if (cs_tprg->num_hot_counters > cs_prg->num_hot_counters)
>>> +              {
>>> +                cs_prg->num_hot_counters = cs_tprg->num_hot_counters;
>>> +                cs_prg->hot_cutoff_value = cs_tprg->hot_cutoff_value;
>>> +              }
>>>               cs_prg->sum_all += cs_tprg->sum_all;
>>>               if (cs_prg->run_max < cs_tprg->run_max)
>>>                 cs_prg->run_max = cs_tprg->run_max;
>>> Index: gcc/doc/invoke.texi
>>> ===================================================================
>>> --- gcc/doc/invoke.texi (revision 189893)
>>> +++ gcc/doc/invoke.texi (working copy)
>>> @@ -385,7 +385,7 @@ Objective-C and Objective-C++ Dialects}.
>>>  -fno-sched-interblock -fno-sched-spec -fno-signed-zeros @gol
>>>  -fno-toplevel-reorder -fno-trapping-math -fno-zero-initialized-in-bss @gol
>>>  -fomit-frame-pointer -foptimize-register-move -foptimize-sibling-calls
@gol
>>> --fpartial-inlining -fpeel-loops -fpredictive-commoning @gol
>>> +-fpartial-inlining -fpeel-codesize-limit -fpeel-loops
-fpredictive-commoning @gol
>>>  -fprefetch-loop-arrays @gol
>>>  -fprofile-correction -fprofile-dir=@var{path} -fprofile-generate @gol
>>>  -fprofile-generate=@var{path} @gol
>>> @@ -417,7 +417,7 @@ Objective-C and Objective-C++ Dialects}.
>>>  -ftree-reassoc -ftree-sink -ftree-slsr -ftree-sra @gol
>>>  -ftree-switch-conversion -ftree-tail-merge @gol
>>>  -ftree-ter -ftree-vect-loop-version -ftree-vectorize -ftree-vrp @gol
>>> --funit-at-a-time -funroll-all-loops -funroll-loops @gol
>>> +-funit-at-a-time -funroll-all-loops -funroll-loops -funroll-codesize-limit
@gol
>>>  -funsafe-loop-optimizations -funsafe-math-optimizations -funswitch-loops
@gol
>>>  -fvariable-expansion-in-unroller -fvect-cost-model -fvpt -fweb @gol
>>>  -fwhole-program -fwpa -fuse-linker-plugin @gol
>>> @@ -8527,6 +8527,14 @@ the loop is entered.  This usually makes programs
>>>  @option{-funroll-all-loops} implies the same options as
>>>  @option{-funroll-loops}.
>>>
>>> +@item -funroll-codesize-limit
>>> +@opindex funroll-codesize-limit
>>> +Limit loop unrolling of non-const non-FP loops in a profile feedback
compilation
>>> +under estimates of a large code footprint. Enabled by default with
>>> +@option{-fprofile-use}. Code size and execution weight thresholds are
controlled
>>> +by the @option{unrollpeel-codesize-threshold} and
>>> +@option{unrollpeel-hotness-threshold} parameters.
>>> +
>>>  @item -fpeel-loops
>>>  @opindex fpeel-loops
>>>  Peels loops for which there is enough information that they do not
>>> @@ -8535,6 +8543,14 @@ roll much (from profile feedback).  It also turns
>>>
>>>  Enabled with @option{-fprofile-use}.
>>>
>>> +@item -fpeel-codesize-limit
>>> +@opindex fpeel-codesize-limit
>>> +Limit loop peeling of non-const non-FP loops in a profile feedback
compilation
>>> +under estimates of a large code footprint. Enabled by default with
>>> +@option{-fprofile-use}. Code size and execution weight thresholds are
controlled
>>> +by the @option{unrollpeel-codesize-threshold} and
>>> +@option{unrollpeel-hotness-threshold} parameters.
>>> +
>>>  @item -fmove-loop-invariants
>>>  @opindex fmove-loop-invariants
>>>  Enables the loop invariant motion pass in the RTL loop optimizer.  Enabled
>>> @@ -8886,6 +8902,15 @@ The maximum number of iterations of a loop to be s
>>>  @item max-completely-peel-loop-nest-depth
>>>  The maximum depth of a loop nest suitable for complete peeling.
>>>
>>> +@item unrollpeel-codesize-threshold
>>> +Maximum profile-based code size footprint estimate for loop unrolling and
>>> +peeling.
>>> +
>>> +@item unrollpeel-hotness-threshold
>>> +Maximum ratio of total execution count to loop entry block count under
which
>>> +most profile-based code size estimates will be ignored for loop unrolling
and
>>> +peeling.
>>> +
>>>  @item max-unswitch-insns
>>>  The maximum number of insns of an unswitched loop.
>>>
>>> Index: gcc/gcov-io.c
>>> ===================================================================
>>> --- gcc/gcov-io.c       (revision 189893)
>>> +++ gcc/gcov-io.c       (working copy)
>>> @@ -376,6 +376,8 @@ gcov_write_summary (gcov_unsigned_t tag, const str
>>>    for (csum = summary->ctrs, ix = GCOV_COUNTERS_SUMMABLE; ix--; csum++)
>>>      {
>>>        gcov_write_unsigned (csum->num);
>>> +      gcov_write_unsigned (csum->num_hot_counters);
>>> +      gcov_write_counter (csum->hot_cutoff_value);
>>>        gcov_write_unsigned (csum->runs);
>>>        gcov_write_counter (csum->sum_all);
>>>        gcov_write_counter (csum->run_max);
>>> @@ -495,6 +497,8 @@ gcov_read_summary (struct gcov_summary *summary)
>>>    for (csum = summary->ctrs, ix = GCOV_COUNTERS_SUMMABLE; ix--; csum++)
>>>      {
>>>        csum->num = gcov_read_unsigned ();
>>> +      csum->num_hot_counters = gcov_read_unsigned ();
>>> +      csum->hot_cutoff_value = gcov_read_counter ();
>>>        csum->runs = gcov_read_unsigned ();
>>>        csum->sum_all = gcov_read_counter ();
>>>        csum->run_max = gcov_read_counter ();
>>> Index: gcc/gcov-io.h
>>> ===================================================================
>>> --- gcc/gcov-io.h       (revision 189893)
>>> +++ gcc/gcov-io.h       (working copy)
>>> @@ -310,7 +310,7 @@ typedef HOST_WIDEST_INT gcov_type;
>>>  #define GCOV_TAG_OBJECT_SUMMARY  ((gcov_unsigned_t)0xa1000000) /* Obsolete
*/
>>>  #define GCOV_TAG_PROGRAM_SUMMARY ((gcov_unsigned_t)0xa3000000)
>>>  #define GCOV_TAG_SUMMARY_LENGTH  \
>>> -       (1 + GCOV_COUNTERS_SUMMABLE * (2 + 3 * 2))
>>> +       (1 + GCOV_COUNTERS_SUMMABLE * (3 + 4 * 2))
>>>
>>>  /* Counters that are collected.  */
>>>  #define GCOV_COUNTER_ARCS      0  /* Arc transitions.  */
>>> @@ -393,6 +393,10 @@ typedef HOST_WIDEST_INT gcov_type;
>>>  struct gcov_ctr_summary
>>>  {
>>>    gcov_unsigned_t num;         /* number of counters.  */
>>> +  gcov_unsigned_t num_hot_counters;/* number of counters to reach a given
>>> +                                      percent of sum_all.  */
>>> +  gcov_type hot_cutoff_value;   /* value of smallest counter included in
>>> +                                   num_hot_counters.  */
>>>    gcov_unsigned_t runs;                /* number of program runs */
>>>    gcov_type sum_all;           /* sum of all counters accumulated.  */
>>>    gcov_type run_max;           /* maximum value on a single run.  */
>>> Index: gcc/loop-unroll.c
>>> ===================================================================
>>> --- gcc/loop-unroll.c   (revision 189893)
>>> +++ gcc/loop-unroll.c   (working copy)
>>> @@ -32,6 +32,7 @@ along with GCC; see the file COPYING3.  If not see
>>>  #include "hashtab.h"
>>>  #include "recog.h"
>>>  #include "target.h"
>>> +#include "gcov-io.h"
>>>  #include "dumpfile.h"
>>>
>>>  /* This pass performs loop unrolling and peeling.  We only perform these
>>> @@ -151,6 +152,75 @@ static void combine_var_copies_in_loop_exit (struc
>>>                                              basic_block);
>>>  static rtx get_expansion (struct var_to_expand *);
>>>
>>> +/* Determine whether and how much LOOP unrolling/peeling should be
constrained
>>> +   based on code footprint estimates. Returns the codesize-based factor to
be
>>> +   divided into the max instructions in an unrolled or peeled loop:
>>> +   1) For size <= threshold, do not limit (by returning 1).
>>> +   2) For threshold < size < 2*threshold, reduce maximum allowed peeled or
>>> +      unrolled instructions according to loop hotness.
>>> +   3) For threshold >= 2*threshold, disable unrolling/peeling (by
returning
>>> +      INT_MAX).  */
>>> +
>>> +static int
>>> +code_size_limit_factor(struct loop *loop)
>>> +{
>>> +  unsigned size_threshold;
>>> +  struct niter_desc *desc = get_simple_loop_desc (loop);
>>> +  gcov_type sum_to_header_ratio;
>>> +  int hotness_ratio_threshold;
>>> +  int limit_factor;
>>> +
>>> +  /* First check if the application has a large codesize footprint.
>>> +     This is estimated from FDO profile summary information for the
>>> +     program, where the num_hot_counters indicates the number of hottest
>>> +     counters (blocks) that compose most of the execution time of
>>> +     the program. A large value would indicate a large flat execution
>>> +     profile where icache misses may be a concern.  */
>>> +  size_threshold = PARAM_VALUE (PARAM_UNROLLPEEL_CODESIZE_THRESHOLD);
>>> +  if (!profile_info
>>> +      || profile_info->num_hot_counters <= size_threshold
>>> +      || !profile_info->sum_all)
>>> +    return 1;
>>> +
>>> +  /* Next, exclude some loops where unrolling/peeling may be more
>>> +     important to overall performance.  */
>>> +
>>> +  /* Ignore FP loops, which are more likely to benefit heavily from
>>> +     unrolling. */
>>> +  if (desc->has_fp)
>>> +    return 1;
>>> +
>>> +  /* Next, set the value of the codesize-based unroll factor divisor which
in
>>> +     most loops will need to be set to a value that will reduce or
eliminate
>>> +     unrolling/peeling.  */
>>> +  if (profile_info->num_hot_counters < size_threshold * 2)
>>> +    {
>>> +      /* For applications that are less than twice the codesize limit,
allow
>>> +         limited unrolling for very hot loops.  */
>>> +      sum_to_header_ratio = profile_info->sum_all / loop->header->count;
>>> +      hotness_ratio_threshold = PARAM_VALUE
(PARAM_UNROLLPEEL_HOTNESS_THRESHOLD);
>>> +      /* When the profile count sum to loop entry header ratio is smaller
than
>>> +         the threshold (i.e. the loop entry is hot enough, the divisor is
set
>>> +         to 1 so the unroll/peel factor is not reduced. When it is bigger
>>> +         than the ratio, increase the divisor by the amount this ratio
>>> +         is over the threshold, which will quickly reduce the unroll/peel
>>> +         factor to zero as the loop's hotness reduces.  */
>>> +      if (sum_to_header_ratio > hotness_ratio_threshold)
>>> +        {
>>> +          limit_factor = sum_to_header_ratio / hotness_ratio_threshold;
>>> +          gcc_assert (limit_factor >= 1);
>>> +        }
>>> +      else
>>> +        limit_factor = 1;
>>> +    }
>>> +  else
>>> +    /* For appliations that are at least twice the codesize limit, set
>>> +       the divisor to a large value that will force the unroll factor to 0.
 */
>>> +    limit_factor = INT_MAX;
>>> +
>>> +  return limit_factor;
>>> +}
>>> +
>>>  /* Unroll and/or peel (depending on FLAGS) LOOPS.  */
>>>  void
>>>  unroll_and_peel_loops (int flags)
>>> @@ -803,6 +873,7 @@ decide_unroll_runtime_iterations (struct loop *loo
>>>  {
>>>    unsigned nunroll, nunroll_by_av, i;
>>>    struct niter_desc *desc;
>>> +  int limit_factor = 1;
>>>
>>>    if (!(flags & UAP_UNROLL))
>>>      {
>>> @@ -815,10 +886,26 @@ decide_unroll_runtime_iterations (struct loop *loo
>>>              "\n;; Considering unrolling loop with runtime "
>>>              "computable number of iterations\n");
>>>
>>> +  if (flag_unroll_codesize_limit)
>>> +    {
>>> +      /* Determine whether to limit code size growth from unrolling,
>>> +         using FDO profile summary information that gives an
>>> +         estimated number of executed blocks.  */
>>> +      limit_factor = code_size_limit_factor (loop);
>>> +      if (dump_file && limit_factor > 1)
>>> +        {
>>> +          fprintf (dump_file,
>>> +                   ";; Due to large code size footprint estimate, limit "
>>> +                   "max unrolled insns by divisor %d\n", limit_factor);
>>> +        }
>>> +    }
>>> +
>>>    /* nunroll = total number of copies of the original loop body in
>>>       unrolled loop (i.e. if it is 2, we have to duplicate loop body once. 
*/
>>> -  nunroll = PARAM_VALUE (PARAM_MAX_UNROLLED_INSNS) / loop->ninsns;
>>> -  nunroll_by_av = PARAM_VALUE (PARAM_MAX_AVERAGE_UNROLLED_INSNS) /
loop->av_ninsns;
>>> +  nunroll = PARAM_VALUE (PARAM_MAX_UNROLLED_INSNS) / limit_factor
>>> +            / loop->ninsns;
>>> +  nunroll_by_av = PARAM_VALUE (PARAM_MAX_AVERAGE_UNROLLED_INSNS)
>>> +                  / limit_factor / loop->av_ninsns;
>>>    if (nunroll > nunroll_by_av)
>>>      nunroll = nunroll_by_av;
>>>    if (nunroll > (unsigned) PARAM_VALUE (PARAM_MAX_UNROLL_TIMES))
>>> @@ -1196,6 +1283,7 @@ decide_peel_simple (struct loop *loop, int flags)
>>>  {
>>>    unsigned npeel;
>>>    struct niter_desc *desc;
>>> +  int limit_factor = 1;
>>>
>>>    if (!(flags & UAP_PEEL))
>>>      {
>>> @@ -1206,8 +1294,23 @@ decide_peel_simple (struct loop *loop, int flags)
>>>    if (dump_file)
>>>      fprintf (dump_file, "\n;; Considering simply peeling loop\n");
>>>
>>> +  if (flag_peel_codesize_limit)
>>> +    {
>>> +      /* Determine whether to limit code size growth from peeling,
>>> +         using FDO profile summary information that gives an
>>> +         estimated number of executed blocks.  */
>>> +      limit_factor = code_size_limit_factor (loop);
>>> +      if (dump_file && limit_factor > 1)
>>> +       {
>>> +          fprintf (dump_file,
>>> +                   ";; Due to large code size footprint estimate, limit "
>>> +                   "max peeled insns by divisor %d\n", limit_factor);
>>> +       }
>>> +    }
>>> +
>>>    /* npeel = number of iterations to peel.  */
>>> -  npeel = PARAM_VALUE (PARAM_MAX_PEELED_INSNS) / loop->ninsns;
>>> +  npeel = PARAM_VALUE (PARAM_MAX_PEELED_INSNS) / limit_factor
>>> +          / loop->ninsns;
>>>    if (npeel > (unsigned) PARAM_VALUE (PARAM_MAX_PEEL_TIMES))
>>>      npeel = PARAM_VALUE (PARAM_MAX_PEEL_TIMES);
>>>
>>> @@ -1232,7 +1335,7 @@ decide_peel_simple (struct loop *loop, int flags)
>>>
>>>    /* Do not simply peel loops with branches inside -- it increases number
>>>       of mispredicts.  */
>>> -  if (num_loop_branches (loop) > 1)
>>> +  if (desc->num_branches > 1)
>>>      {
>>>        if (dump_file)
>>>         fprintf (dump_file, ";; Not peeling, contains branches\n");
>>> @@ -1349,6 +1452,7 @@ decide_unroll_stupid (struct loop *loop, int flags
>>>  {
>>>    unsigned nunroll, nunroll_by_av, i;
>>>    struct niter_desc *desc;
>>> +  int limit_factor = 1;
>>>
>>>    if (!(flags & UAP_UNROLL_ALL))
>>>      {
>>> @@ -1359,11 +1463,26 @@ decide_unroll_stupid (struct loop *loop, int flags
>>>    if (dump_file)
>>>      fprintf (dump_file, "\n;; Considering unrolling loop stupidly\n");
>>>
>>> +  if (flag_unroll_codesize_limit)
>>> +    {
>>> +      /* Determine whether to limit code size growth from unrolling,
>>> +         using FDO profile summary information that gives an
>>> +         estimated number of executed blocks.  */
>>> +      limit_factor = code_size_limit_factor (loop);
>>> +      if (dump_file && limit_factor > 1)
>>> +       {
>>> +          fprintf (dump_file,
>>> +                   ";; Due to large code size footprint estimate, limit "
>>> +                   "max unrolled insns by divisor %d\n", limit_factor);
>>> +       }
>>> +    }
>>> +
>>>    /* nunroll = total number of copies of the original loop body in
>>>       unrolled loop (i.e. if it is 2, we have to duplicate loop body once. 
*/
>>> -  nunroll = PARAM_VALUE (PARAM_MAX_UNROLLED_INSNS) / loop->ninsns;
>>> -  nunroll_by_av
>>> -    = PARAM_VALUE (PARAM_MAX_AVERAGE_UNROLLED_INSNS) / loop->av_ninsns;
>>> +  nunroll = PARAM_VALUE (PARAM_MAX_UNROLLED_INSNS) / limit_factor
>>> +            / loop->ninsns;
>>> +  nunroll_by_av = PARAM_VALUE (PARAM_MAX_AVERAGE_UNROLLED_INSNS)
>>> +                  / limit_factor / loop->av_ninsns;
>>>    if (nunroll > nunroll_by_av)
>>>      nunroll = nunroll_by_av;
>>>    if (nunroll > (unsigned) PARAM_VALUE (PARAM_MAX_UNROLL_TIMES))
>>> @@ -1393,7 +1512,7 @@ decide_unroll_stupid (struct loop *loop, int flags
>>>
>>>    /* Do not unroll loops with branches inside -- it increases number
>>>       of mispredicts.  */
>>> -  if (num_loop_branches (loop) > 1)
>>> +  if (desc->num_branches > 1)
>>>      {
>>>        if (dump_file)
>>>         fprintf (dump_file, ";; Not unrolling, contains branches\n");
>>> Index: gcc/coverage.c
>>> ===================================================================
>>> --- gcc/coverage.c      (revision 189893)
>>> +++ gcc/coverage.c      (working copy)
>>> @@ -247,6 +247,14 @@ read_counts_file (void)
>>>           gcov_read_summary (&sum);
>>>           for (ix = 0; ix != GCOV_COUNTERS_SUMMABLE; ix++)
>>>             {
>>> +              if (summary.ctrs[ix].num_hot_counters
>>> +                  < sum.ctrs[ix].num_hot_counters)
>>> +                {
>>> +                 summary.ctrs[ix].num_hot_counters
>>> +                      = sum.ctrs[ix].num_hot_counters;
>>> +                 summary.ctrs[ix].hot_cutoff_value
>>> +                      = sum.ctrs[ix].hot_cutoff_value;
>>> +                }
>>>               summary.ctrs[ix].runs += sum.ctrs[ix].runs;
>>>               summary.ctrs[ix].sum_all += sum.ctrs[ix].sum_all;
>>>               if (summary.ctrs[ix].run_max < sum.ctrs[ix].run_max)
>>> Index: gcc/loop-iv.c
>>> ===================================================================
>>> --- gcc/loop-iv.c       (revision 189893)
>>> +++ gcc/loop-iv.c       (working copy)
>>> @@ -2970,8 +2970,12 @@ get_simple_loop_desc (struct loop *loop)
>>>    /* At least desc->infinite is not always initialized by
>>>       find_simple_loop_exit.  */
>>>    desc = XCNEW (struct niter_desc);
>>> -  iv_analysis_loop_init (loop);
>>> -  find_simple_exit (loop, desc);
>>> +  if (loop->latch != EXIT_BLOCK_PTR)
>>> +    {
>>> +      iv_analysis_loop_init (loop);
>>> +      find_simple_exit (loop, desc);
>>> +    }
>>> +  analyze_loop_insns (loop, desc);
>>>    loop->aux = desc;
>>>
>>>    if (desc->simple_p && (desc->assumptions || desc->infinite))
>>> Index: gcc/common.opt
>>> ===================================================================
>>> --- gcc/common.opt      (revision 189893)
>>> +++ gcc/common.opt      (working copy)
>>> @@ -1551,6 +1551,10 @@ fpcc-struct-return
>>>  Common Report Var(flag_pcc_struct_return,1)
Init(DEFAULT_PCC_STRUCT_RETURN)
>>>  Return small aggregates in memory, not registers
>>>
>>> +fpeel-codesize-limit
>>> +Common Report Var(flag_peel_codesize_limit) Init(1) Optimization
>>> +Limit non-const non-FP loop peeling under profile estimates of large code
footprint
>>> +
>>>  fpeel-loops
>>>  Common Report Var(flag_peel_loops) Optimization
>>>  Perform loop peeling
>>> @@ -2112,6 +2116,10 @@ funroll-all-loops
>>>  Common Report Var(flag_unroll_all_loops) Optimization
>>>  Perform loop unrolling for all loops
>>>
>>> +funroll-codesize-limit
>>> +Common Report Var(flag_unroll_codesize_limit) Init(1) Optimization
>>> +Limit non-const non-FP loop unrolling under profile estimates of large code
footprint
>>> +
>>>  ; Nonzero means that loop optimizer may assume that the induction
variables
>>>  ; that control loops do not overflow and that the loops with nontrivial
>>>  ; exit condition are not infinite
>>> Index: gcc/cfgloop.c
>>> ===================================================================
>>> --- gcc/cfgloop.c       (revision 189893)
>>> +++ gcc/cfgloop.c       (working copy)
>>> @@ -1154,24 +1154,98 @@ get_loop_exit_edges (const struct loop *loop)
>>>    return edges;
>>>  }
>>>
>>> -/* Counts the number of conditional branches inside LOOP.  */
>>> +/* Determine if INSN is a floating point set.  */
>>>
>>> -unsigned
>>> -num_loop_branches (const struct loop *loop)
>>> +static bool
>>> +insn_has_fp_set(rtx insn)
>>>  {
>>> -  unsigned i, n;
>>> -  basic_block * body;
>>> +  int i;
>>> +  rtx pat = PATTERN(insn);
>>> +  if (GET_CODE (pat) == SET)
>>> +    return (FLOAT_MODE_P (GET_MODE (SET_DEST (pat))));
>>> +  else if (GET_CODE (pat) == PARALLEL)
>>> +    {
>>> +      for (i = 0; i < XVECLEN (pat, 0); i++)
>>> +        {
>>> +          rtx sub = XVECEXP (pat, 0, i);
>>> +          if (GET_CODE (sub) == SET)
>>> +            return (FLOAT_MODE_P (GET_MODE (SET_DEST (sub))));
>>> +        }
>>> +    }
>>> +  return false;
>>> +}
>>>
>>> -  gcc_assert (loop->latch != EXIT_BLOCK_PTR);
>>> +/* Analyzes the instructions inside LOOP, updating the DESC. Currently
counts
>>> +   the number of conditional branch instructions, calls and fp
instructions,
>>> +   as well as the average number of branches executed per iteration.  */
>>>
>>> +void
>>> +analyze_loop_insns (const struct loop *loop, struct niter_desc *desc)
>>> +{
>>> +  unsigned i, nbranch;
>>> +  gcov_type weighted_nbranch;
>>> +  bool has_call, has_fp;
>>> +  basic_block * body, bb;
>>> +  rtx insn;
>>> +  gcov_type header_count = loop->header->count;
>>> +
>>> +  nbranch = weighted_nbranch = 0;
>>> +  has_call = has_fp = false;
>>> +
>>>    body = get_loop_body (loop);
>>> -  n = 0;
>>>    for (i = 0; i < loop->num_nodes; i++)
>>> -    if (EDGE_COUNT (body[i]->succs) >= 2)
>>> -      n++;
>>> +    {
>>> +      bb = body[i];
>>> +
>>> +      if (EDGE_COUNT (bb->succs) >= 2)
>>> +        {
>>> +          nbranch++;
>>> +
>>> +          /* If this block is executed less frequently than the header
(loop
>>> +             entry), then it is weighted based on its execution count,
which
>>> +             will be turned into a ratio compared to the loop header below.
*/
>>> +          if (bb->count < header_count)
>>> +            weighted_nbranch += bb->count;
>>> +
>>> +          /* When it is executed more frequently than the header (i.e. it
is
>>> +             in a nested inner loop), simply weight the branch the same as
the
>>> +             header execution count, so that it will contribute 1 branch
to
>>> +             the ratio computed below. */
>>> +          else
>>> +            weighted_nbranch += header_count;
>>> +        }
>>> +
>>> +      /* No need to iterate through the instructions below if
>>> +         both flags have already been set.  */
>>> +      if (has_call && has_fp)
>>> +        continue;
>>> +
>>> +      FOR_BB_INSNS (bb, insn)
>>> +        {
>>> +          if (!INSN_P (insn))
>>> +            continue;
>>> +
>>> +          if (!has_call)
>>> +            has_call = CALL_P (insn);
>>> +
>>> +          if (!has_fp)
>>> +            has_fp = insn_has_fp_set (insn);
>>> +        }
>>> +    }
>>>    free (body);
>>>
>>> -  return n;
>>> +  desc->num_branches = nbranch;
>>> +  /* Now divide the weights computed above by the loop header execution
count,
>>> +     to compute the average number of branches through the loop. By adding
>>> +     header_count/2 to the numerator we round to nearest with integer
>>> +     division.  */
>>> +  if (header_count  != 0)
>>> +    desc->av_num_branches
>>> +        = (weighted_nbranch + header_count/2) / header_count;
>>> +  else
>>> +    desc->av_num_branches = 0;
>>> +  desc->has_call = has_call;
>>> +  desc->has_fp = has_fp;
>>>  }
>>>
>>>  /* Adds basic block BB to LOOP.  */
>>> Index: gcc/cfgloop.h
>>> ===================================================================
>>> --- gcc/cfgloop.h       (revision 189893)
>>> +++ gcc/cfgloop.h       (working copy)
>>> @@ -255,7 +255,6 @@ extern basic_block *get_loop_body_in_custom_order
>>>
>>>  extern VEC (edge, heap) *get_loop_exit_edges (const struct loop *);
>>>  edge single_exit (const struct loop *);
>>> -extern unsigned num_loop_branches (const struct loop *);
>>>
>>>  extern edge loop_preheader_edge (const struct loop *);
>>>  extern edge loop_latch_edge (const struct loop *);
>>> @@ -366,7 +365,8 @@ struct rtx_iv
>>>  };
>>>
>>>  /* The description of an exit from the loop and of the number of
iterations
>>> -   till we take the exit.  */
>>> +   till we take the exit. Also includes other information used primarily
>>> +   by the loop unroller.  */
>>>
>>>  struct niter_desc
>>>  {
>>> @@ -407,6 +407,18 @@ struct niter_desc
>>>
>>>    /* The number of iterations of the loop.  */
>>>    rtx niter_expr;
>>> +
>>> +  /* The number of branches in the loop.  */
>>> +  unsigned num_branches;
>>> +
>>> +  /* The number of executed branches per iteration.  */
>>> +  unsigned av_num_branches;
>>> +
>>> +  /* Whether the loop contains a call instruction.  */
>>> +  bool has_call;
>>> +
>>> +  /* Whether the loop contains fp instructions.  */
>>> +  bool has_fp;
>>>  };
>>>
>>>  extern void iv_analysis_loop_init (struct loop *);
>>> @@ -420,6 +432,7 @@ extern void iv_analysis_done (void);
>>>
>>>  extern struct niter_desc *get_simple_loop_desc (struct loop *loop);
>>>  extern void free_simple_loop_desc (struct loop *loop);
>>> +void analyze_loop_insns (const struct loop *, struct niter_desc *desc);
>>>
>>>  static inline struct niter_desc *
>>>  simple_loop_desc (struct loop *loop)
>>> Index: gcc/params.def
>>> ===================================================================
>>> --- gcc/params.def      (revision 189893)
>>> +++ gcc/params.def      (working copy)
>>> @@ -311,6 +311,23 @@ DEFPARAM(PARAM_MAX_UNROLL_ITERATIONS,
>>>          "max-completely-peel-loop-nest-depth",
>>>          "The maximum depth of a loop nest we completely peel",
>>>          8, 0, 0)
>>> +/* The maximum code size estimate under which loop unrolling and peeling
>>> + * is allowed in a profile feedback compile. This currently applies to
loops
>>> + * with non-constant iteration counts and no floating point computations. 
*/
>>> +DEFPARAM(PARAM_UNROLLPEEL_CODESIZE_THRESHOLD,
>>> +        "unrollpeel-codesize-threshold",
>>> +        "Maximum profile-based code size footprint estimate for loop
unrolling "
>>> +        "and peeling",
>>> +        15000, 0, 0)
>>> +/* The maximum ratio of total profiled execution counts to loop entry
block
>>> +   count that must be exceeded to ignore most code size limits when
unrolling
>>> +   and peeling.  */
>>> +DEFPARAM(PARAM_UNROLLPEEL_HOTNESS_THRESHOLD,
>>> +        "unrollpeel-hotness-threshold",
>>> +        "Maximum ratio of total profiled execution count to loop entry "
>>> +        "block count under which most codesize limits for unrolling and "
>>> +        "peeling will be ignored",
>>> +        100, 1, 0)
>>>
>>>  /* The maximum number of insns of an unswitched loop.  */
>>>  DEFPARAM(PARAM_MAX_UNSWITCH_INSNS,
>>> Index: gcc/gcov-dump.c
>>> ===================================================================
>>> --- gcc/gcov-dump.c     (revision 189893)
>>> +++ gcc/gcov-dump.c     (working copy)
>>> @@ -456,8 +456,12 @@ tag_summary (const char *filename ATTRIBUTE_UNUSED
>>>      {
>>>        printf ("\n");
>>>        print_prefix (filename, 0, 0);
>>> -      printf ("\t\tcounts=%u, runs=%u",
>>> -             summary.ctrs[ix].num, summary.ctrs[ix].runs);
>>> +      printf ("\t\tcounts=%u (num hot counts=%u, hot cutoff count="
>>> +              HOST_WIDEST_INT_PRINT_DEC "), runs=%u",
>>> +             summary.ctrs[ix].num,
>>> +             summary.ctrs[ix].num_hot_counters,
>>> +             (HOST_WIDEST_INT)summary.ctrs[ix].hot_cutoff_value,
>>> +             summary.ctrs[ix].runs);
>>>
>>>        printf (", sum_all=" HOST_WIDEST_INT_PRINT_DEC,
>>>               (HOST_WIDEST_INT)summary.ctrs[ix].sum_all);
>>>
>>> --
>>> This patch is available for review at http://codereview.appspot.com/6351086
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413



-- 
Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Sign in to reply to this message.

Powered by Google App Engine
RSS Feeds Recent Issues | This issue
This is Rietveld f62528b