This is the google/4_6 version of the patch to add new program summary information to ...
12 years, 10 months ago
(2012-06-07 06:42:22 UTC)
#1
This is the google/4_6 version of the patch to add new program summary
information to the gcov profile files to use as a estimate of code size
for guiding unrolling.
Bootstrapped and tested on x86_64-unknown-linux-gnu. Ok for google/4_6?
Thanks,
Teresa
Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi (revision 188240)
+++ doc/invoke.texi (working copy)
@@ -384,7 +384,7 @@ Objective-C and Objective-C++ Dialects}.
-fno-sched-interblock -fno-sched-spec -fno-signed-zeros @gol
-fno-toplevel-reorder -fno-trapping-math -fno-zero-initialized-in-bss @gol
-fomit-frame-pointer -foptimize-register-move -foptimize-sibling-calls @gol
--fpartial-inlining -fpeel-loops -fpredictive-commoning @gol
+-fpartial-inlining -fpeel-codesize-limit -fpeel-loops -fpredictive-commoning
@gol
-fprefetch-loop-arrays @gol
-fprofile-correction -fprofile-dir=@var{path} -fprofile-generate @gol
-fprofile-generate=@var{path} -fprofile-generate-sampling @gol
@@ -396,8 +396,8 @@ Objective-C and Objective-C++ Dialects}.
-freorder-blocks-and-partition -freorder-functions @gol
-frerun-cse-after-loop -freschedule-modulo-scheduled-loops @gol
-fripa -fripa-disallow-asm-modules -fripa-disallow-opt-mismatch @gol
--fripa-inc-path-sub=@var{path_mapping} -fripa-no-promote-always-inline-func
-fripa-verbose @gol
--fripa-peel-size-limit -fripa-unroll-size-limit -frounding-math @gol
+-fripa-inc-path-sub=@var{path_mapping} -fripa-no-promote-always-inline-func
@gol
+-fripa-verbose -frounding-math @gol
-fsched2-use-superblocks -fsched-pressure @gol
-fsched-spec-load -fsched-spec-load-dangerous @gol
-fsched-stalled-insns-dep[=@var{n}] -fsched-stalled-insns[=@var{n}] @gol
@@ -420,7 +420,7 @@ Objective-C and Objective-C++ Dialects}.
-ftree-parallelize-loops=@var{n} -ftree-pre -ftree-pta -ftree-reassoc @gol
-ftree-sink -ftree-sra -ftree-switch-conversion @gol
-ftree-ter -ftree-vect-loop-version -ftree-vectorize -ftree-vrp @gol
--funit-at-a-time -funroll-all-loops -funroll-loops @gol
+-funit-at-a-time -funroll-all-loops -funroll-loops -funroll-codesize-limit
@gol
-funsafe-loop-optimizations -funsafe-math-optimizations -funswitch-loops @gol
-fvariable-expansion-in-unroller -fvect-cost-model -fvpt -fweb @gol
-fwhole-program -fwpa -fuse-ld -fuse-linker-plugin @gol
@@ -8187,20 +8187,6 @@ Do not promote static functions with always inline
Enable printing of verbose information about dynamic inter-procedural
optimizations.
This is used in conjunction with the @option{-fripa}.
-@item -fripa-peel-size-limit
-@opindex fripa-peel-size-limit
-Limit loop peeling of non-const non-FP loops in a LIPO compilation under
estimates
-of a large code footprint. Enabled by default under @option{-fripa}. Code size
-estimation and thresholds are controlled by the
@option{codesize-hotness-threshold}
-and @option{unrollpeel-codesize-threshold} parameters.
-
-@item -fripa-unroll-size-limit
-@opindex fripa-unroll-size-limit
-Limit loop unrolling of non-const non-FP loops in a LIPO compilation under
estimates
-of a large code footprint. Enabled by default under @option{-fripa}. Code size
-estimation and thresholds are controlled by the
@option{codesize-hotness-threshold}
-and @option{unrollpeel-codesize-threshold} parameters.
-
@item fripa-auxiliary_module_id
@opindex fripa-auxiliary_module_id
Specify the global module id for an auxiliary module. This is only effective
in
@@ -8545,6 +8531,14 @@ the loop is entered. This usually makes programs
@option{-funroll-all-loops} implies the same options as
@option{-funroll-loops}.
+@item -funroll-codesize-limit
+@opindex funroll-codesize-limit
+Limit loop unrolling of non-const non-FP loops in a profile feedback
compilation
+under estimates of a large code footprint. Enabled by default with
+@option{-fprofile-use}. Code size and execution weight thresholds are
controlled
+by the @option{unrollpeel-codesize-threshold} and
+@option{unrollpeel-hotness-threshold} parameters.
+
@item -fpeel-loops
@opindex fpeel-loops
Peels the loops for that there is enough information that they do not
@@ -8553,6 +8547,14 @@ roll much (from profile feedback). It also turns
Enabled with @option{-fprofile-use}.
+@item -fpeel-codesize-limit
+@opindex fpeel-codesize-limit
+Limit loop peeling of non-const non-FP loops in a profile feedback compilation
+under estimates of a large code footprint. Enabled by default with
+@option{-fprofile-use}. Code size and execution weight thresholds are
controlled
+by the @option{unrollpeel-codesize-threshold} and
+@option{unrollpeel-hotness-threshold} parameters.
+
@item -fmove-loop-invariants
@opindex fmove-loop-invariants
Enables the loop invariant motion pass in the RTL loop optimizer. Enabled
@@ -8935,13 +8937,15 @@ hot loops. Its default value is 16.
@item max-completely-peel-loop-nest-depth
The maximum depth of a loop nest suitable for complete peeling.
-@item codesize-hotness-threshold
-The minimum profile count of basic blocks to look at when estimating
-the code size footprint of the call graph in a LIPO compile.
-
@item unrollpeel-codesize-threshold
-Maximum LIPO code size footprint estimate for loop unrolling and peeling.
+Maximum profile-based code size footprint estimate for loop unrolling and
+peeling.
+@item unrollpeel-hotness-threshold
+Maximum ratio of total execution count to loop entry block count under which
+most profile-based code size estimates will be ignored for loop unrolling and
+peeling.
+
@item max-unswitch-insns
The maximum number of insns of an unswitched loop.
Index: gcov-io.c
===================================================================
--- gcov-io.c (revision 188240)
+++ gcov-io.c (working copy)
@@ -520,6 +520,7 @@ gcov_write_summary (gcov_unsigned_t tag, const str
for (csum = summary->ctrs, ix = GCOV_COUNTERS_SUMMABLE; ix--; csum++)
{
gcov_write_unsigned (csum->num);
+ gcov_write_unsigned (csum->num_hot_counters);
gcov_write_unsigned (csum->runs);
gcov_write_counter (csum->sum_all);
gcov_write_counter (csum->run_max);
@@ -637,6 +638,7 @@ gcov_read_summary (struct gcov_summary *summary)
for (csum = summary->ctrs, ix = GCOV_COUNTERS_SUMMABLE; ix--; csum++)
{
csum->num = gcov_read_unsigned ();
+ csum->num_hot_counters = gcov_read_unsigned ();
csum->runs = gcov_read_unsigned ();
csum->sum_all = gcov_read_counter ();
csum->run_max = gcov_read_counter ();
Index: gcov-io.h
===================================================================
--- gcov-io.h (revision 188240)
+++ gcov-io.h (working copy)
@@ -440,7 +440,7 @@ typedef HOST_WIDEST_INT gcov_type;
#define GCOV_TAG_OBJECT_SUMMARY ((gcov_unsigned_t)0xa1000000)
#define GCOV_TAG_PROGRAM_SUMMARY ((gcov_unsigned_t)0xa3000000)
#define GCOV_TAG_SUMMARY_LENGTH \
- (1 + GCOV_COUNTERS_SUMMABLE * (2 + 3 * 2))
+ (1 + GCOV_COUNTERS_SUMMABLE * (3 + 3 * 2))
#define GCOV_TAG_MODULE_INFO ((gcov_unsigned_t)0xa4000000)
#define GCOV_TAG_PMU_LOAD_LATENCY_INFO ((gcov_unsigned_t)0xa5000000)
#define GCOV_TAG_PMU_LOAD_LATENCY_LENGTH(filename) \
@@ -541,6 +541,8 @@ typedef HOST_WIDEST_INT gcov_type;
struct gcov_ctr_summary
{
gcov_unsigned_t num; /* number of counters. */
+ gcov_unsigned_t num_hot_counters;/* number of counters to reach a given
+ percent of sum_all. */
gcov_unsigned_t runs; /* number of program runs */
gcov_type sum_all; /* sum of all counters accumulated. */
gcov_type run_max; /* maximum value on a single run. */
Index: loop-unroll.c
===================================================================
--- loop-unroll.c (revision 188240)
+++ loop-unroll.c (working copy)
@@ -35,6 +35,7 @@ along with GCC; see the file COPYING3. If not see
#include "recog.h"
#include "target.h"
#include "diagnostic.h"
+#include "gcov-io.h"
/* This pass performs loop unrolling and peeling. We only perform these
optimizations on innermost loops (with single exception) because
@@ -181,48 +182,73 @@ report_unroll_peel(struct loop *loop, location_t l
"" : iter_str);
}
-/* This returns a bit vector */
-typedef enum {
- NO_LIMIT = 0,
- LIMIT_UNROLL = 0x1,
- LIMIT_PEEL = 0x2,
- LIMIT_BOTH = 0x3
-} limit_type;
+/* Determine whether and how much LOOP unrolling/peeling should be constrained
+ based on code footprint estimates. Returns the codesize-based factor to be
+ divided into the max instructions in an unrolled or peeled loop:
+ 1) For size <= threshold, do not limit (by returning 1).
+ 2) For threshold < size < 2*threshold, reduce maximum allowed peeled or
+ unrolled instructions according to loop hotness.
+ 3) For threshold >= 2*threshold, disable unrolling/peeling (by returning
+ INT_MAX). */
-extern int cgraph_codesize_estimate;
-
-/* Determine whether LOOP unrolling/peeling should be constrained based
- on code footprint estimates. */
-static limit_type
-limit_code_size(struct loop *loop)
+static int
+code_size_limit_factor(struct loop *loop)
{
unsigned size_threshold;
- limit_type result = NO_LIMIT;
- int result_int = 0;
struct niter_desc *desc = get_simple_loop_desc (loop);
+ gcov_type sum_to_header_ratio;
+ int hotness_ratio_threshold;
+ int limit_factor;
- if (!flag_dyn_ipa)
- return NO_LIMIT;
+ /* First check if the application has a large codesize footprint.
+ This is estimated from FDO profile summary information for the
+ program, where the num_hot_counters indicates the number of hottest
+ counters (blocks) that compose most of the execution time of
+ the program. A large value would indicate a large flat execution
+ profile where icache misses may be a concern. */
+ size_threshold = PARAM_VALUE (PARAM_UNROLLPEEL_CODESIZE_THRESHOLD);
+ if (!profile_info
+ || profile_info->num_hot_counters <= size_threshold
+ || !profile_info->sum_all)
+ return 1;
- gcc_assert (cgraph_codesize_estimate >= 0);
+ /* Next, exclude some loops where unrolling/peeling may be more
+ important to overall performance. */
/* Ignore FP loops, which are more likely to benefit heavily from
unrolling. */
if (desc->has_fp)
- return NO_LIMIT;
+ return 1;
- size_threshold = PARAM_VALUE (PARAM_UNROLLPEEL_CODESIZE_THRESHOLD);
- if (cgraph_codesize_estimate <= (int)size_threshold)
- return NO_LIMIT;
+ /* Next, set the value of the codesize-based unroll factor divisor which in
+ most loops will need to be set to a value that will reduce or eliminate
+ unrolling/peeling. */
+ if (profile_info->num_hot_counters < size_threshold * 2)
+ {
+ /* For applications that are less than twice the codesize limit, allow
+ limited unrolling for very hot loops. */
+ sum_to_header_ratio = profile_info->sum_all / loop->header->count;
+ hotness_ratio_threshold = PARAM_VALUE
(PARAM_UNROLLPEEL_HOTNESS_THRESHOLD);
+ /* When the profile count sum to loop entry header ratio is smaller than
+ the threshold (i.e. the loop entry is hot enough, the divisor is set
+ to 1 so the unroll/peel factor is not reduced. When it is bigger
+ than the ratio, increase the divisor by the amount this ratio
+ is over the threshold, which will quickly reduce the unroll/peel
+ factor to zero as the loop's hotness reduces. */
+ if (sum_to_header_ratio > hotness_ratio_threshold)
+ {
+ limit_factor = sum_to_header_ratio / hotness_ratio_threshold;
+ gcc_assert (limit_factor >= 1);
+ }
+ else
+ limit_factor = 1;
+ }
+ else
+ /* For appliations that are at least twice the codesize limit, set
+ the divisor to a large value that will force the unroll factor to 0.
*/
+ limit_factor = INT_MAX;
- if (flag_ripa_peel_size_limit)
- result_int |= LIMIT_PEEL;
-
- if (flag_ripa_unroll_size_limit)
- result_int |= LIMIT_UNROLL;
-
- result = (limit_type)result_int;
- return result;
+ return limit_factor;
}
/* Compute the maximum number of times LOOP can be unrolled without exceeding
@@ -444,7 +470,6 @@ decide_unrolling_and_peeling (int flags)
struct loop *loop;
loop_iterator li;
location_t locus;
- limit_type limit;
/* Scan the loops, inner ones first. */
FOR_EACH_LOOP (li, loop, LI_FROM_INNERMOST)
@@ -485,38 +510,15 @@ decide_unrolling_and_peeling (int flags)
loop->ninsns = num_loop_insns (loop);
loop->av_ninsns = average_num_loop_insns (loop);
- /* Determine whether to limit code size growth from unrolling and
- peeling. This is currently enabled only under LIPO (dynamic IPA)
- where we have a partial call graph. It is not applied to loops
- with constant trip counts, as it is easier to determine the
- profitability of unrolling and peeling such loops. */
- limit = limit_code_size(loop);
- if (limit != NO_LIMIT)
- {
- if (dump_file)
- {
- fprintf (dump_file, ";; Due to large code size footprint estimate, limit
");
- if (limit == (LIMIT_UNROLL|LIMIT_PEEL))
- fprintf (dump_file, "unrolling and peeling\n");
- else if (limit == LIMIT_UNROLL)
- fprintf (dump_file, "unrolling\n");
- else
- fprintf (dump_file, "peeling\n");
- }
- }
-
/* Try transformations one by one in decreasing order of
priority. */
decide_unroll_constant_iterations (loop, flags);
- if (loop->lpt_decision.decision == LPT_NONE
- && !(limit & LIMIT_UNROLL))
+ if (loop->lpt_decision.decision == LPT_NONE)
decide_unroll_runtime_iterations (loop, flags);
- if (loop->lpt_decision.decision == LPT_NONE
- && !(limit & LIMIT_UNROLL))
+ if (loop->lpt_decision.decision == LPT_NONE)
decide_unroll_stupid (loop, flags);
- if (loop->lpt_decision.decision == LPT_NONE
- && !(limit & LIMIT_PEEL))
+ if (loop->lpt_decision.decision == LPT_NONE)
decide_peel_simple (loop, flags);
if (flag_opt_info >= OPT_INFO_MIN
@@ -1055,6 +1057,7 @@ decide_unroll_runtime_iterations (struct loop *loo
{
unsigned nunroll, nunroll_by_av, nunroll_branches, i;
struct niter_desc *desc;
+ int limit_factor = 1;
if (!(flags & UAP_UNROLL))
{
@@ -1067,10 +1070,26 @@ decide_unroll_runtime_iterations (struct loop *loo
"\n;; Considering unrolling loop with runtime "
"computable number of iterations\n");
+ if (flag_unroll_codesize_limit)
+ {
+ /* Determine whether to limit code size growth from unrolling,
+ using FDO profile summary information that gives an
+ estimated number of executed blocks. */
+ limit_factor = code_size_limit_factor (loop);
+ if (dump_file && limit_factor > 1)
+ {
+ fprintf (dump_file,
+ ";; Due to large code size footprint estimate, limit "
+ "max unrolled insns by divisor %d\n", limit_factor);
+ }
+ }
+
/* nunroll = total number of copies of the original loop body in
unrolled loop (i.e. if it is 2, we have to duplicate loop body once. */
- nunroll = PARAM_VALUE (PARAM_MAX_UNROLLED_INSNS) / loop->ninsns;
- nunroll_by_av = PARAM_VALUE (PARAM_MAX_AVERAGE_UNROLLED_INSNS) /
loop->av_ninsns;
+ nunroll = PARAM_VALUE (PARAM_MAX_UNROLLED_INSNS) / limit_factor
+ / loop->ninsns;
+ nunroll_by_av = PARAM_VALUE (PARAM_MAX_AVERAGE_UNROLLED_INSNS)
+ / limit_factor / loop->av_ninsns;
if (nunroll > nunroll_by_av)
nunroll = nunroll_by_av;
if (nunroll > (unsigned) PARAM_VALUE (PARAM_MAX_UNROLL_TIMES))
@@ -1461,6 +1480,7 @@ decide_peel_simple (struct loop *loop, int flags)
{
unsigned npeel;
struct niter_desc *desc;
+ int limit_factor = 1;
if (!(flags & UAP_PEEL))
{
@@ -1471,8 +1491,23 @@ decide_peel_simple (struct loop *loop, int flags)
if (dump_file)
fprintf (dump_file, "\n;; Considering simply peeling loop\n");
+ if (flag_peel_codesize_limit)
+ {
+ /* Determine whether to limit code size growth from peeling,
+ using FDO profile summary information that gives an
+ estimated number of executed blocks. */
+ limit_factor = code_size_limit_factor (loop);
+ if (dump_file && limit_factor > 1)
+ {
+ fprintf (dump_file,
+ ";; Due to large code size footprint estimate, limit "
+ "max peeled insns by divisor %d\n", limit_factor);
+ }
+ }
+
/* npeel = number of iterations to peel. */
- npeel = PARAM_VALUE (PARAM_MAX_PEELED_INSNS) / loop->ninsns;
+ npeel = PARAM_VALUE (PARAM_MAX_PEELED_INSNS) / limit_factor
+ / loop->ninsns;
if (npeel > (unsigned) PARAM_VALUE (PARAM_MAX_PEEL_TIMES))
npeel = PARAM_VALUE (PARAM_MAX_PEEL_TIMES);
@@ -1614,6 +1649,7 @@ decide_unroll_stupid (struct loop *loop, int flags
{
unsigned nunroll, nunroll_by_av, i;
struct niter_desc *desc;
+ int limit_factor = 1;
if (!(flags & UAP_UNROLL_ALL))
{
@@ -1624,11 +1660,26 @@ decide_unroll_stupid (struct loop *loop, int flags
if (dump_file)
fprintf (dump_file, "\n;; Considering unrolling loop stupidly\n");
+ if (flag_unroll_codesize_limit)
+ {
+ /* Determine whether to limit code size growth from unrolling,
+ using FDO profile summary information that gives an
+ estimated number of executed blocks. */
+ limit_factor = code_size_limit_factor (loop);
+ if (dump_file && limit_factor > 1)
+ {
+ fprintf (dump_file,
+ ";; Due to large code size footprint estimate, limit "
+ "max unrolled insns by divisor %d\n", limit_factor);
+ }
+ }
+
/* nunroll = total number of copies of the original loop body in
unrolled loop (i.e. if it is 2, we have to duplicate loop body once. */
- nunroll = PARAM_VALUE (PARAM_MAX_UNROLLED_INSNS) / loop->ninsns;
- nunroll_by_av
- = PARAM_VALUE (PARAM_MAX_AVERAGE_UNROLLED_INSNS) / loop->av_ninsns;
+ nunroll = PARAM_VALUE (PARAM_MAX_UNROLLED_INSNS) / limit_factor
+ / loop->ninsns;
+ nunroll_by_av = PARAM_VALUE (PARAM_MAX_AVERAGE_UNROLLED_INSNS)
+ / limit_factor / loop->av_ninsns;
if (nunroll > nunroll_by_av)
nunroll = nunroll_by_av;
if (nunroll > (unsigned) PARAM_VALUE (PARAM_MAX_UNROLL_TIMES))
Index: coverage.c
===================================================================
--- coverage.c (revision 188240)
+++ coverage.c (working copy)
@@ -558,6 +558,7 @@ read_counts_file (const char *da_file_name, unsign
{
struct gcov_ctr_summary *csum = &summary.ctrs[entry->ctr];
+ entry->summary.num_hot_counters += csum->num_hot_counters;
entry->summary.runs += csum->runs;
entry->summary.sum_all += csum->sum_all;
if (entry->summary.run_max < csum->run_max)
Index: common.opt
===================================================================
--- common.opt (revision 188240)
+++ common.opt (working copy)
@@ -1130,14 +1130,6 @@ Common Report Var(flag_ripa_no_promote_always_inli
Don't promote always inline static functions assuming they
will be inlined and no copy is needed.
-fripa-peel-size-limit
-Common Report Var(flag_ripa_peel_size_limit) Init(1) Optimization
-Limit non-const non-FP loop peeling under dynamic IPA estimates of large code
footprint
-
-fripa-unroll-size-limit
-Common Report Var(flag_ripa_unroll_size_limit) Init(1) Optimization
-Limit non-const non-FP loop unrolling under dynamic IPA estimates of large code
footprint
-
fripa-inc-path-sub=
Common Joined RejectNegative Var(lipo_inc_path_pattern)
Substitute substring in include paths with a new string to allow reuse profile
data
@@ -1646,6 +1638,10 @@ fpcc-struct-return
Common Report Var(flag_pcc_struct_return,1) Init(DEFAULT_PCC_STRUCT_RETURN)
Return small aggregates in memory, not registers
+fpeel-codesize-limit
+Common Report Var(flag_peel_codesize_limit) Init(1) Optimization
+Limit non-const non-FP loop peeling under profile estimates of large code
footprint
+
fpeel-loops
Common Report Var(flag_peel_loops) Optimization
Perform loop peeling
@@ -2216,6 +2212,10 @@ funroll-all-loops
Common Report Var(flag_unroll_all_loops) Optimization
Perform loop unrolling for all loops
+funroll-codesize-limit
+Common Report Var(flag_unroll_codesize_limit) Init(1) Optimization
+Limit non-const non-FP loop unrolling under profile estimates of large code
footprint
+
; Nonzero means that loop optimizer may assume that the induction variables
; that control loops do not overflow and that the loops with nontrivial
; exit condition are not infinite
Index: tree-optimize.c
===================================================================
--- tree-optimize.c (revision 188240)
+++ tree-optimize.c (working copy)
@@ -47,7 +47,6 @@ along with GCC; see the file COPYING3. If not see
#include "except.h"
#include "plugin.h"
#include "regset.h" /* FIXME: For reg_obstack. */
-#include "params.h"
/* Decides if the cgraph callee edges are being cleaned up for the
last time. */
@@ -156,49 +155,6 @@ struct gimple_opt_pass pass_all_early_optimization
}
};
-int cgraph_codesize_estimate = -1;
-
-/* Estimate the code size from the dynamic IPA call graph. */
-static void
-compute_codesize_estimate(void)
-{
- struct cgraph_node *node;
- basic_block bb;
- gimple_stmt_iterator bsi;
- struct function *my_function;
-
- if (!flag_dyn_ipa)
- return;
-
- if (cgraph_codesize_estimate >= 0)
- return;
- cgraph_codesize_estimate = 0;
-
- for (node = cgraph_nodes; node; node = node->next)
- {
- if (node->count == 0)
- continue;
-
- if (!gimple_has_body_p(node->decl))
- continue;
-
- my_function = DECL_STRUCT_FUNCTION (node->decl);
-
- FOR_EACH_BB_FN (bb, my_function)
- {
- if (bb->count < PARAM_VALUE (PARAM_CODESIZE_HOTNESS_THRESHOLD))
- continue;
- for (bsi = gsi_start_bb (bb); !gsi_end_p (bsi); gsi_next (&bsi))
- {
- gimple stmt = gsi_stmt (bsi);
- cgraph_codesize_estimate += estimate_num_insns (stmt,
&eni_size_weights);
- }
- }
- }
- if (dump_file)
- fprintf (dump_file, "Code size estimate from cgraph: %d\n",
cgraph_codesize_estimate);
-}
-
/* Pass: cleanup the CFG just before expanding trees to RTL.
This is just a round of label cleanups and case node grouping
because after the tree optimizers have run such cleanups may
@@ -207,8 +163,6 @@ struct gimple_opt_pass pass_all_early_optimization
static unsigned int
execute_cleanup_cfg_post_optimizing (void)
{
- /* Estimate the code footprint for hot BBs before we enter RTL */
- compute_codesize_estimate();
cleanup_tree_cfg ();
cleanup_dead_labels ();
group_case_labels ();
Index: libgcov.c
===================================================================
--- libgcov.c (revision 188240)
+++ libgcov.c (working copy)
@@ -835,6 +835,101 @@ gcov_counter_array (const struct gcov_info *info,
return result;
}
+/* Used by qsort to sort gcov values in descending order. */
+
+static int
+sort_by_reverse_gcov_value (const void *pa, const void *pb)
+{
+ const gcov_type a = *(gcov_type const *)pa;
+ const gcov_type b = *(gcov_type const *)pb;
+
+ if (b > a)
+ return 1;
+ else if (b == a)
+ return 0;
+ else
+ return -1;
+}
+
+/* Determines the number of counters required to cover a given percentage
+ of the total sum of execution counts in the summary, which is then also
+ recorded in SUM. */
+
+static void
+gcov_compute_cutoff_values (struct gcov_summary *sum)
+{
+ struct gcov_info *gi_ptr;
+ const struct gcov_ctr_info *ci_ptr;
+ struct gcov_ctr_summary *cs_ptr;
+ unsigned t_ix, index;
+ gcov_unsigned_t c_num;
+ gcov_type *value_array;
+ gcov_type cum, cum_cutoff;
+ char *cutoff_str;
+ unsigned cutoff_perc;
+
+#define CUM_CUTOFF_PERCENT_TIMES_10 999
+ cutoff_str = getenv ("GCOV_HOTCODE_CUTOFF_TIMES_10");
+ if (cutoff_str && strlen (cutoff_str))
+ cutoff_perc = atoi (cutoff_str);
+ else
+ cutoff_perc = CUM_CUTOFF_PERCENT_TIMES_10;
+
+ /* This currently only applies to arc counters. */
+ t_ix = GCOV_COUNTER_ARCS;
+
+ /* First check if there are any counts recorded for this counter. */
+ cs_ptr = &(sum->ctrs[t_ix]);
+ if (!cs_ptr->num)
+ return;
+
+ /* Determine the cumulative counter value at the specified cutoff
+ percentage and record the percentage for use by gcov consumers.
+ Check for overflow when sum_all is multiplied by the cutoff_perc,
+ and if so, do the divide first. */
+ if (cs_ptr->sum_all*cutoff_perc < cs_ptr->sum_all)
+ /* Overflow, do the divide first. */
+ cum_cutoff = cs_ptr->sum_all / 1000 * cutoff_perc;
+ else
+ /* Otherwise multiply first to get the correct value for small
+ values of sum_all. */
+ cum_cutoff = (cs_ptr->sum_all * cutoff_perc) / 1000;
+
+ /* Next, walk through all the per-object structures and save each of
+ the count values in value_array. */
+ index = 0;
+ value_array = (gcov_type *) malloc (sizeof (gcov_type) * cs_ptr->num);
+ for (gi_ptr = __gcov_list; gi_ptr; gi_ptr = gi_ptr->next)
+ {
+ if (!((1 << t_ix) & gi_ptr->ctr_mask))
+ continue;
+
+ ci_ptr = gi_ptr->counts;
+ /* Sanity check that there are enough entries in value_array
+ for this object's counters. Gracefully handle the case when
+ there are not, in case something in the profile info is
+ corrupted. */
+ c_num = ci_ptr->num;
+ if (index + c_num > cs_ptr->num)
+ c_num = cs_ptr->num - index;
+ /* Copy over this object's counter values. */
+ memcpy (&value_array[index], ci_ptr->values,
+ sizeof (gcov_type) * c_num);
+ index += c_num;
+ }
+
+ /* Sort all the counter values by descending value and finally
+ accumulate the values from hottest on down until reaching
+ the cutoff value computed earlier. */
+ qsort (value_array, cs_ptr->num, sizeof (gcov_type),
+ sort_by_reverse_gcov_value);
+ for (cum = 0, c_num = 0; c_num < cs_ptr->num && cum < cum_cutoff; c_num++)
+ cum += value_array[c_num];
+ /* Record the number of counters required to reach the cutoff value. */
+ cs_ptr->num_hot_counters = c_num;
+ free (value_array);
+}
+
/* Compute object summary recored in gcov_info INFO. The result is
stored in OBJ_SUM. Note that the caller is responsible for
zeroing out OBJ_SUM, otherwise the summary is accumulated. */
@@ -1010,7 +1105,11 @@ rewrite:;
if ((1 << t_ix) & info->ctr_mask)
{
if (!cs_obj->runs++)
- cs_obj->num = cs_tobj->num;
+ {
+ cs_obj->num = cs_tobj->num;
+ if (cs_tobj->num_hot_counters > cs_obj->num_hot_counters)
+ cs_obj->num_hot_counters = cs_tobj->num_hot_counters;
+ }
else if (cs_obj->num != cs_tobj->num)
goto read_mismatch;
cs_obj->sum_all += cs_tobj->sum_all;
@@ -1019,7 +1118,11 @@ rewrite:;
cs_obj->sum_max += cs_tobj->run_max;
if (!cs_prg->runs++)
- cs_prg->num = cs_tprg->num;
+ {
+ cs_prg->num = cs_tprg->num;
+ if (cs_tprg->num_hot_counters > cs_prg->num_hot_counters)
+ cs_prg->num_hot_counters = cs_tprg->num_hot_counters;
+ }
else if (cs_prg->num != cs_tprg->num)
goto read_mismatch;
cs_prg->sum_all += cs_tprg->sum_all;
@@ -1225,6 +1328,7 @@ gcov_exit_init (void)
is FDO/LIPO. */
dump_module_info |= gi_ptr->mod_info->is_primary;
}
+ gcov_compute_cutoff_values (&this_program);
gcov_alloc_filename ();
Index: params.def
===================================================================
--- params.def (revision 188240)
+++ params.def (working copy)
@@ -340,21 +340,23 @@ DEFPARAM(PARAM_MAX_UNROLL_ITERATIONS,
"max-completely-peel-loop-nest-depth",
"The maximum depth of a loop nest we completely peel",
8, 0, 0)
-/* The minimum profile count of basic blocks to look at when estimating
- * the code size footprint of the call graph in a dynamic IPA compile. */
-DEFPARAM(PARAM_CODESIZE_HOTNESS_THRESHOLD,
- "codesize-hotness-threshold",
- "Minimum profile count of basic blocks counted towards dynamic IPA "
- "code size footprint estimate",
- 10000, 0, 0)
/* The maximum code size estimate under which loop unrolling and peeling
- * is allowed in a dynamic IPA compile. This currently applies to loops with
- * non-constant iteration counts and no floating point computations. */
+ * is allowed in a profile feedback compile. This currently applies to loops
+ * with non-constant iteration counts and no floating point computations. */
DEFPARAM(PARAM_UNROLLPEEL_CODESIZE_THRESHOLD,
"unrollpeel-codesize-threshold",
- "Maximum dynamic IPA code size footprint estimate for loop unrolling "
+ "Maximum profile-based code size footprint estimate for loop unrolling "
"and peeling",
- 10000, 0, 0)
+ 15000, 0, 0)
+/* The maximum ratio of total profiled execution counts to loop entry block
+ count that must be exceeded to ignore most code size limits when unrolling
+ and peeling. */
+DEFPARAM(PARAM_UNROLLPEEL_HOTNESS_THRESHOLD,
+ "unrollpeel-hotness-threshold",
+ "Maximum ratio of total profiled execution count to loop entry "
+ "block count under which most codesize limits for unrolling and "
+ "peeling will be ignored",
+ 100, 1, 0)
DEFPARAM(PARAM_MIN_ITER_UNROLL_WITH_BRANCHES,
"min-iter-unroll-with-branches",
Index: gcov-dump.c
===================================================================
--- gcov-dump.c (revision 188240)
+++ gcov-dump.c (working copy)
@@ -494,8 +494,10 @@ tag_summary (const char *filename ATTRIBUTE_UNUSED
{
printf ("\n");
print_prefix (filename, 0, 0);
- printf ("\t\tcounts=%u, runs=%u",
- summary.ctrs[ix].num, summary.ctrs[ix].runs);
+ printf ("\t\tcounts=%u (num hot counts=%u), runs=%u",
+ summary.ctrs[ix].num,
+ summary.ctrs[ix].num_hot_counters,
+ summary.ctrs[ix].runs);
printf (", sum_all=" HOST_WIDEST_INT_PRINT_DEC,
(HOST_WIDEST_INT)summary.ctrs[ix].sum_all);
--
This patch is available for review at http://codereview.appspot.com/6298056
Issue 6298056: [google/4_6] New fdo summary-based icache sensitive unrolling
Created 12 years, 10 months ago by tejohnson
Modified 10 years, 7 months ago
Reviewers:
Base URL: svn+ssh://gcc.gnu.org/svn/gcc/branches/google/gcc-4_6/gcc/
Comments: 0