|
|
Created:
12 years, 5 months ago by tejohnson Modified:
10 years, 7 months ago Reviewers:
hubicka CC:
gcc-patches_gcc.gnu.org Base URL:
svn+ssh://gcc.gnu.org/svn/gcc/trunk/gcc/ Visibility:
Public. |
Patch Set 1 #MessagesTotal messages: 3
This patch uses the new working set information from the profile to select the hot count threshold for an application instead of using a hard cutoff. Currently the threshold is set by default to the minimum counter value needed to reach 99.9% of the profiled execution time, but I have added a parameter to control this. I saw a couple improvements in SPEC2006 on a Westmere, such as xalancbmk by a few percent. Bootstrapped and tested on x86_64-unknown-linux-gnu. Ok for trunk? Thanks, Teresa 2012-11-19 Teresa Johnson <tejohnson@google.com> * predict.c (maybe_hot_count_p): Use threshold from profiled working set instead of hard limit. (cgraph_maybe_hot_edge_p): Invoke maybe_hot_count_p() instead of directly checking limit. * params.def (HOT_BB_COUNT_FRACTION): Remove. (HOT_BB_COUNT_WS_PERCENT): New parameter. Index: predict.c =================================================================== --- predict.c (revision 193614) +++ predict.c (working copy) @@ -134,13 +134,15 @@ maybe_hot_frequency_p (struct function *fun, int f static inline bool maybe_hot_count_p (struct function *fun, gcov_type count) { + gcov_working_set_t *ws; if (profile_status_for_function (fun) != PROFILE_READ) return true; /* Code executed at most once is not hot. */ if (profile_info->runs >= count) return false; - return (count - > profile_info->sum_max / PARAM_VALUE (HOT_BB_COUNT_FRACTION)); + ws = find_working_set (PARAM_VALUE (HOT_BB_COUNT_WS_PERCENT)); + gcc_assert (ws); + return (count >= ws->min_counter); } /* Return true in case BB can be CPU intensive and should be optimized @@ -161,8 +163,8 @@ bool cgraph_maybe_hot_edge_p (struct cgraph_edge *edge) { if (profile_info && flag_branch_probabilities - && (edge->count - <= profile_info->sum_max / PARAM_VALUE (HOT_BB_COUNT_FRACTION))) + && !maybe_hot_count_p (DECL_STRUCT_FUNCTION (edge->caller->symbol.decl), + edge->count)) return false; if (edge->caller->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED || (edge->callee Index: params.def =================================================================== --- params.def (revision 193614) +++ params.def (working copy) @@ -365,10 +365,11 @@ DEFPARAM(PARAM_SMS_LOOP_AVERAGE_COUNT_THRESHOLD, "A threshold on the average loop count considered by the swing modulo scheduler", 0, 0, 0) -DEFPARAM(HOT_BB_COUNT_FRACTION, - "hot-bb-count-fraction", - "Select fraction of the maximal count of repetitions of basic block in program given basic block needs to have to be considered hot", - 10000, 0, 0) +DEFPARAM(HOT_BB_COUNT_WS_PERCENT, + "hot-bb-count-ws-percent", + "A basic block profile count is considered hot if it contributes to " + "the given percentage (times ten) of the entire profiled execution", + 999, 0, 1000) DEFPARAM(HOT_BB_FREQUENCY_FRACTION, "hot-bb-frequency-fraction", "Select fraction of the maximal frequency of executions of basic block in function given basic block needs to have to be considered hot", @@ -392,7 +393,7 @@ DEFPARAM (PARAM_ALIGN_LOOP_ITERATIONS, flatten the profile. We need to cut the maximal predicted iterations to large enough iterations - so the loop appears important, but safely within HOT_BB_COUNT_FRACTION + so the loop appears important, but safely within maximum hotness range. */ DEFPARAM(PARAM_MAX_PREDICTED_ITERATIONS, -- This patch is available for review at http://codereview.appspot.com/6852069
Sign in to reply to this message.
> This patch uses the new working set information from the profile to select > the hot count threshold for an application instead of using a hard cutoff. > Currently the threshold is set by default to the minimum counter value > needed to reach 99.9% of the profiled execution time, but I have added > a parameter to control this. > > I saw a couple improvements in SPEC2006 on a Westmere, such as xalancbmk by a few > percent. > > Bootstrapped and tested on x86_64-unknown-linux-gnu. Ok for trunk? > > Thanks, > Teresa > > > 2012-11-19 Teresa Johnson <tejohnson@google.com> > > * predict.c (maybe_hot_count_p): Use threshold from profiled working > set instead of hard limit. > (cgraph_maybe_hot_edge_p): Invoke maybe_hot_count_p() instead of > directly checking limit. > * params.def (HOT_BB_COUNT_FRACTION): Remove. > (HOT_BB_COUNT_WS_PERCENT): New parameter. > > Index: predict.c > =================================================================== > --- predict.c (revision 193614) > +++ predict.c (working copy) > @@ -134,13 +134,15 @@ maybe_hot_frequency_p (struct function *fun, int f > static inline bool > maybe_hot_count_p (struct function *fun, gcov_type count) > { > + gcov_working_set_t *ws; > if (profile_status_for_function (fun) != PROFILE_READ) > return true; > /* Code executed at most once is not hot. */ > if (profile_info->runs >= count) > return false; > - return (count > - > profile_info->sum_max / PARAM_VALUE (HOT_BB_COUNT_FRACTION)); > + ws = find_working_set (PARAM_VALUE (HOT_BB_COUNT_WS_PERCENT)); > + gcc_assert (ws); > + return (count >= ws->min_counter); I think you want to store the minimal count into a global variable to avoid the repeated working set lookup. > Index: params.def > =================================================================== > --- params.def (revision 193614) > +++ params.def (working copy) > @@ -365,10 +365,11 @@ DEFPARAM(PARAM_SMS_LOOP_AVERAGE_COUNT_THRESHOLD, > "A threshold on the average loop count considered by the swing modulo scheduler", > 0, 0, 0) > > -DEFPARAM(HOT_BB_COUNT_FRACTION, > - "hot-bb-count-fraction", > - "Select fraction of the maximal count of repetitions of basic block in program given basic block needs to have to be considered hot", > - 10000, 0, 0) > +DEFPARAM(HOT_BB_COUNT_WS_PERCENT, > + "hot-bb-count-ws-percent", > + "A basic block profile count is considered hot if it contributes to " > + "the given percentage (times ten) of the entire profiled execution", > + 999, 0, 1000) And document the parameter. Honza > DEFPARAM(HOT_BB_FREQUENCY_FRACTION, > "hot-bb-frequency-fraction", > "Select fraction of the maximal frequency of executions of basic block in function given basic block needs to have to be considered hot", > @@ -392,7 +393,7 @@ DEFPARAM (PARAM_ALIGN_LOOP_ITERATIONS, > flatten the profile. > > We need to cut the maximal predicted iterations to large enough iterations > - so the loop appears important, but safely within HOT_BB_COUNT_FRACTION > + so the loop appears important, but safely within maximum hotness > range. */ > > DEFPARAM(PARAM_MAX_PREDICTED_ITERATIONS, > > -- > This patch is available for review at http://codereview.appspot.com/6852069
Sign in to reply to this message.
Hi, I went ahead, updated the patch, tested wth profiledbootstrap on x86_64-linux and commited. I really need to progress with heuristic re-tunning before we get too far in stage3. In addition to caching result of find_working_set I had to avoid ICE when we try to determine DECL_STRUCT_FUNCTION of callee if ithe indirect call. Thanks, Honza 2012-11-19 Teresa Johnson <tejohnson@google.com> Jan Hubicka <jh@suse.cz> * predict.c (maybe_hot_count_p): Use threshold from profiled working set instead of hard limit. (cgraph_maybe_hot_edge_p): Invoke maybe_hot_count_p() instead of directly checking limit. * params.def (HOT_BB_COUNT_FRACTION): Remove. (HOT_BB_COUNT_WS_PERMILLE): New parameter. * invoke.texi (hot-bb-count-fraction): Remove. (hot-bb-count-ws-permille): Document. Index: predict.c =================================================================== --- predict.c (revision 193696) +++ predict.c (working copy) @@ -134,13 +134,20 @@ maybe_hot_frequency_p (struct function * static inline bool maybe_hot_count_p (struct function *fun, gcov_type count) { - if (profile_status_for_function (fun) != PROFILE_READ) + gcov_working_set_t *ws; + static gcov_type min_count = -1; + if (fun && profile_status_for_function (fun) != PROFILE_READ) return true; /* Code executed at most once is not hot. */ if (profile_info->runs >= count) return false; - return (count - > profile_info->sum_max / PARAM_VALUE (HOT_BB_COUNT_FRACTION)); + if (min_count == -1) + { + ws = find_working_set (PARAM_VALUE (HOT_BB_COUNT_WS_PERMILLE)); + gcc_assert (ws); + min_count = ws->min_counter; + } + return (count >= min_count); } /* Return true in case BB can be CPU intensive and should be optimized @@ -161,8 +168,8 @@ bool cgraph_maybe_hot_edge_p (struct cgraph_edge *edge) { if (profile_info && flag_branch_probabilities - && (edge->count - <= profile_info->sum_max / PARAM_VALUE (HOT_BB_COUNT_FRACTION))) + && !maybe_hot_count_p (NULL, + edge->count)) return false; if (edge->caller->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED || (edge->callee Index: params.def =================================================================== --- params.def (revision 193696) +++ params.def (working copy) @@ -365,10 +365,11 @@ DEFPARAM(PARAM_SMS_LOOP_AVERAGE_COUNT_TH "A threshold on the average loop count considered by the swing modulo scheduler", 0, 0, 0) -DEFPARAM(HOT_BB_COUNT_FRACTION, - "hot-bb-count-fraction", - "Select fraction of the maximal count of repetitions of basic block in program given basic block needs to have to be considered hot", - 10000, 0, 0) +DEFPARAM(HOT_BB_COUNT_WS_PERMILLE, + "hot-bb-count-ws-permille", + "A basic block profile count is considered hot if it contributes to " + "the given permillage of the entire profiled execution", + 999, 0, 1000) DEFPARAM(HOT_BB_FREQUENCY_FRACTION, "hot-bb-frequency-fraction", "Select fraction of the maximal frequency of executions of basic block in function given basic block needs to have to be considered hot", @@ -392,7 +393,7 @@ DEFPARAM (PARAM_ALIGN_LOOP_ITERATIONS, flatten the profile. We need to cut the maximal predicted iterations to large enough iterations - so the loop appears important, but safely within HOT_BB_COUNT_FRACTION + so the loop appears important, but safely within maximum hotness range. */ DEFPARAM(PARAM_MAX_PREDICTED_ITERATIONS, Index: doc/invoke.texi =================================================================== --- doc/invoke.texi (revision 193696) +++ doc/invoke.texi (working copy) @@ -9216,9 +9216,9 @@ doing loop versioning for alias in the v The maximum number of iterations of a loop the brute-force algorithm for analysis of the number of iterations of the loop tries to evaluate. -@item hot-bb-count-fraction -Select fraction of the maximal count of repetitions of basic block in program -given basic block needs to have to be considered hot. +@item hot-bb-count-ws-permille +A basic block profile count is considered hot if it contributes to +the given permillage (i.e. 0...1000) of the entire profiled execution. @item hot-bb-frequency-fraction Select fraction of the entry block frequency of executions of basic block in
Sign in to reply to this message.
|