|
|
Created:
12 years, 9 months ago by Sriraman Modified:
12 years, 1 month ago CC:
gcc-patches_gcc.gnu.org Base URL:
svn+ssh://gcc.gnu.org/svn/gcc/trunk/gcc/ Visibility:
Public. |
Patch Set 1 #Patch Set 2 : User directed Function Multiversioning via Function Overloading #Patch Set 3 : User directed Function Multiversioning via Function Overloading #Patch Set 4 : User directed Function Multiversioning via Function Overloading #Patch Set 5 : User directed Function Multiversioning via Function Overloading #Patch Set 6 : User directed Function Multiversioning via Function Overloading #Patch Set 7 : User directed Function Multiversioning via Function Overloading #Patch Set 8 : User directed Function Multiversioning via Function Overloading #Patch Set 9 : User directed Function Multiversioning via Function Overloading #
Total comments: 5
Patch Set 10 : User directed Function Multiversioning via Function Overloading #
MessagesTotal messages: 86
User directed Function Multiversioning (MV) via Function Overloading ==================================================================== This patch adds support for user directed function MV via function overloading. For more detailed description: http://gcc.gnu.org/ml/gcc/2012-03/msg00074.html Here is an example program with function versions: int foo (); /* Default version */ int foo () __attribute__ ((targetv("arch=corei7")));/*Specialized for corei7 */ int foo () __attribute__ ((targetv("arch=core2")));/*Specialized for core2 */ int main () { int (*p)() = &foo; return foo () + (*p)(); } int foo () { return 0; } int __attribute__ ((targetv("arch=corei7"))) foo () { return 0; } int __attribute__ ((targetv("arch=core2"))) foo () { return 0; } The above example has foo defined 3 times, but all 3 definitions of foo are different versions of the same function. The call to foo in main, directly and via a pointer, are calls to the multi-versioned function foo which is dispatched to the right foo at run-time. Function versions must have the same signature but must differ in the specifier string provided to a new attribute called "targetv", which is nothing but the target attribute with an extra specification to indicate a version. Any number of versions can be created using the targetv attribute but it is mandatory to have one function without the attribute, which is treated as the default version. The dispatching is done using the IFUNC mechanism to keep the dispatch overhead low. The compiler creates a dispatcher function which checks the CPU type and calls the right version of foo. The dispatching code checks for the platform type and calls the first version that matches. The default function is called if no specialized version is appropriate for execution. The pointer to foo is made to be the address of the dispatcher function, so that it is unique and calls made via the pointer also work correctly. The assembler names of the various versions of foo is made different, by tagging the specifier strings, to keep them unique. A specific version can be called directly by creating an alias to its assembler name. For instance, to call the corei7 version directly, make an alias : int foo_corei7 () __attribute__((alias ("_Z3foov.arch_corei7"))); and then call foo_corei7. Note that using IFUNC blocks inlining of versioned functions. I had implemented an optimization earlier to do hot path cloning to allow versioned functions to be inlined. Please see : http://gcc.gnu.org/ml/gcc-patches/2011-04/msg02285.html In the next iteration, I plan to merge these two. With that, hot code paths with versioned functions will be cloned so that versioned functions can be inlined. * doc/tm.texi.in: Add description for TARGET_DISPATCH_VERSION. * doc/tm.texi: Regenerate. * c-family/c-common.c (handle_targetv_attribute): New function. * target.def (dispatch_version): New target hook. * tree.h (DECL_FUNCTION_VERSIONED): New macro. (tree_function_decl): New bit-field versioned_function. * tree-pass.h (pass_dispatch_versions): New pass. * multiversion.c: New file. * multiversion.h: New file. * cgraphunit.c: Include multiversion.h (cgraph_finalize_function): Change assembler names of versioned functions. * cp/class.c: Include multiversion.h (add_method): aggregate function versions. Change assembler names of versioned functions. (resolve_address_of_overloaded_function): Match address of function version with default function. Return address of ifunc dispatcher for address of versioned functions. * cp/decl.c (decls_match): Make decls unmatched for versioned functions. (duplicate_decls): Remove ambiguity for versioned functions. Notify of deleted function version decls. (start_decl): Change assembler name of versioned functions. (start_function): Change assembler name of versioned functions. (cxx_comdat_group): Make comdat group of versioned functions be the same. * cp/semantics.c (expand_or_defer_fn_1): Mark as needed versioned functions that are also marked inline. * cp/decl2.c: Include multiversion.h (check_classfn): Check attributes of versioned functions for match. * cp/call.c: Include multiversion.h (build_over_call): Make calls to multiversioned functions to call the dispatcher. (joust): For calls to multi-versioned functions, make the default function win. * timevar.def (TV_MULTIVERSION_DISPATCH): New time var. * varasm.c (finish_aliases_1): Check if the alias points to a function with a body before giving an error. * Makefile.in: Add multiversion.o * passes.c: Add pass_dispatch_versions to the pass list. * config/i386/i386.c (add_condition_to_bb): New function. (get_builtin_code_for_version): New function. (ix86_dispatch_version): New function. (TARGET_DISPATCH_VERSION): New macro. * testsuite/g++.dg/mv1.C: New test. Index: doc/tm.texi =================================================================== --- doc/tm.texi (revision 184971) +++ doc/tm.texi (working copy) @@ -10995,6 +10995,14 @@ The result is another tree containing a simplified call's result. If @var{ignore} is true the value will be ignored. @end deftypefn +@deftypefn {Target Hook} int TARGET_DISPATCH_VERSION (tree @var{dispatch_decl}, void *@var{fndecls}, basic_block *@var{empty_bb}) +For multi-versioned function, this hook sets up the dispatcher. +@var{dispatch_decl} is the function that will be used to dispatch the +version. @var{fndecls} are the function choices for dispatch. +@var{empty_bb} is an basic block in @var{dispatch_decl} where the +code to do the dispatch will be added. +@end deftypefn + @deftypefn {Target Hook} {const char *} TARGET_INVALID_WITHIN_DOLOOP (const_rtx @var{insn}) Take an instruction in @var{insn} and return NULL if it is valid within a Index: doc/tm.texi.in =================================================================== --- doc/tm.texi.in (revision 184971) +++ doc/tm.texi.in (working copy) @@ -10873,6 +10873,14 @@ The result is another tree containing a simplified call's result. If @var{ignore} is true the value will be ignored. @end deftypefn +@hook TARGET_DISPATCH_VERSION +For multi-versioned function, this hook sets up the dispatcher. +@var{dispatch_decl} is the function that will be used to dispatch the +version. @var{fndecls} are the function choices for dispatch. +@var{empty_bb} is an basic block in @var{dispatch_decl} where the +code to do the dispatch will be added. +@end deftypefn + @hook TARGET_INVALID_WITHIN_DOLOOP Take an instruction in @var{insn} and return NULL if it is valid within a Index: c-family/c-common.c =================================================================== --- c-family/c-common.c (revision 184971) +++ c-family/c-common.c (working copy) @@ -315,6 +315,7 @@ static tree check_case_value (tree); static bool check_case_bounds (tree, tree, tree *, tree *); static tree handle_packed_attribute (tree *, tree, tree, int, bool *); +static tree handle_targetv_attribute (tree *, tree, tree, int, bool *); static tree handle_nocommon_attribute (tree *, tree, tree, int, bool *); static tree handle_common_attribute (tree *, tree, tree, int, bool *); static tree handle_noreturn_attribute (tree *, tree, tree, int, bool *); @@ -604,6 +605,8 @@ const struct attribute_spec c_common_attribute_tab { /* { name, min_len, max_len, decl_req, type_req, fn_type_req, handler, affects_type_identity } */ + { "targetv", 1, -1, true, false, false, + handle_targetv_attribute, false }, { "packed", 0, 0, false, false, false, handle_packed_attribute , false}, { "nocommon", 0, 0, true, false, false, @@ -5869,6 +5872,54 @@ handle_packed_attribute (tree *node, tree name, tr return NULL_TREE; } +/* The targetv attribue is used to specify a function version + targeted to specific platform types. The "targetv" attributes + have to be valid "target" attributes. NODE should always point + to a FUNCTION_DECL. ARGS contain the arguments to "targetv" + which should be valid arguments to attribute "target" too. + Check handle_target_attribute for FLAGS and NO_ADD_ATTRS. */ + +static tree +handle_targetv_attribute (tree *node, tree name, + tree args, + int flags, + bool *no_add_attrs) +{ + const char *attr_str = NULL; + gcc_assert (TREE_CODE (*node) == FUNCTION_DECL); + gcc_assert (args != NULL); + + /* This is a function version. */ + DECL_FUNCTION_VERSIONED (*node) = 1; + + attr_str = TREE_STRING_POINTER (TREE_VALUE (args)); + + /* Check if multiple sets of target attributes are there. This + is not supported now. In future, this will be supported by + cloning this function for each set. */ + if (TREE_CHAIN (args) != NULL) + warning (OPT_Wattributes, "%qE attribute has multiple sets which " + "is not supported", name); + + if (attr_str == NULL + || strstr (attr_str, "arch=") == NULL) + error_at (DECL_SOURCE_LOCATION (*node), + "Versioning supported only on \"arch=\" for now"); + + /* targetv attributes must translate into target attributes. */ + handle_target_attribute (node, get_identifier ("target"), args, flags, + no_add_attrs); + + if (*no_add_attrs) + warning (OPT_Wattributes, "%qE attribute has no effect", name); + + /* This is necessary to keep the attribute tagged to the decl + all the time. */ + *no_add_attrs = false; + + return NULL_TREE; +} + /* Handle a "nocommon" attribute; arguments as in struct attribute_spec.handler. */ Index: target.def =================================================================== --- target.def (revision 184971) +++ target.def (working copy) @@ -1249,6 +1249,15 @@ DEFHOOK tree, (tree fndecl, int n_args, tree *argp, bool ignore), hook_tree_tree_int_treep_bool_null) +/* Target hook to generate the dispatching code for calls to multi-versioned + functions. DISPATCH_DECL is the function that will have the dispatching + logic. FNDECLS are the list of choices for dispatch and EMPTY_BB is the + basic bloc in DISPATCH_DECL which will contain the code. */ +DEFHOOK +(dispatch_version, + "", + int, (tree dispatch_decl, void *fndecls, basic_block *empty_bb), NULL) + /* Returns a code for a target-specific builtin that implements reciprocal of the function, or NULL_TREE if not available. */ DEFHOOK Index: tree.h =================================================================== --- tree.h (revision 184971) +++ tree.h (working copy) @@ -3532,6 +3532,12 @@ extern VEC(tree, gc) **decl_debug_args_insert (tre #define DECL_FUNCTION_SPECIFIC_OPTIMIZATION(NODE) \ (FUNCTION_DECL_CHECK (NODE)->function_decl.function_specific_optimization) +/* In FUNCTION_DECL, this is set if this function has other versions generated + using "targetv" attributes. The default version is the one which does not + have any "targetv" attribute set. */ +#define DECL_FUNCTION_VERSIONED(NODE)\ + (FUNCTION_DECL_CHECK (NODE)->function_decl.versioned_function) + /* FUNCTION_DECL inherits from DECL_NON_COMMON because of the use of the arguments/result/saved_tree fields by front ends. It was either inherit FUNCTION_DECL from non_common, or inherit non_common from FUNCTION_DECL, @@ -3576,8 +3582,8 @@ struct GTY(()) tree_function_decl { unsigned looping_const_or_pure_flag : 1; unsigned has_debug_args_flag : 1; unsigned tm_clone_flag : 1; - - /* 1 bit left */ + unsigned versioned_function : 1; + /* No bits left. */ }; /* The source language of the translation-unit. */ Index: tree-pass.h =================================================================== --- tree-pass.h (revision 184971) +++ tree-pass.h (working copy) @@ -455,6 +455,7 @@ extern struct gimple_opt_pass pass_tm_memopt; extern struct gimple_opt_pass pass_tm_edges; extern struct gimple_opt_pass pass_split_functions; extern struct gimple_opt_pass pass_feedback_split_functions; +extern struct gimple_opt_pass pass_dispatch_versions; /* IPA Passes */ extern struct simple_ipa_opt_pass pass_ipa_lower_emutls; Index: multiversion.c =================================================================== --- multiversion.c (revision 0) +++ multiversion.c (revision 0) @@ -0,0 +1,798 @@ +/* Function Multiversioning. + Copyright (C) 2012 Free Software Foundation, Inc. + Contributed by Sriraman Tallam (tmsriram@google.com) + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +<http://www.gnu.org/licenses/>. */ + +/* Holds the state for multi-versioned functions here. The front-end + updates the state as and when function versions are encountered. + This is then used to generate the dispatch code. Also, the + optimization passes to clone hot paths involving versioned functions + will be done here. + + Function versions are created by using the same function signature but + also tagging attribute "targetv" to specify the platform type for which + the version must be executed. Here is an example: + + int foo () + { + printf ("Execute as default"); + return 0; + } + + int __attribute__ ((targetv ("arch=corei7"))) + foo () + { + printf ("Execute for corei7"); + return 0; + } + + int main () + { + return foo (); + } + + The call to foo in main is replaced with a call to an IFUNC function that + contains the dispatch code to call the correct function version at + run-time. */ + + +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "tm.h" +#include "tree.h" +#include "tree-inline.h" +#include "langhooks.h" +#include "flags.h" +#include "cgraph.h" +#include "diagnostic.h" +#include "toplev.h" +#include "timevar.h" +#include "params.h" +#include "fibheap.h" +#include "intl.h" +#include "tree-pass.h" +#include "hashtab.h" +#include "coverage.h" +#include "ggc.h" +#include "tree-flow.h" +#include "rtl.h" +#include "ipa-prop.h" +#include "basic-block.h" +#include "toplev.h" +#include "dbgcnt.h" +#include "tree-dump.h" +#include "output.h" +#include "vecprim.h" +#include "gimple-pretty-print.h" +#include "ipa-inline.h" +#include "target.h" +#include "multiversion.h" + +typedef void * void_p; + +DEF_VEC_P (void_p); +DEF_VEC_ALLOC_P (void_p, heap); + +/* Each function decl that is a function version gets an instance of this + structure. Since this is called by the front-end, decl merging can + happen, where a decl created for a new declaration is merged with + the old. In this case, the new decl is deleted and the IS_DELETED + field is set for the struct instance corresponding to the new decl. + IFUNC_DECL is the decl of the ifunc function for default decls. + IFUNC_RESOLVER_DECL is the decl of the dispatch function. VERSIONS + is a vector containing the list of function versions that are + the candidates for dispatch. */ + +typedef struct version_function_d { + tree decl; + tree ifunc_decl; + tree ifunc_resolver_decl; + VEC (void_p, heap) *versions; + bool is_deleted; +} version_function; + +/* Hashmap has an entry for every function decl that has other function + versions. For function decls that are the default, it also stores the + list of all the other function versions. Each entry is a structure + of type version_function_d. */ +static htab_t decl_version_htab = NULL; + +/* Hashtable helpers for decl_version_htab. */ + +static hashval_t +decl_version_htab_hash_descriptor (const void *p) +{ + const version_function *t = (const version_function *) p; + return htab_hash_pointer (t->decl); +} + +/* Hashtable helper for decl_version_htab. */ + +static int +decl_version_htab_eq_descriptor (const void *p1, const void *p2) +{ + const version_function *t1 = (const version_function *) p1; + return htab_eq_pointer ((const void_p) t1->decl, p2); +} + +/* Create the decl_version_htab. */ +static void +create_decl_version_htab (void) +{ + if (decl_version_htab == NULL) + decl_version_htab = htab_create (10, decl_version_htab_hash_descriptor, + decl_version_htab_eq_descriptor, NULL); +} + +/* Creates an instance of version_function for decl DECL. */ + +static version_function* +new_version_function (const tree decl) +{ + version_function *v; + v = (version_function *)xmalloc(sizeof (version_function)); + v->decl = decl; + v->ifunc_decl = NULL; + v->ifunc_resolver_decl = NULL; + v->versions = NULL; + v->is_deleted = false; + return v; +} + +/* Comparator function to be used in qsort routine to sort attribute + specification strings to "targetv". */ + +static int +attr_strcmp (const void *v1, const void *v2) +{ + const char *c1 = *(char *const*)v1; + const char *c2 = *(char *const*)v2; + return strcmp (c1, c2); +} + +/* STR is the argument to targetv attribute. This function tokenizes + the comma separated arguments, sorts them and returns a string which + is a unique identifier for the comma separated arguments. */ + +static char * +sorted_attr_string (const char *str) +{ + char **args = NULL; + char *attr_str, *ret_str; + char *attr = NULL; + unsigned int argnum = 1; + unsigned int i; + + for (i = 0; i < strlen (str); i++) + if (str[i] == ',') + argnum++; + + attr_str = (char *)xmalloc (strlen (str) + 1); + strcpy (attr_str, str); + + for (i = 0; i < strlen (attr_str); i++) + if (attr_str[i] == '=') + attr_str[i] = '_'; + + if (argnum == 1) + return attr_str; + + args = (char **)xmalloc (argnum * sizeof (char *)); + + i = 0; + attr = strtok (attr_str, ","); + while (attr != NULL) + { + args[i] = attr; + i++; + attr = strtok (NULL, ","); + } + + qsort (args, argnum, sizeof (char*), attr_strcmp); + + ret_str = (char *)xmalloc (strlen (str) + 1); + strcpy (ret_str, args[0]); + for (i = 1; i < argnum; i++) + { + strcat (ret_str, "_"); + strcat (ret_str, args[i]); + } + + free (args); + free (attr_str); + return ret_str; +} + +/* Returns true when only one of DECL1 and DECL2 is marked with "targetv" + or if the "targetv" attribute strings of DECL1 and DECL2 dont match. */ + +bool +has_different_version_attributes (const tree decl1, const tree decl2) +{ + tree attr1, attr2; + char *c1, *c2; + bool ret = false; + + if (TREE_CODE (decl1) != FUNCTION_DECL + || TREE_CODE (decl2) != FUNCTION_DECL) + return false; + + attr1 = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl1)); + attr2 = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl2)); + + if (attr1 == NULL_TREE && attr2 == NULL_TREE) + return false; + + if ((attr1 == NULL_TREE && attr2 != NULL_TREE) + || (attr1 != NULL_TREE && attr2 == NULL_TREE)) + return true; + + c1 = sorted_attr_string ( + TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr1)))); + c2 = sorted_attr_string ( + TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr2)))); + + if (strcmp (c1, c2) != 0) + ret = true; + + free (c1); + free (c2); + + return ret; +} + +/* If this decl corresponds to a function and has "targetv" attribute, + append the attribute string to its assembler name. */ + +void +version_assembler_name (const tree decl) +{ + tree version_attr; + const char *orig_name, *version_string, *attr_str; + char *assembler_name; + tree assembler_name_tree; + + if (TREE_CODE (decl) != FUNCTION_DECL + || DECL_ASSEMBLER_NAME_SET_P (decl) + || !DECL_FUNCTION_VERSIONED (decl)) + return; + + if (DECL_DECLARED_INLINE_P (decl) + &&lookup_attribute ("gnu_inline", + DECL_ATTRIBUTES (decl))) + error_at (DECL_SOURCE_LOCATION (decl), + "Function versions cannot be marked as gnu_inline," + " bodies have to be generated\n"); + + if (DECL_VIRTUAL_P (decl) + || DECL_VINDEX (decl)) + error_at (DECL_SOURCE_LOCATION (decl), + "Virtual function versioning not supported\n"); + + version_attr = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl)); + /* targetv attribute string is NULL for default functions. */ + if (version_attr == NULL_TREE) + return; + + orig_name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)); + version_string + = TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (version_attr))); + + attr_str = sorted_attr_string (version_string); + assembler_name = (char *) xmalloc (strlen (orig_name) + + strlen (attr_str) + 2); + + sprintf (assembler_name, "%s.%s", orig_name, attr_str); + if (dump_file) + fprintf (dump_file, "Assembler name set to %s for function version %s\n", + assembler_name, IDENTIFIER_POINTER (DECL_NAME (decl))); + assembler_name_tree = get_identifier (assembler_name); + SET_DECL_ASSEMBLER_NAME (decl, assembler_name_tree); +} + +/* Returns true if decl is multi-versioned and DECL is the default function, + that is it is not tagged with "targetv" attribute. */ + +bool +is_default_function (const tree decl) +{ + return (TREE_CODE (decl) == FUNCTION_DECL + && DECL_FUNCTION_VERSIONED (decl) + && (lookup_attribute ("targetv", DECL_ATTRIBUTES (decl)) + == NULL_TREE)); +} + +/* For function decl DECL, find the version_function struct in the + decl_version_htab. */ + +static version_function * +find_function_version (const tree decl) +{ + void *slot; + + if (!DECL_FUNCTION_VERSIONED (decl)) + return NULL; + + if (!decl_version_htab) + return NULL; + + slot = htab_find_with_hash (decl_version_htab, decl, + htab_hash_pointer (decl)); + + if (slot != NULL) + return (version_function *)slot; + + return NULL; +} + +/* Record DECL as a function version by creating a version_function struct + for it and storing it in the hashtable. */ + +static version_function * +add_function_version (const tree decl) +{ + void **slot; + version_function *v; + + if (!DECL_FUNCTION_VERSIONED (decl)) + return NULL; + + create_decl_version_htab (); + + slot = htab_find_slot_with_hash (decl_version_htab, (const void_p)decl, + htab_hash_pointer ((const void_p)decl), + INSERT); + + if (*slot != NULL) + return (version_function *)*slot; + + v = new_version_function (decl); + *slot = v; + + return v; +} + +/* Push V into VEC only if it is not already present. */ + +static void +push_function_version (version_function *v, VEC (void_p, heap) *vec) +{ + int ix; + void_p ele; + for (ix = 0; VEC_iterate (void_p, vec, ix, ele); ++ix) + { + if (ele == (void_p)v) + return; + } + + VEC_safe_push (void_p, heap, vec, (void*)v); +} + +/* Mark DECL as deleted. This is called by the front-end when a duplicate + decl is merged with the original decl and the duplicate decl is deleted. + This function marks the duplicate_decl as invalid. Called by + duplicate_decls in cp/decl.c. */ + +void +mark_delete_decl_version (const tree decl) +{ + version_function *decl_v; + + decl_v = find_function_version (decl); + + if (decl_v == NULL) + return; + + decl_v->is_deleted = true; + + if (is_default_function (decl) + && decl_v->versions != NULL) + { + VEC_truncate (void_p, decl_v->versions, 0); + VEC_free (void_p, heap, decl_v->versions); + } +} + +/* Mark DECL1 and DECL2 to be function versions in the same group. One + of DECL1 and DECL2 must be the default, otherwise this function does + nothing. This function aggregates the versions. */ + +int +group_function_versions (const tree decl1, const tree decl2) +{ + tree default_decl, version_decl; + version_function *default_v, *version_v; + + gcc_assert (DECL_FUNCTION_VERSIONED (decl1) + && DECL_FUNCTION_VERSIONED (decl2)); + + /* The version decls are added only to the default decl. */ + if (!is_default_function (decl1) + && !is_default_function (decl2)) + return 0; + + /* This can happen with duplicate declarations. Just ignore. */ + if (is_default_function (decl1) + && is_default_function (decl2)) + return 0; + + default_decl = (is_default_function (decl1)) ? decl1 : decl2; + version_decl = (default_decl == decl1) ? decl2 : decl1; + + gcc_assert (default_decl != version_decl); + create_decl_version_htab (); + + /* If the version function is found, it has been added. */ + if (find_function_version (version_decl)) + return 0; + + default_v = add_function_version (default_decl); + version_v = add_function_version (version_decl); + + if (default_v->versions == NULL) + default_v->versions = VEC_alloc (void_p, heap, 1); + + push_function_version (version_v, default_v->versions); + return 0; +} + +/* Makes a function attribute of the form NAME(ARG_NAME) and chains + it to CHAIN. */ + +static tree +make_attribute (const char *name, const char *arg_name, tree chain) +{ + tree attr_name; + tree attr_arg_name; + tree attr_args; + tree attr; + + attr_name = get_identifier (name); + attr_arg_name = build_string (strlen (arg_name), arg_name); + attr_args = tree_cons (NULL_TREE, attr_arg_name, NULL_TREE); + attr = tree_cons (attr_name, attr_args, chain); + return attr; +} + +/* Return a new name by appending SUFFIX to the DECL name. If + make_unique is true, append the full path name. */ + +static char * +make_name (tree decl, const char *suffix, bool make_unique) +{ + char *global_var_name; + int name_len; + const char *name; + const char *unique_name = NULL; + + name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)); + + /* Get a unique name that can be used globally without any chances + of collision at link time. */ + if (make_unique) + unique_name = IDENTIFIER_POINTER (get_file_function_name ("\0")); + + name_len = strlen (name) + strlen (suffix) + 2; + + if (make_unique) + name_len += strlen (unique_name) + 1; + global_var_name = (char *) xmalloc (name_len); + + /* Use '.' to concatenate names as it is demangler friendly. */ + if (make_unique) + snprintf (global_var_name, name_len, "%s.%s.%s", name, + unique_name, suffix); + else + snprintf (global_var_name, name_len, "%s.%s", name, suffix); + + return global_var_name; +} + +/* Make the resolver function decl for ifunc (IFUNC_DECL) to dispatch + the versions of multi-versioned function DEFAULT_DECL. Create and + empty basic block in the resolver and store the pointer in + EMPTY_BB. Return the decl of the resolver function. */ + +static tree +make_ifunc_resolver_func (const tree default_decl, + const tree ifunc_decl, + basic_block *empty_bb) +{ + char *resolver_name; + tree decl, type, decl_name, t; + basic_block new_bb; + tree old_current_function_decl; + bool make_unique = false; + + /* IFUNC's have to be globally visible. So, if the default_decl is + not, then the name of the IFUNC should be made unique. */ + if (TREE_PUBLIC (default_decl) == 0) + make_unique = true; + + /* Append the filename to the resolver function if the versions are + not externally visible. This is because the resolver function has + to be externally visible for the loader to find it. So, appending + the filename will prevent conflicts with a resolver function from + another module which is based on the same version name. */ + resolver_name = make_name (default_decl, "resolver", make_unique); + + /* The resolver function should return a (void *). */ + type = build_function_type_list (ptr_type_node, NULL_TREE); + + decl = build_fn_decl (resolver_name, type); + decl_name = get_identifier (resolver_name); + SET_DECL_ASSEMBLER_NAME (decl, decl_name); + + DECL_NAME (decl) = decl_name; + TREE_USED (decl) = TREE_USED (default_decl); + DECL_ARTIFICIAL (decl) = 1; + DECL_IGNORED_P (decl) = 0; + /* IFUNC resolvers have to be externally visible. */ + TREE_PUBLIC (decl) = 1; + DECL_UNINLINABLE (decl) = 1; + + DECL_EXTERNAL (decl) = DECL_EXTERNAL (default_decl); + DECL_EXTERNAL (ifunc_decl) = 0; + + DECL_CONTEXT (decl) = NULL_TREE; + DECL_INITIAL (decl) = make_node (BLOCK); + DECL_STATIC_CONSTRUCTOR (decl) = 0; + TREE_READONLY (decl) = 0; + DECL_PURE_P (decl) = 0; + DECL_COMDAT (decl) = DECL_COMDAT (default_decl); + if (DECL_COMDAT_GROUP (default_decl)) + { + make_decl_one_only (decl, DECL_COMDAT_GROUP (default_decl)); + } + /* Build result decl and add to function_decl. */ + t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE, ptr_type_node); + DECL_ARTIFICIAL (t) = 1; + DECL_IGNORED_P (t) = 1; + DECL_RESULT (decl) = t; + + gimplify_function_tree (decl); + old_current_function_decl = current_function_decl; + push_cfun (DECL_STRUCT_FUNCTION (decl)); + current_function_decl = decl; + init_empty_tree_cfg_for_function (DECL_STRUCT_FUNCTION (decl)); + cfun->curr_properties |= + (PROP_gimple_lcf | PROP_gimple_leh | PROP_cfg | PROP_referenced_vars | + PROP_ssa); + new_bb = create_empty_bb (ENTRY_BLOCK_PTR); + make_edge (ENTRY_BLOCK_PTR, new_bb, EDGE_FALLTHRU); + make_edge (new_bb, EXIT_BLOCK_PTR, 0); + *empty_bb = new_bb; + + cgraph_add_new_function (decl, true); + cgraph_call_function_insertion_hooks (cgraph_get_create_node (decl)); + cgraph_analyze_function (cgraph_get_create_node (decl)); + cgraph_mark_needed_node (cgraph_get_create_node (decl)); + + if (DECL_COMDAT_GROUP (default_decl)) + { + gcc_assert (cgraph_get_node (default_decl)); + cgraph_add_to_same_comdat_group (cgraph_get_node (decl), + cgraph_get_node (default_decl)); + } + + pop_cfun (); + current_function_decl = old_current_function_decl; + + gcc_assert (ifunc_decl != NULL); + DECL_ATTRIBUTES (ifunc_decl) + = make_attribute ("ifunc", resolver_name, DECL_ATTRIBUTES (ifunc_decl)); + assemble_alias (ifunc_decl, get_identifier (resolver_name)); + return decl; +} + +/* Make and ifunc declaration for the multi-versioned function DECL. Calls to + DECL function will be replaced with calls to the ifunc. Return the decl + of the ifunc created. */ + +static tree +make_ifunc_func (const tree decl) +{ + tree ifunc_decl; + char *ifunc_name, *resolver_name; + tree fn_type, ifunc_type; + bool make_unique = false; + + if (TREE_PUBLIC (decl) == 0) + make_unique = true; + + ifunc_name = make_name (decl, "ifunc", make_unique); + resolver_name = make_name (decl, "resolver", make_unique); + gcc_assert (resolver_name); + + fn_type = TREE_TYPE (decl); + ifunc_type = build_function_type (TREE_TYPE (fn_type), + TYPE_ARG_TYPES (fn_type)); + + ifunc_decl = build_fn_decl (ifunc_name, ifunc_type); + TREE_USED (ifunc_decl) = 1; + DECL_CONTEXT (ifunc_decl) = NULL_TREE; + DECL_INITIAL (ifunc_decl) = error_mark_node; + DECL_ARTIFICIAL (ifunc_decl) = 1; + /* Mark this ifunc as external, the resolver will flip it again if + it gets generated. */ + DECL_EXTERNAL (ifunc_decl) = 1; + /* IFUNCs have to be externally visible. */ + TREE_PUBLIC (ifunc_decl) = 1; + + return ifunc_decl; +} + +/* For multi-versioned function decl, which should also be the default, + return the decl of the ifunc resolver, create it if it does not + exist. */ + +tree +get_ifunc_for_version (const tree decl) +{ + version_function *decl_v; + int ix; + void_p ele; + + /* DECL has to be the default version, otherwise it is missing and + that is not allowed. */ + if (!is_default_function (decl)) + { + error_at (DECL_SOURCE_LOCATION (decl), "Default version not found"); + return decl; + } + + decl_v = find_function_version (decl); + gcc_assert (decl_v != NULL); + if (decl_v->ifunc_decl == NULL) + { + tree ifunc_decl; + ifunc_decl = make_ifunc_func (decl); + decl_v->ifunc_decl = ifunc_decl; + } + + if (cgraph_get_node (decl)) + cgraph_mark_needed_node (cgraph_get_node (decl)); + + for (ix = 0; VEC_iterate (void_p, decl_v->versions, ix, ele); ++ix) + { + version_function *v = (version_function *) ele; + gcc_assert (v->decl != NULL); + if (cgraph_get_node (v->decl)) + cgraph_mark_needed_node (cgraph_get_node (v->decl)); + } + + return decl_v->ifunc_decl; +} + +/* Generate the dispatching code to dispatch multi-versioned function + DECL. Make a new function decl for dispatching and call the target + hook to process the "targetv" attributes and provide the code to + dispatch the right function at run-time. */ + +static tree +make_ifunc_resolver_for_version (const tree decl) +{ + version_function *decl_v; + tree ifunc_resolver_decl, ifunc_decl; + basic_block empty_bb; + int ix; + void_p ele; + VEC (tree, heap) *fn_ver_vec = NULL; + + gcc_assert (is_default_function (decl)); + + decl_v = find_function_version (decl); + gcc_assert (decl_v != NULL); + + if (decl_v->ifunc_resolver_decl != NULL) + return decl_v->ifunc_resolver_decl; + + ifunc_decl = decl_v->ifunc_decl; + + if (ifunc_decl == NULL) + ifunc_decl = decl_v->ifunc_decl = make_ifunc_func (decl); + + ifunc_resolver_decl = make_ifunc_resolver_func (decl, ifunc_decl, + &empty_bb); + + fn_ver_vec = VEC_alloc (tree, heap, 2); + VEC_safe_push (tree, heap, fn_ver_vec, decl); + + for (ix = 0; VEC_iterate (void_p, decl_v->versions, ix, ele); ++ix) + { + version_function *v = (version_function *) ele; + gcc_assert (v->decl != NULL); + /* Check for virtual functions here again, as by this time it should + have been determined if this function needs a vtable index or + not. This happens for methods in derived classes that override + virtual methods in base classes but are not explicitly marked as + virtual. */ + if (DECL_VINDEX (v->decl)) + error_at (DECL_SOURCE_LOCATION (v->decl), + "Virtual function versioning not supported\n"); + if (!v->is_deleted) + VEC_safe_push (tree, heap, fn_ver_vec, v->decl); + } + + gcc_assert (targetm.dispatch_version); + targetm.dispatch_version (ifunc_resolver_decl, fn_ver_vec, &empty_bb); + decl_v->ifunc_resolver_decl = ifunc_resolver_decl; + + return ifunc_resolver_decl; +} + +/* Main entry point to pass_dispatch_versions. For multi-versioned functions, + generate the dispatching code. */ + +static unsigned int +do_dispatch_versions (void) +{ + /* A new pass for generating dispatch code for multi-versioned functions. + Other forms of dispatch can be added when ifunc support is not available + like just calling the function directly after checking for target type. + Currently, dispatching is done through IFUNC. This pass will become + more meaningful when other dispatch mechanisms are added. */ + + /* Cloning a function to produce more versions will happen here when the + user requests that via the targetv attribute. For example, + int foo () __attribute__ ((targetv(("arch=core2"), ("arch=corei7")))); + means that the user wants the same body of foo to be versioned for core2 + and corei7. In that case, this function will be cloned during this + pass. */ + + if (DECL_FUNCTION_VERSIONED (current_function_decl) + && is_default_function (current_function_decl)) + { + tree decl = make_ifunc_resolver_for_version (current_function_decl); + if (dump_file && decl) + dump_function_to_file (decl, dump_file, TDF_BLOCKS); + } + return 0; +} + +static bool +gate_dispatch_versions (void) +{ + return true; +} + +/* A pass to generate the dispatch code to execute the appropriate version + of a multi-versioned function at run-time. */ + +struct gimple_opt_pass pass_dispatch_versions = +{ + { + GIMPLE_PASS, + "dispatch_multiversion_functions", /* name */ + gate_dispatch_versions, /* gate */ + do_dispatch_versions, /* execute */ + NULL, /* sub */ + NULL, /* next */ + 0, /* static_pass_number */ + TV_MULTIVERSION_DISPATCH, /* tv_id */ + PROP_cfg, /* properties_required */ + PROP_cfg, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + TODO_dump_func | /* todo_flags_finish */ + TODO_cleanup_cfg | TODO_dump_cgraph + } +}; Index: cgraphunit.c =================================================================== --- cgraphunit.c (revision 184971) +++ cgraphunit.c (working copy) @@ -141,6 +141,7 @@ along with GCC; see the file COPYING3. If not see #include "ipa-inline.h" #include "ipa-utils.h" #include "lto-streamer.h" +#include "multiversion.h" static void cgraph_expand_all_functions (void); static void cgraph_mark_functions_to_output (void); @@ -343,6 +344,13 @@ cgraph_finalize_function (tree decl, bool nested) node->local.redefined_extern_inline = true; } + /* If this is a function version and not the default, change the + assembler name of this function. The DECL names of function + versions are the same, only the assembler names are made unique. + The assembler name is changed by appending the string from + the "targetv" attribute. */ + version_assembler_name (decl); + notice_global_symbol (decl); node->local.finalized = true; node->lowered = DECL_STRUCT_FUNCTION (decl)->cfg != NULL; Index: multiversion.h =================================================================== --- multiversion.h (revision 0) +++ multiversion.h (revision 0) @@ -0,0 +1,52 @@ +/* Function Multiversioning. + Copyright (C) 2012 Free Software Foundation, Inc. + Contributed by Sriraman Tallam (tmsriram@google.com) + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +<http://www.gnu.org/licenses/>. */ + +/* This is the header file which provides the functions to keep track + of functions that are multi-versioned and to generate the dispatch + code to call the right version at run-time. */ + +#ifndef GCC_MULTIVERSION_H +#define GCC_MULTIVERION_H + +#include "tree.h" + +/* Mark DECL1 and DECL2 as function versions. */ +int group_function_versions (const tree decl1, const tree decl2); + +/* Mark DECL as deleted and no longer a version. */ +void mark_delete_decl_version (const tree decl); + +/* Returns true if DECL is the default version to be executed if all + other versions are inappropriate at run-time. */ +bool is_default_function (const tree decl); + +/* Gets the IFUNC dispatcher for this multi-versioned function DECL. DECL + must be the default function in the multi-versioned group. */ +tree get_ifunc_for_version (const tree decl); + +/* Returns true when only one of DECL1 and DECL2 is marked with "targetv" + or if the "targetv" attribute strings of DECL1 and DECL2 dont match. */ +bool has_different_version_attributes (const tree decl1, const tree decl2); + +/* If DECL is a function version and not the default version, the assembler + name of DECL is changed to include the attribute string to keep the + name unambiguous. */ +void version_assembler_name (const tree decl); +#endif Index: cp/class.c =================================================================== --- cp/class.c (revision 184971) +++ cp/class.c (working copy) @@ -38,6 +38,7 @@ along with GCC; see the file COPYING3. If not see #include "tree-dump.h" #include "splay-tree.h" #include "pointer-set.h" +#include "multiversion.h" /* The number of nested classes being processed. If we are not in the scope of any class, this is zero. */ @@ -1092,7 +1093,20 @@ add_method (tree type, tree method, tree using_dec || same_type_p (TREE_TYPE (fn_type), TREE_TYPE (method_type)))) { - if (using_decl) + /* For function versions, their parms and types match + but they are not duplicates. Record function versions + as and when they are found. */ + if (TREE_CODE (fn) == FUNCTION_DECL + && TREE_CODE (method) == FUNCTION_DECL + && (DECL_FUNCTION_VERSIONED (fn) + || DECL_FUNCTION_VERSIONED (method))) + { + DECL_FUNCTION_VERSIONED (fn) = 1; + DECL_FUNCTION_VERSIONED (method) = 1; + group_function_versions (fn, method); + continue; + } + else if (using_decl) { if (DECL_CONTEXT (fn) == type) /* Defer to the local function. */ @@ -1150,6 +1164,13 @@ add_method (tree type, tree method, tree using_dec else /* Replace the current slot. */ VEC_replace (tree, method_vec, slot, overload); + + /* Change the assembler name of method here if it has "targetv" + attributes. Since all versions have the same mangled name, + their assembler name is changed by appending the string from + the "targetv" attribute. */ + version_assembler_name (method); + return true; } @@ -6890,8 +6911,11 @@ resolve_address_of_overloaded_function (tree targe if (DECL_ANTICIPATED (fn)) continue; - /* See if there's a match. */ - if (same_type_p (target_fn_type, static_fn_type (fn))) + /* See if there's a match. For functions that are multi-versioned + match it to the default function. */ + if (same_type_p (target_fn_type, static_fn_type (fn)) + && (!DECL_FUNCTION_VERSIONED (fn) + || is_default_function (fn))) matches = tree_cons (fn, NULL_TREE, matches); } } @@ -7053,6 +7077,21 @@ resolve_address_of_overloaded_function (tree targe perform_or_defer_access_check (access_path, fn, fn); } + /* If a pointer to a function that is multi-versioned is requested, the + pointer to the dispatcher function is returned instead. This works + well because indirectly calling the function will dispatch the right + function version at run-time. Also, the function address is kept + unique. */ + if (DECL_FUNCTION_VERSIONED (fn) + && is_default_function (fn)) + { + tree ifunc_decl; + ifunc_decl = get_ifunc_for_version (fn); + gcc_assert (ifunc_decl != NULL); + mark_used (fn); + return build_fold_addr_expr (ifunc_decl); + } + if (TYPE_PTRFN_P (target_type) || TYPE_PTRMEMFUNC_P (target_type)) return cp_build_addr_expr (fn, flags); else Index: cp/decl.c =================================================================== --- cp/decl.c (revision 184971) +++ cp/decl.c (working copy) @@ -54,6 +54,7 @@ along with GCC; see the file COPYING3. If not see #include "pointer-set.h" #include "splay-tree.h" #include "plugin.h" +#include "multiversion.h" /* Possible cases of bad specifiers type used by bad_specifiers. */ enum bad_spec_place { @@ -972,6 +973,23 @@ decls_match (tree newdecl, tree olddecl) if (t1 != t2) return 0; + /* The decls dont match if they correspond to two different versions + of the same function. */ + if (compparms (p1, p2) + && same_type_p (TREE_TYPE (f1), TREE_TYPE (f2)) + && (DECL_FUNCTION_VERSIONED (newdecl) + || DECL_FUNCTION_VERSIONED (olddecl)) + && has_different_version_attributes (newdecl, olddecl)) + { + /* One of the decls could be the default without the "targetv" + attribute. Set it to be a versioned function here. */ + DECL_FUNCTION_VERSIONED (newdecl) = 1; + DECL_FUNCTION_VERSIONED (olddecl) = 1; + /* Accumulate all the versions of a function. */ + group_function_versions (olddecl, newdecl); + return 0; + } + if (CP_DECL_CONTEXT (newdecl) != CP_DECL_CONTEXT (olddecl) && ! (DECL_EXTERN_C_P (newdecl) && DECL_EXTERN_C_P (olddecl))) @@ -1482,7 +1500,11 @@ duplicate_decls (tree newdecl, tree olddecl, bool error ("previous declaration %q+#D here", olddecl); return NULL_TREE; } - else if (compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)), + /* For function versions, params and types match, but they + are not ambiguous. */ + else if ((!DECL_FUNCTION_VERSIONED (newdecl) + && !DECL_FUNCTION_VERSIONED (olddecl)) + && compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)), TYPE_ARG_TYPES (TREE_TYPE (olddecl)))) { error ("new declaration %q#D", newdecl); @@ -2250,6 +2272,16 @@ duplicate_decls (tree newdecl, tree olddecl, bool else if (DECL_PRESERVE_P (newdecl)) DECL_PRESERVE_P (olddecl) = 1; + /* If the olddecl is a version, so is the newdecl. */ + if (TREE_CODE (newdecl) == FUNCTION_DECL + && DECL_FUNCTION_VERSIONED (olddecl)) + { + DECL_FUNCTION_VERSIONED (newdecl) = 1; + /* Record that newdecl is not a valid version and has + been deleted. */ + mark_delete_decl_version (newdecl); + } + if (TREE_CODE (newdecl) == FUNCTION_DECL) { int function_size; @@ -4512,6 +4544,10 @@ start_decl (const cp_declarator *declarator, /* Enter this declaration into the symbol table. */ decl = maybe_push_decl (decl); + /* If this decl is a function version and not the default, its assembler + name has to be changed. */ + version_assembler_name (decl); + if (processing_template_decl) decl = push_template_decl (decl); if (decl == error_mark_node) @@ -13019,6 +13055,10 @@ start_function (cp_decl_specifier_seq *declspecs, gcc_assert (same_type_p (TREE_TYPE (TREE_TYPE (decl1)), integer_type_node)); + /* If this decl is a function version and not the default, its assembler + name has to be changed. */ + version_assembler_name (decl1); + start_preparsed_function (decl1, attrs, /*flags=*/SF_DEFAULT); return 1; @@ -13960,6 +14000,11 @@ cxx_comdat_group (tree decl) break; } name = DECL_ASSEMBLER_NAME (decl); + if (TREE_CODE (decl) == FUNCTION_DECL + && DECL_FUNCTION_VERSIONED (decl)) + name = DECL_NAME (decl); + else + name = DECL_ASSEMBLER_NAME (decl); } return name; Index: cp/semantics.c =================================================================== --- cp/semantics.c (revision 184971) +++ cp/semantics.c (working copy) @@ -3783,8 +3783,11 @@ expand_or_defer_fn_1 (tree fn) /* If the user wants us to keep all inline functions, then mark this function as needed so that finish_file will make sure to output it later. Similarly, all dllexport'd functions must - be emitted; there may be callers in other DLLs. */ - if ((flag_keep_inline_functions + be emitted; there may be callers in other DLLs. + Also, mark this function as needed if it is marked inline but + is a multi-versioned function. */ + if (((flag_keep_inline_functions + || DECL_FUNCTION_VERSIONED (fn)) && DECL_DECLARED_INLINE_P (fn) && !DECL_REALLY_EXTERN (fn)) || (flag_keep_inline_dllexport Index: cp/decl2.c =================================================================== --- cp/decl2.c (revision 184971) +++ cp/decl2.c (working copy) @@ -53,6 +53,7 @@ along with GCC; see the file COPYING3. If not see #include "splay-tree.h" #include "langhooks.h" #include "c-family/c-ada-spec.h" +#include "multiversion.h" extern cpp_reader *parse_in; @@ -674,9 +675,13 @@ check_classfn (tree ctype, tree function, tree tem if (is_template != (TREE_CODE (fndecl) == TEMPLATE_DECL)) continue; + /* While finding a match, same types and params are not enough + if the function is versioned. Also check version ("targetv") + attributes. */ if (same_type_p (TREE_TYPE (TREE_TYPE (function)), TREE_TYPE (TREE_TYPE (fndecl))) && compparms (p1, p2) + && !has_different_version_attributes (function, fndecl) && (!is_template || comp_template_parms (template_parms, DECL_TEMPLATE_PARMS (fndecl))) Index: cp/call.c =================================================================== --- cp/call.c (revision 184971) +++ cp/call.c (working copy) @@ -41,6 +41,7 @@ along with GCC; see the file COPYING3. If not see #include "langhooks.h" #include "c-family/c-objc.h" #include "timevar.h" +#include "multiversion.h" /* The various kinds of conversion. */ @@ -6730,6 +6731,17 @@ build_over_call (struct z_candidate *cand, int fla if (!already_used) mark_used (fn); + /* For a call to a multi-versioned function, the call should actually be to + the dispatcher. */ + if (DECL_FUNCTION_VERSIONED (fn)) + { + tree ifunc_decl; + ifunc_decl = get_ifunc_for_version (fn); + gcc_assert (ifunc_decl != NULL); + return build_call_expr_loc_array (UNKNOWN_LOCATION, ifunc_decl, + nargs, argarray); + } + if (DECL_VINDEX (fn) && (flags & LOOKUP_NONVIRTUAL) == 0) { tree t; @@ -7980,6 +7992,30 @@ joust (struct z_candidate *cand1, struct z_candida size_t i; size_t len; + /* For Candidates of a multi-versioned function, the one marked default + wins. This is because the default decl is used as key to aggregate + all the other versions provided for it in multiversion.c. When + generating the actual call, the appropriate dispatcher is created + to call the right function version at run-time. */ + + if ((TREE_CODE (cand1->fn) == FUNCTION_DECL + && DECL_FUNCTION_VERSIONED (cand1->fn)) + ||(TREE_CODE (cand2->fn) == FUNCTION_DECL + && DECL_FUNCTION_VERSIONED (cand2->fn))) + { + if (is_default_function (cand1->fn)) + { + mark_used (cand2->fn); + return 1; + } + if (is_default_function (cand2->fn)) + { + mark_used (cand1->fn); + return -1; + } + return 0; + } + /* Candidates that involve bad conversions are always worse than those that don't. */ if (cand1->viable > cand2->viable) Index: timevar.def =================================================================== --- timevar.def (revision 184971) +++ timevar.def (working copy) @@ -253,6 +253,7 @@ DEFTIMEVAR (TV_TREE_IFCOMBINE , "tree if-co DEFTIMEVAR (TV_TREE_UNINIT , "uninit var analysis") DEFTIMEVAR (TV_PLUGIN_INIT , "plugin initialization") DEFTIMEVAR (TV_PLUGIN_RUN , "plugin execution") +DEFTIMEVAR (TV_MULTIVERSION_DISPATCH , "multiversion dispatch") /* Everything else in rest_of_compilation not included above. */ DEFTIMEVAR (TV_EARLY_LOCAL , "early local passes") Index: varasm.c =================================================================== --- varasm.c (revision 184971) +++ varasm.c (working copy) @@ -5755,6 +5755,8 @@ finish_aliases_1 (void) } else if (! (p->emitted_diags & ALIAS_DIAG_TO_EXTERN) && DECL_EXTERNAL (target_decl) + && (!TREE_CODE (target_decl) == FUNCTION_DECL + || !DECL_STRUCT_FUNCTION (target_decl)) /* We use local aliases for C++ thunks to force the tailcall to bind locally. This is a hack - to keep it working do the following (which is not strictly correct). */ Index: Makefile.in =================================================================== --- Makefile.in (revision 184971) +++ Makefile.in (working copy) @@ -1298,6 +1298,7 @@ OBJS = \ mcf.o \ mode-switching.o \ modulo-sched.o \ + multiversion.o \ omega.o \ omp-low.o \ optabs.o \ @@ -3030,6 +3031,11 @@ ree.o : ree.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(DF_H) $(TIMEVAR_H) tree-pass.h $(RECOG_H) $(EXPR_H) \ $(REGS_H) $(TREE_H) $(TM_P_H) insn-config.h $(INSN_ATTR_H) $(DIAGNOSTIC_CORE_H) \ $(TARGET_H) $(OPTABS_H) insn-codes.h rtlhooks-def.h $(PARAMS_H) $(CGRAPH_H) +multiversion.o : multiversion.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \ + $(TREE_H) langhooks.h $(TREE_INLINE_H) $(FLAGS_H) $(CGRAPH_H) intl.h \ + $(DIAGNOSTIC_H) $(FIBHEAP_H) $(PARAMS_H) $(TIMEVAR_H) tree-pass.h \ + $(HASHTAB_H) $(COVERAGE_H) $(GGC_H) $(TREE_FLOW_H) $(RTL_H) $(IPA_PROP_H) \ + $(BASIC_BLOCK_H) $(TOPLEV_H) $(TREE_DUMP_H) ipa-inline.h cprop.o : cprop.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \ $(REGS_H) hard-reg-set.h $(FLAGS_H) insn-config.h $(GGC_H) \ $(RECOG_H) $(EXPR_H) $(BASIC_BLOCK_H) $(FUNCTION_H) output.h toplev.h $(DIAGNOSTIC_CORE_H) \ Index: passes.c =================================================================== --- passes.c (revision 184971) +++ passes.c (working copy) @@ -1190,6 +1190,7 @@ init_optimization_passes (void) NEXT_PASS (pass_build_cfg); NEXT_PASS (pass_warn_function_return); NEXT_PASS (pass_build_cgraph_edges); + NEXT_PASS (pass_dispatch_versions); *p = NULL; /* Interprocedural optimization passes. */ Index: config/i386/i386.c =================================================================== --- config/i386/i386.c (revision 184971) +++ config/i386/i386.c (working copy) @@ -27446,6 +27473,593 @@ ix86_init_mmx_sse_builtins (void) } } +/* This adds a condition to the basic_block NEW_BB in function FUNCTION_DECL + to return a pointer to VERSION_DECL if the outcome of the function + PREDICATE_DECL is true. This function will be called during version + dispatch to decide which function version to execute. It returns the + basic block at the end to which more conditions can be added. */ + +static basic_block +add_condition_to_bb (tree function_decl, tree version_decl, + basic_block new_bb, tree predicate_decl) +{ + gimple return_stmt; + tree convert_expr, result_var; + gimple convert_stmt; + gimple call_cond_stmt; + gimple if_else_stmt; + + basic_block bb1, bb2, bb3; + edge e12, e23; + + tree cond_var; + gimple_seq gseq; + + tree old_current_function_decl; + + old_current_function_decl = current_function_decl; + push_cfun (DECL_STRUCT_FUNCTION (function_decl)); + current_function_decl = function_decl; + + gcc_assert (new_bb != NULL); + gseq = bb_seq (new_bb); + + + convert_expr = build1 (CONVERT_EXPR, ptr_type_node, + build_fold_addr_expr (version_decl)); + result_var = create_tmp_var (ptr_type_node, NULL); + convert_stmt = gimple_build_assign (result_var, convert_expr); + return_stmt = gimple_build_return (result_var); + + if (predicate_decl == NULL_TREE) + { + gimple_seq_add_stmt (&gseq, convert_stmt); + gimple_seq_add_stmt (&gseq, return_stmt); + set_bb_seq (new_bb, gseq); + gimple_set_bb (convert_stmt, new_bb); + gimple_set_bb (return_stmt, new_bb); + pop_cfun (); + current_function_decl = old_current_function_decl; + return new_bb; + } + + cond_var = create_tmp_var (integer_type_node, NULL); + call_cond_stmt = gimple_build_call (predicate_decl, 0); + gimple_call_set_lhs (call_cond_stmt, cond_var); + + gimple_set_block (call_cond_stmt, DECL_INITIAL (function_decl)); + gimple_set_bb (call_cond_stmt, new_bb); + gimple_seq_add_stmt (&gseq, call_cond_stmt); + + if_else_stmt = gimple_build_cond (GT_EXPR, cond_var, + integer_zero_node, + NULL_TREE, NULL_TREE); + gimple_set_block (if_else_stmt, DECL_INITIAL (function_decl)); + gimple_set_bb (if_else_stmt, new_bb); + gimple_seq_add_stmt (&gseq, if_else_stmt); + + gimple_seq_add_stmt (&gseq, convert_stmt); + gimple_seq_add_stmt (&gseq, return_stmt); + set_bb_seq (new_bb, gseq); + + bb1 = new_bb; + e12 = split_block (bb1, if_else_stmt); + bb2 = e12->dest; + e12->flags &= ~EDGE_FALLTHRU; + e12->flags |= EDGE_TRUE_VALUE; + + e23 = split_block (bb2, return_stmt); + + gimple_set_bb (convert_stmt, bb2); + gimple_set_bb (return_stmt, bb2); + + bb3 = e23->dest; + make_edge (bb1, bb3, EDGE_FALSE_VALUE); + + remove_edge (e23); + make_edge (bb2, EXIT_BLOCK_PTR, 0); + + rebuild_cgraph_edges (); + + pop_cfun (); + current_function_decl = old_current_function_decl; + + return bb3; +} + +/* This parses the attribute arguments to targetv in DECL and determines + the right builtin to use to match the platform specification. + For now, only one target argument ("arch=") is allowed. */ + +static enum ix86_builtins +get_builtin_code_for_version (tree decl) +{ + tree attrs; + struct cl_target_option cur_target; + tree target_node; + struct cl_target_option *new_target; + enum ix86_builtins builtin_code = IX86_BUILTIN_MAX; + + attrs = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl)); + gcc_assert (attrs != NULL); + + cl_target_option_save (&cur_target, &global_options); + + target_node = ix86_valid_target_attribute_tree + (TREE_VALUE (TREE_VALUE (attrs))); + + gcc_assert (target_node); + new_target = TREE_TARGET_OPTION (target_node); + gcc_assert (new_target); + + if (new_target->arch_specified && new_target->arch > 0) + { + switch (new_target->arch) + { + case 1: + case 2: + case 3: + case 4: + case 5: + case 6: + case 7: + case 8: + case 9: + case 10: + case 11: + builtin_code = IX86_BUILTIN_CPU_IS_INTEL; + break; + case 12: + builtin_code = IX86_BUILTIN_CPU_IS_INTEL_CORE2; + break; + case 13: + builtin_code = IX86_BUILTIN_CPU_IS_INTEL_COREI7; + break; + case 14: + builtin_code = IX86_BUILTIN_CPU_IS_INTEL_ATOM; + break; + case 15: + case 16: + case 17: + case 18: + case 19: + case 20: + case 21: + builtin_code = IX86_BUILTIN_CPU_IS_AMD; + break; + case 22: + builtin_code = IX86_BUILTIN_CPU_IS_AMDFAM10H; + break; + case 23: + builtin_code = IX86_BUILTIN_CPU_IS_AMDFAM15H_BDVER1; + break; + case 24: + builtin_code = IX86_BUILTIN_CPU_IS_AMDFAM15H_BDVER2; + break; + case 25: /* What is btver1 ? */ + builtin_code = IX86_BUILTIN_CPU_IS_AMD; + break; + } + } + + cl_target_option_restore (&global_options, &cur_target); + if (builtin_code == IX86_BUILTIN_MAX) + error_at (DECL_SOURCE_LOCATION (decl), + "No dispatcher found for the versioning attributes"); + + return builtin_code; +} + +/* This is the target hook to generate the dispatch function for + multi-versioned functions. DISPATCH_DECL is the function which will + contain the dispatch logic. FNDECLS are the function choices for + dispatch, and is a tree chain. EMPTY_BB is the basic block pointer + in DISPATCH_DECL in which the dispatch code is generated. */ + +static int +ix86_dispatch_version (tree dispatch_decl, + void *fndecls_p, + basic_block *empty_bb) +{ + tree default_decl; + gimple ifunc_cpu_init_stmt; + gimple_seq gseq; + tree old_current_function_decl; + int ix; + tree ele; + VEC (tree, heap) *fndecls; + + gcc_assert (dispatch_decl != NULL + && fndecls_p != NULL + && empty_bb != NULL); + + /*fndecls_p is actually a vector. */ + fndecls = (VEC (tree, heap) *)fndecls_p; + + /* Atleast one more version other than the default. */ + gcc_assert (VEC_length (tree, fndecls) >= 2); + + /* The first version in the vector is the default decl. */ + default_decl = VEC_index (tree, fndecls, 0); + + old_current_function_decl = current_function_decl; + push_cfun (DECL_STRUCT_FUNCTION (dispatch_decl)); + current_function_decl = dispatch_decl; + + gseq = bb_seq (*empty_bb); + ifunc_cpu_init_stmt = gimple_build_call_vec ( + ix86_builtins [(int) IX86_BUILTIN_CPU_INIT], NULL); + gimple_seq_add_stmt (&gseq, ifunc_cpu_init_stmt); + gimple_set_bb (ifunc_cpu_init_stmt, *empty_bb); + set_bb_seq (*empty_bb, gseq); + + pop_cfun (); + current_function_decl = old_current_function_decl; + + + for (ix = 1; VEC_iterate (tree, fndecls, ix, ele); ++ix) + { + tree version_decl = ele; + /* Get attribute string, parse it and find the right predicate decl. + The predicate function could be a lengthy combination of many + features, like arch-type and various isa-variants. For now, only + check the arch-type. */ + tree predicate_decl = ix86_builtins [ + get_builtin_code_for_version (version_decl)]; + *empty_bb = add_condition_to_bb (dispatch_decl, version_decl, *empty_bb, + predicate_decl); + + } + /* dispatch default version at the end. */ + *empty_bb = add_condition_to_bb (dispatch_decl, default_decl, *empty_bb, + NULL); + return 0; +} @@ -38610,6 +39269,12 @@ ix86_autovectorize_vector_sizes (void) #undef TARGET_BUILD_BUILTIN_VA_LIST #define TARGET_BUILD_BUILTIN_VA_LIST ix86_build_builtin_va_list +#undef TARGET_DISPATCH_VERSION +#define TARGET_DISPATCH_VERSION ix86_dispatch_version + #undef TARGET_ENUM_VA_LIST_P #define TARGET_ENUM_VA_LIST_P ix86_enum_va_list Index: testsuite/g++.dg/mv1.C =================================================================== --- testsuite/g++.dg/mv1.C (revision 0) +++ testsuite/g++.dg/mv1.C (revision 0) @@ -0,0 +1,23 @@ +/* Simple test case to check if Multiversioning works. */ +/* { dg-do run } */ +/* { dg-options "-O2" } */ + +int foo (); +int foo () __attribute__ ((targetv("arch=corei7"))); + +int main () +{ + int (*p)() = &foo; + return foo () + (*p)(); +} + +int foo () +{ + return 0; +} + +int __attribute__ ((targetv("arch=corei7"))) +foo () +{ + return 0; +} -- This patch is available for review at http://codereview.appspot.com/5752064
Sign in to reply to this message.
On Wed, Mar 7, 2012 at 1:46 AM, Sriraman Tallam <tmsriram@google.com> wrote: > User directed Function Multiversioning (MV) via Function Overloading > ==================================================================== > > This patch adds support for user directed function MV via function overloading. > For more detailed description: > http://gcc.gnu.org/ml/gcc/2012-03/msg00074.html > > > Here is an example program with function versions: > > int foo ();  /* Default version */ > int foo () __attribute__ ((targetv("arch=corei7")));/*Specialized for corei7 */ > int foo () __attribute__ ((targetv("arch=core2")));/*Specialized for core2 */ > > int main () > { >  int (*p)() = &foo; >  return foo () + (*p)(); > } > > int foo () > { >  return 0; > } > > int __attribute__ ((targetv("arch=corei7"))) > foo () > { >  return 0; > } > > int __attribute__ ((targetv("arch=core2"))) > foo () > { >  return 0; > } > > The above example has foo defined 3 times, but all 3 definitions of foo are > different versions of the same function. The call to foo in main, directly and > via a pointer, are calls to the multi-versioned function foo which is dispatched > to the right foo at run-time. > > Function versions must have the same signature but must differ in the specifier > string provided to a new attribute called "targetv", which is nothing but the > target attribute with an extra specification to indicate a version. Any number > of versions can be created using the targetv attribute but it is mandatory to > have one function without the attribute, which is treated as the default > version. > > The dispatching is done using the IFUNC mechanism to keep the dispatch overhead > low. The compiler creates a dispatcher function which checks the CPU type and > calls the right version of foo. The dispatching code checks for the platform > type and calls the first version that matches. The default function is called if > no specialized version is appropriate for execution. > > The pointer to foo is made to be the address of the dispatcher function, so that > it is unique and calls made via the pointer also work correctly. The assembler > names of the various versions of foo is made different, by tagging > the specifier strings, to keep them unique.  A specific version can be called > directly by creating an alias to its assembler name. For instance, to call the > corei7 version directly, make an alias : > int foo_corei7 () __attribute__((alias ("_Z3foov.arch_corei7"))); > and then call foo_corei7. > > Note that using IFUNC  blocks inlining of versioned functions. I had implemented > an optimization earlier to do hot path cloning to allow versioned functions to > be inlined. Please see : http://gcc.gnu.org/ml/gcc-patches/2011-04/msg02285.html > In the next iteration, I plan to merge these two. With that, hot code paths with > versioned functions will be cloned so that versioned functions can be inlined. Note that inlining of functions with the target attribute is limited as well, but your issue is that of the indirect dispatch as ... You don't give an overview of the frontend implementation. Thus I have extracted the following - the FE does not really know about the "overloading", nor can it directly resolve calls from a "sse" function to another "sse" function without going through the 2nd IFUNC - cgraph also does not know about the "overloading", so it cannot do such "devirtualization" either you seem to have implemented something inbetween a pure frontend solution and a proper middle-end solution. For optimization and eventually automatically selecting functions for cloning (like, callees of a manual "sse" versioned function should be cloned?) it would be nice if the cgraph would know about the different versions and their relationships (and the dispatcher). Especially the cgraph code should know the functions are semantically equivalent (I suppose we should require that). The IFUNC should be generated by cgraph / target code, similar to how we generate C++ thunks. Honza, any suggestions on how the FE side of such cgraph infrastructure should look like and how we should encode the target bits? Thanks, Richard. >     * doc/tm.texi.in: Add description for TARGET_DISPATCH_VERSION. >     * doc/tm.texi: Regenerate. >     * c-family/c-common.c (handle_targetv_attribute): New function. >     * target.def (dispatch_version): New target hook. >     * tree.h (DECL_FUNCTION_VERSIONED): New macro. >     (tree_function_decl): New bit-field versioned_function. >     * tree-pass.h (pass_dispatch_versions): New pass. >     * multiversion.c: New file. >     * multiversion.h: New file. >     * cgraphunit.c: Include multiversion.h >     (cgraph_finalize_function): Change assembler names of versioned >     functions. >     * cp/class.c: Include multiversion.h >     (add_method): aggregate function versions. Change assembler names of >     versioned functions. >     (resolve_address_of_overloaded_function): Match address of function >     version with default function.  Return address of ifunc dispatcher >     for address of versioned functions. >     * cp/decl.c (decls_match): Make decls unmatched for versioned >     functions. >     (duplicate_decls): Remove ambiguity for versioned functions. Notify >     of deleted function version decls. >     (start_decl): Change assembler name of versioned functions. >     (start_function): Change assembler name of versioned functions. >     (cxx_comdat_group): Make comdat group of versioned functions be the >     same. >     * cp/semantics.c (expand_or_defer_fn_1): Mark as needed versioned >     functions that are also marked inline. >     * cp/decl2.c: Include multiversion.h >     (check_classfn): Check attributes of versioned functions for match. >     * cp/call.c: Include multiversion.h >     (build_over_call): Make calls to multiversioned functions to call the >     dispatcher. >     (joust): For calls to multi-versioned functions, make the default >     function win. >     * timevar.def (TV_MULTIVERSION_DISPATCH): New time var. >     * varasm.c (finish_aliases_1): Check if the alias points to a function >     with a body before giving an error. >     * Makefile.in: Add multiversion.o >     * passes.c: Add pass_dispatch_versions to the pass list. >     * config/i386/i386.c (add_condition_to_bb): New function. >     (get_builtin_code_for_version): New function. >     (ix86_dispatch_version): New function. >     (TARGET_DISPATCH_VERSION): New macro. >     * testsuite/g++.dg/mv1.C: New test. > > Index: doc/tm.texi > =================================================================== > --- doc/tm.texi (revision 184971) > +++ doc/tm.texi (working copy) > @@ -10995,6 +10995,14 @@ The result is another tree containing a simplified >  call's result.  If @var{ignore} is true the value will be ignored. >  @end deftypefn > > +@deftypefn {Target Hook} int TARGET_DISPATCH_VERSION (tree @var{dispatch_decl}, void *@var{fndecls}, basic_block *@var{empty_bb}) > +For multi-versioned function, this hook sets up the dispatcher. > +@var{dispatch_decl} is the function that will be used to dispatch the > +version. @var{fndecls} are the function choices for dispatch. > +@var{empty_bb} is an basic block in @var{dispatch_decl} where the > +code to do the dispatch will be added. > +@end deftypefn > + >  @deftypefn {Target Hook} {const char *} TARGET_INVALID_WITHIN_DOLOOP (const_rtx @var{insn}) > >  Take an instruction in @var{insn} and return NULL if it is valid within a > Index: doc/tm.texi.in > =================================================================== > --- doc/tm.texi.in    (revision 184971) > +++ doc/tm.texi.in    (working copy) > @@ -10873,6 +10873,14 @@ The result is another tree containing a simplified >  call's result.  If @var{ignore} is true the value will be ignored. >  @end deftypefn > > +@hook TARGET_DISPATCH_VERSION > +For multi-versioned function, this hook sets up the dispatcher. > +@var{dispatch_decl} is the function that will be used to dispatch the > +version. @var{fndecls} are the function choices for dispatch. > +@var{empty_bb} is an basic block in @var{dispatch_decl} where the > +code to do the dispatch will be added. > +@end deftypefn > + >  @hook TARGET_INVALID_WITHIN_DOLOOP > >  Take an instruction in @var{insn} and return NULL if it is valid within a > Index: c-family/c-common.c > =================================================================== > --- c-family/c-common.c (revision 184971) > +++ c-family/c-common.c (working copy) > @@ -315,6 +315,7 @@ static tree check_case_value (tree); >  static bool check_case_bounds (tree, tree, tree *, tree *); > >  static tree handle_packed_attribute (tree *, tree, tree, int, bool *); > +static tree handle_targetv_attribute (tree *, tree, tree, int, bool *); >  static tree handle_nocommon_attribute (tree *, tree, tree, int, bool *); >  static tree handle_common_attribute (tree *, tree, tree, int, bool *); >  static tree handle_noreturn_attribute (tree *, tree, tree, int, bool *); > @@ -604,6 +605,8 @@ const struct attribute_spec c_common_attribute_tab >  { >  /* { name, min_len, max_len, decl_req, type_req, fn_type_req, handler, >     affects_type_identity } */ > +  { "targetv",        1, -1, true, false, false, > +               handle_targetv_attribute, false }, >  { "packed",         0, 0, false, false, false, >                handle_packed_attribute , false}, >  { "nocommon",        0, 0, true,  false, false, > @@ -5869,6 +5872,54 @@ handle_packed_attribute (tree *node, tree name, tr >  return NULL_TREE; >  } > > +/* The targetv attribue is used to specify a function version > +  targeted to specific platform types.  The "targetv" attributes > +  have to be valid "target" attributes.  NODE should always point > +  to a FUNCTION_DECL.  ARGS contain the arguments to "targetv" > +  which should be valid arguments to attribute "target" too. > +  Check handle_target_attribute for FLAGS and NO_ADD_ATTRS.  */ > + > +static tree > +handle_targetv_attribute (tree *node, tree name, > +             tree args, > +             int flags, > +             bool *no_add_attrs) > +{ > +  const char *attr_str = NULL; > +  gcc_assert (TREE_CODE (*node) == FUNCTION_DECL); > +  gcc_assert (args != NULL); > + > +  /* This is a function version.  */ > +  DECL_FUNCTION_VERSIONED (*node) = 1; > + > +  attr_str = TREE_STRING_POINTER (TREE_VALUE (args)); > + > +  /* Check if multiple sets of target attributes are there.  This > +   is not supported now.  In future, this will be supported by > +   cloning this function for each set.  */ > +  if (TREE_CHAIN (args) != NULL) > +   warning (OPT_Wattributes, "%qE attribute has multiple sets which " > +       "is not supported", name); > + > +  if (attr_str == NULL > +    || strstr (attr_str, "arch=") == NULL) > +   error_at (DECL_SOURCE_LOCATION (*node), > +       "Versioning supported only on \"arch=\" for now"); > + > +  /* targetv attributes must translate into target attributes.  */ > +  handle_target_attribute (node, get_identifier ("target"), args, flags, > +              no_add_attrs); > + > +  if (*no_add_attrs) > +   warning (OPT_Wattributes, "%qE attribute has no effect", name); > + > +  /* This is necessary to keep the attribute tagged to the decl > +   all the time.  */ > +  *no_add_attrs = false; > + > +  return NULL_TREE; > +} > + >  /* Handle a "nocommon" attribute; arguments as in >   struct attribute_spec.handler.  */ > > Index: target.def > =================================================================== > --- target.def  (revision 184971) > +++ target.def  (working copy) > @@ -1249,6 +1249,15 @@ DEFHOOK >  tree, (tree fndecl, int n_args, tree *argp, bool ignore), >  hook_tree_tree_int_treep_bool_null) > > +/* Target hook to generate the dispatching code for calls to multi-versioned > +  functions.  DISPATCH_DECL is the function that will have the dispatching > +  logic.  FNDECLS are the list of choices for dispatch and EMPTY_BB is the > +  basic bloc in DISPATCH_DECL which will contain the code.  */ > +DEFHOOK > +(dispatch_version, > + "", > + int, (tree dispatch_decl, void *fndecls, basic_block *empty_bb), NULL) > + >  /* Returns a code for a target-specific builtin that implements >   reciprocal of the function, or NULL_TREE if not available.  */ >  DEFHOOK > Index: tree.h > =================================================================== > --- tree.h    (revision 184971) > +++ tree.h    (working copy) > @@ -3532,6 +3532,12 @@ extern VEC(tree, gc) **decl_debug_args_insert (tre >  #define DECL_FUNCTION_SPECIFIC_OPTIMIZATION(NODE) \ >   (FUNCTION_DECL_CHECK (NODE)->function_decl.function_specific_optimization) > > +/* In FUNCTION_DECL, this is set if this function has other versions generated > +  using "targetv" attributes.  The default version is the one which does not > +  have any "targetv" attribute set. */ > +#define DECL_FUNCTION_VERSIONED(NODE)\ > +  (FUNCTION_DECL_CHECK (NODE)->function_decl.versioned_function) > + >  /* FUNCTION_DECL inherits from DECL_NON_COMMON because of the use of the >   arguments/result/saved_tree fields by front ends.  It was either inherit >   FUNCTION_DECL from non_common, or inherit non_common from FUNCTION_DECL, > @@ -3576,8 +3582,8 @@ struct GTY(()) tree_function_decl { >  unsigned looping_const_or_pure_flag : 1; >  unsigned has_debug_args_flag : 1; >  unsigned tm_clone_flag : 1; > - > -  /* 1 bit left */ > +  unsigned versioned_function : 1; > +  /* No bits left.  */ >  }; > >  /* The source language of the translation-unit.  */ > Index: tree-pass.h > =================================================================== > --- tree-pass.h (revision 184971) > +++ tree-pass.h (working copy) > @@ -455,6 +455,7 @@ extern struct gimple_opt_pass pass_tm_memopt; >  extern struct gimple_opt_pass pass_tm_edges; >  extern struct gimple_opt_pass pass_split_functions; >  extern struct gimple_opt_pass pass_feedback_split_functions; > +extern struct gimple_opt_pass pass_dispatch_versions; > >  /* IPA Passes */ >  extern struct simple_ipa_opt_pass pass_ipa_lower_emutls; > Index: multiversion.c > =================================================================== > --- multiversion.c    (revision 0) > +++ multiversion.c    (revision 0) > @@ -0,0 +1,798 @@ > +/* Function Multiversioning. > +  Copyright (C) 2012 Free Software Foundation, Inc. > +  Contributed by Sriraman Tallam (tmsriram@google.com) > + > +This file is part of GCC. > + > +GCC is free software; you can redistribute it and/or modify it under > +the terms of the GNU General Public License as published by the Free > +Software Foundation; either version 3, or (at your option) any later > +version. > + > +GCC is distributed in the hope that it will be useful, but WITHOUT ANY > +WARRANTY; without even the implied warranty of MERCHANTABILITY or > +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License > +for more details. > + > +You should have received a copy of the GNU General Public License > +along with GCC; see the file COPYING3.  If not see > +<http://www.gnu.org/licenses/>. */ > + > +/* Holds the state for multi-versioned functions here. The front-end > +  updates the state as and when function versions are encountered. > +  This is then used to generate the dispatch code.  Also, the > +  optimization passes to clone hot paths involving versioned functions > +  will be done here. > + > +  Function versions are created by using the same function signature but > +  also tagging attribute "targetv" to specify the platform type for which > +  the version must be executed.  Here is an example: > + > +  int foo () > +  { > +   printf ("Execute as default"); > +   return 0; > +  } > + > +  int  __attribute__ ((targetv ("arch=corei7"))) > +  foo () > +  { > +   printf ("Execute for corei7"); > +   return 0; > +  } > + > +  int main () > +  { > +   return foo (); > +  } > + > +  The call to foo in main is replaced with a call to an IFUNC function that > +  contains the dispatch code to call the correct function version at > +  run-time.  */ > + > + > +#include "config.h" > +#include "system.h" > +#include "coretypes.h" > +#include "tm.h" > +#include "tree.h" > +#include "tree-inline.h" > +#include "langhooks.h" > +#include "flags.h" > +#include "cgraph.h" > +#include "diagnostic.h" > +#include "toplev.h" > +#include "timevar.h" > +#include "params.h" > +#include "fibheap.h" > +#include "intl.h" > +#include "tree-pass.h" > +#include "hashtab.h" > +#include "coverage.h" > +#include "ggc.h" > +#include "tree-flow.h" > +#include "rtl.h" > +#include "ipa-prop.h" > +#include "basic-block.h" > +#include "toplev.h" > +#include "dbgcnt.h" > +#include "tree-dump.h" > +#include "output.h" > +#include "vecprim.h" > +#include "gimple-pretty-print.h" > +#include "ipa-inline.h" > +#include "target.h" > +#include "multiversion.h" > + > +typedef void * void_p; > + > +DEF_VEC_P (void_p); > +DEF_VEC_ALLOC_P (void_p, heap); > + > +/* Each function decl that is a function version gets an instance of this > +  structure.  Since this is called by the front-end, decl merging can > +  happen, where a decl created for a new declaration is merged with > +  the old. In this case, the new decl is deleted and the IS_DELETED > +  field is set for the struct instance corresponding to the new decl. > +  IFUNC_DECL is the decl of the ifunc function for default decls. > +  IFUNC_RESOLVER_DECL is the decl of the dispatch function.  VERSIONS > +  is a vector containing the list of function versions  that are > +  the candidates for dispatch.  */ > + > +typedef struct version_function_d { > +  tree decl; > +  tree ifunc_decl; > +  tree ifunc_resolver_decl; > +  VEC (void_p, heap) *versions; > +  bool is_deleted; > +} version_function; > + > +/* Hashmap has an entry for every function decl that has other function > +  versions.  For function decls that are the default, it also stores the > +  list of all the other function versions.  Each entry is a structure > +  of type version_function_d.  */ > +static htab_t decl_version_htab = NULL; > + > +/* Hashtable helpers for decl_version_htab. */ > + > +static hashval_t > +decl_version_htab_hash_descriptor (const void *p) > +{ > +  const version_function *t = (const version_function *) p; > +  return htab_hash_pointer (t->decl); > +} > + > +/* Hashtable helper for decl_version_htab. */ > + > +static int > +decl_version_htab_eq_descriptor (const void *p1, const void *p2) > +{ > +  const version_function *t1 = (const version_function *) p1; > +  return htab_eq_pointer ((const void_p) t1->decl, p2); > +} > + > +/* Create the decl_version_htab.  */ > +static void > +create_decl_version_htab (void) > +{ > +  if (decl_version_htab == NULL) > +   decl_version_htab = htab_create (10, decl_version_htab_hash_descriptor, > +                   decl_version_htab_eq_descriptor, NULL); > +} > + > +/* Creates an instance of version_function for decl DECL.  */ > + > +static version_function* > +new_version_function (const tree decl) > +{ > +  version_function *v; > +  v = (version_function *)xmalloc(sizeof (version_function)); > +  v->decl = decl; > +  v->ifunc_decl = NULL; > +  v->ifunc_resolver_decl = NULL; > +  v->versions = NULL; > +  v->is_deleted = false; > +  return v; > +} > + > +/* Comparator function to be used in qsort routine to sort attribute > +  specification strings to "targetv".  */ > + > +static int > +attr_strcmp (const void *v1, const void *v2) > +{ > +  const char *c1 = *(char *const*)v1; > +  const char *c2 = *(char *const*)v2; > +  return strcmp (c1, c2); > +} > + > +/* STR is the argument to targetv attribute.  This function tokenizes > +  the comma separated arguments, sorts them and returns a string which > +  is a unique identifier for the comma separated arguments.  */ > + > +static char * > +sorted_attr_string (const char *str) > +{ > +  char **args = NULL; > +  char *attr_str, *ret_str; > +  char *attr = NULL; > +  unsigned int argnum = 1; > +  unsigned int i; > + > +  for (i = 0; i < strlen (str); i++) > +   if (str[i] == ',') > +    argnum++; > + > +  attr_str = (char *)xmalloc (strlen (str) + 1); > +  strcpy (attr_str, str); > + > +  for (i = 0; i < strlen (attr_str); i++) > +   if (attr_str[i] == '=') > +    attr_str[i] = '_'; > + > +  if (argnum == 1) > +   return attr_str; > + > +  args = (char **)xmalloc (argnum * sizeof (char *)); > + > +  i = 0; > +  attr = strtok (attr_str, ","); > +  while (attr != NULL) > +   { > +    args[i] = attr; > +    i++; > +    attr = strtok (NULL, ","); > +   } > + > +  qsort (args, argnum, sizeof (char*), attr_strcmp); > + > +  ret_str = (char *)xmalloc (strlen (str) + 1); > +  strcpy (ret_str, args[0]); > +  for (i = 1; i < argnum; i++) > +   { > +    strcat (ret_str, "_"); > +    strcat (ret_str, args[i]); > +   } > + > +  free (args); > +  free (attr_str); > +  return ret_str; > +} > + > +/* Returns true when only one of DECL1 and DECL2 is marked with "targetv" > +  or if the "targetv" attribute strings of DECL1 and DECL2 dont match.  */ > + > +bool > +has_different_version_attributes (const tree decl1, const tree decl2) > +{ > +  tree attr1, attr2; > +  char *c1, *c2; > +  bool ret = false; > + > +  if (TREE_CODE (decl1) != FUNCTION_DECL > +    || TREE_CODE (decl2) != FUNCTION_DECL) > +   return false; > + > +  attr1 = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl1)); > +  attr2 = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl2)); > + > +  if (attr1 == NULL_TREE && attr2 == NULL_TREE) > +   return false; > + > +  if ((attr1 == NULL_TREE && attr2 != NULL_TREE) > +    || (attr1 != NULL_TREE && attr2 == NULL_TREE)) > +   return true; > + > +  c1 = sorted_attr_string ( > +    TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr1)))); > +  c2 = sorted_attr_string ( > +    TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr2)))); > + > +  if (strcmp (c1, c2) != 0) > +   ret = true; > + > +  free (c1); > +  free (c2); > + > +  return ret; > +} > + > +/* If this decl corresponds to a function and has "targetv" attribute, > +  append the attribute string to its assembler name.  */ > + > +void > +version_assembler_name (const tree decl) > +{ > +  tree version_attr; > +  const char *orig_name, *version_string, *attr_str; > +  char *assembler_name; > +  tree assembler_name_tree; > + > +  if (TREE_CODE (decl) != FUNCTION_DECL > +    || DECL_ASSEMBLER_NAME_SET_P (decl) > +    || !DECL_FUNCTION_VERSIONED (decl)) > +   return; > + > +  if (DECL_DECLARED_INLINE_P (decl) > +    &&lookup_attribute ("gnu_inline", > +             DECL_ATTRIBUTES (decl))) > +   error_at (DECL_SOURCE_LOCATION (decl), > +       "Function versions cannot be marked as gnu_inline," > +       " bodies have to be generated\n"); > + > +  if (DECL_VIRTUAL_P (decl) > +    || DECL_VINDEX (decl)) > +   error_at (DECL_SOURCE_LOCATION (decl), > +       "Virtual function versioning not supported\n"); > + > +  version_attr = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl)); > +  /* targetv attribute string is NULL for default functions.  */ > +  if (version_attr == NULL_TREE) > +   return; > + > +  orig_name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)); > +  version_string > +   = TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (version_attr))); > + > +  attr_str = sorted_attr_string (version_string); > +  assembler_name = (char *) xmalloc (strlen (orig_name) > +                   + strlen (attr_str) + 2); > + > +  sprintf (assembler_name, "%s.%s", orig_name, attr_str); > +  if (dump_file) > +   fprintf (dump_file, "Assembler name set to %s for function version %s\n", > +       assembler_name, IDENTIFIER_POINTER (DECL_NAME (decl))); > +  assembler_name_tree = get_identifier (assembler_name); > +  SET_DECL_ASSEMBLER_NAME (decl, assembler_name_tree); > +} > + > +/* Returns true if decl is multi-versioned and DECL is the default function, > +  that is it is not tagged with "targetv" attribute.  */ > + > +bool > +is_default_function (const tree decl) > +{ > +  return (TREE_CODE (decl) == FUNCTION_DECL > +     && DECL_FUNCTION_VERSIONED (decl) > +     && (lookup_attribute ("targetv", DECL_ATTRIBUTES (decl)) > +       == NULL_TREE)); > +} > + > +/* For function decl DECL, find the version_function struct in the > +  decl_version_htab.  */ > + > +static version_function * > +find_function_version (const tree decl) > +{ > +  void *slot; > + > +  if (!DECL_FUNCTION_VERSIONED (decl)) > +   return NULL; > + > +  if (!decl_version_htab) > +   return NULL; > + > +  slot = htab_find_with_hash (decl_version_htab, decl, > +                htab_hash_pointer (decl)); > + > +  if (slot != NULL) > +   return (version_function *)slot; > + > +  return NULL; > +} > + > +/* Record DECL as a function version by creating a version_function struct > +  for it and storing it in the hashtable.  */ > + > +static version_function * > +add_function_version (const tree decl) > +{ > +  void **slot; > +  version_function *v; > + > +  if (!DECL_FUNCTION_VERSIONED (decl)) > +   return NULL; > + > +  create_decl_version_htab (); > + > +  slot = htab_find_slot_with_hash (decl_version_htab, (const void_p)decl, > +                  htab_hash_pointer ((const void_p)decl), > +                  INSERT); > + > +  if (*slot != NULL) > +   return (version_function *)*slot; > + > +  v = new_version_function (decl); > +  *slot = v; > + > +  return v; > +} > + > +/* Push V into VEC only if it is not already present.  */ > + > +static void > +push_function_version (version_function *v, VEC (void_p, heap) *vec) > +{ > +  int ix; > +  void_p ele; > +  for (ix = 0; VEC_iterate (void_p, vec, ix, ele); ++ix) > +   { > +    if (ele == (void_p)v) > +     return; > +   } > + > +  VEC_safe_push (void_p, heap, vec, (void*)v); > +} > + > +/* Mark DECL as deleted.  This is called by the front-end when a duplicate > +  decl is merged with the original decl and the duplicate decl is deleted. > +  This function marks the duplicate_decl as invalid.  Called by > +  duplicate_decls in cp/decl.c.  */ > + > +void > +mark_delete_decl_version (const tree decl) > +{ > +  version_function *decl_v; > + > +  decl_v = find_function_version (decl); > + > +  if (decl_v == NULL) > +   return; > + > +  decl_v->is_deleted = true; > + > +  if (is_default_function (decl) > +    && decl_v->versions != NULL) > +   { > +    VEC_truncate (void_p, decl_v->versions, 0); > +    VEC_free (void_p, heap, decl_v->versions); > +   } > +} > + > +/* Mark DECL1 and DECL2 to be function versions in the same group.  One > +  of DECL1 and DECL2 must be the default, otherwise this function does > +  nothing.  This function aggregates the versions.  */ > + > +int > +group_function_versions (const tree decl1, const tree decl2) > +{ > +  tree default_decl, version_decl; > +  version_function *default_v, *version_v; > + > +  gcc_assert (DECL_FUNCTION_VERSIONED (decl1) > +       && DECL_FUNCTION_VERSIONED (decl2)); > + > +  /* The version decls are added only to the default decl.  */ > +  if (!is_default_function (decl1) > +    && !is_default_function (decl2)) > +   return 0; > + > +  /* This can happen with duplicate declarations.  Just ignore.  */ > +  if (is_default_function (decl1) > +    && is_default_function (decl2)) > +   return 0; > + > +  default_decl = (is_default_function (decl1)) ? decl1 : decl2; > +  version_decl = (default_decl == decl1) ? decl2 : decl1; > + > +  gcc_assert (default_decl != version_decl); > +  create_decl_version_htab (); > + > +  /* If the version function is found, it has been added.  */ > +  if (find_function_version (version_decl)) > +   return 0; > + > +  default_v = add_function_version (default_decl); > +  version_v = add_function_version (version_decl); > + > +  if (default_v->versions == NULL) > +   default_v->versions = VEC_alloc (void_p, heap, 1); > + > +  push_function_version (version_v, default_v->versions); > +  return 0; > +} > + > +/* Makes a function attribute of the form NAME(ARG_NAME) and chains > +  it to CHAIN.  */ > + > +static tree > +make_attribute (const char *name, const char *arg_name, tree chain) > +{ > +  tree attr_name; > +  tree attr_arg_name; > +  tree attr_args; > +  tree attr; > + > +  attr_name = get_identifier (name); > +  attr_arg_name = build_string (strlen (arg_name), arg_name); > +  attr_args = tree_cons (NULL_TREE, attr_arg_name, NULL_TREE); > +  attr = tree_cons (attr_name, attr_args, chain); > +  return attr; > +} > + > +/* Return a new name by appending SUFFIX to the DECL name.  If > +  make_unique is true, append the full path name.  */ > + > +static char * > +make_name (tree decl, const char *suffix, bool make_unique) > +{ > +  char *global_var_name; > +  int name_len; > +  const char *name; > +  const char *unique_name = NULL; > + > +  name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)); > + > +  /* Get a unique name that can be used globally without any chances > +   of collision at link time.  */ > +  if (make_unique) > +   unique_name = IDENTIFIER_POINTER (get_file_function_name ("\0")); > + > +  name_len = strlen (name) + strlen (suffix) + 2; > + > +  if (make_unique) > +   name_len += strlen (unique_name) + 1; > +  global_var_name = (char *) xmalloc (name_len); > + > +  /* Use '.' to concatenate names as it is demangler friendly.  */ > +  if (make_unique) > +    snprintf (global_var_name, name_len, "%s.%s.%s", name, > +        unique_name, suffix); > +  else > +    snprintf (global_var_name, name_len, "%s.%s", name, suffix); > + > +  return global_var_name; > +} > + > +/* Make the resolver function decl for ifunc (IFUNC_DECL) to dispatch > +  the versions of multi-versioned function DEFAULT_DECL.  Create and > +  empty basic block in the resolver and store the pointer in > +  EMPTY_BB.  Return the decl of the resolver function.  */ > + > +static tree > +make_ifunc_resolver_func (const tree default_decl, > +             const tree ifunc_decl, > +             basic_block *empty_bb) > +{ > +  char *resolver_name; > +  tree decl, type, decl_name, t; > +  basic_block new_bb; > +  tree old_current_function_decl; > +  bool make_unique = false; > + > +  /* IFUNC's have to be globally visible.  So, if the default_decl is > +   not, then the name of the IFUNC should be made unique.  */ > +  if (TREE_PUBLIC (default_decl) == 0) > +   make_unique = true; > + > +  /* Append the filename to the resolver function if the versions are > +   not externally visible.  This is because the resolver function has > +   to be externally visible for the loader to find it.  So, appending > +   the filename will prevent conflicts with a resolver function from > +   another module which is based on the same version name.  */ > +  resolver_name = make_name (default_decl, "resolver", make_unique); > + > +  /* The resolver function should return a (void *). */ > +  type = build_function_type_list (ptr_type_node, NULL_TREE); > + > +  decl = build_fn_decl (resolver_name, type); > +  decl_name = get_identifier (resolver_name); > +  SET_DECL_ASSEMBLER_NAME (decl, decl_name); > + > +  DECL_NAME (decl) = decl_name; > +  TREE_USED (decl) = TREE_USED (default_decl); > +  DECL_ARTIFICIAL (decl) = 1; > +  DECL_IGNORED_P (decl) = 0; > +  /* IFUNC resolvers have to be externally visible.  */ > +  TREE_PUBLIC (decl) = 1; > +  DECL_UNINLINABLE (decl) = 1; > + > +  DECL_EXTERNAL (decl) = DECL_EXTERNAL (default_decl); > +  DECL_EXTERNAL (ifunc_decl) = 0; > + > +  DECL_CONTEXT (decl) = NULL_TREE; > +  DECL_INITIAL (decl) = make_node (BLOCK); > +  DECL_STATIC_CONSTRUCTOR (decl) = 0; > +  TREE_READONLY (decl) = 0; > +  DECL_PURE_P (decl) = 0; > +  DECL_COMDAT (decl) = DECL_COMDAT (default_decl); > +  if (DECL_COMDAT_GROUP (default_decl)) > +   { > +    make_decl_one_only (decl, DECL_COMDAT_GROUP (default_decl)); > +   } > +  /* Build result decl and add to function_decl. */ > +  t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE, ptr_type_node); > +  DECL_ARTIFICIAL (t) = 1; > +  DECL_IGNORED_P (t) = 1; > +  DECL_RESULT (decl) = t; > + > +  gimplify_function_tree (decl); > +  old_current_function_decl = current_function_decl; > +  push_cfun (DECL_STRUCT_FUNCTION (decl)); > +  current_function_decl = decl; > +  init_empty_tree_cfg_for_function (DECL_STRUCT_FUNCTION (decl)); > +  cfun->curr_properties |= > +   (PROP_gimple_lcf | PROP_gimple_leh | PROP_cfg | PROP_referenced_vars | > +   PROP_ssa); > +  new_bb = create_empty_bb (ENTRY_BLOCK_PTR); > +  make_edge (ENTRY_BLOCK_PTR, new_bb, EDGE_FALLTHRU); > +  make_edge (new_bb, EXIT_BLOCK_PTR, 0); > +  *empty_bb = new_bb; > + > +  cgraph_add_new_function (decl, true); > +  cgraph_call_function_insertion_hooks (cgraph_get_create_node (decl)); > +  cgraph_analyze_function (cgraph_get_create_node (decl)); > +  cgraph_mark_needed_node (cgraph_get_create_node (decl)); > + > +  if (DECL_COMDAT_GROUP (default_decl)) > +   { > +    gcc_assert (cgraph_get_node (default_decl)); > +    cgraph_add_to_same_comdat_group (cgraph_get_node (decl), > +                    cgraph_get_node (default_decl)); > +   } > + > +  pop_cfun (); > +  current_function_decl = old_current_function_decl; > + > +  gcc_assert (ifunc_decl != NULL); > +  DECL_ATTRIBUTES (ifunc_decl) > +   = make_attribute ("ifunc", resolver_name, DECL_ATTRIBUTES (ifunc_decl)); > +  assemble_alias (ifunc_decl, get_identifier (resolver_name)); > +  return decl; > +} > + > +/* Make and ifunc declaration for the multi-versioned function DECL.  Calls to > +  DECL function will be replaced with calls to the ifunc.  Return the decl > +  of the ifunc created.  */ > + > +static tree > +make_ifunc_func (const tree decl) > +{ > +  tree ifunc_decl; > +  char *ifunc_name, *resolver_name; > +  tree fn_type, ifunc_type; > +  bool make_unique = false; > + > +  if (TREE_PUBLIC (decl) == 0) > +   make_unique = true; > + > +  ifunc_name = make_name (decl, "ifunc", make_unique); > +  resolver_name = make_name (decl, "resolver", make_unique); > +  gcc_assert (resolver_name); > + > +  fn_type = TREE_TYPE (decl); > +  ifunc_type = build_function_type (TREE_TYPE (fn_type), > +                  TYPE_ARG_TYPES (fn_type)); > + > +  ifunc_decl = build_fn_decl (ifunc_name, ifunc_type); > +  TREE_USED (ifunc_decl) = 1; > +  DECL_CONTEXT (ifunc_decl) = NULL_TREE; > +  DECL_INITIAL (ifunc_decl) = error_mark_node; > +  DECL_ARTIFICIAL (ifunc_decl) = 1; > +  /* Mark this ifunc as external, the resolver will flip it again if > +   it gets generated.  */ > +  DECL_EXTERNAL (ifunc_decl) = 1; > +  /* IFUNCs have to be externally visible.  */ > +  TREE_PUBLIC (ifunc_decl) = 1; > + > +  return ifunc_decl; > +} > + > +/* For multi-versioned function decl, which should also be the default, > +  return the decl of the ifunc resolver, create it if it does not > +  exist.  */ > + > +tree > +get_ifunc_for_version (const tree decl) > +{ > +  version_function *decl_v; > +  int ix; > +  void_p ele; > + > +  /* DECL has to be the default version, otherwise it is missing and > +   that is not allowed.  */ > +  if (!is_default_function (decl)) > +   { > +    error_at (DECL_SOURCE_LOCATION (decl), "Default version not found"); > +    return decl; > +   } > + > +  decl_v = find_function_version (decl); > +  gcc_assert (decl_v != NULL); > +  if (decl_v->ifunc_decl == NULL) > +   { > +    tree ifunc_decl; > +    ifunc_decl = make_ifunc_func (decl); > +    decl_v->ifunc_decl = ifunc_decl; > +   } > + > +  if (cgraph_get_node (decl)) > +   cgraph_mark_needed_node (cgraph_get_node (decl)); > + > +  for (ix = 0; VEC_iterate (void_p, decl_v->versions, ix, ele); ++ix) > +   { > +    version_function *v = (version_function *) ele; > +    gcc_assert (v->decl != NULL); > +    if (cgraph_get_node (v->decl)) > +    cgraph_mark_needed_node (cgraph_get_node (v->decl)); > +   } > + > +  return decl_v->ifunc_decl; > +} > + > +/* Generate the dispatching code to dispatch multi-versioned function > +  DECL.  Make a new function decl for dispatching and call the target > +  hook to process the "targetv" attributes and provide the code to > +  dispatch the right function at run-time.  */ > + > +static tree > +make_ifunc_resolver_for_version (const tree decl) > +{ > +  version_function *decl_v; > +  tree ifunc_resolver_decl, ifunc_decl; > +  basic_block empty_bb; > +  int ix; > +  void_p ele; > +  VEC (tree, heap) *fn_ver_vec = NULL; > + > +  gcc_assert (is_default_function (decl)); > + > +  decl_v = find_function_version (decl); > +  gcc_assert (decl_v != NULL); > + > +  if (decl_v->ifunc_resolver_decl != NULL) > +   return decl_v->ifunc_resolver_decl; > + > +  ifunc_decl = decl_v->ifunc_decl; > + > +  if (ifunc_decl == NULL) > +   ifunc_decl = decl_v->ifunc_decl = make_ifunc_func (decl); > + > +  ifunc_resolver_decl = make_ifunc_resolver_func (decl, ifunc_decl, > +                         &empty_bb); > + > +  fn_ver_vec = VEC_alloc (tree, heap, 2); > +  VEC_safe_push (tree, heap, fn_ver_vec, decl); > + > +  for (ix = 0; VEC_iterate (void_p, decl_v->versions, ix, ele); ++ix) > +   { > +    version_function *v = (version_function *) ele; > +    gcc_assert (v->decl != NULL); > +    /* Check for virtual functions here again, as by this time it should > +     have been determined if this function needs a vtable index or > +     not.  This happens for methods in derived classes that override > +     virtual methods in base classes but are not explicitly marked as > +     virtual.  */ > +    if (DECL_VINDEX (v->decl)) > +     error_at (DECL_SOURCE_LOCATION (v->decl), > +         "Virtual function versioning not supported\n"); > +    if (!v->is_deleted) > +    VEC_safe_push (tree, heap, fn_ver_vec, v->decl); > +   } > + > +  gcc_assert (targetm.dispatch_version); > +  targetm.dispatch_version (ifunc_resolver_decl, fn_ver_vec, &empty_bb); > +  decl_v->ifunc_resolver_decl = ifunc_resolver_decl; > + > +  return ifunc_resolver_decl; > +} > + > +/* Main entry point to pass_dispatch_versions. For multi-versioned functions, > +  generate the dispatching code.  */ > + > +static unsigned int > +do_dispatch_versions (void) > +{ > +  /* A new pass for generating dispatch code for multi-versioned functions. > +   Other forms of dispatch can be added when ifunc support is not available > +   like just calling the function directly after checking for target type. > +   Currently, dispatching is done through IFUNC.  This pass will become > +   more meaningful when other dispatch mechanisms are added.  */ > + > +  /* Cloning a function to produce more versions will happen here when the > +   user requests that via the targetv attribute. For example, > +   int foo () __attribute__ ((targetv(("arch=core2"), ("arch=corei7")))); > +   means that the user wants the same body of foo to be versioned for core2 > +   and corei7.  In that case, this function will be cloned during this > +   pass.  */ > + > +  if (DECL_FUNCTION_VERSIONED (current_function_decl) > +    && is_default_function (current_function_decl)) > +   { > +    tree decl = make_ifunc_resolver_for_version (current_function_decl); > +    if (dump_file && decl) > +    dump_function_to_file (decl, dump_file, TDF_BLOCKS); > +   } > +  return 0; > +} > + > +static  bool > +gate_dispatch_versions (void) > +{ > +  return true; > +} > + > +/* A pass to generate the dispatch code to execute the appropriate version > +  of a multi-versioned function at run-time.  */ > + > +struct gimple_opt_pass pass_dispatch_versions = > +{ > + { > +  GIMPLE_PASS, > +  "dispatch_multiversion_functions",   /* name */ > +  gate_dispatch_versions,        /* gate */ > +  do_dispatch_versions,             /* execute */ > +  NULL,                     /* sub */ > +  NULL,                     /* next */ > +  0,                  /* static_pass_number */ > +  TV_MULTIVERSION_DISPATCH,       /* tv_id */ > +  PROP_cfg,               /* properties_required */ > +  PROP_cfg,               /* properties_provided */ > +  0,                  /* properties_destroyed */ > +  0,                  /* todo_flags_start */ > +  TODO_dump_func |           /* todo_flags_finish */ > +  TODO_cleanup_cfg | TODO_dump_cgraph > + } > +}; > Index: cgraphunit.c > =================================================================== > --- cgraphunit.c     (revision 184971) > +++ cgraphunit.c     (working copy) > @@ -141,6 +141,7 @@ along with GCC; see the file COPYING3.  If not see >  #include "ipa-inline.h" >  #include "ipa-utils.h" >  #include "lto-streamer.h" > +#include "multiversion.h" > >  static void cgraph_expand_all_functions (void); >  static void cgraph_mark_functions_to_output (void); > @@ -343,6 +344,13 @@ cgraph_finalize_function (tree decl, bool nested) >    node->local.redefined_extern_inline = true; >   } > > +  /* If this is a function version and not the default, change the > +   assembler name of this function.  The DECL names of function > +   versions are the same, only the assembler names are made unique. > +   The assembler name is changed by appending the string from > +   the "targetv" attribute.  */ > +  version_assembler_name (decl); > + >  notice_global_symbol (decl); >  node->local.finalized = true; >  node->lowered = DECL_STRUCT_FUNCTION (decl)->cfg != NULL; > Index: multiversion.h > =================================================================== > --- multiversion.h    (revision 0) > +++ multiversion.h    (revision 0) > @@ -0,0 +1,52 @@ > +/* Function Multiversioning. > +  Copyright (C) 2012 Free Software Foundation, Inc. > +  Contributed by Sriraman Tallam (tmsriram@google.com) > + > +This file is part of GCC. > + > +GCC is free software; you can redistribute it and/or modify it under > +the terms of the GNU General Public License as published by the Free > +Software Foundation; either version 3, or (at your option) any later > +version. > + > +GCC is distributed in the hope that it will be useful, but WITHOUT ANY > +WARRANTY; without even the implied warranty of MERCHANTABILITY or > +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License > +for more details. > + > +You should have received a copy of the GNU General Public License > +along with GCC; see the file COPYING3.  If not see > +<http://www.gnu.org/licenses/>. */ > + > +/* This is the header file which provides the functions to keep track > +  of functions that are multi-versioned and to generate the dispatch > +  code to call the right version at run-time.  */ > + > +#ifndef GCC_MULTIVERSION_H > +#define GCC_MULTIVERION_H > + > +#include "tree.h" > + > +/* Mark DECL1 and DECL2 as function versions.  */ > +int group_function_versions (const tree decl1, const tree decl2); > + > +/* Mark DECL as deleted and no longer a version.  */ > +void mark_delete_decl_version (const tree decl); > + > +/* Returns true if DECL is the default version to be executed if all > +  other versions are inappropriate at run-time.  */ > +bool is_default_function (const tree decl); > + > +/* Gets the IFUNC dispatcher for this multi-versioned function DECL. DECL > +  must be the default function in the multi-versioned group.  */ > +tree get_ifunc_for_version (const tree decl); > + > +/* Returns true when only one of DECL1 and DECL2 is marked with "targetv" > +  or if the "targetv" attribute strings of  DECL1 and DECL2 dont match.  */ > +bool has_different_version_attributes (const tree decl1, const tree decl2); > + > +/* If DECL is a function version and not the default version, the assembler > +  name of DECL is changed to include the attribute string to keep the > +  name unambiguous.  */ > +void version_assembler_name (const tree decl); > +#endif > Index: cp/class.c > =================================================================== > --- cp/class.c  (revision 184971) > +++ cp/class.c  (working copy) > @@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.  If not see >  #include "tree-dump.h" >  #include "splay-tree.h" >  #include "pointer-set.h" > +#include "multiversion.h" > >  /* The number of nested classes being processed.  If we are not in the >   scope of any class, this is zero.  */ > @@ -1092,7 +1093,20 @@ add_method (tree type, tree method, tree using_dec >        || same_type_p (TREE_TYPE (fn_type), >                TREE_TYPE (method_type)))) >     { > -     if (using_decl) > +     /* For function versions, their parms and types match > +       but they are not duplicates.  Record function versions > +       as and when they are found.  */ > +     if (TREE_CODE (fn) == FUNCTION_DECL > +       && TREE_CODE (method) == FUNCTION_DECL > +       && (DECL_FUNCTION_VERSIONED (fn) > +         || DECL_FUNCTION_VERSIONED (method))) > +      { > +       DECL_FUNCTION_VERSIONED (fn) = 1; > +       DECL_FUNCTION_VERSIONED (method) = 1; > +       group_function_versions (fn, method); > +       continue; > +      } > +     else if (using_decl) >       { >        if (DECL_CONTEXT (fn) == type) >         /* Defer to the local function.  */ > @@ -1150,6 +1164,13 @@ add_method (tree type, tree method, tree using_dec >  else >   /* Replace the current slot.  */ >   VEC_replace (tree, method_vec, slot, overload); > + > +  /* Change the assembler name of method here if it has "targetv" > +   attributes.  Since all versions have the same mangled name, > +   their assembler name is changed by appending the string from > +   the "targetv" attribute. */ > +  version_assembler_name (method); > + >  return true; >  } > > @@ -6890,8 +6911,11 @@ resolve_address_of_overloaded_function (tree targe >      if (DECL_ANTICIPATED (fn)) >       continue; > > -     /* See if there's a match.  */ > -     if (same_type_p (target_fn_type, static_fn_type (fn))) > +     /* See if there's a match.  For functions that are multi-versioned > +       match it to the default function.  */ > +     if (same_type_p (target_fn_type, static_fn_type (fn)) > +       && (!DECL_FUNCTION_VERSIONED (fn) > +         || is_default_function (fn))) >       matches = tree_cons (fn, NULL_TREE, matches); >     } >   } > @@ -7053,6 +7077,21 @@ resolve_address_of_overloaded_function (tree targe >    perform_or_defer_access_check (access_path, fn, fn); >   } > > +  /* If a pointer to a function that is multi-versioned is requested, the > +   pointer to the dispatcher function is returned instead.  This works > +   well because indirectly calling the function will dispatch the right > +   function version at run-time. Also, the function address is kept > +   unique.  */ > +  if (DECL_FUNCTION_VERSIONED (fn) > +    && is_default_function (fn)) > +   { > +    tree ifunc_decl; > +    ifunc_decl = get_ifunc_for_version (fn); > +    gcc_assert (ifunc_decl != NULL); > +    mark_used (fn); > +    return build_fold_addr_expr (ifunc_decl); > +   } > + >  if (TYPE_PTRFN_P (target_type) || TYPE_PTRMEMFUNC_P (target_type)) >   return cp_build_addr_expr (fn, flags); >  else > Index: cp/decl.c > =================================================================== > --- cp/decl.c  (revision 184971) > +++ cp/decl.c  (working copy) > @@ -54,6 +54,7 @@ along with GCC; see the file COPYING3.  If not see >  #include "pointer-set.h" >  #include "splay-tree.h" >  #include "plugin.h" > +#include "multiversion.h" > >  /* Possible cases of bad specifiers type used by bad_specifiers. */ >  enum bad_spec_place { > @@ -972,6 +973,23 @@ decls_match (tree newdecl, tree olddecl) >    if (t1 != t2) >     return 0; > > +    /* The decls dont match if they correspond to two different versions > +     of the same function.  */ > +    if (compparms (p1, p2) > +     && same_type_p (TREE_TYPE (f1), TREE_TYPE (f2)) > +     && (DECL_FUNCTION_VERSIONED (newdecl) > +       || DECL_FUNCTION_VERSIONED (olddecl)) > +     && has_different_version_attributes (newdecl, olddecl)) > +    { > +     /* One of the decls could be the default without the "targetv" > +       attribute. Set it to be a versioned function here.  */ > +     DECL_FUNCTION_VERSIONED (newdecl) = 1; > +     DECL_FUNCTION_VERSIONED (olddecl) = 1; > +     /* Accumulate all the versions of a function.  */ > +     group_function_versions (olddecl, newdecl); > +     return 0; > +    } > + >    if (CP_DECL_CONTEXT (newdecl) != CP_DECL_CONTEXT (olddecl) >      && ! (DECL_EXTERN_C_P (newdecl) >         && DECL_EXTERN_C_P (olddecl))) > @@ -1482,7 +1500,11 @@ duplicate_decls (tree newdecl, tree olddecl, bool >        error ("previous declaration %q+#D here", olddecl); >        return NULL_TREE; >       } > -     else if (compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)), > +     /* For function versions, params and types match, but they > +       are not ambiguous.  */ > +     else if ((!DECL_FUNCTION_VERSIONED (newdecl) > +          && !DECL_FUNCTION_VERSIONED (olddecl)) > +          && compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)), >                TYPE_ARG_TYPES (TREE_TYPE (olddecl)))) >       { >        error ("new declaration %q#D", newdecl); > @@ -2250,6 +2272,16 @@ duplicate_decls (tree newdecl, tree olddecl, bool >  else if (DECL_PRESERVE_P (newdecl)) >   DECL_PRESERVE_P (olddecl) = 1; > > +  /* If the olddecl is a version, so is the newdecl.  */ > +  if (TREE_CODE (newdecl) == FUNCTION_DECL > +    && DECL_FUNCTION_VERSIONED (olddecl)) > +   { > +    DECL_FUNCTION_VERSIONED (newdecl) = 1; > +    /* Record that newdecl is not a valid version and has > +     been deleted.  */ > +    mark_delete_decl_version (newdecl); > +   } > + >  if (TREE_CODE (newdecl) == FUNCTION_DECL) >   { >    int function_size; > @@ -4512,6 +4544,10 @@ start_decl (const cp_declarator *declarator, >  /* Enter this declaration into the symbol table.  */ >  decl = maybe_push_decl (decl); > > +  /* If this decl is a function version and not the default, its assembler > +   name has to be changed.  */ > +  version_assembler_name (decl); > + >  if (processing_template_decl) >   decl = push_template_decl (decl); >  if (decl == error_mark_node) > @@ -13019,6 +13055,10 @@ start_function (cp_decl_specifier_seq *declspecs, >   gcc_assert (same_type_p (TREE_TYPE (TREE_TYPE (decl1)), >               integer_type_node)); > > +  /* If this decl is a function version and not the default, its assembler > +   name has to be changed.  */ > +  version_assembler_name (decl1); > + >  start_preparsed_function (decl1, attrs, /*flags=*/SF_DEFAULT); > >  return 1; > @@ -13960,6 +14000,11 @@ cxx_comdat_group (tree decl) >       break; >     } >    name = DECL_ASSEMBLER_NAME (decl); > +    if (TREE_CODE (decl) == FUNCTION_DECL > +     && DECL_FUNCTION_VERSIONED (decl)) > +    name = DECL_NAME (decl); > +    else > +     name = DECL_ASSEMBLER_NAME (decl); >   } > >  return name; > Index: cp/semantics.c > =================================================================== > --- cp/semantics.c    (revision 184971) > +++ cp/semantics.c    (working copy) > @@ -3783,8 +3783,11 @@ expand_or_defer_fn_1 (tree fn) >    /* If the user wants us to keep all inline functions, then mark >     this function as needed so that finish_file will make sure to >     output it later.  Similarly, all dllexport'd functions must > -     be emitted; there may be callers in other DLLs.  */ > -    if ((flag_keep_inline_functions > +     be emitted; there may be callers in other DLLs. > +     Also, mark this function as needed if it is marked inline but > +     is a multi-versioned function.  */ > +    if (((flag_keep_inline_functions > +      || DECL_FUNCTION_VERSIONED (fn)) >      && DECL_DECLARED_INLINE_P (fn) >      && !DECL_REALLY_EXTERN (fn)) >      || (flag_keep_inline_dllexport > Index: cp/decl2.c > =================================================================== > --- cp/decl2.c  (revision 184971) > +++ cp/decl2.c  (working copy) > @@ -53,6 +53,7 @@ along with GCC; see the file COPYING3.  If not see >  #include "splay-tree.h" >  #include "langhooks.h" >  #include "c-family/c-ada-spec.h" > +#include "multiversion.h" > >  extern cpp_reader *parse_in; > > @@ -674,9 +675,13 @@ check_classfn (tree ctype, tree function, tree tem >      if (is_template != (TREE_CODE (fndecl) == TEMPLATE_DECL)) >       continue; > > +     /* While finding a match, same types and params are not enough > +       if the function is versioned.  Also check version ("targetv") > +       attributes.  */ >      if (same_type_p (TREE_TYPE (TREE_TYPE (function)), >              TREE_TYPE (TREE_TYPE (fndecl))) >        && compparms (p1, p2) > +       && !has_different_version_attributes (function, fndecl) >        && (!is_template >          || comp_template_parms (template_parms, >                      DECL_TEMPLATE_PARMS (fndecl))) > Index: cp/call.c > =================================================================== > --- cp/call.c  (revision 184971) > +++ cp/call.c  (working copy) > @@ -41,6 +41,7 @@ along with GCC; see the file COPYING3.  If not see >  #include "langhooks.h" >  #include "c-family/c-objc.h" >  #include "timevar.h" > +#include "multiversion.h" > >  /* The various kinds of conversion.  */ > > @@ -6730,6 +6731,17 @@ build_over_call (struct z_candidate *cand, int fla >  if (!already_used) >   mark_used (fn); > > +  /* For a call to a multi-versioned function, the call should actually be to > +   the dispatcher.  */ > +  if (DECL_FUNCTION_VERSIONED (fn)) > +   { > +    tree ifunc_decl; > +    ifunc_decl = get_ifunc_for_version (fn); > +    gcc_assert (ifunc_decl != NULL); > +    return build_call_expr_loc_array (UNKNOWN_LOCATION, ifunc_decl, > +                    nargs, argarray); > +   } > + >  if (DECL_VINDEX (fn) && (flags & LOOKUP_NONVIRTUAL) == 0) >   { >    tree t; > @@ -7980,6 +7992,30 @@ joust (struct z_candidate *cand1, struct z_candida >  size_t i; >  size_t len; > > +  /* For Candidates of a multi-versioned function, the one marked default > +   wins.  This is because the default decl is used as key to aggregate > +   all the other versions provided for it in multiversion.c.  When > +   generating the actual call, the appropriate dispatcher is created > +   to call the right function version at run-time.  */ > + > +  if ((TREE_CODE (cand1->fn) == FUNCTION_DECL > +    && DECL_FUNCTION_VERSIONED (cand1->fn)) > +    ||(TREE_CODE (cand2->fn) == FUNCTION_DECL > +     && DECL_FUNCTION_VERSIONED (cand2->fn))) > +   { > +    if (is_default_function (cand1->fn)) > +    { > +      mark_used (cand2->fn); > +     return 1; > +    } > +    if (is_default_function (cand2->fn)) > +    { > +      mark_used (cand1->fn); > +     return -1; > +    } > +    return 0; > +   } > + >  /* Candidates that involve bad conversions are always worse than those >    that don't.  */ >  if (cand1->viable > cand2->viable) > Index: timevar.def > =================================================================== > --- timevar.def (revision 184971) > +++ timevar.def (working copy) > @@ -253,6 +253,7 @@ DEFTIMEVAR (TV_TREE_IFCOMBINE     , "tree if-co >  DEFTIMEVAR (TV_TREE_UNINIT      , "uninit var analysis") >  DEFTIMEVAR (TV_PLUGIN_INIT      , "plugin initialization") >  DEFTIMEVAR (TV_PLUGIN_RUN       , "plugin execution") > +DEFTIMEVAR (TV_MULTIVERSION_DISPATCH , "multiversion dispatch") > >  /* Everything else in rest_of_compilation not included above.  */ >  DEFTIMEVAR (TV_EARLY_LOCAL      , "early local passes") > Index: varasm.c > =================================================================== > --- varasm.c   (revision 184971) > +++ varasm.c   (working copy) > @@ -5755,6 +5755,8 @@ finish_aliases_1 (void) >     } >    else if (! (p->emitted_diags & ALIAS_DIAG_TO_EXTERN) >        && DECL_EXTERNAL (target_decl) > +        && (!TREE_CODE (target_decl) == FUNCTION_DECL > +          || !DECL_STRUCT_FUNCTION (target_decl)) >        /* We use local aliases for C++ thunks to force the tailcall >          to bind locally.  This is a hack - to keep it working do >          the following (which is not strictly correct).  */ > Index: Makefile.in > =================================================================== > --- Makefile.in (revision 184971) > +++ Makefile.in (working copy) > @@ -1298,6 +1298,7 @@ OBJS = \ >     mcf.o \ >     mode-switching.o \ >     modulo-sched.o \ > +    multiversion.o \ >     omega.o \ >     omp-low.o \ >     optabs.o \ > @@ -3030,6 +3031,11 @@ ree.o : ree.c $(CONFIG_H) $(SYSTEM_H) coretypes.h >   $(DF_H) $(TIMEVAR_H) tree-pass.h $(RECOG_H) $(EXPR_H) \ >   $(REGS_H) $(TREE_H) $(TM_P_H) insn-config.h $(INSN_ATTR_H) $(DIAGNOSTIC_CORE_H) \ >   $(TARGET_H) $(OPTABS_H) insn-codes.h rtlhooks-def.h $(PARAMS_H) $(CGRAPH_H) > +multiversion.o : multiversion.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \ > +  $(TREE_H) langhooks.h $(TREE_INLINE_H) $(FLAGS_H) $(CGRAPH_H) intl.h \ > +  $(DIAGNOSTIC_H) $(FIBHEAP_H) $(PARAMS_H) $(TIMEVAR_H) tree-pass.h \ > +  $(HASHTAB_H) $(COVERAGE_H) $(GGC_H) $(TREE_FLOW_H) $(RTL_H) $(IPA_PROP_H) \ > +  $(BASIC_BLOCK_H) $(TOPLEV_H) $(TREE_DUMP_H) ipa-inline.h >  cprop.o : cprop.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \ >   $(REGS_H) hard-reg-set.h $(FLAGS_H) insn-config.h $(GGC_H) \ >   $(RECOG_H) $(EXPR_H) $(BASIC_BLOCK_H) $(FUNCTION_H) output.h toplev.h $(DIAGNOSTIC_CORE_H) \ > Index: passes.c > =================================================================== > --- passes.c   (revision 184971) > +++ passes.c   (working copy) > @@ -1190,6 +1190,7 @@ init_optimization_passes (void) >  NEXT_PASS (pass_build_cfg); >  NEXT_PASS (pass_warn_function_return); >  NEXT_PASS (pass_build_cgraph_edges); > +  NEXT_PASS (pass_dispatch_versions); >  *p = NULL; > >  /* Interprocedural optimization passes.  */ > Index: config/i386/i386.c > =================================================================== > --- config/i386/i386.c  (revision 184971) > +++ config/i386/i386.c  (working copy) > @@ -27446,6 +27473,593 @@ ix86_init_mmx_sse_builtins (void) >   } >  } > > +/* This adds a condition to the basic_block NEW_BB in function FUNCTION_DECL > +  to return a pointer to VERSION_DECL if the outcome of the function > +  PREDICATE_DECL is true.  This function will be called during version > +  dispatch to decide which function version to execute.  It returns the > +  basic block at the end to which more conditions can be added.  */ > + > +static basic_block > +add_condition_to_bb (tree function_decl, tree version_decl, > +           basic_block new_bb, tree predicate_decl) > +{ > +  gimple return_stmt; > +  tree convert_expr, result_var; > +  gimple convert_stmt; > +  gimple call_cond_stmt; > +  gimple if_else_stmt; > + > +  basic_block bb1, bb2, bb3; > +  edge e12, e23; > + > +  tree cond_var; > +  gimple_seq gseq; > + > +  tree old_current_function_decl; > + > +  old_current_function_decl = current_function_decl; > +  push_cfun (DECL_STRUCT_FUNCTION (function_decl)); > +  current_function_decl = function_decl; > + > +  gcc_assert (new_bb != NULL); > +  gseq = bb_seq (new_bb); > + > + > +  convert_expr = build1 (CONVERT_EXPR, ptr_type_node, > +             build_fold_addr_expr (version_decl)); > +  result_var = create_tmp_var (ptr_type_node, NULL); > +  convert_stmt = gimple_build_assign (result_var, convert_expr); > +  return_stmt = gimple_build_return (result_var); > + > +  if (predicate_decl == NULL_TREE) > +   { > +    gimple_seq_add_stmt (&gseq, convert_stmt); > +    gimple_seq_add_stmt (&gseq, return_stmt); > +    set_bb_seq (new_bb, gseq); > +    gimple_set_bb (convert_stmt, new_bb); > +    gimple_set_bb (return_stmt, new_bb); > +    pop_cfun (); > +    current_function_decl = old_current_function_decl; > +    return new_bb; > +   } > + > +  cond_var = create_tmp_var (integer_type_node, NULL); > +  call_cond_stmt = gimple_build_call (predicate_decl, 0); > +  gimple_call_set_lhs (call_cond_stmt, cond_var); > + > +  gimple_set_block (call_cond_stmt, DECL_INITIAL (function_decl)); > +  gimple_set_bb (call_cond_stmt, new_bb); > +  gimple_seq_add_stmt (&gseq, call_cond_stmt); > + > +  if_else_stmt = gimple_build_cond (GT_EXPR, cond_var, > +                  integer_zero_node, > +                  NULL_TREE, NULL_TREE); > +  gimple_set_block (if_else_stmt, DECL_INITIAL (function_decl)); > +  gimple_set_bb (if_else_stmt, new_bb); > +  gimple_seq_add_stmt (&gseq, if_else_stmt); > + > +  gimple_seq_add_stmt (&gseq, convert_stmt); > +  gimple_seq_add_stmt (&gseq, return_stmt); > +  set_bb_seq (new_bb, gseq); > + > +  bb1 = new_bb; > +  e12 = split_block (bb1, if_else_stmt); > +  bb2 = e12->dest; > +  e12->flags &= ~EDGE_FALLTHRU; > +  e12->flags |= EDGE_TRUE_VALUE; > + > +  e23 = split_block (bb2, return_stmt); > + > +  gimple_set_bb (convert_stmt, bb2); > +  gimple_set_bb (return_stmt, bb2); > + > +  bb3 = e23->dest; > +  make_edge (bb1, bb3, EDGE_FALSE_VALUE); > + > +  remove_edge (e23); > +  make_edge (bb2, EXIT_BLOCK_PTR, 0); > + > +  rebuild_cgraph_edges (); > + > +  pop_cfun (); > +  current_function_decl = old_current_function_decl; > + > +  return bb3; > +} > + > +/* This parses the attribute arguments to targetv in DECL and determines > +  the right builtin to use to match the platform specification. > +  For now, only one target argument ("arch=") is allowed.  */ > + > +static enum ix86_builtins > +get_builtin_code_for_version (tree decl) > +{ > +  tree attrs; > +  struct cl_target_option cur_target; > +  tree target_node; > +  struct cl_target_option *new_target; > +  enum ix86_builtins builtin_code = IX86_BUILTIN_MAX; > + > +  attrs = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl)); > +  gcc_assert (attrs != NULL); > + > +  cl_target_option_save (&cur_target, &global_options); > + > +  target_node = ix86_valid_target_attribute_tree > +         (TREE_VALUE (TREE_VALUE (attrs))); > + > +  gcc_assert (target_node); > +  new_target = TREE_TARGET_OPTION (target_node); > +  gcc_assert (new_target); > + > +  if (new_target->arch_specified && new_target->arch > 0) > +   { > +    switch (new_target->arch) > +     { > +    case 1: > +    case 2: > +    case 3: > +    case 4: > +    case 5: > +    case 6: > +    case 7: > +    case 8: > +    case 9: > +    case 10: > +    case 11: > +     builtin_code = IX86_BUILTIN_CPU_IS_INTEL; > +     break; > +    case 12: > +     builtin_code = IX86_BUILTIN_CPU_IS_INTEL_CORE2; > +     break; > +    case 13: > +     builtin_code = IX86_BUILTIN_CPU_IS_INTEL_COREI7; > +     break; > +    case 14: > +     builtin_code = IX86_BUILTIN_CPU_IS_INTEL_ATOM; > +     break; > +    case 15: > +    case 16: > +    case 17: > +    case 18: > +    case 19: > +    case 20: > +    case 21: > +     builtin_code = IX86_BUILTIN_CPU_IS_AMD; > +     break; > +    case 22: > +     builtin_code = IX86_BUILTIN_CPU_IS_AMDFAM10H; > +     break; > +    case 23: > +     builtin_code = IX86_BUILTIN_CPU_IS_AMDFAM15H_BDVER1; > +     break; > +    case 24: > +     builtin_code = IX86_BUILTIN_CPU_IS_AMDFAM15H_BDVER2; > +     break; > +    case 25: /* What is btver1 ? */ > +     builtin_code = IX86_BUILTIN_CPU_IS_AMD; > +     break; > +    } > +   } > + > +  cl_target_option_restore (&global_options, &cur_target); > +  if (builtin_code == IX86_BUILTIN_MAX) > +    error_at (DECL_SOURCE_LOCATION (decl), > +        "No dispatcher found for the versioning attributes"); > + > +  return builtin_code; > +} > + > +/* This is the target hook to generate the dispatch function for > +  multi-versioned functions.  DISPATCH_DECL is the function which will > +  contain the dispatch logic.  FNDECLS are the function choices for > +  dispatch, and is a tree chain.  EMPTY_BB is the basic block pointer > +  in DISPATCH_DECL in which the dispatch code is generated.  */ > + > +static int > +ix86_dispatch_version (tree dispatch_decl, > +            void *fndecls_p, > +            basic_block *empty_bb) > +{ > +  tree default_decl; > +  gimple ifunc_cpu_init_stmt; > +  gimple_seq gseq; > +  tree old_current_function_decl; > +  int ix; > +  tree ele; > +  VEC (tree, heap) *fndecls; > + > +  gcc_assert (dispatch_decl != NULL > +       && fndecls_p != NULL > +       && empty_bb != NULL); > + > +  /*fndecls_p is actually a vector.  */ > +  fndecls = (VEC (tree, heap) *)fndecls_p; > + > +  /* Atleast one more version other than the default.  */ > +  gcc_assert (VEC_length (tree, fndecls) >= 2); > + > +  /* The first version in the vector is the default decl.  */ > +  default_decl = VEC_index (tree, fndecls, 0); > + > +  old_current_function_decl = current_function_decl; > +  push_cfun (DECL_STRUCT_FUNCTION (dispatch_decl)); > +  current_function_decl = dispatch_decl; > + > +  gseq = bb_seq (*empty_bb); > +  ifunc_cpu_init_stmt = gimple_build_call_vec ( > +           ix86_builtins [(int) IX86_BUILTIN_CPU_INIT], NULL); > +  gimple_seq_add_stmt (&gseq, ifunc_cpu_init_stmt); > +  gimple_set_bb (ifunc_cpu_init_stmt, *empty_bb); > +  set_bb_seq (*empty_bb, gseq); > + > +  pop_cfun (); > +  current_function_decl = old_current_function_decl; > + > + > +  for (ix = 1; VEC_iterate (tree, fndecls, ix, ele); ++ix) > +   { > +    tree version_decl = ele; > +    /* Get attribute string, parse it and find the right predicate decl. > +     The predicate function could be a lengthy combination of many > +     features, like arch-type and various isa-variants.  For now, only > +     check the arch-type.  */ > +    tree predicate_decl = ix86_builtins [ > +            get_builtin_code_for_version (version_decl)]; > +    *empty_bb = add_condition_to_bb (dispatch_decl, version_decl, *empty_bb, > +                    predicate_decl); > + > +   } > +  /* dispatch default version at the end.  */ > +  *empty_bb = add_condition_to_bb (dispatch_decl, default_decl, *empty_bb, > +                  NULL); > +  return 0; > +} > > @@ -38610,6 +39269,12 @@ ix86_autovectorize_vector_sizes (void) >  #undef TARGET_BUILD_BUILTIN_VA_LIST >  #define TARGET_BUILD_BUILTIN_VA_LIST ix86_build_builtin_va_list > > +#undef TARGET_DISPATCH_VERSION > +#define TARGET_DISPATCH_VERSION ix86_dispatch_version > + >  #undef TARGET_ENUM_VA_LIST_P >  #define TARGET_ENUM_VA_LIST_P ix86_enum_va_list > > Index: testsuite/g++.dg/mv1.C > =================================================================== > --- testsuite/g++.dg/mv1.C    (revision 0) > +++ testsuite/g++.dg/mv1.C    (revision 0) > @@ -0,0 +1,23 @@ > +/* Simple test case to check if Multiversioning works.  */ > +/* { dg-do run } */ > +/* { dg-options "-O2" } */ > + > +int foo (); > +int foo () __attribute__ ((targetv("arch=corei7"))); > + > +int main () > +{ > +  int (*p)() = &foo; > +  return foo () + (*p)(); > +} > + > +int foo () > +{ > +  return 0; > +} > + > +int __attribute__ ((targetv("arch=corei7"))) > +foo () > +{ > +  return 0; > +} > > > -- > This patch is available for review at http://codereview.appspot.com/5752064
Sign in to reply to this message.
On Wed, Mar 7, 2012 at 6:05 AM, Richard Guenther <richard.guenther@gmail.com> wrote: > On Wed, Mar 7, 2012 at 1:46 AM, Sriraman Tallam <tmsriram@google.com> wrote: >> User directed Function Multiversioning (MV) via Function Overloading >> ==================================================================== >> >> This patch adds support for user directed function MV via function overloading. >> For more detailed description: >> http://gcc.gnu.org/ml/gcc/2012-03/msg00074.html >> >> >> Here is an example program with function versions: >> >> int foo ();  /* Default version */ >> int foo () __attribute__ ((targetv("arch=corei7")));/*Specialized for corei7 */ >> int foo () __attribute__ ((targetv("arch=core2")));/*Specialized for core2 */ >> >> int main () >> { >>  int (*p)() = &foo; >>  return foo () + (*p)(); >> } >> >> int foo () >> { >>  return 0; >> } >> >> int __attribute__ ((targetv("arch=corei7"))) >> foo () >> { >>  return 0; >> } >> >> int __attribute__ ((targetv("arch=core2"))) >> foo () >> { >>  return 0; >> } >> >> The above example has foo defined 3 times, but all 3 definitions of foo are >> different versions of the same function. The call to foo in main, directly and >> via a pointer, are calls to the multi-versioned function foo which is dispatched >> to the right foo at run-time. >> >> Function versions must have the same signature but must differ in the specifier >> string provided to a new attribute called "targetv", which is nothing but the >> target attribute with an extra specification to indicate a version. Any number >> of versions can be created using the targetv attribute but it is mandatory to >> have one function without the attribute, which is treated as the default >> version. >> >> The dispatching is done using the IFUNC mechanism to keep the dispatch overhead >> low. The compiler creates a dispatcher function which checks the CPU type and >> calls the right version of foo. The dispatching code checks for the platform >> type and calls the first version that matches. The default function is called if >> no specialized version is appropriate for execution. >> >> The pointer to foo is made to be the address of the dispatcher function, so that >> it is unique and calls made via the pointer also work correctly. The assembler >> names of the various versions of foo is made different, by tagging >> the specifier strings, to keep them unique.  A specific version can be called >> directly by creating an alias to its assembler name. For instance, to call the >> corei7 version directly, make an alias : >> int foo_corei7 () __attribute__((alias ("_Z3foov.arch_corei7"))); >> and then call foo_corei7. >> >> Note that using IFUNC  blocks inlining of versioned functions. I had implemented >> an optimization earlier to do hot path cloning to allow versioned functions to >> be inlined. Please see : http://gcc.gnu.org/ml/gcc-patches/2011-04/msg02285.html >> In the next iteration, I plan to merge these two. With that, hot code paths with >> versioned functions will be cloned so that versioned functions can be inlined. > > Note that inlining of functions with the target attribute is limited as well, > but your issue is that of the indirect dispatch as ... > > You don't give an overview of the frontend implementation.  Thus I have > extracted the following > >  - the FE does not really know about the "overloading", nor can it directly >  resolve calls from a "sse" function to another "sse" function without going >  through the 2nd IFUNC This is a good point but I can change function joust, where the overload candidate is selected, to return the decl of the versioned function with matching target attributes as that of the callee. That will solve this problem. I have to treat the target attributes as an additional criterion for a match in overload resolution. The front end *does know* about the overloading, it is a question of doing the overload resolution correctly right? This is easy when there is no cloning involved. When cloning of a version is required, it gets complicated since the FE must clone and produce the bodies. Once, all the bodies are available the overload resolution can do the right thing. > >  - cgraph also does not know about the "overloading", so it cannot do such >  "devirtualization" either > > you seem to have implemented something inbetween a pure frontend > solution and a proper middle-end solution. The only thing I delayed is the code generation of the dispatcher. I thought it is better to have this come later, after cfg and cgraph is generated, so that multiple dispatching mechanisms could be implemented. For optimization and eventually > automatically selecting functions for cloning (like, callees of a manual "sse" > versioned function should be cloned?) it would be nice if the cgraph would > know about the different versions and their relationships (and the dispatcher). > Especially the cgraph code should know the functions are semantically > equivalent (I suppose we should require that).  The IFUNC should be > generated by cgraph / target code, similar to how we generate C++ thunks. > > Honza, any suggestions on how the FE side of such cgraph infrastructure > should look like and how we should encode the target bits? > > Thanks, > Richard. > >>     * doc/tm.texi.in: Add description for TARGET_DISPATCH_VERSION. >>     * doc/tm.texi: Regenerate. >>     * c-family/c-common.c (handle_targetv_attribute): New function. >>     * target.def (dispatch_version): New target hook. >>     * tree.h (DECL_FUNCTION_VERSIONED): New macro. >>     (tree_function_decl): New bit-field versioned_function. >>     * tree-pass.h (pass_dispatch_versions): New pass. >>     * multiversion.c: New file. >>     * multiversion.h: New file. >>     * cgraphunit.c: Include multiversion.h >>     (cgraph_finalize_function): Change assembler names of versioned >>     functions. >>     * cp/class.c: Include multiversion.h >>     (add_method): aggregate function versions. Change assembler names of >>     versioned functions. >>     (resolve_address_of_overloaded_function): Match address of function >>     version with default function.  Return address of ifunc dispatcher >>     for address of versioned functions. >>     * cp/decl.c (decls_match): Make decls unmatched for versioned >>     functions. >>     (duplicate_decls): Remove ambiguity for versioned functions. Notify >>     of deleted function version decls. >>     (start_decl): Change assembler name of versioned functions. >>     (start_function): Change assembler name of versioned functions. >>     (cxx_comdat_group): Make comdat group of versioned functions be the >>     same. >>     * cp/semantics.c (expand_or_defer_fn_1): Mark as needed versioned >>     functions that are also marked inline. >>     * cp/decl2.c: Include multiversion.h >>     (check_classfn): Check attributes of versioned functions for match. >>     * cp/call.c: Include multiversion.h >>     (build_over_call): Make calls to multiversioned functions to call the >>     dispatcher. >>     (joust): For calls to multi-versioned functions, make the default >>     function win. >>     * timevar.def (TV_MULTIVERSION_DISPATCH): New time var. >>     * varasm.c (finish_aliases_1): Check if the alias points to a function >>     with a body before giving an error. >>     * Makefile.in: Add multiversion.o >>     * passes.c: Add pass_dispatch_versions to the pass list. >>     * config/i386/i386.c (add_condition_to_bb): New function. >>     (get_builtin_code_for_version): New function. >>     (ix86_dispatch_version): New function. >>     (TARGET_DISPATCH_VERSION): New macro. >>     * testsuite/g++.dg/mv1.C: New test. >> >> Index: doc/tm.texi >> =================================================================== >> --- doc/tm.texi (revision 184971) >> +++ doc/tm.texi (working copy) >> @@ -10995,6 +10995,14 @@ The result is another tree containing a simplified >>  call's result.  If @var{ignore} is true the value will be ignored. >>  @end deftypefn >> >> +@deftypefn {Target Hook} int TARGET_DISPATCH_VERSION (tree @var{dispatch_decl}, void *@var{fndecls}, basic_block *@var{empty_bb}) >> +For multi-versioned function, this hook sets up the dispatcher. >> +@var{dispatch_decl} is the function that will be used to dispatch the >> +version. @var{fndecls} are the function choices for dispatch. >> +@var{empty_bb} is an basic block in @var{dispatch_decl} where the >> +code to do the dispatch will be added. >> +@end deftypefn >> + >>  @deftypefn {Target Hook} {const char *} TARGET_INVALID_WITHIN_DOLOOP (const_rtx @var{insn}) >> >>  Take an instruction in @var{insn} and return NULL if it is valid within a >> Index: doc/tm.texi.in >> =================================================================== >> --- doc/tm.texi.in    (revision 184971) >> +++ doc/tm.texi.in    (working copy) >> @@ -10873,6 +10873,14 @@ The result is another tree containing a simplified >>  call's result.  If @var{ignore} is true the value will be ignored. >>  @end deftypefn >> >> +@hook TARGET_DISPATCH_VERSION >> +For multi-versioned function, this hook sets up the dispatcher. >> +@var{dispatch_decl} is the function that will be used to dispatch the >> +version. @var{fndecls} are the function choices for dispatch. >> +@var{empty_bb} is an basic block in @var{dispatch_decl} where the >> +code to do the dispatch will be added. >> +@end deftypefn >> + >>  @hook TARGET_INVALID_WITHIN_DOLOOP >> >>  Take an instruction in @var{insn} and return NULL if it is valid within a >> Index: c-family/c-common.c >> =================================================================== >> --- c-family/c-common.c (revision 184971) >> +++ c-family/c-common.c (working copy) >> @@ -315,6 +315,7 @@ static tree check_case_value (tree); >>  static bool check_case_bounds (tree, tree, tree *, tree *); >> >>  static tree handle_packed_attribute (tree *, tree, tree, int, bool *); >> +static tree handle_targetv_attribute (tree *, tree, tree, int, bool *); >>  static tree handle_nocommon_attribute (tree *, tree, tree, int, bool *); >>  static tree handle_common_attribute (tree *, tree, tree, int, bool *); >>  static tree handle_noreturn_attribute (tree *, tree, tree, int, bool *); >> @@ -604,6 +605,8 @@ const struct attribute_spec c_common_attribute_tab >>  { >>  /* { name, min_len, max_len, decl_req, type_req, fn_type_req, handler, >>     affects_type_identity } */ >> +  { "targetv",        1, -1, true, false, false, >> +               handle_targetv_attribute, false }, >>  { "packed",         0, 0, false, false, false, >>                handle_packed_attribute , false}, >>  { "nocommon",        0, 0, true,  false, false, >> @@ -5869,6 +5872,54 @@ handle_packed_attribute (tree *node, tree name, tr >>  return NULL_TREE; >>  } >> >> +/* The targetv attribue is used to specify a function version >> +  targeted to specific platform types.  The "targetv" attributes >> +  have to be valid "target" attributes.  NODE should always point >> +  to a FUNCTION_DECL.  ARGS contain the arguments to "targetv" >> +  which should be valid arguments to attribute "target" too. >> +  Check handle_target_attribute for FLAGS and NO_ADD_ATTRS.  */ >> + >> +static tree >> +handle_targetv_attribute (tree *node, tree name, >> +             tree args, >> +             int flags, >> +             bool *no_add_attrs) >> +{ >> +  const char *attr_str = NULL; >> +  gcc_assert (TREE_CODE (*node) == FUNCTION_DECL); >> +  gcc_assert (args != NULL); >> + >> +  /* This is a function version.  */ >> +  DECL_FUNCTION_VERSIONED (*node) = 1; >> + >> +  attr_str = TREE_STRING_POINTER (TREE_VALUE (args)); >> + >> +  /* Check if multiple sets of target attributes are there.  This >> +   is not supported now.  In future, this will be supported by >> +   cloning this function for each set.  */ >> +  if (TREE_CHAIN (args) != NULL) >> +   warning (OPT_Wattributes, "%qE attribute has multiple sets which " >> +       "is not supported", name); >> + >> +  if (attr_str == NULL >> +    || strstr (attr_str, "arch=") == NULL) >> +   error_at (DECL_SOURCE_LOCATION (*node), >> +       "Versioning supported only on \"arch=\" for now"); >> + >> +  /* targetv attributes must translate into target attributes.  */ >> +  handle_target_attribute (node, get_identifier ("target"), args, flags, >> +              no_add_attrs); >> + >> +  if (*no_add_attrs) >> +   warning (OPT_Wattributes, "%qE attribute has no effect", name); >> + >> +  /* This is necessary to keep the attribute tagged to the decl >> +   all the time.  */ >> +  *no_add_attrs = false; >> + >> +  return NULL_TREE; >> +} >> + >>  /* Handle a "nocommon" attribute; arguments as in >>   struct attribute_spec.handler.  */ >> >> Index: target.def >> =================================================================== >> --- target.def  (revision 184971) >> +++ target.def  (working copy) >> @@ -1249,6 +1249,15 @@ DEFHOOK >>  tree, (tree fndecl, int n_args, tree *argp, bool ignore), >>  hook_tree_tree_int_treep_bool_null) >> >> +/* Target hook to generate the dispatching code for calls to multi-versioned >> +  functions.  DISPATCH_DECL is the function that will have the dispatching >> +  logic.  FNDECLS are the list of choices for dispatch and EMPTY_BB is the >> +  basic bloc in DISPATCH_DECL which will contain the code.  */ >> +DEFHOOK >> +(dispatch_version, >> + "", >> + int, (tree dispatch_decl, void *fndecls, basic_block *empty_bb), NULL) >> + >>  /* Returns a code for a target-specific builtin that implements >>   reciprocal of the function, or NULL_TREE if not available.  */ >>  DEFHOOK >> Index: tree.h >> =================================================================== >> --- tree.h    (revision 184971) >> +++ tree.h    (working copy) >> @@ -3532,6 +3532,12 @@ extern VEC(tree, gc) **decl_debug_args_insert (tre >>  #define DECL_FUNCTION_SPECIFIC_OPTIMIZATION(NODE) \ >>   (FUNCTION_DECL_CHECK (NODE)->function_decl.function_specific_optimization) >> >> +/* In FUNCTION_DECL, this is set if this function has other versions generated >> +  using "targetv" attributes.  The default version is the one which does not >> +  have any "targetv" attribute set. */ >> +#define DECL_FUNCTION_VERSIONED(NODE)\ >> +  (FUNCTION_DECL_CHECK (NODE)->function_decl.versioned_function) >> + >>  /* FUNCTION_DECL inherits from DECL_NON_COMMON because of the use of the >>   arguments/result/saved_tree fields by front ends.  It was either inherit >>   FUNCTION_DECL from non_common, or inherit non_common from FUNCTION_DECL, >> @@ -3576,8 +3582,8 @@ struct GTY(()) tree_function_decl { >>  unsigned looping_const_or_pure_flag : 1; >>  unsigned has_debug_args_flag : 1; >>  unsigned tm_clone_flag : 1; >> - >> -  /* 1 bit left */ >> +  unsigned versioned_function : 1; >> +  /* No bits left.  */ >>  }; >> >>  /* The source language of the translation-unit.  */ >> Index: tree-pass.h >> =================================================================== >> --- tree-pass.h (revision 184971) >> +++ tree-pass.h (working copy) >> @@ -455,6 +455,7 @@ extern struct gimple_opt_pass pass_tm_memopt; >>  extern struct gimple_opt_pass pass_tm_edges; >>  extern struct gimple_opt_pass pass_split_functions; >>  extern struct gimple_opt_pass pass_feedback_split_functions; >> +extern struct gimple_opt_pass pass_dispatch_versions; >> >>  /* IPA Passes */ >>  extern struct simple_ipa_opt_pass pass_ipa_lower_emutls; >> Index: multiversion.c >> =================================================================== >> --- multiversion.c    (revision 0) >> +++ multiversion.c    (revision 0) >> @@ -0,0 +1,798 @@ >> +/* Function Multiversioning. >> +  Copyright (C) 2012 Free Software Foundation, Inc. >> +  Contributed by Sriraman Tallam (tmsriram@google.com) >> + >> +This file is part of GCC. >> + >> +GCC is free software; you can redistribute it and/or modify it under >> +the terms of the GNU General Public License as published by the Free >> +Software Foundation; either version 3, or (at your option) any later >> +version. >> + >> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY >> +WARRANTY; without even the implied warranty of MERCHANTABILITY or >> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License >> +for more details. >> + >> +You should have received a copy of the GNU General Public License >> +along with GCC; see the file COPYING3.  If not see >> +<http://www.gnu.org/licenses/>. */ >> + >> +/* Holds the state for multi-versioned functions here. The front-end >> +  updates the state as and when function versions are encountered. >> +  This is then used to generate the dispatch code.  Also, the >> +  optimization passes to clone hot paths involving versioned functions >> +  will be done here. >> + >> +  Function versions are created by using the same function signature but >> +  also tagging attribute "targetv" to specify the platform type for which >> +  the version must be executed.  Here is an example: >> + >> +  int foo () >> +  { >> +   printf ("Execute as default"); >> +   return 0; >> +  } >> + >> +  int  __attribute__ ((targetv ("arch=corei7"))) >> +  foo () >> +  { >> +   printf ("Execute for corei7"); >> +   return 0; >> +  } >> + >> +  int main () >> +  { >> +   return foo (); >> +  } >> + >> +  The call to foo in main is replaced with a call to an IFUNC function that >> +  contains the dispatch code to call the correct function version at >> +  run-time.  */ >> + >> + >> +#include "config.h" >> +#include "system.h" >> +#include "coretypes.h" >> +#include "tm.h" >> +#include "tree.h" >> +#include "tree-inline.h" >> +#include "langhooks.h" >> +#include "flags.h" >> +#include "cgraph.h" >> +#include "diagnostic.h" >> +#include "toplev.h" >> +#include "timevar.h" >> +#include "params.h" >> +#include "fibheap.h" >> +#include "intl.h" >> +#include "tree-pass.h" >> +#include "hashtab.h" >> +#include "coverage.h" >> +#include "ggc.h" >> +#include "tree-flow.h" >> +#include "rtl.h" >> +#include "ipa-prop.h" >> +#include "basic-block.h" >> +#include "toplev.h" >> +#include "dbgcnt.h" >> +#include "tree-dump.h" >> +#include "output.h" >> +#include "vecprim.h" >> +#include "gimple-pretty-print.h" >> +#include "ipa-inline.h" >> +#include "target.h" >> +#include "multiversion.h" >> + >> +typedef void * void_p; >> + >> +DEF_VEC_P (void_p); >> +DEF_VEC_ALLOC_P (void_p, heap); >> + >> +/* Each function decl that is a function version gets an instance of this >> +  structure.  Since this is called by the front-end, decl merging can >> +  happen, where a decl created for a new declaration is merged with >> +  the old. In this case, the new decl is deleted and the IS_DELETED >> +  field is set for the struct instance corresponding to the new decl. >> +  IFUNC_DECL is the decl of the ifunc function for default decls. >> +  IFUNC_RESOLVER_DECL is the decl of the dispatch function.  VERSIONS >> +  is a vector containing the list of function versions  that are >> +  the candidates for dispatch.  */ >> + >> +typedef struct version_function_d { >> +  tree decl; >> +  tree ifunc_decl; >> +  tree ifunc_resolver_decl; >> +  VEC (void_p, heap) *versions; >> +  bool is_deleted; >> +} version_function; >> + >> +/* Hashmap has an entry for every function decl that has other function >> +  versions.  For function decls that are the default, it also stores the >> +  list of all the other function versions.  Each entry is a structure >> +  of type version_function_d.  */ >> +static htab_t decl_version_htab = NULL; >> + >> +/* Hashtable helpers for decl_version_htab. */ >> + >> +static hashval_t >> +decl_version_htab_hash_descriptor (const void *p) >> +{ >> +  const version_function *t = (const version_function *) p; >> +  return htab_hash_pointer (t->decl); >> +} >> + >> +/* Hashtable helper for decl_version_htab. */ >> + >> +static int >> +decl_version_htab_eq_descriptor (const void *p1, const void *p2) >> +{ >> +  const version_function *t1 = (const version_function *) p1; >> +  return htab_eq_pointer ((const void_p) t1->decl, p2); >> +} >> + >> +/* Create the decl_version_htab.  */ >> +static void >> +create_decl_version_htab (void) >> +{ >> +  if (decl_version_htab == NULL) >> +   decl_version_htab = htab_create (10, decl_version_htab_hash_descriptor, >> +                   decl_version_htab_eq_descriptor, NULL); >> +} >> + >> +/* Creates an instance of version_function for decl DECL.  */ >> + >> +static version_function* >> +new_version_function (const tree decl) >> +{ >> +  version_function *v; >> +  v = (version_function *)xmalloc(sizeof (version_function)); >> +  v->decl = decl; >> +  v->ifunc_decl = NULL; >> +  v->ifunc_resolver_decl = NULL; >> +  v->versions = NULL; >> +  v->is_deleted = false; >> +  return v; >> +} >> + >> +/* Comparator function to be used in qsort routine to sort attribute >> +  specification strings to "targetv".  */ >> + >> +static int >> +attr_strcmp (const void *v1, const void *v2) >> +{ >> +  const char *c1 = *(char *const*)v1; >> +  const char *c2 = *(char *const*)v2; >> +  return strcmp (c1, c2); >> +} >> + >> +/* STR is the argument to targetv attribute.  This function tokenizes >> +  the comma separated arguments, sorts them and returns a string which >> +  is a unique identifier for the comma separated arguments.  */ >> + >> +static char * >> +sorted_attr_string (const char *str) >> +{ >> +  char **args = NULL; >> +  char *attr_str, *ret_str; >> +  char *attr = NULL; >> +  unsigned int argnum = 1; >> +  unsigned int i; >> + >> +  for (i = 0; i < strlen (str); i++) >> +   if (str[i] == ',') >> +    argnum++; >> + >> +  attr_str = (char *)xmalloc (strlen (str) + 1); >> +  strcpy (attr_str, str); >> + >> +  for (i = 0; i < strlen (attr_str); i++) >> +   if (attr_str[i] == '=') >> +    attr_str[i] = '_'; >> + >> +  if (argnum == 1) >> +   return attr_str; >> + >> +  args = (char **)xmalloc (argnum * sizeof (char *)); >> + >> +  i = 0; >> +  attr = strtok (attr_str, ","); >> +  while (attr != NULL) >> +   { >> +    args[i] = attr; >> +    i++; >> +    attr = strtok (NULL, ","); >> +   } >> + >> +  qsort (args, argnum, sizeof (char*), attr_strcmp); >> + >> +  ret_str = (char *)xmalloc (strlen (str) + 1); >> +  strcpy (ret_str, args[0]); >> +  for (i = 1; i < argnum; i++) >> +   { >> +    strcat (ret_str, "_"); >> +    strcat (ret_str, args[i]); >> +   } >> + >> +  free (args); >> +  free (attr_str); >> +  return ret_str; >> +} >> + >> +/* Returns true when only one of DECL1 and DECL2 is marked with "targetv" >> +  or if the "targetv" attribute strings of DECL1 and DECL2 dont match.  */ >> + >> +bool >> +has_different_version_attributes (const tree decl1, const tree decl2) >> +{ >> +  tree attr1, attr2; >> +  char *c1, *c2; >> +  bool ret = false; >> + >> +  if (TREE_CODE (decl1) != FUNCTION_DECL >> +    || TREE_CODE (decl2) != FUNCTION_DECL) >> +   return false; >> + >> +  attr1 = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl1)); >> +  attr2 = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl2)); >> + >> +  if (attr1 == NULL_TREE && attr2 == NULL_TREE) >> +   return false; >> + >> +  if ((attr1 == NULL_TREE && attr2 != NULL_TREE) >> +    || (attr1 != NULL_TREE && attr2 == NULL_TREE)) >> +   return true; >> + >> +  c1 = sorted_attr_string ( >> +    TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr1)))); >> +  c2 = sorted_attr_string ( >> +    TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr2)))); >> + >> +  if (strcmp (c1, c2) != 0) >> +   ret = true; >> + >> +  free (c1); >> +  free (c2); >> + >> +  return ret; >> +} >> + >> +/* If this decl corresponds to a function and has "targetv" attribute, >> +  append the attribute string to its assembler name.  */ >> + >> +void >> +version_assembler_name (const tree decl) >> +{ >> +  tree version_attr; >> +  const char *orig_name, *version_string, *attr_str; >> +  char *assembler_name; >> +  tree assembler_name_tree; >> + >> +  if (TREE_CODE (decl) != FUNCTION_DECL >> +    || DECL_ASSEMBLER_NAME_SET_P (decl) >> +    || !DECL_FUNCTION_VERSIONED (decl)) >> +   return; >> + >> +  if (DECL_DECLARED_INLINE_P (decl) >> +    &&lookup_attribute ("gnu_inline", >> +             DECL_ATTRIBUTES (decl))) >> +   error_at (DECL_SOURCE_LOCATION (decl), >> +       "Function versions cannot be marked as gnu_inline," >> +       " bodies have to be generated\n"); >> + >> +  if (DECL_VIRTUAL_P (decl) >> +    || DECL_VINDEX (decl)) >> +   error_at (DECL_SOURCE_LOCATION (decl), >> +       "Virtual function versioning not supported\n"); >> + >> +  version_attr = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl)); >> +  /* targetv attribute string is NULL for default functions.  */ >> +  if (version_attr == NULL_TREE) >> +   return; >> + >> +  orig_name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)); >> +  version_string >> +   = TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (version_attr))); >> + >> +  attr_str = sorted_attr_string (version_string); >> +  assembler_name = (char *) xmalloc (strlen (orig_name) >> +                   + strlen (attr_str) + 2); >> + >> +  sprintf (assembler_name, "%s.%s", orig_name, attr_str); >> +  if (dump_file) >> +   fprintf (dump_file, "Assembler name set to %s for function version %s\n", >> +       assembler_name, IDENTIFIER_POINTER (DECL_NAME (decl))); >> +  assembler_name_tree = get_identifier (assembler_name); >> +  SET_DECL_ASSEMBLER_NAME (decl, assembler_name_tree); >> +} >> + >> +/* Returns true if decl is multi-versioned and DECL is the default function, >> +  that is it is not tagged with "targetv" attribute.  */ >> + >> +bool >> +is_default_function (const tree decl) >> +{ >> +  return (TREE_CODE (decl) == FUNCTION_DECL >> +     && DECL_FUNCTION_VERSIONED (decl) >> +     && (lookup_attribute ("targetv", DECL_ATTRIBUTES (decl)) >> +       == NULL_TREE)); >> +} >> + >> +/* For function decl DECL, find the version_function struct in the >> +  decl_version_htab.  */ >> + >> +static version_function * >> +find_function_version (const tree decl) >> +{ >> +  void *slot; >> + >> +  if (!DECL_FUNCTION_VERSIONED (decl)) >> +   return NULL; >> + >> +  if (!decl_version_htab) >> +   return NULL; >> + >> +  slot = htab_find_with_hash (decl_version_htab, decl, >> +                htab_hash_pointer (decl)); >> + >> +  if (slot != NULL) >> +   return (version_function *)slot; >> + >> +  return NULL; >> +} >> + >> +/* Record DECL as a function version by creating a version_function struct >> +  for it and storing it in the hashtable.  */ >> + >> +static version_function * >> +add_function_version (const tree decl) >> +{ >> +  void **slot; >> +  version_function *v; >> + >> +  if (!DECL_FUNCTION_VERSIONED (decl)) >> +   return NULL; >> + >> +  create_decl_version_htab (); >> + >> +  slot = htab_find_slot_with_hash (decl_version_htab, (const void_p)decl, >> +                  htab_hash_pointer ((const void_p)decl), >> +                  INSERT); >> + >> +  if (*slot != NULL) >> +   return (version_function *)*slot; >> + >> +  v = new_version_function (decl); >> +  *slot = v; >> + >> +  return v; >> +} >> + >> +/* Push V into VEC only if it is not already present.  */ >> + >> +static void >> +push_function_version (version_function *v, VEC (void_p, heap) *vec) >> +{ >> +  int ix; >> +  void_p ele; >> +  for (ix = 0; VEC_iterate (void_p, vec, ix, ele); ++ix) >> +   { >> +    if (ele == (void_p)v) >> +     return; >> +   } >> + >> +  VEC_safe_push (void_p, heap, vec, (void*)v); >> +} >> + >> +/* Mark DECL as deleted.  This is called by the front-end when a duplicate >> +  decl is merged with the original decl and the duplicate decl is deleted. >> +  This function marks the duplicate_decl as invalid.  Called by >> +  duplicate_decls in cp/decl.c.  */ >> + >> +void >> +mark_delete_decl_version (const tree decl) >> +{ >> +  version_function *decl_v; >> + >> +  decl_v = find_function_version (decl); >> + >> +  if (decl_v == NULL) >> +   return; >> + >> +  decl_v->is_deleted = true; >> + >> +  if (is_default_function (decl) >> +    && decl_v->versions != NULL) >> +   { >> +    VEC_truncate (void_p, decl_v->versions, 0); >> +    VEC_free (void_p, heap, decl_v->versions); >> +   } >> +} >> + >> +/* Mark DECL1 and DECL2 to be function versions in the same group.  One >> +  of DECL1 and DECL2 must be the default, otherwise this function does >> +  nothing.  This function aggregates the versions.  */ >> + >> +int >> +group_function_versions (const tree decl1, const tree decl2) >> +{ >> +  tree default_decl, version_decl; >> +  version_function *default_v, *version_v; >> + >> +  gcc_assert (DECL_FUNCTION_VERSIONED (decl1) >> +       && DECL_FUNCTION_VERSIONED (decl2)); >> + >> +  /* The version decls are added only to the default decl.  */ >> +  if (!is_default_function (decl1) >> +    && !is_default_function (decl2)) >> +   return 0; >> + >> +  /* This can happen with duplicate declarations.  Just ignore.  */ >> +  if (is_default_function (decl1) >> +    && is_default_function (decl2)) >> +   return 0; >> + >> +  default_decl = (is_default_function (decl1)) ? decl1 : decl2; >> +  version_decl = (default_decl == decl1) ? decl2 : decl1; >> + >> +  gcc_assert (default_decl != version_decl); >> +  create_decl_version_htab (); >> + >> +  /* If the version function is found, it has been added.  */ >> +  if (find_function_version (version_decl)) >> +   return 0; >> + >> +  default_v = add_function_version (default_decl); >> +  version_v = add_function_version (version_decl); >> + >> +  if (default_v->versions == NULL) >> +   default_v->versions = VEC_alloc (void_p, heap, 1); >> + >> +  push_function_version (version_v, default_v->versions); >> +  return 0; >> +} >> + >> +/* Makes a function attribute of the form NAME(ARG_NAME) and chains >> +  it to CHAIN.  */ >> + >> +static tree >> +make_attribute (const char *name, const char *arg_name, tree chain) >> +{ >> +  tree attr_name; >> +  tree attr_arg_name; >> +  tree attr_args; >> +  tree attr; >> + >> +  attr_name = get_identifier (name); >> +  attr_arg_name = build_string (strlen (arg_name), arg_name); >> +  attr_args = tree_cons (NULL_TREE, attr_arg_name, NULL_TREE); >> +  attr = tree_cons (attr_name, attr_args, chain); >> +  return attr; >> +} >> + >> +/* Return a new name by appending SUFFIX to the DECL name.  If >> +  make_unique is true, append the full path name.  */ >> + >> +static char * >> +make_name (tree decl, const char *suffix, bool make_unique) >> +{ >> +  char *global_var_name; >> +  int name_len; >> +  const char *name; >> +  const char *unique_name = NULL; >> + >> +  name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)); >> + >> +  /* Get a unique name that can be used globally without any chances >> +   of collision at link time.  */ >> +  if (make_unique) >> +   unique_name = IDENTIFIER_POINTER (get_file_function_name ("\0")); >> + >> +  name_len = strlen (name) + strlen (suffix) + 2; >> + >> +  if (make_unique) >> +   name_len += strlen (unique_name) + 1; >> +  global_var_name = (char *) xmalloc (name_len); >> + >> +  /* Use '.' to concatenate names as it is demangler friendly.  */ >> +  if (make_unique) >> +    snprintf (global_var_name, name_len, "%s.%s.%s", name, >> +        unique_name, suffix); >> +  else >> +    snprintf (global_var_name, name_len, "%s.%s", name, suffix); >> + >> +  return global_var_name; >> +} >> + >> +/* Make the resolver function decl for ifunc (IFUNC_DECL) to dispatch >> +  the versions of multi-versioned function DEFAULT_DECL.  Create and >> +  empty basic block in the resolver and store the pointer in >> +  EMPTY_BB.  Return the decl of the resolver function.  */ >> + >> +static tree >> +make_ifunc_resolver_func (const tree default_decl, >> +             const tree ifunc_decl, >> +             basic_block *empty_bb) >> +{ >> +  char *resolver_name; >> +  tree decl, type, decl_name, t; >> +  basic_block new_bb; >> +  tree old_current_function_decl; >> +  bool make_unique = false; >> + >> +  /* IFUNC's have to be globally visible.  So, if the default_decl is >> +   not, then the name of the IFUNC should be made unique.  */ >> +  if (TREE_PUBLIC (default_decl) == 0) >> +   make_unique = true; >> + >> +  /* Append the filename to the resolver function if the versions are >> +   not externally visible.  This is because the resolver function has >> +   to be externally visible for the loader to find it.  So, appending >> +   the filename will prevent conflicts with a resolver function from >> +   another module which is based on the same version name.  */ >> +  resolver_name = make_name (default_decl, "resolver", make_unique); >> + >> +  /* The resolver function should return a (void *). */ >> +  type = build_function_type_list (ptr_type_node, NULL_TREE); >> + >> +  decl = build_fn_decl (resolver_name, type); >> +  decl_name = get_identifier (resolver_name); >> +  SET_DECL_ASSEMBLER_NAME (decl, decl_name); >> + >> +  DECL_NAME (decl) = decl_name; >> +  TREE_USED (decl) = TREE_USED (default_decl); >> +  DECL_ARTIFICIAL (decl) = 1; >> +  DECL_IGNORED_P (decl) = 0; >> +  /* IFUNC resolvers have to be externally visible.  */ >> +  TREE_PUBLIC (decl) = 1; >> +  DECL_UNINLINABLE (decl) = 1; >> + >> +  DECL_EXTERNAL (decl) = DECL_EXTERNAL (default_decl); >> +  DECL_EXTERNAL (ifunc_decl) = 0; >> + >> +  DECL_CONTEXT (decl) = NULL_TREE; >> +  DECL_INITIAL (decl) = make_node (BLOCK); >> +  DECL_STATIC_CONSTRUCTOR (decl) = 0; >> +  TREE_READONLY (decl) = 0; >> +  DECL_PURE_P (decl) = 0; >> +  DECL_COMDAT (decl) = DECL_COMDAT (default_decl); >> +  if (DECL_COMDAT_GROUP (default_decl)) >> +   { >> +    make_decl_one_only (decl, DECL_COMDAT_GROUP (default_decl)); >> +   } >> +  /* Build result decl and add to function_decl. */ >> +  t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE, ptr_type_node); >> +  DECL_ARTIFICIAL (t) = 1; >> +  DECL_IGNORED_P (t) = 1; >> +  DECL_RESULT (decl) = t; >> + >> +  gimplify_function_tree (decl); >> +  old_current_function_decl = current_function_decl; >> +  push_cfun (DECL_STRUCT_FUNCTION (decl)); >> +  current_function_decl = decl; >> +  init_empty_tree_cfg_for_function (DECL_STRUCT_FUNCTION (decl)); >> +  cfun->curr_properties |= >> +   (PROP_gimple_lcf | PROP_gimple_leh | PROP_cfg | PROP_referenced_vars | >> +   PROP_ssa); >> +  new_bb = create_empty_bb (ENTRY_BLOCK_PTR); >> +  make_edge (ENTRY_BLOCK_PTR, new_bb, EDGE_FALLTHRU); >> +  make_edge (new_bb, EXIT_BLOCK_PTR, 0); >> +  *empty_bb = new_bb; >> + >> +  cgraph_add_new_function (decl, true); >> +  cgraph_call_function_insertion_hooks (cgraph_get_create_node (decl)); >> +  cgraph_analyze_function (cgraph_get_create_node (decl)); >> +  cgraph_mark_needed_node (cgraph_get_create_node (decl)); >> + >> +  if (DECL_COMDAT_GROUP (default_decl)) >> +   { >> +    gcc_assert (cgraph_get_node (default_decl)); >> +    cgraph_add_to_same_comdat_group (cgraph_get_node (decl), >> +                    cgraph_get_node (default_decl)); >> +   } >> + >> +  pop_cfun (); >> +  current_function_decl = old_current_function_decl; >> + >> +  gcc_assert (ifunc_decl != NULL); >> +  DECL_ATTRIBUTES (ifunc_decl) >> +   = make_attribute ("ifunc", resolver_name, DECL_ATTRIBUTES (ifunc_decl)); >> +  assemble_alias (ifunc_decl, get_identifier (resolver_name)); >> +  return decl; >> +} >> + >> +/* Make and ifunc declaration for the multi-versioned function DECL.  Calls to >> +  DECL function will be replaced with calls to the ifunc.  Return the decl >> +  of the ifunc created.  */ >> + >> +static tree >> +make_ifunc_func (const tree decl) >> +{ >> +  tree ifunc_decl; >> +  char *ifunc_name, *resolver_name; >> +  tree fn_type, ifunc_type; >> +  bool make_unique = false; >> + >> +  if (TREE_PUBLIC (decl) == 0) >> +   make_unique = true; >> + >> +  ifunc_name = make_name (decl, "ifunc", make_unique); >> +  resolver_name = make_name (decl, "resolver", make_unique); >> +  gcc_assert (resolver_name); >> + >> +  fn_type = TREE_TYPE (decl); >> +  ifunc_type = build_function_type (TREE_TYPE (fn_type), >> +                  TYPE_ARG_TYPES (fn_type)); >> + >> +  ifunc_decl = build_fn_decl (ifunc_name, ifunc_type); >> +  TREE_USED (ifunc_decl) = 1; >> +  DECL_CONTEXT (ifunc_decl) = NULL_TREE; >> +  DECL_INITIAL (ifunc_decl) = error_mark_node; >> +  DECL_ARTIFICIAL (ifunc_decl) = 1; >> +  /* Mark this ifunc as external, the resolver will flip it again if >> +   it gets generated.  */ >> +  DECL_EXTERNAL (ifunc_decl) = 1; >> +  /* IFUNCs have to be externally visible.  */ >> +  TREE_PUBLIC (ifunc_decl) = 1; >> + >> +  return ifunc_decl; >> +} >> + >> +/* For multi-versioned function decl, which should also be the default, >> +  return the decl of the ifunc resolver, create it if it does not >> +  exist.  */ >> + >> +tree >> +get_ifunc_for_version (const tree decl) >> +{ >> +  version_function *decl_v; >> +  int ix; >> +  void_p ele; >> + >> +  /* DECL has to be the default version, otherwise it is missing and >> +   that is not allowed.  */ >> +  if (!is_default_function (decl)) >> +   { >> +    error_at (DECL_SOURCE_LOCATION (decl), "Default version not found"); >> +    return decl; >> +   } >> + >> +  decl_v = find_function_version (decl); >> +  gcc_assert (decl_v != NULL); >> +  if (decl_v->ifunc_decl == NULL) >> +   { >> +    tree ifunc_decl; >> +    ifunc_decl = make_ifunc_func (decl); >> +    decl_v->ifunc_decl = ifunc_decl; >> +   } >> + >> +  if (cgraph_get_node (decl)) >> +   cgraph_mark_needed_node (cgraph_get_node (decl)); >> + >> +  for (ix = 0; VEC_iterate (void_p, decl_v->versions, ix, ele); ++ix) >> +   { >> +    version_function *v = (version_function *) ele; >> +    gcc_assert (v->decl != NULL); >> +    if (cgraph_get_node (v->decl)) >> +    cgraph_mark_needed_node (cgraph_get_node (v->decl)); >> +   } >> + >> +  return decl_v->ifunc_decl; >> +} >> + >> +/* Generate the dispatching code to dispatch multi-versioned function >> +  DECL.  Make a new function decl for dispatching and call the target >> +  hook to process the "targetv" attributes and provide the code to >> +  dispatch the right function at run-time.  */ >> + >> +static tree >> +make_ifunc_resolver_for_version (const tree decl) >> +{ >> +  version_function *decl_v; >> +  tree ifunc_resolver_decl, ifunc_decl; >> +  basic_block empty_bb; >> +  int ix; >> +  void_p ele; >> +  VEC (tree, heap) *fn_ver_vec = NULL; >> + >> +  gcc_assert (is_default_function (decl)); >> + >> +  decl_v = find_function_version (decl); >> +  gcc_assert (decl_v != NULL); >> + >> +  if (decl_v->ifunc_resolver_decl != NULL) >> +   return decl_v->ifunc_resolver_decl; >> + >> +  ifunc_decl = decl_v->ifunc_decl; >> + >> +  if (ifunc_decl == NULL) >> +   ifunc_decl = decl_v->ifunc_decl = make_ifunc_func (decl); >> + >> +  ifunc_resolver_decl = make_ifunc_resolver_func (decl, ifunc_decl, >> +                         &empty_bb); >> + >> +  fn_ver_vec = VEC_alloc (tree, heap, 2); >> +  VEC_safe_push (tree, heap, fn_ver_vec, decl); >> + >> +  for (ix = 0; VEC_iterate (void_p, decl_v->versions, ix, ele); ++ix) >> +   { >> +    version_function *v = (version_function *) ele; >> +    gcc_assert (v->decl != NULL); >> +    /* Check for virtual functions here again, as by this time it should >> +     have been determined if this function needs a vtable index or >> +     not.  This happens for methods in derived classes that override >> +     virtual methods in base classes but are not explicitly marked as >> +     virtual.  */ >> +    if (DECL_VINDEX (v->decl)) >> +     error_at (DECL_SOURCE_LOCATION (v->decl), >> +         "Virtual function versioning not supported\n"); >> +    if (!v->is_deleted) >> +    VEC_safe_push (tree, heap, fn_ver_vec, v->decl); >> +   } >> + >> +  gcc_assert (targetm.dispatch_version); >> +  targetm.dispatch_version (ifunc_resolver_decl, fn_ver_vec, &empty_bb); >> +  decl_v->ifunc_resolver_decl = ifunc_resolver_decl; >> + >> +  return ifunc_resolver_decl; >> +} >> + >> +/* Main entry point to pass_dispatch_versions. For multi-versioned functions, >> +  generate the dispatching code.  */ >> + >> +static unsigned int >> +do_dispatch_versions (void) >> +{ >> +  /* A new pass for generating dispatch code for multi-versioned functions. >> +   Other forms of dispatch can be added when ifunc support is not available >> +   like just calling the function directly after checking for target type. >> +   Currently, dispatching is done through IFUNC.  This pass will become >> +   more meaningful when other dispatch mechanisms are added.  */ >> + >> +  /* Cloning a function to produce more versions will happen here when the >> +   user requests that via the targetv attribute. For example, >> +   int foo () __attribute__ ((targetv(("arch=core2"), ("arch=corei7")))); >> +   means that the user wants the same body of foo to be versioned for core2 >> +   and corei7.  In that case, this function will be cloned during this >> +   pass.  */ >> + >> +  if (DECL_FUNCTION_VERSIONED (current_function_decl) >> +    && is_default_function (current_function_decl)) >> +   { >> +    tree decl = make_ifunc_resolver_for_version (current_function_decl); >> +    if (dump_file && decl) >> +    dump_function_to_file (decl, dump_file, TDF_BLOCKS); >> +   } >> +  return 0; >> +} >> + >> +static  bool >> +gate_dispatch_versions (void) >> +{ >> +  return true; >> +} >> + >> +/* A pass to generate the dispatch code to execute the appropriate version >> +  of a multi-versioned function at run-time.  */ >> + >> +struct gimple_opt_pass pass_dispatch_versions = >> +{ >> + { >> +  GIMPLE_PASS, >> +  "dispatch_multiversion_functions",   /* name */ >> +  gate_dispatch_versions,        /* gate */ >> +  do_dispatch_versions,             /* execute */ >> +  NULL,                     /* sub */ >> +  NULL,                     /* next */ >> +  0,                  /* static_pass_number */ >> +  TV_MULTIVERSION_DISPATCH,       /* tv_id */ >> +  PROP_cfg,               /* properties_required */ >> +  PROP_cfg,               /* properties_provided */ >> +  0,                  /* properties_destroyed */ >> +  0,                  /* todo_flags_start */ >> +  TODO_dump_func |           /* todo_flags_finish */ >> +  TODO_cleanup_cfg | TODO_dump_cgraph >> + } >> +}; >> Index: cgraphunit.c >> =================================================================== >> --- cgraphunit.c     (revision 184971) >> +++ cgraphunit.c     (working copy) >> @@ -141,6 +141,7 @@ along with GCC; see the file COPYING3.  If not see >>  #include "ipa-inline.h" >>  #include "ipa-utils.h" >>  #include "lto-streamer.h" >> +#include "multiversion.h" >> >>  static void cgraph_expand_all_functions (void); >>  static void cgraph_mark_functions_to_output (void); >> @@ -343,6 +344,13 @@ cgraph_finalize_function (tree decl, bool nested) >>    node->local.redefined_extern_inline = true; >>   } >> >> +  /* If this is a function version and not the default, change the >> +   assembler name of this function.  The DECL names of function >> +   versions are the same, only the assembler names are made unique. >> +   The assembler name is changed by appending the string from >> +   the "targetv" attribute.  */ >> +  version_assembler_name (decl); >> + >>  notice_global_symbol (decl); >>  node->local.finalized = true; >>  node->lowered = DECL_STRUCT_FUNCTION (decl)->cfg != NULL; >> Index: multiversion.h >> =================================================================== >> --- multiversion.h    (revision 0) >> +++ multiversion.h    (revision 0) >> @@ -0,0 +1,52 @@ >> +/* Function Multiversioning. >> +  Copyright (C) 2012 Free Software Foundation, Inc. >> +  Contributed by Sriraman Tallam (tmsriram@google.com) >> + >> +This file is part of GCC. >> + >> +GCC is free software; you can redistribute it and/or modify it under >> +the terms of the GNU General Public License as published by the Free >> +Software Foundation; either version 3, or (at your option) any later >> +version. >> + >> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY >> +WARRANTY; without even the implied warranty of MERCHANTABILITY or >> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License >> +for more details. >> + >> +You should have received a copy of the GNU General Public License >> +along with GCC; see the file COPYING3.  If not see >> +<http://www.gnu.org/licenses/>. */ >> + >> +/* This is the header file which provides the functions to keep track >> +  of functions that are multi-versioned and to generate the dispatch >> +  code to call the right version at run-time.  */ >> + >> +#ifndef GCC_MULTIVERSION_H >> +#define GCC_MULTIVERION_H >> + >> +#include "tree.h" >> + >> +/* Mark DECL1 and DECL2 as function versions.  */ >> +int group_function_versions (const tree decl1, const tree decl2); >> + >> +/* Mark DECL as deleted and no longer a version.  */ >> +void mark_delete_decl_version (const tree decl); >> + >> +/* Returns true if DECL is the default version to be executed if all >> +  other versions are inappropriate at run-time.  */ >> +bool is_default_function (const tree decl); >> + >> +/* Gets the IFUNC dispatcher for this multi-versioned function DECL. DECL >> +  must be the default function in the multi-versioned group.  */ >> +tree get_ifunc_for_version (const tree decl); >> + >> +/* Returns true when only one of DECL1 and DECL2 is marked with "targetv" >> +  or if the "targetv" attribute strings of  DECL1 and DECL2 dont match.  */ >> +bool has_different_version_attributes (const tree decl1, const tree decl2); >> + >> +/* If DECL is a function version and not the default version, the assembler >> +  name of DECL is changed to include the attribute string to keep the >> +  name unambiguous.  */ >> +void version_assembler_name (const tree decl); >> +#endif >> Index: cp/class.c >> =================================================================== >> --- cp/class.c  (revision 184971) >> +++ cp/class.c  (working copy) >> @@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.  If not see >>  #include "tree-dump.h" >>  #include "splay-tree.h" >>  #include "pointer-set.h" >> +#include "multiversion.h" >> >>  /* The number of nested classes being processed.  If we are not in the >>   scope of any class, this is zero.  */ >> @@ -1092,7 +1093,20 @@ add_method (tree type, tree method, tree using_dec >>        || same_type_p (TREE_TYPE (fn_type), >>                TREE_TYPE (method_type)))) >>     { >> -     if (using_decl) >> +     /* For function versions, their parms and types match >> +       but they are not duplicates.  Record function versions >> +       as and when they are found.  */ >> +     if (TREE_CODE (fn) == FUNCTION_DECL >> +       && TREE_CODE (method) == FUNCTION_DECL >> +       && (DECL_FUNCTION_VERSIONED (fn) >> +         || DECL_FUNCTION_VERSIONED (method))) >> +      { >> +       DECL_FUNCTION_VERSIONED (fn) = 1; >> +       DECL_FUNCTION_VERSIONED (method) = 1; >> +       group_function_versions (fn, method); >> +       continue; >> +      } >> +     else if (using_decl) >>       { >>        if (DECL_CONTEXT (fn) == type) >>         /* Defer to the local function.  */ >> @@ -1150,6 +1164,13 @@ add_method (tree type, tree method, tree using_dec >>  else >>   /* Replace the current slot.  */ >>   VEC_replace (tree, method_vec, slot, overload); >> + >> +  /* Change the assembler name of method here if it has "targetv" >> +   attributes.  Since all versions have the same mangled name, >> +   their assembler name is changed by appending the string from >> +   the "targetv" attribute. */ >> +  version_assembler_name (method); >> + >>  return true; >>  } >> >> @@ -6890,8 +6911,11 @@ resolve_address_of_overloaded_function (tree targe >>      if (DECL_ANTICIPATED (fn)) >>       continue; >> >> -     /* See if there's a match.  */ >> -     if (same_type_p (target_fn_type, static_fn_type (fn))) >> +     /* See if there's a match.  For functions that are multi-versioned >> +       match it to the default function.  */ >> +     if (same_type_p (target_fn_type, static_fn_type (fn)) >> +       && (!DECL_FUNCTION_VERSIONED (fn) >> +         || is_default_function (fn))) >>       matches = tree_cons (fn, NULL_TREE, matches); >>     } >>   } >> @@ -7053,6 +7077,21 @@ resolve_address_of_overloaded_function (tree targe >>    perform_or_defer_access_check (access_path, fn, fn); >>   } >> >> +  /* If a pointer to a function that is multi-versioned is requested, the >> +   pointer to the dispatcher function is returned instead.  This works >> +   well because indirectly calling the function will dispatch the right >> +   function version at run-time. Also, the function address is kept >> +   unique.  */ >> +  if (DECL_FUNCTION_VERSIONED (fn) >> +    && is_default_function (fn)) >> +   { >> +    tree ifunc_decl; >> +    ifunc_decl = get_ifunc_for_version (fn); >> +    gcc_assert (ifunc_decl != NULL); >> +    mark_used (fn); >> +    return build_fold_addr_expr (ifunc_decl); >> +   } >> + >>  if (TYPE_PTRFN_P (target_type) || TYPE_PTRMEMFUNC_P (target_type)) >>   return cp_build_addr_expr (fn, flags); >>  else >> Index: cp/decl.c >> =================================================================== >> --- cp/decl.c  (revision 184971) >> +++ cp/decl.c  (working copy) >> @@ -54,6 +54,7 @@ along with GCC; see the file COPYING3.  If not see >>  #include "pointer-set.h" >>  #include "splay-tree.h" >>  #include "plugin.h" >> +#include "multiversion.h" >> >>  /* Possible cases of bad specifiers type used by bad_specifiers. */ >>  enum bad_spec_place { >> @@ -972,6 +973,23 @@ decls_match (tree newdecl, tree olddecl) >>    if (t1 != t2) >>     return 0; >> >> +    /* The decls dont match if they correspond to two different versions >> +     of the same function.  */ >> +    if (compparms (p1, p2) >> +     && same_type_p (TREE_TYPE (f1), TREE_TYPE (f2)) >> +     && (DECL_FUNCTION_VERSIONED (newdecl) >> +       || DECL_FUNCTION_VERSIONED (olddecl)) >> +     && has_different_version_attributes (newdecl, olddecl)) >> +    { >> +     /* One of the decls could be the default without the "targetv" >> +       attribute. Set it to be a versioned function here.  */ >> +     DECL_FUNCTION_VERSIONED (newdecl) = 1; >> +     DECL_FUNCTION_VERSIONED (olddecl) = 1; >> +     /* Accumulate all the versions of a function.  */ >> +     group_function_versions (olddecl, newdecl); >> +     return 0; >> +    } >> + >>    if (CP_DECL_CONTEXT (newdecl) != CP_DECL_CONTEXT (olddecl) >>      && ! (DECL_EXTERN_C_P (newdecl) >>         && DECL_EXTERN_C_P (olddecl))) >> @@ -1482,7 +1500,11 @@ duplicate_decls (tree newdecl, tree olddecl, bool >>        error ("previous declaration %q+#D here", olddecl); >>        return NULL_TREE; >>       } >> -     else if (compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)), >> +     /* For function versions, params and types match, but they >> +       are not ambiguous.  */ >> +     else if ((!DECL_FUNCTION_VERSIONED (newdecl) >> +          && !DECL_FUNCTION_VERSIONED (olddecl)) >> +          && compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)), >>                TYPE_ARG_TYPES (TREE_TYPE (olddecl)))) >>       { >>        error ("new declaration %q#D", newdecl); >> @@ -2250,6 +2272,16 @@ duplicate_decls (tree newdecl, tree olddecl, bool >>  else if (DECL_PRESERVE_P (newdecl)) >>   DECL_PRESERVE_P (olddecl) = 1; >> >> +  /* If the olddecl is a version, so is the newdecl.  */ >> +  if (TREE_CODE (newdecl) == FUNCTION_DECL >> +    && DECL_FUNCTION_VERSIONED (olddecl)) >> +   { >> +    DECL_FUNCTION_VERSIONED (newdecl) = 1; >> +    /* Record that newdecl is not a valid version and has >> +     been deleted.  */ >> +    mark_delete_decl_version (newdecl); >> +   } >> + >>  if (TREE_CODE (newdecl) == FUNCTION_DECL) >>   { >>    int function_size; >> @@ -4512,6 +4544,10 @@ start_decl (const cp_declarator *declarator, >>  /* Enter this declaration into the symbol table.  */ >>  decl = maybe_push_decl (decl); >> >> +  /* If this decl is a function version and not the default, its assembler >> +   name has to be changed.  */ >> +  version_assembler_name (decl); >> + >>  if (processing_template_decl) >>   decl = push_template_decl (decl); >>  if (decl == error_mark_node) >> @@ -13019,6 +13055,10 @@ start_function (cp_decl_specifier_seq *declspecs, >>   gcc_assert (same_type_p (TREE_TYPE (TREE_TYPE (decl1)), >>               integer_type_node)); >> >> +  /* If this decl is a function version and not the default, its assembler >> +   name has to be changed.  */ >> +  version_assembler_name (decl1); >> + >>  start_preparsed_function (decl1, attrs, /*flags=*/SF_DEFAULT); >> >>  return 1; >> @@ -13960,6 +14000,11 @@ cxx_comdat_group (tree decl) >>       break; >>     } >>    name = DECL_ASSEMBLER_NAME (decl); >> +    if (TREE_CODE (decl) == FUNCTION_DECL >> +     && DECL_FUNCTION_VERSIONED (decl)) >> +    name = DECL_NAME (decl); >> +    else >> +     name = DECL_ASSEMBLER_NAME (decl); >>   } >> >>  return name; >> Index: cp/semantics.c >> =================================================================== >> --- cp/semantics.c    (revision 184971) >> +++ cp/semantics.c    (working copy) >> @@ -3783,8 +3783,11 @@ expand_or_defer_fn_1 (tree fn) >>    /* If the user wants us to keep all inline functions, then mark >>     this function as needed so that finish_file will make sure to >>     output it later.  Similarly, all dllexport'd functions must >> -     be emitted; there may be callers in other DLLs.  */ >> -    if ((flag_keep_inline_functions >> +     be emitted; there may be callers in other DLLs. >> +     Also, mark this function as needed if it is marked inline but >> +     is a multi-versioned function.  */ >> +    if (((flag_keep_inline_functions >> +      || DECL_FUNCTION_VERSIONED (fn)) >>      && DECL_DECLARED_INLINE_P (fn) >>      && !DECL_REALLY_EXTERN (fn)) >>      || (flag_keep_inline_dllexport >> Index: cp/decl2.c >> =================================================================== >> --- cp/decl2.c  (revision 184971) >> +++ cp/decl2.c  (working copy) >> @@ -53,6 +53,7 @@ along with GCC; see the file COPYING3.  If not see >>  #include "splay-tree.h" >>  #include "langhooks.h" >>  #include "c-family/c-ada-spec.h" >> +#include "multiversion.h" >> >>  extern cpp_reader *parse_in; >> >> @@ -674,9 +675,13 @@ check_classfn (tree ctype, tree function, tree tem >>      if (is_template != (TREE_CODE (fndecl) == TEMPLATE_DECL)) >>       continue; >> >> +     /* While finding a match, same types and params are not enough >> +       if the function is versioned.  Also check version ("targetv") >> +       attributes.  */ >>      if (same_type_p (TREE_TYPE (TREE_TYPE (function)), >>              TREE_TYPE (TREE_TYPE (fndecl))) >>        && compparms (p1, p2) >> +       && !has_different_version_attributes (function, fndecl) >>        && (!is_template >>          || comp_template_parms (template_parms, >>                      DECL_TEMPLATE_PARMS (fndecl))) >> Index: cp/call.c >> =================================================================== >> --- cp/call.c  (revision 184971) >> +++ cp/call.c  (working copy) >> @@ -41,6 +41,7 @@ along with GCC; see the file COPYING3.  If not see >>  #include "langhooks.h" >>  #include "c-family/c-objc.h" >>  #include "timevar.h" >> +#include "multiversion.h" >> >>  /* The various kinds of conversion.  */ >> >> @@ -6730,6 +6731,17 @@ build_over_call (struct z_candidate *cand, int fla >>  if (!already_used) >>   mark_used (fn); >> >> +  /* For a call to a multi-versioned function, the call should actually be to >> +   the dispatcher.  */ >> +  if (DECL_FUNCTION_VERSIONED (fn)) >> +   { >> +    tree ifunc_decl; >> +    ifunc_decl = get_ifunc_for_version (fn); >> +    gcc_assert (ifunc_decl != NULL); >> +    return build_call_expr_loc_array (UNKNOWN_LOCATION, ifunc_decl, >> +                    nargs, argarray); >> +   } >> + >>  if (DECL_VINDEX (fn) && (flags & LOOKUP_NONVIRTUAL) == 0) >>   { >>    tree t; >> @@ -7980,6 +7992,30 @@ joust (struct z_candidate *cand1, struct z_candida >>  size_t i; >>  size_t len; >> >> +  /* For Candidates of a multi-versioned function, the one marked default >> +   wins.  This is because the default decl is used as key to aggregate >> +   all the other versions provided for it in multiversion.c.  When >> +   generating the actual call, the appropriate dispatcher is created >> +   to call the right function version at run-time.  */ >> + >> +  if ((TREE_CODE (cand1->fn) == FUNCTION_DECL >> +    && DECL_FUNCTION_VERSIONED (cand1->fn)) >> +    ||(TREE_CODE (cand2->fn) == FUNCTION_DECL >> +     && DECL_FUNCTION_VERSIONED (cand2->fn))) >> +   { >> +    if (is_default_function (cand1->fn)) >> +    { >> +      mark_used (cand2->fn); >> +     return 1; >> +    } >> +    if (is_default_function (cand2->fn)) >> +    { >> +      mark_used (cand1->fn); >> +     return -1; >> +    } >> +    return 0; >> +   } >> + >>  /* Candidates that involve bad conversions are always worse than those >>    that don't.  */ >>  if (cand1->viable > cand2->viable) >> Index: timevar.def >> =================================================================== >> --- timevar.def (revision 184971) >> +++ timevar.def (working copy) >> @@ -253,6 +253,7 @@ DEFTIMEVAR (TV_TREE_IFCOMBINE     , "tree if-co >>  DEFTIMEVAR (TV_TREE_UNINIT      , "uninit var analysis") >>  DEFTIMEVAR (TV_PLUGIN_INIT      , "plugin initialization") >>  DEFTIMEVAR (TV_PLUGIN_RUN       , "plugin execution") >> +DEFTIMEVAR (TV_MULTIVERSION_DISPATCH , "multiversion dispatch") >> >>  /* Everything else in rest_of_compilation not included above.  */ >>  DEFTIMEVAR (TV_EARLY_LOCAL      , "early local passes") >> Index: varasm.c >> =================================================================== >> --- varasm.c   (revision 184971) >> +++ varasm.c   (working copy) >> @@ -5755,6 +5755,8 @@ finish_aliases_1 (void) >>     } >>    else if (! (p->emitted_diags & ALIAS_DIAG_TO_EXTERN) >>        && DECL_EXTERNAL (target_decl) >> +        && (!TREE_CODE (target_decl) == FUNCTION_DECL >> +          || !DECL_STRUCT_FUNCTION (target_decl)) >>        /* We use local aliases for C++ thunks to force the tailcall >>          to bind locally.  This is a hack - to keep it working do >>          the following (which is not strictly correct).  */ >> Index: Makefile.in >> =================================================================== >> --- Makefile.in (revision 184971) >> +++ Makefile.in (working copy) >> @@ -1298,6 +1298,7 @@ OBJS = \ >>     mcf.o \ >>     mode-switching.o \ >>     modulo-sched.o \ >> +    multiversion.o \ >>     omega.o \ >>     omp-low.o \ >>     optabs.o \ >> @@ -3030,6 +3031,11 @@ ree.o : ree.c $(CONFIG_H) $(SYSTEM_H) coretypes.h >>   $(DF_H) $(TIMEVAR_H) tree-pass.h $(RECOG_H) $(EXPR_H) \ >>   $(REGS_H) $(TREE_H) $(TM_P_H) insn-config.h $(INSN_ATTR_H) $(DIAGNOSTIC_CORE_H) \ >>   $(TARGET_H) $(OPTABS_H) insn-codes.h rtlhooks-def.h $(PARAMS_H) $(CGRAPH_H) >> +multiversion.o : multiversion.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \ >> +  $(TREE_H) langhooks.h $(TREE_INLINE_H) $(FLAGS_H) $(CGRAPH_H) intl.h \ >> +  $(DIAGNOSTIC_H) $(FIBHEAP_H) $(PARAMS_H) $(TIMEVAR_H) tree-pass.h \ >> +  $(HASHTAB_H) $(COVERAGE_H) $(GGC_H) $(TREE_FLOW_H) $(RTL_H) $(IPA_PROP_H) \ >> +  $(BASIC_BLOCK_H) $(TOPLEV_H) $(TREE_DUMP_H) ipa-inline.h >>  cprop.o : cprop.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \ >>   $(REGS_H) hard-reg-set.h $(FLAGS_H) insn-config.h $(GGC_H) \ >>   $(RECOG_H) $(EXPR_H) $(BASIC_BLOCK_H) $(FUNCTION_H) output.h toplev.h $(DIAGNOSTIC_CORE_H) \ >> Index: passes.c >> =================================================================== >> --- passes.c   (revision 184971) >> +++ passes.c   (working copy) >> @@ -1190,6 +1190,7 @@ init_optimization_passes (void) >>  NEXT_PASS (pass_build_cfg); >>  NEXT_PASS (pass_warn_function_return); >>  NEXT_PASS (pass_build_cgraph_edges); >> +  NEXT_PASS (pass_dispatch_versions); >>  *p = NULL; >> >>  /* Interprocedural optimization passes.  */ >> Index: config/i386/i386.c >> =================================================================== >> --- config/i386/i386.c  (revision 184971) >> +++ config/i386/i386.c  (working copy) >> @@ -27446,6 +27473,593 @@ ix86_init_mmx_sse_builtins (void) >>   } >>  } >> >> +/* This adds a condition to the basic_block NEW_BB in function FUNCTION_DECL >> +  to return a pointer to VERSION_DECL if the outcome of the function >> +  PREDICATE_DECL is true.  This function will be called during version >> +  dispatch to decide which function version to execute.  It returns the >> +  basic block at the end to which more conditions can be added.  */ >> + >> +static basic_block >> +add_condition_to_bb (tree function_decl, tree version_decl, >> +           basic_block new_bb, tree predicate_decl) >> +{ >> +  gimple return_stmt; >> +  tree convert_expr, result_var; >> +  gimple convert_stmt; >> +  gimple call_cond_stmt; >> +  gimple if_else_stmt; >> + >> +  basic_block bb1, bb2, bb3; >> +  edge e12, e23; >> + >> +  tree cond_var; >> +  gimple_seq gseq; >> + >> +  tree old_current_function_decl; >> + >> +  old_current_function_decl = current_function_decl; >> +  push_cfun (DECL_STRUCT_FUNCTION (function_decl)); >> +  current_function_decl = function_decl; >> + >> +  gcc_assert (new_bb != NULL); >> +  gseq = bb_seq (new_bb); >> + >> + >> +  convert_expr = build1 (CONVERT_EXPR, ptr_type_node, >> +             build_fold_addr_expr (version_decl)); >> +  result_var = create_tmp_var (ptr_type_node, NULL); >> +  convert_stmt = gimple_build_assign (result_var, convert_expr); >> +  return_stmt = gimple_build_return (result_var); >> + >> +  if (predicate_decl == NULL_TREE) >> +   { >> +    gimple_seq_add_stmt (&gseq, convert_stmt); >> +    gimple_seq_add_stmt (&gseq, return_stmt); >> +    set_bb_seq (new_bb, gseq); >> +    gimple_set_bb (convert_stmt, new_bb); >> +    gimple_set_bb (return_stmt, new_bb); >> +    pop_cfun (); >> +    current_function_decl = old_current_function_decl; >> +    return new_bb; >> +   } >> + >> +  cond_var = create_tmp_var (integer_type_node, NULL); >> +  call_cond_stmt = gimple_build_call (predicate_decl, 0); >> +  gimple_call_set_lhs (call_cond_stmt, cond_var); >> + >> +  gimple_set_block (call_cond_stmt, DECL_INITIAL (function_decl)); >> +  gimple_set_bb (call_cond_stmt, new_bb); >> +  gimple_seq_add_stmt (&gseq, call_cond_stmt); >> + >> +  if_else_stmt = gimple_build_cond (GT_EXPR, cond_var, >> +                  integer_zero_node, >> +                  NULL_TREE, NULL_TREE); >> +  gimple_set_block (if_else_stmt, DECL_INITIAL (function_decl)); >> +  gimple_set_bb (if_else_stmt, new_bb); >> +  gimple_seq_add_stmt (&gseq, if_else_stmt); >> + >> +  gimple_seq_add_stmt (&gseq, convert_stmt); >> +  gimple_seq_add_stmt (&gseq, return_stmt); >> +  set_bb_seq (new_bb, gseq); >> + >> +  bb1 = new_bb; >> +  e12 = split_block (bb1, if_else_stmt); >> +  bb2 = e12->dest; >> +  e12->flags &= ~EDGE_FALLTHRU; >> +  e12->flags |= EDGE_TRUE_VALUE; >> + >> +  e23 = split_block (bb2, return_stmt); >> + >> +  gimple_set_bb (convert_stmt, bb2); >> +  gimple_set_bb (return_stmt, bb2); >> + >> +  bb3 = e23->dest; >> +  make_edge (bb1, bb3, EDGE_FALSE_VALUE); >> + >> +  remove_edge (e23); >> +  make_edge (bb2, EXIT_BLOCK_PTR, 0); >> + >> +  rebuild_cgraph_edges (); >> + >> +  pop_cfun (); >> +  current_function_decl = old_current_function_decl; >> + >> +  return bb3; >> +} >> + >> +/* This parses the attribute arguments to targetv in DECL and determines >> +  the right builtin to use to match the platform specification. >> +  For now, only one target argument ("arch=") is allowed.  */ >> + >> +static enum ix86_builtins >> +get_builtin_code_for_version (tree decl) >> +{ >> +  tree attrs; >> +  struct cl_target_option cur_target; >> +  tree target_node; >> +  struct cl_target_option *new_target; >> +  enum ix86_builtins builtin_code = IX86_BUILTIN_MAX; >> + >> +  attrs = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl)); >> +  gcc_assert (attrs != NULL); >> + >> +  cl_target_option_save (&cur_target, &global_options); >> + >> +  target_node = ix86_valid_target_attribute_tree >> +         (TREE_VALUE (TREE_VALUE (attrs))); >> + >> +  gcc_assert (target_node); >> +  new_target = TREE_TARGET_OPTION (target_node); >> +  gcc_assert (new_target); >> + >> +  if (new_target->arch_specified && new_target->arch > 0) >> +   { >> +    switch (new_target->arch) >> +     { >> +    case 1: >> +    case 2: >> +    case 3: >> +    case 4: >> +    case 5: >> +    case 6: >> +    case 7: >> +    case 8: >> +    case 9: >> +    case 10: >> +    case 11: >> +     builtin_code = IX86_BUILTIN_CPU_IS_INTEL; >> +     break; >> +    case 12: >> +     builtin_code = IX86_BUILTIN_CPU_IS_INTEL_CORE2; >> +     break; >> +    case 13: >> +     builtin_code = IX86_BUILTIN_CPU_IS_INTEL_COREI7; >> +     break; >> +    case 14: >> +     builtin_code = IX86_BUILTIN_CPU_IS_INTEL_ATOM; >> +     break; >> +    case 15: >> +    case 16: >> +    case 17: >> +    case 18: >> +    case 19: >> +    case 20: >> +    case 21: >> +     builtin_code = IX86_BUILTIN_CPU_IS_AMD; >> +     break; >> +    case 22: >> +     builtin_code = IX86_BUILTIN_CPU_IS_AMDFAM10H; >> +     break; >> +    case 23: >> +     builtin_code = IX86_BUILTIN_CPU_IS_AMDFAM15H_BDVER1; >> +     break; >> +    case 24: >> +     builtin_code = IX86_BUILTIN_CPU_IS_AMDFAM15H_BDVER2; >> +     break; >> +    case 25: /* What is btver1 ? */ >> +     builtin_code = IX86_BUILTIN_CPU_IS_AMD; >> +     break; >> +    } >> +   } >> + >> +  cl_target_option_restore (&global_options, &cur_target); >> +  if (builtin_code == IX86_BUILTIN_MAX) >> +    error_at (DECL_SOURCE_LOCATION (decl), >> +        "No dispatcher found for the versioning attributes"); >> + >> +  return builtin_code; >> +} >> + >> +/* This is the target hook to generate the dispatch function for >> +  multi-versioned functions.  DISPATCH_DECL is the function which will >> +  contain the dispatch logic.  FNDECLS are the function choices for >> +  dispatch, and is a tree chain.  EMPTY_BB is the basic block pointer >> +  in DISPATCH_DECL in which the dispatch code is generated.  */ >> + >> +static int >> +ix86_dispatch_version (tree dispatch_decl, >> +            void *fndecls_p, >> +            basic_block *empty_bb) >> +{ >> +  tree default_decl; >> +  gimple ifunc_cpu_init_stmt; >> +  gimple_seq gseq; >> +  tree old_current_function_decl; >> +  int ix; >> +  tree ele; >> +  VEC (tree, heap) *fndecls; >> + >> +  gcc_assert (dispatch_decl != NULL >> +       && fndecls_p != NULL >> +       && empty_bb != NULL); >> + >> +  /*fndecls_p is actually a vector.  */ >> +  fndecls = (VEC (tree, heap) *)fndecls_p; >> + >> +  /* Atleast one more version other than the default.  */ >> +  gcc_assert (VEC_length (tree, fndecls) >= 2); >> + >> +  /* The first version in the vector is the default decl.  */ >> +  default_decl = VEC_index (tree, fndecls, 0); >> + >> +  old_current_function_decl = current_function_decl; >> +  push_cfun (DECL_STRUCT_FUNCTION (dispatch_decl)); >> +  current_function_decl = dispatch_decl; >> + >> +  gseq = bb_seq (*empty_bb); >> +  ifunc_cpu_init_stmt = gimple_build_call_vec ( >> +           ix86_builtins [(int) IX86_BUILTIN_CPU_INIT], NULL); >> +  gimple_seq_add_stmt (&gseq, ifunc_cpu_init_stmt); >> +  gimple_set_bb (ifunc_cpu_init_stmt, *empty_bb); >> +  set_bb_seq (*empty_bb, gseq); >> + >> +  pop_cfun (); >> +  current_function_decl = old_current_function_decl; >> + >> + >> +  for (ix = 1; VEC_iterate (tree, fndecls, ix, ele); ++ix) >> +   { >> +    tree version_decl = ele; >> +    /* Get attribute string, parse it and find the right predicate decl. >> +     The predicate function could be a lengthy combination of many >> +     features, like arch-type and various isa-variants.  For now, only >> +     check the arch-type.  */ >> +    tree predicate_decl = ix86_builtins [ >> +            get_builtin_code_for_version (version_decl)]; >> +    *empty_bb = add_condition_to_bb (dispatch_decl, version_decl, *empty_bb, >> +                    predicate_decl); >> + >> +   } >> +  /* dispatch default version at the end.  */ >> +  *empty_bb = add_condition_to_bb (dispatch_decl, default_decl, *empty_bb, >> +                  NULL); >> +  return 0; >> +} >> >> @@ -38610,6 +39269,12 @@ ix86_autovectorize_vector_sizes (void) >>  #undef TARGET_BUILD_BUILTIN_VA_LIST >>  #define TARGET_BUILD_BUILTIN_VA_LIST ix86_build_builtin_va_list >> >> +#undef TARGET_DISPATCH_VERSION >> +#define TARGET_DISPATCH_VERSION ix86_dispatch_version >> + >>  #undef TARGET_ENUM_VA_LIST_P >>  #define TARGET_ENUM_VA_LIST_P ix86_enum_va_list >> >> Index: testsuite/g++.dg/mv1.C >> =================================================================== >> --- testsuite/g++.dg/mv1.C    (revision 0) >> +++ testsuite/g++.dg/mv1.C    (revision 0) >> @@ -0,0 +1,23 @@ >> +/* Simple test case to check if Multiversioning works.  */ >> +/* { dg-do run } */ >> +/* { dg-options "-O2" } */ >> + >> +int foo (); >> +int foo () __attribute__ ((targetv("arch=corei7"))); >> + >> +int main () >> +{ >> +  int (*p)() = &foo; >> +  return foo () + (*p)(); >> +} >> + >> +int foo () >> +{ >> +  return 0; >> +} >> + >> +int __attribute__ ((targetv("arch=corei7"))) >> +foo () >> +{ >> +  return 0; >> +} >> >> >> -- >> This patch is available for review at http://codereview.appspot.com/5752064
Sign in to reply to this message.
> You don't give an overview of the frontend implementation.  Thus I have > extracted the following > >  - the FE does not really know about the "overloading", nor can it directly >  resolve calls from a "sse" function to another "sse" function without going >  through the 2nd IFUNC > >  - cgraph also does not know about the "overloading", so it cannot do such >  "devirtualization" either > > you seem to have implemented something inbetween a pure frontend > solution and a proper middle-end solution.  For optimization and eventually > automatically selecting functions for cloning (like, callees of a manual "sse" > versioned function should be cloned?) it would be nice if the cgraph would > know about the different versions and their relationships (and the dispatcher). > Especially the cgraph code should know the functions are semantically > equivalent (I suppose we should require that).  The IFUNC should be > generated by cgraph / target code, similar to how we generate C++ thunks. The implementation is very similar to the case when the user writes its own ifunc and resolver. The difference here is that the resolver/dispatcher is synthesized by the compiler. Thunk is different -- as it is completely user invisible. Promoting ifunc to cgraph level has its advantage, but can also introduce burdens to ipa passes as it has to be understood by them. thanks, David > > Honza, any suggestions on how the FE side of such cgraph infrastructure > should look like and how we should encode the target bits? > > Thanks, > Richard. > >>     * doc/tm.texi.in: Add description for TARGET_DISPATCH_VERSION. >>     * doc/tm.texi: Regenerate. >>     * c-family/c-common.c (handle_targetv_attribute): New function. >>     * target.def (dispatch_version): New target hook. >>     * tree.h (DECL_FUNCTION_VERSIONED): New macro. >>     (tree_function_decl): New bit-field versioned_function. >>     * tree-pass.h (pass_dispatch_versions): New pass. >>     * multiversion.c: New file. >>     * multiversion.h: New file. >>     * cgraphunit.c: Include multiversion.h >>     (cgraph_finalize_function): Change assembler names of versioned >>     functions. >>     * cp/class.c: Include multiversion.h >>     (add_method): aggregate function versions. Change assembler names of >>     versioned functions. >>     (resolve_address_of_overloaded_function): Match address of function >>     version with default function.  Return address of ifunc dispatcher >>     for address of versioned functions. >>     * cp/decl.c (decls_match): Make decls unmatched for versioned >>     functions. >>     (duplicate_decls): Remove ambiguity for versioned functions. Notify >>     of deleted function version decls. >>     (start_decl): Change assembler name of versioned functions. >>     (start_function): Change assembler name of versioned functions. >>     (cxx_comdat_group): Make comdat group of versioned functions be the >>     same. >>     * cp/semantics.c (expand_or_defer_fn_1): Mark as needed versioned >>     functions that are also marked inline. >>     * cp/decl2.c: Include multiversion.h >>     (check_classfn): Check attributes of versioned functions for match. >>     * cp/call.c: Include multiversion.h >>     (build_over_call): Make calls to multiversioned functions to call the >>     dispatcher. >>     (joust): For calls to multi-versioned functions, make the default >>     function win. >>     * timevar.def (TV_MULTIVERSION_DISPATCH): New time var. >>     * varasm.c (finish_aliases_1): Check if the alias points to a function >>     with a body before giving an error. >>     * Makefile.in: Add multiversion.o >>     * passes.c: Add pass_dispatch_versions to the pass list. >>     * config/i386/i386.c (add_condition_to_bb): New function. >>     (get_builtin_code_for_version): New function. >>     (ix86_dispatch_version): New function. >>     (TARGET_DISPATCH_VERSION): New macro. >>     * testsuite/g++.dg/mv1.C: New test. >> >> Index: doc/tm.texi >> =================================================================== >> --- doc/tm.texi (revision 184971) >> +++ doc/tm.texi (working copy) >> @@ -10995,6 +10995,14 @@ The result is another tree containing a simplified >>  call's result.  If @var{ignore} is true the value will be ignored. >>  @end deftypefn >> >> +@deftypefn {Target Hook} int TARGET_DISPATCH_VERSION (tree @var{dispatch_decl}, void *@var{fndecls}, basic_block *@var{empty_bb}) >> +For multi-versioned function, this hook sets up the dispatcher. >> +@var{dispatch_decl} is the function that will be used to dispatch the >> +version. @var{fndecls} are the function choices for dispatch. >> +@var{empty_bb} is an basic block in @var{dispatch_decl} where the >> +code to do the dispatch will be added. >> +@end deftypefn >> + >>  @deftypefn {Target Hook} {const char *} TARGET_INVALID_WITHIN_DOLOOP (const_rtx @var{insn}) >> >>  Take an instruction in @var{insn} and return NULL if it is valid within a >> Index: doc/tm.texi.in >> =================================================================== >> --- doc/tm.texi.in    (revision 184971) >> +++ doc/tm.texi.in    (working copy) >> @@ -10873,6 +10873,14 @@ The result is another tree containing a simplified >>  call's result.  If @var{ignore} is true the value will be ignored. >>  @end deftypefn >> >> +@hook TARGET_DISPATCH_VERSION >> +For multi-versioned function, this hook sets up the dispatcher. >> +@var{dispatch_decl} is the function that will be used to dispatch the >> +version. @var{fndecls} are the function choices for dispatch. >> +@var{empty_bb} is an basic block in @var{dispatch_decl} where the >> +code to do the dispatch will be added. >> +@end deftypefn >> + >>  @hook TARGET_INVALID_WITHIN_DOLOOP >> >>  Take an instruction in @var{insn} and return NULL if it is valid within a >> Index: c-family/c-common.c >> =================================================================== >> --- c-family/c-common.c (revision 184971) >> +++ c-family/c-common.c (working copy) >> @@ -315,6 +315,7 @@ static tree check_case_value (tree); >>  static bool check_case_bounds (tree, tree, tree *, tree *); >> >>  static tree handle_packed_attribute (tree *, tree, tree, int, bool *); >> +static tree handle_targetv_attribute (tree *, tree, tree, int, bool *); >>  static tree handle_nocommon_attribute (tree *, tree, tree, int, bool *); >>  static tree handle_common_attribute (tree *, tree, tree, int, bool *); >>  static tree handle_noreturn_attribute (tree *, tree, tree, int, bool *); >> @@ -604,6 +605,8 @@ const struct attribute_spec c_common_attribute_tab >>  { >>  /* { name, min_len, max_len, decl_req, type_req, fn_type_req, handler, >>     affects_type_identity } */ >> +  { "targetv",        1, -1, true, false, false, >> +               handle_targetv_attribute, false }, >>  { "packed",         0, 0, false, false, false, >>                handle_packed_attribute , false}, >>  { "nocommon",        0, 0, true,  false, false, >> @@ -5869,6 +5872,54 @@ handle_packed_attribute (tree *node, tree name, tr >>  return NULL_TREE; >>  } >> >> +/* The targetv attribue is used to specify a function version >> +  targeted to specific platform types.  The "targetv" attributes >> +  have to be valid "target" attributes.  NODE should always point >> +  to a FUNCTION_DECL.  ARGS contain the arguments to "targetv" >> +  which should be valid arguments to attribute "target" too. >> +  Check handle_target_attribute for FLAGS and NO_ADD_ATTRS.  */ >> + >> +static tree >> +handle_targetv_attribute (tree *node, tree name, >> +             tree args, >> +             int flags, >> +             bool *no_add_attrs) >> +{ >> +  const char *attr_str = NULL; >> +  gcc_assert (TREE_CODE (*node) == FUNCTION_DECL); >> +  gcc_assert (args != NULL); >> + >> +  /* This is a function version.  */ >> +  DECL_FUNCTION_VERSIONED (*node) = 1; >> + >> +  attr_str = TREE_STRING_POINTER (TREE_VALUE (args)); >> + >> +  /* Check if multiple sets of target attributes are there.  This >> +   is not supported now.  In future, this will be supported by >> +   cloning this function for each set.  */ >> +  if (TREE_CHAIN (args) != NULL) >> +   warning (OPT_Wattributes, "%qE attribute has multiple sets which " >> +       "is not supported", name); >> + >> +  if (attr_str == NULL >> +    || strstr (attr_str, "arch=") == NULL) >> +   error_at (DECL_SOURCE_LOCATION (*node), >> +       "Versioning supported only on \"arch=\" for now"); >> + >> +  /* targetv attributes must translate into target attributes.  */ >> +  handle_target_attribute (node, get_identifier ("target"), args, flags, >> +              no_add_attrs); >> + >> +  if (*no_add_attrs) >> +   warning (OPT_Wattributes, "%qE attribute has no effect", name); >> + >> +  /* This is necessary to keep the attribute tagged to the decl >> +   all the time.  */ >> +  *no_add_attrs = false; >> + >> +  return NULL_TREE; >> +} >> + >>  /* Handle a "nocommon" attribute; arguments as in >>   struct attribute_spec.handler.  */ >> >> Index: target.def >> =================================================================== >> --- target.def  (revision 184971) >> +++ target.def  (working copy) >> @@ -1249,6 +1249,15 @@ DEFHOOK >>  tree, (tree fndecl, int n_args, tree *argp, bool ignore), >>  hook_tree_tree_int_treep_bool_null) >> >> +/* Target hook to generate the dispatching code for calls to multi-versioned >> +  functions.  DISPATCH_DECL is the function that will have the dispatching >> +  logic.  FNDECLS are the list of choices for dispatch and EMPTY_BB is the >> +  basic bloc in DISPATCH_DECL which will contain the code.  */ >> +DEFHOOK >> +(dispatch_version, >> + "", >> + int, (tree dispatch_decl, void *fndecls, basic_block *empty_bb), NULL) >> + >>  /* Returns a code for a target-specific builtin that implements >>   reciprocal of the function, or NULL_TREE if not available.  */ >>  DEFHOOK >> Index: tree.h >> =================================================================== >> --- tree.h    (revision 184971) >> +++ tree.h    (working copy) >> @@ -3532,6 +3532,12 @@ extern VEC(tree, gc) **decl_debug_args_insert (tre >>  #define DECL_FUNCTION_SPECIFIC_OPTIMIZATION(NODE) \ >>   (FUNCTION_DECL_CHECK (NODE)->function_decl.function_specific_optimization) >> >> +/* In FUNCTION_DECL, this is set if this function has other versions generated >> +  using "targetv" attributes.  The default version is the one which does not >> +  have any "targetv" attribute set. */ >> +#define DECL_FUNCTION_VERSIONED(NODE)\ >> +  (FUNCTION_DECL_CHECK (NODE)->function_decl.versioned_function) >> + >>  /* FUNCTION_DECL inherits from DECL_NON_COMMON because of the use of the >>   arguments/result/saved_tree fields by front ends.  It was either inherit >>   FUNCTION_DECL from non_common, or inherit non_common from FUNCTION_DECL, >> @@ -3576,8 +3582,8 @@ struct GTY(()) tree_function_decl { >>  unsigned looping_const_or_pure_flag : 1; >>  unsigned has_debug_args_flag : 1; >>  unsigned tm_clone_flag : 1; >> - >> -  /* 1 bit left */ >> +  unsigned versioned_function : 1; >> +  /* No bits left.  */ >>  }; >> >>  /* The source language of the translation-unit.  */ >> Index: tree-pass.h >> =================================================================== >> --- tree-pass.h (revision 184971) >> +++ tree-pass.h (working copy) >> @@ -455,6 +455,7 @@ extern struct gimple_opt_pass pass_tm_memopt; >>  extern struct gimple_opt_pass pass_tm_edges; >>  extern struct gimple_opt_pass pass_split_functions; >>  extern struct gimple_opt_pass pass_feedback_split_functions; >> +extern struct gimple_opt_pass pass_dispatch_versions; >> >>  /* IPA Passes */ >>  extern struct simple_ipa_opt_pass pass_ipa_lower_emutls; >> Index: multiversion.c >> =================================================================== >> --- multiversion.c    (revision 0) >> +++ multiversion.c    (revision 0) >> @@ -0,0 +1,798 @@ >> +/* Function Multiversioning. >> +  Copyright (C) 2012 Free Software Foundation, Inc. >> +  Contributed by Sriraman Tallam (tmsriram@google.com) >> + >> +This file is part of GCC. >> + >> +GCC is free software; you can redistribute it and/or modify it under >> +the terms of the GNU General Public License as published by the Free >> +Software Foundation; either version 3, or (at your option) any later >> +version. >> + >> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY >> +WARRANTY; without even the implied warranty of MERCHANTABILITY or >> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License >> +for more details. >> + >> +You should have received a copy of the GNU General Public License >> +along with GCC; see the file COPYING3.  If not see >> +<http://www.gnu.org/licenses/>. */ >> + >> +/* Holds the state for multi-versioned functions here. The front-end >> +  updates the state as and when function versions are encountered. >> +  This is then used to generate the dispatch code.  Also, the >> +  optimization passes to clone hot paths involving versioned functions >> +  will be done here. >> + >> +  Function versions are created by using the same function signature but >> +  also tagging attribute "targetv" to specify the platform type for which >> +  the version must be executed.  Here is an example: >> + >> +  int foo () >> +  { >> +   printf ("Execute as default"); >> +   return 0; >> +  } >> + >> +  int  __attribute__ ((targetv ("arch=corei7"))) >> +  foo () >> +  { >> +   printf ("Execute for corei7"); >> +   return 0; >> +  } >> + >> +  int main () >> +  { >> +   return foo (); >> +  } >> + >> +  The call to foo in main is replaced with a call to an IFUNC function that >> +  contains the dispatch code to call the correct function version at >> +  run-time.  */ >> + >> + >> +#include "config.h" >> +#include "system.h" >> +#include "coretypes.h" >> +#include "tm.h" >> +#include "tree.h" >> +#include "tree-inline.h" >> +#include "langhooks.h" >> +#include "flags.h" >> +#include "cgraph.h" >> +#include "diagnostic.h" >> +#include "toplev.h" >> +#include "timevar.h" >> +#include "params.h" >> +#include "fibheap.h" >> +#include "intl.h" >> +#include "tree-pass.h" >> +#include "hashtab.h" >> +#include "coverage.h" >> +#include "ggc.h" >> +#include "tree-flow.h" >> +#include "rtl.h" >> +#include "ipa-prop.h" >> +#include "basic-block.h" >> +#include "toplev.h" >> +#include "dbgcnt.h" >> +#include "tree-dump.h" >> +#include "output.h" >> +#include "vecprim.h" >> +#include "gimple-pretty-print.h" >> +#include "ipa-inline.h" >> +#include "target.h" >> +#include "multiversion.h" >> + >> +typedef void * void_p; >> + >> +DEF_VEC_P (void_p); >> +DEF_VEC_ALLOC_P (void_p, heap); >> + >> +/* Each function decl that is a function version gets an instance of this >> +  structure.  Since this is called by the front-end, decl merging can >> +  happen, where a decl created for a new declaration is merged with >> +  the old. In this case, the new decl is deleted and the IS_DELETED >> +  field is set for the struct instance corresponding to the new decl. >> +  IFUNC_DECL is the decl of the ifunc function for default decls. >> +  IFUNC_RESOLVER_DECL is the decl of the dispatch function.  VERSIONS >> +  is a vector containing the list of function versions  that are >> +  the candidates for dispatch.  */ >> + >> +typedef struct version_function_d { >> +  tree decl; >> +  tree ifunc_decl; >> +  tree ifunc_resolver_decl; >> +  VEC (void_p, heap) *versions; >> +  bool is_deleted; >> +} version_function; >> + >> +/* Hashmap has an entry for every function decl that has other function >> +  versions.  For function decls that are the default, it also stores the >> +  list of all the other function versions.  Each entry is a structure >> +  of type version_function_d.  */ >> +static htab_t decl_version_htab = NULL; >> + >> +/* Hashtable helpers for decl_version_htab. */ >> + >> +static hashval_t >> +decl_version_htab_hash_descriptor (const void *p) >> +{ >> +  const version_function *t = (const version_function *) p; >> +  return htab_hash_pointer (t->decl); >> +} >> + >> +/* Hashtable helper for decl_version_htab. */ >> + >> +static int >> +decl_version_htab_eq_descriptor (const void *p1, const void *p2) >> +{ >> +  const version_function *t1 = (const version_function *) p1; >> +  return htab_eq_pointer ((const void_p) t1->decl, p2); >> +} >> + >> +/* Create the decl_version_htab.  */ >> +static void >> +create_decl_version_htab (void) >> +{ >> +  if (decl_version_htab == NULL) >> +   decl_version_htab = htab_create (10, decl_version_htab_hash_descriptor, >> +                   decl_version_htab_eq_descriptor, NULL); >> +} >> + >> +/* Creates an instance of version_function for decl DECL.  */ >> + >> +static version_function* >> +new_version_function (const tree decl) >> +{ >> +  version_function *v; >> +  v = (version_function *)xmalloc(sizeof (version_function)); >> +  v->decl = decl; >> +  v->ifunc_decl = NULL; >> +  v->ifunc_resolver_decl = NULL; >> +  v->versions = NULL; >> +  v->is_deleted = false; >> +  return v; >> +} >> + >> +/* Comparator function to be used in qsort routine to sort attribute >> +  specification strings to "targetv".  */ >> + >> +static int >> +attr_strcmp (const void *v1, const void *v2) >> +{ >> +  const char *c1 = *(char *const*)v1; >> +  const char *c2 = *(char *const*)v2; >> +  return strcmp (c1, c2); >> +} >> + >> +/* STR is the argument to targetv attribute.  This function tokenizes >> +  the comma separated arguments, sorts them and returns a string which >> +  is a unique identifier for the comma separated arguments.  */ >> + >> +static char * >> +sorted_attr_string (const char *str) >> +{ >> +  char **args = NULL; >> +  char *attr_str, *ret_str; >> +  char *attr = NULL; >> +  unsigned int argnum = 1; >> +  unsigned int i; >> + >> +  for (i = 0; i < strlen (str); i++) >> +   if (str[i] == ',') >> +    argnum++; >> + >> +  attr_str = (char *)xmalloc (strlen (str) + 1); >> +  strcpy (attr_str, str); >> + >> +  for (i = 0; i < strlen (attr_str); i++) >> +   if (attr_str[i] == '=') >> +    attr_str[i] = '_'; >> + >> +  if (argnum == 1) >> +   return attr_str; >> + >> +  args = (char **)xmalloc (argnum * sizeof (char *)); >> + >> +  i = 0; >> +  attr = strtok (attr_str, ","); >> +  while (attr != NULL) >> +   { >> +    args[i] = attr; >> +    i++; >> +    attr = strtok (NULL, ","); >> +   } >> + >> +  qsort (args, argnum, sizeof (char*), attr_strcmp); >> + >> +  ret_str = (char *)xmalloc (strlen (str) + 1); >> +  strcpy (ret_str, args[0]); >> +  for (i = 1; i < argnum; i++) >> +   { >> +    strcat (ret_str, "_"); >> +    strcat (ret_str, args[i]); >> +   } >> + >> +  free (args); >> +  free (attr_str); >> +  return ret_str; >> +} >> + >> +/* Returns true when only one of DECL1 and DECL2 is marked with "targetv" >> +  or if the "targetv" attribute strings of DECL1 and DECL2 dont match.  */ >> + >> +bool >> +has_different_version_attributes (const tree decl1, const tree decl2) >> +{ >> +  tree attr1, attr2; >> +  char *c1, *c2; >> +  bool ret = false; >> + >> +  if (TREE_CODE (decl1) != FUNCTION_DECL >> +    || TREE_CODE (decl2) != FUNCTION_DECL) >> +   return false; >> + >> +  attr1 = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl1)); >> +  attr2 = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl2)); >> + >> +  if (attr1 == NULL_TREE && attr2 == NULL_TREE) >> +   return false; >> + >> +  if ((attr1 == NULL_TREE && attr2 != NULL_TREE) >> +    || (attr1 != NULL_TREE && attr2 == NULL_TREE)) >> +   return true; >> + >> +  c1 = sorted_attr_string ( >> +    TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr1)))); >> +  c2 = sorted_attr_string ( >> +    TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr2)))); >> + >> +  if (strcmp (c1, c2) != 0) >> +   ret = true; >> + >> +  free (c1); >> +  free (c2); >> + >> +  return ret; >> +} >> + >> +/* If this decl corresponds to a function and has "targetv" attribute, >> +  append the attribute string to its assembler name.  */ >> + >> +void >> +version_assembler_name (const tree decl) >> +{ >> +  tree version_attr; >> +  const char *orig_name, *version_string, *attr_str; >> +  char *assembler_name; >> +  tree assembler_name_tree; >> + >> +  if (TREE_CODE (decl) != FUNCTION_DECL >> +    || DECL_ASSEMBLER_NAME_SET_P (decl) >> +    || !DECL_FUNCTION_VERSIONED (decl)) >> +   return; >> + >> +  if (DECL_DECLARED_INLINE_P (decl) >> +    &&lookup_attribute ("gnu_inline", >> +             DECL_ATTRIBUTES (decl))) >> +   error_at (DECL_SOURCE_LOCATION (decl), >> +       "Function versions cannot be marked as gnu_inline," >> +       " bodies have to be generated\n"); >> + >> +  if (DECL_VIRTUAL_P (decl) >> +    || DECL_VINDEX (decl)) >> +   error_at (DECL_SOURCE_LOCATION (decl), >> +       "Virtual function versioning not supported\n"); >> + >> +  version_attr = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl)); >> +  /* targetv attribute string is NULL for default functions.  */ >> +  if (version_attr == NULL_TREE) >> +   return; >> + >> +  orig_name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)); >> +  version_string >> +   = TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (version_attr))); >> + >> +  attr_str = sorted_attr_string (version_string); >> +  assembler_name = (char *) xmalloc (strlen (orig_name) >> +                   + strlen (attr_str) + 2); >> + >> +  sprintf (assembler_name, "%s.%s", orig_name, attr_str); >> +  if (dump_file) >> +   fprintf (dump_file, "Assembler name set to %s for function version %s\n", >> +       assembler_name, IDENTIFIER_POINTER (DECL_NAME (decl))); >> +  assembler_name_tree = get_identifier (assembler_name); >> +  SET_DECL_ASSEMBLER_NAME (decl, assembler_name_tree); >> +} >> + >> +/* Returns true if decl is multi-versioned and DECL is the default function, >> +  that is it is not tagged with "targetv" attribute.  */ >> + >> +bool >> +is_default_function (const tree decl) >> +{ >> +  return (TREE_CODE (decl) == FUNCTION_DECL >> +     && DECL_FUNCTION_VERSIONED (decl) >> +     && (lookup_attribute ("targetv", DECL_ATTRIBUTES (decl)) >> +       == NULL_TREE)); >> +} >> + >> +/* For function decl DECL, find the version_function struct in the >> +  decl_version_htab.  */ >> + >> +static version_function * >> +find_function_version (const tree decl) >> +{ >> +  void *slot; >> + >> +  if (!DECL_FUNCTION_VERSIONED (decl)) >> +   return NULL; >> + >> +  if (!decl_version_htab) >> +   return NULL; >> + >> +  slot = htab_find_with_hash (decl_version_htab, decl, >> +                htab_hash_pointer (decl)); >> + >> +  if (slot != NULL) >> +   return (version_function *)slot; >> + >> +  return NULL; >> +} >> + >> +/* Record DECL as a function version by creating a version_function struct >> +  for it and storing it in the hashtable.  */ >> + >> +static version_function * >> +add_function_version (const tree decl) >> +{ >> +  void **slot; >> +  version_function *v; >> + >> +  if (!DECL_FUNCTION_VERSIONED (decl)) >> +   return NULL; >> + >> +  create_decl_version_htab (); >> + >> +  slot = htab_find_slot_with_hash (decl_version_htab, (const void_p)decl, >> +                  htab_hash_pointer ((const void_p)decl), >> +                  INSERT); >> + >> +  if (*slot != NULL) >> +   return (version_function *)*slot; >> + >> +  v = new_version_function (decl); >> +  *slot = v; >> + >> +  return v; >> +} >> + >> +/* Push V into VEC only if it is not already present.  */ >> + >> +static void >> +push_function_version (version_function *v, VEC (void_p, heap) *vec) >> +{ >> +  int ix; >> +  void_p ele; >> +  for (ix = 0; VEC_iterate (void_p, vec, ix, ele); ++ix) >> +   { >> +    if (ele == (void_p)v) >> +     return; >> +   } >> + >> +  VEC_safe_push (void_p, heap, vec, (void*)v); >> +} >> + >> +/* Mark DECL as deleted.  This is called by the front-end when a duplicate >> +  decl is merged with the original decl and the duplicate decl is deleted. >> +  This function marks the duplicate_decl as invalid.  Called by >> +  duplicate_decls in cp/decl.c.  */ >> + >> +void >> +mark_delete_decl_version (const tree decl) >> +{ >> +  version_function *decl_v; >> + >> +  decl_v = find_function_version (decl); >> + >> +  if (decl_v == NULL) >> +   return; >> + >> +  decl_v->is_deleted = true; >> + >> +  if (is_default_function (decl) >> +    && decl_v->versions != NULL) >> +   { >> +    VEC_truncate (void_p, decl_v->versions, 0); >> +    VEC_free (void_p, heap, decl_v->versions); >> +   } >> +} >> + >> +/* Mark DECL1 and DECL2 to be function versions in the same group.  One >> +  of DECL1 and DECL2 must be the default, otherwise this function does >> +  nothing.  This function aggregates the versions.  */ >> + >> +int >> +group_function_versions (const tree decl1, const tree decl2) >> +{ >> +  tree default_decl, version_decl; >> +  version_function *default_v, *version_v; >> + >> +  gcc_assert (DECL_FUNCTION_VERSIONED (decl1) >> +       && DECL_FUNCTION_VERSIONED (decl2)); >> + >> +  /* The version decls are added only to the default decl.  */ >> +  if (!is_default_function (decl1) >> +    && !is_default_function (decl2)) >> +   return 0; >> + >> +  /* This can happen with duplicate declarations.  Just ignore.  */ >> +  if (is_default_function (decl1) >> +    && is_default_function (decl2)) >> +   return 0; >> + >> +  default_decl = (is_default_function (decl1)) ? decl1 : decl2; >> +  version_decl = (default_decl == decl1) ? decl2 : decl1; >> + >> +  gcc_assert (default_decl != version_decl); >> +  create_decl_version_htab (); >> + >> +  /* If the version function is found, it has been added.  */ >> +  if (find_function_version (version_decl)) >> +   return 0; >> + >> +  default_v = add_function_version (default_decl); >> +  version_v = add_function_version (version_decl); >> + >> +  if (default_v->versions == NULL) >> +   default_v->versions = VEC_alloc (void_p, heap, 1); >> + >> +  push_function_version (version_v, default_v->versions); >> +  return 0; >> +} >> + >> +/* Makes a function attribute of the form NAME(ARG_NAME) and chains >> +  it to CHAIN.  */ >> + >> +static tree >> +make_attribute (const char *name, const char *arg_name, tree chain) >> +{ >> +  tree attr_name; >> +  tree attr_arg_name; >> +  tree attr_args; >> +  tree attr; >> + >> +  attr_name = get_identifier (name); >> +  attr_arg_name = build_string (strlen (arg_name), arg_name); >> +  attr_args = tree_cons (NULL_TREE, attr_arg_name, NULL_TREE); >> +  attr = tree_cons (attr_name, attr_args, chain); >> +  return attr; >> +} >> + >> +/* Return a new name by appending SUFFIX to the DECL name.  If >> +  make_unique is true, append the full path name.  */ >> + >> +static char * >> +make_name (tree decl, const char *suffix, bool make_unique) >> +{ >> +  char *global_var_name; >> +  int name_len; >> +  const char *name; >> +  const char *unique_name = NULL; >> + >> +  name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)); >> + >> +  /* Get a unique name that can be used globally without any chances >> +   of collision at link time.  */ >> +  if (make_unique) >> +   unique_name = IDENTIFIER_POINTER (get_file_function_name ("\0")); >> + >> +  name_len = strlen (name) + strlen (suffix) + 2; >> + >> +  if (make_unique) >> +   name_len += strlen (unique_name) + 1; >> +  global_var_name = (char *) xmalloc (name_len); >> + >> +  /* Use '.' to concatenate names as it is demangler friendly.  */ >> +  if (make_unique) >> +    snprintf (global_var_name, name_len, "%s.%s.%s", name, >> +        unique_name, suffix); >> +  else >> +    snprintf (global_var_name, name_len, "%s.%s", name, suffix); >> + >> +  return global_var_name; >> +} >> + >> +/* Make the resolver function decl for ifunc (IFUNC_DECL) to dispatch >> +  the versions of multi-versioned function DEFAULT_DECL.  Create and >> +  empty basic block in the resolver and store the pointer in >> +  EMPTY_BB.  Return the decl of the resolver function.  */ >> + >> +static tree >> +make_ifunc_resolver_func (const tree default_decl, >> +             const tree ifunc_decl, >> +             basic_block *empty_bb) >> +{ >> +  char *resolver_name; >> +  tree decl, type, decl_name, t; >> +  basic_block new_bb; >> +  tree old_current_function_decl; >> +  bool make_unique = false; >> + >> +  /* IFUNC's have to be globally visible.  So, if the default_decl is >> +   not, then the name of the IFUNC should be made unique.  */ >> +  if (TREE_PUBLIC (default_decl) == 0) >> +   make_unique = true; >> + >> +  /* Append the filename to the resolver function if the versions are >> +   not externally visible.  This is because the resolver function has >> +   to be externally visible for the loader to find it.  So, appending >> +   the filename will prevent conflicts with a resolver function from >> +   another module which is based on the same version name.  */ >> +  resolver_name = make_name (default_decl, "resolver", make_unique); >> + >> +  /* The resolver function should return a (void *). */ >> +  type = build_function_type_list (ptr_type_node, NULL_TREE); >> + >> +  decl = build_fn_decl (resolver_name, type); >> +  decl_name = get_identifier (resolver_name); >> +  SET_DECL_ASSEMBLER_NAME (decl, decl_name); >> + >> +  DECL_NAME (decl) = decl_name; >> +  TREE_USED (decl) = TREE_USED (default_decl); >> +  DECL_ARTIFICIAL (decl) = 1; >> +  DECL_IGNORED_P (decl) = 0; >> +  /* IFUNC resolvers have to be externally visible.  */ >> +  TREE_PUBLIC (decl) = 1; >> +  DECL_UNINLINABLE (decl) = 1; >> + >> +  DECL_EXTERNAL (decl) = DECL_EXTERNAL (default_decl); >> +  DECL_EXTERNAL (ifunc_decl) = 0; >> + >> +  DECL_CONTEXT (decl) = NULL_TREE; >> +  DECL_INITIAL (decl) = make_node (BLOCK); >> +  DECL_STATIC_CONSTRUCTOR (decl) = 0; >> +  TREE_READONLY (decl) = 0; >> +  DECL_PURE_P (decl) = 0; >> +  DECL_COMDAT (decl) = DECL_COMDAT (default_decl); >> +  if (DECL_COMDAT_GROUP (default_decl)) >> +   { >> +    make_decl_one_only (decl, DECL_COMDAT_GROUP (default_decl)); >> +   } >> +  /* Build result decl and add to function_decl. */ >> +  t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE, ptr_type_node); >> +  DECL_ARTIFICIAL (t) = 1; >> +  DECL_IGNORED_P (t) = 1; >> +  DECL_RESULT (decl) = t; >> + >> +  gimplify_function_tree (decl); >> +  old_current_function_decl = current_function_decl; >> +  push_cfun (DECL_STRUCT_FUNCTION (decl)); >> +  current_function_decl = decl; >> +  init_empty_tree_cfg_for_function (DECL_STRUCT_FUNCTION (decl)); >> +  cfun->curr_properties |= >> +   (PROP_gimple_lcf | PROP_gimple_leh | PROP_cfg | PROP_referenced_vars | >> +   PROP_ssa); >> +  new_bb = create_empty_bb (ENTRY_BLOCK_PTR); >> +  make_edge (ENTRY_BLOCK_PTR, new_bb, EDGE_FALLTHRU); >> +  make_edge (new_bb, EXIT_BLOCK_PTR, 0); >> +  *empty_bb = new_bb; >> + >> +  cgraph_add_new_function (decl, true); >> +  cgraph_call_function_insertion_hooks (cgraph_get_create_node (decl)); >> +  cgraph_analyze_function (cgraph_get_create_node (decl)); >> +  cgraph_mark_needed_node (cgraph_get_create_node (decl)); >> + >> +  if (DECL_COMDAT_GROUP (default_decl)) >> +   { >> +    gcc_assert (cgraph_get_node (default_decl)); >> +    cgraph_add_to_same_comdat_group (cgraph_get_node (decl), >> +                    cgraph_get_node (default_decl)); >> +   } >> + >> +  pop_cfun (); >> +  current_function_decl = old_current_function_decl; >> + >> +  gcc_assert (ifunc_decl != NULL); >> +  DECL_ATTRIBUTES (ifunc_decl) >> +   = make_attribute ("ifunc", resolver_name, DECL_ATTRIBUTES (ifunc_decl)); >> +  assemble_alias (ifunc_decl, get_identifier (resolver_name)); >> +  return decl; >> +} >> + >> +/* Make and ifunc declaration for the multi-versioned function DECL.  Calls to >> +  DECL function will be replaced with calls to the ifunc.  Return the decl >> +  of the ifunc created.  */ >> + >> +static tree >> +make_ifunc_func (const tree decl) >> +{ >> +  tree ifunc_decl; >> +  char *ifunc_name, *resolver_name; >> +  tree fn_type, ifunc_type; >> +  bool make_unique = false; >> + >> +  if (TREE_PUBLIC (decl) == 0) >> +   make_unique = true; >> + >> +  ifunc_name = make_name (decl, "ifunc", make_unique); >> +  resolver_name = make_name (decl, "resolver", make_unique); >> +  gcc_assert (resolver_name); >> + >> +  fn_type = TREE_TYPE (decl); >> +  ifunc_type = build_function_type (TREE_TYPE (fn_type), >> +                  TYPE_ARG_TYPES (fn_type)); >> + >> +  ifunc_decl = build_fn_decl (ifunc_name, ifunc_type); >> +  TREE_USED (ifunc_decl) = 1; >> +  DECL_CONTEXT (ifunc_decl) = NULL_TREE; >> +  DECL_INITIAL (ifunc_decl) = error_mark_node; >> +  DECL_ARTIFICIAL (ifunc_decl) = 1; >> +  /* Mark this ifunc as external, the resolver will flip it again if >> +   it gets generated.  */ >> +  DECL_EXTERNAL (ifunc_decl) = 1; >> +  /* IFUNCs have to be externally visible.  */ >> +  TREE_PUBLIC (ifunc_decl) = 1; >> + >> +  return ifunc_decl; >> +} >> + >> +/* For multi-versioned function decl, which should also be the default, >> +  return the decl of the ifunc resolver, create it if it does not >> +  exist.  */ >> + >> +tree >> +get_ifunc_for_version (const tree decl) >> +{ >> +  version_function *decl_v; >> +  int ix; >> +  void_p ele; >> + >> +  /* DECL has to be the default version, otherwise it is missing and >> +   that is not allowed.  */ >> +  if (!is_default_function (decl)) >> +   { >> +    error_at (DECL_SOURCE_LOCATION (decl), "Default version not found"); >> +    return decl; >> +   } >> + >> +  decl_v = find_function_version (decl); >> +  gcc_assert (decl_v != NULL); >> +  if (decl_v->ifunc_decl == NULL) >> +   { >> +    tree ifunc_decl; >> +    ifunc_decl = make_ifunc_func (decl); >> +    decl_v->ifunc_decl = ifunc_decl; >> +   } >> + >> +  if (cgraph_get_node (decl)) >> +   cgraph_mark_needed_node (cgraph_get_node (decl)); >> + >> +  for (ix = 0; VEC_iterate (void_p, decl_v->versions, ix, ele); ++ix) >> +   { >> +    version_function *v = (version_function *) ele; >> +    gcc_assert (v->decl != NULL); >> +    if (cgraph_get_node (v->decl)) >> +    cgraph_mark_needed_node (cgraph_get_node (v->decl)); >> +   } >> + >> +  return decl_v->ifunc_decl; >> +} >> + >> +/* Generate the dispatching code to dispatch multi-versioned function >> +  DECL.  Make a new function decl for dispatching and call the target >> +  hook to process the "targetv" attributes and provide the code to >> +  dispatch the right function at run-time.  */ >> + >> +static tree >> +make_ifunc_resolver_for_version (const tree decl) >> +{ >> +  version_function *decl_v; >> +  tree ifunc_resolver_decl, ifunc_decl; >> +  basic_block empty_bb; >> +  int ix; >> +  void_p ele; >> +  VEC (tree, heap) *fn_ver_vec = NULL; >> + >> +  gcc_assert (is_default_function (decl)); >> + >> +  decl_v = find_function_version (decl); >> +  gcc_assert (decl_v != NULL); >> + >> +  if (decl_v->ifunc_resolver_decl != NULL) >> +   return decl_v->ifunc_resolver_decl; >> + >> +  ifunc_decl = decl_v->ifunc_decl; >> + >> +  if (ifunc_decl == NULL) >> +   ifunc_decl = decl_v->ifunc_decl = make_ifunc_func (decl); >> + >> +  ifunc_resolver_decl = make_ifunc_resolver_func (decl, ifunc_decl, >> +                         &empty_bb); >> + >> +  fn_ver_vec = VEC_alloc (tree, heap, 2); >> +  VEC_safe_push (tree, heap, fn_ver_vec, decl); >> + >> +  for (ix = 0; VEC_iterate (void_p, decl_v->versions, ix, ele); ++ix) >> +   { >> +    version_function *v = (version_function *) ele; >> +    gcc_assert (v->decl != NULL); >> +    /* Check for virtual functions here again, as by this time it should >> +     have been determined if this function needs a vtable index or >> +     not.  This happens for methods in derived classes that override >> +     virtual methods in base classes but are not explicitly marked as >> +     virtual.  */ >> +    if (DECL_VINDEX (v->decl)) >> +     error_at (DECL_SOURCE_LOCATION (v->decl), >> +         "Virtual function versioning not supported\n"); >> +    if (!v->is_deleted) >> +    VEC_safe_push (tree, heap, fn_ver_vec, v->decl); >> +   } >> + >> +  gcc_assert (targetm.dispatch_version); >> +  targetm.dispatch_version (ifunc_resolver_decl, fn_ver_vec, &empty_bb); >> +  decl_v->ifunc_resolver_decl = ifunc_resolver_decl; >> + >> +  return ifunc_resolver_decl; >> +} >> + >> +/* Main entry point to pass_dispatch_versions. For multi-versioned functions, >> +  generate the dispatching code.  */ >> + >> +static unsigned int >> +do_dispatch_versions (void) >> +{ >> +  /* A new pass for generating dispatch code for multi-versioned functions. >> +   Other forms of dispatch can be added when ifunc support is not available >> +   like just calling the function directly after checking for target type. >> +   Currently, dispatching is done through IFUNC.  This pass will become >> +   more meaningful when other dispatch mechanisms are added.  */ >> + >> +  /* Cloning a function to produce more versions will happen here when the >> +   user requests that via the targetv attribute. For example, >> +   int foo () __attribute__ ((targetv(("arch=core2"), ("arch=corei7")))); >> +   means that the user wants the same body of foo to be versioned for core2 >> +   and corei7.  In that case, this function will be cloned during this >> +   pass.  */ >> + >> +  if (DECL_FUNCTION_VERSIONED (current_function_decl) >> +    && is_default_function (current_function_decl)) >> +   { >> +    tree decl = make_ifunc_resolver_for_version (current_function_decl); >> +    if (dump_file && decl) >> +    dump_function_to_file (decl, dump_file, TDF_BLOCKS); >> +   } >> +  return 0; >> +} >> + >> +static  bool >> +gate_dispatch_versions (void) >> +{ >> +  return true; >> +} >> + >> +/* A pass to generate the dispatch code to execute the appropriate version >> +  of a multi-versioned function at run-time.  */ >> + >> +struct gimple_opt_pass pass_dispatch_versions = >> +{ >> + { >> +  GIMPLE_PASS, >> +  "dispatch_multiversion_functions",   /* name */ >> +  gate_dispatch_versions,        /* gate */ >> +  do_dispatch_versions,             /* execute */ >> +  NULL,                     /* sub */ >> +  NULL,                     /* next */ >> +  0,                  /* static_pass_number */ >> +  TV_MULTIVERSION_DISPATCH,       /* tv_id */ >> +  PROP_cfg,               /* properties_required */ >> +  PROP_cfg,               /* properties_provided */ >> +  0,                  /* properties_destroyed */ >> +  0,                  /* todo_flags_start */ >> +  TODO_dump_func |           /* todo_flags_finish */ >> +  TODO_cleanup_cfg | TODO_dump_cgraph >> + } >> +}; >> Index: cgraphunit.c >> =================================================================== >> --- cgraphunit.c     (revision 184971) >> +++ cgraphunit.c     (working copy) >> @@ -141,6 +141,7 @@ along with GCC; see the file COPYING3.  If not see >>  #include "ipa-inline.h" >>  #include "ipa-utils.h" >>  #include "lto-streamer.h" >> +#include "multiversion.h" >> >>  static void cgraph_expand_all_functions (void); >>  static void cgraph_mark_functions_to_output (void); >> @@ -343,6 +344,13 @@ cgraph_finalize_function (tree decl, bool nested) >>    node->local.redefined_extern_inline = true; >>   } >> >> +  /* If this is a function version and not the default, change the >> +   assembler name of this function.  The DECL names of function >> +   versions are the same, only the assembler names are made unique. >> +   The assembler name is changed by appending the string from >> +   the "targetv" attribute.  */ >> +  version_assembler_name (decl); >> + >>  notice_global_symbol (decl); >>  node->local.finalized = true; >>  node->lowered = DECL_STRUCT_FUNCTION (decl)->cfg != NULL; >> Index: multiversion.h >> =================================================================== >> --- multiversion.h    (revision 0) >> +++ multiversion.h    (revision 0) >> @@ -0,0 +1,52 @@ >> +/* Function Multiversioning. >> +  Copyright (C) 2012 Free Software Foundation, Inc. >> +  Contributed by Sriraman Tallam (tmsriram@google.com) >> + >> +This file is part of GCC. >> + >> +GCC is free software; you can redistribute it and/or modify it under >> +the terms of the GNU General Public License as published by the Free >> +Software Foundation; either version 3, or (at your option) any later >> +version. >> + >> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY >> +WARRANTY; without even the implied warranty of MERCHANTABILITY or >> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License >> +for more details. >> + >> +You should have received a copy of the GNU General Public License >> +along with GCC; see the file COPYING3.  If not see >> +<http://www.gnu.org/licenses/>. */ >> + >> +/* This is the header file which provides the functions to keep track >> +  of functions that are multi-versioned and to generate the dispatch >> +  code to call the right version at run-time.  */ >> + >> +#ifndef GCC_MULTIVERSION_H >> +#define GCC_MULTIVERION_H >> + >> +#include "tree.h" >> + >> +/* Mark DECL1 and DECL2 as function versions.  */ >> +int group_function_versions (const tree decl1, const tree decl2); >> + >> +/* Mark DECL as deleted and no longer a version.  */ >> +void mark_delete_decl_version (const tree decl); >> + >> +/* Returns true if DECL is the default version to be executed if all >> +  other versions are inappropriate at run-time.  */ >> +bool is_default_function (const tree decl); >> + >> +/* Gets the IFUNC dispatcher for this multi-versioned function DECL. DECL >> +  must be the default function in the multi-versioned group.  */ >> +tree get_ifunc_for_version (const tree decl); >> + >> +/* Returns true when only one of DECL1 and DECL2 is marked with "targetv" >> +  or if the "targetv" attribute strings of  DECL1 and DECL2 dont match.  */ >> +bool has_different_version_attributes (const tree decl1, const tree decl2); >> + >> +/* If DECL is a function version and not the default version, the assembler >> +  name of DECL is changed to include the attribute string to keep the >> +  name unambiguous.  */ >> +void version_assembler_name (const tree decl); >> +#endif >> Index: cp/class.c >> =================================================================== >> --- cp/class.c  (revision 184971) >> +++ cp/class.c  (working copy) >> @@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.  If not see >>  #include "tree-dump.h" >>  #include "splay-tree.h" >>  #include "pointer-set.h" >> +#include "multiversion.h" >> >>  /* The number of nested classes being processed.  If we are not in the >>   scope of any class, this is zero.  */ >> @@ -1092,7 +1093,20 @@ add_method (tree type, tree method, tree using_dec >>        || same_type_p (TREE_TYPE (fn_type), >>                TREE_TYPE (method_type)))) >>     { >> -     if (using_decl) >> +     /* For function versions, their parms and types match >> +       but they are not duplicates.  Record function versions >> +       as and when they are found.  */ >> +     if (TREE_CODE (fn) == FUNCTION_DECL >> +       && TREE_CODE (method) == FUNCTION_DECL >> +       && (DECL_FUNCTION_VERSIONED (fn) >> +         || DECL_FUNCTION_VERSIONED (method))) >> +      { >> +       DECL_FUNCTION_VERSIONED (fn) = 1; >> +       DECL_FUNCTION_VERSIONED (method) = 1; >> +       group_function_versions (fn, method); >> +       continue; >> +      } >> +     else if (using_decl) >>       { >>        if (DECL_CONTEXT (fn) == type) >>         /* Defer to the local function.  */ >> @@ -1150,6 +1164,13 @@ add_method (tree type, tree method, tree using_dec >>  else >>   /* Replace the current slot.  */ >>   VEC_replace (tree, method_vec, slot, overload); >> + >> +  /* Change the assembler name of method here if it has "targetv" >> +   attributes.  Since all versions have the same mangled name, >> +   their assembler name is changed by appending the string from >> +   the "targetv" attribute. */ >> +  version_assembler_name (method); >> + >>  return true; >>  } >> >> @@ -6890,8 +6911,11 @@ resolve_address_of_overloaded_function (tree targe >>      if (DECL_ANTICIPATED (fn)) >>       continue; >> >> -     /* See if there's a match.  */ >> -     if (same_type_p (target_fn_type, static_fn_type (fn))) >> +     /* See if there's a match.  For functions that are multi-versioned >> +       match it to the default function.  */ >> +     if (same_type_p (target_fn_type, static_fn_type (fn)) >> +       && (!DECL_FUNCTION_VERSIONED (fn) >> +         || is_default_function (fn))) >>       matches = tree_cons (fn, NULL_TREE, matches); >>     } >>   } >> @@ -7053,6 +7077,21 @@ resolve_address_of_overloaded_function (tree targe >>    perform_or_defer_access_check (access_path, fn, fn); >>   } >> >> +  /* If a pointer to a function that is multi-versioned is requested, the >> +   pointer to the dispatcher function is returned instead.  This works >> +   well because indirectly calling the function will dispatch the right >> +   function version at run-time. Also, the function address is kept >> +   unique.  */ >> +  if (DECL_FUNCTION_VERSIONED (fn) >> +    && is_default_function (fn)) >> +   { >> +    tree ifunc_decl; >> +    ifunc_decl = get_ifunc_for_version (fn); >> +    gcc_assert (ifunc_decl != NULL); >> +    mark_used (fn); >> +    return build_fold_addr_expr (ifunc_decl); >> +   } >> + >>  if (TYPE_PTRFN_P (target_type) || TYPE_PTRMEMFUNC_P (target_type)) >>   return cp_build_addr_expr (fn, flags); >>  else >> Index: cp/decl.c >> =================================================================== >> --- cp/decl.c  (revision 184971) >> +++ cp/decl.c  (working copy) >> @@ -54,6 +54,7 @@ along with GCC; see the file COPYING3.  If not see >>  #include "pointer-set.h" >>  #include "splay-tree.h" >>  #include "plugin.h" >> +#include "multiversion.h" >> >>  /* Possible cases of bad specifiers type used by bad_specifiers. */ >>  enum bad_spec_place { >> @@ -972,6 +973,23 @@ decls_match (tree newdecl, tree olddecl) >>    if (t1 != t2) >>     return 0; >> >> +    /* The decls dont match if they correspond to two different versions >> +     of the same function.  */ >> +    if (compparms (p1, p2) >> +     && same_type_p (TREE_TYPE (f1), TREE_TYPE (f2)) >> +     && (DECL_FUNCTION_VERSIONED (newdecl) >> +       || DECL_FUNCTION_VERSIONED (olddecl)) >> +     && has_different_version_attributes (newdecl, olddecl)) >> +    { >> +     /* One of the decls could be the default without the "targetv" >> +       attribute. Set it to be a versioned function here.  */ >> +     DECL_FUNCTION_VERSIONED (newdecl) = 1; >> +     DECL_FUNCTION_VERSIONED (olddecl) = 1; >> +     /* Accumulate all the versions of a function.  */ >> +     group_function_versions (olddecl, newdecl); >> +     return 0; >> +    } >> + >>    if (CP_DECL_CONTEXT (newdecl) != CP_DECL_CONTEXT (olddecl) >>      && ! (DECL_EXTERN_C_P (newdecl) >>         && DECL_EXTERN_C_P (olddecl))) >> @@ -1482,7 +1500,11 @@ duplicate_decls (tree newdecl, tree olddecl, bool >>        error ("previous declaration %q+#D here", olddecl); >>        return NULL_TREE; >>       } >> -     else if (compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)), >> +     /* For function versions, params and types match, but they >> +       are not ambiguous.  */ >> +     else if ((!DECL_FUNCTION_VERSIONED (newdecl) >> +          && !DECL_FUNCTION_VERSIONED (olddecl)) >> +          && compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)), >>                TYPE_ARG_TYPES (TREE_TYPE (olddecl)))) >>       { >>        error ("new declaration %q#D", newdecl); >> @@ -2250,6 +2272,16 @@ duplicate_decls (tree newdecl, tree olddecl, bool >>  else if (DECL_PRESERVE_P (newdecl)) >>   DECL_PRESERVE_P (olddecl) = 1; >> >> +  /* If the olddecl is a version, so is the newdecl.  */ >> +  if (TREE_CODE (newdecl) == FUNCTION_DECL >> +    && DECL_FUNCTION_VERSIONED (olddecl)) >> +   { >> +    DECL_FUNCTION_VERSIONED (newdecl) = 1; >> +    /* Record that newdecl is not a valid version and has >> +     been deleted.  */ >> +    mark_delete_decl_version (newdecl); >> +   } >> + >>  if (TREE_CODE (newdecl) == FUNCTION_DECL) >>   { >>    int function_size; >> @@ -4512,6 +4544,10 @@ start_decl (const cp_declarator *declarator, >>  /* Enter this declaration into the symbol table.  */ >>  decl = maybe_push_decl (decl); >> >> +  /* If this decl is a function version and not the default, its assembler >> +   name has to be changed.  */ >> +  version_assembler_name (decl); >> + >>  if (processing_template_decl) >>   decl = push_template_decl (decl); >>  if (decl == error_mark_node) >> @@ -13019,6 +13055,10 @@ start_function (cp_decl_specifier_seq *declspecs, >>   gcc_assert (same_type_p (TREE_TYPE (TREE_TYPE (decl1)), >>               integer_type_node)); >> >> +  /* If this decl is a function version and not the default, its assembler >> +   name has to be changed.  */ >> +  version_assembler_name (decl1); >> + >>  start_preparsed_function (decl1, attrs, /*flags=*/SF_DEFAULT); >> >>  return 1; >> @@ -13960,6 +14000,11 @@ cxx_comdat_group (tree decl) >>       break; >>     } >>    name = DECL_ASSEMBLER_NAME (decl); >> +    if (TREE_CODE (decl) == FUNCTION_DECL >> +     && DECL_FUNCTION_VERSIONED (decl)) >> +    name = DECL_NAME (decl); >> +    else >> +     name = DECL_ASSEMBLER_NAME (decl); >>   } >> >>  return name; >> Index: cp/semantics.c >> =================================================================== >> --- cp/semantics.c    (revision 184971) >> +++ cp/semantics.c    (working copy) >> @@ -3783,8 +3783,11 @@ expand_or_defer_fn_1 (tree fn) >>    /* If the user wants us to keep all inline functions, then mark >>     this function as needed so that finish_file will make sure to >>     output it later.  Similarly, all dllexport'd functions must >> -     be emitted; there may be callers in other DLLs.  */ >> -    if ((flag_keep_inline_functions >> +     be emitted; there may be callers in other DLLs. >> +     Also, mark this function as needed if it is marked inline but >> +     is a multi-versioned function.  */ >> +    if (((flag_keep_inline_functions >> +      || DECL_FUNCTION_VERSIONED (fn)) >>      && DECL_DECLARED_INLINE_P (fn) >>      && !DECL_REALLY_EXTERN (fn)) >>      || (flag_keep_inline_dllexport >> Index: cp/decl2.c >> =================================================================== >> --- cp/decl2.c  (revision 184971) >> +++ cp/decl2.c  (working copy) >> @@ -53,6 +53,7 @@ along with GCC; see the file COPYING3.  If not see >>  #include "splay-tree.h" >>  #include "langhooks.h" >>  #include "c-family/c-ada-spec.h" >> +#include "multiversion.h" >> >>  extern cpp_reader *parse_in; >> >> @@ -674,9 +675,13 @@ check_classfn (tree ctype, tree function, tree tem >>      if (is_template != (TREE_CODE (fndecl) == TEMPLATE_DECL)) >>       continue; >> >> +     /* While finding a match, same types and params are not enough >> +       if the function is versioned.  Also check version ("targetv") >> +       attributes.  */ >>      if (same_type_p (TREE_TYPE (TREE_TYPE (function)), >>              TREE_TYPE (TREE_TYPE (fndecl))) >>        && compparms (p1, p2) >> +       && !has_different_version_attributes (function, fndecl) >>        && (!is_template >>          || comp_template_parms (template_parms, >>                      DECL_TEMPLATE_PARMS (fndecl))) >> Index: cp/call.c >> =================================================================== >> --- cp/call.c  (revision 184971) >> +++ cp/call.c  (working copy) >> @@ -41,6 +41,7 @@ along with GCC; see the file COPYING3.  If not see >>  #include "langhooks.h" >>  #include "c-family/c-objc.h" >>  #include "timevar.h" >> +#include "multiversion.h" >> >>  /* The various kinds of conversion.  */ >> >> @@ -6730,6 +6731,17 @@ build_over_call (struct z_candidate *cand, int fla >>  if (!already_used) >>   mark_used (fn); >> >> +  /* For a call to a multi-versioned function, the call should actually be to >> +   the dispatcher.  */ >> +  if (DECL_FUNCTION_VERSIONED (fn)) >> +   { >> +    tree ifunc_decl; >> +    ifunc_decl = get_ifunc_for_version (fn); >> +    gcc_assert (ifunc_decl != NULL); >> +    return build_call_expr_loc_array (UNKNOWN_LOCATION, ifunc_decl, >> +                    nargs, argarray); >> +   } >> + >>  if (DECL_VINDEX (fn) && (flags & LOOKUP_NONVIRTUAL) == 0) >>   { >>    tree t; >> @@ -7980,6 +7992,30 @@ joust (struct z_candidate *cand1, struct z_candida >>  size_t i; >>  size_t len; >> >> +  /* For Candidates of a multi-versioned function, the one marked default >> +   wins.  This is because the default decl is used as key to aggregate >> +   all the other versions provided for it in multiversion.c.  When >> +   generating the actual call, the appropriate dispatcher is created >> +   to call the right function version at run-time.  */ >> + >> +  if ((TREE_CODE (cand1->fn) == FUNCTION_DECL >> +    && DECL_FUNCTION_VERSIONED (cand1->fn)) >> +    ||(TREE_CODE (cand2->fn) == FUNCTION_DECL >> +     && DECL_FUNCTION_VERSIONED (cand2->fn))) >> +   { >> +    if (is_default_function (cand1->fn)) >> +    { >> +      mark_used (cand2->fn); >> +     return 1; >> +    } >> +    if (is_default_function (cand2->fn)) >> +    { >> +      mark_used (cand1->fn); >> +     return -1; >> +    } >> +    return 0; >> +   } >> + >>  /* Candidates that involve bad conversions are always worse than those >>    that don't.  */ >>  if (cand1->viable > cand2->viable) >> Index: timevar.def >> =================================================================== >> --- timevar.def (revision 184971) >> +++ timevar.def (working copy) >> @@ -253,6 +253,7 @@ DEFTIMEVAR (TV_TREE_IFCOMBINE     , "tree if-co >>  DEFTIMEVAR (TV_TREE_UNINIT      , "uninit var analysis") >>  DEFTIMEVAR (TV_PLUGIN_INIT      , "plugin initialization") >>  DEFTIMEVAR (TV_PLUGIN_RUN       , "plugin execution") >> +DEFTIMEVAR (TV_MULTIVERSION_DISPATCH , "multiversion dispatch") >> >>  /* Everything else in rest_of_compilation not included above.  */ >>  DEFTIMEVAR (TV_EARLY_LOCAL      , "early local passes") >> Index: varasm.c >> =================================================================== >> --- varasm.c   (revision 184971) >> +++ varasm.c   (working copy) >> @@ -5755,6 +5755,8 @@ finish_aliases_1 (void) >>     } >>    else if (! (p->emitted_diags & ALIAS_DIAG_TO_EXTERN) >>        && DECL_EXTERNAL (target_decl) >> +        && (!TREE_CODE (target_decl) == FUNCTION_DECL >> +          || !DECL_STRUCT_FUNCTION (target_decl)) >>        /* We use local aliases for C++ thunks to force the tailcall >>          to bind locally.  This is a hack - to keep it working do >>          the following (which is not strictly correct).  */ >> Index: Makefile.in >> =================================================================== >> --- Makefile.in (revision 184971) >> +++ Makefile.in (working copy) >> @@ -1298,6 +1298,7 @@ OBJS = \ >>     mcf.o \ >>     mode-switching.o \ >>     modulo-sched.o \ >> +    multiversion.o \ >>     omega.o \ >>     omp-low.o \ >>     optabs.o \ >> @@ -3030,6 +3031,11 @@ ree.o : ree.c $(CONFIG_H) $(SYSTEM_H) coretypes.h >>   $(DF_H) $(TIMEVAR_H) tree-pass.h $(RECOG_H) $(EXPR_H) \ >>   $(REGS_H) $(TREE_H) $(TM_P_H) insn-config.h $(INSN_ATTR_H) $(DIAGNOSTIC_CORE_H) \ >>   $(TARGET_H) $(OPTABS_H) insn-codes.h rtlhooks-def.h $(PARAMS_H) $(CGRAPH_H) >> +multiversion.o : multiversion.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \ >> +  $(TREE_H) langhooks.h $(TREE_INLINE_H) $(FLAGS_H) $(CGRAPH_H) intl.h \ >> +  $(DIAGNOSTIC_H) $(FIBHEAP_H) $(PARAMS_H) $(TIMEVAR_H) tree-pass.h \ >> +  $(HASHTAB_H) $(COVERAGE_H) $(GGC_H) $(TREE_FLOW_H) $(RTL_H) $(IPA_PROP_H) \ >> +  $(BASIC_BLOCK_H) $(TOPLEV_H) $(TREE_DUMP_H) ipa-inline.h >>  cprop.o : cprop.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \ >>   $(REGS_H) hard-reg-set.h $(FLAGS_H) insn-config.h $(GGC_H) \ >>   $(RECOG_H) $(EXPR_H) $(BASIC_BLOCK_H) $(FUNCTION_H) output.h toplev.h $(DIAGNOSTIC_CORE_H) \ >> Index: passes.c >> =================================================================== >> --- passes.c   (revision 184971) >> +++ passes.c   (working copy) >> @@ -1190,6 +1190,7 @@ init_optimization_passes (void) >>  NEXT_PASS (pass_build_cfg); >>  NEXT_PASS (pass_warn_function_return); >>  NEXT_PASS (pass_build_cgraph_edges); >> +  NEXT_PASS (pass_dispatch_versions); >>  *p = NULL; >> >>  /* Interprocedural optimization passes.  */ >> Index: config/i386/i386.c >> =================================================================== >> --- config/i386/i386.c  (revision 184971) >> +++ config/i386/i386.c  (working copy) >> @@ -27446,6 +27473,593 @@ ix86_init_mmx_sse_builtins (void) >>   } >>  } >> >> +/* This adds a condition to the basic_block NEW_BB in function FUNCTION_DECL >> +  to return a pointer to VERSION_DECL if the outcome of the function >> +  PREDICATE_DECL is true.  This function will be called during version >> +  dispatch to decide which function version to execute.  It returns the >> +  basic block at the end to which more conditions can be added.  */ >> + >> +static basic_block >> +add_condition_to_bb (tree function_decl, tree version_decl, >> +           basic_block new_bb, tree predicate_decl) >> +{ >> +  gimple return_stmt; >> +  tree convert_expr, result_var; >> +  gimple convert_stmt; >> +  gimple call_cond_stmt; >> +  gimple if_else_stmt; >> + >> +  basic_block bb1, bb2, bb3; >> +  edge e12, e23; >> + >> +  tree cond_var; >> +  gimple_seq gseq; >> + >> +  tree old_current_function_decl; >> + >> +  old_current_function_decl = current_function_decl; >> +  push_cfun (DECL_STRUCT_FUNCTION (function_decl)); >> +  current_function_decl = function_decl; >> + >> +  gcc_assert (new_bb != NULL); >> +  gseq = bb_seq (new_bb); >> + >> + >> +  convert_expr = build1 (CONVERT_EXPR, ptr_type_node, >> +             build_fold_addr_expr (version_decl)); >> +  result_var = create_tmp_var (ptr_type_node, NULL); >> +  convert_stmt = gimple_build_assign (result_var, convert_expr); >> +  return_stmt = gimple_build_return (result_var); >> + >> +  if (predicate_decl == NULL_TREE) >> +   { >> +    gimple_seq_add_stmt (&gseq, convert_stmt); >> +    gimple_seq_add_stmt (&gseq, return_stmt); >> +    set_bb_seq (new_bb, gseq); >> +    gimple_set_bb (convert_stmt, new_bb); >> +    gimple_set_bb (return_stmt, new_bb); >> +    pop_cfun (); >> +    current_function_decl = old_current_function_decl; >> +    return new_bb; >> +   } >> + >> +  cond_var = create_tmp_var (integer_type_node, NULL); >> +  call_cond_stmt = gimple_build_call (predicate_decl, 0); >> +  gimple_call_set_lhs (call_cond_stmt, cond_var); >> + >> +  gimple_set_block (call_cond_stmt, DECL_INITIAL (function_decl)); >> +  gimple_set_bb (call_cond_stmt, new_bb); >> +  gimple_seq_add_stmt (&gseq, call_cond_stmt); >> + >> +  if_else_stmt = gimple_build_cond (GT_EXPR, cond_var, >> +                  integer_zero_node, >> +                  NULL_TREE, NULL_TREE); >> +  gimple_set_block (if_else_stmt, DECL_INITIAL (function_decl)); >> +  gimple_set_bb (if_else_stmt, new_bb); >> +  gimple_seq_add_stmt (&gseq, if_else_stmt); >> + >> +  gimple_seq_add_stmt (&gseq, convert_stmt); >> +  gimple_seq_add_stmt (&gseq, return_stmt); >> +  set_bb_seq (new_bb, gseq); >> + >> +  bb1 = new_bb; >> +  e12 = split_block (bb1, if_else_stmt); >> +  bb2 = e12->dest; >> +  e12->flags &= ~EDGE_FALLTHRU; >> +  e12->flags |= EDGE_TRUE_VALUE; >> + >> +  e23 = split_block (bb2, return_stmt); >> + >> +  gimple_set_bb (convert_stmt, bb2); >> +  gimple_set_bb (return_stmt, bb2); >> + >> +  bb3 = e23->dest; >> +  make_edge (bb1, bb3, EDGE_FALSE_VALUE); >> + >> +  remove_edge (e23); >> +  make_edge (bb2, EXIT_BLOCK_PTR, 0); >> + >> +  rebuild_cgraph_edges (); >> + >> +  pop_cfun (); >> +  current_function_decl = old_current_function_decl; >> + >> +  return bb3; >> +} >> + >> +/* This parses the attribute arguments to targetv in DECL and determines >> +  the right builtin to use to match the platform specification. >> +  For now, only one target argument ("arch=") is allowed.  */ >> + >> +static enum ix86_builtins >> +get_builtin_code_for_version (tree decl) >> +{ >> +  tree attrs; >> +  struct cl_target_option cur_target; >> +  tree target_node; >> +  struct cl_target_option *new_target; >> +  enum ix86_builtins builtin_code = IX86_BUILTIN_MAX; >> + >> +  attrs = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl)); >> +  gcc_assert (attrs != NULL); >> + >> +  cl_target_option_save (&cur_target, &global_options); >> + >> +  target_node = ix86_valid_target_attribute_tree >> +         (TREE_VALUE (TREE_VALUE (attrs))); >> + >> +  gcc_assert (target_node); >> +  new_target = TREE_TARGET_OPTION (target_node); >> +  gcc_assert (new_target); >> + >> +  if (new_target->arch_specified && new_target->arch > 0) >> +   { >> +    switch (new_target->arch) >> +     { >> +    case 1: >> +    case 2: >> +    case 3: >> +    case 4: >> +    case 5: >> +    case 6: >> +    case 7: >> +    case 8: >> +    case 9: >> +    case 10: >> +    case 11: >> +     builtin_code = IX86_BUILTIN_CPU_IS_INTEL; >> +     break; >> +    case 12: >> +     builtin_code = IX86_BUILTIN_CPU_IS_INTEL_CORE2; >> +     break; >> +    case 13: >> +     builtin_code = IX86_BUILTIN_CPU_IS_INTEL_COREI7; >> +     break; >> +    case 14: >> +     builtin_code = IX86_BUILTIN_CPU_IS_INTEL_ATOM; >> +     break; >> +    case 15: >> +    case 16: >> +    case 17: >> +    case 18: >> +    case 19: >> +    case 20: >> +    case 21: >> +     builtin_code = IX86_BUILTIN_CPU_IS_AMD; >> +     break; >> +    case 22: >> +     builtin_code = IX86_BUILTIN_CPU_IS_AMDFAM10H; >> +     break; >> +    case 23: >> +     builtin_code = IX86_BUILTIN_CPU_IS_AMDFAM15H_BDVER1; >> +     break; >> +    case 24: >> +     builtin_code = IX86_BUILTIN_CPU_IS_AMDFAM15H_BDVER2; >> +     break; >> +    case 25: /* What is btver1 ? */ >> +     builtin_code = IX86_BUILTIN_CPU_IS_AMD; >> +     break; >> +    } >> +   } >> + >> +  cl_target_option_restore (&global_options, &cur_target); >> +  if (builtin_code == IX86_BUILTIN_MAX) >> +    error_at (DECL_SOURCE_LOCATION (decl), >> +        "No dispatcher found for the versioning attributes"); >> + >> +  return builtin_code; >> +} >> + >> +/* This is the target hook to generate the dispatch function for >> +  multi-versioned functions.  DISPATCH_DECL is the function which will >> +  contain the dispatch logic.  FNDECLS are the function choices for >> +  dispatch, and is a tree chain.  EMPTY_BB is the basic block pointer >> +  in DISPATCH_DECL in which the dispatch code is generated.  */ >> + >> +static int >> +ix86_dispatch_version (tree dispatch_decl, >> +            void *fndecls_p, >> +            basic_block *empty_bb) >> +{ >> +  tree default_decl; >> +  gimple ifunc_cpu_init_stmt; >> +  gimple_seq gseq; >> +  tree old_current_function_decl; >> +  int ix; >> +  tree ele; >> +  VEC (tree, heap) *fndecls; >> + >> +  gcc_assert (dispatch_decl != NULL >> +       && fndecls_p != NULL >> +       && empty_bb != NULL); >> + >> +  /*fndecls_p is actually a vector.  */ >> +  fndecls = (VEC (tree, heap) *)fndecls_p; >> + >> +  /* Atleast one more version other than the default.  */ >> +  gcc_assert (VEC_length (tree, fndecls) >= 2); >> + >> +  /* The first version in the vector is the default decl.  */ >> +  default_decl = VEC_index (tree, fndecls, 0); >> + >> +  old_current_function_decl = current_function_decl; >> +  push_cfun (DECL_STRUCT_FUNCTION (dispatch_decl)); >> +  current_function_decl = dispatch_decl; >> + >> +  gseq = bb_seq (*empty_bb); >> +  ifunc_cpu_init_stmt = gimple_build_call_vec ( >> +           ix86_builtins [(int) IX86_BUILTIN_CPU_INIT], NULL); >> +  gimple_seq_add_stmt (&gseq, ifunc_cpu_init_stmt); >> +  gimple_set_bb (ifunc_cpu_init_stmt, *empty_bb); >> +  set_bb_seq (*empty_bb, gseq); >> + >> +  pop_cfun (); >> +  current_function_decl = old_current_function_decl; >> + >> + >> +  for (ix = 1; VEC_iterate (tree, fndecls, ix, ele); ++ix) >> +   { >> +    tree version_decl = ele; >> +    /* Get attribute string, parse it and find the right predicate decl. >> +     The predicate function could be a lengthy combination of many >> +     features, like arch-type and various isa-variants.  For now, only >> +     check the arch-type.  */ >> +    tree predicate_decl = ix86_builtins [ >> +            get_builtin_code_for_version (version_decl)]; >> +    *empty_bb = add_condition_to_bb (dispatch_decl, version_decl, *empty_bb, >> +                    predicate_decl); >> + >> +   } >> +  /* dispatch default version at the end.  */ >> +  *empty_bb = add_condition_to_bb (dispatch_decl, default_decl, *empty_bb, >> +                  NULL); >> +  return 0; >> +} >> >> @@ -38610,6 +39269,12 @@ ix86_autovectorize_vector_sizes (void) >>  #undef TARGET_BUILD_BUILTIN_VA_LIST >>  #define TARGET_BUILD_BUILTIN_VA_LIST ix86_build_builtin_va_list >> >> +#undef TARGET_DISPATCH_VERSION >> +#define TARGET_DISPATCH_VERSION ix86_dispatch_version >> + >>  #undef TARGET_ENUM_VA_LIST_P >>  #define TARGET_ENUM_VA_LIST_P ix86_enum_va_list >> >> Index: testsuite/g++.dg/mv1.C >> =================================================================== >> --- testsuite/g++.dg/mv1.C    (revision 0) >> +++ testsuite/g++.dg/mv1.C    (revision 0) >> @@ -0,0 +1,23 @@ >> +/* Simple test case to check if Multiversioning works.  */ >> +/* { dg-do run } */ >> +/* { dg-options "-O2" } */ >> + >> +int foo (); >> +int foo () __attribute__ ((targetv("arch=corei7"))); >> + >> +int main () >> +{ >> +  int (*p)() = &foo; >> +  return foo () + (*p)(); >> +} >> + >> +int foo () >> +{ >> +  return 0; >> +} >> + >> +int __attribute__ ((targetv("arch=corei7"))) >> +foo () >> +{ >> +  return 0; >> +} >> >> >> -- >> This patch is available for review at http://codereview.appspot.com/5752064
Sign in to reply to this message.
On Wed, Mar 7, 2012 at 11:08 AM, Sriraman Tallam <tmsriram@google.com> wrote: > On Wed, Mar 7, 2012 at 6:05 AM, Richard Guenther > <richard.guenther@gmail.com> wrote: >> On Wed, Mar 7, 2012 at 1:46 AM, Sriraman Tallam <tmsriram@google.com> wrote: >>> User directed Function Multiversioning (MV) via Function Overloading >>> ==================================================================== >>> >>> This patch adds support for user directed function MV via function overloading. >>> For more detailed description: >>> http://gcc.gnu.org/ml/gcc/2012-03/msg00074.html >>> >>> >>> Here is an example program with function versions: >>> >>> int foo ();  /* Default version */ >>> int foo () __attribute__ ((targetv("arch=corei7")));/*Specialized for corei7 */ >>> int foo () __attribute__ ((targetv("arch=core2")));/*Specialized for core2 */ >>> >>> int main () >>> { >>>  int (*p)() = &foo; >>>  return foo () + (*p)(); >>> } >>> >>> int foo () >>> { >>>  return 0; >>> } >>> >>> int __attribute__ ((targetv("arch=corei7"))) >>> foo () >>> { >>>  return 0; >>> } >>> >>> int __attribute__ ((targetv("arch=core2"))) >>> foo () >>> { >>>  return 0; >>> } >>> >>> The above example has foo defined 3 times, but all 3 definitions of foo are >>> different versions of the same function. The call to foo in main, directly and >>> via a pointer, are calls to the multi-versioned function foo which is dispatched >>> to the right foo at run-time. >>> >>> Function versions must have the same signature but must differ in the specifier >>> string provided to a new attribute called "targetv", which is nothing but the >>> target attribute with an extra specification to indicate a version. Any number >>> of versions can be created using the targetv attribute but it is mandatory to >>> have one function without the attribute, which is treated as the default >>> version. >>> >>> The dispatching is done using the IFUNC mechanism to keep the dispatch overhead >>> low. The compiler creates a dispatcher function which checks the CPU type and >>> calls the right version of foo. The dispatching code checks for the platform >>> type and calls the first version that matches. The default function is called if >>> no specialized version is appropriate for execution. >>> >>> The pointer to foo is made to be the address of the dispatcher function, so that >>> it is unique and calls made via the pointer also work correctly. The assembler >>> names of the various versions of foo is made different, by tagging >>> the specifier strings, to keep them unique.  A specific version can be called >>> directly by creating an alias to its assembler name. For instance, to call the >>> corei7 version directly, make an alias : >>> int foo_corei7 () __attribute__((alias ("_Z3foov.arch_corei7"))); >>> and then call foo_corei7. >>> >>> Note that using IFUNC  blocks inlining of versioned functions. I had implemented >>> an optimization earlier to do hot path cloning to allow versioned functions to >>> be inlined. Please see : http://gcc.gnu.org/ml/gcc-patches/2011-04/msg02285.html >>> In the next iteration, I plan to merge these two. With that, hot code paths with >>> versioned functions will be cloned so that versioned functions can be inlined. >> >> Note that inlining of functions with the target attribute is limited as well, >> but your issue is that of the indirect dispatch as ... >> >> You don't give an overview of the frontend implementation.  Thus I have >> extracted the following >> >>  - the FE does not really know about the "overloading", nor can it directly >>  resolve calls from a "sse" function to another "sse" function without going >>  through the 2nd IFUNC > > This is a good point but I can change function joust, where the > overload candidate is selected, to return the decl of the versioned > function with matching target attributes as that of the callee. That > will solve this problem. I have to treat the target attributes as an > additional criterion for a match in overload resolution. The front end > *does know* about the overloading, it is a question of doing the > overload resolution correctly right?  This is easy when there is no > cloning involved. Should this be covered by a new IFUNC folding rule? FE just needs to generate dummy code. > > When cloning of a version is required, it gets complicated since the > FE must clone and produce the bodies. Once, all the bodies are > available the overload resolution can do the right thing. > How can you safely clone a function without knowing if the versioned body is available in another module? David >> >>  - cgraph also does not know about the "overloading", so it cannot do such >>  "devirtualization" either >> >> you seem to have implemented something inbetween a pure frontend >> solution and a proper middle-end solution. > > The only thing I delayed is the code generation of the dispatcher. I > thought it is better to have this come later, after cfg and cgraph is > generated, so that multiple dispatching mechanisms could be > implemented. > > For optimization and eventually >> automatically selecting functions for cloning (like, callees of a manual "sse" >> versioned function should be cloned?) it would be nice if the cgraph would >> know about the different versions and their relationships (and the dispatcher). >> Especially the cgraph code should know the functions are semantically >> equivalent (I suppose we should require that).  The IFUNC should be >> generated by cgraph / target code, similar to how we generate C++ thunks. >> >> Honza, any suggestions on how the FE side of such cgraph infrastructure >> should look like and how we should encode the target bits? >> >> Thanks, >> Richard. >> >>>     * doc/tm.texi.in: Add description for TARGET_DISPATCH_VERSION. >>>     * doc/tm.texi: Regenerate. >>>     * c-family/c-common.c (handle_targetv_attribute): New function. >>>     * target.def (dispatch_version): New target hook. >>>     * tree.h (DECL_FUNCTION_VERSIONED): New macro. >>>     (tree_function_decl): New bit-field versioned_function. >>>     * tree-pass.h (pass_dispatch_versions): New pass. >>>     * multiversion.c: New file. >>>     * multiversion.h: New file. >>>     * cgraphunit.c: Include multiversion.h >>>     (cgraph_finalize_function): Change assembler names of versioned >>>     functions. >>>     * cp/class.c: Include multiversion.h >>>     (add_method): aggregate function versions. Change assembler names of >>>     versioned functions. >>>     (resolve_address_of_overloaded_function): Match address of function >>>     version with default function.  Return address of ifunc dispatcher >>>     for address of versioned functions. >>>     * cp/decl.c (decls_match): Make decls unmatched for versioned >>>     functions. >>>     (duplicate_decls): Remove ambiguity for versioned functions. Notify >>>     of deleted function version decls. >>>     (start_decl): Change assembler name of versioned functions. >>>     (start_function): Change assembler name of versioned functions. >>>     (cxx_comdat_group): Make comdat group of versioned functions be the >>>     same. >>>     * cp/semantics.c (expand_or_defer_fn_1): Mark as needed versioned >>>     functions that are also marked inline. >>>     * cp/decl2.c: Include multiversion.h >>>     (check_classfn): Check attributes of versioned functions for match. >>>     * cp/call.c: Include multiversion.h >>>     (build_over_call): Make calls to multiversioned functions to call the >>>     dispatcher. >>>     (joust): For calls to multi-versioned functions, make the default >>>     function win. >>>     * timevar.def (TV_MULTIVERSION_DISPATCH): New time var. >>>     * varasm.c (finish_aliases_1): Check if the alias points to a function >>>     with a body before giving an error. >>>     * Makefile.in: Add multiversion.o >>>     * passes.c: Add pass_dispatch_versions to the pass list. >>>     * config/i386/i386.c (add_condition_to_bb): New function. >>>     (get_builtin_code_for_version): New function. >>>     (ix86_dispatch_version): New function. >>>     (TARGET_DISPATCH_VERSION): New macro. >>>     * testsuite/g++.dg/mv1.C: New test. >>> >>> Index: doc/tm.texi >>> =================================================================== >>> --- doc/tm.texi (revision 184971) >>> +++ doc/tm.texi (working copy) >>> @@ -10995,6 +10995,14 @@ The result is another tree containing a simplified >>>  call's result.  If @var{ignore} is true the value will be ignored. >>>  @end deftypefn >>> >>> +@deftypefn {Target Hook} int TARGET_DISPATCH_VERSION (tree @var{dispatch_decl}, void *@var{fndecls}, basic_block *@var{empty_bb}) >>> +For multi-versioned function, this hook sets up the dispatcher. >>> +@var{dispatch_decl} is the function that will be used to dispatch the >>> +version. @var{fndecls} are the function choices for dispatch. >>> +@var{empty_bb} is an basic block in @var{dispatch_decl} where the >>> +code to do the dispatch will be added. >>> +@end deftypefn >>> + >>>  @deftypefn {Target Hook} {const char *} TARGET_INVALID_WITHIN_DOLOOP (const_rtx @var{insn}) >>> >>>  Take an instruction in @var{insn} and return NULL if it is valid within a >>> Index: doc/tm.texi.in >>> =================================================================== >>> --- doc/tm.texi.in    (revision 184971) >>> +++ doc/tm.texi.in    (working copy) >>> @@ -10873,6 +10873,14 @@ The result is another tree containing a simplified >>>  call's result.  If @var{ignore} is true the value will be ignored. >>>  @end deftypefn >>> >>> +@hook TARGET_DISPATCH_VERSION >>> +For multi-versioned function, this hook sets up the dispatcher. >>> +@var{dispatch_decl} is the function that will be used to dispatch the >>> +version. @var{fndecls} are the function choices for dispatch. >>> +@var{empty_bb} is an basic block in @var{dispatch_decl} where the >>> +code to do the dispatch will be added. >>> +@end deftypefn >>> + >>>  @hook TARGET_INVALID_WITHIN_DOLOOP >>> >>>  Take an instruction in @var{insn} and return NULL if it is valid within a >>> Index: c-family/c-common.c >>> =================================================================== >>> --- c-family/c-common.c (revision 184971) >>> +++ c-family/c-common.c (working copy) >>> @@ -315,6 +315,7 @@ static tree check_case_value (tree); >>>  static bool check_case_bounds (tree, tree, tree *, tree *); >>> >>>  static tree handle_packed_attribute (tree *, tree, tree, int, bool *); >>> +static tree handle_targetv_attribute (tree *, tree, tree, int, bool *); >>>  static tree handle_nocommon_attribute (tree *, tree, tree, int, bool *); >>>  static tree handle_common_attribute (tree *, tree, tree, int, bool *); >>>  static tree handle_noreturn_attribute (tree *, tree, tree, int, bool *); >>> @@ -604,6 +605,8 @@ const struct attribute_spec c_common_attribute_tab >>>  { >>>  /* { name, min_len, max_len, decl_req, type_req, fn_type_req, handler, >>>     affects_type_identity } */ >>> +  { "targetv",        1, -1, true, false, false, >>> +               handle_targetv_attribute, false }, >>>  { "packed",         0, 0, false, false, false, >>>                handle_packed_attribute , false}, >>>  { "nocommon",        0, 0, true,  false, false, >>> @@ -5869,6 +5872,54 @@ handle_packed_attribute (tree *node, tree name, tr >>>  return NULL_TREE; >>>  } >>> >>> +/* The targetv attribue is used to specify a function version >>> +  targeted to specific platform types.  The "targetv" attributes >>> +  have to be valid "target" attributes.  NODE should always point >>> +  to a FUNCTION_DECL.  ARGS contain the arguments to "targetv" >>> +  which should be valid arguments to attribute "target" too. >>> +  Check handle_target_attribute for FLAGS and NO_ADD_ATTRS.  */ >>> + >>> +static tree >>> +handle_targetv_attribute (tree *node, tree name, >>> +             tree args, >>> +             int flags, >>> +             bool *no_add_attrs) >>> +{ >>> +  const char *attr_str = NULL; >>> +  gcc_assert (TREE_CODE (*node) == FUNCTION_DECL); >>> +  gcc_assert (args != NULL); >>> + >>> +  /* This is a function version.  */ >>> +  DECL_FUNCTION_VERSIONED (*node) = 1; >>> + >>> +  attr_str = TREE_STRING_POINTER (TREE_VALUE (args)); >>> + >>> +  /* Check if multiple sets of target attributes are there.  This >>> +   is not supported now.  In future, this will be supported by >>> +   cloning this function for each set.  */ >>> +  if (TREE_CHAIN (args) != NULL) >>> +   warning (OPT_Wattributes, "%qE attribute has multiple sets which " >>> +       "is not supported", name); >>> + >>> +  if (attr_str == NULL >>> +    || strstr (attr_str, "arch=") == NULL) >>> +   error_at (DECL_SOURCE_LOCATION (*node), >>> +       "Versioning supported only on \"arch=\" for now"); >>> + >>> +  /* targetv attributes must translate into target attributes.  */ >>> +  handle_target_attribute (node, get_identifier ("target"), args, flags, >>> +              no_add_attrs); >>> + >>> +  if (*no_add_attrs) >>> +   warning (OPT_Wattributes, "%qE attribute has no effect", name); >>> + >>> +  /* This is necessary to keep the attribute tagged to the decl >>> +   all the time.  */ >>> +  *no_add_attrs = false; >>> + >>> +  return NULL_TREE; >>> +} >>> + >>>  /* Handle a "nocommon" attribute; arguments as in >>>   struct attribute_spec.handler.  */ >>> >>> Index: target.def >>> =================================================================== >>> --- target.def  (revision 184971) >>> +++ target.def  (working copy) >>> @@ -1249,6 +1249,15 @@ DEFHOOK >>>  tree, (tree fndecl, int n_args, tree *argp, bool ignore), >>>  hook_tree_tree_int_treep_bool_null) >>> >>> +/* Target hook to generate the dispatching code for calls to multi-versioned >>> +  functions.  DISPATCH_DECL is the function that will have the dispatching >>> +  logic.  FNDECLS are the list of choices for dispatch and EMPTY_BB is the >>> +  basic bloc in DISPATCH_DECL which will contain the code.  */ >>> +DEFHOOK >>> +(dispatch_version, >>> + "", >>> + int, (tree dispatch_decl, void *fndecls, basic_block *empty_bb), NULL) >>> + >>>  /* Returns a code for a target-specific builtin that implements >>>   reciprocal of the function, or NULL_TREE if not available.  */ >>>  DEFHOOK >>> Index: tree.h >>> =================================================================== >>> --- tree.h    (revision 184971) >>> +++ tree.h    (working copy) >>> @@ -3532,6 +3532,12 @@ extern VEC(tree, gc) **decl_debug_args_insert (tre >>>  #define DECL_FUNCTION_SPECIFIC_OPTIMIZATION(NODE) \ >>>   (FUNCTION_DECL_CHECK (NODE)->function_decl.function_specific_optimization) >>> >>> +/* In FUNCTION_DECL, this is set if this function has other versions generated >>> +  using "targetv" attributes.  The default version is the one which does not >>> +  have any "targetv" attribute set. */ >>> +#define DECL_FUNCTION_VERSIONED(NODE)\ >>> +  (FUNCTION_DECL_CHECK (NODE)->function_decl.versioned_function) >>> + >>>  /* FUNCTION_DECL inherits from DECL_NON_COMMON because of the use of the >>>   arguments/result/saved_tree fields by front ends.  It was either inherit >>>   FUNCTION_DECL from non_common, or inherit non_common from FUNCTION_DECL, >>> @@ -3576,8 +3582,8 @@ struct GTY(()) tree_function_decl { >>>  unsigned looping_const_or_pure_flag : 1; >>>  unsigned has_debug_args_flag : 1; >>>  unsigned tm_clone_flag : 1; >>> - >>> -  /* 1 bit left */ >>> +  unsigned versioned_function : 1; >>> +  /* No bits left.  */ >>>  }; >>> >>>  /* The source language of the translation-unit.  */ >>> Index: tree-pass.h >>> =================================================================== >>> --- tree-pass.h (revision 184971) >>> +++ tree-pass.h (working copy) >>> @@ -455,6 +455,7 @@ extern struct gimple_opt_pass pass_tm_memopt; >>>  extern struct gimple_opt_pass pass_tm_edges; >>>  extern struct gimple_opt_pass pass_split_functions; >>>  extern struct gimple_opt_pass pass_feedback_split_functions; >>> +extern struct gimple_opt_pass pass_dispatch_versions; >>> >>>  /* IPA Passes */ >>>  extern struct simple_ipa_opt_pass pass_ipa_lower_emutls; >>> Index: multiversion.c >>> =================================================================== >>> --- multiversion.c    (revision 0) >>> +++ multiversion.c    (revision 0) >>> @@ -0,0 +1,798 @@ >>> +/* Function Multiversioning. >>> +  Copyright (C) 2012 Free Software Foundation, Inc. >>> +  Contributed by Sriraman Tallam (tmsriram@google.com) >>> + >>> +This file is part of GCC. >>> + >>> +GCC is free software; you can redistribute it and/or modify it under >>> +the terms of the GNU General Public License as published by the Free >>> +Software Foundation; either version 3, or (at your option) any later >>> +version. >>> + >>> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY >>> +WARRANTY; without even the implied warranty of MERCHANTABILITY or >>> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License >>> +for more details. >>> + >>> +You should have received a copy of the GNU General Public License >>> +along with GCC; see the file COPYING3.  If not see >>> +<http://www.gnu.org/licenses/>. */ >>> + >>> +/* Holds the state for multi-versioned functions here. The front-end >>> +  updates the state as and when function versions are encountered. >>> +  This is then used to generate the dispatch code.  Also, the >>> +  optimization passes to clone hot paths involving versioned functions >>> +  will be done here. >>> + >>> +  Function versions are created by using the same function signature but >>> +  also tagging attribute "targetv" to specify the platform type for which >>> +  the version must be executed.  Here is an example: >>> + >>> +  int foo () >>> +  { >>> +   printf ("Execute as default"); >>> +   return 0; >>> +  } >>> + >>> +  int  __attribute__ ((targetv ("arch=corei7"))) >>> +  foo () >>> +  { >>> +   printf ("Execute for corei7"); >>> +   return 0; >>> +  } >>> + >>> +  int main () >>> +  { >>> +   return foo (); >>> +  } >>> + >>> +  The call to foo in main is replaced with a call to an IFUNC function that >>> +  contains the dispatch code to call the correct function version at >>> +  run-time.  */ >>> + >>> + >>> +#include "config.h" >>> +#include "system.h" >>> +#include "coretypes.h" >>> +#include "tm.h" >>> +#include "tree.h" >>> +#include "tree-inline.h" >>> +#include "langhooks.h" >>> +#include "flags.h" >>> +#include "cgraph.h" >>> +#include "diagnostic.h" >>> +#include "toplev.h" >>> +#include "timevar.h" >>> +#include "params.h" >>> +#include "fibheap.h" >>> +#include "intl.h" >>> +#include "tree-pass.h" >>> +#include "hashtab.h" >>> +#include "coverage.h" >>> +#include "ggc.h" >>> +#include "tree-flow.h" >>> +#include "rtl.h" >>> +#include "ipa-prop.h" >>> +#include "basic-block.h" >>> +#include "toplev.h" >>> +#include "dbgcnt.h" >>> +#include "tree-dump.h" >>> +#include "output.h" >>> +#include "vecprim.h" >>> +#include "gimple-pretty-print.h" >>> +#include "ipa-inline.h" >>> +#include "target.h" >>> +#include "multiversion.h" >>> + >>> +typedef void * void_p; >>> + >>> +DEF_VEC_P (void_p); >>> +DEF_VEC_ALLOC_P (void_p, heap); >>> + >>> +/* Each function decl that is a function version gets an instance of this >>> +  structure.  Since this is called by the front-end, decl merging can >>> +  happen, where a decl created for a new declaration is merged with >>> +  the old. In this case, the new decl is deleted and the IS_DELETED >>> +  field is set for the struct instance corresponding to the new decl. >>> +  IFUNC_DECL is the decl of the ifunc function for default decls. >>> +  IFUNC_RESOLVER_DECL is the decl of the dispatch function.  VERSIONS >>> +  is a vector containing the list of function versions  that are >>> +  the candidates for dispatch.  */ >>> + >>> +typedef struct version_function_d { >>> +  tree decl; >>> +  tree ifunc_decl; >>> +  tree ifunc_resolver_decl; >>> +  VEC (void_p, heap) *versions; >>> +  bool is_deleted; >>> +} version_function; >>> + >>> +/* Hashmap has an entry for every function decl that has other function >>> +  versions.  For function decls that are the default, it also stores the >>> +  list of all the other function versions.  Each entry is a structure >>> +  of type version_function_d.  */ >>> +static htab_t decl_version_htab = NULL; >>> + >>> +/* Hashtable helpers for decl_version_htab. */ >>> + >>> +static hashval_t >>> +decl_version_htab_hash_descriptor (const void *p) >>> +{ >>> +  const version_function *t = (const version_function *) p; >>> +  return htab_hash_pointer (t->decl); >>> +} >>> + >>> +/* Hashtable helper for decl_version_htab. */ >>> + >>> +static int >>> +decl_version_htab_eq_descriptor (const void *p1, const void *p2) >>> +{ >>> +  const version_function *t1 = (const version_function *) p1; >>> +  return htab_eq_pointer ((const void_p) t1->decl, p2); >>> +} >>> + >>> +/* Create the decl_version_htab.  */ >>> +static void >>> +create_decl_version_htab (void) >>> +{ >>> +  if (decl_version_htab == NULL) >>> +   decl_version_htab = htab_create (10, decl_version_htab_hash_descriptor, >>> +                   decl_version_htab_eq_descriptor, NULL); >>> +} >>> + >>> +/* Creates an instance of version_function for decl DECL.  */ >>> + >>> +static version_function* >>> +new_version_function (const tree decl) >>> +{ >>> +  version_function *v; >>> +  v = (version_function *)xmalloc(sizeof (version_function)); >>> +  v->decl = decl; >>> +  v->ifunc_decl = NULL; >>> +  v->ifunc_resolver_decl = NULL; >>> +  v->versions = NULL; >>> +  v->is_deleted = false; >>> +  return v; >>> +} >>> + >>> +/* Comparator function to be used in qsort routine to sort attribute >>> +  specification strings to "targetv".  */ >>> + >>> +static int >>> +attr_strcmp (const void *v1, const void *v2) >>> +{ >>> +  const char *c1 = *(char *const*)v1; >>> +  const char *c2 = *(char *const*)v2; >>> +  return strcmp (c1, c2); >>> +} >>> + >>> +/* STR is the argument to targetv attribute.  This function tokenizes >>> +  the comma separated arguments, sorts them and returns a string which >>> +  is a unique identifier for the comma separated arguments.  */ >>> + >>> +static char * >>> +sorted_attr_string (const char *str) >>> +{ >>> +  char **args = NULL; >>> +  char *attr_str, *ret_str; >>> +  char *attr = NULL; >>> +  unsigned int argnum = 1; >>> +  unsigned int i; >>> + >>> +  for (i = 0; i < strlen (str); i++) >>> +   if (str[i] == ',') >>> +    argnum++; >>> + >>> +  attr_str = (char *)xmalloc (strlen (str) + 1); >>> +  strcpy (attr_str, str); >>> + >>> +  for (i = 0; i < strlen (attr_str); i++) >>> +   if (attr_str[i] == '=') >>> +    attr_str[i] = '_'; >>> + >>> +  if (argnum == 1) >>> +   return attr_str; >>> + >>> +  args = (char **)xmalloc (argnum * sizeof (char *)); >>> + >>> +  i = 0; >>> +  attr = strtok (attr_str, ","); >>> +  while (attr != NULL) >>> +   { >>> +    args[i] = attr; >>> +    i++; >>> +    attr = strtok (NULL, ","); >>> +   } >>> + >>> +  qsort (args, argnum, sizeof (char*), attr_strcmp); >>> + >>> +  ret_str = (char *)xmalloc (strlen (str) + 1); >>> +  strcpy (ret_str, args[0]); >>> +  for (i = 1; i < argnum; i++) >>> +   { >>> +    strcat (ret_str, "_"); >>> +    strcat (ret_str, args[i]); >>> +   } >>> + >>> +  free (args); >>> +  free (attr_str); >>> +  return ret_str; >>> +} >>> + >>> +/* Returns true when only one of DECL1 and DECL2 is marked with "targetv" >>> +  or if the "targetv" attribute strings of DECL1 and DECL2 dont match.  */ >>> + >>> +bool >>> +has_different_version_attributes (const tree decl1, const tree decl2) >>> +{ >>> +  tree attr1, attr2; >>> +  char *c1, *c2; >>> +  bool ret = false; >>> + >>> +  if (TREE_CODE (decl1) != FUNCTION_DECL >>> +    || TREE_CODE (decl2) != FUNCTION_DECL) >>> +   return false; >>> + >>> +  attr1 = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl1)); >>> +  attr2 = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl2)); >>> + >>> +  if (attr1 == NULL_TREE && attr2 == NULL_TREE) >>> +   return false; >>> + >>> +  if ((attr1 == NULL_TREE && attr2 != NULL_TREE) >>> +    || (attr1 != NULL_TREE && attr2 == NULL_TREE)) >>> +   return true; >>> + >>> +  c1 = sorted_attr_string ( >>> +    TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr1)))); >>> +  c2 = sorted_attr_string ( >>> +    TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr2)))); >>> + >>> +  if (strcmp (c1, c2) != 0) >>> +   ret = true; >>> + >>> +  free (c1); >>> +  free (c2); >>> + >>> +  return ret; >>> +} >>> + >>> +/* If this decl corresponds to a function and has "targetv" attribute, >>> +  append the attribute string to its assembler name.  */ >>> + >>> +void >>> +version_assembler_name (const tree decl) >>> +{ >>> +  tree version_attr; >>> +  const char *orig_name, *version_string, *attr_str; >>> +  char *assembler_name; >>> +  tree assembler_name_tree; >>> + >>> +  if (TREE_CODE (decl) != FUNCTION_DECL >>> +    || DECL_ASSEMBLER_NAME_SET_P (decl) >>> +    || !DECL_FUNCTION_VERSIONED (decl)) >>> +   return; >>> + >>> +  if (DECL_DECLARED_INLINE_P (decl) >>> +    &&lookup_attribute ("gnu_inline", >>> +             DECL_ATTRIBUTES (decl))) >>> +   error_at (DECL_SOURCE_LOCATION (decl), >>> +       "Function versions cannot be marked as gnu_inline," >>> +       " bodies have to be generated\n"); >>> + >>> +  if (DECL_VIRTUAL_P (decl) >>> +    || DECL_VINDEX (decl)) >>> +   error_at (DECL_SOURCE_LOCATION (decl), >>> +       "Virtual function versioning not supported\n"); >>> + >>> +  version_attr = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl)); >>> +  /* targetv attribute string is NULL for default functions.  */ >>> +  if (version_attr == NULL_TREE) >>> +   return; >>> + >>> +  orig_name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)); >>> +  version_string >>> +   = TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (version_attr))); >>> + >>> +  attr_str = sorted_attr_string (version_string); >>> +  assembler_name = (char *) xmalloc (strlen (orig_name) >>> +                   + strlen (attr_str) + 2); >>> + >>> +  sprintf (assembler_name, "%s.%s", orig_name, attr_str); >>> +  if (dump_file) >>> +   fprintf (dump_file, "Assembler name set to %s for function version %s\n", >>> +       assembler_name, IDENTIFIER_POINTER (DECL_NAME (decl))); >>> +  assembler_name_tree = get_identifier (assembler_name); >>> +  SET_DECL_ASSEMBLER_NAME (decl, assembler_name_tree); >>> +} >>> + >>> +/* Returns true if decl is multi-versioned and DECL is the default function, >>> +  that is it is not tagged with "targetv" attribute.  */ >>> + >>> +bool >>> +is_default_function (const tree decl) >>> +{ >>> +  return (TREE_CODE (decl) == FUNCTION_DECL >>> +     && DECL_FUNCTION_VERSIONED (decl) >>> +     && (lookup_attribute ("targetv", DECL_ATTRIBUTES (decl)) >>> +       == NULL_TREE)); >>> +} >>> + >>> +/* For function decl DECL, find the version_function struct in the >>> +  decl_version_htab.  */ >>> + >>> +static version_function * >>> +find_function_version (const tree decl) >>> +{ >>> +  void *slot; >>> + >>> +  if (!DECL_FUNCTION_VERSIONED (decl)) >>> +   return NULL; >>> + >>> +  if (!decl_version_htab) >>> +   return NULL; >>> + >>> +  slot = htab_find_with_hash (decl_version_htab, decl, >>> +                htab_hash_pointer (decl)); >>> + >>> +  if (slot != NULL) >>> +   return (version_function *)slot; >>> + >>> +  return NULL; >>> +} >>> + >>> +/* Record DECL as a function version by creating a version_function struct >>> +  for it and storing it in the hashtable.  */ >>> + >>> +static version_function * >>> +add_function_version (const tree decl) >>> +{ >>> +  void **slot; >>> +  version_function *v; >>> + >>> +  if (!DECL_FUNCTION_VERSIONED (decl)) >>> +   return NULL; >>> + >>> +  create_decl_version_htab (); >>> + >>> +  slot = htab_find_slot_with_hash (decl_version_htab, (const void_p)decl, >>> +                  htab_hash_pointer ((const void_p)decl), >>> +                  INSERT); >>> + >>> +  if (*slot != NULL) >>> +   return (version_function *)*slot; >>> + >>> +  v = new_version_function (decl); >>> +  *slot = v; >>> + >>> +  return v; >>> +} >>> + >>> +/* Push V into VEC only if it is not already present.  */ >>> + >>> +static void >>> +push_function_version (version_function *v, VEC (void_p, heap) *vec) >>> +{ >>> +  int ix; >>> +  void_p ele; >>> +  for (ix = 0; VEC_iterate (void_p, vec, ix, ele); ++ix) >>> +   { >>> +    if (ele == (void_p)v) >>> +     return; >>> +   } >>> + >>> +  VEC_safe_push (void_p, heap, vec, (void*)v); >>> +} >>> + >>> +/* Mark DECL as deleted.  This is called by the front-end when a duplicate >>> +  decl is merged with the original decl and the duplicate decl is deleted. >>> +  This function marks the duplicate_decl as invalid.  Called by >>> +  duplicate_decls in cp/decl.c.  */ >>> + >>> +void >>> +mark_delete_decl_version (const tree decl) >>> +{ >>> +  version_function *decl_v; >>> + >>> +  decl_v = find_function_version (decl); >>> + >>> +  if (decl_v == NULL) >>> +   return; >>> + >>> +  decl_v->is_deleted = true; >>> + >>> +  if (is_default_function (decl) >>> +    && decl_v->versions != NULL) >>> +   { >>> +    VEC_truncate (void_p, decl_v->versions, 0); >>> +    VEC_free (void_p, heap, decl_v->versions); >>> +   } >>> +} >>> + >>> +/* Mark DECL1 and DECL2 to be function versions in the same group.  One >>> +  of DECL1 and DECL2 must be the default, otherwise this function does >>> +  nothing.  This function aggregates the versions.  */ >>> + >>> +int >>> +group_function_versions (const tree decl1, const tree decl2) >>> +{ >>> +  tree default_decl, version_decl; >>> +  version_function *default_v, *version_v; >>> + >>> +  gcc_assert (DECL_FUNCTION_VERSIONED (decl1) >>> +       && DECL_FUNCTION_VERSIONED (decl2)); >>> + >>> +  /* The version decls are added only to the default decl.  */ >>> +  if (!is_default_function (decl1) >>> +    && !is_default_function (decl2)) >>> +   return 0; >>> + >>> +  /* This can happen with duplicate declarations.  Just ignore.  */ >>> +  if (is_default_function (decl1) >>> +    && is_default_function (decl2)) >>> +   return 0; >>> + >>> +  default_decl = (is_default_function (decl1)) ? decl1 : decl2; >>> +  version_decl = (default_decl == decl1) ? decl2 : decl1; >>> + >>> +  gcc_assert (default_decl != version_decl); >>> +  create_decl_version_htab (); >>> + >>> +  /* If the version function is found, it has been added.  */ >>> +  if (find_function_version (version_decl)) >>> +   return 0; >>> + >>> +  default_v = add_function_version (default_decl); >>> +  version_v = add_function_version (version_decl); >>> + >>> +  if (default_v->versions == NULL) >>> +   default_v->versions = VEC_alloc (void_p, heap, 1); >>> + >>> +  push_function_version (version_v, default_v->versions); >>> +  return 0; >>> +} >>> + >>> +/* Makes a function attribute of the form NAME(ARG_NAME) and chains >>> +  it to CHAIN.  */ >>> + >>> +static tree >>> +make_attribute (const char *name, const char *arg_name, tree chain) >>> +{ >>> +  tree attr_name; >>> +  tree attr_arg_name; >>> +  tree attr_args; >>> +  tree attr; >>> + >>> +  attr_name = get_identifier (name); >>> +  attr_arg_name = build_string (strlen (arg_name), arg_name); >>> +  attr_args = tree_cons (NULL_TREE, attr_arg_name, NULL_TREE); >>> +  attr = tree_cons (attr_name, attr_args, chain); >>> +  return attr; >>> +} >>> + >>> +/* Return a new name by appending SUFFIX to the DECL name.  If >>> +  make_unique is true, append the full path name.  */ >>> + >>> +static char * >>> +make_name (tree decl, const char *suffix, bool make_unique) >>> +{ >>> +  char *global_var_name; >>> +  int name_len; >>> +  const char *name; >>> +  const char *unique_name = NULL; >>> + >>> +  name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)); >>> + >>> +  /* Get a unique name that can be used globally without any chances >>> +   of collision at link time.  */ >>> +  if (make_unique) >>> +   unique_name = IDENTIFIER_POINTER (get_file_function_name ("\0")); >>> + >>> +  name_len = strlen (name) + strlen (suffix) + 2; >>> + >>> +  if (make_unique) >>> +   name_len += strlen (unique_name) + 1; >>> +  global_var_name = (char *) xmalloc (name_len); >>> + >>> +  /* Use '.' to concatenate names as it is demangler friendly.  */ >>> +  if (make_unique) >>> +    snprintf (global_var_name, name_len, "%s.%s.%s", name, >>> +        unique_name, suffix); >>> +  else >>> +    snprintf (global_var_name, name_len, "%s.%s", name, suffix); >>> + >>> +  return global_var_name; >>> +} >>> + >>> +/* Make the resolver function decl for ifunc (IFUNC_DECL) to dispatch >>> +  the versions of multi-versioned function DEFAULT_DECL.  Create and >>> +  empty basic block in the resolver and store the pointer in >>> +  EMPTY_BB.  Return the decl of the resolver function.  */ >>> + >>> +static tree >>> +make_ifunc_resolver_func (const tree default_decl, >>> +             const tree ifunc_decl, >>> +             basic_block *empty_bb) >>> +{ >>> +  char *resolver_name; >>> +  tree decl, type, decl_name, t; >>> +  basic_block new_bb; >>> +  tree old_current_function_decl; >>> +  bool make_unique = false; >>> + >>> +  /* IFUNC's have to be globally visible.  So, if the default_decl is >>> +   not, then the name of the IFUNC should be made unique.  */ >>> +  if (TREE_PUBLIC (default_decl) == 0) >>> +   make_unique = true; >>> + >>> +  /* Append the filename to the resolver function if the versions are >>> +   not externally visible.  This is because the resolver function has >>> +   to be externally visible for the loader to find it.  So, appending >>> +   the filename will prevent conflicts with a resolver function from >>> +   another module which is based on the same version name.  */ >>> +  resolver_name = make_name (default_decl, "resolver", make_unique); >>> + >>> +  /* The resolver function should return a (void *). */ >>> +  type = build_function_type_list (ptr_type_node, NULL_TREE); >>> + >>> +  decl = build_fn_decl (resolver_name, type); >>> +  decl_name = get_identifier (resolver_name); >>> +  SET_DECL_ASSEMBLER_NAME (decl, decl_name); >>> + >>> +  DECL_NAME (decl) = decl_name; >>> +  TREE_USED (decl) = TREE_USED (default_decl); >>> +  DECL_ARTIFICIAL (decl) = 1; >>> +  DECL_IGNORED_P (decl) = 0; >>> +  /* IFUNC resolvers have to be externally visible.  */ >>> +  TREE_PUBLIC (decl) = 1; >>> +  DECL_UNINLINABLE (decl) = 1; >>> + >>> +  DECL_EXTERNAL (decl) = DECL_EXTERNAL (default_decl); >>> +  DECL_EXTERNAL (ifunc_decl) = 0; >>> + >>> +  DECL_CONTEXT (decl) = NULL_TREE; >>> +  DECL_INITIAL (decl) = make_node (BLOCK); >>> +  DECL_STATIC_CONSTRUCTOR (decl) = 0; >>> +  TREE_READONLY (decl) = 0; >>> +  DECL_PURE_P (decl) = 0; >>> +  DECL_COMDAT (decl) = DECL_COMDAT (default_decl); >>> +  if (DECL_COMDAT_GROUP (default_decl)) >>> +   { >>> +    make_decl_one_only (decl, DECL_COMDAT_GROUP (default_decl)); >>> +   } >>> +  /* Build result decl and add to function_decl. */ >>> +  t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE, ptr_type_node); >>> +  DECL_ARTIFICIAL (t) = 1; >>> +  DECL_IGNORED_P (t) = 1; >>> +  DECL_RESULT (decl) = t; >>> + >>> +  gimplify_function_tree (decl); >>> +  old_current_function_decl = current_function_decl; >>> +  push_cfun (DECL_STRUCT_FUNCTION (decl)); >>> +  current_function_decl = decl; >>> +  init_empty_tree_cfg_for_function (DECL_STRUCT_FUNCTION (decl)); >>> +  cfun->curr_properties |= >>> +   (PROP_gimple_lcf | PROP_gimple_leh | PROP_cfg | PROP_referenced_vars | >>> +   PROP_ssa); >>> +  new_bb = create_empty_bb (ENTRY_BLOCK_PTR); >>> +  make_edge (ENTRY_BLOCK_PTR, new_bb, EDGE_FALLTHRU); >>> +  make_edge (new_bb, EXIT_BLOCK_PTR, 0); >>> +  *empty_bb = new_bb; >>> + >>> +  cgraph_add_new_function (decl, true); >>> +  cgraph_call_function_insertion_hooks (cgraph_get_create_node (decl)); >>> +  cgraph_analyze_function (cgraph_get_create_node (decl)); >>> +  cgraph_mark_needed_node (cgraph_get_create_node (decl)); >>> + >>> +  if (DECL_COMDAT_GROUP (default_decl)) >>> +   { >>> +    gcc_assert (cgraph_get_node (default_decl)); >>> +    cgraph_add_to_same_comdat_group (cgraph_get_node (decl), >>> +                    cgraph_get_node (default_decl)); >>> +   } >>> + >>> +  pop_cfun (); >>> +  current_function_decl = old_current_function_decl; >>> + >>> +  gcc_assert (ifunc_decl != NULL); >>> +  DECL_ATTRIBUTES (ifunc_decl) >>> +   = make_attribute ("ifunc", resolver_name, DECL_ATTRIBUTES (ifunc_decl)); >>> +  assemble_alias (ifunc_decl, get_identifier (resolver_name)); >>> +  return decl; >>> +} >>> + >>> +/* Make and ifunc declaration for the multi-versioned function DECL.  Calls to >>> +  DECL function will be replaced with calls to the ifunc.  Return the decl >>> +  of the ifunc created.  */ >>> + >>> +static tree >>> +make_ifunc_func (const tree decl) >>> +{ >>> +  tree ifunc_decl; >>> +  char *ifunc_name, *resolver_name; >>> +  tree fn_type, ifunc_type; >>> +  bool make_unique = false; >>> + >>> +  if (TREE_PUBLIC (decl) == 0) >>> +   make_unique = true; >>> + >>> +  ifunc_name = make_name (decl, "ifunc", make_unique); >>> +  resolver_name = make_name (decl, "resolver", make_unique); >>> +  gcc_assert (resolver_name); >>> + >>> +  fn_type = TREE_TYPE (decl); >>> +  ifunc_type = build_function_type (TREE_TYPE (fn_type), >>> +                  TYPE_ARG_TYPES (fn_type)); >>> + >>> +  ifunc_decl = build_fn_decl (ifunc_name, ifunc_type); >>> +  TREE_USED (ifunc_decl) = 1; >>> +  DECL_CONTEXT (ifunc_decl) = NULL_TREE; >>> +  DECL_INITIAL (ifunc_decl) = error_mark_node; >>> +  DECL_ARTIFICIAL (ifunc_decl) = 1; >>> +  /* Mark this ifunc as external, the resolver will flip it again if >>> +   it gets generated.  */ >>> +  DECL_EXTERNAL (ifunc_decl) = 1; >>> +  /* IFUNCs have to be externally visible.  */ >>> +  TREE_PUBLIC (ifunc_decl) = 1; >>> + >>> +  return ifunc_decl; >>> +} >>> + >>> +/* For multi-versioned function decl, which should also be the default, >>> +  return the decl of the ifunc resolver, create it if it does not >>> +  exist.  */ >>> + >>> +tree >>> +get_ifunc_for_version (const tree decl) >>> +{ >>> +  version_function *decl_v; >>> +  int ix; >>> +  void_p ele; >>> + >>> +  /* DECL has to be the default version, otherwise it is missing and >>> +   that is not allowed.  */ >>> +  if (!is_default_function (decl)) >>> +   { >>> +    error_at (DECL_SOURCE_LOCATION (decl), "Default version not found"); >>> +    return decl; >>> +   } >>> + >>> +  decl_v = find_function_version (decl); >>> +  gcc_assert (decl_v != NULL); >>> +  if (decl_v->ifunc_decl == NULL) >>> +   { >>> +    tree ifunc_decl; >>> +    ifunc_decl = make_ifunc_func (decl); >>> +    decl_v->ifunc_decl = ifunc_decl; >>> +   } >>> + >>> +  if (cgraph_get_node (decl)) >>> +   cgraph_mark_needed_node (cgraph_get_node (decl)); >>> + >>> +  for (ix = 0; VEC_iterate (void_p, decl_v->versions, ix, ele); ++ix) >>> +   { >>> +    version_function *v = (version_function *) ele; >>> +    gcc_assert (v->decl != NULL); >>> +    if (cgraph_get_node (v->decl)) >>> +    cgraph_mark_needed_node (cgraph_get_node (v->decl)); >>> +   } >>> + >>> +  return decl_v->ifunc_decl; >>> +} >>> + >>> +/* Generate the dispatching code to dispatch multi-versioned function >>> +  DECL.  Make a new function decl for dispatching and call the target >>> +  hook to process the "targetv" attributes and provide the code to >>> +  dispatch the right function at run-time.  */ >>> + >>> +static tree >>> +make_ifunc_resolver_for_version (const tree decl) >>> +{ >>> +  version_function *decl_v; >>> +  tree ifunc_resolver_decl, ifunc_decl; >>> +  basic_block empty_bb; >>> +  int ix; >>> +  void_p ele; >>> +  VEC (tree, heap) *fn_ver_vec = NULL; >>> + >>> +  gcc_assert (is_default_function (decl)); >>> + >>> +  decl_v = find_function_version (decl); >>> +  gcc_assert (decl_v != NULL); >>> + >>> +  if (decl_v->ifunc_resolver_decl != NULL) >>> +   return decl_v->ifunc_resolver_decl; >>> + >>> +  ifunc_decl = decl_v->ifunc_decl; >>> + >>> +  if (ifunc_decl == NULL) >>> +   ifunc_decl = decl_v->ifunc_decl = make_ifunc_func (decl); >>> + >>> +  ifunc_resolver_decl = make_ifunc_resolver_func (decl, ifunc_decl, >>> +                         &empty_bb); >>> + >>> +  fn_ver_vec = VEC_alloc (tree, heap, 2); >>> +  VEC_safe_push (tree, heap, fn_ver_vec, decl); >>> + >>> +  for (ix = 0; VEC_iterate (void_p, decl_v->versions, ix, ele); ++ix) >>> +   { >>> +    version_function *v = (version_function *) ele; >>> +    gcc_assert (v->decl != NULL); >>> +    /* Check for virtual functions here again, as by this time it should >>> +     have been determined if this function needs a vtable index or >>> +     not.  This happens for methods in derived classes that override >>> +     virtual methods in base classes but are not explicitly marked as >>> +     virtual.  */ >>> +    if (DECL_VINDEX (v->decl)) >>> +     error_at (DECL_SOURCE_LOCATION (v->decl), >>> +         "Virtual function versioning not supported\n"); >>> +    if (!v->is_deleted) >>> +    VEC_safe_push (tree, heap, fn_ver_vec, v->decl); >>> +   } >>> + >>> +  gcc_assert (targetm.dispatch_version); >>> +  targetm.dispatch_version (ifunc_resolver_decl, fn_ver_vec, &empty_bb); >>> +  decl_v->ifunc_resolver_decl = ifunc_resolver_decl; >>> + >>> +  return ifunc_resolver_decl; >>> +} >>> + >>> +/* Main entry point to pass_dispatch_versions. For multi-versioned functions, >>> +  generate the dispatching code.  */ >>> + >>> +static unsigned int >>> +do_dispatch_versions (void) >>> +{ >>> +  /* A new pass for generating dispatch code for multi-versioned functions. >>> +   Other forms of dispatch can be added when ifunc support is not available >>> +   like just calling the function directly after checking for target type. >>> +   Currently, dispatching is done through IFUNC.  This pass will become >>> +   more meaningful when other dispatch mechanisms are added.  */ >>> + >>> +  /* Cloning a function to produce more versions will happen here when the >>> +   user requests that via the targetv attribute. For example, >>> +   int foo () __attribute__ ((targetv(("arch=core2"), ("arch=corei7")))); >>> +   means that the user wants the same body of foo to be versioned for core2 >>> +   and corei7.  In that case, this function will be cloned during this >>> +   pass.  */ >>> + >>> +  if (DECL_FUNCTION_VERSIONED (current_function_decl) >>> +    && is_default_function (current_function_decl)) >>> +   { >>> +    tree decl = make_ifunc_resolver_for_version (current_function_decl); >>> +    if (dump_file && decl) >>> +    dump_function_to_file (decl, dump_file, TDF_BLOCKS); >>> +   } >>> +  return 0; >>> +} >>> + >>> +static  bool >>> +gate_dispatch_versions (void) >>> +{ >>> +  return true; >>> +} >>> + >>> +/* A pass to generate the dispatch code to execute the appropriate version >>> +  of a multi-versioned function at run-time.  */ >>> + >>> +struct gimple_opt_pass pass_dispatch_versions = >>> +{ >>> + { >>> +  GIMPLE_PASS, >>> +  "dispatch_multiversion_functions",   /* name */ >>> +  gate_dispatch_versions,        /* gate */ >>> +  do_dispatch_versions,             /* execute */ >>> +  NULL,                     /* sub */ >>> +  NULL,                     /* next */ >>> +  0,                  /* static_pass_number */ >>> +  TV_MULTIVERSION_DISPATCH,       /* tv_id */ >>> +  PROP_cfg,               /* properties_required */ >>> +  PROP_cfg,               /* properties_provided */ >>> +  0,                  /* properties_destroyed */ >>> +  0,                  /* todo_flags_start */ >>> +  TODO_dump_func |           /* todo_flags_finish */ >>> +  TODO_cleanup_cfg | TODO_dump_cgraph >>> + } >>> +}; >>> Index: cgraphunit.c >>> =================================================================== >>> --- cgraphunit.c     (revision 184971) >>> +++ cgraphunit.c     (working copy) >>> @@ -141,6 +141,7 @@ along with GCC; see the file COPYING3.  If not see >>>  #include "ipa-inline.h" >>>  #include "ipa-utils.h" >>>  #include "lto-streamer.h" >>> +#include "multiversion.h" >>> >>>  static void cgraph_expand_all_functions (void); >>>  static void cgraph_mark_functions_to_output (void); >>> @@ -343,6 +344,13 @@ cgraph_finalize_function (tree decl, bool nested) >>>    node->local.redefined_extern_inline = true; >>>   } >>> >>> +  /* If this is a function version and not the default, change the >>> +   assembler name of this function.  The DECL names of function >>> +   versions are the same, only the assembler names are made unique. >>> +   The assembler name is changed by appending the string from >>> +   the "targetv" attribute.  */ >>> +  version_assembler_name (decl); >>> + >>>  notice_global_symbol (decl); >>>  node->local.finalized = true; >>>  node->lowered = DECL_STRUCT_FUNCTION (decl)->cfg != NULL; >>> Index: multiversion.h >>> =================================================================== >>> --- multiversion.h    (revision 0) >>> +++ multiversion.h    (revision 0) >>> @@ -0,0 +1,52 @@ >>> +/* Function Multiversioning. >>> +  Copyright (C) 2012 Free Software Foundation, Inc. >>> +  Contributed by Sriraman Tallam (tmsriram@google.com) >>> + >>> +This file is part of GCC. >>> + >>> +GCC is free software; you can redistribute it and/or modify it under >>> +the terms of the GNU General Public License as published by the Free >>> +Software Foundation; either version 3, or (at your option) any later >>> +version. >>> + >>> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY >>> +WARRANTY; without even the implied warranty of MERCHANTABILITY or >>> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License >>> +for more details. >>> + >>> +You should have received a copy of the GNU General Public License >>> +along with GCC; see the file COPYING3.  If not see >>> +<http://www.gnu.org/licenses/>. */ >>> + >>> +/* This is the header file which provides the functions to keep track >>> +  of functions that are multi-versioned and to generate the dispatch >>> +  code to call the right version at run-time.  */ >>> + >>> +#ifndef GCC_MULTIVERSION_H >>> +#define GCC_MULTIVERION_H >>> + >>> +#include "tree.h" >>> + >>> +/* Mark DECL1 and DECL2 as function versions.  */ >>> +int group_function_versions (const tree decl1, const tree decl2); >>> + >>> +/* Mark DECL as deleted and no longer a version.  */ >>> +void mark_delete_decl_version (const tree decl); >>> + >>> +/* Returns true if DECL is the default version to be executed if all >>> +  other versions are inappropriate at run-time.  */ >>> +bool is_default_function (const tree decl); >>> + >>> +/* Gets the IFUNC dispatcher for this multi-versioned function DECL. DECL >>> +  must be the default function in the multi-versioned group.  */ >>> +tree get_ifunc_for_version (const tree decl); >>> + >>> +/* Returns true when only one of DECL1 and DECL2 is marked with "targetv" >>> +  or if the "targetv" attribute strings of  DECL1 and DECL2 dont match.  */ >>> +bool has_different_version_attributes (const tree decl1, const tree decl2); >>> + >>> +/* If DECL is a function version and not the default version, the assembler >>> +  name of DECL is changed to include the attribute string to keep the >>> +  name unambiguous.  */ >>> +void version_assembler_name (const tree decl); >>> +#endif >>> Index: cp/class.c >>> =================================================================== >>> --- cp/class.c  (revision 184971) >>> +++ cp/class.c  (working copy) >>> @@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.  If not see >>>  #include "tree-dump.h" >>>  #include "splay-tree.h" >>>  #include "pointer-set.h" >>> +#include "multiversion.h" >>> >>>  /* The number of nested classes being processed.  If we are not in the >>>   scope of any class, this is zero.  */ >>> @@ -1092,7 +1093,20 @@ add_method (tree type, tree method, tree using_dec >>>        || same_type_p (TREE_TYPE (fn_type), >>>                TREE_TYPE (method_type)))) >>>     { >>> -     if (using_decl) >>> +     /* For function versions, their parms and types match >>> +       but they are not duplicates.  Record function versions >>> +       as and when they are found.  */ >>> +     if (TREE_CODE (fn) == FUNCTION_DECL >>> +       && TREE_CODE (method) == FUNCTION_DECL >>> +       && (DECL_FUNCTION_VERSIONED (fn) >>> +         || DECL_FUNCTION_VERSIONED (method))) >>> +      { >>> +       DECL_FUNCTION_VERSIONED (fn) = 1; >>> +       DECL_FUNCTION_VERSIONED (method) = 1; >>> +       group_function_versions (fn, method); >>> +       continue; >>> +      } >>> +     else if (using_decl) >>>       { >>>        if (DECL_CONTEXT (fn) == type) >>>         /* Defer to the local function.  */ >>> @@ -1150,6 +1164,13 @@ add_method (tree type, tree method, tree using_dec >>>  else >>>   /* Replace the current slot.  */ >>>   VEC_replace (tree, method_vec, slot, overload); >>> + >>> +  /* Change the assembler name of method here if it has "targetv" >>> +   attributes.  Since all versions have the same mangled name, >>> +   their assembler name is changed by appending the string from >>> +   the "targetv" attribute. */ >>> +  version_assembler_name (method); >>> + >>>  return true; >>>  } >>> >>> @@ -6890,8 +6911,11 @@ resolve_address_of_overloaded_function (tree targe >>>      if (DECL_ANTICIPATED (fn)) >>>       continue; >>> >>> -     /* See if there's a match.  */ >>> -     if (same_type_p (target_fn_type, static_fn_type (fn))) >>> +     /* See if there's a match.  For functions that are multi-versioned >>> +       match it to the default function.  */ >>> +     if (same_type_p (target_fn_type, static_fn_type (fn)) >>> +       && (!DECL_FUNCTION_VERSIONED (fn) >>> +         || is_default_function (fn))) >>>       matches = tree_cons (fn, NULL_TREE, matches); >>>     } >>>   } >>> @@ -7053,6 +7077,21 @@ resolve_address_of_overloaded_function (tree targe >>>    perform_or_defer_access_check (access_path, fn, fn); >>>   } >>> >>> +  /* If a pointer to a function that is multi-versioned is requested, the >>> +   pointer to the dispatcher function is returned instead.  This works >>> +   well because indirectly calling the function will dispatch the right >>> +   function version at run-time. Also, the function address is kept >>> +   unique.  */ >>> +  if (DECL_FUNCTION_VERSIONED (fn) >>> +    && is_default_function (fn)) >>> +   { >>> +    tree ifunc_decl; >>> +    ifunc_decl = get_ifunc_for_version (fn); >>> +    gcc_assert (ifunc_decl != NULL); >>> +    mark_used (fn); >>> +    return build_fold_addr_expr (ifunc_decl); >>> +   } >>> + >>>  if (TYPE_PTRFN_P (target_type) || TYPE_PTRMEMFUNC_P (target_type)) >>>   return cp_build_addr_expr (fn, flags); >>>  else >>> Index: cp/decl.c >>> =================================================================== >>> --- cp/decl.c  (revision 184971) >>> +++ cp/decl.c  (working copy) >>> @@ -54,6 +54,7 @@ along with GCC; see the file COPYING3.  If not see >>>  #include "pointer-set.h" >>>  #include "splay-tree.h" >>>  #include "plugin.h" >>> +#include "multiversion.h" >>> >>>  /* Possible cases of bad specifiers type used by bad_specifiers. */ >>>  enum bad_spec_place { >>> @@ -972,6 +973,23 @@ decls_match (tree newdecl, tree olddecl) >>>    if (t1 != t2) >>>     return 0; >>> >>> +    /* The decls dont match if they correspond to two different versions >>> +     of the same function.  */ >>> +    if (compparms (p1, p2) >>> +     && same_type_p (TREE_TYPE (f1), TREE_TYPE (f2)) >>> +     && (DECL_FUNCTION_VERSIONED (newdecl) >>> +       || DECL_FUNCTION_VERSIONED (olddecl)) >>> +     && has_different_version_attributes (newdecl, olddecl)) >>> +    { >>> +     /* One of the decls could be the default without the "targetv" >>> +       attribute. Set it to be a versioned function here.  */ >>> +     DECL_FUNCTION_VERSIONED (newdecl) = 1; >>> +     DECL_FUNCTION_VERSIONED (olddecl) = 1; >>> +     /* Accumulate all the versions of a function.  */ >>> +     group_function_versions (olddecl, newdecl); >>> +     return 0; >>> +    } >>> + >>>    if (CP_DECL_CONTEXT (newdecl) != CP_DECL_CONTEXT (olddecl) >>>      && ! (DECL_EXTERN_C_P (newdecl) >>>         && DECL_EXTERN_C_P (olddecl))) >>> @@ -1482,7 +1500,11 @@ duplicate_decls (tree newdecl, tree olddecl, bool >>>        error ("previous declaration %q+#D here", olddecl); >>>        return NULL_TREE; >>>       } >>> -     else if (compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)), >>> +     /* For function versions, params and types match, but they >>> +       are not ambiguous.  */ >>> +     else if ((!DECL_FUNCTION_VERSIONED (newdecl) >>> +          && !DECL_FUNCTION_VERSIONED (olddecl)) >>> +          && compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)), >>>                TYPE_ARG_TYPES (TREE_TYPE (olddecl)))) >>>       { >>>        error ("new declaration %q#D", newdecl); >>> @@ -2250,6 +2272,16 @@ duplicate_decls (tree newdecl, tree olddecl, bool >>>  else if (DECL_PRESERVE_P (newdecl)) >>>   DECL_PRESERVE_P (olddecl) = 1; >>> >>> +  /* If the olddecl is a version, so is the newdecl.  */ >>> +  if (TREE_CODE (newdecl) == FUNCTION_DECL >>> +    && DECL_FUNCTION_VERSIONED (olddecl)) >>> +   { >>> +    DECL_FUNCTION_VERSIONED (newdecl) = 1; >>> +    /* Record that newdecl is not a valid version and has >>> +     been deleted.  */ >>> +    mark_delete_decl_version (newdecl); >>> +   } >>> + >>>  if (TREE_CODE (newdecl) == FUNCTION_DECL) >>>   { >>>    int function_size; >>> @@ -4512,6 +4544,10 @@ start_decl (const cp_declarator *declarator, >>>  /* Enter this declaration into the symbol table.  */ >>>  decl = maybe_push_decl (decl); >>> >>> +  /* If this decl is a function version and not the default, its assembler >>> +   name has to be changed.  */ >>> +  version_assembler_name (decl); >>> + >>>  if (processing_template_decl) >>>   decl = push_template_decl (decl); >>>  if (decl == error_mark_node) >>> @@ -13019,6 +13055,10 @@ start_function (cp_decl_specifier_seq *declspecs, >>>   gcc_assert (same_type_p (TREE_TYPE (TREE_TYPE (decl1)), >>>               integer_type_node)); >>> >>> +  /* If this decl is a function version and not the default, its assembler >>> +   name has to be changed.  */ >>> +  version_assembler_name (decl1); >>> + >>>  start_preparsed_function (decl1, attrs, /*flags=*/SF_DEFAULT); >>> >>>  return 1; >>> @@ -13960,6 +14000,11 @@ cxx_comdat_group (tree decl) >>>       break; >>>     } >>>    name = DECL_ASSEMBLER_NAME (decl); >>> +    if (TREE_CODE (decl) == FUNCTION_DECL >>> +     && DECL_FUNCTION_VERSIONED (decl)) >>> +    name = DECL_NAME (decl); >>> +    else >>> +     name = DECL_ASSEMBLER_NAME (decl); >>>   } >>> >>>  return name; >>> Index: cp/semantics.c >>> =================================================================== >>> --- cp/semantics.c    (revision 184971) >>> +++ cp/semantics.c    (working copy) >>> @@ -3783,8 +3783,11 @@ expand_or_defer_fn_1 (tree fn) >>>    /* If the user wants us to keep all inline functions, then mark >>>     this function as needed so that finish_file will make sure to >>>     output it later.  Similarly, all dllexport'd functions must >>> -     be emitted; there may be callers in other DLLs.  */ >>> -    if ((flag_keep_inline_functions >>> +     be emitted; there may be callers in other DLLs. >>> +     Also, mark this function as needed if it is marked inline but >>> +     is a multi-versioned function.  */ >>> +    if (((flag_keep_inline_functions >>> +      || DECL_FUNCTION_VERSIONED (fn)) >>>      && DECL_DECLARED_INLINE_P (fn) >>>      && !DECL_REALLY_EXTERN (fn)) >>>      || (flag_keep_inline_dllexport >>> Index: cp/decl2.c >>> =================================================================== >>> --- cp/decl2.c  (revision 184971) >>> +++ cp/decl2.c  (working copy) >>> @@ -53,6 +53,7 @@ along with GCC; see the file COPYING3.  If not see >>>  #include "splay-tree.h" >>>  #include "langhooks.h" >>>  #include "c-family/c-ada-spec.h" >>> +#include "multiversion.h" >>> >>>  extern cpp_reader *parse_in; >>> >>> @@ -674,9 +675,13 @@ check_classfn (tree ctype, tree function, tree tem >>>      if (is_template != (TREE_CODE (fndecl) == TEMPLATE_DECL)) >>>       continue; >>> >>> +     /* While finding a match, same types and params are not enough >>> +       if the function is versioned.  Also check version ("targetv") >>> +       attributes.  */ >>>      if (same_type_p (TREE_TYPE (TREE_TYPE (function)), >>>              TREE_TYPE (TREE_TYPE (fndecl))) >>>        && compparms (p1, p2) >>> +       && !has_different_version_attributes (function, fndecl) >>>        && (!is_template >>>          || comp_template_parms (template_parms, >>>                      DECL_TEMPLATE_PARMS (fndecl))) >>> Index: cp/call.c >>> =================================================================== >>> --- cp/call.c  (revision 184971) >>> +++ cp/call.c  (working copy) >>> @@ -41,6 +41,7 @@ along with GCC; see the file COPYING3.  If not see >>>  #include "langhooks.h" >>>  #include "c-family/c-objc.h" >>>  #include "timevar.h" >>> +#include "multiversion.h" >>> >>>  /* The various kinds of conversion.  */ >>> >>> @@ -6730,6 +6731,17 @@ build_over_call (struct z_candidate *cand, int fla >>>  if (!already_used) >>>   mark_used (fn); >>> >>> +  /* For a call to a multi-versioned function, the call should actually be to >>> +   the dispatcher.  */ >>> +  if (DECL_FUNCTION_VERSIONED (fn)) >>> +   { >>> +    tree ifunc_decl; >>> +    ifunc_decl = get_ifunc_for_version (fn); >>> +    gcc_assert (ifunc_decl != NULL); >>> +    return build_call_expr_loc_array (UNKNOWN_LOCATION, ifunc_decl, >>> +                    nargs, argarray); >>> +   } >>> + >>>  if (DECL_VINDEX (fn) && (flags & LOOKUP_NONVIRTUAL) == 0) >>>   { >>>    tree t; >>> @@ -7980,6 +7992,30 @@ joust (struct z_candidate *cand1, struct z_candida >>>  size_t i; >>>  size_t len; >>> >>> +  /* For Candidates of a multi-versioned function, the one marked default >>> +   wins.  This is because the default decl is used as key to aggregate >>> +   all the other versions provided for it in multiversion.c.  When >>> +   generating the actual call, the appropriate dispatcher is created >>> +   to call the right function version at run-time.  */ >>> + >>> +  if ((TREE_CODE (cand1->fn) == FUNCTION_DECL >>> +    && DECL_FUNCTION_VERSIONED (cand1->fn)) >>> +    ||(TREE_CODE (cand2->fn) == FUNCTION_DECL >>> +     && DECL_FUNCTION_VERSIONED (cand2->fn))) >>> +   { >>> +    if (is_default_function (cand1->fn)) >>> +    { >>> +      mark_used (cand2->fn); >>> +     return 1; >>> +    } >>> +    if (is_default_function (cand2->fn)) >>> +    { >>> +      mark_used (cand1->fn); >>> +     return -1; >>> +    } >>> +    return 0; >>> +   } >>> + >>>  /* Candidates that involve bad conversions are always worse than those >>>    that don't.  */ >>>  if (cand1->viable > cand2->viable) >>> Index: timevar.def >>> =================================================================== >>> --- timevar.def (revision 184971) >>> +++ timevar.def (working copy) >>> @@ -253,6 +253,7 @@ DEFTIMEVAR (TV_TREE_IFCOMBINE     , "tree if-co >>>  DEFTIMEVAR (TV_TREE_UNINIT      , "uninit var analysis") >>>  DEFTIMEVAR (TV_PLUGIN_INIT      , "plugin initialization") >>>  DEFTIMEVAR (TV_PLUGIN_RUN       , "plugin execution") >>> +DEFTIMEVAR (TV_MULTIVERSION_DISPATCH , "multiversion dispatch") >>> >>>  /* Everything else in rest_of_compilation not included above.  */ >>>  DEFTIMEVAR (TV_EARLY_LOCAL      , "early local passes") >>> Index: varasm.c >>> =================================================================== >>> --- varasm.c   (revision 184971) >>> +++ varasm.c   (working copy) >>> @@ -5755,6 +5755,8 @@ finish_aliases_1 (void) >>>     } >>>    else if (! (p->emitted_diags & ALIAS_DIAG_TO_EXTERN) >>>        && DECL_EXTERNAL (target_decl) >>> +        && (!TREE_CODE (target_decl) == FUNCTION_DECL >>> +          || !DECL_STRUCT_FUNCTION (target_decl)) >>>        /* We use local aliases for C++ thunks to force the tailcall >>>          to bind locally.  This is a hack - to keep it working do >>>          the following (which is not strictly correct).  */ >>> Index: Makefile.in >>> =================================================================== >>> --- Makefile.in (revision 184971) >>> +++ Makefile.in (working copy) >>> @@ -1298,6 +1298,7 @@ OBJS = \ >>>     mcf.o \ >>>     mode-switching.o \ >>>     modulo-sched.o \ >>> +    multiversion.o \ >>>     omega.o \ >>>     omp-low.o \ >>>     optabs.o \ >>> @@ -3030,6 +3031,11 @@ ree.o : ree.c $(CONFIG_H) $(SYSTEM_H) coretypes.h >>>   $(DF_H) $(TIMEVAR_H) tree-pass.h $(RECOG_H) $(EXPR_H) \ >>>   $(REGS_H) $(TREE_H) $(TM_P_H) insn-config.h $(INSN_ATTR_H) $(DIAGNOSTIC_CORE_H) \ >>>   $(TARGET_H) $(OPTABS_H) insn-codes.h rtlhooks-def.h $(PARAMS_H) $(CGRAPH_H) >>> +multiversion.o : multiversion.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \ >>> +  $(TREE_H) langhooks.h $(TREE_INLINE_H) $(FLAGS_H) $(CGRAPH_H) intl.h \ >>> +  $(DIAGNOSTIC_H) $(FIBHEAP_H) $(PARAMS_H) $(TIMEVAR_H) tree-pass.h \ >>> +  $(HASHTAB_H) $(COVERAGE_H) $(GGC_H) $(TREE_FLOW_H) $(RTL_H) $(IPA_PROP_H) \ >>> +  $(BASIC_BLOCK_H) $(TOPLEV_H) $(TREE_DUMP_H) ipa-inline.h >>>  cprop.o : cprop.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \ >>>   $(REGS_H) hard-reg-set.h $(FLAGS_H) insn-config.h $(GGC_H) \ >>>   $(RECOG_H) $(EXPR_H) $(BASIC_BLOCK_H) $(FUNCTION_H) output.h toplev.h $(DIAGNOSTIC_CORE_H) \ >>> Index: passes.c >>> =================================================================== >>> --- passes.c   (revision 184971) >>> +++ passes.c   (working copy) >>> @@ -1190,6 +1190,7 @@ init_optimization_passes (void) >>>  NEXT_PASS (pass_build_cfg); >>>  NEXT_PASS (pass_warn_function_return); >>>  NEXT_PASS (pass_build_cgraph_edges); >>> +  NEXT_PASS (pass_dispatch_versions); >>>  *p = NULL; >>> >>>  /* Interprocedural optimization passes.  */ >>> Index: config/i386/i386.c >>> =================================================================== >>> --- config/i386/i386.c  (revision 184971) >>> +++ config/i386/i386.c  (working copy) >>> @@ -27446,6 +27473,593 @@ ix86_init_mmx_sse_builtins (void) >>>   } >>>  } >>> >>> +/* This adds a condition to the basic_block NEW_BB in function FUNCTION_DECL >>> +  to return a pointer to VERSION_DECL if the outcome of the function >>> +  PREDICATE_DECL is true.  This function will be called during version >>> +  dispatch to decide which function version to execute.  It returns the >>> +  basic block at the end to which more conditions can be added.  */ >>> + >>> +static basic_block >>> +add_condition_to_bb (tree function_decl, tree version_decl, >>> +           basic_block new_bb, tree predicate_decl) >>> +{ >>> +  gimple return_stmt; >>> +  tree convert_expr, result_var; >>> +  gimple convert_stmt; >>> +  gimple call_cond_stmt; >>> +  gimple if_else_stmt; >>> + >>> +  basic_block bb1, bb2, bb3; >>> +  edge e12, e23; >>> + >>> +  tree cond_var; >>> +  gimple_seq gseq; >>> + >>> +  tree old_current_function_decl; >>> + >>> +  old_current_function_decl = current_function_decl; >>> +  push_cfun (DECL_STRUCT_FUNCTION (function_decl)); >>> +  current_function_decl = function_decl; >>> + >>> +  gcc_assert (new_bb != NULL); >>> +  gseq = bb_seq (new_bb); >>> + >>> + >>> +  convert_expr = build1 (CONVERT_EXPR, ptr_type_node, >>> +             build_fold_addr_expr (version_decl)); >>> +  result_var = create_tmp_var (ptr_type_node, NULL); >>> +  convert_stmt = gimple_build_assign (result_var, convert_expr); >>> +  return_stmt = gimple_build_return (result_var); >>> + >>> +  if (predicate_decl == NULL_TREE) >>> +   { >>> +    gimple_seq_add_stmt (&gseq, convert_stmt); >>> +    gimple_seq_add_stmt (&gseq, return_stmt); >>> +    set_bb_seq (new_bb, gseq); >>> +    gimple_set_bb (convert_stmt, new_bb); >>> +    gimple_set_bb (return_stmt, new_bb); >>> +    pop_cfun (); >>> +    current_function_decl = old_current_function_decl; >>> +    return new_bb; >>> +   } >>> + >>> +  cond_var = create_tmp_var (integer_type_node, NULL); >>> +  call_cond_stmt = gimple_build_call (predicate_decl, 0); >>> +  gimple_call_set_lhs (call_cond_stmt, cond_var); >>> + >>> +  gimple_set_block (call_cond_stmt, DECL_INITIAL (function_decl)); >>> +  gimple_set_bb (call_cond_stmt, new_bb); >>> +  gimple_seq_add_stmt (&gseq, call_cond_stmt); >>> + >>> +  if_else_stmt = gimple_build_cond (GT_EXPR, cond_var, >>> +                  integer_zero_node, >>> +                  NULL_TREE, NULL_TREE); >>> +  gimple_set_block (if_else_stmt, DECL_INITIAL (function_decl)); >>> +  gimple_set_bb (if_else_stmt, new_bb); >>> +  gimple_seq_add_stmt (&gseq, if_else_stmt); >>> + >>> +  gimple_seq_add_stmt (&gseq, convert_stmt); >>> +  gimple_seq_add_stmt (&gseq, return_stmt); >>> +  set_bb_seq (new_bb, gseq); >>> + >>> +  bb1 = new_bb; >>> +  e12 = split_block (bb1, if_else_stmt); >>> +  bb2 = e12->dest; >>> +  e12->flags &= ~EDGE_FALLTHRU; >>> +  e12->flags |= EDGE_TRUE_VALUE; >>> + >>> +  e23 = split_block (bb2, return_stmt); >>> + >>> +  gimple_set_bb (convert_stmt, bb2); >>> +  gimple_set_bb (return_stmt, bb2); >>> + >>> +  bb3 = e23->dest; >>> +  make_edge (bb1, bb3, EDGE_FALSE_VALUE); >>> + >>> +  remove_edge (e23); >>> +  make_edge (bb2, EXIT_BLOCK_PTR, 0); >>> + >>> +  rebuild_cgraph_edges (); >>> + >>> +  pop_cfun (); >>> +  current_function_decl = old_current_function_decl; >>> + >>> +  return bb3; >>> +} >>> + >>> +/* This parses the attribute arguments to targetv in DECL and determines >>> +  the right builtin to use to match the platform specification. >>> +  For now, only one target argument ("arch=") is allowed.  */ >>> + >>> +static enum ix86_builtins >>> +get_builtin_code_for_version (tree decl) >>> +{ >>> +  tree attrs; >>> +  struct cl_target_option cur_target; >>> +  tree target_node; >>> +  struct cl_target_option *new_target; >>> +  enum ix86_builtins builtin_code = IX86_BUILTIN_MAX; >>> + >>> +  attrs = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl)); >>> +  gcc_assert (attrs != NULL); >>> + >>> +  cl_target_option_save (&cur_target, &global_options); >>> + >>> +  target_node = ix86_valid_target_attribute_tree >>> +         (TREE_VALUE (TREE_VALUE (attrs))); >>> + >>> +  gcc_assert (target_node); >>> +  new_target = TREE_TARGET_OPTION (target_node); >>> +  gcc_assert (new_target); >>> + >>> +  if (new_target->arch_specified && new_target->arch > 0) >>> +   { >>> +    switch (new_target->arch) >>> +     { >>> +    case 1: >>> +    case 2: >>> +    case 3: >>> +    case 4: >>> +    case 5: >>> +    case 6: >>> +    case 7: >>> +    case 8: >>> +    case 9: >>> +    case 10: >>> +    case 11: >>> +     builtin_code = IX86_BUILTIN_CPU_IS_INTEL; >>> +     break; >>> +    case 12: >>> +     builtin_code = IX86_BUILTIN_CPU_IS_INTEL_CORE2; >>> +     break; >>> +    case 13: >>> +     builtin_code = IX86_BUILTIN_CPU_IS_INTEL_COREI7; >>> +     break; >>> +    case 14: >>> +     builtin_code = IX86_BUILTIN_CPU_IS_INTEL_ATOM; >>> +     break; >>> +    case 15: >>> +    case 16: >>> +    case 17: >>> +    case 18: >>> +    case 19: >>> +    case 20: >>> +    case 21: >>> +     builtin_code = IX86_BUILTIN_CPU_IS_AMD; >>> +     break; >>> +    case 22: >>> +     builtin_code = IX86_BUILTIN_CPU_IS_AMDFAM10H; >>> +     break; >>> +    case 23: >>> +     builtin_code = IX86_BUILTIN_CPU_IS_AMDFAM15H_BDVER1; >>> +     break; >>> +    case 24: >>> +     builtin_code = IX86_BUILTIN_CPU_IS_AMDFAM15H_BDVER2; >>> +     break; >>> +    case 25: /* What is btver1 ? */ >>> +     builtin_code = IX86_BUILTIN_CPU_IS_AMD; >>> +     break; >>> +    } >>> +   } >>> + >>> +  cl_target_option_restore (&global_options, &cur_target); >>> +  if (builtin_code == IX86_BUILTIN_MAX) >>> +    error_at (DECL_SOURCE_LOCATION (decl), >>> +        "No dispatcher found for the versioning attributes"); >>> + >>> +  return builtin_code; >>> +} >>> + >>> +/* This is the target hook to generate the dispatch function for >>> +  multi-versioned functions.  DISPATCH_DECL is the function which will >>> +  contain the dispatch logic.  FNDECLS are the function choices for >>> +  dispatch, and is a tree chain.  EMPTY_BB is the basic block pointer >>> +  in DISPATCH_DECL in which the dispatch code is generated.  */ >>> + >>> +static int >>> +ix86_dispatch_version (tree dispatch_decl, >>> +            void *fndecls_p, >>> +            basic_block *empty_bb) >>> +{ >>> +  tree default_decl; >>> +  gimple ifunc_cpu_init_stmt; >>> +  gimple_seq gseq; >>> +  tree old_current_function_decl; >>> +  int ix; >>> +  tree ele; >>> +  VEC (tree, heap) *fndecls; >>> + >>> +  gcc_assert (dispatch_decl != NULL >>> +       && fndecls_p != NULL >>> +       && empty_bb != NULL); >>> + >>> +  /*fndecls_p is actually a vector.  */ >>> +  fndecls = (VEC (tree, heap) *)fndecls_p; >>> + >>> +  /* Atleast one more version other than the default.  */ >>> +  gcc_assert (VEC_length (tree, fndecls) >= 2); >>> + >>> +  /* The first version in the vector is the default decl.  */ >>> +  default_decl = VEC_index (tree, fndecls, 0); >>> + >>> +  old_current_function_decl = current_function_decl; >>> +  push_cfun (DECL_STRUCT_FUNCTION (dispatch_decl)); >>> +  current_function_decl = dispatch_decl; >>> + >>> +  gseq = bb_seq (*empty_bb); >>> +  ifunc_cpu_init_stmt = gimple_build_call_vec ( >>> +           ix86_builtins [(int) IX86_BUILTIN_CPU_INIT], NULL); >>> +  gimple_seq_add_stmt (&gseq, ifunc_cpu_init_stmt); >>> +  gimple_set_bb (ifunc_cpu_init_stmt, *empty_bb); >>> +  set_bb_seq (*empty_bb, gseq); >>> + >>> +  pop_cfun (); >>> +  current_function_decl = old_current_function_decl; >>> + >>> + >>> +  for (ix = 1; VEC_iterate (tree, fndecls, ix, ele); ++ix) >>> +   { >>> +    tree version_decl = ele; >>> +    /* Get attribute string, parse it and find the right predicate decl. >>> +     The predicate function could be a lengthy combination of many >>> +     features, like arch-type and various isa-variants.  For now, only >>> +     check the arch-type.  */ >>> +    tree predicate_decl = ix86_builtins [ >>> +            get_builtin_code_for_version (version_decl)]; >>> +    *empty_bb = add_condition_to_bb (dispatch_decl, version_decl, *empty_bb, >>> +                    predicate_decl); >>> + >>> +   } >>> +  /* dispatch default version at the end.  */ >>> +  *empty_bb = add_condition_to_bb (dispatch_decl, default_decl, *empty_bb, >>> +                  NULL); >>> +  return 0; >>> +} >>> >>> @@ -38610,6 +39269,12 @@ ix86_autovectorize_vector_sizes (void) >>>  #undef TARGET_BUILD_BUILTIN_VA_LIST >>>  #define TARGET_BUILD_BUILTIN_VA_LIST ix86_build_builtin_va_list >>> >>> +#undef TARGET_DISPATCH_VERSION >>> +#define TARGET_DISPATCH_VERSION ix86_dispatch_version >>> + >>>  #undef TARGET_ENUM_VA_LIST_P >>>  #define TARGET_ENUM_VA_LIST_P ix86_enum_va_list >>> >>> Index: testsuite/g++.dg/mv1.C >>> =================================================================== >>> --- testsuite/g++.dg/mv1.C    (revision 0) >>> +++ testsuite/g++.dg/mv1.C    (revision 0) >>> @@ -0,0 +1,23 @@ >>> +/* Simple test case to check if Multiversioning works.  */ >>> +/* { dg-do run } */ >>> +/* { dg-options "-O2" } */ >>> + >>> +int foo (); >>> +int foo () __attribute__ ((targetv("arch=corei7"))); >>> + >>> +int main () >>> +{ >>> +  int (*p)() = &foo; >>> +  return foo () + (*p)(); >>> +} >>> + >>> +int foo () >>> +{ >>> +  return 0; >>> +} >>> + >>> +int __attribute__ ((targetv("arch=corei7"))) >>> +foo () >>> +{ >>> +  return 0; >>> +} >>> >>> >>> -- >>> This patch is available for review at http://codereview.appspot.com/5752064
Sign in to reply to this message.
Hi Richard, Here is a more detailed overview of the front-end description: * Tracking decls that correspond to function versions of function name, say "foo": Wnen the front-end sees a decl for "foo" with "targetv" attributes, it tags it as a function version. To prevent duplicate definition errors with other versions of "foo", I change "decls_match" function in cp/decl.c to return false when 2 decls have the same signature but different targetv attributes. This will make all function versions of "foo" to be added to the overload list of "foo". To expand further, different targetv attributes is checked for by sorting the arguments to targetv. * Change the assembler names of the function versions. The front-end, changes the assembler names of the function versions by tagging the sorted list of args to "targetv" to the function name of "foo". For example, the assembler name of "void foo () __attribute__ ((targetv ("sse4")))" will become _Z3foov.sse4. * Separately group all function versions of "foo" together, in multiversion.c: File multiversion.c maintains a hashtab, decl_version_htab, that maps the default function decl of "foo" to the list of all other versions of this function "foo". This is meant to be used when creating the dispatcher for this function. * Overload resolution: Function "build_over_call" in cp/call.c sees a call to function "foo", which is multi-versioned. The overload resolution happens in function "joust" in "cp/call.c". Here, the call to "foo" has all possible versions of "foo" as candidates. Currently, "joust" returns the default version of "foo" as the winning candidate. But, "build_over_call" realizes that this is a versioned function and replaces the call-site of foo with a "ifunc" call for foo, by querying a function in "multiversion.c" which builds the ifunc decl. After this, all call-sites of "foo" contain the call to the ifunc. Notice that, for calls from a sse function to a versioned function with an sse variant, I can modify "joust" to return the "sse" function version rather than the default and not replace this call with an ifunc. To do this, I must pass the target attributes of the callee to "joust" and check if the target attributes also match any version. * Creating the dispatcher: The dispatcher is independently created in a new pass, called "pass_dispatch_version", that runs immediately after cfg and cgraph is created. The dispatcher looks at all possible versions and queries the target to give it the CPU detection predicates it must use to dispatch each version. Then, the dispatcher body is created and the ifunc is mapped to use this dispatcher. Notice that only the dispatcher creation is done after the front-end. Everything else occurs in the front-end itself. I could have created the dispatcher also in the front-end. I did not do so because I thought keeping it as a separate pass made sense to easily add more dispatch mechanisms. Like when IFUNC is not available, replace it with control-flow to make direct calls to the function versions. Also, making the dispatcher after "cfg" is created was easy. Thanks, -Sri. On Wed, Mar 7, 2012 at 6:05 AM, Richard Guenther <richard.guenther@gmail.com> wrote: > On Wed, Mar 7, 2012 at 1:46 AM, Sriraman Tallam <tmsriram@google.com> wrote: >> User directed Function Multiversioning (MV) via Function Overloading >> ==================================================================== >> >> This patch adds support for user directed function MV via function overloading. >> For more detailed description: >> http://gcc.gnu.org/ml/gcc/2012-03/msg00074.html >> >> >> Here is an example program with function versions: >> >> int foo ();  /* Default version */ >> int foo () __attribute__ ((targetv("arch=corei7")));/*Specialized for corei7 */ >> int foo () __attribute__ ((targetv("arch=core2")));/*Specialized for core2 */ >> >> int main () >> { >>  int (*p)() = &foo; >>  return foo () + (*p)(); >> } >> >> int foo () >> { >>  return 0; >> } >> >> int __attribute__ ((targetv("arch=corei7"))) >> foo () >> { >>  return 0; >> } >> >> int __attribute__ ((targetv("arch=core2"))) >> foo () >> { >>  return 0; >> } >> >> The above example has foo defined 3 times, but all 3 definitions of foo are >> different versions of the same function. The call to foo in main, directly and >> via a pointer, are calls to the multi-versioned function foo which is dispatched >> to the right foo at run-time. >> >> Function versions must have the same signature but must differ in the specifier >> string provided to a new attribute called "targetv", which is nothing but the >> target attribute with an extra specification to indicate a version. Any number >> of versions can be created using the targetv attribute but it is mandatory to >> have one function without the attribute, which is treated as the default >> version. >> >> The dispatching is done using the IFUNC mechanism to keep the dispatch overhead >> low. The compiler creates a dispatcher function which checks the CPU type and >> calls the right version of foo. The dispatching code checks for the platform >> type and calls the first version that matches. The default function is called if >> no specialized version is appropriate for execution. >> >> The pointer to foo is made to be the address of the dispatcher function, so that >> it is unique and calls made via the pointer also work correctly. The assembler >> names of the various versions of foo is made different, by tagging >> the specifier strings, to keep them unique.  A specific version can be called >> directly by creating an alias to its assembler name. For instance, to call the >> corei7 version directly, make an alias : >> int foo_corei7 () __attribute__((alias ("_Z3foov.arch_corei7"))); >> and then call foo_corei7. >> >> Note that using IFUNC  blocks inlining of versioned functions. I had implemented >> an optimization earlier to do hot path cloning to allow versioned functions to >> be inlined. Please see : http://gcc.gnu.org/ml/gcc-patches/2011-04/msg02285.html >> In the next iteration, I plan to merge these two. With that, hot code paths with >> versioned functions will be cloned so that versioned functions can be inlined. > > Note that inlining of functions with the target attribute is limited as well, > but your issue is that of the indirect dispatch as ... > > You don't give an overview of the frontend implementation.  Thus I have > extracted the following > >  - the FE does not really know about the "overloading", nor can it directly >  resolve calls from a "sse" function to another "sse" function without going >  through the 2nd IFUNC > >  - cgraph also does not know about the "overloading", so it cannot do such >  "devirtualization" either > > you seem to have implemented something inbetween a pure frontend > solution and a proper middle-end solution.  For optimization and eventually > automatically selecting functions for cloning (like, callees of a manual "sse" > versioned function should be cloned?) it would be nice if the cgraph would > know about the different versions and their relationships (and the dispatcher). > Especially the cgraph code should know the functions are semantically > equivalent (I suppose we should require that).  The IFUNC should be > generated by cgraph / target code, similar to how we generate C++ thunks. > > Honza, any suggestions on how the FE side of such cgraph infrastructure > should look like and how we should encode the target bits? > > Thanks, > Richard. > >>     * doc/tm.texi.in: Add description for TARGET_DISPATCH_VERSION. >>     * doc/tm.texi: Regenerate. >>     * c-family/c-common.c (handle_targetv_attribute): New function. >>     * target.def (dispatch_version): New target hook. >>     * tree.h (DECL_FUNCTION_VERSIONED): New macro. >>     (tree_function_decl): New bit-field versioned_function. >>     * tree-pass.h (pass_dispatch_versions): New pass. >>     * multiversion.c: New file. >>     * multiversion.h: New file. >>     * cgraphunit.c: Include multiversion.h >>     (cgraph_finalize_function): Change assembler names of versioned >>     functions. >>     * cp/class.c: Include multiversion.h >>     (add_method): aggregate function versions. Change assembler names of >>     versioned functions. >>     (resolve_address_of_overloaded_function): Match address of function >>     version with default function.  Return address of ifunc dispatcher >>     for address of versioned functions. >>     * cp/decl.c (decls_match): Make decls unmatched for versioned >>     functions. >>     (duplicate_decls): Remove ambiguity for versioned functions. Notify >>     of deleted function version decls. >>     (start_decl): Change assembler name of versioned functions. >>     (start_function): Change assembler name of versioned functions. >>     (cxx_comdat_group): Make comdat group of versioned functions be the >>     same. >>     * cp/semantics.c (expand_or_defer_fn_1): Mark as needed versioned >>     functions that are also marked inline. >>     * cp/decl2.c: Include multiversion.h >>     (check_classfn): Check attributes of versioned functions for match. >>     * cp/call.c: Include multiversion.h >>     (build_over_call): Make calls to multiversioned functions to call the >>     dispatcher. >>     (joust): For calls to multi-versioned functions, make the default >>     function win. >>     * timevar.def (TV_MULTIVERSION_DISPATCH): New time var. >>     * varasm.c (finish_aliases_1): Check if the alias points to a function >>     with a body before giving an error. >>     * Makefile.in: Add multiversion.o >>     * passes.c: Add pass_dispatch_versions to the pass list. >>     * config/i386/i386.c (add_condition_to_bb): New function. >>     (get_builtin_code_for_version): New function. >>     (ix86_dispatch_version): New function. >>     (TARGET_DISPATCH_VERSION): New macro. >>     * testsuite/g++.dg/mv1.C: New test. >> >> Index: doc/tm.texi >> =================================================================== >> --- doc/tm.texi (revision 184971) >> +++ doc/tm.texi (working copy) >> @@ -10995,6 +10995,14 @@ The result is another tree containing a simplified >>  call's result.  If @var{ignore} is true the value will be ignored. >>  @end deftypefn >> >> +@deftypefn {Target Hook} int TARGET_DISPATCH_VERSION (tree @var{dispatch_decl}, void *@var{fndecls}, basic_block *@var{empty_bb}) >> +For multi-versioned function, this hook sets up the dispatcher. >> +@var{dispatch_decl} is the function that will be used to dispatch the >> +version. @var{fndecls} are the function choices for dispatch. >> +@var{empty_bb} is an basic block in @var{dispatch_decl} where the >> +code to do the dispatch will be added. >> +@end deftypefn >> + >>  @deftypefn {Target Hook} {const char *} TARGET_INVALID_WITHIN_DOLOOP (const_rtx @var{insn}) >> >>  Take an instruction in @var{insn} and return NULL if it is valid within a >> Index: doc/tm.texi.in >> =================================================================== >> --- doc/tm.texi.in    (revision 184971) >> +++ doc/tm.texi.in    (working copy) >> @@ -10873,6 +10873,14 @@ The result is another tree containing a simplified >>  call's result.  If @var{ignore} is true the value will be ignored. >>  @end deftypefn >> >> +@hook TARGET_DISPATCH_VERSION >> +For multi-versioned function, this hook sets up the dispatcher. >> +@var{dispatch_decl} is the function that will be used to dispatch the >> +version. @var{fndecls} are the function choices for dispatch. >> +@var{empty_bb} is an basic block in @var{dispatch_decl} where the >> +code to do the dispatch will be added. >> +@end deftypefn >> + >>  @hook TARGET_INVALID_WITHIN_DOLOOP >> >>  Take an instruction in @var{insn} and return NULL if it is valid within a >> Index: c-family/c-common.c >> =================================================================== >> --- c-family/c-common.c (revision 184971) >> +++ c-family/c-common.c (working copy) >> @@ -315,6 +315,7 @@ static tree check_case_value (tree); >>  static bool check_case_bounds (tree, tree, tree *, tree *); >> >>  static tree handle_packed_attribute (tree *, tree, tree, int, bool *); >> +static tree handle_targetv_attribute (tree *, tree, tree, int, bool *); >>  static tree handle_nocommon_attribute (tree *, tree, tree, int, bool *); >>  static tree handle_common_attribute (tree *, tree, tree, int, bool *); >>  static tree handle_noreturn_attribute (tree *, tree, tree, int, bool *); >> @@ -604,6 +605,8 @@ const struct attribute_spec c_common_attribute_tab >>  { >>  /* { name, min_len, max_len, decl_req, type_req, fn_type_req, handler, >>     affects_type_identity } */ >> +  { "targetv",        1, -1, true, false, false, >> +               handle_targetv_attribute, false }, >>  { "packed",         0, 0, false, false, false, >>                handle_packed_attribute , false}, >>  { "nocommon",        0, 0, true,  false, false, >> @@ -5869,6 +5872,54 @@ handle_packed_attribute (tree *node, tree name, tr >>  return NULL_TREE; >>  } >> >> +/* The targetv attribue is used to specify a function version >> +  targeted to specific platform types.  The "targetv" attributes >> +  have to be valid "target" attributes.  NODE should always point >> +  to a FUNCTION_DECL.  ARGS contain the arguments to "targetv" >> +  which should be valid arguments to attribute "target" too. >> +  Check handle_target_attribute for FLAGS and NO_ADD_ATTRS.  */ >> + >> +static tree >> +handle_targetv_attribute (tree *node, tree name, >> +             tree args, >> +             int flags, >> +             bool *no_add_attrs) >> +{ >> +  const char *attr_str = NULL; >> +  gcc_assert (TREE_CODE (*node) == FUNCTION_DECL); >> +  gcc_assert (args != NULL); >> + >> +  /* This is a function version.  */ >> +  DECL_FUNCTION_VERSIONED (*node) = 1; >> + >> +  attr_str = TREE_STRING_POINTER (TREE_VALUE (args)); >> + >> +  /* Check if multiple sets of target attributes are there.  This >> +   is not supported now.  In future, this will be supported by >> +   cloning this function for each set.  */ >> +  if (TREE_CHAIN (args) != NULL) >> +   warning (OPT_Wattributes, "%qE attribute has multiple sets which " >> +       "is not supported", name); >> + >> +  if (attr_str == NULL >> +    || strstr (attr_str, "arch=") == NULL) >> +   error_at (DECL_SOURCE_LOCATION (*node), >> +       "Versioning supported only on \"arch=\" for now"); >> + >> +  /* targetv attributes must translate into target attributes.  */ >> +  handle_target_attribute (node, get_identifier ("target"), args, flags, >> +              no_add_attrs); >> + >> +  if (*no_add_attrs) >> +   warning (OPT_Wattributes, "%qE attribute has no effect", name); >> + >> +  /* This is necessary to keep the attribute tagged to the decl >> +   all the time.  */ >> +  *no_add_attrs = false; >> + >> +  return NULL_TREE; >> +} >> + >>  /* Handle a "nocommon" attribute; arguments as in >>   struct attribute_spec.handler.  */ >> >> Index: target.def >> =================================================================== >> --- target.def  (revision 184971) >> +++ target.def  (working copy) >> @@ -1249,6 +1249,15 @@ DEFHOOK >>  tree, (tree fndecl, int n_args, tree *argp, bool ignore), >>  hook_tree_tree_int_treep_bool_null) >> >> +/* Target hook to generate the dispatching code for calls to multi-versioned >> +  functions.  DISPATCH_DECL is the function that will have the dispatching >> +  logic.  FNDECLS are the list of choices for dispatch and EMPTY_BB is the >> +  basic bloc in DISPATCH_DECL which will contain the code.  */ >> +DEFHOOK >> +(dispatch_version, >> + "", >> + int, (tree dispatch_decl, void *fndecls, basic_block *empty_bb), NULL) >> + >>  /* Returns a code for a target-specific builtin that implements >>   reciprocal of the function, or NULL_TREE if not available.  */ >>  DEFHOOK >> Index: tree.h >> =================================================================== >> --- tree.h    (revision 184971) >> +++ tree.h    (working copy) >> @@ -3532,6 +3532,12 @@ extern VEC(tree, gc) **decl_debug_args_insert (tre >>  #define DECL_FUNCTION_SPECIFIC_OPTIMIZATION(NODE) \ >>   (FUNCTION_DECL_CHECK (NODE)->function_decl.function_specific_optimization) >> >> +/* In FUNCTION_DECL, this is set if this function has other versions generated >> +  using "targetv" attributes.  The default version is the one which does not >> +  have any "targetv" attribute set. */ >> +#define DECL_FUNCTION_VERSIONED(NODE)\ >> +  (FUNCTION_DECL_CHECK (NODE)->function_decl.versioned_function) >> + >>  /* FUNCTION_DECL inherits from DECL_NON_COMMON because of the use of the >>   arguments/result/saved_tree fields by front ends.  It was either inherit >>   FUNCTION_DECL from non_common, or inherit non_common from FUNCTION_DECL, >> @@ -3576,8 +3582,8 @@ struct GTY(()) tree_function_decl { >>  unsigned looping_const_or_pure_flag : 1; >>  unsigned has_debug_args_flag : 1; >>  unsigned tm_clone_flag : 1; >> - >> -  /* 1 bit left */ >> +  unsigned versioned_function : 1; >> +  /* No bits left.  */ >>  }; >> >>  /* The source language of the translation-unit.  */ >> Index: tree-pass.h >> =================================================================== >> --- tree-pass.h (revision 184971) >> +++ tree-pass.h (working copy) >> @@ -455,6 +455,7 @@ extern struct gimple_opt_pass pass_tm_memopt; >>  extern struct gimple_opt_pass pass_tm_edges; >>  extern struct gimple_opt_pass pass_split_functions; >>  extern struct gimple_opt_pass pass_feedback_split_functions; >> +extern struct gimple_opt_pass pass_dispatch_versions; >> >>  /* IPA Passes */ >>  extern struct simple_ipa_opt_pass pass_ipa_lower_emutls; >> Index: multiversion.c >> =================================================================== >> --- multiversion.c    (revision 0) >> +++ multiversion.c    (revision 0) >> @@ -0,0 +1,798 @@ >> +/* Function Multiversioning. >> +  Copyright (C) 2012 Free Software Foundation, Inc. >> +  Contributed by Sriraman Tallam (tmsriram@google.com) >> + >> +This file is part of GCC. >> + >> +GCC is free software; you can redistribute it and/or modify it under >> +the terms of the GNU General Public License as published by the Free >> +Software Foundation; either version 3, or (at your option) any later >> +version. >> + >> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY >> +WARRANTY; without even the implied warranty of MERCHANTABILITY or >> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License >> +for more details. >> + >> +You should have received a copy of the GNU General Public License >> +along with GCC; see the file COPYING3.  If not see >> +<http://www.gnu.org/licenses/>. */ >> + >> +/* Holds the state for multi-versioned functions here. The front-end >> +  updates the state as and when function versions are encountered. >> +  This is then used to generate the dispatch code.  Also, the >> +  optimization passes to clone hot paths involving versioned functions >> +  will be done here. >> + >> +  Function versions are created by using the same function signature but >> +  also tagging attribute "targetv" to specify the platform type for which >> +  the version must be executed.  Here is an example: >> + >> +  int foo () >> +  { >> +   printf ("Execute as default"); >> +   return 0; >> +  } >> + >> +  int  __attribute__ ((targetv ("arch=corei7"))) >> +  foo () >> +  { >> +   printf ("Execute for corei7"); >> +   return 0; >> +  } >> + >> +  int main () >> +  { >> +   return foo (); >> +  } >> + >> +  The call to foo in main is replaced with a call to an IFUNC function that >> +  contains the dispatch code to call the correct function version at >> +  run-time.  */ >> + >> + >> +#include "config.h" >> +#include "system.h" >> +#include "coretypes.h" >> +#include "tm.h" >> +#include "tree.h" >> +#include "tree-inline.h" >> +#include "langhooks.h" >> +#include "flags.h" >> +#include "cgraph.h" >> +#include "diagnostic.h" >> +#include "toplev.h" >> +#include "timevar.h" >> +#include "params.h" >> +#include "fibheap.h" >> +#include "intl.h" >> +#include "tree-pass.h" >> +#include "hashtab.h" >> +#include "coverage.h" >> +#include "ggc.h" >> +#include "tree-flow.h" >> +#include "rtl.h" >> +#include "ipa-prop.h" >> +#include "basic-block.h" >> +#include "toplev.h" >> +#include "dbgcnt.h" >> +#include "tree-dump.h" >> +#include "output.h" >> +#include "vecprim.h" >> +#include "gimple-pretty-print.h" >> +#include "ipa-inline.h" >> +#include "target.h" >> +#include "multiversion.h" >> + >> +typedef void * void_p; >> + >> +DEF_VEC_P (void_p); >> +DEF_VEC_ALLOC_P (void_p, heap); >> + >> +/* Each function decl that is a function version gets an instance of this >> +  structure.  Since this is called by the front-end, decl merging can >> +  happen, where a decl created for a new declaration is merged with >> +  the old. In this case, the new decl is deleted and the IS_DELETED >> +  field is set for the struct instance corresponding to the new decl. >> +  IFUNC_DECL is the decl of the ifunc function for default decls. >> +  IFUNC_RESOLVER_DECL is the decl of the dispatch function.  VERSIONS >> +  is a vector containing the list of function versions  that are >> +  the candidates for dispatch.  */ >> + >> +typedef struct version_function_d { >> +  tree decl; >> +  tree ifunc_decl; >> +  tree ifunc_resolver_decl; >> +  VEC (void_p, heap) *versions; >> +  bool is_deleted; >> +} version_function; >> + >> +/* Hashmap has an entry for every function decl that has other function >> +  versions.  For function decls that are the default, it also stores the >> +  list of all the other function versions.  Each entry is a structure >> +  of type version_function_d.  */ >> +static htab_t decl_version_htab = NULL; >> + >> +/* Hashtable helpers for decl_version_htab. */ >> + >> +static hashval_t >> +decl_version_htab_hash_descriptor (const void *p) >> +{ >> +  const version_function *t = (const version_function *) p; >> +  return htab_hash_pointer (t->decl); >> +} >> + >> +/* Hashtable helper for decl_version_htab. */ >> + >> +static int >> +decl_version_htab_eq_descriptor (const void *p1, const void *p2) >> +{ >> +  const version_function *t1 = (const version_function *) p1; >> +  return htab_eq_pointer ((const void_p) t1->decl, p2); >> +} >> + >> +/* Create the decl_version_htab.  */ >> +static void >> +create_decl_version_htab (void) >> +{ >> +  if (decl_version_htab == NULL) >> +   decl_version_htab = htab_create (10, decl_version_htab_hash_descriptor, >> +                   decl_version_htab_eq_descriptor, NULL); >> +} >> + >> +/* Creates an instance of version_function for decl DECL.  */ >> + >> +static version_function* >> +new_version_function (const tree decl) >> +{ >> +  version_function *v; >> +  v = (version_function *)xmalloc(sizeof (version_function)); >> +  v->decl = decl; >> +  v->ifunc_decl = NULL; >> +  v->ifunc_resolver_decl = NULL; >> +  v->versions = NULL; >> +  v->is_deleted = false; >> +  return v; >> +} >> + >> +/* Comparator function to be used in qsort routine to sort attribute >> +  specification strings to "targetv".  */ >> + >> +static int >> +attr_strcmp (const void *v1, const void *v2) >> +{ >> +  const char *c1 = *(char *const*)v1; >> +  const char *c2 = *(char *const*)v2; >> +  return strcmp (c1, c2); >> +} >> + >> +/* STR is the argument to targetv attribute.  This function tokenizes >> +  the comma separated arguments, sorts them and returns a string which >> +  is a unique identifier for the comma separated arguments.  */ >> + >> +static char * >> +sorted_attr_string (const char *str) >> +{ >> +  char **args = NULL; >> +  char *attr_str, *ret_str; >> +  char *attr = NULL; >> +  unsigned int argnum = 1; >> +  unsigned int i; >> + >> +  for (i = 0; i < strlen (str); i++) >> +   if (str[i] == ',') >> +    argnum++; >> + >> +  attr_str = (char *)xmalloc (strlen (str) + 1); >> +  strcpy (attr_str, str); >> + >> +  for (i = 0; i < strlen (attr_str); i++) >> +   if (attr_str[i] == '=') >> +    attr_str[i] = '_'; >> + >> +  if (argnum == 1) >> +   return attr_str; >> + >> +  args = (char **)xmalloc (argnum * sizeof (char *)); >> + >> +  i = 0; >> +  attr = strtok (attr_str, ","); >> +  while (attr != NULL) >> +   { >> +    args[i] = attr; >> +    i++; >> +    attr = strtok (NULL, ","); >> +   } >> + >> +  qsort (args, argnum, sizeof (char*), attr_strcmp); >> + >> +  ret_str = (char *)xmalloc (strlen (str) + 1); >> +  strcpy (ret_str, args[0]); >> +  for (i = 1; i < argnum; i++) >> +   { >> +    strcat (ret_str, "_"); >> +    strcat (ret_str, args[i]); >> +   } >> + >> +  free (args); >> +  free (attr_str); >> +  return ret_str; >> +} >> + >> +/* Returns true when only one of DECL1 and DECL2 is marked with "targetv" >> +  or if the "targetv" attribute strings of DECL1 and DECL2 dont match.  */ >> + >> +bool >> +has_different_version_attributes (const tree decl1, const tree decl2) >> +{ >> +  tree attr1, attr2; >> +  char *c1, *c2; >> +  bool ret = false; >> + >> +  if (TREE_CODE (decl1) != FUNCTION_DECL >> +    || TREE_CODE (decl2) != FUNCTION_DECL) >> +   return false; >> + >> +  attr1 = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl1)); >> +  attr2 = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl2)); >> + >> +  if (attr1 == NULL_TREE && attr2 == NULL_TREE) >> +   return false; >> + >> +  if ((attr1 == NULL_TREE && attr2 != NULL_TREE) >> +    || (attr1 != NULL_TREE && attr2 == NULL_TREE)) >> +   return true; >> + >> +  c1 = sorted_attr_string ( >> +    TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr1)))); >> +  c2 = sorted_attr_string ( >> +    TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr2)))); >> + >> +  if (strcmp (c1, c2) != 0) >> +   ret = true; >> + >> +  free (c1); >> +  free (c2); >> + >> +  return ret; >> +} >> + >> +/* If this decl corresponds to a function and has "targetv" attribute, >> +  append the attribute string to its assembler name.  */ >> + >> +void >> +version_assembler_name (const tree decl) >> +{ >> +  tree version_attr; >> +  const char *orig_name, *version_string, *attr_str; >> +  char *assembler_name; >> +  tree assembler_name_tree; >> + >> +  if (TREE_CODE (decl) != FUNCTION_DECL >> +    || DECL_ASSEMBLER_NAME_SET_P (decl) >> +    || !DECL_FUNCTION_VERSIONED (decl)) >> +   return; >> + >> +  if (DECL_DECLARED_INLINE_P (decl) >> +    &&lookup_attribute ("gnu_inline", >> +             DECL_ATTRIBUTES (decl))) >> +   error_at (DECL_SOURCE_LOCATION (decl), >> +       "Function versions cannot be marked as gnu_inline," >> +       " bodies have to be generated\n"); >> + >> +  if (DECL_VIRTUAL_P (decl) >> +    || DECL_VINDEX (decl)) >> +   error_at (DECL_SOURCE_LOCATION (decl), >> +       "Virtual function versioning not supported\n"); >> + >> +  version_attr = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl)); >> +  /* targetv attribute string is NULL for default functions.  */ >> +  if (version_attr == NULL_TREE) >> +   return; >> + >> +  orig_name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)); >> +  version_string >> +   = TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (version_attr))); >> + >> +  attr_str = sorted_attr_string (version_string); >> +  assembler_name = (char *) xmalloc (strlen (orig_name) >> +                   + strlen (attr_str) + 2); >> + >> +  sprintf (assembler_name, "%s.%s", orig_name, attr_str); >> +  if (dump_file) >> +   fprintf (dump_file, "Assembler name set to %s for function version %s\n", >> +       assembler_name, IDENTIFIER_POINTER (DECL_NAME (decl))); >> +  assembler_name_tree = get_identifier (assembler_name); >> +  SET_DECL_ASSEMBLER_NAME (decl, assembler_name_tree); >> +} >> + >> +/* Returns true if decl is multi-versioned and DECL is the default function, >> +  that is it is not tagged with "targetv" attribute.  */ >> + >> +bool >> +is_default_function (const tree decl) >> +{ >> +  return (TREE_CODE (decl) == FUNCTION_DECL >> +     && DECL_FUNCTION_VERSIONED (decl) >> +     && (lookup_attribute ("targetv", DECL_ATTRIBUTES (decl)) >> +       == NULL_TREE)); >> +} >> + >> +/* For function decl DECL, find the version_function struct in the >> +  decl_version_htab.  */ >> + >> +static version_function * >> +find_function_version (const tree decl) >> +{ >> +  void *slot; >> + >> +  if (!DECL_FUNCTION_VERSIONED (decl)) >> +   return NULL; >> + >> +  if (!decl_version_htab) >> +   return NULL; >> + >> +  slot = htab_find_with_hash (decl_version_htab, decl, >> +                htab_hash_pointer (decl)); >> + >> +  if (slot != NULL) >> +   return (version_function *)slot; >> + >> +  return NULL; >> +} >> + >> +/* Record DECL as a function version by creating a version_function struct >> +  for it and storing it in the hashtable.  */ >> + >> +static version_function * >> +add_function_version (const tree decl) >> +{ >> +  void **slot; >> +  version_function *v; >> + >> +  if (!DECL_FUNCTION_VERSIONED (decl)) >> +   return NULL; >> + >> +  create_decl_version_htab (); >> + >> +  slot = htab_find_slot_with_hash (decl_version_htab, (const void_p)decl, >> +                  htab_hash_pointer ((const void_p)decl), >> +                  INSERT); >> + >> +  if (*slot != NULL) >> +   return (version_function *)*slot; >> + >> +  v = new_version_function (decl); >> +  *slot = v; >> + >> +  return v; >> +} >> + >> +/* Push V into VEC only if it is not already present.  */ >> + >> +static void >> +push_function_version (version_function *v, VEC (void_p, heap) *vec) >> +{ >> +  int ix; >> +  void_p ele; >> +  for (ix = 0; VEC_iterate (void_p, vec, ix, ele); ++ix) >> +   { >> +    if (ele == (void_p)v) >> +     return; >> +   } >> + >> +  VEC_safe_push (void_p, heap, vec, (void*)v); >> +} >> + >> +/* Mark DECL as deleted.  This is called by the front-end when a duplicate >> +  decl is merged with the original decl and the duplicate decl is deleted. >> +  This function marks the duplicate_decl as invalid.  Called by >> +  duplicate_decls in cp/decl.c.  */ >> + >> +void >> +mark_delete_decl_version (const tree decl) >> +{ >> +  version_function *decl_v; >> + >> +  decl_v = find_function_version (decl); >> + >> +  if (decl_v == NULL) >> +   return; >> + >> +  decl_v->is_deleted = true; >> + >> +  if (is_default_function (decl) >> +    && decl_v->versions != NULL) >> +   { >> +    VEC_truncate (void_p, decl_v->versions, 0); >> +    VEC_free (void_p, heap, decl_v->versions); >> +   } >> +} >> + >> +/* Mark DECL1 and DECL2 to be function versions in the same group.  One >> +  of DECL1 and DECL2 must be the default, otherwise this function does >> +  nothing.  This function aggregates the versions.  */ >> + >> +int >> +group_function_versions (const tree decl1, const tree decl2) >> +{ >> +  tree default_decl, version_decl; >> +  version_function *default_v, *version_v; >> + >> +  gcc_assert (DECL_FUNCTION_VERSIONED (decl1) >> +       && DECL_FUNCTION_VERSIONED (decl2)); >> + >> +  /* The version decls are added only to the default decl.  */ >> +  if (!is_default_function (decl1) >> +    && !is_default_function (decl2)) >> +   return 0; >> + >> +  /* This can happen with duplicate declarations.  Just ignore.  */ >> +  if (is_default_function (decl1) >> +    && is_default_function (decl2)) >> +   return 0; >> + >> +  default_decl = (is_default_function (decl1)) ? decl1 : decl2; >> +  version_decl = (default_decl == decl1) ? decl2 : decl1; >> + >> +  gcc_assert (default_decl != version_decl); >> +  create_decl_version_htab (); >> + >> +  /* If the version function is found, it has been added.  */ >> +  if (find_function_version (version_decl)) >> +   return 0; >> + >> +  default_v = add_function_version (default_decl); >> +  version_v = add_function_version (version_decl); >> + >> +  if (default_v->versions == NULL) >> +   default_v->versions = VEC_alloc (void_p, heap, 1); >> + >> +  push_function_version (version_v, default_v->versions); >> +  return 0; >> +} >> + >> +/* Makes a function attribute of the form NAME(ARG_NAME) and chains >> +  it to CHAIN.  */ >> + >> +static tree >> +make_attribute (const char *name, const char *arg_name, tree chain) >> +{ >> +  tree attr_name; >> +  tree attr_arg_name; >> +  tree attr_args; >> +  tree attr; >> + >> +  attr_name = get_identifier (name); >> +  attr_arg_name = build_string (strlen (arg_name), arg_name); >> +  attr_args = tree_cons (NULL_TREE, attr_arg_name, NULL_TREE); >> +  attr = tree_cons (attr_name, attr_args, chain); >> +  return attr; >> +} >> + >> +/* Return a new name by appending SUFFIX to the DECL name.  If >> +  make_unique is true, append the full path name.  */ >> + >> +static char * >> +make_name (tree decl, const char *suffix, bool make_unique) >> +{ >> +  char *global_var_name; >> +  int name_len; >> +  const char *name; >> +  const char *unique_name = NULL; >> + >> +  name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)); >> + >> +  /* Get a unique name that can be used globally without any chances >> +   of collision at link time.  */ >> +  if (make_unique) >> +   unique_name = IDENTIFIER_POINTER (get_file_function_name ("\0")); >> + >> +  name_len = strlen (name) + strlen (suffix) + 2; >> + >> +  if (make_unique) >> +   name_len += strlen (unique_name) + 1; >> +  global_var_name = (char *) xmalloc (name_len); >> + >> +  /* Use '.' to concatenate names as it is demangler friendly.  */ >> +  if (make_unique) >> +    snprintf (global_var_name, name_len, "%s.%s.%s", name, >> +        unique_name, suffix); >> +  else >> +    snprintf (global_var_name, name_len, "%s.%s", name, suffix); >> + >> +  return global_var_name; >> +} >> + >> +/* Make the resolver function decl for ifunc (IFUNC_DECL) to dispatch >> +  the versions of multi-versioned function DEFAULT_DECL.  Create and >> +  empty basic block in the resolver and store the pointer in >> +  EMPTY_BB.  Return the decl of the resolver function.  */ >> + >> +static tree >> +make_ifunc_resolver_func (const tree default_decl, >> +             const tree ifunc_decl, >> +             basic_block *empty_bb) >> +{ >> +  char *resolver_name; >> +  tree decl, type, decl_name, t; >> +  basic_block new_bb; >> +  tree old_current_function_decl; >> +  bool make_unique = false; >> + >> +  /* IFUNC's have to be globally visible.  So, if the default_decl is >> +   not, then the name of the IFUNC should be made unique.  */ >> +  if (TREE_PUBLIC (default_decl) == 0) >> +   make_unique = true; >> + >> +  /* Append the filename to the resolver function if the versions are >> +   not externally visible.  This is because the resolver function has >> +   to be externally visible for the loader to find it.  So, appending >> +   the filename will prevent conflicts with a resolver function from >> +   another module which is based on the same version name.  */ >> +  resolver_name = make_name (default_decl, "resolver", make_unique); >> + >> +  /* The resolver function should return a (void *). */ >> +  type = build_function_type_list (ptr_type_node, NULL_TREE); >> + >> +  decl = build_fn_decl (resolver_name, type); >> +  decl_name = get_identifier (resolver_name); >> +  SET_DECL_ASSEMBLER_NAME (decl, decl_name); >> + >> +  DECL_NAME (decl) = decl_name; >> +  TREE_USED (decl) = TREE_USED (default_decl); >> +  DECL_ARTIFICIAL (decl) = 1; >> +  DECL_IGNORED_P (decl) = 0; >> +  /* IFUNC resolvers have to be externally visible.  */ >> +  TREE_PUBLIC (decl) = 1; >> +  DECL_UNINLINABLE (decl) = 1; >> + >> +  DECL_EXTERNAL (decl) = DECL_EXTERNAL (default_decl); >> +  DECL_EXTERNAL (ifunc_decl) = 0; >> + >> +  DECL_CONTEXT (decl) = NULL_TREE; >> +  DECL_INITIAL (decl) = make_node (BLOCK); >> +  DECL_STATIC_CONSTRUCTOR (decl) = 0; >> +  TREE_READONLY (decl) = 0; >> +  DECL_PURE_P (decl) = 0; >> +  DECL_COMDAT (decl) = DECL_COMDAT (default_decl); >> +  if (DECL_COMDAT_GROUP (default_decl)) >> +   { >> +    make_decl_one_only (decl, DECL_COMDAT_GROUP (default_decl)); >> +   } >> +  /* Build result decl and add to function_decl. */ >> +  t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE, ptr_type_node); >> +  DECL_ARTIFICIAL (t) = 1; >> +  DECL_IGNORED_P (t) = 1; >> +  DECL_RESULT (decl) = t; >> + >> +  gimplify_function_tree (decl); >> +  old_current_function_decl = current_function_decl; >> +  push_cfun (DECL_STRUCT_FUNCTION (decl)); >> +  current_function_decl = decl; >> +  init_empty_tree_cfg_for_function (DECL_STRUCT_FUNCTION (decl)); >> +  cfun->curr_properties |= >> +   (PROP_gimple_lcf | PROP_gimple_leh | PROP_cfg | PROP_referenced_vars | >> +   PROP_ssa); >> +  new_bb = create_empty_bb (ENTRY_BLOCK_PTR); >> +  make_edge (ENTRY_BLOCK_PTR, new_bb, EDGE_FALLTHRU); >> +  make_edge (new_bb, EXIT_BLOCK_PTR, 0); >> +  *empty_bb = new_bb; >> + >> +  cgraph_add_new_function (decl, true); >> +  cgraph_call_function_insertion_hooks (cgraph_get_create_node (decl)); >> +  cgraph_analyze_function (cgraph_get_create_node (decl)); >> +  cgraph_mark_needed_node (cgraph_get_create_node (decl)); >> + >> +  if (DECL_COMDAT_GROUP (default_decl)) >> +   { >> +    gcc_assert (cgraph_get_node (default_decl)); >> +    cgraph_add_to_same_comdat_group (cgraph_get_node (decl), >> +                    cgraph_get_node (default_decl)); >> +   } >> + >> +  pop_cfun (); >> +  current_function_decl = old_current_function_decl; >> + >> +  gcc_assert (ifunc_decl != NULL); >> +  DECL_ATTRIBUTES (ifunc_decl) >> +   = make_attribute ("ifunc", resolver_name, DECL_ATTRIBUTES (ifunc_decl)); >> +  assemble_alias (ifunc_decl, get_identifier (resolver_name)); >> +  return decl; >> +} >> + >> +/* Make and ifunc declaration for the multi-versioned function DECL.  Calls to >> +  DECL function will be replaced with calls to the ifunc.  Return the decl >> +  of the ifunc created.  */ >> + >> +static tree >> +make_ifunc_func (const tree decl) >> +{ >> +  tree ifunc_decl; >> +  char *ifunc_name, *resolver_name; >> +  tree fn_type, ifunc_type; >> +  bool make_unique = false; >> + >> +  if (TREE_PUBLIC (decl) == 0) >> +   make_unique = true; >> + >> +  ifunc_name = make_name (decl, "ifunc", make_unique); >> +  resolver_name = make_name (decl, "resolver", make_unique); >> +  gcc_assert (resolver_name); >> + >> +  fn_type = TREE_TYPE (decl); >> +  ifunc_type = build_function_type (TREE_TYPE (fn_type), >> +                  TYPE_ARG_TYPES (fn_type)); >> + >> +  ifunc_decl = build_fn_decl (ifunc_name, ifunc_type); >> +  TREE_USED (ifunc_decl) = 1; >> +  DECL_CONTEXT (ifunc_decl) = NULL_TREE; >> +  DECL_INITIAL (ifunc_decl) = error_mark_node; >> +  DECL_ARTIFICIAL (ifunc_decl) = 1; >> +  /* Mark this ifunc as external, the resolver will flip it again if >> +   it gets generated.  */ >> +  DECL_EXTERNAL (ifunc_decl) = 1; >> +  /* IFUNCs have to be externally visible.  */ >> +  TREE_PUBLIC (ifunc_decl) = 1; >> + >> +  return ifunc_decl; >> +} >> + >> +/* For multi-versioned function decl, which should also be the default, >> +  return the decl of the ifunc resolver, create it if it does not >> +  exist.  */ >> + >> +tree >> +get_ifunc_for_version (const tree decl) >> +{ >> +  version_function *decl_v; >> +  int ix; >> +  void_p ele; >> + >> +  /* DECL has to be the default version, otherwise it is missing and >> +   that is not allowed.  */ >> +  if (!is_default_function (decl)) >> +   { >> +    error_at (DECL_SOURCE_LOCATION (decl), "Default version not found"); >> +    return decl; >> +   } >> + >> +  decl_v = find_function_version (decl); >> +  gcc_assert (decl_v != NULL); >> +  if (decl_v->ifunc_decl == NULL) >> +   { >> +    tree ifunc_decl; >> +    ifunc_decl = make_ifunc_func (decl); >> +    decl_v->ifunc_decl = ifunc_decl; >> +   } >> + >> +  if (cgraph_get_node (decl)) >> +   cgraph_mark_needed_node (cgraph_get_node (decl)); >> + >> +  for (ix = 0; VEC_iterate (void_p, decl_v->versions, ix, ele); ++ix) >> +   { >> +    version_function *v = (version_function *) ele; >> +    gcc_assert (v->decl != NULL); >> +    if (cgraph_get_node (v->decl)) >> +    cgraph_mark_needed_node (cgraph_get_node (v->decl)); >> +   } >> + >> +  return decl_v->ifunc_decl; >> +} >> + >> +/* Generate the dispatching code to dispatch multi-versioned function >> +  DECL.  Make a new function decl for dispatching and call the target >> +  hook to process the "targetv" attributes and provide the code to >> +  dispatch the right function at run-time.  */ >> + >> +static tree >> +make_ifunc_resolver_for_version (const tree decl) >> +{ >> +  version_function *decl_v; >> +  tree ifunc_resolver_decl, ifunc_decl; >> +  basic_block empty_bb; >> +  int ix; >> +  void_p ele; >> +  VEC (tree, heap) *fn_ver_vec = NULL; >> + >> +  gcc_assert (is_default_function (decl)); >> + >> +  decl_v = find_function_version (decl); >> +  gcc_assert (decl_v != NULL); >> + >> +  if (decl_v->ifunc_resolver_decl != NULL) >> +   return decl_v->ifunc_resolver_decl; >> + >> +  ifunc_decl = decl_v->ifunc_decl; >> + >> +  if (ifunc_decl == NULL) >> +   ifunc_decl = decl_v->ifunc_decl = make_ifunc_func (decl); >> + >> +  ifunc_resolver_decl = make_ifunc_resolver_func (decl, ifunc_decl, >> +                         &empty_bb); >> + >> +  fn_ver_vec = VEC_alloc (tree, heap, 2); >> +  VEC_safe_push (tree, heap, fn_ver_vec, decl); >> + >> +  for (ix = 0; VEC_iterate (void_p, decl_v->versions, ix, ele); ++ix) >> +   { >> +    version_function *v = (version_function *) ele; >> +    gcc_assert (v->decl != NULL); >> +    /* Check for virtual functions here again, as by this time it should >> +     have been determined if this function needs a vtable index or >> +     not.  This happens for methods in derived classes that override >> +     virtual methods in base classes but are not explicitly marked as >> +     virtual.  */ >> +    if (DECL_VINDEX (v->decl)) >> +     error_at (DECL_SOURCE_LOCATION (v->decl), >> +         "Virtual function versioning not supported\n"); >> +    if (!v->is_deleted) >> +    VEC_safe_push (tree, heap, fn_ver_vec, v->decl); >> +   } >> + >> +  gcc_assert (targetm.dispatch_version); >> +  targetm.dispatch_version (ifunc_resolver_decl, fn_ver_vec, &empty_bb); >> +  decl_v->ifunc_resolver_decl = ifunc_resolver_decl; >> + >> +  return ifunc_resolver_decl; >> +} >> + >> +/* Main entry point to pass_dispatch_versions. For multi-versioned functions, >> +  generate the dispatching code.  */ >> + >> +static unsigned int >> +do_dispatch_versions (void) >> +{ >> +  /* A new pass for generating dispatch code for multi-versioned functions. >> +   Other forms of dispatch can be added when ifunc support is not available >> +   like just calling the function directly after checking for target type. >> +   Currently, dispatching is done through IFUNC.  This pass will become >> +   more meaningful when other dispatch mechanisms are added.  */ >> + >> +  /* Cloning a function to produce more versions will happen here when the >> +   user requests that via the targetv attribute. For example, >> +   int foo () __attribute__ ((targetv(("arch=core2"), ("arch=corei7")))); >> +   means that the user wants the same body of foo to be versioned for core2 >> +   and corei7.  In that case, this function will be cloned during this >> +   pass.  */ >> + >> +  if (DECL_FUNCTION_VERSIONED (current_function_decl) >> +    && is_default_function (current_function_decl)) >> +   { >> +    tree decl = make_ifunc_resolver_for_version (current_function_decl); >> +    if (dump_file && decl) >> +    dump_function_to_file (decl, dump_file, TDF_BLOCKS); >> +   } >> +  return 0; >> +} >> + >> +static  bool >> +gate_dispatch_versions (void) >> +{ >> +  return true; >> +} >> + >> +/* A pass to generate the dispatch code to execute the appropriate version >> +  of a multi-versioned function at run-time.  */ >> + >> +struct gimple_opt_pass pass_dispatch_versions = >> +{ >> + { >> +  GIMPLE_PASS, >> +  "dispatch_multiversion_functions",   /* name */ >> +  gate_dispatch_versions,        /* gate */ >> +  do_dispatch_versions,             /* execute */ >> +  NULL,                     /* sub */ >> +  NULL,                     /* next */ >> +  0,                  /* static_pass_number */ >> +  TV_MULTIVERSION_DISPATCH,       /* tv_id */ >> +  PROP_cfg,               /* properties_required */ >> +  PROP_cfg,               /* properties_provided */ >> +  0,                  /* properties_destroyed */ >> +  0,                  /* todo_flags_start */ >> +  TODO_dump_func |           /* todo_flags_finish */ >> +  TODO_cleanup_cfg | TODO_dump_cgraph >> + } >> +}; >> Index: cgraphunit.c >> =================================================================== >> --- cgraphunit.c     (revision 184971) >> +++ cgraphunit.c     (working copy) >> @@ -141,6 +141,7 @@ along with GCC; see the file COPYING3.  If not see >>  #include "ipa-inline.h" >>  #include "ipa-utils.h" >>  #include "lto-streamer.h" >> +#include "multiversion.h" >> >>  static void cgraph_expand_all_functions (void); >>  static void cgraph_mark_functions_to_output (void); >> @@ -343,6 +344,13 @@ cgraph_finalize_function (tree decl, bool nested) >>    node->local.redefined_extern_inline = true; >>   } >> >> +  /* If this is a function version and not the default, change the >> +   assembler name of this function.  The DECL names of function >> +   versions are the same, only the assembler names are made unique. >> +   The assembler name is changed by appending the string from >> +   the "targetv" attribute.  */ >> +  version_assembler_name (decl); >> + >>  notice_global_symbol (decl); >>  node->local.finalized = true; >>  node->lowered = DECL_STRUCT_FUNCTION (decl)->cfg != NULL; >> Index: multiversion.h >> =================================================================== >> --- multiversion.h    (revision 0) >> +++ multiversion.h    (revision 0) >> @@ -0,0 +1,52 @@ >> +/* Function Multiversioning. >> +  Copyright (C) 2012 Free Software Foundation, Inc. >> +  Contributed by Sriraman Tallam (tmsriram@google.com) >> + >> +This file is part of GCC. >> + >> +GCC is free software; you can redistribute it and/or modify it under >> +the terms of the GNU General Public License as published by the Free >> +Software Foundation; either version 3, or (at your option) any later >> +version. >> + >> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY >> +WARRANTY; without even the implied warranty of MERCHANTABILITY or >> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License >> +for more details. >> + >> +You should have received a copy of the GNU General Public License >> +along with GCC; see the file COPYING3.  If not see >> +<http://www.gnu.org/licenses/>. */ >> + >> +/* This is the header file which provides the functions to keep track >> +  of functions that are multi-versioned and to generate the dispatch >> +  code to call the right version at run-time.  */ >> + >> +#ifndef GCC_MULTIVERSION_H >> +#define GCC_MULTIVERION_H >> + >> +#include "tree.h" >> + >> +/* Mark DECL1 and DECL2 as function versions.  */ >> +int group_function_versions (const tree decl1, const tree decl2); >> + >> +/* Mark DECL as deleted and no longer a version.  */ >> +void mark_delete_decl_version (const tree decl); >> + >> +/* Returns true if DECL is the default version to be executed if all >> +  other versions are inappropriate at run-time.  */ >> +bool is_default_function (const tree decl); >> + >> +/* Gets the IFUNC dispatcher for this multi-versioned function DECL. DECL >> +  must be the default function in the multi-versioned group.  */ >> +tree get_ifunc_for_version (const tree decl); >> + >> +/* Returns true when only one of DECL1 and DECL2 is marked with "targetv" >> +  or if the "targetv" attribute strings of  DECL1 and DECL2 dont match.  */ >> +bool has_different_version_attributes (const tree decl1, const tree decl2); >> + >> +/* If DECL is a function version and not the default version, the assembler >> +  name of DECL is changed to include the attribute string to keep the >> +  name unambiguous.  */ >> +void version_assembler_name (const tree decl); >> +#endif >> Index: cp/class.c >> =================================================================== >> --- cp/class.c  (revision 184971) >> +++ cp/class.c  (working copy) >> @@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.  If not see >>  #include "tree-dump.h" >>  #include "splay-tree.h" >>  #include "pointer-set.h" >> +#include "multiversion.h" >> >>  /* The number of nested classes being processed.  If we are not in the >>   scope of any class, this is zero.  */ >> @@ -1092,7 +1093,20 @@ add_method (tree type, tree method, tree using_dec >>        || same_type_p (TREE_TYPE (fn_type), >>                TREE_TYPE (method_type)))) >>     { >> -     if (using_decl) >> +     /* For function versions, their parms and types match >> +       but they are not duplicates.  Record function versions >> +       as and when they are found.  */ >> +     if (TREE_CODE (fn) == FUNCTION_DECL >> +       && TREE_CODE (method) == FUNCTION_DECL >> +       && (DECL_FUNCTION_VERSIONED (fn) >> +         || DECL_FUNCTION_VERSIONED (method))) >> +      { >> +       DECL_FUNCTION_VERSIONED (fn) = 1; >> +       DECL_FUNCTION_VERSIONED (method) = 1; >> +       group_function_versions (fn, method); >> +       continue; >> +      } >> +     else if (using_decl) >>       { >>        if (DECL_CONTEXT (fn) == type) >>         /* Defer to the local function.  */ >> @@ -1150,6 +1164,13 @@ add_method (tree type, tree method, tree using_dec >>  else >>   /* Replace the current slot.  */ >>   VEC_replace (tree, method_vec, slot, overload); >> + >> +  /* Change the assembler name of method here if it has "targetv" >> +   attributes.  Since all versions have the same mangled name, >> +   their assembler name is changed by appending the string from >> +   the "targetv" attribute. */ >> +  version_assembler_name (method); >> + >>  return true; >>  } >> >> @@ -6890,8 +6911,11 @@ resolve_address_of_overloaded_function (tree targe >>      if (DECL_ANTICIPATED (fn)) >>       continue; >> >> -     /* See if there's a match.  */ >> -     if (same_type_p (target_fn_type, static_fn_type (fn))) >> +     /* See if there's a match.  For functions that are multi-versioned >> +       match it to the default function.  */ >> +     if (same_type_p (target_fn_type, static_fn_type (fn)) >> +       && (!DECL_FUNCTION_VERSIONED (fn) >> +         || is_default_function (fn))) >>       matches = tree_cons (fn, NULL_TREE, matches); >>     } >>   } >> @@ -7053,6 +7077,21 @@ resolve_address_of_overloaded_function (tree targe >>    perform_or_defer_access_check (access_path, fn, fn); >>   } >> >> +  /* If a pointer to a function that is multi-versioned is requested, the >> +   pointer to the dispatcher function is returned instead.  This works >> +   well because indirectly calling the function will dispatch the right >> +   function version at run-time. Also, the function address is kept >> +   unique.  */ >> +  if (DECL_FUNCTION_VERSIONED (fn) >> +    && is_default_function (fn)) >> +   { >> +    tree ifunc_decl; >> +    ifunc_decl = get_ifunc_for_version (fn); >> +    gcc_assert (ifunc_decl != NULL); >> +    mark_used (fn); >> +    return build_fold_addr_expr (ifunc_decl); >> +   } >> + >>  if (TYPE_PTRFN_P (target_type) || TYPE_PTRMEMFUNC_P (target_type)) >>   return cp_build_addr_expr (fn, flags); >>  else >> Index: cp/decl.c >> =================================================================== >> --- cp/decl.c  (revision 184971) >> +++ cp/decl.c  (working copy) >> @@ -54,6 +54,7 @@ along with GCC; see the file COPYING3.  If not see >>  #include "pointer-set.h" >>  #include "splay-tree.h" >>  #include "plugin.h" >> +#include "multiversion.h" >> >>  /* Possible cases of bad specifiers type used by bad_specifiers. */ >>  enum bad_spec_place { >> @@ -972,6 +973,23 @@ decls_match (tree newdecl, tree olddecl) >>    if (t1 != t2) >>     return 0; >> >> +    /* The decls dont match if they correspond to two different versions >> +     of the same function.  */ >> +    if (compparms (p1, p2) >> +     && same_type_p (TREE_TYPE (f1), TREE_TYPE (f2)) >> +     && (DECL_FUNCTION_VERSIONED (newdecl) >> +       || DECL_FUNCTION_VERSIONED (olddecl)) >> +     && has_different_version_attributes (newdecl, olddecl)) >> +    { >> +     /* One of the decls could be the default without the "targetv" >> +       attribute. Set it to be a versioned function here.  */ >> +     DECL_FUNCTION_VERSIONED (newdecl) = 1; >> +     DECL_FUNCTION_VERSIONED (olddecl) = 1; >> +     /* Accumulate all the versions of a function.  */ >> +     group_function_versions (olddecl, newdecl); >> +     return 0; >> +    } >> + >>    if (CP_DECL_CONTEXT (newdecl) != CP_DECL_CONTEXT (olddecl) >>      && ! (DECL_EXTERN_C_P (newdecl) >>         && DECL_EXTERN_C_P (olddecl))) >> @@ -1482,7 +1500,11 @@ duplicate_decls (tree newdecl, tree olddecl, bool >>        error ("previous declaration %q+#D here", olddecl); >>        return NULL_TREE; >>       } >> -     else if (compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)), >> +     /* For function versions, params and types match, but they >> +       are not ambiguous.  */ >> +     else if ((!DECL_FUNCTION_VERSIONED (newdecl) >> +          && !DECL_FUNCTION_VERSIONED (olddecl)) >> +          && compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)), >>                TYPE_ARG_TYPES (TREE_TYPE (olddecl)))) >>       { >>        error ("new declaration %q#D", newdecl); >> @@ -2250,6 +2272,16 @@ duplicate_decls (tree newdecl, tree olddecl, bool >>  else if (DECL_PRESERVE_P (newdecl)) >>   DECL_PRESERVE_P (olddecl) = 1; >> >> +  /* If the olddecl is a version, so is the newdecl.  */ >> +  if (TREE_CODE (newdecl) == FUNCTION_DECL >> +    && DECL_FUNCTION_VERSIONED (olddecl)) >> +   { >> +    DECL_FUNCTION_VERSIONED (newdecl) = 1; >> +    /* Record that newdecl is not a valid version and has >> +     been deleted.  */ >> +    mark_delete_decl_version (newdecl); >> +   } >> + >>  if (TREE_CODE (newdecl) == FUNCTION_DECL) >>   { >>    int function_size; >> @@ -4512,6 +4544,10 @@ start_decl (const cp_declarator *declarator, >>  /* Enter this declaration into the symbol table.  */ >>  decl = maybe_push_decl (decl); >> >> +  /* If this decl is a function version and not the default, its assembler >> +   name has to be changed.  */ >> +  version_assembler_name (decl); >> + >>  if (processing_template_decl) >>   decl = push_template_decl (decl); >>  if (decl == error_mark_node) >> @@ -13019,6 +13055,10 @@ start_function (cp_decl_specifier_seq *declspecs, >>   gcc_assert (same_type_p (TREE_TYPE (TREE_TYPE (decl1)), >>               integer_type_node)); >> >> +  /* If this decl is a function version and not the default, its assembler >> +   name has to be changed.  */ >> +  version_assembler_name (decl1); >> + >>  start_preparsed_function (decl1, attrs, /*flags=*/SF_DEFAULT); >> >>  return 1; >> @@ -13960,6 +14000,11 @@ cxx_comdat_group (tree decl) >>       break; >>     } >>    name = DECL_ASSEMBLER_NAME (decl); >> +    if (TREE_CODE (decl) == FUNCTION_DECL >> +     && DECL_FUNCTION_VERSIONED (decl)) >> +    name = DECL_NAME (decl); >> +    else >> +     name = DECL_ASSEMBLER_NAME (decl); >>   } >> >>  return name; >> Index: cp/semantics.c >> =================================================================== >> --- cp/semantics.c    (revision 184971) >> +++ cp/semantics.c    (working copy) >> @@ -3783,8 +3783,11 @@ expand_or_defer_fn_1 (tree fn) >>    /* If the user wants us to keep all inline functions, then mark >>     this function as needed so that finish_file will make sure to >>     output it later.  Similarly, all dllexport'd functions must >> -     be emitted; there may be callers in other DLLs.  */ >> -    if ((flag_keep_inline_functions >> +     be emitted; there may be callers in other DLLs. >> +     Also, mark this function as needed if it is marked inline but >> +     is a multi-versioned function.  */ >> +    if (((flag_keep_inline_functions >> +      || DECL_FUNCTION_VERSIONED (fn)) >>      && DECL_DECLARED_INLINE_P (fn) >>      && !DECL_REALLY_EXTERN (fn)) >>      || (flag_keep_inline_dllexport >> Index: cp/decl2.c >> =================================================================== >> --- cp/decl2.c  (revision 184971) >> +++ cp/decl2.c  (working copy) >> @@ -53,6 +53,7 @@ along with GCC; see the file COPYING3.  If not see >>  #include "splay-tree.h" >>  #include "langhooks.h" >>  #include "c-family/c-ada-spec.h" >> +#include "multiversion.h" >> >>  extern cpp_reader *parse_in; >> >> @@ -674,9 +675,13 @@ check_classfn (tree ctype, tree function, tree tem >>      if (is_template != (TREE_CODE (fndecl) == TEMPLATE_DECL)) >>       continue; >> >> +     /* While finding a match, same types and params are not enough >> +       if the function is versioned.  Also check version ("targetv") >> +       attributes.  */ >>      if (same_type_p (TREE_TYPE (TREE_TYPE (function)), >>              TREE_TYPE (TREE_TYPE (fndecl))) >>        && compparms (p1, p2) >> +       && !has_different_version_attributes (function, fndecl) >>        && (!is_template >>          || comp_template_parms (template_parms, >>                      DECL_TEMPLATE_PARMS (fndecl))) >> Index: cp/call.c >> =================================================================== >> --- cp/call.c  (revision 184971) >> +++ cp/call.c  (working copy) >> @@ -41,6 +41,7 @@ along with GCC; see the file COPYING3.  If not see >>  #include "langhooks.h" >>  #include "c-family/c-objc.h" >>  #include "timevar.h" >> +#include "multiversion.h" >> >>  /* The various kinds of conversion.  */ >> >> @@ -6730,6 +6731,17 @@ build_over_call (struct z_candidate *cand, int fla >>  if (!already_used) >>   mark_used (fn); >> >> +  /* For a call to a multi-versioned function, the call should actually be to >> +   the dispatcher.  */ >> +  if (DECL_FUNCTION_VERSIONED (fn)) >> +   { >> +    tree ifunc_decl; >> +    ifunc_decl = get_ifunc_for_version (fn); >> +    gcc_assert (ifunc_decl != NULL); >> +    return build_call_expr_loc_array (UNKNOWN_LOCATION, ifunc_decl, >> +                    nargs, argarray); >> +   } >> + >>  if (DECL_VINDEX (fn) && (flags & LOOKUP_NONVIRTUAL) == 0) >>   { >>    tree t; >> @@ -7980,6 +7992,30 @@ joust (struct z_candidate *cand1, struct z_candida >>  size_t i; >>  size_t len; >> >> +  /* For Candidates of a multi-versioned function, the one marked default >> +   wins.  This is because the default decl is used as key to aggregate >> +   all the other versions provided for it in multiversion.c.  When >> +   generating the actual call, the appropriate dispatcher is created >> +   to call the right function version at run-time.  */ >> + >> +  if ((TREE_CODE (cand1->fn) == FUNCTION_DECL >> +    && DECL_FUNCTION_VERSIONED (cand1->fn)) >> +    ||(TREE_CODE (cand2->fn) == FUNCTION_DECL >> +     && DECL_FUNCTION_VERSIONED (cand2->fn))) >> +   { >> +    if (is_default_function (cand1->fn)) >> +    { >> +      mark_used (cand2->fn); >> +     return 1; >> +    } >> +    if (is_default_function (cand2->fn)) >> +    { >> +      mark_used (cand1->fn); >> +     return -1; >> +    } >> +    return 0; >> +   } >> + >>  /* Candidates that involve bad conversions are always worse than those >>    that don't.  */ >>  if (cand1->viable > cand2->viable) >> Index: timevar.def >> =================================================================== >> --- timevar.def (revision 184971) >> +++ timevar.def (working copy) >> @@ -253,6 +253,7 @@ DEFTIMEVAR (TV_TREE_IFCOMBINE     , "tree if-co >>  DEFTIMEVAR (TV_TREE_UNINIT      , "uninit var analysis") >>  DEFTIMEVAR (TV_PLUGIN_INIT      , "plugin initialization") >>  DEFTIMEVAR (TV_PLUGIN_RUN       , "plugin execution") >> +DEFTIMEVAR (TV_MULTIVERSION_DISPATCH , "multiversion dispatch") >> >>  /* Everything else in rest_of_compilation not included above.  */ >>  DEFTIMEVAR (TV_EARLY_LOCAL      , "early local passes") >> Index: varasm.c >> =================================================================== >> --- varasm.c   (revision 184971) >> +++ varasm.c   (working copy) >> @@ -5755,6 +5755,8 @@ finish_aliases_1 (void) >>     } >>    else if (! (p->emitted_diags & ALIAS_DIAG_TO_EXTERN) >>        && DECL_EXTERNAL (target_decl) >> +        && (!TREE_CODE (target_decl) == FUNCTION_DECL >> +          || !DECL_STRUCT_FUNCTION (target_decl)) >>        /* We use local aliases for C++ thunks to force the tailcall >>          to bind locally.  This is a hack - to keep it working do >>          the following (which is not strictly correct).  */ >> Index: Makefile.in >> =================================================================== >> --- Makefile.in (revision 184971) >> +++ Makefile.in (working copy) >> @@ -1298,6 +1298,7 @@ OBJS = \ >>     mcf.o \ >>     mode-switching.o \ >>     modulo-sched.o \ >> +    multiversion.o \ >>     omega.o \ >>     omp-low.o \ >>     optabs.o \ >> @@ -3030,6 +3031,11 @@ ree.o : ree.c $(CONFIG_H) $(SYSTEM_H) coretypes.h >>   $(DF_H) $(TIMEVAR_H) tree-pass.h $(RECOG_H) $(EXPR_H) \ >>   $(REGS_H) $(TREE_H) $(TM_P_H) insn-config.h $(INSN_ATTR_H) $(DIAGNOSTIC_CORE_H) \ >>   $(TARGET_H) $(OPTABS_H) insn-codes.h rtlhooks-def.h $(PARAMS_H) $(CGRAPH_H) >> +multiversion.o : multiversion.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \ >> +  $(TREE_H) langhooks.h $(TREE_INLINE_H) $(FLAGS_H) $(CGRAPH_H) intl.h \ >> +  $(DIAGNOSTIC_H) $(FIBHEAP_H) $(PARAMS_H) $(TIMEVAR_H) tree-pass.h \ >> +  $(HASHTAB_H) $(COVERAGE_H) $(GGC_H) $(TREE_FLOW_H) $(RTL_H) $(IPA_PROP_H) \ >> +  $(BASIC_BLOCK_H) $(TOPLEV_H) $(TREE_DUMP_H) ipa-inline.h >>  cprop.o : cprop.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \ >>   $(REGS_H) hard-reg-set.h $(FLAGS_H) insn-config.h $(GGC_H) \ >>   $(RECOG_H) $(EXPR_H) $(BASIC_BLOCK_H) $(FUNCTION_H) output.h toplev.h $(DIAGNOSTIC_CORE_H) \ >> Index: passes.c >> =================================================================== >> --- passes.c   (revision 184971) >> +++ passes.c   (working copy) >> @@ -1190,6 +1190,7 @@ init_optimization_passes (void) >>  NEXT_PASS (pass_build_cfg); >>  NEXT_PASS (pass_warn_function_return); >>  NEXT_PASS (pass_build_cgraph_edges); >> +  NEXT_PASS (pass_dispatch_versions); >>  *p = NULL; >> >>  /* Interprocedural optimization passes.  */ >> Index: config/i386/i386.c >> =================================================================== >> --- config/i386/i386.c  (revision 184971) >> +++ config/i386/i386.c  (working copy) >> @@ -27446,6 +27473,593 @@ ix86_init_mmx_sse_builtins (void) >>   } >>  } >> >> +/* This adds a condition to the basic_block NEW_BB in function FUNCTION_DECL >> +  to return a pointer to VERSION_DECL if the outcome of the function >> +  PREDICATE_DECL is true.  This function will be called during version >> +  dispatch to decide which function version to execute.  It returns the >> +  basic block at the end to which more conditions can be added.  */ >> + >> +static basic_block >> +add_condition_to_bb (tree function_decl, tree version_decl, >> +           basic_block new_bb, tree predicate_decl) >> +{ >> +  gimple return_stmt; >> +  tree convert_expr, result_var; >> +  gimple convert_stmt; >> +  gimple call_cond_stmt; >> +  gimple if_else_stmt; >> + >> +  basic_block bb1, bb2, bb3; >> +  edge e12, e23; >> + >> +  tree cond_var; >> +  gimple_seq gseq; >> + >> +  tree old_current_function_decl; >> + >> +  old_current_function_decl = current_function_decl; >> +  push_cfun (DECL_STRUCT_FUNCTION (function_decl)); >> +  current_function_decl = function_decl; >> + >> +  gcc_assert (new_bb != NULL); >> +  gseq = bb_seq (new_bb); >> + >> + >> +  convert_expr = build1 (CONVERT_EXPR, ptr_type_node, >> +             build_fold_addr_expr (version_decl)); >> +  result_var = create_tmp_var (ptr_type_node, NULL); >> +  convert_stmt = gimple_build_assign (result_var, convert_expr); >> +  return_stmt = gimple_build_return (result_var); >> + >> +  if (predicate_decl == NULL_TREE) >> +   { >> +    gimple_seq_add_stmt (&gseq, convert_stmt); >> +    gimple_seq_add_stmt (&gseq, return_stmt); >> +    set_bb_seq (new_bb, gseq); >> +    gimple_set_bb (convert_stmt, new_bb); >> +    gimple_set_bb (return_stmt, new_bb); >> +    pop_cfun (); >> +    current_function_decl = old_current_function_decl; >> +    return new_bb; >> +   } >> + >> +  cond_var = create_tmp_var (integer_type_node, NULL); >> +  call_cond_stmt = gimple_build_call (predicate_decl, 0); >> +  gimple_call_set_lhs (call_cond_stmt, cond_var); >> + >> +  gimple_set_block (call_cond_stmt, DECL_INITIAL (function_decl)); >> +  gimple_set_bb (call_cond_stmt, new_bb); >> +  gimple_seq_add_stmt (&gseq, call_cond_stmt); >> + >> +  if_else_stmt = gimple_build_cond (GT_EXPR, cond_var, >> +                  integer_zero_node, >> +                  NULL_TREE, NULL_TREE); >> +  gimple_set_block (if_else_stmt, DECL_INITIAL (function_decl)); >> +  gimple_set_bb (if_else_stmt, new_bb); >> +  gimple_seq_add_stmt (&gseq, if_else_stmt); >> + >> +  gimple_seq_add_stmt (&gseq, convert_stmt); >> +  gimple_seq_add_stmt (&gseq, return_stmt); >> +  set_bb_seq (new_bb, gseq); >> + >> +  bb1 = new_bb; >> +  e12 = split_block (bb1, if_else_stmt); >> +  bb2 = e12->dest; >> +  e12->flags &= ~EDGE_FALLTHRU; >> +  e12->flags |= EDGE_TRUE_VALUE; >> + >> +  e23 = split_block (bb2, return_stmt); >> + >> +  gimple_set_bb (convert_stmt, bb2); >> +  gimple_set_bb (return_stmt, bb2); >> + >> +  bb3 = e23->dest; >> +  make_edge (bb1, bb3, EDGE_FALSE_VALUE); >> + >> +  remove_edge (e23); >> +  make_edge (bb2, EXIT_BLOCK_PTR, 0); >> + >> +  rebuild_cgraph_edges (); >> + >> +  pop_cfun (); >> +  current_function_decl = old_current_function_decl; >> + >> +  return bb3; >> +} >> + >> +/* This parses the attribute arguments to targetv in DECL and determines >> +  the right builtin to use to match the platform specification. >> +  For now, only one target argument ("arch=") is allowed.  */ >> + >> +static enum ix86_builtins >> +get_builtin_code_for_version (tree decl) >> +{ >> +  tree attrs; >> +  struct cl_target_option cur_target; >> +  tree target_node; >> +  struct cl_target_option *new_target; >> +  enum ix86_builtins builtin_code = IX86_BUILTIN_MAX; >> + >> +  attrs = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl)); >> +  gcc_assert (attrs != NULL); >> + >> +  cl_target_option_save (&cur_target, &global_options); >> + >> +  target_node = ix86_valid_target_attribute_tree >> +         (TREE_VALUE (TREE_VALUE (attrs))); >> + >> +  gcc_assert (target_node); >> +  new_target = TREE_TARGET_OPTION (target_node); >> +  gcc_assert (new_target); >> + >> +  if (new_target->arch_specified && new_target->arch > 0) >> +   { >> +    switch (new_target->arch) >> +     { >> +    case 1: >> +    case 2: >> +    case 3: >> +    case 4: >> +    case 5: >> +    case 6: >> +    case 7: >> +    case 8: >> +    case 9: >> +    case 10: >> +    case 11: >> +     builtin_code = IX86_BUILTIN_CPU_IS_INTEL; >> +     break; >> +    case 12: >> +     builtin_code = IX86_BUILTIN_CPU_IS_INTEL_CORE2; >> +     break; >> +    case 13: >> +     builtin_code = IX86_BUILTIN_CPU_IS_INTEL_COREI7; >> +     break; >> +    case 14: >> +     builtin_code = IX86_BUILTIN_CPU_IS_INTEL_ATOM; >> +     break; >> +    case 15: >> +    case 16: >> +    case 17: >> +    case 18: >> +    case 19: >> +    case 20: >> +    case 21: >> +     builtin_code = IX86_BUILTIN_CPU_IS_AMD; >> +     break; >> +    case 22: >> +     builtin_code = IX86_BUILTIN_CPU_IS_AMDFAM10H; >> +     break; >> +    case 23: >> +     builtin_code = IX86_BUILTIN_CPU_IS_AMDFAM15H_BDVER1; >> +     break; >> +    case 24: >> +     builtin_code = IX86_BUILTIN_CPU_IS_AMDFAM15H_BDVER2; >> +     break; >> +    case 25: /* What is btver1 ? */ >> +     builtin_code = IX86_BUILTIN_CPU_IS_AMD; >> +     break; >> +    } >> +   } >> + >> +  cl_target_option_restore (&global_options, &cur_target); >> +  if (builtin_code == IX86_BUILTIN_MAX) >> +    error_at (DECL_SOURCE_LOCATION (decl), >> +        "No dispatcher found for the versioning attributes"); >> + >> +  return builtin_code; >> +} >> + >> +/* This is the target hook to generate the dispatch function for >> +  multi-versioned functions.  DISPATCH_DECL is the function which will >> +  contain the dispatch logic.  FNDECLS are the function choices for >> +  dispatch, and is a tree chain.  EMPTY_BB is the basic block pointer >> +  in DISPATCH_DECL in which the dispatch code is generated.  */ >> + >> +static int >> +ix86_dispatch_version (tree dispatch_decl, >> +            void *fndecls_p, >> +            basic_block *empty_bb) >> +{ >> +  tree default_decl; >> +  gimple ifunc_cpu_init_stmt; >> +  gimple_seq gseq; >> +  tree old_current_function_decl; >> +  int ix; >> +  tree ele; >> +  VEC (tree, heap) *fndecls; >> + >> +  gcc_assert (dispatch_decl != NULL >> +       && fndecls_p != NULL >> +       && empty_bb != NULL); >> + >> +  /*fndecls_p is actually a vector.  */ >> +  fndecls = (VEC (tree, heap) *)fndecls_p; >> + >> +  /* Atleast one more version other than the default.  */ >> +  gcc_assert (VEC_length (tree, fndecls) >= 2); >> + >> +  /* The first version in the vector is the default decl.  */ >> +  default_decl = VEC_index (tree, fndecls, 0); >> + >> +  old_current_function_decl = current_function_decl; >> +  push_cfun (DECL_STRUCT_FUNCTION (dispatch_decl)); >> +  current_function_decl = dispatch_decl; >> + >> +  gseq = bb_seq (*empty_bb); >> +  ifunc_cpu_init_stmt = gimple_build_call_vec ( >> +           ix86_builtins [(int) IX86_BUILTIN_CPU_INIT], NULL); >> +  gimple_seq_add_stmt (&gseq, ifunc_cpu_init_stmt); >> +  gimple_set_bb (ifunc_cpu_init_stmt, *empty_bb); >> +  set_bb_seq (*empty_bb, gseq); >> + >> +  pop_cfun (); >> +  current_function_decl = old_current_function_decl; >> + >> + >> +  for (ix = 1; VEC_iterate (tree, fndecls, ix, ele); ++ix) >> +   { >> +    tree version_decl = ele; >> +    /* Get attribute string, parse it and find the right predicate decl. >> +     The predicate function could be a lengthy combination of many >> +     features, like arch-type and various isa-variants.  For now, only >> +     check the arch-type.  */ >> +    tree predicate_decl = ix86_builtins [ >> +            get_builtin_code_for_version (version_decl)]; >> +    *empty_bb = add_condition_to_bb (dispatch_decl, version_decl, *empty_bb, >> +                    predicate_decl); >> + >> +   } >> +  /* dispatch default version at the end.  */ >> +  *empty_bb = add_condition_to_bb (dispatch_decl, default_decl, *empty_bb, >> +                  NULL); >> +  return 0; >> +} >> >> @@ -38610,6 +39269,12 @@ ix86_autovectorize_vector_sizes (void) >>  #undef TARGET_BUILD_BUILTIN_VA_LIST >>  #define TARGET_BUILD_BUILTIN_VA_LIST ix86_build_builtin_va_list >> >> +#undef TARGET_DISPATCH_VERSION >> +#define TARGET_DISPATCH_VERSION ix86_dispatch_version >> + >>  #undef TARGET_ENUM_VA_LIST_P >>  #define TARGET_ENUM_VA_LIST_P ix86_enum_va_list >> >> Index: testsuite/g++.dg/mv1.C >> =================================================================== >> --- testsuite/g++.dg/mv1.C    (revision 0) >> +++ testsuite/g++.dg/mv1.C    (revision 0) >> @@ -0,0 +1,23 @@ >> +/* Simple test case to check if Multiversioning works.  */ >> +/* { dg-do run } */ >> +/* { dg-options "-O2" } */ >> + >> +int foo (); >> +int foo () __attribute__ ((targetv("arch=corei7"))); >> + >> +int main () >> +{ >> +  int (*p)() = &foo; >> +  return foo () + (*p)(); >> +} >> + >> +int foo () >> +{ >> +  return 0; >> +} >> + >> +int __attribute__ ((targetv("arch=corei7"))) >> +foo () >> +{ >> +  return 0; >> +} >> >> >> -- >> This patch is available for review at http://codereview.appspot.com/5752064
Sign in to reply to this message.
Hi, I have made the following changes in this new patch which is attached: * Use target attribute itself to create function versions. * Handle any number of ISA names and arch= args to target attribute, generating the right dispatchers. * Integrate with the CPU runtime detection checked in this week. * Overload resolution: If the caller's target matches any of the version function's target, then a direct call to the version is generated, no need to go through the dispatching. Patch also available for review here: http://codereview.appspot.com/5752064 Thanks, -Sri. On Fri, Mar 9, 2012 at 12:04 PM, Sriraman Tallam <tmsriram@google.com> wrote: > Hi Richard, > >  Here is a more detailed overview of the front-end description: > > * Tracking decls that correspond to function versions of function > name, say "foo": > > Wnen the front-end sees a decl for "foo" with "targetv" attributes, it > tags it as a function version. To prevent duplicate definition errors > with other versions of "foo", I change "decls_match" function in > cp/decl.c to return false when 2 decls have the same signature but > different targetv attributes. This will make all function versions of > "foo" to be added to the overload list of "foo". > > To expand further, different targetv attributes is checked for by > sorting the arguments to targetv. > > * Change the assembler names of the function versions. > > The front-end, changes the assembler names of the function versions by > tagging the sorted list of args to "targetv" to the function name of > "foo". For example, the assembler name of "void foo () __attribute__ > ((targetv ("sse4")))" will become _Z3foov.sse4. > > * Separately group all function versions of "foo" together, in multiversion.c: > > File multiversion.c maintains a hashtab, decl_version_htab,  that maps > the  default function decl of "foo" to the list of all other versions > of this function "foo". This is meant to be used when creating the > dispatcher for this function. > > * Overload resolution: > >  Function "build_over_call" in cp/call.c sees a call to function > "foo", which is multi-versioned. The overload resolution happens in > function "joust" in "cp/call.c". Here, the call to "foo" has all > possible versions of "foo" as candidates. Currently, "joust" returns > the default version of "foo" as the winning candidate. But, > "build_over_call" realizes that this is a versioned function and > replaces the call-site of foo with a "ifunc" call for foo, by querying > a function in "multiversion.c" which builds the ifunc decl. After > this, all call-sites of "foo" contain the call to the ifunc. > > Notice that, for  calls from a sse function to a versioned function > with an sse variant, I can modify "joust" to return the "sse" function > version rather than the default and not replace this call with an > ifunc. To do this, I must pass the target attributes of the callee to > "joust" and check if the target attributes also match any version. > > * Creating the dispatcher: > > The dispatcher is independently created in a new pass, called > "pass_dispatch_version", that runs immediately after cfg and cgraph is > created. The dispatcher looks at all possible versions and queries the > target to give it the CPU detection predicates it must use to dispatch > each version. Then, the dispatcher body is created and the ifunc is > mapped to use this dispatcher. > > Notice that only the dispatcher creation is done after the front-end. > Everything else occurs in the front-end itself. I could have created > the dispatcher also in the front-end. I did not do so because I > thought keeping it as a separate pass made sense to easily add more > dispatch mechanisms. Like when IFUNC is not available, replace it with >  control-flow to make direct calls to the function versions. Also, > making the dispatcher after "cfg" is created was easy. > > Thanks, > -Sri. > > > On Wed, Mar 7, 2012 at 6:05 AM, Richard Guenther > <richard.guenther@gmail.com> wrote: >> On Wed, Mar 7, 2012 at 1:46 AM, Sriraman Tallam <tmsriram@google.com> wrote: >>> User directed Function Multiversioning (MV) via Function Overloading >>> ==================================================================== >>> >>> This patch adds support for user directed function MV via function overloading. >>> For more detailed description: >>> http://gcc.gnu.org/ml/gcc/2012-03/msg00074.html >>> >>> >>> Here is an example program with function versions: >>> >>> int foo ();  /* Default version */ >>> int foo () __attribute__ ((targetv("arch=corei7")));/*Specialized for corei7 */ >>> int foo () __attribute__ ((targetv("arch=core2")));/*Specialized for core2 */ >>> >>> int main () >>> { >>>  int (*p)() = &foo; >>>  return foo () + (*p)(); >>> } >>> >>> int foo () >>> { >>>  return 0; >>> } >>> >>> int __attribute__ ((targetv("arch=corei7"))) >>> foo () >>> { >>>  return 0; >>> } >>> >>> int __attribute__ ((targetv("arch=core2"))) >>> foo () >>> { >>>  return 0; >>> } >>> >>> The above example has foo defined 3 times, but all 3 definitions of foo are >>> different versions of the same function. The call to foo in main, directly and >>> via a pointer, are calls to the multi-versioned function foo which is dispatched >>> to the right foo at run-time. >>> >>> Function versions must have the same signature but must differ in the specifier >>> string provided to a new attribute called "targetv", which is nothing but the >>> target attribute with an extra specification to indicate a version. Any number >>> of versions can be created using the targetv attribute but it is mandatory to >>> have one function without the attribute, which is treated as the default >>> version. >>> >>> The dispatching is done using the IFUNC mechanism to keep the dispatch overhead >>> low. The compiler creates a dispatcher function which checks the CPU type and >>> calls the right version of foo. The dispatching code checks for the platform >>> type and calls the first version that matches. The default function is called if >>> no specialized version is appropriate for execution. >>> >>> The pointer to foo is made to be the address of the dispatcher function, so that >>> it is unique and calls made via the pointer also work correctly. The assembler >>> names of the various versions of foo is made different, by tagging >>> the specifier strings, to keep them unique.  A specific version can be called >>> directly by creating an alias to its assembler name. For instance, to call the >>> corei7 version directly, make an alias : >>> int foo_corei7 () __attribute__((alias ("_Z3foov.arch_corei7"))); >>> and then call foo_corei7. >>> >>> Note that using IFUNC  blocks inlining of versioned functions. I had implemented >>> an optimization earlier to do hot path cloning to allow versioned functions to >>> be inlined. Please see : http://gcc.gnu.org/ml/gcc-patches/2011-04/msg02285.html >>> In the next iteration, I plan to merge these two. With that, hot code paths with >>> versioned functions will be cloned so that versioned functions can be inlined. >> >> Note that inlining of functions with the target attribute is limited as well, >> but your issue is that of the indirect dispatch as ... >> >> You don't give an overview of the frontend implementation.  Thus I have >> extracted the following >> >>  - the FE does not really know about the "overloading", nor can it directly >>  resolve calls from a "sse" function to another "sse" function without going >>  through the 2nd IFUNC >> >>  - cgraph also does not know about the "overloading", so it cannot do such >>  "devirtualization" either >> >> you seem to have implemented something inbetween a pure frontend >> solution and a proper middle-end solution.  For optimization and eventually >> automatically selecting functions for cloning (like, callees of a manual "sse" >> versioned function should be cloned?) it would be nice if the cgraph would >> know about the different versions and their relationships (and the dispatcher). >> Especially the cgraph code should know the functions are semantically >> equivalent (I suppose we should require that).  The IFUNC should be >> generated by cgraph / target code, similar to how we generate C++ thunks. >> >> Honza, any suggestions on how the FE side of such cgraph infrastructure >> should look like and how we should encode the target bits? >> >> Thanks, >> Richard. >> >>>     * doc/tm.texi.in: Add description for TARGET_DISPATCH_VERSION. >>>     * doc/tm.texi: Regenerate. >>>     * c-family/c-common.c (handle_targetv_attribute): New function. >>>     * target.def (dispatch_version): New target hook. >>>     * tree.h (DECL_FUNCTION_VERSIONED): New macro. >>>     (tree_function_decl): New bit-field versioned_function. >>>     * tree-pass.h (pass_dispatch_versions): New pass. >>>     * multiversion.c: New file. >>>     * multiversion.h: New file. >>>     * cgraphunit.c: Include multiversion.h >>>     (cgraph_finalize_function): Change assembler names of versioned >>>     functions. >>>     * cp/class.c: Include multiversion.h >>>     (add_method): aggregate function versions. Change assembler names of >>>     versioned functions. >>>     (resolve_address_of_overloaded_function): Match address of function >>>     version with default function.  Return address of ifunc dispatcher >>>     for address of versioned functions. >>>     * cp/decl.c (decls_match): Make decls unmatched for versioned >>>     functions. >>>     (duplicate_decls): Remove ambiguity for versioned functions. Notify >>>     of deleted function version decls. >>>     (start_decl): Change assembler name of versioned functions. >>>     (start_function): Change assembler name of versioned functions. >>>     (cxx_comdat_group): Make comdat group of versioned functions be the >>>     same. >>>     * cp/semantics.c (expand_or_defer_fn_1): Mark as needed versioned >>>     functions that are also marked inline. >>>     * cp/decl2.c: Include multiversion.h >>>     (check_classfn): Check attributes of versioned functions for match. >>>     * cp/call.c: Include multiversion.h >>>     (build_over_call): Make calls to multiversioned functions to call the >>>     dispatcher. >>>     (joust): For calls to multi-versioned functions, make the default >>>     function win. >>>     * timevar.def (TV_MULTIVERSION_DISPATCH): New time var. >>>     * varasm.c (finish_aliases_1): Check if the alias points to a function >>>     with a body before giving an error. >>>     * Makefile.in: Add multiversion.o >>>     * passes.c: Add pass_dispatch_versions to the pass list. >>>     * config/i386/i386.c (add_condition_to_bb): New function. >>>     (get_builtin_code_for_version): New function. >>>     (ix86_dispatch_version): New function. >>>     (TARGET_DISPATCH_VERSION): New macro. >>>     * testsuite/g++.dg/mv1.C: New test. >>> >>> Index: doc/tm.texi >>> =================================================================== >>> --- doc/tm.texi (revision 184971) >>> +++ doc/tm.texi (working copy) >>> @@ -10995,6 +10995,14 @@ The result is another tree containing a simplified >>>  call's result.  If @var{ignore} is true the value will be ignored. >>>  @end deftypefn >>> >>> +@deftypefn {Target Hook} int TARGET_DISPATCH_VERSION (tree @var{dispatch_decl}, void *@var{fndecls}, basic_block *@var{empty_bb}) >>> +For multi-versioned function, this hook sets up the dispatcher. >>> +@var{dispatch_decl} is the function that will be used to dispatch the >>> +version. @var{fndecls} are the function choices for dispatch. >>> +@var{empty_bb} is an basic block in @var{dispatch_decl} where the >>> +code to do the dispatch will be added. >>> +@end deftypefn >>> + >>>  @deftypefn {Target Hook} {const char *} TARGET_INVALID_WITHIN_DOLOOP (const_rtx @var{insn}) >>> >>>  Take an instruction in @var{insn} and return NULL if it is valid within a >>> Index: doc/tm.texi.in >>> =================================================================== >>> --- doc/tm.texi.in    (revision 184971) >>> +++ doc/tm.texi.in    (working copy) >>> @@ -10873,6 +10873,14 @@ The result is another tree containing a simplified >>>  call's result.  If @var{ignore} is true the value will be ignored. >>>  @end deftypefn >>> >>> +@hook TARGET_DISPATCH_VERSION >>> +For multi-versioned function, this hook sets up the dispatcher. >>> +@var{dispatch_decl} is the function that will be used to dispatch the >>> +version. @var{fndecls} are the function choices for dispatch. >>> +@var{empty_bb} is an basic block in @var{dispatch_decl} where the >>> +code to do the dispatch will be added. >>> +@end deftypefn >>> + >>>  @hook TARGET_INVALID_WITHIN_DOLOOP >>> >>>  Take an instruction in @var{insn} and return NULL if it is valid within a >>> Index: c-family/c-common.c >>> =================================================================== >>> --- c-family/c-common.c (revision 184971) >>> +++ c-family/c-common.c (working copy) >>> @@ -315,6 +315,7 @@ static tree check_case_value (tree); >>>  static bool check_case_bounds (tree, tree, tree *, tree *); >>> >>>  static tree handle_packed_attribute (tree *, tree, tree, int, bool *); >>> +static tree handle_targetv_attribute (tree *, tree, tree, int, bool *); >>>  static tree handle_nocommon_attribute (tree *, tree, tree, int, bool *); >>>  static tree handle_common_attribute (tree *, tree, tree, int, bool *); >>>  static tree handle_noreturn_attribute (tree *, tree, tree, int, bool *); >>> @@ -604,6 +605,8 @@ const struct attribute_spec c_common_attribute_tab >>>  { >>>  /* { name, min_len, max_len, decl_req, type_req, fn_type_req, handler, >>>     affects_type_identity } */ >>> +  { "targetv",        1, -1, true, false, false, >>> +               handle_targetv_attribute, false }, >>>  { "packed",         0, 0, false, false, false, >>>                handle_packed_attribute , false}, >>>  { "nocommon",        0, 0, true,  false, false, >>> @@ -5869,6 +5872,54 @@ handle_packed_attribute (tree *node, tree name, tr >>>  return NULL_TREE; >>>  } >>> >>> +/* The targetv attribue is used to specify a function version >>> +  targeted to specific platform types.  The "targetv" attributes >>> +  have to be valid "target" attributes.  NODE should always point >>> +  to a FUNCTION_DECL.  ARGS contain the arguments to "targetv" >>> +  which should be valid arguments to attribute "target" too. >>> +  Check handle_target_attribute for FLAGS and NO_ADD_ATTRS.  */ >>> + >>> +static tree >>> +handle_targetv_attribute (tree *node, tree name, >>> +             tree args, >>> +             int flags, >>> +             bool *no_add_attrs) >>> +{ >>> +  const char *attr_str = NULL; >>> +  gcc_assert (TREE_CODE (*node) == FUNCTION_DECL); >>> +  gcc_assert (args != NULL); >>> + >>> +  /* This is a function version.  */ >>> +  DECL_FUNCTION_VERSIONED (*node) = 1; >>> + >>> +  attr_str = TREE_STRING_POINTER (TREE_VALUE (args)); >>> + >>> +  /* Check if multiple sets of target attributes are there.  This >>> +   is not supported now.  In future, this will be supported by >>> +   cloning this function for each set.  */ >>> +  if (TREE_CHAIN (args) != NULL) >>> +   warning (OPT_Wattributes, "%qE attribute has multiple sets which " >>> +       "is not supported", name); >>> + >>> +  if (attr_str == NULL >>> +    || strstr (attr_str, "arch=") == NULL) >>> +   error_at (DECL_SOURCE_LOCATION (*node), >>> +       "Versioning supported only on \"arch=\" for now"); >>> + >>> +  /* targetv attributes must translate into target attributes.  */ >>> +  handle_target_attribute (node, get_identifier ("target"), args, flags, >>> +              no_add_attrs); >>> + >>> +  if (*no_add_attrs) >>> +   warning (OPT_Wattributes, "%qE attribute has no effect", name); >>> + >>> +  /* This is necessary to keep the attribute tagged to the decl >>> +   all the time.  */ >>> +  *no_add_attrs = false; >>> + >>> +  return NULL_TREE; >>> +} >>> + >>>  /* Handle a "nocommon" attribute; arguments as in >>>   struct attribute_spec.handler.  */ >>> >>> Index: target.def >>> =================================================================== >>> --- target.def  (revision 184971) >>> +++ target.def  (working copy) >>> @@ -1249,6 +1249,15 @@ DEFHOOK >>>  tree, (tree fndecl, int n_args, tree *argp, bool ignore), >>>  hook_tree_tree_int_treep_bool_null) >>> >>> +/* Target hook to generate the dispatching code for calls to multi-versioned >>> +  functions.  DISPATCH_DECL is the function that will have the dispatching >>> +  logic.  FNDECLS are the list of choices for dispatch and EMPTY_BB is the >>> +  basic bloc in DISPATCH_DECL which will contain the code.  */ >>> +DEFHOOK >>> +(dispatch_version, >>> + "", >>> + int, (tree dispatch_decl, void *fndecls, basic_block *empty_bb), NULL) >>> + >>>  /* Returns a code for a target-specific builtin that implements >>>   reciprocal of the function, or NULL_TREE if not available.  */ >>>  DEFHOOK >>> Index: tree.h >>> =================================================================== >>> --- tree.h    (revision 184971) >>> +++ tree.h    (working copy) >>> @@ -3532,6 +3532,12 @@ extern VEC(tree, gc) **decl_debug_args_insert (tre >>>  #define DECL_FUNCTION_SPECIFIC_OPTIMIZATION(NODE) \ >>>   (FUNCTION_DECL_CHECK (NODE)->function_decl.function_specific_optimization) >>> >>> +/* In FUNCTION_DECL, this is set if this function has other versions generated >>> +  using "targetv" attributes.  The default version is the one which does not >>> +  have any "targetv" attribute set. */ >>> +#define DECL_FUNCTION_VERSIONED(NODE)\ >>> +  (FUNCTION_DECL_CHECK (NODE)->function_decl.versioned_function) >>> + >>>  /* FUNCTION_DECL inherits from DECL_NON_COMMON because of the use of the >>>   arguments/result/saved_tree fields by front ends.  It was either inherit >>>   FUNCTION_DECL from non_common, or inherit non_common from FUNCTION_DECL, >>> @@ -3576,8 +3582,8 @@ struct GTY(()) tree_function_decl { >>>  unsigned looping_const_or_pure_flag : 1; >>>  unsigned has_debug_args_flag : 1; >>>  unsigned tm_clone_flag : 1; >>> - >>> -  /* 1 bit left */ >>> +  unsigned versioned_function : 1; >>> +  /* No bits left.  */ >>>  }; >>> >>>  /* The source language of the translation-unit.  */ >>> Index: tree-pass.h >>> =================================================================== >>> --- tree-pass.h (revision 184971) >>> +++ tree-pass.h (working copy) >>> @@ -455,6 +455,7 @@ extern struct gimple_opt_pass pass_tm_memopt; >>>  extern struct gimple_opt_pass pass_tm_edges; >>>  extern struct gimple_opt_pass pass_split_functions; >>>  extern struct gimple_opt_pass pass_feedback_split_functions; >>> +extern struct gimple_opt_pass pass_dispatch_versions; >>> >>>  /* IPA Passes */ >>>  extern struct simple_ipa_opt_pass pass_ipa_lower_emutls; >>> Index: multiversion.c >>> =================================================================== >>> --- multiversion.c    (revision 0) >>> +++ multiversion.c    (revision 0) >>> @@ -0,0 +1,798 @@ >>> +/* Function Multiversioning. >>> +  Copyright (C) 2012 Free Software Foundation, Inc. >>> +  Contributed by Sriraman Tallam (tmsriram@google.com) >>> + >>> +This file is part of GCC. >>> + >>> +GCC is free software; you can redistribute it and/or modify it under >>> +the terms of the GNU General Public License as published by the Free >>> +Software Foundation; either version 3, or (at your option) any later >>> +version. >>> + >>> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY >>> +WARRANTY; without even the implied warranty of MERCHANTABILITY or >>> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License >>> +for more details. >>> + >>> +You should have received a copy of the GNU General Public License >>> +along with GCC; see the file COPYING3.  If not see >>> +<http://www.gnu.org/licenses/>. */ >>> + >>> +/* Holds the state for multi-versioned functions here. The front-end >>> +  updates the state as and when function versions are encountered. >>> +  This is then used to generate the dispatch code.  Also, the >>> +  optimization passes to clone hot paths involving versioned functions >>> +  will be done here. >>> + >>> +  Function versions are created by using the same function signature but >>> +  also tagging attribute "targetv" to specify the platform type for which >>> +  the version must be executed.  Here is an example: >>> + >>> +  int foo () >>> +  { >>> +   printf ("Execute as default"); >>> +   return 0; >>> +  } >>> + >>> +  int  __attribute__ ((targetv ("arch=corei7"))) >>> +  foo () >>> +  { >>> +   printf ("Execute for corei7"); >>> +   return 0; >>> +  } >>> + >>> +  int main () >>> +  { >>> +   return foo (); >>> +  } >>> + >>> +  The call to foo in main is replaced with a call to an IFUNC function that >>> +  contains the dispatch code to call the correct function version at >>> +  run-time.  */ >>> + >>> + >>> +#include "config.h" >>> +#include "system.h" >>> +#include "coretypes.h" >>> +#include "tm.h" >>> +#include "tree.h" >>> +#include "tree-inline.h" >>> +#include "langhooks.h" >>> +#include "flags.h" >>> +#include "cgraph.h" >>> +#include "diagnostic.h" >>> +#include "toplev.h" >>> +#include "timevar.h" >>> +#include "params.h" >>> +#include "fibheap.h" >>> +#include "intl.h" >>> +#include "tree-pass.h" >>> +#include "hashtab.h" >>> +#include "coverage.h" >>> +#include "ggc.h" >>> +#include "tree-flow.h" >>> +#include "rtl.h" >>> +#include "ipa-prop.h" >>> +#include "basic-block.h" >>> +#include "toplev.h" >>> +#include "dbgcnt.h" >>> +#include "tree-dump.h" >>> +#include "output.h" >>> +#include "vecprim.h" >>> +#include "gimple-pretty-print.h" >>> +#include "ipa-inline.h" >>> +#include "target.h" >>> +#include "multiversion.h" >>> + >>> +typedef void * void_p; >>> + >>> +DEF_VEC_P (void_p); >>> +DEF_VEC_ALLOC_P (void_p, heap); >>> + >>> +/* Each function decl that is a function version gets an instance of this >>> +  structure.  Since this is called by the front-end, decl merging can >>> +  happen, where a decl created for a new declaration is merged with >>> +  the old. In this case, the new decl is deleted and the IS_DELETED >>> +  field is set for the struct instance corresponding to the new decl. >>> +  IFUNC_DECL is the decl of the ifunc function for default decls. >>> +  IFUNC_RESOLVER_DECL is the decl of the dispatch function.  VERSIONS >>> +  is a vector containing the list of function versions  that are >>> +  the candidates for dispatch.  */ >>> + >>> +typedef struct version_function_d { >>> +  tree decl; >>> +  tree ifunc_decl; >>> +  tree ifunc_resolver_decl; >>> +  VEC (void_p, heap) *versions; >>> +  bool is_deleted; >>> +} version_function; >>> + >>> +/* Hashmap has an entry for every function decl that has other function >>> +  versions.  For function decls that are the default, it also stores the >>> +  list of all the other function versions.  Each entry is a structure >>> +  of type version_function_d.  */ >>> +static htab_t decl_version_htab = NULL; >>> + >>> +/* Hashtable helpers for decl_version_htab. */ >>> + >>> +static hashval_t >>> +decl_version_htab_hash_descriptor (const void *p) >>> +{ >>> +  const version_function *t = (const version_function *) p; >>> +  return htab_hash_pointer (t->decl); >>> +} >>> + >>> +/* Hashtable helper for decl_version_htab. */ >>> + >>> +static int >>> +decl_version_htab_eq_descriptor (const void *p1, const void *p2) >>> +{ >>> +  const version_function *t1 = (const version_function *) p1; >>> +  return htab_eq_pointer ((const void_p) t1->decl, p2); >>> +} >>> + >>> +/* Create the decl_version_htab.  */ >>> +static void >>> +create_decl_version_htab (void) >>> +{ >>> +  if (decl_version_htab == NULL) >>> +   decl_version_htab = htab_create (10, decl_version_htab_hash_descriptor, >>> +                   decl_version_htab_eq_descriptor, NULL); >>> +} >>> + >>> +/* Creates an instance of version_function for decl DECL.  */ >>> + >>> +static version_function* >>> +new_version_function (const tree decl) >>> +{ >>> +  version_function *v; >>> +  v = (version_function *)xmalloc(sizeof (version_function)); >>> +  v->decl = decl; >>> +  v->ifunc_decl = NULL; >>> +  v->ifunc_resolver_decl = NULL; >>> +  v->versions = NULL; >>> +  v->is_deleted = false; >>> +  return v; >>> +} >>> + >>> +/* Comparator function to be used in qsort routine to sort attribute >>> +  specification strings to "targetv".  */ >>> + >>> +static int >>> +attr_strcmp (const void *v1, const void *v2) >>> +{ >>> +  const char *c1 = *(char *const*)v1; >>> +  const char *c2 = *(char *const*)v2; >>> +  return strcmp (c1, c2); >>> +} >>> + >>> +/* STR is the argument to targetv attribute.  This function tokenizes >>> +  the comma separated arguments, sorts them and returns a string which >>> +  is a unique identifier for the comma separated arguments.  */ >>> + >>> +static char * >>> +sorted_attr_string (const char *str) >>> +{ >>> +  char **args = NULL; >>> +  char *attr_str, *ret_str; >>> +  char *attr = NULL; >>> +  unsigned int argnum = 1; >>> +  unsigned int i; >>> + >>> +  for (i = 0; i < strlen (str); i++) >>> +   if (str[i] == ',') >>> +    argnum++; >>> + >>> +  attr_str = (char *)xmalloc (strlen (str) + 1); >>> +  strcpy (attr_str, str); >>> + >>> +  for (i = 0; i < strlen (attr_str); i++) >>> +   if (attr_str[i] == '=') >>> +    attr_str[i] = '_'; >>> + >>> +  if (argnum == 1) >>> +   return attr_str; >>> + >>> +  args = (char **)xmalloc (argnum * sizeof (char *)); >>> + >>> +  i = 0; >>> +  attr = strtok (attr_str, ","); >>> +  while (attr != NULL) >>> +   { >>> +    args[i] = attr; >>> +    i++; >>> +    attr = strtok (NULL, ","); >>> +   } >>> + >>> +  qsort (args, argnum, sizeof (char*), attr_strcmp); >>> + >>> +  ret_str = (char *)xmalloc (strlen (str) + 1); >>> +  strcpy (ret_str, args[0]); >>> +  for (i = 1; i < argnum; i++) >>> +   { >>> +    strcat (ret_str, "_"); >>> +    strcat (ret_str, args[i]); >>> +   } >>> + >>> +  free (args); >>> +  free (attr_str); >>> +  return ret_str; >>> +} >>> + >>> +/* Returns true when only one of DECL1 and DECL2 is marked with "targetv" >>> +  or if the "targetv" attribute strings of DECL1 and DECL2 dont match.  */ >>> + >>> +bool >>> +has_different_version_attributes (const tree decl1, const tree decl2) >>> +{ >>> +  tree attr1, attr2; >>> +  char *c1, *c2; >>> +  bool ret = false; >>> + >>> +  if (TREE_CODE (decl1) != FUNCTION_DECL >>> +    || TREE_CODE (decl2) != FUNCTION_DECL) >>> +   return false; >>> + >>> +  attr1 = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl1)); >>> +  attr2 = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl2)); >>> + >>> +  if (attr1 == NULL_TREE && attr2 == NULL_TREE) >>> +   return false; >>> + >>> +  if ((attr1 == NULL_TREE && attr2 != NULL_TREE) >>> +    || (attr1 != NULL_TREE && attr2 == NULL_TREE)) >>> +   return true; >>> + >>> +  c1 = sorted_attr_string ( >>> +    TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr1)))); >>> +  c2 = sorted_attr_string ( >>> +    TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr2)))); >>> + >>> +  if (strcmp (c1, c2) != 0) >>> +   ret = true; >>> + >>> +  free (c1); >>> +  free (c2); >>> + >>> +  return ret; >>> +} >>> + >>> +/* If this decl corresponds to a function and has "targetv" attribute, >>> +  append the attribute string to its assembler name.  */ >>> + >>> +void >>> +version_assembler_name (const tree decl) >>> +{ >>> +  tree version_attr; >>> +  const char *orig_name, *version_string, *attr_str; >>> +  char *assembler_name; >>> +  tree assembler_name_tree; >>> + >>> +  if (TREE_CODE (decl) != FUNCTION_DECL >>> +    || DECL_ASSEMBLER_NAME_SET_P (decl) >>> +    || !DECL_FUNCTION_VERSIONED (decl)) >>> +   return; >>> + >>> +  if (DECL_DECLARED_INLINE_P (decl) >>> +    &&lookup_attribute ("gnu_inline", >>> +             DECL_ATTRIBUTES (decl))) >>> +   error_at (DECL_SOURCE_LOCATION (decl), >>> +       "Function versions cannot be marked as gnu_inline," >>> +       " bodies have to be generated\n"); >>> + >>> +  if (DECL_VIRTUAL_P (decl) >>> +    || DECL_VINDEX (decl)) >>> +   error_at (DECL_SOURCE_LOCATION (decl), >>> +       "Virtual function versioning not supported\n"); >>> + >>> +  version_attr = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl)); >>> +  /* targetv attribute string is NULL for default functions.  */ >>> +  if (version_attr == NULL_TREE) >>> +   return; >>> + >>> +  orig_name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)); >>> +  version_string >>> +   = TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (version_attr))); >>> + >>> +  attr_str = sorted_attr_string (version_string); >>> +  assembler_name = (char *) xmalloc (strlen (orig_name) >>> +                   + strlen (attr_str) + 2); >>> + >>> +  sprintf (assembler_name, "%s.%s", orig_name, attr_str); >>> +  if (dump_file) >>> +   fprintf (dump_file, "Assembler name set to %s for function version %s\n", >>> +       assembler_name, IDENTIFIER_POINTER (DECL_NAME (decl))); >>> +  assembler_name_tree = get_identifier (assembler_name); >>> +  SET_DECL_ASSEMBLER_NAME (decl, assembler_name_tree); >>> +} >>> + >>> +/* Returns true if decl is multi-versioned and DECL is the default function, >>> +  that is it is not tagged with "targetv" attribute.  */ >>> + >>> +bool >>> +is_default_function (const tree decl) >>> +{ >>> +  return (TREE_CODE (decl) == FUNCTION_DECL >>> +     && DECL_FUNCTION_VERSIONED (decl) >>> +     && (lookup_attribute ("targetv", DECL_ATTRIBUTES (decl)) >>> +       == NULL_TREE)); >>> +} >>> + >>> +/* For function decl DECL, find the version_function struct in the >>> +  decl_version_htab.  */ >>> + >>> +static version_function * >>> +find_function_version (const tree decl) >>> +{ >>> +  void *slot; >>> + >>> +  if (!DECL_FUNCTION_VERSIONED (decl)) >>> +   return NULL; >>> + >>> +  if (!decl_version_htab) >>> +   return NULL; >>> + >>> +  slot = htab_find_with_hash (decl_version_htab, decl, >>> +                htab_hash_pointer (decl)); >>> + >>> +  if (slot != NULL) >>> +   return (version_function *)slot; >>> + >>> +  return NULL; >>> +} >>> + >>> +/* Record DECL as a function version by creating a version_function struct >>> +  for it and storing it in the hashtable.  */ >>> + >>> +static version_function * >>> +add_function_version (const tree decl) >>> +{ >>> +  void **slot; >>> +  version_function *v; >>> + >>> +  if (!DECL_FUNCTION_VERSIONED (decl)) >>> +   return NULL; >>> + >>> +  create_decl_version_htab (); >>> + >>> +  slot = htab_find_slot_with_hash (decl_version_htab, (const void_p)decl, >>> +                  htab_hash_pointer ((const void_p)decl), >>> +                  INSERT); >>> + >>> +  if (*slot != NULL) >>> +   return (version_function *)*slot; >>> + >>> +  v = new_version_function (decl); >>> +  *slot = v; >>> + >>> +  return v; >>> +} >>> + >>> +/* Push V into VEC only if it is not already present.  */ >>> + >>> +static void >>> +push_function_version (version_function *v, VEC (void_p, heap) *vec) >>> +{ >>> +  int ix; >>> +  void_p ele; >>> +  for (ix = 0; VEC_iterate (void_p, vec, ix, ele); ++ix) >>> +   { >>> +    if (ele == (void_p)v) >>> +     return; >>> +   } >>> + >>> +  VEC_safe_push (void_p, heap, vec, (void*)v); >>> +} >>> + >>> +/* Mark DECL as deleted.  This is called by the front-end when a duplicate >>> +  decl is merged with the original decl and the duplicate decl is deleted. >>> +  This function marks the duplicate_decl as invalid.  Called by >>> +  duplicate_decls in cp/decl.c.  */ >>> + >>> +void >>> +mark_delete_decl_version (const tree decl) >>> +{ >>> +  version_function *decl_v; >>> + >>> +  decl_v = find_function_version (decl); >>> + >>> +  if (decl_v == NULL) >>> +   return; >>> + >>> +  decl_v->is_deleted = true; >>> + >>> +  if (is_default_function (decl) >>> +    && decl_v->versions != NULL) >>> +   { >>> +    VEC_truncate (void_p, decl_v->versions, 0); >>> +    VEC_free (void_p, heap, decl_v->versions); >>> +   } >>> +} >>> + >>> +/* Mark DECL1 and DECL2 to be function versions in the same group.  One >>> +  of DECL1 and DECL2 must be the default, otherwise this function does >>> +  nothing.  This function aggregates the versions.  */ >>> + >>> +int >>> +group_function_versions (const tree decl1, const tree decl2) >>> +{ >>> +  tree default_decl, version_decl; >>> +  version_function *default_v, *version_v; >>> + >>> +  gcc_assert (DECL_FUNCTION_VERSIONED (decl1) >>> +       && DECL_FUNCTION_VERSIONED (decl2)); >>> + >>> +  /* The version decls are added only to the default decl.  */ >>> +  if (!is_default_function (decl1) >>> +    && !is_default_function (decl2)) >>> +   return 0; >>> + >>> +  /* This can happen with duplicate declarations.  Just ignore.  */ >>> +  if (is_default_function (decl1) >>> +    && is_default_function (decl2)) >>> +   return 0; >>> + >>> +  default_decl = (is_default_function (decl1)) ? decl1 : decl2; >>> +  version_decl = (default_decl == decl1) ? decl2 : decl1; >>> + >>> +  gcc_assert (default_decl != version_decl); >>> +  create_decl_version_htab (); >>> + >>> +  /* If the version function is found, it has been added.  */ >>> +  if (find_function_version (version_decl)) >>> +   return 0; >>> + >>> +  default_v = add_function_version (default_decl); >>> +  version_v = add_function_version (version_decl); >>> + >>> +  if (default_v->versions == NULL) >>> +   default_v->versions = VEC_alloc (void_p, heap, 1); >>> + >>> +  push_function_version (version_v, default_v->versions); >>> +  return 0; >>> +} >>> + >>> +/* Makes a function attribute of the form NAME(ARG_NAME) and chains >>> +  it to CHAIN.  */ >>> + >>> +static tree >>> +make_attribute (const char *name, const char *arg_name, tree chain) >>> +{ >>> +  tree attr_name; >>> +  tree attr_arg_name; >>> +  tree attr_args; >>> +  tree attr; >>> + >>> +  attr_name = get_identifier (name); >>> +  attr_arg_name = build_string (strlen (arg_name), arg_name); >>> +  attr_args = tree_cons (NULL_TREE, attr_arg_name, NULL_TREE); >>> +  attr = tree_cons (attr_name, attr_args, chain); >>> +  return attr; >>> +} >>> + >>> +/* Return a new name by appending SUFFIX to the DECL name.  If >>> +  make_unique is true, append the full path name.  */ >>> + >>> +static char * >>> +make_name (tree decl, const char *suffix, bool make_unique) >>> +{ >>> +  char *global_var_name; >>> +  int name_len; >>> +  const char *name; >>> +  const char *unique_name = NULL; >>> + >>> +  name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)); >>> + >>> +  /* Get a unique name that can be used globally without any chances >>> +   of collision at link time.  */ >>> +  if (make_unique) >>> +   unique_name = IDENTIFIER_POINTER (get_file_function_name ("\0")); >>> + >>> +  name_len = strlen (name) + strlen (suffix) + 2; >>> + >>> +  if (make_unique) >>> +   name_len += strlen (unique_name) + 1; >>> +  global_var_name = (char *) xmalloc (name_len); >>> + >>> +  /* Use '.' to concatenate names as it is demangler friendly.  */ >>> +  if (make_unique) >>> +    snprintf (global_var_name, name_len, "%s.%s.%s", name, >>> +        unique_name, suffix); >>> +  else >>> +    snprintf (global_var_name, name_len, "%s.%s", name, suffix); >>> + >>> +  return global_var_name; >>> +} >>> + >>> +/* Make the resolver function decl for ifunc (IFUNC_DECL) to dispatch >>> +  the versions of multi-versioned function DEFAULT_DECL.  Create and >>> +  empty basic block in the resolver and store the pointer in >>> +  EMPTY_BB.  Return the decl of the resolver function.  */ >>> + >>> +static tree >>> +make_ifunc_resolver_func (const tree default_decl, >>> +             const tree ifunc_decl, >>> +             basic_block *empty_bb) >>> +{ >>> +  char *resolver_name; >>> +  tree decl, type, decl_name, t; >>> +  basic_block new_bb; >>> +  tree old_current_function_decl; >>> +  bool make_unique = false; >>> + >>> +  /* IFUNC's have to be globally visible.  So, if the default_decl is >>> +   not, then the name of the IFUNC should be made unique.  */ >>> +  if (TREE_PUBLIC (default_decl) == 0) >>> +   make_unique = true; >>> + >>> +  /* Append the filename to the resolver function if the versions are >>> +   not externally visible.  This is because the resolver function has >>> +   to be externally visible for the loader to find it.  So, appending >>> +   the filename will prevent conflicts with a resolver function from >>> +   another module which is based on the same version name.  */ >>> +  resolver_name = make_name (default_decl, "resolver", make_unique); >>> + >>> +  /* The resolver function should return a (void *). */ >>> +  type = build_function_type_list (ptr_type_node, NULL_TREE); >>> + >>> +  decl = build_fn_decl (resolver_name, type); >>> +  decl_name = get_identifier (resolver_name); >>> +  SET_DECL_ASSEMBLER_NAME (decl, decl_name); >>> + >>> +  DECL_NAME (decl) = decl_name; >>> +  TREE_USED (decl) = TREE_USED (default_decl); >>> +  DECL_ARTIFICIAL (decl) = 1; >>> +  DECL_IGNORED_P (decl) = 0; >>> +  /* IFUNC resolvers have to be externally visible.  */ >>> +  TREE_PUBLIC (decl) = 1; >>> +  DECL_UNINLINABLE (decl) = 1; >>> + >>> +  DECL_EXTERNAL (decl) = DECL_EXTERNAL (default_decl); >>> +  DECL_EXTERNAL (ifunc_decl) = 0; >>> + >>> +  DECL_CONTEXT (decl) = NULL_TREE; >>> +  DECL_INITIAL (decl) = make_node (BLOCK); >>> +  DECL_STATIC_CONSTRUCTOR (decl) = 0; >>> +  TREE_READONLY (decl) = 0; >>> +  DECL_PURE_P (decl) = 0; >>> +  DECL_COMDAT (decl) = DECL_COMDAT (default_decl); >>> +  if (DECL_COMDAT_GROUP (default_decl)) >>> +   { >>> +    make_decl_one_only (decl, DECL_COMDAT_GROUP (default_decl)); >>> +   } >>> +  /* Build result decl and add to function_decl. */ >>> +  t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE, ptr_type_node); >>> +  DECL_ARTIFICIAL (t) = 1; >>> +  DECL_IGNORED_P (t) = 1; >>> +  DECL_RESULT (decl) = t; >>> + >>> +  gimplify_function_tree (decl); >>> +  old_current_function_decl = current_function_decl; >>> +  push_cfun (DECL_STRUCT_FUNCTION (decl)); >>> +  current_function_decl = decl; >>> +  init_empty_tree_cfg_for_function (DECL_STRUCT_FUNCTION (decl)); >>> +  cfun->curr_properties |= >>> +   (PROP_gimple_lcf | PROP_gimple_leh | PROP_cfg | PROP_referenced_vars | >>> +   PROP_ssa); >>> +  new_bb = create_empty_bb (ENTRY_BLOCK_PTR); >>> +  make_edge (ENTRY_BLOCK_PTR, new_bb, EDGE_FALLTHRU); >>> +  make_edge (new_bb, EXIT_BLOCK_PTR, 0); >>> +  *empty_bb = new_bb; >>> + >>> +  cgraph_add_new_function (decl, true); >>> +  cgraph_call_function_insertion_hooks (cgraph_get_create_node (decl)); >>> +  cgraph_analyze_function (cgraph_get_create_node (decl)); >>> +  cgraph_mark_needed_node (cgraph_get_create_node (decl)); >>> + >>> +  if (DECL_COMDAT_GROUP (default_decl)) >>> +   { >>> +    gcc_assert (cgraph_get_node (default_decl)); >>> +    cgraph_add_to_same_comdat_group (cgraph_get_node (decl), >>> +                    cgraph_get_node (default_decl)); >>> +   } >>> + >>> +  pop_cfun (); >>> +  current_function_decl = old_current_function_decl; >>> + >>> +  gcc_assert (ifunc_decl != NULL); >>> +  DECL_ATTRIBUTES (ifunc_decl) >>> +   = make_attribute ("ifunc", resolver_name, DECL_ATTRIBUTES (ifunc_decl)); >>> +  assemble_alias (ifunc_decl, get_identifier (resolver_name)); >>> +  return decl; >>> +} >>> + >>> +/* Make and ifunc declaration for the multi-versioned function DECL.  Calls to >>> +  DECL function will be replaced with calls to the ifunc.  Return the decl >>> +  of the ifunc created.  */ >>> + >>> +static tree >>> +make_ifunc_func (const tree decl) >>> +{ >>> +  tree ifunc_decl; >>> +  char *ifunc_name, *resolver_name; >>> +  tree fn_type, ifunc_type; >>> +  bool make_unique = false; >>> + >>> +  if (TREE_PUBLIC (decl) == 0) >>> +   make_unique = true; >>> + >>> +  ifunc_name = make_name (decl, "ifunc", make_unique); >>> +  resolver_name = make_name (decl, "resolver", make_unique); >>> +  gcc_assert (resolver_name); >>> + >>> +  fn_type = TREE_TYPE (decl); >>> +  ifunc_type = build_function_type (TREE_TYPE (fn_type), >>> +                  TYPE_ARG_TYPES (fn_type)); >>> + >>> +  ifunc_decl = build_fn_decl (ifunc_name, ifunc_type); >>> +  TREE_USED (ifunc_decl) = 1; >>> +  DECL_CONTEXT (ifunc_decl) = NULL_TREE; >>> +  DECL_INITIAL (ifunc_decl) = error_mark_node; >>> +  DECL_ARTIFICIAL (ifunc_decl) = 1; >>> +  /* Mark this ifunc as external, the resolver will flip it again if >>> +   it gets generated.  */ >>> +  DECL_EXTERNAL (ifunc_decl) = 1; >>> +  /* IFUNCs have to be externally visible.  */ >>> +  TREE_PUBLIC (ifunc_decl) = 1; >>> + >>> +  return ifunc_decl; >>> +} >>> + >>> +/* For multi-versioned function decl, which should also be the default, >>> +  return the decl of the ifunc resolver, create it if it does not >>> +  exist.  */ >>> + >>> +tree >>> +get_ifunc_for_version (const tree decl) >>> +{ >>> +  version_function *decl_v; >>> +  int ix; >>> +  void_p ele; >>> + >>> +  /* DECL has to be the default version, otherwise it is missing and >>> +   that is not allowed.  */ >>> +  if (!is_default_function (decl)) >>> +   { >>> +    error_at (DECL_SOURCE_LOCATION (decl), "Default version not found"); >>> +    return decl; >>> +   } >>> + >>> +  decl_v = find_function_version (decl); >>> +  gcc_assert (decl_v != NULL); >>> +  if (decl_v->ifunc_decl == NULL) >>> +   { >>> +    tree ifunc_decl; >>> +    ifunc_decl = make_ifunc_func (decl); >>> +    decl_v->ifunc_decl = ifunc_decl; >>> +   } >>> + >>> +  if (cgraph_get_node (decl)) >>> +   cgraph_mark_needed_node (cgraph_get_node (decl)); >>> + >>> +  for (ix = 0; VEC_iterate (void_p, decl_v->versions, ix, ele); ++ix) >>> +   { >>> +    version_function *v = (version_function *) ele; >>> +    gcc_assert (v->decl != NULL); >>> +    if (cgraph_get_node (v->decl)) >>> +    cgraph_mark_needed_node (cgraph_get_node (v->decl)); >>> +   } >>> + >>> +  return decl_v->ifunc_decl; >>> +} >>> + >>> +/* Generate the dispatching code to dispatch multi-versioned function >>> +  DECL.  Make a new function decl for dispatching and call the target >>> +  hook to process the "targetv" attributes and provide the code to >>> +  dispatch the right function at run-time.  */ >>> + >>> +static tree >>> +make_ifunc_resolver_for_version (const tree decl) >>> +{ >>> +  version_function *decl_v; >>> +  tree ifunc_resolver_decl, ifunc_decl; >>> +  basic_block empty_bb; >>> +  int ix; >>> +  void_p ele; >>> +  VEC (tree, heap) *fn_ver_vec = NULL; >>> + >>> +  gcc_assert (is_default_function (decl)); >>> + >>> +  decl_v = find_function_version (decl); >>> +  gcc_assert (decl_v != NULL); >>> + >>> +  if (decl_v->ifunc_resolver_decl != NULL) >>> +   return decl_v->ifunc_resolver_decl; >>> + >>> +  ifunc_decl = decl_v->ifunc_decl; >>> + >>> +  if (ifunc_decl == NULL) >>> +   ifunc_decl = decl_v->ifunc_decl = make_ifunc_func (decl); >>> + >>> +  ifunc_resolver_decl = make_ifunc_resolver_func (decl, ifunc_decl, >>> +                         &empty_bb); >>> + >>> +  fn_ver_vec = VEC_alloc (tree, heap, 2); >>> +  VEC_safe_push (tree, heap, fn_ver_vec, decl); >>> + >>> +  for (ix = 0; VEC_iterate (void_p, decl_v->versions, ix, ele); ++ix) >>> +   { >>> +    version_function *v = (version_function *) ele; >>> +    gcc_assert (v->decl != NULL); >>> +    /* Check for virtual functions here again, as by this time it should >>> +     have been determined if this function needs a vtable index or >>> +     not.  This happens for methods in derived classes that override >>> +     virtual methods in base classes but are not explicitly marked as >>> +     virtual.  */ >>> +    if (DECL_VINDEX (v->decl)) >>> +     error_at (DECL_SOURCE_LOCATION (v->decl), >>> +         "Virtual function versioning not supported\n"); >>> +    if (!v->is_deleted) >>> +    VEC_safe_push (tree, heap, fn_ver_vec, v->decl); >>> +   } >>> + >>> +  gcc_assert (targetm.dispatch_version); >>> +  targetm.dispatch_version (ifunc_resolver_decl, fn_ver_vec, &empty_bb); >>> +  decl_v->ifunc_resolver_decl = ifunc_resolver_decl; >>> + >>> +  return ifunc_resolver_decl; >>> +} >>> + >>> +/* Main entry point to pass_dispatch_versions. For multi-versioned functions, >>> +  generate the dispatching code.  */ >>> + >>> +static unsigned int >>> +do_dispatch_versions (void) >>> +{ >>> +  /* A new pass for generating dispatch code for multi-versioned functions. >>> +   Other forms of dispatch can be added when ifunc support is not available >>> +   like just calling the function directly after checking for target type. >>> +   Currently, dispatching is done through IFUNC.  This pass will become >>> +   more meaningful when other dispatch mechanisms are added.  */ >>> + >>> +  /* Cloning a function to produce more versions will happen here when the >>> +   user requests that via the targetv attribute. For example, >>> +   int foo () __attribute__ ((targetv(("arch=core2"), ("arch=corei7")))); >>> +   means that the user wants the same body of foo to be versioned for core2 >>> +   and corei7.  In that case, this function will be cloned during this >>> +   pass.  */ >>> + >>> +  if (DECL_FUNCTION_VERSIONED (current_function_decl) >>> +    && is_default_function (current_function_decl)) >>> +   { >>> +    tree decl = make_ifunc_resolver_for_version (current_function_decl); >>> +    if (dump_file && decl) >>> +    dump_function_to_file (decl, dump_file, TDF_BLOCKS); >>> +   } >>> +  return 0; >>> +} >>> + >>> +static  bool >>> +gate_dispatch_versions (void) >>> +{ >>> +  return true; >>> +} >>> + >>> +/* A pass to generate the dispatch code to execute the appropriate version >>> +  of a multi-versioned function at run-time.  */ >>> + >>> +struct gimple_opt_pass pass_dispatch_versions = >>> +{ >>> + { >>> +  GIMPLE_PASS, >>> +  "dispatch_multiversion_functions",   /* name */ >>> +  gate_dispatch_versions,        /* gate */ >>> +  do_dispatch_versions,             /* execute */ >>> +  NULL,                     /* sub */ >>> +  NULL,                     /* next */ >>> +  0,                  /* static_pass_number */ >>> +  TV_MULTIVERSION_DISPATCH,       /* tv_id */ >>> +  PROP_cfg,               /* properties_required */ >>> +  PROP_cfg,               /* properties_provided */ >>> +  0,                  /* properties_destroyed */ >>> +  0,                  /* todo_flags_start */ >>> +  TODO_dump_func |           /* todo_flags_finish */ >>> +  TODO_cleanup_cfg | TODO_dump_cgraph >>> + } >>> +}; >>> Index: cgraphunit.c >>> =================================================================== >>> --- cgraphunit.c     (revision 184971) >>> +++ cgraphunit.c     (working copy) >>> @@ -141,6 +141,7 @@ along with GCC; see the file COPYING3.  If not see >>>  #include "ipa-inline.h" >>>  #include "ipa-utils.h" >>>  #include "lto-streamer.h" >>> +#include "multiversion.h" >>> >>>  static void cgraph_expand_all_functions (void); >>>  static void cgraph_mark_functions_to_output (void); >>> @@ -343,6 +344,13 @@ cgraph_finalize_function (tree decl, bool nested) >>>    node->local.redefined_extern_inline = true; >>>   } >>> >>> +  /* If this is a function version and not the default, change the >>> +   assembler name of this function.  The DECL names of function >>> +   versions are the same, only the assembler names are made unique. >>> +   The assembler name is changed by appending the string from >>> +   the "targetv" attribute.  */ >>> +  version_assembler_name (decl); >>> + >>>  notice_global_symbol (decl); >>>  node->local.finalized = true; >>>  node->lowered = DECL_STRUCT_FUNCTION (decl)->cfg != NULL; >>> Index: multiversion.h >>> =================================================================== >>> --- multiversion.h    (revision 0) >>> +++ multiversion.h    (revision 0) >>> @@ -0,0 +1,52 @@ >>> +/* Function Multiversioning. >>> +  Copyright (C) 2012 Free Software Foundation, Inc. >>> +  Contributed by Sriraman Tallam (tmsriram@google.com) >>> + >>> +This file is part of GCC. >>> + >>> +GCC is free software; you can redistribute it and/or modify it under >>> +the terms of the GNU General Public License as published by the Free >>> +Software Foundation; either version 3, or (at your option) any later >>> +version. >>> + >>> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY >>> +WARRANTY; without even the implied warranty of MERCHANTABILITY or >>> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License >>> +for more details. >>> + >>> +You should have received a copy of the GNU General Public License >>> +along with GCC; see the file COPYING3.  If not see >>> +<http://www.gnu.org/licenses/>. */ >>> + >>> +/* This is the header file which provides the functions to keep track >>> +  of functions that are multi-versioned and to generate the dispatch >>> +  code to call the right version at run-time.  */ >>> + >>> +#ifndef GCC_MULTIVERSION_H >>> +#define GCC_MULTIVERION_H >>> + >>> +#include "tree.h" >>> + >>> +/* Mark DECL1 and DECL2 as function versions.  */ >>> +int group_function_versions (const tree decl1, const tree decl2); >>> + >>> +/* Mark DECL as deleted and no longer a version.  */ >>> +void mark_delete_decl_version (const tree decl); >>> + >>> +/* Returns true if DECL is the default version to be executed if all >>> +  other versions are inappropriate at run-time.  */ >>> +bool is_default_function (const tree decl); >>> + >>> +/* Gets the IFUNC dispatcher for this multi-versioned function DECL. DECL >>> +  must be the default function in the multi-versioned group.  */ >>> +tree get_ifunc_for_version (const tree decl); >>> + >>> +/* Returns true when only one of DECL1 and DECL2 is marked with "targetv" >>> +  or if the "targetv" attribute strings of  DECL1 and DECL2 dont match.  */ >>> +bool has_different_version_attributes (const tree decl1, const tree decl2); >>> + >>> +/* If DECL is a function version and not the default version, the assembler >>> +  name of DECL is changed to include the attribute string to keep the >>> +  name unambiguous.  */ >>> +void version_assembler_name (const tree decl); >>> +#endif >>> Index: cp/class.c >>> =================================================================== >>> --- cp/class.c  (revision 184971) >>> +++ cp/class.c  (working copy) >>> @@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.  If not see >>>  #include "tree-dump.h" >>>  #include "splay-tree.h" >>>  #include "pointer-set.h" >>> +#include "multiversion.h" >>> >>>  /* The number of nested classes being processed.  If we are not in the >>>   scope of any class, this is zero.  */ >>> @@ -1092,7 +1093,20 @@ add_method (tree type, tree method, tree using_dec >>>        || same_type_p (TREE_TYPE (fn_type), >>>                TREE_TYPE (method_type)))) >>>     { >>> -     if (using_decl) >>> +     /* For function versions, their parms and types match >>> +       but they are not duplicates.  Record function versions >>> +       as and when they are found.  */ >>> +     if (TREE_CODE (fn) == FUNCTION_DECL >>> +       && TREE_CODE (method) == FUNCTION_DECL >>> +       && (DECL_FUNCTION_VERSIONED (fn) >>> +         || DECL_FUNCTION_VERSIONED (method))) >>> +      { >>> +       DECL_FUNCTION_VERSIONED (fn) = 1; >>> +       DECL_FUNCTION_VERSIONED (method) = 1; >>> +       group_function_versions (fn, method); >>> +       continue; >>> +      } >>> +     else if (using_decl) >>>       { >>>        if (DECL_CONTEXT (fn) == type) >>>         /* Defer to the local function.  */ >>> @@ -1150,6 +1164,13 @@ add_method (tree type, tree method, tree using_dec >>>  else >>>   /* Replace the current slot.  */ >>>   VEC_replace (tree, method_vec, slot, overload); >>> + >>> +  /* Change the assembler name of method here if it has "targetv" >>> +   attributes.  Since all versions have the same mangled name, >>> +   their assembler name is changed by appending the string from >>> +   the "targetv" attribute. */ >>> +  version_assembler_name (method); >>> + >>>  return true; >>>  } >>> >>> @@ -6890,8 +6911,11 @@ resolve_address_of_overloaded_function (tree targe >>>      if (DECL_ANTICIPATED (fn)) >>>       continue; >>> >>> -     /* See if there's a match.  */ >>> -     if (same_type_p (target_fn_type, static_fn_type (fn))) >>> +     /* See if there's a match.  For functions that are multi-versioned >>> +       match it to the default function.  */ >>> +     if (same_type_p (target_fn_type, static_fn_type (fn)) >>> +       && (!DECL_FUNCTION_VERSIONED (fn) >>> +         || is_default_function (fn))) >>>       matches = tree_cons (fn, NULL_TREE, matches); >>>     } >>>   } >>> @@ -7053,6 +7077,21 @@ resolve_address_of_overloaded_function (tree targe >>>    perform_or_defer_access_check (access_path, fn, fn); >>>   } >>> >>> +  /* If a pointer to a function that is multi-versioned is requested, the >>> +   pointer to the dispatcher function is returned instead.  This works >>> +   well because indirectly calling the function will dispatch the right >>> +   function version at run-time. Also, the function address is kept >>> +   unique.  */ >>> +  if (DECL_FUNCTION_VERSIONED (fn) >>> +    && is_default_function (fn)) >>> +   { >>> +    tree ifunc_decl; >>> +    ifunc_decl = get_ifunc_for_version (fn); >>> +    gcc_assert (ifunc_decl != NULL); >>> +    mark_used (fn); >>> +    return build_fold_addr_expr (ifunc_decl); >>> +   } >>> + >>>  if (TYPE_PTRFN_P (target_type) || TYPE_PTRMEMFUNC_P (target_type)) >>>   return cp_build_addr_expr (fn, flags); >>>  else >>> Index: cp/decl.c >>> =================================================================== >>> --- cp/decl.c  (revision 184971) >>> +++ cp/decl.c  (working copy) >>> @@ -54,6 +54,7 @@ along with GCC; see the file COPYING3.  If not see >>>  #include "pointer-set.h" >>>  #include "splay-tree.h" >>>  #include "plugin.h" >>> +#include "multiversion.h" >>> >>>  /* Possible cases of bad specifiers type used by bad_specifiers. */ >>>  enum bad_spec_place { >>> @@ -972,6 +973,23 @@ decls_match (tree newdecl, tree olddecl) >>>    if (t1 != t2) >>>     return 0; >>> >>> +    /* The decls dont match if they correspond to two different versions >>> +     of the same function.  */ >>> +    if (compparms (p1, p2) >>> +     && same_type_p (TREE_TYPE (f1), TREE_TYPE (f2)) >>> +     && (DECL_FUNCTION_VERSIONED (newdecl) >>> +       || DECL_FUNCTION_VERSIONED (olddecl)) >>> +     && has_different_version_attributes (newdecl, olddecl)) >>> +    { >>> +     /* One of the decls could be the default without the "targetv" >>> +       attribute. Set it to be a versioned function here.  */ >>> +     DECL_FUNCTION_VERSIONED (newdecl) = 1; >>> +     DECL_FUNCTION_VERSIONED (olddecl) = 1; >>> +     /* Accumulate all the versions of a function.  */ >>> +     group_function_versions (olddecl, newdecl); >>> +     return 0; >>> +    } >>> + >>>    if (CP_DECL_CONTEXT (newdecl) != CP_DECL_CONTEXT (olddecl) >>>      && ! (DECL_EXTERN_C_P (newdecl) >>>         && DECL_EXTERN_C_P (olddecl))) >>> @@ -1482,7 +1500,11 @@ duplicate_decls (tree newdecl, tree olddecl, bool >>>        error ("previous declaration %q+#D here", olddecl); >>>        return NULL_TREE; >>>       } >>> -     else if (compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)), >>> +     /* For function versions, params and types match, but they >>> +       are not ambiguous.  */ >>> +     else if ((!DECL_FUNCTION_VERSIONED (newdecl) >>> +          && !DECL_FUNCTION_VERSIONED (olddecl)) >>> +          && compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)), >>>                TYPE_ARG_TYPES (TREE_TYPE (olddecl)))) >>>       { >>>        error ("new declaration %q#D", newdecl); >>> @@ -2250,6 +2272,16 @@ duplicate_decls (tree newdecl, tree olddecl, bool >>>  else if (DECL_PRESERVE_P (newdecl)) >>>   DECL_PRESERVE_P (olddecl) = 1; >>> >>> +  /* If the olddecl is a version, so is the newdecl.  */ >>> +  if (TREE_CODE (newdecl) == FUNCTION_DECL >>> +    && DECL_FUNCTION_VERSIONED (olddecl)) >>> +   { >>> +    DECL_FUNCTION_VERSIONED (newdecl) = 1; >>> +    /* Record that newdecl is not a valid version and has >>> +     been deleted.  */ >>> +    mark_delete_decl_version (newdecl); >>> +   } >>> + >>>  if (TREE_CODE (newdecl) == FUNCTION_DECL) >>>   { >>>    int function_size; >>> @@ -4512,6 +4544,10 @@ start_decl (const cp_declarator *declarator, >>>  /* Enter this declaration into the symbol table.  */ >>>  decl = maybe_push_decl (decl); >>> >>> +  /* If this decl is a function version and not the default, its assembler >>> +   name has to be changed.  */ >>> +  version_assembler_name (decl); >>> + >>>  if (processing_template_decl) >>>   decl = push_template_decl (decl); >>>  if (decl == error_mark_node) >>> @@ -13019,6 +13055,10 @@ start_function (cp_decl_specifier_seq *declspecs, >>>   gcc_assert (same_type_p (TREE_TYPE (TREE_TYPE (decl1)), >>>               integer_type_node)); >>> >>> +  /* If this decl is a function version and not the default, its assembler >>> +   name has to be changed.  */ >>> +  version_assembler_name (decl1); >>> + >>>  start_preparsed_function (decl1, attrs, /*flags=*/SF_DEFAULT); >>> >>>  return 1; >>> @@ -13960,6 +14000,11 @@ cxx_comdat_group (tree decl) >>>       break; >>>     } >>>    name = DECL_ASSEMBLER_NAME (decl); >>> +    if (TREE_CODE (decl) == FUNCTION_DECL >>> +     && DECL_FUNCTION_VERSIONED (decl)) >>> +    name = DECL_NAME (decl); >>> +    else >>> +     name = DECL_ASSEMBLER_NAME (decl); >>>   } >>> >>>  return name; >>> Index: cp/semantics.c >>> =================================================================== >>> --- cp/semantics.c    (revision 184971) >>> +++ cp/semantics.c    (working copy) >>> @@ -3783,8 +3783,11 @@ expand_or_defer_fn_1 (tree fn) >>>    /* If the user wants us to keep all inline functions, then mark >>>     this function as needed so that finish_file will make sure to >>>     output it later.  Similarly, all dllexport'd functions must >>> -     be emitted; there may be callers in other DLLs.  */ >>> -    if ((flag_keep_inline_functions >>> +     be emitted; there may be callers in other DLLs. >>> +     Also, mark this function as needed if it is marked inline but >>> +     is a multi-versioned function.  */ >>> +    if (((flag_keep_inline_functions >>> +      || DECL_FUNCTION_VERSIONED (fn)) >>>      && DECL_DECLARED_INLINE_P (fn) >>>      && !DECL_REALLY_EXTERN (fn)) >>>      || (flag_keep_inline_dllexport >>> Index: cp/decl2.c >>> =================================================================== >>> --- cp/decl2.c  (revision 184971) >>> +++ cp/decl2.c  (working copy) >>> @@ -53,6 +53,7 @@ along with GCC; see the file COPYING3.  If not see >>>  #include "splay-tree.h" >>>  #include "langhooks.h" >>>  #include "c-family/c-ada-spec.h" >>> +#include "multiversion.h" >>> >>>  extern cpp_reader *parse_in; >>> >>> @@ -674,9 +675,13 @@ check_classfn (tree ctype, tree function, tree tem >>>      if (is_template != (TREE_CODE (fndecl) == TEMPLATE_DECL)) >>>       continue; >>> >>> +     /* While finding a match, same types and params are not enough >>> +       if the function is versioned.  Also check version ("targetv") >>> +       attributes.  */ >>>      if (same_type_p (TREE_TYPE (TREE_TYPE (function)), >>>              TREE_TYPE (TREE_TYPE (fndecl))) >>>        && compparms (p1, p2) >>> +       && !has_different_version_attributes (function, fndecl) >>>        && (!is_template >>>          || comp_template_parms (template_parms, >>>                      DECL_TEMPLATE_PARMS (fndecl))) >>> Index: cp/call.c >>> =================================================================== >>> --- cp/call.c  (revision 184971) >>> +++ cp/call.c  (working copy) >>> @@ -41,6 +41,7 @@ along with GCC; see the file COPYING3.  If not see >>>  #include "langhooks.h" >>>  #include "c-family/c-objc.h" >>>  #include "timevar.h" >>> +#include "multiversion.h" >>> >>>  /* The various kinds of conversion.  */ >>> >>> @@ -6730,6 +6731,17 @@ build_over_call (struct z_candidate *cand, int fla >>>  if (!already_used) >>>   mark_used (fn); >>> >>> +  /* For a call to a multi-versioned function, the call should actually be to >>> +   the dispatcher.  */ >>> +  if (DECL_FUNCTION_VERSIONED (fn)) >>> +   { >>> +    tree ifunc_decl; >>> +    ifunc_decl = get_ifunc_for_version (fn); >>> +    gcc_assert (ifunc_decl != NULL); >>> +    return build_call_expr_loc_array (UNKNOWN_LOCATION, ifunc_decl, >>> +                    nargs, argarray); >>> +   } >>> + >>>  if (DECL_VINDEX (fn) && (flags & LOOKUP_NONVIRTUAL) == 0) >>>   { >>>    tree t; >>> @@ -7980,6 +7992,30 @@ joust (struct z_candidate *cand1, struct z_candida >>>  size_t i; >>>  size_t len; >>> >>> +  /* For Candidates of a multi-versioned function, the one marked default >>> +   wins.  This is because the default decl is used as key to aggregate >>> +   all the other versions provided for it in multiversion.c.  When >>> +   generating the actual call, the appropriate dispatcher is created >>> +   to call the right function version at run-time.  */ >>> + >>> +  if ((TREE_CODE (cand1->fn) == FUNCTION_DECL >>> +    && DECL_FUNCTION_VERSIONED (cand1->fn)) >>> +    ||(TREE_CODE (cand2->fn) == FUNCTION_DECL >>> +     && DECL_FUNCTION_VERSIONED (cand2->fn))) >>> +   { >>> +    if (is_default_function (cand1->fn)) >>> +    { >>> +      mark_used (cand2->fn); >>> +     return 1; >>> +    } >>> +    if (is_default_function (cand2->fn)) >>> +    { >>> +      mark_used (cand1->fn); >>> +     return -1; >>> +    } >>> +    return 0; >>> +   } >>> + >>>  /* Candidates that involve bad conversions are always worse than those >>>    that don't.  */ >>>  if (cand1->viable > cand2->viable) >>> Index: timevar.def >>> =================================================================== >>> --- timevar.def (revision 184971) >>> +++ timevar.def (working copy) >>> @@ -253,6 +253,7 @@ DEFTIMEVAR (TV_TREE_IFCOMBINE     , "tree if-co >>>  DEFTIMEVAR (TV_TREE_UNINIT      , "uninit var analysis") >>>  DEFTIMEVAR (TV_PLUGIN_INIT      , "plugin initialization") >>>  DEFTIMEVAR (TV_PLUGIN_RUN       , "plugin execution") >>> +DEFTIMEVAR (TV_MULTIVERSION_DISPATCH , "multiversion dispatch") >>> >>>  /* Everything else in rest_of_compilation not included above.  */ >>>  DEFTIMEVAR (TV_EARLY_LOCAL      , "early local passes") >>> Index: varasm.c >>> =================================================================== >>> --- varasm.c   (revision 184971) >>> +++ varasm.c   (working copy) >>> @@ -5755,6 +5755,8 @@ finish_aliases_1 (void) >>>     } >>>    else if (! (p->emitted_diags & ALIAS_DIAG_TO_EXTERN) >>>        && DECL_EXTERNAL (target_decl) >>> +        && (!TREE_CODE (target_decl) == FUNCTION_DECL >>> +          || !DECL_STRUCT_FUNCTION (target_decl)) >>>        /* We use local aliases for C++ thunks to force the tailcall >>>          to bind locally.  This is a hack - to keep it working do >>>          the following (which is not strictly correct).  */ >>> Index: Makefile.in >>> =================================================================== >>> --- Makefile.in (revision 184971) >>> +++ Makefile.in (working copy) >>> @@ -1298,6 +1298,7 @@ OBJS = \ >>>     mcf.o \ >>>     mode-switching.o \ >>>     modulo-sched.o \ >>> +    multiversion.o \ >>>     omega.o \ >>>     omp-low.o \ >>>     optabs.o \ >>> @@ -3030,6 +3031,11 @@ ree.o : ree.c $(CONFIG_H) $(SYSTEM_H) coretypes.h >>>   $(DF_H) $(TIMEVAR_H) tree-pass.h $(RECOG_H) $(EXPR_H) \ >>>   $(REGS_H) $(TREE_H) $(TM_P_H) insn-config.h $(INSN_ATTR_H) $(DIAGNOSTIC_CORE_H) \ >>>   $(TARGET_H) $(OPTABS_H) insn-codes.h rtlhooks-def.h $(PARAMS_H) $(CGRAPH_H) >>> +multiversion.o : multiversion.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \ >>> +  $(TREE_H) langhooks.h $(TREE_INLINE_H) $(FLAGS_H) $(CGRAPH_H) intl.h \ >>> +  $(DIAGNOSTIC_H) $(FIBHEAP_H) $(PARAMS_H) $(TIMEVAR_H) tree-pass.h \ >>> +  $(HASHTAB_H) $(COVERAGE_H) $(GGC_H) $(TREE_FLOW_H) $(RTL_H) $(IPA_PROP_H) \ >>> +  $(BASIC_BLOCK_H) $(TOPLEV_H) $(TREE_DUMP_H) ipa-inline.h >>>  cprop.o : cprop.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \ >>>   $(REGS_H) hard-reg-set.h $(FLAGS_H) insn-config.h $(GGC_H) \ >>>   $(RECOG_H) $(EXPR_H) $(BASIC_BLOCK_H) $(FUNCTION_H) output.h toplev.h $(DIAGNOSTIC_CORE_H) \ >>> Index: passes.c >>> =================================================================== >>> --- passes.c   (revision 184971) >>> +++ passes.c   (working copy) >>> @@ -1190,6 +1190,7 @@ init_optimization_passes (void) >>>  NEXT_PASS (pass_build_cfg); >>>  NEXT_PASS (pass_warn_function_return); >>>  NEXT_PASS (pass_build_cgraph_edges); >>> +  NEXT_PASS (pass_dispatch_versions); >>>  *p = NULL; >>> >>>  /* Interprocedural optimization passes.  */ >>> Index: config/i386/i386.c >>> =================================================================== >>> --- config/i386/i386.c  (revision 184971) >>> +++ config/i386/i386.c  (working copy) >>> @@ -27446,6 +27473,593 @@ ix86_init_mmx_sse_builtins (void) >>>   } >>>  } >>> >>> +/* This adds a condition to the basic_block NEW_BB in function FUNCTION_DECL >>> +  to return a pointer to VERSION_DECL if the outcome of the function >>> +  PREDICATE_DECL is true.  This function will be called during version >>> +  dispatch to decide which function version to execute.  It returns the >>> +  basic block at the end to which more conditions can be added.  */ >>> + >>> +static basic_block >>> +add_condition_to_bb (tree function_decl, tree version_decl, >>> +           basic_block new_bb, tree predicate_decl) >>> +{ >>> +  gimple return_stmt; >>> +  tree convert_expr, result_var; >>> +  gimple convert_stmt; >>> +  gimple call_cond_stmt; >>> +  gimple if_else_stmt; >>> + >>> +  basic_block bb1, bb2, bb3; >>> +  edge e12, e23; >>> + >>> +  tree cond_var; >>> +  gimple_seq gseq; >>> + >>> +  tree old_current_function_decl; >>> + >>> +  old_current_function_decl = current_function_decl; >>> +  push_cfun (DECL_STRUCT_FUNCTION (function_decl)); >>> +  current_function_decl = function_decl; >>> + >>> +  gcc_assert (new_bb != NULL); >>> +  gseq = bb_seq (new_bb); >>> + >>> + >>> +  convert_expr = build1 (CONVERT_EXPR, ptr_type_node, >>> +             build_fold_addr_expr (version_decl)); >>> +  result_var = create_tmp_var (ptr_type_node, NULL); >>> +  convert_stmt = gimple_build_assign (result_var, convert_expr); >>> +  return_stmt = gimple_build_return (result_var); >>> + >>> +  if (predicate_decl == NULL_TREE) >>> +   { >>> +    gimple_seq_add_stmt (&gseq, convert_stmt); >>> +    gimple_seq_add_stmt (&gseq, return_stmt); >>> +    set_bb_seq (new_bb, gseq); >>> +    gimple_set_bb (convert_stmt, new_bb); >>> +    gimple_set_bb (return_stmt, new_bb); >>> +    pop_cfun (); >>> +    current_function_decl = old_current_function_decl; >>> +    return new_bb; >>> +   } >>> + >>> +  cond_var = create_tmp_var (integer_type_node, NULL); >>> +  call_cond_stmt = gimple_build_call (predicate_decl, 0); >>> +  gimple_call_set_lhs (call_cond_stmt, cond_var); >>> + >>> +  gimple_set_block (call_cond_stmt, DECL_INITIAL (function_decl)); >>> +  gimple_set_bb (call_cond_stmt, new_bb); >>> +  gimple_seq_add_stmt (&gseq, call_cond_stmt); >>> + >>> +  if_else_stmt = gimple_build_cond (GT_EXPR, cond_var, >>> +                  integer_zero_node, >>> +                  NULL_TREE, NULL_TREE); >>> +  gimple_set_block (if_else_stmt, DECL_INITIAL (function_decl)); >>> +  gimple_set_bb (if_else_stmt, new_bb); >>> +  gimple_seq_add_stmt (&gseq, if_else_stmt); >>> + >>> +  gimple_seq_add_stmt (&gseq, convert_stmt); >>> +  gimple_seq_add_stmt (&gseq, return_stmt); >>> +  set_bb_seq (new_bb, gseq); >>> + >>> +  bb1 = new_bb; >>> +  e12 = split_block (bb1, if_else_stmt); >>> +  bb2 = e12->dest; >>> +  e12->flags &= ~EDGE_FALLTHRU; >>> +  e12->flags |= EDGE_TRUE_VALUE; >>> + >>> +  e23 = split_block (bb2, return_stmt); >>> + >>> +  gimple_set_bb (convert_stmt, bb2); >>> +  gimple_set_bb (return_stmt, bb2); >>> + >>> +  bb3 = e23->dest; >>> +  make_edge (bb1, bb3, EDGE_FALSE_VALUE); >>> + >>> +  remove_edge (e23); >>> +  make_edge (bb2, EXIT_BLOCK_PTR, 0); >>> + >>> +  rebuild_cgraph_edges (); >>> + >>> +  pop_cfun (); >>> +  current_function_decl = old_current_function_decl; >>> + >>> +  return bb3; >>> +} >>> + >>> +/* This parses the attribute arguments to targetv in DECL and determines >>> +  the right builtin to use to match the platform specification. >>> +  For now, only one target argument ("arch=") is allowed.  */ >>> + >>> +static enum ix86_builtins >>> +get_builtin_code_for_version (tree decl) >>> +{ >>> +  tree attrs; >>> +  struct cl_target_option cur_target; >>> +  tree target_node; >>> +  struct cl_target_option *new_target; >>> +  enum ix86_builtins builtin_code = IX86_BUILTIN_MAX; >>> + >>> +  attrs = lookup_attribute ("targetv", DECL_ATTRIBUTES (decl)); >>> +  gcc_assert (attrs != NULL); >>> + >>> +  cl_target_option_save (&cur_target, &global_options); >>> + >>> +  target_node = ix86_valid_target_attribute_tree >>> +         (TREE_VALUE (TREE_VALUE (attrs))); >>> + >>> +  gcc_assert (target_node); >>> +  new_target = TREE_TARGET_OPTION (target_node); >>> +  gcc_assert (new_target); >>> + >>> +  if (new_target->arch_specified && new_target->arch > 0) >>> +   { >>> +    switch (new_target->arch) >>> +     { >>> +    case 1: >>> +    case 2: >>> +    case 3: >>> +    case 4: >>> +    case 5: >>> +    case 6: >>> +    case 7: >>> +    case 8: >>> +    case 9: >>> +    case 10: >>> +    case 11: >>> +     builtin_code = IX86_BUILTIN_CPU_IS_INTEL; >>> +     break; >>> +    case 12: >>> +     builtin_code = IX86_BUILTIN_CPU_IS_INTEL_CORE2; >>> +     break; >>> +    case 13: >>> +     builtin_code = IX86_BUILTIN_CPU_IS_INTEL_COREI7; >>> +     break; >>> +    case 14: >>> +     builtin_code = IX86_BUILTIN_CPU_IS_INTEL_ATOM; >>> +     break; >>> +    case 15: >>> +    case 16: >>> +    case 17: >>> +    case 18: >>> +    case 19: >>> +    case 20: >>> +    case 21: >>> +     builtin_code = IX86_BUILTIN_CPU_IS_AMD; >>> +     break; >>> +    case 22: >>> +     builtin_code = IX86_BUILTIN_CPU_IS_AMDFAM10H; >>> +     break; >>> +    case 23: >>> +     builtin_code = IX86_BUILTIN_CPU_IS_AMDFAM15H_BDVER1; >>> +     break; >>> +    case 24: >>> +     builtin_code = IX86_BUILTIN_CPU_IS_AMDFAM15H_BDVER2; >>> +     break; >>> +    case 25: /* What is btver1 ? */ >>> +     builtin_code = IX86_BUILTIN_CPU_IS_AMD; >>> +     break; >>> +    } >>> +   } >>> + >>> +  cl_target_option_restore (&global_options, &cur_target); >>> +  if (builtin_code == IX86_BUILTIN_MAX) >>> +    error_at (DECL_SOURCE_LOCATION (decl), >>> +        "No dispatcher found for the versioning attributes"); >>> + >>> +  return builtin_code; >>> +} >>> + >>> +/* This is the target hook to generate the dispatch function for >>> +  multi-versioned functions.  DISPATCH_DECL is the function which will >>> +  contain the dispatch logic.  FNDECLS are the function choices for >>> +  dispatch, and is a tree chain.  EMPTY_BB is the basic block pointer >>> +  in DISPATCH_DECL in which the dispatch code is generated.  */ >>> + >>> +static int >>> +ix86_dispatch_version (tree dispatch_decl, >>> +            void *fndecls_p, >>> +            basic_block *empty_bb) >>> +{ >>> +  tree default_decl; >>> +  gimple ifunc_cpu_init_stmt; >>> +  gimple_seq gseq; >>> +  tree old_current_function_decl; >>> +  int ix; >>> +  tree ele; >>> +  VEC (tree, heap) *fndecls; >>> + >>> +  gcc_assert (dispatch_decl != NULL >>> +       && fndecls_p != NULL >>> +       && empty_bb != NULL); >>> + >>> +  /*fndecls_p is actually a vector.  */ >>> +  fndecls = (VEC (tree, heap) *)fndecls_p; >>> + >>> +  /* Atleast one more version other than the default.  */ >>> +  gcc_assert (VEC_length (tree, fndecls) >= 2); >>> + >>> +  /* The first version in the vector is the default decl.  */ >>> +  default_decl = VEC_index (tree, fndecls, 0); >>> + >>> +  old_current_function_decl = current_function_decl; >>> +  push_cfun (DECL_STRUCT_FUNCTION (dispatch_decl)); >>> +  current_function_decl = dispatch_decl; >>> + >>> +  gseq = bb_seq (*empty_bb); >>> +  ifunc_cpu_init_stmt = gimple_build_call_vec ( >>> +           ix86_builtins [(int) IX86_BUILTIN_CPU_INIT], NULL); >>> +  gimple_seq_add_stmt (&gseq, ifunc_cpu_init_stmt); >>> +  gimple_set_bb (ifunc_cpu_init_stmt, *empty_bb); >>> +  set_bb_seq (*empty_bb, gseq); >>> + >>> +  pop_cfun (); >>> +  current_function_decl = old_current_function_decl; >>> + >>> + >>> +  for (ix = 1; VEC_iterate (tree, fndecls, ix, ele); ++ix) >>> +   { >>> +    tree version_decl = ele; >>> +    /* Get attribute string, parse it and find the right predicate decl. >>> +     The predicate function could be a lengthy combination of many >>> +     features, like arch-type and various isa-variants.  For now, only >>> +     check the arch-type.  */ >>> +    tree predicate_decl = ix86_builtins [ >>> +            get_builtin_code_for_version (version_decl)]; >>> +    *empty_bb = add_condition_to_bb (dispatch_decl, version_decl, *empty_bb, >>> +                    predicate_decl); >>> + >>> +   } >>> +  /* dispatch default version at the end.  */ >>> +  *empty_bb = add_condition_to_bb (dispatch_decl, default_decl, *empty_bb, >>> +                  NULL); >>> +  return 0; >>> +} >>> >>> @@ -38610,6 +39269,12 @@ ix86_autovectorize_vector_sizes (void) >>>  #undef TARGET_BUILD_BUILTIN_VA_LIST >>>  #define TARGET_BUILD_BUILTIN_VA_LIST ix86_build_builtin_va_list >>> >>> +#undef TARGET_DISPATCH_VERSION >>> +#define TARGET_DISPATCH_VERSION ix86_dispatch_version >>> + >>>  #undef TARGET_ENUM_VA_LIST_P >>>  #define TARGET_ENUM_VA_LIST_P ix86_enum_va_list >>> >>> Index: testsuite/g++.dg/mv1.C >>> =================================================================== >>> --- testsuite/g++.dg/mv1.C    (revision 0) >>> +++ testsuite/g++.dg/mv1.C    (revision 0) >>> @@ -0,0 +1,23 @@ >>> +/* Simple test case to check if Multiversioning works.  */ >>> +/* { dg-do run } */ >>> +/* { dg-options "-O2" } */ >>> + >>> +int foo (); >>> +int foo () __attribute__ ((targetv("arch=corei7"))); >>> + >>> +int main () >>> +{ >>> +  int (*p)() = &foo; >>> +  return foo () + (*p)(); >>> +} >>> + >>> +int foo () >>> +{ >>> +  return 0; >>> +} >>> + >>> +int __attribute__ ((targetv("arch=corei7"))) >>> +foo () >>> +{ >>> +  return 0; >>> +} >>> >>> >>> -- >>> This patch is available for review at http://codereview.appspot.com/5752064
Sign in to reply to this message.
On Thu, Apr 26, 2012 at 10:08 PM, Sriraman Tallam <tmsriram@google.com> wrote: > Hi, > > Â I have made the following changes in this new patch which is attached: > > * Use target attribute itself to create function versions. > * Handle any number of ISA names and arch= Â args to target attribute, > generating the right dispatchers. > * Integrate with the CPU runtime detection checked in this week. > * Overload resolution: If the caller's target matches any of the > version function's target, then a direct call to the version is > generated, no need to go through the dispatching. > > Patch also available for review here: > http://codereview.appspot.com/5752064 > Does it work with int foo (); int foo () __attribute__ ((targetv("arch=corei7"))); int (*foo_p) () = foo? Does it support C++? Thanks. -- H.J.
Sign in to reply to this message.
On Fri, Apr 27, 2012 at 6:38 AM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Thu, Apr 26, 2012 at 10:08 PM, Sriraman Tallam <tmsriram@google.com> wrote: >> Hi, >> >> Â I have made the following changes in this new patch which is attached: >> >> * Use target attribute itself to create function versions. >> * Handle any number of ISA names and arch= Â args to target attribute, >> generating the right dispatchers. >> * Integrate with the CPU runtime detection checked in this week. >> * Overload resolution: If the caller's target matches any of the >> version function's target, then a direct call to the version is >> generated, no need to go through the dispatching. >> >> Patch also available for review here: >> http://codereview.appspot.com/5752064 >> > > Does it work with > > int foo (); > int foo () __attribute__ ((targetv("arch=corei7"))); > > int (*foo_p) () = foo? Yes, this will work. foo_p will be the address of the dispatcher function and hence doing (*foo_p)() will call the right version. > > Does it support C++? Partially, no support for virtual function versioning yet. I will add it in the next iteration. Thanks, -Sri. > > Thanks. > > -- > H.J.
Sign in to reply to this message.
On Fri, Apr 27, 2012 at 7:35 AM, Sriraman Tallam <tmsriram@google.com> wrote: > On Fri, Apr 27, 2012 at 6:38 AM, H.J. Lu <hjl.tools@gmail.com> wrote: >> On Thu, Apr 26, 2012 at 10:08 PM, Sriraman Tallam <tmsriram@google.com> wrote: >>> Hi, >>> >>> Â I have made the following changes in this new patch which is attached: >>> >>> * Use target attribute itself to create function versions. >>> * Handle any number of ISA names and arch= Â args to target attribute, >>> generating the right dispatchers. >>> * Integrate with the CPU runtime detection checked in this week. >>> * Overload resolution: If the caller's target matches any of the >>> version function's target, then a direct call to the version is >>> generated, no need to go through the dispatching. >>> >>> Patch also available for review here: >>> http://codereview.appspot.com/5752064 >>> >> >> Does it work with >> >> int foo (); >> int foo () __attribute__ ((targetv("arch=corei7"))); >> >> int (*foo_p) () = foo? > > Yes, this will work. foo_p will be the address of the dispatcher > function and hence doing (*foo_p)() will call the right version. Even when foo_p is a global variable and compiled with -fPIC? Thanks. -- H.J.
Sign in to reply to this message.
On Fri, Apr 27, 2012 at 7:38 AM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Fri, Apr 27, 2012 at 7:35 AM, Sriraman Tallam <tmsriram@google.com> wrote: >> On Fri, Apr 27, 2012 at 6:38 AM, H.J. Lu <hjl.tools@gmail.com> wrote: >>> On Thu, Apr 26, 2012 at 10:08 PM, Sriraman Tallam <tmsriram@google.com> wrote: >>>> Hi, >>>> >>>> Â I have made the following changes in this new patch which is attached: >>>> >>>> * Use target attribute itself to create function versions. >>>> * Handle any number of ISA names and arch= Â args to target attribute, >>>> generating the right dispatchers. >>>> * Integrate with the CPU runtime detection checked in this week. >>>> * Overload resolution: If the caller's target matches any of the >>>> version function's target, then a direct call to the version is >>>> generated, no need to go through the dispatching. >>>> >>>> Patch also available for review here: >>>> http://codereview.appspot.com/5752064 >>>> >>> >>> Does it work with >>> >>> int foo (); >>> int foo () __attribute__ ((targetv("arch=corei7"))); >>> >>> int (*foo_p) () = foo? >> >> Yes, this will work. foo_p will be the address of the dispatcher >> function and hence doing (*foo_p)() will call the right version. > > Even when foo_p is a global variable and compiled with -fPIC? I am not sure I understand what the complication is here, but FWIW, I tried this example and it works int foo () { return 0; } int __attribute__ ((target ("arch=corei7))) foo () { return 1; } int (*foo_p)() = foo; int main () { return (*foo_p)(); } g++ -fPIC -O2 example.cc Did you have something else in mind? Could you please elaborate if you a have a particular case in mind. The way I handle function pointers is straightforward. When the front-end sees a pointer to a function that is versioned, it returns the pointer to the dispatcher instead. Thanks, -Sri. > > Thanks. > > -- > H.J.
Sign in to reply to this message.
On Fri, Apr 27, 2012 at 7:53 AM, Sriraman Tallam <tmsriram@google.com> wrote: > On Fri, Apr 27, 2012 at 7:38 AM, H.J. Lu <hjl.tools@gmail.com> wrote: >> On Fri, Apr 27, 2012 at 7:35 AM, Sriraman Tallam <tmsriram@google.com> wrote: >>> On Fri, Apr 27, 2012 at 6:38 AM, H.J. Lu <hjl.tools@gmail.com> wrote: >>>> On Thu, Apr 26, 2012 at 10:08 PM, Sriraman Tallam <tmsriram@google.com> wrote: >>>>> Hi, >>>>> >>>>>  I have made the following changes in this new patch which is attached: >>>>> >>>>> * Use target attribute itself to create function versions. >>>>> * Handle any number of ISA names and arch=  args to target attribute, >>>>> generating the right dispatchers. >>>>> * Integrate with the CPU runtime detection checked in this week. >>>>> * Overload resolution: If the caller's target matches any of the >>>>> version function's target, then a direct call to the version is >>>>> generated, no need to go through the dispatching. >>>>> >>>>> Patch also available for review here: >>>>> http://codereview.appspot.com/5752064 >>>>> >>>> >>>> Does it work with >>>> >>>> int foo (); >>>> int foo () __attribute__ ((targetv("arch=corei7"))); >>>> >>>> int (*foo_p) () = foo? >>> >>> Yes, this will work. foo_p will be the address of the dispatcher >>> function and hence doing (*foo_p)() will call the right version. >> >> Even when foo_p is a global variable and compiled with -fPIC? > > I am not sure I understand what the complication is here, but FWIW, I > tried this example and it works > > int foo () > { >  return 0; > } > > int  __attribute__ ((target ("arch=corei7))) > foo () > { >  return 1; > } > > int (*foo_p)() = foo; > int main () > { >  return (*foo_p)(); > } > > g++ -fPIC -O2 example.cc > > > Did you have something else in mind? Could you please elaborate if you > a have a particular case in mind. > That is what I meant. But I didn't see it in your testcase. Can you add it to your testcase? Also you should verify the correct function is called in your testcase at run-time. Thanks. -- H.J.
Sign in to reply to this message.
On Fri, Apr 27, 2012 at 8:36 AM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Fri, Apr 27, 2012 at 7:53 AM, Sriraman Tallam <tmsriram@google.com> wrote: >> On Fri, Apr 27, 2012 at 7:38 AM, H.J. Lu <hjl.tools@gmail.com> wrote: >>> On Fri, Apr 27, 2012 at 7:35 AM, Sriraman Tallam <tmsriram@google.com> wrote: >>>> On Fri, Apr 27, 2012 at 6:38 AM, H.J. Lu <hjl.tools@gmail.com> wrote: >>>>> On Thu, Apr 26, 2012 at 10:08 PM, Sriraman Tallam <tmsriram@google.com> wrote: >>>>>> Hi, >>>>>> >>>>>>  I have made the following changes in this new patch which is attached: >>>>>> >>>>>> * Use target attribute itself to create function versions. >>>>>> * Handle any number of ISA names and arch=  args to target attribute, >>>>>> generating the right dispatchers. >>>>>> * Integrate with the CPU runtime detection checked in this week. >>>>>> * Overload resolution: If the caller's target matches any of the >>>>>> version function's target, then a direct call to the version is >>>>>> generated, no need to go through the dispatching. >>>>>> >>>>>> Patch also available for review here: >>>>>> http://codereview.appspot.com/5752064 >>>>>> >>>>> >>>>> Does it work with >>>>> >>>>> int foo (); >>>>> int foo () __attribute__ ((targetv("arch=corei7"))); >>>>> >>>>> int (*foo_p) () = foo? >>>> >>>> Yes, this will work. foo_p will be the address of the dispatcher >>>> function and hence doing (*foo_p)() will call the right version. >>> >>> Even when foo_p is a global variable and compiled with -fPIC? >> >> I am not sure I understand what the complication is here, but FWIW, I >> tried this example and it works >> >> int foo () >> { >>  return 0; >> } >> >> int  __attribute__ ((target ("arch=corei7))) >> foo () >> { >>  return 1; >> } >> >> int (*foo_p)() = foo; >> int main () >> { >>  return (*foo_p)(); >> } >> >> g++ -fPIC -O2 example.cc >> >> >> Did you have something else in mind? Could you please elaborate if you >> a have a particular case in mind. >> > > That is what I meant.  But I didn't see it in your testcase. > Can you add it to your testcase? > > Also you should verify the correct function is called in > your testcase at run-time. Ok, i will update the patch. Thanks, -Sri. > > > Thanks. > > > -- > H.J.
Sign in to reply to this message.
Hi, New patch attached, updated test case and fixed bugs related to __PRETTY_FUNCTION_. Patch also available for review here: http://codereview.appspot.com/5752064 Thanks, -Sri. On Fri, Apr 27, 2012 at 8:36 AM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Fri, Apr 27, 2012 at 7:53 AM, Sriraman Tallam <tmsriram@google.com> wrote: >> On Fri, Apr 27, 2012 at 7:38 AM, H.J. Lu <hjl.tools@gmail.com> wrote: >>> On Fri, Apr 27, 2012 at 7:35 AM, Sriraman Tallam <tmsriram@google.com> wrote: >>>> On Fri, Apr 27, 2012 at 6:38 AM, H.J. Lu <hjl.tools@gmail.com> wrote: >>>>> On Thu, Apr 26, 2012 at 10:08 PM, Sriraman Tallam <tmsriram@google.com> wrote: >>>>>> Hi, >>>>>> >>>>>>  I have made the following changes in this new patch which is attached: >>>>>> >>>>>> * Use target attribute itself to create function versions. >>>>>> * Handle any number of ISA names and arch=  args to target attribute, >>>>>> generating the right dispatchers. >>>>>> * Integrate with the CPU runtime detection checked in this week. >>>>>> * Overload resolution: If the caller's target matches any of the >>>>>> version function's target, then a direct call to the version is >>>>>> generated, no need to go through the dispatching. >>>>>> >>>>>> Patch also available for review here: >>>>>> http://codereview.appspot.com/5752064 >>>>>> >>>>> >>>>> Does it work with >>>>> >>>>> int foo (); >>>>> int foo () __attribute__ ((targetv("arch=corei7"))); >>>>> >>>>> int (*foo_p) () = foo? >>>> >>>> Yes, this will work. foo_p will be the address of the dispatcher >>>> function and hence doing (*foo_p)() will call the right version. >>> >>> Even when foo_p is a global variable and compiled with -fPIC? >> >> I am not sure I understand what the complication is here, but FWIW, I >> tried this example and it works >> >> int foo () >> { >>  return 0; >> } >> >> int  __attribute__ ((target ("arch=corei7))) >> foo () >> { >>  return 1; >> } >> >> int (*foo_p)() = foo; >> int main () >> { >>  return (*foo_p)(); >> } >> >> g++ -fPIC -O2 example.cc >> >> >> Did you have something else in mind? Could you please elaborate if you >> a have a particular case in mind. >> > > That is what I meant.  But I didn't see it in your testcase. > Can you add it to your testcase? > > Also you should verify the correct function is called in > your testcase at run-time. > > > Thanks. > > > -- > H.J.
Sign in to reply to this message.
On Tue, May 1, 2012 at 4:51 PM, Sriraman Tallam <tmsriram@google.com> wrote: > Hi, > > New patch attached, updated test case and fixed bugs related to > __PRETTY_FUNCTION_. > > Patch also available for review here: Â http://codereview.appspot.com/5752064 @@ -0,0 +1,39 @@ +/* Simple test case to check if Multiversioning works. */ +/* { dg-do run } */ +/* { dg-options "-O2 -fPIC" } */ + +#include <assert.h> + +int foo (); +int foo () __attribute__ ((target("arch=corei7,sse4.2,popcnt"))); +/* The target operands in this declaration and the definition are re-ordered. + This should still work. */ +int foo () __attribute__ ((target("ssse3,avx2"))); + +int (*p)() = &foo; +int main () +{ + return foo () + (*p)(); +} + +int foo () +{ + return 0; +} + +int __attribute__ ((target("arch=corei7,sse4.2,popcnt"))) +foo () +{ + assert (__builtin_cpu_is ("corei7") + && __builtin_cpu_supports ("sse4.2") + && __builtin_cpu_supports ("popcnt")); + return 0; +} + +int __attribute__ ((target("avx2,ssse3"))) +foo () +{ + assert (__builtin_cpu_supports ("avx2") + && __builtin_cpu_supports ("ssse3")); + return 0; +} This test will pass if int foo () { return 0; } is selected on processors with AVX. The run-time test should check that the right function is selected on the target processor, not the selected function matches the target attribute. You can do it by returning different values for each foo and call cpuid to check if the right foo is selected. You should add a testcase for __builtin_cpu_supports to check all valid arguments. -- H.J.
Sign in to reply to this message.
Hi H.J, Done now. Patch attached. Thanks, -Sri. On Tue, May 1, 2012 at 5:08 PM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Tue, May 1, 2012 at 4:51 PM, Sriraman Tallam <tmsriram@google.com> wrote: >> Hi, >> >> New patch attached, updated test case and fixed bugs related to >> __PRETTY_FUNCTION_. >> >> Patch also available for review here: Â http://codereview.appspot.com/5752064 > > @@ -0,0 +1,39 @@ > +/* Simple test case to check if Multiversioning works. Â */ > +/* { dg-do run } */ > +/* { dg-options "-O2 -fPIC" } */ > + > +#include <assert.h> > + > +int foo (); > +int foo () __attribute__ ((target("arch=corei7,sse4.2,popcnt"))); > +/* The target operands in this declaration and the definition are re-ordered. > + Â This should still work. Â */ > +int foo () __attribute__ ((target("ssse3,avx2"))); > + > +int (*p)() = &foo; > +int main () > +{ > + Â return foo () + (*p)(); > +} > + > +int foo () > +{ > + Â return 0; > +} > + > +int __attribute__ ((target("arch=corei7,sse4.2,popcnt"))) > +foo () > +{ > + Â assert (__builtin_cpu_is ("corei7") > + Â Â Â Â && __builtin_cpu_supports ("sse4.2") > + Â Â Â Â && __builtin_cpu_supports ("popcnt")); > + Â return 0; > +} > + > +int __attribute__ ((target("avx2,ssse3"))) > +foo () > +{ > + Â assert (__builtin_cpu_supports ("avx2") > + Â Â Â Â && __builtin_cpu_supports ("ssse3")); > + Â return 0; > +} > > This test will pass if > > int foo () > { > Â return 0; > } > > is selected on processors with AVX. Â The run-time test should > check that the right function is selected on the target processor, > not the selected function matches the target attribute. You can > do it by returning different values for each foo and call cpuid > to check if the right foo is selected. > > You should add a testcase for __builtin_cpu_supports to check > all valid arguments. > > -- > H.J.
Sign in to reply to this message.
On Tue, May 1, 2012 at 7:45 PM, Sriraman Tallam <tmsriram@google.com> wrote: > Hi H.J, > > Â Done now. Patch attached. > > Thanks, > -Sri. > > On Tue, May 1, 2012 at 5:08 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >> On Tue, May 1, 2012 at 4:51 PM, Sriraman Tallam <tmsriram@google.com> wrote: >>> Hi, >>> >>> New patch attached, updated test case and fixed bugs related to >>> __PRETTY_FUNCTION_. >>> >>> Patch also available for review here: Â http://codereview.appspot.com/5752064 >> >> @@ -0,0 +1,39 @@ >> +/* Simple test case to check if Multiversioning works. Â */ >> +/* { dg-do run } */ >> +/* { dg-options "-O2 -fPIC" } */ >> + >> +#include <assert.h> >> + >> +int foo (); >> +int foo () __attribute__ ((target("arch=corei7,sse4.2,popcnt"))); >> +/* The target operands in this declaration and the definition are re-ordered. >> + Â This should still work. Â */ >> +int foo () __attribute__ ((target("ssse3,avx2"))); >> + >> +int (*p)() = &foo; >> +int main () >> +{ >> + Â return foo () + (*p)(); >> +} >> + >> +int foo () >> +{ >> + Â return 0; >> +} >> + >> +int __attribute__ ((target("arch=corei7,sse4.2,popcnt"))) >> +foo () >> +{ >> + Â assert (__builtin_cpu_is ("corei7") >> + Â Â Â Â && __builtin_cpu_supports ("sse4.2") >> + Â Â Â Â && __builtin_cpu_supports ("popcnt")); >> + Â return 0; >> +} >> + >> +int __attribute__ ((target("avx2,ssse3"))) >> +foo () >> +{ >> + Â assert (__builtin_cpu_supports ("avx2") >> + Â Â Â Â && __builtin_cpu_supports ("ssse3")); >> + Â return 0; >> +} >> >> This test will pass if >> >> int foo () >> { >> Â return 0; >> } >> >> is selected on processors with AVX. Â The run-time test should >> check that the right function is selected on the target processor, >> not the selected function matches the target attribute. You can >> do it by returning different values for each foo and call cpuid >> to check if the right foo is selected. >> >> You should add a testcase for __builtin_cpu_supports to check >> all valid arguments. >> >> -- >> H.J. 2 questions: 1. Since AVX > SSE4 > SSSE3 > SSE3 > SSE2 > SSE, with foo for AVX and SSE3, on AVX processors, which foo will be selected? 2. I don't see any tests for __builtin_cpu_supports ("XXX") nor __builtin_cpu_is ("XXX"). I think you need tests for them. -- H.J.
Sign in to reply to this message.
On Wed, May 2, 2012 at 6:42 AM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Tue, May 1, 2012 at 7:45 PM, Sriraman Tallam <tmsriram@google.com> wrote: >> Hi H.J, >> >> Â Done now. Patch attached. >> >> Thanks, >> -Sri. >> >> On Tue, May 1, 2012 at 5:08 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >>> On Tue, May 1, 2012 at 4:51 PM, Sriraman Tallam <tmsriram@google.com> wrote: >>>> Hi, >>>> >>>> New patch attached, updated test case and fixed bugs related to >>>> __PRETTY_FUNCTION_. >>>> >>>> Patch also available for review here: Â http://codereview.appspot.com/5752064 >>> >>> @@ -0,0 +1,39 @@ >>> +/* Simple test case to check if Multiversioning works. Â */ >>> +/* { dg-do run } */ >>> +/* { dg-options "-O2 -fPIC" } */ >>> + >>> +#include <assert.h> >>> + >>> +int foo (); >>> +int foo () __attribute__ ((target("arch=corei7,sse4.2,popcnt"))); >>> +/* The target operands in this declaration and the definition are re-ordered. >>> + Â This should still work. Â */ >>> +int foo () __attribute__ ((target("ssse3,avx2"))); >>> + >>> +int (*p)() = &foo; >>> +int main () >>> +{ >>> + Â return foo () + (*p)(); >>> +} >>> + >>> +int foo () >>> +{ >>> + Â return 0; >>> +} >>> + >>> +int __attribute__ ((target("arch=corei7,sse4.2,popcnt"))) >>> +foo () >>> +{ >>> + Â assert (__builtin_cpu_is ("corei7") >>> + Â Â Â Â && __builtin_cpu_supports ("sse4.2") >>> + Â Â Â Â && __builtin_cpu_supports ("popcnt")); >>> + Â return 0; >>> +} >>> + >>> +int __attribute__ ((target("avx2,ssse3"))) >>> +foo () >>> +{ >>> + Â assert (__builtin_cpu_supports ("avx2") >>> + Â Â Â Â && __builtin_cpu_supports ("ssse3")); >>> + Â return 0; >>> +} >>> >>> This test will pass if >>> >>> int foo () >>> { >>> Â return 0; >>> } >>> >>> is selected on processors with AVX. Â The run-time test should >>> check that the right function is selected on the target processor, >>> not the selected function matches the target attribute. You can >>> do it by returning different values for each foo and call cpuid >>> to check if the right foo is selected. >>> >>> You should add a testcase for __builtin_cpu_supports to check >>> all valid arguments. >>> >>> -- >>> H.J. > > 2 questions: > > 1. Â Since AVX > SSE4 > SSSE3 > SSE3 > SSE2 > SSE, with > foo for AVX and SSE3, on AVX processors, which foo will be > selected? foo for AVX will get called since that appears ahead. The dispatching is done in the same order in which the functions are specified. If, potentially, two foo versions can be dispatched for an architecture, the first foo will get called. There is no way right now to specify the order in which the dispatching should be done. > 2. Â I don't see any tests for __builtin_cpu_supports ("XXX") > nor __builtin_cpu_is ("XXX"). Â I think you need tests for > them. This is already there as part of the previous CPU detection patch that was submitted. Please see gcc.target/i386/builtin_target.c. Did you want something else? > > -- > H.J.
Sign in to reply to this message.
On Wed, May 2, 2012 at 8:08 AM, Sriraman Tallam <tmsriram@google.com> wrote: > On Wed, May 2, 2012 at 6:42 AM, H.J. Lu <hjl.tools@gmail.com> wrote: >> On Tue, May 1, 2012 at 7:45 PM, Sriraman Tallam <tmsriram@google.com> wrote: >>> Hi H.J, >>> >>> Â Done now. Patch attached. >>> >>> Thanks, >>> -Sri. >>> >>> On Tue, May 1, 2012 at 5:08 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >>>> On Tue, May 1, 2012 at 4:51 PM, Sriraman Tallam <tmsriram@google.com> wrote: >>>>> Hi, >>>>> >>>>> New patch attached, updated test case and fixed bugs related to >>>>> __PRETTY_FUNCTION_. >>>>> >>>>> Patch also available for review here: Â http://codereview.appspot.com/5752064 >>>> >>>> @@ -0,0 +1,39 @@ >>>> +/* Simple test case to check if Multiversioning works. Â */ >>>> +/* { dg-do run } */ >>>> +/* { dg-options "-O2 -fPIC" } */ >>>> + >>>> +#include <assert.h> >>>> + >>>> +int foo (); >>>> +int foo () __attribute__ ((target("arch=corei7,sse4.2,popcnt"))); >>>> +/* The target operands in this declaration and the definition are re-ordered. >>>> + Â This should still work. Â */ >>>> +int foo () __attribute__ ((target("ssse3,avx2"))); >>>> + >>>> +int (*p)() = &foo; >>>> +int main () >>>> +{ >>>> + Â return foo () + (*p)(); >>>> +} >>>> + >>>> +int foo () >>>> +{ >>>> + Â return 0; >>>> +} >>>> + >>>> +int __attribute__ ((target("arch=corei7,sse4.2,popcnt"))) >>>> +foo () >>>> +{ >>>> + Â assert (__builtin_cpu_is ("corei7") >>>> + Â Â Â Â && __builtin_cpu_supports ("sse4.2") >>>> + Â Â Â Â && __builtin_cpu_supports ("popcnt")); >>>> + Â return 0; >>>> +} >>>> + >>>> +int __attribute__ ((target("avx2,ssse3"))) >>>> +foo () >>>> +{ >>>> + Â assert (__builtin_cpu_supports ("avx2") >>>> + Â Â Â Â && __builtin_cpu_supports ("ssse3")); >>>> + Â return 0; >>>> +} >>>> >>>> This test will pass if >>>> >>>> int foo () >>>> { >>>> Â return 0; >>>> } >>>> >>>> is selected on processors with AVX. Â The run-time test should >>>> check that the right function is selected on the target processor, >>>> not the selected function matches the target attribute. You can >>>> do it by returning different values for each foo and call cpuid >>>> to check if the right foo is selected. >>>> >>>> You should add a testcase for __builtin_cpu_supports to check >>>> all valid arguments. >>>> >>>> -- >>>> H.J. >> >> 2 questions: >> >> 1. Â Since AVX > SSE4 > SSSE3 > SSE3 > SSE2 > SSE, with >> foo for AVX and SSE3, on AVX processors, which foo will be >> selected? > > foo for AVX will get called since that appears ahead. > > The dispatching is done in the same order in which the functions are > specified. If, potentially, two foo versions can be dispatched for an > architecture, the first foo will get called. Â There is no way right > now to specify the order in which the dispatching should be done. This is very fragile. We know ISAs and processors. The source order should be irrelevant. > >> 2. Â I don't see any tests for __builtin_cpu_supports ("XXX") >> nor __builtin_cpu_is ("XXX"). Â I think you need tests for >> them. > > This is already there as part of the previous CPU detection patch that > was submitted. Please see gcc.target/i386/builtin_target.c. Did you > want something else? gcc.target/i386/builtin_target.c doesn't test if __builtin_cpu_supports ("XXX") and __builtin_cpu_is ("XXX") are implemented correctly. -- H.J.
Sign in to reply to this message.
On Wed, May 2, 2012 at 9:05 AM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Wed, May 2, 2012 at 8:08 AM, Sriraman Tallam <tmsriram@google.com> wrote: >> On Wed, May 2, 2012 at 6:42 AM, H.J. Lu <hjl.tools@gmail.com> wrote: >>> On Tue, May 1, 2012 at 7:45 PM, Sriraman Tallam <tmsriram@google.com> wrote: >>>> Hi H.J, >>>> >>>> Â Done now. Patch attached. >>>> >>>> Thanks, >>>> -Sri. >>>> >>>> On Tue, May 1, 2012 at 5:08 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >>>>> On Tue, May 1, 2012 at 4:51 PM, Sriraman Tallam <tmsriram@google.com> wrote: >>>>>> Hi, >>>>>> >>>>>> New patch attached, updated test case and fixed bugs related to >>>>>> __PRETTY_FUNCTION_. >>>>>> >>>>>> Patch also available for review here: Â http://codereview.appspot.com/5752064 >>>>> >>>>> @@ -0,0 +1,39 @@ >>>>> +/* Simple test case to check if Multiversioning works. Â */ >>>>> +/* { dg-do run } */ >>>>> +/* { dg-options "-O2 -fPIC" } */ >>>>> + >>>>> +#include <assert.h> >>>>> + >>>>> +int foo (); >>>>> +int foo () __attribute__ ((target("arch=corei7,sse4.2,popcnt"))); >>>>> +/* The target operands in this declaration and the definition are re-ordered. >>>>> + Â This should still work. Â */ >>>>> +int foo () __attribute__ ((target("ssse3,avx2"))); >>>>> + >>>>> +int (*p)() = &foo; >>>>> +int main () >>>>> +{ >>>>> + Â return foo () + (*p)(); >>>>> +} >>>>> + >>>>> +int foo () >>>>> +{ >>>>> + Â return 0; >>>>> +} >>>>> + >>>>> +int __attribute__ ((target("arch=corei7,sse4.2,popcnt"))) >>>>> +foo () >>>>> +{ >>>>> + Â assert (__builtin_cpu_is ("corei7") >>>>> + Â Â Â Â && __builtin_cpu_supports ("sse4.2") >>>>> + Â Â Â Â && __builtin_cpu_supports ("popcnt")); >>>>> + Â return 0; >>>>> +} >>>>> + >>>>> +int __attribute__ ((target("avx2,ssse3"))) >>>>> +foo () >>>>> +{ >>>>> + Â assert (__builtin_cpu_supports ("avx2") >>>>> + Â Â Â Â && __builtin_cpu_supports ("ssse3")); >>>>> + Â return 0; >>>>> +} >>>>> >>>>> This test will pass if >>>>> >>>>> int foo () >>>>> { >>>>> Â return 0; >>>>> } >>>>> >>>>> is selected on processors with AVX. Â The run-time test should >>>>> check that the right function is selected on the target processor, >>>>> not the selected function matches the target attribute. You can >>>>> do it by returning different values for each foo and call cpuid >>>>> to check if the right foo is selected. >>>>> >>>>> You should add a testcase for __builtin_cpu_supports to check >>>>> all valid arguments. >>>>> >>>>> -- >>>>> H.J. >>> >>> 2 questions: >>> >>> 1. Â Since AVX > SSE4 > SSSE3 > SSE3 > SSE2 > SSE, with >>> foo for AVX and SSE3, on AVX processors, which foo will be >>> selected? >> >> foo for AVX will get called since that appears ahead. >> >> The dispatching is done in the same order in which the functions are >> specified. If, potentially, two foo versions can be dispatched for an >> architecture, the first foo will get called. Â There is no way right >> now to specify the order in which the dispatching should be done. > > This is very fragile. Â We know ISAs and processors. Â The source > order should be irrelevant. I am not sure it is always possible keep this dispatching unambiguous to the user. It might be better to let the user specify a priority for each version to control the order of dispatching. Still, one way to implement what you said is to assign a significance number to each ISA, where the number of sse4 > sse, for instance. Then, the dispatching can be done in the descending order of significance. What do you think? I thought about this earlier and I was thinking along the lines of letting the user specify a priority for each version, when there is ambiguity. > >> >>> 2. Â I don't see any tests for __builtin_cpu_supports ("XXX") >>> nor __builtin_cpu_is ("XXX"). Â I think you need tests for >>> them. >> >> This is already there as part of the previous CPU detection patch that >> was submitted. Please see gcc.target/i386/builtin_target.c. Did you >> want something else? > > gcc.target/i386/builtin_target.c doesn't test if __builtin_cpu_supports ("XXX") > and __builtin_cpu_is ("XXX") are implemented correctly. Oh, you mean like doing a CPUID again in the test case itself and checking, ok. Thanks, -Sri. > > > -- > H.J.
Sign in to reply to this message.
On Wed, May 2, 2012 at 10:44 AM, Sriraman Tallam <tmsriram@google.com> wrote: >>>> >>>> 1. Â Since AVX > SSE4 > SSSE3 > SSE3 > SSE2 > SSE, with >>>> foo for AVX and SSE3, on AVX processors, which foo will be >>>> selected? >>> >>> foo for AVX will get called since that appears ahead. >>> >>> The dispatching is done in the same order in which the functions are >>> specified. If, potentially, two foo versions can be dispatched for an >>> architecture, the first foo will get called. Â There is no way right >>> now to specify the order in which the dispatching should be done. >> >> This is very fragile. Â We know ISAs and processors. Â The source >> order should be irrelevant. > > I am not sure it is always possible keep this dispatching unambiguous > to the user. It might be better to let the user specify a priority for > each version to control the order of dispatching. > > Â Still, one way to implement what you said is to assign a significance > number to each ISA, where the number of sse4 > sse, for instance. > Then, the dispatching can be done in the descending order of > significance. What do you think? This sounds reasonable. You should also take processor into account when doing this. > I thought about this earlier and I was thinking along the lines of > letting the user specify a priority for each version, when there is > ambiguity. > >> >>> >>>> 2. Â I don't see any tests for __builtin_cpu_supports ("XXX") >>>> nor __builtin_cpu_is ("XXX"). Â I think you need tests for >>>> them. >>> >>> This is already there as part of the previous CPU detection patch that >>> was submitted. Please see gcc.target/i386/builtin_target.c. Did you >>> want something else? >> >> gcc.target/i386/builtin_target.c doesn't test if __builtin_cpu_supports ("XXX") >> and __builtin_cpu_is ("XXX") are implemented correctly. > > Oh, you mean like doing a CPUID again in the test case itself and checking, ok. > Yes. BTW, I think you should also add FMA support to config/i386/i386-cpuinfo.c. Thanks. -- H.J.
Sign in to reply to this message.
On Wed, May 2, 2012 at 11:04 AM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Wed, May 2, 2012 at 10:44 AM, Sriraman Tallam <tmsriram@google.com> wrote: >>>>> >>>>> 1. Â Since AVX > SSE4 > SSSE3 > SSE3 > SSE2 > SSE, with >>>>> foo for AVX and SSE3, on AVX processors, which foo will be >>>>> selected? >>>> >>>> foo for AVX will get called since that appears ahead. >>>> >>>> The dispatching is done in the same order in which the functions are >>>> specified. If, potentially, two foo versions can be dispatched for an >>>> architecture, the first foo will get called. Â There is no way right >>>> now to specify the order in which the dispatching should be done. >>> >>> This is very fragile. Â We know ISAs and processors. Â The source >>> order should be irrelevant. >> >> I am not sure it is always possible keep this dispatching unambiguous >> to the user. It might be better to let the user specify a priority for >> each version to control the order of dispatching. >> >> Â Still, one way to implement what you said is to assign a significance >> number to each ISA, where the number of sse4 > sse, for instance. >> Then, the dispatching can be done in the descending order of >> significance. What do you think? > > This sounds reasonable. Â You should also take processor into > account when doing this. > >> I thought about this earlier and I was thinking along the lines of >> letting the user specify a priority for each version, when there is >> ambiguity. >> >>> >>>> >>>>> 2. Â I don't see any tests for __builtin_cpu_supports ("XXX") >>>>> nor __builtin_cpu_is ("XXX"). Â I think you need tests for >>>>> them. >>>> >>>> This is already there as part of the previous CPU detection patch that >>>> was submitted. Please see gcc.target/i386/builtin_target.c. Did you >>>> want something else? >>> >>> gcc.target/i386/builtin_target.c doesn't test if __builtin_cpu_supports ("XXX") >>> and __builtin_cpu_is ("XXX") are implemented correctly. >> >> Oh, you mean like doing a CPUID again in the test case itself and checking, ok. >> > > Yes. BTW, Â I think you should also add FMA support to > config/i386/i386-cpuinfo.c. I am preparing a patch for this. I will send it your way soon enough. Thanks, -Sri. > > Thanks. > > -- > H.J.
Sign in to reply to this message.
Hi, Attached new patch with more bug fixes. I will fix the dispatching method to use prioirty of attributes in the next iteration. Patch also available for review here: http://codereview.appspot.com/5752064 Thanks, -Sri. On Mon, May 7, 2012 at 9:58 AM, Sriraman Tallam <tmsriram@google.com> wrote: > On Wed, May 2, 2012 at 11:04 AM, H.J. Lu <hjl.tools@gmail.com> wrote: >> On Wed, May 2, 2012 at 10:44 AM, Sriraman Tallam <tmsriram@google.com> wrote: >>>>>> >>>>>> 1. Â Since AVX > SSE4 > SSSE3 > SSE3 > SSE2 > SSE, with >>>>>> foo for AVX and SSE3, on AVX processors, which foo will be >>>>>> selected? >>>>> >>>>> foo for AVX will get called since that appears ahead. >>>>> >>>>> The dispatching is done in the same order in which the functions are >>>>> specified. If, potentially, two foo versions can be dispatched for an >>>>> architecture, the first foo will get called. Â There is no way right >>>>> now to specify the order in which the dispatching should be done. >>>> >>>> This is very fragile. Â We know ISAs and processors. Â The source >>>> order should be irrelevant. >>> >>> I am not sure it is always possible keep this dispatching unambiguous >>> to the user. It might be better to let the user specify a priority for >>> each version to control the order of dispatching. >>> >>> Â Still, one way to implement what you said is to assign a significance >>> number to each ISA, where the number of sse4 > sse, for instance. >>> Then, the dispatching can be done in the descending order of >>> significance. What do you think? >> >> This sounds reasonable. Â You should also take processor into >> account when doing this. >> >>> I thought about this earlier and I was thinking along the lines of >>> letting the user specify a priority for each version, when there is >>> ambiguity. >>> >>>> >>>>> >>>>>> 2. Â I don't see any tests for __builtin_cpu_supports ("XXX") >>>>>> nor __builtin_cpu_is ("XXX"). Â I think you need tests for >>>>>> them. >>>>> >>>>> This is already there as part of the previous CPU detection patch that >>>>> was submitted. Please see gcc.target/i386/builtin_target.c. Did you >>>>> want something else? >>>> >>>> gcc.target/i386/builtin_target.c doesn't test if __builtin_cpu_supports ("XXX") >>>> and __builtin_cpu_is ("XXX") are implemented correctly. >>> >>> Oh, you mean like doing a CPUID again in the test case itself and checking, ok. >>> >> >> Yes. BTW, Â I think you should also add FMA support to >> config/i386/i386-cpuinfo.c. > > I am preparing a patch for this. I will send it your way soon enough. > > Thanks, > -Sri. > >> >> Thanks. >> >> -- >> H.J.
Sign in to reply to this message.
On Wed, May 9, 2012 at 12:01 PM, Sriraman Tallam <tmsriram@google.com> wrote: > Hi, > > Attached new patch with more bug fixes. I will fix the dispatching > method to use prioirty of attributes in the next iteration. > > Patch also available for review here: Â http://codereview.appspot.com/5752064 > The patch looks OK to me. Since testcase depends on the dispatching method, I'd like to see the whole patch with the updated dispatching method. Thanks. -- H.J.
Sign in to reply to this message.
Hi H.J., I have updated the patch to improve the dispatching method like we discussed. Each feature gets a priority now, and the dispatching is done in priority order. Please see i386.c for the changes. Patch also available for review here: http://codereview.appspot.com/5752064 Thanks, -Sri. On Thu, May 10, 2012 at 10:55 AM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Wed, May 9, 2012 at 12:01 PM, Sriraman Tallam <tmsriram@google.com> wrote: >> Hi, >> >> Attached new patch with more bug fixes. I will fix the dispatching >> method to use prioirty of attributes in the next iteration. >> >> Patch also available for review here: Â http://codereview.appspot.com/5752064 >> > > The patch looks OK to me. Â Since testcase depends on the dispatching > method, Â I'd like to see the whole patch with the updated dispatching > method. > > Thanks. > > -- > H.J.
Sign in to reply to this message.
On Fri, May 11, 2012 at 7:04 PM, Sriraman Tallam <tmsriram@google.com> wrote: > Hi H.J., > > Â I have updated the patch to improve the dispatching method like we > discussed. Each feature gets a priority now, and the dispatching is > done in priority order. Please see i386.c for the changes. > > Patch also available for review here: Â http://codereview.appspot.com/5752064 > I think you need 3 tests: 1. Only with ISA. 2. Only with arch 3. Mixed with ISA and arch since test mixed ISA and arch may hide issues with ISA only or arch only. -- H.J.
Sign in to reply to this message.
Hi H.J, Attaching new patch with 2 test cases, mv2.C checks ISAs only and mv1.C checks ISAs and arches mixed. Right now, checking only arches is not needed as they are mutually exclusive, any order should be fine. Patch also available for review here: http://codereview.appspot.com/5752064 Thanks, -Sri. On Sat, May 12, 2012 at 6:37 AM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Fri, May 11, 2012 at 7:04 PM, Sriraman Tallam <tmsriram@google.com> wrote: >> Hi H.J., >> >> Â I have updated the patch to improve the dispatching method like we >> discussed. Each feature gets a priority now, and the dispatching is >> done in priority order. Please see i386.c for the changes. >> >> Patch also available for review here: Â http://codereview.appspot.com/5752064 >> > > I think you need 3 tests: > > 1. Â Only with ISA. > 2. Â Only with arch > 3. Â Mixed with ISA and arch > > since test mixed ISA and arch may hide issues with ISA only or arch only. > > -- > H.J.
Sign in to reply to this message.
On Mon, May 14, 2012 at 11:28 AM, Sriraman Tallam <tmsriram@google.com> wrote: > Hi H.J, > > Â Attaching new patch with 2 test cases, mv2.C checks ISAs only and > mv1.C checks ISAs and arches mixed. Right now, checking only arches is > not needed as they are mutually exclusive, any order should be fine. > > Patch also available for review here: Â http://codereview.appspot.com/5752064 Sorry for the delay. It looks OK except for the function order in tescases. I think you should rearrange them so that they are not in the same order as the priority. Thanks. H.J. > Thanks, > -Sri. > > On Sat, May 12, 2012 at 6:37 AM, H.J. Lu <hjl.tools@gmail.com> wrote: >> On Fri, May 11, 2012 at 7:04 PM, Sriraman Tallam <tmsriram@google.com> wrote: >>> Hi H.J., >>> >>> Â I have updated the patch to improve the dispatching method like we >>> discussed. Each feature gets a priority now, and the dispatching is >>> done in priority order. Please see i386.c for the changes. >>> >>> Patch also available for review here: Â http://codereview.appspot.com/5752064 >>> >> >> I think you need 3 tests: >> >> 1. Â Only with ISA. >> 2. Â Only with arch >> 3. Â Mixed with ISA and arch >> >> since test mixed ISA and arch may hide issues with ISA only or arch only. >> >> --
Sign in to reply to this message.
Hi H.J., On Fri, May 25, 2012 at 5:07 PM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Mon, May 14, 2012 at 11:28 AM, Sriraman Tallam <tmsriram@google.com> wrote: >> Hi H.J, >> >> Â Attaching new patch with 2 test cases, mv2.C checks ISAs only and >> mv1.C checks ISAs and arches mixed. Right now, checking only arches is >> not needed as they are mutually exclusive, any order should be fine. >> >> Patch also available for review here: Â http://codereview.appspot.com/5752064 > > Sorry for the delay. Â It looks OK except for the function order in tescases. > I think you should rearrange them so that they are not in the same order > as the priority. I am not sure I understand. The function order is mixed up in the declarations, I have explicitly commented about this. I only do the checking in order which I must, right? Thanks, -Sri. > > Thanks. > > H.J. >> Thanks, >> -Sri. >> >> On Sat, May 12, 2012 at 6:37 AM, H.J. Lu <hjl.tools@gmail.com> wrote: >>> On Fri, May 11, 2012 at 7:04 PM, Sriraman Tallam <tmsriram@google.com> wrote: >>>> Hi H.J., >>>> >>>> Â I have updated the patch to improve the dispatching method like we >>>> discussed. Each feature gets a priority now, and the dispatching is >>>> done in priority order. Please see i386.c for the changes. >>>> >>>> Patch also available for review here: Â http://codereview.appspot.com/5752064 >>>> >>> >>> I think you need 3 tests: >>> >>> 1. Â Only with ISA. >>> 2. Â Only with arch >>> 3. Â Mixed with ISA and arch >>> >>> since test mixed ISA and arch may hide issues with ISA only or arch only. >>> >>> --
Sign in to reply to this message.
On Fri, May 25, 2012 at 5:16 PM, Sriraman Tallam <tmsriram@google.com> wrote: > Hi H.J., > > On Fri, May 25, 2012 at 5:07 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >> On Mon, May 14, 2012 at 11:28 AM, Sriraman Tallam <tmsriram@google.com> wrote: >>> Hi H.J, >>> >>> Â Attaching new patch with 2 test cases, mv2.C checks ISAs only and >>> mv1.C checks ISAs and arches mixed. Right now, checking only arches is >>> not needed as they are mutually exclusive, any order should be fine. >>> >>> Patch also available for review here: Â http://codereview.appspot.com/5752064 >> >> Sorry for the delay. Â It looks OK except for the function order in tescases. >> I think you should rearrange them so that they are not in the same order >> as the priority. > > I am not sure I understand. The function order is mixed up in the > declarations, I have explicitly commented about this. I only do the > checking in order which I must, right? > > gcc/testsuite/g++.dg/mv2.C has int __attribute__ ((target("avx2"))) foo () { return 1; } int __attribute__ ((target("avx"))) foo () { return 2; } int __attribute__ ((target("popcnt"))) foo () { return 3; } int __attribute__ ((target("sse4.2"))) foo () { return 4; } int __attribute__ ((target("sse4.1"))) foo () { return 5; } int __attribute__ ((target("ssse3"))) foo () { return 6; } int __attribute__ ((target("sse3"))) foo () { return 7; } int __attribute__ ((target("sse2"))) foo () { return 8; } int __attribute__ ((target("sse"))) foo () { return 9; } int __attribute__ ((target("mmx"))) foo () { return 10; } It is most in the priority order. BTW, I noticed: [hjl@gnu-6 pr14170]$ readelf -sW libgcc.a | grep __cpu_model 20: 0000000000000010 16 OBJECT GLOBAL HIDDEN COM __cpu_model [hjl@gnu-6 pr14170]$ readelf -sW libgcc_s.so | grep __cpu_model 82: 0000000000214ff0 16 OBJECT GLOBAL DEFAULT 24 __cpu_model@@GCC_4.8.0 310: 0000000000214ff0 16 OBJECT GLOBAL DEFAULT 24 __cpu_model [hjl@gnu-6 pr14170]$ Why is __cpu_model in both libgcc.a and libgcc_s.o? H.J.
Sign in to reply to this message.
On Fri, May 25, 2012 at 5:27 PM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Fri, May 25, 2012 at 5:16 PM, Sriraman Tallam <tmsriram@google.com> wrote: >> Hi H.J., >> >> On Fri, May 25, 2012 at 5:07 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >>> On Mon, May 14, 2012 at 11:28 AM, Sriraman Tallam <tmsriram@google.com> wrote: >>>> Hi H.J, >>>> >>>> Â Attaching new patch with 2 test cases, mv2.C checks ISAs only and >>>> mv1.C checks ISAs and arches mixed. Right now, checking only arches is >>>> not needed as they are mutually exclusive, any order should be fine. >>>> >>>> Patch also available for review here: Â http://codereview.appspot.com/5752064 >>> >>> Sorry for the delay. Â It looks OK except for the function order in tescases. >>> I think you should rearrange them so that they are not in the same order >>> as the priority. >> >> I am not sure I understand. The function order is mixed up in the >> declarations, I have explicitly commented about this. I only do the >> checking in order which I must, right? >> >> > > gcc/testsuite/g++.dg/mv2.C has > > int __attribute__ ((target("avx2"))) > foo () > { > Â return 1; > } > > int __attribute__ ((target("avx"))) > foo () > { > Â return 2; > } > > int __attribute__ ((target("popcnt"))) > foo () > { > Â return 3; > } > > int __attribute__ ((target("sse4.2"))) > foo () > { > Â return 4; > } > > int __attribute__ ((target("sse4.1"))) > foo () > { > Â return 5; > } > > int __attribute__ ((target("ssse3"))) > foo () > { > Â return 6; > } > > int __attribute__ ((target("sse3"))) > foo () > { > Â return 7; > } > > int __attribute__ ((target("sse2"))) > foo () > { > Â return 8; > } > > int __attribute__ ((target("sse"))) > foo () > { > Â return 9; > } > > int __attribute__ ((target("mmx"))) > foo () > { > Â return 10; > } > > It is most in the priority order. Ah! ok, got it. I kept it that way because it is really the order of the declarations before the call that matter but I will rearrange the definitions too to be clear. > > BTW, I noticed: > > [hjl@gnu-6 pr14170]$ readelf -sW libgcc.a | grep __cpu_model > Â Â 20: 0000000000000010 Â Â 16 OBJECT Â GLOBAL HIDDEN Â COM __cpu_model > [hjl@gnu-6 pr14170]$ readelf -sW libgcc_s.so | grep __cpu_model > Â Â 82: 0000000000214ff0 Â Â 16 OBJECT Â GLOBAL DEFAULT Â 24 > __cpu_model@@GCC_4.8.0 > Â 310: 0000000000214ff0 Â Â 16 OBJECT Â GLOBAL DEFAULT Â 24 __cpu_model > [hjl@gnu-6 pr14170]$ > > Why is __cpu_model in both libgcc.a and libgcc_s.o? How do I disallow this in libgcc_s.so? Looks like t-cpuinfo file is wrong but I cannot figure out the fix. Thanks, -Sri. > > > H.J.
Sign in to reply to this message.
On May 25, 2012 6:54 PM, "Sriraman Tallam" <tmsriram@google.com> wrote: > > > >> > >> On Fri, May 25, 2012 at 5:0 > > BTW, I noticed: > > > > [hjl@gnu-6 pr14170]$ readelf -sW libgcc.a | grep __cpu_model > > 20: 0000000000000010 16 OBJECT GLOBAL HIDDEN COM __cpu_model > > [hjl@gnu-6 pr14170]$ readelf -sW libgcc_s.so | grep __cpu_model > > 82: 0000000000214ff0 16 OBJECT GLOBAL DEFAULT 24 > > __cpu_model@@GCC_4.8.0 > > 310: 0000000000214ff0 16 OBJECT GLOBAL DEFAULT 24 __cpu_model > > [hjl@gnu-6 pr14170]$ > > > > Why is __cpu_model in both libgcc.a and libgcc_s.o? > > How do I disallow this in libgcc_s.so? Looks like t-cpuinfo file is > wrong but I cannot figure out the fix. > Why don't you want it in libgcc_s.so? H.J
Sign in to reply to this message.
On May 25, 2012 7:15 PM, "H.J. Lu" <hjl.tools@gmail.com> wrote: > > > On May 25, 2012 6:54 PM, "Sriraman Tallam" <tmsriram@google.com> wrote: > > > > > > >> > > >> On Fri, May 25, 2012 at 5:0 > > BTW, I noticed: > > > > > > > [hjl@gnu-6 pr14170]$ readelf -sW libgcc.a | grep __cpu_model > > > 20: 0000000000000010 16 OBJECT GLOBAL HIDDEN COM __cpu_model > > > [hjl@gnu-6 pr14170]$ readelf -sW libgcc_s.so | grep __cpu_model > > > 82: 0000000000214ff0 16 OBJECT GLOBAL DEFAULT 24 > > > __cpu_model@@GCC_4.8.0 > > > 310: 0000000000214ff0 16 OBJECT GLOBAL DEFAULT 24 __cpu_model > > > [hjl@gnu-6 pr14170]$ > > > > > > Why is __cpu_model in both libgcc.a and libgcc_s.o? > > > > How do I disallow this in libgcc_s.so? Looks like t-cpuinfo file is > > wrong but I cannot figure out the fix. > > > Why don't you want it in libgcc_s.so? I thought libgcc.a is always linked in for static and dynamic builds. So having it in libgcc_s.so is redundant. Thanks -Sri. > > H.J
Sign in to reply to this message.
On Fri, May 25, 2012 at 8:38 PM, Sriraman Tallam <tmsriram@google.com> wrote: > > On May 25, 2012 7:15 PM, "H.J. Lu" <hjl.tools@gmail.com> wrote: >> >> >> On May 25, 2012 6:54 PM, "Sriraman Tallam" <tmsriram@google.com> wrote: >> > >> > >> > >> >> > >> On Fri, May 25, 2012 at 5:0 > > BTW, I noticed: >> >> > > >> > > [hjl@gnu-6 pr14170]$ readelf -sW libgcc.a | grep __cpu_model >> > > Â Â 20: 0000000000000010 Â Â 16 OBJECT Â GLOBAL HIDDEN Â COM __cpu_model >> > > [hjl@gnu-6 pr14170]$ readelf -sW libgcc_s.so | grep __cpu_model >> > > Â Â 82: 0000000000214ff0 Â Â 16 OBJECT Â GLOBAL DEFAULT Â 24 >> > > __cpu_model@@GCC_4.8.0 >> > > Â 310: 0000000000214ff0 Â Â 16 OBJECT Â GLOBAL DEFAULT Â 24 __cpu_model >> > > [hjl@gnu-6 pr14170]$ >> > > >> > > Why is __cpu_model in both libgcc.a and libgcc_s.o? >> > >> > How do I disallow this in libgcc_s.so? Looks like t-cpuinfo file is >> > wrong but I cannot figure out the fix. >> > >> Why don't you want it in libgcc_s.so? > > I thought libgcc.a is always linked in for static and dynamic builds. So > having it in libgcc_s.so is redundant. > [hjl@gnu-6 pr14170]$ readelf -sW libgcc.a | grep _cpu_ 20: 0000000000000010 16 OBJECT GLOBAL HIDDEN COM __cpu_model 21: 0000000000000110 612 FUNC GLOBAL HIDDEN 4 __cpu_indicator_init [hjl@gnu-6 pr14170]$ readelf -sW libgcc_s.so.1 | grep _cpu_ 82: 0000000000214ff0 16 OBJECT GLOBAL DEFAULT 24 __cpu_model@@GCC_4.8.0 223: 0000000000002b60 560 FUNC LOCAL DEFAULT 11 __cpu_indicator_init 310: 0000000000214ff0 16 OBJECT GLOBAL DEFAULT 24 __cpu_model [hjl@gnu-6 pr14170]$ I think there should be only one copy of __cpu_model in the process. It should be in libgcc_s.so. Why isn't __cpu_indicator_init exported from libgcc_s.so? -- H.J.
Sign in to reply to this message.
On Fri, May 25, 2012 at 10:05 PM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Fri, May 25, 2012 at 8:38 PM, Sriraman Tallam <tmsriram@google.com> wrote: >> >> On May 25, 2012 7:15 PM, "H.J. Lu" <hjl.tools@gmail.com> wrote: >>> >>> >>> On May 25, 2012 6:54 PM, "Sriraman Tallam" <tmsriram@google.com> wrote: >>> > >>> > >>> > >> >>> > >> On Fri, May 25, 2012 at 5:0 > > BTW, I noticed: >>> >>> > > >>> > > [hjl@gnu-6 pr14170]$ readelf -sW libgcc.a | grep __cpu_model >>> > >   20: 0000000000000010   16 OBJECT  GLOBAL HIDDEN  COM __cpu_model >>> > > [hjl@gnu-6 pr14170]$ readelf -sW libgcc_s.so | grep __cpu_model >>> > >   82: 0000000000214ff0   16 OBJECT  GLOBAL DEFAULT  24 >>> > > __cpu_model@@GCC_4.8.0 >>> > >  310: 0000000000214ff0   16 OBJECT  GLOBAL DEFAULT  24 __cpu_model >>> > > [hjl@gnu-6 pr14170]$ >>> > > >>> > > Why is __cpu_model in both libgcc.a and libgcc_s.o? >>> > >>> > How do I disallow this in libgcc_s.so? Looks like t-cpuinfo file is >>> > wrong but I cannot figure out the fix. >>> > >>> Why don't you want it in libgcc_s.so? >> >> I thought libgcc.a is always linked in for static and dynamic builds. So >> having it in libgcc_s.so is redundant. >> > > [hjl@gnu-6 pr14170]$ readelf -sW libgcc.a | grep _cpu_ >   20: 0000000000000010   16 OBJECT  GLOBAL HIDDEN  COM __cpu_model >   21: 0000000000000110  612 FUNC   GLOBAL HIDDEN   4 __cpu_indicator_init > [hjl@gnu-6 pr14170]$ readelf -sW libgcc_s.so.1 | grep _cpu_ >   82: 0000000000214ff0   16 OBJECT  GLOBAL DEFAULT  24 > __cpu_model@@GCC_4.8.0 >  223: 0000000000002b60  560 FUNC   LOCAL  DEFAULT  11 __cpu_indicator_init >  310: 0000000000214ff0   16 OBJECT  GLOBAL DEFAULT  24 __cpu_model > [hjl@gnu-6 pr14170]$ > > I think there should be only one copy of __cpu_model in the process. > It should be in libgcc_s.so. Why isn't  __cpu_indicator_init exported > from libgcc_s.so? Ok, I am elaborating so that I understand the issue clearly. The dynamic symbol table of libgcc_s.so: $ objdump -T libgcc_s.so | grep __cpu 0000000000015fd0 g DO .bss 0000000000000010 GCC_4.8.0 __cpu_model It only has __cpu_model, not __cpu_indicator_init just like you pointed out. I will fix this by adding a versioned symbol of __cpu_indicator_init to the *.ver files. Do you see any other issues here? I dont get the duplicate entries part you are referring to. The static symbol table also contains references to __cpu_model and __cpu_indicator_init, but that is expected right? In libgcc.a: readelf -sWt /g/tmsriram/GCC_trunk_svn_mv_fe_at_nfs/native_builds/bld1/install/lib/gcc/x86_64-unknown-linux-gnu/libgcc.a | grep __cpu 20: 0000000000000010 16 OBJECT GLOBAL HIDDEN COM __cpu_model 21: 0000000000000110 612 FUNC GLOBAL HIDDEN 4 __cpu_indicator_init libgcc.a has __cpu_model and __cpu_indicator_init as GLOBAL syms with HIDDEN visibility. Is this an issue? Is this not needed for static linking? Further thoughts: * It looks like libgcc.a is always linked for both static and dynamic links. It occurred to me when you brought this up. So, I thought why not exclude the symbols from libgcc_s.so! Is there any problem here? Example: file:test.c int main () { return (int) __builtin_cpu_is ("corei7"); } Case I : Use gcc to build dynamic $ gcc test.c -Wl,-y,__cpu_model libgcc.a(cpuinfo.o): reference to __cpu_model libgcc_s.so: definition of __cpu_model Case II: Use g++ to build dynamic $ g++ test.c -Wl,-y,__cpu_model fe1.o: reference to __cpu_model libgcc_s.so: definition of __cpu_model Case III: Use gcc to link static $ gcc test.c -Wl,-y,__cpu_model -static fe1.o: reference to __cpu_model libgcc.a(cpuinfo.o): reference to __cpu_model Please note that in all 3 cases, libgcc.a was linked in. Hence, removing these symbols from the dynamic symbol table of libgcc_s.so should have no issues. Thanks, -Sri. > > -- > H.J.
Sign in to reply to this message.
On Sat, May 26, 2012 at 3:34 PM, Sriraman Tallam <tmsriram@google.com> wrote: > On Fri, May 25, 2012 at 10:05 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >> On Fri, May 25, 2012 at 8:38 PM, Sriraman Tallam <tmsriram@google.com> wrote: >>> >>> On May 25, 2012 7:15 PM, "H.J. Lu" <hjl.tools@gmail.com> wrote: >>>> >>>> >>>> On May 25, 2012 6:54 PM, "Sriraman Tallam" <tmsriram@google.com> wrote: >>>> > >>>> > >>>> > >> >>>> > >> On Fri, May 25, 2012 at 5:0 > > BTW, I noticed: >>>> >>>> > > >>>> > > [hjl@gnu-6 pr14170]$ readelf -sW libgcc.a | grep __cpu_model >>>> > >   20: 0000000000000010   16 OBJECT  GLOBAL HIDDEN  COM __cpu_model >>>> > > [hjl@gnu-6 pr14170]$ readelf -sW libgcc_s.so | grep __cpu_model >>>> > >   82: 0000000000214ff0   16 OBJECT  GLOBAL DEFAULT  24 >>>> > > __cpu_model@@GCC_4.8.0 >>>> > >  310: 0000000000214ff0   16 OBJECT  GLOBAL DEFAULT  24 __cpu_model >>>> > > [hjl@gnu-6 pr14170]$ >>>> > > >>>> > > Why is __cpu_model in both libgcc.a and libgcc_s.o? >>>> > >>>> > How do I disallow this in libgcc_s.so? Looks like t-cpuinfo file is >>>> > wrong but I cannot figure out the fix. >>>> > >>>> Why don't you want it in libgcc_s.so? >>> >>> I thought libgcc.a is always linked in for static and dynamic builds. So >>> having it in libgcc_s.so is redundant. >>> >> >> [hjl@gnu-6 pr14170]$ readelf -sW libgcc.a | grep _cpu_ >>   20: 0000000000000010   16 OBJECT  GLOBAL HIDDEN  COM __cpu_model >>   21: 0000000000000110  612 FUNC   GLOBAL HIDDEN   4 __cpu_indicator_init >> [hjl@gnu-6 pr14170]$ readelf -sW libgcc_s.so.1 | grep _cpu_ >>   82: 0000000000214ff0   16 OBJECT  GLOBAL DEFAULT  24 >> __cpu_model@@GCC_4.8.0 >>  223: 0000000000002b60  560 FUNC   LOCAL  DEFAULT  11 __cpu_indicator_init >>  310: 0000000000214ff0   16 OBJECT  GLOBAL DEFAULT  24 __cpu_model >> [hjl@gnu-6 pr14170]$ >> >> I think there should be only one copy of __cpu_model in the process. >> It should be in libgcc_s.so. Why isn't  __cpu_indicator_init exported >> from libgcc_s.so? > > Ok, I am elaborating so that I understand the issue clearly. > > The dynamic symbol table of libgcc_s.so: > > $ objdump -T libgcc_s.so | grep __cpu > > 0000000000015fd0 g   DO .bss  0000000000000010  GCC_4.8.0  __cpu_model > > It only has __cpu_model, not __cpu_indicator_init just like you > pointed out. I will fix this by adding a versioned symbol of > __cpu_indicator_init to the *.ver files. That will be great. > Do you see any other issues here? I dont get the duplicate entries > part you are referring to. The static symbol table also contains > references to __cpu_model and __cpu_indicator_init, but that is > expected right? Duplication comes from static and dynamic symbol tables. > In libgcc.a: > > readelf -sWt /g/tmsriram/GCC_trunk_svn_mv_fe_at_nfs/native_builds/bld1/install/lib/gcc/x86_64-unknown-linux-gnu/libgcc.a > | grep __cpu > >  20: 0000000000000010   16 OBJECT  GLOBAL HIDDEN  COM __cpu_model >   21: 0000000000000110  612 FUNC   GLOBAL HIDDEN   4 __cpu_indicator_init > > libgcc.a has __cpu_model and __cpu_indicator_init as GLOBAL syms with > HIDDEN visibility. Is this an issue? Is this not needed for static > linking? > > Further thoughts: > > * It looks like libgcc.a is always linked for both static and dynamic > links. It occurred to me when you brought this up. So, I thought why > not exclude the symbols from libgcc_s.so! Is there any problem here? > You don't want one copy of those 2 symbols in each DSO where they are used. -- H.J.
Sign in to reply to this message.
On Sat, May 26, 2012 at 4:56 PM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Sat, May 26, 2012 at 3:34 PM, Sriraman Tallam <tmsriram@google.com> wrote: >> On Fri, May 25, 2012 at 10:05 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >>> On Fri, May 25, 2012 at 8:38 PM, Sriraman Tallam <tmsriram@google.com> wrote: >>>> >>>> On May 25, 2012 7:15 PM, "H.J. Lu" <hjl.tools@gmail.com> wrote: >>>>> >>>>> >>>>> On May 25, 2012 6:54 PM, "Sriraman Tallam" <tmsriram@google.com> wrote: >>>>> > >>>>> > >>>>> > >> >>>>> > >> On Fri, May 25, 2012 at 5:0 > > BTW, I noticed: >>>>> >>>>> > > >>>>> > > [hjl@gnu-6 pr14170]$ readelf -sW libgcc.a | grep __cpu_model >>>>> > >   20: 0000000000000010   16 OBJECT  GLOBAL HIDDEN  COM __cpu_model >>>>> > > [hjl@gnu-6 pr14170]$ readelf -sW libgcc_s.so | grep __cpu_model >>>>> > >   82: 0000000000214ff0   16 OBJECT  GLOBAL DEFAULT  24 >>>>> > > __cpu_model@@GCC_4.8.0 >>>>> > >  310: 0000000000214ff0   16 OBJECT  GLOBAL DEFAULT  24 __cpu_model >>>>> > > [hjl@gnu-6 pr14170]$ >>>>> > > >>>>> > > Why is __cpu_model in both libgcc.a and libgcc_s.o? >>>>> > >>>>> > How do I disallow this in libgcc_s.so? Looks like t-cpuinfo file is >>>>> > wrong but I cannot figure out the fix. >>>>> > >>>>> Why don't you want it in libgcc_s.so? >>>> >>>> I thought libgcc.a is always linked in for static and dynamic builds. So >>>> having it in libgcc_s.so is redundant. >>>> >>> >>> [hjl@gnu-6 pr14170]$ readelf -sW libgcc.a | grep _cpu_ >>>   20: 0000000000000010   16 OBJECT  GLOBAL HIDDEN  COM __cpu_model >>>   21: 0000000000000110  612 FUNC   GLOBAL HIDDEN   4 __cpu_indicator_init >>> [hjl@gnu-6 pr14170]$ readelf -sW libgcc_s.so.1 | grep _cpu_ >>>   82: 0000000000214ff0   16 OBJECT  GLOBAL DEFAULT  24 >>> __cpu_model@@GCC_4.8.0 >>>  223: 0000000000002b60  560 FUNC   LOCAL  DEFAULT  11 __cpu_indicator_init >>>  310: 0000000000214ff0   16 OBJECT  GLOBAL DEFAULT  24 __cpu_model >>> [hjl@gnu-6 pr14170]$ >>> >>> I think there should be only one copy of __cpu_model in the process. >>> It should be in libgcc_s.so. Why isn't  __cpu_indicator_init exported >>> from libgcc_s.so? >> >> Ok, I am elaborating so that I understand the issue clearly. >> >> The dynamic symbol table of libgcc_s.so: >> >> $ objdump -T libgcc_s.so | grep __cpu >> >> 0000000000015fd0 g   DO .bss  0000000000000010  GCC_4.8.0  __cpu_model >> >> It only has __cpu_model, not __cpu_indicator_init just like you >> pointed out. I will fix this by adding a versioned symbol of >> __cpu_indicator_init to the *.ver files. > > That will be great. > >> Do you see any other issues here? I dont get the duplicate entries >> part you are referring to. The static symbol table also contains >> references to __cpu_model and __cpu_indicator_init, but that is >> expected right? > > Duplication comes from static and dynamic symbol tables. > >> In libgcc.a: >> >> readelf -sWt /g/tmsriram/GCC_trunk_svn_mv_fe_at_nfs/native_builds/bld1/install/lib/gcc/x86_64-unknown-linux-gnu/libgcc.a >> | grep __cpu >> >>  20: 0000000000000010   16 OBJECT  GLOBAL HIDDEN  COM __cpu_model >>   21: 0000000000000110  612 FUNC   GLOBAL HIDDEN   4 __cpu_indicator_init >> >> libgcc.a has __cpu_model and __cpu_indicator_init as GLOBAL syms with >> HIDDEN visibility. Is this an issue? Is this not needed for static >> linking? >> >> Further thoughts: >> >> * It looks like libgcc.a is always linked for both static and dynamic >> links. It occurred to me when you brought this up. So, I thought why >> not exclude the symbols from libgcc_s.so! Is there any problem here? >> > > You don't want one copy of those 2 symbols in each DSO where > they are used. Right, I agree. But this problem exists right now even if libgcc_s.so is provided with these symbols. Please see example below: Example: dso.c ------- int some_func () { return (int) __builtin_cpu_is ("corei7"); } Build with gcc driver: $ gcc dso.c -fPIC -shared -o dso.so $ nm dso.so | grep __cpu 0000000000000780 t __cpu_indicator_init 0000000000001e00 b __cpu_model This DSO is getting its own local copy of __cpu_model. This is fine functionally but this is not the behaviour you have in mind. whereas, if I build with g++ driver: $ g++ dso.c -fPIC -shared dso.so $ nm dso.so | grep __cpu U __cpu_model This is as we would like, __cpu_model is undefined. The difference is that with the gcc driver, the link line is -lgcc -lgcc_s, whereas with the g++ driver -lgcc is not even present! Should I fix the gcc driver instead? This double-standard is not clear to me. Thanks, -Sri. > > -- > H.J.
Sign in to reply to this message.
On Sat, May 26, 2012 at 5:23 PM, Sriraman Tallam <tmsriram@google.com> wrote: > On Sat, May 26, 2012 at 4:56 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >> On Sat, May 26, 2012 at 3:34 PM, Sriraman Tallam <tmsriram@google.com> wrote: >>> On Fri, May 25, 2012 at 10:05 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >>>> On Fri, May 25, 2012 at 8:38 PM, Sriraman Tallam <tmsriram@google.com> wrote: >>>>> >>>>> On May 25, 2012 7:15 PM, "H.J. Lu" <hjl.tools@gmail.com> wrote: >>>>>> >>>>>> >>>>>> On May 25, 2012 6:54 PM, "Sriraman Tallam" <tmsriram@google.com> wrote: >>>>>> > >>>>>> > >>>>>> > >> >>>>>> > >> On Fri, May 25, 2012 at 5:0 > > BTW, I noticed: >>>>>> >>>>>> > > >>>>>> > > [hjl@gnu-6 pr14170]$ readelf -sW libgcc.a | grep __cpu_model >>>>>> > >   20: 0000000000000010   16 OBJECT  GLOBAL HIDDEN  COM __cpu_model >>>>>> > > [hjl@gnu-6 pr14170]$ readelf -sW libgcc_s.so | grep __cpu_model >>>>>> > >   82: 0000000000214ff0   16 OBJECT  GLOBAL DEFAULT  24 >>>>>> > > __cpu_model@@GCC_4.8.0 >>>>>> > >  310: 0000000000214ff0   16 OBJECT  GLOBAL DEFAULT  24 __cpu_model >>>>>> > > [hjl@gnu-6 pr14170]$ >>>>>> > > >>>>>> > > Why is __cpu_model in both libgcc.a and libgcc_s.o? >>>>>> > >>>>>> > How do I disallow this in libgcc_s.so? Looks like t-cpuinfo file is >>>>>> > wrong but I cannot figure out the fix. >>>>>> > >>>>>> Why don't you want it in libgcc_s.so? >>>>> >>>>> I thought libgcc.a is always linked in for static and dynamic builds. So >>>>> having it in libgcc_s.so is redundant. >>>>> >>>> >>>> [hjl@gnu-6 pr14170]$ readelf -sW libgcc.a | grep _cpu_ >>>>   20: 0000000000000010   16 OBJECT  GLOBAL HIDDEN  COM __cpu_model >>>>   21: 0000000000000110  612 FUNC   GLOBAL HIDDEN   4 __cpu_indicator_init >>>> [hjl@gnu-6 pr14170]$ readelf -sW libgcc_s.so.1 | grep _cpu_ >>>>   82: 0000000000214ff0   16 OBJECT  GLOBAL DEFAULT  24 >>>> __cpu_model@@GCC_4.8.0 >>>>  223: 0000000000002b60  560 FUNC   LOCAL  DEFAULT  11 __cpu_indicator_init >>>>  310: 0000000000214ff0   16 OBJECT  GLOBAL DEFAULT  24 __cpu_model >>>> [hjl@gnu-6 pr14170]$ >>>> >>>> I think there should be only one copy of __cpu_model in the process. >>>> It should be in libgcc_s.so. Why isn't  __cpu_indicator_init exported >>>> from libgcc_s.so? >>> >>> Ok, I am elaborating so that I understand the issue clearly. >>> >>> The dynamic symbol table of libgcc_s.so: >>> >>> $ objdump -T libgcc_s.so | grep __cpu >>> >>> 0000000000015fd0 g   DO .bss  0000000000000010  GCC_4.8.0  __cpu_model >>> >>> It only has __cpu_model, not __cpu_indicator_init just like you >>> pointed out. I will fix this by adding a versioned symbol of >>> __cpu_indicator_init to the *.ver files. >> >> That will be great. >> >>> Do you see any other issues here? I dont get the duplicate entries >>> part you are referring to. The static symbol table also contains >>> references to __cpu_model and __cpu_indicator_init, but that is >>> expected right? >> >> Duplication comes from static and dynamic symbol tables. >> >>> In libgcc.a: >>> >>> readelf -sWt /g/tmsriram/GCC_trunk_svn_mv_fe_at_nfs/native_builds/bld1/install/lib/gcc/x86_64-unknown-linux-gnu/libgcc.a >>> | grep __cpu >>> >>>  20: 0000000000000010   16 OBJECT  GLOBAL HIDDEN  COM __cpu_model >>>   21: 0000000000000110  612 FUNC   GLOBAL HIDDEN   4 __cpu_indicator_init >>> >>> libgcc.a has __cpu_model and __cpu_indicator_init as GLOBAL syms with >>> HIDDEN visibility. Is this an issue? Is this not needed for static >>> linking? >>> >>> Further thoughts: >>> >>> * It looks like libgcc.a is always linked for both static and dynamic >>> links. It occurred to me when you brought this up. So, I thought why >>> not exclude the symbols from libgcc_s.so! Is there any problem here? >>> >> >> You don't want one copy of those 2 symbols in each DSO where >> they are used. > > Right, I agree. But this problem exists right now even if libgcc_s.so > is provided with these symbols. Please see example below: > > Example: > > dso.c > ------- > > int some_func () > { >  return (int) __builtin_cpu_is ("corei7"); > } > > Build with gcc driver: > $ gcc dso.c -fPIC -shared -o dso.so > $ nm dso.so | grep __cpu > 0000000000000780 t __cpu_indicator_init > 0000000000001e00 b __cpu_model > > This DSO is getting its own local copy of __cpu_model. This is fine > functionally but this is not the behaviour you have in mind. > > whereas, if I build with g++ driver: > > $ g++ dso.c -fPIC -shared dso.so > $ nm dso.so | grep __cpu >         U __cpu_model > > This is as we would like, __cpu_model is undefined. > > The difference is that with the gcc driver, the link line is -lgcc > -lgcc_s, whereas with the g++ driver -lgcc is not even present! > > Should I fix the gcc driver instead? This double-standard is not clear to me. > That is because libgcc_s.so is preferred by g++. We can do one of 3 things: 1. Abuse libgcc_eh.a by moving __cpu_model and __cpu_indicator_init from libgcc.a to libgcc_eh.a. 2. Rename libgcc_eh.a to libgcc_static.a and move __cpu_model and __cpu_indicator_init from libgcc.a to libgcc_static.a. 3. Add libgcc_static.a and move __cpu_model and __cpu_indicator_ini from libgcc.a to libgcc_static.a. We treat libgcc_static.a similar to libgcc_eh.a. -- H.J.
Sign in to reply to this message.
On Sat, May 26, 2012 at 7:06 PM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Sat, May 26, 2012 at 5:23 PM, Sriraman Tallam <tmsriram@google.com> wrote: >> On Sat, May 26, 2012 at 4:56 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >>> On Sat, May 26, 2012 at 3:34 PM, Sriraman Tallam <tmsriram@google.com> wrote: >>>> On Fri, May 25, 2012 at 10:05 PM, H.J. Lu <hjl.tools@gmail.com> wrote: >>>>> On Fri, May 25, 2012 at 8:38 PM, Sriraman Tallam <tmsriram@google.com> wrote: >>>>>> >>>>>> On May 25, 2012 7:15 PM, "H.J. Lu" <hjl.tools@gmail.com> wrote: >>>>>>> >>>>>>> >>>>>>> On May 25, 2012 6:54 PM, "Sriraman Tallam" <tmsriram@google.com> wrote: >>>>>>> > >>>>>>> > >>>>>>> > >> >>>>>>> > >> On Fri, May 25, 2012 at 5:0 > > BTW, I noticed: >>>>>>> >>>>>>> > > >>>>>>> > > [hjl@gnu-6 pr14170]$ readelf -sW libgcc.a | grep __cpu_model >>>>>>> > >   20: 0000000000000010   16 OBJECT  GLOBAL HIDDEN  COM __cpu_model >>>>>>> > > [hjl@gnu-6 pr14170]$ readelf -sW libgcc_s.so | grep __cpu_model >>>>>>> > >   82: 0000000000214ff0   16 OBJECT  GLOBAL DEFAULT  24 >>>>>>> > > __cpu_model@@GCC_4.8.0 >>>>>>> > >  310: 0000000000214ff0   16 OBJECT  GLOBAL DEFAULT  24 __cpu_model >>>>>>> > > [hjl@gnu-6 pr14170]$ >>>>>>> > > >>>>>>> > > Why is __cpu_model in both libgcc.a and libgcc_s.o? >>>>>>> > >>>>>>> > How do I disallow this in libgcc_s.so? Looks like t-cpuinfo file is >>>>>>> > wrong but I cannot figure out the fix. >>>>>>> > >>>>>>> Why don't you want it in libgcc_s.so? >>>>>> >>>>>> I thought libgcc.a is always linked in for static and dynamic builds. So >>>>>> having it in libgcc_s.so is redundant. >>>>>> >>>>> >>>>> [hjl@gnu-6 pr14170]$ readelf -sW libgcc.a | grep _cpu_ >>>>>   20: 0000000000000010   16 OBJECT  GLOBAL HIDDEN  COM __cpu_model >>>>>   21: 0000000000000110  612 FUNC   GLOBAL HIDDEN   4 __cpu_indicator_init >>>>> [hjl@gnu-6 pr14170]$ readelf -sW libgcc_s.so.1 | grep _cpu_ >>>>>   82: 0000000000214ff0   16 OBJECT  GLOBAL DEFAULT  24 >>>>> __cpu_model@@GCC_4.8.0 >>>>>  223: 0000000000002b60  560 FUNC   LOCAL  DEFAULT  11 __cpu_indicator_init >>>>>  310: 0000000000214ff0   16 OBJECT  GLOBAL DEFAULT  24 __cpu_model >>>>> [hjl@gnu-6 pr14170]$ >>>>> >>>>> I think there should be only one copy of __cpu_model in the process. >>>>> It should be in libgcc_s.so. Why isn't  __cpu_indicator_init exported >>>>> from libgcc_s.so? >>>> >>>> Ok, I am elaborating so that I understand the issue clearly. >>>> >>>> The dynamic symbol table of libgcc_s.so: >>>> >>>> $ objdump -T libgcc_s.so | grep __cpu >>>> >>>> 0000000000015fd0 g   DO .bss  0000000000000010  GCC_4.8.0  __cpu_model >>>> >>>> It only has __cpu_model, not __cpu_indicator_init just like you >>>> pointed out. I will fix this by adding a versioned symbol of >>>> __cpu_indicator_init to the *.ver files. >>> >>> That will be great. >>> >>>> Do you see any other issues here? I dont get the duplicate entries >>>> part you are referring to. The static symbol table also contains >>>> references to __cpu_model and __cpu_indicator_init, but that is >>>> expected right? >>> >>> Duplication comes from static and dynamic symbol tables. >>> >>>> In libgcc.a: >>>> >>>> readelf -sWt /g/tmsriram/GCC_trunk_svn_mv_fe_at_nfs/native_builds/bld1/install/lib/gcc/x86_64-unknown-linux-gnu/libgcc.a >>>> | grep __cpu >>>> >>>>  20: 0000000000000010   16 OBJECT  GLOBAL HIDDEN  COM __cpu_model >>>>   21: 0000000000000110  612 FUNC   GLOBAL HIDDEN   4 __cpu_indicator_init >>>> >>>> libgcc.a has __cpu_model and __cpu_indicator_init as GLOBAL syms with >>>> HIDDEN visibility. Is this an issue? Is this not needed for static >>>> linking? >>>> >>>> Further thoughts: >>>> >>>> * It looks like libgcc.a is always linked for both static and dynamic >>>> links. It occurred to me when you brought this up. So, I thought why >>>> not exclude the symbols from libgcc_s.so! Is there any problem here? >>>> >>> >>> You don't want one copy of those 2 symbols in each DSO where >>> they are used. >> >> Right, I agree. But this problem exists right now even if libgcc_s.so >> is provided with these symbols. Please see example below: >> >> Example: >> >> dso.c >> ------- >> >> int some_func () >> { >>  return (int) __builtin_cpu_is ("corei7"); >> } >> >> Build with gcc driver: >> $ gcc dso.c -fPIC -shared -o dso.so >> $ nm dso.so | grep __cpu >> 0000000000000780 t __cpu_indicator_init >> 0000000000001e00 b __cpu_model >> >> This DSO is getting its own local copy of __cpu_model. This is fine >> functionally but this is not the behaviour you have in mind. >> >> whereas, if I build with g++ driver: >> >> $ g++ dso.c -fPIC -shared dso.so >> $ nm dso.so | grep __cpu >>         U __cpu_model >> >> This is as we would like, __cpu_model is undefined. >> >> The difference is that with the gcc driver, the link line is -lgcc >> -lgcc_s, whereas with the g++ driver -lgcc is not even present! >> >> Should I fix the gcc driver instead? This double-standard is not clear to me. >> > > That is because libgcc_s.so is preferred by g++. We can do one > of 3 things: > > 1. Abuse libgcc_eh.a by moving __cpu_model and __cpu_indicator_init > from libgcc.a to libgcc_eh.a. > 2. Rename libgcc_eh.a to libgcc_static.a and move __cpu_model and > __cpu_indicator_init from libgcc.a to libgcc_static.a. > 3. Add  libgcc_static.a and move __cpu_model and __cpu_indicator_ini >  from libgcc.a to libgcc_static.a.  We treat libgcc_static.a similar to > libgcc_eh.a. Any reason why gcc should not be made to prefer libgcc_s.so too like g++? Thanks for clearing this up. I will take a stab at it. -Sri. > > > -- > H.J.
Sign in to reply to this message.
On Sat, May 26, 2012 at 7:23 PM, Sriraman Tallam <tmsriram@google.com> wrote: >> >> That is because libgcc_s.so is preferred by g++. We can do one >> of 3 things: >> >> 1. Abuse libgcc_eh.a by moving __cpu_model and __cpu_indicator_init >> from libgcc.a to libgcc_eh.a. >> 2. Rename libgcc_eh.a to libgcc_static.a and move __cpu_model and >> __cpu_indicator_init from libgcc.a to libgcc_static.a. >> 3. Add  libgcc_static.a and move __cpu_model and __cpu_indicator_ini >>  from libgcc.a to libgcc_static.a.  We treat libgcc_static.a similar to >> libgcc_eh.a. > > Any reason why gcc should not be made to prefer libgcc_s.so too like g++? > > Thanks for clearing this up. I will take a stab at it. > This is a long story. The short answer is people didn't want to add libgcc_s.so to DT_NEEDED for C programs. But it is no longer an issue since we now pass -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed to linker. -- H.J.
Sign in to reply to this message.
Sriraman Tallam <tmsriram@google.com> writes: > Any reason why gcc should not be made to prefer libgcc_s.so too like g++? It's controlled by the -shared-libgcc and -static-libgcc options. The -shared-libgcc option is the default for g++ because several years ago a shared libgcc was required to make exception handling work correctly when exceptions were thrown across shared library boundaries. That is no longer true when using GNU ld or gold on a GNU/Linux system, but it is still true on some systems. The -static-libgcc option is the default for gcc because the assumption is that most C programs do not throw exceptions. The -shared-libgcc option is available for those that do. Ian
Sign in to reply to this message.
Hi, Attaching updated patch for function multiversioning which brings in plenty of changes. * As suggested by Richard earlier, I have made cgraph aware of function versions. All nodes of function versions are chained and the dispatcher bodies are created on demand while building cgraph edges. The dispatcher body will be created if and only if there is a call or reference to a versioned function. Previously, I was maintaining the list of versions separately in a hash map, all that is gone now. * Now, the file multiverison.c has some helper routines that are used in the context of function versioning. There are no new passes and no new globals. * More tests, updated existing tests. * Fixed lots of bugs. * Updated patch description. Patch attached. Patch also available for review at http://codereview.appspot.com/5752064 Please let me know what you think, Thanks, -Sri. On Mon, May 14, 2012 at 11:28 AM, Sriraman Tallam <tmsriram@google.com> wrote: > Hi H.J, > > Â Attaching new patch with 2 test cases, mv2.C checks ISAs only and > mv1.C checks ISAs and arches mixed. Right now, checking only arches is > not needed as they are mutually exclusive, any order should be fine. > > Patch also available for review here: Â http://codereview.appspot.com/5752064 > > Thanks, > -Sri. > > On Sat, May 12, 2012 at 6:37 AM, H.J. Lu <hjl.tools@gmail.com> wrote: >> On Fri, May 11, 2012 at 7:04 PM, Sriraman Tallam <tmsriram@google.com> wrote: >>> Hi H.J., >>> >>> Â I have updated the patch to improve the dispatching method like we >>> discussed. Each feature gets a priority now, and the dispatching is >>> done in priority order. Please see i386.c for the changes. >>> >>> Patch also available for review here: Â http://codereview.appspot.com/5752064 >>> >> >> I think you need 3 tests: >> >> 1. Â Only with ISA. >> 2. Â Only with arch >> 3. Â Mixed with ISA and arch >> >> since test mixed ISA and arch may hide issues with ISA only or arch only. >> >> -- >> H.J.
Sign in to reply to this message.
On Mon, Jun 4, 2012 at 11:59 AM, Sriraman Tallam <tmsriram@google.com> wrote: > Hi, > > Â Attaching updated patch for function multiversioning which brings > in plenty of changes. > > * As suggested by Richard earlier, I have made cgraph aware of > function versions. All nodes of function versions are chained and the > dispatcher bodies are created on demand while building cgraph edges. > The dispatcher body will be created if and only if there is a call or > reference to a versioned function. Previously, I was maintaining the > list of versions separately in a hash map, all that is gone now. > * Now, the file multiverison.c has some helper routines that are used > in the context of function versioning. There are no new passes and no > new globals. > * More tests, updated existing tests. > * Fixed lots of bugs. > * Updated patch description. > > Patch attached. Patch also available for review at > http://codereview.appspot.com/5752064 > > Please let me know what you think, > Build failed in libstdc++-v3: /export/build/gnu/gcc/build-x86_64-linux/x86_64-unknown-linux-gnu/libstdc++-v3/include/bits/locale_classes.h:546:59: internal compiler error: tree check: expected function_decl, have identifier_node in tourney, at cp/call.c:8498 for (size_t __i = 0; __ret && __i < _S_categories_size - 1; ++__i) ^ Please submit a full bug report, with preprocessed source if appropriate. See <http://gcc.gnu.org/bugs.html> for instructions. make[5]: *** [x86_64-unknown-linux-gnu/bits/stdc++.h.gch/O2g.gch] Erro on Linux/x86-64. -- H.J.
Sign in to reply to this message.
Bug fixed and new patch attached. Patch also available for review at http://codereview.appspot.com/5752064 Thanks, -Sri. On Mon, Jun 4, 2012 at 2:36 PM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Mon, Jun 4, 2012 at 11:59 AM, Sriraman Tallam <tmsriram@google.com> wrote: >> Hi, >> >> Â Attaching updated patch for function multiversioning which brings >> in plenty of changes. >> >> * As suggested by Richard earlier, I have made cgraph aware of >> function versions. All nodes of function versions are chained and the >> dispatcher bodies are created on demand while building cgraph edges. >> The dispatcher body will be created if and only if there is a call or >> reference to a versioned function. Previously, I was maintaining the >> list of versions separately in a hash map, all that is gone now. >> * Now, the file multiverison.c has some helper routines that are used >> in the context of function versioning. There are no new passes and no >> new globals. >> * More tests, updated existing tests. >> * Fixed lots of bugs. >> * Updated patch description. >> >> Patch attached. Patch also available for review at >> http://codereview.appspot.com/5752064 >> >> Please let me know what you think, >> > > Build failed in libstdc++-v3: > > /export/build/gnu/gcc/build-x86_64-linux/x86_64-unknown-linux-gnu/libstdc++-v3/include/bits/locale_classes.h:546:59: > internal compiler error: tree check: expected function_decl, have > identifier_node in tourney, at cp/call.c:8498 > Â for (size_t __i = 0; __ret && __i < _S_categories_size - 1; ++__i) > Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â ^ > Please submit a full bug report, > with preprocessed source if appropriate. > See <http://gcc.gnu.org/bugs.html> for instructions. > make[5]: *** [x86_64-unknown-linux-gnu/bits/stdc++.h.gch/O2g.gch] Erro > > on Linux/x86-64. > > > -- > H.J.
Sign in to reply to this message.
On Mon, Jun 4, 2012 at 3:29 PM, Sriraman Tallam <tmsriram@google.com> wrote: > Bug fixed and new patch attached. > > Patch also available for review at http://codereview.appspot.com/5752064 > I think you should also export __cpu_indicator_init in libgcc_s.so. Also, is this feature C++ only? Can you make it to work for C? -- H.J.
Sign in to reply to this message.
On Jun 5, 2012 6:56 AM, "H.J. Lu" <hjl.tools@gmail.com> wrote: > > On Mon, Jun 4, 2012 at 3:29 PM, Sriraman Tallam <tmsriram@google.com> wrote: > > Bug fixed and new patch attached. > > > > Patch also available for review at http://codereview.appspot.com/5752064 > > > > I think you should also export __cpu_indicator_init in libgcc_s.so. > Also, is this feature C++ only? Can you make it to work for C? Yes, I should have that patch shortly. I just wanted to keep front end and run time patches separate. Yes, I plan to support C. I may have to make a new attribute for it. Thanks, -Sri. > > > -- > H.J.
Sign in to reply to this message.
+cc c++ front-end maintainers Hi, C++ Frontend maintainers, Could you please take a look at the front-end part when you find the time? Honza, your thoughts on the callgraph part? Richard, any further comments/feedback? Additionally, I am working on generating better mangled names for function versions, along the lines of C++ thunks. Thanks, -Sri. On Mon, Jun 4, 2012 at 11:59 AM, Sriraman Tallam <tmsriram@google.com> wrote: > Hi, > > Â Attaching updated patch for function multiversioning which brings > in plenty of changes. > > * As suggested by Richard earlier, I have made cgraph aware of > function versions. All nodes of function versions are chained and the > dispatcher bodies are created on demand while building cgraph edges. > The dispatcher body will be created if and only if there is a call or > reference to a versioned function. Previously, I was maintaining the > list of versions separately in a hash map, all that is gone now. > * Now, the file multiverison.c has some helper routines that are used > in the context of function versioning. There are no new passes and no > new globals. > * More tests, updated existing tests. > * Fixed lots of bugs. > * Updated patch description. > > Patch attached. Patch also available for review at > http://codereview.appspot.com/5752064 > > Please let me know what you think, > > Thanks, > -Sri. > > > On Mon, May 14, 2012 at 11:28 AM, Sriraman Tallam <tmsriram@google.com> wrote: >> Hi H.J, >> >> Â Attaching new patch with 2 test cases, mv2.C checks ISAs only and >> mv1.C checks ISAs and arches mixed. Right now, checking only arches is >> not needed as they are mutually exclusive, any order should be fine. >> >> Patch also available for review here: Â http://codereview.appspot.com/5752064 >> >> Thanks, >> -Sri. >> >> On Sat, May 12, 2012 at 6:37 AM, H.J. Lu <hjl.tools@gmail.com> wrote: >>> On Fri, May 11, 2012 at 7:04 PM, Sriraman Tallam <tmsriram@google.com> wrote: >>>> Hi H.J., >>>> >>>> Â I have updated the patch to improve the dispatching method like we >>>> discussed. Each feature gets a priority now, and the dispatching is >>>> done in priority order. Please see i386.c for the changes. >>>> >>>> Patch also available for review here: Â http://codereview.appspot.com/5752064 >>>> >>> >>> I think you need 3 tests: >>> >>> 1. Â Only with ISA. >>> 2. Â Only with arch >>> 3. Â Mixed with ISA and arch >>> >>> since test mixed ISA and arch may hide issues with ISA only or arch only. >>> >>> -- >>> H.J.
Sign in to reply to this message.
Ping. On Thu, Jun 14, 2012 at 1:13 PM, Sriraman Tallam <tmsriram@google.com> wrote: > +cc c++ front-end maintainers > > Hi, > > Â C++ Frontend maintainers, Could you please take a look at the > front-end part when you find the time? > > Â Honza, your thoughts on the callgraph part? > > Â Richard, any further comments/feedback? > > Â Additionally, I am working on generating better mangled names for > function versions, along the lines of C++ thunks. > > Thanks, > -Sri. > > On Mon, Jun 4, 2012 at 11:59 AM, Sriraman Tallam <tmsriram@google.com> wrote: >> Hi, >> >> Â Attaching updated patch for function multiversioning which brings >> in plenty of changes. >> >> * As suggested by Richard earlier, I have made cgraph aware of >> function versions. All nodes of function versions are chained and the >> dispatcher bodies are created on demand while building cgraph edges. >> The dispatcher body will be created if and only if there is a call or >> reference to a versioned function. Previously, I was maintaining the >> list of versions separately in a hash map, all that is gone now. >> * Now, the file multiverison.c has some helper routines that are used >> in the context of function versioning. There are no new passes and no >> new globals. >> * More tests, updated existing tests. >> * Fixed lots of bugs. >> * Updated patch description. >> >> Patch attached. Patch also available for review at >> http://codereview.appspot.com/5752064 >> >> Please let me know what you think, >> >> Thanks, >> -Sri. >> >> >> On Mon, May 14, 2012 at 11:28 AM, Sriraman Tallam <tmsriram@google.com> wrote: >>> Hi H.J, >>> >>> Â Attaching new patch with 2 test cases, mv2.C checks ISAs only and >>> mv1.C checks ISAs and arches mixed. Right now, checking only arches is >>> not needed as they are mutually exclusive, any order should be fine. >>> >>> Patch also available for review here: Â http://codereview.appspot.com/5752064 >>> >>> Thanks, >>> -Sri. >>> >>> On Sat, May 12, 2012 at 6:37 AM, H.J. Lu <hjl.tools@gmail.com> wrote: >>>> On Fri, May 11, 2012 at 7:04 PM, Sriraman Tallam <tmsriram@google.com> wrote: >>>>> Hi H.J., >>>>> >>>>> Â I have updated the patch to improve the dispatching method like we >>>>> discussed. Each feature gets a priority now, and the dispatching is >>>>> done in priority order. Please see i386.c for the changes. >>>>> >>>>> Patch also available for review here: Â http://codereview.appspot.com/5752064 >>>>> >>>> >>>> I think you need 3 tests: >>>> >>>> 1. Â Only with ISA. >>>> 2. Â Only with arch >>>> 3. Â Mixed with ISA and arch >>>> >>>> since test mixed ISA and arch may hide issues with ISA only or arch only. >>>> >>>> -- >>>> H.J.
Sign in to reply to this message.
Ping. On Tue, Jun 19, 2012 at 6:03 PM, Sriraman Tallam <tmsriram@google.com>wrote: > Ping. > > On Thu, Jun 14, 2012 at 1:13 PM, Sriraman Tallam <tmsriram@google.com> > wrote: > > +cc c++ front-end maintainers > > > > Hi, > > > > C++ Frontend maintainers, Could you please take a look at the > > front-end part when you find the time? > > > > Honza, your thoughts on the callgraph part? > > > > Richard, any further comments/feedback? > > > > Additionally, I am working on generating better mangled names for > > function versions, along the lines of C++ thunks. > > > > Thanks, > > -Sri. > > > > On Mon, Jun 4, 2012 at 11:59 AM, Sriraman Tallam <tmsriram@google.com> > wrote: > >> Hi, > >> > >> Attaching updated patch for function multiversioning which brings > >> in plenty of changes. > >> > >> * As suggested by Richard earlier, I have made cgraph aware of > >> function versions. All nodes of function versions are chained and the > >> dispatcher bodies are created on demand while building cgraph edges. > >> The dispatcher body will be created if and only if there is a call or > >> reference to a versioned function. Previously, I was maintaining the > >> list of versions separately in a hash map, all that is gone now. > >> * Now, the file multiverison.c has some helper routines that are used > >> in the context of function versioning. There are no new passes and no > >> new globals. > >> * More tests, updated existing tests. > >> * Fixed lots of bugs. > >> * Updated patch description. > >> > >> Patch attached. Patch also available for review at > >> http://codereview.appspot.com/5752064 > >> > >> Please let me know what you think, > >> > >> Thanks, > >> -Sri. > >> > >> > >> On Mon, May 14, 2012 at 11:28 AM, Sriraman Tallam <tmsriram@google.com> > wrote: > >>> Hi H.J, > >>> > >>> Attaching new patch with 2 test cases, mv2.C checks ISAs only and > >>> mv1.C checks ISAs and arches mixed. Right now, checking only arches is > >>> not needed as they are mutually exclusive, any order should be fine. > >>> > >>> Patch also available for review here: > http://codereview.appspot.com/5752064 > >>> > >>> Thanks, > >>> -Sri. > >>> > >>> On Sat, May 12, 2012 at 6:37 AM, H.J. Lu <hjl.tools@gmail.com> wrote: > >>>> On Fri, May 11, 2012 at 7:04 PM, Sriraman Tallam <tmsriram@google.com> > wrote: > >>>>> Hi H.J., > >>>>> > >>>>> I have updated the patch to improve the dispatching method like we > >>>>> discussed. Each feature gets a priority now, and the dispatching is > >>>>> done in priority order. Please see i386.c for the changes. > >>>>> > >>>>> Patch also available for review here: > http://codereview.appspot.com/5752064 > >>>>> > >>>> > >>>> I think you need 3 tests: > >>>> > >>>> 1. Only with ISA. > >>>> 2. Only with arch > >>>> 3. Mixed with ISA and arch > >>>> > >>>> since test mixed ISA and arch may hide issues with ISA only or arch > only. > >>>> > >>>> -- > >>>> H.J. >
Sign in to reply to this message.
On Thu, Jun 14, 2012 at 10:13 PM, Sriraman Tallam <tmsriram@google.com> wrote: > +cc c++ front-end maintainers > > Hi, > > C++ Frontend maintainers, Could you please take a look at the > front-end part when you find the time? So you have (for now?) omitted the C frontend change(s)? > Honza, your thoughts on the callgraph part? > > Richard, any further comments/feedback? Overall I like it - the cgraph portions need comments from Honza and the C++ portions from a C++ maintainer though. I would appreciate a C version, too. As you are tackling the C++ frontend first you should add some C++ specific testcases - if only to verify you properly reject cases you do not or can not implement. Like eventually class Foo { virtual void bar() __attribute__((target("sse"))); virtual void bar() __attribute__((target("sse2"))); }; or template <class T> void bar (T t) __attribute__((target("sse"))); template <class T> void bar (T t) __attribute__((target("sse2"))); template <> void bar (int t); (how does regular C++ overload resolution / template specialization interfere with the target overloads?) Thanks, Richard. > Additionally, I am working on generating better mangled names for > function versions, along the lines of C++ thunks. > > Thanks, > -Sri. > > On Mon, Jun 4, 2012 at 11:59 AM, Sriraman Tallam <tmsriram@google.com> wrote: >> Hi, >> >> Attaching updated patch for function multiversioning which brings >> in plenty of changes. >> >> * As suggested by Richard earlier, I have made cgraph aware of >> function versions. All nodes of function versions are chained and the >> dispatcher bodies are created on demand while building cgraph edges. >> The dispatcher body will be created if and only if there is a call or >> reference to a versioned function. Previously, I was maintaining the >> list of versions separately in a hash map, all that is gone now. >> * Now, the file multiverison.c has some helper routines that are used >> in the context of function versioning. There are no new passes and no >> new globals. >> * More tests, updated existing tests. >> * Fixed lots of bugs. >> * Updated patch description. >> >> Patch attached. Patch also available for review at >> http://codereview.appspot.com/5752064 >> >> Please let me know what you think, >> >> Thanks, >> -Sri. >> >> >> On Mon, May 14, 2012 at 11:28 AM, Sriraman Tallam <tmsriram@google.com> wrote: >>> Hi H.J, >>> >>> Attaching new patch with 2 test cases, mv2.C checks ISAs only and >>> mv1.C checks ISAs and arches mixed. Right now, checking only arches is >>> not needed as they are mutually exclusive, any order should be fine. >>> >>> Patch also available for review here: http://codereview.appspot.com/5752064 >>> >>> Thanks, >>> -Sri. >>> >>> On Sat, May 12, 2012 at 6:37 AM, H.J. Lu <hjl.tools@gmail.com> wrote: >>>> On Fri, May 11, 2012 at 7:04 PM, Sriraman Tallam <tmsriram@google.com> wrote: >>>>> Hi H.J., >>>>> >>>>> I have updated the patch to improve the dispatching method like we >>>>> discussed. Each feature gets a priority now, and the dispatching is >>>>> done in priority order. Please see i386.c for the changes. >>>>> >>>>> Patch also available for review here: http://codereview.appspot.com/5752064 >>>>> >>>> >>>> I think you need 3 tests: >>>> >>>> 1. Only with ISA. >>>> 2. Only with arch >>>> 3. Mixed with ISA and arch >>>> >>>> since test mixed ISA and arch may hide issues with ISA only or arch only. >>>> >>>> -- >>>> H.J.
Sign in to reply to this message.
On Fri, Jul 6, 2012 at 2:14 AM, Richard Guenther <richard.guenther@gmail.com> wrote: > > On Thu, Jun 14, 2012 at 10:13 PM, Sriraman Tallam <tmsriram@google.com> wrote: > > +cc c++ front-end maintainers > > > > Hi, > > > > C++ Frontend maintainers, Could you please take a look at the > > front-end part when you find the time? > > So you have (for now?) omitted the C frontend change(s)? Yes, for now. I thought I will get the C++ changes and associated middle-end checked in first. The C changes should be easy to add, I have to introduce a new attribute for this. So, the C front-end should look like this: int foo (); // default version. int foo_sse4() __attribute__ ((version("foo"), target("sse4.2"))); // A version of foo. and the call will be to foo. The version attribute will be the new one, there may be an existing attribute that I could use too for this purpose. I was thinking if the "alias" attribute along with the "target" attribute could be used for this purpose but it makes things unnecessarily complicated. What do you think? > > > Honza, your thoughts on the callgraph part? > > > > Richard, any further comments/feedback? > > Overall I like it - the cgraph portions need comments from Honza and the > C++ portions from a C++ maintainer though. > > I would appreciate a C version, too. Sure, I will get to it immediately after the current patch reaches a stable point. > > As you are tackling the C++ frontend first you should add some C++ > specific testcases - if only to verify you properly reject cases you do not > or can not implement. Like eventually Sure, I will add these test cases. Thanks for reviewing, -Sri. > > class Foo { > virtual void bar() __attribute__((target("sse"))); > virtual void bar() __attribute__((target("sse2"))); > }; > > or > > template <class T> > void bar (T t) __attribute__((target("sse"))); > template <class T> > void bar (T t) __attribute__((target("sse2"))); > template <> > void bar (int t); > > (how does regular C++ overload resolution / template specialization > interfere with the target overloads?) > > Thanks, > Richard. > > > Additionally, I am working on generating better mangled names for > > function versions, along the lines of C++ thunks. > > > > Thanks, > > -Sri. > > > > On Mon, Jun 4, 2012 at 11:59 AM, Sriraman Tallam <tmsriram@google.com> wrote: > >> Hi, > >> > >> Attaching updated patch for function multiversioning which brings > >> in plenty of changes. > >> > >> * As suggested by Richard earlier, I have made cgraph aware of > >> function versions. All nodes of function versions are chained and the > >> dispatcher bodies are created on demand while building cgraph edges. > >> The dispatcher body will be created if and only if there is a call or > >> reference to a versioned function. Previously, I was maintaining the > >> list of versions separately in a hash map, all that is gone now. > >> * Now, the file multiverison.c has some helper routines that are used > >> in the context of function versioning. There are no new passes and no > >> new globals. > >> * More tests, updated existing tests. > >> * Fixed lots of bugs. > >> * Updated patch description. > >> > >> Patch attached. Patch also available for review at > >> http://codereview.appspot.com/5752064 > >> > >> Please let me know what you think, > >> > >> Thanks, > >> -Sri. > >> > >> > >> On Mon, May 14, 2012 at 11:28 AM, Sriraman Tallam <tmsriram@google.com> wrote: > >>> Hi H.J, > >>> > >>> Attaching new patch with 2 test cases, mv2.C checks ISAs only and > >>> mv1.C checks ISAs and arches mixed. Right now, checking only arches is > >>> not needed as they are mutually exclusive, any order should be fine. > >>> > >>> Patch also available for review here: http://codereview.appspot.com/5752064 > >>> > >>> Thanks, > >>> -Sri. > >>> > >>> On Sat, May 12, 2012 at 6:37 AM, H.J. Lu <hjl.tools@gmail.com> wrote: > >>>> On Fri, May 11, 2012 at 7:04 PM, Sriraman Tallam <tmsriram@google.com> wrote: > >>>>> Hi H.J., > >>>>> > >>>>> I have updated the patch to improve the dispatching method like we > >>>>> discussed. Each feature gets a priority now, and the dispatching is > >>>>> done in priority order. Please see i386.c for the changes. > >>>>> > >>>>> Patch also available for review here: http://codereview.appspot.com/5752064 > >>>>> > >>>> > >>>> I think you need 3 tests: > >>>> > >>>> 1. Only with ISA. > >>>> 2. Only with arch > >>>> 3. Mixed with ISA and arch > >>>> > >>>> since test mixed ISA and arch may hide issues with ISA only or arch only. > >>>> > >>>> -- > >>>> H.J.
Sign in to reply to this message.
On 06/14/2012 04:13 PM, Sriraman Tallam wrote: > C++ Frontend maintainers, Could you please take a look at the > front-end part when you find the time? It seems to me that what you have here are target-specific attributes that affect the signature of a function such that they make two declarations different that would otherwise declare the same function. Stepping away from the specific notion of versioning, it seems that these are the questions that you want the front end to be able to ask about these attributes: * Does this attribute affect a function signature? * Do the attributes on these two declarations make them different? * Do the attributes on these two declarations make one a better match? * Given a call to function X, should I call another function instead? * Return a string representation of the attributes on this function that affect its signature. Does this seem like a worthwhile direction to other people, or do you like better the approach the patch takes, handling versioning directly? Jason
Sign in to reply to this message.
On Fri, Jul 6, 2012 at 11:05 PM, Jason Merrill <jason@redhat.com> wrote: > On 06/14/2012 04:13 PM, Sriraman Tallam wrote: >> >> C++ Frontend maintainers, Could you please take a look at the >> front-end part when you find the time? > > > It seems to me that what you have here are target-specific attributes that > affect the signature of a function such that they make two declarations > different that would otherwise declare the same function. Stepping away from > the specific notion of versioning, it seems that these are the questions > that you want the front end to be able to ask about these attributes: > > * Does this attribute affect a function signature? The question becomes if a caller 'bar' with target attribute 'x' can make a call to a function 'foo' with an incompatible target attribute 'y'. If the answer is no, then the target attribute is part of 'foo's signature. I think the answer is yes -- the attribute affects a function signature. > * Do the attributes on these two declarations make them different? yes. > * Do the attributes on these two declarations make one a better match? yes -- and there are rules defined for that. > * Given a call to function X, should I call another function instead? The binding can happen at compile time (given caller/callee attribute) or at the runtime. > * Return a string representation of the attributes on this function that > affect its signature. yes. > > Does this seem like a worthwhile direction to other people, or do you like > better the approach the patch takes, handling versioning directly? There are prior discussions about this. The direct way of handling it is to use __builtin_dispatch, but we concluded that using function overloading is much more user friendly. Note that Intel's icc has a similar feature to the overloading approach implemented by Sri here. thanks, David > > Jason
Sign in to reply to this message.
On 07/07/2012 08:38 PM, Xinliang David Li wrote: >> It seems to me that what you have here are target-specific attributes that >> affect the signature of a function such that they make two declarations >> different that would otherwise declare the same function. Stepping away from >> the specific notion of versioning, it seems that these are the questions >> that you want the front end to be able to ask about these attributes: >> >> * Does this attribute affect a function signature? > > The question becomes if a caller 'bar' with target attribute 'x' can > make a call to a function 'foo' with an incompatible target attribute > 'y'. If the answer is no, then the target attribute is part of 'foo's > signature. I think the answer is yes -- the attribute affects a > function signature. Yes, clearly the answer is yes for the target attribute. But I wasn't asking someone to answer those questions; I was saying that those are the questions the front end needs to be able to ask of the back end in order to implement this functionality in a more generic way. Jason
Sign in to reply to this message.
http://codereview.appspot.com/5752064/diff/51001/gcc/cgraph.c File gcc/cgraph.c (right): http://codereview.appspot.com/5752064/diff/51001/gcc/cgraph.c#newcode1282 gcc/cgraph.c:1282: is needed as the address can be used to do an indirect call. */ Extend the comment here. http://codereview.appspot.com/5752064/diff/51001/gcc/cgraph.h File gcc/cgraph.h (right): http://codereview.appspot.com/5752064/diff/51001/gcc/cgraph.h#newcode230 gcc/cgraph.h:230: /* Chains all the semantically identical function versions. The It is better to extend the comments on these two members because dispatcher and resolver seem to mean the same thing. http://codereview.appspot.com/5752064/diff/51001/gcc/cgraph.h#newcode236 gcc/cgraph.h:236: /* For functions with many calls sites it holds map from call expression It might be better to put the four fields into a separate data structure with only one pointer from cgraph_node to it. When the lowering completes, they can be destroyed to save memory consumption. http://codereview.appspot.com/5752064/diff/51001/gcc/cgraphbuild.c File gcc/cgraphbuild.c (right): http://codereview.appspot.com/5752064/diff/51001/gcc/cgraphbuild.c#newcode321 gcc/cgraphbuild.c:321: if (decl && cgraph_get_node (decl) There does not seem to a need to add this condition here -- newly created node would have the dispatch_function bit set any way. Besides, there are 3 calls to cgraph_get_node here. It is cleaner to sink this code into the following if (decl) .. if (decl) { struct cgraph_node *callee = cgraph_get_create_node (decl); if (callee->dispatch_function) { build_resolver_for_function_versions (node); gcc_assert (get_mv_resolver (node)); } cgraph_create_edge (node, callee, stmt, bb->count, freq); } else ... http://codereview.appspot.com/5752064/diff/51001/gcc/cgraphunit.c File gcc/cgraphunit.c (right): http://codereview.appspot.com/5752064/diff/51001/gcc/cgraphunit.c#newcode942 gcc/cgraphunit.c:942: enqueue_node ((symtab_node)edge->callee); This change is irrelevant.
Sign in to reply to this message.
Ok. Do you have specific comments on the patch? thanks, David On Sun, Jul 8, 2012 at 4:20 AM, Jason Merrill <jason@redhat.com> wrote: > On 07/07/2012 08:38 PM, Xinliang David Li wrote: >>> >>> It seems to me that what you have here are target-specific attributes >>> that >>> affect the signature of a function such that they make two declarations >>> different that would otherwise declare the same function. Stepping away >>> from >>> the specific notion of versioning, it seems that these are the questions >>> that you want the front end to be able to ask about these attributes: >>> >>> * Does this attribute affect a function signature? >> >> >> The question becomes if a caller 'bar' with target attribute 'x' can >> make a call to a function 'foo' with an incompatible target attribute >> 'y'. If the answer is no, then the target attribute is part of 'foo's >> signature. I think the answer is yes -- the attribute affects a >> function signature. > > > Yes, clearly the answer is yes for the target attribute. But I wasn't > asking someone to answer those questions; I was saying that those are the > questions the front end needs to be able to ask of the back end in order to > implement this functionality in a more generic way. > > Jason
Sign in to reply to this message.
On 07/09/2012 11:27 PM, Xinliang David Li wrote: > Ok. Do you have specific comments on the patch? My comment is "Perhaps we want to implement this using a more generic mechanism." I was thinking to defer a detailed code review until that question is settled. Jason
Sign in to reply to this message.
On Tue, Jul 10, 2012 at 2:46 AM, Jason Merrill <jason@redhat.com> wrote: > On 07/09/2012 11:27 PM, Xinliang David Li wrote: >> >> Ok. Do you have specific comments on the patch? > > > My comment is "Perhaps we want to implement this using a more generic > mechanism." I was thinking to defer a detailed code review until that > question is settled. We all like more generic solutions :) Sri, can you provide more descriptions on FE changes -- this will help reviewers get started. By the way, there are a couple of files with bad contents and needs re-upload -- e.g, cp/decl.c. thanks, David > > Jason
Sign in to reply to this message.
Hi Jason/David, Thanks for the comments. On Tue, Jul 10, 2012 at 9:08 AM, Xinliang David Li <davidxl@google.com>wrote: > On Tue, Jul 10, 2012 at 2:46 AM, Jason Merrill <jason@redhat.com> wrote: > > On 07/09/2012 11:27 PM, Xinliang David Li wrote: > >> > >> Ok. Do you have specific comments on the patch? > > > > > > My comment is "Perhaps we want to implement this using a more generic > > mechanism." I was thinking to defer a detailed code review until that > > question is settled. > I am not sure what you mean by more generic so I am giving a overview of how the front-end is implemented so that you could say if this should be done in a different way. I am using the questions you asked previously to explain how I solved each of them. When working on this patch, these are the exact questions I had and tried to address it. * Does this attribute affect a function signature? The function signature should be changed when there is more than one definition/declaration of foo distinguished by unique target attributes. To make use of overloading, the DECL_NAME is not changed but only the DECL_ASSEMBLER_NAME is modified by appending the target attribute string which is canonicalized. The functionality for this is in multiversion.c. The DECL_ASSEMBLER_NAME for a function is changed as soon as it is known that it is a version. The default function, which is defined as that with no target attributes remains unchanged. * Do the attributes on these two declarations make them different? Yes, declarations with different target attributes correspond to different function versions and their assembler names are changed as explained above. * Do the attributes on these two declarations make one a better match? Which declaration matches depends on the run-time platform and hence the dispatching code to determine this at run-time. The code generating the body of the dispatcher is present in multiversion.c * Given a call to function X, should I call another function instead? Yes, this is what the dispatching takes care of. * Return a string representation of the attributes on this function that affect its signature. Yes, as described above the code for this is in multiversion.c Importantly, * The FE can detect that a declaration/definition is a version when it checks for duplicate decls in cp/decl.c. At this point, the FE checks and mark function versions. * When a call to a function that is a version is detected, in build_over_call in cp/call.c, the function joust determines if it is possible to make a direct call to a version or it needs to go through the dispatcher. If a direct call can be made, in some cases, it is done. Otherwise, the call is to a dispatcher function generated by the FE. * A similar handling is done when the pointer to a function is taken, resolve_address_of_overloaded_function in cp/class.c. The dispatcher is always generated in this case since an indirect call can be made using the pointer. * Handle comdat function versions in cxx_comdat_group by using the DECL_NAME rather than the DECL_ASSEMBLER_NAME * The cgraph saves the decls of all function versions so that future optimizations can know of semantically identical versions. I dont think the front-end changes are complicated. The new functionality is calls to functions defined in multiversion.c/cgraph, at what I could determine as appropriate points, to change assembler names of versions and generate the dispatcher body if necessary. What do you think? How would I make this more generic? I will re-upload the patch addressing David's comments. Thanks, -Sri. > > We all like more generic solutions :) > > > Sri, can you provide more descriptions on FE changes -- this will help > reviewers get started. > > By the way, there are a couple of files with bad contents and needs > re-upload -- e.g, cp/decl.c. > > thanks, > > David > > > > > Jason >
Sign in to reply to this message.
On Tue, Jul 10, 2012 at 9:08 AM, Xinliang David Li <davidxl@google.com>wrote: > On Tue, Jul 10, 2012 at 2:46 AM, Jason Merrill <jason@redhat.com> wrote: > > On 07/09/2012 11:27 PM, Xinliang David Li wrote: > >> > >> Ok. Do you have specific comments on the patch? > > > > > > My comment is "Perhaps we want to implement this using a more generic > > mechanism." I was thinking to defer a detailed code review until that > > question is settled. > I am not sure what you mean by more generic so I am giving a overview of how the front-end is implemented so that you could say if this should be done in a different way. I am using the questions you asked previously to explain how I solved each of them. When working on this patch, these are the exact questions I had and tried to address it. * Does this attribute affect a function signature? The function signature should be changed when there is more than one definition/declaration of foo distinguished by unique target attributes. To make use of overloading, the DECL_NAME is not changed but only the DECL_ASSEMBLER_NAME is modified by appending the target attribute string which is canonicalized. The functionality for this is in multiversion.c. The DECL_ASSEMBLER_NAME for a function is changed as soon as it is known that it is a version. The default function, which is defined as that with no target attributes remains unchanged. * Do the attributes on these two declarations make them different? Yes, declarations with different target attributes correspond to different function versions and their assembler names are changed as explained above. * Do the attributes on these two declarations make one a better match? Which declaration matches depends on the run-time platform and hence the dispatching code to determine this at run-time. The code generating the body of the dispatcher is present in multiversion.c * Given a call to function X, should I call another function instead? Yes, this is what the dispatching takes care of. * Return a string representation of the attributes on this function that affect its signature. Yes, as described above the code for this is in multiversion.c Importantly, * The FE can detect that a declaration/definition is a version when it checks for duplicate decls in cp/decl.c. At this point, the FE checks and mark function versions. * When a call to a function that is a version is detected, in build_over_call in cp/call.c, the function joust determines if it is possible to make a direct call to a version or it needs to go through the dispatcher. If a direct call can be made, in some cases, it is done. Otherwise, the call is to a dispatcher function generated by the FE. * A similar handling is done when the pointer to a function is taken, resolve_address_of_overloaded_function in cp/class.c. The dispatcher is always generated in this case since an indirect call can be made using the pointer. * Handle comdat function versions in cxx_comdat_group by using the DECL_NAME rather than the DECL_ASSEMBLER_NAME * The cgraph saves the decls of all function versions so that future optimizations can know of semantically identical versions. I dont think the front-end changes are complicated. The new functionality is calls to functions defined in multiversion.c/cgraph, at what I could determine as appropriate points, to change assembler names of versions and generate the dispatcher body if necessary. What do you think? How would I make this more generic? I will re-upload the patch addressing David's comments. Thanks, -Sri. > > We all like more generic solutions :) > > > Sri, can you provide more descriptions on FE changes -- this will help > reviewers get started. > > By the way, there are a couple of files with bad contents and needs > re-upload -- e.g, cp/decl.c. > > thanks, > > David > > > > > Jason >
Sign in to reply to this message.
On 07/10/2012 03:14 PM, Sriraman Tallam wrote: > I am using the questions you asked previously > to explain how I solved each of them. When working on this patch, these > are the exact questions I had and tried to address it. > > * Does this attribute affect a function signature? > > The function signature should be changed when there is more than one > definition/declaration of foo distinguished by unique target attributes. >[...] I agree. I was trying to suggest that these questions are what the front end needs to care about, not about versioning specifically. If these questions are turned into target hooks, all of the logic specific to versioning can be contained in the target. My only question intended to be answered by humans is, do people think moving the versioning logic behind more generic target hooks is worthwhile? Jason
Sign in to reply to this message.
On Thu, Jul 19, 2012 at 1:39 PM, Jason Merrill <jason@redhat.com> wrote: > > On 07/10/2012 03:14 PM, Sriraman Tallam wrote: >> >> I am using the questions you asked previously >> to explain how I solved each of them. When working on this patch, these >> are the exact questions I had and tried to address it. >> >> * Does this attribute affect a function signature? >> >> The function signature should be changed when there is more than one >> definition/declaration of foo distinguished by unique target attributes. > > >[...] > > I agree. I was trying to suggest that these questions are what the front end needs to care about, not about versioning specifically. If these questions are turned into target hooks, all of the logic specific to versioning can be contained in the target. > > My only question intended to be answered by humans is, do people think moving the versioning logic behind more generic target hooks is worthwhile? I have some comments related For the example below, // Default version. int foo () { ..... } // Version XXX feature supported by Target ABC. int foo __attribute__ ((target ("XXX"))) { .... } How should the second version of foo be treated for targets where feature XXX is not supported? Right now, I am working on having my patch completely ignore such function versions when compiled for targets that do not understand the attribute. I could move this check into a generic target hook so that a function definition that does not make sense for the current target is ignored. Also, currently the patch uses target hooks to do the following: - Find if a particular version can be called directly, rather than go through the dispatcher. - Determine what the dispatcher body should be. - Determining the order in which function versions must be dispatched. I do not have a strong opinion on whether the entire logic should be based on target hooks. Thanks, -Sri. > > > > Jason
Sign in to reply to this message.
Hi Jason, I have created a new patch to use target hooks for all the functionality and make the front-end just call the target hooks at the appropriate places. This is more like what you suggested in a previous mail. In particular, target hooks address the following questions: * Determine if two function decls with the same signature are versions. * Determine the new assembler name of a function version. * Generate the dispatcher function for a set of function versions. * Compare versions to see if one has a higher priority over the other. Patch attached and also available for review at: http://codereview.appspot.com/5752064/ Hope this is more along the lines of what you had in mind, please let me know what you think. Thanks, -Sri. On Mon, Jul 30, 2012 at 12:01 PM, Sriraman Tallam <tmsriram@google.com> wrote: > On Thu, Jul 19, 2012 at 1:39 PM, Jason Merrill <jason@redhat.com> wrote: >> >> On 07/10/2012 03:14 PM, Sriraman Tallam wrote: >>> >>> I am using the questions you asked previously >>> to explain how I solved each of them. When working on this patch, these >>> are the exact questions I had and tried to address it. >>> >>> * Does this attribute affect a function signature? >>> >>> The function signature should be changed when there is more than one >>> definition/declaration of foo distinguished by unique target attributes. >> >> >[...] >> >> I agree. I was trying to suggest that these questions are what the front end needs to care about, not about versioning specifically. If these questions are turned into target hooks, all of the logic specific to versioning can be contained in the target. >> >> My only question intended to be answered by humans is, do people think moving the versioning logic behind more generic target hooks is worthwhile? > > I have some comments related > > For the example below, > > // Default version. > int foo () > { > ..... > } > > // Version XXX feature supported by Target ABC. > int foo __attribute__ ((target ("XXX"))) > { > .... > } > > How should the second version of foo be treated for targets where > feature XXX is not supported? Right now, I am working on having my > patch completely ignore such function versions when compiled for > targets that do not understand the attribute. I could move this check > into a generic target hook so that a function definition that does not > make sense for the current target is ignored. > > Also, currently the patch uses target hooks to do the following: > > - Find if a particular version can be called directly, rather than go > through the dispatcher. > - Determine what the dispatcher body should be. > - Determining the order in which function versions must be dispatched. > > I do not have a strong opinion on whether the entire logic should be > based on target hooks. > > Thanks, > -Sri. > >> >> >> >> Jason
Sign in to reply to this message.
Ping. On Aug 25, 2012 6:04 AM, "Sriraman Tallam" <tmsriram@google.com> wrote: > Hi Jason, > > I have created a new patch to use target hooks for all the > functionality and make the front-end just call the target hooks at the > appropriate places. This is more like what you suggested in a previous > mail. In particular, target hooks address the following questions: > > * Determine if two function decls with the same signature are versions. > * Determine the new assembler name of a function version. > * Generate the dispatcher function for a set of function versions. > * Compare versions to see if one has a higher priority over the other. > > Patch attached and also available for review at: > > http://codereview.appspot.com/5752064/ > > Hope this is more along the lines of what you had in mind, please let > me know what you think. > > Thanks, > -Sri. > > > On Mon, Jul 30, 2012 at 12:01 PM, Sriraman Tallam <tmsriram@google.com> > wrote: > > On Thu, Jul 19, 2012 at 1:39 PM, Jason Merrill <jason@redhat.com> wrote: > >> > >> On 07/10/2012 03:14 PM, Sriraman Tallam wrote: > >>> > >>> I am using the questions you asked previously > >>> to explain how I solved each of them. When working on this patch, these > >>> are the exact questions I had and tried to address it. > >>> > >>> * Does this attribute affect a function signature? > >>> > >>> The function signature should be changed when there is more than one > >>> definition/declaration of foo distinguished by unique target > attributes. > >> > >> >[...] > >> > >> I agree. I was trying to suggest that these questions are what the > front end needs to care about, not about versioning specifically. If these > questions are turned into target hooks, all of the logic specific to > versioning can be contained in the target. > >> > >> My only question intended to be answered by humans is, do people think > moving the versioning logic behind more generic target hooks is worthwhile? > > > > I have some comments related > > > > For the example below, > > > > // Default version. > > int foo () > > { > > ..... > > } > > > > // Version XXX feature supported by Target ABC. > > int foo __attribute__ ((target ("XXX"))) > > { > > .... > > } > > > > How should the second version of foo be treated for targets where > > feature XXX is not supported? Right now, I am working on having my > > patch completely ignore such function versions when compiled for > > targets that do not understand the attribute. I could move this check > > into a generic target hook so that a function definition that does not > > make sense for the current target is ignored. > > > > Also, currently the patch uses target hooks to do the following: > > > > - Find if a particular version can be called directly, rather than go > > through the dispatcher. > > - Determine what the dispatcher body should be. > > - Determining the order in which function versions must be dispatched. > > > > I do not have a strong opinion on whether the entire logic should be > > based on target hooks. > > > > Thanks, > > -Sri. > > > >> > >> > >> > >> Jason >
Sign in to reply to this message.
Ping. On Fri, Aug 24, 2012 at 5:34 PM, Sriraman Tallam <tmsriram@google.com> wrote: > Hi Jason, > > I have created a new patch to use target hooks for all the > functionality and make the front-end just call the target hooks at the > appropriate places. This is more like what you suggested in a previous > mail. In particular, target hooks address the following questions: > > * Determine if two function decls with the same signature are versions. > * Determine the new assembler name of a function version. > * Generate the dispatcher function for a set of function versions. > * Compare versions to see if one has a higher priority over the other. > > Patch attached and also available for review at: > > http://codereview.appspot.com/5752064/ > > Hope this is more along the lines of what you had in mind, please let > me know what you think. > > Thanks, > -Sri. > > > On Mon, Jul 30, 2012 at 12:01 PM, Sriraman Tallam <tmsriram@google.com> wrote: >> On Thu, Jul 19, 2012 at 1:39 PM, Jason Merrill <jason@redhat.com> wrote: >>> >>> On 07/10/2012 03:14 PM, Sriraman Tallam wrote: >>>> >>>> I am using the questions you asked previously >>>> to explain how I solved each of them. When working on this patch, these >>>> are the exact questions I had and tried to address it. >>>> >>>> * Does this attribute affect a function signature? >>>> >>>> The function signature should be changed when there is more than one >>>> definition/declaration of foo distinguished by unique target attributes. >>> >>> >[...] >>> >>> I agree. I was trying to suggest that these questions are what the front end needs to care about, not about versioning specifically. If these questions are turned into target hooks, all of the logic specific to versioning can be contained in the target. >>> >>> My only question intended to be answered by humans is, do people think moving the versioning logic behind more generic target hooks is worthwhile? >> >> I have some comments related >> >> For the example below, >> >> // Default version. >> int foo () >> { >> ..... >> } >> >> // Version XXX feature supported by Target ABC. >> int foo __attribute__ ((target ("XXX"))) >> { >> .... >> } >> >> How should the second version of foo be treated for targets where >> feature XXX is not supported? Right now, I am working on having my >> patch completely ignore such function versions when compiled for >> targets that do not understand the attribute. I could move this check >> into a generic target hook so that a function definition that does not >> make sense for the current target is ignored. >> >> Also, currently the patch uses target hooks to do the following: >> >> - Find if a particular version can be called directly, rather than go >> through the dispatcher. >> - Determine what the dispatcher body should be. >> - Determining the order in which function versions must be dispatched. >> >> I do not have a strong opinion on whether the entire logic should be >> based on target hooks. >> >> Thanks, >> -Sri. >> >>> >>> >>> >>> Jason
Sign in to reply to this message.
Hi Jason, Sri has addressed the comments you had on FE part. Can you take a look if it is ok? Stage-1 is going to be closed soon, and we hope to get this major feature in 4.8. thanks, David On Tue, Sep 18, 2012 at 9:29 AM, Sriraman Tallam <tmsriram@google.com> wrote: > Ping. > > On Fri, Aug 24, 2012 at 5:34 PM, Sriraman Tallam <tmsriram@google.com> wrote: >> Hi Jason, >> >> I have created a new patch to use target hooks for all the >> functionality and make the front-end just call the target hooks at the >> appropriate places. This is more like what you suggested in a previous >> mail. In particular, target hooks address the following questions: >> >> * Determine if two function decls with the same signature are versions. >> * Determine the new assembler name of a function version. >> * Generate the dispatcher function for a set of function versions. >> * Compare versions to see if one has a higher priority over the other. >> >> Patch attached and also available for review at: >> >> http://codereview.appspot.com/5752064/ >> >> Hope this is more along the lines of what you had in mind, please let >> me know what you think. >> >> Thanks, >> -Sri. >> >> >> On Mon, Jul 30, 2012 at 12:01 PM, Sriraman Tallam <tmsriram@google.com> wrote: >>> On Thu, Jul 19, 2012 at 1:39 PM, Jason Merrill <jason@redhat.com> wrote: >>>> >>>> On 07/10/2012 03:14 PM, Sriraman Tallam wrote: >>>>> >>>>> I am using the questions you asked previously >>>>> to explain how I solved each of them. When working on this patch, these >>>>> are the exact questions I had and tried to address it. >>>>> >>>>> * Does this attribute affect a function signature? >>>>> >>>>> The function signature should be changed when there is more than one >>>>> definition/declaration of foo distinguished by unique target attributes. >>>> >>>> >[...] >>>> >>>> I agree. I was trying to suggest that these questions are what the front end needs to care about, not about versioning specifically. If these questions are turned into target hooks, all of the logic specific to versioning can be contained in the target. >>>> >>>> My only question intended to be answered by humans is, do people think moving the versioning logic behind more generic target hooks is worthwhile? >>> >>> I have some comments related >>> >>> For the example below, >>> >>> // Default version. >>> int foo () >>> { >>> ..... >>> } >>> >>> // Version XXX feature supported by Target ABC. >>> int foo __attribute__ ((target ("XXX"))) >>> { >>> .... >>> } >>> >>> How should the second version of foo be treated for targets where >>> feature XXX is not supported? Right now, I am working on having my >>> patch completely ignore such function versions when compiled for >>> targets that do not understand the attribute. I could move this check >>> into a generic target hook so that a function definition that does not >>> make sense for the current target is ignored. >>> >>> Also, currently the patch uses target hooks to do the following: >>> >>> - Find if a particular version can be called directly, rather than go >>> through the dispatcher. >>> - Determine what the dispatcher body should be. >>> - Determining the order in which function versions must be dispatched. >>> >>> I do not have a strong opinion on whether the entire logic should be >>> based on target hooks. >>> >>> Thanks, >>> -Sri. >>> >>>> >>>> >>>> >>>> Jason
Sign in to reply to this message.
On 08/24/2012 08:34 PM, Sriraman Tallam wrote: > + /* If the address of a multiversioned function dispatcher is taken, > + generate the body to dispatch the right function at run-time. This > + is needed as the address can be used to do an indirect call. */ It seems to me that you don't need a dispatcher for doing indirect calls; you could just take the address of the version you would choose if you were doing a direct call. The only reason for a dispatcher I can think of is if you want the address of a function to compare equal across translation units compiled with different target flags. I'm not sure that's necessary; am I missing something? Continuing to look at the patch. Jason
Sign in to reply to this message.
On 10/05/2012 01:43 PM, Jason Merrill wrote: > On 08/24/2012 08:34 PM, Sriraman Tallam wrote: >> + /* If the address of a multiversioned function dispatcher is taken, >> + generate the body to dispatch the right function at run-time. This >> + is needed as the address can be used to do an indirect call. */ > > It seems to me that you don't need a dispatcher for doing indirect > calls; you could just take the address of the version you would choose > if you were doing a direct call. Oh, I see you use the dispatcher for direct calls as well. Why is that? Why do you do direct calls when the function is inlineable, but not otherwise? Jason
Sign in to reply to this message.
On 08/24/2012 08:34 PM, Sriraman Tallam wrote: > + /* For function versions, their parms and types match > + but they are not duplicates. Record function versions > + as and when they are found. */ > + if (TREE_CODE (fn) == FUNCTION_DECL > + && TREE_CODE (method) == FUNCTION_DECL > + && (DECL_FUNCTION_SPECIFIC_TARGET (fn) > + || DECL_FUNCTION_SPECIFIC_TARGET (method)) > + && targetm.target_option.function_versions (fn, method)) > + { > + targetm.set_version_assembler_name (fn); > + targetm.set_version_assembler_name (method); > + continue; > + } This seems like an odd place to be setting assembler names; better to just have the existing mangle_decl_assembler_name hook add the appropriate suffix when it's called normally. > + Also, mark this function as needed if it is marked inline but > + is a multi-versioned function. */ Why? If it's used, it should be marked needed though the normal process. > + error_at (location_of (DECL_NAME (OVL_CURRENT (fn))), > + "Call to multiversioned function %<%D(%A)%> with" > + " no default version", DECL_NAME (OVL_CURRENT (fn)), > + build_tree_list_vec (*args)); location_of just returns input_location if you ask it for the location of an identifier, so you might as well use error with no explicit location. And why not print candidates->fn instead of pasting the name/args? Also, lowercase "call". > + { > + tree dispatcher_decl = NULL; > + struct cgraph_node *node = cgraph_get_node (fn); > + if (node != NULL) > + dispatcher_decl = cgraph_get_node (fn)->version_dispatcher_decl; > + if (dispatcher_decl == NULL) > + { > + error_at (input_location, "Call to multiversioned function" > + " without a default is not allowed"); > + return NULL; > + } > + retrofit_lang_decl (dispatcher_decl); > + gcc_assert (dispatcher_decl != NULL); > + fn = dispatcher_decl; > + } Let's move this logic into a separate function that returns the dispatcher function. > + /* Both functions must be marked versioned. */ > + gcc_assert (DECL_FUNCTION_VERSIONED (cand1->fn) > + && DECL_FUNCTION_VERSIONED (cand2->fn)); Why can't you compare a versioned function and a non-versioned one? The code in joust should go further down in the function, before the handling of two declarations of the same function. > + /* For multiversioned functions, aggregate all the versions here for > + generating the dispatcher body later if necessary. */ > + > + if (TREE_CODE (candidates->fn) == FUNCTION_DECL > + && DECL_FUNCTION_VERSIONED (candidates->fn)) > + { > + VEC (tree, heap) *fn_ver_vec = NULL; > + struct z_candidate *ver = candidates; > + fn_ver_vec = VEC_alloc (tree, heap, 2); > + for (;ver; ver = ver->next) > + VEC_safe_push (tree, heap, fn_ver_vec, ver->fn); > + gcc_assert (targetm.get_function_versions_dispatcher); > + targetm.get_function_versions_dispatcher (fn_ver_vec); > + VEC_free (tree, heap, fn_ver_vec); > + } This seems to assume that all the functions in the list of candidates are versioned, but there might be unrelated functions from different namespaces. Also, doing this every time someone calls a versioned function seems like the wrong place; I would think it would be better to build up a list of versions as you seed declarations, and then use that list to define the dispatcher at EOF if it's needed. > + if (TREE_CODE (decl) == FUNCTION_DECL > + && DECL_FUNCTION_VERSIONED (decl) > + && DECL_ASSEMBLER_NAME_SET_P (decl)) > + write_source_name (DECL_ASSEMBLER_NAME (decl)); > + else > + write_source_name (DECL_NAME (decl)); Again, I think it's better to handle the suffix via mangle_decl_assembler_name. Jason
Sign in to reply to this message.
On Fri, Oct 5, 2012 at 10:43 AM, Jason Merrill <jason@redhat.com> wrote: > On 08/24/2012 08:34 PM, Sriraman Tallam wrote: >> >> + /* If the address of a multiversioned function dispatcher is taken, >> + generate the body to dispatch the right function at run-time. This >> >> + is needed as the address can be used to do an indirect call. */ > > > It seems to me that you don't need a dispatcher for doing indirect calls; > you could just take the address of the version you would choose if you were > doing a direct call. > > The only reason for a dispatcher I can think of is if you want the address > of a function to compare equal across translation units compiled with > different target flags. I'm not sure that's necessary; am I missing > something? In general, the dispatcher is always necessary since it is not known what function version will be called at compile time. This is true whether it is a direct or an indirect call. Example: int foo() __attribute__(sse3) { } int foo () __attribute__(sse4) { } int main () { foo (); // The version of foo to be called is not known at compile time. Needs dispatcher. int (*p)() = &foo; // What should be the value of p? (*p)(); // This needs a dispatcher too. } Now, since a dispatcher is necessary when the address of the function is taken, I thought I could as well make it the address of the function. Thanks, -Sri. > > Continuing to look at the patch. > > Jason >
Sign in to reply to this message.
On 10/05/2012 05:57 PM, Sriraman Tallam wrote: > In general, the dispatcher is always necessary since it is not known > what function version will be called at compile time. This is true > whether it is a direct or an indirect call. So you want to compile with lowest common denominator flags and then choose a faster version at runtime based on the running configuration? I see. Jason
Sign in to reply to this message.
On Fri, Oct 5, 2012 at 3:50 PM, Jason Merrill <jason@redhat.com> wrote: > On 10/05/2012 05:57 PM, Sriraman Tallam wrote: >> >> In general, the dispatcher is always necessary since it is not known >> what function version will be called at compile time. This is true >> whether it is a direct or an indirect call. > > > So you want to compile with lowest common denominator flags and then choose > a faster version at runtime based on the running configuration? I see. > Yes. Thanks, -Sri. > Jason >
Sign in to reply to this message.
Hi Jason, I have addressed all your comments and attached the new patch. On Fri, Oct 5, 2012 at 11:32 AM, Jason Merrill <jason@redhat.com> wrote: > On 08/24/2012 08:34 PM, Sriraman Tallam wrote: >> >> + /* For function versions, their parms and types match >> + but they are not duplicates. Record function versions >> + as and when they are found. */ >> + if (TREE_CODE (fn) == FUNCTION_DECL >> + && TREE_CODE (method) == FUNCTION_DECL >> + && (DECL_FUNCTION_SPECIFIC_TARGET (fn) >> + || DECL_FUNCTION_SPECIFIC_TARGET (method)) >> + && targetm.target_option.function_versions (fn, method)) >> + { >> + targetm.set_version_assembler_name (fn); >> + targetm.set_version_assembler_name (method); >> + continue; >> + } > > > This seems like an odd place to be setting assembler names; better to just > have the existing mangle_decl_assembler_name hook add the appropriate suffix > when it's called normally. I moved this to mangle_decl_assembler_name. Still, functions may go from not being a version to then becoming versions after a new definition is detected. In such cases, I explicitly call mangle_decl to modify the assembler name. > > >> + Also, mark this function as needed if it is marked inline but >> + is a multi-versioned function. */ > > > Why? If it's used, it should be marked needed though the normal process. How do I do this? If a versioned function is marked inline, I need to keep it but it has no explicit callers. How do I mark that it is needed? > >> + error_at (location_of (DECL_NAME (OVL_CURRENT (fn))), >> + "Call to multiversioned function %<%D(%A)%> with" >> + " no default version", DECL_NAME (OVL_CURRENT (fn)), >> + build_tree_list_vec (*args)); > > > location_of just returns input_location if you ask it for the location of an > identifier, so you might as well use error with no explicit location. And > why not print candidates->fn instead of pasting the name/args? Also, > lowercase "call". I removed this since the check already happens elsewhere. > >> + { >> + tree dispatcher_decl = NULL; >> + struct cgraph_node *node = cgraph_get_node (fn); >> + if (node != NULL) >> + dispatcher_decl = cgraph_get_node (fn)->version_dispatcher_decl; >> + if (dispatcher_decl == NULL) >> + { >> + error_at (input_location, "Call to multiversioned function" >> + " without a default is not allowed"); >> + return NULL; >> + } >> + retrofit_lang_decl (dispatcher_decl); >> + gcc_assert (dispatcher_decl != NULL); >> + fn = dispatcher_decl; >> + } > > > Let's move this logic into a separate function that returns the dispatcher > function. Done. > >> + /* Both functions must be marked versioned. */ >> + gcc_assert (DECL_FUNCTION_VERSIONED (cand1->fn) >> + && DECL_FUNCTION_VERSIONED (cand2->fn)); > > > Why can't you compare a versioned function and a non-versioned one? Right, there was a big bug in my code. I have changed this now. This should address your question. > > The code in joust should go further down in the function, before the > handling of two declarations of the same function. Done. > >> + /* For multiversioned functions, aggregate all the versions here for >> + generating the dispatcher body later if necessary. */ >> + >> + if (TREE_CODE (candidates->fn) == FUNCTION_DECL >> + && DECL_FUNCTION_VERSIONED (candidates->fn)) >> + { >> >> + VEC (tree, heap) *fn_ver_vec = NULL; >> + struct z_candidate *ver = candidates; >> >> + fn_ver_vec = VEC_alloc (tree, heap, 2); >> + for (;ver; ver = ver->next) >> + VEC_safe_push (tree, heap, fn_ver_vec, ver->fn); >> + gcc_assert (targetm.get_function_versions_dispatcher); >> + targetm.get_function_versions_dispatcher (fn_ver_vec); >> + VEC_free (tree, heap, fn_ver_vec); >> + } > > > This seems to assume that all the functions in the list of candidates are > versioned, but there might be unrelated functions from different namespaces. > Also, doing this every time someone calls a versioned function seems like > the wrong place; I would think it would be better to build up a list of > versions as you seed declarations, and then use that list to define the > dispatcher at EOF if it's needed. This was the bug I was referring to earlier. I have moved this to a separate function. I thought it is better to do this on demand. I have changed the code so that the aggregation and dispatcher generation happens exactly once. > >> + if (TREE_CODE (decl) == FUNCTION_DECL >> + && DECL_FUNCTION_VERSIONED (decl) >> + && DECL_ASSEMBLER_NAME_SET_P (decl)) >> + write_source_name (DECL_ASSEMBLER_NAME (decl)); >> + else >> + write_source_name (DECL_NAME (decl)); > > > Again, I think it's better to handle the suffix via > mangle_decl_assembler_name. Removed. Thanks for the comments. Please let me know what you think about the new patch. -Sri. > > Jason >
Sign in to reply to this message.
Hi Jason, I have attached the latest patch with more cleanups. Please let me know what you think. Honza, can you please review the cgraph part? Thanks, -Sri. On Wed, Oct 10, 2012 at 4:45 PM, Sriraman Tallam <tmsriram@google.com> wrote: > Hi Jason, > > I have addressed all your comments and attached the new patch. > > On Fri, Oct 5, 2012 at 11:32 AM, Jason Merrill <jason@redhat.com> wrote: >> On 08/24/2012 08:34 PM, Sriraman Tallam wrote: >>> >>> + /* For function versions, their parms and types match >>> + but they are not duplicates. Record function versions >>> + as and when they are found. */ >>> + if (TREE_CODE (fn) == FUNCTION_DECL >>> + && TREE_CODE (method) == FUNCTION_DECL >>> + && (DECL_FUNCTION_SPECIFIC_TARGET (fn) >>> + || DECL_FUNCTION_SPECIFIC_TARGET (method)) >>> + && targetm.target_option.function_versions (fn, method)) >>> + { >>> + targetm.set_version_assembler_name (fn); >>> + targetm.set_version_assembler_name (method); >>> + continue; >>> + } >> >> >> This seems like an odd place to be setting assembler names; better to just >> have the existing mangle_decl_assembler_name hook add the appropriate suffix >> when it's called normally. > > I moved this to mangle_decl_assembler_name. Still, functions may go > from not being a version to then becoming versions after a new > definition is detected. In such cases, I explicitly call mangle_decl > to modify the assembler name. > >> >> >>> + Also, mark this function as needed if it is marked inline but >>> + is a multi-versioned function. */ >> >> >> Why? If it's used, it should be marked needed though the normal process. > > How do I do this? If a versioned function is marked inline, I need to > keep it but it has no explicit callers. How do I mark that it is > needed? > >> >>> + error_at (location_of (DECL_NAME (OVL_CURRENT (fn))), >>> + "Call to multiversioned function %<%D(%A)%> with" >>> + " no default version", DECL_NAME (OVL_CURRENT (fn)), >>> + build_tree_list_vec (*args)); >> >> >> location_of just returns input_location if you ask it for the location of an >> identifier, so you might as well use error with no explicit location. And >> why not print candidates->fn instead of pasting the name/args? Also, >> lowercase "call". > > I removed this since the check already happens elsewhere. > >> >>> + { >>> + tree dispatcher_decl = NULL; >>> + struct cgraph_node *node = cgraph_get_node (fn); >>> + if (node != NULL) >>> + dispatcher_decl = cgraph_get_node (fn)->version_dispatcher_decl; >>> + if (dispatcher_decl == NULL) >>> + { >>> + error_at (input_location, "Call to multiversioned function" >>> + " without a default is not allowed"); >>> + return NULL; >>> + } >>> + retrofit_lang_decl (dispatcher_decl); >>> + gcc_assert (dispatcher_decl != NULL); >>> + fn = dispatcher_decl; >>> + } >> >> >> Let's move this logic into a separate function that returns the dispatcher >> function. > > Done. > >> >>> + /* Both functions must be marked versioned. */ >>> + gcc_assert (DECL_FUNCTION_VERSIONED (cand1->fn) >>> + && DECL_FUNCTION_VERSIONED (cand2->fn)); >> >> >> Why can't you compare a versioned function and a non-versioned one? > > Right, there was a big bug in my code. I have changed this now. This > should address your question. > >> >> The code in joust should go further down in the function, before the >> handling of two declarations of the same function. > > Done. > >> >>> + /* For multiversioned functions, aggregate all the versions here for >>> + generating the dispatcher body later if necessary. */ >>> + >>> + if (TREE_CODE (candidates->fn) == FUNCTION_DECL >>> + && DECL_FUNCTION_VERSIONED (candidates->fn)) >>> + { >>> >>> + VEC (tree, heap) *fn_ver_vec = NULL; >>> + struct z_candidate *ver = candidates; >>> >>> + fn_ver_vec = VEC_alloc (tree, heap, 2); >>> + for (;ver; ver = ver->next) >>> + VEC_safe_push (tree, heap, fn_ver_vec, ver->fn); >>> + gcc_assert (targetm.get_function_versions_dispatcher); >>> + targetm.get_function_versions_dispatcher (fn_ver_vec); >>> + VEC_free (tree, heap, fn_ver_vec); >>> + } >> >> >> This seems to assume that all the functions in the list of candidates are >> versioned, but there might be unrelated functions from different namespaces. >> Also, doing this every time someone calls a versioned function seems like >> the wrong place; I would think it would be better to build up a list of >> versions as you seed declarations, and then use that list to define the >> dispatcher at EOF if it's needed. > > > This was the bug I was referring to earlier. I have moved this to a > separate function. I thought it is better to do this on demand. I have > changed the code so that the aggregation and dispatcher generation > happens exactly once. > > >> >>> + if (TREE_CODE (decl) == FUNCTION_DECL >>> + && DECL_FUNCTION_VERSIONED (decl) >>> + && DECL_ASSEMBLER_NAME_SET_P (decl)) >>> + write_source_name (DECL_ASSEMBLER_NAME (decl)); >>> + else >>> + write_source_name (DECL_NAME (decl)); >> >> >> Again, I think it's better to handle the suffix via >> mangle_decl_assembler_name. > > Removed. > > > Thanks for the comments. Please let me know what you think about the new patch. > > -Sri. > >> >> Jason >>
Sign in to reply to this message.
On 2012-10-12 18:19 , Sriraman Tallam wrote: > When the front-end sees more than one decl for "foo", it calls a target hook to > determine if they are versions. To prevent duplicate definition errors with other > versions of "foo", "decls_match" function in cp/decl.c is made to return false > when 2 decls have are deemed versions by the target. This will make all function > versions of "foo" to be added to the overload list of "foo". So, this means that this can only work for C++, right? Or could the same trickery be done some other way in other FEs? I see no handling of different FEs. If the user tries to use these attributes from languages other than C++, we should emit a diagnostic. > +@deftypefn {Target Hook} tree TARGET_GET_FUNCTION_VERSIONS_DISPATCHER (void *@var{arglist}) > +This hook is used to get the dispatcher function for a set of function > +versions. The dispatcher function is called to invoke the rignt function s/rignt/right/ > +version at run-time. @var{arglist} is the vector of function versions > +that should be considered for dispatch. > +@end deftypefn > + > +@deftypefn {Target Hook} tree TARGET_GENERATE_VERSION_DISPATCHER_BODY (void *@var{arg}) > +This hook is used to generate the dispatcher logic to invoke the right > +function version at runtime for a given set of function versions. s/runtime/run-time/ > +@hook TARGET_GET_FUNCTION_VERSIONS_DISPATCHER > +This hook is used to get the dispatcher function for a set of function > +versions. The dispatcher function is called to invoke the rignt function s/rignt/right/ > +version at run-time. @var{arglist} is the vector of function versions > +that should be considered for dispatch. > +@end deftypefn > + > +@hook TARGET_GENERATE_VERSION_DISPATCHER_BODY > +This hook is used to generate the dispatcher logic to invoke the right > +function version at runtime for a given set of function versions. s/runtime/run-time/ > @@ -288,7 +289,6 @@ mark_store (gimple stmt, tree t, void *data) > } > return false; > } > - > /* Create cgraph edges for function calls. > Also look for functions and variables having addresses taken. */ Don't remove vertical white space, please. > + { > + struct cgraph_node *callee = cgraph_get_create_node (decl); > + /* If a call to a multiversioned function dispatcher is > + found, generate the body to dispatch the right function > + at run-time. */ > + if (callee->dispatcher_function) > + { > + tree resolver_decl; > + gcc_assert (callee->function_version.next); What if callee is the last version in the list? Not sure what you are trying to check here. > @@ -8601,9 +8601,22 @@ handle_target_attribute (tree *node, tree name, tr > warning (OPT_Wattributes, "%qE attribute ignored", name); > *no_add_attrs = true; > } > - else if (! targetm.target_option.valid_attribute_p (*node, name, args, > - flags)) > - *no_add_attrs = true; > + else > + { > + /* When a target attribute is invalid, it may also be because the > + target for the compilation unit and the attribute match. For > + instance, target attribute "xxx" is invalid when -mxxx is used. > + When used with multiversioning, removing the attribute will lead > + to duplicate definitions if a default version is provided. > + So, generate a warning here and remove the attribute. */ > + if (!targetm.target_option.valid_attribute_p (*node, name, args, flags)) > + { > + warning (OPT_Wattributes, > + "Invalid target attribute in function %qE, ignored.", > + *node); > + *no_add_attrs = true; If you do this, isn't the compiler going to generate two warning messages? One for the invalid target attribute, the second for the duplicate definition. > @@ -228,6 +228,26 @@ struct GTY(()) cgraph_node { > struct cgraph_node *prev_sibling_clone; > struct cgraph_node *clones; > struct cgraph_node *clone_of; > + > + /* Function Multiversioning info. */ > + struct { > + /* Chains all the semantically identical function versions. The > + first function in this chain is the default function. */ > + struct cgraph_node *prev; > + /* If this node is a dispatcher for function versions, this points > + to the default function version, the first function in the chain. */ > + struct cgraph_node *next; Why not a VEC of function decls? Seems easier to manage and less size overhead. > @@ -3516,8 +3522,8 @@ struct GTY(()) tree_function_decl { > unsigned looping_const_or_pure_flag : 1; > unsigned has_debug_args_flag : 1; > unsigned tm_clone_flag : 1; > - > - /* 1 bit left */ > + unsigned versioned_function : 1; > + /* No bits left. */ You ate the last bit! How rude ;) > @@ -8132,6 +8176,38 @@ joust (struct z_candidate *cand1, struct z_candida > && (IS_TYPE_OR_DECL_P (cand1->fn))) > return 1; > > + /* For Candidates of a multi-versioned function, make the version with s/Candidates/candidates/ > + old_current_function_decl = current_function_decl; > + push_cfun (DECL_STRUCT_FUNCTION (function_decl)); > + current_function_decl = function_decl; push_cfun will set current_function_decl for you. No need to keep track of old_current_function_decl. > + enum feature_priority > + { > + P_ZERO = 0, > + P_MMX, > + P_SSE, > + P_SSE2, > + P_SSE3, > + P_SSSE3, > + P_PROC_SSSE3, > + P_SSE4_a, > + P_PROC_SSE4_a, > + P_SSE4_1, > + P_SSE4_2, > + P_PROC_SSE4_2, > + P_POPCNT, > + P_AVX, > + P_AVX2, > + P_FMA, > + P_PROC_FMA > + }; There's no need to have this list dynamically defined, right? > + } > + } > + > + /* Process feature name. */ > + tok_str = (char *) xmalloc (strlen (attrs_str) + 1); XNEWVEC(char, strlen (attrs_str) + 1); > + /* Atleast one more version other than the default. */ s/Atleast/At least/ > + num_versions = VEC_length (tree, fndecls); > + gcc_assert (num_versions >= 2); > + > + function_version_info = (struct _function_version_info *) > + xmalloc ((num_versions - 1) * sizeof (struct _function_version_info)); Better use VEC() here. > + > + /* The first version in the vector is the default decl. */ > + default_decl = VEC_index (tree, fndecls, 0); > + > + old_current_function_decl = current_function_decl; > + push_cfun (DECL_STRUCT_FUNCTION (dispatch_decl)); > + current_function_decl = dispatch_decl; No need to set current_function_decl. > + > + gseq = bb_seq (*empty_bb); > + /* Function version dispatch is via IFUNC. IFUNC resolvers fire before > + constructors, so explicity call __builtin_cpu_init here. */ > + ifunc_cpu_init_stmt = gimple_build_call_vec ( > + ix86_builtins [(int) IX86_BUILTIN_CPU_INIT], NULL); > + gimple_seq_add_stmt (&gseq, ifunc_cpu_init_stmt); > + gimple_set_bb (ifunc_cpu_init_stmt, *empty_bb); > + set_bb_seq (*empty_bb, gseq); > + > + pop_cfun (); > + current_function_decl = old_current_function_decl; Likewise here. > +/* This function returns true if fn1 and fn2 are versions of the same function. > + Returns false if only one of the function decls has the target attribute > + set or if the targets of the function decls are different. This assumes > + the fn1 and fn2 have the same signature. */ Mention the arguments in capitals. > + for (i = 0; i < strlen (str); i++) > + if (str[i] == ',') > + argnum++; > + > + attr_str = (char *)xmalloc (strlen (str) + 1); XNEWVEC() > + strcpy (attr_str, str); > + > + /* Replace "=,-" with "_". */ > + for (i = 0; i < strlen (attr_str); i++) > + if (attr_str[i] == '=' || attr_str[i]== '-') > + attr_str[i] = '_'; > + > + if (argnum == 1) > + return attr_str; > + > + args = (char **)xmalloc (argnum * sizeof (char *)); VEC()? > + if (DECL_DECLARED_INLINE_P (decl) > + && lookup_attribute ("gnu_inline", > + DECL_ATTRIBUTES (decl))) > + error_at (DECL_SOURCE_LOCATION (decl), > + "Function versions cannot be marked as gnu_inline," > + " bodies have to be generated\n"); No newline at the end of the error message. > + sprintf (assembler_name, "%s.%s", orig_name, attr_str); > + if (dump_file) > + fprintf (stderr, "Assembler name set to %s for function version %s\n", > + assembler_name, IDENTIFIER_POINTER (id)); This dumps to stderr instead of dump_file. Also, use the new dumping facility? > +/* Return a new name by appending SUFFIX to the DECL name. If > + make_unique is true, append the full path name. */ Full path name of what? > + > +static char * > +make_name (tree decl, const char *suffix, bool make_unique) > +{ > + char *global_var_name; > + int name_len; > + const char *name; > + const char *unique_name = NULL; > + > + name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)); > + > + /* Get a unique name that can be used globally without any chances > + of collision at link time. */ > + if (make_unique) > + unique_name = IDENTIFIER_POINTER (get_file_function_name ("\0")); > + > + name_len = strlen (name) + strlen (suffix) + 2; > + > + if (make_unique) > + name_len += strlen (unique_name) + 1; > + global_var_name = (char *) xmalloc (name_len); XNEWVEC. Diego.
Sign in to reply to this message.
Hi Diego, Thanks for the review. I have addressed all your comments. New patch attached. Thanks, -Sri. On Fri, Oct 19, 2012 at 8:10 AM, Diego Novillo <dnovillo@google.com> wrote: > On 2012-10-12 18:19 , Sriraman Tallam wrote: > >> When the front-end sees more than one decl for "foo", it calls a target >> hook to >> determine if they are versions. To prevent duplicate definition errors >> with other >> versions of "foo", "decls_match" function in cp/decl.c is made to return >> false >> when 2 decls have are deemed versions by the target. This will make all >> function >> >> versions of "foo" to be added to the overload list of "foo". > > > So, this means that this can only work for C++, right? Or could the same > trickery be done some other way in other FEs? > > I see no handling of different FEs. If the user tries to use these > attributes from languages other than C++, we should emit a diagnostic. Yes, the support is only for C++ for now. "target" attribute is not new and if the user tries to use this with 'C' then a duplicate defintion error would occur just like now. I have plans to implement this for C too. > >> +@deftypefn {Target Hook} tree TARGET_GET_FUNCTION_VERSIONS_DISPATCHER >> (void *@var{arglist}) >> +This hook is used to get the dispatcher function for a set of function >> +versions. The dispatcher function is called to invoke the rignt function > > > s/rignt/right/ > >> +version at run-time. @var{arglist} is the vector of function versions >> +that should be considered for dispatch. >> +@end deftypefn >> + >> +@deftypefn {Target Hook} tree TARGET_GENERATE_VERSION_DISPATCHER_BODY >> (void *@var{arg}) >> +This hook is used to generate the dispatcher logic to invoke the right >> +function version at runtime for a given set of function versions. > > > s/runtime/run-time/ > >> +@hook TARGET_GET_FUNCTION_VERSIONS_DISPATCHER >> +This hook is used to get the dispatcher function for a set of function >> +versions. The dispatcher function is called to invoke the rignt function > > > s/rignt/right/ > >> +version at run-time. @var{arglist} is the vector of function versions >> +that should be considered for dispatch. >> +@end deftypefn >> + >> +@hook TARGET_GENERATE_VERSION_DISPATCHER_BODY >> +This hook is used to generate the dispatcher logic to invoke the right >> +function version at runtime for a given set of function versions. > > > s/runtime/run-time/ > >> @@ -288,7 +289,6 @@ mark_store (gimple stmt, tree t, void *data) >> } >> return false; >> } >> - >> /* Create cgraph edges for function calls. >> Also look for functions and variables having addresses taken. */ > > > Don't remove vertical white space, please. > >> + { >> + struct cgraph_node *callee = cgraph_get_create_node >> (decl); >> + /* If a call to a multiversioned function dispatcher is >> + found, generate the body to dispatch the right >> function >> + at run-time. */ >> + if (callee->dispatcher_function) >> + { >> + tree resolver_decl; >> + gcc_assert (callee->function_version.next); > > > What if callee is the last version in the list? Not sure what you are > trying to check here. So, callee here is the dispatcher function and it points to the set of semantically identical function versions. At this point, the dispatcher (callee) should have all the function versions chained in function_version, which is what the assert is checking. > > >> @@ -8601,9 +8601,22 @@ handle_target_attribute (tree *node, tree name, tr >> warning (OPT_Wattributes, "%qE attribute ignored", name); >> *no_add_attrs = true; >> } >> - else if (! targetm.target_option.valid_attribute_p (*node, name, args, >> - flags)) >> - *no_add_attrs = true; >> + else >> + { >> + /* When a target attribute is invalid, it may also be because the >> + target for the compilation unit and the attribute match. For >> + instance, target attribute "xxx" is invalid when -mxxx is used. >> + When used with multiversioning, removing the attribute will lead >> + to duplicate definitions if a default version is provided. >> + So, generate a warning here and remove the attribute. */ >> + if (!targetm.target_option.valid_attribute_p (*node, name, args, >> flags)) >> + { >> + warning (OPT_Wattributes, >> + "Invalid target attribute in function %qE, ignored.", >> + *node); >> + *no_add_attrs = true; > > > If you do this, isn't the compiler going to generate two warning messages? > One for the invalid target attribute, the second for the duplicate > definition. This will be a warning and the duplicate definition would be an error. The warning would help the user understand why this error occurred. Example: ver.cc int __attribute__((target("popcnt"))) bar (bool a) { return 0; } int bar (bool a) { return 1; } $ g++ -mpopcnt ver.cc ver.cc:2:12: warning: Invalid target attribute in function ‘bar’, ignored. [-Wattributes] bar (bool a) ^ ver.cc: In function ‘int bar(bool)’: ver.cc:7:1: error: redefinition of ‘int bar(bool)’ bar (bool a) ^ ver.cc:2:1: error: ‘int bar(bool)’ previously defined here bar (bool a) When compiled with -mpopcnt, the new version does not differ from the default. Now, the warning makes it clear why the redefinition error occurred. > >> @@ -228,6 +228,26 @@ struct GTY(()) cgraph_node { >> struct cgraph_node *prev_sibling_clone; >> struct cgraph_node *clones; >> struct cgraph_node *clone_of; >> + >> + /* Function Multiversioning info. */ >> + struct { >> >> + /* Chains all the semantically identical function versions. The >> + first function in this chain is the default function. */ >> + struct cgraph_node *prev; >> + /* If this node is a dispatcher for function versions, this points >> + to the default function version, the first function in the chain. >> */ >> + struct cgraph_node *next; > > > Why not a VEC of function decls? Seems easier to manage and less size > overhead. I have solved the size overhead by moving function_version_info outside cgraph. I think it is better to chain the decls as it is very easy to traverse the list of semantically identical versions from any given function version. > > >> @@ -3516,8 +3522,8 @@ struct GTY(()) tree_function_decl { >> >> unsigned looping_const_or_pure_flag : 1; >> unsigned has_debug_args_flag : 1; >> unsigned tm_clone_flag : 1; >> - >> - /* 1 bit left */ >> + unsigned versioned_function : 1; >> + /* No bits left. */ > > > You ate the last bit! How rude ;) I should get the patch in before somebody else really eats it ;-) > >> @@ -8132,6 +8176,38 @@ joust (struct z_candidate *cand1, struct z_candida >> && (IS_TYPE_OR_DECL_P (cand1->fn))) >> return 1; >> >> + /* For Candidates of a multi-versioned function, make the version with > > > s/Candidates/candidates/ > > >> + old_current_function_decl = current_function_decl; >> + push_cfun (DECL_STRUCT_FUNCTION (function_decl)); >> + current_function_decl = function_decl; > > > push_cfun will set current_function_decl for you. No need to keep track of > old_current_function_decl. > >> + enum feature_priority >> + { >> + P_ZERO = 0, >> + P_MMX, >> + P_SSE, >> + P_SSE2, >> + P_SSE3, >> + P_SSSE3, >> + P_PROC_SSSE3, >> + P_SSE4_a, >> + P_PROC_SSE4_a, >> + P_SSE4_1, >> + P_SSE4_2, >> + P_PROC_SSE4_2, >> + P_POPCNT, >> + P_AVX, >> + P_AVX2, >> + P_FMA, >> + P_PROC_FMA >> + }; > > > There's no need to have this list dynamically defined, right? I dont understand, why expose the enum outside the function? > >> + } >> + } >> + >> + /* Process feature name. */ >> + tok_str = (char *) xmalloc (strlen (attrs_str) + 1); > > > XNEWVEC(char, strlen (attrs_str) + 1); > > >> + /* Atleast one more version other than the default. */ > > > s/Atleast/At least/ > >> + num_versions = VEC_length (tree, fndecls); >> + gcc_assert (num_versions >= 2); >> + >> + function_version_info = (struct _function_version_info *) >> + xmalloc ((num_versions - 1) * sizeof (struct >> _function_version_info)); > > > Better use VEC() here. > > >> + >> + /* The first version in the vector is the default decl. */ >> + default_decl = VEC_index (tree, fndecls, 0); >> + >> + old_current_function_decl = current_function_decl; >> + push_cfun (DECL_STRUCT_FUNCTION (dispatch_decl)); >> + current_function_decl = dispatch_decl; > > > No need to set current_function_decl. > >> + >> + gseq = bb_seq (*empty_bb); >> + /* Function version dispatch is via IFUNC. IFUNC resolvers fire before >> + constructors, so explicity call __builtin_cpu_init here. */ >> >> + ifunc_cpu_init_stmt = gimple_build_call_vec ( >> + ix86_builtins [(int) IX86_BUILTIN_CPU_INIT], NULL); >> + gimple_seq_add_stmt (&gseq, ifunc_cpu_init_stmt); >> + gimple_set_bb (ifunc_cpu_init_stmt, *empty_bb); >> + set_bb_seq (*empty_bb, gseq); >> + >> + pop_cfun (); >> + current_function_decl = old_current_function_decl; > > > Likewise here. > >> +/* This function returns true if fn1 and fn2 are versions of the same >> function. >> + Returns false if only one of the function decls has the target >> attribute >> + set or if the targets of the function decls are different. This >> assumes >> + the fn1 and fn2 have the same signature. */ > > > Mention the arguments in capitals. > > >> + for (i = 0; i < strlen (str); i++) >> + if (str[i] == ',') >> + argnum++; >> + >> + attr_str = (char *)xmalloc (strlen (str) + 1); > > > XNEWVEC() > >> + strcpy (attr_str, str); >> + >> + /* Replace "=,-" with "_". */ >> >> + for (i = 0; i < strlen (attr_str); i++) >> + if (attr_str[i] == '=' || attr_str[i]== '-') >> >> + attr_str[i] = '_'; >> + >> + if (argnum == 1) >> + return attr_str; >> + >> + args = (char **)xmalloc (argnum * sizeof (char *)); > > > VEC()? > >> + if (DECL_DECLARED_INLINE_P (decl) >> + && lookup_attribute ("gnu_inline", >> >> + DECL_ATTRIBUTES (decl))) >> + error_at (DECL_SOURCE_LOCATION (decl), >> + "Function versions cannot be marked as gnu_inline," >> + " bodies have to be generated\n"); > > > No newline at the end of the error message. > >> + sprintf (assembler_name, "%s.%s", orig_name, attr_str); >> + if (dump_file) >> + fprintf (stderr, "Assembler name set to %s for function version >> %s\n", >> + assembler_name, IDENTIFIER_POINTER (id)); > > > This dumps to stderr instead of dump_file. Also, use the new dumping > facility? > > >> +/* Return a new name by appending SUFFIX to the DECL name. If >> + make_unique is true, append the full path name. */ > > > Full path name of what? > > >> + >> +static char * >> +make_name (tree decl, const char *suffix, bool make_unique) >> +{ >> + char *global_var_name; >> + int name_len; >> + const char *name; >> + const char *unique_name = NULL; >> + >> + name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)); >> + >> + /* Get a unique name that can be used globally without any chances >> + of collision at link time. */ >> + if (make_unique) >> + unique_name = IDENTIFIER_POINTER (get_file_function_name ("\0")); >> + >> + name_len = strlen (name) + strlen (suffix) + 2; >> + >> + if (make_unique) >> + name_len += strlen (unique_name) + 1; >> + global_var_name = (char *) xmalloc (name_len); > > > XNEWVEC. > > > > Diego.
Sign in to reply to this message.
Hi, I have attached the latest patch with bug fixes, comments. I have also added a description of the function multiversioning syntax supported by the Intel compiler. Thanks, -Sri. On Fri, Oct 19, 2012 at 7:33 PM, Sriraman Tallam <tmsriram@google.com> wrote: > Hi Diego, > > Thanks for the review. I have addressed all your comments. New > patch attached. > > Thanks, > -Sri. > > On Fri, Oct 19, 2012 at 8:10 AM, Diego Novillo <dnovillo@google.com> wrote: >> On 2012-10-12 18:19 , Sriraman Tallam wrote: >> >>> When the front-end sees more than one decl for "foo", it calls a target >>> hook to >>> determine if they are versions. To prevent duplicate definition errors >>> with other >>> versions of "foo", "decls_match" function in cp/decl.c is made to return >>> false >>> when 2 decls have are deemed versions by the target. This will make all >>> function >>> >>> versions of "foo" to be added to the overload list of "foo". >> >> >> So, this means that this can only work for C++, right? Or could the same >> trickery be done some other way in other FEs? >> >> I see no handling of different FEs. If the user tries to use these >> attributes from languages other than C++, we should emit a diagnostic. > > Yes, the support is only for C++ for now. "target" attribute is not > new and if the user tries to use this with 'C' then a duplicate > defintion error would occur just like now. > I have plans to implement this for C too. > >> >>> +@deftypefn {Target Hook} tree TARGET_GET_FUNCTION_VERSIONS_DISPATCHER >>> (void *@var{arglist}) >>> +This hook is used to get the dispatcher function for a set of function >>> +versions. The dispatcher function is called to invoke the rignt function >> >> >> s/rignt/right/ >> >>> +version at run-time. @var{arglist} is the vector of function versions >>> +that should be considered for dispatch. >>> +@end deftypefn >>> + >>> +@deftypefn {Target Hook} tree TARGET_GENERATE_VERSION_DISPATCHER_BODY >>> (void *@var{arg}) >>> +This hook is used to generate the dispatcher logic to invoke the right >>> +function version at runtime for a given set of function versions. >> >> >> s/runtime/run-time/ >> >>> +@hook TARGET_GET_FUNCTION_VERSIONS_DISPATCHER >>> +This hook is used to get the dispatcher function for a set of function >>> +versions. The dispatcher function is called to invoke the rignt function >> >> >> s/rignt/right/ >> >>> +version at run-time. @var{arglist} is the vector of function versions >>> +that should be considered for dispatch. >>> +@end deftypefn >>> + >>> +@hook TARGET_GENERATE_VERSION_DISPATCHER_BODY >>> +This hook is used to generate the dispatcher logic to invoke the right >>> +function version at runtime for a given set of function versions. >> >> >> s/runtime/run-time/ >> >>> @@ -288,7 +289,6 @@ mark_store (gimple stmt, tree t, void *data) >>> } >>> return false; >>> } >>> - >>> /* Create cgraph edges for function calls. >>> Also look for functions and variables having addresses taken. */ >> >> >> Don't remove vertical white space, please. >> >>> + { >>> + struct cgraph_node *callee = cgraph_get_create_node >>> (decl); >>> + /* If a call to a multiversioned function dispatcher is >>> + found, generate the body to dispatch the right >>> function >>> + at run-time. */ >>> + if (callee->dispatcher_function) >>> + { >>> + tree resolver_decl; >>> + gcc_assert (callee->function_version.next); >> >> >> What if callee is the last version in the list? Not sure what you are >> trying to check here. > > So, callee here is the dispatcher function and it points to the set of > semantically identical function versions. At this point, the > dispatcher (callee) should have all the function versions chained in > function_version, which is what the assert is checking. > >> >> >>> @@ -8601,9 +8601,22 @@ handle_target_attribute (tree *node, tree name, tr >>> warning (OPT_Wattributes, "%qE attribute ignored", name); >>> *no_add_attrs = true; >>> } >>> - else if (! targetm.target_option.valid_attribute_p (*node, name, args, >>> - flags)) >>> - *no_add_attrs = true; >>> + else >>> + { >>> + /* When a target attribute is invalid, it may also be because the >>> + target for the compilation unit and the attribute match. For >>> + instance, target attribute "xxx" is invalid when -mxxx is used. >>> + When used with multiversioning, removing the attribute will lead >>> + to duplicate definitions if a default version is provided. >>> + So, generate a warning here and remove the attribute. */ >>> + if (!targetm.target_option.valid_attribute_p (*node, name, args, >>> flags)) >>> + { >>> + warning (OPT_Wattributes, >>> + "Invalid target attribute in function %qE, ignored.", >>> + *node); >>> + *no_add_attrs = true; >> >> >> If you do this, isn't the compiler going to generate two warning messages? >> One for the invalid target attribute, the second for the duplicate >> definition. > > This will be a warning and the duplicate definition would be an error. > The warning would help the user understand why this error occurred. > Example: > > ver.cc > int __attribute__((target("popcnt"))) > bar (bool a) > { > return 0; > } > > int > bar (bool a) > { > return 1; > } > > $ g++ -mpopcnt ver.cc > > ver.cc:2:12: warning: Invalid target attribute in function ‘bar’, > ignored. [-Wattributes] > bar (bool a) > ^ > ver.cc: In function ‘int bar(bool)’: > ver.cc:7:1: error: redefinition of ‘int bar(bool)’ > bar (bool a) > ^ > ver.cc:2:1: error: ‘int bar(bool)’ previously defined here > bar (bool a) > > When compiled with -mpopcnt, the new version does not differ from the default. > Now, the warning makes it clear why the redefinition error occurred. > > > >> >>> @@ -228,6 +228,26 @@ struct GTY(()) cgraph_node { >>> struct cgraph_node *prev_sibling_clone; >>> struct cgraph_node *clones; >>> struct cgraph_node *clone_of; >>> + >>> + /* Function Multiversioning info. */ >>> + struct { >>> >>> + /* Chains all the semantically identical function versions. The >>> + first function in this chain is the default function. */ >>> + struct cgraph_node *prev; >>> + /* If this node is a dispatcher for function versions, this points >>> + to the default function version, the first function in the chain. >>> */ >>> + struct cgraph_node *next; >> >> >> Why not a VEC of function decls? Seems easier to manage and less size >> overhead. > > I have solved the size overhead by moving function_version_info > outside cgraph. I think it is better to chain the decls as it is very > easy to traverse the list of semantically identical versions from any > given function version. > >> >> >>> @@ -3516,8 +3522,8 @@ struct GTY(()) tree_function_decl { >>> >>> unsigned looping_const_or_pure_flag : 1; >>> unsigned has_debug_args_flag : 1; >>> unsigned tm_clone_flag : 1; >>> - >>> - /* 1 bit left */ >>> + unsigned versioned_function : 1; >>> + /* No bits left. */ >> >> >> You ate the last bit! How rude ;) > > I should get the patch in before somebody else really eats it ;-) > >> >>> @@ -8132,6 +8176,38 @@ joust (struct z_candidate *cand1, struct z_candida >>> && (IS_TYPE_OR_DECL_P (cand1->fn))) >>> return 1; >>> >>> + /* For Candidates of a multi-versioned function, make the version with >> >> >> s/Candidates/candidates/ >> >> >>> + old_current_function_decl = current_function_decl; >>> + push_cfun (DECL_STRUCT_FUNCTION (function_decl)); >>> + current_function_decl = function_decl; >> >> >> push_cfun will set current_function_decl for you. No need to keep track of >> old_current_function_decl. >> >>> + enum feature_priority >>> + { >>> + P_ZERO = 0, >>> + P_MMX, >>> + P_SSE, >>> + P_SSE2, >>> + P_SSE3, >>> + P_SSSE3, >>> + P_PROC_SSSE3, >>> + P_SSE4_a, >>> + P_PROC_SSE4_a, >>> + P_SSE4_1, >>> + P_SSE4_2, >>> + P_PROC_SSE4_2, >>> + P_POPCNT, >>> + P_AVX, >>> + P_AVX2, >>> + P_FMA, >>> + P_PROC_FMA >>> + }; >> >> >> There's no need to have this list dynamically defined, right? > > I dont understand, why expose the enum outside the function? > >> >>> + } >>> + } >>> + >>> + /* Process feature name. */ >>> + tok_str = (char *) xmalloc (strlen (attrs_str) + 1); >> >> >> XNEWVEC(char, strlen (attrs_str) + 1); >> >> >>> + /* Atleast one more version other than the default. */ >> >> >> s/Atleast/At least/ >> >>> + num_versions = VEC_length (tree, fndecls); >>> + gcc_assert (num_versions >= 2); >>> + >>> + function_version_info = (struct _function_version_info *) >>> + xmalloc ((num_versions - 1) * sizeof (struct >>> _function_version_info)); >> >> >> Better use VEC() here. >> >> >>> + >>> + /* The first version in the vector is the default decl. */ >>> + default_decl = VEC_index (tree, fndecls, 0); >>> + >>> + old_current_function_decl = current_function_decl; >>> + push_cfun (DECL_STRUCT_FUNCTION (dispatch_decl)); >>> + current_function_decl = dispatch_decl; >> >> >> No need to set current_function_decl. >> >>> + >>> + gseq = bb_seq (*empty_bb); >>> + /* Function version dispatch is via IFUNC. IFUNC resolvers fire before >>> + constructors, so explicity call __builtin_cpu_init here. */ >>> >>> + ifunc_cpu_init_stmt = gimple_build_call_vec ( >>> + ix86_builtins [(int) IX86_BUILTIN_CPU_INIT], NULL); >>> + gimple_seq_add_stmt (&gseq, ifunc_cpu_init_stmt); >>> + gimple_set_bb (ifunc_cpu_init_stmt, *empty_bb); >>> + set_bb_seq (*empty_bb, gseq); >>> + >>> + pop_cfun (); >>> + current_function_decl = old_current_function_decl; >> >> >> Likewise here. >> >>> +/* This function returns true if fn1 and fn2 are versions of the same >>> function. >>> + Returns false if only one of the function decls has the target >>> attribute >>> + set or if the targets of the function decls are different. This >>> assumes >>> + the fn1 and fn2 have the same signature. */ >> >> >> Mention the arguments in capitals. >> >> >>> + for (i = 0; i < strlen (str); i++) >>> + if (str[i] == ',') >>> + argnum++; >>> + >>> + attr_str = (char *)xmalloc (strlen (str) + 1); >> >> >> XNEWVEC() >> >>> + strcpy (attr_str, str); >>> + >>> + /* Replace "=,-" with "_". */ >>> >>> + for (i = 0; i < strlen (attr_str); i++) >>> + if (attr_str[i] == '=' || attr_str[i]== '-') >>> >>> + attr_str[i] = '_'; >>> + >>> + if (argnum == 1) >>> + return attr_str; >>> + >>> + args = (char **)xmalloc (argnum * sizeof (char *)); >> >> >> VEC()? >> >>> + if (DECL_DECLARED_INLINE_P (decl) >>> + && lookup_attribute ("gnu_inline", >>> >>> + DECL_ATTRIBUTES (decl))) >>> + error_at (DECL_SOURCE_LOCATION (decl), >>> + "Function versions cannot be marked as gnu_inline," >>> + " bodies have to be generated\n"); >> >> >> No newline at the end of the error message. >> >>> + sprintf (assembler_name, "%s.%s", orig_name, attr_str); >>> + if (dump_file) >>> + fprintf (stderr, "Assembler name set to %s for function version >>> %s\n", >>> + assembler_name, IDENTIFIER_POINTER (id)); >> >> >> This dumps to stderr instead of dump_file. Also, use the new dumping >> facility? >> >> >>> +/* Return a new name by appending SUFFIX to the DECL name. If >>> + make_unique is true, append the full path name. */ >> >> >> Full path name of what? >> >> >>> + >>> +static char * >>> +make_name (tree decl, const char *suffix, bool make_unique) >>> +{ >>> + char *global_var_name; >>> + int name_len; >>> + const char *name; >>> + const char *unique_name = NULL; >>> + >>> + name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)); >>> + >>> + /* Get a unique name that can be used globally without any chances >>> + of collision at link time. */ >>> + if (make_unique) >>> + unique_name = IDENTIFIER_POINTER (get_file_function_name ("\0")); >>> + >>> + name_len = strlen (name) + strlen (suffix) + 2; >>> + >>> + if (make_unique) >>> + name_len += strlen (unique_name) + 1; >>> + global_var_name = (char *) xmalloc (name_len); >> >> >> XNEWVEC. >> >> >> >> Diego.
Sign in to reply to this message.
On Fri, Oct 19, 2012 at 10:33 PM, Sriraman Tallam <tmsriram@google.com> wrote: > Yes, the support is only for C++ for now. "target" attribute is not > new and if the user tries to use this with 'C' then a duplicate > defintion error would occur just like now. > I have plans to implement this for C too. Would it be hard to emit a diagnostic that specifically states that "target" is not a valid attribute in C? > So, callee here is the dispatcher function and it points to the set of > semantically identical function versions. At this point, the > dispatcher (callee) should have all the function versions chained in > function_version, which is what the assert is checking. Great, could you add this explanation as a comment? It wasn't at all clear to me what was going on. >>> + enum feature_priority >>> + { >>> + P_ZERO = 0, >>> + P_MMX, >>> + P_SSE, >>> + P_SSE2, >>> + P_SSE3, >>> + P_SSSE3, >>> + P_PROC_SSSE3, >>> + P_SSE4_a, >>> + P_PROC_SSE4_a, >>> + P_SSE4_1, >>> + P_SSE4_2, >>> + P_PROC_SSE4_2, >>> + P_POPCNT, >>> + P_AVX, >>> + P_AVX2, >>> + P_FMA, >>> + P_PROC_FMA >>> + }; >> >> >> There's no need to have this list dynamically defined, right? > > I dont understand, why expose the enum outside the function? To allow altering the list of priorities. But if it doesn't make sense, ignore me. The patch is OK with the changes above addressed. Thanks. Please be on the lookout for failures. Diego.
Sign in to reply to this message.
Hi, sorry for jumping in late, for too long I did not had chnce to look at my TODO. I have two comments... > Index: gcc/cgraphbuild.c > =================================================================== > --- gcc/cgraphbuild.c (revision 192623) > +++ gcc/cgraphbuild.c (working copy) > @@ -34,6 +34,7 @@ along with GCC; see the file COPYING3. If not see > #include "ipa-utils.h" > #include "except.h" > #include "ipa-inline.h" > +#include "target.h" > > /* Context of record_reference. */ > struct record_reference_ctx > @@ -317,8 +318,23 @@ build_cgraph_edges (void) > bb); > decl = gimple_call_fndecl (stmt); > if (decl) > - cgraph_create_edge (node, cgraph_get_create_node (decl), > - stmt, bb->count, freq); > + { > + struct cgraph_node *callee = cgraph_get_create_node (decl); > + /* If a call to a multiversioned function dispatcher is > + found, generate the body to dispatch the right function > + at run-time. */ > + if (callee->dispatcher_function) > + { > + tree resolver_decl; > + gcc_assert (callee->function_version > + && callee->function_version->next); > + gcc_assert (targetm.generate_version_dispatcher_body); > + resolver_decl > + = targetm.generate_version_dispatcher_body (callee); > + gcc_assert (resolver_decl != NULL_TREE); > + } > + cgraph_create_edge (node, callee, stmt, bb->count, freq); > + } I do not really think resolver generation belongs here + I would preffer build_cgraph_edges to really just build the edges. > Index: gcc/cgraph.c > =================================================================== > --- gcc/cgraph.c (revision 192623) > +++ gcc/cgraph.c (working copy) > @@ -1277,6 +1277,16 @@ cgraph_mark_address_taken_node (struct cgraph_node > node->symbol.address_taken = 1; > node = cgraph_function_or_thunk_node (node, NULL); > node->symbol.address_taken = 1; > + /* If the address of a multiversioned function dispatcher is taken, > + generate the body to dispatch the right function at run-time. This > + is needed as the address can be used to do an indirect call. */ > + if (node->dispatcher_function) > + { > + gcc_assert (node->function_version > + && node->function_version->next); > + gcc_assert (targetm.generate_version_dispatcher_body); > + targetm.generate_version_dispatcher_body (node); > + } Similarly here. I also think this way you will miss aliases of the multiversioned functions. I am not sure why the multiversioning is tied with the cgraph build and the datastructure is put into cgraph_node itself. It seems to me that your dispatchers are in a way related to thunks - i.e. they are inserted into callgraph and once they become reachable their body needs to be produced. I think generate_version_dispatcher_body should thus probably be done from cgraph_analyze_function. (to make the function to be seen by analyze_function you will need to make it to be finalized at the time you set dispatcher_function flag. I would also put the dispatcher datastructure into on-side hash by node->uid. (i.e. these are rare and thus the datastructure should be small) symbol table is critical for WPA stage memory use and I plan to remove as much as possible from the nodes in near future. For this reason I would preffer to not add too much of stuff that is not going to be used by majority of nodes. Honza
Sign in to reply to this message.
On Fri, Oct 26, 2012 at 8:54 AM, Jan Hubicka <hubicka@ucw.cz> wrote: > Hi, > sorry for jumping in late, for too long I did not had chnce to look at my TODO. > I have two comments... >> Index: gcc/cgraphbuild.c >> =================================================================== >> --- gcc/cgraphbuild.c (revision 192623) >> +++ gcc/cgraphbuild.c (working copy) >> @@ -34,6 +34,7 @@ along with GCC; see the file COPYING3. If not see >> #include "ipa-utils.h" >> #include "except.h" >> #include "ipa-inline.h" >> +#include "target.h" >> >> /* Context of record_reference. */ >> struct record_reference_ctx >> @@ -317,8 +318,23 @@ build_cgraph_edges (void) >> bb); >> decl = gimple_call_fndecl (stmt); >> if (decl) >> - cgraph_create_edge (node, cgraph_get_create_node (decl), >> - stmt, bb->count, freq); >> + { >> + struct cgraph_node *callee = cgraph_get_create_node (decl); >> + /* If a call to a multiversioned function dispatcher is >> + found, generate the body to dispatch the right function >> + at run-time. */ >> + if (callee->dispatcher_function) >> + { >> + tree resolver_decl; >> + gcc_assert (callee->function_version >> + && callee->function_version->next); >> + gcc_assert (targetm.generate_version_dispatcher_body); >> + resolver_decl >> + = targetm.generate_version_dispatcher_body (callee); >> + gcc_assert (resolver_decl != NULL_TREE); >> + } >> + cgraph_create_edge (node, callee, stmt, bb->count, freq); >> + } > I do not really think resolver generation belongs here + I would preffer > build_cgraph_edges to really just build the edges. >> Index: gcc/cgraph.c >> =================================================================== >> --- gcc/cgraph.c (revision 192623) >> +++ gcc/cgraph.c (working copy) >> @@ -1277,6 +1277,16 @@ cgraph_mark_address_taken_node (struct cgraph_node >> node->symbol.address_taken = 1; >> node = cgraph_function_or_thunk_node (node, NULL); >> node->symbol.address_taken = 1; >> + /* If the address of a multiversioned function dispatcher is taken, >> + generate the body to dispatch the right function at run-time. This >> + is needed as the address can be used to do an indirect call. */ >> + if (node->dispatcher_function) >> + { >> + gcc_assert (node->function_version >> + && node->function_version->next); >> + gcc_assert (targetm.generate_version_dispatcher_body); >> + targetm.generate_version_dispatcher_body (node); >> + } > > Similarly here. I also think this way you will miss aliases of the multiversioned > functions. > > I am not sure why the multiversioning is tied with the cgraph build and the > datastructure is put into cgraph_node itself. It seems to me that your > dispatchers are in a way related to thunks - i.e. they are inserted into > callgraph and once they become reachable their body needs to be produced. I > think generate_version_dispatcher_body should thus probably be done from > cgraph_analyze_function. (to make the function to be seen by analyze_function > you will need to make it to be finalized at the time you set > dispatcher_function flag. This seems reasonable -- Sri, do you see any problems with this suggestion? > > I would also put the dispatcher datastructure into on-side hash by node->uid. > (i.e. these are rare and thus the datastructure should be small) > symbol table is critical for WPA stage memory use and I plan to remove as much > as possible from the nodes in near future. For this reason I would preffer > to not add too much of stuff that is not going to be used by majority of nodes. > I had the concern on the increasing the size of core data structure too. thanks, David > Honza
Sign in to reply to this message.
On Fri, Oct 26, 2012 at 9:07 AM, Xinliang David Li <davidxl@google.com> wrote: > On Fri, Oct 26, 2012 at 8:54 AM, Jan Hubicka <hubicka@ucw.cz> wrote: >> Hi, >> sorry for jumping in late, for too long I did not had chnce to look at my TODO. >> I have two comments... >>> Index: gcc/cgraphbuild.c >>> =================================================================== >>> --- gcc/cgraphbuild.c (revision 192623) >>> +++ gcc/cgraphbuild.c (working copy) >>> @@ -34,6 +34,7 @@ along with GCC; see the file COPYING3. If not see >>> #include "ipa-utils.h" >>> #include "except.h" >>> #include "ipa-inline.h" >>> +#include "target.h" >>> >>> /* Context of record_reference. */ >>> struct record_reference_ctx >>> @@ -317,8 +318,23 @@ build_cgraph_edges (void) >>> bb); >>> decl = gimple_call_fndecl (stmt); >>> if (decl) >>> - cgraph_create_edge (node, cgraph_get_create_node (decl), >>> - stmt, bb->count, freq); >>> + { >>> + struct cgraph_node *callee = cgraph_get_create_node (decl); >>> + /* If a call to a multiversioned function dispatcher is >>> + found, generate the body to dispatch the right function >>> + at run-time. */ >>> + if (callee->dispatcher_function) >>> + { >>> + tree resolver_decl; >>> + gcc_assert (callee->function_version >>> + && callee->function_version->next); >>> + gcc_assert (targetm.generate_version_dispatcher_body); >>> + resolver_decl >>> + = targetm.generate_version_dispatcher_body (callee); >>> + gcc_assert (resolver_decl != NULL_TREE); >>> + } >>> + cgraph_create_edge (node, callee, stmt, bb->count, freq); >>> + } >> I do not really think resolver generation belongs here + I would preffer >> build_cgraph_edges to really just build the edges. >>> Index: gcc/cgraph.c >>> =================================================================== >>> --- gcc/cgraph.c (revision 192623) >>> +++ gcc/cgraph.c (working copy) >>> @@ -1277,6 +1277,16 @@ cgraph_mark_address_taken_node (struct cgraph_node >>> node->symbol.address_taken = 1; >>> node = cgraph_function_or_thunk_node (node, NULL); >>> node->symbol.address_taken = 1; >>> + /* If the address of a multiversioned function dispatcher is taken, >>> + generate the body to dispatch the right function at run-time. This >>> + is needed as the address can be used to do an indirect call. */ >>> + if (node->dispatcher_function) >>> + { >>> + gcc_assert (node->function_version >>> + && node->function_version->next); >>> + gcc_assert (targetm.generate_version_dispatcher_body); >>> + targetm.generate_version_dispatcher_body (node); >>> + } >> >> Similarly here. I also think this way you will miss aliases of the multiversioned >> functions. > >> >> I am not sure why the multiversioning is tied with the cgraph build and the >> datastructure is put into cgraph_node itself. It seems to me that your >> dispatchers are in a way related to thunks - i.e. they are inserted into >> callgraph and once they become reachable their body needs to be produced. I >> think generate_version_dispatcher_body should thus probably be done from >> cgraph_analyze_function. (to make the function to be seen by analyze_function >> you will need to make it to be finalized at the time you set >> dispatcher_function flag. > > This seems reasonable -- Sri, do you see any problems with this suggestion? No, I will make this change asap. > >> >> I would also put the dispatcher datastructure into on-side hash by node->uid. >> (i.e. these are rare and thus the datastructure should be small) >> symbol table is critical for WPA stage memory use and I plan to remove as much >> as possible from the nodes in near future. For this reason I would preffer >> to not add too much of stuff that is not going to be used by majority of nodes. >> OK, will change as suggested. > > I had the concern on the increasing the size of core data structure too. Thanks, -Sri. > > thanks, > > David > >> Honza
Sign in to reply to this message.
Hi Diego and Honza, I have made all the changes mentioned and attached the new patch. Thanks, -Sri. On Fri, Oct 26, 2012 at 8:54 AM, Jan Hubicka <hubicka@ucw.cz> wrote: > Hi, > sorry for jumping in late, for too long I did not had chnce to look at my TODO. > I have two comments... >> Index: gcc/cgraphbuild.c >> =================================================================== >> --- gcc/cgraphbuild.c (revision 192623) >> +++ gcc/cgraphbuild.c (working copy) >> @@ -34,6 +34,7 @@ along with GCC; see the file COPYING3. If not see >> #include "ipa-utils.h" >> #include "except.h" >> #include "ipa-inline.h" >> +#include "target.h" >> >> /* Context of record_reference. */ >> struct record_reference_ctx >> @@ -317,8 +318,23 @@ build_cgraph_edges (void) >> bb); >> decl = gimple_call_fndecl (stmt); >> if (decl) >> - cgraph_create_edge (node, cgraph_get_create_node (decl), >> - stmt, bb->count, freq); >> + { >> + struct cgraph_node *callee = cgraph_get_create_node (decl); >> + /* If a call to a multiversioned function dispatcher is >> + found, generate the body to dispatch the right function >> + at run-time. */ >> + if (callee->dispatcher_function) >> + { >> + tree resolver_decl; >> + gcc_assert (callee->function_version >> + && callee->function_version->next); >> + gcc_assert (targetm.generate_version_dispatcher_body); >> + resolver_decl >> + = targetm.generate_version_dispatcher_body (callee); >> + gcc_assert (resolver_decl != NULL_TREE); >> + } >> + cgraph_create_edge (node, callee, stmt, bb->count, freq); >> + } > I do not really think resolver generation belongs here + I would preffer > build_cgraph_edges to really just build the edges. >> Index: gcc/cgraph.c >> =================================================================== >> --- gcc/cgraph.c (revision 192623) >> +++ gcc/cgraph.c (working copy) >> @@ -1277,6 +1277,16 @@ cgraph_mark_address_taken_node (struct cgraph_node >> node->symbol.address_taken = 1; >> node = cgraph_function_or_thunk_node (node, NULL); >> node->symbol.address_taken = 1; >> + /* If the address of a multiversioned function dispatcher is taken, >> + generate the body to dispatch the right function at run-time. This >> + is needed as the address can be used to do an indirect call. */ >> + if (node->dispatcher_function) >> + { >> + gcc_assert (node->function_version >> + && node->function_version->next); >> + gcc_assert (targetm.generate_version_dispatcher_body); >> + targetm.generate_version_dispatcher_body (node); >> + } > > Similarly here. I also think this way you will miss aliases of the multiversioned > functions. > > I am not sure why the multiversioning is tied with the cgraph build and the > datastructure is put into cgraph_node itself. It seems to me that your > dispatchers are in a way related to thunks - i.e. they are inserted into > callgraph and once they become reachable their body needs to be produced. I > think generate_version_dispatcher_body should thus probably be done from > cgraph_analyze_function. (to make the function to be seen by analyze_function > you will need to make it to be finalized at the time you set > dispatcher_function flag. > > I would also put the dispatcher datastructure into on-side hash by node->uid. > (i.e. these are rare and thus the datastructure should be small) > symbol table is critical for WPA stage memory use and I plan to remove as much > as possible from the nodes in near future. For this reason I would preffer > to not add too much of stuff that is not going to be used by majority of nodes. > > Honza
Sign in to reply to this message.
> Index: gcc/cgraph.c > =================================================================== > --- gcc/cgraph.c (revision 192623) > +++ gcc/cgraph.c (working copy) > @@ -132,6 +132,74 @@ static GTY(()) struct cgraph_edge *free_edges; > /* Did procss_same_body_aliases run? */ > bool same_body_aliases_done; > > +/* Map a cgraph_node to cgraph_function_version_info using this htab. > + The cgraph_function_version_info has a THIS_NODE field that is the > + corresponding cgraph_node.. */ > +htab_t GTY((param_is (struct cgraph_function_version_info *))) > + cgraph_fnver_htab = NULL; I think you want declare the htab static and arrange it to be freed after cgraph construction, so you don't need to take care of nodes being removed via the hooks. OK with this change. I have few other comments: > + /* IFUNC resolvers have to be externally visible. */ > + TREE_PUBLIC (decl) = 1; > + DECL_UNINLINABLE (decl) = 1; Why the resolvers can not be inlined? > + > + DECL_EXTERNAL (decl) = 0; > + DECL_EXTERNAL (dispatch_decl) = 0; > + > + DECL_CONTEXT (decl) = NULL_TREE; > + DECL_INITIAL (decl) = make_node (BLOCK); > + DECL_STATIC_CONSTRUCTOR (decl) = 0; > + TREE_READONLY (decl) = 0; > + DECL_PURE_P (decl) = 0; I think those can be copied from the functions you are resolving. (well as well as many attributes and properties) > + > + if (DECL_COMDAT_GROUP (default_decl)) > + { > + DECL_COMDAT (decl) = DECL_COMDAT (default_decl); > + make_decl_one_only (decl, DECL_COMDAT_GROUP (default_decl)); > + } > + else if (TREE_PUBLIC (default_decl)) > + { > + /* In this case, each translation unit with a call to this > + versioned function will put out a resolver. Ensure it > + is comdat to keep just one copy. */ > + DECL_COMDAT (decl) = 1; > + make_decl_one_only (decl, DECL_ASSEMBLER_NAME (decl)); > + } > + /* Build result decl and add to function_decl. */ > + t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE, ptr_type_node); > + DECL_ARTIFICIAL (t) = 1; > + DECL_IGNORED_P (t) = 1; > + DECL_RESULT (decl) = t; > + > + gimplify_function_tree (decl); > + push_cfun (DECL_STRUCT_FUNCTION (decl)); > + gimple_register_cfg_hooks (); > + init_empty_tree_cfg_for_function (DECL_STRUCT_FUNCTION (decl)); > + cfun->curr_properties |= > + (PROP_gimple_lcf | PROP_gimple_leh | PROP_cfg | PROP_ssa > + | PROP_gimple_any); > + cfun->curr_properties = 15; > + new_bb = create_empty_bb (ENTRY_BLOCK_PTR); > + make_edge (ENTRY_BLOCK_PTR, new_bb, EDGE_FALLTHRU); > + make_edge (new_bb, EXIT_BLOCK_PTR, 0); > + *empty_bb = new_bb; You can simplify this by init_lowered_empty_function. Honza
Sign in to reply to this message.
On Mon, Oct 29, 2012 at 5:55 AM, Jan Hubicka <hubicka@ucw.cz> wrote: >> Index: gcc/cgraph.c >> =================================================================== >> --- gcc/cgraph.c (revision 192623) >> +++ gcc/cgraph.c (working copy) >> @@ -132,6 +132,74 @@ static GTY(()) struct cgraph_edge *free_edges; >> /* Did procss_same_body_aliases run? */ >> bool same_body_aliases_done; >> >> +/* Map a cgraph_node to cgraph_function_version_info using this htab. >> + The cgraph_function_version_info has a THIS_NODE field that is the >> + corresponding cgraph_node.. */ >> +htab_t GTY((param_is (struct cgraph_function_version_info *))) >> + cgraph_fnver_htab = NULL; > > I think you want declare the htab static and arrange it to be freed after > cgraph construction, so you don't need to take care of nodes being removed > via the hooks. I will declare the htab static but I want this htab for later optimizations, like dispatch hoisting. Please see: http://gcc.gnu.org/ml/gcc-patches/2011-04/msg02285.html for a description of the optimization. IFUNC based dispatch blocks inlining of multi-versioned functions and dispatch hoisting will help with this. I will make the other changes asap. Thanks, -Sri. > > OK with this change. > > I have few other comments: >> + /* IFUNC resolvers have to be externally visible. */ >> + TREE_PUBLIC (decl) = 1; >> + DECL_UNINLINABLE (decl) = 1; > > Why the resolvers can not be inlined? >> + >> + DECL_EXTERNAL (decl) = 0; >> + DECL_EXTERNAL (dispatch_decl) = 0; >> + >> + DECL_CONTEXT (decl) = NULL_TREE; >> + DECL_INITIAL (decl) = make_node (BLOCK); >> + DECL_STATIC_CONSTRUCTOR (decl) = 0; >> + TREE_READONLY (decl) = 0; >> + DECL_PURE_P (decl) = 0; > > I think those can be copied from the functions you are resolving. (well as well > as many attributes and properties) >> + >> + if (DECL_COMDAT_GROUP (default_decl)) >> + { >> + DECL_COMDAT (decl) = DECL_COMDAT (default_decl); >> + make_decl_one_only (decl, DECL_COMDAT_GROUP (default_decl)); >> + } >> + else if (TREE_PUBLIC (default_decl)) >> + { >> + /* In this case, each translation unit with a call to this >> + versioned function will put out a resolver. Ensure it >> + is comdat to keep just one copy. */ >> + DECL_COMDAT (decl) = 1; >> + make_decl_one_only (decl, DECL_ASSEMBLER_NAME (decl)); >> + } >> + /* Build result decl and add to function_decl. */ >> + t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE, ptr_type_node); >> + DECL_ARTIFICIAL (t) = 1; >> + DECL_IGNORED_P (t) = 1; >> + DECL_RESULT (decl) = t; >> + >> + gimplify_function_tree (decl); >> + push_cfun (DECL_STRUCT_FUNCTION (decl)); >> + gimple_register_cfg_hooks (); >> + init_empty_tree_cfg_for_function (DECL_STRUCT_FUNCTION (decl)); >> + cfun->curr_properties |= >> + (PROP_gimple_lcf | PROP_gimple_leh | PROP_cfg | PROP_ssa >> + | PROP_gimple_any); >> + cfun->curr_properties = 15; >> + new_bb = create_empty_bb (ENTRY_BLOCK_PTR); >> + make_edge (ENTRY_BLOCK_PTR, new_bb, EDGE_FALLTHRU); >> + make_edge (new_bb, EXIT_BLOCK_PTR, 0); >> + *empty_bb = new_bb; > > You can simplify this by init_lowered_empty_function. > > Honza
Sign in to reply to this message.
On 10/27/2012 09:16 PM, Sriraman Tallam wrote: > + /* See if there's a match. For functions that are multi-versioned, > + all the versions match. */ > if (same_type_p (target_fn_type, static_fn_type (fn))) > - matches = tree_cons (fn, NULL_TREE, matches); > + { > + matches = tree_cons (fn, NULL_TREE, matches); > + /*If versioned, push all possible versions into a vector. */ > + if (DECL_FUNCTION_VERSIONED (fn)) > + { > + if (fn_ver_vec == NULL) > + fn_ver_vec = VEC_alloc (tree, heap, 2); > + VEC_safe_push (tree, heap, fn_ver_vec, fn); > + } > + } Why do we need to keep both a list and vector of the matches? > + Call decls_match to make sure they are different because they are > + versioned. */ > + if (DECL_FUNCTION_VERSIONED (fn)) > + { > + for (match = TREE_CHAIN (matches); match; match = TREE_CHAIN (match)) > + if (decls_match (fn, TREE_PURPOSE (match))) > + break; > + } What if you have multiple matches that aren't all versions of the same function? Why would it be a problem to have two separate declarations of the same function? > + dispatcher_decl = targetm.get_function_versions_dispatcher (fn_ver_vec); Is the idea here that if you have some versions declared, then a call, then more versions declared, then another call, you will call two different dispatchers, where the first one will only dispatch to the versions declared before the first call? If not, why do we care about the set of declarations at this point? > + /* Mark this functio to be output. */ > + node->local.finalized = 1; Missing 'n' in "function". > @@ -14227,7 +14260,11 @@ cxx_comdat_group (tree decl) > else > break; > } > - name = DECL_ASSEMBLER_NAME (decl); > + if (TREE_CODE (decl) == FUNCTION_DECL > + && DECL_FUNCTION_VERSIONED (decl)) > + name = DECL_NAME (decl); This would mean that f in the global namespace and f in namespace foo would end up in the same comdat group. Why do we need special handling here at all? > dump_function_name (tree t, int flags) > { > - tree name = DECL_NAME (t); > + tree name; > > + /* For function versions, use the assembler name as the decl name is > + the same for all versions. */ > + if (TREE_CODE (t) == FUNCTION_DECL > + && DECL_FUNCTION_VERSIONED (t)) > + name = DECL_ASSEMBLER_NAME (t); This shouldn't be necessary; we should print the target attribute when printing the function declaration. > + Also, mark this function as needed if it is marked inline but > + is a multi-versioned function. */ > + if (((flag_keep_inline_functions > + || DECL_FUNCTION_VERSIONED (fn)) This should be marked as needed by the code that builds the dispatcher. > + /* For calls to a multi-versioned function, overload resolution > + returns the function with the highest target priority, that is, > + the version that will checked for dispatching first. If this > + version is inlinable, a direct call to this version can be made > + otherwise the call should go through the dispatcher. */ I'm a bit confused why people would want both dispatched calls and non-dispatched inlining; I would expect that if a function can be compiled differently enough on newer hardware to make versioning worthwhile, that would be a larger difference than the call overhead. > + if (DECL_FUNCTION_VERSIONED (fn) > + && !targetm.target_option.can_inline_p (current_function_decl, fn)) > + { > + struct cgraph_node *dispatcher_node = NULL; > + fn = get_function_version_dispatcher (fn); > + if (fn == NULL) > + return NULL; > + dispatcher_node = cgraph_get_create_node (fn); > + gcc_assert (dispatcher_node != NULL); > + /* Mark this function to be output. */ > + dispatcher_node->local.finalized = 1; > + } Why do you need to mark this here? If you generate a call to the dispatcher, cgraph should mark it to be output automatically. > + /* For candidates of a multi-versioned function, make the version with > + the highest priority win. This version will be checked for dispatching > + first. If this version can be inlined into the caller, the front-end > + will simply make a direct call to this function. */ This is still too high in joust. I believe I said before that this code should come just above /* If the two function declarations represent the same function (this can happen with declarations in multiple scopes and arg-dependent lookup), arbitrarily choose one. But first make sure the default args we're using match. */ > + /* For multiversioned functions, aggregate all the versions here for > + generating the dispatcher body later if necessary. Check to see if > + the dispatcher is already generated to avoid doing this more than > + once. */ This caching seems to assume that you'll always be considering the same group of declarations, which goes back to my earlier question. Jason
Sign in to reply to this message.
On Tue, Oct 30, 2012 at 12:10 PM, Jason Merrill <jason@redhat.com> wrote: > On 10/27/2012 09:16 PM, Sriraman Tallam wrote: >> >> + /* See if there's a match. For functions that are >> multi-versioned, >> + all the versions match. */ >> if (same_type_p (target_fn_type, static_fn_type (fn))) >> - matches = tree_cons (fn, NULL_TREE, matches); >> + { >> + matches = tree_cons (fn, NULL_TREE, matches); >> + /*If versioned, push all possible versions into a vector. >> */ >> + if (DECL_FUNCTION_VERSIONED (fn)) >> + { >> + if (fn_ver_vec == NULL) >> + fn_ver_vec = VEC_alloc (tree, heap, 2); >> + VEC_safe_push (tree, heap, fn_ver_vec, fn); >> + } >> + } > > > Why do we need to keep both a list and vector of the matches? Right, but we later call the target hook get_function_versions_dispatcher which takes a vector. I could change that to accept a list instead if that is preferable? > >> + Call decls_match to make sure they are different because they are >> + versioned. */ >> + if (DECL_FUNCTION_VERSIONED (fn)) >> + { >> + for (match = TREE_CHAIN (matches); match; match = TREE_CHAIN >> (match)) >> + if (decls_match (fn, TREE_PURPOSE (match))) >> + break; >> + } > > > What if you have multiple matches that aren't all versions of the same > function? Right, I should really check if there are versions by comparing params too. I fixed this in joust but missed out here. I will make the change so that any matches with functions that do not belong to the semantically identical group of function versions will be caught and the ambiguity will be flagged. > > Why would it be a problem to have two separate declarations of the same > function? AFAIU, this should not be a problem. For duplicate declarations, duplicate_decls should merge them and they should never be seen here. Did I miss something? > >> + dispatcher_decl = targetm.get_function_versions_dispatcher >> (fn_ver_vec); > > > Is the idea here that if you have some versions declared, then a call, then > more versions declared, then another call, you will call two different > dispatchers, No, I thought about this but I did not want to handle this case in this iteration. The dispatcher is created only once and if more functions are declared later, they will not be dispatched atleast in this iteration. > where the first one will only dispatch to the versions declared > before the first call? If not, why do we care about the set of declarations > at this point? I am taking the address of a multi-versioned function here. The front-end is returning the address of the dispatcher decl instead. Since, I am building the dispatcher here, why not construct the cgraph datastructures for these versions too? That is why I aggregate all the declarations here. > >> + /* Mark this functio to be output. */ >> + node->local.finalized = 1; > > > Missing 'n' in "function". > >> @@ -14227,7 +14260,11 @@ cxx_comdat_group (tree decl) >> else >> break; >> } >> - name = DECL_ASSEMBLER_NAME (decl); >> + if (TREE_CODE (decl) == FUNCTION_DECL >> + && DECL_FUNCTION_VERSIONED (decl)) >> + name = DECL_NAME (decl); > > > This would mean that f in the global namespace and f in namespace foo would > end up in the same comdat group. Why do we need special handling here at > all? Right, we do not need special handling. It is ok for each function version to be in its own comdat group, I will remove this. > >> dump_function_name (tree t, int flags) >> { >> - tree name = DECL_NAME (t); >> + tree name; >> >> + /* For function versions, use the assembler name as the decl name is >> + the same for all versions. */ >> + if (TREE_CODE (t) == FUNCTION_DECL >> + && DECL_FUNCTION_VERSIONED (t)) >> + name = DECL_ASSEMBLER_NAME (t); > > > This shouldn't be necessary; we should print the target attribute when > printing the function declaration. Ok. > >> + Also, mark this function as needed if it is marked inline but >> + is a multi-versioned function. */ >> + if (((flag_keep_inline_functions >> + || DECL_FUNCTION_VERSIONED (fn)) > > > This should be marked as needed by the code that builds the dispatcher. I had some trouble previously figuring out where to mark this as needed. I will fix it. > >> + /* For calls to a multi-versioned function, overload resolution >> + returns the function with the highest target priority, that is, >> + the version that will checked for dispatching first. If this >> + version is inlinable, a direct call to this version can be made >> + otherwise the call should go through the dispatcher. */ > > > I'm a bit confused why people would want both dispatched calls and > non-dispatched inlining; I would expect that if a function can be compiled > differently enough on newer hardware to make versioning worthwhile, that > would be a larger difference than the call overhead. Simple example: int foo () { return 1; } int __attribute__ ((target ("popcnt"))) foo () { return 0; } int __attribute__ ((target ("popcnt"))) bar () { return foo (); } Here, the call to foo () from bar () will be turned into a direct call to the popcnt version. Here, if bar is executed, then popcnt is supported and the call to foo from bar will be dispatched to the popcnt version even if it goes through the dispatcher and this is known at compile time. So, why not make a direct call? I am only making direct calls to versions when I am sure the dispatcher would do the same. > >> + if (DECL_FUNCTION_VERSIONED (fn) >> + && !targetm.target_option.can_inline_p (current_function_decl, fn)) >> + { >> + struct cgraph_node *dispatcher_node = NULL; >> + fn = get_function_version_dispatcher (fn); >> + if (fn == NULL) >> + return NULL; >> + dispatcher_node = cgraph_get_create_node (fn); >> + gcc_assert (dispatcher_node != NULL); >> + /* Mark this function to be output. */ >> + dispatcher_node->local.finalized = 1; >> + } > > > Why do you need to mark this here? If you generate a call to the > dispatcher, cgraph should mark it to be output automatically. dispatcher_node does not have a body until it is generated in cgraphunit.c, so cgraph does not mark this field before this is processed in cgraph_analyze_function. > >> + /* For candidates of a multi-versioned function, make the version with >> + the highest priority win. This version will be checked for >> dispatching >> + first. If this version can be inlined into the caller, the >> front-end >> + will simply make a direct call to this function. */ > > > This is still too high in joust. I believe I said before that this code > should come just above > > /* If the two function declarations represent the same function (this can > happen with declarations in multiple scopes and arg-dependent lookup), > arbitrarily choose one. But first make sure the default args we're > using match. */ Yes, I missed this the last time around. Will fix it this time. > >> + /* For multiversioned functions, aggregate all the versions here for >> + generating the dispatcher body later if necessary. Check to see if >> + the dispatcher is already generated to avoid doing this more than >> + once. */ > > > This caching seems to assume that you'll always be considering the same > group of declarations, which goes back to my earlier question. Yes, for now I want to be only considering the same group of declarations. I am assuming that all declarations/definitions of all versions of foo are seen before the first call to foo. I do not want multiple dispatcher support complexity in this iteration. Is it ok to delay this to the next patch iteration? Your earlier question on this was: "This seems to assume that all the functions in the list of candidates are versioned, but there might be unrelated functions from different namespaces. Also, doing this every time someone calls a versioned function seems like the wrong place; I would think it would be better to build up a list of versions as you seed declarations, and then use that list to define the dispatcher at EOF if it's needed." I have fixed the problem of unrelated functions by always checking the type (same_type_p) and params (comp_params) in get_function_version_dispatcher. You talked about doing the dispatcher building later, but I did it here since I am doing it only once. Thanks, -Sri. > > Jason >
Sign in to reply to this message.
|