https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88587
--- Comment #3 from Martin Liška <marxin at gcc dot gnu.org> --- So the problem is that new function body is copied in: (gdb) bt #0 __memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:342 #1 0x0000000000fd7378 in copy_node (node=node@entry=<var_decl 0x7ffff7feeab0 b>) at /home/marxin/Programming/gcc/gcc/tree.c:1183 #2 0x0000000000db07bf in copy_decl_no_change (decl=<var_decl 0x7ffff7feeab0 b>, id=0x7fffffffd5d0) at /home/marxin/Programming/gcc/gcc/tree-inline.c:5557 #3 0x0000000000db0b2d in remap_decl (decl=<optimized out>, id=0x7fffffffd5d0) at /home/marxin/Programming/gcc/gcc/tree-inline.c:366 #4 0x0000000000db0d9d in remap_decls (decls=<optimized out>, nonlocalized_list=0x7ffff6c60508<error reading variable: Cannot access memory at address 0x4>, id=0x7fffffffd5d0) at /home/marxin/Programming/gcc/gcc/tree-inline.c:656 #5 0x0000000000db29c2 in remap_block (block=0x7fffffffd498, id=0x7fffffffd5d0) at /home/marxin/Programming/gcc/gcc/tree.h:3177 #6 0x0000000000db2a86 in remap_blocks (block=<block 0x7ffff6c602a0>, id=0x7fffffffd5d0) at /home/marxin/Programming/gcc/gcc/tree-inline.c:736 #7 0x0000000000dbd5c8 in tree_function_versioning (old_decl=old_decl@entry=<function_decl 0x7ffff6c5c200 a>, new_decl=new_decl@entry=<function_decl 0x7ffff6c5c800 a.sse2.0>, tree_map=tree_map@entry=0x0, update_clones=update_clones@entry=false, args_to_skip=args_to_skip@entry=0x0, skip_return=skip_return@entry=false, blocks_to_copy=<optimized out>, new_entry=<optimized out>) at /home/marxin/Programming/gcc/gcc/tree.h:3291 #8 0x000000000093f8b1 in cgraph_node::create_version_clone_with_body (this=this@entry=<cgraph_node * const 0x7ffff6b552d0 "a"/0>, redirect_callers=..., redirect_callers@entry=..., tree_map=tree_map@entry=0x0, args_to_skip=args_to_skip@entry=0x0, skip_return=skip_return@entry=false, bbs_to_copy=bbs_to_copy@entry=0x0, new_entry_block=<optimized out>, suffix=<optimized out>) at /home/marxin/Programming/gcc/gcc/cgraphclones.c:1060 #9 0x00000000015d8c18 in create_target_clone (name=0x21fddc0 "sse2", definition=<optimized out>, node=<cgraph_node * 0x7ffff6b552d0 "a"/0>) at /home/marxin/Programming/gcc/gcc/multiple_target.c:303 #10 expand_target_clones (definition=<optimized out>, node=<optimized out>) at /home/marxin/Programming/gcc/gcc/multiple_target.c:403 #11 ipa_target_clone () at /home/marxin/Programming/gcc/gcc/multiple_target.c:509 #12 (anonymous namespace)::pass_target_clone::execute (this=<optimized out>) at /home/marxin/Programming/gcc/gcc/multiple_target.c:545 #13 0x0000000000c53605 in execute_one_pass (pass=<opt_pass* 0x220d2a0 "targetclone"(65)>) at /home/marxin/Programming/gcc/gcc/passes.c:2483 #14 0x0000000000c5c0d6 in execute_ipa_pass_list (pass=<opt_pass* 0x220d2a0 "targetclone"(65)>) at /home/marxin/Programming/gcc/gcc/passes.c:2923 #15 0x0000000000939853 in ipa_passes () at /home/marxin/Programming/gcc/gcc/cgraphunit.c:2482 #16 symbol_table::compile (this=0x7ffff6b56100) at /home/marxin/Programming/gcc/gcc/cgraphunit.c:2618 #17 0x000000000093bbed in symbol_table::compile (this=0x7ffff6b56100) at /home/marxin/Programming/gcc/gcc/cgraphunit.c:2863 #18 symbol_table::finalize_compilation_unit (this=0x7ffff6b56100) at /home/marxin/Programming/gcc/gcc/cgraphunit.c:2863 #19 0x0000000000d3a62b in compile_file () at /home/marxin/Programming/gcc/gcc/toplev.c:481 #20 0x00000000007a630a in do_compile () at /home/marxin/Programming/gcc/gcc/toplev.c:2176 #21 toplev::main (this=this@entry=0x7fffffffd9de, argc=<optimized out>, argc@entry=24, argv=<optimized out>, argv@entry=0x7fffffffdad8) at /home/marxin/Programming/gcc/gcc/toplev.c:2311 #22 0x00000000007a98eb in main (argc=24, argv=0x7fffffffdad8) at /home/marxin/Programming/gcc/gcc/main.c:39 so we copy the 'b' with copy_decl_no_change and thus we end up with same mode. On the other hand when using target("sse2"), the V4SI mode is set here: #0 layout_decl (decl=<result_decl 0x7ffff6b60d98>, known_align=0) at /home/marxin/Programming/gcc/gcc/stor-layout.c:605 #1 0x0000000000fc1be2 in build_decl (loc=loc@entry=255680, code=code@entry=RESULT_DECL, name=name@entry=<tree 0x0>, type=<void_type 0x7ffff6b740a8 void>) at /home/marxin/Programming/gcc/gcc/tree.c:5035 #2 0x00000000007cab6f in start_function (declspecs=declspecs@entry=0x225f0f0, declarator=declarator@entry=0x225f200, attributes=<optimized out>, attributes@entry=<tree_list 0x7ffff6c59758>) at /home/marxin/Programming/gcc/gcc/c/c-decl.c:8991 #3 0x000000000081d8d3 in c_parser_declaration_or_fndef (parser=0x7ffff6b60d20, fndef_ok=true, static_assert_ok=<optimized out>, empty_ok=<optimized out>, nested=<optimized out>, start_attr_ok=<optimized out>, objc_foreach_object_declaration=<optimized out>, omp_declare_simd_clauses=..., oacc_routine_data=<optimized out>, fallthru_attr_p=<optimized out>) at /home/marxin/Programming/gcc/gcc/c/c-parser.c:2256 #4 0x00000000008252e0 in c_parser_external_declaration (parser=0x7ffff6b60d20) at /home/marxin/Programming/gcc/gcc/c/c-parser.c:1653 #5 0x0000000000825b2a in c_parser_translation_unit (parser=<optimized out>) at /home/marxin/Programming/gcc/gcc/c/c-parser.c:1534 #6 c_parse_file () at /home/marxin/Programming/gcc/gcc/c/c-parser.c:19807 #7 0x0000000000878de1 in c_common_parse_file () at /home/marxin/Programming/gcc/gcc/c-family/c-opts.c:1151 #8 0x0000000000d3a3cf in compile_file () at /home/marxin/Programming/gcc/gcc/toplev.c:456 #9 0x00000000007a630a in do_compile () at /home/marxin/Programming/gcc/gcc/toplev.c:2176 #10 toplev::main (this=this@entry=0x7fffffffd9ce, argc=<optimized out>, argc@entry=24, argv=<optimized out>, argv@entry=0x7fffffffdac8) at /home/marxin/Programming/gcc/gcc/toplev.c:2311 #11 0x00000000007a98eb in main (argc=24, argv=0x7fffffffdac8) at /home/marxin/Programming/gcc/gcc/main.c:39 So call to relayout_decl will be needed. I'm investigating..