[Bug c/40363] New: Nonoptimal save/restore registers

2009-06-06 Thread vvv at ru dot ru
ladimir Volynsky -- Summary: Nonoptimal save/restore registers Product: gcc Version: 4.5.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org Repor

[Bug target/40171] GCC does not pass -mtune and -march options to assembler!

2009-05-25 Thread vvv at ru dot ru
--- Comment #4 from vvv at ru dot ru 2009-05-25 19:54 --- (In reply to comment #2) > This is very odd? What is the assembler doing that the compiler isn't? There are exist some optimizations impossible without exact knowledge of address and opcodes, One example avoiding o

[Bug target/39942] Nonoptimal code - leaveq; xchg %ax,%ax; retq

2009-05-20 Thread vvv at ru dot ru
--- Comment #49 from vvv at ru dot ru 2009-05-20 21:38 --- (In reply to comment #48) How this patches work? Is it required some special options? # /media/disk-1/B/bin/gcc --version gcc (GCC) 4.5.0 20090520 (experimental) # cat test.c void f(int i) { if (i == 1) F(1); if

[Bug c/40171] New: GCC does not pass -mtune and -march options to assembler!

2009-05-16 Thread vvv at ru dot ru
assembler! Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: vvv at ru dot ru http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40171

[Bug target/39942] Nonoptimal code - leaveq; xchg %ax,%ax; retq

2009-05-14 Thread vvv at ru dot ru
--- Comment #34 from vvv at ru dot ru 2009-05-14 19:43 --- (In reply to comment #32) > Please make sure that you only test nop paddings for branch insns, > not nop paddings for branch targets, which prefer 16byte alignment. Additional tests (for Core2) results: 1. Execution time

[Bug target/39942] Nonoptimal code - leaveq; xchg %ax,%ax; retq

2009-05-14 Thread vvv at ru dot ru
--- Comment #30 from vvv at ru dot ru 2009-05-14 09:01 --- Created an attachment (id=17863) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17863&action=view) Testing tool. Here is results of my testing. Code: align 128 test_cikl: rept 14 ; 14 if SH=0, 15

[Bug target/39942] Nonoptimal code - leaveq; xchg %ax,%ax; retq

2009-05-13 Thread vvv at ru dot ru
--- Comment #28 from vvv at ru dot ru 2009-05-13 19:18 --- (In reply to comment #24) > Using padding to avoid 4 branches in 16byte chunk may not be a good idea since > it will increase code size. It's enough only one byte NOP per 16-byte chunk for padding. But, IMHO, four bra

[Bug target/39942] Nonoptimal code - leaveq; xchg %ax,%ax; retq

2009-05-13 Thread vvv at ru dot ru
--- Comment #26 from vvv at ru dot ru 2009-05-13 19:05 --- (In reply to comment #23) > Note that we need something that works for the generic model as well, which in > this case looks like it is the same as for AMD models. There is processor property TARGET_FOUR_JUMP_LIMIT,

[Bug target/39942] Nonoptimal code - leaveq; xchg %ax,%ax; retq

2009-05-13 Thread vvv at ru dot ru
--- Comment #25 from vvv at ru dot ru 2009-05-13 18:56 --- (In reply to comment #22) > CCing H.J for Intel optimization issues. VVV> 1. AMD limitation for 16-bytes page (memory range XXX0 - XXXF), but VVV> Intel limitation for 16-bytes chunk (memory range -

[Bug target/39942] Nonoptimal code - leaveq; xchg %ax,%ax; retq

2009-05-13 Thread vvv at ru dot ru
--- Comment #21 from vvv at ru dot ru 2009-05-13 17:13 --- I guess! Your patch is absolutely correct for AMD AthlonTM 64 and AMD OpteronTM processors, but it is nonoptimal for Intel processors. Because: 1. AMD limitation for 16-bytes page (memory range XXX0 - XXXF), but Intel

[Bug target/39942] Nonoptimal code - leaveq; xchg %ax,%ax; retq

2009-05-13 Thread vvv at ru dot ru
--- Comment #19 from vvv at ru dot ru 2009-05-13 11:42 --- (In reply to comment #18) > No, .p2align is the right thing to do, given that GCC doesn't have 100% > accurate information about instruction sizes (for e.g. inline asms it can't > have, for > stuff where

[Bug target/39942] Nonoptimal code - leaveq; xchg %ax,%ax; retq

2009-05-12 Thread vvv at ru dot ru
--- Comment #17 from vvv at ru dot ru 2009-05-12 16:40 --- (In reply to comment #16) > Created an attachment (id=17783) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17783&action=view) [edit] > gcc45-pr39942.patch > Patch that attempts to take into account .p2align

[Bug middle-end/40093] Optimization by functios reordering.

2009-05-10 Thread vvv at ru dot ru
--- Comment #5 from vvv at ru dot ru 2009-05-10 18:20 --- (In reply to comment #4) > Well you need whole program to get the behavior which you want. Yes. Of course, it's no problem for small single-programmer project, but it's problem for big projects like Linux Kernel.

[Bug middle-end/40093] Optimization by functios reordering.

2009-05-10 Thread vvv at ru dot ru
--- Comment #3 from vvv at ru dot ru 2009-05-10 18:08 --- (In reply to comment #2) > This should have been done already with cgraph order. Unfortunately, I can see inverse order only in separate source file. Inverse but not optimized. Example: // file order1.c #include main(int a

[Bug c/40093] Optimization by functios reordering.

2009-05-10 Thread vvv at ru dot ru
--- Comment #1 from vvv at ru dot ru 2009-05-10 16:43 --- Created an attachment (id=17847) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17847&action=view) Example direct/inverse calls Simple example. RDTSC ticks for direct and inverse sequence of calls. --

[Bug c/40093] New: Optimization by functios reordering.

2009-05-10 Thread vvv at ru dot ru
ReportedBy: vvv at ru dot ru http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40093

[Bug target/40072] Nonoptimal code - CMOVxx %eax,%edi; mov %edi,%eax; retq

2009-05-10 Thread vvv at ru dot ru
--- Comment #3 from vvv at ru dot ru 2009-05-10 16:09 --- > Not really, the move insn is moved to the beginning of the function: > 0060 : > 60: 89 f8 mov%edi,%eax > 62: 83 ff 08cmp$0x8,%edi > 65:

[Bug target/40072] Nonoptimal code - CMOVxx %eax,%edi; mov %edi,%eax; retq

2009-05-09 Thread vvv at ru dot ru
--- Comment #1 from vvv at ru dot ru 2009-05-09 12:02 --- There is no bug for current trunk. So bug fixed. -- vvv at ru dot ru changed: What|Removed |Added

[Bug c/40072] New: Nonoptimal code - CMOVxx %eax,%edi; mov %edi,%eax; retq

2009-05-08 Thread vvv at ru dot ru
atus: UNCONFIRMED Severity: minor Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: vvv at ru dot ru http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40072

[Bug target/39942] Nonoptimal code - leaveq; xchg %ax,%ax; retq

2009-04-29 Thread vvv at ru dot ru
--- Comment #15 from vvv at ru dot ru 2009-04-29 19:16 --- One more example 5-bytes nop between leaveq and retq. # cat test.c void wait_for_enter() { int u = getchar(); while (!u) u = getchar()-13; } main() { wait_for_enter(); } # gcc -o t.out test.c

[Bug target/39942] Nonoptimal code - leaveq; xchg %ax,%ax; retq

2009-04-29 Thread vvv at ru dot ru
--- Comment #12 from vvv at ru dot ru 2009-04-29 07:55 --- (In reply to comment #9) > So that explains it, Use -Os or attribute cold if you want NOPs to be gone. But my measurements on Core 2 Duo P8600 show that push %ebp mov %esp,%ebp leave ret _faster_ then push %ebp mov %

[Bug target/39942] Nonoptimal code - leaveq; xchg %ax,%ax; retq

2009-04-29 Thread vvv at ru dot ru
--- Comment #11 from vvv at ru dot ru 2009-04-29 07:46 --- (In reply to comment #8) > From config/i386/i386.c: > /* AMD Athlon works faster >when RET is not destination of conditional jump or directly preceded >by other jump instruction. We avoid the penalty by i

[Bug target/39942] Nonoptimal code - leaveq; xchg %ax,%ax; retq

2009-04-28 Thread vvv at ru dot ru
--- Comment #6 from vvv at ru dot ru 2009-04-28 21:18 --- Let's compile file test.c //#file test.c extern int F(int m); void func(int x) { int u = F(x); while (u) u = F(u)*3+1; } # gcc -o t.out test.c -c -O2 # objdump -d t.out t.out: file f

[Bug target/39942] Nonoptimal code - leaveq; xchg %ax,%ax; retq

2009-04-28 Thread vvv at ru dot ru
--- Comment #4 from vvv at ru dot ru 2009-04-28 17:15 --- Created an attachment (id=1) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=1&action=view) Simple example from Linux See two functons: static void pre_schedule_rt static void switched_from_rt --

[Bug target/39942] Nonoptimal code - leaveq; xchg %ax,%ax; retq

2009-04-28 Thread vvv at ru dot ru
--- Comment #3 from vvv at ru dot ru 2009-04-28 17:10 --- Additional examples from Linux Kernel 2.6.29.1: (Note: conditional statement at the end of all fuctions!) = linux/drivers/video/console/bitblit.c void fbcon_set_bitops(struct fbcon_ops *ops) { ops->bm

[Bug target/39942] Nonoptimal code - leaveq; xchg %ax,%ax; retq

2009-04-28 Thread vvv at ru dot ru
--- Comment #2 from vvv at ru dot ru 2009-04-28 17:04 --- Created an attachment (id=17776) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17776&action=view) Source file from Linx Kernel 2.6.29.1 See static void set_blitting_type -- http://gcc.gnu.org/bugzilla/show_bug

[Bug c/39942] New: Nonoptimal code - leaveq; xchg %ax,%ax; retq

2009-04-28 Thread vvv at ru dot ru
ummary: Nonoptimal code - leaveq; xchg %ax,%ax; retq Product: gcc Version: 4.3.2 Status: UNCONFIRMED Severity: minor Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: vvv at ru dot ru

[Bug c/39549] New: Nonoptimal byte load. mov (%rdi),%al better then movzbl (%rdi),%eax

2009-03-24 Thread vvv at ru dot ru
Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: vvv at ru dot ru http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39549

[Bug c/39520] New: Empty function translated to repz retq.

2009-03-22 Thread vvv at ru dot ru
ReportedBy: vvv at ru dot ru http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39520