ladimir Volynsky
--
Summary: Nonoptimal save/restore registers
Product: gcc
Version: 4.5.0
Status: UNCONFIRMED
Severity: enhancement
Priority: P3
Component: c
AssignedTo: unassigned at gcc dot gnu dot org
Repor
--- Comment #4 from vvv at ru dot ru 2009-05-25 19:54 ---
(In reply to comment #2)
> This is very odd? What is the assembler doing that the compiler isn't?
There are exist some optimizations impossible without exact knowledge of
address and opcodes,
One example avoiding o
--- Comment #49 from vvv at ru dot ru 2009-05-20 21:38 ---
(In reply to comment #48)
How this patches work? Is it required some special options?
# /media/disk-1/B/bin/gcc --version
gcc (GCC) 4.5.0 20090520 (experimental)
# cat test.c
void f(int i)
{
if (i == 1) F(1);
if
assembler!
Product: gcc
Version: 4.4.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: vvv at ru dot ru
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40171
--- Comment #34 from vvv at ru dot ru 2009-05-14 19:43 ---
(In reply to comment #32)
> Please make sure that you only test nop paddings for branch insns,
> not nop paddings for branch targets, which prefer 16byte alignment.
Additional tests (for Core2) results:
1. Execution time
--- Comment #30 from vvv at ru dot ru 2009-05-14 09:01 ---
Created an attachment (id=17863)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17863&action=view)
Testing tool.
Here is results of my testing.
Code:
align 128
test_cikl:
rept 14 ; 14 if SH=0, 15
--- Comment #28 from vvv at ru dot ru 2009-05-13 19:18 ---
(In reply to comment #24)
> Using padding to avoid 4 branches in 16byte chunk may not be a good idea since
> it will increase code size.
It's enough only one byte NOP per 16-byte chunk for padding. But, IMHO, four
bra
--- Comment #26 from vvv at ru dot ru 2009-05-13 19:05 ---
(In reply to comment #23)
> Note that we need something that works for the generic model as well, which in
> this case looks like it is the same as for AMD models.
There is processor property TARGET_FOUR_JUMP_LIMIT,
--- Comment #25 from vvv at ru dot ru 2009-05-13 18:56 ---
(In reply to comment #22)
> CCing H.J for Intel optimization issues.
VVV> 1. AMD limitation for 16-bytes page (memory range XXX0 - XXXF),
but
VVV> Intel limitation for 16-bytes chunk (memory range -
--- Comment #21 from vvv at ru dot ru 2009-05-13 17:13 ---
I guess! Your patch is absolutely correct for AMD AthlonTM 64 and AMD OpteronTM
processors, but it is nonoptimal for Intel processors. Because:
1. AMD limitation for 16-bytes page (memory range XXX0 - XXXF), but
Intel
--- Comment #19 from vvv at ru dot ru 2009-05-13 11:42 ---
(In reply to comment #18)
> No, .p2align is the right thing to do, given that GCC doesn't have 100%
> accurate information about instruction sizes (for e.g. inline asms it can't
> have, for
> stuff where
--- Comment #17 from vvv at ru dot ru 2009-05-12 16:40 ---
(In reply to comment #16)
> Created an attachment (id=17783)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17783&action=view) [edit]
> gcc45-pr39942.patch
> Patch that attempts to take into account .p2align
--- Comment #5 from vvv at ru dot ru 2009-05-10 18:20 ---
(In reply to comment #4)
> Well you need whole program to get the behavior which you want.
Yes. Of course, it's no problem for small single-programmer project, but it's
problem for big projects like Linux Kernel.
--- Comment #3 from vvv at ru dot ru 2009-05-10 18:08 ---
(In reply to comment #2)
> This should have been done already with cgraph order.
Unfortunately, I can see inverse order only in separate source file. Inverse
but not optimized.
Example:
// file order1.c
#include
main(int a
--- Comment #1 from vvv at ru dot ru 2009-05-10 16:43 ---
Created an attachment (id=17847)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17847&action=view)
Example direct/inverse calls
Simple example. RDTSC ticks for direct and inverse sequence of calls.
--
ReportedBy: vvv at ru dot ru
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40093
--- Comment #3 from vvv at ru dot ru 2009-05-10 16:09 ---
> Not really, the move insn is moved to the beginning of the function:
> 0060 :
> 60: 89 f8 mov%edi,%eax
> 62: 83 ff 08cmp$0x8,%edi
> 65:
--- Comment #1 from vvv at ru dot ru 2009-05-09 12:02 ---
There is no bug for current trunk. So bug fixed.
--
vvv at ru dot ru changed:
What|Removed |Added
atus: UNCONFIRMED
Severity: minor
Priority: P3
Component: c
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: vvv at ru dot ru
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40072
--- Comment #15 from vvv at ru dot ru 2009-04-29 19:16 ---
One more example 5-bytes nop between leaveq and retq.
# cat test.c
void wait_for_enter()
{
int u = getchar();
while (!u)
u = getchar()-13;
}
main()
{
wait_for_enter();
}
# gcc -o t.out test.c
--- Comment #12 from vvv at ru dot ru 2009-04-29 07:55 ---
(In reply to comment #9)
> So that explains it, Use -Os or attribute cold if you want NOPs to be gone.
But my measurements on Core 2 Duo P8600 show that
push %ebp
mov %esp,%ebp
leave
ret
_faster_ then
push %ebp
mov %
--- Comment #11 from vvv at ru dot ru 2009-04-29 07:46 ---
(In reply to comment #8)
> From config/i386/i386.c:
> /* AMD Athlon works faster
>when RET is not destination of conditional jump or directly preceded
>by other jump instruction. We avoid the penalty by i
--- Comment #6 from vvv at ru dot ru 2009-04-28 21:18 ---
Let's compile file test.c
//#file test.c
extern int F(int m);
void func(int x)
{
int u = F(x);
while (u)
u = F(u)*3+1;
}
# gcc -o t.out test.c -c -O2
# objdump -d t.out
t.out: file f
--- Comment #4 from vvv at ru dot ru 2009-04-28 17:15 ---
Created an attachment (id=1)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=1&action=view)
Simple example from Linux
See two functons:
static void pre_schedule_rt
static void switched_from_rt
--
--- Comment #3 from vvv at ru dot ru 2009-04-28 17:10 ---
Additional examples from Linux Kernel 2.6.29.1:
(Note: conditional statement at the end of all fuctions!)
=
linux/drivers/video/console/bitblit.c
void fbcon_set_bitops(struct fbcon_ops *ops)
{
ops->bm
--- Comment #2 from vvv at ru dot ru 2009-04-28 17:04 ---
Created an attachment (id=17776)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17776&action=view)
Source file from Linx Kernel 2.6.29.1
See static void set_blitting_type
--
http://gcc.gnu.org/bugzilla/show_bug
ummary: Nonoptimal code - leaveq; xchg %ax,%ax; retq
Product: gcc
Version: 4.3.2
Status: UNCONFIRMED
Severity: minor
Priority: P3
Component: c
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: vvv at ru dot ru
Priority: P3
Component: c
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: vvv at ru dot ru
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39549
ReportedBy: vvv at ru dot ru
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39520
29 matches
Mail list logo