> Hi, Jan!
> I was just preparing my version of the patch, but it seems a bit late
> now. Please see my comments to this and your previous letter below.
>
> By the way, would it be possible to commit other part of the patch
> (middle-end part) - probably also by small parts - and some other
> tun
>
> There is rolled loop algorithm, that doesn't use SSE-modes - such
> architectures could use it instead of unrolled_loop. I think the
> performance wouldn't suffer much from that.
> For the most of modern processors, SSE-moves are faster than several
> word-sized moves, so this change in unroll
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 2c53423..6ce240a 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -561,10 +561,14 @@ struct processor_costs ix86_size_cost = {/* costs for
tuning for size */
COSTS_N_BYTES (2), /* cost of FAB
> I am going to benchmark the following hunk separately tonight. It is
> independent change.
You would probably need some changes from sse.md (for gen_sse2_loadq).
Michael
Hi,
I am going to benchmark the following hunk separately tonight. It is
independent change.
Rth, Vladimir: there are obviously several options how to make GCC use SSE for
64bit loads/stores in 32bit codegen (and 128bit loads/stores in 128bit
codegen). What do you think is best variant here?
(an
On 10/28/2011 05:41 AM, Michael Zolotukhin wrote:
>> > +/* Target hook. Returns rtx of mode MODE with promoted value VAL, that is
>> > + supposed to represent one byte. MODE could be a vector mode.
>> > + Example:
>> > + 1) VAL = const_int (0xAB), mode = SImode,
>> > + the result is const
Hi Jan!
Thanks for the review, you could find my answers to some of your
remarks below. I'll send a corrected patch soon with answers to the
rest of your remarks.
> - {{rep_prefix_1_byte, {{-1, rep_prefix_1_byte}}},
> + {{{rep_prefix_1_byte, {{-1, rep_prefix_1_byte}}},
>{rep_prefix_1_byte, {
Hi,
sorry for delay with the review. This is my first pass through the backend
part, hopefully
someone else will do the middle end bits.
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 2c53423..d7c4330 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -561,10
Any questions on these patches? Are they ok for the trunk?
On 20 October 2011 12:37, Michael Zolotukhin
wrote:
> And, finally, part with the tests.
>
> On 20 October 2011 12:36, Michael Zolotukhin
> wrote:
>> Back-end part of the patch is attached here.
>>
>> On 20 October 2011 12:35, Michael Zo
Middle-end part of the patch is attached.
On 20 October 2011 12:34, Michael Zolotukhin
wrote:
> I fixed the tests as well as updated my branch and fixed introduced
> during this process bugs.
> Here is fixed complete patch (other parts will be sent in consequent letters).
>
> The changes passed b
Hi!
On Thu, Sep 29, 2011 at 03:14:40PM +0400, Michael Zolotukhin wrote:
+/* { dg-options "-O2 -march=atom -mtune=atom -m64 -dp" } */
The testcases are wrong, -m64 or -m32 should never appear in dg-options,
ins
> Sorry what I meant is that it would be bad if -mtune=corei7(-avx)? was
> slower than generic.
For now, -mtune=corei7 is triggering use of generic cost-table (I'm
not sure about corei7-avx, but assume the same) - so it won't be
slower.
> Adding new tables shouldn't be very difficult, even if they
> Michael,
>Did you bootstrap with --enable-checking=yes? I am seeing the bootstrap
> failure...
I checked bootstrap, specs and 'make check' with the complete patch.
Separate patches for ME and BE were only tested for build (no
bootstrap) and 'make check'. I think it's better to apply the compl
On Wed, Sep 28, 2011 at 05:33:23PM +0400, Michael Zolotukhin wrote:
> > It appears that part 1 of the patch wasn't really attached.
> Thanks, resending.
Michael,
Did you bootstrap with --enable-checking=yes? I am seeing the bootstrap
failure...
/sw/src/fink.build/gcc47-4.7.0-1/darwin_objdir
On Wed, Sep 28, 2011 at 06:27:11PM +0200, Andi Kleen wrote:
> > There is no separate cost-table for Nehalem or SandyBridge - however,
> > I tuned generic32 and generic64 tables, that should improve
> > performance on modern processors. In old version REP-MOV was used - it
>
> The recommended heuri
> There is no separate cost-table for Nehalem or SandyBridge - however,
> I tuned generic32 and generic64 tables, that should improve
> performance on modern processors. In old version REP-MOV was used - it
The recommended heuristics have changed in Nehalem and Sandy-Bridge
over earlier Intel CPUs
> You could add a check to configure and generate based on that?
Do you mean check if glibc is newer than 2.13?
I think that when new glibc version is released, the tables should be
re-examined anyway - we shouldn't just stop inlining, or stop
generating libcalls.
> BTW I know that the tables need
On Wed, Sep 28, 2011 at 02:54:34PM +0200, Jan Hubicka wrote:
> > > Do you know glibc version numbers when
> > > the optimized string functions was introduced?
> >
> > Afaik, it's 2.13.
> > I also compared my implementation to 2.13.
>
> I wonder if we can assume that most of GCC 4.7 based systems
> It appears that part 1 of the patch wasn't really attached.
Thanks, resending.
memfunc-mid.patch
Description: Binary data
On Wed, Sep 28, 2011 at 02:56:30PM +0400, Michael Zolotukhin wrote:
> Attached is a part 1 of patch that enables use of vector-instructions
> in memset and memcopy (middle-end part).
> The main part of the changes is in functions
> move_by_pieces/set_by_pieces. In new version algorithm of move-mode
> (I worry about the tables in i386.c deciding what strategy to use for block of
> given size. This is more or less unrelated to the actual patch)
Yep, the threshold values I mentioned above are the values in these
tables. Even with fast glibs there are some cases when inlining is
profitable (e.g.
> > Do you know glibc version numbers when
> > the optimized string functions was introduced?
>
> Afaik, it's 2.13.
> I also compared my implementation to 2.13.
I wonder if we can assume that most of GCC 4.7 based systems will be glibc 2.13
based, too. I would tend to say that yes and thus would
> Do you know glibc version numbers when
> the optimized string functions was introduced?
Afaik, it's 2.13.
I also compared my implementation to 2.13.
This expanding only works on relatively small sizes (up to 4k), where
overhead of library call could be quite significant. In some cases new
implementation gives 5x acceleration (especially on small sizes - less
than ~256 bytes). Almost on all sizes from 16 to 4096 bytes there is a
some gain, in av
> On Wed, Sep 28, 2011 at 04:41:47AM -0700, Andi Kleen wrote:
> > Michael Zolotukhin writes:
> > >
> > > Build and 'make check' was tested.
> >
> > Could you expand a bit on the performance benefits? Where does it help?
>
> Especially when glibc these days has very well optimized implementation
On Wed, Sep 28, 2011 at 04:41:47AM -0700, Andi Kleen wrote:
> Michael Zolotukhin writes:
> >
> > Build and 'make check' was tested.
>
> Could you expand a bit on the performance benefits? Where does it help?
Especially when glibc these days has very well optimized implementations
tuned for vari
Michael Zolotukhin writes:
>
> Build and 'make check' was tested.
Could you expand a bit on the performance benefits? Where does it help?
-Andi
--
a...@linux.intel.com -- Speaking for myself only
Attached is a part 1 of patch that enables use of vector-instructions
in memset and memcopy (middle-end part).
The main part of the changes is in functions
move_by_pieces/set_by_pieces. In new version algorithm of move-mode
selection was changed – now it checks if alignment is known at compile
time
Ping.
On 18 July 2011 15:00, Michael Zolotukhin
wrote:
> Here is a summary - probably, it doesn't cover every single piece in
> the patch, but I tried to describe the major changes. I hope this will
> help you a bit - and of course I'll answer your further questions if
> they appear.
>
> The chan
Any updates/questions on this?
On 18 July 2011 15:00, Michael Zolotukhin
wrote:
> Here is a summary - probably, it doesn't cover every single piece in
> the patch, but I tried to describe the major changes. I hope this will
> help you a bit - and of course I'll answer your further questions if
>
Here is a summary - probably, it doesn't cover every single piece in
the patch, but I tried to describe the major changes. I hope this will
help you a bit - and of course I'll answer your further questions if
they appear.
The changes could be logically divided into two parts (though, these
parts h
> > New algorithm for move-mode selection is implemented for move_by_pieces,
> > store_by_pieces.
> > x86-specific ix86_expand_movmem and ix86_expand_setmem are also changed in
> > similar way, x86 cost-models parameters are slightly changed to support
> > this. This implementation checks if array'
Hello!
> Please don't use -m32/-m64 in testcases directly.
> You should use
>
> /* { dg-do compile { target { ! ia32 } } } */
>
> for 32bit insns and
>
> /* { dg-do compile { target { ia32 } } } */
>
> for 64bit insns.
Also, there is no need to add -mtune if -march is already specified.
-mtune wi
On Mon, Jul 11, 2011 at 1:57 PM, Michael Zolotukhin
wrote:
> Sorry, for sending once again - forgot to attach the patch.
>
> On 11 July 2011 23:50, Michael Zolotukhin
> wrote:
>> The attached patch enables use of vector instructions in memmov/memset
>> expanding.
>>
>> New algorithm for move-mode
Resending in plain text:
On 11 July 2011 23:50, Michael Zolotukhin
wrote:
>
> The attached patch enables use of vector instructions in memmov/memset
> expanding.
>
> New algorithm for move-mode selection is implemented for move_by_pieces,
> store_by_pieces.
> x86-specific ix86_expand_movmem and
35 matches
Mail list logo