from:"Matt"

[PATCH] fix 50148

2012-12-17 Thread Matt

Andrew attached a patch for this bug, which I am now hitting in trunk. Can 
someone please review and apply to trunk and 4.7 branch?


Thanks!

http://gcc.gnu.org/bugzilla/attachment.cgi?id=25070&action=diff

--- file_not_specified_in_diff
+++ file_not_specified_in_diff
@@ -1,2 +1,2 @@
--- gcc/c-parser.c
+++ gcc/c-parser.c
-  VEC(tree,gc) *origtypes;
+  VEC(tree,gc) *origtypes = NULL;



--
tangled strands of DNA explain the way that I behave.
http://www.clock.org/~matt

Re:

2013-02-14 Thread Matt


On Thu, 14 Feb 2013, Xinliang David Li wrote:


Ok for the google branch -- please provide the patch details in svn
commit message (note that ChangeLog is not needed any more for the
branch).


I don't have commit access (yet). Should I email overse...@gcc.gnu.org as 
mentioned at http://gcc.gnu.org/svnwrite.html to get the ball rolling?







On Thu, Feb 14, 2013 at 11:53 AM, Matt Hargett  wrote:

On Feb 14, 2013, at 10:40 AM, Xinliang David Li  wrote:


On Thu, Feb 14, 2013 at 10:18 AM, Matt  wrote:

The attached patches do two things:
1. Backports a fix from trunk that eliminates bogus warning traces. On my
current codebase which links ~40MB of C++ with LTO, the bogus warning traces
are literally hundreds of lines.


What is the trunk revision?


Richard's original patch was committed to trunk in r195884.



I verified the backport fixed our issue by doing doing a profiledbootstrap
using the bootstrap-lto.mk config with -O3 added. I used the resulting
compiler on the proprietary codebase, C++Benchmark, scummvm, and a few other
open source projects to validate.

2. Our primary development platform is RHEL6.1-based, and the recent
autoconf requirement bump locked us out. I lowered the version, and saw no
difference in ability to configure/bootstrap.

Thanks!


--
tangled strands of DNA explain the way that I behave.
http://www.clock.org/~matt





--
tangled strands of DNA explain the way that I behave.
http://www.clock.org/~matt

Re: [PATCH] Add capability to run several iterations of early optimizations

2011-10-27 Thread Matt

default of three iterations to make the 
typical use of Factory pattern devirtualize correctly still resulted in 
improved performance over a single pass -- just not necessarily a smaller 
binary.




--
tangled strands of DNA explain the way that I behave.
http://www.clock.org/~matt

Re: [PATCH] Add capability to run several iterations of early optimizations

2011-10-28 Thread Matt


On Fri, 28 Oct 2011, Richard Guenther wrote:


I discussed the idea of iterating early optimizations shortly with Honza.
I was trying to step back a bit and look at what we try to do right now,
which is, optimize functions in topological order (thus, try to make sure
all callees are already early optimized when optimizing callers).  That
of course is difficult when there are cycles in the cgraph (for which we
basically start at a random place), especially when there would be
may-edges (which we do not have) for indirect/virtual calls, as they
basically make the whole cgraph cyclic.  So my idea was to make
cycle processing more explicit in early optimizations, and, whenever
we discover a new direct cgraph edge make sure we optimize the
callee, and whenever we optimized a callee queue all callers for
re-optimization.  You of course have to limit the number of times you
want to process a function, otherwise for a cycle, you'd optimize
indefinitely.  We already do the inlining itself repeatedly (via
--param early-inliner-max-iterations), though that only iterates
the inlining itself, allowing for "deep" inlining and some cases
of inlining of indirect calls if the inliner substituted a function
pointer parameter into a call of the inlined function.  Moving that
iteration over to iterating over the optimizations could make sense.


Inverting the patch's current approach makes sense to me, and may be 
preferrable from a usability perspective. That is, since each iteration 
increases compile time the user is indirectly specifying how long they're 
willing to give the compiler to get the best code. That being said, I'd be 
curious what the "maximum" number of iterations would be on something like 
Firefox/WebKit when compiled with LTO. On the proprietary codebase we 
tested on, there were still things being discovered at 120+ iterations 
(~100KLOC, compiled with mega-compilation aka the poor-man's LTO).


This could actually speed things up quite a bit if it means that we would 
only re-run the early passes on the functions whose call-chain had some 
optimization applied. That is, that the parts of the code being 
re-analyzed/optimized would narrow with each iteration, potentially 
reducing the overhead of multiple passes in the first place.




Thus, I'd really like to at least make iterating depend on some
profitability analysis, even if it is only based on cgraph analysis
such as 'we discovered a new direct edge'.


That makes sense to me, but then again I'm not implementing it ;) FWIW, I 
implemented a similar rule in a static analysis product I wrote a few 
years ago -- if no new information was discovered on a given bottom-up 
pass, move onto the next analysis.


Thanks again for taking the time to work through this!


--
tangled strands of DNA explain the way that I behave.
http://www.clock.org/~matt

Re: [PATCH] Add capability to run several iterations of early optimizations

2011-10-28 Thread Matt


On Sat, 29 Oct 2011, Maxim Kuvyrkov wrote:


I like this variant a lot better than the last one - still it lacks any
analysis-based justification for iteration (see my reply to Matt on
what I discussed with Honza).


Yes, having a way to tell whether a function have significantly changed 
would be awesome.  My approach here would be to make inline_parameters 
output feedback of how much the size/time metrics have changed for a 
function since previous run.  If the change is above X%, then queue 
functions callers for more optimizations.  Similarly, Martin's 
rebuild_cgraph_edges_and_devirt (when that goes into trunk) could queue 
new direct callees and current function for another iteration if new 
direct edges were resolved.


Figuring out the heuristic will need decent testing on a few projects to 
figure out what the "sweet spot" is (smallest binary for time/passes 
spent) for that given codebase. With a few data points, a reasonable stab 
at the metrics you mention can be had that would not terminate the 
iterations before the known optimial number of passes. Without those data 
points, it seems like making sure the metrics allow those "sweet spots" to 
be attained will be difficult.



 Thus, I don't think we want to
merge this in its current form or in this stage1.


What is the benefit of pushing this to a later release?  If anything, 
merging the support for iterative optimizations now will allow us to 
consider adding the wonderful smartness to it later.  In the meantime, 
substituting that smartness with a knob is still a great alternative.


I agree (of course). Having the knob will be very useful for testing and 
determining the acceptance criteria for the later "smartness". While 
terminating early would be a nice optimization, the feature is still 
intrinsically useful and deployable without it. In addition, when using 
LTO on nearly all the projects/modules I tested on, 3+ passes were 
always productive. To be fair, when not using LTO, beyond 2-3 passes did 
not often produce improvements unless individual compilation units were 
enormous.


There was also the question of if some of the improvements seen with 
multiple passes were indicative of deficiencies in early inlining, CFG, 
SRA, etc. If the knob is available, I'm happy to continue testing on the 
same projects I've filed recent LTO/graphite bugs against (glib, zlib, 
openssl, scummvm, binutils, etc) and write a report on what I observe as 
"suspicious" improvements that perhaps should be caught/made in a single 
pass.


It's worth noting again that while this is a useful feature in and of 
itself (especially when combined with LTO), it's *extremely* useful when 
coupled with the de-virtualization improvements submitted in other 
threads. The examples submitted for inclusion in the test suite aren't 
academic -- they are reductions of real-world performance issues from a 
mature (and shipping) C++-based networking product. Any C++ codebase that 
employs physical separation in their designs via Factory patterns, 
Interface Segregation, and/or Dependency Inversion will likely see 
improvements. To me, these enahncements combine to form one of the biggest 
leaps I've seen in C++ code optimization -- code that can be clean, OO, 
*and* fast.


Richard: If there's any additional testing or information I can reasonably 
provide to help get this in for this stage1, let me know.


Thanks!


--
tangled strands of DNA explain the way that I behave.
http://www.clock.org/~matt

Re: [RFC 0/3] Stuff related to pr53533

2012-06-19 Thread Matt

On 2012-06-15 13:57, Richard Henderson wrote:

> Bootstrapped and tested on x86_64, but I'll leave some time for
> comment before committing any of this.

Patches now committed.

Hey Richard,

Thanks for taking on some of these issues. I'm not seeing much of an 
improvement yet when manually applying the patches to 4.7, but it looks 
like steps in the right direction. Having to turn off vectorization to 
approximate previous compiler performance was disappointing given it's 
supposed to give us a boost on some of these architectures ;)

Would it be possible to commit these to 4_7-branch as well? (One of the 
patches looks relevant to 4.6 as well, and applied cleanly, but I haven't 
tested to see if it had a noticeable effect.)

Thanks again!

--
tangled strands of DNA explain the way that I behave.
http://www.clock.org/~matt

Re: [PR 47382] We cannot simply fold OBJ_TYPE_REF at all in 4.6

2011-11-29 Thread Matt


Hi,


> Martin, do you plan to have this pushed in for GCC 4.7?



well, there were two patches.  I have managed to update and push
trough one of them in time
(http://gcc.gnu.org/ml/gcc-patches/2011-11/msg00086.html) but
unfortunately I have not managed to do the same with the second one.
It's recent incarnation is here:



BTW, it looked like Maxim's patch for devirtualization through pointers 
was left out of that first patch. Was that intentional? It seemed like it 
should have been, given the discussion in the thread.




http://gcc.gnu.org/ml/gcc-patches/2011-11/msg00095.html


I noticed Richard had a dangling question on that thread, in case you 
missed it.



However, since the stage 1 ended and I still wasn't able to
demonstrate any real impact anywhere (other than my semi-silly example
attached to that patch), I gave up.  It is unfortunate but I also had


In the local set of patches that made such an impact on the code quality 
tests for devirtualization (posted here 
http://gcc.gnu.org/ml/gcc-patches/2011-10/msg02589.html), this remaining 
patch does make a notable difference when combined with the others.


Even if the other patches that synergize with that one can't make it for 
4.7, I still think it would be valuable to get incremental feedback and 
testing on as many aspects as we can for 4.7 so that integration of 
further efforts in the 4.8 timeframe are better informed by bug reports, 
etc.



other pressing tasks and the patch does not do anything on simple
programs and I have not been able to compile Firefox even without LTO
with the current trunk to try it on something complex.


I know how that is ;) It is frustrating that Firefox, which is what we 
were asked to benchmark these devirt improvements on, has failed to 
compile at all since we were asked for said benchmarks. (I had a week set 
aside to do nothing but testing and benchmarking, but could only test 
multiple-iterations with C code due to the state of trunk at that time.)


The devirt code quality tests are derived from real-world code, and we 
verified that the real-world code was optimized successfully in-context 
once the test was passing (using a 4.6-based toolchain with Maxim's 
patches). As for open source projects to demonstrate on, scummvm is also a 
good candidate for these due to its use of pure virtual classes and 
factory patterns, but it hasn't compiled with LTO in a while, either 
(bugs posted a few weeks ago). I had previously benchmarked libv8, which 
also makes use of pure virtual base classes, but that didn't compile due 
to an ICE the last time I tested.


If you have suggestions for an open source C++ codebase that will compile 
with LTO and current trunk, let me know, and I'll test it to try and show 
differences without this patch.




PS: Thanks for all your work on devirtualization -- it made it much easier 
to get the additional development and testing funded, which has resulted 
in amazing improvements to code generation in the primary C++ codebase I 
work on.



--
http://themakingofthemakingof.com

Re: [Ping^2 PATCH] VAX: Fix ICE during operand output

2013-09-13 Thread Matt Thomas


On Sep 13, 2013, at 4:21 AM, Jan-Benedict Glaw  wrote:

> On Wed, 2013-07-31 18:34:26 +0200, Jan-Benedict Glaw  
> wrote:
>> We've seen ICEs while outputting an operand (not even the excessive
>> CISC of a VAX could do that), which should be fixed by this patch:
>> 
>> 2013-07-31  Jan-Benedict Glaw  
>> 
>>  * config/vax/constraints.md (T): Add missing CONSTANT_P check.
>> 
>> diff --git a/gcc/config/vax/constraints.md b/gcc/config/vax/constraints.md
>> index a4774d4..66d6bf0 100644
>> --- a/gcc/config/vax/constraints.md
>> +++ b/gcc/config/vax/constraints.md
>> @@ -114,5 +114,6 @@
>> 
>> (define_constraint "T"
>> "@internal satisfies CONSTANT_P and, if pic is enabled, is not a 
>> SYMBOL_REF, LABEL_REF, or CONST."
>> -   (ior (not (match_code "const,symbol_ref,label_ref"))
>> -(match_test "!flag_pic")))
>> +  (and (match_test ("CONSTANT_P (op)"))
>> +   (ior (not (match_code "symbol_ref,label_ref,const"))
>> +(match_test "!flag_pic"
>> 
>> Even the description got it right :)  Thanks to Will Deacon for
>> debugging this.
>> 
>> Ok?

Yes.

Re: [PATCH] target/56875: Work around buggy GAS

2013-09-20 Thread Matt Thomas


On Sep 20, 2013, at 9:58 AM, Jan-Benedict Glaw  wrote:

> Hi!
> 
> VAX GAS has a glitch when generating a 64bit value from a small
> negative integer, which isn't properly sign-extended. (I'll see if
> this can be fixed without breaking other cases.)
> 
> However, GCC should work around this by simply using the already
> prepared formatting operand code 'D'.  The patch also introduces a new
> vax.exp fragment (under gcc.target).
> 
> Ok?

Yes.

Re: [PATCH] Disable updating VRSAVE everywhere except Darwin

2012-09-29 Thread Matt Thomas


On Sep 29, 2012, at 8:08 AM, Segher Boessenkool wrote:

>> The following proposed patch disables setting, saving and restoring
>> the VRSAVE register on all targets except Darwin.
>> 
>> VRSAVE was removed from the AIX ABI and was suppose to have been
>> removed from the PPC SVR4 ABI.  All recent versions of the Linux
>> kernel set and maintain VRSAVE itself, as a process-level flag, not as
>> individual bits, so no need for the compiler to set the register or to
>> save and restore it across calls.  All uses of VRSAVE (e.g., GLibc)
>> will continue to work using the value set by the kernel.
> 
> I don't think you can assume all embedded users do not use VRSAVE (or
> even the majority).  And what about *BSD?

NetBSD does nothing with VRSAVE except saving and restoring it
on exceptions.

Re: wide-int, vax

2013-12-04 Thread Matt Thomas


On Nov 23, 2013, at 11:23 AM, Mike Stump  wrote:

> Richi has asked the we break the wide-int patch so that the individual port 
> and front end maintainers can review their parts without have to go through 
> the entire patch.This patch covers the vax port.
> 
> Ok?

OK.

[PATCH] driver: Also prune joined switches with negation

2019-09-23 Thread Matt Turner

When -march=native is passed to host_detect_local_cpu to the backend,
it overrides all command lines after it.  That means

$ gcc -march=native -march=armv8-a

is treated as

$ gcc -march=armv8-a -march=native

Prune joined switches with Negative and RejectNegative to allow
-march=armv8-a to override previous -march=native on command-line.

This is the same fix as was applied for i386 in SVN revision 269164 but for
aarch64 and arm.

gcc/

PR driver/69471
* config/aarch64/aarch64.opt (march=): Add Negative(march=).
(mtune=): Add Negative(mtune=).
* config/arm/arm.opt: Likewise.
---
 gcc/config/aarch64/aarch64.opt | 5 +++--
 gcc/config/arm/arm.opt | 4 ++--
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
index 865b6a6d8ca..908dca23b3c 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -119,7 +119,8 @@ EnumValue
 Enum(aarch64_tls_size) String(48) Value(48)
 
 march=
-Target RejectNegative ToLower Joined Var(aarch64_arch_string)
+Target RejectNegative Negative(march=) ToLower Joined Var(aarch64_arch_string)
+
 Use features of architecture ARCH.
 
 mcpu=
@@ -127,7 +128,7 @@ Target RejectNegative ToLower Joined Var(aarch64_cpu_string)
 Use features of and optimize for CPU.
 
 mtune=
-Target RejectNegative ToLower Joined Var(aarch64_tune_string)
+Target RejectNegative Negative(mtune=) ToLower Joined Var(aarch64_tune_string)
 Optimize for CPU.
 
 mabi=
diff --git a/gcc/config/arm/arm.opt b/gcc/config/arm/arm.opt
index 452f0cf6d67..e3ead5c95d1 100644
--- a/gcc/config/arm/arm.opt
+++ b/gcc/config/arm/arm.opt
@@ -82,7 +82,7 @@ mapcs-stack-check
 Target Report Mask(APCS_STACK) Undocumented
 
 march=
-Target RejectNegative ToLower Joined Var(arm_arch_string)
+Target RejectNegative Negative(march=) ToLower Joined Var(arm_arch_string)
 Specify the name of the target architecture.
 
 ; Other arm_arch values are loaded from arm-tables.opt
@@ -232,7 +232,7 @@ Target Report Mask(TPCS_LEAF_FRAME)
 Thumb: Generate (leaf) stack frames even if not needed.
 
 mtune=
-Target RejectNegative ToLower Joined Var(arm_tune_string)
+Target RejectNegative Negative(mtune=) ToLower Joined Var(arm_tune_string)
 Tune code for the given processor.
 
 mprint-tune-info
-- 
2.21.0

[PATCH] driver: Also prune joined switches with negation

2019-09-24 Thread Matt Turner

When -march=native is passed to host_detect_local_cpu to the backend,
it overrides all command lines after it.  That means

$ gcc -march=native -march=armv8-a

is treated as

$ gcc -march=armv8-a -march=native

Prune joined switches with Negative and RejectNegative to allow
-march=armv8-a to override previous -march=native on command-line.

This is the same fix as was applied for i386 in SVN revision 269164 but for
aarch64 and arm.

gcc/

PR driver/69471
* config/aarch64/aarch64.opt (march=): Add Negative(march=).
(mtune=): Add Negative(mtune=). (mcpu=): Add Negative(mcpu=).
* config/arm/arm.opt: Likewise.
---
 gcc/config/aarch64/aarch64.opt | 6 +++---
 gcc/config/arm/arm.opt | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
index 865b6a6d8ca..fc43428b32a 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -119,15 +119,15 @@ EnumValue
 Enum(aarch64_tls_size) String(48) Value(48)
 
 march=
-Target RejectNegative ToLower Joined Var(aarch64_arch_string)
+Target RejectNegative Negative(march=) ToLower Joined Var(aarch64_arch_string)
 Use features of architecture ARCH.
 
 mcpu=
-Target RejectNegative ToLower Joined Var(aarch64_cpu_string)
+Target RejectNegative Negative(mcpu=) ToLower Joined Var(aarch64_cpu_string)
 Use features of and optimize for CPU.
 
 mtune=
-Target RejectNegative ToLower Joined Var(aarch64_tune_string)
+Target RejectNegative Negative(mtune=) ToLower Joined Var(aarch64_tune_string)
 Optimize for CPU.
 
 mabi=
diff --git a/gcc/config/arm/arm.opt b/gcc/config/arm/arm.opt
index 452f0cf6d67..76c10ab62a2 100644
--- a/gcc/config/arm/arm.opt
+++ b/gcc/config/arm/arm.opt
@@ -82,7 +82,7 @@ mapcs-stack-check
 Target Report Mask(APCS_STACK) Undocumented
 
 march=
-Target RejectNegative ToLower Joined Var(arm_arch_string)
+Target RejectNegative Negative(march=) ToLower Joined Var(arm_arch_string)
 Specify the name of the target architecture.
 
 ; Other arm_arch values are loaded from arm-tables.opt
@@ -107,7 +107,7 @@ Target Report Mask(CALLER_INTERWORKING)
 Thumb: Assume function pointers may go to non-Thumb aware code.
 
 mcpu=
-Target RejectNegative ToLower Joined Var(arm_cpu_string)
+Target RejectNegative Negative(mcpu=) ToLower Joined Var(arm_cpu_string)
 Specify the name of the target CPU.
 
 mfloat-abi=
@@ -232,7 +232,7 @@ Target Report Mask(TPCS_LEAF_FRAME)
 Thumb: Generate (leaf) stack frames even if not needed.
 
 mtune=
-Target RejectNegative ToLower Joined Var(arm_tune_string)
+Target RejectNegative Negative(mtune=) ToLower Joined Var(arm_tune_string)
 Tune code for the given processor.
 
 mprint-tune-info
-- 
2.21.0

Re: [PATCH] driver: Also prune joined switches with negation

2019-09-24 Thread Matt Turner

On Tue, Sep 24, 2019 at 1:24 AM Kyrill Tkachov
 wrote:
>
> Hi Matt,
>
> On 9/24/19 5:04 AM, Matt Turner wrote:
> > When -march=native is passed to host_detect_local_cpu to the backend,
> > it overrides all command lines after it.  That means
> >
> > $ gcc -march=native -march=armv8-a
> >
> > is treated as
> >
> > $ gcc -march=armv8-a -march=native
> >
> > Prune joined switches with Negative and RejectNegative to allow
> > -march=armv8-a to override previous -march=native on command-line.
> >
> > This is the same fix as was applied for i386 in SVN revision 269164
> > but for
> > aarch64 and arm.
> >
> The fix is ok for arm and LGTM for aarch64 FWIW.

Thanks!

> How has this been tested?

The problem was noticed in this bug report:

   https://bugs.gentoo.org/693522

I remembered seeing the i386 fix and I separately encountered the
problem on ARM when building the pixman library which has iwMMXt code
which requires march=iwmmxt (Could I bribe someone into fixing that by
giving gcc an -miwmmxt flag?)

I verified the fix works by patching gcc and seeing that nss (the
package from the Gentoo bug report) successfully builds with
CFLAGS="-march=native -O2 -pipe"

SVN revision 269164 also added some tests to the gcc test suite, but I
am not sufficiently familiar with building gcc and running the test
suite to verify that any test I speculatively add actually works.

> However...
>
>
> > gcc/
> >
> > PR driver/69471
> > * config/aarch64/aarch64.opt (march=): Add Negative(march=).
> > (mtune=): Add Negative(mtune=).
> > * config/arm/arm.opt: Likewise.
> > ---
> >  gcc/config/aarch64/aarch64.opt | 5 +++--
> >  gcc/config/arm/arm.opt | 4 ++--
> >  2 files changed, 5 insertions(+), 4 deletions(-)
> >
> > diff --git a/gcc/config/aarch64/aarch64.opt
> > b/gcc/config/aarch64/aarch64.opt
> > index 865b6a6d8ca..908dca23b3c 100644
> > --- a/gcc/config/aarch64/aarch64.opt
> > +++ b/gcc/config/aarch64/aarch64.opt
> > @@ -119,7 +119,8 @@ EnumValue
> >  Enum(aarch64_tls_size) String(48) Value(48)
> >
> >  march=
> > -Target RejectNegative ToLower Joined Var(aarch64_arch_string)
> > +Target RejectNegative Negative(march=) ToLower Joined
> > Var(aarch64_arch_string)
> > +
> >  Use features of architecture ARCH.
> >
> >  mcpu=
>
>
> ... Looks like we'll need something similar for -mcpu. On arm and
> aarch64 the -mcpu is the most commonly used option and that can also
> take a "native" value that would suffer from the same issue I presume.

Thank you. I've sent a second version with this addressed in reply to
my initial patch.

If the patch is okay, I think we'd appreciate it if it were backported
to the gcc-8 branch as well.

[PATCH 1/2] i386: Consider Kaby Lake to be equivalent to Skylake

2017-06-16 Thread Matt Turner

Currently -march=native selects -march=broadwell on Kaby Lake systems,
since its model numbers are missing from the switch statement. It falls
back to the default case and chooses -march=broadwell because of the
presence of the ADX instruction set.

gcc/
* config/i386/driver-i386.c (host_detect_local_cpu): Add Kaby
Lake models to skylake case.

gcc/testsuite/

* gcc.target/i386/builtin_target.c: Add Kaby Lake models to
skylake check.

libgcc/

* config/i386/cpuinfo.c (get_intel_cpu): Add Kaby Lake models to
skylake case.
---
 gcc/config/i386/driver-i386.c  | 3 +++
 gcc/testsuite/gcc.target/i386/builtin_target.c | 3 +++
 libgcc/config/i386/cpuinfo.c   | 3 +++
 3 files changed, 9 insertions(+)

diff --git a/gcc/config/i386/driver-i386.c b/gcc/config/i386/driver-i386.c
index 6c812514239..09faad0af0e 100644
--- a/gcc/config/i386/driver-i386.c
+++ b/gcc/config/i386/driver-i386.c
@@ -781,6 +781,9 @@ const char *host_detect_local_cpu (int argc, const char 
**argv)
case 0x4e:
case 0x5e:
  /* Skylake.  */
+   case 0x8e:
+   case 0x9e:
+ /* Kaby Lake.  */
  cpu = "skylake";
  break;
case 0x57:
diff --git a/gcc/testsuite/gcc.target/i386/builtin_target.c 
b/gcc/testsuite/gcc.target/i386/builtin_target.c
index 374f0292453..9c190eb7ebc 100644
--- a/gcc/testsuite/gcc.target/i386/builtin_target.c
+++ b/gcc/testsuite/gcc.target/i386/builtin_target.c
@@ -88,6 +88,9 @@ check_intel_cpu_model (unsigned int family, unsigned int 
model,
case 0x4e:
case 0x5e:
  /* Skylake.  */
+   case 0x8e:
+   case 0x9e:
+ /* Kaby Lake.  */
  assert (__builtin_cpu_is ("corei7"));
  assert (__builtin_cpu_is ("skylake"));
  break;
diff --git a/libgcc/config/i386/cpuinfo.c b/libgcc/config/i386/cpuinfo.c
index a1dc011525f..b008fb6e396 100644
--- a/libgcc/config/i386/cpuinfo.c
+++ b/libgcc/config/i386/cpuinfo.c
@@ -183,6 +183,9 @@ get_intel_cpu (unsigned int family, unsigned int model, 
unsigned int brand_id)
case 0x4e:
case 0x5e:
  /* Skylake.  */
+   case 0x8e:
+   case 0x9e:
+ /* Kaby Lake.  */
  __cpu_model.__cpu_type = INTEL_COREI7;
  __cpu_model.__cpu_subtype = INTEL_COREI7_SKYLAKE;
  break;
-- 
2.13.0

[PATCH 2/2] i386: Assume Skylake for unknown models with clflushopt

2017-06-16 Thread Matt Turner

gcc/
* config/i386/driver-i386.c (host_detect_local_cpu): Assume
skylake for unknown models with clflushopt.
---
 gcc/config/i386/driver-i386.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/config/i386/driver-i386.c b/gcc/config/i386/driver-i386.c
index 09faad0af0e..570c49031bd 100644
--- a/gcc/config/i386/driver-i386.c
+++ b/gcc/config/i386/driver-i386.c
@@ -797,6 +797,9 @@ const char *host_detect_local_cpu (int argc, const char 
**argv)
  /* Assume Knights Landing.  */
  if (has_avx512f)
cpu = "knl";
+ /* Assume Skylake.  */
+ else if (has_clflushopt)
+   cpu = "skylake";
  /* Assume Broadwell.  */
  else if (has_adx)
cpu = "broadwell";
-- 
2.13.0

Re: [PATCH 1/2] i386: Consider Kaby Lake to be equivalent to Skylake

2017-06-22 Thread Matt Turner

On Sun, Jun 18, 2017 at 10:56 AM, Uros Bizjak  wrote:
> On Fri, Jun 16, 2017 at 11:42 PM, Matt Turner  wrote:
>> Currently -march=native selects -march=broadwell on Kaby Lake systems,
>> since its model numbers are missing from the switch statement. It falls
>> back to the default case and chooses -march=broadwell because of the
>> presence of the ADX instruction set.
>>
>> gcc/
>> * config/i386/driver-i386.c (host_detect_local_cpu): Add Kaby
>> Lake models to skylake case.
>>
>> gcc/testsuite/
>>
>> * gcc.target/i386/builtin_target.c: Add Kaby Lake models to
>> skylake check.
>>
>> libgcc/
>>
>> * config/i386/cpuinfo.c (get_intel_cpu): Add Kaby Lake models to
>> skylake case.
>
> OK.
>
> Thanks,
> Uros.

Thank you very much. I do not have write access, so please check the
patches in for me if you would not mind.

Re: [patch] Clean up detection of SJLJ exceptions in target libraries

2015-05-13 Thread Matt Breedlove

This patch fixes an issue preventing mingw-w64 i686 dwarf2-eh
bootstrapping described at:

http://sourceforge.net/p/mingw-w64/mailman/message/34101954/

I'm assuming this has more to do with switching away from the current
sjlj configuration method since configuring gcc with
"--disable-sjlj-exceptions --with-dwarf2" still suffers the same
issues.  Building with simply "--with-dwarf2" instead, however, now
works fine.  I'm not sure whether or not a bug has been created for it
and if one needs to be.

Much appreciated,
Matt

On Wed, May 13, 2015 at 6:36 AM, Jonathan Wakely  wrote:
> On 12/05/15 18:42 +0200, Eric Botcazou wrote:
>>
>> libstdc++-v3/
>> * acinclude.m4 (GLIBCXX_ENABLE_SJLJ_EXCEPTIONS): Delete.
>> * configure.ac: Remove GLIBCXX_ENABLE_SJLJ_EXCEPTIONS.
>> * config.h.in: Regenerate.
>> * configure: Likewise.
>> * libsupc++/eh_personality.cc: Replace _GLIBCXX_SJLJ_EXCEPTIONS by
>> __USING_SJLJ_EXCEPTIONS__.
>> * libsupc++/eh_throw.cc: Likewise.
>> * libsupc++/eh_ptr.cc: Likewise.
>> * doc/html/manual/appendix_porting.html: Remove
>> GLIBCXX_ENABLE_SJLJ_EXCEPTIONS
>> * doc/xml/manual/build_hacking.xml: Likewise.
>> * doc/html/manual/configure.html: Remove --enable-sjlj-exceptions.
>> * doc/xml/manual/configure.xml: Likewise.
>
>
> The libstdc++ parts are OK, thanks.
>

Re: [Patch Vax] zero/sign extend patterns need to be SUBREG aware

2015-06-19 Thread Matt Thomas


> On Jun 19, 2015, at 8:51 AM, Jan-Benedict Glaw  wrote:
> 
> Hi James,
> 
> On Tue, 2015-06-16 10:58:48 +0100, James Greenhalgh 
>  wrote:
>> The testcase in this patch, from libgcc, causes an ICE in the Vax
>> build.
> [...]
>> As far as I know, reload is going to get rid of these SUBREGs
>> for us, so we don't need to modify the output statement.
>> 
>> Tested that this restores the VAX build and that the code-gen is
>> sensible for the testcase.
>> 
>> OK?
> 
> Looks good to me, but Matt has to ACK this fix.

I so ACK.

Re: [Patch Vax] zero/sign extend patterns need to be SUBREG aware

2015-06-29 Thread Matt Thomas


> On Jun 29, 2015, at 8:19 AM, James Greenhalgh  
> wrote:
> 
> Now that this has had a few days sitting on trunk without seeing any
> complaints, would you mind if I backported it to the GCC 5 branch?

I don’t have a problem with that.

Patch tree-profile.c to support profile-func-internal-id parameter

2015-08-27 Thread Matt Deeds

This patch is for svn://gcc.gnu.org/svn/gcc/branches/google/gcc-4_9.  I add
support for the profile_func_internal-id in the instrumentation generated for
__coverage_callback.

Add support for the profile-func-internal-id parameter to the coverage callback.
Without this change, the function identifier passed to __coverage_callback
(enabled with param=coverage-callback=1) does not match the values emitted in
the .gcno file.  Because the function profile_id is typically more unique
(typically 32 bits) than the function internal id (typically 16 bits), it can be
desirable to have the profile_id used to identify a function as opposed to the
function internal id.

I've instrumented a large binary creating over 500 .gcno files and confirmed
that function IDs in these .gcno files match the IDs in __coverage_callback.  In
my example, there were typically about one to four functions sharing the same
internal function ID.  There were no collisions using profile_id.


Index: gcc/tree-profile.c
===
--- gcc/tree-profile.c (revision 226647)
+++ gcc/tree-profile.c (working copy)
@@ -864,8 +864,20 @@ gimple_gen_edge_profiler (int edgeno, edge e)
 {
   gimple call;
   tree tree_edgeno = build_int_cst (gcov_type_node, edgeno);
-  tree tree_uid = build_int_cst (gcov_type_node,
+
+  tree tree_uid;
+  if (PARAM_VALUE (PARAM_PROFILE_FUNC_INTERNAL_ID))
+{
+  tree_uid  = build_int_cst (gcov_type_node,
  current_function_funcdef_no);
+}
+  else
+{
+  gcc_assert (coverage_node_map_initialized_p ());
+
+  tree_uid = build_int_cst
+  (gcov_type_node, cgraph_get_node (current_function_decl)->profile_id);
+}
   tree callback_fn_type
   = build_function_type_list (void_type_node,
   gcov_type_node,

Patch GCC for profile-func-internal-id=0 coverage-callback=1

2015-09-02 Thread Matt Deeds

Hello, Honza.  David Li said you might be able to help me get this
patch into GCC trunk.  I sent mail for this on August 27, but didn't
get a reply.  It's a small change to make these two options work
together:

profile-func-internal-id=0 coverage-callback=1

Let me know what I can do to get this submitted.

This patch is for svn://gcc.gnu.org/svn/gcc/branches/google/gcc-4_9.  I add
support for the profile_func_internal-id in the instrumentation generated for
__coverage_callback.

Add support for the profile-func-internal-id parameter to the coverage callback.
Without this change, the function identifier passed to __coverage_callback
(enabled with param=coverage-callback=1) does not match the values emitted in
the .gcno file.  Because the function profile_id is typically more unique
(typically 32 bits) than the function internal id (typically 16 bits), it can be
desirable to have the profile_id used to identify a function as opposed to the
function internal id.

I've instrumented a large binary creating over 500 .gcno files and confirmed
that function IDs in these .gcno files match the IDs in __coverage_callback.  In
my example, there were typically about one to four functions sharing the same
internal function ID.  There were no collisions using profile_id.


Index: gcc/tree-profile.c
===
--- gcc/tree-profile.c (revision 226647)
+++ gcc/tree-profile.c (working copy)
@@ -864,8 +864,20 @@ gimple_gen_edge_profiler (int edgeno, edge e)
 {
   gimple call;
   tree tree_edgeno = build_int_cst (gcov_type_node, edgeno);
-  tree tree_uid = build_int_cst (gcov_type_node,
+
+  tree tree_uid;
+  if (PARAM_VALUE (PARAM_PROFILE_FUNC_INTERNAL_ID))
+{
+  tree_uid  = build_int_cst (gcov_type_node,
  current_function_funcdef_no);
+}
+  else
+{
+  gcc_assert (coverage_node_map_initialized_p ());
+
+  tree_uid = build_int_cst
+  (gcov_type_node, cgraph_get_node (current_function_decl)->profile_id);
+}
   tree callback_fn_type
   = build_function_type_list (void_type_node,
   gcov_type_node,

[Ping] Re: [Patch] Fix PR56780: --disable-install-libiberty still installs libiberty.a

2013-05-13 Thread Matt Burgess

Hi,

Is anyone able to review the below please (original patch attached to
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56780 and first posted at
http://gcc.gnu.org/ml/gcc-patches/2013-04/msg00167.html

Thanks,

Matt.

On Wed, 2013-04-03 at 15:03 +0100, Matt Burgess wrote:
> Hi,
> 
> Please find attached a patch that fixes PR56780.  Build tested on
> x86_64-linux.  I've also attached it to the bug.
> 
> Regards,
> 
> Matt Burgess
> 
> 2013-04-03 Matt Burgess 
> 
>   other/PR56780
>   * libiberty/configure.ac:
>   Move test for --enable-install-libiberty outside of the
> 'with_target_subdir' test so that it actually gets run.
>   Add output messages to show the test result.
> 
>   * libiberty/configure:
>   Regenerate.
> 
>   * libiberty/Makefile.in (install_to_libdir):
>   Place the installation of the libiberty library in the same guard as
> that used for the headers to prevent it being installed unless requested
> via --enable-install-libiberty.

Re: [Patch] Fix PR56780: --disable-install-libiberty still installs libiberty.a

2013-05-22 Thread Matt Burgess

Hi Ian,

Thanks for the review.  Here's v2, which I think addresses both of your
comments.

Kind Regards,

Matt.

2013-05-22 Matt Burgess 

other/PR56780
* libiberty/configure.ac: Move test for --enable-install-libiberty
outside of the 'with_target_subdir' test so that it actually gets run.
Add output messages to show the test result.
* libiberty/configure: Regenerate.
* libiberty/Makefile.in (install_to_libdir): Place the installation
of the libiberty library in the same guard as that used for the
headers to prevent it being installed unless requested via
--enable-install-libiberty.
Index: libiberty/Makefile.in
===
--- libiberty/Makefile.in	(revision 197373)
+++ libiberty/Makefile.in	(working copy)
@@ -355,19 +355,19 @@
 # since it will be passed the multilib flags.
 MULTIOSDIR = `$(CC) $(CFLAGS) -print-multi-os-directory`
 install_to_libdir: all
-	${mkinstalldirs} $(DESTDIR)$(libdir)/$(MULTIOSDIR)
-	$(INSTALL_DATA) $(TARGETLIB) $(DESTDIR)$(libdir)/$(MULTIOSDIR)/$(TARGETLIB)n
-	( cd $(DESTDIR)$(libdir)/$(MULTIOSDIR) ; chmod 644 $(TARGETLIB)n ;$(RANLIB) $(TARGETLIB)n )
-	mv -f $(DESTDIR)$(libdir)/$(MULTIOSDIR)/$(TARGETLIB)n $(DESTDIR)$(libdir)/$(MULTIOSDIR)/$(TARGETLIB)
 	if test -n "${target_header_dir}"; then \
-	  case "${target_header_dir}" in \
-	/*)thd=${target_header_dir};; \
-	*) thd=${includedir}/${target_header_dir};; \
-	  esac; \
-	  ${mkinstalldirs} $(DESTDIR)$${thd}; \
-	  for h in ${INSTALLED_HEADERS}; do \
-	${INSTALL_DATA} $$h $(DESTDIR)$${thd}; \
-	  done; \
+		${mkinstalldirs} $(DESTDIR)$(libdir)/$(MULTIOSDIR); \
+		$(INSTALL_DATA) $(TARGETLIB) $(DESTDIR)$(libdir)/$(MULTIOSDIR)/$(TARGETLIB)n; \
+		( cd $(DESTDIR)$(libdir)/$(MULTIOSDIR) ; chmod 644 $(TARGETLIB)n ;$(RANLIB) $(TARGETLIB)n ); \
+		mv -f $(DESTDIR)$(libdir)/$(MULTIOSDIR)/$(TARGETLIB)n $(DESTDIR)$(libdir)/$(MULTIOSDIR)/$(TARGETLIB); \
+		case "${target_header_dir}" in \
+		  /*)thd=${target_header_dir};; \
+		  *) thd=${includedir}/${target_header_dir};; \
+		esac; \
+		${mkinstalldirs} $(DESTDIR)$${thd}; \
+		for h in ${INSTALLED_HEADERS}; do \
+		  ${INSTALL_DATA} $$h $(DESTDIR)$${thd}; \
+		done; \
 	fi
 	@$(MULTIDO) $(FLAGS_TO_PASS) multi-do DO=install
 
Index: libiberty/configure.ac
===
--- libiberty/configure.ac	(revision 197373)
+++ libiberty/configure.ac	(working copy)
@@ -128,6 +128,31 @@
cross_compiling=maybe
 fi
 
+# We may wish to install the target headers somewhere.
+AC_MSG_CHECKING([whether to install libiberty headers and static library])
+dnl install-libiberty is disabled by default
+
+AC_ARG_ENABLE(install-libiberty,
+[  --enable-install-libiberty   Install headers and library for end users],
+enable_install_libiberty=$enableval,
+enable_install_libiberty=no)dnl
+
+# Option parsed, now set things appropriately.
+case x"$enable_install_libiberty" in
+  xyes|x)
+target_header_dir=libiberty
+;;
+  xno)   
+target_header_dir=
+;;
+  *) 
+# This could be sanity-checked in various ways...
+target_header_dir="${enable_install_libiberty}"
+;;
+esac
+AC_MSG_RESULT($enable_install_libiberty)
+AC_MSG_NOTICE([target_header_dir = $target_header_dir])
+
 GCC_NO_EXECUTABLES
 AC_PROG_CC
 AC_SYS_LARGEFILE
@@ -492,27 +517,6 @@
 
   esac
 
-  # We may wish to install the target headers somewhere.
-  AC_ARG_ENABLE(install-libiberty,
-  [  --enable-install-libiberty   Install headers for end users],
-  enable_install_libiberty=$enableval,
-  enable_install_libiberty=no)dnl
-  
-  # Option parsed, now set things appropriately.
-  case x"$enable_install_libiberty" in
-xyes|x)
-  target_header_dir=libiberty
-  ;;
-xno)   
-  target_header_dir=
-  ;;
-*) 
-  # This could be sanity-checked in various ways...
-  target_header_dir="${enable_install_libiberty}"
-  ;;
-  esac
-
-
 else
 
# Not a target library, so we set things up to run the test suite.

Re: [Patch] Fix PR56780: --disable-install-libiberty still installs libiberty.a

2013-05-30 Thread Matt Burgess

On Wed, 2013-05-22 at 15:13 -0700, Ian Lance Taylor wrote:
> On Wed, May 22, 2013 at 2:41 PM, Matt Burgess
>  wrote:
> >
> > 2013-05-22 Matt Burgess 
> >
> > other/PR56780
> > * libiberty/configure.ac: Move test for --enable-install-libiberty
> > outside of the 'with_target_subdir' test so that it actually gets 
> > run.
> > Add output messages to show the test result.
> > * libiberty/configure: Regenerate.
> > * libiberty/Makefile.in (install_to_libdir): Place the installation
> > of the libiberty library in the same guard as that used for the
> > headers to prevent it being installed unless requested via
> > --enable-install-libiberty.
> 
> This is OK.

Thanks, Ian.  Another step in the contributing guidelines I seem to have
missed is to inform you that I don't have write access to the GCC repo.
If you could commit this for me please, I'd appreciate it.

Thanks,

Matt.

[Patch] Fix PR56780: --disable-install-libiberty still installs libiberty.a

2013-04-03 Thread Matt Burgess

Hi,

Please find attached a patch that fixes PR56780.  Build tested on
x86_64-linux.  I've also attached it to the bug.

Regards,

Matt Burgess

2013-04-03 Matt Burgess 

other/PR56780
* libiberty/configure.ac:
Move test for --enable-install-libiberty outside of the
'with_target_subdir' test so that it actually gets run.
Add output messages to show the test result.

* libiberty/configure:
Regenerate.

* libiberty/Makefile.in (install_to_libdir):
Place the installation of the libiberty library in the same guard as
that used for the headers to prevent it being installed unless requested
via --enable-install-libiberty.
Index: libiberty/Makefile.in
===
--- libiberty/Makefile.in	(revision 197373)
+++ libiberty/Makefile.in	(working copy)
@@ -355,19 +355,19 @@
 # since it will be passed the multilib flags.
 MULTIOSDIR = `$(CC) $(CFLAGS) -print-multi-os-directory`
 install_to_libdir: all
-	${mkinstalldirs} $(DESTDIR)$(libdir)/$(MULTIOSDIR)
-	$(INSTALL_DATA) $(TARGETLIB) $(DESTDIR)$(libdir)/$(MULTIOSDIR)/$(TARGETLIB)n
-	( cd $(DESTDIR)$(libdir)/$(MULTIOSDIR) ; chmod 644 $(TARGETLIB)n ;$(RANLIB) $(TARGETLIB)n )
-	mv -f $(DESTDIR)$(libdir)/$(MULTIOSDIR)/$(TARGETLIB)n $(DESTDIR)$(libdir)/$(MULTIOSDIR)/$(TARGETLIB)
 	if test -n "${target_header_dir}"; then \
-	  case "${target_header_dir}" in \
-	/*)thd=${target_header_dir};; \
-	*) thd=${includedir}/${target_header_dir};; \
-	  esac; \
-	  ${mkinstalldirs} $(DESTDIR)$${thd}; \
-	  for h in ${INSTALLED_HEADERS}; do \
-	${INSTALL_DATA} $$h $(DESTDIR)$${thd}; \
-	  done; \
+		${mkinstalldirs} $(DESTDIR)$(libdir)/$(MULTIOSDIR); \
+		$(INSTALL_DATA) $(TARGETLIB) $(DESTDIR)$(libdir)/$(MULTIOSDIR)/$(TARGETLIB)n; \
+		( cd $(DESTDIR)$(libdir)/$(MULTIOSDIR) ; chmod 644 $(TARGETLIB)n ;$(RANLIB) $(TARGETLIB)n ); \
+		mv -f $(DESTDIR)$(libdir)/$(MULTIOSDIR)/$(TARGETLIB)n $(DESTDIR)$(libdir)/$(MULTIOSDIR)/$(TARGETLIB); \
+		case "${target_header_dir}" in \
+		  /*)thd=${target_header_dir};; \
+		  *) thd=${includedir}/${target_header_dir};; \
+		esac; \
+		${mkinstalldirs} $(DESTDIR)$${thd}; \
+		for h in ${INSTALLED_HEADERS}; do \
+		  ${INSTALL_DATA} $$h $(DESTDIR)$${thd}; \
+		done; \
 	fi
 	@$(MULTIDO) $(FLAGS_TO_PASS) multi-do DO=install
 
Index: libiberty/configure.ac
===
--- libiberty/configure.ac	(revision 197373)
+++ libiberty/configure.ac	(working copy)
@@ -128,6 +128,31 @@
cross_compiling=maybe
 fi
 
+# We may wish to install the target headers somewhere.
+AC_MSG_CHECKING([whether to install libiberty headers and static library])
+dnl install-libiberty is disabled by default
+
+AC_ARG_ENABLE(install-libiberty,
+[  --enable-install-libiberty   Install headers for end users],
+enable_install_libiberty=$enableval,
+enable_install_libiberty=no)dnl
+
+# Option parsed, now set things appropriately.
+case x"$enable_install_libiberty" in
+  xyes|x)
+target_header_dir=libiberty
+;;
+  xno)   
+target_header_dir=
+;;
+  *) 
+# This could be sanity-checked in various ways...
+target_header_dir="${enable_install_libiberty}"
+;;
+esac
+AC_MSG_RESULT($enable_install_libiberty)
+AC_MSG_NOTICE([target_header_dir = $target_header_dir])
+
 GCC_NO_EXECUTABLES
 AC_PROG_CC
 AC_SYS_LARGEFILE
@@ -492,27 +517,6 @@
 
   esac
 
-  # We may wish to install the target headers somewhere.
-  AC_ARG_ENABLE(install-libiberty,
-  [  --enable-install-libiberty   Install headers for end users],
-  enable_install_libiberty=$enableval,
-  enable_install_libiberty=no)dnl
-  
-  # Option parsed, now set things appropriately.
-  case x"$enable_install_libiberty" in
-xyes|x)
-  target_header_dir=libiberty
-  ;;
-xno)   
-  target_header_dir=
-  ;;
-*) 
-  # This could be sanity-checked in various ways...
-  target_header_dir="${enable_install_libiberty}"
-  ;;
-  esac
-
-
 else
 
# Not a target library, so we set things up to run the test suite.

Re: [PATCH ARM iWMMXt 0/5] Improve iWMMXt support

2013-01-27 Thread Matt Turner

On Tue, Jun 26, 2012 at 7:56 AM, nick clifton  wrote:
> Hi Matt,
>
>
>> There's also a trivial documentation fix:
>>
>> [PATCH 1/2] doc: Correct __builtin_arm_tinsr prototype documentation
>>
>> and a test to exercise the intrinsics:
>>
>> [PATCH 2/2] arm: add iwMMXt mmx-2.c test
>
>
> These have both been checked in.
>
> It turns out that both needed minor updates as some of the builtins have
> changed since these patches were written.  I have taken care of this
> however.
>
> Cheers
>   Nick

Hi Nick,

Could this patch, or perhaps the much smaller one I attached to bug
35294 be committed to the 4.7 branch?

Also, could you close its duplicates, bugs 36798 and 36966?

Thanks,
Matt

Re:

2013-02-14 Thread Matt Hargett

On Feb 14, 2013, at 10:40 AM, Xinliang David Li  wrote:

> On Thu, Feb 14, 2013 at 10:18 AM, Matt  wrote:
>> The attached patches do two things:
>> 1. Backports a fix from trunk that eliminates bogus warning traces. On my
>> current codebase which links ~40MB of C++ with LTO, the bogus warning traces
>> are literally hundreds of lines.
> 
> What is the trunk revision?

Richard's original patch was committed to trunk in r195884.


>> I verified the backport fixed our issue by doing doing a profiledbootstrap
>> using the bootstrap-lto.mk config with -O3 added. I used the resulting
>> compiler on the proprietary codebase, C++Benchmark, scummvm, and a few other
>> open source projects to validate.
>> 
>> 2. Our primary development platform is RHEL6.1-based, and the recent
>> autoconf requirement bump locked us out. I lowered the version, and saw no
>> difference in ability to configure/bootstrap.
>> 
>> Thanks!
>> 
>> 
>> --
>> tangled strands of DNA explain the way that I behave.
>> http://www.clock.org/~matt

[PATCH] mips: Document r4700

2013-02-22 Thread Matt Turner

2013-02-22  Matt Turner  

gcc/
* doc/invoke.texi: Document r4700.
---
 gcc/doc/invoke.texi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 7d96467..63eb6a6 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -15867,7 +15867,7 @@ The processor names are:
 @samp{octeon}, @samp{octeon+}, @samp{octeon2},
 @samp{orion},
 @samp{r2000}, @samp{r3000}, @samp{r3900}, @samp{r4000}, @samp{r4400},
-@samp{r4600}, @samp{r4650}, @samp{r6000}, @samp{r8000},
+@samp{r4600}, @samp{r4650}, @samp{r4700}, @samp{r6000}, @samp{r8000},
 @samp{rm7000}, @samp{rm9000},
 @samp{r1}, @samp{r12000}, @samp{r14000}, @samp{r16000},
 @samp{sb1},
-- 
1.7.12.4

Re: [PATCH ARM iWMMXt 0/5] Improve iWMMXt support

2012-06-13 Thread Matt Turner

On Wed, Jun 13, 2012 at 3:26 AM, nick clifton  wrote:
> Hi Matt, Hi Xinyu,
>
>
>> This series was written by Marvell and sent by Xinyu Qi
>> a number of times in the last year.
>
>
> Sorry for the long delay in reviewing these patches.  Overall they were
> fine, with only a few, very minor, formatting issues.  I have committed the
> entire series of patches to the mainline.

Great! Thank you so much! Thanks to Ramana for the reviews!

>> For 4.7 and 4.6 please consider committing my patch
>> "[PATCH] arm: Fix iwmmxt shift and logical intrinsics (PR 35294)."
>> which only fixes the logical and shift intrinsics.

Sounds good.

There's also a trivial documentation fix:

[PATCH 1/2] doc: Correct __builtin_arm_tinsr prototype documentation

and a test to exercise the intrinsics:

[PATCH 2/2] arm: add iwMMXt mmx-2.c test

Thanks a lot!

Matt

Re: [PATCH ARM iWMMXt 0/5] Improve iWMMXt support

2012-06-27 Thread Matt Turner

On Tue, Jun 26, 2012 at 10:56 AM, nick clifton  wrote:
> Hi Matt,
>
>
>> There's also a trivial documentation fix:
>>
>> [PATCH 1/2] doc: Correct __builtin_arm_tinsr prototype documentation
>>
>> and a test to exercise the intrinsics:
>>
>> [PATCH 2/2] arm: add iwMMXt mmx-2.c test
>
>
> These have both been checked in.
>
> It turns out that both needed minor updates as some of the builtins have
> changed since these patches were written.  I have taken care of this
> however.
>
> Cheers
>  Nick

Thanks a lot, Nick!

Re: [PING] iwMMXt patches

2012-05-02 Thread Matt Turner

On Tue, Apr 17, 2012 at 4:17 PM, Matt Turner  wrote:
> Are these patches ready to go in? It looks like they were ack'd.
>
> http://gcc.gnu.org/ml/gcc-patches/2011-10/msg01815.html
> http://gcc.gnu.org/ml/gcc-patches/2011-10/msg01817.html
> http://gcc.gnu.org/ml/gcc-patches/2011-10/msg01816.html
> http://gcc.gnu.org/ml/gcc-patches/2011-10/msg01818.html
> http://gcc.gnu.org/ml/gcc-patches/2011-10/msg01819.html
>
> We (OLPC) will need these patches for reasonable iwMMXt performance
> and the ability to use VFP and iwMMXt together.
>
> Thanks,
> Matt

Xinyu,

With these patches I don't see a new -mcpu flag. Isn't a tune/cpu flag
the normal way to activate this code?

Other .md files have statements like (eq_attr "tune" "cortexa8"), but
I don't see how to turn on the marvell-f-iwmmxt attribute, ie (eq_attr
"marvell_f_iwmmxt" "yes").

Please let me know.

Thanks,
Matt

Re: [PING] iwMMXt patches

2012-05-03 Thread Matt Turner

On Thu, May 3, 2012 at 12:59 AM, Xinyu Qi  wrote:
>> From: Matt Turner [mailto:matts...@gmail.com]
>> To: Xinyu Qi
>> Cc: Ramana Radhakrishnan; GCC Patches
>> Subject: Re: [PING] iwMMXt patches
>>
>> On Tue, Apr 17, 2012 at 4:17 PM, Matt Turner  wrote:
>> > Are these patches ready to go in? It looks like they were ack'd.
>> >
>> > http://gcc.gnu.org/ml/gcc-patches/2011-10/msg01815.html
>> > http://gcc.gnu.org/ml/gcc-patches/2011-10/msg01817.html
>> > http://gcc.gnu.org/ml/gcc-patches/2011-10/msg01816.html
>> > http://gcc.gnu.org/ml/gcc-patches/2011-10/msg01818.html
>> > http://gcc.gnu.org/ml/gcc-patches/2011-10/msg01819.html
>> >
>> > We (OLPC) will need these patches for reasonable iwMMXt performance
>> > and the ability to use VFP and iwMMXt together.
>> >
>> > Thanks,
>> > Matt
>>
>> Xinyu,
>>
>> With these patches I don't see a new -mcpu flag. Isn't a tune/cpu flag
>> the normal way to activate this code?
>>
>> Other .md files have statements like (eq_attr "tune" "cortexa8"), but
>> I don't see how to turn on the marvell-f-iwmmxt attribute, ie (eq_attr
>> "marvell_f_iwmmxt" "yes").
>>
>> Please let me know.
>>
>> Thanks,
>> Matt
>
> Hi Matt,
>
> I updated the patches several months ago by following the review opinions 
> form Richard Earnshaw [richard.earns...@arm.com]
> (unfortunately, no further feedback)
> The newest patches are
> http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01787.html
> http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01788.html
> http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01789.html
> http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01786.html
> http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01599.html
> The main discussion is in
> http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01786.html
>
> No new -mcpu flag is introduced in the patches. You can simply turn on 
> marvell-f-iwmmxt by -mcpu=iwmmxt2(or -march=iwmmxt2).
> (Of course it is odd to treat the "iwmmxt2" as a name of cpu)
>
>
> Thanks,
> Xinyu

Thanks for the email, Xinyu!

We (OLPC) will test the patches and then I'll resubmit them to
gcc-patches@ and try to get them included. They're definitely needed
for us, since they fix PR35294 (iwmmxt shift and logical intrinsics
are broken).

By the way, are there patches for add general instruction scheduling
support for Marvell CPUs like the Armada 610?

Thanks again,

Matt

Re: [PATCH 2/2] arm: add iwMMXt mmx-2.c test

2012-05-28 Thread Matt Turner

On Thu, Apr 5, 2012 at 4:53 AM, Ramana Radhakrishnan
 wrote:
> On 4 April 2012 19:35, Matt Turner  wrote:
>>  gcc/testsuite/gcc.target/arm/mmx-2.c |  158 
>> ++
>>  1 files changed, 158 insertions(+), 0 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.target/arm/mmx-2.c
>>
>> diff --git a/gcc/testsuite/gcc.target/arm/mmx-2.c 
>> b/gcc/testsuite/gcc.target/arm/mmx-2.c
>> new file mode 100644
>> index 000..603a63b
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/arm/mmx-2.c
>> @@ -0,0 +1,158 @@
>> +/* { dg-do compile } */
>> +/* { dg-skip-if "Test is specific to the iWMMXt" { arm*-*-* } { "-mcpu=*" } 
>> { "-mcpu=iwmmxt" } } */
>> +/* { dg-skip-if "Test is specific to the iWMMXt" { arm*-*-* } { "-mabi=*" } 
>> { "-mabi=iwmmxt" } } */
>> +/* { dg-skip-if "Test is specific to the iWMMXt" { arm*-*-* } { "-march=*" 
>> } { "-march=iwmmxt" } } */
>> +/* { dg-skip-if "Test is specific to ARM mode" { arm*-*-* } { "-mthumb" } { 
>> "" } } */
>
> How about simplifying this with a dg-require-effective-target
> arm_arm_ok instead of doing
> dg-require-effective-target arm32 and then skipping it for Thumb2 ?

I might not understand properly, but couldn't I just do this?

/* { dg-require-effective-target arm_iwmmxt_ok } */

Thanks,
Matt

Re: [PATCH 1/2] mips: Add R4600 scheduling support for imul and idiv

2012-05-28 Thread Matt Turner

On Sat, Feb 25, 2012 at 3:11 AM, Richard Sandiford
 wrote:
> Matt Turner  writes:
>> The r4600_imul and r4600_idiv reservations were correct for si, but
>> there were no *_di reservations.
>>
>> See page 4 of
>> http://www.sgistuff.net/hardware/other/documents/R4600_Prod_OV.pdf
>>
>> 2012-02-24  Matt Turner  
>>
>>       * config/mips/4600.md (r4600_imul_si): Rename from r4600_imul.
>>       (r4600_imul_di): New.
>>       (r4600_idiv_si): Rename from r4600_idiv.
>>       (r4600_idiv_di): New.
>
> Both patches look good, thanks.  Will commit once 4.8 is open and the
> copyright assignment is sorted.
>
> Richard

Copyright assignment is sorted. Please commit. :)

Re: [PATCH] alpha: add bypasses for fmul/fadd/fcmov -> fst/ftoi

2012-05-28 Thread Matt Turner

On Fri, Feb 24, 2012 at 10:53 PM, Matt Turner  wrote:
> See section 2.5.3 (page 28) of
> http://download.majix.org/dec/comp_guide_v2.pdf
>
> 2012-02-24  Matt Turner  
>
>        * config/alpha/ev6.md: (define_bypass "ev6_fmul,ev6_fadd"): New.
>        (define_bypass "ev6_fcmov"): New.
> ---
>  gcc/config/alpha/ev6.md |    4 
>  1 files changed, 4 insertions(+), 0 deletions(-)
>
> diff --git a/gcc/config/alpha/ev6.md b/gcc/config/alpha/ev6.md
> index adfe504..a16535a 100644
> --- a/gcc/config/alpha/ev6.md
> +++ b/gcc/config/alpha/ev6.md
> @@ -147,11 +147,15 @@
>        (eq_attr "type" "fadd,fcpys,fbr"))
>   "ev6_fa")
>
> +(define_bypass 6 "ev6_fmul,ev6_fadd" "ev6_fst,ev6_ftoi")
> +
>  (define_insn_reservation "ev6_fcmov" 8
>   (and (eq_attr "tune" "ev6")
>        (eq_attr "type" "fcmov"))
>   "ev6_fa,nothing*3,ev6_fa")
>
> +(define_bypass 10 "ev6_fcmov" "ev6_fst,ev6_ftoi")
> +
>  (define_insn_reservation "ev6_fdivsf" 12
>   (and (eq_attr "tune" "ev6")
>        (and (eq_attr "type" "fdiv")
> --
> 1.7.3.4
>

Copyright assignment is sorted. Please commit. :)

Re: [PATCH] arm: add _mm_empty to mmintrin.h for source compatibility

2012-05-28 Thread Matt Turner

On Tue, Feb 28, 2012 at 7:13 PM, Ramana Radhakrishnan
 wrote:
> On Fri, Feb 24, 2012 at 10:53:35PM -0500, Matt Turner wrote:
>> The x86/amd64 mmintrin.h provides the _mm_empty intrinsic for the 'emms'
>> MMX instruction. Although ARM does not need such an instruction, we
>> should provide an empty _mm_empty function nonetheless for source
>> compatibility.
>
> OK for 4.8 and after your copyright assignment has been
> sorted.
>
> Ramana
>
>>
>> 2012-02-24  Matt Turner  
>>
>>       * config/arm/mmintrin.h (_mm_empty): New.
>> ---
>>  gcc/config/arm/mmintrin.h |    7 +++
>>  1 files changed, 7 insertions(+), 0 deletions(-)
>>
>> diff --git a/gcc/config/arm/mmintrin.h b/gcc/config/arm/mmintrin.h
>> index 2cc500d..ea73bf1 100644
>> --- a/gcc/config/arm/mmintrin.h
>> +++ b/gcc/config/arm/mmintrin.h
>> @@ -32,6 +32,12 @@ typedef int __v2si __attribute__ ((vector_size (8)));
>>  typedef short __v4hi __attribute__ ((vector_size (8)));
>>  typedef char __v8qi __attribute__ ((vector_size (8)));
>>
>> +/* Provided for source compatibility with MMX.  */
>> +extern __inline void __attribute__((__gnu_inline__, __always_inline__, 
>> __artificial__))
>> +_mm_empty (void)
>> +{
>> +}
>> +
>>  /* "Convert" __m64 and __int64 into each other.  */
>>  static __inline __m64
>>  _mm_cvtsi64_m64 (__int64 __i)
>> @@ -1248,6 +1254,7 @@ _m_from_int (int __a)
>>  #define _m_psadzbw _mm_sadz_pu8
>>  #define _m_psadzwd _mm_sadz_pu16
>>  #define _m_paligniq _mm_align_si64
>> +#define _m_empty _mm_empty
>>  #define _m_cvt_si2pi _mm_cvtsi64_m64
>>  #define _m_cvt_pi2si _mm_cvtm64_si64
>>
>> --
>> 1.7.3.4
>>

Copyright assignment is sorted. Please commit. :)

Re: [PATCH] arm: Fix iwmmxt shift and logical intrinsics (PR 35294).

2012-05-28 Thread Matt Turner

On Fri, Feb 24, 2012 at 10:53 PM, Matt Turner  wrote:
> PR 36798 and 36966 are duplicates.
>
> 2012-02-24  Matt Turner  
>
>        PR target/35294
>        * config/arm/arm.c (arm_expand_builtin): Wire up missing
>        intrinsics.
> ---
>  gcc/config/arm/arm.c |   62 
> +-
>  1 files changed, 61 insertions(+), 1 deletions(-)

Drop this patch. Marvell has a five patch series that fixes this and
more. Maybe this patch would be suitable for the 4.6 and 4.7 branches,
since Marvell's adds some features?

Re: [PATCH 1/2] doc: Correct __builtin_arm_tinsr prototype documentation

2012-05-28 Thread Matt Turner

On Wed, Apr 4, 2012 at 2:34 PM, Matt Turner  wrote:
> 2012-04-04  Matt Turner  
>
>        gcc/
>        * doc/extend.texi (__builtin_arm_tinsrb): Add missing second
>        parameter.
>        (__builtin_arm_tinsrh): Likewise.
>        (__builtin_arm_tinsrw): Likewise.
> ---
> This patch and 2/2 are tie-ons to
> http://gcc.gnu.org/ml/gcc-patches/2012-02/msg01269.html
>
> Still waiting on copyright assignment, but I think this doc patch
> is trivial enough to be committed without it.
>
>  gcc/doc/extend.texi |    6 +++---
>  1 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index bb43825..966175d 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -8676,9 +8676,9 @@ int __builtin_arm_textrmsw (v2si, int)
>  int __builtin_arm_textrmub (v8qi, int)
>  int __builtin_arm_textrmuh (v4hi, int)
>  int __builtin_arm_textrmuw (v2si, int)
> -v8qi __builtin_arm_tinsrb (v8qi, int)
> -v4hi __builtin_arm_tinsrh (v4hi, int)
> -v2si __builtin_arm_tinsrw (v2si, int)
> +v8qi __builtin_arm_tinsrb (v8qi, int, int)
> +v4hi __builtin_arm_tinsrh (v4hi, int, int)
> +v2si __builtin_arm_tinsrw (v2si, int, int)
>  long long __builtin_arm_tmia (long long, int, int)
>  long long __builtin_arm_tmiabb (long long, int, int)
>  long long __builtin_arm_tmiabt (long long, int, int)
> --
> 1.7.3.4
>

Copyright assignment is sorted. Please commit. :)

[PATCH ARM iWMMXt 0/5] Improve iWMMXt support

2012-05-28 Thread Matt Turner


This series was written by Marvell and sent by Xinyu Qi 
a number of times in the last year.

We (One Laptop per Child) need these patches for reasonable iWMMXt support
and performance. Without them, logical and shift intrinsics cause ICEs,
see PR 35294 and its duplicates 36798 and 36966.

The software compositing library pixman uses MMX intrinsics to optimize
various compositing routines. The following are the minimum execution times
of cairo-perf-trace graphics work loads without and with iWMMXt-optimized
pixman for the image and image16 backends (32-bpp and 16-bpp respectively).

 image   image16
   evolution   33.492 ->  29.59030.334 ->  24.751
firefox-planet-gnome  191.465 -> 173.835   211.297 -> 187.570
gnome-system-monitor   51.956 ->  44.54952.272 ->  40.525
  gnome-terminal-vim   53.625 ->  54.55447.593 ->  47.341
  grads-heat-map4.439 ->   4.165 4.548 ->   4.624
   midori-zoomed   38.033 ->  28.50038.576 ->  26.937
 poppler   41.096 ->  31.94941.230 ->  31.749
  swfdec-giant-steps   20.062 ->  16.91228.294 ->  17.286
  swfdec-youtube   42.281 ->  37.33552.848 ->  47.053
   xfce4-terminal-a1   64.311 ->  51.01162.592 ->  51.191

We have cleaned up some white-space issues with the patches and fixed a
small bug in patch 4/5 since the last time they were posted in December
(added tandc,textrc,torc,torvsc to the "wtype" attribute)

Please commit them for 4.8.

For 4.7 and 4.6 please consider committing my patch
"[PATCH] arm: Fix iwmmxt shift and logical intrinsics (PR 35294)."
which only fixes the logical and shift intrinsics.

Thanks,

Matt Turner

[PATCH ARM iWMMXt 5/5] pipeline description

2012-05-28 Thread Matt Turner

From: Xinyu Qi 

gcc/
* config/arm/t-arm (MD_INCLUDES): Add marvell-f-iwmmxt.md.
* config/arm/marvell-f-iwmmxt.md: New file.
* config/arm/arm.md (marvell-f-iwmmxt.md): Include.
---
 gcc/config/arm/arm.md  |1 +
 gcc/config/arm/marvell-f-iwmmxt.md |  179 
 gcc/config/arm/t-arm   |1 +
 3 files changed, 181 insertions(+), 0 deletions(-)
 create mode 100644 gcc/config/arm/marvell-f-iwmmxt.md

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index b0333c2..baa3b7c 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -546,6 +546,7 @@
  (const_string "yes")
  (const_string "no"
 
+(include "marvell-f-iwmmxt.md")
 (include "arm-generic.md")
 (include "arm926ejs.md")
 (include "arm1020e.md")
diff --git a/gcc/config/arm/marvell-f-iwmmxt.md 
b/gcc/config/arm/marvell-f-iwmmxt.md
new file mode 100644
index 000..fe8e455
--- /dev/null
+++ b/gcc/config/arm/marvell-f-iwmmxt.md
@@ -0,0 +1,179 @@
+;; Marvell WMMX2 pipeline description
+;; Copyright (C) 2011 Free Software Foundation, Inc.
+;; Written by Marvell, Inc.
+
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published
+;; by the Free Software Foundation; either version 3, or (at your
+;; option) any later version.
+
+;; GCC is distributed in the hope that it will be useful, but WITHOUT
+;; ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+;; or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+;; License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; .
+
+
+(define_automaton "marvell_f_iwmmxt")
+
+
+;; Pipelines
+
+
+;; This is a 7-stage pipelines:
+;;
+;;MD | MI | ME1 | ME2 | ME3 | ME4 | MW
+;;
+;; There are various bypasses modelled to a greater or lesser extent.
+;;
+;; Latencies in this file correspond to the number of cycles after
+;; the issue stage that it takes for the result of the instruction to
+;; be computed, or for its side-effects to occur.
+
+(define_cpu_unit "mf_iwmmxt_MD" "marvell_f_iwmmxt")
+(define_cpu_unit "mf_iwmmxt_MI" "marvell_f_iwmmxt")
+(define_cpu_unit "mf_iwmmxt_ME1" "marvell_f_iwmmxt")
+(define_cpu_unit "mf_iwmmxt_ME2" "marvell_f_iwmmxt")
+(define_cpu_unit "mf_iwmmxt_ME3" "marvell_f_iwmmxt")
+(define_cpu_unit "mf_iwmmxt_ME4" "marvell_f_iwmmxt")
+(define_cpu_unit "mf_iwmmxt_MW" "marvell_f_iwmmxt")
+
+(define_reservation "mf_iwmmxt_ME"
+  "mf_iwmmxt_ME1,mf_iwmmxt_ME2,mf_iwmmxt_ME3,mf_iwmmxt_ME4"
+)
+
+(define_reservation "mf_iwmmxt_pipeline"
+  "mf_iwmmxt_MD, mf_iwmmxt_MI, mf_iwmmxt_ME, mf_iwmmxt_MW"
+)
+
+;; An attribute to indicate whether our reservations are applicable.
+(define_attr "marvell_f_iwmmxt" "yes,no"
+  (const (if_then_else (symbol_ref "arm_arch_iwmmxt")
+   (const_string "yes") (const_string "no"
+
+
+;; instruction classes
+
+
+;; An attribute appended to instructions for classification
+
+(define_attr "wmmxt_shift" "yes,no"
+  (if_then_else (eq_attr "wtype" "wror, wsll, wsra, wsrl")
+   (const_string "yes") (const_string "no"))
+)
+
+(define_attr "wmmxt_pack" "yes,no"
+  (if_then_else (eq_attr "wtype" "waligni, walignr, wmerge, wpack, wshufh, 
wunpckeh, wunpckih, wunpckel, wunpckil")
+   (const_string "yes") (const_string "no"))
+)
+
+(define_attr "wmmxt_mult_c1" "yes,no"
+  (if_then_else (eq_attr "wtype" "wmac, wmadd, wmiaxy, wmiawxy, wmulw, 
wqmiaxy, wqmulwm")
+   (const_string "yes") (const_string "no"))
+)
+
+(define_attr "wmmxt_mult_c2" "yes,no"
+  (if_then_else (eq_attr "wtype" "wmul, wqmulm")
+   (const_string "yes") (const_string "no"))
+)
+
+(define_attr "wmmxt_alu_c1" "yes,no"
+  (if_then_else (eq_attr "wtype" "wabs, wabsdiff, wand, wandn, wmov, wor, 
wxor")
+   (const_string "yes") (const_string "no"))
+)
+
+(define_attr "wmmxt_alu_c2" "yes,no"
+  (if_then_else (eq_attr "wtype" "wacc, wadd, waddsubhx, wavg2, wavg4, wcmpeq, 
wcmpgt, wmax, wmin, wsub, waddbhus, wsubaddhx")
+   (const_string "yes") (const_string "no"))
+)
+
+(define_attr "wmmxt_alu_c3" "yes,no"
+  (if_then_else (eq_attr "wtype" "wsad")
+   (const_string "yes") (const_string "no"))
+)
+
+(define_attr "wmmxt_transfer_c1" "yes,no"
+  (if_then_else (eq_attr "wtype" "tbcst, tinsr, tmcr, tmcrr")
+(const_string "yes") (const_string "no"))
+)
+
+(define_attr "wmmxt_transfer_c2" "yes,no"
+  (if_then_else (eq_attr "wtype" "textrm, tmo

[PATCH ARM iWMMXt 1/5] ARM code generic change

2012-05-28 Thread Matt Turner

From: Xinyu Qi 

gcc/
* config/arm/arm.c (FL_IWMMXT2): New define.
(arm_arch_iwmmxt2): New variable.
(arm_option_override): Enable use of iWMMXt with VFP.
Disable use of iWMMXt with NEON. Disable use of iWMMXt under
Thumb mode. Set arm_arch_iwmmxt2.
(arm_expand_binop_builtin): Accept VOIDmode op.
* config/arm/arm.h (TARGET_CPU_CPP_BUILTINS): Define __IWMMXT2__.
(TARGET_IWMMXT2): New define.
(TARGET_REALLY_IWMMXT2): Likewise.
(arm_arch_iwmmxt2): Declare.
* config/arm/arm-cores.def (iwmmxt2): Add FL_IWMMXT2.
* config/arm/arm-arches.def (iwmmxt2): Likewise.
* config/arm/arm.md (arch): Add "iwmmxt2".
(arch_enabled): Handle "iwmmxt2".
---
 gcc/config/arm/arm-arches.def |2 +-
 gcc/config/arm/arm-cores.def  |2 +-
 gcc/config/arm/arm.c  |   25 +
 gcc/config/arm/arm.h  |7 +++
 gcc/config/arm/arm.md |6 +-
 5 files changed, 31 insertions(+), 11 deletions(-)

diff --git a/gcc/config/arm/arm-arches.def b/gcc/config/arm/arm-arches.def
index 3123426..f4dd6cc 100644
--- a/gcc/config/arm/arm-arches.def
+++ b/gcc/config/arm/arm-arches.def
@@ -57,4 +57,4 @@ ARM_ARCH("armv7-m", cortexm3, 7M,  FL_CO_PROC | 
FL_FOR_ARCH7M)
 ARM_ARCH("armv7e-m", cortexm4,  7EM, FL_CO_PROC |FL_FOR_ARCH7EM)
 ARM_ARCH("ep9312",  ep9312, 4T,  FL_LDSCHED | FL_CIRRUS | FL_FOR_ARCH4)
 ARM_ARCH("iwmmxt",  iwmmxt, 5TE, FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | 
FL_XSCALE | FL_IWMMXT)
-ARM_ARCH("iwmmxt2", iwmmxt2,5TE, FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | 
FL_XSCALE | FL_IWMMXT)
+ARM_ARCH("iwmmxt2", iwmmxt2,5TE, FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | 
FL_XSCALE | FL_IWMMXT | FL_IWMMXT2)
diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def
index d82b10b..c82eada 100644
--- a/gcc/config/arm/arm-cores.def
+++ b/gcc/config/arm/arm-cores.def
@@ -105,7 +105,7 @@ ARM_CORE("arm1020e",  arm1020e, 5TE,
 FL_LDSCHED, fastmul)
 ARM_CORE("arm1022e",  arm1022e,5TE, 
FL_LDSCHED, fastmul)
 ARM_CORE("xscale",xscale,  5TE, 
FL_LDSCHED | FL_STRONG | FL_XSCALE, xscale)
 ARM_CORE("iwmmxt",iwmmxt,  5TE, 
FL_LDSCHED | FL_STRONG | FL_XSCALE | FL_IWMMXT, xscale)
-ARM_CORE("iwmmxt2",   iwmmxt2, 5TE, 
FL_LDSCHED | FL_STRONG | FL_XSCALE | FL_IWMMXT, xscale)
+ARM_CORE("iwmmxt2",   iwmmxt2, 5TE, 
FL_LDSCHED | FL_STRONG | FL_XSCALE | FL_IWMMXT | FL_IWMMXT2, xscale)
 ARM_CORE("fa606te",   fa606te,  5TE, 
FL_LDSCHED, 9e)
 ARM_CORE("fa626te",   fa626te,  5TE, 
FL_LDSCHED, 9e)
 ARM_CORE("fmp626",fmp626,   5TE, 
FL_LDSCHED, 9e)
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 7a98197..b0680ab 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -685,6 +685,7 @@ static int thumb_call_reg_needed;
 #define FL_ARM_DIV(1 << 23)  /* Hardware divide (ARM mode).  */
 
 #define FL_IWMMXT (1 << 29)  /* XScale v2 or "Intel Wireless 
MMX technology".  */
+#define FL_IWMMXT2(1 << 30)   /* "Intel Wireless MMX2 technology".  */
 
 /* Flags that only effect tuning, not available instructions.  */
 #define FL_TUNE(FL_WBUF | FL_VFPV2 | FL_STRONG | FL_LDSCHED \
@@ -766,6 +767,9 @@ int arm_arch_cirrus = 0;
 /* Nonzero if this chip supports Intel Wireless MMX technology.  */
 int arm_arch_iwmmxt = 0;
 
+/* Nonzero if this chip supports Intel Wireless MMX2 technology.  */
+int arm_arch_iwmmxt2 = 0;
+
 /* Nonzero if this chip is an XScale.  */
 int arm_arch_xscale = 0;
 
@@ -1717,6 +1721,7 @@ arm_option_override (void)
   arm_tune_wbuf = (tune_flags & FL_WBUF) != 0;
   arm_tune_xscale = (tune_flags & FL_XSCALE) != 0;
   arm_arch_iwmmxt = (insn_flags & FL_IWMMXT) != 0;
+  arm_arch_iwmmxt2 = (insn_flags & FL_IWMMXT2) != 0;
   arm_arch_thumb_hwdiv = (insn_flags & FL_THUMB_DIV) != 0;
   arm_arch_arm_hwdiv = (insn_flags & FL_ARM_DIV) != 0;
   arm_tune_cortex_a9 = (arm_tune == cortexa9) != 0;
@@ -1817,14 +1822,17 @@ arm_option_override (void)
 }
 
   /* FPA and iWMMXt are incompatible because the insn encodings overlap.
- VFP and iWMMXt can theoretically coexist, but it's unlikely such silicon
- will ever exist.  GCC makes no attempt to support this combination.  */
-  if (TARGET_IWMMXT && !TARGET_SOFT_FLOAT)
-sorry ("iWMMXt and hardware floating point");
+ VFP and iWMMXt however can coexist.  */
+  if (TARGET_IWMMXT && TARGET_HARD_FLOAT && !TARGET_VFP)
+error ("iWMMXt and non-VFP floating point unit are incompatible");
+
+  /* iWMMXt and NEON are incompatible.  */
+  if (TARGET_IWMMXT && TARGET_NEON)
+error ("iWMMXt and NEON are inc

[PATCH ARM iWMMXt 2/5] intrinsic head file change

2012-05-28 Thread Matt Turner

From: Xinyu Qi 

gcc/
* config/arm/mmintrin.h: Use __IWMMXT__ to enable iWMMXt intrinsics.
Use __IWMMXT2__ to enable iWMMXt2 intrinsics.
Use C name-mangling for intrinsics.
(__v8qi): Redefine.
(_mm_cvtsi32_si64, _mm_andnot_si64, _mm_sad_pu8): Revise.
(_mm_sad_pu16, _mm_align_si64, _mm_setwcx, _mm_getwcx): Likewise.
(_m_from_int): Likewise.
(_mm_sada_pu8, _mm_sada_pu16): New intrinsic.
(_mm_alignr0_si64, _mm_alignr1_si64, _mm_alignr2_si64): Likewise.
(_mm_alignr3_si64, _mm_tandcb, _mm_tandch, _mm_tandcw): Likewise.
(_mm_textrcb, _mm_textrch, _mm_textrcw, _mm_torcb): Likewise.
(_mm_torch, _mm_torcw, _mm_tbcst_pi8, _mm_tbcst_pi16): Likewise.
(_mm_tbcst_pi32): Likewise.
(_mm_abs_pi8, _mm_abs_pi16, _mm_abs_pi32): New iWMMXt2 intrinsic.
(_mm_addsubhx_pi16, _mm_absdiff_pu8, _mm_absdiff_pu16): Likewise.
(_mm_absdiff_pu32, _mm_addc_pu16, _mm_addc_pu32): Likewise.
(_mm_avg4_pu8, _mm_avg4r_pu8, _mm_maddx_pi16, _mm_maddx_pu16): Likewise.
(_mm_msub_pi16, _mm_msub_pu16, _mm_mulhi_pi32): Likewise.
(_mm_mulhi_pu32, _mm_mulhir_pi16, _mm_mulhir_pi32): Likewise.
(_mm_mulhir_pu16, _mm_mulhir_pu32, _mm_mullo_pi32): Likewise.
(_mm_qmulm_pi16, _mm_qmulm_pi32, _mm_qmulmr_pi16): Likewise.
(_mm_qmulmr_pi32, _mm_subaddhx_pi16, _mm_addbhusl_pu8): Likewise.
(_mm_addbhusm_pu8, _mm_qmiabb_pi32, _mm_qmiabbn_pi32): Likewise.
(_mm_qmiabt_pi32, _mm_qmiabtn_pi32, _mm_qmiatb_pi32): Likewise.
(_mm_qmiatbn_pi32, _mm_qmiatt_pi32, _mm_qmiattn_pi32): Likewise.
(_mm_wmiabb_si64, _mm_wmiabbn_si64, _mm_wmiabt_si64): Likewise.
(_mm_wmiabtn_si64, _mm_wmiatb_si64, _mm_wmiatbn_si64): Likewise.
(_mm_wmiatt_si64, _mm_wmiattn_si64, _mm_wmiawbb_si64): Likewise.
(_mm_wmiawbbn_si64, _mm_wmiawbt_si64, _mm_wmiawbtn_si64): Likewise.
(_mm_wmiawtb_si64, _mm_wmiawtbn_si64, _mm_wmiawtt_si64): Likewise.
(_mm_wmiawttn_si64, _mm_merge_si64): Likewise.
(_mm_torvscb, _mm_torvsch, _mm_torvscw): Likewise.
(_m_to_int): New define.
---
 gcc/config/arm/mmintrin.h |  649 ++---
 1 files changed, 614 insertions(+), 35 deletions(-)

diff --git a/gcc/config/arm/mmintrin.h b/gcc/config/arm/mmintrin.h
index 2cc500d..0fe551d 100644
--- a/gcc/config/arm/mmintrin.h
+++ b/gcc/config/arm/mmintrin.h
@@ -24,16 +24,30 @@
 #ifndef _MMINTRIN_H_INCLUDED
 #define _MMINTRIN_H_INCLUDED
 
+#ifndef __IWMMXT__
+#error You must enable WMMX/WMMX2 instructions (e.g. -march=iwmmxt or 
-march=iwmmxt2) to use iWMMXt/iWMMXt2 intrinsics
+#else
+
+#ifndef __IWMMXT2__
+#warning You only enable iWMMXt intrinsics. Extended iWMMXt2 intrinsics 
available only if WMMX2 instructions enabled (e.g. -march=iwmmxt2)
+#endif
+
+
+#if defined __cplusplus
+extern "C" { /* Begin "C" */
+/* Intrinsics use C name-mangling.  */
+#endif /* __cplusplus */
+
 /* The data type intended for user use.  */
 typedef unsigned long long __m64, __int64;
 
 /* Internal data types for implementing the intrinsics.  */
 typedef int __v2si __attribute__ ((vector_size (8)));
 typedef short __v4hi __attribute__ ((vector_size (8)));
-typedef char __v8qi __attribute__ ((vector_size (8)));
+typedef signed char __v8qi __attribute__ ((vector_size (8)));
 
 /* "Convert" __m64 and __int64 into each other.  */
-static __inline __m64 
+static __inline __m64
 _mm_cvtsi64_m64 (__int64 __i)
 {
   return __i;
@@ -54,7 +68,7 @@ _mm_cvtsi64_si32 (__int64 __i)
 static __inline __int64
 _mm_cvtsi32_si64 (int __i)
 {
-  return __i;
+  return (__i & 0x);
 }
 
 /* Pack the four 16-bit values from M1 into the lower four 8-bit values of
@@ -603,7 +617,7 @@ _mm_and_si64 (__m64 __m1, __m64 __m2)
 static __inline __m64
 _mm_andnot_si64 (__m64 __m1, __m64 __m2)
 {
-  return __builtin_arm_wandn (__m1, __m2);
+  return __builtin_arm_wandn (__m2, __m1);
 }
 
 /* Bit-wise inclusive OR the 64-bit values in M1 and M2.  */
@@ -935,7 +949,13 @@ _mm_avg2_pu16 (__m64 __A, __m64 __B)
 static __inline __m64
 _mm_sad_pu8 (__m64 __A, __m64 __B)
 {
-  return (__m64) __builtin_arm_wsadb ((__v8qi)__A, (__v8qi)__B);
+  return (__m64) __builtin_arm_wsadbz ((__v8qi)__A, (__v8qi)__B);
+}
+
+static __inline __m64
+_mm_sada_pu8 (__m64 __A, __m64 __B, __m64 __C)
+{
+  return (__m64) __builtin_arm_wsadb ((__v2si)__A, (__v8qi)__B, (__v8qi)__C);
 }
 
 /* Compute the sum of the absolute differences of the unsigned 16-bit
@@ -944,9 +964,16 @@ _mm_sad_pu8 (__m64 __A, __m64 __B)
 static __inline __m64
 _mm_sad_pu16 (__m64 __A, __m64 __B)
 {
-  return (__m64) __builtin_arm_wsadh ((__v4hi)__A, (__v4hi)__B);
+  return (__m64) __builtin_arm_wsadhz ((__v4hi)__A, (__v4hi)__B);
 }
 
+static __inline __m64
+_mm_sada_pu16 (__m64 __A, __m64 __B, __m64 __C)
+{
+  return (__m64) __builtin_arm_wsadh ((__v2si)__A, (__v4hi)__B, (__v4hi)__C);
+}
+
+
 /* Compute the sum of the absolute differences of th

[PATCH ARM iWMMXt 3/5] built in define and expand

2012-05-28 Thread Matt Turner

From: Xinyu Qi 

gcc/
* config/arm/arm.c (enum arm_builtins): Revise built-in fcode.
(IWMMXT2_BUILTIN): New define.
(IWMMXT2_BUILTIN2): Likewise.
(iwmmx2_mbuiltin): Likewise.
(builtin_description bdesc_2arg): Revise built in declaration.
(builtin_description bdesc_1arg): Likewise.
(arm_init_iwmmxt_builtins): Revise built in initialization.
(arm_expand_builtin): Revise built in expansion.
---
 gcc/config/arm/arm.c |  620 +-
 1 files changed, 559 insertions(+), 61 deletions(-)

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index b0680ab..51eed40 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -19637,8 +19637,15 @@ static neon_builtin_datum neon_builtin_data[] =
FIXME?  */
 enum arm_builtins
 {
-  ARM_BUILTIN_GETWCX,
-  ARM_BUILTIN_SETWCX,
+  ARM_BUILTIN_GETWCGR0,
+  ARM_BUILTIN_GETWCGR1,
+  ARM_BUILTIN_GETWCGR2,
+  ARM_BUILTIN_GETWCGR3,
+
+  ARM_BUILTIN_SETWCGR0,
+  ARM_BUILTIN_SETWCGR1,
+  ARM_BUILTIN_SETWCGR2,
+  ARM_BUILTIN_SETWCGR3,
 
   ARM_BUILTIN_WZERO,
 
@@ -19661,7 +19668,11 @@ enum arm_builtins
   ARM_BUILTIN_WSADH,
   ARM_BUILTIN_WSADHZ,
 
-  ARM_BUILTIN_WALIGN,
+  ARM_BUILTIN_WALIGNI,
+  ARM_BUILTIN_WALIGNR0,
+  ARM_BUILTIN_WALIGNR1,
+  ARM_BUILTIN_WALIGNR2,
+  ARM_BUILTIN_WALIGNR3,
 
   ARM_BUILTIN_TMIA,
   ARM_BUILTIN_TMIAPH,
@@ -19797,6 +19808,81 @@ enum arm_builtins
   ARM_BUILTIN_WUNPCKELUH,
   ARM_BUILTIN_WUNPCKELUW,
 
+  ARM_BUILTIN_WABSB,
+  ARM_BUILTIN_WABSH,
+  ARM_BUILTIN_WABSW,
+
+  ARM_BUILTIN_WADDSUBHX,
+  ARM_BUILTIN_WSUBADDHX,
+
+  ARM_BUILTIN_WABSDIFFB,
+  ARM_BUILTIN_WABSDIFFH,
+  ARM_BUILTIN_WABSDIFFW,
+
+  ARM_BUILTIN_WADDCH,
+  ARM_BUILTIN_WADDCW,
+
+  ARM_BUILTIN_WAVG4,
+  ARM_BUILTIN_WAVG4R,
+
+  ARM_BUILTIN_WMADDSX,
+  ARM_BUILTIN_WMADDUX,
+
+  ARM_BUILTIN_WMADDSN,
+  ARM_BUILTIN_WMADDUN,
+
+  ARM_BUILTIN_WMULWSM,
+  ARM_BUILTIN_WMULWUM,
+
+  ARM_BUILTIN_WMULWSMR,
+  ARM_BUILTIN_WMULWUMR,
+
+  ARM_BUILTIN_WMULWL,
+
+  ARM_BUILTIN_WMULSMR,
+  ARM_BUILTIN_WMULUMR,
+
+  ARM_BUILTIN_WQMULM,
+  ARM_BUILTIN_WQMULMR,
+
+  ARM_BUILTIN_WQMULWM,
+  ARM_BUILTIN_WQMULWMR,
+
+  ARM_BUILTIN_WADDBHUSM,
+  ARM_BUILTIN_WADDBHUSL,
+
+  ARM_BUILTIN_WQMIABB,
+  ARM_BUILTIN_WQMIABT,
+  ARM_BUILTIN_WQMIATB,
+  ARM_BUILTIN_WQMIATT,
+
+  ARM_BUILTIN_WQMIABBN,
+  ARM_BUILTIN_WQMIABTN,
+  ARM_BUILTIN_WQMIATBN,
+  ARM_BUILTIN_WQMIATTN,
+
+  ARM_BUILTIN_WMIABB,
+  ARM_BUILTIN_WMIABT,
+  ARM_BUILTIN_WMIATB,
+  ARM_BUILTIN_WMIATT,
+
+  ARM_BUILTIN_WMIABBN,
+  ARM_BUILTIN_WMIABTN,
+  ARM_BUILTIN_WMIATBN,
+  ARM_BUILTIN_WMIATTN,
+
+  ARM_BUILTIN_WMIAWBB,
+  ARM_BUILTIN_WMIAWBT,
+  ARM_BUILTIN_WMIAWTB,
+  ARM_BUILTIN_WMIAWTT,
+
+  ARM_BUILTIN_WMIAWBBN,
+  ARM_BUILTIN_WMIAWBTN,
+  ARM_BUILTIN_WMIAWTBN,
+  ARM_BUILTIN_WMIAWTTN,
+
+  ARM_BUILTIN_WMERGE,
+
   ARM_BUILTIN_THREAD_POINTER,
 
   ARM_BUILTIN_NEON_BASE,
@@ -20329,6 +20415,10 @@ static const struct builtin_description bdesc_2arg[] =
   { FL_IWMMXT, CODE_FOR_##code, "__builtin_arm_" string, \
 ARM_BUILTIN_##builtin, UNKNOWN, 0 },
 
+#define IWMMXT2_BUILTIN(code, string, builtin) \
+  { FL_IWMMXT2, CODE_FOR_##code, "__builtin_arm_" string, \
+ARM_BUILTIN_##builtin, UNKNOWN, 0 },
+
   IWMMXT_BUILTIN (addv8qi3, "waddb", WADDB)
   IWMMXT_BUILTIN (addv4hi3, "waddh", WADDH)
   IWMMXT_BUILTIN (addv2si3, "waddw", WADDW)
@@ -20385,44 +20475,45 @@ static const struct builtin_description bdesc_2arg[] =
   IWMMXT_BUILTIN (iwmmxt_wunpckihb, "wunpckihb", WUNPCKIHB)
   IWMMXT_BUILTIN (iwmmxt_wunpckihh, "wunpckihh", WUNPCKIHH)
   IWMMXT_BUILTIN (iwmmxt_wunpckihw, "wunpckihw", WUNPCKIHW)
-  IWMMXT_BUILTIN (iwmmxt_wmadds, "wmadds", WMADDS)
-  IWMMXT_BUILTIN (iwmmxt_wmaddu, "wmaddu", WMADDU)
+  IWMMXT2_BUILTIN (iwmmxt_waddsubhx, "waddsubhx", WADDSUBHX)
+  IWMMXT2_BUILTIN (iwmmxt_wsubaddhx, "wsubaddhx", WSUBADDHX)
+  IWMMXT2_BUILTIN (iwmmxt_wabsdiffb, "wabsdiffb", WABSDIFFB)
+  IWMMXT2_BUILTIN (iwmmxt_wabsdiffh, "wabsdiffh", WABSDIFFH)
+  IWMMXT2_BUILTIN (iwmmxt_wabsdiffw, "wabsdiffw", WABSDIFFW)
+  IWMMXT2_BUILTIN (iwmmxt_avg4, "wavg4", WAVG4)
+  IWMMXT2_BUILTIN (iwmmxt_avg4r, "wavg4r", WAVG4R)
+  IWMMXT2_BUILTIN (iwmmxt_wmulwsm, "wmulwsm", WMULWSM)
+  IWMMXT2_BUILTIN (iwmmxt_wmulwum, "wmulwum", WMULWUM)
+  IWMMXT2_BUILTIN (iwmmxt_wmulwsmr, "wmulwsmr", WMULWSMR)
+  IWMMXT2_BUILTIN (iwmmxt_wmulwumr, "wmulwumr", WMULWUMR)
+  IWMMXT2_BUILTIN (iwmmxt_wmulwl, "wmulwl", WMULWL)
+  IWMMXT2_BUILTIN (iwmmxt_wmulsmr, "wmulsmr", WMULSMR)
+  IWMMXT2_BUILTIN (iwmmxt_wmulumr, "wmulumr", WMULUMR)
+  IWMMXT2_BUILTIN (iwmmxt_wqmulm, "wqmulm", WQMULM)
+  IWMMXT2_BUILTIN (iwmmxt_wqmulmr, "wqmulmr", WQMULMR)
+  IWMMXT2_BUILTIN (iwmmxt_wqmulwm, "wqmulwm", WQMULWM)
+  IWMMXT2_BUILTIN (iwmmxt_wqmulwmr, "wqmulwmr", WQMULWMR)
+  IWMMXT_BUILTIN (iwmmxt_walignr0, "walignr0", WALIGNR0)
+  IWMMXT_BUILTIN (iwmmxt_walignr1, "walignr1", WALIGNR1)
+  IWMMXT_BUILTIN (iwmmxt_walignr2, "walignr2", WALIGNR2)
+  IWMMXT_BUILTIN (iwmmxt_walignr3, "wali

Re: [PATCH 1/2] mips: Add R4600 scheduling support for imul and idiv

2012-05-31 Thread Matt Turner

On Thu, May 31, 2012 at 5:35 PM, Richard Sandiford
 wrote:
> Matt Turner  writes:
>> On Sat, Feb 25, 2012 at 3:11 AM, Richard Sandiford
>>  wrote:
>>> Matt Turner  writes:
>>>> The r4600_imul and r4600_idiv reservations were correct for si, but
>>>> there were no *_di reservations.
>>>>
>>>> See page 4 of
>>>> http://www.sgistuff.net/hardware/other/documents/R4600_Prod_OV.pdf
>>>>
>>>> 2012-02-24  Matt Turner  
>>>>
>>>>       * config/mips/4600.md (r4600_imul_si): Rename from r4600_imul.
>>>>       (r4600_imul_di): New.
>>>>       (r4600_idiv_si): Rename from r4600_idiv.
>>>>       (r4600_idiv_di): New.
>>>
>>> Both patches look good, thanks.  Will commit once 4.8 is open and the
>>> copyright assignment is sorted.
>>>
>>> Richard
>>
>> Copyright assignment is sorted. Please commit. :)
>
> Applied this one.  Part 2 seems to be based on a different version
> of driver-native.c though.
>
> Thanks for perservering. :-)
>
> Richard

Thanks a lot!

Ah, right, 2/2 was written before IRIX support was removed and changed
driver-native.c significantly.

Updated patch in your inbox shortly.

Thanks!
Matt

[PATCH 2/2] mips: Add R4700 scheduling support

2012-05-31 Thread Matt Turner

The R4700 is identical to the R4600 except for the integer and
floating-point multiplication costs.

See page 4 of http://datasheets.chipdb.org/IDT/MIPS/79RV4700.pdf

2012-03-24  Matt Turner  

gcc/
* config/mips/4600.md (r4700_imul_si): New.
(r4700_imul_di): New.
(r4700_fmul_single): New.
(r4700_fmul_double): New.
* config/mips/mips-cpus.def: Add r4700.
* config/mips/mips.c: Likewise.
* config/mips/mips.md: Likewise.
* config/mips/mips-tables.opt: Regenerate.
---
 gcc/config/mips/4600.md |   51 ++--
 gcc/config/mips/mips-cpus.def   |1 +
 gcc/config/mips/mips-tables.opt |  278 ---
 gcc/config/mips/mips.c  |3 +
 gcc/config/mips/mips.md |1 +
 5 files changed, 187 insertions(+), 147 deletions(-)

diff --git a/gcc/config/mips/4600.md b/gcc/config/mips/4600.md
index 53aa01b..36eab80 100644
--- a/gcc/config/mips/4600.md
+++ b/gcc/config/mips/4600.md
@@ -1,4 +1,4 @@
-;; R4600 and R4650 pipeline description.
+;; R4600, R4650, and R4700 pipeline description.
 ;;   Copyright (C) 2004, 2005, 2007, 2012 Free Software Foundation, Inc.
 ;;
 ;; This file is part of GCC.
@@ -21,8 +21,10 @@
 ;; This file overrides parts of generic.md.  It is derived from the
 ;; old define_function_unit description.
 ;;
-;; We handle the R4600 and R4650 in much the same way.  The only difference
-;; is in the integer multiplication and division costs.
+;; We handle the R4600, R4650, and R4700 in much the same way.  The only
+;; differences between R4600 and R4650 are the integer multiplication and
+;; division costs. The only differences between R4600 and R4700 are the
+;; integer and floating-point multiplication costs.
 
 (define_insn_reservation "r4600_imul_si" 10
   (and (eq_attr "cpu" "r4600")
@@ -37,13 +39,13 @@
   "imuldiv*12")
 
 (define_insn_reservation "r4600_idiv_si" 42
-  (and (eq_attr "cpu" "r4600")
+  (and (eq_attr "cpu" "r4600,r4700")
(eq_attr "type" "idiv")
(eq_attr "mode" "SI"))
   "imuldiv*42")
 
 (define_insn_reservation "r4600_idiv_di" 74
-  (and (eq_attr "cpu" "r4600")
+  (and (eq_attr "cpu" "r4600,r4700")
(eq_attr "type" "idiv")
(eq_attr "mode" "DI"))
   "imuldiv*74")
@@ -60,13 +62,26 @@
   "imuldiv*36")
 
 
+(define_insn_reservation "r4700_imul_si" 8
+  (and (eq_attr "cpu" "r4700")
+   (eq_attr "type" "imul,imul3,imadd")
+   (eq_attr "mode" "SI"))
+  "imuldiv*8")
+
+(define_insn_reservation "r4700_imul_di" 10
+  (and (eq_attr "cpu" "r4700")
+   (eq_attr "type" "imul,imul3,imadd")
+   (eq_attr "mode" "DI"))
+  "imuldiv*10")
+
+
 (define_insn_reservation "r4600_load" 2
-  (and (eq_attr "cpu" "r4600,r4650")
+  (and (eq_attr "cpu" "r4600,r4650,r4700")
(eq_attr "type" "load,fpload,fpidxload"))
   "alu")
 
 (define_insn_reservation "r4600_fmove" 1
-  (and (eq_attr "cpu" "r4600,r4650")
+  (and (eq_attr "cpu" "r4600,r4650,r4700")
(eq_attr "type" "fabs,fneg,fmove"))
   "alu")
 
@@ -82,26 +97,40 @@
(eq_attr "mode" "DF")))
   "alu")
 
+
+(define_insn_reservation "r4700_fmul_single" 4
+  (and (eq_attr "cpu" "r4700")
+   (and (eq_attr "type" "fmul,fmadd")
+   (eq_attr "mode" "SF")))
+  "alu")
+
+(define_insn_reservation "r4700_fmul_double" 5
+  (and (eq_attr "cpu" "r4700")
+   (and (eq_attr "type" "fmul,fmadd")
+   (eq_attr "mode" "DF")))
+  "alu")
+
+
 (define_insn_reservation "r4600_fdiv_single" 32
-  (and (eq_attr "cpu" "r4600,r4650")
+  (and (eq_attr "cpu" "r4600,r4650,r4700")
(and (eq_attr "type" "fdiv,frdiv")
(eq_attr "mode" "SF")))
   "alu")
 
 (define_insn_reservation "r4600_fdiv_double" 61
-  (and (eq_attr "cpu" "r4600,r4650")
+  (and (eq_attr "cpu" "r4600,r4650,r4700")
(and (eq_attr "type" "fdiv,frdiv")
(eq_attr "mode" "DF")))
   "alu")
 
 (define_insn_reservation "r4600_fsqrt_single" 31
-  (and (eq_attr "cpu" "r4600,r4650")
+  (and (eq_attr "cpu" &

Re: [PATCH] vrp: fold ffs to ctz

2012-06-05 Thread Matt Thomas


On Jun 5, 2012, at 6:46 AM, Paolo Bonzini wrote:

>> Do we always have CTZ if we have FFS?  Can't there be a target that
>> implements FFS as opcode but not CTZ, so you'd slow down things?
>> Thus, should the transform be conditonal on target support for CTZ
>> or no target support for FFS?
> 
> Hmm, SH and (some semi-obscure variant of) SPARC.  But actually SPARC
> should define a clz pattern instead; SH should have a popcount pattern +
> a generic trick to expand ctz/ffs in terms of popcount.  I'll submit
> those before applying this patch.

VAX has both FFS/FFC instructions but only a ffs pattern.
It does not have CTZ or CTO patterns but those could be added trivially.

[PATCH] Wire-up missing ARM iwmmxt intrinsics (bugs 35294, 36798, 36966)

2011-08-18 Thread Matt Turner

Hi,

Attached is a patch based on gcc-4.6.1 that wires-up missing ARM
iwmmxt intrinsics. Without it, gcc is completely useless when it comes
to using a large portion of the intrinsics documented on this page:
http://gcc.gnu.org/onlinedocs/gcc/ARM-iWMMXt-Built_002din-Functions.html

The patch is based on the work of  in bug 35294.

I do not know why the check_opsmode hack is necessary. Perhaps serowk
can help with that. I also do not know if this wires up all the
missing intrinsics, but it is sufficient to build a working
iwmmxt-optimized pixman:
http://cgit.freedesktop.org/~mattst88/pixman/log/?h=iwmmxt-optimizations

I have seen much more extensive patches from Xinyu Qi, but I do not
suppose that they will be available in gcc 4.6.

Thanks,
Matt Turner
--- arm.c.orig	2011-08-19 00:03:06.163195724 -0400
+++ arm.c	2011-08-19 00:03:10.872195933 -0400
@@ -157,7 +157,7 @@
 static void arm_init_builtins (void);
 static void arm_init_iwmmxt_builtins (void);
 static rtx safe_vector_operand (rtx, enum machine_mode);
-static rtx arm_expand_binop_builtin (enum insn_code, tree, rtx);
+static rtx arm_expand_binop_builtin (enum insn_code, tree, rtx, bool);
 static rtx arm_expand_unop_builtin (enum insn_code, tree, rtx, int);
 static rtx arm_expand_builtin (tree, rtx, rtx, enum machine_mode, int);
 static void emit_constant_insn (rtx cond, rtx pattern);
@@ -19197,7 +19197,7 @@
 
 static rtx
 arm_expand_binop_builtin (enum insn_code icode,
-			  tree exp, rtx target)
+			  tree exp, rtx target, bool check_opsmode)
 {
   rtx pat;
   tree arg0 = CALL_EXPR_ARG (exp, 0);
@@ -19218,7 +19218,8 @@
   || ! (*insn_data[icode].operand[0].predicate) (target, tmode))
 target = gen_reg_rtx (tmode);
 
-  gcc_assert (GET_MODE (op0) == mode0 && GET_MODE (op1) == mode1);
+  if (check_opsmode)
+gcc_assert (GET_MODE (op0) == mode0 && GET_MODE (op1) == mode1);
 
   if (! (*insn_data[icode].operand[1].predicate) (op0, mode0))
 op0 = copy_to_mode_reg (mode0, op0);
@@ -19760,13 +19761,13 @@
   return target;
 
 case ARM_BUILTIN_WSADB:
-  return arm_expand_binop_builtin (CODE_FOR_iwmmxt_wsadb, exp, target);
+  return arm_expand_binop_builtin (CODE_FOR_iwmmxt_wsadb, exp, target, true);
 case ARM_BUILTIN_WSADH:
-  return arm_expand_binop_builtin (CODE_FOR_iwmmxt_wsadh, exp, target);
+  return arm_expand_binop_builtin (CODE_FOR_iwmmxt_wsadh, exp, target, true);
 case ARM_BUILTIN_WSADBZ:
-  return arm_expand_binop_builtin (CODE_FOR_iwmmxt_wsadbz, exp, target);
+  return arm_expand_binop_builtin (CODE_FOR_iwmmxt_wsadbz, exp, target, true);
 case ARM_BUILTIN_WSADHZ:
-  return arm_expand_binop_builtin (CODE_FOR_iwmmxt_wsadhz, exp, target);
+  return arm_expand_binop_builtin (CODE_FOR_iwmmxt_wsadhz, exp, target, true);
 
   /* Several three-argument builtins.  */
 case ARM_BUILTIN_WMACS:
@@ -19814,6 +19815,65 @@
   emit_insn (pat);
   return target;
 
+case ARM_BUILTIN_WSLLH:
+case ARM_BUILTIN_WSLLHI:
+case ARM_BUILTIN_WSLLW:
+case ARM_BUILTIN_WSLLWI:
+case ARM_BUILTIN_WSLLD:
+case ARM_BUILTIN_WSLLDI:
+case ARM_BUILTIN_WSRAH:
+case ARM_BUILTIN_WSRAHI:
+case ARM_BUILTIN_WSRAW:
+case ARM_BUILTIN_WSRAWI:
+case ARM_BUILTIN_WSRAD:
+case ARM_BUILTIN_WSRADI:
+case ARM_BUILTIN_WSRLH:
+case ARM_BUILTIN_WSRLHI:
+case ARM_BUILTIN_WSRLW:
+case ARM_BUILTIN_WSRLWI:
+case ARM_BUILTIN_WSRLD:
+case ARM_BUILTIN_WSRLDI:
+case ARM_BUILTIN_WRORH:
+case ARM_BUILTIN_WRORHI:
+case ARM_BUILTIN_WRORW:
+case ARM_BUILTIN_WRORWI:
+case ARM_BUILTIN_WRORD:
+case ARM_BUILTIN_WRORDI:
+case ARM_BUILTIN_WAND:
+case ARM_BUILTIN_WANDN:
+case ARM_BUILTIN_WOR:
+case ARM_BUILTIN_WXOR:
+  icode = (fcode == ARM_BUILTIN_WSLLH ? CODE_FOR_ashlv4hi3_di
+	   : fcode == ARM_BUILTIN_WSLLHI ? CODE_FOR_ashlv4hi3_iwmmxt
+	   : fcode == ARM_BUILTIN_WSLLW  ? CODE_FOR_ashlv2si3_di
+	   : fcode == ARM_BUILTIN_WSLLWI ? CODE_FOR_ashlv2si3_iwmmxt
+	   : fcode == ARM_BUILTIN_WSLLD  ? CODE_FOR_ashldi3_di
+	   : fcode == ARM_BUILTIN_WSLLDI ? CODE_FOR_ashldi3_iwmmxt
+	   : fcode == ARM_BUILTIN_WSRAH  ? CODE_FOR_ashrv4hi3_di
+	   : fcode == ARM_BUILTIN_WSRAHI ? CODE_FOR_ashrv4hi3_iwmmxt
+	   : fcode == ARM_BUILTIN_WSRAW  ? CODE_FOR_ashrv2si3_di
+	   : fcode == ARM_BUILTIN_WSRAWI ? CODE_FOR_ashrv2si3_iwmmxt
+	   : fcode == ARM_BUILTIN_WSRAD  ? CODE_FOR_ashrdi3_di
+	   : fcode == ARM_BUILTIN_WSRADI ? CODE_FOR_ashrdi3_iwmmxt
+	   : fcode == ARM_BUILTIN_WSRLH  ? CODE_FOR_lshrv4hi3_di
+	   : fcode == ARM_BUILTIN_WSRLHI ? CODE_FOR_lshrv4hi3_iwmmxt
+	   : fcode == ARM_BUILTIN_WSRLW  ? CODE_FOR_lshrv2si3_di
+	   : fcode == ARM_BUILTIN_WSRLWI ? CODE_FOR_lshrv2si3_iwmmxt
+	   : fcode == ARM_BUILTIN_WSRLD  ? CODE_FOR_lshrdi3_di
+	   : fcode == ARM_BUILTIN_WSRLDI ? CODE_FOR_lshrdi3_iwmmxt
+	   : fcode == ARM_BUILTIN_WRORH  ? CODE

Re: [PATCH] Wire-up missing ARM iwmmxt intrinsics (bugs 35294, 36798, 36966)

2011-08-18 Thread Matt Turner

On Fri, Aug 19, 2011 at 12:13 AM, Matt Turner  wrote:
> Hi,
>
> Attached is a patch based on gcc-4.6.1 that wires-up missing ARM
> iwmmxt intrinsics. Without it, gcc is completely useless when it comes
> to using a large portion of the intrinsics documented on this page:
> http://gcc.gnu.org/onlinedocs/gcc/ARM-iWMMXt-Built_002din-Functions.html
>
> The patch is based on the work of  in bug 35294.
>
> I do not know why the check_opsmode hack is necessary. Perhaps serowk
> can help with that. I also do not know if this wires up all the
> missing intrinsics, but it is sufficient to build a working
> iwmmxt-optimized pixman:
> http://cgit.freedesktop.org/~mattst88/pixman/log/?h=iwmmxt-optimizations
>
> I have seen much more extensive patches from Xinyu Qi, but I do not
> suppose that they will be available in gcc 4.6.
>
> Thanks,
> Matt Turner

Re: [PATCH] Wire-up missing ARM iwmmxt intrinsics (bugs 35294, 36798, 36966)

2011-08-19 Thread Matt Turner

On Fri, Aug 19, 2011 at 2:09 AM, Xinyu Qi  wrote:
> At 2011-08-19 12:18:10,"Matt Turner"  wrote:> Subject: Re:
>>
>> On Fri, Aug 19, 2011 at 12:13 AM, Matt Turner  wrote:
>> > Hi,
>> >
>> > Attached is a patch based on gcc-4.6.1 that wires-up missing ARM
>> > iwmmxt intrinsics. Without it, gcc is completely useless when it comes
>> > to using a large portion of the intrinsics documented on this page:
>> > http://gcc.gnu.org/onlinedocs/gcc/ARM-iWMMXt-Built_002din-Functions.html
>> >
>> > The patch is based on the work of  in bug 35294.
>> >
>> > I do not know why the check_opsmode hack is necessary.
>
> Hi,
>
> I think check_opsmode in this patch is used to solve something that could be 
> solved by
> -  gcc_assert (GET_MODE (op0) == mode0 && GET_MODE (op1) == mode1);
> +  gcc_assert ((GET_MODE (op0) == mode0 || GET_MODE (op0) == VOIDmode)
> +             && (GET_MODE (op1) == mode1 || GET_MODE (op1) == VOIDmode));
> in my patch.
> For example, in the shift intrinsics, the shift count could be either a 
> variable, or a CONST_INT which has VOIDmode.
>
>> >I also do not know if this wires up all the missing intrinsics.
>
> I'm afraid not. Trunk misses all iWMMXt2 intrinsics and the bugs could be 
> found everywhere since it is lack of maintenance for a long time.
>
>> > I have seen much more extensive patches from Xinyu Qi, but I do not
>> > suppose that they will be available in gcc 4.6.
>
> The patches I submitted have some conflict with 4.6 code base.
>
> Thanks,
> Xinyu

Indeed, that seems like the way it should be done. Thanks very much.
See the attached patch.

Thanks,
Matt
--- arm.c.orig	2011-05-05 04:39:40.0 -0400
+++ arm.c	2011-08-19 13:48:21.548405102 -0400
@@ -19218,7 +19218,8 @@
   || ! (*insn_data[icode].operand[0].predicate) (target, tmode))
 target = gen_reg_rtx (tmode);
 
-  gcc_assert (GET_MODE (op0) == mode0 && GET_MODE (op1) == mode1);
+  gcc_assert ((GET_MODE (op0) == mode0 || GET_MODE (op0) == VOIDmode)
+ && (GET_MODE (op1) == mode1 || GET_MODE (op1) == VOIDmode));
 
   if (! (*insn_data[icode].operand[1].predicate) (op0, mode0))
 op0 = copy_to_mode_reg (mode0, op0);
@@ -19814,6 +19815,65 @@
   emit_insn (pat);
   return target;
 
+case ARM_BUILTIN_WSLLH:
+case ARM_BUILTIN_WSLLHI:
+case ARM_BUILTIN_WSLLW:
+case ARM_BUILTIN_WSLLWI:
+case ARM_BUILTIN_WSLLD:
+case ARM_BUILTIN_WSLLDI:
+case ARM_BUILTIN_WSRAH:
+case ARM_BUILTIN_WSRAHI:
+case ARM_BUILTIN_WSRAW:
+case ARM_BUILTIN_WSRAWI:
+case ARM_BUILTIN_WSRAD:
+case ARM_BUILTIN_WSRADI:
+case ARM_BUILTIN_WSRLH:
+case ARM_BUILTIN_WSRLHI:
+case ARM_BUILTIN_WSRLW:
+case ARM_BUILTIN_WSRLWI:
+case ARM_BUILTIN_WSRLD:
+case ARM_BUILTIN_WSRLDI:
+case ARM_BUILTIN_WRORH:
+case ARM_BUILTIN_WRORHI:
+case ARM_BUILTIN_WRORW:
+case ARM_BUILTIN_WRORWI:
+case ARM_BUILTIN_WRORD:
+case ARM_BUILTIN_WRORDI:
+case ARM_BUILTIN_WAND:
+case ARM_BUILTIN_WANDN:
+case ARM_BUILTIN_WOR:
+case ARM_BUILTIN_WXOR:
+  icode = (fcode == ARM_BUILTIN_WSLLH ? CODE_FOR_ashlv4hi3_di
+	   : fcode == ARM_BUILTIN_WSLLHI ? CODE_FOR_ashlv4hi3_iwmmxt
+	   : fcode == ARM_BUILTIN_WSLLW  ? CODE_FOR_ashlv2si3_di
+	   : fcode == ARM_BUILTIN_WSLLWI ? CODE_FOR_ashlv2si3_iwmmxt
+	   : fcode == ARM_BUILTIN_WSLLD  ? CODE_FOR_ashldi3_di
+	   : fcode == ARM_BUILTIN_WSLLDI ? CODE_FOR_ashldi3_iwmmxt
+	   : fcode == ARM_BUILTIN_WSRAH  ? CODE_FOR_ashrv4hi3_di
+	   : fcode == ARM_BUILTIN_WSRAHI ? CODE_FOR_ashrv4hi3_iwmmxt
+	   : fcode == ARM_BUILTIN_WSRAW  ? CODE_FOR_ashrv2si3_di
+	   : fcode == ARM_BUILTIN_WSRAWI ? CODE_FOR_ashrv2si3_iwmmxt
+	   : fcode == ARM_BUILTIN_WSRAD  ? CODE_FOR_ashrdi3_di
+	   : fcode == ARM_BUILTIN_WSRADI ? CODE_FOR_ashrdi3_iwmmxt
+	   : fcode == ARM_BUILTIN_WSRLH  ? CODE_FOR_lshrv4hi3_di
+	   : fcode == ARM_BUILTIN_WSRLHI ? CODE_FOR_lshrv4hi3_iwmmxt
+	   : fcode == ARM_BUILTIN_WSRLW  ? CODE_FOR_lshrv2si3_di
+	   : fcode == ARM_BUILTIN_WSRLWI ? CODE_FOR_lshrv2si3_iwmmxt
+	   : fcode == ARM_BUILTIN_WSRLD  ? CODE_FOR_lshrdi3_di
+	   : fcode == ARM_BUILTIN_WSRLDI ? CODE_FOR_lshrdi3_iwmmxt
+	   : fcode == ARM_BUILTIN_WRORH  ? CODE_FOR_rorv4hi3_di
+	   : fcode == ARM_BUILTIN_WRORHI ? CODE_FOR_rorv4hi3
+	   : fcode == ARM_BUILTIN_WRORW  ? CODE_FOR_rorv2si3_di
+	   : fcode == ARM_BUILTIN_WRORWI ? CODE_FOR_rorv2si3
+	   : fcode == ARM_BUILTIN_WRORD  ? CODE_FOR_rordi3_di
+	   : fcode == ARM_BUILTIN_WRORDI ? CODE_FOR_rordi3
+	   : fcode == ARM_BUILTIN_WAND   ? CODE_FOR_iwmmxt_anddi3
+	   : fcode == ARM_BUILTIN_WANDN  ? CODE_FOR_iwmmxt_nanddi3
+	   : fcode == ARM_BUILTIN_WOR

Re: [google] Add intermediate text format for gcov (issue4595053)

2011-10-01 Thread Matt Rice

2011/6/14 Sharad Singhai (शरद सिंघई) :
> Sorry, Rietveld didn't send out the updated patch along with my mail.
> Here it is.
>

Hi, I tried this patch out on trunk it applies alright, and appears to
work fine, (haven't run the testsuite though) any plans on submitting
it for inclusion with mainline gcc?

[PATCH] Wire-up missing ARM iwmmxt intrinsics (bugs 35294, 36798, 36966)

2011-10-01 Thread Matt Turner


--- arm.c.orig  2011-05-05 04:39:40.0 -0400
+++ arm.c   2011-08-19 13:48:21.548405102 -0400
@@ -19218,7 +19218,8 @@
   || ! (*insn_data[icode].operand[0].predicate) (target, tmode))
 target = gen_reg_rtx (tmode);
 
-  gcc_assert (GET_MODE (op0) == mode0 && GET_MODE (op1) == mode1);
+  gcc_assert ((GET_MODE (op0) == mode0 || GET_MODE (op0) == VOIDmode)
+ && (GET_MODE (op1) == mode1 || GET_MODE (op1) == VOIDmode));
 
   if (! (*insn_data[icode].operand[1].predicate) (op0, mode0))
 op0 = copy_to_mode_reg (mode0, op0);
@@ -19814,6 +19815,65 @@
   emit_insn (pat);
   return target;
 
+case ARM_BUILTIN_WSLLH:
+case ARM_BUILTIN_WSLLHI:
+case ARM_BUILTIN_WSLLW:
+case ARM_BUILTIN_WSLLWI:
+case ARM_BUILTIN_WSLLD:
+case ARM_BUILTIN_WSLLDI:
+case ARM_BUILTIN_WSRAH:
+case ARM_BUILTIN_WSRAHI:
+case ARM_BUILTIN_WSRAW:
+case ARM_BUILTIN_WSRAWI:
+case ARM_BUILTIN_WSRAD:
+case ARM_BUILTIN_WSRADI:
+case ARM_BUILTIN_WSRLH:
+case ARM_BUILTIN_WSRLHI:
+case ARM_BUILTIN_WSRLW:
+case ARM_BUILTIN_WSRLWI:
+case ARM_BUILTIN_WSRLD:
+case ARM_BUILTIN_WSRLDI:
+case ARM_BUILTIN_WRORH:
+case ARM_BUILTIN_WRORHI:
+case ARM_BUILTIN_WRORW:
+case ARM_BUILTIN_WRORWI:
+case ARM_BUILTIN_WRORD:
+case ARM_BUILTIN_WRORDI:
+case ARM_BUILTIN_WAND:
+case ARM_BUILTIN_WANDN:
+case ARM_BUILTIN_WOR:
+case ARM_BUILTIN_WXOR:
+  icode = (fcode == ARM_BUILTIN_WSLLH ? CODE_FOR_ashlv4hi3_di
+  : fcode == ARM_BUILTIN_WSLLHI ? CODE_FOR_ashlv4hi3_iwmmxt
+  : fcode == ARM_BUILTIN_WSLLW  ? CODE_FOR_ashlv2si3_di
+  : fcode == ARM_BUILTIN_WSLLWI ? CODE_FOR_ashlv2si3_iwmmxt
+  : fcode == ARM_BUILTIN_WSLLD  ? CODE_FOR_ashldi3_di
+  : fcode == ARM_BUILTIN_WSLLDI ? CODE_FOR_ashldi3_iwmmxt
+  : fcode == ARM_BUILTIN_WSRAH  ? CODE_FOR_ashrv4hi3_di
+  : fcode == ARM_BUILTIN_WSRAHI ? CODE_FOR_ashrv4hi3_iwmmxt
+  : fcode == ARM_BUILTIN_WSRAW  ? CODE_FOR_ashrv2si3_di
+  : fcode == ARM_BUILTIN_WSRAWI ? CODE_FOR_ashrv2si3_iwmmxt
+  : fcode == ARM_BUILTIN_WSRAD  ? CODE_FOR_ashrdi3_di
+  : fcode == ARM_BUILTIN_WSRADI ? CODE_FOR_ashrdi3_iwmmxt
+  : fcode == ARM_BUILTIN_WSRLH  ? CODE_FOR_lshrv4hi3_di
+  : fcode == ARM_BUILTIN_WSRLHI ? CODE_FOR_lshrv4hi3_iwmmxt
+  : fcode == ARM_BUILTIN_WSRLW  ? CODE_FOR_lshrv2si3_di
+  : fcode == ARM_BUILTIN_WSRLWI ? CODE_FOR_lshrv2si3_iwmmxt
+  : fcode == ARM_BUILTIN_WSRLD  ? CODE_FOR_lshrdi3_di
+  : fcode == ARM_BUILTIN_WSRLDI ? CODE_FOR_lshrdi3_iwmmxt
+  : fcode == ARM_BUILTIN_WRORH  ? CODE_FOR_rorv4hi3_di
+  : fcode == ARM_BUILTIN_WRORHI ? CODE_FOR_rorv4hi3
+  : fcode == ARM_BUILTIN_WRORW  ? CODE_FOR_rorv2si3_di
+  : fcode == ARM_BUILTIN_WRORWI ? CODE_FOR_rorv2si3
+  : fcode == ARM_BUILTIN_WRORD  ? CODE_FOR_rordi3_di
+  : fcode == ARM_BUILTIN_WRORDI ? CODE_FOR_rordi3
+  : fcode == ARM_BUILTIN_WAND   ? CODE_FOR_iwmmxt_anddi3
+  : fcode == ARM_BUILTIN_WANDN  ? CODE_FOR_iwmmxt_nanddi3
+  : fcode == ARM_BUILTIN_WOR? CODE_FOR_iwmmxt_iordi3
+  : fcode == ARM_BUILTIN_WXOR   ? CODE_FOR_iwmmxt_xordi3
+  : CODE_FOR_rordi3);
+  return arm_expand_binop_builtin (icode, exp, target);
+
 case ARM_BUILTIN_WZERO:
   target = gen_reg_rtx (DImode);
   emit_insn (gen_iwmmxt_clrdi (target));

[PATCH 1/2] doc: Correct __builtin_arm_tinsr prototype documentation

2012-04-04 Thread Matt Turner

2012-04-04  Matt Turner  

gcc/
* doc/extend.texi (__builtin_arm_tinsrb): Add missing second
parameter.
(__builtin_arm_tinsrh): Likewise.
(__builtin_arm_tinsrw): Likewise.
---
This patch and 2/2 are tie-ons to
http://gcc.gnu.org/ml/gcc-patches/2012-02/msg01269.html

Still waiting on copyright assignment, but I think this doc patch
is trivial enough to be committed without it.

 gcc/doc/extend.texi |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index bb43825..966175d 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -8676,9 +8676,9 @@ int __builtin_arm_textrmsw (v2si, int)
 int __builtin_arm_textrmub (v8qi, int)
 int __builtin_arm_textrmuh (v4hi, int)
 int __builtin_arm_textrmuw (v2si, int)
-v8qi __builtin_arm_tinsrb (v8qi, int)
-v4hi __builtin_arm_tinsrh (v4hi, int)
-v2si __builtin_arm_tinsrw (v2si, int)
+v8qi __builtin_arm_tinsrb (v8qi, int, int)
+v4hi __builtin_arm_tinsrh (v4hi, int, int)
+v2si __builtin_arm_tinsrw (v2si, int, int)
 long long __builtin_arm_tmia (long long, int, int)
 long long __builtin_arm_tmiabb (long long, int, int)
 long long __builtin_arm_tmiabt (long long, int, int)
-- 
1.7.3.4

[PATCH] doc: Fix typo: mno-lsc -> mno-llsc

2012-04-04 Thread Matt Turner

2012-04-04  Matt Turner  

gcc/
* doc/install.texi: Correct typo "-mno-lsc" -> "-mno-llsc".
---
Still waiting on copyright assignment, but I think this doc patch
is trivial enough to be committed without it.

 gcc/doc/install.texi |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 41dbf44..6da6c09 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -1238,7 +1238,7 @@ Division by zero checks use the break instruction.
 
 @item --with-llsc
 On MIPS targets, make @option{-mllsc} the default when no
-@option{-mno-lsc} option is passed.  This is the default for
+@option{-mno-llsc} option is passed.  This is the default for
 Linux-based targets, as the kernel will emulate them if the ISA does
 not provide them.
 
-- 
1.7.3.4

[PATCH 2/2] arm: add iwMMXt mmx-2.c test

2012-04-04 Thread Matt Turner

2012-04-04  Matt Turner  

PR target/35294
* gcc.target/arm/mmx-2.c: New.
---
This patch and 1/2 are tie-ons to
http://gcc.gnu.org/ml/gcc-patches/2012-02/msg01269.html

Still waiting on copyright assignment, but please review in the meantime.

Is there anything else I need to do to wire this into the test suite
other than putting it in the testsuite/gcc.target/arm/ folder?

 gcc/testsuite/gcc.target/arm/mmx-2.c |  158 ++
 1 files changed, 158 insertions(+), 0 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/mmx-2.c

diff --git a/gcc/testsuite/gcc.target/arm/mmx-2.c 
b/gcc/testsuite/gcc.target/arm/mmx-2.c
new file mode 100644
index 000..603a63b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mmx-2.c
@@ -0,0 +1,158 @@
+/* { dg-do compile } */
+/* { dg-skip-if "Test is specific to the iWMMXt" { arm*-*-* } { "-mcpu=*" } { 
"-mcpu=iwmmxt" } } */
+/* { dg-skip-if "Test is specific to the iWMMXt" { arm*-*-* } { "-mabi=*" } { 
"-mabi=iwmmxt" } } */
+/* { dg-skip-if "Test is specific to the iWMMXt" { arm*-*-* } { "-march=*" } { 
"-march=iwmmxt" } } */
+/* { dg-skip-if "Test is specific to ARM mode" { arm*-*-* } { "-mthumb" } { "" 
} } */
+/* { dg-require-effective-target arm32 } */
+/* { dg-require-effective-target arm_iwmmxt_ok } */
+
+/* Internal data types for implementing the intrinsics.  */
+typedef int __v2si __attribute__ ((vector_size (8)));
+typedef short __v4hi __attribute__ ((vector_size (8)));
+typedef signed char __v8qi __attribute__ ((vector_size (8)));
+
+void
+foo(void)
+{
+  volatile int isink;
+  volatile long long llsink;
+  volatile __v8qi v8sink;
+  volatile __v4hi v4sink;
+  volatile __v2si v2sink;
+
+  isink = __builtin_arm_getwcx (0);
+  __builtin_arm_setwcx (isink, 0);
+  isink = __builtin_arm_textrmsb (v8sink, 0);
+  isink = __builtin_arm_textrmsh (v4sink, 0);
+  isink = __builtin_arm_textrmsw (v2sink, 0);
+  isink = __builtin_arm_textrmub (v8sink, 0);
+  isink = __builtin_arm_textrmuh (v4sink, 0);
+  isink = __builtin_arm_textrmuw (v2sink, 0);
+  v8sink = __builtin_arm_tinsrb (v8sink, isink, 0);
+  v4sink = __builtin_arm_tinsrh (v4sink, isink, 0);
+  v2sink = __builtin_arm_tinsrw (v2sink, isink, 0);
+  llsink = __builtin_arm_tmia (llsink, isink, isink);
+  llsink = __builtin_arm_tmiabb (llsink, isink, isink);
+  llsink = __builtin_arm_tmiabt (llsink, isink, isink);
+  llsink = __builtin_arm_tmiaph (llsink, isink, isink);
+  llsink = __builtin_arm_tmiatb (llsink, isink, isink);
+  llsink = __builtin_arm_tmiatt (llsink, isink, isink);
+  isink = __builtin_arm_tmovmskb (v8sink);
+  isink = __builtin_arm_tmovmskh (v4sink);
+  isink = __builtin_arm_tmovmskw (v2sink);
+  llsink = __builtin_arm_waccb (v8sink);
+  llsink = __builtin_arm_wacch (v4sink);
+  llsink = __builtin_arm_waccw (v2sink);
+  v8sink = __builtin_arm_waddb (v8sink, v8sink);
+  v8sink = __builtin_arm_waddbss (v8sink, v8sink);
+  v8sink = __builtin_arm_waddbus (v8sink, v8sink);
+  v4sink = __builtin_arm_waddh (v4sink, v4sink);
+  v4sink = __builtin_arm_waddhss (v4sink, v4sink);
+  v4sink = __builtin_arm_waddhus (v4sink, v4sink);
+  v2sink = __builtin_arm_waddw (v2sink, v2sink);
+  v2sink = __builtin_arm_waddwss (v2sink, v2sink);
+  v2sink = __builtin_arm_waddwus (v2sink, v2sink);
+  v8sink = __builtin_arm_walign (v8sink, v8sink, 0);  /* waligni: 3-bit 
immediate.  */
+  v8sink = __builtin_arm_walign (v8sink, v8sink, isink); /* walignr: GP 
register.  */
+  llsink = __builtin_arm_wand(llsink, llsink);
+  llsink = __builtin_arm_wandn (llsink, llsink);
+  v8sink = __builtin_arm_wavg2b (v8sink, v8sink);
+  v8sink = __builtin_arm_wavg2br (v8sink, v8sink);
+  v4sink = __builtin_arm_wavg2h (v4sink, v4sink);
+  v4sink = __builtin_arm_wavg2hr (v4sink, v4sink);
+  v8sink = __builtin_arm_wcmpeqb (v8sink, v8sink);
+  v4sink = __builtin_arm_wcmpeqh (v4sink, v4sink);
+  v2sink = __builtin_arm_wcmpeqw (v2sink, v2sink);
+  v8sink = __builtin_arm_wcmpgtsb (v8sink, v8sink);
+  v4sink = __builtin_arm_wcmpgtsh (v4sink, v4sink);
+  v2sink = __builtin_arm_wcmpgtsw (v2sink, v2sink);
+  v8sink = __builtin_arm_wcmpgtub (v8sink, v8sink);
+  v4sink = __builtin_arm_wcmpgtuh (v4sink, v4sink);
+  v2sink = __builtin_arm_wcmpgtuw (v2sink, v2sink);
+  llsink = __builtin_arm_wmacs (llsink, v4sink, v4sink);
+  llsink = __builtin_arm_wmacsz (v4sink, v4sink);
+  llsink = __builtin_arm_wmacu (llsink, v4sink, v4sink);
+  llsink = __builtin_arm_wmacuz (v4sink, v4sink);
+  v4sink = __builtin_arm_wmadds (v4sink, v4sink);
+  v4sink = __builtin_arm_wmaddu (v4sink, v4sink);
+  v8sink = __builtin_arm_wmaxsb (v8sink, v8sink);
+  v4sink = __builtin_arm_wmaxsh (v4sink, v4sink);
+  v2sink = __builtin_arm_wmaxsw (v2sink, v2sink);
+  v8sink = __builtin_arm_wmaxub (v8sink, v8sink);
+  v4sink = __builtin_arm_wmaxuh (v4sink, v4sink);
+  v2sink = __builtin_arm_wmaxuw (v2sink, v2sink);
+  v8sink = __

[PING] iwMMXt patches

2012-04-17 Thread Matt Turner

Are these patches ready to go in? It looks like they were ack'd.

http://gcc.gnu.org/ml/gcc-patches/2011-10/msg01815.html
http://gcc.gnu.org/ml/gcc-patches/2011-10/msg01817.html
http://gcc.gnu.org/ml/gcc-patches/2011-10/msg01816.html
http://gcc.gnu.org/ml/gcc-patches/2011-10/msg01818.html
http://gcc.gnu.org/ml/gcc-patches/2011-10/msg01819.html

We (OLPC) will need these patches for reasonable iwMMXt performance
and the ability to use VFP and iwMMXt together.

Thanks,
Matt

[PATCH 1/2] mips: Add R4600 scheduling support for imul and idiv

2012-02-24 Thread Matt Turner

The r4600_imul and r4600_idiv reservations were correct for si, but
there were no *_di reservations.

See page 4 of
http://www.sgistuff.net/hardware/other/documents/R4600_Prod_OV.pdf

2012-02-24  Matt Turner  

* config/mips/4600.md (r4600_imul_si): Rename from r4600_imul.
(r4600_imul_di): New.
(r4600_idiv_si): Rename from r4600_idiv.
(r4600_idiv_di): New.
---
 gcc/config/mips/4600.md |   24 +++-
 1 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/gcc/config/mips/4600.md b/gcc/config/mips/4600.md
index c645cbc..fcdbf00 100644
--- a/gcc/config/mips/4600.md
+++ b/gcc/config/mips/4600.md
@@ -1,5 +1,5 @@
 ;; R4600 and R4650 pipeline description.
-;;   Copyright (C) 2004, 2005, 2007 Free Software Foundation, Inc.
+;;   Copyright (C) 2004, 2005, 2007, 2012 Free Software Foundation, Inc.
 ;;
 ;; This file is part of GCC.
 
@@ -24,16 +24,30 @@
 ;; We handle the R4600 and R4650 in much the same way.  The only difference
 ;; is in the integer multiplication and division costs.
 
-(define_insn_reservation "r4600_imul" 10
+(define_insn_reservation "r4600_imul_si" 10
   (and (eq_attr "cpu" "r4600")
-   (eq_attr "type" "imul,imul3,imadd"))
+   (eq_attr "type" "imul,imul3,imadd")
+   (eq_attr "mode" "SI"))
   "imuldiv*10")
 
-(define_insn_reservation "r4600_idiv" 42
+(define_insn_reservation "r4600_imul_di" 12
   (and (eq_attr "cpu" "r4600")
-   (eq_attr "type" "idiv"))
+   (eq_attr "type" "imul,imul3,imadd")
+   (eq_attr "mode" "DI"))
+  "imuldiv*12")
+
+(define_insn_reservation "r4600_idiv_si" 42
+  (and (eq_attr "cpu" "r4600")
+   (eq_attr "type" "idiv")
+   (eq_attr "mode" "SI"))
   "imuldiv*42")
 
+(define_insn_reservation "r4600_idiv_di" 74
+  (and (eq_attr "cpu" "r4600")
+   (eq_attr "type" "idiv")
+   (eq_attr "mode" "DI"))
+  "imuldiv*74")
+
 
 (define_insn_reservation "r4650_imul" 4
   (and (eq_attr "cpu" "r4650")
-- 
1.7.3.4

Miscellaneous mips, arm, and alpha patches

2012-02-24 Thread Matt Turner

Hi,

Following this email are five rather trivial patches that I've had
sitting around while waiting for my grad school and the Free Software
Foundation to decide it's okay for me to contribute. I don't have
copyright assignment for gcc yet, but I thought I would pipeline this
process and try to get the patches at least reviewed before the
paperwork is completed. If they're trivial enough to be committed
without copyright assignment, I'd love for them to be committed for
gcc 4.8.

The patches are

[PATCH 1/2] mips: Add R4600 scheduling support for imul and idiv
[PATCH 2/2] mips: Add R4700 scheduling support
[PATCH] arm: Fix iwmmxt shift and logical intrinsics (PR 35294)
[PATCH] arm: add _mm_empty to mmintrin.h for source compatibility
[PATCH] alpha: add bypasses for fmul/fadd/fcmov -> fst/ftoi

I have not contributed to gcc before, so please tell me if I've missed
a step or didn't format the ChangeLog entries properly, and so forth.
Please CC me on replies.

Thanks,
Matt Turner

[PATCH] alpha: add bypasses for fmul/fadd/fcmov -> fst/ftoi

2012-02-24 Thread Matt Turner

See section 2.5.3 (page 28) of
http://download.majix.org/dec/comp_guide_v2.pdf

2012-02-24  Matt Turner  

* config/alpha/ev6.md: (define_bypass "ev6_fmul,ev6_fadd"): New.
(define_bypass "ev6_fcmov"): New.
---
 gcc/config/alpha/ev6.md |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/gcc/config/alpha/ev6.md b/gcc/config/alpha/ev6.md
index adfe504..a16535a 100644
--- a/gcc/config/alpha/ev6.md
+++ b/gcc/config/alpha/ev6.md
@@ -147,11 +147,15 @@
(eq_attr "type" "fadd,fcpys,fbr"))
   "ev6_fa")
 
+(define_bypass 6 "ev6_fmul,ev6_fadd" "ev6_fst,ev6_ftoi")
+
 (define_insn_reservation "ev6_fcmov" 8
   (and (eq_attr "tune" "ev6")
(eq_attr "type" "fcmov"))
   "ev6_fa,nothing*3,ev6_fa")
 
+(define_bypass 10 "ev6_fcmov" "ev6_fst,ev6_ftoi")
+
 (define_insn_reservation "ev6_fdivsf" 12
   (and (eq_attr "tune" "ev6")
(and (eq_attr "type" "fdiv")
-- 
1.7.3.4

[PATCH 2/2] mips: Add R4700 scheduling support

2012-02-24 Thread Matt Turner

The R4700 is identical to the R4600 except for the integer and
floating-point multiplication costs.

See page 4 of http://datasheets.chipdb.org/IDT/MIPS/79RV4700.pdf

2012-02-24  Matt Turner  

* config/mips/4600.md (r4700_imul_si): New.
(r4700_imul_di): New.
(r4700_fmul_single): New.
(r4700_fmul_double): New.
* config/mips/driver-native.c (cpu_types): Add r4700.
* config/mips/mips-cpus.def: Likewise.
* config/mips/mips.c: Likewise.
* config/mips/mips.md: Likewise.
---
 gcc/config/mips/4600.md |   51 ++
 gcc/config/mips/driver-native.c |2 +-
 gcc/config/mips/mips-cpus.def   |1 +
 gcc/config/mips/mips.c  |3 ++
 gcc/config/mips/mips.md |1 +
 5 files changed, 46 insertions(+), 12 deletions(-)

diff --git a/gcc/config/mips/4600.md b/gcc/config/mips/4600.md
index fcdbf00..ef74fd3 100644
--- a/gcc/config/mips/4600.md
+++ b/gcc/config/mips/4600.md
@@ -1,4 +1,4 @@
-;; R4600 and R4650 pipeline description.
+;; R4600, R4650, and R4700 pipeline description.
 ;;   Copyright (C) 2004, 2005, 2007, 2012 Free Software Foundation, Inc.
 ;;
 ;; This file is part of GCC.
@@ -21,8 +21,10 @@
 ;; This file overrides parts of generic.md.  It is derived from the
 ;; old define_function_unit description.
 ;;
-;; We handle the R4600 and R4650 in much the same way.  The only difference
-;; is in the integer multiplication and division costs.
+;; We handle the R4600, R4650, and R4700 in much the same way.  The only
+;; differences between R4600 and R4650 are the integer multiplication and
+;; division costs. The only differences between R4600 and R4700 are the
+;; integer and floating-point multiplication costs.
 
 (define_insn_reservation "r4600_imul_si" 10
   (and (eq_attr "cpu" "r4600")
@@ -37,13 +39,13 @@
   "imuldiv*12")
 
 (define_insn_reservation "r4600_idiv_si" 42
-  (and (eq_attr "cpu" "r4600")
+  (and (eq_attr "cpu" "r4600,r4700")
(eq_attr "type" "idiv")
(eq_attr "mode" "SI"))
   "imuldiv*42")
 
 (define_insn_reservation "r4600_idiv_di" 74
-  (and (eq_attr "cpu" "r4600")
+  (and (eq_attr "cpu" "r4600,r4700")
(eq_attr "type" "idiv")
(eq_attr "mode" "DI"))
   "imuldiv*74")
@@ -60,13 +62,26 @@
   "imuldiv*36")
 
 
+(define_insn_reservation "r4700_imul_si" 8
+  (and (eq_attr "cpu" "r4700")
+   (eq_attr "type" "imul,imul3,imadd")
+   (eq_attr "mode" "SI"))
+  "imuldiv*8")
+
+(define_insn_reservation "r4700_imul_di" 10
+  (and (eq_attr "cpu" "r4700")
+   (eq_attr "type" "imul,imul3,imadd")
+   (eq_attr "mode" "DI"))
+  "imuldiv*10")
+
+
 (define_insn_reservation "r4600_load" 2
-  (and (eq_attr "cpu" "r4600,r4650")
+  (and (eq_attr "cpu" "r4600,r4650,r4700")
(eq_attr "type" "load,fpload,fpidxload"))
   "alu")
 
 (define_insn_reservation "r4600_fmove" 1
-  (and (eq_attr "cpu" "r4600,r4650")
+  (and (eq_attr "cpu" "r4600,r4650,r4700")
(eq_attr "type" "fabs,fneg,fmove"))
   "alu")
 
@@ -76,26 +91,40 @@
(eq_attr "mode" "SF")))
   "alu")
 
+
+(define_insn_reservation "r4700_fmul_single" 4
+  (and (eq_attr "cpu" "r4700")
+   (and (eq_attr "type" "fmul,fmadd")
+   (eq_attr "mode" "SF")))
+  "alu")
+
+(define_insn_reservation "r4700_fmul_double" 5
+  (and (eq_attr "cpu" "r4700")
+   (and (eq_attr "type" "fmul,fmadd")
+   (eq_attr "mode" "DF")))
+  "alu")
+
+
 (define_insn_reservation "r4600_fdiv_single" 32
-  (and (eq_attr "cpu" "r4600,r4650")
+  (and (eq_attr "cpu" "r4600,r4650,r4700")
(and (eq_attr "type" "fdiv,frdiv")
(eq_attr "mode" "SF")))
   "alu")
 
 (define_insn_reservation "r4600_fdiv_double" 61
-  (and (eq_attr "cpu" "r4600,r4650")
+  (and (eq_attr "cpu" "r4600,r4650,r4700")
(and (eq_attr "type" "fdiv,frdiv")
(eq_attr "mode" "DF")))
   "alu")
 
 (define_insn_reservation "r4600_fsqrt_single" 31
-  (and (eq_attr "cpu" "r4600,r4650")
+  (and (eq_attr "cpu" "r4600,r4

[PATCH] arm: Fix iwmmxt shift and logical intrinsics (PR 35294).

2012-02-24 Thread Matt Turner

PR 36798 and 36966 are duplicates.

2012-02-24  Matt Turner  

PR target/35294
* config/arm/arm.c (arm_expand_builtin): Wire up missing
intrinsics.
---
 gcc/config/arm/arm.c |   62 +-
 1 files changed, 61 insertions(+), 1 deletions(-)

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 7f0dc6b..f5935d6 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -20502,7 +20502,8 @@ arm_expand_binop_builtin (enum insn_code icode,
   || ! (*insn_data[icode].operand[0].predicate) (target, tmode))
 target = gen_reg_rtx (tmode);
 
-  gcc_assert (GET_MODE (op0) == mode0 && GET_MODE (op1) == mode1);
+  gcc_assert ((GET_MODE (op0) == mode0 || GET_MODE (op0) == VOIDmode)
+ && (GET_MODE (op1) == mode1 || GET_MODE (op1) == VOIDmode));
 
   if (! (*insn_data[icode].operand[1].predicate) (op0, mode0))
 op0 = copy_to_mode_reg (mode0, op0);
@@ -21181,6 +21182,65 @@ arm_expand_builtin (tree exp,
   emit_insn (pat);
   return target;
 
+case ARM_BUILTIN_WSLLH:
+case ARM_BUILTIN_WSLLHI:
+case ARM_BUILTIN_WSLLW:
+case ARM_BUILTIN_WSLLWI:
+case ARM_BUILTIN_WSLLD:
+case ARM_BUILTIN_WSLLDI:
+case ARM_BUILTIN_WSRAH:
+case ARM_BUILTIN_WSRAHI:
+case ARM_BUILTIN_WSRAW:
+case ARM_BUILTIN_WSRAWI:
+case ARM_BUILTIN_WSRAD:
+case ARM_BUILTIN_WSRADI:
+case ARM_BUILTIN_WSRLH:
+case ARM_BUILTIN_WSRLHI:
+case ARM_BUILTIN_WSRLW:
+case ARM_BUILTIN_WSRLWI:
+case ARM_BUILTIN_WSRLD:
+case ARM_BUILTIN_WSRLDI:
+case ARM_BUILTIN_WRORH:
+case ARM_BUILTIN_WRORHI:
+case ARM_BUILTIN_WRORW:
+case ARM_BUILTIN_WRORWI:
+case ARM_BUILTIN_WRORD:
+case ARM_BUILTIN_WRORDI:
+case ARM_BUILTIN_WAND:
+case ARM_BUILTIN_WANDN:
+case ARM_BUILTIN_WOR:
+case ARM_BUILTIN_WXOR:
+  icode = (fcode == ARM_BUILTIN_WSLLH ? CODE_FOR_ashlv4hi3_di
+  : fcode == ARM_BUILTIN_WSLLHI ? CODE_FOR_ashlv4hi3_iwmmxt
+  : fcode == ARM_BUILTIN_WSLLW  ? CODE_FOR_ashlv2si3_di
+  : fcode == ARM_BUILTIN_WSLLWI ? CODE_FOR_ashlv2si3_iwmmxt
+  : fcode == ARM_BUILTIN_WSLLD  ? CODE_FOR_ashldi3_di
+  : fcode == ARM_BUILTIN_WSLLDI ? CODE_FOR_ashldi3_iwmmxt
+  : fcode == ARM_BUILTIN_WSRAH  ? CODE_FOR_ashrv4hi3_di
+  : fcode == ARM_BUILTIN_WSRAHI ? CODE_FOR_ashrv4hi3_iwmmxt
+  : fcode == ARM_BUILTIN_WSRAW  ? CODE_FOR_ashrv2si3_di
+  : fcode == ARM_BUILTIN_WSRAWI ? CODE_FOR_ashrv2si3_iwmmxt
+  : fcode == ARM_BUILTIN_WSRAD  ? CODE_FOR_ashrdi3_di
+  : fcode == ARM_BUILTIN_WSRADI ? CODE_FOR_ashrdi3_iwmmxt
+  : fcode == ARM_BUILTIN_WSRLH  ? CODE_FOR_lshrv4hi3_di
+  : fcode == ARM_BUILTIN_WSRLHI ? CODE_FOR_lshrv4hi3_iwmmxt
+  : fcode == ARM_BUILTIN_WSRLW  ? CODE_FOR_lshrv2si3_di
+  : fcode == ARM_BUILTIN_WSRLWI ? CODE_FOR_lshrv2si3_iwmmxt
+  : fcode == ARM_BUILTIN_WSRLD  ? CODE_FOR_lshrdi3_di
+  : fcode == ARM_BUILTIN_WSRLDI ? CODE_FOR_lshrdi3_iwmmxt
+  : fcode == ARM_BUILTIN_WRORH  ? CODE_FOR_rorv4hi3_di
+  : fcode == ARM_BUILTIN_WRORHI ? CODE_FOR_rorv4hi3
+  : fcode == ARM_BUILTIN_WRORW  ? CODE_FOR_rorv2si3_di
+  : fcode == ARM_BUILTIN_WRORWI ? CODE_FOR_rorv2si3
+  : fcode == ARM_BUILTIN_WRORD  ? CODE_FOR_rordi3_di
+  : fcode == ARM_BUILTIN_WRORDI ? CODE_FOR_rordi3
+  : fcode == ARM_BUILTIN_WAND   ? CODE_FOR_iwmmxt_anddi3
+  : fcode == ARM_BUILTIN_WANDN  ? CODE_FOR_iwmmxt_nanddi3
+  : fcode == ARM_BUILTIN_WOR? CODE_FOR_iwmmxt_iordi3
+  : fcode == ARM_BUILTIN_WXOR   ? CODE_FOR_iwmmxt_xordi3
+  : CODE_FOR_rordi3);
+  return arm_expand_binop_builtin (icode, exp, target);
+
 case ARM_BUILTIN_WZERO:
   target = gen_reg_rtx (DImode);
   emit_insn (gen_iwmmxt_clrdi (target));
-- 
1.7.3.4

[PATCH] arm: add _mm_empty to mmintrin.h for source compatibility

2012-02-24 Thread Matt Turner

The x86/amd64 mmintrin.h provides the _mm_empty intrinsic for the 'emms'
MMX instruction. Although ARM does not need such an instruction, we
should provide an empty _mm_empty function nonetheless for source
compatibility.

2012-02-24  Matt Turner  

* config/arm/mmintrin.h (_mm_empty): New.
---
 gcc/config/arm/mmintrin.h |7 +++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/gcc/config/arm/mmintrin.h b/gcc/config/arm/mmintrin.h
index 2cc500d..ea73bf1 100644
--- a/gcc/config/arm/mmintrin.h
+++ b/gcc/config/arm/mmintrin.h
@@ -32,6 +32,12 @@ typedef int __v2si __attribute__ ((vector_size (8)));
 typedef short __v4hi __attribute__ ((vector_size (8)));
 typedef char __v8qi __attribute__ ((vector_size (8)));
 
+/* Provided for source compatibility with MMX.  */
+extern __inline void __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_empty (void)
+{
+}
+
 /* "Convert" __m64 and __int64 into each other.  */
 static __inline __m64 
 _mm_cvtsi64_m64 (__int64 __i)
@@ -1248,6 +1254,7 @@ _m_from_int (int __a)
 #define _m_psadzbw _mm_sadz_pu8
 #define _m_psadzwd _mm_sadz_pu16
 #define _m_paligniq _mm_align_si64
+#define _m_empty _mm_empty
 #define _m_cvt_si2pi _mm_cvtsi64_m64
 #define _m_cvt_pi2si _mm_cvtm64_si64
 
-- 
1.7.3.4

Patch for bugzilla ticket 117366

2025-06-11 Thread Matt Parks

See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117366 where it was suggested 
I send this bug fix as a patch to gcc-patches. I have never submitted to this 
alias so I apologize if I am not "doing it right", but here is the diff, and 
the Bugzilla ticket has an example written up with good/bad assembler code 
output. As the ticket notes, it has been a problem for years and should be 
backported to the latest bugfix releases of older supported versions as well. 
This patch reference is for master as of today, but the base code is the same 
since gcc 10.
 
index bde06f3fa86..d6d8d720b67 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -26747,7 +26747,7 @@ thumb1_extra_regs_pushed (arm_stack_offsets *offsets, 
bool for_prologue)
 }
   while (reg_base + n_free < 8 && !(live_regs_mask & 1)
-&& (for_prologue || call_used_or_fixed_reg_p (reg_base + n_free)))
+&& (for_prologue || (call_used_or_fixed_reg_p (reg_base + n_free) && 
!fixed_reg[reg_base + n_free])))
 {
   live_regs_mask >>= 1;
   n_free++;

[PATCH] config/arm/arm.cc thumbv1 fixes - Patch for bugzilla tickets 117366 and 117468

2025-06-12 Thread Matt Parks

This can replace my earlier e-mail which addressed only ticket 117366. Since 
both the bugs are in one file and somewhat related, here's a larger patch that 
address both bugzilla tickets 117366 and 117468 (see 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117468):
 
index bde06f3fa86..742b0904612 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -8274,7 +8274,7 @@ thumb1_prologue_unused_call_clobbered_lo_regs (void)
   bitmap prologue_live_out = df_get_live_out (ENTRY_BLOCK_PTR_FOR_FN (cfun));
   for (int reg = FIRST_LO_REGNUM; reg <= LAST_LO_REGNUM; reg++)
-if (!callee_saved_reg_p (reg) && !REGNO_REG_SET_P (prologue_live_out, reg))
+if (!callee_saved_reg_p (reg) && !REGNO_REG_SET_P (prologue_live_out, reg) 
&& !fixed_regs[reg])
   mask |= 1 << (reg - FIRST_LO_REGNUM);
   return mask;
 }
@@ -8287,7 +8287,7 @@ thumb1_epilogue_unused_call_clobbered_lo_regs (void)
   bitmap epilogue_live_in = df_get_live_in (EXIT_BLOCK_PTR_FOR_FN (cfun));
   for (int reg = FIRST_LO_REGNUM; reg <= LAST_LO_REGNUM; reg++)
-if (!callee_saved_reg_p (reg) && !REGNO_REG_SET_P (epilogue_live_in, reg))
+if (!callee_saved_reg_p (reg) && !REGNO_REG_SET_P (epilogue_live_in, reg) 
&& !fixed_regs[reg])
   mask |= 1 << (reg - FIRST_LO_REGNUM);
   return mask;
 }
@@ -26746,8 +26746,8 @@ thumb1_extra_regs_pushed (arm_stack_offsets *offsets, 
bool for_prologue)
   live_regs_mask >>= reg_base;
 }
-  while (reg_base + n_free < 8 && !(live_regs_mask & 1)
-&& (for_prologue || call_used_or_fixed_reg_p (reg_base + n_free)))
+  while (reg_base + n_free <= LAST_LO_REGNUM && !(live_regs_mask & 1)
+&& (for_prologue || (call_used_or_fixed_reg_p (reg_base + n_free) && 
!fixed_reg[reg_base + n_free])))
 {
   live_regs_mask >>= 1;
   n_free++;

> On 06/11/2025 7:13 PM EDT Matt Parks  wrote:
>  
>  
> See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117366 where it was 
> suggested I send this bug fix as a patch to gcc-patches. I have never 
> submitted to this alias so I apologize if I am not "doing it right", but here 
> is the diff, and the Bugzilla ticket has an example written up with good/bad 
> assembler code output. As the ticket notes, it has been a problem for years 
> and should be backported to the latest bugfix releases of older supported 
> versions as well. This patch reference is for master as of today, but the 
> base code is the same since gcc 10.
>

Re: [PATCH] Objective-C: don't require redundant -fno-objc-sjlj-exceptions for the NeXT v2 ABI

2021-08-02 Thread Matt Jacobson via Gcc-patches




> On Aug 2, 2021, at 5:09 PM, Eric Gallager  wrote:
> 
> On Wed, Jul 28, 2021 at 11:36 PM Matt Jacobson via Gcc-patches
>  wrote:
>> 
>> As is, an invocation of GCC with -fnext-runtime -fobjc-abi-version=2 crashes,
>> unless target-specific code adds an implicit -fno-objc-sjlj-exceptions (which
>> Darwin does).
>> 
>> This patch makes the general case not crash.
>> 
>> I don't have commit access, so if this patch is suitable, I'd need someone 
>> else
>> to commit it for me.  Thanks.
> 
> Is there a bug open for the issue that this fixes? Just wondering for
> cross-referencing purposes...

No, I didn’t file a bug for this one, just sent the patch directly.  Hope 
that’s OK.  If not, happy to file one.

Matt

Re: [PATCH] Objective-C: don't require redundant -fno-objc-sjlj-exceptions for the NeXT v2 ABI

2021-08-10 Thread Matt Jacobson via Gcc-patches




> On Aug 3, 2021, at 2:39 PM, Iain Sandoe  wrote:
> 
> 
> 
>> On 2 Aug 2021, at 22:37, Matt Jacobson via Gcc-patches 
>>  wrote:
>> 
>>> On Aug 2, 2021, at 5:09 PM, Eric Gallager  wrote:
>>> 
>>> On Wed, Jul 28, 2021 at 11:36 PM Matt Jacobson via Gcc-patches
>>>  wrote:
>>>> 
>>>> As is, an invocation of GCC with -fnext-runtime -fobjc-abi-version=2 
>>>> crashes,
>>>> unless target-specific code adds an implicit -fno-objc-sjlj-exceptions 
>>>> (which
>>>> Darwin does).
>>>> 
>>>> This patch makes the general case not crash.
>>>> 
>>>> I don't have commit access, so if this patch is suitable, I'd need someone 
>>>> else
>>>> to commit it for me.  Thanks.
>>> 
>>> Is there a bug open for the issue that this fixes? Just wondering for
>>> cross-referencing purposes...
>> 
>> No, I didn’t file a bug for this one, just sent the patch directly.  Hope 
>> that’s OK.  If not, happy to file one.
> 
> I have this on my TODO (and in my “to apply” patch queue - IMO it’s OK as an 
> interim
> solution - but I think in the longer term it would be better to make 
> fobjc-sjlj-exceptions
> into a NOP, since the exception models are fixed for NeXT runtime (unless you 
> have
> some intent to update the 32bit one to use DWARF unwinding ;-) ).

Thanks.

It certainly isn’t crystal clear just from the diff in the mail, but with this 
patch, -fobjc-sjlj-exceptions *is* essentially a no-op (modulo a small warning) 
under NeXT v2.

Prior to this patch, it’s also a no-op, but (a) it’s initially on by default 
for NeXT v2, which (b) causes a crash unless `-fobjc-exceptions` is also 
specified.

Matt

[PATCH] Objective-C: fix crash with -fobjc-nilcheck

2021-08-14 Thread Matt Jacobson via Gcc-patches

When -fobjc-nilcheck is enabled, messages that result in a struct type should 
yield a zero-initialized struct when sent to nil.  Currently, the frontend 
crashes when it encounters this situation.  This patch fixes the crash by 
generating the tree for the `{}` initializer.

Tested by running the frontend against the example in PR101666 and inspecting 
the generated code.

I don't have commit access, so if this patch is suitable, I'd need someone else 
to commit it for me.  Thanks.


gcc/objc/ChangeLog:

2021-08-14  Matt Jacobson  

PR objc/101666
* objc-next-runtime-abi-02.c (build_v2_objc_method_fixup_call): Fix 
crash.
(build_v2_build_objc_method_call): Fix crash.


diff --git a/gcc/objc/objc-next-runtime-abi-02.c 
b/gcc/objc/objc-next-runtime-abi-02.c
index e391ee527ce..42645e22316 100644
--- a/gcc/objc/objc-next-runtime-abi-02.c
+++ b/gcc/objc/objc-next-runtime-abi-02.c
@@ -1676,11 +1676,7 @@ build_v2_objc_method_fixup_call (int super_flag, tree 
method_prototype,
   if (TREE_CODE (ret_type) == RECORD_TYPE
  || TREE_CODE (ret_type) == UNION_TYPE)
{
- vec *rtt = NULL;
- /* ??? CHECKME. hmmm. think we need something more
-here.  */
- CONSTRUCTOR_APPEND_ELT (rtt, NULL_TREE, NULL_TREE);
- ftree = objc_build_constructor (ret_type, rtt);
+ ftree = objc_build_constructor (ret_type, NULL);
}
   else
ftree = fold_convert (ret_type, integer_zero_node);
@@ -1790,11 +1786,7 @@ build_v2_build_objc_method_call (int super, tree 
method_prototype,
   if (TREE_CODE (ret_type) == RECORD_TYPE
  || TREE_CODE (ret_type) == UNION_TYPE)
{
- vec *rtt = NULL;
- /* ??? CHECKME. hmmm. think we need something more
-here.  */
- CONSTRUCTOR_APPEND_ELT (rtt, NULL_TREE, NULL_TREE);
- ftree = objc_build_constructor (ret_type, rtt);
+ ftree = objc_build_constructor (ret_type, NULL);
}
   else
ftree = fold_convert (ret_type, integer_zero_node);

Re: [PATCH] Objective-C: fix crash with -fobjc-nilcheck

2021-08-14 Thread Matt Jacobson via Gcc-patches




> On Aug 14, 2021, at 5:25 AM, Iain Sandoe  wrote:
> 
> 1/ please can you either post using a mailer that doesn’t mangle patches or 
> put the patch as a plain text attachment
>  (pushing to a git branch somewhere public also works for me, but maybe not 
> for all reviewers)
>   - for small patches like this I can obviously fix things up by hand, but 
> for anything larger not a good idea.
> 
> 2/ since this is fixing a crashing case, we should add a test to the test 
> suite for it (and also check the corresponding objective-c++).

Sorry for the broken patch.  I *think* this one should apply cleanly.  If not, 
I’ve also pushed the change to branch "objc-fix-struct-nil-check-10.3.0” of 
, viewable at:

<https://github.com/mhjacobson/gcc/commit/5f158dc5f15fcbeae6163cc46cc520df8369681e>

I’ve also added a test specifically for this bug and in the process added 
-fobjc-nilcheck to the compiler invocation in objc-torture.exp.  Let me know 
what you think.

I’m not sure what you mean w.r.t. Objective-C++ -- can you explain?


gcc/testsuite/ChangeLog:

2021-08-14  Matt Jacobson  

PR objc/101666
* lib/objc-torture.exp: Test -fobjc-nilcheck when supported by target.
* objc/compile/pr101666.m: New test.


gcc/objc/ChangeLog:

2021-08-14  Matt Jacobson  

PR objc/101666
* objc-next-runtime-abi-02.c (build_v2_objc_method_fixup_call): Fix 
crash.
(build_v2_build_objc_method_call): Fix crash.


diff --git a/gcc/objc/objc-next-runtime-abi-02.c 
b/gcc/objc/objc-next-runtime-abi-02.c
index 66c13ad0db2..192731ff954 100644
--- a/gcc/objc/objc-next-runtime-abi-02.c
+++ b/gcc/objc/objc-next-runtime-abi-02.c
@@ -1676,11 +1676,7 @@ build_v2_objc_method_fixup_call (int super_flag, tree 
method_prototype,
   if (TREE_CODE (ret_type) == RECORD_TYPE
  || TREE_CODE (ret_type) == UNION_TYPE)
{
- vec *rtt = NULL;
- /* ??? CHECKME. hmmm. think we need something more
-here.  */
- CONSTRUCTOR_APPEND_ELT (rtt, NULL_TREE, NULL_TREE);
- ftree = objc_build_constructor (ret_type, rtt);
+ ftree = objc_build_constructor (ret_type, NULL);
}
   else
ftree = fold_convert (ret_type, integer_zero_node);
@@ -1790,11 +1786,7 @@ build_v2_build_objc_method_call (int super, tree 
method_prototype,
   if (TREE_CODE (ret_type) == RECORD_TYPE
  || TREE_CODE (ret_type) == UNION_TYPE)
{
- vec *rtt = NULL;
- /* ??? CHECKME. hmmm. think we need something more
-here.  */
- CONSTRUCTOR_APPEND_ELT (rtt, NULL_TREE, NULL_TREE);
- ftree = objc_build_constructor (ret_type, rtt);
+ ftree = objc_build_constructor (ret_type, NULL);
}
   else
ftree = fold_convert (ret_type, integer_zero_node);
diff --git a/gcc/testsuite/lib/objc-torture.exp 
b/gcc/testsuite/lib/objc-torture.exp
index 9aa5792f656..58c4b86f840 100644
--- a/gcc/testsuite/lib/objc-torture.exp
+++ b/gcc/testsuite/lib/objc-torture.exp
@@ -30,7 +30,11 @@ proc objc-set-runtime-options { dowhat args } {
 # that Darwin uses.  If NeXT is ported to another target, then it should
 # be listed here.
 if [istarget *-*-darwin*] {
-  lappend rtlist "-fnext-runtime" 
+  if { [istarget *64-*-*] || [istarget arm-*-*] } {
+   lappend rtlist "-fnext-runtime -fobjc-abi-version=2 -fobjc-nilcheck"
+  } else {
+   lappend rtlist "-fnext-runtime -fobjc-abi-version=1"
+  }
 }
 if [info exists OBJC_RUNTIME_OPTIONS] {
   foreach other $OBJC_RUNTIME_OPTIONS {
diff --git a/gcc/testsuite/objc/compile/pr101666.m 
b/gcc/testsuite/objc/compile/pr101666.m
new file mode 100644
index 000..bfde52d3b35
--- /dev/null
+++ b/gcc/testsuite/objc/compile/pr101666.m
@@ -0,0 +1,15 @@
+struct point { double x, y, z; };
+
+@interface Foo
+
+- (struct point)bar;
+
+@end
+
+Foo *f;
+
+int
+main(void)
+{
+  struct point p = [f bar];
+}

[PATCH] Objective-C: fix class_ro layout for non-LP64

2021-09-21 Thread Matt Jacobson via Gcc-patches

Fix class_ro layout for non-LP64.  On LP64, the requisite padding is added at a
lower level.  For non-LP64, this fixes binary compatibility with clang-built
classes/runtimes.

Tested by examining the generated assembly for a class_ro in both cases (and in 
the case of clang), for both x86_64 (64-bit pointers) and AVR (16-bit pointers).
Tested by running a program on AVR with a GCC-built class using a clang-built 
Objective-C runtime.  Tested by running a program on x86_64/Darwin with a GCC-
built class and the clang-built system Objective-C runtime.

Patch also available at:
<https://github.com/mhjacobson/gcc/commit/917dc8bb2f3265c2ca899ad750c5833b0161a11e>

I don't have commit access, so if this patch is suitable, I'd need someone else 
to commit it for me.  Thanks.


gcc/objc/ChangeLog:

2021-09-21  Matt Jacobson  

* objc-next-runtime-abi-02.c (struct class_ro_t): Remove explicit 
alignment 
padding.
(build_v2_class_templates): Remove explicit alignment padding.
(build_v2_class_ro_t_initializer): Adjust initializer.


diff --git a/gcc/objc/objc-next-runtime-abi-02.c 
b/gcc/objc/objc-next-runtime-abi-02.c
index 42645e22316..c3af369ff0d 100644
--- a/gcc/objc/objc-next-runtime-abi-02.c
+++ b/gcc/objc/objc-next-runtime-abi-02.c
@@ -632,9 +632,7 @@ struct class_ro_t
 uint32_t const flags;
 uint32_t const instanceStart;
 uint32_t const instanceSize;
-#ifdef __LP64__
-uint32_t const reserved;
-#endif
+// [32 bits of reserved space here on LP64 platforms]
 const uint8_t * const ivarLayout;
 const char *const name;
 const struct method_list_t * const baseMethods;
@@ -677,11 +675,6 @@ build_v2_class_templates (void)
   /* uint32_t const instanceSize; */
   add_field_decl (integer_type_node, "instanceSize", &chain);
 
-  /* This ABI is currently only used on m64 NeXT.  We always
- explicitly declare the alignment padding.  */
-  /* uint32_t const reserved; */
-  add_field_decl (integer_type_node, "reserved", &chain);
-
   /* const uint8_t * const ivarLayout; */
   cnst_strg_type = build_pointer_type (unsigned_char_type_node);
   add_field_decl (cnst_strg_type, "ivarLayout", &chain);
@@ -3225,12 +3218,6 @@ build_v2_class_ro_t_initializer (tree type, tree name,
   CONSTRUCTOR_APPEND_ELT (initlist, NULL_TREE,
  build_int_cst (integer_type_node, instanceSize));
 
-  /* This ABI is currently only used on m64 NeXT.  We always
- explicitly declare the alignment padding.  */
-  /* reserved, pads alignment.  */
-  CONSTRUCTOR_APPEND_ELT (initlist, NULL_TREE,
-   build_int_cst (integer_type_node, 0));
-
   /* ivarLayout */
   unsigned_char_star = build_pointer_type (unsigned_char_type_node);
   if (ivarLayout)

Re: [PATCH] Objective-C: fix class_ro layout for non-LP64

2021-09-26 Thread Matt Jacobson via Gcc-patches

Hi Iain,

Thanks for reviewing.  I’m happy to make the suggested changes.  One comment 
inline.

> On Sep 22, 2021, at 2:49 PM, Iain Sandoe  wrote:
> 
> However, the behaviour is changed - the existing implementation is explicit 
> about the fields and
> clears the reserved ones (and, ISTR, that was based on what the gcc-4.2.1 
> compiler did).

My original change does in fact clear the reserved bytes on LP64 platforms.  
The padding space compiles down to a `.space` assembler directive, and GNU as 
is documented to fill that space with zeros.  So the reserved bits are indeed 
cleared.

However, I understand the argument that this is too implicit, in that the C 
standard makes no guarantee about the contents of padding bytes.  So future 
standard-conforming changes to GCC *could* cause that space to be filled with 
other values (however unlikely that may actually be).

(Of course, clang -- which also does not explicitly declare this field --
essentially faces this same theoretical peril...)

One problem with the proposed diff: `__LP64__` there refers to the host, not 
the target.  What's the right way to refer to the LP64-ness of the target?  I 
see `TARGET_LP64`, but it's only defined for Intel.  I'm using it below (and 
backstopping it to zero), but I'm not sure if that's correct.  Note that it's a 
run-time-of-compiler (not build-time-of-compiler) check.

===

Here's v2.

<https://github.com/mhjacobson/gcc/commit/8193903a1d5a1569a6799174e13cb22925f1f428>

gcc/objc/ChangeLog:

2021-09-26  Matt Jacobson  

* objc-next-runtime-abi-02.c (build_v2_class_templates): Remove explicit
padding on non-LP64.
(build_v2_class_ro_t_initializer): Remove initialization of explicit 
padding on
non-LP64.

diff --git a/gcc/objc/objc-next-runtime-abi-02.c 
b/gcc/objc/objc-next-runtime-abi-02.c
index 42645e22316..22d5232614d 100644
--- a/gcc/objc/objc-next-runtime-abi-02.c
+++ b/gcc/objc/objc-next-runtime-abi-02.c
@@ -85,6 +85,10 @@ along with GCC; see the file COPYING3.  If not see

 #define OBJC2_CLS_HAS_CXX_STRUCTORS0x0004L

+#ifndef TARGET_LP64
+#define TARGET_LP64 0
+#endif
+
 enum objc_v2_tree_index
 {
   /* Templates.  */
@@ -677,10 +681,12 @@ build_v2_class_templates (void)
   /* uint32_t const instanceSize; */
   add_field_decl (integer_type_node, "instanceSize", &chain);

-  /* This ABI is currently only used on m64 NeXT.  We always
- explicitly declare the alignment padding.  */
+  /* For compatibility with existing implementations of the 64-bit NeXT v2
+ ABI, explicitly declare reserved fields that otherwise would be filled
+ with alignment padding. */
   /* uint32_t const reserved; */
-  add_field_decl (integer_type_node, "reserved", &chain);
+  if (TARGET_LP64)
+add_field_decl (integer_type_node, "reserved", &chain);

   /* const uint8_t * const ivarLayout; */
   cnst_strg_type = build_pointer_type (unsigned_char_type_node);
@@ -3225,10 +3231,12 @@ build_v2_class_ro_t_initializer (tree type, tree name,
   CONSTRUCTOR_APPEND_ELT (initlist, NULL_TREE,
  build_int_cst (integer_type_node, instanceSize));

-  /* This ABI is currently only used on m64 NeXT.  We always
- explicitly declare the alignment padding.  */
-  /* reserved, pads alignment.  */
-  CONSTRUCTOR_APPEND_ELT (initlist, NULL_TREE,
+  /* For compatibility with existing implementations of the 64-bit NeXT v2
+ ABI, explicitly zero-fill reserved fields that otherwise would be filled
+ with alignment padding. */
+  /* reserved */
+  if (TARGET_LP64)
+CONSTRUCTOR_APPEND_ELT (initlist, NULL_TREE,
build_int_cst (integer_type_node, 0));

   /* ivarLayout */

[PATCH] Objective-C: fix protocol list count type (pertinent to non-LP64)

2021-09-26 Thread Matt Jacobson via Gcc-patches

Fix protocol list layout for non-LP64.  clang and objc4 both give the `count` 
field as `long`, not `intptr_t`.  Those are the same on LP64, but not 
everywhere.  For non-LP64, this fixes binary compatibility with clang-built 
classes.

This was more complicated than I anticipated, because the relevant frontend 
code in fact had no AST type for `protocol_list_t`, instead emitting protocol 
lists as `protocol_t[]`, with the zeroth element actually being the integer 
count.  That made it nontrivial to change the count to `long`.  With this 
change, there is now a true `protocol_list_t` type in the AST.

Tested multiple ways.  On x86_64/Darwin, I confirmed with a test program that 
protocol conformances by classes, categories, and protocols works.  On AVR, I 
manually inspected the generated assembly to confirm that protocol lists gain 
an extra two bytes of `count`, matching clang.

Thank you for your time.

<https://github.com/mhjacobson/gcc/commit/5ebc95dc726f0745ebdf003093f1b8d7720ce32f>


gcc/objc/ChangeLog:

2021-09-26  Matt Jacobson  

* objc-next-runtime-abi-02.c (enum objc_v2_tree_index): Add new global 
tree.
(static void next_runtime_02_initialize): Initialize protocol list type 
tree.
(struct class_ro_t): Fix type misspelling.
(build_v2_class_templates): Correct type in field declaration.
(build_v2_protocol_templates): Create actual protocol list type tree.
(build_v2_category_template): Correct type in field declaration.
(generate_v2_protocol_list): Emit protocol list count as `long`.
(generate_v2_protocols): Use correct type.
(build_v2_category_initializer): Use correct type.
(build_v2_class_ro_t_initializer): Use correct type.


diff --git a/gcc/objc/objc-next-runtime-abi-02.c 
b/gcc/objc/objc-next-runtime-abi-02.c
index c3af369ff0d..aadf1741676 100644
--- a/gcc/objc/objc-next-runtime-abi-02.c
+++ b/gcc/objc/objc-next-runtime-abi-02.c
@@ -92,6 +92,7 @@ enum objc_v2_tree_index
   OCTI_V2_CAT_TEMPL,
   OCTI_V2_CLS_RO_TEMPL,
   OCTI_V2_PROTO_TEMPL,
+  OCTI_V2_PROTO_LIST_TEMPL,
   OCTI_V2_IVAR_TEMPL,
   OCTI_V2_IVAR_LIST_TEMPL,
   OCTI_V2_MESSAGE_REF_TEMPL,
@@ -130,6 +131,8 @@ enum objc_v2_tree_index
objc_v2_global_trees[OCTI_V2_CAT_TEMPL]
 #define objc_v2_protocol_template \
objc_v2_global_trees[OCTI_V2_PROTO_TEMPL]
+#define objc_v2_protocol_list_template \
+   objc_v2_global_trees[OCTI_V2_PROTO_LIST_TEMPL]
 
 /* struct message_ref_t */
 #define objc_v2_message_ref_template \
@@ -196,7 +199,7 @@ static void build_v2_message_ref_templates (void);
 static void build_v2_class_templates (void);
 static void build_v2_super_template (void);
 static void build_v2_category_template (void);
-static void build_v2_protocol_template (void);
+static void build_v2_protocol_templates (void);
 
 static tree next_runtime_abi_02_super_superclassfield_id (void);
 
@@ -394,9 +397,9 @@ static void next_runtime_02_initialize (void)
build_pointer_type (xref_tag (RECORD_TYPE,
get_identifier ("_prop_list_t")));
 
+  build_v2_protocol_templates ();
   build_v2_class_templates ();
   build_v2_super_template ();
-  build_v2_protocol_template ();
   build_v2_category_template ();
 
   bool fixup_p = flag_next_runtime < USE_FIXUP_BEFORE;
@@ -636,7 +639,7 @@ struct class_ro_t
 const uint8_t * const ivarLayout;
 const char *const name;
 const struct method_list_t * const baseMethods;
-const struct objc_protocol_list *const baseProtocols;
+const struct protocol_list_t *const baseProtocols;
 const struct ivar_list_t *const ivars;
 const uint8_t * const weakIvarLayout;
 const struct _prop_list_t * const properties;
@@ -685,11 +688,9 @@ build_v2_class_templates (void)
   /* const struct method_list_t * const baseMethods; */
   add_field_decl (objc_method_list_ptr, "baseMethods", &chain);
 
-  /* const struct objc_protocol_list *const baseProtocols; */
-  add_field_decl (build_pointer_type
-   (xref_tag (RECORD_TYPE,
-  get_identifier (UTAG_V2_PROTOCOL_LIST))),
- "baseProtocols", &chain);
+  /* const struct protocol_list_t *const baseProtocols; */
+  add_field_decl (build_pointer_type (objc_v2_protocol_list_template),
+ "baseProtocols", &chain);
 
   /* const struct ivar_list_t *const ivars; */
   add_field_decl (objc_v2_ivar_list_ptr, "ivars", &chain);
@@ -763,25 +764,33 @@ build_v2_super_template (void)
  const char ** extended_method_types;
  const char * demangled_name;
  const struct _prop_list_t * class_properties;
-   }
+  };
+
+  struct protocol_list_t
+  {
+long count;
+struct protocol_t protocols[];
+  };
 */
 static void
-build_v2_protocol_template (void)
+build_

Re: [PATCH] Objective-C: fix protocol list count type (pertinent to non-LP64)

2021-11-07 Thread Matt Jacobson via Gcc-patches

> On Oct 25, 2021, at 5:43 AM, Iain Sandoe  wrote:
> 
> Did you test objective-c++ on Darwin?
> 
> I see a lot of fails of the form:
> Excess errors:
> : error: initialization of a flexible array member [-Wpedantic]

Looked into this.  It’s happening because obj-c++.dg/dg.exp has:

set DEFAULT_OBJCXXFLAGS " -ansi -pedantic-errors -Wno-long-long"

Specifically, the `-pedantic-errors` argument prohibits initialization of a 
flexible array member.  Notably, this flag does *not* appear in objc/dg.exp.

Admittedly I didn’t know that initialization of a FAM was prohibited by the 
standard.  It’s allowed by GCC, though, as documented here:

<https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html>

Is it OK to use a GCC extension this way in the Objective-C frontend?

> For a patch that changes code-gen we should have a test that it produces 
> what’s
> expected (in general, a ‘torture' test would be preferrable so that we can be 
> sure the
> output is as expected for different optimisation levels). 

The output is different only for targets where 
sizeof (long) != sizeof (void *).  Do we have the ability to run “cross” 
torture tests?  Could such a test verify the emitted assembly (like LLVM’s 
FileCheck tests do)?  Or would it need to execute something?

Thanks for your help!

Matt

[PATCH] build: Implement --with-multilib-list for avr target

2021-06-07 Thread Matt Jacobson via Gcc-patches

The AVR target builds a lot of multilib variants of target libraries by default,
and I found myself wanting to use the --with-multilib-list argument to limit
what I was building, to shorten build times.  This patch implements that option
for the AVR target.

Tested by configuring and building an AVR compiler and target libs on macOS.

I don't have commit access, so if this patch is suitable, I'd need someone else
to commit it for me.  Thanks.

gcc/ChangeLog:

2020-06-07  Matt Jacobson  

* config.gcc: For the AVR target, populate TM_MULTILIB_CONFIG.
* config/avr/genmultilib.awk: Add ability to filter generated multilib
list.
* config/avr/t-avr: Pass TM_MULTILIB_CONFIG to genmultilib.awk.
* configure.ac: Update help string for --with-multilib-list.
* configure: Regenerate.


diff --git a/gcc/config.gcc b/gcc/config.gcc
index 6a349965c..fd83996a4 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -4201,6 +4201,13 @@ case "${target}" in
fi
;;
 
+   avr-*-*)
+   # Handle --with-multilib-list.
+   if test "x${with_multilib_list}" != xdefault; then
+   TM_MULTILIB_CONFIG="${with_multilib_list}"
+   fi
+   ;;
+
 csky-*-*)
supported_defaults="cpu endian float"
;;
diff --git a/gcc/config/avr/genmultilib.awk b/gcc/config/avr/genmultilib.awk
index 2d07c0e53..ad8814602 100644
--- a/gcc/config/avr/genmultilib.awk
+++ b/gcc/config/avr/genmultilib.awk
@@ -67,6 +67,16 @@ BEGIN {
 
 dir_long_double = "long-double"   (96 - with_long_double)
 opt_long_double = "mlong-double=" (96 - with_long_double)
+
+if (with_multilib_list != "")
+{
+   split(with_multilib_list, multilib_list, ",")
+
+   for (i in multilib_list)
+   {
+   multilibs[multilib_list[i]] = 1
+   }
+}
 }
 
 ##
@@ -137,6 +147,9 @@ BEGIN {
if (core == "avr1")
next
 
+   if (with_multilib_list != "" && !(core in multilibs))
+   next
+
option[core] = "mmcu=" core
 
m_options  = m_options m_sep option[core]
@@ -150,6 +163,9 @@ BEGIN {
 if (core == "avr1")
next
 
+if (with_multilib_list != "" && !(core in multilibs))
+   next
+
 opts = option[core]
 
 # split device specific feature list
diff --git a/gcc/config/avr/t-avr b/gcc/config/avr/t-avr
index 3e1a1ba68..7d20c6107 100644
--- a/gcc/config/avr/t-avr
+++ b/gcc/config/avr/t-avr
@@ -127,6 +127,7 @@ t-multilib-avr: $(srcdir)/config/avr/genmultilib.awk \
-v HAVE_LONG_DOUBLE64=$(HAVE_LONG_DOUBLE64) \
-v with_double=$(WITH_DOUBLE)   \
-v with_long_double=$(WITH_LONG_DOUBLE) \
+   -v with_multilib_list=$(TM_MULTILIB_CONFIG) \
-f $< $< $(AVR_MCUS) > $@
 
 include t-multilib-avr
diff --git a/gcc/configure.ac b/gcc/configure.ac
index 715fcba04..c3ed65df7 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -1106,7 +1106,7 @@ if test x"$enable_hsa" = x1 ; then
 fi
 
 AC_ARG_WITH(multilib-list,
-[AS_HELP_STRING([--with-multilib-list], [select multilibs (AArch64, SH and 
x86-64 only)])],
+[AS_HELP_STRING([--with-multilib-list], [select multilibs (AArch64, AVR, i386, 
or1k, RISC-V, SH, and x86-64 only)])],
 :,
 with_multilib_list=default)

Re: [PATCH] Objective-C: fix protocol list count type (pertinent to non-LP64)

2021-10-19 Thread Matt Jacobson via Gcc-patches



> On Sep 26, 2021, at 11:45 PM, Matt Jacobson  wrote:
> 
> Fix protocol list layout for non-LP64.  clang and objc4 both give the `count` 
> field as `long`, not `intptr_t`.  Those are the same on LP64, but not 
> everywhere.  For non-LP64, this fixes binary compatibility with clang-built 
> classes.
> 
> This was more complicated than I anticipated, because the relevant frontend 
> code in fact had no AST type for `protocol_list_t`, instead emitting protocol 
> lists as `protocol_t[]`, with the zeroth element actually being the integer 
> count.  That made it nontrivial to change the count to `long`.  With this 
> change, there is now a true `protocol_list_t` type in the AST.
> 
> Tested multiple ways.  On x86_64/Darwin, I confirmed with a test program that 
> protocol conformances by classes, categories, and protocols works.  On AVR, I 
> manually inspected the generated assembly to confirm that protocol lists gain 
> an extra two bytes of `count`, matching clang.
> 
> Thank you for your time.
> 
> <https://github.com/mhjacobson/gcc/commit/5ebc95dc726f0745ebdf003093f1b8d7720ce32f>

Friendly ping.  Please let me know if there’s anything I can clarify.

Original mail:
<https://gcc.gnu.org/pipermail/gcc-patches/2021-September/580280.html>

Thanks.

Re: [PATCH] build: Implement --with-multilib-list for avr target

2021-07-05 Thread Matt Jacobson via Gcc-patches




> On Jun 7, 2021, at 3:30 AM, Matt Jacobson  wrote:
> 
> The AVR target builds a lot of multilib variants of target libraries by 
> default,
> and I found myself wanting to use the --with-multilib-list argument to limit
> what I was building, to shorten build times.  This patch implements that 
> option
> for the AVR target.
> 
> Tested by configuring and building an AVR compiler and target libs on macOS.
> 
> I don't have commit access, so if this patch is suitable, I'd need someone 
> else
> to commit it for me.  Thanks.

Ping.  (Please let me know if I’ve made some process error here; this is my 
first change to GCC.)

Original mail:
<https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572041.html>

Thanks.

Re: [PATCH] build: Implement --with-multilib-list for avr target

2021-07-19 Thread Matt Jacobson via Gcc-patches




> On Jul 5, 2021, at 7:09 PM, Matt Jacobson  wrote:
> 
>> On Jun 7, 2021, at 3:30 AM, Matt Jacobson  wrote:
>> 
>> The AVR target builds a lot of multilib variants of target libraries by 
>> default,
>> and I found myself wanting to use the --with-multilib-list argument to limit
>> what I was building, to shorten build times.  This patch implements that 
>> option
>> for the AVR target.
>> 
>> Tested by configuring and building an AVR compiler and target libs on macOS.
>> 
>> I don't have commit access, so if this patch is suitable, I'd need someone 
>> else
>> to commit it for me.  Thanks.
> 
> Ping.  (Please let me know if I’ve made some process error here; this is my 
> first change to GCC.)

Ping again.  Thanks!

Original mail:
<https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572041.html>

Matt

[PATCH] Objective-C: don't require redundant -fno-objc-sjlj-exceptions for the NeXT v2 ABI

2021-07-28 Thread Matt Jacobson via Gcc-patches

As is, an invocation of GCC with -fnext-runtime -fobjc-abi-version=2 crashes, 
unless target-specific code adds an implicit -fno-objc-sjlj-exceptions (which 
Darwin does).

This patch makes the general case not crash.

I don't have commit access, so if this patch is suitable, I'd need someone else
to commit it for me.  Thanks.

gcc/objc/ChangeLog:

2021-07-28  Matt Jacobson  

* objc-next-runtime-abi-02.c (objc_next_runtime_abi_02_init): Warn
about and reset flag_objc_sjlj_exceptions regardless of
flag_objc_exceptions.


gcc/c-family/ChangeLog:

2021-07-28  Matt Jacobson  

* c-opts.c (c_common_post_options): Default to
flag_objc_sjlj_exceptions = 1 only when flag_objc_abi < 2.

diff --git a/gcc/c-family/c-opts.c b/gcc/c-family/c-opts.c
index c51d6d34726..2568df67972 100644
--- a/gcc/c-family/c-opts.c
+++ b/gcc/c-family/c-opts.c
@@ -840,9 +840,9 @@ c_common_post_options (const char **pfilename)
   else if (!flag_gnu89_inline && !flag_isoc99)
 error ("%<-fno-gnu89-inline%> is only supported in GNU99 or C99 mode");
 
-  /* Default to ObjC sjlj exception handling if NeXT runtime.  */
+  /* Default to ObjC sjlj exception handling if NeXT  (SIZEHASHTABLE);
 
-  if (flag_objc_exceptions && flag_objc_sjlj_exceptions)
+  if (flag_objc_sjlj_exceptions)
 {
   inform (UNKNOWN_LOCATION,
  "%<-fobjc-sjlj-exceptions%> is ignored for "

77 matches

Mail list logo