Goodbye

2011-12-29 Thread Ira Rosen
Hi, Thank you all for an interesting and pleasant experience. I am very grateful to Linaro for the opportunity to meet and work with such an amazing group of people. I wish you all the best, and hope to meet you again (at least online). You can find me at i...@il.ibm.com or ira@gmail.com. Ir

Re: cdce3.C execution fault

2011-12-19 Thread Ira Rosen
On 20 December 2011 02:03, Michael Hope wrote: > Hi there.  I've looked further into the intermittent > gcc/testsuite/g++.dg/cdce3.C test failures.  Taking Ira's > vectoriser-only fix-pr51301-4.6 branch and comparing it with it's > predecessor r106845: >  * cdce3.o itself is identical across compi

[ACTIVITY] December 11-15

2011-12-15 Thread Ira Rosen
Hi, After learning how to control MEM_ALIGN and, therefore, alignment hints from the vectorizer, I was able to generate 64-bit hints (with the help of Ramana's patches). I saw a 16% improvement on a benchmark with stack variables, for which we now force alignment to 64 bits and create alignment hi

[ACTIVITY] December 4-8

2011-12-08 Thread Ira Rosen
Hi, - fixed PR 51285 - continued looking at the alignment issue, ran Michael's script with different options, tested Ramana's preliminary patch for vld1/vst1, and my "don't peel for low loop bounds" patch Ira ___ linaro-toolchain mailing list linaro-t

Re: Release notes for GCC 4.6

2011-12-07 Thread Ira Rosen
On 7 December 2011 22:36, Andrew Stubbs wrote: > Hi all, Hi Andrew, > > I've copied all those who made commits to GCC 4.6 this month. > > Could you please give me a sentence or two for the release notes? I committed several patches to enable SLP of libav's weight-h264-pixels16x16-8: - support S

[ACTIVITY] Nov 27 - Dec 1

2011-12-01 Thread Ira Rosen
Hi, - Ran eon with gcc 4.7: there are much more loops similar to the one in lp#831094 that get vectorized (due to some data ref analysis improvement), so the impact of disabling peeling for such loops (i.e. loops with low loop bound) is even bigger than for 4.6, and vectorization improves the perf

Re: Effect of alignment and peeling on vectorised loops

2011-12-01 Thread Ira Rosen
On 30 November 2011 22:28, Michael Hope wrote: >>> This run also showed the affect of loop unrolling. The loop seems to >>> be unrolled for loops of <= 64 words and drops off in performance past >>> around 8 words. When the unrolling finally drops out, performance >>> increases by 101 %. >> >> I

Re: Effect of alignment and peeling on vectorised loops

2011-11-30 Thread Ira Rosen
On 30 November 2011 02:33, Michael Hope wrote: > I then converted the vld1 and vst1 to specifiy an alignment of 64 > bits. See: >  http://people.linaro.org/~michaelh/incoming/set-alignment.png > > This improved the throughput in all cases and in cases for more than 50 > words by 14 %.  This graph

Re: [ACTIVITY] November 20-24

2011-11-28 Thread Ira Rosen
On 24 November 2011 15:32, Ira Rosen wrote: > * Disabling peeling for low loop bounds also helps with one of EEMBC > benchmarks, for which vectorization with double-words is more > beneficial than with quad-words. It turns out that we are able to > force the alignment for doubl

[ACTIVITY] November 20-24

2011-11-24 Thread Ira Rosen
Hi, * Worked on peeling problem in eon (#831094). Wrote a patch that checks if the number of vector iterations is going to be more than 2, and disables peeling otherwise. With this patch I see about 1.5% regression with vectorization (and about 7% without it). * I am thinking to extend the patch

[ACTIVITY] November 14-17

2011-11-17 Thread Ira Rosen
Hi, - spent most of the week trying to reproduce regressions with vectorization - started bringing the latest SLP feature, condition with different types, to gcc-linaro. There are 5 patches. Merged one, started to prepare another one. - fixed PR 51112 Ira __

[ACTIVITY] November 6-10

2011-11-10 Thread Ira Rosen
Hi, - SLP improvements for weight-h264-pixels16x16-8 (libav): - conditions in SLP - committed upstream - support pattern detection in SLP - implemented - enhance mixed condition pattern to handle non-constant then/else clauses - implemented weight-h264-pixels16x16-8 now gets vectorize

[ACTIVITY] Oct 30 - Nov 3

2011-11-03 Thread Ira Rosen
Hi, - Finished rewriting SLP analysis to support not only unary and binary operations. Committed upstream. - Implemented cond_expr support in SLP (for libav weight_h264_pixels). Testing it now. - Vectorizer maintenance (test/bug fixes, patch reviews). Ira __

[ACTIVITY] October 23-27

2011-10-27 Thread Ira Rosen
Hi, - Merged to gcc-linaro: - widening shifts - SLP features: support loads with different offsets and swap operands if necessary - Started rewriting SLP analysis to support operations with more than two operands (towards SLP of conditions) - Updated NEON presentation following Ramana's sugg

[ACTIVITY] October 16-19

2011-10-19 Thread Ira Rosen
Hi, * widening shifts - finally committed upstream * SLP loads with different offsets and operand swaps - committed upstream * SLP with multiple types - merged to gcc-linaro-4.6 * vectorizer stuff: patch review, test fixes, discussions, bug fix * Ramana and I discussed what can be done with VEC_PE

[ACTIVITY] October 9-12

2011-10-12 Thread Ira Rosen
Hi, * Finished a presentation for NEON forum. Revital and Richard kindly agreed to take a look and gave me some valuable comments. Thanks! * widen-shifts: - While preparing the presentation I found some room for improvement in the pattern detection, so I implemented it. It gave additional 13% t

[ACTIVITY] October 2-6

2011-10-06 Thread Ira Rosen
Hi, - worked on the RTL part of the widen-shift patch - backported to linaro 2/3 of the SLP patches, and proposed the third one - worked on additional SLP improvements: - swap operands to make statements isomorphic - support load with offset 1 (after load from 0) - started working on presentat

[ACTIVITY] September 25-28

2011-09-28 Thread Ira Rosen
Hi, * change default vector size patch - merged to linaro-gcc * SLP improvements committed upstream: - allow not affine accesses - auto-detect vector size - support multiple types and promotion/demotion operations Ira ___ linaro-toolchain mailing

[ACTIVITY] September 18-22

2011-09-22 Thread Ira Rosen
Hi, * widening shifts patch - submitted upstream * change default vector size patch - submitted to linaro-gcc * automatic choice of vector size for basic block vectorization - testing * vectorizer bug fixes Next week we have New Year holiday on Wednesday (half day) and Thursday. Ira ___

[ACTIVITY] September 11-15

2011-09-15 Thread Ira Rosen
Hi, * testing widen-shifts patch on ARM * SLP improvements: - submitted a patch to allow not simple ivs in SLP - committed a patch to allow read-after-read dependencies in SLP Ira ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.or

[ACTIVITY] September 4-8

2011-09-08 Thread Ira Rosen
Hi, * merged vector over-promotion patch to linaro-gcc-4.6 * committed upstream the change of the default vector size for NEON * continued working on widening shifts Ira ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.l

[ACTIVITY] Aug 31 - Sep 1

2011-09-01 Thread Ira Rosen
Hi, - catching up with the mail - fixed GCC PR50178 - prepared a patch to fix vectorizer dumps Ira ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain

[ACTIVITY] August 14-18

2011-08-18 Thread Ira Rosen
Hi, - change of default vector size for auto-vectorization on NEON - submitted and approved - continued working on vectorization of widening shifts - looked into SLP vectorization for libav - two vacation days I'll be on vacation on Aug 22-30. Ira ___

Re: Basic libav profiling

2011-08-18 Thread Ira Rosen
On 18 August 2011 12:45, Andrew Stubbs wrote: > On 18/08/11 06:56, Ira Rosen wrote: >>> >>> How can I tell the vectoriser that a input is a multiple of something? >> >> Unfortunately, I don't think you can. > > I think you can do something like this:

Re: Basic libav profiling

2011-08-17 Thread Ira Rosen
On 18 August 2011 02:43, Michael Hope wrote: > On Thu, Aug 18, 2011 at 11:11 AM, Michael Hope > wrote: >> On Tue, Aug 16, 2011 at 11:32 PM, Richard Sandiford >> wrote: >>> Michael Hope writes: I put a build harness around libav and gathered some profiling data.  See:  bzr branch lp:~

[ACTIVITY] August 7-11

2011-08-11 Thread Ira Rosen
Hi, * fixed PR 50014 and 50039 - to be backported to linaro-gcc * tested the patch to change the default vector size on NEON * found one test that fails with quad-words - gcc.c-torture/execute/mode-dependent-address.c. Debugging it with Ramana. * started looking into widening shifts Vacation plan

[ACTIVITY] Aug 1-5

2011-08-07 Thread Ira Rosen
Hi, * committed upstream a patch that reduces over-promotion of vector operations * started to work on a new version of the patch to change the default vector size for Neon * attended Linaro connect Ira ___ linaro-toolchain mailing list linaro-toolchai

Re: libav wiki page updated for current FSF trunk

2011-08-05 Thread Ira Rosen
On 5 August 2011 08:53, Richard Sandiford wrote: > As well as using a more recent compiler, the new version also uses > -mvectorize-with-neon-quad.  Once again it shows a significant improvement > over the default. Richard E., Ramana and I finally came to an agreement about how it should work, s

[ACTIVITY] July 24-28

2011-07-28 Thread Ira Rosen
Hi, I am checking the coverage of the NEON instructions mostly by writing tests in C to check which instructions are generated (after auto-vectorization) and which are not. I put here https://wiki.linaro.org/IraRosen/Sandbox/InstructionCoverage the list of things that I've checked till now. Ira

[ACTIVITY] July 17-21

2011-07-21 Thread Ira Rosen
Hi, - I finally submitted the over-widening patch, but Richard Guenther thought that this optimization should be done for scalars as well, and he is now working on this himself. - Some auto-vectorizer fixes Ira ___ linaro-toolchain mailing list linaro-

[ACTIVITY] July 11-14

2011-07-14 Thread Ira Rosen
Hi, - merged over-widened multiply patch to gcc-linaro-4.6 (now vectorized rgbyiqv should be about as good as its scalar version) - continued working on over-widened shifts and bit operations Ira ___ linaro-toolchain mailing list linaro-toolchain@lists

[ACTIVITY] July 3-6

2011-07-06 Thread Ira Rosen
Hi, - continued working on prevention of over-widening in vectorization - finalizing the patch - improvement of vectorizer peeling heuristic - merged to gcc-linaro-4.6 - vectorization of widen-mult with over-promoted operands - proposed for merge to gcc-linaro-4.6 - fixed PR 49610 - patch reviews

Re: [ACTIVITY] June 26-30

2011-06-30 Thread Ira Rosen
Sorry, I forgot to mention that I will be on vacation on July 7-10. Ira On 30 June 2011 15:19, Ira Rosen wrote: > Hi, > > - support of multiple uses of original pattern statements (needed for > over-promotion work) - committed upstream > - support of widen-mult of unsigned type

[ACTIVITY] June 26-30

2011-06-30 Thread Ira Rosen
Hi, - support of multiple uses of original pattern statements (needed for over-promotion work) - committed upstream - support of widen-mult of unsigned types and constants - merged to gcc-linaro-4.6 - vectorizer peeling heuristic improvement - proposed to merge to gcc-linaro-4.6 Ira

[ACTIVITY] June 19-23

2011-06-23 Thread Ira Rosen
Hi, - investigating detection of general over-widening cases in the vectorizer - improvements of widen-mult - proposed for merge to gcc-linaro-4.6 - fixed PRs 49443 and 49478 Ira ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http:

[ACTIVITY] June 12-16

2011-06-16 Thread Ira Rosen
Hi, - fix vectorizer testsuite failures on ARM - committed - committed a fix of a bug in the vectorizer revealed by the widen-mult patch - committed an improvement of peeling heuristic - reduce over-widening in case of multiplication by a constant (improves vectorized rgbyiq by almost 2x) - commit

[ACTIVITY] June 5-9

2011-06-09 Thread Ira Rosen
Hi, - vectorization of widening multiplication of unsigned types and constants - committed to mainline - fix vectorizer testsuite failures on ARM - submitted - testing a patch to fix a bug in the vectorizer revealed by the widen-mult patch - testing a patch to fix bad peeling heuristic that causes

[ACTIVITY] May 29 - June 2

2011-06-02 Thread Ira Rosen
Hi, - bug fixes: PRs 49222, 49199, 49239, 49093 - widening multiplication: submitted a patch to support widen-mul for unsigned types and constants in the vectorizer's pattern recognizer. Now considering to move optimize_widening_mul pass before loop optimizations and improve it to support unsigned

[ACTIVITY] May 22-26

2011-05-26 Thread Ira Rosen
Hi, * PR 49087 - fixed * PR 49038 - opened by Richard - fixed on 4.7, to be backported to 4.5 and 4.6 * working on widening multiplication for unsigned types and constants (the signed case works fine) Ira ___ linaro-toolchain mailing list linaro-toolch

Re: Engineering blueprints for 11.11

2011-05-22 Thread Ira Rosen
> At the moment good 'ol > CoreMark is worse with -O3 -omfpu=neon... It maybe worth to try -fvect-cost-model. Ira > > -- Michael > > ___ > linaro-toolchain mailing list > linaro-toolchain@lists.linaro.org > http://lists.linaro.org/mailman/listinfo/l

[ACTIVITY] May 15-19

2011-05-19 Thread Ira Rosen
Hi, * committed a patch that supports reductions in SLP (upstream) * continued analyzing benchmarks: ffmpeg, EEMBC telecom, office, networking * started to look into implementation of reverse accesses for Neon * blueprints Ira ___ linaro-toolchain mail

[ACTIVITY] May 8-12

2011-05-12 Thread Ira Rosen
Hi, * continued looking into ffmpeg/libavcodec: - dcadsp.c - the inner loop contains reverse accesses which are not supported on Neon. I think we can handle them using vrev and vswp. - a lot of loops have unknown memory stride. I am exploring a possibility of a combination of scalar loads and

[ACTIVITY] May 1-5

2011-05-05 Thread Ira Rosen
Hi, - backported vzip fix to GCC 4.5 and 4.6 (PR 48252) - merged auto-detection of vector size patch to gcc-linaro 4.6 - started looking into vectorization of ffmpeg Ira ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.

[ACTIVITY] April 27-28

2011-04-30 Thread Ira Rosen
Hi, gcc-linaro/4.6 - if-conversion improvement (needed to vectorize Telecom/viterbi) - merged - auto-detection of vector size - proposed for merging GCC FSF - fixed PR 48765 on trunk - completed testig of vzip fix (PR 48252) on gcc 4.5 and gcc 4.6 Ira ___

Re: Some initial notes on the effects of vldN and vstN vectorisation

2011-04-17 Thread Ira Rosen
On 13 April 2011 18:48, Richard Sandiford wrote: > I've now submitted the initial vldN and vstN work, so I thought I'd see > how often it triggers for natty's libav package.  I've put some initial > results here: > >    https://wiki.linaro.org/RichardSandiford/Sandbox/NeonLibAv > > There are more

[ACTIVITY] April 3-7

2011-04-07 Thread Ira Rosen
Hi, * continued bringing patches upstream - changing default vector size to 128 - resubmitted with changes according to comments, awaiting review - if-conversion improvement - committed * PR 48252 - bug in vzip/vuzp/vtrn implementation - patch submitted * opened PR 48454 - a test failure wit

[ACTIVITY] March 27-31

2011-03-31 Thread Ira Rosen
Hi, * continued bringing patches upstream - auto-detection of vector size - committed - changing default vector size to 128 - submitted and testing the final version - if-conversion improvement - submitted and now testing the final version * gcc-linaro-4.6 - submitted a merge request for

[ACTIVITY] March 22-24

2011-03-24 Thread Ira Rosen
Hi, * resubmitted and committed store sink patch to trunk, I'll commit it to gcc-linaro-4.6 next week * submitted autodetection of vector size patch to gcc-patches, I'l commit it next week * started testing a patch that makes mvectorize-with-neon-quad the default * DenBench: found some more cases

[ACTIVITY] March 13-17

2011-03-17 Thread Ira Rosen
Hi, * submitted store sinking patch to mainline * started testing auto-detection of vector size patch * DENBench - some benchmarks are still unstable, I am looking into stable regressions, adjusting and fixing the cost model for them Next week: Sunday and Monday - holidays Ira _

Re: Fwd: Representing interleaving and lane load/stores at the tree level

2011-03-15 Thread Ira Rosen
On 15 March 2011 13:30, Richard Sandiford wrote: > Ira Rosen writes: > >> > How do you distinguish between "multiple structures" and "single > > structure > >> > to all lanes"? > >> > >> Sorry, I'm not sure I underst

[ACTIVITY] March 6-10

2011-03-10 Thread Ira Rosen
Hi, * continued working on cost model tuning. I don't see much difference running EEMBC DenBench with and without vectorization enabled (and, therefore, also with and without cost model). Also, I have to say, that the results are not stable and I sometimes get 10% difference just running the same

Re: Fwd: Representing interleaving and lane load/stores at the tree level

2011-03-10 Thread Ira Rosen
> [Sorry, forgot to CC: the list] > > Hi Ira, > > Thanks for the feedback. > > On 6 March 2011 09:20, Ira Rosen wrote: > > > So how about the following functions?  (Forgive the pascally syntax.) > > > > > >     __builtin_load_lanes (REF : array N

Re: Representing interleaving and lane load/stores at the tree level

2011-03-06 Thread Ira Rosen
er to allow these functions to be created in the > original source code. This is throw-away code though; it would never > be submitted. > > I've also included a simple test case and the output I get from it. > The output looks pretty good; there's not even the stray VMOV tha

[ACTIVITY] February 20-24

2011-02-24 Thread Ira Rosen
Hi, * vectorizer cost model - implemented builtin_vectorization_cost for NEON - added register spilling considerations to the cost model - started testing/tuning on EEMBC Telecom and DenBench (for now I have only two examples for spilling: fdct_int32 mp4encode that shouldn't get vectorized a

[ACTIVITY] February 13-17

2011-02-17 Thread Ira Rosen
Hi, This week I looked into DENBench: * sad8_c (hot function from mp4encode) needs SLP reduction, but it also contains cond_expr which cannot be vectorized as reduction, so I don't think there is anything I can do here * fdct_int32 (another hot function from mp4encode) now gets vectorized with vzi

[ACTIVITY] February 6-10

2011-02-10 Thread Ira Rosen
Hi, * regtested vzip/vuzp patch * looked into big-endian build * applied all the required patches and checked that Viterbi gets vectorized giving ~2x performance improvement (compiled with cross-compiler) * looked into vld/vst implementation - mostly discussions with Richard * DenBench analysis:

Re: Question about big endian

2011-02-09 Thread Ira Rosen
On 8 February 2011 17:34, Julian Brown wrote: > On Tue, 8 Feb 2011 11:22:32 + > Julian Brown wrote: > >> IIRC I couldn't figure out the magic incantation needed to do it last >> time I tried. I don't think the "--with-endian=xxx" option is >> supported for ARM. Possibly the way to do it is to

Re: Question about big endian

2011-02-07 Thread Ira Rosen
On 7 February 2011 18:24, Julian Brown wrote: > On Mon, 7 Feb 2011 17:18:40 +0200 > Ira Rosen wrote: > >> Hi, >> >> I'd like to check vzip/vuzp patch in big endian mode. But when I try >> to compile with -mbig-endian flag, I get >> >> > ~/m

Question about big endian

2011-02-07 Thread Ira Rosen
Hi, I'd like to check vzip/vuzp patch in big endian mode. But when I try to compile with -mbig-endian flag, I get > ~/mainline/bin/bin/gcc -O3 -mfloat-abi=softfp -mfpu=neon neon-vtrnu8.c > -mbig-endian /home/irar/mainline/bin/lib/gcc/armv7l-unknown-linux-gnueabi/4.6.0/../../../libgcc_s.so.1: cou

[ACTIVITY] January 30 - February 3

2011-02-03 Thread Ira Rosen
Hi, I continued to work on vect_interleave and vect_extract implementation on NEON: * debugged the compiler to find out what's the problem with neon_vzip/vuzp_internal * fixed it following Uli's advice * checked how neon_vzip/vuzp_internal work for intrinsics by writing tests * fixed the patch

Re: Help with define_insn

2011-02-03 Thread Ira Rosen
On 1 February 2011 16:23, Ulrich Weigand wrote: >> >> Are they actually broken  ? I'd be worried if that were the case. My >> understanding is that the >> existing ones are being used for the Neon intrinsics / builtins. > > Yes, they're broken, for the reason Ira originally points out: Right. Th

Re: Help with define_insn

2011-02-01 Thread Ira Rosen
On 1 February 2011 11:47, Ira Rosen wrote: > Thanks a lot! It seems to work. It fixed the problem and I am now > testing the patch on the rest of the vectorizer testsuite. After testing only with the vectorizer testsuite (which contains at least 30 tests for strided accesses), I'd

Re: Help with define_insn

2011-02-01 Thread Ira Rosen
On 31 January 2011 16:53, Ulrich Weigand wrote: > Ira Rosen wrote: > >> (define_insn "neon_vzip_internal" >>   [(set (match_operand:VDQW 0 "s_register_operand" "=w") >>        (unspec:VDQW [(match_operand:VDQW 1 "s_register_oper

Help with define_insn

2011-01-31 Thread Ira Rosen
Hi, I am trying to implement interleave_high/low and extract_even/odd using vzip and vuzp instructions. I am attaching a patch that attempts to do that. It uses already existing neon_vzip_internal. The problem with it is that it doesn't express the fact that the two outputs of vzip depend on both

Re: [ACTIVITY] January 23-27

2011-01-28 Thread Ira Rosen
gt; I am planning to post the patch here anyway, but since there are some test failures I prefer to wait with this a bit (and your questions make me feel even more insecure with the patch ;)). Ira > Thanks, > Tejas. > > > > On Thu, 2011-01-27 at 15:44 +0200, Ira Rosen wrote: &g

[ACTIVITY] January 23-27

2011-01-27 Thread Ira Rosen
Hi, I am working on implementation of interleave_high/low and extract_even/odd for NEON. The pairs of high/low (even/odd) are "magically" united into single vzip (vuzp) instruction in the back end, so there is no need in special support from the tree level. There are still some test failures that

[ACTIVITY] January 16-20

2011-01-20 Thread Ira Rosen
Hi, * finished SLP for reduction patch. The loop in DenBench that needs this feature also requires support of load permutation. I am considering to implement that too. I looked for other occasions that need this feature, but only found loops that are not vectorizable. So, I am not sure I'll procee

[ACTIVITY] January 9-13

2011-01-13 Thread Ira Rosen
Hi, * Continued with testing and implementation of reduction support in SLP * Found a major problem in vectorization of if-converted data accesses. Looked into other ways to solve the problem. * Spent some time on non-Linaro vectorization plans * Unsuccessfully tried to make the board work Ira _

[ACTIVITY] January 2-6

2011-01-06 Thread Ira Rosen
Hi, * implemented reduction support in SLP, I'll check if it helps DenBench next week * helping Sebastian Pop with if-conversion for vectorization improvements (BTW, Sebastian's goal is to vectorize kernels from ffmpeg) * fixed GCC PR47139 Ira ___ lina

[ACTIVITY] December 26-30

2010-12-30 Thread Ira Rosen
Hi, * continued with my attempts to vectorize Viterbi: - finished implementation of conditional store sinking in cselim pass (I did only limited testing). - reconsidered the idea of safe load if-conversion if an adjacent field of the same structure is accessed unconditionally - this may be i

[ACTIVITY] December 20-23

2010-12-23 Thread Ira Rosen
Hi, I was on vacation on Sunday and starting from Tuesday stayed home with a sick child, so I only had a couple of days to work. * vectorization of Viterbi: - continued implementing conditional store sinking in cselim pass - made if-conversion to work on loads of structure fields if other fie

Re: [ACTIVITY] December 12 - 16

2010-12-18 Thread Ira Rosen
Hi Ramana, On 16 December 2010 20:11, Ramana Radhakrishnan wrote: > Hi Ira, > > On Thu, 2010-12-16 at 15:29 +0200, Ira Rosen wrote: >> - telecom viterbi (vectorization potential gain is 4x) requires >> conditional store sinking and load hoisting to enable if-conve

[ACTIVITY] December 12 - 16

2010-12-16 Thread Ira Rosen
Hi, I continued looking into EEMBC benchmarks: - telecom fft is not vectorized because unknown number of iterations. It has both non-constant step and its loop bound may overflow. I think, the solution here could be loop versioning, but since versioning increases code size, this kind of optimizati

[ACTIVITY] Nov 29 - Dec 2

2010-12-02 Thread Ira Rosen
- Continued looking into NEON special loads and stores. - Benchmarks: concentrated on EEMBC Telecom: - autcor gets vectorized - viterbi, besides strided data accesses, needs to sink conditional stores to allow if-conversion and make the main loop vectorizable. Since the potential here is 4x,

Re: [gnu-arm-releases] Re: [PATCH, WIP] NEON quadword vectors in big-endian mode (#10061, #7306)

2010-12-02 Thread Ira Rosen
On 1 December 2010 17:57, Daniel Jacobowitz wrote: > On Wed, Dec 01, 2010 at 11:16:16AM +0200, Ira Rosen wrote: >> The meaning of the builtin (or maybe a new tree code would be better?) >> is that the elements of v0, v1 and v2 are deinterleaved. I wanted the >> MEM_REFs, s

Re: [PATCH, WIP] NEON quadword vectors in big-endian mode (#10061, #7306)

2010-12-01 Thread Ira Rosen
On 30 November 2010 14:51, Julian Brown wrote: >> >>> I think we need to somehow enhance MEM_REF, or maybe generate a >> >>> MEM_REF for the first vector and a builtin after it. >> >> >> >> Yeah, keeping these things looking like memory references to most >> >> of the compiler seems like a good p

Re: [ACTIVITY] November 21-25

2010-11-30 Thread Ira Rosen
On 25 November 2010 22:34, Michael Hope wrote: > On Fri, Nov 26, 2010 at 2:35 AM, Ira Rosen wrote: >>      FFMPEG http://www.ffmpeg.org/  (got this from Rony Nandy from >> User Platforms). It contains hand-vectorized code for NEON. >> Investigating. > > I'm builid

Re: [PATCH, WIP] NEON quadword vectors in big-endian mode (#10061, #7306)

2010-11-30 Thread Ira Rosen
On 22 November 2010 13:46, Ira Rosen wrote: > On 17 November 2010 13:21, Julian Brown wrote: >>> > We'd need to figure out what the RTL for such loads/stores should >>> > look like, and whether it can represent alignment constraints, or >>> > strides,

[ACTIVITY] November 21-25

2010-11-25 Thread Ira Rosen
Hi, - the struggle with the board took a lot of time - continued to investigate special loads/stores - looked for benchmarks: EEMBC Consumer filters rgbcmy and rgbyiq should be vectorizable once vld3, vst3/4 are supported EEMBC Telecom viterbi is supposed to give 4x on NEON once vector

Re: [PATCH, WIP] NEON quadword vectors in big-endian mode (#10061, #7306)

2010-11-22 Thread Ira Rosen
On 17 November 2010 13:21, Julian Brown wrote: >> > We'd need to figure out what the RTL for such loads/stores should >> > look like, and whether it can represent alignment constraints, or >> > strides, or loads/stores of multiple vector registers simulateously. Alignment info is kept in struct p

[ACTIVITY] November 14-18

2010-11-18 Thread Ira Rosen
Hi, This week I continued looking into vld/vst support in GCC. I also fixed GCC PR 46312 - testsuite failures on ARM. Ira ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Fwd: Fw: GCC SVN vs. BZR/LP

2010-11-17 Thread Ira Rosen
Hi, On 17 November 2010 05:35, Michael Hope wrote: > 1. How easy is it to frequently merge in SVN? It used to be terrible > as you had to manually track the merges. These days can you do a 'svn > merge trunk' and have it just work? I asked Mike Meissner to answer this question. Mike is very ex

Re: [PATCH, WIP] NEON quadword vectors in big-endian mode (#10061, #7306)

2010-11-17 Thread Ira Rosen
On 15 November 2010 17:33, Julian Brown wrote: > On Mon, 15 Nov 2010 10:12:26 +0200 > Ira Rosen wrote: > > > Hi Julian, > > > > On 12 November 2010 17:49, Julian Brown > > wrote: > > > ... > > > The important observation is that vectors from c

Re: [PATCH, WIP] NEON quadword vectors in big-endian mode (#10061, #7306)

2010-11-15 Thread Ira Rosen
Hi Julian, On 12 November 2010 17:49, Julian Brown wrote: > > For the first of these, I think we can get away with changing the > vectorizer to use explicit "array" loads and stores (i.e. vldN/vstN), so > that vector registers will hold elements in memory order -- so, all the > contortions in th

Mixed vector sizes

2010-11-09 Thread Ira Rosen
Hi, I started to look into mixed vector sizes (in the same loop). My main reason for this was to allow widening and narrowing instructions, that have different vector sizes for src and dest, to work properly. My example was widen_mult (int = short * short), I thought its implementation was not opt

Re: GCC SVN vs. BZR/LP

2010-11-09 Thread Ira Rosen
On 9 November 2010 15:36, Andrew Stubbs wrote: > On 09/11/10 12:55, Ira Rosen wrote: > >> * We can't really apply anything we want just for ourselves >> >> Why? It will be our "private" Linaro branch. We can apply whatever we >> want there (we ca

Re: GCC SVN vs. BZR/LP

2010-11-09 Thread Ira Rosen
On 9 November 2010 14:38, Andrew Stubbs wrote: > Re my recent email "Upstream GCC feature freeze", I think we're agreed that > we need to create a branch that tracks GCC 4.6 development, but has our own > performance improvements included. The question is where to host it? > > Option 1: Launchpad

Re: Auto-detection of vector size for NEON

2010-11-09 Thread Ira Rosen
Julian Brown wrote on 05/11/2010 12:58:14 PM: > I think it's probably fine to default to 128-bit vectors, and fall back > to 64-bits when necessary (where access patterns block usage of wider > vectors, or similar). AIUI, ARM were quite keen to get rid of > -mvectorize-with-neon-quad altogether

Re: Upstream GCC feature freeze

2010-11-08 Thread Ira Rosen
On 8 November 2010 20:30, Chung-Lin Tang wrote: > Still, I would like to see a 'linaro-trunk' branch under svn:// > gcc.gnu.org/svn/branches. It would actually serve a different purpose than > a LP branch; the LP GCC 4.6 would probably eventually turn into Linaro 4.6, > while a SVN branch would b

Re: Auto-detection of vector size for NEON

2010-11-04 Thread Ira Rosen
Julian Brown wrote on 03/11/2010 11:55:59 AM: > > On Mon, 1 Nov 2010 15:57:11 +0200 > Ira Rosen wrote: > > > It looks like it's enough to implement targetm.vectorize. > > autovectorize_vector_sizes for NEON in order to enable initial > > auto-detection of v

Auto-detection of vector size for NEON

2010-11-01 Thread Ira Rosen
Hi, It looks like it's enough to implement targetm.vectorize. autovectorize_vector_sizes for NEON in order to enable initial auto-detection of vector size. With the attached patch and -mvectorize-with-neon-quad flag, the vectorizer first tries to vectorize for 128 bit, and if this fails, it tries

Re: NEON vectorization: use of specialized load/store instructions

2010-10-18 Thread Ira Rosen
Joseph Myers wrote on 14/10/2010 05:18:37 PM: > On Thu, 14 Oct 2010, Ira Rosen wrote: > > > Let me check that I understand the problem first: the problem is that VLD1 > > and VST1 instructions in big endian mode follow the array numbering of > > elements, while all o

Re: NEON vectorization: use of specialized load/store instructions

2010-10-14 Thread Ira Rosen
Julian Brown wrote on 11/10/2010 04:29:15 PM: > In further followups (at the risk of misrepresenting Joseph & Paul > Brook's opinions!), there seemed to be general agreement that a scheme > something like that outlined below, with "permuting" loads/stores and > some way of handling multiple in-re

Re: NEON vectorization improvements - preliminary notes

2010-09-22 Thread Ira Rosen
Hi Julian, Here are some thoughts about your report. > Automatic vector size selection/mixed-size vectors > == I think we (I) need to cooperate with Richard Guenther: ask him about committing his patch to 4.6 (they are probably planning to merge v

Re: NEON vectorization improvements - preliminary notes

2010-09-15 Thread Ira Rosen
nd ARM®v7-R edition". Ira > > Cheers, > > Julian[attachment "CS308-vectorization-improvements.txt" deleted by > Ira Rosen/Haifa/IBM] ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain