Re: [PATCH RFC]Pair load store instructions using a generic scheduling fusion pass

Jeff Law Fri, 31 Oct 2014 12:21:00 -0700

On 10/30/14 23:36, Bin.Cheng wrote:

#2 would be the best solution for the case I was pondering, but I don't
think solving that case is terribly important given the processors for which
it was profitable haven't been made for a very long time.

I am thinking if it's possible to introduce a pattern-directed fusion.
Something like define_fusion, and adapting haifa-scheduler for it.  I
agree there are two kinds (relevant and irrelevant) fusion types, and
it's not trivial to support both in one scheme.  Do you have a
specific example that I can have a try?

I kicked around using reorg to do stuff like this in the past(combination of unrelated insns). But ultimately I think the way to gois have it happen when insns are on the ready list in the scheduler.

For fusion of related insns like the load/store pairing, I think yourapproach should work pretty well.

As to specific examples of independent insn fusion, the ones I'm mostfamiliar with are from the older PA chips. I wouldn't recommendbuilding something for those processors simply becuase they're so datedthat I don't believe anyone uses them anymore.

However, if you have cases (arm shift insns?), building for those isfine. If you just want examples, the ones we tried to exploit on the PAwere fmpyadd/fmpysub, movb,tr and addb,tr

fmpyadd/fmpysub combined independent floating point multiply with an FPadd or sub insn. There's many conditions, but if you want a simpleexample to play with, the attached file with -O2 -mschedule=7100LC oughtto generate one of these insns via pa_reorg.

addb,tr can combine an unconditional branch with a reg+reg or reg+imm5addition operation. movb,tr combines an unconditional branch with areg-reg copy or load of a 5 bit immediate value into a general register.I don't happen to have examples handy, but compiling integer code with-O2 -mschedule=7100LC ought to trigger some.

The code in pa_reorg is O(n^2) or worse. It predates the hooks to allowthe target to reorder the ready queue. It would probably be relativelyeasy to have that code run via those hooks and just look at the readyqueue. So it'd still be O(n^2), but the N would be *much* smaller. Butagain, I don't think anyone uses PA7xxxx processors and hasn't for overa decade, so it hasn't seemed worth the effort to change.


Cheers,
Jeff

*> \brief \b CLARSCL2 performs reciprocal diagonal scaling on a vector.
*
*  =========== DOCUMENTATION ===========
*
* Online html documentation available at 
*            http://www.netlib.org/lapack/explore-html/ 
*
*> \htmlonly
*> Download CLARSCL2 + dependencies 
*> <a href="http://www.netlib.org/cgi-bin/netlibfiles.tgz?format=tgz&filename=/lapack/lapack_routine/clarscl2.f";> 
*> [TGZ]</a> 
*> <a href="http://www.netlib.org/cgi-bin/netlibfiles.zip?format=zip&filename=/lapack/lapack_routine/clarscl2.f";> 
*> [ZIP]</a> 
*> <a href="http://www.netlib.org/cgi-bin/netlibfiles.txt?format=txt&filename=/lapack/lapack_routine/clarscl2.f";> 
*> [TXT]</a>
*> \endhtmlonly 
*
*  Definition:
*  ===========
*
*       SUBROUTINE CLARSCL2 ( M, N, D, X, LDX )
* 
*       .. Scalar Arguments ..
*       INTEGER            M, N, LDX
*       ..
*       .. Array Arguments ..
*       COMPLEX            X( LDX, * )
*       REAL               D( * )
*       ..
*  
*
*> \par Purpose:
*  =============
*>
*> \verbatim
*>
*> CLARSCL2 performs a reciprocal diagonal scaling on an vector:
*>   x <-- inv(D) * x
*> where the REAL diagonal matrix D is stored as a vector.
*>
*> Eventually to be replaced by BLAS_cge_diag_scale in the new BLAS
*> standard.
*> \endverbatim
*
*  Arguments:
*  ==========
*
*> \param[in] M
*> \verbatim
*>          M is INTEGER
*>     The number of rows of D and X. M >= 0.
*> \endverbatim
*>
*> \param[in] N
*> \verbatim
*>          N is INTEGER
*>     The number of columns of D and X. N >= 0.
*> \endverbatim
*>
*> \param[in] D
*> \verbatim
*>          D is REAL array, length M
*>     Diagonal matrix D, stored as a vector of length M.
*> \endverbatim
*>
*> \param[in,out] X
*> \verbatim
*>          X is COMPLEX array, dimension (LDX,N)
*>     On entry, the vector X to be scaled by D.
*>     On exit, the scaled vector.
*> \endverbatim
*>
*> \param[in] LDX
*> \verbatim
*>          LDX is INTEGER
*>     The leading dimension of the vector X. LDX >= 0.
*> \endverbatim
*
*  Authors:
*  ========
*
*> \author Univ. of Tennessee 
*> \author Univ. of California Berkeley 
*> \author Univ. of Colorado Denver 
*> \author NAG Ltd. 
*
*> \date September 2012
*
*> \ingroup complexOTHERcomputational
*
*  =====================================================================
      SUBROUTINE CLARSCL2 ( M, N, D, X, LDX )
*
*  -- LAPACK computational routine (version 3.4.2) --
*  -- LAPACK is a software package provided by Univ. of Tennessee,    --
*  -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..--
*     September 2012
*
*     .. Scalar Arguments ..
      INTEGER            M, N, LDX
*     ..
*     .. Array Arguments ..
      COMPLEX            X( LDX, * )
      REAL               D( * )
*     ..
*
*  =====================================================================
*
*     .. Local Scalars ..
      INTEGER            I, J
*     ..
*     .. Executable Statements ..
*
      DO J = 1, N
         DO I = 1, M
            X( I, J ) = X( I, J ) / D( I )
         END DO
      END DO

      RETURN
      END

Re: [PATCH RFC]Pair load store instructions using a generic scheduling fusion pass

Reply via email to