On Sun, May 15, 2005 at 08:16:04PM -0700, Mike Stump wrote:

> On Sunday, May 15, 2005, at 04:11  PM, Luke Kenneth Casson Leighton 
> wrote:
> > *click* - so you .... you... ooooooo :)
> >
> > holy cow.
> >
> > you looked at valarray,
> 
> No, not really, I'm not a library guy.  I know of almost nothing of the 
> space, the applications or the tricks people play, but...

 ack.

> >and went "how could this be automatically speeded up by gcc, if gcc 
> >had access to a hardware vector processing unit"?
> >
> > i'm... genuinely impressed.
> 
> I'm sorry, wasn't meant to be impressive.  

 :)

 how to put it best - i'm impressed by the audacity and ambitious
 goals, then :)


> What would have been 
> impressive, is if I read up on ASP and coded up some complex algorithm 
> using all the latest tips and tricks of templates, and had you try it 
> and and you discovered that indeed it was trivial enough to write, 
> exactly matched what, as an author, you would have expected, best 
> case...  I think that is possible, but alas, I'd just leave it as an 
> exercise for the reader.

 well, given as i mentioned in my previous message that bit-level
 programming just _isn't_ something that sane people do, i would be
 _very_ impressed to find any support for all of the features
 of an ASP (in particular, that hardware "carry" instruction).


> > can you _imagine_ the number of different tags you'd need to say
> > "i want this register to be 1-bit wide, spread across 16 processors 
> >each,
> >  i want _this_ register array to be 4-bits wide, spread across 32 
> >processors.."
> 
> bitregister<1,16> i;
> 
> bitregister<4, 32> j;
> 
> I can imagine...  Seems trivial to me...

 it's the data interleave and also the fact that, due to the size of the
 arrays, you would need to do _arrays_ of arrays or some other such
 trick...  and yet _still_ have it parallelised:

 bitregister<4, 32> j[16];

 or, in valarray-like terminology:

 valarray<int> j[10];

 for (i = 0; i < 10; i++)
        j.set_size(16 * j); /* emulate the ability to cut an ASP into
                               arbitrary length strings at 16-APE
                               boundaries */

 and then _still_ be able to have _this_ parallelised:

 for (i = 0; i < 10; i++)
        j++;

 and there be only one instruction :)


 ... btw, for something like a parallel processing unit containing 64
 cells, where you declare an array j[8][8], or if you have only 16
 cells and you have an array j[4][4] ... is _that_ taken care of in the
 design of OpenMP?


> > ... it just goes _nuts_.
> 
> I don't see the use of the above nuts.  Coding up the library to 
> support it, would be, well, fun....  but for you (someone that knows 
> ASP) and someone that knows how to make C++ do tricks (expression 
> templates and template metaprograms at least) for them, it should be 
> trivial enough.
 
 *cackle*

 i _did_ start to create a template-based library which emulated
 the behaviour of the ASP (working from the original code i'd
 written in python).

 python's "map" and "reduce" functions were extremely useful
 in this respect, and certain ASP operations can be done with
 "reduce" with an appropriate lambda function.

 

> >well, the approach taken by aspex _makes_ it portable, already
> >[because it's a macro pre-processing step, turning inline-asp
> >instructions into c-code].
> 
> Vendor lock in by a vendor that can go out of business isn't what we 
> call portable.  Portable means that someone versed in it, can use it, 
> and that code can run on sse3, mmx, altivec, ASP, normal hardware or a 
> Cell processor, BlueGene, virtually unmodified.  

 *sigh*...

> For example, OpenMP 
> would seem to be portable (not being an expert in that field, I'd let 
> people correct me).  BLAS, boost and Blitz++  are yet other ways...  
> http://ggt.sourceforge.net/html/main.html is a new one I've not heard 
> of...  but google has.

 
> Do you know what Blitz++ is and does?  And how?
 
  looks like blitz++ and ggtl are both a bit high-level
  (and as such, end up being perfect targets for the OpenMP
   optimisation process).

> >[just not the ASP, because of their proprietary assembler-based 
> >toolchain]
> 
> No, even ASP, one just needs to understand the output of their 
> compiler, and then code it up, though, admittedly, one might not get 
> the speed, if the interface (valarray) is wrong.  

 it's the interleaving of data and coding that makes it so difficult
 to program.

 remember, you're talking _gigabytes_ per second, here.  IIRC
 it's a 64-bit bus, running at 250mhz on the VASP-F architecture,
 and tera bit-ops / sec (which obviously come rapidly down as you
 use that to do 8-bit adds, 8-bit MACs etc.)
 
 god only knows what they're doing with the VASP-G architecture which,
 last time i heard, was going to have SIXTEEN times the number of
 processing elements.


 _but_ if you _could_ do bitregister<1, 32> x and then change that to
 bitregister<2, 16>, bitregister<4, 8> and TEST CODE, then i don't
 believe that to be so much of a problem.

 you could even _automate_ the process of code-generation to test
 all combinations, using macros, and tell you which one was fastest!

> The deferred 
> evaluation math libraries would be closer to what might be required, 
> don't know if it is enough, but it might be; even if it weren't, a few 
> more concepts and certainly it would be.

 ah.  interesting.  perhaps...

> > instead of doing
> > for (i = 0; i < this->get_size(); i++)
> >     this->data[i] += op1->data[i]
> >
> > you'd do
> >
> > for (i = 0; i < this->get_size(); i+= vector_unit->get_size())
> > {
> >     asm { .... }
> > }
> 
> No, expression templates don't require rewriting of code like this.  
> That's the entire point.

 yes, i do appreciate that.

 hmmm :)

> > i imagine this to be a _whole_ lot less grief than putting support
> > in gcc for vectors / autodetection / tagging.
> 
> I actually mean to include C++ library work, as a first solution, as 
> doing up a library is usually preferable to compiler work.
> 
> > ... don't get me wrong - i'd be _delighted_ to see vector
> > autodetection and tagging in gcc!
> 
> Presto, download a copy today.  :-)
 
  :)

-- 
--
<a href="http://lkcl.net";>http://lkcl.net</a>
--

Reply via email to