Tamar Christina <tamar.christ...@arm.com> writes: >> > so I'd need 5 parameters and then I'm guessing the other expressions >> would be removed by DCE at some point? >> >> Are you planning to make the FCMLA behaviour directly available as an >> internal function or provide a higher-level one that does a full complex >> multiply, with the target lowering that into individual instructions where >> necessary? > > I was planning on doing it as one internal function and leave it up to > the target to expand it however it needs to.
OK, sounds good. >> What to do with the intermediate results you don't need is an interesting >> question :-). Like you say, I was hoping DCE would get rid of them later. >> Does that not work? > > I haven't tried it yet 😊 But I assume it'll work too. I have complex > add almost working, it generates the right code for the vectorized > loop. The loads are also corrected and the permute is gone and I > update all the data references for the two statements I replaced. Not sure what you mean by the last bit. Why do you need to replace data references rather than just use the existing ones? > However for the scalar tail loop I have a problem since I only have > vector versions of the instructions, and the scalar loop is created > from the same SLP tree. So I end up with the builtins in the tail > loop with nothing to expand them to and with no way to differentiate > between the two calls to the internal fn. > > I would need to somehow undo this for the scalar part.. The epilogue loop should just be a copy of the basic block before vectorisation is applied. The new calls shouldn't be in that, just in the SLP tree. (This is how pattern statements work too: they're never added to the basic block, they're just temporary statements attached to internal vectoriser structures.) Thanks, Richard