> - riscv_vector::emit_vlmax_insn (code_for_pred_broadcast (<MODE>mode), > - riscv_vector::UNARY_OP, operands); > + /* We cannot do anything with a Float16 mode apart from converting. > + So convert to float, broadcast and truncate. */ > + if (TARGET_ZVFHMIN && !TARGET_ZVFH && <VEL>mode == HFmode) > + { > + rtx tmpsf = gen_reg_rtx (SFmode); > + emit_insn (gen_extendhfsf2 (tmpsf, operands[1])); > + poly_uint64 nunits = GET_MODE_NUNITS (<MODE>mode); > + machine_mode vmodesf > + = riscv_vector::get_vector_mode (SFmode, nunits).require (); > + rtx tmp = gen_reg_rtx (vmodesf); > + rtx ops[] = {tmp, tmpsf}; > + riscv_vector::emit_vlmax_insn (code_for_pred_broadcast (vmodesf), > + riscv_vector::UNARY_OP, ops); > + rtx ops2[] = {operands[0], tmp}; > + riscv_vector::emit_vlmax_insn (code_for_pred_trunc (vmodesf), > + riscv_vector::UNARY_OP_FRM_DYN, ops2);
I disagree with this part especially the comment, vlse for HF vector just a 16 bits load, and load does not really care about the data format but size. Also we can put HF in GPR rather than FPR for those splat/broadcast patterns in theory. > + } > + else > + riscv_vector::emit_vlmax_insn (code_for_pred_broadcast (<MODE>mode), > + riscv_vector::UNARY_OP, operands); On Tue, Jun 24, 2025 at 8:47 AM Jeff Law <jeffreya...@gmail.com> wrote: > > This is primarily work from Robin and Shreya. My contribution is just > mentoring for Shreya and writing the ChangeLog. Shreya is busy on a > code generation issue and I expect both new entries in the tuning > structure as well as new instances of the tuning structure in the works > (spacemit x60) coming relatively soon. > > Late breaking news is that we are going to need to add some additional > alignment checks to this code. That's a preexisting issue and after > some discussions with Robin and a bit of pondering on my side I've > decided to go forward with this change now. Robin is already looking > at alignment issues WRT strided, indexed and presumably segmented memory > references and will cover the issue as part of that work. > > -- > > So the basic idea here is to give uarchs the ability to enable/disable > using the zero strided load idiom to broadcast a single memory element > across a vector. While long term I would expect most if not all designs > to support this efficiently, I could easily some vector designs not > having implementations of this optimization in the short term. > > It didn't seem worth the effort to have a --param here. If folks think > it's really needed, it certainly could be added. Though I suspect it's > primarily a test, set and forget process for each design with vector > support. > > This has been in my tester a few days. Waiting for pre-commit CI to do > its thing. > > jeff