Giuseppe Bilotta <[email protected]> writes: >> On Thu, May 28, 2015 at 1:04 PM, Grigori Goronzy <[email protected]> wrote: >>> @@ -286,6 +287,13 @@ ilo_get_compute_param(struct pipe_screen *screen, >>> ptr = &val.images_supported; >>> size = sizeof(val.images_supported); >>> break; >>> + case PIPE_COMPUTE_CAP_SUBGROUP_SIZE: >>> + /* best case is SIMD32 */ >>> + val.subgroup_size = 32; >>> + >>> + ptr = &val.subgroup_size; >>> + size = sizeof(val.subgroup_size); >>> + break; >>> default: >>> ptr = NULL; >>> size = 0; >> >> Everything else seems fine to me, but IIRC Intel's IGPs have a SIMD >> width of 16, not 32. (Or if it depends on generation, we should >> probably have a lookup function like for r600). > > Ok, scratch that. I was confused by the fact that Beignet reports a > preferred work-group size multiple of 16. Intel IGPs support _logical_ > SIMD width of up to 32, but the _hardware_ SIMD width is just 4. So > the question is if here we should report the _hardware_ width, or the > maximum _logical_ width. > The physical SIMD width of any Intel GPU that as far as I'm aware ILO supports is 8, however, the hardware can execute 16- and in some cases 32-wide instructions by splitting them internally into instructions of the native SIMD width. There is an actual performance benefit from this, mainly because it can save some overhead and hide part of the execution latency when several interdependent instructions are encountered in sequence (e.g. by doing SIMD16 you typically have the guarantee that there will be no mutual data dependencies between any pair of native-width instructions arriving into the pipeline one after the other, so you may avoid stalls).
As this cap is just a performance hint, I think it makes sense to assume the best-case scenario as Grigori has done. If the driver later on decides it doesn't pay off to use the maximum SIMD width it can always use less, but using more may be difficult if the application didn't keep it in mind while choosing the workgroup layout. That said, it doesn't look like ILO supports SIMD32 at this point, and the first Intel GPU with any hardware support for it was IVB (Gen7). I suggest you just return 16 unconditionally for now but keep the comment saying that the best case is SIMD32 (on Gen7+). Thanks. > For OpenCL, the _logical_ aspect is the only relevant one, but I think > this should be handled on the OpenCL side of things (since it also > depends on things such as the vectorization of each specific kernel > and, for future OpenCL 2.0 support, even on the individual launch > grid). Here, I think the _hardware_ property should be reported > instead. > > -- > Giuseppe "Oblomov" Bilotta > _______________________________________________ > mesa-dev mailing list > [email protected] > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
signature.asc
Description: PGP signature
_______________________________________________ mesa-dev mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/mesa-dev
