Regarding "fixed_local_sz[0] = 16", the reason is that the basic unit of VME hardware is 16*16 pixels, and our design is to handle 1*16 pixels in a work item, and use 16*1 as local size, so, each group is a basic unit of VME.
For the extension concern "Is this a duplicate of code in check_op1_extension()?", yes, it is a duplicate code, will be removed in v2. For others, Chuanbo will refine and send out the v2 patch. -----Original Message----- From: Song, Ruiling Sent: Sunday, September 06, 2015 3:02 PM To: Weng, Chuanbo; [email protected] Cc: Guo, Yejun; Weng, Chuanbo Subject: RE: [Beignet] [PATCH 2/4] add extensions intel_accelerator and basic intel_motion_estimation > + if (kernel->vme) { > + fixed_local_sz[0] = 16; > + fixed_local_sz[1] = 1; Why it is 16? Does it work for all cases? > - if (global_work_size != NULL) > + if (kernel->vme) { > + fixed_global_sz[0] = (global_work_size[0]+15) / 16 * 16; > + fixed_global_sz[1] = (global_work_size[1]+15) / 16; } else { > for (i = 0; i < work_dim; ++i) > fixed_global_sz[i] = global_work_size[i]; > + } > if (global_work_offset != NULL) > for (i = 0; i < work_dim; ++i) > fixed_global_off[i] = global_work_offset[i]; > @@ -140,10 +140,16 @@ > cl_command_queue_bind_image(cl_command_queue queue, cl_kernel k) > > image = cl_mem_image(k->args[id].mem); > set_image_info(k->curbe, &k->images[i], image); > - cl_gpgpu_bind_image(gpgpu, k->images[i].idx, image->base.bo, image- > >offset, > - image->intel_fmt, image->image_type, image->bpp, > - image->w, image->h, image->depth, > - image->row_pitch, image->slice_pitch, > (cl_gpgpu_tiling)image- > >tiling); Please check for invalid image format. > + if(k->vme) > + cl_gpgpu_bind_image_for_vme(gpgpu, k->images[i].idx, > + image->base.bo, > image->offset, > + image->intel_fmt, image->image_type, image->bpp, > + image->w, image->h, image->depth, > + image->row_pitch, image->slice_pitch, > + (cl_gpgpu_tiling)image- > >tiling); > /* Bind all samplers */ > - cl_gpgpu_bind_sampler(gpgpu, ker->samplers, ker->sampler_sz); > + if (ker->vme) > + cl_gpgpu_bind_vme_state(gpgpu, ker->accel); else > + cl_gpgpu_bind_sampler(gpgpu, ker->samplers, ker->sampler_sz); Like sampler state, I hope we can gather the vme count used in the kernel, and don't need to set the state of all VMEs. > diff --git a/src/cl_context.h b/src/cl_context.h > --- a/src/cl_extensions.c > +++ b/src/cl_extensions.c > @@ -40,7 +40,7 @@ void check_opt1_extension(cl_extensions_t *extensions) > int id; > for(id = OPT1_EXT_START_ID; id <= OPT1_EXT_END_ID; id++) > { > - if (id == EXT_ID(khr_icd)) > + if (id == EXT_ID(khr_icd) || id == EXT_ID(intel_accelerator) || > + id == > EXT_ID(intel_motion_estimation)) > extensions->extensions[id].base.ext_enabled = 1; #if > LLVM_VERSION_MAJOR == 3 && LLVM_VERSION_MINOR >= 5 > if (id == EXT_ID(khr_spir)) > @@ -63,7 +63,9 @@ check_gl_extension(cl_extensions_t *extensions) { > void check_intel_extension(cl_extensions_t *extensions) { > - /* Should put those map/unmap extensions here. */ > + int id; > + for(id = INTEL_EXT_START_ID; id <= INTEL_EXT_END_ID; id++) > + extensions->extensions[id].base.ext_enabled = 1; Is this a duplicate of code in check_op1_extension()? > + > +static void > +intel_gpgpu_bind_vme_state_gen7(intel_gpgpu_t *gpgpu, > +cl_accelerator_intel > accel) > +{ > + int index; As I said, setting all the vme states is not a good idea, please refine this in next step. > + for(index = 0; index < GEN_MAX_VME_STATES; index++) > + intel_gpgpu_insert_vme_state_gen7(gpgpu, accel, index); } > + > static void I have discussed with Chuanbo, src_grf0, src_grf1,... src_grf4 don't need to be passed in to __gen_ocl_vme(), they are not real arguments. Thanks! Ruiling > + vme_result = __gen_ocl_vme(src_image, ref_image, > + src_grf0, src_grf1, src_grf2, src_grf3, src_grf4, > + src_grf0_dw7, src_grf0_dw6, src_grf0_dw5, src_grf0_dw4, > + src_grf0_dw3, src_grf0_dw2, src_grf0_dw1, src_grf0_dw0, > + src_grf1_dw7, src_grf1_dw6, src_grf1_dw5, src_grf1_dw4, > + src_grf1_dw3, src_grf1_dw2, src_grf1_dw1, src_grf1_dw0, > + src_grf2_dw7, src_grf2_dw6, src_grf2_dw5, src_grf2_dw4, > + src_grf2_dw3, src_grf2_dw2, src_grf2_dw1, src_grf2_dw0, > + src_grf3_dw7, src_grf3_dw6, src_grf3_dw5, src_grf3_dw4, > + src_grf3_dw3, src_grf3_dw2, src_grf3_dw1, src_grf3_dw0, > + src_grf4_dw7, src_grf4_dw6, src_grf4_dw5, src_grf4_dw4, > + src_grf4_dw3, src_grf4_dw2, src_grf4_dw1, src_grf4_dw0, > + //msg_type, vme_search_path_lut, lut_sub, > + 1, 0, 0); > + > + barrier(CLK_LOCAL_MEM_FENCE); > + > + int lid_x = get_local_id(0); > + uint write_back_grf1_dw0 = __gen_ocl_region(0, vme_result.s1); > + short2 val = as_short2( write_back_grf1_dw0 ); int index = gid_1 * > + get_num_groups(0) + gid_0; if( lid_x == 0 ){ > + motion_vector_buffer[index] = val; } > + > +} > -- > 1.9.1 > > _______________________________________________ > Beignet mailing list > [email protected] > http://lists.freedesktop.org/mailman/listinfo/beignet _______________________________________________ Beignet mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/beignet
