Beignet 1.3.0 ========================
Beignet version 1.3 has been released. This is a major release of Beignet. This release include lots of improvements. The most important one is complete OpenCL 2.0 support. From 6th generation Intel Core Processors, include Skylake, Kabylake and Apollolake, OpenCL 2.0 support can be turned on or off with build. When OpenCL 2.0 support is turned on with build, Beignet complies with OpenCL 2.0 spec. For more OpenCL 2.0 information, please refer to the README. Another improvement is runtime driver's refinement. Beignet re-implement the event module and enqueue module, make them more modular and structured. Beignet supported more extensions, speeded up kernel compiling time and improved the performance in this release. The highlighted improvements are as below: 1. OpenCL 2.0 support. 2. OpenCL event and enqueue module re-implement. 3. Other OpenCL runtime driver refine. 4. LLVM 3.9 support. 5. Extension cl_khr_gl_sharing support. 6. Extension intel_subgroups_short support. 7. Large kernel compiling speed up. 8. Register allocation improvement. 9. Bugs fix. Git tag: Release_v1.3.0 Gitweb URL: http://cgit.freedesktop.org/beignet https://01.org/sites/default/files/beignet-1.3.0-source.tar.gz md5sum: ff4b5f66fc66649aef883e5602d0a3b1 beignet-1.3.0-source.tar.gz sha1sum: e77f7bcca16e3f19066a7335876b7ba3ffc3ee39 beignet-1.3.0-source.tar.gz sha256sum: 63d98b4fe8fba3dbc0299d29fef84560625e5ac51b16b8fed453021d4afb5cd5 beignet-1.3.0-source.tar.gz ----------------------------------------------------------------- Changes since 1.2.0: Armin K (1): buildsys: Use CMRT_LIBDIR instead of CMRT_LIBRARY_DIRS Chuanbo Weng (3): Runtime: re-enable cl_khr_gl_sharing with existing egl extension. rumtime: check all the extension id, not only BASE and OPT1. runtime: set cl_intel_motion_estimation as IVB specifc device extension. Giuseppe Bilotta (2): Fix shift-overflow warning toMB: use standard constant Guo Yejun (12): fix the condition to check if there are built-in kernels use OCL_MAP_BUFFER_GTT to map climage avoid too many messages when the driver could not find good values for local_size fix w of image when simulate image1dbuffer with image2d add another broxton pciid 0x5A85 enlarge stack size for chv since its EU might be masked enlarge scratch size for bxt 0x5a85 add bxt with pciid 0x1A84 correct the kernel name add bxt with pciid 0x1A85 change PCI_CHIP_BROXTON_P to PCI_CHIP_BROXTON_0 to unify the naming fix UNTYPED_WRITE function parameters for Gen75Encoder::UNTYPED_WRITE Guo, Yejun (21): fix build issue when HAS_BO_SET_SOFTPIN is false remove some redundant code for printf do not care dst for printf do not touch src1 when setting instruction header prepare gen9 sends binary format and enable the ASM dump for sends support sends (split send) for untyped write revert clCreateCommandQueue* from ocl2.0 back to 1.2 in utests move function setDPByteScatterGather into class GenEncoder add sends support for byte write disable CMRT as default, since no real case reported save host_ptr when create sub buffer from CL_MEM_ALLOC_HOST_PTR enable sends for skl refine code to change insn.extra.splitSend as encoder funtion parameter support sends for long write add sends for atomic operation, only for ocl 1.2 refine code starting from header in typedwrite enable sends for typed write output more detail of GEN IR for workgroup op add sends support for oword/media block write enable sends to write SLM for workgroup op add sends support for printf Igor Gnatenko (1): Fix build with latest libdrm Jan Vesely (3): api: check kernel parameter before accessing it tests: Use python2 explicitly libocl: Provide specs required CL_VERSION macros Junyan He (51): Runtime: Add CL base object for all cl objects. Runtime: Apply CL base object to program. Runtime: Apply base object to cl_platform_id Runtime: Apply base object to cl_device_id Runtime: Apply base object ot cl_sampler. Runtime: Apply base object to cl_mem. Runtime: Apply base object to cl_event Runtime: Apply base object to cl_context Runtime: Apply base object to cl_command_queue. Runtime: Apply base_object to cl_kernel Runtime: Apply base object to cl_accelerator_intel Add list operation to utils. Add WAIT_ON_COND and WAIT_ON_COND to base object. Delete all the verbose locks and use list to store CL objects. Add command queue's enqueue thread. Implement event related functions. Modify all event related functions using new event handle. Add ref check for CL object's validation. Fix bugs in utest for event. Add a multi-queue utest. Delete useless cl_thread files. Fix a bug for event error status. Fix a bug for double free of enqueueNativeKernel. Add error handle for command queue destroy. Delete useless event list in command queue struct. Add a helper function for all information get. Modify clGetEventInfo using cl_get_info_helper. Modify clGetPlatformInfo using cl_get_info_helper. Modify clGetKernelInfo using cl_get_info_helper. Modify clGetCommandQueueInfo using cl_get_info_helper. Modify clGetContextInfo using cl_get_info_helper. Modify clGetDeviceInfo using cl_get_info_helper. Modify clGetSamplerInfo using cl_get_info_helper. Modify program Info using cl_get_info_helper. Modify clGetMemObjectInfo using cl_get_info_helper. Modify clGetImageInfo using cl_get_info_helper. Add helper functions for device list check. Refine create context APIs. Add multi devices support in context. Refine clRetain/Release MemObject Refine clCreateSampler API. Refine retain/release sampler API refine clCreateCommandQueue and clRetainCommandQueue. Move Device related APIs to new file Move clCreateCommandQueueWithProperties API to command_queue file. Utest: Refine half and float convert functions. Refine list related functions. Add profiling feature based on new event implementation. Improve event execute function. Fix two bugs about event. Fix a event notify bug. Luo Xionghu (12): add atomic operators output for GEN_IR and gen disa. gbe: add AtomicA64 instructions with stateless access. support generic atomic. utest: add generic atomic test. cl_mem_fence_flags definiton change from MACRO to enum gbe: atomic_long type support. address bits change to 64. Runtime: Add API clCreateCommandQueueWithProperties atomic_flag_test_and_set function fix. gbe: use kernel_arg_base_type to recognize image arguments. gbe: add vec_type_hint's type into functionAttributes. atomic bug fix. Mark Thompson (1): Apply image offset to read/write/map operations Meng Mengmeng (3): Runtime: return CL_INVALID_EVENT_WAIT_LIST if not event in the wait list. eliminate build warnings in i386 system. Runtime: Use cl_ulong as CL_DEVICE_MAX_MEM_ALLOC_SIZE's return type. Pan Xiuli (70): Backend: Refine block_read buffer with unaligned OWord block read Utest: Add test for half type subgroup functions Backend: Fix printf bug for simd8 Runtime: Fix null device for clGetKernelWorkGroupInfo Libocl: Add define for cl_intel_subgroups Backend: Resize the selection instruction max dst num Backend: Refine image block read with less vector and dst tmp Backend: Fix simd id will broke in simd8 mode Utest: Fix sub group broadcast for simd8 Backend: Fix simd shuffle base address Utest: Fix sub group shuffle for simd8 Backend: Fix bug for sub/work group functions Libocl: Fix get_sub_group_size bug Backend: Refine gen ir ALU1 inst getType Utest: Change the kernel index to fit case index Runtime: Fix accesss quilifer for internal kernels Libocl: Image should have access qualifier Utest: read/write_only qualifier should only used with image. Utest: Remove load spir test Backend: Add support for LLVM 3.9 release Backend: Refine GenRegiter::offset Backend: Refine register offset for simd shuffle Backend: Refine sub group broadcast code for spec Libocl: Add sub group broadcast short builtin function Utest: Add check subgroup short helper function Utest: Add test case for sub group broadcast short Backend: Change the sel ir optimization for unpack register Backend: Add short sub group builtin functions Utest: Add test case for sub group short builtin functions Backend: Add sub groups short shuffle builtin functions Utest: Add test case for short type sub group shuffle Backend: Add subgroup short block read/write Utest: Add subgroup block read/write ushort test case Backend: Add A64 subgroup block read/write support Libocl: Add intel_subgroups_short extension Backend: Add built-in ctz function Utest: add a test case for built-in ctz function Runtime: Add clCreateSamplerWithProperties Utest: Add sampler test Runtime: Add support of OCL2.0 device queries Runtime: Add extensions for OCL20 Runtime: Add pipe related APIs Backend: Add Pipe Builtin support Backend: Add pipe packet size check Utest: Add pipe related test Runtime: Add support for sRGB Runtime: Refine clGetSupportedImageFormats to support CL_MEM_FLAGS Runtime: Add suport for sRGB to clEnqueueCopyImage Runtime: Add suport for sRGB to clEnqueueFillImage Runtime: Add support for clGetMemObjectInfo Backend: Refine get_enqueued_local_size and get_local_size Runtime: Add support for non uniform group size Backend: Clang now support static, fix now libocl: Refine return type of workitem built-in functions Backend: Chang scan limit for GVN pass Runtime: Add support for queue size and fix error handling Backend: Add RegisterFamily for ir Backend: Initialize the extra value for selection instruction Backend: Fix GenRegister::offset sub reg offset Backend: Refine flag usage in instrction selection Backend: Add kernel name for sel ir output Backend: Refine instruction ID for sel ir Backend: Refine selection IR output Backend: Refine block read/write instruction selection Backend: Fix some A64 block read/write bug CMake: Add OCL20 env for utest Backend: Fix sel ir subnr usage Backend: Fix header address of oword block read/write GBE: Fix memdep-block-scan-limit caused bug on LLVM3.8 GBE: Fix getTypesize bug with LLVM3.9 Rebecca N. Palmer (10): Allow building tests with Python 3 (no string.atoi) Utest: test pow, not powr, on negative x Docs: Spelling and grammar fixes Utests: use clGetExtensionFunctionAddressForPlatform Utests: Don't end an all-tests run when one test fails Utests: respect existing C/CXXFLAGS Fix build failure with CMRT enabled Utests: Allow testing cl_intel_accelerator via ICD Add clGetKernelSubGroupInfoKHR to _cl_icd_dispatch table Fail, don't assert, if unable to create context Ruiling Song (25): GBE: add untyped A64 stateless message GBE: add byte scatter a64 message GBE: Add 64bit data stateless messages GBE: new Load/Store Instruction Selection pattern OCL20/GBE: Fix 64bit pointer issue in Load store instruction selection. ocl20/runtime: take the first 64KB page table entries. ocl20/GBE: support generic load/store utest: add generic pointer test GBE: Implement new constant solution for ocl2 GBE: Implement to_local/private/global() function libocl: add get_fence() builtin. GBE: Fix type mismatch bug. GBE: Fix SEL.bool issue. GBE: add ocl 2.0 work_group_barrier support. GBE: Fix bug when unspill a long type value from scratch. GBE: don't try to erase a llvm:Constant. GBE: the dst grf should use same width as source register GBE: retype double register to long type when do spilling. runtime: prog->global_data may get 64bit address GBE: imm64 should not be in src1 per hardware spec. GBE: handle ConstantExpr in program-scope variable handling. GBE: Refine program scope variable logic. GBE: Fix destination grf register type for cmp instruction. runtime: handle PROGRAM_BUILD_GLOBAL_VARIABLE_TOTAL_SIZE GBE: Fix another Sel.bool issue. Yan Wang (4): Fix bug: Initialize bti of LoadInstuctionPattern::shootByteGatherMsg(). Fix getting bitwidth of PointerType of LLVM. Restore jump threading pass for reducing compiling time when run the large and complex kernel like Luxmark. Avoid possible invalid pointer by vector interator. Yang Rong (36): Docs: update readme. Bump version to 1.3. Docs: update a readme typo. GBE: fix uninitialized build warning. GBE: fix half immediate negate assert. GBE: Fix assert when get metadata llvm.loop.unroll.enable. GBE: Fix a logical insn with flag bug. NEWS: Update Release 1.2.1. OCL20/GBE: Change the pointer relative op's type. OCL20: Add svm support. OCL20: Add OpenCL2.0 apis to icd. OCL20: add svm enqueue apis and svm's sub buffer support. OCL20: add gbe_kernel_get_ocl_version for getting kernel's version in runtime. libocl: change prototype of vload/vstore to match ocl2.0 spec. add opencl builtin atomic functions implementation. utest: add atomic opencl-2.0 case to test api. OCL20: Fix svm bugs OCL20: Implement clSetKernelExecInfo api Libocl: change prototype of math built-in for OCL2.0 spec OCL20: fix a unpack long assert. Runtime: Fix vme fail. Refine clSetMemObjectDestructorCallback API. GBE: reorder the LLVM pass to reduce the compilation time. GEB/Runtime: eliminate release build warnings. utest: suspend deprecated-declarations warning. Add the NULL pointer check. GBE: correct the llvm.loop.unroll.enable meta. Runtime: add the head file to avoid implicit declaration of function 'cl_devices_list_include_check' warning. Runtime: fix a profiling fail. utest: fix i386 system long ctz fail. GBE: fix long work group fail. Runtime: Fix a event bug. GBE: if PointerFamily is FAMILY_QWORD, chv and bxt need special handle. GBE: fix legacy read64 mix pointer bug. GBE: fix a mix analyze bug. Add some pointer access check. Yang, Rong R (23): KBL: fix some 1d array test fail. Runtime: avoid clang warning "warning: expression result unused". Add new BXT and KBL pciids to GetGenID.sh. GBE: fix ctz fail. Runtime: fix clEnqueueMigrateMemObjects fail. GBE: don't use call->getCalledFunction() to decide the materialize function. GBE: remove image type's access qual from image type name. Runtime: fix fill image event assert and some SVM rebase error. OCL20: Add read_write image type of image apis. OCL20: add beignet_20.pch and beignet_20.bc. OCL20: Add __OPENCL_VERSION__ and CL_VERSION_2_0 define. OCL20: enable -cl-std=CL2.0. OCL20: Add generic address space memcpy and memset. GBE: fix a src/dst register reuse bug. OCL20: add device enqueue helper functions in backend. OCL20: add device enqueue builtins. OCL20: add ir register enqueuebufptr for enqueue global buffer. OCL20: handle device enqueue helper functions in the backend. OCL20: Add runtime functions to get the device enqueue info. OCL20: add a cl_kernel pointer to gpgpu. OCL20: handle device enqueue in runtime. OCL20: add device enqueue test case. CMake: add an option to enable OpenCL 2.0. Zhigang Gong (1): CL: update to 2.0 header files.
_______________________________________________ Beignet mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/beignet
