Hi,
On 05.04.2017 11:27, Chris Wilson wrote:
On Tue, Apr 04, 2017 at 07:21:38PM -0700, Jason Ekstrand wrote:
Before, we were just looking at whether or not the user wanted us to
wait and waiting on the BO. This instead makes us busy-loop on each
query until it's available. This reduces some of the pipeline bubbles
we were getting and improves performance of The Talos Principle on
medium settings (where the GPU isn't overloaded) by around 20% on my
SkyLake gt4.
I get similar results. Depending on which part of Benchmark one is
running, and whether one is using (Ubuntu default) powersave [1] or
performance governor with Mesa master, the improvement is 10-25%.
Even with everything set to Ultra, there's small improvement.
Tested-by: Eero Tamminen <[email protected]>
- Eero
[1] With powersave scheduler, without this patch kernel doesn't even
request GPU to run at full speed. This was with Ubuntu 4.4 + 4.7 i915
backport kernel, but using latest drm-tip didn't seem to change things
significantly.
Hmm. The kernel also spins, but it limits itself to only spining on the
active request and for a max of 2us within your process's timeslice.
The ioctl overhead is ~100ns in this case, cheaper than a call to
clock_gettime()! Looks like the advantage here is that you do not limit
yourself. A much simpler loop doing the same would be
while (true) {
if (query_is_available())
return VK_SUCCESS;
if (!gem_busy())
return query_is_available() ? VK_SUCCESS : VK_NOT_READY;
}
---
src/intel/vulkan/genX_query.c | 72 +++++++++++++++++++++++++++++++++++++++----
1 file changed, 66 insertions(+), 6 deletions(-)
diff --git a/src/intel/vulkan/genX_query.c b/src/intel/vulkan/genX_query.c
index 7ea9404..ebf99d2 100644
--- a/src/intel/vulkan/genX_query.c
+++ b/src/intel/vulkan/genX_query.c
@@ -131,6 +131,64 @@ cpu_write_query_result(void *dst_slot, VkQueryResultFlags
flags,
}
}
+#define NSEC_PER_SEC 1000000000
+
+static bool
+query_is_available(struct anv_device *device, uint64_t *slot)
+{
+ if (!device->info.has_llc)
+ __builtin_ia32_clflush(slot);
Make the target cacheable? Your query write will then do the cacheline
invalidation, but there's obviously a tradeoff depending on the frequency
of snooping.
+
+ return slot[0];
+}
+
+static VkResult
+wait_for_available(struct anv_device *device,
+ struct anv_query_pool *pool, uint64_t *slot)
+{
+ while (true) {
+ struct timespec start;
+ clock_gettime(CLOCK_MONOTONIC, &start);
+
+ while (true) {
+ if (!device->info.has_llc)
+ __builtin_ia32_clflush(slot);
+
+ if (query_is_available(device, slot))
+ return VK_SUCCESS;
+
+ struct timespec current;
+ clock_gettime(CLOCK_MONOTONIC, ¤t);
+
+ if (current.tv_nsec < start.tv_nsec) {
+ current.tv_nsec += NSEC_PER_SEC;
+ current.tv_sec -= 1;
+ }
+
+ /* If we've been looping for more than 1 ms, break out of the busy
+ * loop and ask the kernel if the buffer is actually busy.
+ */
+ if (current.tv_sec > start.tv_sec ||
+ current.tv_nsec - start.tv_nsec > 1000000)
+ break;
+ }
+
+ VkResult result = anv_device_wait(device, &pool->bo, 0);
Using the busy-ioctl is even cheaper than wait(0).
+ switch (result) {
+ case VK_SUCCESS:
+ /* The BO is no longer busy. If we haven't seen availability yet,
+ * then we never will.
+ */
+ return query_is_available(device, slot) ? VK_SUCCESS : VK_NOT_READY;
+ case VK_TIMEOUT:
+ /* The BO is still busy, keep waiting. */
+ continue;
+ default:
+ return result;
+ }
+ }
+}
_______________________________________________
mesa-dev mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/mesa-dev