On Sat, Mar 21, 2020 at 09:44:43PM -0700, Dixit, Ashutosh wrote:
On Sat, 21 Mar 2020 16:26:42 -0700, Dixit, Ashutosh wrote:

On Thu, 19 Mar 2020 15:52:01 -0700, Umesh Nerlige Ramappa wrote:
>
> From: Lionel Landwerlin <[email protected]>

> @@ -477,16 +468,6 @@ static bool oa_buffer_check_unlocked(struct 
i915_perf_stream *stream)
>     */
>    spin_lock_irqsave(&stream->oa_buffer.ptr_lock, flags);
>
>    hw_tail = stream->perf->ops.oa_hw_tail_read(stream);
>
>    hw_tail &= ~(report_size - 1);
>
> @@ -496,64 +477,64 @@ static bool oa_buffer_check_unlocked(struct 
i915_perf_stream *stream)
>
>    now = ktime_get_mono_fast_ns();
>
> +  if (hw_tail == stream->oa_buffer.aging_tail &&
> +     (now - stream->oa_buffer.aging_timestamp) > OA_TAIL_MARGIN_NSEC) {
> +          /* If the HW tail hasn't move since the last check and the HW
> +           * tail has been aging for long enough, declare it the new
> +           * tail.
> +           */
> +          stream->oa_buffer.tail = stream->oa_buffer.aging_tail;
> +  } else {
> +          u32 head, tail;
>
> +          /* NB: The head we observe here might effectively be a little
> +           * out of date. If a read() is in progress, the head could be
> +           * anywhere between this head and stream->oa_buffer.tail.
> +           */
> +          head = stream->oa_buffer.head - gtt_offset;
>
> +          hw_tail -= gtt_offset;
> +          tail = hw_tail;
>
> +          /* Walk the stream backward until we find a report with dword 0
> +           * & 1 not at 0. Since the circular buffer pointers progress by
> +           * increments of 64 bytes and that reports can be up to 256
> +           * bytes long, we can't tell whether a report has fully landed
> +           * in memory before the first 2 dwords of the following report
> +           * have effectively landed.
> +           *
> +           * This is assuming that the writes of the OA unit land in
> +           * memory in the order they were written to.
> +           * If not : (╯°□°)╯︵ ┻━┻
>             */
> +          while (OA_TAKEN(tail, head) >= report_size) {
> +                  u32 previous_tail = (tail - report_size) & (OA_BUFFER_SIZE 
- 1);
> +                  u32 *report32 = (void *)(stream->oa_buffer.vaddr + 
previous_tail);

Sorry, this is wrong. This should just be:

                        tail = (tail - report_size) & (OA_BUFFER_SIZE - 1);
                        report32 = (void *)(stream->oa_buffer.vaddr + tail);

Otherwise when we break out of the loop below tail is still set one
report_size ahead. previous_tail is not needed. (In the previous version of
the patch this used to work out correctly).

> +
> +                  /* Head of the report indicated by the HW tail register has
> +                   * indeed landed into memory.
> +                   */
> +                  if (report32[0] != 0 || report32[1] != 0)
> +                          break;
> +
> +                  tail = previous_tail;
>            }

Actually a couple of further improvements to the loop above are
possible. First there is no reason to start at previous_tail, we can just
start at the aligned hw_tail itself. Therefore the loop becomes:

                while (OA_TAKEN(tail, head) >= report_size) {
                        u32 *report32 = (void *)(stream->oa_buffer.vaddr + 
tail);

                        if (report32[0] != 0 || report32[1] != 0)
                                break;

                        tail = (tail - report_size) & (OA_BUFFER_SIZE - 1);
                }

Further, there is no reason to go back to the head but only to the old
tail. Therefore:

                head = stream->oa_buffer.head - gtt_offset;
                old_tail = stream->oa_buffer.tail - gtt_offset;

                hw_tail -= gtt_offset;
                tail = hw_tail;

                while (OA_TAKEN(tail, old_tail) >= report_size) {
                        u32 *report32 = (void *)(stream->oa_buffer.vaddr + 
tail);

                        if (report32[0] != 0 || report32[1] != 0)
                                break;

                        tail = (tail - report_size) & (OA_BUFFER_SIZE - 1);
                }

Please review and see if these two improvements are possible. Thanks!

I think that this is possible. We could check reports between old_tail and tail to optimize the number of reports we read. I will give this a try.

Thanks,
Umesh
_______________________________________________
Intel-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to