> -----Original Message----- > From: Gwenole Beauchesne [mailto:[email protected]] > Sent: Thursday, June 05, 2014 9:29 PM > To: Xiang, Haihao > Cc: Zhao, Yakui; [email protected] > Subject: Re: [Libva] [PATCH v3 intel-driver 3/9] decoder: h264: simplify and > optimize reference frame store updates. > > Hi, > > 2014-06-05 14:44 GMT+02:00 Xiang, Haihao <[email protected]>: > > > > > >> -----Original Message----- > >> From: Libva [mailto:[email protected]] On Behalf Of > >> Gwenole Beauchesne > >> Sent: Thursday, June 05, 2014 5:20 PM > >> To: Zhao, Yakui > >> Cc: [email protected] > >> Subject: Re: [Libva] [PATCH v3 intel-driver 3/9] decoder: h264: > >> simplify and optimize reference frame store updates. > >> > >> 2014-06-04 7:30 GMT+02:00 Zhao, Yakui <[email protected]>: > >> > On Tue, 2014-06-03 at 22:59 -0600, Gwenole Beauchesne wrote: > >> >> 2014-06-04 4:11 GMT+02:00 Zhao, Yakui <[email protected]>: > >> >> > On Tue, 2014-06-03 at 10:43 -0600, Gwenole Beauchesne wrote: > >> >> >> Simplify and optimize the update process of the reference frame > store. > >> >> >> Use less iterations to look up existing objects. Use a cache to > >> >> >> store the free'd slots. > >> >> >> > >> >> >> Prerequisite: the reference_objects[] array was previously > >> >> >> arranged in a way that the element at index i is exactly the > >> >> >> object_surface that corresponds to the VA surface identified by > >> >> >> the VAPictureH264.picture_id located at index i in the > >> >> >> ReferenceFrames[] > >> array. > >> >> > > >> >> > As a whole, this patch looks good to me. Based on the hardware > >> >> > requirement, it will be better that one reference picture later > >> >> > used is not assigned for the other purpose even when it is not > >> >> > used in current reference frame list. In such case this patch > >> >> > does some optimization about DPB, which can defer the usage of > >> >> > one picture later used that is not in current reference > >> >> > frame.especially > when: > >> >> > one picture is not used during decoding frame X while it is used > >> >> > during > >> decoding frame X + 1. > >> >> > >> >> The case you mention was supposed to be fixed with the other patch > >> >> series where the re-factorisation is a pre-requisite. :) A very > >> >> strict GenFrameStoreContext needs to be maintained with the set of > >> >> used_frames[] as it is today, and an additional set of > >> >> free_frames[] *candidates*. Though, this case is not covered by > >> >> the conformance suite, and I haven't found any real world case > >> >> exhibiting that, neither did I try to generate one yet. So we can defer > >> >> this > one. > >> >> > >> >> > Although it still has no improvment for the following scenario, > >> >> > it still is an improvement. > >> >> > >One picture is not used during decoding frame X and X + 1 > >> >> > while it is used during decoding frame X + 2. > >> >> > > >> >> > Of course the perfect solution is that the upper layer can > >> >> > assure that one picture later used is still added into the DPB > >> >> > of current frame although it is not used in current reference list. > >> >> > >> >> No, this is not going to happen. This would be an implementation > >> >> error of the standard. > >> > > >> > In theory it is no problem for normal H264. But it is true for MVC > >> > case as some reference frames for view 1 is not used when decoding view > 0. > >> > >> When I said this was not going to happen, this concerned the "perfect > >> solution" you proposed, which is not one. That would be a workaround > >> to driver bugs, and an error of implementation of the standard. Of > >> course, the DPB holds all the necessary view components, but the > >> VAPictureH264.ReferenceFrames[] is not exactly the DPB, and it cannot be. > > > > The number of elements in VAPictureH264.ReferenceFrames[] is 16, so it > > is enough to hold All reference surfaces for H.264. and the patch works as > before, I am OK for your patch. > > > >> > >> The case you mention is what I said a paragraph above, and the only > "perfect" > >> way to solve it is to correctly track the references in the driver > >> side. This was what the GenFrameStoreContext was for. However, as I > >> mentioned, there is no supporting case in the conformance test suite, > >> and I haven't come by a real-world sample exhibiting this. > > > > The case Yakui mentioned is not for H.264 as a surface should be in > > DPB if it is referenced by later Surface. > > > > Xiaowei told us each view uses a separated DPB buffer for MVC > > There is no such thing in MVC. There is a unique DPB with a size scalable by a > log2 number of views. > > > A case might occurs as below: > > > > View0: f0, f1, f2, f3, ... > > View1: s0, s1, s2, s3, ... > > > > f0 is referenced by f1, but f0 isn't referenced by other surfaces in view0, > > f1 > is reference by f2. > > F0 is also referenced by s1 , both f0 and f1 are referenced by S2 in > > view1 > > > > When decoding f1, f0 is added into View0's DPB, so it is added into > > frame store too, the frame store index for f0 is *0*. > > > > When decoding s1, f0 is added into view1's DPB, it has been in the frame > store, the frame store index for f0 is still *0*. > > > > When decoding f2, f0 is removed from view0's DPB and f1 is added into > > view0's DPB, so f0 is removed from frame store and f1 is added into > > frame store too, now the frame store index For f1 is 0, > > > > When decoding s2, f0 is still in View1's DPB so it will be added into > > the frame store again, and now the frame store index for f0 is *1* > > > > The frame store index for f0 is changed from 0 to 1 and the decoding result > will be wrong. > > That's exactly the thing we are discussing since the beginning. :)
The above case doesn't exist if MVC uses a unique DBP buffer as f0 is still in the DBP after Decoding f2, hence the driver won't remove f0 from the frame store. > > And there is no way to handle that from the unique GenFrameStore array as it > is today, even if you delay the marking as "free", or whatever. > You need a more elaborated data struct. And this is what was to be provided > with the GenFrameStoreContext and additional re-factoring so that I don't > have to copy code around again. > > I am working on an alternative that wouldn't need too many changes at this > time. > > Regards, > Gwenole. > > > > >> > >> >> >> Theory of operations: > >> >> >> > >> >> >> 1. Obsolete entries are removed first, i.e. entries in the internal > >> >> >> DPB > >> >> >> that no longer have a match in the supplied ReferenceFrames[] > array. > >> >> >> That obsolete entry index is stored in a local cache: > >> >> >> free_slots[]. > >> >> >> > >> >> >> 2. This cache is completed with entries considered as "invalid" or > >> >> >> "not > >> >> >> present", sequentially while traversing the frame store for > obsolete > >> >> >> entries. At the end of this removal process, the free_slots[] > >> >> >> array > >> >> >> represents all possible indices in there that could be re-used for > >> >> >> new reference frames to track. > >> >> >> > >> >> >> 3. The list of ReferenceFrames[] objects is traversed for new entries > >> >> >> that are not already in the frame store. If an entry needs to be > >> >> >> added, it is placed at the index obtained from the next > >> >> >> free_slots[] > >> >> >> element. There is no need to traverse the frame store array again, > >> >> >> the next available slot can be known from that free_slots[] cache. > >> >> >> > >> >> >> Signed-off-by: Gwenole Beauchesne > >> >> >> <[email protected]> > >> >> >> --- > >> >> >> src/i965_decoder_utils.c | 107 > >> >> >> ++++++++++++++++++++-------------------------- > >> >> >> 1 file changed, 46 insertions(+), 61 deletions(-) > >> >> >> > >> >> >> diff --git a/src/i965_decoder_utils.c > >> >> >> b/src/i965_decoder_utils.c index 2570be9..c686235 100644 > >> >> >> --- a/src/i965_decoder_utils.c > >> >> >> +++ b/src/i965_decoder_utils.c > >> >> >> @@ -418,88 +418,73 @@ gen6_send_avc_ref_idx_state( } > >> >> >> > >> >> >> void > >> >> >> -intel_update_avc_frame_store_index(VADriverContextP ctx, > >> >> >> - struct decode_state > >> *decode_state, > >> >> >> - > >> VAPictureParameterBufferH264 *pic_param, > >> >> >> - GenFrameStore > >> frame_store[MAX_GEN_REFERENCE_FRAMES]) > >> >> >> +intel_update_avc_frame_store_index( > >> >> >> + VADriverContextP ctx, > >> >> >> + struct decode_state *decode_state, > >> >> >> + VAPictureParameterBufferH264 *pic_param, > >> >> >> + GenFrameStore > >> frame_store[MAX_GEN_REFERENCE_FRAMES] > >> >> >> +) > >> >> >> { > >> >> >> - int i, j; > >> >> >> - > >> >> >> - assert(MAX_GEN_REFERENCE_FRAMES == > >> ARRAY_ELEMS(pic_param->ReferenceFrames)); > >> >> >> - > >> >> >> - for (i = 0; i < MAX_GEN_REFERENCE_FRAMES; i++) { > >> >> >> - int found = 0; > >> >> >> - > >> >> >> - if (frame_store[i].surface_id == VA_INVALID_ID || > >> >> >> - frame_store[i].obj_surface == NULL) > >> >> >> + int free_slots[MAX_GEN_REFERENCE_FRAMES]; > >> >> >> + int i, j, n, num_free_slots; > >> >> >> + bool found; > >> >> >> + > >> >> > > >> >> > It will be better to fill the free_slots through the following > >> >> > two loops instead of one loop, which is to try to defer the > >> >> > usage of the reference picture later used that is not used in > >> >> > current reference frame > >> list. > >> >> > a. check the invalid surface_id or NULL surface and then add it > >> >> > into free_slots. > >> >> > b. check whether one picture is used by the current reference > >> >> > frame list. If not, add it into the free_slots. > >> >> > >> >> The previous algorithm was filling in the holes. This one does the > >> >> same. I don't see where your proposal could bring an improvement? > >> > > >> > In your patch you use the free_slots to find one free slot to hold > >> > a new reference picture in DPB. Right? It will firstly try to find > >> > one free_slot with invalid picture_id or surface, which is used to > >> > hold a new reference picture in DPB. > >> > >> No, it will take whatever comes first between: > >> - a slot with an invalid picture_id or surface ; > >> - or a slot with a retired reference picture candidate, i.e. > >> something that cannot be found in the ReferenceFrames[] array. > >> And this works exactly the same way as before. No functional change yet. > >> > >> > It can remove the possibility that one picture later used is > >> > immediately used to hold a new reference picture in DPB. So based > >> > on the above point IMO it is better to find the free_slots through > >> > two steps to hold the new reference picture in DPB. > >> > >> This would only hide further this situation, but OK. > >> > >> Regards, > >> Gwenole. > >> > >> >> >> + /* Remove obsolete entries from the internal DPB */ > >> >> >> + for (i = 0, n = 0; i < MAX_GEN_REFERENCE_FRAMES; i++) { > >> >> >> + GenFrameStore * const fs = &frame_store[i]; > >> >> >> + if (fs->surface_id == VA_INVALID_ID || !fs->obj_surface) { > >> >> >> + free_slots[n++] = i; > >> >> >> continue; > >> >> >> + } > >> >> >> > >> >> >> - assert(frame_store[i].frame_store_id != -1); > >> >> >> - > >> >> >> - for (j = 0; j < MAX_GEN_REFERENCE_FRAMES; j++) { > >> >> >> - VAPictureH264 *ref_pic = > >> &pic_param->ReferenceFrames[j]; > >> >> >> - if (ref_pic->flags & VA_PICTURE_H264_INVALID) > >> >> >> + // Find whether the current entry is still a valid > >> >> >> + reference > >> frame > >> >> >> + found = false; > >> >> >> + for (j = 0; j < > >> >> >> + ARRAY_ELEMS(decode_state->reference_objects); > >> j++) { > >> >> >> + struct object_surface * const obj_surface = > >> >> >> + decode_state->reference_objects[j]; > >> >> >> + if (!obj_surface) > >> >> >> continue; > >> >> >> - > >> >> >> - if (frame_store[i].surface_id == ref_pic->picture_id) { > >> >> >> - found = 1; > >> >> >> + if ((found = obj_surface == fs->obj_surface)) > >> >> >> break; > >> >> >> - } > >> >> >> } > >> >> >> > >> >> >> - /* remove it from the internal DPB */ > >> >> >> + // ... or remove it > >> >> >> if (!found) { > >> >> >> - frame_store[i].surface_id = VA_INVALID_ID; > >> >> >> - frame_store[i].frame_store_id = -1; > >> >> >> - frame_store[i].obj_surface = NULL; > >> >> >> + fs->surface_id = VA_INVALID_ID; > >> >> >> + fs->obj_surface = NULL; > >> >> >> + fs->frame_store_id = -1; > >> >> >> + free_slots[n++] = i; > >> >> >> } > >> >> >> } > >> >> >> + num_free_slots = n; > >> >> >> > >> >> >> - for (i = 0; i < MAX_GEN_REFERENCE_FRAMES; i++) { > >> >> >> - VAPictureH264 *ref_pic = &pic_param->ReferenceFrames[i]; > >> >> >> - int found = 0; > >> >> >> - > >> >> >> - if (ref_pic->flags & VA_PICTURE_H264_INVALID || > >> >> >> - ref_pic->picture_id == VA_INVALID_SURFACE || > >> >> >> - decode_state->reference_objects[i] == NULL) > >> >> >> + /* Append the new reference frames */ > >> >> >> + for (i = 0, n = 0; i < > >> >> >> + ARRAY_ELEMS(decode_state->reference_objects); > >> i++) { > >> >> >> + struct object_surface * const obj_surface = > >> >> >> + decode_state->reference_objects[i]; > >> >> >> + if (!obj_surface) > >> >> >> continue; > >> >> >> > >> >> >> + // Find whether the current frame is not already in > >> >> >> + our frame > >> store > >> >> >> + found = false; > >> >> >> for (j = 0; j < MAX_GEN_REFERENCE_FRAMES; j++) { > >> >> >> - if (frame_store[j].surface_id == ref_pic->picture_id) { > >> >> >> - found = 1; > >> >> >> + if ((found = frame_store[j].obj_surface == > >> >> >> + obj_surface)) > >> >> >> break; > >> >> >> - } > >> >> >> } > >> >> >> > >> >> >> - /* add the new reference frame into the internal DPB */ > >> >> >> + // ... or add it > >> >> >> if (!found) { > >> >> >> - int frame_idx; > >> >> >> - int slot_found; > >> >> >> - struct object_surface *obj_surface = > >> decode_state->reference_objects[i]; > >> >> >> - > >> >> >> - slot_found = 0; > >> >> >> - frame_idx = -1; > >> >> >> - /* Find a free frame store index */ > >> >> >> - for (j = 0; j < MAX_GEN_REFERENCE_FRAMES; j++) { > >> >> >> - if (frame_store[j].surface_id == VA_INVALID_ID || > >> >> >> - frame_store[j].obj_surface == NULL) { > >> >> >> - frame_idx = j; > >> >> >> - slot_found = 1; > >> >> >> - break; > >> >> >> - } > >> >> >> + if (n < num_free_slots) { > >> >> >> + GenFrameStore * const fs = > >> &frame_store[free_slots[n++]]; > >> >> >> + fs->surface_id = obj_surface->base.id; > >> >> >> + fs->obj_surface = obj_surface; > >> >> >> + fs->frame_store_id = fs - frame_store; > >> >> >> + } > >> >> >> + else { > >> >> >> + WARN_ONCE("No free slot found for DPB > >> >> >> + reference list!!!\n"); > >> >> >> } > >> >> >> - > >> >> >> - > >> >> >> - if (slot_found) { > >> >> >> - frame_store[j].surface_id = ref_pic->picture_id; > >> >> >> - frame_store[j].frame_store_id = frame_idx; > >> >> >> - frame_store[j].obj_surface = obj_surface; > >> >> >> - } else { > >> >> >> - WARN_ONCE("Not free slot for DPB reference > >> list!!!\n"); > >> >> >> - } > >> >> >> } > >> >> >> } > >> >> >> - > >> >> >> } > >> >> >> > >> >> >> void > >> >> > > >> >> > > >> > > >> > > >> _______________________________________________ > >> Libva mailing list > >> [email protected] > >> http://lists.freedesktop.org/mailman/listinfo/libva _______________________________________________ Libva mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/libva
