Am 12.06.2017 10:38 nachm. schrieb "Ganapathy Raman Kasi" <[email protected] >:
Hi, Currently incase of using 1 -> N transcode (1 SW decode -> N NVENC encodes) without HW upload filter, we end up allocating multiple Cuda contexts for the N transcode sessions for the same underlying gpu device. This comes with the cuda context initialization overhead. (~100 ms per context creation with 4th gen i5 with GTX 1080 in ubuntu 16.04). Also in case of M * (1->N) full HW accelerated transcode we face this issue where the cuda context is not shared between the M transcode sessions. Sharing the context would greatly reduce the initialization time which will matter in case of short clip transcodes. I currently have a global array in avutil/hwcontext_cuda.c which keeps track of the cuda contexts created and reuses existing contexts when request for hwdevice ctx create occurs. This is shared in the attached patch. Please check the approach and let me know if there is better/cleaner way to do this. Thanks Global state in the libraries is something we absolutely try to stay away from, so this approach is not quite appropriate. If you want to somehow share this, it should be in the ffmpeg command line tool somewhere, however we also try to reduce hardware specific magic in favor of abstractions. - Hendrik _______________________________________________ ffmpeg-devel mailing list [email protected] http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
