Signed-off-by: MaximilianKaindl <[email protected]>
---
doc/filters.texi | 64 ++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 64 insertions(+)
diff --git a/doc/filters.texi b/doc/filters.texi
index a7046e0f4e..340ce39e2a 100644
--- a/doc/filters.texi
+++ b/doc/filters.texi
@@ -30776,6 +30776,70 @@ bench=start,selectivecolor=reds=-.2 .12 -.49,bench=stop
@end example
@end itemize
+@section avgclass
+
+Average classification probabilities across multiple frames for both audio and
video streams.
+
+This filter analyzes classification data from frame side data (bounding boxes)
and calculates average confidence scores for each label. The filter processes
classification metadata from the @code{dnn_classify} filter or other sources
that generate AVDetectionBBox side data, computing averages over the entire
stream.
+
+At the end of the stream (or when manually triggered), the filter outputs the
average probability for each detected class, both to console logs and
optionally to a CSV file.
+
+@table @option
+@item output_file
+Path to a CSV output file where average classification results will be
written. If not specified, results are only printed to log output.
+
+@item v
+Specify the number of video streams (default: 1).
+
+@item a
+Specify the number of audio streams (default: 0).
+@end table
+
+This filter supports the following commands:
+
+@table @option
+@item writeinfo
+Immediately write current average classification results to the log and output
file (if specified) without waiting for the stream to end.
+
+@item flush
+Force the filter to write results and flush all its internal state.
+@end table
+
+@subsection Examples
+
+Process a video with object detection and classification, then calculate
average classification probabilities:
+@example
+ffmpeg -i input.mp4 -vf
"dnn_detect=model=detection.xml:input=data:output=detection_out:confidence=0.5,dnn_classify=model=classification.pt:dnn_backend=torch:tokenizer=tokenizer.json:labels=labels.txt,avgclass=output_file=results.csv"
-f null -
+@end example
+
+Process both audio and video classification:
+@example
+ffmpeg -i input.mkv -filter_complex "[0:v]dnn_classify[v0];
[0:a]aformat=sample_fmts=fltp,dnn_classify=dnn_backend=torch:model=clap_model.pt:is_audio=1:tokenizer=tokenizer.json:labels=audio_labels.txt[a0];
[v0][a0]avgclass=v=1:a=1:output_file=av_results.csv" -f null -
+@end example
+
+@subsection Output Format
+
+When the filter completes processing (or when the @code{writeinfo} command is
sent), it outputs classification results in this format:
+
+@example
+Classification averages:
+Stream #0:
+ Label: cat: Average probability 0.8765, Appeared 120 times
+ Label: dog: Average probability 0.3421, Appeared 42 times
+Stream #1:
+ Label: music: Average probability 0.9823, Appeared 315 times
+ Label: speech: Average probability 0.1245, Appeared 15 times
+@end example
+
+If an output file is specified, the same data is written in CSV format:
+@example
+stream_id,label,avg_probability,count
+0,cat,0.8765,120
+0,dog,0.3421,42
+1,music,0.9823,315
+1,speech,0.1245,15
+@end example
+
@section concat
Concatenate audio and video streams, joining them together one after the
--
2.34.1
_______________________________________________
ffmpeg-devel mailing list
[email protected]
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
[email protected] with subject "unsubscribe".