The GitHub Actions job "Required Checks" on texera.git/main has succeeded.
Run started by GitHub user github-merge-queue[bot] (triggered by 
github-merge-queue[bot]).

Head commit for run:
439ea72e46b78aec7f71e8889225f2c90942a2c2 / Anish Shivamurthy 
<[email protected]>
feat(huggingface): add audio and media generation tasks (#5570)

## What changes were proposed in this PR?

Adds the audio and media-generation task families — 5 HF pipeline tasks
— as new `TaskCodegen`s plugged into the dispatcher established by the
text-generation PR:

audio tasks: `automatic-speech-recognition`, `audio-classification`,
`text-to-speech`

media-generation tasks: `text-to-image`, `text-to-video`

`codegen/AudioTaskCodegen.scala` supplies the per-task payload + parse
Python branches for the 3 audio tasks.

`codegen/MediaGenCodegen.scala` supplies the per-task payload + parse
Python branches for the 2 media-generation tasks.

`CodegenContext` is extended with `audioInput` + `inputAudioColumn`
(`EncodableString`).

`HuggingFaceInferenceOpDesc.scala` gains 2 new `@JsonProperty` fields
and registers `AudioTaskCodegen` + `MediaGenCodegen` in the dispatcher.

`PythonCodegenBase.scala` grows to host the shared audio/media
infrastructure:

- Audio task-family tuple (`audio_only_tasks`) in `process_table`.
- Per-row audio-byte resolution from upload or column input.
- Raw binary request handling for `automatic-speech-recognition` and
`audio-classification`.
- JSON payload handling for `text-to-speech`.
- Provider-specific routing for media generation and audio generation
through `_call_provider`, including OpenAI-compatible image/audio
endpoints where supported.
- Response parsing for audio/media outputs, including data-URL
conversion for generated media URLs.
- Media helper support for converting remote URLs into `data:image/...`,
`data:audio/...`, or `data:video/...` URLs where needed.
- Hardened audio input loading to match the image-input path: uploaded
audio is accepted as a data URL, remote audio is fetched through the
existing HTTPS-only `_fetch_remote_url` helper, and arbitrary
worker-local file paths are no longer read.

User-input strings continue to flow through `pyb"..."` +
`EncodableString` so they reach Python as
`self.decode_python_template('<base64>')` rather than raw literals.
`PythonCodeRawInvalidTextSpec` still passes with 117/117 descriptors
py_compile cleanly.

## Any related issues, documentation, or discussions?

Tracking issue: Add audio and media-generation task families to
HuggingFace operator apache#5288

Closes apache#5288

Stacked on: Add image task family (`ImageTaskCodegen`) to HuggingFace
operator / `hf/03-image-tasks`

Parent issue: Add Hugging Face inference operator apache#5041

Closed sibling issue: Add HuggingFaceModelResource REST endpoints for HF
operator UI apache#5134

## How was this PR tested?

`sbt "WorkflowOperator/compile; WorkflowOperator/Test/compile"` clean.

`sbt scalafmtCheck` clean.

`sbt "WorkflowOperator/testOnly
org.apache.texera.amber.operator.huggingFace.HuggingFaceInferenceOpDescSpec
org.apache.texera.amber.util.PythonCodeRawInvalidTextSpec"` — 26 focused
tests pass, including HuggingFace audio/media task coverage and the raw
Python descriptor scan.

`sbt "WorkflowOperator/testOnly
org.apache.texera.amber.util.PythonCodeRawInvalidTextSpec"` — 117/117
descriptors py_compile cleanly with the new operator code paths, no
marker leaks.

- Added regression coverage that audio remote input routes through
`_fetch_remote_url(audio_input)` and no longer uses raw
`requests.get(audio_input)` or local file reads.

## Was this PR authored or co-authored using generative AI tooling?

Yes, co-authored with generative AI tooling (Codex).

Report URL: https://github.com/apache/texera/actions/runs/27988100482

With regards,
GitHub Actions via GitBox

Reply via email to