PG1204 opened a new issue, #5319:
URL: https://github.com/apache/texera/issues/5319
### Task Summary
### Feature Summary
The HuggingFace inference operator (#5041) covers ~20 HF pipeline tasks
across text, image, audio, and media-generation families. PR #5278 established
the dispatcher + per-task-codegen architecture and shipped the first task
family (text-generation). This issue covers wiring in the **image task family**
— 9 HF pipeline tasks — as the next codegen plugged into the dispatcher.
The image family splits into two sub-groups by request shape:
- **Image-only** (raw binary upload, no prompt column):
`image-classification`, `object-detection`, `image-segmentation`,
`image-to-text`.
- **Image + prompt** (base64 image bundled with a text prompt in a JSON
payload): `visual-question-answering`, `document-question-answering`,
`zero-shot-image-classification`, `image-text-to-text`, `image-to-image`.
Landing this would let users run any of these 9 HF tasks against models
served by HF Hub or the third-party providers HF Inference Router routes to
(Replicate, Fal-ai, Wavespeed, zai-org, OpenAI-compatible chat providers).
### Proposed Solution or Design
1. New file
`common/workflow-operator/src/main/scala/org/apache/texera/amber/operator/huggingFace/codegen/ImageTaskCodegen.scala`:
- Implements `TaskCodegen` and overrides `tasks: Set[String]` to register
under all 9 image task strings via the dispatcher's `tasks: Set[String]` trait
method (introduced in this PR — see point 3).
- `payloadPython(ctx)` emits the per-row payload branches (raw binary for
image-only, base64-encoded JSON for image+prompt, chat-completions with
embedded image for `image-text-to-text`).
- `parsePython(ctx)` emits the response-parse branches per task
(extracting `md_results`, `choices[0]…content`, `generated_text`,
`url`/`b64_json` envelopes from Replicate / Fal-ai / Wavespeed / OpenAI /
zai-org).
2. Operator field additions in `HuggingFaceInferenceOpDesc.scala`:
- `imageInput: EncodableString` — uploaded image data URL.
- `inputImageColumn: EncodableString` — column containing per-row image
data.
Both are typed `EncodableString` per the convention from PR #5278; the
`pyb"..."` macro emits them as `self.decode_python_template('<base64>')`
runtime expressions, so they never appear in the generated Python as raw
literals.
3. `TaskCodegen` trait gains a `tasks: Set[String]` default method (defaults
to `Set(task)`) so a single codegen can register under multiple task strings —
`ImageTaskCodegen` is the first multi-task codegen to use this.
4. `PythonCodegenBase.scala` grows to host the shared image infrastructure:
- Task-family tuples (`image_only_tasks`, `image_prompt_tasks`,
`image_tasks`) + `image_headers` in `process_table`.
- Per-row image-bytes resolution from upload (`self._read_image_input()`)
or input column (`self._read_binary_value(...)` +
`self._compress_image_bytes(...)`).
- `_post_with_fallback` signature extended with `raw_binary_headers` +
`use_raw_binary_body`; new branches for `image-text-to-text` chat-completions
and the model-author vision route.
- `_call_provider` gains image branches for zai-org's custom API,
Replicate predictions + polling, Fal-ai, Wavespeed submit+poll, and image
embedding in the OpenAI-compatible / unknown-provider fallbacks.
- Image-content-type response handling (returns
`data:image/...;base64,...` URLs).
- Image helpers added: `_read_image_input`, `_compress_image_bytes`,
`_image_input_as_base64`, `_read_binary_value`, `_looks_like_html`,
`_html_to_image_bytes`, `_extract_json_arg`, `_url_to_data_url`.
5. Frontend integration (HF lines only — no agent / dataset noise from the
team's feature branch):
- `HuggingFaceImageUploadComponent` (cherry-picked Angular component for
the property panel's image upload widget).
- One-line `huggingface-image-upload` formly-type registration in
`formly-config.ts`.
- One-line declaration in `app.module.ts`.
- `HuggingFace.png` + `sample-image.png` assets.
6. Spec coverage in `HuggingFaceInferenceOpDescSpec`:
- Image-only task routing (raw binary payload + image headers).
- VQA / document-QA payload shape.
- `image-text-to-text` chat-completions with embedded base64.
- `image-to-image` raw binary + `_url_to_data_url` parse.
- Dispatcher coverage for all 9 image task strings.
- The operator continues to participate in
`PythonCodeRawInvalidTextSpec`'s 117-descriptor regression scan (verified to
pass — no marker leaks, no `generatePythonCode` exceptions).
References:
- Parent issue: #5041
- Sibling issues: #5134 (REST resource, closed via #5124), #5277 (operator +
text-generation, in flight via #5278)
- Stacked on #5278
### Impact / Priority
(P2) Medium — required for the HuggingFace inference operator (#5041) to
function for image tasks. Does not affect existing functionality.
### Affected Area
Workflow Engine (Amber) — operator codegen + Python provider routing; minor
frontend integration.
### Task Type
- [ ] Refactor / Cleanup
- [ ] DevOps / Deployment / CI
- [ ] Testing / QA
- [ ] Documentation
- [ ] Performance
- [x] Other
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]