[I] Add image task family (ImageTaskCodegen) to HuggingFace operator [texera]

via GitHub Tue, 02 Jun 2026 18:07:12 -0700


PG1204 opened a new issue, #5319:
URL: https://github.com/apache/texera/issues/5319


   ### Task Summary
   
   ### Feature Summary
   
   The HuggingFace inference operator (#5041) covers ~20 HF pipeline tasks 
across text, image, audio, and media-generation families. PR #5278 established 
the dispatcher + per-task-codegen architecture and shipped the first task 
family (text-generation). This issue covers wiring in the **image task family** 
— 9 HF pipeline tasks — as the next codegen plugged into the dispatcher.
   
   The image family splits into two sub-groups by request shape:
   - **Image-only** (raw binary upload, no prompt column): 
`image-classification`, `object-detection`, `image-segmentation`, 
`image-to-text`.
   - **Image + prompt** (base64 image bundled with a text prompt in a JSON 
payload): `visual-question-answering`, `document-question-answering`, 
`zero-shot-image-classification`, `image-text-to-text`, `image-to-image`.
   
   Landing this would let users run any of these 9 HF tasks against models 
served by HF Hub or the third-party providers HF Inference Router routes to 
(Replicate, Fal-ai, Wavespeed, zai-org, OpenAI-compatible chat providers).
   
   ### Proposed Solution or Design
   
   1. New file 
`common/workflow-operator/src/main/scala/org/apache/texera/amber/operator/huggingFace/codegen/ImageTaskCodegen.scala`:
      - Implements `TaskCodegen` and overrides `tasks: Set[String]` to register 
under all 9 image task strings via the dispatcher's `tasks: Set[String]` trait 
method (introduced in this PR — see point 3).
      - `payloadPython(ctx)` emits the per-row payload branches (raw binary for 
image-only, base64-encoded JSON for image+prompt, chat-completions with 
embedded image for `image-text-to-text`).
      - `parsePython(ctx)` emits the response-parse branches per task 
(extracting `md_results`, `choices[0]…content`, `generated_text`, 
`url`/`b64_json` envelopes from Replicate / Fal-ai / Wavespeed / OpenAI / 
zai-org).
   
   2. Operator field additions in `HuggingFaceInferenceOpDesc.scala`:
      - `imageInput: EncodableString` — uploaded image data URL.
      - `inputImageColumn: EncodableString` — column containing per-row image 
data.
      Both are typed `EncodableString` per the convention from PR #5278; the 
`pyb"..."` macro emits them as `self.decode_python_template('<base64>')` 
runtime expressions, so they never appear in the generated Python as raw 
literals.
   
   3. `TaskCodegen` trait gains a `tasks: Set[String]` default method (defaults 
to `Set(task)`) so a single codegen can register under multiple task strings — 
`ImageTaskCodegen` is the first multi-task codegen to use this.
   
   4. `PythonCodegenBase.scala` grows to host the shared image infrastructure:
      - Task-family tuples (`image_only_tasks`, `image_prompt_tasks`, 
`image_tasks`) + `image_headers` in `process_table`.
      - Per-row image-bytes resolution from upload (`self._read_image_input()`) 
or input column (`self._read_binary_value(...)` + 
`self._compress_image_bytes(...)`).
      - `_post_with_fallback` signature extended with `raw_binary_headers` + 
`use_raw_binary_body`; new branches for `image-text-to-text` chat-completions 
and the model-author vision route.
      - `_call_provider` gains image branches for zai-org's custom API, 
Replicate predictions + polling, Fal-ai, Wavespeed submit+poll, and image 
embedding in the OpenAI-compatible / unknown-provider fallbacks.
      - Image-content-type response handling (returns 
`data:image/...;base64,...` URLs).
      - Image helpers added: `_read_image_input`, `_compress_image_bytes`, 
`_image_input_as_base64`, `_read_binary_value`, `_looks_like_html`, 
`_html_to_image_bytes`, `_extract_json_arg`, `_url_to_data_url`.
   
   5. Frontend integration (HF lines only — no agent / dataset noise from the 
team's feature branch):
      - `HuggingFaceImageUploadComponent` (cherry-picked Angular component for 
the property panel's image upload widget).
      - One-line `huggingface-image-upload` formly-type registration in 
`formly-config.ts`.
      - One-line declaration in `app.module.ts`.
      - `HuggingFace.png` + `sample-image.png` assets.
   
   6. Spec coverage in `HuggingFaceInferenceOpDescSpec`:
      - Image-only task routing (raw binary payload + image headers).
      - VQA / document-QA payload shape.
      - `image-text-to-text` chat-completions with embedded base64.
      - `image-to-image` raw binary + `_url_to_data_url` parse.
      - Dispatcher coverage for all 9 image task strings.
      - The operator continues to participate in 
`PythonCodeRawInvalidTextSpec`'s 117-descriptor regression scan (verified to 
pass — no marker leaks, no `generatePythonCode` exceptions).
   
   References:
   - Parent issue: #5041
   - Sibling issues: #5134 (REST resource, closed via #5124), #5277 (operator + 
text-generation, in flight via #5278)
   - Stacked on #5278
   
   ### Impact / Priority
   
   (P2) Medium — required for the HuggingFace inference operator (#5041) to 
function for image tasks. Does not affect existing functionality.
   
   ### Affected Area
   
   Workflow Engine (Amber) — operator codegen + Python provider routing; minor 
frontend integration.
   
   
   
   ### Task Type
   
   - [ ] Refactor / Cleanup
   - [ ] DevOps / Deployment / CI
   - [ ] Testing / QA
   - [ ] Documentation
   - [ ] Performance
   - [x] Other


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Add image task family (ImageTaskCodegen) to HuggingFace operator [texera]

Reply via email to