The GitHub Actions job "Required Checks" on texera.git/gh-readonly-queue/main/pr-5278-3dab771a2fe3ea5bf97c4c69cfbd761f9cd01e54 has succeeded. Run started by GitHub user xuang7 (triggered by xuang7).
Head commit for run: 2b9add956c9e63c3c4f6e717221a0c5e33e54875 / Prateek Ganigi <[email protected]> feat(huggingFace): refactor operator into per-task codegen + text-generation (#5278) > ⚠️ This PR is stacked on #5124. Until that lands, the diff below includes #5124's `HuggingFaceModelResource.scala` and the 1-line registration in `TexeraWebApplication.scala`. The new code in this PR is everything under `common/workflow-operator/src/main/scala/org/apache/texera/amber/operator/huggingFace/` and the new test under `common/workflow-operator/src/test/.../huggingFace/HuggingFaceInferenceOpDescSpec.scala`. Once #5124 merges, this diff will auto-clean to ~839 lines. ### What changes were proposed in this PR? Refactors the monolithic 1,278-line `HuggingFaceInferenceOpDesc` from the team's feature branch into a dispatcher + per-task codegen architecture and ships the first task family (text-generation): - `codegen/TaskCodegen.scala` introduces the trait + `CodegenContext` that model per-task variation. - `codegen/PythonCodegenBase.scala` emits the shared provider-fallback / `process_table` / `_parse_response` infrastructure with two holes for the per-task payload and parse snippets. - `codegen/TextGenCodegen.scala` supplies text-generation's chat-completions payload and the `body["choices"][0 ["message"]["content"]` parse branch. - `HuggingFaceInferenceOpDesc.scala` becomes a thin (~180-line) dispatcher holding the `@JsonProperty` fields and the `registeredCodegens` map. User-input string fields are typed `EncodableString` and emitted via the `pyb"..."` macro so values reach Python as `self.decode_python_template('<base64>')` rather than raw literals. Class constants are assigned in `open(self)` so `self` is in scope for the decode call. The generated `process_table` runs a defensive `_HF_MODEL_ID_PATTERN` check at runtime before any HF URL is composed. The `TaskCodegen` trait also exposes a `tasks: Set[String]` default so a single codegen can register under multiple task strings, this becomes relevant in PR 3 (image family). ### Any related issues, documentation, or discussions? Tracked in #5277 & #5041(umbrella issue for the HuggingFace operator end-to-end implementation). Closes #5277 Stacked on #5124 (PR 1 - REST resource). This is PR 2 of a multi-PR series landing the HuggingFace operator end-to-end. The full plan and umbrella issue live separately; this PR's scope is exactly the dispatcher pattern + text-generation codegen. ### How was this PR tested? - `sbt "WorkflowOperator/compile; WorkflowOperator/Test/compile"` clean. - `sbt scalafmtCheck` clean. - `sbt "WorkflowOperator/testOnly org.apache.texera.amber.operator.huggingFace.HuggingFaceInferenceOpDescSpec"` - 10/10 pass (operator info, validation, codegen wiring, MODEL_ID runtime check, leak-prevention, clamping, schema). - `sbt "WorkflowOperator/testOnly org.apache.texera.amber.util.PythonCodeRawInvalidTextSpec"` - 117/117 descriptors `py_compile` cleanly, no raw-text leaks. The new operator is included in this scan. - Generated Python verified via `python3 -m py_compile` on a sample output. ### Was this PR authored or co-authored using generative AI tooling? Co-authored with Claude Opus 4.7 --------- Co-authored-by: Elliot Lin <[email protected]> Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]> Co-authored-by: Xuan Gu <[email protected]> Report URL: https://github.com/apache/texera/actions/runs/27580569890 With regards, GitHub Actions via GitBox
