This is an automated email from the ASF dual-hosted git repository.

robertlazarski pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/axis-axis2-java-core.git

commit bf3a403c548b96890d377767005f62dade1921b7
Author: Robert Lazarski <[email protected]>
AuthorDate: Tue Apr 7 03:03:41 2026 -1000

    MCP catalog B1: mcpInputSchema parameter support + build-time code-gen 
script
    
    - OpenApiSpecGenerator: generateMcpCatalogJson() now reads mcpInputSchema
      parameter (operation-level overrides service-level via existing 
getMcpStringParam).
      Parses value with Jackson to validate JSON; falls back to empty schema 
with WARN
      log on parse failure. Backward compatible — no param = existing empty 
schema.
    
    - McpCatalogGeneratorTest: 6 new B1 tests covering parameter override, 
required
      array preservation, service-level fallback, precedence, invalid JSON 
fallback,
      and backward-compat empty schema baseline.
    
    - tools/gen_mcp_schema.py: Option 3 build-time code-gen. Parses typedef 
struct{}
      blocks from Axis2/C .h files, maps C types to JSON Schema 
(integer/number/string/
      boolean/array/object), and writes mcpInputSchema parameters into 
services.xml.
      Run: python3 tools/gen_mcp_schema.py --header service.h --services 
services.xml
    
    - AXIS2_MODERNIZATION_PLAN.md: new Immediate Track section covering 
B1/B2/B3/C3
      (Java), D1/D2/D3 (Axis2/C), and E (Penguin deployment) with sprint 
sequence.
    
    Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
---
 AXIS2_MODERNIZATION_PLAN.md                        | 183 +++++++++++++
 .../apache/axis2/openapi/OpenApiSpecGenerator.java |  51 +++-
 .../axis2/openapi/McpCatalogGeneratorTest.java     | 142 ++++++++++
 tools/gen_mcp_schema.py                            | 300 +++++++++++++++++++++
 4 files changed, 669 insertions(+), 7 deletions(-)

diff --git a/AXIS2_MODERNIZATION_PLAN.md b/AXIS2_MODERNIZATION_PLAN.md
index 7e0242f4c0..6cdda944d6 100644
--- a/AXIS2_MODERNIZATION_PLAN.md
+++ b/AXIS2_MODERNIZATION_PLAN.md
@@ -23,6 +23,189 @@ entirely. No other Java framework can do all three from the 
same service deploym
 
 ---
 
+## Immediate Track — MCP inputSchema + Axis2/C + Penguin Demo
+
+**Goal**: Complete the MCP catalog to production quality, port the catalog 
handler to
+Axis2/C, and run a live demo on penguin via Apache httpd. This track runs 
ahead of
+Phases 1–6 because it validates the MCP story end-to-end on real hardware.
+
+### Step B1 — `mcpInputSchema` static parameter support (Java + C)
+
+**Problem**: Every tool in `/openapi-mcp.json` emits `"inputSchema": {}`. 
Claude has to
+guess parameters. This kills usability for financial benchmark tools with 6+ 
fields.
+
+**Approach (dual strategy)**:
+
+1. **Option 1 — Static declaration in services.xml** (ships first, zero risk):
+   Each `<operation>` carries a `mcpInputSchema` parameter whose value is a 
literal
+   JSON Schema string. `OpenApiSpecGenerator.generateMcpCatalogJson()` reads 
it with
+   `getMcpStringParam()` and embeds it verbatim, parsing with Jackson to 
validate.
+   Falls back to `{}` on parse failure with a WARN log.
+
+   ```xml
+   <operation name="portfolioVariance">
+     <parameter name="mcpInputSchema">{
+       "type": "object",
+       "required": ["n_assets", "weights", "covariance_matrix"],
+       "properties": {
+         "n_assets":          {"type": "integer", "minimum": 2, "maximum": 
2000},
+         "weights":           {"type": "array",   "items": {"type": "number"}},
+         "covariance_matrix": {"type": "array",   "items": {"type": "number"}},
+         "request_id":        {"type": "string"}
+       }
+     }</parameter>
+   </operation>
+   ```
+
+2. **Option 3 — Build-time code generation from C headers** (ships second):
+   A Python script (`tools/gen_mcp_schema.py`) reads Axis2/C service header 
files,
+   maps C struct fields to JSON Schema types, and writes `mcpInputSchema` 
parameters
+   directly back into `services.xml`. The C type mapping table:
+
+   | C type | JSON Schema type |
+   |--------|-----------------|
+   | `int`, `long`, `axis2_int32_t` | `"integer"` |
+   | `double`, `float` | `"number"` |
+   | `axis2_char_t *`, `char *` | `"string"` |
+   | `axis2_bool_t` | `"boolean"` |
+   | pointer-to-struct | `"object"` |
+   | array pointer + count field | `"array"` |
+
+   The script detects `_request_t` structs, infers which fields are required vs
+   optional (required = no default value set in initialiser), and outputs a
+   standards-compliant JSON Schema. Services.xml is updated in-place.
+
+   Run: `python3 tools/gen_mcp_schema.py --header 
financial_benchmark_service.h \
+         --services services.xml`
+
+**Java implementation**: `OpenApiSpecGenerator.generateMcpCatalogJson()` — 
check
+`mcpInputSchema` param before falling back to empty schema. Single method 
change.
+
+**Tests**: `McpCatalogGeneratorTest` — add tests for schema embedding, invalid 
JSON
+graceful fallback, and missing param fallback.
+
+### Step B2 — `mcpAuthScope` per-operation parameter
+
+Operation-level auth scope string embedded in catalog for MCP clients that 
support
+scope-based auth (e.g. `"mcpAuthScope": "read:portfolio"`). Reads via
+`getMcpStringParam()`. Omitted from tool node when absent.
+
+### Step B3 — `mcpStreaming` hint
+
+Boolean `mcpStreaming` parameter marks operations that can stream chunked 
responses
+(e.g. large Monte Carlo results). Adds `"x-streaming": true` to the tool node.
+Reads via `getMcpBoolParam()`.
+
+### Step C3 — MCP Resources endpoint
+
+New servlet path `GET /mcp-resources` returns a JSON array of `resource://` 
URIs:
+
+```json
+{
+  "resources": [
+    {"uri": "resource://axis2/openapi",      "name": "OpenAPI Spec",  
"mimeType": "application/json"},
+    {"uri": "resource://axis2/field-catalog", "name": "Field Catalog", 
"mimeType": "application/json"}
+  ]
+}
+```
+
+Individual resource content served at `GET 
/mcp-resource?uri=resource://axis2/openapi`.
+Wired in `OpenApiServlet` as a new path case.
+
+---
+
+### Step D1 — Axis2/C MCP catalog handler
+
+New file: `modules/mcp/mcp_catalog_handler.c`
+
+Walks `axis2_conf_t` service map at request time — same traversal as Java's
+`axisConfig.getServices()`. Emits the identical JSON catalog format. Key 
functions:
+
+```c
+// Entry point registered on GET /_mcp/openapi-mcp.json
+axis2_status_t mcp_catalog_handler_invoke(
+    axis2_handler_t *handler,
+    const axutil_env_t *env,
+    struct axis2_msg_ctx *msg_ctx);
+
+// Reads axis2_op_t parameter, falls back to axis2_svc_t parameter
+static const axis2_char_t *get_mcp_param(
+    axis2_op_t *op, axis2_svc_t *svc,
+    const axutil_env_t *env,
+    const axis2_char_t *param_name,
+    const axis2_char_t *default_val);
+```
+
+Parameter reading uses `axis2_op_get_param()` / `axis2_svc_get_param()` — the 
same
+two-level lookup as Java. `mcpDescription`, `mcpReadOnly`, `mcpDestructive`,
+`mcpIdempotent`, `mcpInputSchema` all supported.
+
+JSON output built with `json_object_new_object()` (json-c) — no string 
concatenation.
+
+### Step D2 — Axis2/C correlation ID error hardening
+
+New helper: `axis2_json_secure_fault.c`
+
+```c
+axis2_char_t *axis2_json_make_secure_fault_message(
+    const axutil_env_t *env,
+    int is_parse_error);
+// Returns "Bad Request [errorRef=<uuid>]" or "Internal Server Error 
[errorRef=<uuid>]"
+// UUID generated from /dev/urandom (16 bytes → hex with hyphens)
+// Full context logged to axutil_log before sanitized message returned
+```
+
+Applied to `financial_benchmark_service_handler.c` JSON parse error paths and 
any
+`axis2_json_rpc_msg_recv` equivalent in Axis2/C.
+
+### Step D3 — Populate `mcpInputSchema` in all 5 financial benchmark operations
+
+Using Option 1 (hand-authored) immediately; Option 3 code-gen script validates 
against
+it. The 5 operations:
+
+| Operation | Required fields |
+|-----------|----------------|
+| `portfolioVariance` | `n_assets`, `weights`, `covariance_matrix` |
+| `monteCarlo` | `n_simulations`, `n_periods`, `initial_value`, 
`expected_return`, `volatility` |
+| `scenarioAnalysis` | `n_assets`, `assets` |
+| `generateTestData` | `n_assets` |
+| `metadata` | *(none — GET operation)* |
+
+### Step E — Penguin deployment
+
+1. Build `mod_axis2.so` from `axis-axis2-c-core` targeting penguin's Apache 
httpd
+2. `httpd.conf` fragment:
+   ```apache
+   LoadModule axis2_module modules/mod_axis2.so
+   Axis2RepoPath /opt/axis2c/repository
+   <Location /axis2>
+       SetHandler axis2_module
+   </Location>
+   ```
+3. Deploy `FinancialBenchmarkService` to repository
+4. Verify:
+   ```bash
+   curl https://penguin/axis2/_mcp/openapi-mcp.json
+   curl -X POST 
https://penguin/axis2/services/FinancialBenchmarkService/monteCarlo \
+        -H 'Content-Type: application/json' \
+        -d 
'{"monteCarlo":[{"arg0":{"n_simulations":10000,"n_periods":252,...}}]}'
+   ```
+5. Demo: MCP-aware client resolves tools from catalog, calls financial 
operations
+
+### Immediate Sprint Sequence
+
+```
+B1 (Java) → B1 tests → B2/B3 (Java, config-only) → C3 (Java, new servlet path)
+     ↓
+D1 (Axis2/C catalog handler) → D2 (error hardening) → D3 (services.xml schemas)
+     ↓
+Option 3 code-gen script (tools/gen_mcp_schema.py)
+     ↓
+E (Penguin deployment + demo)
+```
+
+---
+
 ## Phase 1 — Spring Boot Starter
 
 **Goal**: Reduce Axis2 + Spring Boot integration from a multi-day 
configuration project
diff --git 
a/modules/openapi/src/main/java/org/apache/axis2/openapi/OpenApiSpecGenerator.java
 
b/modules/openapi/src/main/java/org/apache/axis2/openapi/OpenApiSpecGenerator.java
index 4f17b655dc..5824367b2c 100644
--- 
a/modules/openapi/src/main/java/org/apache/axis2/openapi/OpenApiSpecGenerator.java
+++ 
b/modules/openapi/src/main/java/org/apache/axis2/openapi/OpenApiSpecGenerator.java
@@ -754,13 +754,50 @@ public class OpenApiSpecGenerator {
                             service.getName() + ": " + opName);
                     toolNode.put("description", description);
 
-                    // inputSchema: minimal MCP-compliant structure. Richer 
schemas are
-                    // produced when services carry @McpTool annotations 
(future work).
-                    com.fasterxml.jackson.databind.node.ObjectNode schema =
-                            toolNode.putObject("inputSchema");
-                    schema.put("type", "object");
-                    schema.putObject("properties");
-                    schema.putArray("required");
+                    // inputSchema: prefer mcpInputSchema parameter (literal 
JSON Schema
+                    // string set in services.xml at operation or service 
level).
+                    // Falls back to an empty schema when absent or malformed.
+                    //
+                    // Option 1 usage (services.xml):
+                    //   <operation name="portfolioVariance">
+                    //     <parameter name="mcpInputSchema">{
+                    //       "type": "object",
+                    //       "required": ["n_assets", "weights"],
+                    //       "properties": {
+                    //         "n_assets": {"type": "integer"},
+                    //         "weights":  {"type": "array", "items": {"type": 
"number"}}
+                    //       }
+                    //     }</parameter>
+                    //   </operation>
+                    //
+                    // Option 3: schemas can also be written by the build-time 
code-gen
+                    // script (tools/gen_mcp_schema.py) which reads C header 
structs and
+                    // emits mcpInputSchema parameters into services.xml 
automatically.
+                    String mcpInputSchemaStr = getMcpStringParam(operation, 
service,
+                            "mcpInputSchema", null);
+                    if (mcpInputSchemaStr != null) {
+                        try {
+                            com.fasterxml.jackson.databind.JsonNode 
parsedSchema =
+                                    jackson.readTree(mcpInputSchemaStr);
+                            toolNode.set("inputSchema", parsedSchema);
+                        } catch (Exception parseEx) {
+                            log.warn("[MCP] Invalid mcpInputSchema JSON for 
operation '"
+                                    + opName + "' in service '" + 
service.getName()
+                                    + "' — falling back to empty schema: "
+                                    + parseEx.getMessage());
+                            com.fasterxml.jackson.databind.node.ObjectNode 
schema =
+                                    toolNode.putObject("inputSchema");
+                            schema.put("type", "object");
+                            schema.putObject("properties");
+                            schema.putArray("required");
+                        }
+                    } else {
+                        com.fasterxml.jackson.databind.node.ObjectNode schema =
+                                toolNode.putObject("inputSchema");
+                        schema.put("type", "object");
+                        schema.putObject("properties");
+                        schema.putArray("required");
+                    }
 
                     toolNode.put("endpoint", "POST " + path);
 
diff --git 
a/modules/openapi/src/test/java/org/apache/axis2/openapi/McpCatalogGeneratorTest.java
 
b/modules/openapi/src/test/java/org/apache/axis2/openapi/McpCatalogGeneratorTest.java
index 33682c0651..a799ef7a84 100644
--- 
a/modules/openapi/src/test/java/org/apache/axis2/openapi/McpCatalogGeneratorTest.java
+++ 
b/modules/openapi/src/test/java/org/apache/axis2/openapi/McpCatalogGeneratorTest.java
@@ -715,6 +715,148 @@ public class McpCatalogGeneratorTest extends TestCase {
         assertFalse("openWorldHint default must be false",   
annotations.path("openWorldHint").asBoolean());
     }
 
+    // ── B1: mcpInputSchema static parameter 
──────────────────────────────────
+
+    /**
+     * When an operation has a {@code mcpInputSchema} parameter containing a 
valid
+     * JSON Schema string, that schema is embedded verbatim in the catalog 
tool entry.
+     * This is Option 1: explicit declaration in services.xml.
+     */
+    public void testMcpInputSchemaParamOverridesEmptySchema() throws Exception 
{
+        AxisService svc = new AxisService("FinancialBenchmarkService");
+        AxisOperation op = new InOutAxisOperation();
+        op.setName(QName.valueOf("portfolioVariance"));
+        op.addParameter(new org.apache.axis2.description.Parameter(
+                "mcpInputSchema",
+                
"{\"type\":\"object\",\"required\":[\"n_assets\",\"weights\"]," +
+                "\"properties\":{\"n_assets\":{\"type\":\"integer\"}," +
+                
"\"weights\":{\"type\":\"array\",\"items\":{\"type\":\"number\"}}}}"));
+        svc.addOperation(op);
+        axisConfig.addService(svc);
+
+        JsonNode schema = getCatalogTools().get(0).path("inputSchema");
+        assertEquals("type must be 'object'", "object", 
schema.path("type").asText());
+        assertFalse("properties must be present from mcpInputSchema",
+                schema.path("properties").isMissingNode());
+        assertFalse("n_assets property must be present",
+                schema.path("properties").path("n_assets").isMissingNode());
+        assertEquals("n_assets must be integer type",
+                "integer", 
schema.path("properties").path("n_assets").path("type").asText());
+    }
+
+    /**
+     * The required array from the mcpInputSchema parameter must be preserved
+     * exactly — not replaced with an empty array.
+     */
+    public void testMcpInputSchemaRequiredArrayPreserved() throws Exception {
+        AxisService svc = new AxisService("FinancialBenchmarkService");
+        AxisOperation op = new InOutAxisOperation();
+        op.setName(QName.valueOf("monteCarlo"));
+        op.addParameter(new org.apache.axis2.description.Parameter(
+                "mcpInputSchema",
+                
"{\"type\":\"object\",\"required\":[\"n_simulations\",\"n_periods\"]," +
+                "\"properties\":{\"n_simulations\":{\"type\":\"integer\"}," +
+                "\"n_periods\":{\"type\":\"integer\"}}}"));
+        svc.addOperation(op);
+        axisConfig.addService(svc);
+
+        JsonNode required = 
getCatalogTools().get(0).path("inputSchema").path("required");
+        assertTrue("required must be an array", required.isArray());
+        assertEquals("required must have 2 entries", 2, required.size());
+        // Collect required field names
+        java.util.Set<String> reqFields = new java.util.HashSet<>();
+        for (JsonNode r : required) reqFields.add(r.asText());
+        assertTrue("n_simulations must be required", 
reqFields.contains("n_simulations"));
+        assertTrue("n_periods must be required",     
reqFields.contains("n_periods"));
+    }
+
+    /**
+     * mcpInputSchema set at service level applies to all operations in the 
service
+     * that do not have their own operation-level override.
+     */
+    public void testServiceLevelMcpInputSchemaAppliesWhenNoOperationLevel() 
throws Exception {
+        AxisService svc = new AxisService("MetadataService");
+        svc.addParameter(new org.apache.axis2.description.Parameter(
+                "mcpInputSchema",
+                
"{\"type\":\"object\",\"properties\":{\"request_id\":{\"type\":\"string\"}}}"));
+        AxisOperation op = new InOutAxisOperation();
+        op.setName(QName.valueOf("metadata"));
+        svc.addOperation(op);
+        axisConfig.addService(svc);
+
+        JsonNode schema = getCatalogTools().get(0).path("inputSchema");
+        assertFalse("request_id property must come from service-level 
mcpInputSchema",
+                schema.path("properties").path("request_id").isMissingNode());
+    }
+
+    /**
+     * Operation-level mcpInputSchema takes precedence over a service-level 
one.
+     */
+    public void 
testOperationLevelMcpInputSchemaTakesPrecedenceOverServiceLevel() throws 
Exception {
+        AxisService svc = new AxisService("SomeService");
+        svc.addParameter(new org.apache.axis2.description.Parameter(
+                "mcpInputSchema",
+                
"{\"type\":\"object\",\"properties\":{\"service_field\":{\"type\":\"string\"}}}"));
+        AxisOperation op = new InOutAxisOperation();
+        op.setName(QName.valueOf("specificOp"));
+        op.addParameter(new org.apache.axis2.description.Parameter(
+                "mcpInputSchema",
+                
"{\"type\":\"object\",\"properties\":{\"op_field\":{\"type\":\"integer\"}}}"));
+        svc.addOperation(op);
+        axisConfig.addService(svc);
+
+        JsonNode props = 
getCatalogTools().get(0).path("inputSchema").path("properties");
+        assertFalse("op_field from operation-level schema must be present",
+                props.path("op_field").isMissingNode());
+        assertTrue("service_field must not be present when operation-level 
overrides",
+                props.path("service_field").isMissingNode());
+    }
+
+    /**
+     * When mcpInputSchema contains invalid JSON, the generator must log a 
warning
+     * and fall back to the empty schema — never throw or produce invalid JSON.
+     */
+    public void testInvalidMcpInputSchemaFallsBackToEmptySchema() throws 
Exception {
+        AxisService svc = new AxisService("BrokenService");
+        AxisOperation op = new InOutAxisOperation();
+        op.setName(QName.valueOf("brokenOp"));
+        op.addParameter(new org.apache.axis2.description.Parameter(
+                "mcpInputSchema", "NOT_VALID_JSON{{"));
+        svc.addOperation(op);
+        axisConfig.addService(svc);
+
+        // Must not throw — output must still be valid JSON
+        String json = generator.generateMcpCatalogJson(mockRequest);
+        JsonNode root = MAPPER.readTree(json);
+        assertNotNull("Output must still be valid JSON after mcpInputSchema 
parse failure", root);
+
+        JsonNode schema = root.path("tools").get(0).path("inputSchema");
+        assertEquals("Fallback schema must have type=object", "object",
+                schema.path("type").asText());
+        assertFalse("Fallback schema must still have properties",
+                schema.path("properties").isMissingNode());
+    }
+
+    /**
+     * When no mcpInputSchema parameter is set, the catalog emits the baseline
+     * empty schema — preserving backward compatibility for all existing 
services.
+     */
+    public void testAbsentMcpInputSchemaProducesEmptyBaselineSchema() throws 
Exception {
+        addService("LegacyService", "legacyOp");
+
+        JsonNode schema = getCatalogTools().get(0).path("inputSchema");
+        assertEquals("Absent mcpInputSchema must produce type=object", 
"object",
+                schema.path("type").asText());
+        assertTrue("Baseline properties must be an empty object",
+                schema.path("properties").isObject());
+        assertEquals("Baseline properties must be empty", 0,
+                schema.path("properties").size());
+        assertTrue("Baseline required must be an empty array",
+                schema.path("required").isArray());
+        assertEquals("Baseline required must be empty", 0,
+                schema.path("required").size());
+    }
+
     // ── tool list mirrors existing OpenAPI paths 
──────────────────────────────
 
     /**
diff --git a/tools/gen_mcp_schema.py b/tools/gen_mcp_schema.py
new file mode 100644
index 0000000000..1110925d2d
--- /dev/null
+++ b/tools/gen_mcp_schema.py
@@ -0,0 +1,300 @@
+#!/usr/bin/env python3
+"""
+gen_mcp_schema.py — Build-time MCP inputSchema generator (Option 3)
+
+Reads an Axis2/C service header file, finds *_request_t structs, maps C field
+types to JSON Schema types, and writes mcpInputSchema parameters into the
+corresponding services.xml.
+
+Usage
+-----
+    python3 tools/gen_mcp_schema.py \\
+        --header path/to/service.h \\
+        --services path/to/services.xml \\
+        [--dry-run]
+
+The script writes in-place unless --dry-run is given, in which case it prints
+the updated XML to stdout.
+
+C → JSON Schema type mapping
+-----------------------------
+int / long / int32_t / int64_t / axis2_int32_t   → "integer"
+double / float                                    → "number"
+char * / axis2_char_t *                           → "string"
+axis2_bool_t / bool / int (named is_*/has_*)      → "boolean"
+pointer-to-struct (foo_t *)                       → "object"
+array + companion _count / n_ field               → "array"
+
+Required fields: any field without a "= 0" / "= NULL" / "= false" default in
+the struct definition is treated as required.  Fields named *_id, n_*, count_*
+are also always required.
+
+The script uses regex-only parsing (no libclang) so it works without a C
+toolchain installed.  It is conservative: when a type cannot be mapped
+unambiguously, it emits "type": "object" and logs a warning.
+"""
+
+import argparse
+import json
+import re
+import sys
+import textwrap
+from pathlib import Path
+
+# ---------------------------------------------------------------------------
+# C type → JSON Schema type table
+# ---------------------------------------------------------------------------
+_SCALAR_MAP = [
+    # (regex_pattern, json_schema_type)
+    (r'\bint\b|\blong\b|\bint32_t\b|\bint64_t\b|\buint32_t\b|\buint64_t\b'
+     r'|\baxis2_int32_t\b|\bsize_t\b',      "integer"),
+    (r'\bdouble\b|\bfloat\b',               "number"),
+    (r'\baxis2_char_t\s*\*|\bchar\s*\*',    "string"),
+    (r'\baxis2_bool_t\b|\bbool\b',          "boolean"),
+]
+
+_STRUCT_PTR_RE = re.compile(r'\b(\w+_t)\s*\*')
+
+
+def c_type_to_json_schema(c_type: str, field_name: str) -> dict:
+    """Map a C type string to a minimal JSON Schema dict."""
+    c_type = c_type.strip()
+
+    # Boolean heuristic: field named is_*/has_* with int type
+    if re.match(r'(is|has|enable|use)_', field_name) and re.search(r'\bint\b', 
c_type):
+        return {"type": "boolean"}
+
+    # Pointer to array (double * / float * used for matrix/weight arrays)
+    if re.search(r'\bdouble\s*\*|\bfloat\s*\*', c_type):
+        return {"type": "array", "items": {"type": "number"}}
+
+    for pattern, schema_type in _SCALAR_MAP:
+        if re.search(pattern, c_type):
+            return {"type": schema_type}
+
+    m = _STRUCT_PTR_RE.search(c_type)
+    if m:
+        return {"type": "object"}
+
+    # Fallback
+    print(f"  WARNING: unmapped C type '{c_type}' for field '{field_name}' → 
object",
+          file=sys.stderr)
+    return {"type": "object"}
+
+
+# ---------------------------------------------------------------------------
+# Struct parser
+# ---------------------------------------------------------------------------
+_STRUCT_RE = re.compile(
+    r'typedef\s+struct\s+\w*\s*\{([^}]+)\}\s*(\w+_t)\s*;',
+    re.DOTALL
+)
+_FIELD_RE = re.compile(
+    
r'^\s*(?P<type>(?:const\s+)?[\w\s\*]+?)\s+(?P<name>\w+)\s*(?:=\s*(?P<default>[^;]+))?\s*;',
+    re.MULTILINE
+)
+
+
+def parse_structs(header_text: str) -> dict[str, dict]:
+    """
+    Return {struct_name: {field_name: {"c_type": ..., "has_default": bool}}}.
+    Only parses typedef struct { ... } name_t; blocks.
+    """
+    structs = {}
+    for m in _STRUCT_RE.finditer(header_text):
+        body = m.group(1)
+        name = m.group(2)
+        fields = {}
+        for fm in _FIELD_RE.finditer(body):
+            field_name = fm.group("name")
+            c_type     = fm.group("type")
+            default    = fm.group("default")
+            # Skip comment-only or empty lines picked up by the regex
+            if c_type.strip().startswith("//") or 
c_type.strip().startswith("*"):
+                continue
+            fields[field_name] = {
+                "c_type":      c_type.strip(),
+                "has_default": default is not None,
+            }
+        if fields:
+            structs[name] = fields
+    return structs
+
+
+def build_json_schema(struct_fields: dict) -> dict:
+    """Build a JSON Schema object from parsed struct fields."""
+    properties = {}
+    required = []
+
+    # Fields that are always array companions (paired with n_* / *_count) — 
skip them
+    # as array size information; they are implicit.
+    companion_size_re = re.compile(r'^n_|_count$|_len$|_size$')
+
+    # First pass: collect array-indicator field names
+    array_fields = set()
+    for fname, info in struct_fields.items():
+        c_type = info["c_type"]
+        if re.search(r'\bdouble\s*\*|\bfloat\s*\*', c_type):
+            array_fields.add(fname)
+
+    for fname, info in struct_fields.items():
+        c_type      = info["c_type"]
+        has_default = info["has_default"]
+
+        # Skip size companion fields (n_assets accompanies weights[], etc.)
+        if companion_size_re.search(fname) and fname not in array_fields:
+            # Keep n_assets as it is the primary dimension parameter
+            if not fname.startswith("n_"):
+                continue
+
+        schema_prop = c_type_to_json_schema(c_type, fname)
+
+        # Annotate array items for common financial arrays
+        if schema_prop.get("type") == "array" and not schema_prop.get("items"):
+            schema_prop["items"] = {"type": "number"}
+
+        properties[fname] = schema_prop
+
+        # Required: no default AND not a companion size field
+        always_required = re.match(r'.+_id$|^n_', fname)
+        if always_required or not has_default:
+            required.append(fname)
+
+    schema = {
+        "type": "object",
+        "properties": properties,
+    }
+    if required:
+        schema["required"] = required
+    return schema
+
+
+# ---------------------------------------------------------------------------
+# services.xml patcher
+# ---------------------------------------------------------------------------
+def find_request_struct(structs: dict, op_name: str) -> str | None:
+    """
+    Heuristically find the request struct for an operation name.
+    Tries: finbench_{op_name}_request_t, {op_name}_request_t, {op_name}_req_t
+    """
+    service_prefix = "finbench_"
+    candidates = [
+        f"{service_prefix}{op_name}_request_t",
+        f"{op_name}_request_t",
+        f"{op_name}_req_t",
+    ]
+    for c in candidates:
+        if c in structs:
+            return c
+    # Case-insensitive fallback
+    op_lower = op_name.lower()
+    for sname in structs:
+        if op_lower in sname.lower() and "request" in sname.lower():
+            return sname
+    return None
+
+
+_OP_RE = re.compile(
+    r'(<operation\s+name="(?P<opname>[^"]+)"[^>]*>)',
+    re.DOTALL
+)
+_EXISTING_SCHEMA_RE = re.compile(
+    r'\s*<parameter\s+name="mcpInputSchema">.*?</parameter>',
+    re.DOTALL
+)
+
+
+def patch_services_xml(xml_text: str, structs: dict) -> tuple[str, list[str]]:
+    """
+    For each <operation name="..."> block, find the matching request struct
+    and inject (or replace) a mcpInputSchema parameter.
+
+    Returns (patched_xml, list_of_change_messages).
+    """
+    messages = []
+    result = xml_text
+
+    for m in _OP_RE.finditer(xml_text):
+        op_name = m.group("opname")
+        struct_name = find_request_struct(structs, op_name)
+        if struct_name is None:
+            messages.append(f"  SKIP {op_name}: no matching *_request_t struct 
found")
+            continue
+
+        schema = build_json_schema(structs[struct_name])
+        schema_json = json.dumps(schema, indent=16)
+        param_block = f'<parameter 
name="mcpInputSchema">{schema_json}</parameter>'
+
+        # Check if an mcpInputSchema already exists after this <operation ...> 
tag
+        op_start = m.start()
+        # Find the closing </operation>
+        close_re = re.compile(r'</operation>', re.DOTALL)
+        close_m = close_re.search(result, op_start)
+        if close_m is None:
+            continue
+        op_block = result[op_start:close_m.end()]
+
+        if '<parameter name="mcpInputSchema">' in op_block:
+            # Replace existing
+            new_op_block = _EXISTING_SCHEMA_RE.sub(
+                "\n            " + param_block, op_block)
+            result = result[:op_start] + new_op_block + result[close_m.end():]
+            messages.append(f"  UPDATE {op_name}: replaced mcpInputSchema from 
{struct_name}")
+        else:
+            # Insert after the opening <operation ...> tag
+            tag_end = op_start + len(m.group(1))
+            indent = "\n            "
+            result = (result[:tag_end]
+                      + indent + param_block
+                      + result[tag_end:])
+            messages.append(f"  INSERT {op_name}: wrote mcpInputSchema from 
{struct_name}")
+
+    return result, messages
+
+
+# ---------------------------------------------------------------------------
+# CLI
+# ---------------------------------------------------------------------------
+def main():
+    p = argparse.ArgumentParser(description=__doc__,
+                                
formatter_class=argparse.RawDescriptionHelpFormatter)
+    p.add_argument("--header",   required=True, help="Path to .h file")
+    p.add_argument("--services", required=True, help="Path to services.xml")
+    p.add_argument("--dry-run",  action="store_true",
+                   help="Print patched XML to stdout, do not write")
+    args = p.parse_args()
+
+    header_path   = Path(args.header)
+    services_path = Path(args.services)
+
+    if not header_path.exists():
+        sys.exit(f"ERROR: header not found: {header_path}")
+    if not services_path.exists():
+        sys.exit(f"ERROR: services.xml not found: {services_path}")
+
+    header_text   = header_path.read_text(encoding="utf-8")
+    services_text = services_path.read_text(encoding="utf-8")
+
+    structs = parse_structs(header_text)
+    if not structs:
+        sys.exit("ERROR: no typedef struct { } name_t; blocks found in header")
+
+    print(f"Parsed {len(structs)} structs from {header_path.name}:", 
file=sys.stderr)
+    for sname in structs:
+        print(f"  {sname} ({len(structs[sname])} fields)", file=sys.stderr)
+
+    patched, messages = patch_services_xml(services_text, structs)
+
+    print("Schema generation results:", file=sys.stderr)
+    for msg in messages:
+        print(msg, file=sys.stderr)
+
+    if args.dry_run:
+        print(patched)
+    else:
+        services_path.write_text(patched, encoding="utf-8")
+        print(f"Written: {services_path}", file=sys.stderr)
+
+
+if __name__ == "__main__":
+    main()

Reply via email to