Re: [PR] feat: add audit-comet-expression Claude Code skill [datafusion-comet]

via GitHub Wed, 01 Apr 2026 15:19:08 -0700


kazuyukitanimura commented on code in PR #3793:
URL: https://github.com/apache/datafusion-comet/pull/3793#discussion_r3024925793



##########
.claude/skills/audit-comet-expression/SKILL.md:
##########
@@ -0,0 +1,325 @@
+---
+name: audit-comet-expression
+description: Audit an existing Comet expression for correctness and test 
coverage. Studies the Spark implementation across versions 3.4.3, 3.5.8, and 
4.0.1, reviews the Comet and DataFusion implementations, identifies missing 
test coverage, and offers to implement additional tests.
+argument-hint: <expression-name>
+---
+
+Audit the Comet implementation of the `$ARGUMENTS` expression for correctness 
and test coverage.
+
+## Overview
+
+This audit covers:
+
+1. Spark implementation across versions 3.4.3, 3.5.8, and 4.0.1
+2. Comet Scala serde implementation
+3. Comet Rust / DataFusion implementation
+4. Existing test coverage (SQL file tests and Scala tests)
+5. Gap analysis and test recommendations
+
+---
+
+## Step 1: Locate the Spark Implementations
+
+Clone specific Spark version tags (use shallow clones to avoid polluting the 
workspace). Only clone a version if it is not already present.
+
+```bash
+for tag in v3.4.3 v3.5.8 v4.0.1; do
+  dir="/tmp/spark-${tag}"
+  if [ ! -d "$dir" ]; then
+    git clone --depth 1 --branch "$tag" https://github.com/apache/spark.git 
"$dir"
+  fi
+done
+```
+
+### Find the expression class in each Spark version
+
+Search the Catalyst SQL expressions source:
+
+```bash
+for tag in v3.4.3 v3.5.8 v4.0.1; do
+  dir="/tmp/spark-${tag}"
+  echo "=== $tag ==="
+  find "$dir/sql/catalyst/src/main/scala" -name "*.scala" | \
+    xargs grep -l "case class $ARGUMENTS\b\|object $ARGUMENTS\b" 2>/dev/null
+done
+```
+
+If the expression is not found in catalyst, also check core:
+
+```bash
+for tag in v3.4.3 v3.5.8 v4.0.1; do
+  dir="/tmp/spark-${tag}"
+  echo "=== $tag ==="
+  find "$dir/sql" -name "*.scala" | \
+    xargs grep -l "case class $ARGUMENTS\b\|object $ARGUMENTS\b" 2>/dev/null
+done
+```
+
+### Read the Spark source for each version
+
+For each Spark version, read the expression file and note:
+
+- The `eval`, `nullSafeEval`, and `doGenCode` / `doGenCodeSafe` methods
+- The `inputTypes` and `dataType` fields (accepted input types, return type)
+- Null handling strategy (`nullable`, `nullSafeEval`)
+- ANSI mode behavior (`ansiEnabled`, `failOnError`)
+- Special cases, guards, `require` assertions, and runtime exceptions
+- Any constants or configuration the expression reads
+
+### Compare across Spark versions
+
+Produce a concise diff summary of what changed between:
+
+- 3.4.3 → 3.5.8
+- 3.5.8 → 4.0.1
+
+Pay attention to:
+
+- New input types added or removed
+- Behavior changes for edge cases (null, overflow, empty, boundary)
+- New ANSI mode branches
+- New parameters or configuration
+- Breaking API changes that Comet must shim
+
+---
+
+## Step 2: Locate the Spark Tests
+
+```bash
+for tag in v3.4.3 v3.5.8 v4.0.1; do
+  dir="/tmp/spark-${tag}"
+  echo "=== $tag ==="
+  find "$dir/sql" -name "*.scala" -path "*/test/*" | \
+    xargs grep -l "$ARGUMENTS" 2>/dev/null
+done
+```
+
+Read the relevant Spark test files and produce a list of:
+
+- Input types covered
+- Edge cases exercised (null, empty, overflow, negative, boundary values, 
special characters, etc.)
+- ANSI mode tests
+- Error cases
+
+This list will be the reference for the coverage gap analysis in Step 5.
+
+---
+
+## Step 3: Locate the Comet Implementation
+
+### Scala serde
+
+```bash
+# Find the serde object
+grep -r "$ARGUMENTS" spark/src/main/scala/org/apache/comet/serde/ 
--include="*.scala" -l
+grep -r "$ARGUMENTS" spark/src/main/scala/org/apache/comet/ 
--include="*.scala" -l | grep -v test
+```
+
+Read the serde implementation and check:
+
+- Which Spark versions the serde handles
+- Whether `getSupportLevel` is implemented and accurate
+- Whether all input types are handled
+- Whether any types are explicitly marked `Unsupported`
+
+### Shims
+
+```bash
+find spark/src/main -name "CometExprShim.scala" | xargs grep -l "$ARGUMENTS" 
2>/dev/null
+```
+
+If shims exist, read them and note any version-specific handling.
+
+### Rust / DataFusion implementation
+
+```bash
+# Search for the function in native/spark-expr
+grep -r "$ARGUMENTS" native/spark-expr/src/ --include="*.rs" -l
+grep -r "$ARGUMENTS" native/core/src/ --include="*.rs" -l
+```
+
+If the expression delegates to DataFusion, find it there too:
+
+```bash
+# Check if there's a DataFusion built-in function with this name
+find native/ -name "Cargo.lock" -exec grep -A2 "datafusion" {} \; | grep 
"version" | head -5
+grep -r "$ARGUMENTS" ~/.cargo/registry/src/ --include="*.rs" -l 2>/dev/null | 
head -10
+```
+
+Read the Rust implementation and check:
+
+- Null handling (does it propagate nulls correctly?)
+- Overflow / error handling (returns `Err` vs panics)
+- Type dispatch (does it handle all types that Spark supports?)
+- ANSI / fail-on-error mode
+
+---
+
+## Step 4: Locate Existing Comet Tests
+
+### SQL file tests
+
+```bash
+# Find SQL test files for this expression
+find spark/src/test/resources/sql-tests/expressions/ -name "*.sql" | \
+  xargs grep -l "$ARGUMENTS" 2>/dev/null
+
+# Also check if there's a dedicated file
+find spark/src/test/resources/sql-tests/expressions/ -name "*$(echo $ARGUMENTS 
| tr '[:upper:]' '[:lower:]')*"
+```
+
+Read every SQL test file found and list:
+
+- Table schemas and data values used
+- Queries exercised
+- Query modes used (`query`, `spark_answer_only`, `tolerance`, `ignore`, 
`expect_error`)
+- Any ConfigMatrix directives
+
+### Scala tests
+
+```bash
+grep -r "$ARGUMENTS" spark/src/test/scala/ --include="*.scala" -l
+```
+
+Read the relevant Scala test files and list:
+
+- Input types covered
+- Edge cases exercised
+- Whether constant folding is disabled for literal tests
+
+---
+
+## Step 5: Gap Analysis
+
+Compare the Spark test coverage (Step 2) against the Comet test coverage (Step 
4). Produce a structured gap report:
+
+### Coverage matrix
+
+For each of the following dimensions, note whether it is covered in Comet 
tests or missing:
+
+| Dimension                                               | Spark tests it | 
Comet SQL test | Comet Scala test | Gap? |
+| ------------------------------------------------------- | -------------- | 
-------------- | ---------------- | ---- |
+| Column reference argument(s)                            |                |   
             |                  |      |
+| Literal argument(s)                                     |                |   
             |                  |      |
+| NULL input                                              |                |   
             |                  |      |
+| Empty string / empty array / empty map                  |                |   
             |                  |      |
+| Zero, negative values (numeric)                         |                |   
             |                  |      |
+| Boundary values (INT_MIN, INT_MAX, Long.MinValue, etc.) |                |   
             |                  |      |
+| NaN, Infinity, -Infinity (float/double)                 |                |   
             |                  |      |
+| Multibyte / special UTF-8 characters                    |                |   
             |                  |      |

Review Comment:
   I would like to make sure this edge cases for UTF-8
       val edgeCases = Seq(
         "é", // unicode 'e\\u{301}'
         "é", // unicode '\\u{e9}'
         "తెలుగు")



##########
.claude/skills/audit-comet-expression/SKILL.md:
##########
@@ -0,0 +1,325 @@
+---
+name: audit-comet-expression
+description: Audit an existing Comet expression for correctness and test 
coverage. Studies the Spark implementation across versions 3.4.3, 3.5.8, and 
4.0.1, reviews the Comet and DataFusion implementations, identifies missing 
test coverage, and offers to implement additional tests.
+argument-hint: <expression-name>
+---
+
+Audit the Comet implementation of the `$ARGUMENTS` expression for correctness 
and test coverage.
+
+## Overview
+
+This audit covers:
+
+1. Spark implementation across versions 3.4.3, 3.5.8, and 4.0.1
+2. Comet Scala serde implementation
+3. Comet Rust / DataFusion implementation
+4. Existing test coverage (SQL file tests and Scala tests)
+5. Gap analysis and test recommendations
+
+---
+
+## Step 1: Locate the Spark Implementations
+
+Clone specific Spark version tags (use shallow clones to avoid polluting the 
workspace). Only clone a version if it is not already present.
+
+```bash
+for tag in v3.4.3 v3.5.8 v4.0.1; do
+  dir="/tmp/spark-${tag}"
+  if [ ! -d "$dir" ]; then
+    git clone --depth 1 --branch "$tag" https://github.com/apache/spark.git 
"$dir"
+  fi
+done
+```
+
+### Find the expression class in each Spark version
+
+Search the Catalyst SQL expressions source:
+
+```bash
+for tag in v3.4.3 v3.5.8 v4.0.1; do
+  dir="/tmp/spark-${tag}"
+  echo "=== $tag ==="
+  find "$dir/sql/catalyst/src/main/scala" -name "*.scala" | \
+    xargs grep -l "case class $ARGUMENTS\b\|object $ARGUMENTS\b" 2>/dev/null
+done
+```
+
+If the expression is not found in catalyst, also check core:
+
+```bash
+for tag in v3.4.3 v3.5.8 v4.0.1; do
+  dir="/tmp/spark-${tag}"
+  echo "=== $tag ==="
+  find "$dir/sql" -name "*.scala" | \
+    xargs grep -l "case class $ARGUMENTS\b\|object $ARGUMENTS\b" 2>/dev/null
+done
+```
+
+### Read the Spark source for each version
+
+For each Spark version, read the expression file and note:
+
+- The `eval`, `nullSafeEval`, and `doGenCode` / `doGenCodeSafe` methods
+- The `inputTypes` and `dataType` fields (accepted input types, return type)
+- Null handling strategy (`nullable`, `nullSafeEval`)
+- ANSI mode behavior (`ansiEnabled`, `failOnError`)
+- Special cases, guards, `require` assertions, and runtime exceptions
+- Any constants or configuration the expression reads
+
+### Compare across Spark versions
+
+Produce a concise diff summary of what changed between:
+
+- 3.4.3 → 3.5.8
+- 3.5.8 → 4.0.1
+
+Pay attention to:
+
+- New input types added or removed
+- Behavior changes for edge cases (null, overflow, empty, boundary)
+- New ANSI mode branches
+- New parameters or configuration
+- Breaking API changes that Comet must shim
+
+---
+
+## Step 2: Locate the Spark Tests
+
+```bash
+for tag in v3.4.3 v3.5.8 v4.0.1; do
+  dir="/tmp/spark-${tag}"
+  echo "=== $tag ==="
+  find "$dir/sql" -name "*.scala" -path "*/test/*" | \
+    xargs grep -l "$ARGUMENTS" 2>/dev/null
+done
+```
+
+Read the relevant Spark test files and produce a list of:
+
+- Input types covered
+- Edge cases exercised (null, empty, overflow, negative, boundary values, 
special characters, etc.)
+- ANSI mode tests
+- Error cases
+
+This list will be the reference for the coverage gap analysis in Step 5.
+
+---
+
+## Step 3: Locate the Comet Implementation
+
+### Scala serde
+
+```bash
+# Find the serde object
+grep -r "$ARGUMENTS" spark/src/main/scala/org/apache/comet/serde/ 
--include="*.scala" -l
+grep -r "$ARGUMENTS" spark/src/main/scala/org/apache/comet/ 
--include="*.scala" -l | grep -v test
+```
+
+Read the serde implementation and check:
+
+- Which Spark versions the serde handles
+- Whether `getSupportLevel` is implemented and accurate
+- Whether all input types are handled
+- Whether any types are explicitly marked `Unsupported`
+
+### Shims
+
+```bash
+find spark/src/main -name "CometExprShim.scala" | xargs grep -l "$ARGUMENTS" 
2>/dev/null
+```
+
+If shims exist, read them and note any version-specific handling.
+
+### Rust / DataFusion implementation
+
+```bash
+# Search for the function in native/spark-expr
+grep -r "$ARGUMENTS" native/spark-expr/src/ --include="*.rs" -l
+grep -r "$ARGUMENTS" native/core/src/ --include="*.rs" -l
+```
+
+If the expression delegates to DataFusion, find it there too:
+
+```bash
+# Check if there's a DataFusion built-in function with this name
+find native/ -name "Cargo.lock" -exec grep -A2 "datafusion" {} \; | grep 
"version" | head -5
+grep -r "$ARGUMENTS" ~/.cargo/registry/src/ --include="*.rs" -l 2>/dev/null | 
head -10
+```
+
+Read the Rust implementation and check:
+
+- Null handling (does it propagate nulls correctly?)
+- Overflow / error handling (returns `Err` vs panics)
+- Type dispatch (does it handle all types that Spark supports?)
+- ANSI / fail-on-error mode
+
+---
+
+## Step 4: Locate Existing Comet Tests
+
+### SQL file tests
+
+```bash
+# Find SQL test files for this expression
+find spark/src/test/resources/sql-tests/expressions/ -name "*.sql" | \
+  xargs grep -l "$ARGUMENTS" 2>/dev/null
+
+# Also check if there's a dedicated file
+find spark/src/test/resources/sql-tests/expressions/ -name "*$(echo $ARGUMENTS 
| tr '[:upper:]' '[:lower:]')*"
+```
+
+Read every SQL test file found and list:
+
+- Table schemas and data values used
+- Queries exercised
+- Query modes used (`query`, `spark_answer_only`, `tolerance`, `ignore`, 
`expect_error`)
+- Any ConfigMatrix directives
+
+### Scala tests
+
+```bash
+grep -r "$ARGUMENTS" spark/src/test/scala/ --include="*.scala" -l
+```
+
+Read the relevant Scala test files and list:
+
+- Input types covered
+- Edge cases exercised
+- Whether constant folding is disabled for literal tests
+
+---
+
+## Step 5: Gap Analysis
+
+Compare the Spark test coverage (Step 2) against the Comet test coverage (Step 
4). Produce a structured gap report:
+
+### Coverage matrix
+
+For each of the following dimensions, note whether it is covered in Comet 
tests or missing:
+
+| Dimension                                               | Spark tests it | 
Comet SQL test | Comet Scala test | Gap? |
+| ------------------------------------------------------- | -------------- | 
-------------- | ---------------- | ---- |
+| Column reference argument(s)                            |                |   
             |                  |      |
+| Literal argument(s)                                     |                |   
             |                  |      |
+| NULL input                                              |                |   
             |                  |      |
+| Empty string / empty array / empty map                  |                |   
             |                  |      |

Review Comment:
   What about array with "null" elements?



##########
.claude/skills/audit-comet-expression/SKILL.md:
##########
@@ -0,0 +1,325 @@
+---
+name: audit-comet-expression
+description: Audit an existing Comet expression for correctness and test 
coverage. Studies the Spark implementation across versions 3.4.3, 3.5.8, and 
4.0.1, reviews the Comet and DataFusion implementations, identifies missing 
test coverage, and offers to implement additional tests.
+argument-hint: <expression-name>
+---
+
+Audit the Comet implementation of the `$ARGUMENTS` expression for correctness 
and test coverage.
+
+## Overview
+
+This audit covers:
+
+1. Spark implementation across versions 3.4.3, 3.5.8, and 4.0.1
+2. Comet Scala serde implementation
+3. Comet Rust / DataFusion implementation
+4. Existing test coverage (SQL file tests and Scala tests)
+5. Gap analysis and test recommendations
+
+---
+
+## Step 1: Locate the Spark Implementations
+
+Clone specific Spark version tags (use shallow clones to avoid polluting the 
workspace). Only clone a version if it is not already present.
+
+```bash
+for tag in v3.4.3 v3.5.8 v4.0.1; do
+  dir="/tmp/spark-${tag}"
+  if [ ! -d "$dir" ]; then
+    git clone --depth 1 --branch "$tag" https://github.com/apache/spark.git 
"$dir"
+  fi
+done
+```
+
+### Find the expression class in each Spark version
+
+Search the Catalyst SQL expressions source:
+
+```bash
+for tag in v3.4.3 v3.5.8 v4.0.1; do
+  dir="/tmp/spark-${tag}"
+  echo "=== $tag ==="
+  find "$dir/sql/catalyst/src/main/scala" -name "*.scala" | \
+    xargs grep -l "case class $ARGUMENTS\b\|object $ARGUMENTS\b" 2>/dev/null
+done
+```
+
+If the expression is not found in catalyst, also check core:
+
+```bash
+for tag in v3.4.3 v3.5.8 v4.0.1; do
+  dir="/tmp/spark-${tag}"
+  echo "=== $tag ==="
+  find "$dir/sql" -name "*.scala" | \
+    xargs grep -l "case class $ARGUMENTS\b\|object $ARGUMENTS\b" 2>/dev/null
+done
+```
+
+### Read the Spark source for each version
+
+For each Spark version, read the expression file and note:
+
+- The `eval`, `nullSafeEval`, and `doGenCode` / `doGenCodeSafe` methods
+- The `inputTypes` and `dataType` fields (accepted input types, return type)
+- Null handling strategy (`nullable`, `nullSafeEval`)
+- ANSI mode behavior (`ansiEnabled`, `failOnError`)
+- Special cases, guards, `require` assertions, and runtime exceptions
+- Any constants or configuration the expression reads
+
+### Compare across Spark versions
+
+Produce a concise diff summary of what changed between:
+
+- 3.4.3 → 3.5.8
+- 3.5.8 → 4.0.1
+
+Pay attention to:
+
+- New input types added or removed
+- Behavior changes for edge cases (null, overflow, empty, boundary)
+- New ANSI mode branches
+- New parameters or configuration
+- Breaking API changes that Comet must shim
+
+---
+
+## Step 2: Locate the Spark Tests
+
+```bash
+for tag in v3.4.3 v3.5.8 v4.0.1; do
+  dir="/tmp/spark-${tag}"
+  echo "=== $tag ==="
+  find "$dir/sql" -name "*.scala" -path "*/test/*" | \
+    xargs grep -l "$ARGUMENTS" 2>/dev/null
+done
+```
+
+Read the relevant Spark test files and produce a list of:
+
+- Input types covered
+- Edge cases exercised (null, empty, overflow, negative, boundary values, 
special characters, etc.)
+- ANSI mode tests
+- Error cases
+
+This list will be the reference for the coverage gap analysis in Step 5.
+
+---
+
+## Step 3: Locate the Comet Implementation
+
+### Scala serde
+
+```bash
+# Find the serde object
+grep -r "$ARGUMENTS" spark/src/main/scala/org/apache/comet/serde/ 
--include="*.scala" -l
+grep -r "$ARGUMENTS" spark/src/main/scala/org/apache/comet/ 
--include="*.scala" -l | grep -v test
+```
+
+Read the serde implementation and check:
+
+- Which Spark versions the serde handles
+- Whether `getSupportLevel` is implemented and accurate
+- Whether all input types are handled
+- Whether any types are explicitly marked `Unsupported`
+
+### Shims
+
+```bash
+find spark/src/main -name "CometExprShim.scala" | xargs grep -l "$ARGUMENTS" 
2>/dev/null
+```
+
+If shims exist, read them and note any version-specific handling.
+
+### Rust / DataFusion implementation
+
+```bash
+# Search for the function in native/spark-expr
+grep -r "$ARGUMENTS" native/spark-expr/src/ --include="*.rs" -l
+grep -r "$ARGUMENTS" native/core/src/ --include="*.rs" -l
+```
+
+If the expression delegates to DataFusion, find it there too:
+
+```bash
+# Check if there's a DataFusion built-in function with this name
+find native/ -name "Cargo.lock" -exec grep -A2 "datafusion" {} \; | grep 
"version" | head -5
+grep -r "$ARGUMENTS" ~/.cargo/registry/src/ --include="*.rs" -l 2>/dev/null | 
head -10
+```
+
+Read the Rust implementation and check:
+
+- Null handling (does it propagate nulls correctly?)
+- Overflow / error handling (returns `Err` vs panics)
+- Type dispatch (does it handle all types that Spark supports?)
+- ANSI / fail-on-error mode
+
+---
+
+## Step 4: Locate Existing Comet Tests
+
+### SQL file tests
+
+```bash
+# Find SQL test files for this expression
+find spark/src/test/resources/sql-tests/expressions/ -name "*.sql" | \
+  xargs grep -l "$ARGUMENTS" 2>/dev/null
+
+# Also check if there's a dedicated file
+find spark/src/test/resources/sql-tests/expressions/ -name "*$(echo $ARGUMENTS 
| tr '[:upper:]' '[:lower:]')*"
+```
+
+Read every SQL test file found and list:
+
+- Table schemas and data values used
+- Queries exercised
+- Query modes used (`query`, `spark_answer_only`, `tolerance`, `ignore`, 
`expect_error`)
+- Any ConfigMatrix directives
+
+### Scala tests
+
+```bash
+grep -r "$ARGUMENTS" spark/src/test/scala/ --include="*.scala" -l
+```
+
+Read the relevant Scala test files and list:
+
+- Input types covered
+- Edge cases exercised
+- Whether constant folding is disabled for literal tests
+
+---
+
+## Step 5: Gap Analysis
+
+Compare the Spark test coverage (Step 2) against the Comet test coverage (Step 
4). Produce a structured gap report:
+
+### Coverage matrix
+
+For each of the following dimensions, note whether it is covered in Comet 
tests or missing:
+
+| Dimension                                               | Spark tests it | 
Comet SQL test | Comet Scala test | Gap? |
+| ------------------------------------------------------- | -------------- | 
-------------- | ---------------- | ---- |
+| Column reference argument(s)                            |                |   
             |                  |      |
+| Literal argument(s)                                     |                |   
             |                  |      |
+| NULL input                                              |                |   
             |                  |      |
+| Empty string / empty array / empty map                  |                |   
             |                  |      |
+| Zero, negative values (numeric)                         |                |   
             |                  |      |
+| Boundary values (INT_MIN, INT_MAX, Long.MinValue, etc.) |                |   
             |                  |      |

Review Comment:
   Should we specifically say minimum positive number?



##########
.claude/skills/audit-comet-expression/SKILL.md:
##########
@@ -0,0 +1,325 @@
+---
+name: audit-comet-expression
+description: Audit an existing Comet expression for correctness and test 
coverage. Studies the Spark implementation across versions 3.4.3, 3.5.8, and 
4.0.1, reviews the Comet and DataFusion implementations, identifies missing 
test coverage, and offers to implement additional tests.
+argument-hint: <expression-name>
+---
+
+Audit the Comet implementation of the `$ARGUMENTS` expression for correctness 
and test coverage.
+
+## Overview
+
+This audit covers:
+
+1. Spark implementation across versions 3.4.3, 3.5.8, and 4.0.1
+2. Comet Scala serde implementation
+3. Comet Rust / DataFusion implementation
+4. Existing test coverage (SQL file tests and Scala tests)
+5. Gap analysis and test recommendations
+
+---
+
+## Step 1: Locate the Spark Implementations
+
+Clone specific Spark version tags (use shallow clones to avoid polluting the 
workspace). Only clone a version if it is not already present.
+
+```bash
+for tag in v3.4.3 v3.5.8 v4.0.1; do
+  dir="/tmp/spark-${tag}"
+  if [ ! -d "$dir" ]; then
+    git clone --depth 1 --branch "$tag" https://github.com/apache/spark.git 
"$dir"
+  fi
+done
+```
+
+### Find the expression class in each Spark version
+
+Search the Catalyst SQL expressions source:
+
+```bash
+for tag in v3.4.3 v3.5.8 v4.0.1; do
+  dir="/tmp/spark-${tag}"
+  echo "=== $tag ==="
+  find "$dir/sql/catalyst/src/main/scala" -name "*.scala" | \
+    xargs grep -l "case class $ARGUMENTS\b\|object $ARGUMENTS\b" 2>/dev/null
+done
+```
+
+If the expression is not found in catalyst, also check core:
+
+```bash
+for tag in v3.4.3 v3.5.8 v4.0.1; do
+  dir="/tmp/spark-${tag}"
+  echo "=== $tag ==="
+  find "$dir/sql" -name "*.scala" | \
+    xargs grep -l "case class $ARGUMENTS\b\|object $ARGUMENTS\b" 2>/dev/null
+done
+```
+
+### Read the Spark source for each version
+
+For each Spark version, read the expression file and note:
+
+- The `eval`, `nullSafeEval`, and `doGenCode` / `doGenCodeSafe` methods
+- The `inputTypes` and `dataType` fields (accepted input types, return type)
+- Null handling strategy (`nullable`, `nullSafeEval`)
+- ANSI mode behavior (`ansiEnabled`, `failOnError`)
+- Special cases, guards, `require` assertions, and runtime exceptions
+- Any constants or configuration the expression reads
+
+### Compare across Spark versions
+
+Produce a concise diff summary of what changed between:
+
+- 3.4.3 → 3.5.8
+- 3.5.8 → 4.0.1
+
+Pay attention to:
+
+- New input types added or removed
+- Behavior changes for edge cases (null, overflow, empty, boundary)
+- New ANSI mode branches
+- New parameters or configuration
+- Breaking API changes that Comet must shim
+
+---
+
+## Step 2: Locate the Spark Tests
+
+```bash
+for tag in v3.4.3 v3.5.8 v4.0.1; do
+  dir="/tmp/spark-${tag}"
+  echo "=== $tag ==="
+  find "$dir/sql" -name "*.scala" -path "*/test/*" | \
+    xargs grep -l "$ARGUMENTS" 2>/dev/null
+done
+```
+
+Read the relevant Spark test files and produce a list of:
+
+- Input types covered
+- Edge cases exercised (null, empty, overflow, negative, boundary values, 
special characters, etc.)
+- ANSI mode tests
+- Error cases
+
+This list will be the reference for the coverage gap analysis in Step 5.
+
+---
+
+## Step 3: Locate the Comet Implementation
+
+### Scala serde
+
+```bash
+# Find the serde object
+grep -r "$ARGUMENTS" spark/src/main/scala/org/apache/comet/serde/ 
--include="*.scala" -l
+grep -r "$ARGUMENTS" spark/src/main/scala/org/apache/comet/ 
--include="*.scala" -l | grep -v test
+```
+
+Read the serde implementation and check:
+
+- Which Spark versions the serde handles
+- Whether `getSupportLevel` is implemented and accurate
+- Whether all input types are handled
+- Whether any types are explicitly marked `Unsupported`
+
+### Shims
+
+```bash
+find spark/src/main -name "CometExprShim.scala" | xargs grep -l "$ARGUMENTS" 
2>/dev/null
+```
+
+If shims exist, read them and note any version-specific handling.
+
+### Rust / DataFusion implementation
+
+```bash
+# Search for the function in native/spark-expr
+grep -r "$ARGUMENTS" native/spark-expr/src/ --include="*.rs" -l
+grep -r "$ARGUMENTS" native/core/src/ --include="*.rs" -l
+```
+
+If the expression delegates to DataFusion, find it there too:
+
+```bash
+# Check if there's a DataFusion built-in function with this name
+find native/ -name "Cargo.lock" -exec grep -A2 "datafusion" {} \; | grep 
"version" | head -5
+grep -r "$ARGUMENTS" ~/.cargo/registry/src/ --include="*.rs" -l 2>/dev/null | 
head -10
+```
+
+Read the Rust implementation and check:
+
+- Null handling (does it propagate nulls correctly?)
+- Overflow / error handling (returns `Err` vs panics)
+- Type dispatch (does it handle all types that Spark supports?)
+- ANSI / fail-on-error mode
+
+---
+
+## Step 4: Locate Existing Comet Tests
+
+### SQL file tests
+
+```bash
+# Find SQL test files for this expression
+find spark/src/test/resources/sql-tests/expressions/ -name "*.sql" | \
+  xargs grep -l "$ARGUMENTS" 2>/dev/null
+
+# Also check if there's a dedicated file
+find spark/src/test/resources/sql-tests/expressions/ -name "*$(echo $ARGUMENTS 
| tr '[:upper:]' '[:lower:]')*"
+```
+
+Read every SQL test file found and list:
+
+- Table schemas and data values used
+- Queries exercised
+- Query modes used (`query`, `spark_answer_only`, `tolerance`, `ignore`, 
`expect_error`)
+- Any ConfigMatrix directives
+
+### Scala tests
+
+```bash
+grep -r "$ARGUMENTS" spark/src/test/scala/ --include="*.scala" -l
+```
+
+Read the relevant Scala test files and list:
+
+- Input types covered
+- Edge cases exercised
+- Whether constant folding is disabled for literal tests
+
+---
+
+## Step 5: Gap Analysis
+
+Compare the Spark test coverage (Step 2) against the Comet test coverage (Step 
4). Produce a structured gap report:
+
+### Coverage matrix
+
+For each of the following dimensions, note whether it is covered in Comet 
tests or missing:
+
+| Dimension                                               | Spark tests it | 
Comet SQL test | Comet Scala test | Gap? |
+| ------------------------------------------------------- | -------------- | 
-------------- | ---------------- | ---- |
+| Column reference argument(s)                            |                |   
             |                  |      |
+| Literal argument(s)                                     |                |   
             |                  |      |
+| NULL input                                              |                |   
             |                  |      |
+| Empty string / empty array / empty map                  |                |   
             |                  |      |
+| Zero, negative values (numeric)                         |                |   
             |                  |      |
+| Boundary values (INT_MIN, INT_MAX, Long.MinValue, etc.) |                |   
             |                  |      |
+| NaN, Infinity, -Infinity (float/double)                 |                |   
             |                  |      |

Review Comment:
   Would you add negative zero as well as subnormal float/double?
   



##########
.claude/skills/audit-comet-expression/SKILL.md:
##########
@@ -0,0 +1,325 @@
+---
+name: audit-comet-expression
+description: Audit an existing Comet expression for correctness and test 
coverage. Studies the Spark implementation across versions 3.4.3, 3.5.8, and 
4.0.1, reviews the Comet and DataFusion implementations, identifies missing 
test coverage, and offers to implement additional tests.
+argument-hint: <expression-name>
+---
+
+Audit the Comet implementation of the `$ARGUMENTS` expression for correctness 
and test coverage.
+
+## Overview
+
+This audit covers:
+
+1. Spark implementation across versions 3.4.3, 3.5.8, and 4.0.1
+2. Comet Scala serde implementation
+3. Comet Rust / DataFusion implementation
+4. Existing test coverage (SQL file tests and Scala tests)
+5. Gap analysis and test recommendations
+
+---
+
+## Step 1: Locate the Spark Implementations
+
+Clone specific Spark version tags (use shallow clones to avoid polluting the 
workspace). Only clone a version if it is not already present.
+
+```bash
+for tag in v3.4.3 v3.5.8 v4.0.1; do
+  dir="/tmp/spark-${tag}"
+  if [ ! -d "$dir" ]; then
+    git clone --depth 1 --branch "$tag" https://github.com/apache/spark.git 
"$dir"
+  fi
+done
+```
+
+### Find the expression class in each Spark version
+
+Search the Catalyst SQL expressions source:
+
+```bash
+for tag in v3.4.3 v3.5.8 v4.0.1; do
+  dir="/tmp/spark-${tag}"
+  echo "=== $tag ==="
+  find "$dir/sql/catalyst/src/main/scala" -name "*.scala" | \
+    xargs grep -l "case class $ARGUMENTS\b\|object $ARGUMENTS\b" 2>/dev/null
+done
+```
+
+If the expression is not found in catalyst, also check core:
+
+```bash
+for tag in v3.4.3 v3.5.8 v4.0.1; do
+  dir="/tmp/spark-${tag}"
+  echo "=== $tag ==="
+  find "$dir/sql" -name "*.scala" | \
+    xargs grep -l "case class $ARGUMENTS\b\|object $ARGUMENTS\b" 2>/dev/null
+done
+```
+
+### Read the Spark source for each version
+
+For each Spark version, read the expression file and note:
+
+- The `eval`, `nullSafeEval`, and `doGenCode` / `doGenCodeSafe` methods
+- The `inputTypes` and `dataType` fields (accepted input types, return type)
+- Null handling strategy (`nullable`, `nullSafeEval`)
+- ANSI mode behavior (`ansiEnabled`, `failOnError`)
+- Special cases, guards, `require` assertions, and runtime exceptions
+- Any constants or configuration the expression reads
+
+### Compare across Spark versions
+
+Produce a concise diff summary of what changed between:
+
+- 3.4.3 → 3.5.8
+- 3.5.8 → 4.0.1
+
+Pay attention to:
+
+- New input types added or removed
+- Behavior changes for edge cases (null, overflow, empty, boundary)
+- New ANSI mode branches
+- New parameters or configuration
+- Breaking API changes that Comet must shim
+
+---
+
+## Step 2: Locate the Spark Tests
+
+```bash
+for tag in v3.4.3 v3.5.8 v4.0.1; do
+  dir="/tmp/spark-${tag}"
+  echo "=== $tag ==="
+  find "$dir/sql" -name "*.scala" -path "*/test/*" | \
+    xargs grep -l "$ARGUMENTS" 2>/dev/null
+done
+```
+
+Read the relevant Spark test files and produce a list of:
+
+- Input types covered
+- Edge cases exercised (null, empty, overflow, negative, boundary values, 
special characters, etc.)
+- ANSI mode tests
+- Error cases
+
+This list will be the reference for the coverage gap analysis in Step 5.
+
+---
+
+## Step 3: Locate the Comet Implementation
+
+### Scala serde
+
+```bash
+# Find the serde object
+grep -r "$ARGUMENTS" spark/src/main/scala/org/apache/comet/serde/ 
--include="*.scala" -l
+grep -r "$ARGUMENTS" spark/src/main/scala/org/apache/comet/ 
--include="*.scala" -l | grep -v test
+```
+
+Read the serde implementation and check:
+
+- Which Spark versions the serde handles
+- Whether `getSupportLevel` is implemented and accurate
+- Whether all input types are handled
+- Whether any types are explicitly marked `Unsupported`
+
+### Shims
+
+```bash
+find spark/src/main -name "CometExprShim.scala" | xargs grep -l "$ARGUMENTS" 
2>/dev/null
+```
+
+If shims exist, read them and note any version-specific handling.
+
+### Rust / DataFusion implementation
+
+```bash
+# Search for the function in native/spark-expr
+grep -r "$ARGUMENTS" native/spark-expr/src/ --include="*.rs" -l
+grep -r "$ARGUMENTS" native/core/src/ --include="*.rs" -l
+```
+
+If the expression delegates to DataFusion, find it there too:
+
+```bash
+# Check if there's a DataFusion built-in function with this name
+find native/ -name "Cargo.lock" -exec grep -A2 "datafusion" {} \; | grep 
"version" | head -5
+grep -r "$ARGUMENTS" ~/.cargo/registry/src/ --include="*.rs" -l 2>/dev/null | 
head -10
+```
+
+Read the Rust implementation and check:
+
+- Null handling (does it propagate nulls correctly?)
+- Overflow / error handling (returns `Err` vs panics)

Review Comment:
   And underflow



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat: add audit-comet-expression Claude Code skill [datafusion-comet]

Reply via email to