Copilot commented on code in PR #16261:
URL: https://github.com/apache/lucene/pull/16261#discussion_r3430348822


##########
lucene/core/src/java/org/apache/lucene/analysis/CharArraySet.java:
##########
@@ -166,6 +167,31 @@ public Iterator<Object> iterator() {
     return map.originalKeySet().iterator();
   }
 
+  /** Returns {@code true} if this set matches entries case-insensitively. */
+  public boolean isIgnoreCase() {
+    return map.isIgnoreCase();
+  }
+
+  @Override
+  public boolean equals(Object o) {
+    if (o == this) return true;
+    if (!(o instanceof CharArraySet other)) return false;
+    if (isIgnoreCase() != other.isIgnoreCase()) return false;
+    if (size() != other.size()) return false;
+    return containsAll(other);
+  }
+
+  @Override
+  public int hashCode() {
+    int h = Boolean.hashCode(isIgnoreCase());
+    for (char[] key : map.keys) {
+      if (key != null) {
+        h += Arrays.hashCode(key);
+      }
+    }

Review Comment:
   `hashCode()` iterates over `map.keys` (the backing hash table array), which 
makes the runtime proportional to the table capacity rather than the number of 
elements. This can be significantly more expensive for sparse tables. Prefer 
iterating only over present keys (e.g., via `map.originalKeySet()` / the set 
iterator) to keep `hashCode()` closer to O(size).



##########
lucene/core/src/test/org/apache/lucene/analysis/TestCharArraySet.java:
##########
@@ -373,8 +373,34 @@ public void testContainsWithNull() {
 
   public void testToString() {
     CharArraySet set = CharArraySet.copy(Collections.singleton("test"));
-    assertEquals("[test]", set.toString());
+    assertEquals("[test](ignoreCase=false)", set.toString());
     set.add("test2");
     assertTrue(set.toString().contains(", "));
+
+    CharArraySet ignoreCase = new CharArraySet(Collections.singleton("test"), 
true);
+    assertEquals("[test](ignoreCase=true)", ignoreCase.toString());
+  }
+
+  public void testEqualsAndHashCode_sameContentSameIgnoreCase() {
+    for (boolean ignoreCase : new boolean[] {false, true}) {
+      CharArraySet a = new CharArraySet(Arrays.asList(TEST_STOP_WORDS), 
ignoreCase);
+      CharArraySet b = new CharArraySet(Arrays.asList(TEST_STOP_WORDS), 
ignoreCase);
+      assertNotSame(a, b);
+      assertEquals(a, b);
+      assertEquals(a.hashCode(), b.hashCode());
+    }
+  }

Review Comment:
   The new coverage verifies same-content/same-mode equality, but it doesn’t 
cover the key contract scenario for `ignoreCase=true` where the two sets 
contain the same terms with different casing (e.g., `[\"Hund\"]` vs 
`[\"hund\"]`). Add a test asserting `equals()` and `hashCode()` are consistent 
in that case to prevent regressions in the ignore-case hashing behavior.



##########
lucene/core/src/java/org/apache/lucene/analysis/CharArraySet.java:
##########
@@ -166,6 +167,31 @@ public Iterator<Object> iterator() {
     return map.originalKeySet().iterator();
   }
 
+  /** Returns {@code true} if this set matches entries case-insensitively. */
+  public boolean isIgnoreCase() {
+    return map.isIgnoreCase();
+  }
+
+  @Override
+  public boolean equals(Object o) {
+    if (o == this) return true;
+    if (!(o instanceof CharArraySet other)) return false;
+    if (isIgnoreCase() != other.isIgnoreCase()) return false;
+    if (size() != other.size()) return false;
+    return containsAll(other);
+  }
+
+  @Override
+  public int hashCode() {
+    int h = Boolean.hashCode(isIgnoreCase());
+    for (char[] key : map.keys) {
+      if (key != null) {
+        h += Arrays.hashCode(key);
+      }
+    }
+    return h;
+  }

Review Comment:
   For `ignoreCase=true`, `equals()` is case-insensitive, but `hashCode()` 
currently uses `Arrays.hashCode(key)` which is case-sensitive for `char[]`. Two 
sets that are equal under case-insensitive matching (e.g., containing 
`\"Test\"` vs `\"test\"`) can end up with different hash codes, violating the 
`equals`/`hashCode` contract. Compute each element’s hash using the same 
case-folding logic used by `CharArrayMap` for ignore-case matching (i.e., a 
case-insensitive hash of the character content) so `hashCode()` aligns with 
`equals()`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to