Re: [PR] Unicode Support for Case Insensitive Matching in RegExp [lucene]

2025-02-18 Thread via GitHub
rmuir commented on PR #14192: URL: https://github.com/apache/lucene/pull/14192#issuecomment-2667264982 ```java var re = new RegExp("παραστάσεις", RegExp.NONE, RegExp.CASE_INSENSITIVE); System.out.println(re.toAutomaton().toDot()); ``` ![Screen_Shot_2025-02-18_at_20 03 30](https:/

Re: [PR] Unicode Support for Case Insensitive Matching in RegExp [lucene]

2025-02-18 Thread via GitHub
john-wagster commented on PR #14192: URL: https://github.com/apache/lucene/pull/14192#issuecomment-2667222184 awesome! thank you sir. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] Unicode Support for Case Insensitive Matching in RegExp [lucene]

2025-02-18 Thread via GitHub
rmuir merged PR #14192: URL: https://github.com/apache/lucene/pull/14192 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apach

Re: [PR] Unicode Support for Case Insensitive Matching in RegExp [lucene]

2025-02-18 Thread via GitHub
rmuir commented on PR #14192: URL: https://github.com/apache/lucene/pull/14192#issuecomment-2667183580 @john-wagster this looks great! Thank you for simplifying this down as first step. I will merge it in after CI checks. -- This is an automated message from the Apache Git Service. To res

Re: [PR] Unicode Support for Case Insensitive Matching in RegExp [lucene]

2025-02-18 Thread via GitHub
john-wagster commented on PR #14192: URL: https://github.com/apache/lucene/pull/14192#issuecomment-2665979362 Apologies @rmuir I forgot to request review after my last set of changes; just did so now. -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] Unicode Support for Case Insensitive Matching in RegExp [lucene]

2025-02-11 Thread via GitHub
john-wagster commented on PR #14192: URL: https://github.com/apache/lucene/pull/14192#issuecomment-2651490994 @rmuir I made another pass based on your feedback and I'm good with and agree to keep this simple for a first pass. To that end I've done the following: * CaseFolding is no

Re: [PR] Unicode Support for Case Insensitive Matching in RegExp [lucene]

2025-02-10 Thread via GitHub
rmuir commented on code in PR #14192: URL: https://github.com/apache/lucene/pull/14192#discussion_r1949856748 ## lucene/core/src/java/org/apache/lucene/util/automaton/CaseFoldingUtil.java: ## @@ -0,0 +1,338 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

Re: [PR] Unicode Support for Case Insensitive Matching in RegExp [lucene]

2025-02-10 Thread via GitHub
rmuir commented on code in PR #14192: URL: https://github.com/apache/lucene/pull/14192#discussion_r1949836434 ## lucene/core/src/java/org/apache/lucene/util/automaton/CaseFolding.java: ## @@ -0,0 +1,82 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

Re: [PR] Unicode Support for Case Insensitive Matching in RegExp [lucene]

2025-02-10 Thread via GitHub
john-wagster commented on code in PR #14192: URL: https://github.com/apache/lucene/pull/14192#discussion_r1949691308 ## lucene/core/src/java/org/apache/lucene/util/automaton/CaseFoldingUtil.java: ## @@ -0,0 +1,338 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] Unicode Support for Case Insensitive Matching in RegExp [lucene]

2025-02-07 Thread via GitHub
rmuir commented on code in PR #14192: URL: https://github.com/apache/lucene/pull/14192#discussion_r1946735208 ## lucene/core/src/java/org/apache/lucene/util/automaton/CaseFoldingUtil.java: ## @@ -0,0 +1,338 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

Re: [PR] Unicode Support for Case Insensitive Matching in RegExp [lucene]

2025-02-07 Thread via GitHub
john-wagster commented on code in PR #14192: URL: https://github.com/apache/lucene/pull/14192#discussion_r1946628495 ## lucene/core/src/java/org/apache/lucene/util/automaton/CaseFoldingUtil.java: ## @@ -0,0 +1,338 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] Unicode Support for Case Insensitive Matching in RegExp [lucene]

2025-02-06 Thread via GitHub
rmuir commented on code in PR #14192: URL: https://github.com/apache/lucene/pull/14192#discussion_r1945855034 ## lucene/core/src/java/org/apache/lucene/util/automaton/CaseFoldingUtil.java: ## @@ -0,0 +1,338 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

Re: [PR] Unicode Support for Case Insensitive Matching in RegExp [lucene]

2025-02-06 Thread via GitHub
john-wagster commented on PR #14192: URL: https://github.com/apache/lucene/pull/14192#issuecomment-2641159156 Iterated here a bit after the changes in https://github.com/apache/lucene/pull/14193 went in and also pivoted to using https://www.unicode.org/Public/16.0.0/ucd/CaseFolding.txt. I

Re: [PR] Unicode Support for Case Insensitive Matching in RegExp [lucene]

2025-02-04 Thread via GitHub
rmuir commented on code in PR #14192: URL: https://github.com/apache/lucene/pull/14192#discussion_r1942013537 ## lucene/core/src/java/org/apache/lucene/util/automaton/RegExp.java: ## @@ -696,17 +896,52 @@ private Automaton toAutomaton( return a; } - private Automaton

Re: [PR] Unicode Support for Case Insensitive Matching in RegExp [lucene]

2025-02-04 Thread via GitHub
rmuir commented on code in PR #14192: URL: https://github.com/apache/lucene/pull/14192#discussion_r1942009773 ## lucene/core/src/java/org/apache/lucene/util/automaton/RegExp.java: ## @@ -436,6 +478,160 @@ public enum Kind { */ @Deprecated public static final int DEPRECATE

Re: [PR] Unicode Support for Case Insensitive Matching in RegExp [lucene]

2025-02-04 Thread via GitHub
john-wagster commented on code in PR #14192: URL: https://github.com/apache/lucene/pull/14192#discussion_r1941963110 ## lucene/core/src/java/org/apache/lucene/util/automaton/RegExp.java: ## @@ -696,17 +896,52 @@ private Automaton toAutomaton( return a; } - private Aut

Re: [PR] Unicode Support for Case Insensitive Matching in RegExp [lucene]

2025-02-04 Thread via GitHub
john-wagster commented on code in PR #14192: URL: https://github.com/apache/lucene/pull/14192#discussion_r1941960613 ## lucene/core/src/java/org/apache/lucene/util/automaton/RegExp.java: ## @@ -436,6 +478,160 @@ public enum Kind { */ @Deprecated public static final int DE

Re: [PR] Unicode Support for Case Insensitive Matching in RegExp [lucene]

2025-02-03 Thread via GitHub
rmuir commented on code in PR #14192: URL: https://github.com/apache/lucene/pull/14192#discussion_r1940403543 ## lucene/core/src/java/org/apache/lucene/util/automaton/RegExp.java: ## @@ -436,6 +478,160 @@ public enum Kind { */ @Deprecated public static final int DEPRECATE

Re: [PR] Unicode Support for Case Insensitive Matching in RegExp [lucene]

2025-02-03 Thread via GitHub
rmuir commented on code in PR #14192: URL: https://github.com/apache/lucene/pull/14192#discussion_r1940371117 ## lucene/core/src/java/org/apache/lucene/util/automaton/RegExp.java: ## @@ -696,17 +896,52 @@ private Automaton toAutomaton( return a; } - private Automaton

Re: [PR] Unicode Support for Case Insensitive Matching in RegExp [lucene]

2025-02-03 Thread via GitHub
rmuir commented on code in PR #14192: URL: https://github.com/apache/lucene/pull/14192#discussion_r1940264096 ## lucene/core/src/java/org/apache/lucene/util/automaton/RegExp.java: ## @@ -436,6 +478,160 @@ public enum Kind { */ @Deprecated public static final int DEPRECATE

Re: [PR] Unicode Support for Case Insensitive Matching in RegExp [lucene]

2025-02-03 Thread via GitHub
john-wagster commented on code in PR #14192: URL: https://github.com/apache/lucene/pull/14192#discussion_r1940026321 ## lucene/core/src/java/org/apache/lucene/util/automaton/RegExp.java: ## @@ -436,6 +478,160 @@ public enum Kind { */ @Deprecated public static final int DE

Re: [PR] Unicode Support for Case Insensitive Matching in RegExp [lucene]

2025-02-03 Thread via GitHub
john-wagster commented on code in PR #14192: URL: https://github.com/apache/lucene/pull/14192#discussion_r1940023844 ## lucene/core/src/java/org/apache/lucene/util/automaton/RegExp.java: ## @@ -424,6 +426,46 @@ public enum Kind { /** Allows case insensitive matching of ASCII

Re: [PR] Unicode Support for Case Insensitive Matching in RegExp [lucene]

2025-02-03 Thread via GitHub
rmuir commented on code in PR #14192: URL: https://github.com/apache/lucene/pull/14192#discussion_r1939974637 ## lucene/core/src/java/org/apache/lucene/util/automaton/RegExp.java: ## @@ -436,6 +478,160 @@ public enum Kind { */ @Deprecated public static final int DEPRECATE

Re: [PR] Unicode Support for Case Insensitive Matching in RegExp [lucene]

2025-02-03 Thread via GitHub
rmuir commented on code in PR #14192: URL: https://github.com/apache/lucene/pull/14192#discussion_r1939944485 ## lucene/core/src/java/org/apache/lucene/util/automaton/RegExp.java: ## @@ -424,6 +426,46 @@ public enum Kind { /** Allows case insensitive matching of ASCII charact

Re: [PR] Unicode Support for Case Insensitive Matching in RegExp [lucene]

2025-02-03 Thread via GitHub
rmuir commented on code in PR #14192: URL: https://github.com/apache/lucene/pull/14192#discussion_r1939946041 ## lucene/core/src/java/org/apache/lucene/util/automaton/RegExp.java: ## @@ -436,6 +478,160 @@ public enum Kind { */ @Deprecated public static final int DEPRECATE

Re: [PR] Unicode Support for Case Insensitive Matching in RegExp [lucene]

2025-02-03 Thread via GitHub
john-wagster commented on code in PR #14192: URL: https://github.com/apache/lucene/pull/14192#discussion_r1939889352 ## lucene/core/src/test/org/apache/lucene/util/automaton/TestRegExp.java: ## @@ -35,6 +43,320 @@ public void testSmoke() { assertFalse(run.run("ad")); }

[PR] Unicode Support for Case Insensitive Matching in RegExp [lucene]

2025-02-03 Thread via GitHub
john-wagster opened a new pull request, #14192: URL: https://github.com/apache/lucene/pull/14192 About four years ago ASCII-only case insensitive matching (https://github.com/apache/lucene-solr/pull/1541) was added to Lucene. In the past couple of a years a couple of requests have been mad

Re: [PR] Unicode Support for Case Insensitive Matching in RegExp [lucene]

2025-02-03 Thread via GitHub
john-wagster commented on PR #14192: URL: https://github.com/apache/lucene/pull/14192#issuecomment-2631833927 @jpountz, @jimczi, @mayya-sharipova ya'll may be interested in this PR so just tagging you here in case you are interested. -- This is an automated message from the Apache Git S