msfroh commented on code in PR #14350:
URL: https://github.com/apache/lucene/pull/14350#discussion_r1992783083
##
lucene/core/src/java/org/apache/lucene/util/automaton/StringsToAutomaton.java:
##
@@ -209,7 +209,25 @@ private static int convert(
int i = 0;
int[] labels
msfroh commented on code in PR #14350:
URL: https://github.com/apache/lucene/pull/14350#discussion_r1992783083
##
lucene/core/src/java/org/apache/lucene/util/automaton/StringsToAutomaton.java:
##
@@ -209,7 +209,25 @@ private static int convert(
int i = 0;
int[] labels
rmuir commented on code in PR #14350:
URL: https://github.com/apache/lucene/pull/14350#discussion_r1992524713
##
lucene/core/src/java/org/apache/lucene/util/automaton/StringsToAutomaton.java:
##
@@ -209,7 +209,25 @@ private static int convert(
int i = 0;
int[] labels =
rmuir commented on PR #14350:
URL: https://github.com/apache/lucene/pull/14350#issuecomment-2719555494
OG paper: https://aclanthology.org/J00-1002.pdf
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go
rmuir commented on PR #14350:
URL: https://github.com/apache/lucene/pull/14350#issuecomment-2719541789
To me it seems potentially safe and practical addition. The idea would be
that, we can add transition "alternatives" (e.g. `A` vs `a`) and it doesn't
break the high-level algorithm, due to
rmuir commented on PR #14349:
URL: https://github.com/apache/lucene/pull/14349#issuecomment-2719519004
In my head, that's what we need. There is a crazy difference in construction
and execution time between a "native" union and using the efficient linear-time
algorithm, as opposed to going
rmuir commented on PR #14350:
URL: https://github.com/apache/lucene/pull/14350#issuecomment-2719524190
@dweiss understands this one the best, he implemented it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL a
msfroh commented on code in PR #14187:
URL: https://github.com/apache/lucene/pull/14187#discussion_r1992514802
##
lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java:
##
@@ -77,7 +77,8 @@
public class IndexSearcher {
static int maxClauseCount = 1024;
- privat
msfroh commented on PR #14349:
URL: https://github.com/apache/lucene/pull/14349#issuecomment-2719491949
@willdickerson -- I took a stab at modifying `StringsToAutomaton`, to
support case-insensitive matching: https://github.com/apache/lucene/pull/14350
--
This is an automated message from
msfroh opened a new pull request, #14350:
URL: https://github.com/apache/lucene/pull/14350
### Description
This is a rough attempt to make `StringsToAutomaton` support
case-insensitive strings.
--
This is an automated message from the Apache Git Service.
To respond to the message,
github-actions[bot] commented on PR #14262:
URL: https://github.com/apache/lucene/pull/14262#issuecomment-2719429972
This PR has not had activity in the past 2 weeks, labeling it as stale. If
the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you
for your contributi
github-actions[bot] commented on PR #14187:
URL: https://github.com/apache/lucene/pull/14187#issuecomment-2719430102
This PR has not had activity in the past 2 weeks, labeling it as stale. If
the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you
for your contributi
msfroh commented on code in PR #14349:
URL: https://github.com/apache/lucene/pull/14349#discussion_r1992372705
##
lucene/core/src/test/org/apache/lucene/search/TestCaseInsensitiveTermInSetQuery.java:
##
@@ -0,0 +1,377 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) u
msfroh commented on code in PR #14349:
URL: https://github.com/apache/lucene/pull/14349#discussion_r1992424195
##
lucene/core/src/java/org/apache/lucene/search/CaseInsensitiveTermInSetQuery.java:
##
@@ -81,58 +89,95 @@ public void visit(QueryVisitor visitor) {
visitor.con
willdickerson commented on code in PR #14349:
URL: https://github.com/apache/lucene/pull/14349#discussion_r1992405437
##
lucene/core/src/test/org/apache/lucene/search/TestCaseInsensitiveTermInSetQuery.java:
##
@@ -0,0 +1,377 @@
+/*
+ * Licensed to the Apache Software Foundation
rmuir commented on code in PR #14349:
URL: https://github.com/apache/lucene/pull/14349#discussion_r1992333636
##
lucene/core/src/java/org/apache/lucene/search/CaseInsensitiveTermInSetQuery.java:
##
@@ -0,0 +1,185 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under
rmuir commented on code in PR #14349:
URL: https://github.com/apache/lucene/pull/14349#discussion_r1992332504
##
lucene/core/src/java/org/apache/lucene/search/CaseInsensitiveTermInSetQuery.java:
##
@@ -0,0 +1,185 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under
atris commented on issue #13675:
URL: https://github.com/apache/lucene/issues/13675#issuecomment-2719042595
Yes, been playing with some stuff. Should be able to get something related
up for review soon
--
This is an automated message from the Apache Git Service.
To respond to the message,
willdickerson opened a new pull request, #14349:
URL: https://github.com/apache/lucene/pull/14349
## Overview
This PR introduces a proof of concept for a case-insensitive variant of
TermInSetQuery. The implementation provides an efficient way to search for
terms regardless of case with
viliam-durina commented on issue #14348:
URL: https://github.com/apache/lucene/issues/14348#issuecomment-2718850463
I have the necessary change ready in my fork of Lucene, and it works for us.
I wanted input from maintainers whether they think this is a good idea in
general for Lucene.
--
rmuir commented on issue #14334:
URL: https://github.com/apache/lucene/issues/14334#issuecomment-2717869141
I also want to point out here, that current usage is not "incorrect". The
idea that there is a "correct" way that will always work is 100% broken.
look at what fsync() does on m
svilen-mihaylov-elastic commented on code in PR #14094:
URL: https://github.com/apache/lucene/pull/14094#discussion_r1991707917
##
lucene/core/src/test/org/apache/lucene/search/TestPatienceFloatVectorQuery.java:
##
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundat
svilen-mihaylov-elastic commented on code in PR #14094:
URL: https://github.com/apache/lucene/pull/14094#discussion_r1991703342
##
lucene/core/src/java/org/apache/lucene/search/HnswKnnCollector.java:
##
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) un
DivyanshIITB commented on issue #14348:
URL: https://github.com/apache/lucene/issues/14348#issuecomment-2717999626
Hi @viliam-durina,
I find this issue interesting and would like to work on it. I see that
Lucene currently uses ReadAdvice.RANDOM when opening vector files (.vec and
.ve
rmuir commented on issue #14334:
URL: https://github.com/apache/lucene/issues/14334#issuecomment-2717565686
> we still need the fsync on the parent directory to persist the file
metadata on Linux
Blows a giant hole in your argument, that it is ok to write to this file
and separately
rmuir commented on PR #14311:
URL: https://github.com/apache/lucene/pull/14311#issuecomment-2717885448
@renatoh we can clean up `main` at any time as it is marked deprecated for
10.2 now
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on
renatoh commented on PR #14311:
URL: https://github.com/apache/lucene/pull/14311#issuecomment-2717827401
> I'm just doing final tests. Thanks again @renatoh. I will backport it to
10.2. We can followup to remove the deprecated "sorta-kinda-longest-match" from
lucene's `main` branch, and see
dungba88 commented on PR #14226:
URL: https://github.com/apache/lucene/pull/14226#issuecomment-2717762620
I published the simplified version here for reference:
https://github.com/dungba88/lucene/commit/278d7c919bc6ca6e1618868a892bcf3d4970cea5
--
This is an automated message from the Apac
viliam-durina opened a new issue, #14348:
URL: https://github.com/apache/lucene/issues/14348
### Description
Vector similarity search using HNSW accesses the vectors very heavily during
the search (the `vec` or `veq` files). Even more than the HNSW graph itself
(the `vex` file). If t
viliam-durina commented on issue #14334:
URL: https://github.com/apache/lucene/issues/14334#issuecomment-2716978401
> personally I think we should just simply fsync the files before we close
them: nothing more fancy than that.
If we rely on the file being ever durably stored, then the
hanbj commented on PR #14267:
URL: https://github.com/apache/lucene/pull/14267#issuecomment-2716944118
The previous failed test case was org.apache.Lucene.index
TestKnnGraph.testMultiThreadedSearch.
I have confirmed the testMultiThreadedSearch method, which uses
KnnFloatVectorQuery for s
31 matches
Mail list logo