date:20240111

Re: [PR] Fix for the bug where JapaneseReadingFormFilter cannot convert some hiragana to romaji [lucene]

2024-01-11 Thread via GitHub



kuramitsu commented on PR #12885:
URL: https://github.com/apache/lucene/pull/12885#issuecomment-1886834358

   @zhaih 
   > could you please add an CHANGES.txt entry under Lucene 9.10?
   
   Thank you. I added it.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Clean up unused code & variables [lucene]

2024-01-11 Thread via GitHub



dungba88 commented on PR #12994:
URL: https://github.com/apache/lucene/pull/12994#issuecomment-1887052636

   Thanks @dweiss for approving! 
   
   Can you help me to merge if it looks okay? Thank you


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Lazily write the FST padding byte [lucene]

2024-01-11 Thread via GitHub



mikemccand merged PR #12981:
URL: https://github.com/apache/lucene/pull/12981


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2024-01-11 Thread via GitHub



jpountz commented on PR #12841:
URL: https://github.com/apache/lucene/pull/12841#issuecomment-1887173459

   FYI I pushed an annotation to nightly benchmarks, it should show up tomorrow.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Make FSTPostingFormat to build FST off-heap [lucene]

2024-01-11 Thread via GitHub



dungba88 commented on PR #12980:
URL: https://github.com/apache/lucene/pull/12980#issuecomment-1887211104

   I'm not sure why FSTPostingsFormat is different from the rest, that it write 
both the metadata and data to the same file. I think writing to separate files 
would be cleaner and more consistent.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Make Lucene90 postings format to write FST off heap [lucene]

2024-01-11 Thread via GitHub



dungba88 commented on code in PR #12985:
URL: https://github.com/apache/lucene/pull/12985#discussion_r1448927527


##
lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/Lucene90BlockTreeTermsWriter.java:
##
@@ -795,7 +783,12 @@ void writeBlocks(int prefixLength, int count) throws 
IOException {
 
   assert firstBlock.isFloor || newBlocks.size() == 1;
 
-  firstBlock.compileIndex(newBlocks, scratchBytes, scratchIntsRef);
+  boolean isRootBlock = prefixLength == 0 && count == pending.size();

Review Comment:
   I can make this a parameter to be clearer instead of inferring from 
prefixLength and count



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Avoid reset BlockDocsEnum#freqBuffer when indexHasFreq is false [lucene]

2024-01-11 Thread via GitHub



jpountz merged PR #12997:
URL: https://github.com/apache/lucene/pull/12997


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Output well-formed UTF-8 bytes in SimpleTextCodec's segmentinfos [lucene]

2024-01-11 Thread via GitHub



jpountz merged PR #12897:
URL: https://github.com/apache/lucene/pull/12897


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Output binary doc values as hex array in SimpleTextCodec [lucene]

2024-01-11 Thread via GitHub



jpountz commented on code in PR #12987:
URL: https://github.com/apache/lucene/pull/12987#discussion_r1448975940


##
lucene/codecs/src/java/org/apache/lucene/codecs/simpletext/SimpleTextDocValuesReader.java:
##
@@ -329,9 +330,15 @@ public BytesRef apply(int docID) {
   } catch (ParseException pe) {
 throw new CorruptIndexException("failed to parse int length", 
in, pe);
   }
-  term.grow(len);
-  term.setLength(len);
-  in.readBytes(term.bytes(), 0, len);
+  termByteArray.grow(len);
+  termByteArray.setLength(len);
+  in.readBytes(termByteArray.bytes(), 0, len);
+  if (len > 2) {
+term.copyBytes(
+
SimpleTextUtil.fromBytesRefString(termByteArray.get().utf8ToString()));
+  } else {
+term.setLength(0);
+  }

Review Comment:
   Why do we need this condition? Can't `fromBytesRefString` read empty arrays?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Add support for index sorting with document blocks [lucene]

2024-01-11 Thread via GitHub



s1monw merged PR #12829:
URL: https://github.com/apache/lucene/pull/12829


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Make sure `ConcurrentApproximatePriorityQueue#poll` never returns `null` on a non-empty queue. [lucene]

2024-01-11 Thread via GitHub



jpountz commented on PR #12959:
URL: https://github.com/apache/lucene/pull/12959#issuecomment-1887441748

   If there are no concerns, I plan on merging this PR soon.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Make sure `ConcurrentApproximatePriorityQueue#poll` never returns `null` on a non-empty queue. [lucene]

2024-01-11 Thread via GitHub



uschindler commented on PR #12959:
URL: https://github.com/apache/lucene/pull/12959#issuecomment-1887464862

   Hi,
   sorry I missed to test this. I started my beasting and try to reproduce it.
   Will report back.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Clean up unused code & variables [lucene]

2024-01-11 Thread via GitHub



dweiss merged PR #12994:
URL: https://github.com/apache/lucene/pull/12994


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Clean up unused code & variables [lucene]

2024-01-11 Thread via GitHub



dweiss commented on PR #12994:
URL: https://github.com/apache/lucene/pull/12994#issuecomment-1887664512

   Thank you and apologies for the delay!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Make sure `ConcurrentApproximatePriorityQueue#poll` never returns `null` on a non-empty queue. [lucene]

2024-01-11 Thread via GitHub



uschindler commented on PR #12959:
URL: https://github.com/apache/lucene/pull/12959#issuecomment-1887730699

   FYI, I did not try to run the beasting for hours because previously it was 
failing very fast on a heavy loaded Ryzen CPU with both Hotspot and OpenJ9. Now 
its stable, so I trust the results also after 2 hours (1 hour with Hotspot, 1 
hour with OpenJ9).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [I] Concurrency bug `DocumentsWriterPerThreadPool.getAndLock()` uncovered by OpenJ9 test failures? [lucene]

2024-01-11 Thread via GitHub



uschindler commented on issue #12916:
URL: https://github.com/apache/lucene/issues/12916#issuecomment-1887733274

   The latest PR seems to fix the issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Fix for the bug where JapaneseReadingFormFilter cannot convert some hiragana to romaji [lucene]

2024-01-11 Thread via GitHub



zhaih merged PR #12885:
URL: https://github.com/apache/lucene/pull/12885


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Fix for the bug where JapaneseReadingFormFilter cannot convert some hiragana to romaji [lucene]

2024-01-11 Thread via GitHub



zhaih commented on PR #12885:
URL: https://github.com/apache/lucene/pull/12885#issuecomment-1888066363

   Merged and backported, thanks for the contribution @kuramitsu !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Output binary doc values as hex array in SimpleTextCodec [lucene]

2024-01-11 Thread via GitHub



msfroh commented on code in PR #12987:
URL: https://github.com/apache/lucene/pull/12987#discussion_r1449507398


##
lucene/codecs/src/java/org/apache/lucene/codecs/simpletext/SimpleTextDocValuesReader.java:
##
@@ -329,9 +330,15 @@ public BytesRef apply(int docID) {
   } catch (ParseException pe) {
 throw new CorruptIndexException("failed to parse int length", 
in, pe);
   }
-  term.grow(len);
-  term.setLength(len);
-  in.readBytes(term.bytes(), 0, len);
+  termByteArray.grow(len);
+  termByteArray.setLength(len);
+  in.readBytes(termByteArray.bytes(), 0, len);
+  if (len > 2) {
+term.copyBytes(
+
SimpleTextUtil.fromBytesRefString(termByteArray.get().utf8ToString()));
+  } else {
+term.setLength(0);
+  }

Review Comment:
   ~~The issue is that on the writing side, if `stringVal == null`, we write 
the length as `0` and don't output anything. So, 
`termByteArray.get().utf8ToString()` is the empty string and 
`fromBytesRefString` throws an exception on that. It was showing up as a unit 
test failure.~~
   
   ~~I added this condition here, but I suppose it would also be fine to output 
`[]` followed by `F` for missing binary values (though we'll waste a couple of 
bytes). Of course, now I'm wondering how we get here if the doc doesn't have 
the given field...~~
   
   Nope... after debugging it, it wasn't the `null` case (which seems to work 
fine and skips any doc where the "has value" is `F`). It turns out that 
`fromBytesRefString` **can't** read empty arrays. The logic extracts from 
within the `[]`, then tries to split by `,`. The result is the empty string, so 
it thinks it has 1 "part".
   
   I'll follow up with a fix for `fromBytesRefString` instead.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Output binary doc values as hex array in SimpleTextCodec [lucene]

2024-01-11 Thread via GitHub



msfroh commented on code in PR #12987:
URL: https://github.com/apache/lucene/pull/12987#discussion_r1449507398


##
lucene/codecs/src/java/org/apache/lucene/codecs/simpletext/SimpleTextDocValuesReader.java:
##
@@ -329,9 +330,15 @@ public BytesRef apply(int docID) {
   } catch (ParseException pe) {
 throw new CorruptIndexException("failed to parse int length", 
in, pe);
   }
-  term.grow(len);
-  term.setLength(len);
-  in.readBytes(term.bytes(), 0, len);
+  termByteArray.grow(len);
+  termByteArray.setLength(len);
+  in.readBytes(termByteArray.bytes(), 0, len);
+  if (len > 2) {
+term.copyBytes(
+
SimpleTextUtil.fromBytesRefString(termByteArray.get().utf8ToString()));
+  } else {
+term.setLength(0);
+  }

Review Comment:
   ~~The issue is that on the writing side, if `stringVal == null`, we write 
the length as `0` and don't output anything. So, 
`termByteArray.get().utf8ToString()` is the empty string and 
`fromBytesRefString` throws an exception on that. It was showing up as a unit 
test failure.~~
   
   ~~I added this condition here, but I suppose it would also be fine to output 
`[]` followed by `F` for missing binary values (though we'll waste a couple of 
bytes). Of course, now I'm wondering how we get here if the doc doesn't have 
the given field...~~
   
   Nope... after debugging it, it wasn't the `null` case (which seems to work 
fine and skips any doc where the "has value" is `F`). It turns out that 
`fromBytesRefString` **can't** read empty arrays. The logic extracts from 
within the `[]`, then tries to split by " ". The result is the empty string, so 
it thinks it has 1 "part".
   
   I'll follow up with a fix for `fromBytesRefString` instead.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [I] Should we explore DiskANN for aKNN vector search? [lucene]

2024-01-11 Thread via GitHub



kevindrosendahl commented on issue #12615:
URL: https://github.com/apache/lucene/issues/12615#issuecomment-1888208867

   > How is segment merging implemented by Lucene Vamana?
   
   I didn't do anything special for Vamana in these experiments, the index 
construction and merging are practically identical to HNSW. The implication of 
course here is that the vectors need to be in memory when building the merged 
graph or performance falls off a cliff. To make this useful operationally you'd 
want to ensure your max segment size does not exceed available memory, and 
ideally you would build the index on a dedicated node to not displace pages 
being used to server queries.
   
   The DiskANN paper suggests clustering the vectors and assigning each vector 
to the two closest clusters, building a graph for each vector, then overlaying 
the graphs and doing another round of pruning. You can then limit the amount of 
memory required based off the number of vectors and clusters. I didn't explore 
this, but it'd be a worthwhile improvement in the long term.
   
   > Correct me if I'm wrong, looks like the Java ANN implementations examine 
the node ids in a more or less random order, filtering requires a `Bits` 
object. I've been wondering if SPANN solves this by enabling ascending doc id 
intersection.
   
   That's correct, the graph is explored (roughly) by proximity order rather 
than doc id. It's an interesting question about SPANN, intuitively it seems 
like you may be able to do as you describe by keeping each vector postings list 
sorted by doc id order, then essentially doing a streaming merge sort on the 
relevant centroids' postings as you consume the candidates and iterate through 
the passed in sorted doc id iterator.
   
   > I have general concerns about large copy overheads
   
   I believe all of today's IO layer implementations in Lucene ([including mmap 
on java 
21](https://github.com/apache/lucene/blob/8bee41880e41024109bf5729584ebc5dd1003717/lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInput.java#L214))
 copy the bytes from the page cache onto the heap one way or another. This 
could potentially be improved in the future by passing a `MemorySegment` 
around, but it seems to have gotten things pretty far so far without problems.
   
   > I know that is a newer Linux kernel module/interface, so that would lead 
me to believe actually taking this implementation forward would require a bit 
of comprehensive testing. ... The other concern I have is that I suspect the 
community would probably need to keep an eye on io_uring given its age.
   
   Yeah, I wouldn't suggest that this be the default implementation, or perhaps 
even included in Lucene core itself at all given how platform specific it is 
(and this POC at least required a small amount ~50 LOC of C glue code linking 
in `liburing` to make things smooth). My main goal here was to highlight to the 
community a few things that may be relevant beyond vector indexes and DiskANN:
   - modern parallel I/O can provide significant performance improvements if 
you're able to remove I/O dependencies in the algorithms
   - the kernel is not doing a great job in this case of determining the best 
pages to keep in memory
   
   There's clearly a ton of perf left on the table by using sequential I/O and 
only the page cache in this case, but it's of course also much simpler both in 
terms of implementation and operational complexity.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[I] org.apache.lucene.search.TestByteVectorSimilarityQuery.testApproximate failing intermittently [lucene]

2024-01-11 Thread via GitHub



msfroh opened a new issue, #13009:
URL: https://github.com/apache/lucene/issues/13009

   ### Description
   
   ### Failure
   ```
   org.apache.lucene.search.TestByteVectorSimilarityQuery > testApproximate 
FAILED
   java.lang.UnsupportedOperationException
   at 
__randomizedtesting.SeedInfo.seed([5D3CBCC19F394DBA:55EAB2739217D945]:0)
   at 
org.apache.lucene.search.TestByteVectorSimilarityQuery$1.createVectorScorer(TestByteVectorSimilarityQuery.java:81)
   at 
org.apache.lucene.search.AbstractVectorSimilarityQuery$1.scorer(AbstractVectorSimilarityQuery.java:148)
   at org.apache.lucene.search.Weight.scorerSupplier(Weight.java:135)
   at org.apache.lucene.search.Weight.bulkScorer(Weight.java:167)
   at 
org.apache.lucene.search.LRUQueryCache$CachingWrapperWeight.bulkScorer(LRUQueryCache.java:930)
   at 
org.apache.lucene.tests.search.AssertingWeight.bulkScorer(AssertingWeight.java:122)
   at 
org.apache.lucene.search.LRUQueryCache$CachingWrapperWeight.bulkScorer(LRUQueryCache.java:930)
   at 
org.apache.lucene.tests.search.AssertingWeight.bulkScorer(AssertingWeight.java:122)
   at 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:678)
   at 
org.apache.lucene.tests.search.AssertingIndexSearcher.search(AssertingIndexSearcher.java:79)
   at 
org.apache.lucene.search.IndexSearcher.lambda$search$2(IndexSearcher.java:636)
   at 
org.apache.lucene.search.TaskExecutor$TaskGroup.lambda$createTask$0(TaskExecutor.java:117)
   at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
   at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
   at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
   at java.base/java.lang.Thread.run(Thread.java:840)
   ```
   
   ### Reproduce with
   ```
   gradlew :lucene:core:test --tests 
"org.apache.lucene.search.TestByteVectorSimilarityQuery.testApproximate" 
-Ptests.jvms=2 \
-Ptests.jvmargs= -Ptests.seed=5D3CBCC19F394DBA -Ptests.gui=true 
-Ptests.file.encoding=UTF-8 -Ptests.vectorsize=512
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Output binary doc values as hex array in SimpleTextCodec [lucene]

2024-01-11 Thread via GitHub



msfroh commented on PR #12987:
URL: https://github.com/apache/lucene/pull/12987#issuecomment-1888270133

   Test failure is reproducible on main: 
https://github.com/apache/lucene/issues/13009


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [I] Should we explore DiskANN for aKNN vector search? [lucene]

2024-01-11 Thread via GitHub



rmuir commented on issue #12615:
URL: https://github.com/apache/lucene/issues/12615#issuecomment-1888419082

   i would be extremely careful around io_uring, it is disabled in many 
environments (e.g. by default in container environments) for security reasons: 
   * 
https://security.googleblog.com/2023/06/learnings-from-kctf-vrps-42-linux.html
   * https://github.com/moby/moby/pull/46762
   * https://github.com/containerd/containerd/pull/9320
   
   To me, use of io_uring seems like a totally separate issue from KNN search. 
i don't think it is a good idea to entangle the two things.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Fix for the bug where JapaneseReadingFormFilter cannot convert some hiragana to romaji [lucene]

Re: [PR] Clean up unused code & variables [lucene]

Re: [PR] Lazily write the FST padding byte [lucene]

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

Re: [PR] Make FSTPostingFormat to build FST off-heap [lucene]

Re: [PR] Make Lucene90 postings format to write FST off heap [lucene]

Re: [PR] Avoid reset BlockDocsEnum#freqBuffer when indexHasFreq is false [lucene]

Re: [PR] Output well-formed UTF-8 bytes in SimpleTextCodec's segmentinfos [lucene]

Re: [PR] Output binary doc values as hex array in SimpleTextCodec [lucene]

Re: [PR] Add support for index sorting with document blocks [lucene]

Re: [PR] Make sure `ConcurrentApproximatePriorityQueue#poll` never returns `null` on a non-empty queue. [lucene]

Re: [PR] Make sure `ConcurrentApproximatePriorityQueue#poll` never returns `null` on a non-empty queue. [lucene]

Re: [PR] Clean up unused code & variables [lucene]

Re: [PR] Clean up unused code & variables [lucene]

Re: [PR] Make sure `ConcurrentApproximatePriorityQueue#poll` never returns `null` on a non-empty queue. [lucene]

Re: [I] Concurrency bug `DocumentsWriterPerThreadPool.getAndLock()` uncovered by OpenJ9 test failures? [lucene]

Re: [PR] Fix for the bug where JapaneseReadingFormFilter cannot convert some hiragana to romaji [lucene]

Re: [PR] Fix for the bug where JapaneseReadingFormFilter cannot convert some hiragana to romaji [lucene]

Re: [PR] Output binary doc values as hex array in SimpleTextCodec [lucene]

Re: [PR] Output binary doc values as hex array in SimpleTextCodec [lucene]

Re: [I] Should we explore DiskANN for aKNN vector search? [lucene]

[I] org.apache.lucene.search.TestByteVectorSimilarityQuery.testApproximate failing intermittently [lucene]

Re: [PR] Output binary doc values as hex array in SimpleTextCodec [lucene]

Re: [I] Should we explore DiskANN for aKNN vector search? [lucene]

24 matches

Site Navigation

Mail list logo

Footer information