benwtrent commented on PR #12529:
URL: https://github.com/apache/lucene/pull/12529#issuecomment-1711551199
@msokolov what say you? It seems like encapsulating random vector seeking &
scoring into one thing makes the code simpler.
--
This is an automated message from the Apache Git Service
jpountz commented on code in PR #12529:
URL: https://github.com/apache/lucene/pull/12529#discussion_r1319777844
##
lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene90/Lucene90HnswVectorsReader.java:
##
@@ -423,8 +422,12 @@ public RandomAccessVectorValues c
mikemccand commented on issue #12542:
URL: https://github.com/apache/lucene/issues/12542#issuecomment-1711608900
Digging into this a bit, I think I found some silly performance bugs in our
current FST impl:
* We seem to create a `PagedGrowableWriter` with [page size 128 MB
here](https:
javanna commented on PR #12544:
URL: https://github.com/apache/lucene/pull/12544#issuecomment-1711628290
thanks @jpountz !
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
T
javanna merged PR #12544:
URL: https://github.com/apache/lucene/pull/12544
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apa
dweiss commented on issue #12542:
URL: https://github.com/apache/lucene/issues/12542#issuecomment-1711706053
With regard to automata/ FSTs - they're nearly the same thing, conceptually.
Automata are logically transducers producing a constant epsilon value (no
value). This knowledge can be u
jimczi commented on code in PR #12529:
URL: https://github.com/apache/lucene/pull/12529#discussion_r1319990727
##
lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene90/Lucene90HnswVectorsReader.java:
##
@@ -423,8 +422,12 @@ public RandomAccessVectorValues co
jimczi commented on code in PR #12529:
URL: https://github.com/apache/lucene/pull/12529#discussion_r132915
##
lucene/core/src/java/org/apache/lucene/util/hnsw/RandomVectorScorerProvider.java:
##
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under
jimczi commented on code in PR #12529:
URL: https://github.com/apache/lucene/pull/12529#discussion_r1320004511
##
lucene/core/src/java/org/apache/lucene/codecs/lucene95/OffHeapByteVectorValues.java:
##
@@ -60,13 +61,17 @@ public int size() {
@Override
public byte[] vecto
jimczi commented on code in PR #12529:
URL: https://github.com/apache/lucene/pull/12529#discussion_r1320004724
##
lucene/core/src/java/org/apache/lucene/util/hnsw/RandomVectorScorerProvider.java:
##
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under
jimczi commented on code in PR #12529:
URL: https://github.com/apache/lucene/pull/12529#discussion_r1320004018
##
lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene91/Lucene91HnswVectorsReader.java:
##
@@ -42,9 +41,7 @@
import org.apache.lucene.util.Bits;
mikemccand commented on PR #12489:
URL: https://github.com/apache/lucene/pull/12489#issuecomment-1712097368
@jpountz did you measure any change to index size with the reordered docids?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to
jpountz commented on PR #12489:
URL: https://github.com/apache/lucene/pull/12489#issuecomment-1712166542
I did. My wikimedium file is sorted by title, which already gives some
compression compared to random ordering. Disappointedly, recursive graph
bisection only improved compression of pos
mikemccand commented on issue #12542:
URL: https://github.com/apache/lucene/issues/12542#issuecomment-1712472912
> We seem to create a PagedGrowableWriter with [page size 128 MB
here](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java#L34
mikemccand opened a new pull request, #12545:
URL: https://github.com/apache/lucene/pull/12545
The bitsRequired passed during NodeHash rehash (when building an FST) was
too small, causing excess/wasted reallocations. This is just a performance
bug, especially impacting larger FSTs, but lik
mikemccand commented on PR #12545:
URL: https://github.com/apache/lucene/pull/12545#issuecomment-1712474813
Tests and precommit passed locally (once) for me ... I'll make sure
`Test2BFST` passes once too.
--
This is an automated message from the Apache Git Service.
To respond to the messa
mikemccand commented on PR #12545:
URL: https://github.com/apache/lucene/pull/12545#issuecomment-1712476087
For the record, this command seems to at least kick off `Test2BFST`:
`./gradlew test --max-workers=1 --tests org.apache.lucene.util.fst.Test2BFST
-Dtests.nightly=true -Dtests.mo
mikemccand commented on PR #12545:
URL: https://github.com/apache/lucene/pull/12545#issuecomment-1712497190
OK `Test2BFST` is happy:
```
BUILD SUCCESSFUL in 54m 15s
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to
stefanvodita opened a new issue, #12546:
URL: https://github.com/apache/lucene/issues/12546
### Description
When a user knows that they want multiple different aggregations, they have
to iterate the match-set once for each aggregation, which [is
inefficient](https://lists.apache.org/
stefanvodita opened a new pull request, #12547:
URL: https://github.com/apache/lucene/pull/12547
Usually facets maintain a one-dimensional array indexed by ordinal which
keeps the values they're supposed to compute.
The change here is simple in principle - use a two-dimensional array,
in
mikemccand merged PR #12545:
URL: https://github.com/apache/lucene/pull/12545
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.
mikemccand commented on PR #12545:
URL: https://github.com/apache/lucene/pull/12545#issuecomment-1712668955
I backported to 9.x as well:
https://github.com/apache/lucene/commit/d70c91134726ff5768c0bcdc7bce51f3fbfcac56
--
This is an automated message from the Apache Git Service.
To respond
jpountz commented on PR #12489:
URL: https://github.com/apache/lucene/pull/12489#issuecomment-1712779097
Wikibigall. Less space spent on doc valuse this time since I did not enable
indexing of facets. There is a more significant size reduction of postings this
time (-10.5%). This is not mis
jpountz commented on PR #12489:
URL: https://github.com/apache/lucene/pull/12489#issuecomment-1712923358
> I wonder why stored fields index size wasn't really hurt nearly as much
for wikibigall but was for wikimediumall?
This is because wikimedium uses chunks of articles as documents,
jpountz commented on PR #12489:
URL: https://github.com/apache/lucene/pull/12489#issuecomment-1712928445
Regarding positions, the reproducibility paper noted that the algorithm
helped term frequencies a bit, though not as much as docs. It doesn't say
anythink about positions, though I suspe
shubhamvishu opened a new pull request, #12548:
URL: https://github.com/apache/lucene/pull/12548
### Description
This PR addresses the issue #12394. It adds an API
**`similarityToQueryVector`** to `DoubleValuesSource` to compute vector
similarity scores between the query vector and t
jpountz opened a new pull request, #12549:
URL: https://github.com/apache/lucene/pull/12549
Currently, merge-on-full-flush only checks if merges need to run if changes
have been flushed to disk. This prevents from having different merging logic
for refreshes and commits, since the merge pol
mikemccand commented on code in PR #12337:
URL: https://github.com/apache/lucene/pull/12337#discussion_r1321799297
##
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyIndexReader.java:
##
@@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software F
mikemccand commented on code in PR #12337:
URL: https://github.com/apache/lucene/pull/12337#discussion_r1321802426
##
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/ReindexingEnrichedDirectoryTaxonomyWriter.java:
##
@@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apa
mikemccand commented on PR #12337:
URL: https://github.com/apache/lucene/pull/12337#issuecomment-1714232934
> But as I think about this feature and how do I see it mature over time, I
DO think the payload should be given when ingesting the documents
Hmm -- I don't think that's great b
mikemccand commented on issue #12190:
URL: https://github.com/apache/lucene/issues/12190#issuecomment-1714240627
I like this idea -- it's an "aggregation level expression", which computes
an expression in "aggregation space", instead of the existing (already
supported) document level expres
onyxmaster commented on issue #4549:
URL: https://github.com/apache/lucene/issues/4549#issuecomment-1714290760
Hi. Got bitten by this today after a lemmatizer filter produced two variants
of base word at the same position and ShingleFilter producing a "shingle" from
these variants, failing
jpountz commented on PR #12490:
URL: https://github.com/apache/lucene/pull/12490#issuecomment-1714465729
I plan on merging soon if there are no objections.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above
jpountz commented on PR #12526:
URL: https://github.com/apache/lucene/pull/12526#issuecomment-1714471318
We could. These tasks are a bit malicious as the doc freq is slightly
greater than the value of `k=100` so it takes lots of collected matches to find
k documents that have this term. I s
gokaai commented on code in PR #12530:
URL: https://github.com/apache/lucene/pull/12530#discussion_r1322006478
##
lucene/core/src/java/org/apache/lucene/index/CheckIndex.java:
##
@@ -610,6 +610,39 @@ public Status checkIndex(List onlySegments,
ExecutorService executorServ
jainankitk commented on issue #12527:
URL: https://github.com/apache/lucene/issues/12527#issuecomment-1714517103
> Maybe next we should try 4 readLong() for readInts32? Though I wonder how
often in this benchy are we really needing 32 bits to encode the docid deltas
in a BKD leaf block?
Tony-X closed pull request #12541: Document why we need `lastPosBlockOffset`
URL: https://github.com/apache/lucene/pull/12541
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To
zhaih commented on issue #11537:
URL: https://github.com/apache/lucene/issues/11537#issuecomment-1715016712
I checked the CHANGES list since last release and seems we have good amount
of commits already, let me start a thread about releasing the next version.
On Wed, Sep 6, 2023 at
jpountz commented on PR #12460:
URL: https://github.com/apache/lucene/pull/12460#issuecomment-1715126194
The more I think of this change, the more I like it: most of the time, you
would need to read data out of binary doc values, e.g. (variable-length)
integers, strings, etc. and exposing b
jpountz commented on code in PR #12549:
URL: https://github.com/apache/lucene/pull/12549#discussion_r1322592471
##
lucene/core/src/test/org/apache/lucene/index/TestIndexWriter.java:
##
@@ -518,11 +518,10 @@ public void testFlushWithNoMerging() throws IOException {
doc.add(n
jpountz commented on code in PR #12549:
URL: https://github.com/apache/lucene/pull/12549#discussion_r1322599113
##
lucene/core/src/test/org/apache/lucene/index/TestIndexWriterDelete.java:
##
@@ -1315,7 +1315,8 @@ public void testTryDeleteDocument() throws Exception {
w.addD
iverase commented on PR #12460:
URL: https://github.com/apache/lucene/pull/12460#issuecomment-1715224914
> I'm contemplating not introducing a new DataInputDocValues class, and
instead have a dataInput() method on BinaryDocValues
I think this approach defeats on of the main purposes f
jpountz commented on PR #12460:
URL: https://github.com/apache/lucene/pull/12460#issuecomment-1715238722
> I think this approach defeats on of the main purposes for this change,
that is to avoid allocating a byte array when reading doc values. I don't think
we want BinaryDocValues to do tha
stefanvodita opened a new pull request, #12550:
URL: https://github.com/apache/lucene/pull/12550
### Description
A user could have data about facet labels. In the demo here, we record an
author's popularity score, with authors being facet labels in an index of books.
Today, use
stefanvodita commented on PR #12550:
URL: https://github.com/apache/lucene/pull/12550#issuecomment-1715245714
Cancelling right away, this is not meant to be merged.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
U
stefanvodita closed pull request #12550: [Demo] Per label association facet
fields
URL: https://github.com/apache/lucene/pull/12550
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comme
jpountz commented on PR #12490:
URL: https://github.com/apache/lucene/pull/12490#issuecomment-1715453502
Another benchmark run on the last commit to make sure it still works as
expected, and wikibigall this time instead of wikimedium10m:
```
TaskQPS base
jimczi commented on PR #12529:
URL: https://github.com/apache/lucene/pull/12529#issuecomment-1715484871
Given that no further concerns have been raised, I am intending to merge
this change soon.
--
This is an automated message from the Apache Git Service.
To respond to the message, please
stefanvodita commented on code in PR #12337:
URL: https://github.com/apache/lucene/pull/12337#discussion_r1322872602
##
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyIndexReader.java:
##
@@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software
stefanvodita commented on PR #12337:
URL: https://github.com/apache/lucene/pull/12337#issuecomment-1715512722
Thank you for the review @mikemccand! I’ve integrated your feedback.
Updatable doc values are definitely something to consider.
For comparison, I coded up an [association facet fi
uschindler commented on PR #12460:
URL: https://github.com/apache/lucene/pull/12460#issuecomment-1715514900
> This has been a challenge so many times in the past, maybe it's time to
add `seek()` support to `DataInput`?
We have full random access (positional reads), if you extend the i
jpountz commented on code in PR #12529:
URL: https://github.com/apache/lucene/pull/12529#discussion_r1322897603
##
lucene/core/src/java/org/apache/lucene/util/hnsw/RandomVectorScorerProvider.java:
##
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under
uschindler commented on PR #12460:
URL: https://github.com/apache/lucene/pull/12460#issuecomment-1715550666
To save more memory copies, the codec may use a slice from the underlying
IndexInput directly to support both access apis. All file pointer checks would
then be performed by the low l
mikemccand merged PR #12541:
URL: https://github.com/apache/lucene/pull/12541
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.
mikemccand commented on PR #12541:
URL: https://github.com/apache/lucene/pull/12541#issuecomment-1715559983
I backported to 9.x as well ... annoying that GitHub doesn't state in
summary that the above push was to 9.x (it's only reflected here because it
referenced this PR). It does reflect
jimczi merged PR #12529:
URL: https://github.com/apache/lucene/pull/12529
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apac
jpountz merged PR #12490:
URL: https://github.com/apache/lucene/pull/12490
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apa
jimczi opened a new pull request, #12551:
URL: https://github.com/apache/lucene/pull/12551
This PR introduces a new parameter known as 'efSearch' to the knn vector
query. 'efSearch' governs the maximum size of the priority queue employed for
nearest neighbor searches. As each segment may co
Tony-X opened a new pull request, #12552:
URL: https://github.com/apache/lucene/pull/12552
### Description
FSTs supports to load offheap for a while. As we were trying to use
`FSTPostingsFormat` for some fields we realized heap usage bumped.
Upon further investigation we reali
msokolov commented on code in PR #12552:
URL: https://github.com/apache/lucene/pull/12552#discussion_r1323494538
##
lucene/codecs/src/java/org/apache/lucene/codecs/memory/FSTTermsReader.java:
##
@@ -191,7 +193,9 @@ final class TermsReader extends Terms {
this.sumTotalTerm
Tony-X commented on code in PR #12552:
URL: https://github.com/apache/lucene/pull/12552#discussion_r1323531587
##
lucene/codecs/src/java/org/apache/lucene/codecs/memory/FSTTermsReader.java:
##
@@ -191,7 +193,9 @@ final class TermsReader extends Terms {
this.sumTotalTermFr
Tony-X commented on issue #12536:
URL: https://github.com/apache/lucene/issues/12536#issuecomment-1716406470
https://github.com/apache/lucene/pull/12541 is merged and I'll close this
one
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on
Tony-X closed issue #12536: Remove `lastPosBlockOffset` from term metadata for
Lucene90PostingsFormat
URL: https://github.com/apache/lucene/issues/12536
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go
shubhamvishu commented on PR #12183:
URL: https://github.com/apache/lucene/pull/12183#issuecomment-1716957965
@jpountz I have made some changes to the `TermStates#build` to unblock this
PR and avoid the deadlock issue happening due to executor forking into itself
by checking if its a `Thre
javanna commented on code in PR #12183:
URL: https://github.com/apache/lucene/pull/12183#discussion_r1324086202
##
lucene/core/src/java/org/apache/lucene/index/TermStates.java:
##
@@ -86,19 +93,58 @@ public TermStates(
* @param needsStats if {@code true} then all leaf contex
javanna commented on code in PR #12183:
URL: https://github.com/apache/lucene/pull/12183#discussion_r1324085210
##
lucene/core/src/java/org/apache/lucene/index/TermStates.java:
##
@@ -86,19 +93,58 @@ public TermStates(
* @param needsStats if {@code true} then all leaf contex
javanna commented on code in PR #12183:
URL: https://github.com/apache/lucene/pull/12183#discussion_r1324087271
##
lucene/CHANGES.txt:
##
@@ -232,11 +172,6 @@ Other
* GITHUB#12410: Refactor vectorization support (split provider from
implementation classes).
(Uwe Schindler,
javanna commented on code in PR #12183:
URL: https://github.com/apache/lucene/pull/12183#discussion_r1324093739
##
lucene/core/src/java/org/apache/lucene/index/TermStates.java:
##
@@ -86,19 +93,58 @@ public TermStates(
* @param needsStats if {@code true} then all leaf contex
shubhamvishu commented on code in PR #12183:
URL: https://github.com/apache/lucene/pull/12183#discussion_r1324225960
##
lucene/core/src/java/org/apache/lucene/index/TermStates.java:
##
@@ -86,19 +93,58 @@ public TermStates(
* @param needsStats if {@code true} then all leaf c
uschindler commented on code in PR #12183:
URL: https://github.com/apache/lucene/pull/12183#discussion_r1324259855
##
lucene/core/src/java/org/apache/lucene/index/TermStates.java:
##
@@ -86,19 +93,58 @@ public TermStates(
* @param needsStats if {@code true} then all leaf con
jpountz commented on PR #12489:
URL: https://github.com/apache/lucene/pull/12489#issuecomment-1717341776
I just found a bug that in practice only made BP run one iteration per
level, fixing it makes performance better (wikibigall):
```
TaskQPS baseline
javanna commented on code in PR #12183:
URL: https://github.com/apache/lucene/pull/12183#discussion_r1324466373
##
lucene/core/src/java/org/apache/lucene/index/TermStates.java:
##
@@ -86,19 +93,58 @@ public TermStates(
* @param needsStats if {@code true} then all leaf contex
Shradha26 opened a new issue, #12553:
URL: https://github.com/apache/lucene/issues/12553
I’d like to gather a list of areas where Lucene’s support for aggregations
can be improved and discuss if faceting can be augmented to offer that support
or if it would need to be separate functionality
jmazanec15 commented on issue #12533:
URL: https://github.com/apache/lucene/issues/12533#issuecomment-1717981259
Additionally, the [FreshDiskANN](https://arxiv.org/pdf/2105.09613.pdf) paper
did some work in this space. They ran a test for NSG where they iteratively
repeat the following proc
uschindler commented on code in PR #12183:
URL: https://github.com/apache/lucene/pull/12183#discussion_r1324955170
##
lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java:
##
@@ -64,4 +68,57 @@ final List invokeAll(Collection>
tasks) throws IOExcept
}
retu
shubhamvishu commented on code in PR #12183:
URL: https://github.com/apache/lucene/pull/12183#discussion_r1325027898
##
lucene/core/src/java/org/apache/lucene/index/TermStates.java:
##
@@ -86,19 +93,58 @@ public TermStates(
* @param needsStats if {@code true} then all leaf c
shubhamvishu commented on code in PR #12183:
URL: https://github.com/apache/lucene/pull/12183#discussion_r1325031509
##
lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java:
##
@@ -64,4 +68,57 @@ final List invokeAll(Collection>
tasks) throws IOExcept
}
re
uschindler commented on code in PR #12183:
URL: https://github.com/apache/lucene/pull/12183#discussion_r1325065806
##
lucene/core/src/java/org/apache/lucene/index/TermStates.java:
##
@@ -86,19 +93,58 @@ public TermStates(
* @param needsStats if {@code true} then all leaf con
uschindler commented on code in PR #12183:
URL: https://github.com/apache/lucene/pull/12183#discussion_r1325066260
##
lucene/core/src/java/org/apache/lucene/index/TermStates.java:
##
@@ -86,19 +93,58 @@ public TermStates(
* @param needsStats if {@code true} then all leaf con
shubhamvishu commented on code in PR #12183:
URL: https://github.com/apache/lucene/pull/12183#discussion_r1325085377
##
lucene/core/src/java/org/apache/lucene/index/TermStates.java:
##
@@ -86,19 +93,58 @@ public TermStates(
* @param needsStats if {@code true} then all leaf c
uschindler commented on code in PR #12183:
URL: https://github.com/apache/lucene/pull/12183#discussion_r1325092791
##
lucene/core/src/java/org/apache/lucene/index/TermStates.java:
##
@@ -86,19 +93,58 @@ public TermStates(
* @param needsStats if {@code true} then all leaf con
uschindler commented on code in PR #12183:
URL: https://github.com/apache/lucene/pull/12183#discussion_r1325097710
##
lucene/core/src/java/org/apache/lucene/index/TermStates.java:
##
@@ -86,19 +93,58 @@ public TermStates(
* @param needsStats if {@code true} then all leaf con
shubhamvishu commented on code in PR #12183:
URL: https://github.com/apache/lucene/pull/12183#discussion_r1325111790
##
lucene/core/src/java/org/apache/lucene/index/TermStates.java:
##
@@ -86,19 +93,58 @@ public TermStates(
* @param needsStats if {@code true} then all leaf c
jpountz commented on PR #12526:
URL: https://github.com/apache/lucene/pull/12526#issuecomment-1718893926
FYI there was an interesting observation on another benchmark that took
advantage of recursive graph bisection:
https://jpountz.github.io/lucene-9.7-vs-9.8/. One query (`the incredibles`
gokaai opened a new pull request, #12554:
URL: https://github.com/apache/lucene/pull/12554
### Description
Allows `org.apache.lucene.search.FilteredDocIdSetIterator#match(doc)` to
throw an IOException so that users don't have to explicitly catch it
Closes #12492
--
This is
mikemccand commented on code in PR #12552:
URL: https://github.com/apache/lucene/pull/12552#discussion_r1325827523
##
lucene/codecs/src/java/org/apache/lucene/codecs/memory/FSTTermsReader.java:
##
@@ -191,7 +193,9 @@ final class TermsReader extends Terms {
this.sumTotalTe
mikemccand commented on PR #12552:
URL: https://github.com/apache/lucene/pull/12552#issuecomment-1719297920
@Tony-X have you tried passing all Lucene unit tests using this Codec? I
think you can add `-Dtests.codec=...` to force all tests to use it.
--
This is an automated message from th
jpountz commented on PR #12554:
URL: https://github.com/apache/lucene/pull/12554#issuecomment-1719334101
Looks great, can you add a CHANGES entry under "Lucene 9.8.0" / "API
Changes"?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to
jimczi commented on PR #12551:
URL: https://github.com/apache/lucene/pull/12551#issuecomment-1719529457
I made some adjustments to the formula to consider the logarithmic
complexity of the greedy search. I conducted tests on two datasets:
1. The standard SIFT dataset, which has 128 d
epotyom opened a new pull request, #12555:
URL: https://github.com/apache/lucene/pull/12555
Fix: Lucene90DocValuesProducer.TermsDict.seekCeil doesn't always position
bytes correctly (#12167)
TermsDict `ord` and `bytes` can be out of sync after a call to seekCeil
which caused test fai
jpountz merged PR #12554:
URL: https://github.com/apache/lucene/pull/12554
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apa
jpountz closed issue #12492: Allow FilteredDocIdSetIterator.match(doc) to throw
IOException
URL: https://github.com/apache/lucene/issues/12492
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the spe
jpountz merged PR #12489:
URL: https://github.com/apache/lucene/pull/12489
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apa
jpountz commented on PR #12489:
URL: https://github.com/apache/lucene/pull/12489#issuecomment-1719763923
Since it's fairly unintrusive to other functionality, I felt free to merge.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitH
jpountz commented on PR #12489:
URL: https://github.com/apache/lucene/pull/12489#issuecomment-1719763914
Since it's fairly unintrusive to other functionality, I felt free to merge.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitH
Tony-X commented on PR #12552:
URL: https://github.com/apache/lucene/pull/12552#issuecomment-1719878383
@mikemccand hey Mike, I did not make a new Codec for this. IIRC,
`FSTPostingsFormat` will be exercised by the RandomCodec. Also there is
`TestFSTPostingsFormat extends BasePostingsFormatT
epotyom commented on PR #12555:
URL: https://github.com/apache/lucene/pull/12555#issuecomment-1719935323
Extended existing nightly random tests to catch the issue most of the time.
Would that be enough or do we need a test that catches it every single time?
--
This is an automated message
benwtrent commented on PR #12551:
URL: https://github.com/apache/lucene/pull/12551#issuecomment-1720048714
@jimczi I like this idea at first glance, but I have one major concern.
What about data that is indexed according to a specific order? Two tests to
verify how this behaves would
jimczi commented on PR #12551:
URL: https://github.com/apache/lucene/pull/12551#issuecomment-1720078533
Adding some charts together to compare how effective it is to use a dynamic
efSearch.
The first chart shows how well different efSearch values work on one
segment, on multiple segm
zhaih commented on code in PR #12555:
URL: https://github.com/apache/lucene/pull/12555#discussion_r1326538550
##
lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90DocValuesProducer.java:
##
@@ -1205,7 +1205,15 @@ public SeekStatus seekCeil(BytesRef text) throws
IOE
1 - 100 of 20519 matches
Mail list logo