dungba88 commented on issue #12543:
URL: https://github.com/apache/lucene/issues/12543#issuecomment-1748001631
I put together a PR at https://github.com/apache/lucene/pull/12624.
I also verified with a custom dictionary (~1MB in size) that position does
not go backward to previously w
risdenk merged PR #2678:
URL: https://github.com/apache/lucene-solr/pull/2678
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.
risdenk opened a new pull request, #2678:
URL: https://github.com/apache/lucene-solr/pull/2678
Backport SOLR-17004
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubs
rmuir commented on issue #12621:
URL: https://github.com/apache/lucene/issues/12621#issuecomment-1747775200
> Actually it is worse: Java 20 introduced conversion between short/float,
but we got neither a native `float16` datatype nor vector support. In short:
completely unuseable.
We
risdenk merged PR #2677:
URL: https://github.com/apache/lucene-solr/pull/2677
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.
dungba88 opened a new pull request, #12624:
URL: https://github.com/apache/lucene/pull/12624
### Description
Refactor the method in `BytesStore` needed for FST construction to an
abstract class and allow it to be passed from `FSTCompiler.Builder`. The
Builder will still maintain `byt
gf2121 opened a new pull request, #12623:
URL: https://github.com/apache/lucene/pull/12623
### Description
As `StableMSBRadixSorter` always requires a `O(n)` extra memory. We can use
a `MergeSorter` taking advantage of the extra memory instead of
`InPlaceMergeSorter`.
### Benc
benwtrent commented on issue #12615:
URL: https://github.com/apache/lucene/issues/12615#issuecomment-1747350135
@jmazanec15 I agree that SPANN seems more attractive. I would argue though
we don't need to do clustering (in the paper they do clustering, but with
minimal effectiveness), but co
jmazanec15 commented on issue #12615:
URL: https://github.com/apache/lucene/issues/12615#issuecomment-1747329967
A hybrid disk-memory algorithm would have very strong benefits. I did run a
few tests recently that confirmed HNSW does not function very well when memory
gets constrained (which
gf2121 commented on code in PR #12610:
URL: https://github.com/apache/lucene/pull/12610#discussion_r1346210779
##
lucene/core/src/java/org/apache/lucene/util/bkd/MutablePointTreeReaderUtils.java:
##
@@ -81,6 +86,40 @@ protected int byteAt(int i, int k) {
return (reade
gf2121 commented on code in PR #12610:
URL: https://github.com/apache/lucene/pull/12610#discussion_r1346202698
##
lucene/core/src/java/org/apache/lucene/util/bkd/MutablePointTreeReaderUtils.java:
##
@@ -81,6 +86,40 @@ protected int byteAt(int i, int k) {
return (reade
benwtrent commented on issue #12615:
URL: https://github.com/apache/lucene/issues/12615#issuecomment-1747298348
> DiskANN is known to be slower at indexing than HNSW and the blog post does
not compare single threaded index times with Lucene.
@robertvanwinkle1138 this is just one of my
robertvanwinkle1138 commented on issue #12615:
URL: https://github.com/apache/lucene/issues/12615#issuecomment-1747228177
@benwtrent
For merges there is "FreshDiskANN: A Fast and Accurate Graph-Based
ANN Index for Streaming Similarity Search"
https://arxiv.org/pdf/2105.09613.pdf
uschindler commented on issue #12621:
URL: https://github.com/apache/lucene/issues/12621#issuecomment-1747206287
See https://github.com/openjdk/jdk/pull/9422 (Java 20)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use th
uschindler commented on issue #12621:
URL: https://github.com/apache/lucene/issues/12621#issuecomment-1747204954
Actually it is worse: Java 20 introduced conversion between short/float, but
we got neither a native `float16` datatype nor vector support. In short:
completely unuseable. 🤮
--
iverase commented on PR #12600:
URL: https://github.com/apache/lucene/pull/12600#issuecomment-1747072284
>@iverase, I think you have to move the changes entry to Lucene 10.
I did it already in ba74da1
>I changed the Policeman Jenkins MMAP job back to Lucene Main branch. The
nex
rmuir commented on issue #12621:
URL: https://github.com/apache/lucene/issues/12621#issuecomment-1747066837
My recommendation: stop messing around with `byte` and start thinking about
the new 16-bit half-float support that is present in Java 21. Unfortunately the
half-float *vectorization*
uschindler commented on PR #12600:
URL: https://github.com/apache/lucene/pull/12600#issuecomment-1747053947
@iverase, I think you have to move the changes entry to Lucene 10.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and
rmuir commented on issue #12621:
URL: https://github.com/apache/lucene/issues/12621#issuecomment-1747044386
As far as the ARM goes, the fact it has only 128-bit SIMD is the limiting
factor.
For e.g. AVX-256, we use 64-bit vector of 8 byte values -> 128 bit vector of
8 short values ->
uschindler commented on PR #12600:
URL: https://github.com/apache/lucene/pull/12600#issuecomment-1747042486
P.S.: See
[docs](https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/nio/ByteBuffer.html#get(int,byte%5B%5D,int,int))
here. The method came with Java 13.
--
This is a
uschindler commented on PR #12600:
URL: https://github.com/apache/lucene/pull/12600#issuecomment-1747037797
Hi @iverase,
oh yeah. The absolute ByteBuffer gets are not available in older Java
versions.
If you want to backport, you could create a temporary ByteBuffer slice, but
if y
iverase commented on PR #12600:
URL: https://github.com/apache/lucene/pull/12600#issuecomment-1747031597
@uschindler I merged the change.
I tried to backported but it is not possible ByteBuffer#get(int, byte[],
int, int) is not available in the java version on line 9.x. I think it is
jpountz commented on PR #12622:
URL: https://github.com/apache/lucene/pull/12622#issuecomment-1747029247
The diff is large because I had to introduce a new
`SlowCompositeCodecReaderWrapper`, which effectively does the merge (lazily)
and can be fed to the reordering logic prior to actually r
rmuir commented on issue #12621:
URL: https://github.com/apache/lucene/issues/12621#issuecomment-1747026111
Also their suggested replacement of 3 instructions for the `VPDPBUSD` is:
> Likewise, for 8-bit values, three instructions are needed - VPMADDUBSW
which is used to multiply two
rmuir commented on issue #12621:
URL: https://github.com/apache/lucene/issues/12621#issuecomment-1747002969
the type conversions are what makes it slow. for float case it is the equiv
of:
```
float x = something;
float y = something;
float z = something;
// no conversions
f
iverase closed issue #12599: Add readBytes method to RandomAccessInput
URL: https://github.com/apache/lucene/issues/12599
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To uns
iverase merged PR #12600:
URL: https://github.com/apache/lucene/pull/12600
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apa
risdenk commented on issue #12598:
URL: https://github.com/apache/lucene/issues/12598#issuecomment-1746863472
FWIW I was looking into this a bit when I saw this issue come in.
Specifically on Solr 8.11, but as far as I can tell the changes in #12604 apply
to 8.x as well.
In a 30s asy
risdenk opened a new pull request, #2677:
URL: https://github.com/apache/lucene-solr/pull/2677
Backport of https://github.com/apache/lucene/pull/12604
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go
jpountz opened a new pull request, #12622:
URL: https://github.com/apache/lucene/pull/12622
This adds `BPReorderingMergePolicy`, a merge policy wrapper that reorders
doc IDs on merge using a `BPIndexReorderer`.
- Reordering always run on forced merges.
- A `minNaturalMergeNumDocs` pa
mikemccand commented on issue #12620:
URL: https://github.com/apache/lucene/issues/12620#issuecomment-1746831073
This might be needle moving on the size of the FSTs created by block tree
for the terms index, since it encodes long as `vLong` in its output. We should
only try this "reverse v
benwtrent opened a new issue, #12621:
URL: https://github.com/apache/lucene/issues/12621
### Description
While testing and digging around, I noticed that our float comparisons are
way faster than byte on my Macbook (M1) and pretty much the same as our byte
comparisons on a GCP Intel
iverase commented on code in PR #12600:
URL: https://github.com/apache/lucene/pull/12600#discussion_r1345506266
##
lucene/core/src/java19/org/apache/lucene/store/MemorySegmentIndexInput.java:
##
@@ -168,6 +168,28 @@ private void readBytesBoundary(byte[] b, int offset, int
len)
iverase commented on code in PR #12600:
URL: https://github.com/apache/lucene/pull/12600#discussion_r1345506266
##
lucene/core/src/java19/org/apache/lucene/store/MemorySegmentIndexInput.java:
##
@@ -168,6 +168,28 @@ private void readBytesBoundary(byte[] b, int offset, int
len)
uschindler commented on code in PR #12600:
URL: https://github.com/apache/lucene/pull/12600#discussion_r1345489213
##
lucene/core/src/java19/org/apache/lucene/store/MemorySegmentIndexInput.java:
##
@@ -168,6 +168,28 @@ private void readBytesBoundary(byte[] b, int offset, int
le
iverase commented on code in PR #12600:
URL: https://github.com/apache/lucene/pull/12600#discussion_r1345475483
##
lucene/core/src/java19/org/apache/lucene/store/MemorySegmentIndexInput.java:
##
@@ -168,6 +168,28 @@ private void readBytesBoundary(byte[] b, int offset, int
len)
dungba88 commented on issue #12543:
URL: https://github.com/apache/lucene/issues/12543#issuecomment-1746403380
Thanks @mikemccand ! Let's continue the discuss in this issue instead.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to Git
37 matches
Mail list logo