jpountz merged PR #12407:
URL: https://github.com/apache/lucene/pull/12407
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apa
jpountz opened a new pull request, #12415:
URL: https://github.com/apache/lucene/pull/12415
This introduces `LeafCollector#collect(DocIdStream)` to enable collectors to
collect batches of doc IDs at once. `BooleanScorer` takes advantage of this by
creating a `DocIdStream` whose `count()` me
jpountz commented on PR #12415:
URL: https://github.com/apache/lucene/pull/12415#issuecomment-1621604501
Note: this is just a proof of concept to discuss the idea of integrating at
the collector level, more work is needed to add more tests, integrating in the
test framework (`AssertingLeafC
jpountz commented on issue #12358:
URL: https://github.com/apache/lucene/issues/12358#issuecomment-1621611587
I opened a proof of concept for the idea that I suggested above at #12415.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to
bobmanc opened a new issue, #12416:
URL: https://github.com/apache/lucene/issues/12416
### Description
0
I am trying to use a larger vector dictionary with the demo code. I have
tried all the files here https://nlp.stanford.edu/projects/glove/ and every one
throws this...
tang-hi opened a new pull request, #12417:
URL: https://github.com/apache/lucene/pull/12417
### Description
ISSUE #12396
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
tang-hi commented on issue #12396:
URL: https://github.com/apache/lucene/issues/12396#issuecomment-1621668271
I have attempted to implement an Int version of the scalar and vector
forutil. I have submitted a draft PR as a simple starting point for those
interested in this issue. Even if it
tang-hi commented on issue #12396:
URL: https://github.com/apache/lucene/issues/12396#issuecomment-1621674331
When using the int type, there is a significant performance improvement
compared to the long type, approximately 2-3 times. You can refer to
[link](https://github.com/ChrisHegarty/b
benwtrent commented on PR #12413:
URL: https://github.com/apache/lucene/pull/12413#issuecomment-1621798989
OK, I reverted my minor optimizations and moved the method to be more inline
with what Lucene did before.
Now I am getting exactly the same recall and the weird bug is fixed wher
gsmiller commented on PR #12408:
URL: https://github.com/apache/lucene/pull/12408#issuecomment-1621991432
Thanks @mikemccand. Just removed the errant "nocommit" comment I left
hanging in the initial PR (doh!) and added a CHANGES entry, so this should be a
clean change now.
--
This is an
mkhludnev commented on issue #12393:
URL: https://github.com/apache/lucene/issues/12393#issuecomment-1622010377
Noob says: Tokenizers for word embeddings
https://github.com/huggingface/tokenizers are quite different to ours.
`thanks to the Rust implementation. Takes less than 20 seconds
jpountz opened a new issue, #12418:
URL: https://github.com/apache/lucene/issues/12418
### Description
The following gradle command fails reproducibly on `branch_9x` with the
following error:
```
> java.lang.AssertionError
> at
__randomizedtesting.SeedI
uschindler commented on code in PR #12417:
URL: https://github.com/apache/lucene/pull/12417#discussion_r1253370894
##
lucene/core/src/java/org/apache/lucene/internal/vectorization/DefaultForUtil90.java:
##
@@ -0,0 +1,135 @@
+// This file has been automatically generated, DO NOT
uschindler commented on issue #12396:
URL: https://github.com/apache/lucene/issues/12396#issuecomment-1622121123
> I agree. There are more complications: DataInput does not have a read
method for int[], only one for float[] and long[]. So changing this is a bigger
task.
I just notic
uschindler commented on issue #12396:
URL: https://github.com/apache/lucene/issues/12396#issuecomment-1622126199
> However, I am not sure why many of the tests are failing, even though the
tests for Pforutil and forutil are passing. I will take a closer look at the
specific reasons when I h
ChrisHegarty commented on PR #12417:
URL: https://github.com/apache/lucene/pull/12417#issuecomment-1622135274
This is starting to look much more like what I was expecting (but still a
long way to go). Nice!
It looks like you @tang-hi brought in some code from [bitpacking][1], which
i
uschindler commented on PR #12417:
URL: https://github.com/apache/lucene/pull/12417#issuecomment-1622155822
> This is starting to look much more like what I was expecting (but still a
long way to go). Nice!
>
> It looks like you @tang-hi brought in some code from
[bitpacking](https:/
tang-hi commented on PR #12417:
URL: https://github.com/apache/lucene/pull/12417#issuecomment-1622158573
> Do you @tang-hi want to open branch-push access to me @ChrisHegarty (and
whoever else desires to write code here)?
Of course. what should I do to open branch-push access?
--
T
uschindler commented on PR #12417:
URL: https://github.com/apache/lucene/pull/12417#issuecomment-1622160707
> > Do you @tang-hi want to open branch-push access to me @ChrisHegarty (and
whoever else desires to write code here)?
>
> Of course. what should I do to open branch-push access
tang-hi commented on PR #12417:
URL: https://github.com/apache/lucene/pull/12417#issuecomment-1622167259
Please feel free to submit your commits. I am a bit exhausted now and don't
have the energy to look deeper. As for the issue with the failed tests, I
believe the decode and encode functi
uschindler commented on PR #12417:
URL: https://github.com/apache/lucene/pull/12417#issuecomment-1622171870
It looks like scalar version passes tests (as GitHub uses java 17).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub an
tang-hi commented on PR #12417:
URL: https://github.com/apache/lucene/pull/12417#issuecomment-1622173370
Vectorized code is automatically generated, but I think we can manually
write code for special bitPerValue (1, 2, 4, 8, 16) in the future to reduce
code size. Of course, we can also hand
uschindler commented on PR #12417:
URL: https://github.com/apache/lucene/pull/12417#issuecomment-1622176898
> Vectorized code is automatically generated, but I think we can manually
write code for special bitPerValue (1, 2, 4, 8, 16) in the future to reduce
code size. Of course, we can also
uschindler commented on PR #12417:
URL: https://github.com/apache/lucene/pull/12417#issuecomment-1622190064
The reason why the backwards compatibility test ate failing is easy. We
modified the Lucene90 codec and not created a new one.
The new code fails to read an index created with L
msokolov commented on code in PR #12413:
URL: https://github.com/apache/lucene/pull/12413#discussion_r1253495097
##
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java:
##
@@ -256,6 +256,72 @@ public NeighborQueue searchLevel(
return results;
}
+ /
msokolov commented on issue #12416:
URL: https://github.com/apache/lucene/issues/12416#issuecomment-1622293880
Your dictionary must be sorted in UTF-8 order
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above
msokolov commented on issue #12394:
URL: https://github.com/apache/lucene/issues/12394#issuecomment-1622298052
The idea makes sense to me, but I don't like the word "distance" in this
context because not all of the similarities are distances in the sense of a
metric space. That's why I pref
benwtrent commented on code in PR #12413:
URL: https://github.com/apache/lucene/pull/12413#discussion_r1253619111
##
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java:
##
@@ -256,6 +256,72 @@ public NeighborQueue searchLevel(
return results;
}
+
benwtrent commented on code in PR #12413:
URL: https://github.com/apache/lucene/pull/12413#discussion_r1253620162
##
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java:
##
@@ -204,26 +204,26 @@ private static NeighborQueue search(
if (initialEp == -1)
xjtushilei opened a new issue, #12419:
URL: https://github.com/apache/lucene/issues/12419
### Description
I use lucene 9.6 in multi-threading, and then found that if the three
classes `IndexWriter`, `SegmentReader`, and `ConcurrentMergeScheduler` are used
in a multi-threaded environm
30 matches
Mail list logo