romseygeek merged PR #12136:
URL: https://github.com/apache/lucene/pull/12136
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.
romseygeek closed issue #12115:
org.apache.lucene.search.uhighlight.TestUnifiedHighlighterTermVec.testFetchTermVecsOncePerDoc
fails reproducibly
URL: https://github.com/apache/lucene/issues/12115
--
This is an automated message from the Apache Git Service.
To respond to the message, please l
rmuir commented on issue #12137:
URL: https://github.com/apache/lucene/issues/12137#issuecomment-1424104929
it doesn't make sense to compress integers with algorithms like these. We
can use a better integer compression algorithm (e.g. PFOR) instead.
--
This is an automated message from th
gsmiller commented on PR #12135:
URL: https://github.com/apache/lucene/pull/12135#issuecomment-1424361807
To avoid leaking internal implementation details of the two query
implementations, I've taken a different approach that doesn't expose the fact
that we're internally representing terms
gsmiller commented on code in PR #12135:
URL: https://github.com/apache/lucene/pull/12135#discussion_r1101623346
##
lucene/core/src/java/org/apache/lucene/document/KeywordField.java:
##
@@ -168,8 +171,10 @@ public static Query newExactQuery(String field, String
value) {
publ
gsmiller commented on code in PR #12135:
URL: https://github.com/apache/lucene/pull/12135#discussion_r1101624944
##
lucene/core/src/java/org/apache/lucene/index/PrefixCodedTerms.java:
##
@@ -39,6 +43,34 @@ public class PrefixCodedTerms implements Accountable {
private long de
rmuir commented on PR #12135:
URL: https://github.com/apache/lucene/pull/12135#issuecomment-1424397388
I still don't understand why we are adding all this special logic to
PrefixCodedTerms, to avoid a single Arrays.sort? The sorting only happens once
in the query, its not like it takes mill
rmuir commented on PR #12135:
URL: https://github.com/apache/lucene/pull/12135#issuecomment-1424408782
btw, latest commit is slightly better as at least it does not expose
`PrefixCodedTerms` in a public API. But it brings java Collections into the
picture, which seems not great versus using
benwtrent opened a new pull request, #12138:
URL: https://github.com/apache/lucene/pull/12138
The test assumes a single segment is created (because only one scorer is
created from the leaf contexts).
But, force merging wasn't done before getting the reader. Forcemerge to a
single seg
benwtrent merged PR #12138:
URL: https://github.com/apache/lucene/pull/12138
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.a
gsmiller commented on PR #12135:
URL: https://github.com/apache/lucene/pull/12135#issuecomment-1424470476
@rmuir The use case where I've seen this in the wild has to to with
allow/deny lists. We have some use-cases where we only want to match documents
that exist in some allow-list. That al
gsmiller commented on PR #12135:
URL: https://github.com/apache/lucene/pull/12135#issuecomment-1424473544
> But it brings java Collections into the picture, which seems not great
versus using simple arrays.
I don't love that either. An alternative could be to just require the ctor
to
jpountz commented on PR #12139:
URL: https://github.com/apache/lucene/pull/12139#issuecomment-1424577832
I ran this made-up benchmark to try to assess the benefits of the change.
It's not representative of a real-world scenario since it disables merging (to
reduce noise), but it still index
uschindler commented on PR #12135:
URL: https://github.com/apache/lucene/pull/12135#issuecomment-1424631344
Hi,
I fully agree with Robert here. I see no reason why you want to avoid a
stupid no-op Arrays.sort(). Yes if the array is large, the sort takes some
time, sorry. But what does th
gsmiller commented on PR #12135:
URL: https://github.com/apache/lucene/pull/12135#issuecomment-1424699768
Let me try to clarify a couple things:
First, this is not an Elasticsearch use-case. Probably not all the
important, but this is for an Amazon Product Search case, where we use Lu
uschindler commented on PR #12135:
URL: https://github.com/apache/lucene/pull/12135#issuecomment-1424845306
Sorry I thought this is Elasticsearch, because generally I get like daiky
people asking how to hide their documents from users. At end they are always
complaining that their SQL IN li
uschindler commented on PR #12135:
URL: https://github.com/apache/lucene/pull/12135#issuecomment-1424863546
I disagree with Robert who says "only arrays". I agree with you we can also
allow to pass collections. But only when we do it as proposed before:
We can have 2 ctors, one with `
rmuir commented on PR #12139:
URL: https://github.com/apache/lucene/pull/12139#issuecomment-1424969915
> I hesitated doing the same with `StringField` but wondered if this could
be breaking to some users who might pull a `TokenStream` themselves.
Maybe we can somehow deprecate using a
rmuir commented on PR #12139:
URL: https://github.com/apache/lucene/pull/12139#issuecomment-1424973701
high level, i dont think its a big problem, but we are adding some
type-guessing, with a lot of runtime checks, versus the user somehow having
some type safety via the .document package. S
dnhatn opened a new issue, #12140:
URL: https://github.com/apache/lucene/issues/12140
### Description
An Elasticsearch customer reported a search performance issue. We looked
into the segment stats and found that the index has 34 * 5GB segments, and
LRUQueryCache never cache these se
20 matches
Mail list logo