jpountz commented on issue #11773:
URL: https://github.com/apache/lucene/issues/11773#issuecomment-1254622883
Thanks, I had not well understood that you were after the case when both the
filter and the sort would be on the same field. You are right that the
collector could do better by bein
wjp719 commented on PR #687:
URL: https://github.com/apache/lucene/pull/687#issuecomment-1254624579
> I would rather not add this option and make the binary search logic a bit
more complex/inefficient.
OK thanks, when index sorts on descending order, I have tried bkd binary
search
dweiss closed pull request #11802: fix sentence iteration in opennlp package
URL: https://github.com/apache/lucene/pull/11802
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To
dweiss commented on PR #11802:
URL: https://github.com/apache/lucene/pull/11802#issuecomment-1254626299
Duplicated in #11734
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
dweiss commented on PR #11734:
URL: https://github.com/apache/lucene/pull/11734#issuecomment-1254627495
I don't know what happened there but I'm sure it's going to be fixable. Let
me take a look later today or tomorrow morning (I'm out of office today).
--
This is an automated message fro
rmuir commented on issue #11788:
URL: https://github.com/apache/lucene/issues/11788#issuecomment-1254645510
looks like an antlr problem, if they broke backwards compat, they prolly
should have named it `5.x`?
let's be careful about upgrading to new versions. newer antlr versions have
uschindler commented on issue #11788:
URL: https://github.com/apache/lucene/issues/11788#issuecomment-1254658670
Thanks Robert. I would have said the same. In the worst case we should (like
most projects do for ASM, e.g. forbidden apis) shade the antrlr runtime to
lucenes package name and i
vigyasharma commented on code in PR #11796:
URL: https://github.com/apache/lucene/pull/11796#discussion_r977305054
##
lucene/core/src/java/org/apache/lucene/store/ByteTrackingIndexOutput.java:
##
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
jpountz commented on issue #11799:
URL: https://github.com/apache/lucene/issues/11799#issuecomment-1254691652
> we want a single Field containing a list of key-value pairs or a json
formatted
Note that you can add one `FeatureField` field to your Lucene document for
every key/value p
jpountz commented on code in PR #11796:
URL: https://github.com/apache/lucene/pull/11796#discussion_r977363543
##
lucene/core/src/java/org/apache/lucene/store/ByteTrackingIndexOutput.java:
##
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or
jpountz commented on PR #11796:
URL: https://github.com/apache/lucene/pull/11796#issuecomment-1254705837
This class feels like it'd be a good fit for the `misc` module rather than
`core`?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on
jpountz commented on PR #687:
URL: https://github.com/apache/lucene/pull/687#issuecomment-1254731765
I'm (maybe naively) assuming that we could work around this case at the
inner node level by skipping inner nodes whose max value is equal to the min
value if we have already seen this value
jpountz commented on code in PR #11722:
URL: https://github.com/apache/lucene/pull/11722#discussion_r977400678
##
lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnumFrame.java:
##
@@ -646,6 +648,84 @@ public SeekStatus scanToTermLeaf(BytesRef target
wjp719 commented on PR #687:
URL: https://github.com/apache/lucene/pull/687#issuecomment-1254778654
> I'm (maybe naively) assuming that we could work around this case at the
inner node level by skipping inner nodes whose max value is equal to the min
value if we have already seen this value
thongnt99 commented on issue #11799:
URL: https://github.com/apache/lucene/issues/11799#issuecomment-1254781175
@ jpountz Great. Thank you very much. I will try it out and see if there is
any different in the scores.
--
This is an automated message from the Apache Git Service.
To respond
gcbaptista commented on issue #11800:
URL: https://github.com/apache/lucene/issues/11800#issuecomment-1254813633
Hey again,
So if I want my queries to support `@`, what should be my approach to keep
the parsing compatibility from this version on?
If there is no way to parse it right no
reta commented on issue #11788:
URL: https://github.com/apache/lucene/issues/11788#issuecomment-1254977405
@rmuir @uschindler thanks guys
> looks like an antlr problem, if they broke backwards compat, they prolly
should have named it 5.x?
Sadly I don't know the story, I believe
rmuir commented on issue #11788:
URL: https://github.com/apache/lucene/issues/11788#issuecomment-1255064053
i'd prefer not changing anything without addressing the testing. I need to
reiterate just how insanely trappy antlr v4 is. for painless to work with v4
and prevent insanely slow perf
[
https://issues.apache.org/jira/browse/LUCENE-9089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Drew Foulks updated LUCENE-9089:
Reporter: Bruno Roustant (was: Bruno Roustant)
> FST.Builder with fluent-style constructor
>
[
https://issues.apache.org/jira/browse/LUCENE-8983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Drew Foulks updated LUCENE-8983:
Reporter: Bruno Roustant (was: Bruno Roustant)
> PhraseWildcardQuery - new query to control and o
[
https://issues.apache.org/jira/browse/LUCENE-9049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Drew Foulks updated LUCENE-9049:
Reporter: Bruno Roustant (was: Bruno Roustant)
> Remove FST cachedRootArcs now redundant with dir
[
https://issues.apache.org/jira/browse/LUCENE-9045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Drew Foulks updated LUCENE-9045:
Reporter: Bruno Roustant (was: Bruno Roustant)
> Do not use TreeMap/TreeSet in BlockTree and PerF
[
https://issues.apache.org/jira/browse/LUCENE-9064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Drew Foulks updated LUCENE-9064:
Reporter: Bruno Roustant (was: Bruno Roustant)
> Can we remove the FST cache in Kuromoji and Nori
gsmiller commented on PR #11738:
URL: https://github.com/apache/lucene/pull/11738#issuecomment-1255173279
@rmuir did you have any other feedback or opposition to this change? Sorry,
it dropped off my plate for a bit but picking it up now and looking to get it
merged. Thanks again!
--
Thi
gsmiller commented on PR #11744:
URL: https://github.com/apache/lucene/pull/11744#issuecomment-1255176819
@mikemccand I tagged you as a potential reviewer on this if you have some
time. Thought you might have a good opinion as you authored it originally.
(Also tagged you in #11746, which is
gsmiller opened a new pull request, #11804:
URL: https://github.com/apache/lucene/pull/11804
### Description
I'd like to propose removing the `final` restriction on
`FacetsCollector#collect` to allow extension. I have a use-case where I'd like
to be able to throw a `CollectionTermina
mikemccand commented on PR #11796:
URL: https://github.com/apache/lucene/pull/11796#issuecomment-1255217103
I love this approach/idea!
It's simple so we should start with this ... but it will necessarily be a
lagging indicator since merging takes some time to kick off and run to
comp
mikemccand commented on PR #11796:
URL: https://github.com/apache/lucene/pull/11796#issuecomment-1255225299
An alternative implementation would be to add the bytes only in the
`IndexOutput.close` method instead of on each method that writes bytes? It
might be less error-proned, but, also l
mikemccand commented on code in PR #11796:
URL: https://github.com/apache/lucene/pull/11796#discussion_r977831498
##
lucene/core/src/java/org/apache/lucene/store/ByteTrackingIndexOutput.java:
##
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
mdmarshmallow commented on code in PR #11796:
URL: https://github.com/apache/lucene/pull/11796#discussion_r977879810
##
lucene/core/src/java/org/apache/lucene/store/ByteTrackingIndexOutput.java:
##
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under o
mdmarshmallow commented on code in PR #11796:
URL: https://github.com/apache/lucene/pull/11796#discussion_r977881217
##
lucene/core/src/java/org/apache/lucene/store/ByteTrackingIndexOutput.java:
##
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under o
mdmarshmallow commented on code in PR #11796:
URL: https://github.com/apache/lucene/pull/11796#discussion_r977890148
##
lucene/core/src/java/org/apache/lucene/store/ByteTrackingIndexOutput.java:
##
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under o
mdmarshmallow commented on PR #11796:
URL: https://github.com/apache/lucene/pull/11796#issuecomment-1255305951
So by doing this on `IndexOutput.close()`, we would avoid including
half-done merges/flushes in the write amplification factor? As you said, this
does track all-time WAF so I guess
dan2097 commented on issue #11761:
URL: https://github.com/apache/lucene/issues/11761#issuecomment-1255309927
I have also ran into this on our patent search system. In our index the
problem is exagerrated by the larger documents tending to be more frequently
reindexed so the 20% deleted doc
caohassl opened a new issue, #11805:
URL: https://github.com/apache/lucene/issues/11805
### Description
hi,
I try to submit a Lucene search task using multiple threads, and when I
cancel the search thread, the search task complete normally. But Some search
tasks are time-consu
vigyasharma commented on PR #11796:
URL: https://github.com/apache/lucene/pull/11796#issuecomment-1255316059
I see you'd already responded to a bunch of my comments. I should've
refreshed my PR page. Will resolve those.
--
This is an automated message from the Apache Git Service.
To respo
caohassl opened a new pull request, #11806:
URL: https://github.com/apache/lucene/pull/11806
### Description
ISSUE:#11805
1、Add a InterruptedCollector class to delegate collector
2、By default, when LeafReaderContext is traversed, determine whether there
is an interrupt reque
Yuti-G commented on PR #11768:
URL: https://github.com/apache/lucene/pull/11768#issuecomment-1255341964
Thanks @gsmiller for discovering this issue! The changes look good to me.
I am curious if the `index` in `LongIntCursor` works similarly to `ordinals`
in other faceting implementati
[
https://issues.apache.org/jira/browse/LUCENE-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Drew Foulks updated LUCENE-8292:
Reporter: Bruno Roustant (was: Bruno Roustant)
> Fix FilterLeafReader.FilterTermsEnum to delegate
[
https://issues.apache.org/jira/browse/LUCENE-8753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Drew Foulks updated LUCENE-8753:
Reporter: Bruno Roustant (was: Bruno Roustant)
> New PostingFormat - UniformSplit
> -
[
https://issues.apache.org/jira/browse/LUCENE-9078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Drew Foulks updated LUCENE-9078:
Reporter: Bruno Roustant (was: Bruno Roustant)
> Term vectors options should not be configurable
[
https://issues.apache.org/jira/browse/LUCENE-8906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Drew Foulks updated LUCENE-8906:
Reporter: Bruno Roustant (was: Bruno Roustant)
> Lucene50PostingsReader.postings() casts BlockTer
[
https://issues.apache.org/jira/browse/LUCENE-8836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Drew Foulks updated LUCENE-8836:
Reporter: Bruno Roustant (was: Bruno Roustant)
> Optimize DocValues TermsDict to continue scannin
[
https://issues.apache.org/jira/browse/LUCENE-8159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Drew Foulks updated LUCENE-8159:
Reporter: Bruno Roustant (was: Bruno Roustant)
> Add a copy constructor in AutomatonQuery to copy
[
https://issues.apache.org/jira/browse/LUCENE-8921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Drew Foulks updated LUCENE-8921:
Reporter: Bruno Roustant (was: Bruno Roustant)
> IndexSearcher.termStatistics should not require
gautamworah96 commented on PR #11796:
URL: https://github.com/apache/lucene/pull/11796#issuecomment-1255423884
For folks more familiar with WAF calculations for Search applications, is
the formula of `(flushedBytes + mergedBytes) / flushedBytes` always correct?
For example, does the
gsmiller commented on PR #11768:
URL: https://github.com/apache/lucene/pull/11768#issuecomment-1255483139
@Yuti-G could you help me understand what faceting implementation or part of
the code you're referring to? Thanks!
--
This is an automated message from the Apache Git Service.
To resp
Yuti-G commented on PR #11768:
URL: https://github.com/apache/lucene/pull/11768#issuecomment-1255500611
Sure, I just updated the previous comment with links. Thanks!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
dweiss commented on issue #11800:
URL: https://github.com/apache/lucene/issues/11800#issuecomment-1255521641
You can escape the at character:
```
am\@zing
```
or you can quote the term:
```
"am\@zing"
```
Or you can set up flexible query parser with your own syntax par
gsmiller commented on PR #11768:
URL: https://github.com/apache/lucene/pull/11768#issuecomment-1255562840
@Yuti-G thanks for the links. In this case, the contract is that we break
ties by the value (of the long) itself (low-to-high), which the PQ is already
doing. So this appears to be corr
joshsouza opened a new pull request, #2671:
URL: https://github.com/apache/lucene-solr/pull/2671
As discovered in https://github.com/apache/solr-operator/issues/475
the `s3-repository` contrib module is missing a dependency on the
`software.amazon.awssdk:sts` module in order to enable aut
Yuti-G commented on PR #11768:
URL: https://github.com/apache/lucene/pull/11768#issuecomment-1255662264
I see.. Thanks for the explanation of indexes!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go
vigyasharma commented on code in PR #11796:
URL: https://github.com/apache/lucene/pull/11796#discussion_r978239743
##
lucene/core/src/java/org/apache/lucene/store/ByteTrackingIndexOutput.java:
##
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
vigyasharma commented on code in PR #11796:
URL: https://github.com/apache/lucene/pull/11796#discussion_r978242377
##
lucene/core/src/java/org/apache/lucene/store/ByteTrackingIndexOutput.java:
##
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
vigyasharma commented on code in PR #11796:
URL: https://github.com/apache/lucene/pull/11796#discussion_r978243220
##
lucene/core/src/java/org/apache/lucene/store/ByteTrackingIndexOutput.java:
##
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
vigyasharma commented on PR #11796:
URL: https://github.com/apache/lucene/pull/11796#issuecomment-1255778326
> An alternative implementation would be to add the bytes only in the
`IndexOutput.close` method instead of on each method that writes bytes? It
might be less error-proned, but, also
vigyasharma commented on PR #11796:
URL: https://github.com/apache/lucene/pull/11796#issuecomment-1255779217
Thanks for persisting with this @mdmarshmallow. I think we're close now,
just a couple of discussion threads to resolve. This change will be super
useful :)
--
This is an automate
vsop-479 commented on PR #11722:
URL: https://github.com/apache/lucene/pull/11722#issuecomment-1255837607
@jpountz
Thanks for your review.
I did a simple performance test, which indexed 1M random UUID's substring(2,
8), got 10 segments, and picked up 1K terms to search. Average Result
LuXugang commented on code in PR #687:
URL: https://github.com/apache/lucene/pull/687#discussion_r978314526
##
lucene/sandbox/src/java/org/apache/lucene/sandbox/search/IndexSortSortedNumericDocValuesRangeQuery.java:
##
@@ -214,12 +221,172 @@ public int count(LeafReaderContext co
59 matches
Mail list logo