rendel commented on issue #11702:
URL: https://github.com/apache/lucene/issues/11702#issuecomment-1313258450
> Could you use [OpenSearch's
implementation](https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/index/mapper/BinaryFieldMapper.java#L241)?
jpountz commented on PR #11903:
URL: https://github.com/apache/lucene/pull/11903#issuecomment-1313781961
@mikemccand Merging this PR will require regolding nightly benchmarks. Does
it help if you can control when the PR gets merged?
--
This is an automated message from the Apache Git Serv
jpountz commented on PR #11917:
URL: https://github.com/apache/lucene/pull/11917#issuecomment-1313830132
@uschindler I think I understand Robert's concerns against mlock, but less
yours against loading. Do you think that it would be useless to preload if
these files can then get paged out?
jpountz commented on PR #912:
URL: https://github.com/apache/lucene/pull/912#issuecomment-1313870436
@uschindler FYI it looks like this change made indexing ~12% faster with
`BEST_SPEED` on the stored fields benchmark:
http://people.apache.org/~mikemccand/lucenebench/stored_fields_benchmark
uschindler commented on PR #11917:
URL: https://github.com/apache/lucene/pull/11917#issuecomment-1313875966
> @uschindler I think I understand Robert's concerns against mlock, but less
yours against loading. Do you think that it would be useless to preload if
these files can then get paged
jpountz commented on PR #11917:
URL: https://github.com/apache/lucene/pull/11917#issuecomment-1313881947
> If you agree that all files marked with IOContext.LOAD are also candidates
for mlock2, then I am fine.
I think it would apply to the same files indeed.
Let me update the P
uschindler commented on PR #11917:
URL: https://github.com/apache/lucene/pull/11917#issuecomment-1313886416
I think the predicate would also simplify the current Elasticserach code
which instantiates 2 MMapDirectory instances with the same underlying dir and
uses FileSwitchDir in front. A p
jpountz commented on PR #11917:
URL: https://github.com/apache/lucene/pull/11917#issuecomment-1313887463
Agreed, this would be a welcome simplification.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to
uschindler commented on PR #11917:
URL: https://github.com/apache/lucene/pull/11917#issuecomment-1313888642
we can keep and deprecate the old boolean setter that delegates to 2
predefined "alaways on", "always off" predicates.
--
This is an automated message from the Apache Git Service.
T
jpountz commented on PR #11917:
URL: https://github.com/apache/lucene/pull/11917#issuecomment-1313905183
@uschindler Something like that?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the speci
jpountz commented on issue #11914:
URL: https://github.com/apache/lucene/issues/11914#issuecomment-1313907213
@shubhamvishu Yes, my thinking was to push the responsibility to the caller
and only wrap with an `ExitableDirectoryReader` if there is actually a timeout.
--
This is an automated
uschindler commented on PR #11917:
URL: https://github.com/apache/lucene/pull/11917#issuecomment-1313914337
Looks fine. I would add 3 predefined constants:
```java
public static final BiPredicate ALL_FILES = (file, context)
-> true;
public static final BiPredicate NO_FILES = (fi
uschindler commented on PR #11917:
URL: https://github.com/apache/lucene/pull/11917#issuecomment-1313922578
Unrelated problem with Github CI:
> org.apache.lucene.index.TestIndexFileDeleter.testExcInDecRef
this test failed for me yesterday in the same way. Seems to be a bug.
--
jpountz commented on PR #11888:
URL: https://github.com/apache/lucene/pull/11888#issuecomment-1313929513
Thanks for looking into a test for this.
@mikemccand Would you have time to have a look at this? I knew it was
complex, but even more than I had anticipated, so I would appreciate
uschindler commented on PR #912:
URL: https://github.com/apache/lucene/pull/912#issuecomment-1313949994
> @uschindler FYI it looks like this change made indexing ~12% faster with
`BEST_SPEED` on the stored fields benchmark:
http://people.apache.org/~mikemccand/lucenebench/stored_fields_benc
jpountz commented on PR #912:
URL: https://github.com/apache/lucene/pull/912#issuecomment-1313963721
I haven't dug, but your reasoning could explain why we are seeing a similar
speedup across both `BEST_SPEED` and `BEST_COMPRESSION` since they benefit from
the bulk merging optimization in a
uschindler commented on PR #11917:
URL: https://github.com/apache/lucene/pull/11917#issuecomment-1313965347
looks great. I don't like the `PRELOAD_` prefix on the constants, because I
plan to use the same for mlock2. I was already thinking about this, but came to
no good names. So my propos
jpountz commented on PR #11917:
URL: https://github.com/apache/lucene/pull/11917#issuecomment-1313968557
OK. I wanted to use names that give a sense of when these general
`BiPredicate` objects are useful without looking up javadocs, but your point
about reusing for mlock makes sense.
--
uschindler commented on code in PR #11917:
URL: https://github.com/apache/lucene/pull/11917#discussion_r1021728743
##
lucene/core/src/java/org/apache/lucene/store/MMapDirectory.java:
##
@@ -85,25 +85,23 @@ public class MMapDirectory extends FSDirectory {
* Argument for {@lin
uschindler commented on PR #11917:
URL: https://github.com/apache/lucene/pull/11917#issuecomment-1313978152
Looks good to me, i think maybe say for the constant that it uses LOAD
context.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on
uschindler commented on PR #11917:
URL: https://github.com/apache/lucene/pull/11917#issuecomment-1313978716
...namining is hard...
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific com
uschindler commented on PR #11917:
URL: https://github.com/apache/lucene/pull/11917#issuecomment-1313979769
I will open an issue in Solr to adapt their code.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL abov
rmuir commented on PR #11917:
URL: https://github.com/apache/lucene/pull/11917#issuecomment-1313991795
> @uschindler I think I understand Robert's concerns against mlock, but less
yours against loading. Do you think that it would be useless to preload if
these files can then get paged out?
uschindler commented on PR #11917:
URL: https://github.com/apache/lucene/pull/11917#issuecomment-1314009331
I don't care about the new behaviour, as I still ave the option to configure
it. I agree with Robert, if the files should stay on heap load them to heap.
What I like most on thi
uschindler commented on PR #912:
URL: https://github.com/apache/lucene/pull/912#issuecomment-1314033529
It does not fully explain it, because actually the copy of memory between
file and heap is both using `Unsafe#copyMemory`. The only difference is that
for ByteBuffer there is some hardcod
jpountz commented on PR #11917:
URL: https://github.com/apache/lucene/pull/11917#issuecomment-1314053441
I reverted the bits that would load some files in physical memory by
default. These files don't have to be in memory and I actually like that these
files are not in the heap because it m
rmuir commented on PR #912:
URL: https://github.com/apache/lucene/pull/912#issuecomment-1314053561
one issue with "fancy" copies (e.g. for bulk merging) is that we can't make
them super-fancy anyway (e.g. zero-copy from fd to fd) as we still have to
calculate CRC32 checksum.
--
T
rmuir commented on PR #11917:
URL: https://github.com/apache/lucene/pull/11917#issuecomment-1314072987
And 1.75GB is the same amount if you want to mlock the pages. Which no
operating system is gonna let you do by default, on linux it will require
modifying system soft/hard limits.
I
rmuir commented on PR #11917:
URL: https://github.com/apache/lucene/pull/11917#issuecomment-1314077245
also wanna note my concern around any use of mlock, that, users may not know
how to do the right thing and be lazy and just run as root. this is sadly the
stuff that users do, i'd hate to
gsmiller opened a new pull request, #11928:
URL: https://github.com/apache/lucene/pull/11928
### Description
This PR explores allowing `DisjunctionDISIApproximation` to short-circuit
after "finding" a doc, allowing sub-iterators to only be exhaustively advanced
when necessary. This c
gsmiller commented on code in PR #11928:
URL: https://github.com/apache/lucene/pull/11928#discussion_r1021831743
##
lucene/MIGRATE.md:
##
@@ -102,6 +102,12 @@ Lucene 9.2 or stay with 9.0.
See LUCENE-10558 for more details and workarounds.
+### DisjunctionDISIApproximation b
risdenk closed pull request #11921: errorprone WIP
URL: https://github.com/apache/lucene/pull/11921
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issu
gsmiller commented on issue #11922:
URL: https://github.com/apache/lucene/issues/11922#issuecomment-1314092165
I put up an initial PR of this idea here #11928. It can likely be further
optimized for the two-phase case, but I wanted to start simple and see if there
are any fundamental flaws
uschindler commented on code in PR #11917:
URL: https://github.com/apache/lucene/pull/11917#discussion_r1021839467
##
lucene/core/src/java/org/apache/lucene/store/MMapDirectory.java:
##
@@ -93,15 +93,8 @@ public class MMapDirectory extends FSDirectory {
*/
public static f
uschindler commented on PR #11917:
URL: https://github.com/apache/lucene/pull/11917#issuecomment-1314111220
BTW, sorry that I mentioned mlock2... oh man oh man.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL a
uschindler commented on code in PR #11917:
URL: https://github.com/apache/lucene/pull/11917#discussion_r1021849653
##
lucene/core/src/java/org/apache/lucene/store/IOContext.java:
##
@@ -45,8 +45,8 @@ public enum Context {
public final boolean readOnce;
/**
- * This fla
uschindler commented on code in PR #11917:
URL: https://github.com/apache/lucene/pull/11917#discussion_r1021850600
##
lucene/core/src/java/org/apache/lucene/store/IOContext.java:
##
@@ -45,8 +45,8 @@ public enum Context {
public final boolean readOnce;
/**
- * This fla
uschindler commented on PR #11917:
URL: https://github.com/apache/lucene/pull/11917#issuecomment-1314119766
Maybe we should start again here:
- sorry for mentioning mlock2, I will think of a way in the future to hook
into the MemorySegment and configure it in native ways (e.g. set madvise
uschindler commented on PR #912:
URL: https://github.com/apache/lucene/pull/912#issuecomment-1314122540
Yes you're right!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To
rmuir commented on PR #11927:
URL: https://github.com/apache/lucene/pull/11927#issuecomment-1314122689
@dweiss I'm seeing the same behavior when i run: `/gradlew assemble
-Dvalidation.errorprone=true`. I fear error-prone is somehow "stomping" on the
jvm arguments. The patch does work in the
uschindler commented on PR #912:
URL: https://github.com/apache/lucene/pull/912#issuecomment-1314125432
Maybe at some point JDK allows to call `Checksum#update(MemorySegment,...)`.
Then we don't need to copy multiple times, but the checksum will be calculated
using the off-heap slice.
--
rmuir commented on PR #11917:
URL: https://github.com/apache/lucene/pull/11917#issuecomment-1314126519
Well it was good to mention mlock as that "makes more sense" than just
"preload" (which has a confusing name IMO). "preload" just faults it in but if
there is memory pressure it can then j
uschindler commented on PR #11917:
URL: https://github.com/apache/lucene/pull/11917#issuecomment-1314131627
I think to get a "real sticky preload to heap", we would need to add a
NRTCachingDirectory-like wrapper on top of any Directory that copies the file
into ByteBuffersIndexInput.
--
jpountz commented on PR #11917:
URL: https://github.com/apache/lucene/pull/11917#issuecomment-1314152039
Agreed with your suggestion to reset and split into two PRs @uschindler.
I'll close this one and open two new ones.
--
This is an automated message from the Apache Git Service.
To resp
jpountz closed pull request #11917: Automatically preload index files that are
both tiny and very hot.
URL: https://github.com/apache/lucene/pull/11917
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go t
dweiss commented on PR #11927:
URL: https://github.com/apache/lucene/pull/11927#issuecomment-1314153938
Damn. Let me look into it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific com
jpountz opened a new pull request, #11929:
URL: https://github.com/apache/lucene/pull/11929
This enables configuring preloading on `MMapDirectory` based on the file
name as well as the `IOContext` that is used to open the file.
--
This is an automated message from the Apache Git Servi
rmuir commented on PR #11927:
URL: https://github.com/apache/lucene/pull/11927#issuecomment-1314176062
i tried looking at the kotlin code but didnt see any obvious way to sneak
args in there. Also not really fluent in kotlin lol
--
This is an automated message from the Apache Git Service.
jpountz opened a new pull request, #11930:
URL: https://github.com/apache/lucene/pull/11930
This PR introduces a new `LOAD` `IOContext` for files that are a small
fraction of the total index size and are expected to be accessed in a
random-access fashion. This may be leveraged by some `Dire
jpountz commented on PR #11917:
URL: https://github.com/apache/lucene/pull/11917#issuecomment-1314178816
This got split into #11929 and #11930.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the
dweiss commented on PR #11927:
URL: https://github.com/apache/lucene/pull/11927#issuecomment-1314200220
I didn't look at kotlin code, since I'm not fluent there at all... :) But I
had a hunch to just upgrade the plugin thinking it's probably a bug there...
and it seems it has been fixed. I
rmuir commented on PR #11927:
URL: https://github.com/apache/lucene/pull/11927#issuecomment-1314202127
definitely seems to be working as it is spamming my console :) Thank you!
I'll test do some simple benchmarks vs main with the `task.times`,
error-prone on and off, and see where we'
dweiss opened a new pull request, #11931:
URL: https://github.com/apache/lucene/pull/11931
Updates github actions to v3 to avoid warnings when running actions on PRs.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use
dweiss commented on PR #11927:
URL: https://github.com/apache/lucene/pull/11927#issuecomment-1314209793
Please feel free to hack anything on my branch or cherry pick what you need
- this was just a poc experiment.
--
This is an automated message from the Apache Git Service.
To respond to
rmuir commented on PR #11927:
URL: https://github.com/apache/lucene/pull/11927#issuecomment-1314220207
Thanks, this definitely got me past my stumbling block with error-prone. I
will do more tests and try to get it ready.
as a first benchmark i tried './gradlew clean && ./gradlew chec
dweiss merged PR #11931:
URL: https://github.com/apache/lucene/pull/11931
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apac
uschindler commented on code in PR #11930:
URL: https://github.com/apache/lucene/pull/11930#discussion_r1021939686
##
lucene/core/src/java/org/apache/lucene/store/IOContext.java:
##
@@ -41,34 +41,47 @@ public enum Context {
public final FlushInfo flushInfo;
+ /** This fl
benwtrent commented on PR #11860:
URL: https://github.com/apache/lucene/pull/11860#issuecomment-1314379458
I am digging back into this.
- Vectors that are similar do not have similar doc_ids. To ensure this,
some clustering would have to occur. This implies a different graph structur
rmuir commented on PR #11927:
URL: https://github.com/apache/lucene/pull/11927#issuecomment-1314381799
OK i sped up the non-errorprone case too. It is best not to fork at all,
forking is not wanted or helpful (but when it happens, we want reasonable
options set). Instead, tuning options sho
mmatela commented on issue #11864:
URL: https://github.com/apache/lucene/issues/11864#issuecomment-1314382093
Really doesn't look like a problem in SOLR.
`QueryBuilder.analyzeGraphBoolean()` creates a `GraphTokenStreamFiniteStrings`
(also Lucene class), retrieves a terms list from it and do
rmuir commented on PR #11927:
URL: https://github.com/apache/lucene/pull/11927#issuecomment-1314387568
note, if you want to try this branch out yourself, you need to be sure to
`clear/regenerate` gradle.properties after switching branches to see the impact.
--
This is an automated message
rmuir commented on PR #11927:
URL: https://github.com/apache/lucene/pull/11927#issuecomment-1314397587
I added what was missing to fix #11924, too.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to
rmuir commented on PR #11927:
URL: https://github.com/apache/lucene/pull/11927#issuecomment-1314400528
I'd like to reduce the 3GB heap of gradle we set by default too. But let's
just make that another issue/discussion as maybe it is more controversial. I've
been overriding the 3GB gradle he
mmatela commented on issue #11864:
URL: https://github.com/apache/lucene/issues/11864#issuecomment-1314436677
In my case the exception occurred with filters `WordDelimiterGraphFilter >
FlattenGraphFilter > StopFilter`, but it started to work when `StopFilter` was
moved before `FlattenGraph`
rmuir commented on PR #11860:
URL: https://github.com/apache/lucene/pull/11860#issuecomment-1314464728
> The neighbors for a given vector have effectively random doc_ids from [0,
MAX_DOC)
> We read ALL neighbors at once when exploring
So let's sort and delta-encode this list before
rmuir commented on PR #11860:
URL: https://github.com/apache/lucene/pull/11860#issuecomment-1314470208
@benwtrent I see what you are saying, but if you add delta-encoding, I feel
like PFOR exception mechanism could work. Then you can patch a couple big gaps
for docids but keep overall bpv l
rmuir commented on PR #11923:
URL: https://github.com/apache/lucene/pull/11923#issuecomment-1314486480
> just some question about cases like `merge.estimatedMergeBytes / 1024 /
1024.` changed to `merge.estimatedMergeBytes / 1024. / 1024.`:
> Were they also found by this check or did you j
dweiss commented on PR #11927:
URL: https://github.com/apache/lucene/pull/11927#issuecomment-1314869284
> It is best not to fork at all, forking is not wanted or helpful
Forking javac is needed under a few circumstances - when you want to use a
different version javac (which we supp
68 matches
Mail list logo