Re: [PR] script to run microbenchmarks across different ec2 instance types [lucene]

2023-11-09 Thread via GitHub
rmuir commented on PR #12787: URL: https://github.com/apache/lucene/pull/12787#issuecomment-1803343093 When I run `make PATCH_BRANCH=rmuir:microbenchmark_ec2` we will just see no differences but it demonstrates it (sorry: no speedups in this branch!). It spins up/tears down `lucene-jm

Re: [PR] Specialize arc store for continuous label in FST [lucene]

2023-11-09 Thread via GitHub
gf2121 commented on code in PR #12748: URL: https://github.com/apache/lucene/pull/12748#discussion_r1387636706 ## lucene/CHANGES.txt: ## @@ -106,6 +106,8 @@ Optimizations * GITHUB#12552: Make FSTPostingsFormat load FSTs off-heap. (Tony X) +* GITHUB#12748: Specialize arc sto

Re: [PR] remove non-NRT replication support [lucene]

2023-11-09 Thread via GitHub
dweiss commented on PR #12038: URL: https://github.com/apache/lucene/pull/12038#issuecomment-1803391323 > If anyone is still using the legacy non-NRT mode, please let me know on this issue and give me your IP address, so I can try to pop a shell. Oh, I missed this bit somehow, @rmuir.

[I] Unrolle or vectorize Math.max in CompetitiveImpactAccumulator.addAll? [lucene]

2023-11-09 Thread via GitHub
vsop-479 opened a new issue, #12788: URL: https://github.com/apache/lucene/issues/12788 ### Description Does it worth to make Math.max in CompetitiveImpactAccumulator.addAll unrolled or vectorized? Maybe scalar can be auto vectorized by JIT, but there is some speed up with unrolle

Re: [I] Unrolle or vectorize Math.max in CompetitiveImpactAccumulator.addAll? [lucene]

2023-11-09 Thread via GitHub
vsop-479 commented on issue #12788: URL: https://github.com/apache/lucene/issues/12788#issuecomment-1803464509 @jpountz Please take a look when you get a chance! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [I] Unrolle or vectorize Math.max in CompetitiveImpactAccumulator.addAll? [lucene]

2023-11-09 Thread via GitHub
uschindler commented on issue #12788: URL: https://github.com/apache/lucene/issues/12788#issuecomment-1803537830 Hi, for correct vectorization please make use of the official Lucene framework (add your implementation class' instance for the scalar and the vectorized variant as a sepa

Re: [PR] Clean up ordinal map in default SSDV reader state [lucene]

2023-11-09 Thread via GitHub
stefanvodita commented on PR #12454: URL: https://github.com/apache/lucene/pull/12454#issuecomment-1803605667 Thanks Greg! I think the delay is partially my fault, I had mentioned a different G. Miller in my message 😄 -- This is an automated message from the Apache Git Service. To respon

Re: [PR] Specialize arc store for continuous label in FST [lucene]

2023-11-09 Thread via GitHub
mikemccand commented on PR #12748: URL: https://github.com/apache/lucene/pull/12748#issuecomment-1803615768 > I can help merge this in and backport if there is no objection in 48h. Thanks @gf2121 -- we should backport all these recent exciting FST changes in the right order as a batch

Re: [PR] Specialize arc store for continuous label in FST [lucene]

2023-11-09 Thread via GitHub
mikemccand merged PR #12748: URL: https://github.com/apache/lucene/pull/12748 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [PR] Clean up ordinal map in default SSDV reader state [lucene]

2023-11-09 Thread via GitHub
mikemccand commented on PR #12454: URL: https://github.com/apache/lucene/pull/12454#issuecomment-1803628190 > Thanks Greg! I think the delay is partially my fault, I had mentioned a different G. Miller in my message 😄 Seems to be common mistake recently! See this [recent hilarious e

Re: [PR] Specialize arc store for continuous label in FST [lucene]

2023-11-09 Thread via GitHub
easyice commented on PR #12748: URL: https://github.com/apache/lucene/pull/12748#issuecomment-1803649666 @mikemccand @gf2121 Thanks for review and merge it ;-) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] Redo #12707: Do not rely on isAlive() status of MemorySegment#Scope [lucene]

2023-11-09 Thread via GitHub
uschindler commented on PR #12785: URL: https://github.com/apache/lucene/pull/12785#issuecomment-1803669858 After some discussion with @mcimadamore we figured out that there are more problem, so we need to rely on the exception message. The following problem can occur and possibly hap

Re: [PR] Redo #12707: Do not rely on isAlive() status of MemorySegment#Scope [lucene]

2023-11-09 Thread via GitHub
uschindler commented on PR #12785: URL: https://github.com/apache/lucene/pull/12785#issuecomment-1803692612 I committed another change to make the sequence of `IndexInput#close()` first try to close the segment and then set everything to null. In case if ISE, the IndexInput is not closed.

Re: [PR] Add TaxonomyReader#getBulkOrdinals method (#12180) [lucene]

2023-11-09 Thread via GitHub
epotyom commented on code in PR #12769: URL: https://github.com/apache/lucene/pull/12769#discussion_r1387902988 ## lucene/facet/src/test/org/apache/lucene/facet/taxonomy/directory/TestDirectoryTaxonomyReader.java: ## @@ -476,6 +479,86 @@ public void testOpenIfChangedReplaceTaxon

Re: [I] Take advantage of bloom filter when delete terms [lucene]

2023-11-09 Thread via GitHub
s1monw commented on issue #12725: URL: https://github.com/apache/lucene/issues/12725#issuecomment-1803718796 yeah I think we should check if it's memory and time efficient. I think in theory we could iterate the terms in the automaton against the bloom filter to take advantage of it inside

Re: [PR] Prevent users from using document block APIs when sort is configured [lucene]

2023-11-09 Thread via GitHub
mikemccand commented on PR #12711: URL: https://github.com/apache/lucene/pull/12711#issuecomment-1803774655 > Really, if we'd be implementing the feature today would we use a bitset or maybe a sparse DV field recording the number of children for each block in the index? In fact, in o

Re: [PR] Add TaxonomyReader#getBulkOrdinals method (#12180) [lucene]

2023-11-09 Thread via GitHub
mikemccand commented on PR #12769: URL: https://github.com/apache/lucene/pull/12769#issuecomment-1803988678 > I can make sure that we have a task that calls this method (indirectly) in the next step for this issue - adding bulk Facets#getSpecificValues, will that be ok? +1, thanks!

Re: [PR] Add TaxonomyReader#getBulkOrdinals method (#12180) [lucene]

2023-11-09 Thread via GitHub
mikemccand commented on code in PR #12769: URL: https://github.com/apache/lucene/pull/12769#discussion_r1388134130 ## lucene/facet/src/test/org/apache/lucene/facet/taxonomy/directory/TestDirectoryTaxonomyReader.java: ## @@ -570,16 +654,20 @@ public void testAccountable() throws

Re: [PR] Add TaxonomyReader#getBulkOrdinals method (#12180) [lucene]

2023-11-09 Thread via GitHub
mikemccand merged PR #12769: URL: https://github.com/apache/lucene/pull/12769 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

[PR] Improve vector search speed by using FixedBitSet [lucene]

2023-11-09 Thread via GitHub
benwtrent opened a new pull request, #12789: URL: https://github.com/apache/lucene/pull/12789 While doing some performance testing and digging into flamegraphs, I noticed for smaller vectors (96dim float32), we were losing a fair bit of time within the `SparseFixedBitSet#getAndSet` method.

Re: [PR] Add TaxonomyReader#getBulkOrdinals method (#12180) [lucene]

2023-11-09 Thread via GitHub
mikemccand commented on PR #12769: URL: https://github.com/apache/lucene/pull/12769#issuecomment-1804048913 I think this is safe to backport to 9.x? I'll do that, and move the `CHANGES.txt` entry down. -- This is an automated message from the Apache Git Service. To respond to the message

Re: [PR] Redo #12707: Do not rely on isAlive() status of MemorySegment#Scope [lucene]

2023-11-09 Thread via GitHub
uschindler commented on PR #12785: URL: https://github.com/apache/lucene/pull/12785#issuecomment-1804063179 I fixed the `close()` method to no longer throw `IllegalStateException` as this would violate the contract. When we close only `IOException` is allowed. As half-open index inputs are

Re: [PR] script to run microbenchmarks across different ec2 instance types [lucene]

2023-11-09 Thread via GitHub
rmuir commented on PR #12787: URL: https://github.com/apache/lucene/pull/12787#issuecomment-1804143223 I still struggle with the noise, it is even more than when you run the benchmarks manually. I inspected an instance under test and saw e.g. scheduled job burning up CPU rebuilding m

Re: [PR] Improve vector search speed by using FixedBitSet [lucene]

2023-11-09 Thread via GitHub
jpountz commented on PR #12789: URL: https://github.com/apache/lucene/pull/12789#issuecomment-1804146598 I can believe that FixedBitSet is faster in some cases, but it's surprising to me that the memory usage of SparseFixedBitSet can go up to 2x that of FixedBitSet, this makes me wonder if

Re: [PR] Fix CheckIndex to detect major corruption with old (not the latest) commit point [lucene]

2023-11-09 Thread via GitHub
gokaai commented on code in PR #12530: URL: https://github.com/apache/lucene/pull/12530#discussion_r1388271749 ## lucene/core/src/java/org/apache/lucene/index/CheckIndex.java: ## @@ -610,6 +610,31 @@ public Status checkIndex(List onlySegments, ExecutorService executorServ

Re: [PR] Use group-varint encoding for the tail of postings [lucene]

2023-11-09 Thread via GitHub
jpountz commented on code in PR #12782: URL: https://github.com/apache/lucene/pull/12782#discussion_r1388273135 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/GroupVintWriter.java: ## @@ -0,0 +1,97 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [I] Unrolle or vectorize Math.max in CompetitiveImpactAccumulator.addAll? [lucene]

2023-11-09 Thread via GitHub
jpountz commented on issue #12788: URL: https://github.com/apache/lucene/issues/12788#issuecomment-1804189940 Oh, it's sad that this loop doesn't get auto-vectorized automatically. Out of curiosity, are you seeing it show up in some benchmarks? -- This is an automated message from the Apa

Re: [PR] Fix CheckIndex to detect major corruption with old (not the latest) commit point [lucene]

2023-11-09 Thread via GitHub
gokaai commented on code in PR #12530: URL: https://github.com/apache/lucene/pull/12530#discussion_r1388271749 ## lucene/core/src/java/org/apache/lucene/index/CheckIndex.java: ## @@ -610,6 +610,31 @@ public Status checkIndex(List onlySegments, ExecutorService executorServ

Re: [I] Reproducible failure in TestIndexWriter.testHasUncommittedChanges [lucene]

2023-11-09 Thread via GitHub
jpountz commented on issue #12763: URL: https://github.com/apache/lucene/issues/12763#issuecomment-1804193016 I'm away from my main working computer this week, I suspect it's a similar issue that I saw elsewhere where merges cascade. I'll look into it on Monday if nobody beats me to me. -

Re: [PR] Improve vector search speed by using FixedBitSet [lucene]

2023-11-09 Thread via GitHub
benwtrent commented on PR #12789: URL: https://github.com/apache/lucene/pull/12789#issuecomment-1804203048 @jpountz I re-ran my tests and double checked my numbers, I have some corrections, I accidentally double-counted sparse sizes, so previous numbers are 2x too big. GLOVE-100-100_

Re: [PR] Redo #12707: Do not rely on isAlive() status of MemorySegment#Scope and make sure IndexInput#close() does not throw IllegalStateException and waits instead [lucene]

2023-11-09 Thread via GitHub
uschindler commented on PR #12785: URL: https://github.com/apache/lucene/pull/12785#issuecomment-1804242819 I let `TestMmapDirectory.testAceWithThreads` run with `gradlew :lucene:core:beast` with many iterations and high multiplier: JDK 19, 20, 21 showed no problems. -- This is an automa

Re: [PR] Adding new flat vector format and refactoring HNSW [lucene]

2023-11-09 Thread via GitHub
jimczi commented on PR #12729: URL: https://github.com/apache/lucene/pull/12729#issuecomment-1804243772 Sorry for the late reply. > Since this is a larger API discussion, do we think we can move forward with the way it is now (quantization for HNSW and other vector indices) and itera

Re: [PR] Redo #12707: Do not rely on isAlive() status of MemorySegment#Scope and make sure IndexInput#close() does not throw IllegalStateException and waits instead [lucene]

2023-11-09 Thread via GitHub
uschindler merged PR #12785: URL: https://github.com/apache/lucene/pull/12785 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [PR] Fix CheckIndex to detect major corruption with old (not the latest) commit point [lucene]

2023-11-09 Thread via GitHub
mikemccand commented on code in PR #12530: URL: https://github.com/apache/lucene/pull/12530#discussion_r1388366133 ## lucene/core/src/java/org/apache/lucene/index/CheckIndex.java: ## @@ -610,6 +610,31 @@ public Status checkIndex(List onlySegments, ExecutorService executorServ

Re: [I] Add Facets#getSpecificValues (bulk) and bulk path -> ordinal lookup for taxonomy faceting [lucene]

2023-11-09 Thread via GitHub
uschindler commented on issue #12180: URL: https://github.com/apache/lucene/issues/12180#issuecomment-1804313295 Hi, the commit causes test failures like this from time to time: ``` org.apache.lucene.facet.taxonomy.directory.TestDirectoryTaxonomyReader > testGetPathAndOrdinalsRandomMul

Re: [PR] Add TaxonomyReader#getBulkOrdinals method (#12180) [lucene]

2023-11-09 Thread via GitHub
uschindler commented on PR #12769: URL: https://github.com/apache/lucene/pull/12769#issuecomment-1804314417 Hi, the commit causes test failures like this from time to time: ``` org.apache.lucene.facet.taxonomy.directory.TestDirectoryTaxonomyReader > testGetPathAndOrdinalsRandomMultithr

Re: [PR] Add TaxonomyReader#getBulkOrdinals method (#12180) [lucene]

2023-11-09 Thread via GitHub
uschindler commented on PR #12769: URL: https://github.com/apache/lucene/pull/12769#issuecomment-1804316973 Looks like the ordinals array sizes must be at least 1, so in general the initial setup of the ordinal size must use `numOrdinals = random(limit) + 1;` -- This is an automated messa

Re: [PR] Refactoring HNSW to use a new internal FlatVectorFormat [lucene]

2023-11-09 Thread via GitHub
benwtrent commented on PR #12729: URL: https://github.com/apache/lucene/pull/12729#issuecomment-1804353255 @jimczi updated the title. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Add TaxonomyReader#getBulkOrdinals method (#12180) [lucene]

2023-11-09 Thread via GitHub
mikemccand commented on PR #12769: URL: https://github.com/apache/lucene/pull/12769#issuecomment-1804396963 Thanks Uwe and sorry! I think Egor is digging on this or I’ll revert soon. Mike On Thu, Nov 9, 2023 at 1:17 PM Uwe Schindler ***@***.***> wrote: > Assigned #127

[PR] Fix random test TestDirectoryTaxonomyReader#TestDirectoryTaxonomyReader [lucene]

2023-11-09 Thread via GitHub
epotyom opened a new pull request, #12790: URL: https://github.com/apache/lucene/pull/12790 Fix bug from https://github.com/apache/lucene/pull/12769 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Add TaxonomyReader#getBulkOrdinals method (#12180) [lucene]

2023-11-09 Thread via GitHub
epotyom commented on PR #12769: URL: https://github.com/apache/lucene/pull/12769#issuecomment-1804432895 Hi all, Sorry for the bug, this pull request should fix it: https://github.com/apache/lucene/pull/12790 Kind regards, Egor On Thu, 9 Nov 2023 at 18:52, Michael M

Re: [PR] Fix random test TestDirectoryTaxonomyReader#TestDirectoryTaxonomyReader [lucene]

2023-11-09 Thread via GitHub
mikemccand merged PR #12790: URL: https://github.com/apache/lucene/pull/12790 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [I] Add Facets#getSpecificValues (bulk) and bulk path -> ordinal lookup for taxonomy faceting [lucene]

2023-11-09 Thread via GitHub
mikemccand commented on issue #12180: URL: https://github.com/apache/lucene/issues/12180#issuecomment-1804446870 OK fixed @uschindler -- sorry! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] Add Facets#getSpecificValues (bulk) and bulk path -> ordinal lookup for taxonomy faceting [lucene]

2023-11-09 Thread via GitHub
mikemccand commented on issue #12180: URL: https://github.com/apache/lucene/issues/12180#issuecomment-1804447251 And thanks @epotyom! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [I] Add Facets#getSpecificValues (bulk) and bulk path -> ordinal lookup for taxonomy faceting [lucene]

2023-11-09 Thread via GitHub
gsmiller commented on issue #12180: URL: https://github.com/apache/lucene/issues/12180#issuecomment-1804469669 Thanks @epotyom! Should we consider a follow up PR that leverages this new bulk lookup by adding something like `Facets#getSpecificValues` that gets facet values for multiple paths

Re: [PR] Fix random test TestDirectoryTaxonomyReader#testGetPathAndOrdinalsRandomMultithreading [lucene]

2023-11-09 Thread via GitHub
epotyom commented on PR #12790: URL: https://github.com/apache/lucene/pull/12790#issuecomment-1804531938 I've re-run the tests multiple times just in case, there were no errors: ``` ./gradlew -p lucene/facet test --tests "*TestDirectoryTaxonomyReader*" -Ptests.iters=1000 ...

Re: [PR] Fix random test TestDirectoryTaxonomyReader#testGetPathAndOrdinalsRandomMultithreading [lucene]

2023-11-09 Thread via GitHub
uschindler commented on PR #12790: URL: https://github.com/apache/lucene/pull/12790#issuecomment-1804593054 I need to merge this also into the java 22 mmap branch where Jenkins runs on. #12706 -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [I] Add Facets#getSpecificValues (bulk) and bulk path -> ordinal lookup for taxonomy faceting [lucene]

2023-11-09 Thread via GitHub
epotyom commented on issue #12180: URL: https://github.com/apache/lucene/issues/12180#issuecomment-1804594098 @gsmiller yes, I'll be working on that now as well as adding benchmark task for getSpecificValues, as was discussed with Mike in https://github.com/apache/lucene/pull/12769#pullrequ

Re: [I] Add Facets#getSpecificValues (bulk) and bulk path -> ordinal lookup for taxonomy faceting [lucene]

2023-11-09 Thread via GitHub
gsmiller commented on issue #12180: URL: https://github.com/apache/lucene/issues/12180#issuecomment-1804667028 @epotyom got it, thanks! Didn't see that earlier conversation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] Use group-varint encoding for the tail of postings [lucene]

2023-11-09 Thread via GitHub
rmuir commented on code in PR #12782: URL: https://github.com/apache/lucene/pull/12782#discussion_r1388595300 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/GroupVintReader.java: ## @@ -0,0 +1,176 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one o

Re: [PR] Fix random test TestDirectoryTaxonomyReader#testGetPathAndOrdinalsRandomMultithreading [lucene]

2023-11-09 Thread via GitHub
uschindler commented on PR #12790: URL: https://github.com/apache/lucene/pull/12790#issuecomment-1804722810 OK merged to java 22 branch. Tests pass. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [I] Unrolle or vectorize Math.max in CompetitiveImpactAccumulator.addAll? [lucene]

2023-11-09 Thread via GitHub
rmuir commented on issue #12788: URL: https://github.com/apache/lucene/issues/12788#issuecomment-1804722948 > Oh, it's sad that this loop doesn't get auto-vectorized automatically. Out of curiosity, are you seeing it show up in some benchmarks I don't believe that, there is code to do

Re: [PR] script to run microbenchmarks across different ec2 instance types [lucene]

2023-11-09 Thread via GitHub
uschindler commented on code in PR #12787: URL: https://github.com/apache/lucene/pull/12787#discussion_r1388633378 ## gradle/validation/rat-sources.gradle: ## @@ -53,6 +53,9 @@ allprojects { include "**/*.sh" include "**/*.bat" +// exclude

Re: [PR] script to run microbenchmarks across different ec2 instance types [lucene]

2023-11-09 Thread via GitHub
rmuir commented on code in PR #12787: URL: https://github.com/apache/lucene/pull/12787#discussion_r1388637999 ## gradle/validation/rat-sources.gradle: ## @@ -53,6 +53,9 @@ allprojects { include "**/*.sh" include "**/*.bat" +// exclude anyt

Re: [PR] script to run microbenchmarks across different ec2 instance types [lucene]

2023-11-09 Thread via GitHub
rmuir commented on code in PR #12787: URL: https://github.com/apache/lucene/pull/12787#discussion_r1388639032 ## gradle/validation/rat-sources.gradle: ## @@ -53,6 +53,9 @@ allprojects { include "**/*.sh" include "**/*.bat" +// exclude anyt

Re: [PR] script to run microbenchmarks across different ec2 instance types [lucene]

2023-11-09 Thread via GitHub
uschindler commented on code in PR #12787: URL: https://github.com/apache/lucene/pull/12787#discussion_r1388639477 ## gradle/validation/rat-sources.gradle: ## @@ -53,6 +53,9 @@ allprojects { include "**/*.sh" include "**/*.bat" +// exclude

Re: [PR] script to run microbenchmarks across different ec2 instance types [lucene]

2023-11-09 Thread via GitHub
rmuir commented on code in PR #12787: URL: https://github.com/apache/lucene/pull/12787#discussion_r1388642669 ## gradle/validation/rat-sources.gradle: ## @@ -53,6 +53,9 @@ allprojects { include "**/*.sh" include "**/*.bat" +// exclude anyt

Re: [PR] script to run microbenchmarks across different ec2 instance types [lucene]

2023-11-09 Thread via GitHub
uschindler commented on code in PR #12787: URL: https://github.com/apache/lucene/pull/12787#discussion_r1388643017 ## gradle/validation/rat-sources.gradle: ## @@ -53,6 +53,9 @@ allprojects { include "**/*.sh" include "**/*.bat" +// exclude

Re: [PR] script to run microbenchmarks across different ec2 instance types [lucene]

2023-11-09 Thread via GitHub
uschindler commented on code in PR #12787: URL: https://github.com/apache/lucene/pull/12787#discussion_r1388643017 ## gradle/validation/rat-sources.gradle: ## @@ -53,6 +53,9 @@ allprojects { include "**/*.sh" include "**/*.bat" +// exclude

Re: [PR] script to run microbenchmarks across different ec2 instance types [lucene]

2023-11-09 Thread via GitHub
rmuir commented on code in PR #12787: URL: https://github.com/apache/lucene/pull/12787#discussion_r1388647325 ## gradle/validation/rat-sources.gradle: ## @@ -53,6 +53,9 @@ allprojects { include "**/*.sh" include "**/*.bat" +// exclude anyt

Re: [PR] script to run microbenchmarks across different ec2 instance types [lucene]

2023-11-09 Thread via GitHub
uschindler commented on code in PR #12787: URL: https://github.com/apache/lucene/pull/12787#discussion_r1388642702 ## gradle/validation/rat-sources.gradle: ## @@ -53,6 +53,9 @@ allprojects { include "**/*.sh" include "**/*.bat" +// exclude

Re: [PR] script to run microbenchmarks across different ec2 instance types [lucene]

2023-11-09 Thread via GitHub
uschindler commented on code in PR #12787: URL: https://github.com/apache/lucene/pull/12787#discussion_r1388648419 ## gradle/validation/rat-sources.gradle: ## @@ -53,6 +53,9 @@ allprojects { include "**/*.sh" include "**/*.bat" +// exclude

Re: [PR] script to run microbenchmarks across different ec2 instance types [lucene]

2023-11-09 Thread via GitHub
uschindler commented on code in PR #12787: URL: https://github.com/apache/lucene/pull/12787#discussion_r1388668515 ## gradle/validation/rat-sources.gradle: ## @@ -53,6 +53,9 @@ allprojects { include "**/*.sh" include "**/*.bat" +// exclude

Re: [I] Should we explore DiskANN for aKNN vector search? [lucene]

2023-11-09 Thread via GitHub
robertvanwinkle1138 commented on issue #12615: URL: https://github.com/apache/lucene/issues/12615#issuecomment-1804858072 Perhaps much of the jvector performance improvement is simply from on heap caching. https://github.com/jbellis/jvector/blob/main/jvector-base/src/main/java/io/git

Re: [I] Unrolle or vectorize Math.max in CompetitiveImpactAccumulator.addAll? [lucene]

2023-11-09 Thread via GitHub
vsop-479 commented on issue #12788: URL: https://github.com/apache/lucene/issues/12788#issuecomment-1804999164 > To benchmark then use the benchmark-jmh Gradle module. This will enable vectorization if all is sane. Thanks for your explanation. I will try it. > are you seeing it

Re: [PR] Use group-varint encoding for the tail of postings [lucene]

2023-11-09 Thread via GitHub
easyice commented on code in PR #12782: URL: https://github.com/apache/lucene/pull/12782#discussion_r1388843109 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/GroupVintWriter.java: ## @@ -0,0 +1,97 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] Use group-varint encoding for the tail of postings [lucene]

2023-11-09 Thread via GitHub
easyice commented on code in PR #12782: URL: https://github.com/apache/lucene/pull/12782#discussion_r1388845597 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/GroupVintReader.java: ## @@ -0,0 +1,176 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] Use group-varint encoding for the tail of postings [lucene]

2023-11-09 Thread via GitHub
easyice commented on PR #12782: URL: https://github.com/apache/lucene/pull/12782#issuecomment-1805059427 @jpountz @rmuir Thanks for your suggestions, it's very helpful for me! I will run the benchmark for recomputing length vs table lookup. -- This is an automated message from the Apach

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-09 Thread via GitHub
dungba88 commented on PR #12624: URL: https://github.com/apache/lucene/pull/12624#issuecomment-1805190081 @mikemccand I put out another revision. Basically the idea is to write everything to a DataOutput (BytesStore is also a DataOutput). To support write-then-read-immediately use case that

Re: [PR] Enable executing using NFA in RegexpQuery [lucene]

2023-11-09 Thread via GitHub
zhaih merged PR #12767: URL: https://github.com/apache/lucene/pull/12767 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apach

Re: [PR] Remove patching for doc blocks. [lucene]

2023-11-09 Thread via GitHub
jpountz commented on PR #12741: URL: https://github.com/apache/lucene/pull/12741#issuecomment-1805220170 [Nightly benchmarks](https://home.apache.org/~mikemccand/lucenebench/) just caught up this change, it's no obvious that there is a speedup. -- This is an automated message from the Apa

Re: [PR] Remove patching for doc blocks. [lucene]

2023-11-09 Thread via GitHub
gf2121 commented on PR #12741: URL: https://github.com/apache/lucene/pull/12741#issuecomment-1805253760 FYI this great [view](https://home.apache.org/~mikemccand/lucenebench/2023.11.09.18.02.58.html) could be easier to see the impact of changes in single day for all tasks. It seems some co