[jira] [Commented] (LUCENE-10606) Optimize hit collection of prefilter in KnnVectorQuery for BitSet backed queries
[ https://issues.apache.org/jira/browse/LUCENE-10606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17559038#comment-17559038 ] ASF subversion and git services commented on LUCENE-10606: -- Commit e055e95d3ef404368c1accea95d88f1cf5b48c80 in lucene's branch refs/heads/branch_9x from Kaival Parikh [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=e055e95d3ef ] LUCENE-10606: For KnnVectorQuery, optimize case where filter is backed by BitSetIterator (#951) Instead of collecting hit-by-hit using a `LeafCollector`, we break down the search by instantiating a weight, creating scorers, and checking the underlying iterator. If it is backed by a `BitSet`, we directly update the reference (as we won't be editing the `Bits`). Else we can create a new `BitSet` from the iterator using `BitSet.of`. > Optimize hit collection of prefilter in KnnVectorQuery for BitSet backed > queries > > > Key: LUCENE-10606 > URL: https://issues.apache.org/jira/browse/LUCENE-10606 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Reporter: Kaival Parikh >Priority: Minor > Labels: performance > Time Spent: 3h 50m > Remaining Estimate: 0h > > While working on this [PR|https://github.com/apache/lucene/pull/932] to add > prefilter testing support, we saw that hit collection took a long time for > BitSetIterator backed scorers (due to iteration over the entire underlying > BitSet, and copying it into an internal one) > These BitSetIterators can be frequent (as they are used in LRUQueryCache), > and bulk collection can be optimized with more knowledge of the underlying > iterator -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10606) Optimize hit collection of prefilter in KnnVectorQuery for BitSet backed queries
[ https://issues.apache.org/jira/browse/LUCENE-10606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17559039#comment-17559039 ] Julie Tibshirani commented on LUCENE-10606: --- I'm closing this out since we added a basic optimization for this case. We can expand on the optimization in future PRs/ issues. > Optimize hit collection of prefilter in KnnVectorQuery for BitSet backed > queries > > > Key: LUCENE-10606 > URL: https://issues.apache.org/jira/browse/LUCENE-10606 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Reporter: Kaival Parikh >Priority: Minor > Labels: performance > Time Spent: 3h 50m > Remaining Estimate: 0h > > While working on this [PR|https://github.com/apache/lucene/pull/932] to add > prefilter testing support, we saw that hit collection took a long time for > BitSetIterator backed scorers (due to iteration over the entire underlying > BitSet, and copying it into an internal one) > These BitSetIterators can be frequent (as they are used in LRUQueryCache), > and bulk collection can be optimized with more knowledge of the underlying > iterator -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10606) Optimize hit collection of prefilter in KnnVectorQuery for BitSet backed queries
[ https://issues.apache.org/jira/browse/LUCENE-10606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julie Tibshirani resolved LUCENE-10606. --- Resolution: Fixed > Optimize hit collection of prefilter in KnnVectorQuery for BitSet backed > queries > > > Key: LUCENE-10606 > URL: https://issues.apache.org/jira/browse/LUCENE-10606 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Reporter: Kaival Parikh >Priority: Minor > Labels: performance > Time Spent: 3h 50m > Remaining Estimate: 0h > > While working on this [PR|https://github.com/apache/lucene/pull/932] to add > prefilter testing support, we saw that hit collection took a long time for > BitSetIterator backed scorers (due to iteration over the entire underlying > BitSet, and copying it into an internal one) > These BitSetIterators can be frequent (as they are used in LRUQueryCache), > and bulk collection can be optimized with more knowledge of the underlying > iterator -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10606) Optimize hit collection of prefilter in KnnVectorQuery for BitSet backed queries
[ https://issues.apache.org/jira/browse/LUCENE-10606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julie Tibshirani updated LUCENE-10606: -- Fix Version/s: 9.3 > Optimize hit collection of prefilter in KnnVectorQuery for BitSet backed > queries > > > Key: LUCENE-10606 > URL: https://issues.apache.org/jira/browse/LUCENE-10606 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Reporter: Kaival Parikh >Priority: Minor > Labels: performance > Fix For: 9.3 > > Time Spent: 3h 50m > Remaining Estimate: 0h > > While working on this [PR|https://github.com/apache/lucene/pull/932] to add > prefilter testing support, we saw that hit collection took a long time for > BitSetIterator backed scorers (due to iteration over the entire underlying > BitSet, and copying it into an internal one) > These BitSetIterators can be frequent (as they are used in LRUQueryCache), > and bulk collection can be optimized with more knowledge of the underlying > iterator -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] iverase merged pull request #2664: [8.11] Backport - LUCENE-9580: Don't introduce collinear edges when splitting polygon
iverase merged PR #2664: URL: https://github.com/apache/lucene-solr/pull/2664 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9580) Tessellator failure for a certain polygon
[ https://issues.apache.org/jira/browse/LUCENE-9580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17559043#comment-17559043 ] ASF subversion and git services commented on LUCENE-9580: - Commit 6a3f50539587cdabe5efe199bc06f6375f1d092a in lucene-solr's branch refs/heads/branch_8_11 from Hugo Mercier [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=6a3f5053958 ] LUCENE-9580: Fix bug in the polygon tessellator when introducing collinear edges during polygon splitting (#2452) (#2664) Co-authored-by: Ignacio Vera > Tessellator failure for a certain polygon > - > > Key: LUCENE-9580 > URL: https://issues.apache.org/jira/browse/LUCENE-9580 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 8.5, 8.6 >Reporter: Iurii Vyshnevskyi >Assignee: Ignacio Vera >Priority: Major > Fix For: 9.0 > > Time Spent: 50m > Remaining Estimate: 0h > > This bug was discovered while using ElasticSearch (checked with versions > 7.6.2 and 7.9.2). > But I've created an isolated test case just for Lucene: > [https://github.com/apache/lucene-solr/pull/2006/files] > > The unit test fails with "java.lang.IllegalArgumentException: Unable to > Tessellate shape". > > The polygon contains two holes that share the same vertex and one more > standalone hole. > Removing any of them makes the unit test pass. > > Changing the least significant digit in any coordinate of the "common vertex" > in any of two first holes, so that these vertices become different in each > hole - also makes unit test pass. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jtibshirani opened a new pull request, #986: Fix FieldExistsQuery rewrite when all docs have vectors
jtibshirani opened a new pull request, #986: URL: https://github.com/apache/lucene/pull/986 Before we were checking the number of vectors in the segment against the total number of documents in IndexReader. This meant FieldExistsQuery would not rewrite to MatchAllDocsQuery when there were multiple segments. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jtibshirani merged pull request #986: Fix FieldExistsQuery rewrite when all docs have vectors
jtibshirani merged PR #986: URL: https://github.com/apache/lucene/pull/986 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on a diff in pull request #951: LUCENE-10606: Optimize Prefilter Hit Collection
jpountz commented on code in PR #951: URL: https://github.com/apache/lucene/pull/951#discussion_r907105526 ## lucene/core/src/java/org/apache/lucene/search/KnnVectorQuery.java: ## @@ -92,20 +91,40 @@ public KnnVectorQuery(String field, float[] target, int k, Query filter) { public Query rewrite(IndexReader reader) throws IOException { TopDocs[] perLeafResults = new TopDocs[reader.leaves().size()]; -BitSetCollector filterCollector = null; +Weight filterWeight = null; if (filter != null) { - filterCollector = new BitSetCollector(reader.leaves().size()); IndexSearcher indexSearcher = new IndexSearcher(reader); BooleanQuery booleanQuery = new BooleanQuery.Builder() .add(filter, BooleanClause.Occur.FILTER) .add(new FieldExistsQuery(field), BooleanClause.Occur.FILTER) .build(); - indexSearcher.search(booleanQuery, filterCollector); + Query rewritten = indexSearcher.rewrite(booleanQuery); + filterWeight = indexSearcher.createWeight(rewritten, ScoreMode.COMPLETE_NO_SCORES, 1f); } for (LeafReaderContext ctx : reader.leaves()) { - TopDocs results = searchLeaf(ctx, filterCollector); + Bits acceptDocs; + int cost; + if (filterWeight != null) { +Scorer scorer = filterWeight.scorer(ctx); +if (scorer != null) { + DocIdSetIterator iterator = scorer.iterator(); + if (iterator instanceof BitSetIterator) { +acceptDocs = ((BitSetIterator) iterator).getBitSet(); + } else { +acceptDocs = BitSet.of(iterator, ctx.reader().maxDoc()); + } + cost = (int) iterator.cost(); Review Comment: > I don't see a good way to do this, since liveDocs is not backed by a FixedBitSet FWIW there is no guarantee that liveDocs are backed by a FixedBitSet, but the default codec always uses a FixedBitSet. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on pull request #972: LUCENE-10480: Use BMM scorer for 2 clauses disjunction
jpountz commented on PR #972: URL: https://github.com/apache/lucene/pull/972#issuecomment-1167025504 > I feel the effect would be similar? Indeed, sorry I had misread your code! > In terms of next steps, I'm wondering if there's a preference between bulk scorer and scorer implementations when performance improvement is similar No, it shouldn't matter. Bulk scorers sometimes help yield better performance because it's easier for them to amortize computation across docs, but if they don't yield better performance, there's no point in using a bulk scorer instead of a regular scorer. I agree that it looks like a great speedup, we should get this in! The benchmark only tests performance of top-level disjunctions of term queries that have two clauses. I'd be curious to get performance numbers for queries like the below ones to see if we need to fine-tune a bit more when this new scorer gets used. Note that I don't think we need to get the performance better for all these queries to merge the change, we could start by only using this new scorer for the (common) case of a top-level disjunction of 2 term queries, and later see if this scorer can handle more disjunctions. ``` OrAndHigMedAndHighMed: (+including +looking) (+date +finished) # disjunction of conjunctions, which don't have as good score upper bounds as term queries OrHighPhraseHighPhrase: "united states" "new york" # disjunction of phrase queries, which don't have as good score upper bounds as term queries and are slow to advance AndHighOrMedMed: +be +(mostly interview) # disjunction within conjunction that leads iteration AndMedOrHighHigh: +interview +(at united) # disjunction within conjunction that doesn't lead iteration ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on a diff in pull request #967: LUCENE-10623: Error implementation of docValueCount for SortingSortedSetDocValues
jpountz commented on code in PR #967: URL: https://github.com/apache/lucene/pull/967#discussion_r907109652 ## lucene/core/src/java/org/apache/lucene/index/SortedSetDocValuesWriter.java: ## @@ -382,23 +386,20 @@ public int advance(int target) { public boolean advanceExact(int target) throws IOException { // needed in IndexSorter#StringSorter docID = target; + initCount(); ordUpto = ords.offsets[docID] - 1; return ords.offsets[docID] > 0; } @Override public long nextOrd() { - long ord = ords.ords.get(ordUpto++); - if (ord == 0) { -return NO_MORE_ORDS; - } else { -return ord - 1; - } + return ords.ords.get(ordUpto++); Review Comment: We should keep returning NO_MORE_ORDS when ords are exhausted. ## lucene/core/src/java/org/apache/lucene/index/SortedSetDocValuesWriter.java: ## @@ -415,34 +416,43 @@ public BytesRef lookupOrd(long ord) throws IOException { public long getValueCount() { return in.getValueCount(); } + +private void initCount() { + assert docID >= 0; + count = (int) ords.growableWriter.get(docID); +} } static final class DocOrds { final long[] offsets; final PackedLongValues ords; +final GrowableWriter growableWriter; Review Comment: Let's call it `docValueCounts` or something like that that better reflects what it stores? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] LuXugang commented on a diff in pull request #967: LUCENE-10623: Error implementation of docValueCount for SortingSortedSetDocValues
LuXugang commented on code in PR #967: URL: https://github.com/apache/lucene/pull/967#discussion_r907126551 ## lucene/core/src/java/org/apache/lucene/index/SortedSetDocValuesWriter.java: ## @@ -382,23 +386,20 @@ public int advance(int target) { public boolean advanceExact(int target) throws IOException { // needed in IndexSorter#StringSorter docID = target; + initCount(); ordUpto = ords.offsets[docID] - 1; return ords.offsets[docID] > 0; } @Override public long nextOrd() { - long ord = ords.ords.get(ordUpto++); - if (ord == 0) { -return NO_MORE_ORDS; - } else { -return ord - 1; - } + return ords.ords.get(ordUpto++); Review Comment: @jpountz , I have already use the new ords iteration style for `SortingSortedSetDocValues` in https://github.com/apache/lucene/pull/967/commits/2de6d0c071bf3344f8f026f023df22953aab9ee3, maybe NO_MORE_ORDS is no longer needer? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on a diff in pull request #967: LUCENE-10623: Error implementation of docValueCount for SortingSortedSetDocValues
jpountz commented on code in PR #967: URL: https://github.com/apache/lucene/pull/967#discussion_r907164385 ## lucene/core/src/java/org/apache/lucene/index/SortedSetDocValuesWriter.java: ## @@ -382,23 +386,20 @@ public int advance(int target) { public boolean advanceExact(int target) throws IOException { // needed in IndexSorter#StringSorter docID = target; + initCount(); ordUpto = ords.offsets[docID] - 1; return ords.offsets[docID] > 0; } @Override public long nextOrd() { - long ord = ords.ords.get(ordUpto++); - if (ord == 0) { -return NO_MORE_ORDS; - } else { -return ord - 1; - } + return ords.ords.get(ordUpto++); Review Comment: The problem is that custom DocValuesFormats would no longer work if they iterate over values using NO_MORE_ORDS. Maybe it's ok because users who write custom codecs are expert users, but I'd rather discuss it on a separate issue that do it silently as part of this bug fix? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] LuXugang commented on a diff in pull request #967: LUCENE-10623: Error implementation of docValueCount for SortingSortedSetDocValues
LuXugang commented on code in PR #967: URL: https://github.com/apache/lucene/pull/967#discussion_r907169606 ## lucene/core/src/java/org/apache/lucene/index/SortedSetDocValuesWriter.java: ## @@ -382,23 +386,20 @@ public int advance(int target) { public boolean advanceExact(int target) throws IOException { // needed in IndexSorter#StringSorter docID = target; + initCount(); ordUpto = ords.offsets[docID] - 1; return ords.offsets[docID] > 0; } @Override public long nextOrd() { - long ord = ords.ords.get(ordUpto++); - if (ord == 0) { -return NO_MORE_ORDS; - } else { -return ord - 1; - } + return ords.ords.get(ordUpto++); Review Comment: Thanks for the explanation! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10622) Prepare complete migration script to GitHub issue from Jira (best effort)
[ https://issues.apache.org/jira/browse/LUCENE-10622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17559095#comment-17559095 ] Tomoko Uchida commented on LUCENE-10622: There were three issues that won't be imported. [LUCENE-1498] {code} [2022-06-26 18:38:25,394] ERROR:import_github_issues: Import GitHub issue /mnt/hdd/repo/sandbox-lucene-10557/migration/github-import-data/GH-LUCENE-1498.json was failed. status=failed, errors=[{'location': '/issue', 'resource': 'Issue', 'field': None, 'value': None, 'code': 'error'}] {code} Have no idea about the cause. Maybe the body contains character sequences that are not acceptable to GitHub. There are only two comments, it'd be easy to manually port it. [LUCENE-4344] Looks like this is redirected to [SOLR-3769] - no action required. [LUCENE-5612] {code} [2022-06-27 02:28:25,450] ERROR:github_issues_util: Failed to import issue LockStressTest fails always with NativeFSLockFactory [LUCENE-5612]; status_code=413, message={"message":"Payload too big: 1048576 bytes are allowed, 1468832 bytes were posted.","documentation_url":"https://docs.github.com/rest"} {code} The data size exceeds the API's limit (1MB). I think the long stacktrace in a comment is the cause. Maybe we could trim the comments and manually port the trimmed comments afterward. Other 10608 issues were successfully imported in 20 hours (the first pass). > Prepare complete migration script to GitHub issue from Jira (best effort) > - > > Key: LUCENE-10622 > URL: https://issues.apache.org/jira/browse/LUCENE-10622 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Tomoko Uchida >Assignee: Tomoko Uchida >Priority: Major > > If we intend to move the history to GitHub, it should be perfect as far as > possible - significantly degraded copies of history are harmful, rather than > helpful for future contributors, I think. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] alessandrobenedetti commented on pull request #926: VectorSimilarityFunction reverse removal
alessandrobenedetti commented on PR #926: URL: https://github.com/apache/lucene/pull/926#issuecomment-1167161892 My plan is to merge tomorrow morning UK time. If you have any additional concerns let me know! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] LuXugang commented on a diff in pull request #967: LUCENE-10623: Error implementation of docValueCount for SortingSortedSetDocValues
LuXugang commented on code in PR #967: URL: https://github.com/apache/lucene/pull/967#discussion_r907325756 ## lucene/core/src/java/org/apache/lucene/index/SortedSetDocValuesWriter.java: ## @@ -415,34 +416,43 @@ public BytesRef lookupOrd(long ord) throws IOException { public long getValueCount() { return in.getValueCount(); } + +private void initCount() { + assert docID >= 0; + count = (int) ords.growableWriter.get(docID); +} } static final class DocOrds { final long[] offsets; final PackedLongValues ords; +final GrowableWriter growableWriter; Review Comment: Addressed in https://github.com/apache/lucene/pull/967/commits/48a1ec2c9e263769c60e24582e69c9cd8d00e382 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] LuXugang commented on a diff in pull request #967: LUCENE-10623: Error implementation of docValueCount for SortingSortedSetDocValues
LuXugang commented on code in PR #967: URL: https://github.com/apache/lucene/pull/967#discussion_r907329493 ## lucene/core/src/java/org/apache/lucene/index/SortedSetDocValuesWriter.java: ## @@ -382,23 +386,20 @@ public int advance(int target) { public boolean advanceExact(int target) throws IOException { // needed in IndexSorter#StringSorter docID = target; + initCount(); ordUpto = ords.offsets[docID] - 1; return ords.offsets[docID] > 0; } @Override public long nextOrd() { - long ord = ords.ords.get(ordUpto++); - if (ord == 0) { -return NO_MORE_ORDS; - } else { -return ord - 1; - } + return ords.ords.get(ordUpto++); Review Comment: @jpountz I revert part code to old ord iteration style so that the changes in https://github.com/apache/lucene/pull/967/commits/8ccc59812e12386dd684e5a0b85a78a0495fcb11 could be verified, so we could focus on fixing bug first. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on a diff in pull request #967: LUCENE-10623: Error implementation of docValueCount for SortingSortedSetDocValues
jpountz commented on code in PR #967: URL: https://github.com/apache/lucene/pull/967#discussion_r907352353 ## lucene/core/src/java/org/apache/lucene/index/SortedSetDocValuesWriter.java: ## @@ -415,34 +419,45 @@ public BytesRef lookupOrd(long ord) throws IOException { public long getValueCount() { return in.getValueCount(); } + +private void initCount() { + assert docID >= 0; + ordUpto = ords.offsets[docID] - 1; + count = (int) ords.docValueCounts.get(docID); + limit = ordUpto + count; +} } static final class DocOrds { final long[] offsets; final PackedLongValues ords; +final GrowableWriter docValueCounts; + +public static final int START_BITS_PER_VALUE = 2; DocOrds( int maxDoc, Sorter.DocMap sortMap, SortedSetDocValues oldValues, -float acceptableOverheadRatio) +float acceptableOverheadRatio, +int bitsPerValue) throws IOException { offsets = new long[maxDoc]; PackedLongValues.Builder builder = PackedLongValues.packedBuilder(acceptableOverheadRatio); - long ordOffset = 1; // 0 marks docs with no values + docValueCounts = new GrowableWriter(bitsPerValue, maxDoc, acceptableOverheadRatio); + long ordOffset = 1; Review Comment: Let's start at 0 to not have to subtract 0 all the time in `initCount()`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10627) Using CompositeByteBuf to Reduce Memory Copy
LuYunCheng created LUCENE-10627: --- Summary: Using CompositeByteBuf to Reduce Memory Copy Key: LUCENE-10627 URL: https://issues.apache.org/jira/browse/LUCENE-10627 Project: Lucene - Core Issue Type: Improvement Components: core/codecs, core/store Reporter: LuYunCheng I see When Lucene Do flush and merge store fields, need many memory copies: {code:java} Lucene Merge Thread #25940]" #906546 daemon prio=5 os_prio=0 cpu=20503.95ms elapsed=68.76s tid=0x7ee990002c50 nid=0x3aac54 runnable [0x7f17718db000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.store.ByteBuffersDataOutput.toArrayCopy(ByteBuffersDataOutput.java:271) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.flush(CompressingStoredFieldsWriter.java:239) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.finishDocument(CompressingStoredFieldsWriter.java:169) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:654) at org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:228) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:105) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4760) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4364) at org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5923) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:624) at org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:100) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:682) {code} When Lucene *CompressingStoredFieldsWriter* do flush documents, it needs many memory copies: With Lucene90 using {*}LZ4WithPresetDictCompressionMode{*}: # bufferedDocs.toArrayCopy copy blocks into one continue content for chunk compress # compressor copy dict and data into one block buffer # do compress # copy compressed data out With Lucene90 using {*}DeflateWithPresetDictCompressionMode{*}: # bufferedDocs.toArrayCopy copy blocks into one continue content for chunk compress # do compress # copy compressed data out I think we can use CompositeByteBuf to reduce temp memory copies: # we do not have to *bufferedDocs.toArrayCopy* when just need continues content for chunk compress I write a simple mini benchamrk in test code: *LZ4WithPresetDict run* Capacity:41943040(bytes) , iter 10times: Origin elapse:5391ms , New elapse:5297ms *DeflateWithPresetDict run* Capacity:41943040(bytes), iter 10times: Origin elapse:115ms, New elapse:12ms And I run runStoredFieldsBenchmark with doc_limit=-1: shows: ||Msec to index||BEST_SPEED ||BEST_COMPRESSION|| |Baseline|318877.00|606288.00| |Candidate|314442.00|604719.00| -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] luyuncheng opened a new pull request, #987: Using CompositeByteBuf to Reduce Memory Copy
luyuncheng opened a new pull request, #987: URL: https://github.com/apache/lucene/pull/987 JIRA: https://issues.apache.org/jira/browse/LUCENE-10627 I see When Lucene Do flush and merge store fields, need many memory copies: ``` Lucene Merge Thread #25940]" #906546 daemon prio=5 os_prio=0 cpu=20503.95ms elapsed=68.76s tid=0x7ee990002c50 nid=0x3aac54 runnable [0x7f17718db000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.store.ByteBuffersDataOutput.toArrayCopy(ByteBuffersDataOutput.java:271) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.flush(CompressingStoredFieldsWriter.java:239) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.finishDocument(CompressingStoredFieldsWriter.java:169) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:654) at org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:228) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:105) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4760) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4364) at org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5923) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:624) at org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:100) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:682) ``` When Lucene CompressingStoredFieldsWriter do flush documents, it needs many memory copies: - With Lucene90 using LZ4WithPresetDictCompressionMode: 1. bufferedDocs.toArrayCopy copy blocks into one continue content for chunk compress 2. compressor copy dict and data into one block buffer 3. do compress 4. copy compressed data out - With Lucene90 using DeflateWithPresetDictCompressionMode: 1. bufferedDocs.toArrayCopy copy blocks into one continue content for chunk compress 2. do compress 3. copy compressed data out I think we can `use CompositeByteBuf` to **reduce temp memory copies** : - we do not have to bufferedDocs.toArrayCopy when just need continues content for chunk compress I write a simple mini benchamrk in test code: LZ4WithPresetDict run Capacity:41943040(bytes) , iter 10times: `Origin elapse:5391ms , New elapse:5297ms` DeflateWithPresetDict run Capacity:41943040(bytes), iter 10times: `Origin elapse:115ms, New elapse:12ms` And I run runStoredFieldsBenchmark with doc_limit=-1: shows: Msec to index | BEST_SPEED | BEST_COMPRESSION -- | -- | -- Baseline | 318877.00 | 606288.00 Candidate | 314442.00 | 604719.00 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10627) Using CompositeByteBuf to Reduce Memory Copy
[ https://issues.apache.org/jira/browse/LUCENE-10627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] LuYunCheng updated LUCENE-10627: Description: Code: https://github.com/apache/lucene/pull/987 I see When Lucene Do flush and merge store fields, need many memory copies: {code:java} Lucene Merge Thread #25940]" #906546 daemon prio=5 os_prio=0 cpu=20503.95ms elapsed=68.76s tid=0x7ee990002c50 nid=0x3aac54 runnable [0x7f17718db000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.store.ByteBuffersDataOutput.toArrayCopy(ByteBuffersDataOutput.java:271) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.flush(CompressingStoredFieldsWriter.java:239) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.finishDocument(CompressingStoredFieldsWriter.java:169) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:654) at org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:228) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:105) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4760) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4364) at org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5923) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:624) at org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:100) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:682) {code} When Lucene *CompressingStoredFieldsWriter* do flush documents, it needs many memory copies: With Lucene90 using {*}LZ4WithPresetDictCompressionMode{*}: # bufferedDocs.toArrayCopy copy blocks into one continue content for chunk compress # compressor copy dict and data into one block buffer # do compress # copy compressed data out With Lucene90 using {*}DeflateWithPresetDictCompressionMode{*}: # bufferedDocs.toArrayCopy copy blocks into one continue content for chunk compress # do compress # copy compressed data out I think we can use CompositeByteBuf to reduce temp memory copies: # we do not have to *bufferedDocs.toArrayCopy* when just need continues content for chunk compress I write a simple mini benchamrk in test code: *LZ4WithPresetDict run* Capacity:41943040(bytes) , iter 10times: Origin elapse:5391ms , New elapse:5297ms *DeflateWithPresetDict run* Capacity:41943040(bytes), iter 10times: Origin elapse:115ms, New elapse:12ms And I run runStoredFieldsBenchmark with doc_limit=-1: shows: ||Msec to index||BEST_SPEED ||BEST_COMPRESSION|| |Baseline|318877.00|606288.00| |Candidate|314442.00|604719.00| was: I see When Lucene Do flush and merge store fields, need many memory copies: {code:java} Lucene Merge Thread #25940]" #906546 daemon prio=5 os_prio=0 cpu=20503.95ms elapsed=68.76s tid=0x7ee990002c50 nid=0x3aac54 runnable [0x7f17718db000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.store.ByteBuffersDataOutput.toArrayCopy(ByteBuffersDataOutput.java:271) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.flush(CompressingStoredFieldsWriter.java:239) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.finishDocument(CompressingStoredFieldsWriter.java:169) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:654) at org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:228) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:105) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4760) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4364) at org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5923) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:624) at org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:100) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:682) {code} When Lucene *CompressingStoredFieldsWriter* do flush documents, it needs many memory copies: With Lucene90 using {*}LZ4WithPresetDictCompressionMode{*}: # bufferedDocs.toArrayCopy copy blocks into one continue content for chunk compress # compressor copy dict and data into one block buffer # do compress # copy compressed data out With Lucene90 using {*}DeflateWithPresetDictCompressionMode{*}: # bufferedDocs.toArrayCopy copy blocks into one continue content for chunk compress # do compress # copy compressed data out I think we c
[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira
[ https://issues.apache.org/jira/browse/LUCENE-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17559161#comment-17559161 ] Tomoko Uchida commented on LUCENE-10557: I opened an INFRA issue https://issues.apache.org/jira/browse/INFRA-23421 > Migrate to GitHub issue from Jira > - > > Key: LUCENE-10557 > URL: https://issues.apache.org/jira/browse/LUCENE-10557 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Tomoko Uchida >Assignee: Tomoko Uchida >Priority: Major > > A few (not the majority) Apache projects already use the GitHub issue instead > of Jira. For example, > Airflow: [https://github.com/apache/airflow/issues] > BookKeeper: [https://github.com/apache/bookkeeper/issues] > So I think it'd be technically possible that we move to GitHub issue. I have > little knowledge of how to proceed with it, I'd like to discuss whether we > should migrate to it, and if so, how to smoothly handle the migration. > The major tasks would be: > * (/) Get a consensus about the migration among committers > * Choose issues that should be moved to GitHub > ** Discussion thread > [https://lists.apache.org/thread/1p3p90k5c0d4othd2ct7nj14bkrxkr12] > ** -Conclusion for now: We don't migrate any issues. Only new issues should > be opened on GitHub.- > ** Write a prototype migration script - the decision could be made on that. > Things to consider: > *** version numbers - labels or milestones? > *** add a comment/ prepend a link to the source Jira issue on github side, > *** add a comment/ prepend a link on the jira side to the new issue on > github side (for people who access jira from blogs, mailing list archives and > other sources that will have stale links), > *** convert cross-issue automatic links in comments/ descriptions (as > suggested by Robert), > *** strategy to deal with sub-issues (hierarchies), > *** maybe prefix (or postfix) the issue title on github side with the > original LUCENE-XYZ key so that it is easier to search for a particular issue > there? > *** how to deal with user IDs (author, reporter, commenters)? Do they have > to be github users? Will information about people not registered on github be > lost? > *** create an extra mapping file of old-issue-new-issue URLs for any > potential future uses. > *** what to do with issue numbers in git/svn commits? These could be > rewritten but it'd change the entire git history tree - I don't think this is > practical, while doable. > * Build the convention for issue label/milestone management > ** Do some experiments on a sandbox repository > [https://github.com/mocobeta/sandbox-lucene-10557] > ** Make documentation for metadata (label/milestone) management > * Enable Github issue on the lucene's repository > ** Raise an issue on INFRA > ** (Create an issue-only private repository for sensitive issues if it's > needed and allowed) > ** Set a mail hook to > [issues@lucene.apache.org|mailto:issues@lucene.apache.org] (many thanks to > the general mail group name) > * Set a schedule for migration > ** Give some time to committers to play around with issues/labels/milestones > before the actual migration > ** Make an announcement on the mail lists > ** Show some text messages when opening a new Jira issue (in issue template?) -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10627) Using CompositeByteBuf to Reduce Memory Copy
[ https://issues.apache.org/jira/browse/LUCENE-10627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] LuYunCheng updated LUCENE-10627: Description: Code: [https://github.com/apache/lucene/pull/987] I see When Lucene Do flush and merge store fields, need many memory copies: {code:java} Lucene Merge Thread #25940]" #906546 daemon prio=5 os_prio=0 cpu=20503.95ms elapsed=68.76s tid=0x7ee990002c50 nid=0x3aac54 runnable [0x7f17718db000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.store.ByteBuffersDataOutput.toArrayCopy(ByteBuffersDataOutput.java:271) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.flush(CompressingStoredFieldsWriter.java:239) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.finishDocument(CompressingStoredFieldsWriter.java:169) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:654) at org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:228) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:105) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4760) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4364) at org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5923) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:624) at org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:100) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:682) {code} When Lucene *CompressingStoredFieldsWriter* do flush documents, it needs many memory copies: With Lucene90 using {*}LZ4WithPresetDictCompressionMode{*}: # bufferedDocs.toArrayCopy copy blocks into one continue content for chunk compress # compressor copy dict and data into one block buffer # do compress # copy compressed data out With Lucene90 using {*}DeflateWithPresetDictCompressionMode{*}: # bufferedDocs.toArrayCopy copy blocks into one continue content for chunk compress # do compress # copy compressed data out I think we can use CompositeByteBuf to reduce temp memory copies: # we do not have to *bufferedDocs.toArrayCopy* when just need continues content for chunk compress I write a simple mini benchamrk in test code: *LZ4WithPresetDict run* Capacity:41943040(bytes) , iter 10times: Origin elapse:5391ms , New elapse:5297ms *DeflateWithPresetDict run* Capacity:41943040(bytes), iter 10times: Origin elapse:{*}115ms{*}, New elapse:{*}12ms{*} And I run runStoredFieldsBenchmark with doc_limit=-1: shows: ||Msec to index||BEST_SPEED ||BEST_COMPRESSION|| |Baseline|318877.00|606288.00| |Candidate|314442.00|604719.00| was: Code: https://github.com/apache/lucene/pull/987 I see When Lucene Do flush and merge store fields, need many memory copies: {code:java} Lucene Merge Thread #25940]" #906546 daemon prio=5 os_prio=0 cpu=20503.95ms elapsed=68.76s tid=0x7ee990002c50 nid=0x3aac54 runnable [0x7f17718db000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.store.ByteBuffersDataOutput.toArrayCopy(ByteBuffersDataOutput.java:271) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.flush(CompressingStoredFieldsWriter.java:239) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.finishDocument(CompressingStoredFieldsWriter.java:169) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:654) at org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:228) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:105) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4760) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4364) at org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5923) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:624) at org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:100) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:682) {code} When Lucene *CompressingStoredFieldsWriter* do flush documents, it needs many memory copies: With Lucene90 using {*}LZ4WithPresetDictCompressionMode{*}: # bufferedDocs.toArrayCopy copy blocks into one continue content for chunk compress # compressor copy dict and data into one block buffer # do compress # copy compressed data out With Lucene90 using {*}DeflateWithPresetDictCompressionMode{*}: # bufferedDocs.toArrayCopy copy blocks into one continue content for chunk compress # do compress # copy
[GitHub] [lucene] mayya-sharipova commented on pull request #926: VectorSimilarityFunction reverse removal
mayya-sharipova commented on PR #926: URL: https://github.com/apache/lucene/pull/926#issuecomment-1167351533 @alessandrobenedetti Thanks for running the tests, the test results look good to me. I was also wondering if you have addressed the previous Mike S.'s [comment](https://github.com/apache/lucene/pull/926#issuecomment-1164418508). I assume that your train files (e.g.`sift-128-euclidean.hdf5-test `) are not in hdf5 format, but just has it in its name. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta opened a new pull request, #988: LUCENE-10557: temprarily enable github issue
mocobeta opened a new pull request, #988: URL: https://github.com/apache/lucene/pull/988 This temporarily enables github issue for testing (LUCENE-10557). https://cwiki.apache.org/confluence/display/INFRA/Git+-+.asf.yaml+features#Git.asf.yamlfeatures-Repositoryfeatures After checking it works, I'll re-disable the feature until actual migration. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10571) Monitor alternative "TermFilter" Presearcher for sparse filter fields
[ https://issues.apache.org/jira/browse/LUCENE-10571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17559248#comment-17559248 ] Chris M. Hostetter commented on LUCENE-10571: - /ping [~romseygeek] ... curious if you have any thoughts on this? > Monitor alternative "TermFilter" Presearcher for sparse filter fields > - > > Key: LUCENE-10571 > URL: https://issues.apache.org/jira/browse/LUCENE-10571 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/monitor >Reporter: Chris M. Hostetter >Priority: Major > Attachments: LUCENE-10571.patch > > > One of the things that surprised me the most when looking into how the > {{TermFilteredPresearcher}} worked was what happens when Queries and/or > Documents do _NOT_ have a value in a configured filter field. > per the javadocs... > {quote}Filtering by additional fields can be configured by passing a set of > field names. Documents that contain values in those fields will only be > checked against \{@link MonitorQuery} instances that have the same > fieldname-value mapping in their metadata. > {quote} > ...which is straightforward and useful in the tested example where every > registered Query has {{"language"}} metadata, and every Document has a > {{"language"}} field, but gives unintuitive results when a Query or Document > does *NOT* have a {{"language"}} > A more "intuitive" & useful (in my opinions) implementation would be > something that could be documented as ... > {quote}Filtering by additional fields can be configured by passing a set of > field names. Documents that contain values in those fields will only be > checked against \{@link MonitorQuery} instances > that have the same fieldname-value mapping in their metadata or have no > mapping for that fieldname. > Documents that do not contain values in those fields will only be checked > against \{@link MonitorQuery} instances that also have no mapping for that > fieldname. > {quote} > ...ie: instead of being a straight "filter candidate queries by what we find > in the filter fields in the documents" we can instead "derive the queries > that are viable candidates for each document if we were restricting the set > of documents by those values during a "forward search" -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #978: Remove/deprecate obsolete constants in oal.util.Constants; remove code which is no longer executed after Java 9
uschindler commented on PR #978: URL: https://github.com/apache/lucene/pull/978#issuecomment-1167583777 I will merge this later this evening unless somebody complains :-) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira
[ https://issues.apache.org/jira/browse/LUCENE-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17559276#comment-17559276 ] Michael McCandless commented on LUCENE-10557: - {quote} Jira markup is converted into Markdown for rendering. * There are many conversion errors and need close investigation.{quote} This seems perhaps solvable, relatively quickly – the conversion tool is open-source right? Tables seem flaky ... what other markup? I can try to dive deep on this if I can make some time. Let's not rush this conversion. {quote}"attachments" (patches, images, etc) cannot be migrated with basic GitHub API functionality. * There could be workarounds; e.g. save them in another github repo and rewrite attachment links to refer to them.{quote} I thought the "unofficial" migration API might support attachments? Or are there big problems with using that API? {quote}As a reference I will migrate existing all issues into a test repository in shortly. Hope we can make a decision by looking at it - I mean, I'll be not able to further invest my time in this PoC. I'll post the PoC migration result to the dev list to ask if we should proceed with it or not next week. {quote} +1! Thank you for pushing so hard on this [~tomoko]! Let's not rush the decision ... others can try to push your PoC forwards too to improve the migration quality. This is worth the one-time investment. And hey, maybe we enable something that future Jira -> GitHub issues migrations can use. > Migrate to GitHub issue from Jira > - > > Key: LUCENE-10557 > URL: https://issues.apache.org/jira/browse/LUCENE-10557 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Tomoko Uchida >Assignee: Tomoko Uchida >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > A few (not the majority) Apache projects already use the GitHub issue instead > of Jira. For example, > Airflow: [https://github.com/apache/airflow/issues] > BookKeeper: [https://github.com/apache/bookkeeper/issues] > So I think it'd be technically possible that we move to GitHub issue. I have > little knowledge of how to proceed with it, I'd like to discuss whether we > should migrate to it, and if so, how to smoothly handle the migration. > The major tasks would be: > * (/) Get a consensus about the migration among committers > * Choose issues that should be moved to GitHub > ** Discussion thread > [https://lists.apache.org/thread/1p3p90k5c0d4othd2ct7nj14bkrxkr12] > ** -Conclusion for now: We don't migrate any issues. Only new issues should > be opened on GitHub.- > ** Write a prototype migration script - the decision could be made on that. > Things to consider: > *** version numbers - labels or milestones? > *** add a comment/ prepend a link to the source Jira issue on github side, > *** add a comment/ prepend a link on the jira side to the new issue on > github side (for people who access jira from blogs, mailing list archives and > other sources that will have stale links), > *** convert cross-issue automatic links in comments/ descriptions (as > suggested by Robert), > *** strategy to deal with sub-issues (hierarchies), > *** maybe prefix (or postfix) the issue title on github side with the > original LUCENE-XYZ key so that it is easier to search for a particular issue > there? > *** how to deal with user IDs (author, reporter, commenters)? Do they have > to be github users? Will information about people not registered on github be > lost? > *** create an extra mapping file of old-issue-new-issue URLs for any > potential future uses. > *** what to do with issue numbers in git/svn commits? These could be > rewritten but it'd change the entire git history tree - I don't think this is > practical, while doable. > * Build the convention for issue label/milestone management > ** Do some experiments on a sandbox repository > [https://github.com/mocobeta/sandbox-lucene-10557] > ** Make documentation for metadata (label/milestone) management > * Enable Github issue on the lucene's repository > ** Raise an issue on INFRA > ** (Create an issue-only private repository for sensitive issues if it's > needed and allowed) > ** Set a mail hook to > [issues@lucene.apache.org|mailto:issues@lucene.apache.org] (many thanks to > the general mail group name) > * Set a schedule for migration > ** Give some time to committers to play around with issues/labels/milestones > before the actual migration > ** Make an announcement on the mail lists > ** Show some text messages when opening a new Jira issue (in issue template?) -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache
[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira
[ https://issues.apache.org/jira/browse/LUCENE-10557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17559285#comment-17559285 ] Uwe Schindler commented on LUCENE-10557: Once we have done this: Should we rewrite CHANGES.txt and replace all LUCENE- links to GITHUB# links? > Migrate to GitHub issue from Jira > - > > Key: LUCENE-10557 > URL: https://issues.apache.org/jira/browse/LUCENE-10557 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Tomoko Uchida >Assignee: Tomoko Uchida >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > A few (not the majority) Apache projects already use the GitHub issue instead > of Jira. For example, > Airflow: [https://github.com/apache/airflow/issues] > BookKeeper: [https://github.com/apache/bookkeeper/issues] > So I think it'd be technically possible that we move to GitHub issue. I have > little knowledge of how to proceed with it, I'd like to discuss whether we > should migrate to it, and if so, how to smoothly handle the migration. > The major tasks would be: > * (/) Get a consensus about the migration among committers > * Choose issues that should be moved to GitHub > ** Discussion thread > [https://lists.apache.org/thread/1p3p90k5c0d4othd2ct7nj14bkrxkr12] > ** -Conclusion for now: We don't migrate any issues. Only new issues should > be opened on GitHub.- > ** Write a prototype migration script - the decision could be made on that. > Things to consider: > *** version numbers - labels or milestones? > *** add a comment/ prepend a link to the source Jira issue on github side, > *** add a comment/ prepend a link on the jira side to the new issue on > github side (for people who access jira from blogs, mailing list archives and > other sources that will have stale links), > *** convert cross-issue automatic links in comments/ descriptions (as > suggested by Robert), > *** strategy to deal with sub-issues (hierarchies), > *** maybe prefix (or postfix) the issue title on github side with the > original LUCENE-XYZ key so that it is easier to search for a particular issue > there? > *** how to deal with user IDs (author, reporter, commenters)? Do they have > to be github users? Will information about people not registered on github be > lost? > *** create an extra mapping file of old-issue-new-issue URLs for any > potential future uses. > *** what to do with issue numbers in git/svn commits? These could be > rewritten but it'd change the entire git history tree - I don't think this is > practical, while doable. > * Build the convention for issue label/milestone management > ** Do some experiments on a sandbox repository > [https://github.com/mocobeta/sandbox-lucene-10557] > ** Make documentation for metadata (label/milestone) management > * Enable Github issue on the lucene's repository > ** Raise an issue on INFRA > ** (Create an issue-only private repository for sensitive issues if it's > needed and allowed) > ** Set a mail hook to > [issues@lucene.apache.org|mailto:issues@lucene.apache.org] (many thanks to > the general mail group name) > * Set a schedule for migration > ** Give some time to committers to play around with issues/labels/milestones > before the actual migration > ** Make an announcement on the mail lists > ** Show some text messages when opening a new Jira issue (in issue template?) -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira
Title: Message Title Dawid Weiss commented on LUCENE-10557 Re: Migrate to GitHub issue from Jira I toyed with attachments a bit. I've modified Tomoko's code a bit so that it fetches attachments for each issue and places is under attachments/LUCENE-xyz/blob.ext. I fetched about half of the attachments from Jira and they total ~350MB. So they're quite large but not unbearably large. I created a separate test repository (https://github.com/dweiss/lucene-jira-migration), with a subset of attachment blobs and an example issue (https://github.com/dweiss/lucene-jira-migration/issues/1) that links to them via gh-pages service URLs. Seems to work (mime types, etc.). The test repository has an orphaned (separate root) branch for just the attachment blobs but they're still downloaded when you clone the master branch (which I kind of hoped could be avoided). This means that we'd have to either ask infra to create a separate repository for the ported attachments or keep those attachments in the main Lucene repository (and pay the price of an extra ~1GB of download size when doing a full clone). I didn't check for multiple attachments with the same name (perhaps it's uncommon but definitely possible) - these would have to be saved under a subfolder or something, so that they can be distinguished. A mapping of original attachment URLs and new attachment URLs could also be preserved/ written. Since the attachments are a git repository, they should be searchable but for some reason it didn't work for me (maybe needs time to update the index). Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[jira] [Comment Edited] (LUCENE-10557) Migrate to GitHub issue from Jira
Title: Message Title Dawid Weiss edited a comment on LUCENE-10557 Re: Migrate to GitHub issue from Jira I toyed with attachments a bit. * I've modified Tomoko's code a bit so that it fetches attachments for each issue and places is under {{{}attachments/LUCENE-xyz/blob.ext{}}}. * I fetched about half of the attachments from Jira and they total ~350MB. So they're quite large but not unbearably large. * I created a separate test repository ( [ https://github.com/dweiss/lucene-jira-migration ] ), with a subset of attachment blobs and an example issue ( [ https://github.com/dweiss/lucene-jira-migration/issues/1 ] ) that links to them via gh-pages service URLs. Seems to work (mime types, etc.). * The test repository has an orphaned (separate root) branch for just the attachment blobs but they're still downloaded when you clone the master branch (which I kind of hoped could be avoided). This means that we'd have to either ask infra to create a separate repository for the ported attachments or keep those attachments in the main Lucene repository (and pay the price of an extra ~1GB of download size when doing a full clone). * I didn't check for multiple attachments with the same name (perhaps it's uncommon but definitely possible) - these would have to be saved under a subfolder or something, so that they can be distinguished. * A mapping of original attachment URLs and new attachment URLs could also be preserved/ written. * Since the attachments are a git repository, they should be searchable but for some reason it didn't work for me (maybe needs time to update the index). This is just an experiment, I don't mean to imply it has to be done (or should). I was just curious as to what's possible. Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira
Title: Message Title Tomoko Uchida commented on LUCENE-10557 Re: Migrate to GitHub issue from Jira Michael McCandless This seems perhaps solvable, relatively quickly – the conversion tool is open-source right? Tables seem flaky ... what other markup? I can try to dive deep on this if I can make some time. Let's not rush this conversion. Besides tables, even simple bullet lists are broken. I haven't closely looked at it yet, but I suspect there may be problems in the source text (Jira dump). It could be easily fixed once we find the root cause. I thought the "unofficial" migration API might support attachments? Or are there big problems with using that API? The unofficial import API does not support binaries, you can only import texts to GitHub with official or unofficial APIs. They have to be stored in other places, outside the main repository (a file storage or another repository). Let's not rush the decision ... others can try to push your PoC forwards too to improve the migration quality. This is worth the one-time investment. And hey, maybe we enable something that future Jira -> GitHub issues migrations can use. I understand we can't push others to make a decision though, a progress report could be useful since I think we have not reached any conclusion yet. As for "others can try to push your PoC forwards too to improve the migration quality", yes it could happen but to be honest I don't expect there are other people who want to be involved in this task. Uwe Schindler Once we have done this: Should we rewrite CHANGES.txt and replace all LUCENE- links to GITHUB# links? I'm not sure if it should be done. Just for your information the current changes2html.pl supports only Pull Requests, so it should be changed if we want to mention GitHub issues in CHANGES. Dawid Weiss I created a separate test repository (https://github.com/dweiss/lucene-jira-migration), with a subset of attachment blobs and an example issue (https://github.com/dweiss/lucene-jira-migration/issues/1) that links to them via gh-pages service URLs. Seems to work (mime types, etc.). Do we need a git repository at all? We won't version control for the files. Is a file storage sufficient and easy to handle if we can have one? This means that we'd have to either ask infra to create a separate repository for the ported attachments or keep those attachments in the main Lucene repository (and pay the price of an extra ~1GB of download size when doing a full clone). This is actually the main concern to me. Unfortunately I don't think I'll be able to explain our needs and request support from infra team. I'm sure I won't be abl
[jira] [Comment Edited] (LUCENE-10557) Migrate to GitHub issue from Jira
Title: Message Title Tomoko Uchida edited a comment on LUCENE-10557 Re: Migrate to GitHub issue from Jira [~mikemccand]bq. This seems perhaps solvable, relatively quickly – the conversion tool is open-source right? Tables seem flaky ... what other markup? I can try to dive deep on this if I can make some time. Let's not rush this conversion.Besides tables, even simple bullet lists are broken. I haven't closely looked at it yet, but I suspect there may be problems in the source text (Jira dump). It could be easily fixed once we find the root cause.bq. I thought the "unofficial" migration API might support attachments? Or are there big problems with using that API?The unofficial import API does not support binaries, you can only import only " texts " to GitHub with official or unofficial APIs. They have to be stored in other places, maybe outside the main repository (a file storage or another repository).bq. Let's not rush the decision ... others can try to push your PoC forwards too to improve the migration quality. This is worth the one-time investment. And hey, maybe we enable something that future Jira -> GitHub issues migrations can use.I understand we can't push others to make a decision though, a progress report could be useful since I think we have not reached any conclusion yet.As for "others can try to push your PoC forwards too to improve the migration quality", yes it could happen but to be honest I don't expect there are other people who want to be involved in this task.[~uschindler]bq. Once we have done this: Should we rewrite CHANGES.txt and replace all LUCENE- links to GITHUB# links?I'm not sure if it should be done. Just for your information the current {{changes2html.pl}} supports only Pull Requests, so it the script should be changed if we want to mention GitHub issues in CHANGES.(I have little experience with perl, but I'll take a look if it's needed. Maybe we should also support issues near future.) [~dweiss]bq. I created a separate test repository (https://github.com/dweiss/lucene-jira-migration), with a subset of attachment blobs and an example issue (https://github.com/dweiss/lucene-jira-migration/issues/1) that links to them via gh-pages service URLs. Seems to work (mime types, etc.).Do we need a git repository at all? We won't need version control for the files. Is a file storage sufficient and easy to handle if we can have one?bq. This means that we'd have to either ask infra to create a separate repository for the ported attachments or keep those attachments in the main Lucene repository (and pay the price of an extra ~1GB of download size when doing a full clone).This is actually the main concern to me. Unfortunately I don't think I'll be able to explain our needs and request support from infra team. I'm sure I won't be able to be a good negotiator for this even if I want to. We need another person if we want to pursue pulling up all attachments from jira.
[GitHub] [lucene] madrob opened a new pull request, #989: Add back-compat indices for 8.11.2
madrob opened a new pull request, #989: URL: https://github.com/apache/lucene/pull/989 Regenerated the index manually, not using the wizard. Spent a lot of time trying to isolate the failures, but couldn't figure them out. New index seems to work but I would appreciate other folks testing it. Generated from a download of `lucene-8.11.2-src.tgz` with ant 1.9 and java 8. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-10557) Migrate to GitHub issue from Jira
Title: Message Title Tomoko Uchida edited a comment on LUCENE-10557 Re: Migrate to GitHub issue from Jira [~mikemccand]bq. This seems perhaps solvable, relatively quickly – the conversion tool is open-source right? Tables seem flaky ... what other markup? I can try to dive deep on this if I can make some time. Let's not rush this conversion.Besides tables, even simple bullet lists are broken. I haven't closely looked at it yet, but I suspect there may be problems in the source text (Jira dump). It could be easily fixed once we find the root cause. The script uses this converter https://github.com/catcombo/jira2markdown for PoC; if the cause of the broken markdowns is the tool's bug, there could be other tools or of course, we could write our own parser/converter from the jira markup spec.https://jira.atlassian.com/secure/WikiRendererHelpAction.jspa?section=all bq. I thought the "unofficial" migration API might support attachments? Or are there big problems with using that API?The unofficial import API does not support binaries, you can import only "texts" to GitHub with official or unofficial APIs. They have to be stored in other places, maybe outside the main repository (a file storage or another repository).bq. Let's not rush the decision ... others can try to push your PoC forwards too to improve the migration quality. This is worth the one-time investment. And hey, maybe we enable something that future Jira -> GitHub issues migrations can use.I understand we can't push others to make a decision though, a progress report could be useful since I think we have not reached any conclusion yet.As for "others can try to push your PoC forwards too to improve the migration quality", yes it could happen but to be honest I don't expect there are other people who want to be involved in this task.[~uschindler]bq. Once we have done this: Should we rewrite CHANGES.txt and replace all LUCENE- links to GITHUB# links?I'm not sure if it should be done. Just for your information the current {{changes2html.pl}} supports only Pull Requests, so the script should be changed if we want to mention GitHub issues in CHANGES. (I have little experience with perl, but I'll take a look if it's needed. Maybe we should also support issues near future.)[~dweiss]bq. I created a separate test repository (https://github.com/dweiss/lucene-jira-migration), with a subset of attachment blobs and an example issue (https://github.com/dweiss/lucene-jira-migration/issues/1) that links to them via gh-pages service URLs. Seems to work (mime types, etc.).Do we need a git repository at all? We won't need version control for the files. Is a file storage sufficient and easy to handle if we can have one?bq. This means that we'd have to either ask infra to create a separate repository for the ported attachments or keep those attachments in the main Lucene repository (and pay the price of an extra ~1GB of download size when doing a full clone).This is actually the main concern to me. Unfortunately I don't think I'll be able to explain our needs and request support from infra team. I'm sure I won't be able to be a good negotiator for this even if I want to. We need another person if we want to pursue pulling up all attachments from jira.
[GitHub] [lucene] gsmiller commented on a diff in pull request #974: LUCENE-10614: Properly support getTopChildren in RangeFacetCounts
gsmiller commented on code in PR #974: URL: https://github.com/apache/lucene/pull/974#discussion_r907916354 ## lucene/facet/src/java/org/apache/lucene/facet/range/RangeFacetCounts.java: ## @@ -232,20 +233,43 @@ public FacetResult getAllChildren(String dim, String... path) throws IOException return new FacetResult(dim, path, totCount, labelValues, labelValues.length); } - // The current getTopChildren method is not returning "top" ranges. Instead, it returns all - // user-provided ranges in - // the order the user specified them when instantiating. This concept is being introduced and - // supported in the - // getAllChildren functionality in LUCENE-10550. getTopChildren is temporarily calling - // getAllChildren to maintain its - // current behavior, and the current implementation will be replaced by an actual "top children" - // implementation - // in LUCENE-10614 - // TODO: fix getTopChildren in LUCENE-10614 @Override public FacetResult getTopChildren(int topN, String dim, String... path) throws IOException { validateTopN(topN); -return getAllChildren(dim, path); +validateDimAndPathForGetChildren(dim, path); + +int resultSize = Math.min(topN, counts.length); +PriorityQueue pq = +new PriorityQueue<>(resultSize) { + @Override + protected boolean lessThan(LabelAndValue a, LabelAndValue b) { +int cmp = Integer.compare(a.value.intValue(), b.value.intValue()); +if (cmp == 0) { + cmp = b.label.compareTo(a.label); +} +return cmp < 0; + } +}; + +for (int i = 0; i < counts.length; i++) { + if (pq.size() < resultSize) { +pq.add(new LabelAndValue(ranges[i].label, counts[i])); Review Comment: I wonder if we should only add to the pq when the count is > 0 to be consistent with other Facet implementations. What do you think? ## lucene/demo/src/java/org/apache/lucene/demo/facet/DistanceFacetsExample.java: ## @@ -212,7 +212,26 @@ public static Query getBoundingBoxQuery( } /** User runs a query and counts facets. */ - public FacetResult search() throws IOException { + public FacetResult searchAllChildren() throws IOException { + +FacetsCollector fc = searcher.search(new MatchAllDocsQuery(), new FacetsCollectorManager()); + +Facets facets = +new DoubleRangeFacetCounts( +"field", +getDistanceValueSource(), +fc, +getBoundingBoxQuery(ORIGIN_LATITUDE, ORIGIN_LONGITUDE, 10.0), +ONE_KM, +TWO_KM, +FIVE_KM, +TEN_KM); + +return facets.getAllChildren("field"); + } + + /** User runs a query and counts facets. */ + public FacetResult searchTopChildren() throws IOException { Review Comment: I'm not totally sold we need to demo the `getTopChildren` functionality. It feels like it will be a little obscure for range faceting to me. What do you think of just changing the existing example code in-place to use `getAllChildren` instead of `getTopChildren` since that's probably the more common use-case? Curious what you think though. Do you think we should demo `getTopChildren` as well? ## lucene/facet/src/java/org/apache/lucene/facet/range/RangeFacetCounts.java: ## @@ -232,20 +233,43 @@ public FacetResult getAllChildren(String dim, String... path) throws IOException return new FacetResult(dim, path, totCount, labelValues, labelValues.length); } - // The current getTopChildren method is not returning "top" ranges. Instead, it returns all - // user-provided ranges in - // the order the user specified them when instantiating. This concept is being introduced and - // supported in the - // getAllChildren functionality in LUCENE-10550. getTopChildren is temporarily calling - // getAllChildren to maintain its - // current behavior, and the current implementation will be replaced by an actual "top children" - // implementation - // in LUCENE-10614 - // TODO: fix getTopChildren in LUCENE-10614 @Override public FacetResult getTopChildren(int topN, String dim, String... path) throws IOException { validateTopN(topN); -return getAllChildren(dim, path); +validateDimAndPathForGetChildren(dim, path); + +int resultSize = Math.min(topN, counts.length); +PriorityQueue pq = +new PriorityQueue<>(resultSize) { + @Override + protected boolean lessThan(LabelAndValue a, LabelAndValue b) { +int cmp = Integer.compare(a.value.intValue(), b.value.intValue()); +if (cmp == 0) { + cmp = b.label.compareTo(a.label); +} +return cmp < 0; + } +}; + +for (int i = 0; i < counts.length; i++) { + if (pq.size() < resultSize) { +pq.add(new LabelAndValue(ranges[i].label, counts[i])); + } else { +int topValue = pq.top().value.intVa
[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira
Title: Message Title Tomoko Uchida commented on LUCENE-10557 Re: Migrate to GitHub issue from Jira As for the attachments, just a rough idea... perhaps we could have these files in our personal space under "https://home.apache.org/~user"? I have never used this space so "https://home.apache.org/~tomoko" is still empty. I don't know what maximum storage size is allowed per user, if it is too small to store whole data, we could distribute them to multiple accounts. Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[GitHub] [lucene] LuXugang merged pull request #967: LUCENE-10623: Error implementation of docValueCount for SortingSortedSetDocValues
LuXugang merged PR #967: URL: https://github.com/apache/lucene/pull/967 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] LuXugang commented on a diff in pull request #967: LUCENE-10623: Error implementation of docValueCount for SortingSortedSetDocValues
LuXugang commented on code in PR #967: URL: https://github.com/apache/lucene/pull/967#discussion_r908003266 ## lucene/core/src/java/org/apache/lucene/index/SortedSetDocValuesWriter.java: ## @@ -415,34 +419,45 @@ public BytesRef lookupOrd(long ord) throws IOException { public long getValueCount() { return in.getValueCount(); } + +private void initCount() { + assert docID >= 0; + ordUpto = ords.offsets[docID] - 1; + count = (int) ords.docValueCounts.get(docID); + limit = ordUpto + count; +} } static final class DocOrds { final long[] offsets; final PackedLongValues ords; +final GrowableWriter docValueCounts; + +public static final int START_BITS_PER_VALUE = 2; DocOrds( int maxDoc, Sorter.DocMap sortMap, SortedSetDocValues oldValues, -float acceptableOverheadRatio) +float acceptableOverheadRatio, +int bitsPerValue) throws IOException { offsets = new long[maxDoc]; PackedLongValues.Builder builder = PackedLongValues.packedBuilder(acceptableOverheadRatio); - long ordOffset = 1; // 0 marks docs with no values + docValueCounts = new GrowableWriter(bitsPerValue, maxDoc, acceptableOverheadRatio); + long ordOffset = 1; Review Comment: Thanks, I saw `SortingSortedNumericDocValues` has the same logic, maybe we could fix it on a separate issues . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10623) Error implementation of docValueCount for SortingSortedSetDocValues
Title: Message Title ASF subversion and git services commented on LUCENE-10623 Re: Error implementation of docValueCount for SortingSortedSetDocValues Commit d8fb47b67480afe5fffca68f1565774ef6874d60 in lucene's branch refs/heads/main from Lu Xugang [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=d8fb47b6748 ] LUCENE-10623: Error implementation of docValueCount for SortingSortedSetDocValues (#967) Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[jira] [Commented] (LUCENE-10603) Improve iteration of ords for SortedSetDocValues
Title: Message Title Lu Xugang commented on LUCENE-10603 Re: Improve iteration of ords for SortedSetDocValues Hi Greg Miller ,LUCENE-10623 was resolved, we could continue to work on this issue If you have some free time recently. Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[jira] [Resolved] (LUCENE-10623) Error implementation of docValueCount for SortingSortedSetDocValues
Title: Message Title Lu Xugang resolved as Fixed Lucene - Core / LUCENE-10623 Error implementation of docValueCount for SortingSortedSetDocValues Change By: Lu Xugang Resolution: Fixed Status: Open Resolved Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[GitHub] [lucene] LuXugang merged pull request #990: Add entry
LuXugang merged PR #990: URL: https://github.com/apache/lucene/pull/990 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zacharymorn commented on pull request #972: LUCENE-10480: Use BMM scorer for 2 clauses disjunction
zacharymorn commented on PR #972: URL: https://github.com/apache/lucene/pull/972#issuecomment-1168197720 > > I feel the effect would be similar? > > Indeed, sorry I had misread your code! > No worry, thanks still for the suggestion! > > No, it shouldn't matter. Bulk scorers sometimes help yield better performance because it's easier for them to amortize computation across docs, but if they don't yield better performance, there's no point in using a bulk scorer instead of a regular scorer. Ok I see, makes sense. > I agree that it looks like a great speedup, we should get this in! The benchmark only tests performance of top-level disjunctions of term queries that have two clauses. I'd be curious to get performance numbers for queries like the below ones to see if we need to fine-tune a bit more when this new scorer gets used. Note that I don't think we need to get the performance better for all these queries to merge the change, we could start by only using this new scorer for the (common) case of a top-level disjunction of 2 term queries, and later see if this scorer can handle more disjunctions. > > ``` > OrAndHigMedAndHighMed: (+including +looking) (+date +finished) # disjunction of conjunctions, which don't have as good score upper bounds as term queries > OrHighPhraseHighPhrase: "united states" "new york" # disjunction of phrase queries, which don't have as good score upper bounds as term queries and are slow to advance > AndHighOrMedMed: +be +(mostly interview) # disjunction within conjunction that leads iteration > AndMedOrHighHigh: +interview +(at united) # disjunction within conjunction that doesn't lead iteration > ``` Sounds good! I have run these queries through benchmark and the results look somewhat consistent: ``` TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value OrHighPhraseHighPhrase 28.89 (8.7%) 24.19 (4.7%) -16.3% ( -27% - -3%) 0.000 AndHighOrMedMed 101.24 (6.6%) 101.09 (3.0%) -0.1% ( -9% - 10%) 0.927 AndMedOrHighHigh 81.44 (6.3%) 81.62 (3.7%)0.2% ( -9% - 10%) 0.895 OrAndHigMedAndHighMed 128.26 (7.0%) 136.94 (3.7%)6.8% ( -3% - 18%) 0.000 PKLookup 221.47 (11.7%) 236.93 (9.1%)7.0% ( -12% - 31%) 0.035 ``` ``` TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value OrHighPhraseHighPhrase 27.73 (9.1%) 23.73 (4.6%) -14.4% ( -25% -0%) 0.000 AndHighOrMedMed 97.09 (13.1%) 99.30 (4.3%)2.3% ( -13% - 22%) 0.462 AndMedOrHighHigh 75.87 (15.2%) 80.04 (5.7%)5.5% ( -13% - 31%) 0.128 PKLookup 219.70 (15.7%) 238.75 (12.4%)8.7% ( -16% - 43%) 0.053 OrAndHigMedAndHighMed 121.83 (13.7%) 134.79 (4.4%) 10.6% ( -6% - 33%) 0.001 ``` ``` TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value OrHighPhraseHighPhrase 27.42 (16.2%) 23.99 (4.0%) -12.5% ( -28% -9%) 0.001 AndHighOrMedMed 96.61 (15.8%) 100.09 (3.6%)3.6% ( -13% - 27%) 0.321 AndMedOrHighHigh 75.72 (16.8%) 79.53 (4.9%)5.0% ( -14% - 32%) 0.200 OrAndHigMedAndHighMed 122.33 (16.9%) 136.60 (4.5%) 11.7% ( -8% - 39%) 0.003 PKLookup 207.94 (21.6%) 233.10 (16.5%) 12.1% ( -21% - 63%) 0.046 ``` Looks like we may need to restrict the scorer to only term queries, or improve it for phrase queries? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zacharymorn commented on pull request #972: LUCENE-10480: Use BMM scorer for 2 clauses disjunction
zacharymorn commented on PR #972: URL: https://github.com/apache/lucene/pull/972#issuecomment-1168202563 For `OrHighPhraseHighPhrase`, the JFR CPU sampling result looks similar, but with the modified version calling `advanceShallow` more often, suggesting the BMM implementation might be doing boundary adjustment more often? Modified: ``` PERCENT CPU SAMPLES STACK 8.63% 1389 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsPostingsEnum#advance() 5.24% 843 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsPostingsEnum#advanceShallow() 3.18% 511 java.nio.DirectByteBuffer#get() 2.79% 449 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader#findFirstGreater() 2.72% 438 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsPostingsEnum#refillPositions() 2.48% 399 jdk.internal.misc.ScopedMemoryAccess#getByteInternal() 2.19% 353 org.apache.lucene.search.ExactPhraseMatcher$1#getImpacts() 2.11% 339 org.apache.lucene.search.PhraseScorer$1#matches() 2.06% 331 org.apache.lucene.codecs.lucene90.Lucene90ScoreSkipReader#skipTo() 1.83% 294 org.apache.lucene.search.ExactPhraseMatcher$1$1#getImpacts() 1.63% 263 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsPostingsEnum#nextPosition() 1.49% 240 org.apache.lucene.store.ByteBufferGuard#getByte() 1.24% 200 org.apache.lucene.search.ExactPhraseMatcher#advancePosition() 1.24% 200 org.apache.lucene.search.ConjunctionDISI#doNext() 1.21% 194 java.util.zip.Inflater#inflateBytesBytes() 1.18% 190 org.apache.lucene.search.ExactPhraseMatcher#nextMatch() 1.13% 182 org.apache.lucene.store.DataInput#readVLong() 1.12% 181 org.apache.lucene.search.ExactPhraseMatcher$1#advanceShallow() 1.11% 178 org.apache.lucene.search.ImpactsDISI#advanceShallow() 1.07% 172 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsPostingsEnum#skipPositions() 0.89% 143 java.lang.Class#isArray() 0.81% 131 org.apache.lucene.codecs.lucene90.ForUtil#expand8() 0.75% 121 org.apache.lucene.search.TwoPhaseIterator$TwoPhaseIteratorAsDocIdSetIterator#doNext() 0.74% 119 org.apache.lucene.codecs.lucene90.PForUtil#innerPrefixSum32() 0.71% 115 org.apache.lucene.search.ConjunctionDISI#docID() 0.71% 115 org.apache.lucene.codecs.lucene90.ForUtil#shiftLongs() 0.70% 113 org.apache.lucene.search.PhraseScorer#docID() 0.70% 112 org.apache.lucene.codecs.lucene90.PForUtil#decode() 0.68% 110 org.apache.lucene.search.ExactPhraseMatcher#maxFreq() 0.68% 109 org.apache.lucene.search.ImpactsDISI#docID() ``` Baseline: ``` PERCENT CPU SAMPLES STACK 8.66% 1196 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsPostingsEnum#advance() 3.88% 536 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader#findFirstGreater() 2.96% 409 java.nio.DirectByteBuffer#get() 2.78% 384 org.apache.lucene.search.ExactPhraseMatcher$1$1#getImpacts() 2.50% 345 org.apache.lucene.codecs.lucene90.Lucene90ScoreSkipReader#skipTo() 2.46% 340 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsPostingsEnum#advanceShallow() 1.73% 239 org.apache.lucene.search.PhraseScorer$1#matches() 1.72% 237 org.apache.lucene.search.ExactPhraseMatcher$1#getImpacts() 1.48% 204 java.util.zip.Inflater#inflateBytesBytes() 1.48% 204 org.apache.lucene.codecs.lucene90.ForUtil#expand8() 1.23% 170 jdk.internal.misc.ScopedMemoryAccess#getByteInternal() 1.21% 167 org.apache.lucene.search.ConjunctionDISI#doNext() 1.20% 166 org.apache.lucene.codecs.lucene90.PForUtil#innerPrefixSum32() 1.19% 165 org.apache.lucene.store.ByteBufferGuard#getByte() 1.12% 155 org.apache.lucene.codecs.lucene90.PForUtil#prefixSum32() 1.07% 148 java.lang.Class#isArray() 1.06% 147 org.apache.lucene.codecs.lucene90.PForUtil#expand32() 0.98% 135 org.apache.lucene.codecs.lucene90.PForUtil#decode() 0.96% 133 org.apache.lucene.search.ConjunctionDISI#docID() 0.91% 125 org.apache.lucene.search.Exac
[GitHub] [lucene] Yuti-G commented on a diff in pull request #974: LUCENE-10614: Properly support getTopChildren in RangeFacetCounts
Yuti-G commented on code in PR #974: URL: https://github.com/apache/lucene/pull/974#discussion_r908038451 ## lucene/demo/src/java/org/apache/lucene/demo/facet/DistanceFacetsExample.java: ## @@ -212,7 +212,26 @@ public static Query getBoundingBoxQuery( } /** User runs a query and counts facets. */ - public FacetResult search() throws IOException { + public FacetResult searchAllChildren() throws IOException { + +FacetsCollector fc = searcher.search(new MatchAllDocsQuery(), new FacetsCollectorManager()); + +Facets facets = +new DoubleRangeFacetCounts( +"field", +getDistanceValueSource(), +fc, +getBoundingBoxQuery(ORIGIN_LATITUDE, ORIGIN_LONGITUDE, 10.0), +ONE_KM, +TWO_KM, +FIVE_KM, +TEN_KM); + +return facets.getAllChildren("field"); + } + + /** User runs a query and counts facets. */ + public FacetResult searchTopChildren() throws IOException { Review Comment: I do not have a strong opinion about this. GetAllChildren does make more sense for range faceting. I will replace the getTopChildren with getAllChildren in the original demo. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10623) Error implementation of docValueCount for SortingSortedSetDocValues
Title: Message Title ASF subversion and git services commented on LUCENE-10623 Re: Error implementation of docValueCount for SortingSortedSetDocValues Commit fb261e6ff48e5a57d9dff7fd960e21ec2634294d in lucene's branch refs/heads/branch_9x from Lu Xugang [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=fb261e6ff48 ] LUCENE-10623: Error implementation of docValueCount for SortingSortedSetDocValues (#967) Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[GitHub] [lucene] alessandrobenedetti commented on pull request #926: VectorSimilarityFunction reverse removal
alessandrobenedetti commented on PR #926: URL: https://github.com/apache/lucene/pull/926#issuecomment-1168280355 > "I was also wondering if you have addressed the previous Mike S.'s https://github.com/apache/lucene/pull/926#issuecomment-1164418508. I assume that your train files (e.g.sift-128-euclidean.hdf5-test ) are not in hdf5 format, but just called like this" yes @mayya-sharipova , the latest benchmarks reported used the pre-processing @msokolov suggested. That's just the name of the file that's automatically generated by that script :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] Yuti-G commented on a diff in pull request #974: LUCENE-10614: Properly support getTopChildren in RangeFacetCounts
Yuti-G commented on code in PR #974: URL: https://github.com/apache/lucene/pull/974#discussion_r908084991 ## lucene/facet/src/java/org/apache/lucene/facet/range/RangeFacetCounts.java: ## @@ -232,20 +233,43 @@ public FacetResult getAllChildren(String dim, String... path) throws IOException return new FacetResult(dim, path, totCount, labelValues, labelValues.length); } - // The current getTopChildren method is not returning "top" ranges. Instead, it returns all - // user-provided ranges in - // the order the user specified them when instantiating. This concept is being introduced and - // supported in the - // getAllChildren functionality in LUCENE-10550. getTopChildren is temporarily calling - // getAllChildren to maintain its - // current behavior, and the current implementation will be replaced by an actual "top children" - // implementation - // in LUCENE-10614 - // TODO: fix getTopChildren in LUCENE-10614 @Override public FacetResult getTopChildren(int topN, String dim, String... path) throws IOException { validateTopN(topN); -return getAllChildren(dim, path); +validateDimAndPathForGetChildren(dim, path); + +int resultSize = Math.min(topN, counts.length); +PriorityQueue pq = +new PriorityQueue<>(resultSize) { + @Override + protected boolean lessThan(LabelAndValue a, LabelAndValue b) { +int cmp = Integer.compare(a.value.intValue(), b.value.intValue()); +if (cmp == 0) { + cmp = b.label.compareTo(a.label); +} +return cmp < 0; + } +}; + +for (int i = 0; i < counts.length; i++) { + if (pq.size() < resultSize) { +pq.add(new LabelAndValue(ranges[i].label, counts[i])); Review Comment: In this case, I propose we also change the `getAllChildren` functionality in RangeFacetCounts to populate LabelAndValue only when count is > 0 to be consistent with `getAllChildren` in other Facet implementations. Please let me know what you think. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] Yuti-G commented on a diff in pull request #974: LUCENE-10614: Properly support getTopChildren in RangeFacetCounts
Yuti-G commented on code in PR #974: URL: https://github.com/apache/lucene/pull/974#discussion_r908084991 ## lucene/facet/src/java/org/apache/lucene/facet/range/RangeFacetCounts.java: ## @@ -232,20 +233,43 @@ public FacetResult getAllChildren(String dim, String... path) throws IOException return new FacetResult(dim, path, totCount, labelValues, labelValues.length); } - // The current getTopChildren method is not returning "top" ranges. Instead, it returns all - // user-provided ranges in - // the order the user specified them when instantiating. This concept is being introduced and - // supported in the - // getAllChildren functionality in LUCENE-10550. getTopChildren is temporarily calling - // getAllChildren to maintain its - // current behavior, and the current implementation will be replaced by an actual "top children" - // implementation - // in LUCENE-10614 - // TODO: fix getTopChildren in LUCENE-10614 @Override public FacetResult getTopChildren(int topN, String dim, String... path) throws IOException { validateTopN(topN); -return getAllChildren(dim, path); +validateDimAndPathForGetChildren(dim, path); + +int resultSize = Math.min(topN, counts.length); +PriorityQueue pq = +new PriorityQueue<>(resultSize) { + @Override + protected boolean lessThan(LabelAndValue a, LabelAndValue b) { +int cmp = Integer.compare(a.value.intValue(), b.value.intValue()); +if (cmp == 0) { + cmp = b.label.compareTo(a.label); +} +return cmp < 0; + } +}; + +for (int i = 0; i < counts.length; i++) { + if (pq.size() < resultSize) { +pq.add(new LabelAndValue(ranges[i].label, counts[i])); Review Comment: In this case, I propose we also change the `getAllChildren` functionality in RangeFacetCounts to populate LabelAndValue only when count is > 0 to be consistent with `getAllChildren` in other Facet implementations. Since if top-N is equal to all, we should return the same results from getAllChildren and getTopChildren. Please let me know what you think. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10557) Migrate to GitHub issue from Jira
Title: Message Title Dawid Weiss commented on LUCENE-10557 Re: Migrate to GitHub issue from Jira > Do we need a git repository at all? We won't need version control for the files. Is a file storage sufficient and easy to handle if we can have one? My hope was that these attachments could be stored in the primary git repository for convenience - keeping the historical artifacts together and having them served for free via github's infrastructure. It's also just convenient as it can be modified/ updated by multiple people (and those same people can freeze the repository for updates, once the migration is complete). Having those artifacts elsewhere (on home.apache.org) lacks some of these conveniences but it's fine too, of course. Also, I don't think infra will have any problem in adding a repository called "lucene-archives" or something like this. I can ask if we decide to push in this direction. Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)