[GitHub] [lucene] rmuir commented on pull request #874: LUCENE-10471 Increse max dims for vectors to 2048
rmuir commented on PR #874: URL: https://github.com/apache/lucene/pull/874#issuecomment-1286914129 the performance with e.g. 768 is incredibly painful. hours and hours to index just 1M documents. Already doesn't scale with the current limit! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] risdenk commented on pull request #1728: SOLR-14596: equals/hashCode for common SolrRequest classes
risdenk commented on PR #1728: URL: https://github.com/apache/lucene-solr/pull/1728#issuecomment-1287203950 @gerlowskija is this still valid? It looks like it might be modulo some merge conflicts in main. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] risdenk commented on pull request #2206: BlobDirectoryFactory correctly deletes directories in the blob store
risdenk commented on PR #2206: URL: https://github.com/apache/lucene-solr/pull/2206#issuecomment-1287219142 @bruno-roustant / @dsmiley is this still valid? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] risdenk commented on pull request #2137: SOLR-14251 Add option skipFreeSpaceCheck to skip checking for availble disk space before splitting shards. Useful with shared file syste
risdenk commented on PR #2137: URL: https://github.com/apache/lucene-solr/pull/2137#issuecomment-1287219774 This is most likely still valid. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] epugh commented on pull request #802: SOLR-13626: document the SystemInfoHandler
epugh commented on PR #802: URL: https://github.com/apache/lucene-solr/pull/802#issuecomment-1287316769 This has been replaced by https://github.com/apache/solr/pull/1104. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] epugh closed pull request #802: SOLR-13626: document the SystemInfoHandler
epugh closed pull request #802: SOLR-13626: document the SystemInfoHandler URL: https://github.com/apache/lucene-solr/pull/802 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] risdenk commented on a diff in pull request #116: SOLR-9775 fixed NPEs
risdenk commented on code in PR #116: URL: https://github.com/apache/lucene-solr/pull/116#discussion_r1002129826 ## solr/core/src/java/org/apache/solr/search/QueryResultKey.java: ## @@ -49,12 +49,14 @@ public QueryResultKey(Query query, List filters, Sort sort, int nc_flags) for (Query filt : filters) // NOTE: simple summation used here so keys with the same filters but in // different orders get the same hashCode -h += filt.hashCode(); +if (filt != null) + h += filt.hashCode(); } sfields = (this.sort !=null) ? this.sort.getSort() : defaultSort; for (SortField sf : sfields) { - h = h*29 + sf.hashCode(); + if (sf != null) +h = h*29 + sf.hashCode(); Review Comment: add `{}` around for if statement ## solr/core/src/java/org/apache/solr/search/QueryResultKey.java: ## @@ -49,12 +49,14 @@ public QueryResultKey(Query query, List filters, Sort sort, int nc_flags) for (Query filt : filters) // NOTE: simple summation used here so keys with the same filters but in // different orders get the same hashCode -h += filt.hashCode(); +if (filt != null) + h += filt.hashCode(); Review Comment: add `{}` around for if statement -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mdmarshmallow opened a new issue, #11869: Add RangeOnRangeFacetCounts
mdmarshmallow opened a new issue, #11869: URL: https://github.com/apache/lucene/issues/11869 ### Description We currently have `LongRangeFacetCounts` and `DoubleRangeFacetCounts`, which counts facets based on doc values points that fall into a given list of range. It would be nice to have a corresponding `RangeOnRangeFacetCounts` that count facets based on indexed ranges (`LongRangeDocValuesField` for example) given a list of ranges. We can let the user supply a `RangeFieldQuery#QueryType` to determine how the range is counted (like in `LongRangeSlowRangeQuery`. I know that currently, the `RangeFacetCounts` pack the provided range into a data structure to enable faster counting, but I think for the first iteration we could probably skip that optimization and just do a simple linear scan. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mdmarshmallow commented on issue #11869: Add RangeOnRangeFacetCounts
mdmarshmallow commented on issue #11869: URL: https://github.com/apache/lucene/issues/11869#issuecomment-1287402896 I plan on working on this issue -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] epugh closed pull request #116: SOLR-9775 fixed NPEs
epugh closed pull request #116: SOLR-9775 fixed NPEs URL: https://github.com/apache/lucene-solr/pull/116 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] epugh commented on pull request #116: SOLR-9775 fixed NPEs
epugh commented on PR #116: URL: https://github.com/apache/lucene-solr/pull/116#issuecomment-1287413360 See https://github.com/apache/solr/pull/1107 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] gezapeti commented on pull request #2137: SOLR-14251 Add option skipFreeSpaceCheck to skip checking for availble disk space before splitting shards. Useful with shared file syst
gezapeti commented on PR #2137: URL: https://github.com/apache/lucene-solr/pull/2137#issuecomment-1287434191 I'll create a PR for the solr repo instead of this one. Should I create a PR for Solr 8 as well? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] gezapeti commented on pull request #2137: SOLR-14251 Add option skipFreeSpaceCheck to skip checking for availble disk space before splitting shards. Useful with shared file syst
gezapeti commented on PR #2137: URL: https://github.com/apache/lucene-solr/pull/2137#issuecomment-1287484266 Ohh, Solr 8x is not maintained anymore. I've filed against main https://github.com/apache/solr/pull/1108 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] gezapeti closed pull request #2137: SOLR-14251 Add option skipFreeSpaceCheck to skip checking for availble disk space before splitting shards. Useful with shared file systems li
gezapeti closed pull request #2137: SOLR-14251 Add option skipFreeSpaceCheck to skip checking for availble disk space before splitting shards. Useful with shared file systems like HDFS URL: https://github.com/apache/lucene-solr/pull/2137 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mikemccand commented on a diff in pull request #11867: Add monster test that indexes 1M vectors
mikemccand commented on code in PR #11867: URL: https://github.com/apache/lucene/pull/11867#discussion_r1001623486 ## lucene/core/src/test/org/apache/lucene/document/TestManyKnnVectors.java: ## @@ -61,11 +61,13 @@ @Monster("takes ~2 hours and needs 2GB heap") public class TestManyKnnVectors extends LuceneTestCase { public void testLargeSegment() throws Exception { -// Make sure to use the default codec instead of a random one -IndexWriterConfig iwc = newIndexWriterConfig().setCodec(TestUtil.getDefaultCodec()); +IndexWriterConfig iwc = new IndexWriterConfig(); +iwc.setCodec(TestUtil.getDefaultCodec()); // Make sure to use the default codec instead of a random one +iwc.setRAMBufferSizeMB(3_000); // Use a 3GB buffer to create a single large segment Review Comment: > And I think its good to keep monster tests less monstrous. LOL -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mikemccand commented on pull request #11815: Support deletions in rearrange (#11814)
mikemccand commented on PR #11815: URL: https://github.com/apache/lucene/pull/11815#issuecomment-1286757349 > This change is technically not backwards compatible. Not just because of changes to the rearrange API, but also because now we no longer make the deletes disappear from the rearranged index. They become live instead. I think this is OK -- this is a new tool, not heavily used. And keeping the deletes live is the purpose of this change :) And `IndexRearranger` is already marked with `@lucene.experimental`, warning users that we might make non-backwards-compatible changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mikemccand commented on pull request #11815: Support deletions in rearrange (#11814)
mikemccand commented on PR #11815: URL: https://github.com/apache/lucene/pull/11815#issuecomment-1286765622 What happens if the delete selector deletes 100% of the documents in a segment? `IndexWriter` would normally drop such segments ... does it do so in this case too? Indeed, `testDeleteEverything` seems to confirm it does, great! Maybe explain this caveat in the javadocs? I.e. one cannot produce a segment geometry that includes segments with 100% deleted documents. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mikemccand commented on a diff in pull request #11815: Support deletions in rearrange (#11814)
mikemccand commented on code in PR #11815: URL: https://github.com/apache/lucene/pull/11815#discussion_r1001630980 ## lucene/misc/src/java/org/apache/lucene/misc/index/IndexRearranger.java: ## @@ -139,6 +203,47 @@ private static void addOneSegment( writer.addIndexes(readers); } + private static void applyDeletes( + IndexWriter writer, IndexReader reader, DocumentSelector selector) + throws ExecutionException, InterruptedException { +if (selector == null) { + // There are no deletes to be applied + return; +} + +ExecutorService executor = +Executors.newFixedThreadPool( +Math.min(Runtime.getRuntime().availableProcessors(), reader.leaves().size()), +new NamedThreadFactory("rearranger")); +ArrayList> futures = new ArrayList<>(); + +for (LeafReaderContext context : reader.leaves()) { + Callable applyDeletesToSegment = + () -> { +applyDeletesToOneSegment(writer, (CodecReader) context.reader(), selector); +return null; + }; + futures.add(executor.submit(applyDeletesToSegment)); +} + +for (Future future : futures) { + future.get(); +} +executor.shutdown(); + } + + private static void applyDeletesToOneSegment( + IndexWriter writer, CodecReader segmentReader, DocumentSelector selector) throws IOException { +Bits deletedDocs = selector.getFilteredDocs(segmentReader); +for (int i = 0; i < segmentReader.maxDoc(); ++i) { + if (deletedDocs.get(i)) { +if (writer.tryDeleteDocument(segmentReader, i) == -1) { + throw new IllegalStateException("tryDeleteDocument failed and there's no plan B"); Review Comment: LOL. Fortunately, as `tryDeleteDocument` is currently implemented today, it should never fail, since you have disabled merging in this writer. ## lucene/misc/src/java/org/apache/lucene/misc/index/IndexRearranger.java: ## @@ -139,6 +203,47 @@ private static void addOneSegment( writer.addIndexes(readers); } + private static void applyDeletes( + IndexWriter writer, IndexReader reader, DocumentSelector selector) + throws ExecutionException, InterruptedException { +if (selector == null) { + // There are no deletes to be applied + return; +} + +ExecutorService executor = +Executors.newFixedThreadPool( +Math.min(Runtime.getRuntime().availableProcessors(), reader.leaves().size()), +new NamedThreadFactory("rearranger")); +ArrayList> futures = new ArrayList<>(); + +for (LeafReaderContext context : reader.leaves()) { + Callable applyDeletesToSegment = + () -> { +applyDeletesToOneSegment(writer, (CodecReader) context.reader(), selector); +return null; + }; + futures.add(executor.submit(applyDeletesToSegment)); +} + +for (Future future : futures) { + future.get(); +} +executor.shutdown(); + } + + private static void applyDeletesToOneSegment( + IndexWriter writer, CodecReader segmentReader, DocumentSelector selector) throws IOException { +Bits deletedDocs = selector.getFilteredDocs(segmentReader); +for (int i = 0; i < segmentReader.maxDoc(); ++i) { Review Comment: Could we rename `i` to `docid`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org