[GitHub] [lucene] rmuir commented on pull request #874: LUCENE-10471 Increse max dims for vectors to 2048

2022-10-21 Thread GitBox


rmuir commented on PR #874:
URL: https://github.com/apache/lucene/pull/874#issuecomment-1286914129

   the performance with e.g. 768 is incredibly painful. hours and hours to 
index just 1M documents. Already doesn't scale with the current limit! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] risdenk commented on pull request #1728: SOLR-14596: equals/hashCode for common SolrRequest classes

2022-10-21 Thread GitBox


risdenk commented on PR #1728:
URL: https://github.com/apache/lucene-solr/pull/1728#issuecomment-1287203950

   @gerlowskija is this still valid? It looks like it might be modulo some 
merge conflicts in main.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] risdenk commented on pull request #2206: BlobDirectoryFactory correctly deletes directories in the blob store

2022-10-21 Thread GitBox


risdenk commented on PR #2206:
URL: https://github.com/apache/lucene-solr/pull/2206#issuecomment-1287219142

   @bruno-roustant / @dsmiley is this still valid?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] risdenk commented on pull request #2137: SOLR-14251 Add option skipFreeSpaceCheck to skip checking for availble disk space before splitting shards. Useful with shared file syste

2022-10-21 Thread GitBox


risdenk commented on PR #2137:
URL: https://github.com/apache/lucene-solr/pull/2137#issuecomment-1287219774

   This is most likely still valid.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] epugh commented on pull request #802: SOLR-13626: document the SystemInfoHandler

2022-10-21 Thread GitBox


epugh commented on PR #802:
URL: https://github.com/apache/lucene-solr/pull/802#issuecomment-1287316769

   This has been replaced by https://github.com/apache/solr/pull/1104.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] epugh closed pull request #802: SOLR-13626: document the SystemInfoHandler

2022-10-21 Thread GitBox


epugh closed pull request #802: SOLR-13626: document the SystemInfoHandler
URL: https://github.com/apache/lucene-solr/pull/802


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] risdenk commented on a diff in pull request #116: SOLR-9775 fixed NPEs

2022-10-21 Thread GitBox


risdenk commented on code in PR #116:
URL: https://github.com/apache/lucene-solr/pull/116#discussion_r1002129826


##
solr/core/src/java/org/apache/solr/search/QueryResultKey.java:
##
@@ -49,12 +49,14 @@ public QueryResultKey(Query query, List filters, 
Sort sort, int nc_flags)
   for (Query filt : filters)
 // NOTE: simple summation used here so keys with the same filters but 
in
 // different orders get the same hashCode
-h += filt.hashCode();
+if (filt != null)
+  h += filt.hashCode();
 }
 
 sfields = (this.sort !=null) ? this.sort.getSort() : defaultSort;
 for (SortField sf : sfields) {
-  h = h*29 + sf.hashCode();
+  if (sf != null)
+h = h*29 + sf.hashCode();

Review Comment:
   add `{}` around for if statement



##
solr/core/src/java/org/apache/solr/search/QueryResultKey.java:
##
@@ -49,12 +49,14 @@ public QueryResultKey(Query query, List filters, 
Sort sort, int nc_flags)
   for (Query filt : filters)
 // NOTE: simple summation used here so keys with the same filters but 
in
 // different orders get the same hashCode
-h += filt.hashCode();
+if (filt != null)
+  h += filt.hashCode();

Review Comment:
   add `{}` around for if statement



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mdmarshmallow opened a new issue, #11869: Add RangeOnRangeFacetCounts

2022-10-21 Thread GitBox


mdmarshmallow opened a new issue, #11869:
URL: https://github.com/apache/lucene/issues/11869

   ### Description
   
   We currently have `LongRangeFacetCounts` and `DoubleRangeFacetCounts`, which 
counts facets based on doc values points that fall into a given list of range. 
It would be nice to have a corresponding `RangeOnRangeFacetCounts` that count 
facets based on indexed ranges (`LongRangeDocValuesField` for example) given a 
list of ranges. We can let the user supply a `RangeFieldQuery#QueryType` to 
determine how the range is counted (like in `LongRangeSlowRangeQuery`.
   
   I know that currently, the `RangeFacetCounts` pack the provided range into a 
data structure to enable faster counting, but I think for the first iteration 
we could probably skip that optimization and just do a simple linear scan. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mdmarshmallow commented on issue #11869: Add RangeOnRangeFacetCounts

2022-10-21 Thread GitBox


mdmarshmallow commented on issue #11869:
URL: https://github.com/apache/lucene/issues/11869#issuecomment-1287402896

   I plan on working on this issue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] epugh closed pull request #116: SOLR-9775 fixed NPEs

2022-10-21 Thread GitBox


epugh closed pull request #116: SOLR-9775 fixed NPEs
URL: https://github.com/apache/lucene-solr/pull/116


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] epugh commented on pull request #116: SOLR-9775 fixed NPEs

2022-10-21 Thread GitBox


epugh commented on PR #116:
URL: https://github.com/apache/lucene-solr/pull/116#issuecomment-1287413360

   See https://github.com/apache/solr/pull/1107


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] gezapeti commented on pull request #2137: SOLR-14251 Add option skipFreeSpaceCheck to skip checking for availble disk space before splitting shards. Useful with shared file syst

2022-10-21 Thread GitBox


gezapeti commented on PR #2137:
URL: https://github.com/apache/lucene-solr/pull/2137#issuecomment-1287434191

   I'll create a PR for the solr repo instead of this one. Should I create a PR 
for Solr 8 as well?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] gezapeti commented on pull request #2137: SOLR-14251 Add option skipFreeSpaceCheck to skip checking for availble disk space before splitting shards. Useful with shared file syst

2022-10-21 Thread GitBox


gezapeti commented on PR #2137:
URL: https://github.com/apache/lucene-solr/pull/2137#issuecomment-1287484266

   Ohh, Solr 8x is not maintained anymore. I've filed  against main 
https://github.com/apache/solr/pull/1108


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] gezapeti closed pull request #2137: SOLR-14251 Add option skipFreeSpaceCheck to skip checking for availble disk space before splitting shards. Useful with shared file systems li

2022-10-21 Thread GitBox


gezapeti closed pull request #2137: SOLR-14251 Add option skipFreeSpaceCheck to 
skip checking for availble disk space before splitting shards. Useful with 
shared file systems like HDFS
URL: https://github.com/apache/lucene-solr/pull/2137


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mikemccand commented on a diff in pull request #11867: Add monster test that indexes 1M vectors

2022-10-21 Thread GitBox


mikemccand commented on code in PR #11867:
URL: https://github.com/apache/lucene/pull/11867#discussion_r1001623486


##
lucene/core/src/test/org/apache/lucene/document/TestManyKnnVectors.java:
##
@@ -61,11 +61,13 @@
 @Monster("takes ~2 hours and needs 2GB heap")
 public class TestManyKnnVectors extends LuceneTestCase {
   public void testLargeSegment() throws Exception {
-// Make sure to use the default codec instead of a random one
-IndexWriterConfig iwc = 
newIndexWriterConfig().setCodec(TestUtil.getDefaultCodec());
+IndexWriterConfig iwc = new IndexWriterConfig();
+iwc.setCodec(TestUtil.getDefaultCodec()); // Make sure to use the default 
codec instead of a random one
+iwc.setRAMBufferSizeMB(3_000); // Use a 3GB buffer to create a single 
large segment

Review Comment:
   > And I think its good to keep monster tests less monstrous.
   
   LOL



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mikemccand commented on pull request #11815: Support deletions in rearrange (#11814)

2022-10-21 Thread GitBox


mikemccand commented on PR #11815:
URL: https://github.com/apache/lucene/pull/11815#issuecomment-1286757349

   > This change is technically not backwards compatible. Not just because of 
changes to the rearrange API, but also because now we no longer make the 
deletes disappear from the rearranged index. They become live instead.
   
   I think this is OK -- this is a new tool, not heavily used.  And keeping the 
deletes live is the purpose of this change :)
   
   And `IndexRearranger` is already marked with `@lucene.experimental`, warning 
users that we might make non-backwards-compatible changes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mikemccand commented on pull request #11815: Support deletions in rearrange (#11814)

2022-10-21 Thread GitBox


mikemccand commented on PR #11815:
URL: https://github.com/apache/lucene/pull/11815#issuecomment-1286765622

   What happens if the delete selector deletes 100% of the documents in a 
segment?
   
   `IndexWriter` would normally drop such segments ... does it do so in this 
case too?  Indeed, `testDeleteEverything` seems to confirm it does, great!
   
   Maybe explain this caveat in the javadocs?  I.e. one cannot produce a 
segment geometry that includes segments with 100% deleted documents.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mikemccand commented on a diff in pull request #11815: Support deletions in rearrange (#11814)

2022-10-21 Thread GitBox


mikemccand commented on code in PR #11815:
URL: https://github.com/apache/lucene/pull/11815#discussion_r1001630980


##
lucene/misc/src/java/org/apache/lucene/misc/index/IndexRearranger.java:
##
@@ -139,6 +203,47 @@ private static void addOneSegment(
 writer.addIndexes(readers);
   }
 
+  private static void applyDeletes(
+  IndexWriter writer, IndexReader reader, DocumentSelector selector)
+  throws ExecutionException, InterruptedException {
+if (selector == null) {
+  // There are no deletes to be applied
+  return;
+}
+
+ExecutorService executor =
+Executors.newFixedThreadPool(
+Math.min(Runtime.getRuntime().availableProcessors(), 
reader.leaves().size()),
+new NamedThreadFactory("rearranger"));
+ArrayList> futures = new ArrayList<>();
+
+for (LeafReaderContext context : reader.leaves()) {
+  Callable applyDeletesToSegment =
+  () -> {
+applyDeletesToOneSegment(writer, (CodecReader) context.reader(), 
selector);
+return null;
+  };
+  futures.add(executor.submit(applyDeletesToSegment));
+}
+
+for (Future future : futures) {
+  future.get();
+}
+executor.shutdown();
+  }
+
+  private static void applyDeletesToOneSegment(
+  IndexWriter writer, CodecReader segmentReader, DocumentSelector 
selector) throws IOException {
+Bits deletedDocs = selector.getFilteredDocs(segmentReader);
+for (int i = 0; i < segmentReader.maxDoc(); ++i) {
+  if (deletedDocs.get(i)) {
+if (writer.tryDeleteDocument(segmentReader, i) == -1) {
+  throw new IllegalStateException("tryDeleteDocument failed and 
there's no plan B");

Review Comment:
   LOL.
   
   Fortunately, as `tryDeleteDocument` is currently implemented today, it 
should never fail, since you have disabled merging in this writer.



##
lucene/misc/src/java/org/apache/lucene/misc/index/IndexRearranger.java:
##
@@ -139,6 +203,47 @@ private static void addOneSegment(
 writer.addIndexes(readers);
   }
 
+  private static void applyDeletes(
+  IndexWriter writer, IndexReader reader, DocumentSelector selector)
+  throws ExecutionException, InterruptedException {
+if (selector == null) {
+  // There are no deletes to be applied
+  return;
+}
+
+ExecutorService executor =
+Executors.newFixedThreadPool(
+Math.min(Runtime.getRuntime().availableProcessors(), 
reader.leaves().size()),
+new NamedThreadFactory("rearranger"));
+ArrayList> futures = new ArrayList<>();
+
+for (LeafReaderContext context : reader.leaves()) {
+  Callable applyDeletesToSegment =
+  () -> {
+applyDeletesToOneSegment(writer, (CodecReader) context.reader(), 
selector);
+return null;
+  };
+  futures.add(executor.submit(applyDeletesToSegment));
+}
+
+for (Future future : futures) {
+  future.get();
+}
+executor.shutdown();
+  }
+
+  private static void applyDeletesToOneSegment(
+  IndexWriter writer, CodecReader segmentReader, DocumentSelector 
selector) throws IOException {
+Bits deletedDocs = selector.getFilteredDocs(segmentReader);
+for (int i = 0; i < segmentReader.maxDoc(); ++i) {

Review Comment:
   Could we rename `i` to `docid`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org