date:20210706

[GitHub] [lucene] jtibshirani commented on a change in pull request #204: LUCENE-10020 DocComparator don't skip docs of same docID

2021-07-06 Thread GitBox



jtibshirani commented on a change in pull request #204:
URL: https://github.com/apache/lucene/pull/204#discussion_r664107815



##
File path: 
lucene/core/src/test/org/apache/lucene/search/TestSortOptimization.java
##
@@ -332,6 +333,71 @@ public void testFloatSortOptimization() throws IOException 
{
 dir.close();
   }
 
+  /**
+   * Test that a search with sort on [_doc, other fields] across multiple 
indices doesn't miss any
+   * documents.
+   */
+  public void testDocSortOptimizationMultipleIndices() throws IOException {
+final int numIndices = 3;
+final int numDocsInIndex = atLeast(50);
+Directory[] dirs = new Directory[numIndices];
+IndexReader[] readers = new IndexReader[numIndices];
+for (int i = 0; i < numIndices; i++) {
+  dirs[i] = newDirectory();
+  final int remainder = i % 3;

Review comment:
   Since `numIndices` is 3, do we need to take a mod here?

##
File path: 
lucene/core/src/java/org/apache/lucene/search/comparators/DocComparator.java
##
@@ -81,7 +87,12 @@ public Integer value(int slot) {
 public DocLeafComparator(LeafReaderContext context) {
   this.docBase = context.docBase;
   if (enableSkipping) {
-this.minDoc = topValue + 1;
+// For a single sort on _doc, we want to skip all docs before topValue.
+// For multiple fields sort on [_doc, other fields], we want to 
include docs with the same
+// docID.
+// This is needed in a distributed search, where there are docs from 
different indices with
+// the same docID.
+this.minDoc = singleSort ? topValue + 1 : topValue;

Review comment:
   This seems to work and matches the approach in `NumericComparator`. I 
guess it doesn't specifically address the case where `_doc` is the last sort, 
for example a sort on `["some_field", "_doc"]`, where we could also use 
`topValue + 1`. 
   
   One thing I wondered: is keeping track of `singleSort` really important, or 
could we simplify and just always use `topValue`? At most we'd consider one 
extra document. A similar simplification would apply to `NumericComparator`. 
The skipping logic is a bit complex and I'm thinking about the performance/ 
simplicity trade-off.

##
File path: 
lucene/core/src/test/org/apache/lucene/search/TestSortOptimization.java
##
@@ -332,6 +333,71 @@ public void testFloatSortOptimization() throws IOException 
{
 dir.close();
   }
 
+  /**
+   * Test that a search with sort on [_doc, other fields] across multiple 
indices doesn't miss any
+   * documents.
+   */
+  public void testDocSortOptimizationMultipleIndices() throws IOException {
+final int numIndices = 3;
+final int numDocsInIndex = atLeast(50);
+Directory[] dirs = new Directory[numIndices];
+IndexReader[] readers = new IndexReader[numIndices];
+for (int i = 0; i < numIndices; i++) {
+  dirs[i] = newDirectory();
+  final int remainder = i % 3;
+  Function valueSupplier = docID -> (docID * 3 + 
remainder);

Review comment:
   Maybe this could be a simple variable assignment instead of using a 
supplier. Also I think we can replace 3 with `numIndices`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] dsmiley commented on a change in pull request #205: LUCENE-9959: Rename TermVectorsReader to TermVectorsReaderBase

2021-07-06 Thread GitBox



dsmiley commented on a change in pull request #205:
URL: https://github.com/apache/lucene/pull/205#discussion_r664111224



##
File path: lucene/core/src/java/org/apache/lucene/index/SegmentCoreReaders.java
##
@@ -57,7 +57,7 @@
   final NormsProducer normsProducer;
 
   final StoredFieldsReader fieldsReaderOrig;
-  final TermVectorsReader termVectorsReaderOrig;
+  final TermVectorsReaderBase termVectorsReaderOrig;

Review comment:
   In the subsequent PR that re-introduces "TermVectorsReader", I assume 
you will change some of the references of TermVectorsReaderBase back to 
TermVectorsReader, such as right here?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] zacharymorn commented on pull request #205: LUCENE-9959: Rename TermVectorsReader to TermVectorsReaderBase

2021-07-06 Thread GitBox



zacharymorn commented on pull request #205:
URL: https://github.com/apache/lucene/pull/205#issuecomment-874499672


   > I'm also okay with introducing the new/simpler TermVectorsReader 
abstraction if it would reduce this PR diff a lot. Up to you.
   
   Thanks @dsmiley for the review and approval! I've thought about having the 
new `TermVectorsReader` here as well, but I'm leaning more towards introducing 
it in the other PR after some digging, as I see so far the majority of 
`TermVectorsReaderBase` usage here involves using the existing codec APIs 
`checkIntegrity/clone/getMergeInstance`, so the newer `TermVectorsReader` class 
(that may not have any method yet) may not be usable directly. In addition, 
introducing it here may also lead to some git merge conflicts with the other PR 
later as well I feel, as `TermVectors` was already created there as the super 
class of `TermVectorsReader`.  So I think saving it for the other PR may be an 
easier approach overall?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] zacharymorn commented on a change in pull request #205: LUCENE-9959: Rename TermVectorsReader to TermVectorsReaderBase

2021-07-06 Thread GitBox



zacharymorn commented on a change in pull request #205:
URL: https://github.com/apache/lucene/pull/205#discussion_r664278681



##
File path: lucene/core/src/java/org/apache/lucene/index/SegmentCoreReaders.java
##
@@ -57,7 +57,7 @@
   final NormsProducer normsProducer;
 
   final StoredFieldsReader fieldsReaderOrig;
-  final TermVectorsReader termVectorsReaderOrig;
+  final TermVectorsReaderBase termVectorsReaderOrig;

Review comment:
   > In the subsequent PR that re-introduces "TermVectorsReader", I assume 
you will change some of the references of TermVectorsReaderBase back to 
TermVectorsReader
   
   Yes I'm planning to do that! 
   
   > such as right here?
   
   For this particular one though, later in the code its `close` method is 
called via:
   
   
https://github.com/apache/lucene/blob/167bd99c23520f8e252ad6f98d1e3065d20d5ae6/lucene/core/src/java/org/apache/lucene/index/SegmentCoreReaders.java#L185-L192
   
   So it may still need to use `TermVectorsReaderBase` there, as that class 
still implements `Closeable` per 
https://github.com/apache/lucene/pull/180#issuecomment-871155896 ? 
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mayya-sharipova commented on a change in pull request #204: LUCENE-10020 DocComparator don't skip docs of same docID

2021-07-06 Thread GitBox



mayya-sharipova commented on a change in pull request #204:
URL: https://github.com/apache/lucene/pull/204#discussion_r664573881



##
File path: 
lucene/core/src/java/org/apache/lucene/search/comparators/DocComparator.java
##
@@ -81,7 +87,12 @@ public Integer value(int slot) {
 public DocLeafComparator(LeafReaderContext context) {
   this.docBase = context.docBase;
   if (enableSkipping) {
-this.minDoc = topValue + 1;
+// For a single sort on _doc, we want to skip all docs before topValue.
+// For multiple fields sort on [_doc, other fields], we want to 
include docs with the same
+// docID.
+// This is needed in a distributed search, where there are docs from 
different indices with
+// the same docID.
+this.minDoc = singleSort ? topValue + 1 : topValue;

Review comment:
   Great comment! +1 for simplifying the code at the expense for extra 
single case in `DocComparator`.  
   
   For `NumericComparator` though this is not the case, and there could be huge 
number of docs with the same value, so extra optimization for `singleSort` is 
important.
   
   > I guess it doesn't specifically address the case where _doc is the last 
sort, for example a sort on ["some_field", "_doc"], where we could also use 
topValue + 1.
   
   No, the sort optimizations in `DocComparator` are not applicable where 
`_doc` is the 1st sort. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mayya-sharipova commented on pull request #204: LUCENE-10020 DocComparator don't skip docs of same docID

2021-07-06 Thread GitBox



mayya-sharipova commented on pull request #204:
URL: https://github.com/apache/lucene/pull/204#issuecomment-874777903


   @jtibshirani Thanks for the review. I've tried to address all your feedback 
in b5af441a39611d63df30168452cdec521ce4d578


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] dsmiley commented on a change in pull request #205: LUCENE-9959: Rename TermVectorsReader to TermVectorsReaderBase

2021-07-06 Thread GitBox



dsmiley commented on a change in pull request #205:
URL: https://github.com/apache/lucene/pull/205#discussion_r664625324



##
File path: lucene/core/src/java/org/apache/lucene/index/SegmentCoreReaders.java
##
@@ -57,7 +57,7 @@
   final NormsProducer normsProducer;
 
   final StoredFieldsReader fieldsReaderOrig;
-  final TermVectorsReader termVectorsReaderOrig;
+  final TermVectorsReaderBase termVectorsReaderOrig;

Review comment:
   Perhaps then TermVectorsReader should also implement Closeable? (to be 
done in another PR)
   
   I called out the line above because it is not harmonious with the other 
"*Reader" suffixed classes declared next to it.  




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] thelabdude opened a new pull request #2522: SOLR-15460: Implement LIKE, IS NOT NULL, IS NULL, and support wildcard * in equals string literal

2021-07-06 Thread GitBox



thelabdude opened a new pull request #2522:
URL: https://github.com/apache/lucene-solr/pull/2522


   Backport of https://github.com/apache/solr/pull/173 to branch_8x


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jtibshirani commented on a change in pull request #204: LUCENE-10020 DocComparator don't skip docs of same docID

2021-07-06 Thread GitBox



jtibshirani commented on a change in pull request #204:
URL: https://github.com/apache/lucene/pull/204#discussion_r664698038



##
File path: 
lucene/core/src/java/org/apache/lucene/search/comparators/DocComparator.java
##
@@ -81,7 +87,12 @@ public Integer value(int slot) {
 public DocLeafComparator(LeafReaderContext context) {
   this.docBase = context.docBase;
   if (enableSkipping) {
-this.minDoc = topValue + 1;
+// For a single sort on _doc, we want to skip all docs before topValue.
+// For multiple fields sort on [_doc, other fields], we want to 
include docs with the same
+// docID.
+// This is needed in a distributed search, where there are docs from 
different indices with
+// the same docID.
+this.minDoc = singleSort ? topValue + 1 : topValue;

Review comment:
   > For `NumericComparator` though this is not the case, and there could 
be huge number of docs with the same value, so extra optimization for 
`singleSort` is important.
   
   Oh I see, `NumericComparator` isn't just used in 'search after' cases so 
there could be many docs that share the same value.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] thelabdude merged pull request #2522: SOLR-15460: Implement LIKE, IS NOT NULL, IS NULL, and support wildcard * in equals string literal

2021-07-06 Thread GitBox



thelabdude merged pull request #2522:
URL: https://github.com/apache/lucene-solr/pull/2522


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] uschindler commented on pull request #203: LUCENE-10019: Align file starts in CFS files to have proper alignment (8 bytes)

2021-07-06 Thread GitBox



uschindler commented on pull request #203:
URL: https://github.com/apache/lucene/pull/203#issuecomment-874900622


   After reviewing the code by DocValues and DirectPackedReader, I figured out 
that the alignment should be configurable. Depending of a RandomAccessInput 
based on an IndexInput slice, the alignment should be adapted to the inner 
entry size (byte, int, short, long).
   
   So I changed the IndexOutput method that allows to align the file pointer to 
take any power of 2 as alignment. I also added extensive tests, so it works 
correct.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10019) Align file starts in CFS files to have proper alignment (8 bytes)

2021-07-06 Thread Uwe Schindler (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17375832#comment-17375832
 ] 

Uwe Schindler commented on LUCENE-10019:


After reviewing the code by DocValues and DirectPackedReader, I figured out 
that the alignment should be configurable. Depending of a RandomAccessInput 
based on an IndexInput slice, the alignment should be adapted to the inner 
entry size (byte, int, short, long).

So I changed the IndexOutput method that allows to align the file pointer to 
take any power of 2 as alignment. I also added extensive tests, so it works 
correct.

Due to the changes I will wait a bit longer until I will merge the PR.

> Align file starts in CFS files to have proper alignment (8 bytes)
> -
>
> Key: LUCENE-10019
> URL: https://issues.apache.org/jira/browse/LUCENE-10019
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs, core/store
>Affects Versions: main (9.0)
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
>Priority: Blocker
> Fix For: 9.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> While discussing about MMapDirectory and fast access to file contents through 
> MMap (https://github.com/apache/lucene/pull/177 and previous versions of this 
> draft, also), I figured out that for most Lucene files, the data inside is 
> not aligned at all.
> We can't fix this easily and it's also not always important, but some files 
> should really have a CPU fieldly alignment from beginning! This is 
> escpecially important when we use slices().
> I got many tests with aligned VarHandles to pass, but it broke instantly, if 
> the file was inside a Compound CFS file.
> CompoundFormat.write() just appends all data to the IndexOutput and writes 
> the offset to the entries file. The fix to make at least file starts aligned 
> is to just write some null-bytes between the files, so startOffset is aligned 
> to multiples of 8 bytes.
> At a later stage we could also think of aligning to LBA 
> blocks/sectors/whatever to make OS paging work better. But for performance of 
> index access, slices of compound files when memory mapped should at least 
> align to 8 bytes.
> Fix is easy: Just add some modulo on startOffset and write some extra bytes 
> before the next file is serialized. The change is only 2 lines. It does not 
> even change index format!
> I'd like to get this in for 9.0 so we can at least say: our CFS files are 
> aligned. Aligning other files like docvalues to better help CPU is then 
> possible.
> I will provide a simple pull request for Lucene90CompoundFormat soon. If you 
> don't see any problems, this is a no-brainer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] thelabdude opened a new pull request #2523: SOLR-15451: SolrSchema (for Parallel SQL) should use PKI principal for internal request to /admin/luke to get table metadata

2021-07-06 Thread GitBox



thelabdude opened a new pull request #2523:
URL: https://github.com/apache/lucene-solr/pull/2523


   Backport of https://github.com/apache/solr/pull/168 to branch_8x


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] thelabdude merged pull request #2523: SOLR-15451: SolrSchema (for Parallel SQL) should use PKI principal for internal request to /admin/luke to get table metadata

2021-07-06 Thread GitBox



thelabdude merged pull request #2523:
URL: https://github.com/apache/lucene-solr/pull/2523


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10003) Disallow C-style array declarations

2021-07-06 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17375864#comment-17375864
 ] 

David Smiley commented on LUCENE-10003:
---

I did some experimentation with using a regexp in 
{{validate-source-patterns.gradle}} {{invalidJavaOnlyPatterns}} and came up 
with this:
{noformat}
  (~$/(?m)^(?!\s*(/[/*]|\*).*).*\b\w+\s+\w+(\[])+\s*[=,;]/$) : 'C style 
array declarations disallowed; move the brackets'
{noformat}
The first part is a negative lookahead to ensure the line doesn't start with a 
comment, because there are a number of comments that would otherwise match this 
expression.  There is only one place where there's a false positive 
(MemoryIndex line 949), and it can be addressed trivially via adding a newline.

I'll push a PR that just does the changes, and separately push another PR 
oriented on the automated detection using the above regexp.

> Disallow C-style array declarations
> ---
>
> Key: LUCENE-10003
> URL: https://issues.apache.org/jira/browse/LUCENE-10003
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
>
> The Google Java Format, that which we adhere to, disallows c-style array 
> declarations: https://google.github.io/styleguide/javaguide.html#s4.8.3-arrays
> It's also known to "Error Prone":
> https://errorprone.info/bugpattern/MixedArrayDimensions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] thelabdude opened a new pull request #2524: SOLR-15456: Get field type info from luke for custom fields instead of defaulting to String in Parallel SQL

2021-07-06 Thread GitBox



thelabdude opened a new pull request #2524:
URL: https://github.com/apache/lucene-solr/pull/2524


   Backport of https://github.com/apache/solr/pull/181


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] thelabdude merged pull request #2524: SOLR-15456: Get field type info from luke for custom fields instead of defaulting to String in Parallel SQL

2021-07-06 Thread GitBox



thelabdude merged pull request #2524:
URL: https://github.com/apache/lucene-solr/pull/2524


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] dsmiley opened a new pull request #206: LUCENE-10003 No C style array declaration

2021-07-06 Thread GitBox



dsmiley opened a new pull request #206:
URL: https://github.com/apache/lucene/pull/206


   https://issues.apache.org/jira/browse/LUCENE-10003
   
   The first commit was purely automated by IntelliJ.
   The second commit are my hand edits to other spots IntellIJ didn't find 
because it was in a comment or some other file like a .jflex.  These hand edits 
were found via 
   ```
   ^(?!\s*(\/\/|\/?\*).*).*\b((?!instanceof)\w+)\s+\w+(\[])+\s*[=,;)]
   ```
   Expect a separate PR to add checks to prevent this syntax via tooling.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] rmuir commented on pull request #206: LUCENE-10003 No C style array declaration

2021-07-06 Thread GitBox



rmuir commented on pull request #206:
URL: https://github.com/apache/lucene/pull/206#issuecomment-874976665


   Why are we not doing this with spotless? As mentioned on the JIRA, I'm 
really -1 on the approach of enforcing *formatting* with anything but spotless. 
Sorry, it will just be annoying. The purpose of spotless is to avoid troubles 
like these.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] thelabdude opened a new pull request #2525: SOLR-15461: Upgrade Apache Calcite to latest release

2021-07-06 Thread GitBox



thelabdude opened a new pull request #2525:
URL: https://github.com/apache/lucene-solr/pull/2525


   Backport https://github.com/apache/solr/pull/177 to branch_8x


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] markrmiller commented on pull request #2525: SOLR-15461: Upgrade Apache Calcite to latest release

2021-07-06 Thread GitBox



markrmiller commented on pull request #2525:
URL: https://github.com/apache/lucene-solr/pull/2525#issuecomment-875004571


   Didn't see anything that looks off to me. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mayya-sharipova commented on a change in pull request #204: LUCENE-10020 DocComparator don't skip docs of same docID

2021-07-06 Thread GitBox



mayya-sharipova commented on a change in pull request #204:
URL: https://github.com/apache/lucene/pull/204#discussion_r664804322



##
File path: 
lucene/core/src/java/org/apache/lucene/search/comparators/DocComparator.java
##
@@ -81,7 +87,12 @@ public Integer value(int slot) {
 public DocLeafComparator(LeafReaderContext context) {
   this.docBase = context.docBase;
   if (enableSkipping) {
-this.minDoc = topValue + 1;
+// For a single sort on _doc, we want to skip all docs before topValue.
+// For multiple fields sort on [_doc, other fields], we want to 
include docs with the same
+// docID.
+// This is needed in a distributed search, where there are docs from 
different indices with
+// the same docID.
+this.minDoc = singleSort ? topValue + 1 : topValue;

Review comment:
   Right, `singleSort` check is used not only for search after case.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mayya-sharipova merged pull request #204: LUCENE-10020 DocComparator don't skip docs of same docID

2021-07-06 Thread GitBox



mayya-sharipova merged pull request #204:
URL: https://github.com/apache/lucene/pull/204


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10020) DocComparator should not skip docs with the same docID on multiple sorts with search after

2021-07-06 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17375948#comment-17375948
 ] 

ASF subversion and git services commented on LUCENE-10020:
--

Commit 64d9f8c58710eaec8177408b8301a72064581fb1 in lucene's branch 
refs/heads/main from Mayya Sharipova
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=64d9f8c ]

LUCENE-10020 DocComparator don't skip docs of same docID (#204)

DocComparator should not skip docs with the same docID on multiple
sorts with search after.

Because of the optimization introduced in LUCENE-9449, currently when
searching with sort on [_doc, other fields] with search after,
DocComparator can efficiently skip all docs before and including
the provided [search after docID]. This is a desirable behaviour
in a single index search. But in a distributed search, where multiple
indices have docs with the same docID, and when searching on
 [_doc, other fields], the sort optimization should NOT skip
documents with the same docIDs.

This PR fixes this.

Relates to LUCENE-9449

> DocComparator should not skip docs with the same docID on multiple sorts with 
> search after
> --
>
> Key: LUCENE-10020
> URL: https://issues.apache.org/jira/browse/LUCENE-10020
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Mayya Sharipova
>Priority: Minor
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Because of the optimization introduced in 
> [LUCENE-9449|https://issues.apache.org/jira/browse/LUCENE-9449], when 
> searching with sort on  [_doc, other fields] with search after,  
> DocComparator can efficiently skip all docs before and including the provided 
> [search after docID].
> This is a desirable behaviour in a single index search. But in a distributed 
> search, where multiple indices have docs with the same docID, and when 
> searching on [_doc, other fields], the sort optimization should not skip 
> documents with the same docIDs.  
> Relates to LUCENE-9449



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9449) Skip non-competitive documents when sort by _doc with search after

2021-07-06 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17375950#comment-17375950
 ] 

ASF subversion and git services commented on LUCENE-9449:
-

Commit 64d9f8c58710eaec8177408b8301a72064581fb1 in lucene's branch 
refs/heads/main from Mayya Sharipova
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=64d9f8c ]

LUCENE-10020 DocComparator don't skip docs of same docID (#204)

DocComparator should not skip docs with the same docID on multiple
sorts with search after.

Because of the optimization introduced in LUCENE-9449, currently when
searching with sort on [_doc, other fields] with search after,
DocComparator can efficiently skip all docs before and including
the provided [search after docID]. This is a desirable behaviour
in a single index search. But in a distributed search, where multiple
indices have docs with the same docID, and when searching on
 [_doc, other fields], the sort optimization should NOT skip
documents with the same docIDs.

This PR fixes this.

Relates to LUCENE-9449

> Skip non-competitive documents when sort by _doc with search after
> --
>
> Key: LUCENE-9449
> URL: https://issues.apache.org/jira/browse/LUCENE-9449
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Mayya Sharipova
>Priority: Minor
> Fix For: 8.7
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> Enhance DocComparator to provide an iterator over competitive documents when 
> search ing with "after" FieldDoc.
> This iterator can quickly position on the desired "after" document, and skip 
> all documents or even segments that contain documents before "after"
> This is especially efficient when "after" is high.
>  
> Related to LUCENE-9280



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9449) Skip non-competitive documents when sort by _doc with search after

2021-07-06 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17375949#comment-17375949
 ] 

ASF subversion and git services commented on LUCENE-9449:
-

Commit 64d9f8c58710eaec8177408b8301a72064581fb1 in lucene's branch 
refs/heads/main from Mayya Sharipova
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=64d9f8c ]

LUCENE-10020 DocComparator don't skip docs of same docID (#204)

DocComparator should not skip docs with the same docID on multiple
sorts with search after.

Because of the optimization introduced in LUCENE-9449, currently when
searching with sort on [_doc, other fields] with search after,
DocComparator can efficiently skip all docs before and including
the provided [search after docID]. This is a desirable behaviour
in a single index search. But in a distributed search, where multiple
indices have docs with the same docID, and when searching on
 [_doc, other fields], the sort optimization should NOT skip
documents with the same docIDs.

This PR fixes this.

Relates to LUCENE-9449

> Skip non-competitive documents when sort by _doc with search after
> --
>
> Key: LUCENE-9449
> URL: https://issues.apache.org/jira/browse/LUCENE-9449
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Mayya Sharipova
>Priority: Minor
> Fix For: 8.7
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> Enhance DocComparator to provide an iterator over competitive documents when 
> search ing with "after" FieldDoc.
> This iterator can quickly position on the desired "after" document, and skip 
> all documents or even segments that contain documents before "after"
> This is especially efficient when "after" is high.
>  
> Related to LUCENE-9280



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] dsmiley commented on pull request #206: LUCENE-10003 No C style array declaration

2021-07-06 Thread GitBox



dsmiley commented on pull request #206:
URL: https://github.com/apache/lucene/pull/206#issuecomment-875011653


   This PR isn't about automatic enforcement so I think your veto is intended 
at the conversation in the JIRA issue about how to enforce.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] thelabdude merged pull request #2525: SOLR-15461: Upgrade Apache Calcite to latest release

2021-07-06 Thread GitBox



thelabdude merged pull request #2525:
URL: https://github.com/apache/lucene-solr/pull/2525


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] thelabdude opened a new pull request #2526: SOLR-15489: Implement OFFSET & FETCH for LIMIT SQL queries

2021-07-06 Thread GitBox



thelabdude opened a new pull request #2526:
URL: https://github.com/apache/lucene-solr/pull/2526


   Backport https://github.com/apache/solr/pull/191 to branch_8x


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] thelabdude merged pull request #2526: SOLR-15489: Implement OFFSET & FETCH for LIMIT SQL queries

2021-07-06 Thread GitBox



thelabdude merged pull request #2526:
URL: https://github.com/apache/lucene-solr/pull/2526


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] thelabdude opened a new pull request #2527: SOLR-15208: Add the countDist aggregation to the stats, facet and timeseries Streaming Expressions

2021-07-06 Thread GitBox



thelabdude opened a new pull request #2527:
URL: https://github.com/apache/lucene-solr/pull/2527


   Backport to branch_8x


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] thelabdude merged pull request #2527: SOLR-15208: Add the countDist aggregation to the stats, facet and timeseries Streaming Expressions

2021-07-06 Thread GitBox



thelabdude merged pull request #2527:
URL: https://github.com/apache/lucene-solr/pull/2527


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] thelabdude opened a new pull request #2528: SOLR-15475: Implement COUNT and APPROX_COUNT_DISTINCT aggregation functions for Parallel SQL

2021-07-06 Thread GitBox



thelabdude opened a new pull request #2528:
URL: https://github.com/apache/lucene-solr/pull/2528


   Backport https://github.com/apache/solr/pull/194 to branch_8x


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] thelabdude merged pull request #2528: SOLR-15475: Implement COUNT and APPROX_COUNT_DISTINCT aggregation functions for Parallel SQL

2021-07-06 Thread GitBox



thelabdude merged pull request #2528:
URL: https://github.com/apache/lucene-solr/pull/2528


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] thelabdude opened a new pull request #2529: SOLR-15499: StatsStream implement ParallelMetricsRollup to allow for tiered computation of SQL metrics over collection aliases

2021-07-06 Thread GitBox



thelabdude opened a new pull request #2529:
URL: https://github.com/apache/lucene-solr/pull/2529


   Backport https://github.com/apache/solr/pull/197 to branch_8x


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] thelabdude merged pull request #2529: SOLR-15499: StatsStream implement ParallelMetricsRollup to allow for tiered computation of SQL metrics over collection aliases

2021-07-06 Thread GitBox



thelabdude merged pull request #2529:
URL: https://github.com/apache/lucene-solr/pull/2529


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] mayya-sharipova opened a new pull request #2530: LUCENE-10020 DocComparator don't skip docs of same docID

2021-07-06 Thread GitBox



mayya-sharipova opened a new pull request #2530:
URL: https://github.com/apache/lucene-solr/pull/2530


   DocComparator should not skip docs with the same docID on multiple
   sorts with search after.
   
   Because of the optimization introduced in LUCENE-9449, currently when
   searching with sort on [_doc, other fields] with search after,
   DocComparator can efficiently skip all docs before and including
   the provided [search after docID]. This is a desirable behaviour
   in a single index search. But in a distributed search, where multiple
   indices have docs with the same docID, and when searching on
   [_doc, other fields], the sort optimization should NOT skip
   documents with the same docIDs.
   
   This PR fixes this.
   
   Backport for https://github.com/apache/lucene/pull/204
   Relates to LUCENE-9449


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] mayya-sharipova merged pull request #2530: LUCENE-10020 DocComparator don't skip docs of same docID

2021-07-06 Thread GitBox



mayya-sharipova merged pull request #2530:
URL: https://github.com/apache/lucene-solr/pull/2530


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9449) Skip non-competitive documents when sort by _doc with search after

2021-07-06 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17376132#comment-17376132
 ] 

ASF subversion and git services commented on LUCENE-9449:
-

Commit bdef1be78fb46ab7bf1924386daab2a7db1a4316 in lucene-solr's branch 
refs/heads/branch_8x from Mayya Sharipova
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=bdef1be ]

LUCENE-10020 DocComparator don't skip docs of same docID (#2530)

DocComparator should not skip docs with the same docID on multiple
sorts with search after.

Because of the optimization introduced in LUCENE-9449, currently when
searching with sort on [_doc, other fields] with search after,
DocComparator can efficiently skip all docs before and including
the provided [search after docID]. This is a desirable behaviour
in a single index search. But in a distributed search, where multiple
indices have docs with the same docID, and when searching on
[_doc, other fields], the sort optimization should NOT skip
documents with the same docIDs.

This PR fixes this.

Backport for https://github.com/apache/lucene/pull/204
Relates to LUCENE-9449

> Skip non-competitive documents when sort by _doc with search after
> --
>
> Key: LUCENE-9449
> URL: https://issues.apache.org/jira/browse/LUCENE-9449
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Mayya Sharipova
>Priority: Minor
> Fix For: 8.7
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> Enhance DocComparator to provide an iterator over competitive documents when 
> search ing with "after" FieldDoc.
> This iterator can quickly position on the desired "after" document, and skip 
> all documents or even segments that contain documents before "after"
> This is especially efficient when "after" is high.
>  
> Related to LUCENE-9280



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9449) Skip non-competitive documents when sort by _doc with search after

2021-07-06 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17376131#comment-17376131
 ] 

ASF subversion and git services commented on LUCENE-9449:
-

Commit bdef1be78fb46ab7bf1924386daab2a7db1a4316 in lucene-solr's branch 
refs/heads/branch_8x from Mayya Sharipova
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=bdef1be ]

LUCENE-10020 DocComparator don't skip docs of same docID (#2530)

DocComparator should not skip docs with the same docID on multiple
sorts with search after.

Because of the optimization introduced in LUCENE-9449, currently when
searching with sort on [_doc, other fields] with search after,
DocComparator can efficiently skip all docs before and including
the provided [search after docID]. This is a desirable behaviour
in a single index search. But in a distributed search, where multiple
indices have docs with the same docID, and when searching on
[_doc, other fields], the sort optimization should NOT skip
documents with the same docIDs.

This PR fixes this.

Backport for https://github.com/apache/lucene/pull/204
Relates to LUCENE-9449

> Skip non-competitive documents when sort by _doc with search after
> --
>
> Key: LUCENE-9449
> URL: https://issues.apache.org/jira/browse/LUCENE-9449
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Mayya Sharipova
>Priority: Minor
> Fix For: 8.7
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> Enhance DocComparator to provide an iterator over competitive documents when 
> search ing with "after" FieldDoc.
> This iterator can quickly position on the desired "after" document, and skip 
> all documents or even segments that contain documents before "after"
> This is especially efficient when "after" is high.
>  
> Related to LUCENE-9280



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10020) DocComparator should not skip docs with the same docID on multiple sorts with search after

2021-07-06 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17376130#comment-17376130
 ] 

ASF subversion and git services commented on LUCENE-10020:
--

Commit bdef1be78fb46ab7bf1924386daab2a7db1a4316 in lucene-solr's branch 
refs/heads/branch_8x from Mayya Sharipova
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=bdef1be ]

LUCENE-10020 DocComparator don't skip docs of same docID (#2530)

DocComparator should not skip docs with the same docID on multiple
sorts with search after.

Because of the optimization introduced in LUCENE-9449, currently when
searching with sort on [_doc, other fields] with search after,
DocComparator can efficiently skip all docs before and including
the provided [search after docID]. This is a desirable behaviour
in a single index search. But in a distributed search, where multiple
indices have docs with the same docID, and when searching on
[_doc, other fields], the sort optimization should NOT skip
documents with the same docIDs.

This PR fixes this.

Backport for https://github.com/apache/lucene/pull/204
Relates to LUCENE-9449

> DocComparator should not skip docs with the same docID on multiple sorts with 
> search after
> --
>
> Key: LUCENE-10020
> URL: https://issues.apache.org/jira/browse/LUCENE-10020
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Mayya Sharipova
>Priority: Minor
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Because of the optimization introduced in 
> [LUCENE-9449|https://issues.apache.org/jira/browse/LUCENE-9449], when 
> searching with sort on  [_doc, other fields] with search after,  
> DocComparator can efficiently skip all docs before and including the provided 
> [search after docID].
> This is a desirable behaviour in a single index search. But in a distributed 
> search, where multiple indices have docs with the same docID, and when 
> searching on [_doc, other fields], the sort optimization should not skip 
> documents with the same docIDs.  
> Relates to LUCENE-9449



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] zacharymorn commented on a change in pull request #205: LUCENE-9959: Rename TermVectorsReader to TermVectorsReaderBase

2021-07-06 Thread GitBox



zacharymorn commented on a change in pull request #205:
URL: https://github.com/apache/lucene/pull/205#discussion_r665060449



##
File path: lucene/core/src/java/org/apache/lucene/index/SegmentCoreReaders.java
##
@@ -57,7 +57,7 @@
   final NormsProducer normsProducer;
 
   final StoredFieldsReader fieldsReaderOrig;
-  final TermVectorsReader termVectorsReaderOrig;
+  final TermVectorsReaderBase termVectorsReaderOrig;

Review comment:
   Yeah this indeed looks inconsistent. I think the main thing here is, the 
overall understanding I got from 
https://github.com/apache/lucene/pull/180#discussion_r650720327 & 
https://github.com/apache/lucene/pull/180#discussion_r653357562 is that the new 
`TermVectors / TermVectorsReader` abstraction is supposed to be an index API, 
while the original `TermVectorsReader` / now `TermVectorsReaderBase` is a codec 
API, which has `clone/getMergeInstance/checkIntegrity` methods, and also 
implements `Closeable` (and all other `*Reader/Producer` codec classes also 
implement `Closeable` as well). If we were to move `Closeable` from 
`TermVectorsReaderBase` to the new `TermVectorsReader`, wouldn't that break the 
separation there?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] zacharymorn commented on a change in pull request #205: LUCENE-9959: Rename TermVectorsReader to TermVectorsReaderBase

2021-07-06 Thread GitBox



zacharymorn commented on a change in pull request #205:
URL: https://github.com/apache/lucene/pull/205#discussion_r665060449



##
File path: lucene/core/src/java/org/apache/lucene/index/SegmentCoreReaders.java
##
@@ -57,7 +57,7 @@
   final NormsProducer normsProducer;
 
   final StoredFieldsReader fieldsReaderOrig;
-  final TermVectorsReader termVectorsReaderOrig;
+  final TermVectorsReaderBase termVectorsReaderOrig;

Review comment:
   Yeah this indeed looks inconsistent. 
   
   I think the main thing here is, the overall understanding I got from 
https://github.com/apache/lucene/pull/180#discussion_r650720327 & 
https://github.com/apache/lucene/pull/180#discussion_r653357562 (this comment 
specifically asked not to move up `Closeable` from the original 
`TermVectorsReader`) is that, the new `TermVectors / TermVectorsReader` 
abstraction is supposed to be an index API, while the original 
`TermVectorsReader` / now `TermVectorsReaderBase` is a codec API, which has 
`clone/getMergeInstance/checkIntegrity` methods, and also implements 
`Closeable` (and all other `*Reader/Producer` codec classes also implement 
`Closeable` as well). If we were to move `Closeable` from 
`TermVectorsReaderBase` to the new `TermVectorsReader`, wouldn't that break the 
separation there?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

43 matches

Mail list logo