[jira] [Created] (LUCENE-9358) BKDTree: remove unnecessary tree rotation for the one dimensional case

2020-05-04 Thread Ignacio Vera (Jira)
Ignacio Vera created LUCENE-9358:


 Summary: BKDTree: remove unnecessary tree rotation for the one 
dimensional case 
 Key: LUCENE-9358
 URL: https://issues.apache.org/jira/browse/LUCENE-9358
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Ignacio Vera


This is a spin-off of LUCENE-9807. The reason we need to rotate the one 
dimensional tree is that the expected representation when we pack the index is 
different to the tree generated by the one dimensional logic. It would be easy 
to harmonise how we generate this tree representation to be the same in the one 
dimensional case and the multi-dimensional case and therefore change the 
index-packing logic to work on that representation.

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-7111) Better handling of relative links for SimplePostTool

2020-05-04 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SOLR-7111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl resolved SOLR-7111.
---
Resolution: Won't Fix

This is not a full-blown crawler, let's not try to make it one :)

> Better handling of relative links for SimplePostTool
> 
>
> Key: SOLR-7111
> URL: https://issues.apache.org/jira/browse/SOLR-7111
> Project: Solr
>  Issue Type: Bug
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
>
> The very simplistic crawler in SimplePostTool could handle links such as 
> {{href="./foo"}}, {{href="../foo"}} and {{href="#foo"}} better. Also, in some 
> cases there will be double {{//}} when concatenating base URL and relative 
> links.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] iverase opened a new pull request #1481: LUCENE-9358: remove unnecessary tree rotation for the one dimensional case

2020-05-04 Thread GitBox


iverase opened a new pull request #1481:
URL: https://github.com/apache/lucene-solr/pull/1481


   This commit changes the way the multi-dimensional tree builder generates the 
intermediate tree representation to be equal to the one dimensional case. 
Therefore, the index packing logic can be changed to work on the representation 
and avoid unnecessary tree and leaves rotation.
   
   A new interface is introduced to avoid copying intermediate List arrays.
   
   split values and split dimensions are handled in different arrays which 
means we are increasing the number of max points the tree builders can handle.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz opened a new pull request #1482: LUCENE-7822: CodecUtil#checkFooter should throw a CorruptIndexException as the main exception.

2020-05-04 Thread GitBox


jpountz opened a new pull request #1482:
URL: https://github.com/apache/lucene-solr/pull/1482


   See https://issues.apache.org/jira/browse/LUCENE-7822.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7822) IllegalArgumentException thrown instead of a CorruptIndexException

2020-05-04 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-7822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098750#comment-17098750
 ] 

Adrien Grand commented on LUCENE-7822:
--

I opened [https://github.com/apache/lucene-solr/pull/1482/files?w=1] to discuss 
what it could look like. I have a slight preference for changing checkFooter 
instead of calling checksumEntireFile up-front since the former guarantees that 
we verify the checksum of the exact bytes that we just read, but I could be 
convinced otherwise.

> IllegalArgumentException thrown instead of a CorruptIndexException
> --
>
> Key: LUCENE-7822
> URL: https://issues.apache.org/jira/browse/LUCENE-7822
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 6.5.1
>Reporter: Martin Amirault
>Priority: Minor
> Attachments: LUCENE-7822.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Similarly to LUCENE-7592 , When an {{*.si}} file is corrupted on very 
> specific part an IllegalArgumentException is thrown instead of a 
> CorruptIndexException.
> StackTrace (Lucene 6.5.1):
> {code}
> java.lang.IllegalArgumentException: Illegal minor version: 12517381
>   at 
> __randomizedtesting.SeedInfo.seed([1FEB5987CFA44BE:B8755B5574F9F3BF]:0)
>   at org.apache.lucene.util.Version.(Version.java:385)
>   at org.apache.lucene.util.Version.(Version.java:371)
>   at org.apache.lucene.util.Version.fromBits(Version.java:353)
>   at 
> org.apache.lucene.codecs.lucene62.Lucene62SegmentInfoFormat.read(Lucene62SegmentInfoFormat.java:97)
>   at 
> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:357)
>   at 
> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:288)
>   at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:448)
>   at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:445)
>   at 
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:692)
>   at 
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:644)
>   at 
> org.apache.lucene.index.SegmentInfos.readLatestCommit(SegmentInfos.java:450)
>   at 
> org.apache.lucene.index.DirectoryReader.listCommits(DirectoryReader.java:260)
> {code}
> Simple fix would be to add IllegalArgumentException to the catch list at 
> {{org/apache/lucene/index/SegmentInfos.java:289}}
> Other variations for the stacktraces:
> {code}
> java.lang.IllegalArgumentException: invalid codec filename '_�.cfs', must 
> match: _[a-z0-9]+(_.*)?\..*
>   at 
> __randomizedtesting.SeedInfo.seed([8B3FDE317B8D634A:A8EE07E5EB4B0B13]:0)
>   at 
> org.apache.lucene.index.SegmentInfo.checkFileNames(SegmentInfo.java:270)
>   at org.apache.lucene.index.SegmentInfo.addFiles(SegmentInfo.java:252)
>   at org.apache.lucene.index.SegmentInfo.setFiles(SegmentInfo.java:246)
>   at 
> org.apache.lucene.codecs.lucene62.Lucene62SegmentInfoFormat.read(Lucene62SegmentInfoFormat.java:248)
>   at 
> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:357)
>   at 
> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:288)
>   at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:448)
>   at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:445)
>   at 
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:692)
>   at 
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:644)
>   at 
> org.apache.lucene.index.SegmentInfos.readLatestCommit(SegmentInfos.java:450)
>   at 
> org.apache.lucene.index.DirectoryReader.listCommits(DirectoryReader.java:260)
> {code}
> {code}
> java.lang.IllegalArgumentException: An SPI class of type 
> org.apache.lucene.codecs.Codec with name 'LucenI62' does not exist.  You need 
> to add the corresponding JAR file supporting this SPI to your classpath.  The 
> current classpath supports the following names: [Lucene62, Lucene50, 
> Lucene53, Lucene54, Lucene60]
>   at 
> __randomizedtesting.SeedInfo.seed([925DE160F7260F99:B026EB9373CB6368]:0)
>   at org.apache.lucene.util.NamedSPILoader.lookup(NamedSPILoader.java:116)
>   at org.apache.lucene.codecs.Codec.forName(Codec.java:116)
>   at org.apache.lucene.index.SegmentInfos.readCodec(SegmentInfos.java:424)
>   at 
> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:356)
>   at 
> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:288)
>   at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:448)
>   at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:445)
>   at 
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:692)
>   

[jira] [Updated] (LUCENE-9328) SortingGroupHead to reuse DocValues

2020-05-04 Thread Mikhail Khludnev (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev updated LUCENE-9328:
-
Attachment: LUCENE-9328.patch
Status: Patch Available  (was: Patch Available)

> SortingGroupHead to reuse DocValues
> ---
>
> Key: LUCENE-9328
> URL: https://issues.apache.org/jira/browse/LUCENE-9328
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/grouping
>Reporter: Mikhail Khludnev
>Assignee: Mikhail Khludnev
>Priority: Minor
> Attachments: LUCENE-9328.patch, LUCENE-9328.patch, LUCENE-9328.patch, 
> LUCENE-9328.patch, LUCENE-9328.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> That's why 
> https://issues.apache.org/jira/browse/LUCENE-7701?focusedCommentId=17084365&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17084365



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-8579) TLP website identity updates

2020-05-04 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SOLR-8579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl resolved SOLR-8579.
---
Resolution: Fixed

> TLP website identity updates
> 
>
> Key: SOLR-8579
> URL: https://issues.apache.org/jira/browse/SOLR-8579
> Project: Solr
>  Issue Type: Task
>  Components: website
>Reporter: Jan Høydahl
>Priority: Major
>  Labels: newdev
> Attachments: download-button.patch
>
>
> The TLP site http://lucene.apache.org/ could need some logo updates, and I 
> feel that the TLP site should better reflect that it is a *Project* and not a 
> *Product* site. As it is now it looks almost identical to the Lucene-Java 
> site. Would be nice to start from scratch, alternatively fix the old one.
> h2. Option A: New responsive TLP site
> Create a super clean new TLP site with responsive design. Content could be 
> limited to describing the three sub projects with logo, short presentation, 
> download button and link to product sites. Perhaps also promote the community 
> in form of some auto-updated stats (active committers, ML activity, link to 
> last board report?). No slideshow, no endless news, no duplicate menus...
> h2. Option B: Refresh the existing site
> *Should*
> * The top branding contains a Lucene+ASF logo. Make a new top with the brand 
> new ASF feather logo, Lucene logo and Solr logo
> * Replace old orange Solr logo in slideshow with the new red one
> * Color scheme is the Lucene pale green, same as for Lucene-core. Choose 
> another color scheme for the TLP!
> * Remove the discontinued OpenRelevance project from top menu and intro 
> bullet list. Keep a link "OpenRelevance (discontinued)" in right-menu?
> *Optional*
> * Color of the Solr Download button should be changed to Solr-RED™
> * Likewise, color of the Lucene Download button could take Lucene-GREEN™ ?
> * Main title says *Welcome to Apache Lucene*. Perhaps it should say *Welcome 
> to Apache Lucene/Solr*?
> * Update the slide show images and texts to better describe the project as of 
> 2016...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9359) SegmentInfos.readCommit should verify checksums in case of error

2020-05-04 Thread Adrien Grand (Jira)
Adrien Grand created LUCENE-9359:


 Summary: SegmentInfos.readCommit should verify checksums in case 
of error
 Key: LUCENE-9359
 URL: https://issues.apache.org/jira/browse/LUCENE-9359
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand


SegmentInfos.readCommit only calls checkFooter if reading the commit succeeded. 
We should also call it in case of errors in order to be able to distinguish 
hardware errors from Lucene bugs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] romseygeek commented on a change in pull request #1462: LUCENE-9328: open group.sort docvalues once per segment.

2020-05-04 Thread GitBox


romseygeek commented on a change in pull request #1462:
URL: https://github.com/apache/lucene-solr/pull/1462#discussion_r419289810



##
File path: 
lucene/grouping/src/test/org/apache/lucene/search/grouping/DocValuesPoolingReaderTest.java
##
@@ -0,0 +1,150 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.search.grouping;
+
+import java.io.IOException;
+
+import org.apache.lucene.analysis.MockAnalyzer;
+import org.apache.lucene.document.BinaryDocValuesField;
+import org.apache.lucene.document.Document;
+import org.apache.lucene.document.NumericDocValuesField;
+import org.apache.lucene.document.SortedDocValuesField;
+import org.apache.lucene.document.SortedNumericDocValuesField;
+import org.apache.lucene.document.SortedSetDocValuesField;
+import org.apache.lucene.index.BinaryDocValues;
+import org.apache.lucene.index.DirectoryReader;
+import org.apache.lucene.index.LeafReader;
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.index.NumericDocValues;
+import org.apache.lucene.index.RandomIndexWriter;
+import org.apache.lucene.index.SortedDocValues;
+import org.apache.lucene.index.SortedNumericDocValues;
+import org.apache.lucene.index.SortedSetDocValues;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.util.BytesRef;
+import org.apache.lucene.util.LuceneTestCase;
+import org.junit.AfterClass;
+import org.junit.BeforeClass;
+
+public class DocValuesPoolingReaderTest extends LuceneTestCase {
+  
+  private static RandomIndexWriter w;
+  private static Directory dir;
+  private static DirectoryReader reader;
+
+  @BeforeClass
+  public static void index() throws IOException {
+dir = newDirectory();
+w = new RandomIndexWriter(
+random(),
+dir,
+newIndexWriterConfig(new 
MockAnalyzer(random())).setMergePolicy(newLogMergePolicy()));
+Document doc = new Document();
+doc.add(new BinaryDocValuesField("bin", new BytesRef("binary")));
+doc.add(new BinaryDocValuesField("bin2", new BytesRef("binary2")));
+
+doc.add(new NumericDocValuesField("num", 1L));
+doc.add(new NumericDocValuesField("num2", 2L));
+
+doc.add(new SortedNumericDocValuesField("sortnum", 3L));
+doc.add(new SortedNumericDocValuesField("sortnum2", 4L));
+
+doc.add(new SortedDocValuesField("sort",  new BytesRef("sorted")));
+doc.add(new SortedDocValuesField("sort2",  new BytesRef("sorted2")));
+
+doc.add(new SortedSetDocValuesField("sortset", new BytesRef("sortedset")));
+doc.add(new SortedSetDocValuesField("sortset2", new 
BytesRef("sortedset2")));
+
+w.addDocument(doc);
+w.commit();
+reader = w.getReader();
+w.close();
+  }
+  
+  public void testDVCache() throws IOException {
+assertFalse(reader.leaves().isEmpty());
+for (LeafReaderContext leaf : reader.leaves()) {
+  final DocValuesPoolingReader caching = new 
DocValuesPoolingReader(leaf.reader());
+  
+  assertSame(assertBinaryDV(caching, "bin", "binary"), 
+  caching.getBinaryDocValues("bin"));
+  assertSame(assertBinaryDV(caching, "bin2", "binary2"), 
+  caching.getBinaryDocValues("bin2"));
+  
+  assertSame(assertNumericDV(caching, "num", 1L), 
+  caching.getNumericDocValues("num"));
+  assertSame(assertNumericDV(caching, "num2", 2L), 
+  caching.getNumericDocValues("num2"));
+  
+  assertSame(assertSortedNumericDV(caching, "sortnum", 3L), 
+  caching.getSortedNumericDocValues("sortnum"));
+  assertSame(assertSortedNumericDV(caching, "sortnum2", 4L), 
+  caching.getSortedNumericDocValues("sortnum2"));
+  
+  assertSame(assertSortedDV(caching, "sort", "sorted"), 
+  caching.getSortedDocValues("sort"));
+  assertSame(assertSortedDV(caching, "sort2", "sorted2"), 
+  caching.getSortedDocValues("sort2"));

Review comment:
   I think this still doesn't test iteration through a single doc's values 
on two instances of the shared DV?  We need to pull the iterator twice, advance 
both to the same doc, and then iterate through the values on both - as I read 
it, currently if you iterate through the values on one, then you can'

[jira] [Created] (SOLR-14458) Solr Replica locked in recovering state after a Zookeeper disconnection

2020-05-04 Thread Endika Posadas (Jira)
Endika Posadas created SOLR-14458:
-

 Summary: Solr Replica locked in recovering state after a Zookeeper 
disconnection
 Key: SOLR-14458
 URL: https://issues.apache.org/jira/browse/SOLR-14458
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: SolrCloud
Affects Versions: 8.4.1
 Environment: A Solr cluster with 2 replicas that each has 2 shards 
split across 2 Windows VMS.
They use a 3 replica zookeeper across 3 vms.
Reporter: Endika Posadas
 Attachments: replica7.log, solr-thread-dump.log, solr.log

In a solr cluster, a Solr instance containing two shards has lost connection 
with zookeeper. Upon reconnecting, it has checked the status with the leader 
and start a recovery. However, it's stuck in recovering status without making 
further progress (has been like that for days now).

 

Upon checking a thread dump, `recoveryExecutor-7-thread-3-processing-n` is  
trying to acquire the lock to createa new Index Writer: `at 
org.apache.solr.update.DefaultSolrCoreState.lock(DefaultSolrCoreState.java:179)`
 (

after lock(iwLock.writeLock()){color:#cc7832};{color}). However, the 
ReentrantLock it's waiting for is never released. Moreover, no thread can be 
found holding the lock, leaving restarting Solr as the only solution.

There is no Error in the logs that can help with the issue. I have attached 
solr.log and a grep with node 7 lines, as well as a thread dump.

 

My hypothesis is that 
org.apache.solr.update.DefaultSolrCoreState#closeIndexWriter(org.apache.solr.core.SolrCore,
 boolean) was called once but for some reason openIndexWriter was skipped.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14458) Solr Replica locked in recovering state after a Zookeeper disconnection

2020-05-04 Thread Endika Posadas (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Endika Posadas updated SOLR-14458:
--
Description: 
In a solr cluster, a Solr instance containing two shards has lost connection 
with zookeeper. Upon reconnecting, it has checked the status with the leader 
and start a recovery. However, it's stuck in recovering status without making 
further progress (has been like that for days now).

 

Upon checking a thread dump, `recoveryExecutor-7-thread-3-processing-n` is  
trying to acquire the lock to createa new Index Writer: `at 
org.apache.solr.update.DefaultSolrCoreState.lock(DefaultSolrCoreState.java:179)`
 (

after lock(iwLock.writeLock()){color:#cc7832};{color}). However, the 
ReentrantLock it's waiting for is never released. Moreover, no thread can be 
found holding the lock, leaving restarting Solr as the only solution.

There is no Error in the logs that can help with the issue. I have attached 
solr.log and a grep with node 7 lines, as well as a thread dump.

 

There is also no other recovery currently running. In Solr metrics, 4 
recoveries have started, 3 have completed and 1 is running (forever).

 

My hypothesis is that 
org.apache.solr.update.DefaultSolrCoreState#closeIndexWriter(org.apache.solr.core.SolrCore,
 boolean) was called once but for some reason openIndexWriter was skipped.

  was:
In a solr cluster, a Solr instance containing two shards has lost connection 
with zookeeper. Upon reconnecting, it has checked the status with the leader 
and start a recovery. However, it's stuck in recovering status without making 
further progress (has been like that for days now).

 

Upon checking a thread dump, `recoveryExecutor-7-thread-3-processing-n` is  
trying to acquire the lock to createa new Index Writer: `at 
org.apache.solr.update.DefaultSolrCoreState.lock(DefaultSolrCoreState.java:179)`
 (

after lock(iwLock.writeLock()){color:#cc7832};{color}). However, the 
ReentrantLock it's waiting for is never released. Moreover, no thread can be 
found holding the lock, leaving restarting Solr as the only solution.

There is no Error in the logs that can help with the issue. I have attached 
solr.log and a grep with node 7 lines, as well as a thread dump.

 

My hypothesis is that 
org.apache.solr.update.DefaultSolrCoreState#closeIndexWriter(org.apache.solr.core.SolrCore,
 boolean) was called once but for some reason openIndexWriter was skipped.


> Solr Replica locked in recovering state after a Zookeeper disconnection
> ---
>
> Key: SOLR-14458
> URL: https://issues.apache.org/jira/browse/SOLR-14458
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Affects Versions: 8.4.1
> Environment: A Solr cluster with 2 replicas that each has 2 shards 
> split across 2 Windows VMS.
> They use a 3 replica zookeeper across 3 vms.
>Reporter: Endika Posadas
>Priority: Major
> Attachments: replica7.log, solr-thread-dump.log, solr.log
>
>
> In a solr cluster, a Solr instance containing two shards has lost connection 
> with zookeeper. Upon reconnecting, it has checked the status with the leader 
> and start a recovery. However, it's stuck in recovering status without making 
> further progress (has been like that for days now).
>  
> Upon checking a thread dump, `recoveryExecutor-7-thread-3-processing-n` is  
> trying to acquire the lock to createa new Index Writer: `at 
> org.apache.solr.update.DefaultSolrCoreState.lock(DefaultSolrCoreState.java:179)`
>  (
> after lock(iwLock.writeLock()){color:#cc7832};{color}). However, the 
> ReentrantLock it's waiting for is never released. Moreover, no thread can be 
> found holding the lock, leaving restarting Solr as the only solution.
> There is no Error in the logs that can help with the issue. I have attached 
> solr.log and a grep with node 7 lines, as well as a thread dump.
>  
> There is also no other recovery currently running. In Solr metrics, 4 
> recoveries have started, 3 have completed and 1 is running (forever).
>  
> My hypothesis is that 
> org.apache.solr.update.DefaultSolrCoreState#closeIndexWriter(org.apache.solr.core.SolrCore,
>  boolean) was called once but for some reason openIndexWriter was skipped.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9360) ToParentDocValues uses advanceExact() of underneath DocValues

2020-05-04 Thread Mikhail Khludnev (Jira)
Mikhail Khludnev created LUCENE-9360:


 Summary: ToParentDocValues uses advanceExact() of underneath 
DocValues
 Key: LUCENE-9360
 URL: https://issues.apache.org/jira/browse/LUCENE-9360
 Project: Lucene - Core
  Issue Type: Sub-task
Reporter: Mikhail Khludnev


Currently {{ToParentDocvalues.advanceExact()}} propagates it to 
{{DocValues.advance()}} as advised at LUCENE-7871. It causes some problem at 
LUCENE-9328 and seems not really reasonable. The later jira has patch attached 
which resolves this. The questions is why(not)?
cc [~jpountz]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz opened a new pull request #1483: LUCENE-9359: Always call checkFooter in SegmentInfos#readCommit.

2020-05-04 Thread GitBox


jpountz opened a new pull request #1483:
URL: https://github.com/apache/lucene-solr/pull/1483


   See https://issues.apache.org/jira/browse/LUCENE-9359.
   
   This builds on top of LUCENE-7822. To make the review easier I'd advise to 
add `?w=1` to the URL to ignore whitespace changes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] juanka588 commented on a change in pull request #1473: LUCENE-9353: Move terms metadata to its own file.

2020-05-04 Thread GitBox


juanka588 commented on a change in pull request #1473:
URL: https://github.com/apache/lucene-solr/pull/1473#discussion_r419337819



##
File path: 
lucene/core/src/java/org/apache/lucene/codecs/blocktree/BlockTreeTermsWriter.java
##
@@ -1060,36 +1052,35 @@ public void close() throws IOException {
   return;
 }
 closed = true;
-
+
+final String metaName = 
IndexFileNames.segmentFileName(state.segmentInfo.name, state.segmentSuffix, 
BlockTreeTermsReader.TERMS_META_EXTENSION);
 boolean success = false;
-try {
-  
-  final long dirStart = termsOut.getFilePointer();
-  final long indexDirStart = indexOut.getFilePointer();
+try (IndexOutput metaOut = state.directory.createOutput(metaName, 
state.context)) {
+  CodecUtil.writeIndexHeader(metaOut, 
BlockTreeTermsReader.TERMS_META_CODEC_NAME, 
BlockTreeTermsReader.VERSION_CURRENT,
+  state.segmentInfo.getId(), state.segmentSuffix);
 
-  termsOut.writeVInt(fields.size());
+  metaOut.writeVInt(fields.size());

Review comment:
   @jpountz here I see the same lack of serializer write/read code, could 
it be possible to have such thing, It would improve readability and unit 
testing by only mocking fieldMetadatas and check serialization is correctly 
applied.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14458) Solr Replica locked in recovering state after a Zookeeper disconnection

2020-05-04 Thread Endika Posadas (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Endika Posadas updated SOLR-14458:
--
Attachment: solrrecovering.png

> Solr Replica locked in recovering state after a Zookeeper disconnection
> ---
>
> Key: SOLR-14458
> URL: https://issues.apache.org/jira/browse/SOLR-14458
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Affects Versions: 8.4.1
> Environment: A Solr cluster with 2 replicas that each has 2 shards 
> split across 2 Windows VMS.
> They use a 3 replica zookeeper across 3 vms.
>Reporter: Endika Posadas
>Priority: Major
> Attachments: replica7.log, solr-thread-dump.log, solr.log, 
> solrrecovering.png
>
>
> In a solr cluster, a Solr instance containing two shards has lost connection 
> with zookeeper. Upon reconnecting, it has checked the status with the leader 
> and start a recovery. However, it's stuck in recovering status without making 
> further progress (has been like that for days now).
>  
> Upon checking a thread dump, `recoveryExecutor-7-thread-3-processing-n` is  
> trying to acquire the lock to createa new Index Writer: `at 
> org.apache.solr.update.DefaultSolrCoreState.lock(DefaultSolrCoreState.java:179)`
>  (
> after lock(iwLock.writeLock()){color:#cc7832};{color}). However, the 
> ReentrantLock it's waiting for is never released. Moreover, no thread can be 
> found holding the lock, leaving restarting Solr as the only solution.
> There is no Error in the logs that can help with the issue. I have attached 
> solr.log and a grep with node 7 lines, as well as a thread dump.
>  
> There is also no other recovery currently running. In Solr metrics, 4 
> recoveries have started, 3 have completed and 1 is running (forever).
>  
> My hypothesis is that 
> org.apache.solr.update.DefaultSolrCoreState#closeIndexWriter(org.apache.solr.core.SolrCore,
>  boolean) was called once but for some reason openIndexWriter was skipped.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-14459) Close SolrClientCache in ColStatus

2020-05-04 Thread Andrzej Bialecki (Jira)
Andrzej Bialecki created SOLR-14459:
---

 Summary: Close SolrClientCache in ColStatus
 Key: SOLR-14459
 URL: https://issues.apache.org/jira/browse/SOLR-14459
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Andrzej Bialecki
Assignee: Andrzej Bialecki


As David pointed out in SOLR-13292 {{ColStatus}} creates a new SCC and never 
closes it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13292) Provide extended per-segment status of a collection

2020-05-04 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098857#comment-17098857
 ] 

Andrzej Bialecki commented on SOLR-13292:
-

Good point, David - I opened SOLR-14459.

> Provide extended per-segment status of a collection
> ---
>
> Key: SOLR-13292
> URL: https://issues.apache.org/jira/browse/SOLR-13292
> Project: Solr
>  Issue Type: New Feature
>Affects Versions: 8.0, master (9.0)
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: 8.1, master (9.0)
>
> Attachments: SOLR-13292.patch, SOLR-13292.patch, adminSegments.json, 
> adminSegments.json, colstatus.json, colstatus.json
>
>
> When changing a collection configuration or schema there may be non-obvious 
> conflicts between existing data and the new configuration or the newly 
> declared schema. A similar situation arises when upgrading Solr to a new 
> version while keeping the existing data.
> Currently the {{SegmentsInfoRequestHandler}} provides insufficient 
> information to detect such conflicts. Also, there's no collection-wide 
> command to gather such status from all shard leaders.
> This issue proposes extending the {{/admin/segments}} handler to provide more 
> low-level Lucene details about the segments, including potential conflicts 
> between existing segments' data and the current declared schema. It also adds 
> a new COLSTATUS collection command to report an aggregated status from all 
> shards, and optionally for all collections.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-14450) SegmentsInfoRequestHandler doesn't properly close ref-counted IW

2020-05-04 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki resolved SOLR-14450.
-
Resolution: Duplicate

> SegmentsInfoRequestHandler doesn't properly close ref-counted IW
> 
>
> Key: SOLR-14450
> URL: https://issues.apache.org/jira/browse/SOLR-14450
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.5
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Minor
>
> As reported on the mailing list by Tiziano Degaetano:
> I’m digging in an issue getting timeouts doing a managed schema change using 
> the schema api.
>  The call  hangs reloading the cores (does not recover until restarting the 
> node):
> sun.misc.Unsafe.park​(Native Method)
>  java.util.concurrent.locks.LockSupport.parkNanos​(Unknown Source)
>  
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireNanos​(Unknown 
> Source)
>  
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos​(Unknown
>  Source)
>  java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock​(Unknown 
> Source)
>  
> org.apache.solr.update.DefaultSolrCoreState.lock​(DefaultSolrCoreState.java:179)
>  
> org.apache.solr.update.DefaultSolrCoreState.newIndexWriter​(DefaultSolrCoreState.java:230)
>  org.apache.solr.core.SolrCore.reload​(SolrCore.java:696)
>  org.apache.solr.core.CoreContainer.reload​(CoreContainer.java:1558)
>  org.apache.solr.schema.SchemaManager.doOperations​(SchemaManager.java:133)
>  
> org.apache.solr.schema.SchemaManager.performOperations​(SchemaManager.java:92)
>  
> org.apache.solr.handler.SchemaHandler.handleRequestBody​(SchemaHandler.java:90)
>  
> org.apache.solr.handler.RequestHandlerBase.handleRequest​(RequestHandlerBase.java:211)
>  org.apache.solr.core.SolrCore.execute​(SolrCore.java:2596)
>  org.apache.solr.servlet.HttpSolrCall.execute​(HttpSolrCall.java:802)
>  org.apache.solr.servlet.HttpSolrCall.call​(HttpSolrCall.java:579)
> After a while I realized it was only deadlocked, after I used the AdminUI to 
> view the segments info of the core.
> So my question: is this line correct? If withCoreInfo is false iwRef.decref() 
> will not be called to release the reader lock, preventing any further writer 
> locks.
>  
> [https://github.com/apache/lucene-solr/blob/3a743ea953f0ecfc35fc7b198f68d142ce99d789/solr/core/src/java/org/apache/solr/handler/admin/SegmentsInfoRequestHandler.java#L144]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14431) SegmentsInfoRequestHandler.java does not release IndexWriter

2020-05-04 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-14431:

Fix Version/s: 8.6

> SegmentsInfoRequestHandler.java does not release IndexWriter
> 
>
> Key: SOLR-14431
> URL: https://issues.apache.org/jira/browse/SOLR-14431
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Affects Versions: 8.1.1, 8.5.1
>Reporter: Tiziano Degaetano
>Assignee: Andrzej Bialecki
>Priority: Minor
> Fix For: 8.6
>
>
> If withCoreInfo is false iwRef.decref() will not
> be called to release the reader lock, preventing any further writer locks.
> https://github.com/apache/lucene-solr/blob/3a743ea953f0ecfc35fc7b198f68d142ce99d789/solr/core/src/java/org/apache/solr/handler/admin/SegmentsInfoRequestHandler.java#L144
> Line 130 should be moved inside the if statement L144.
> [~ab] FYI



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Assigned] (SOLR-14431) SegmentsInfoRequestHandler.java does not release IndexWriter

2020-05-04 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki reassigned SOLR-14431:
---

Assignee: Andrzej Bialecki

> SegmentsInfoRequestHandler.java does not release IndexWriter
> 
>
> Key: SOLR-14431
> URL: https://issues.apache.org/jira/browse/SOLR-14431
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Affects Versions: 8.1.1, 8.5.1
>Reporter: Tiziano Degaetano
>Assignee: Andrzej Bialecki
>Priority: Minor
>
> If withCoreInfo is false iwRef.decref() will not
> be called to release the reader lock, preventing any further writer locks.
> https://github.com/apache/lucene-solr/blob/3a743ea953f0ecfc35fc7b198f68d142ce99d789/solr/core/src/java/org/apache/solr/handler/admin/SegmentsInfoRequestHandler.java#L144
> Line 130 should be moved inside the if statement L144.
> [~ab] FYI



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9348) Rework grouping tests to make it simpler to add new GroupSelector implementations

2020-05-04 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098862#comment-17098862
 ] 

ASF subversion and git services commented on LUCENE-9348:
-

Commit 0c58687a978ef19a4913b3a9350492d4ae6af40d in lucene-solr's branch 
refs/heads/master from Alan Woodward
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=0c58687 ]

LUCENE-9348: Add a base grouping test for use with different GroupSelector 
implementations (#1461)

The grouping module tests currently all try and test both grouping by term and
grouping by ValueSource. They are quite difficult to follow, however, and it is 
not
at all easy to add tests for a new grouping type. This commit adds a new
BaseGroupSelectorTestCase class which can be extended to test particular
GroupSelector implementations, and adds tests for TermGroupSelector and
ValueSourceGroupSelector.  It also adds a separate test for Block grouping,
so that the distinct grouping types are tested separately.

> Rework grouping tests to make it simpler to add new GroupSelector 
> implementations
> -
>
> Key: LUCENE-9348
> URL: https://issues.apache.org/jira/browse/LUCENE-9348
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The grouping module tests currently all try and test both grouping by term 
> and grouping by ValueSource.  They are quite difficult to follow, however, 
> and it is not at all easy to add tests for a new grouping type.  We should 
> refactor things into an abstract base class that can then be extended by 
> tests for each specific grouping type.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-9348) Rework grouping tests to make it simpler to add new GroupSelector implementations

2020-05-04 Thread Alan Woodward (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward resolved LUCENE-9348.
---
Fix Version/s: 8.6
   Resolution: Fixed

> Rework grouping tests to make it simpler to add new GroupSelector 
> implementations
> -
>
> Key: LUCENE-9348
> URL: https://issues.apache.org/jira/browse/LUCENE-9348
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: 8.6
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The grouping module tests currently all try and test both grouping by term 
> and grouping by ValueSource.  They are quite difficult to follow, however, 
> and it is not at all easy to add tests for a new grouping type.  We should 
> refactor things into an abstract base class that can then be extended by 
> tests for each specific grouping type.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9348) Rework grouping tests to make it simpler to add new GroupSelector implementations

2020-05-04 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098864#comment-17098864
 ] 

ASF subversion and git services commented on LUCENE-9348:
-

Commit 5fd36c4d56b5b9ecbcc6d5f7736fb3f42672b3d4 in lucene-solr's branch 
refs/heads/branch_8x from Alan Woodward
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=5fd36c4 ]

LUCENE-9348: Add a base grouping test for use with different GroupSelector 
implementations (#1461)

The grouping module tests currently all try and test both grouping by term and
grouping by ValueSource. They are quite difficult to follow, however, and it is 
not
at all easy to add tests for a new grouping type. This commit adds a new
BaseGroupSelectorTestCase class which can be extended to test particular
GroupSelector implementations, and adds tests for TermGroupSelector and
ValueSourceGroupSelector.  It also adds a separate test for Block grouping,
so that the distinct grouping types are tested separately.

> Rework grouping tests to make it simpler to add new GroupSelector 
> implementations
> -
>
> Key: LUCENE-9348
> URL: https://issues.apache.org/jira/browse/LUCENE-9348
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The grouping module tests currently all try and test both grouping by term 
> and grouping by ValueSource.  They are quite difficult to follow, however, 
> and it is not at all easy to add tests for a new grouping type.  We should 
> refactor things into an abstract base class that can then be extended by 
> tests for each specific grouping type.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9360) ToParentDocValues uses advanceExact() of underneath DocValues

2020-05-04 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098872#comment-17098872
 ] 

Adrien Grand commented on LUCENE-9360:
--

I looked at the Jira but it's not clean to me what the problem is with calling 
advance() under the hood, can you explain? I'm also a bit confused why the Jira 
has both patches and a PR attached, which do not have the same content, what 
would eventually get merged?

> ToParentDocValues uses advanceExact() of underneath DocValues
> -
>
> Key: LUCENE-9360
> URL: https://issues.apache.org/jira/browse/LUCENE-9360
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Mikhail Khludnev
>Priority: Major
>
> Currently {{ToParentDocvalues.advanceExact()}} propagates it to 
> {{DocValues.advance()}} as advised at LUCENE-7871. It causes some problem at 
> LUCENE-9328 and seems not really reasonable. The later jira has patch 
> attached which resolves this. The questions is why(not)?
> cc [~jpountz]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on a change in pull request #1473: LUCENE-9353: Move terms metadata to its own file.

2020-05-04 Thread GitBox


jpountz commented on a change in pull request #1473:
URL: https://github.com/apache/lucene-solr/pull/1473#discussion_r419391721



##
File path: 
lucene/core/src/java/org/apache/lucene/codecs/blocktree/BlockTreeTermsWriter.java
##
@@ -1060,36 +1052,35 @@ public void close() throws IOException {
   return;
 }
 closed = true;
-
+
+final String metaName = 
IndexFileNames.segmentFileName(state.segmentInfo.name, state.segmentSuffix, 
BlockTreeTermsReader.TERMS_META_EXTENSION);
 boolean success = false;
-try {
-  
-  final long dirStart = termsOut.getFilePointer();
-  final long indexDirStart = indexOut.getFilePointer();
+try (IndexOutput metaOut = state.directory.createOutput(metaName, 
state.context)) {
+  CodecUtil.writeIndexHeader(metaOut, 
BlockTreeTermsReader.TERMS_META_CODEC_NAME, 
BlockTreeTermsReader.VERSION_CURRENT,
+  state.segmentInfo.getId(), state.segmentSuffix);
 
-  termsOut.writeVInt(fields.size());
+  metaOut.writeVInt(fields.size());

Review comment:
   Your proposal sounds orthogonal to this pull request to me?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9328) SortingGroupHead to reuse DocValues

2020-05-04 Thread Lucene/Solr QA (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1709#comment-1709
 ] 

Lucene/Solr QA commented on LUCENE-9328:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  8s{color} 
| {color:red} LUCENE-9328 does not apply to master. Rebase required? Wrong 
Branch? See 
https://wiki.apache.org/lucene-java/HowToContribute#Contributing_your_work for 
help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | LUCENE-9328 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13001971/LUCENE-9328.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-LUCENE-Build/267/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> SortingGroupHead to reuse DocValues
> ---
>
> Key: LUCENE-9328
> URL: https://issues.apache.org/jira/browse/LUCENE-9328
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/grouping
>Reporter: Mikhail Khludnev
>Assignee: Mikhail Khludnev
>Priority: Minor
> Attachments: LUCENE-9328.patch, LUCENE-9328.patch, LUCENE-9328.patch, 
> LUCENE-9328.patch, LUCENE-9328.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> That's why 
> https://issues.apache.org/jira/browse/LUCENE-7701?focusedCommentId=17084365&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17084365



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mikemccand commented on a change in pull request #1482: LUCENE-7822: CodecUtil#checkFooter should throw a CorruptIndexException as the main exception.

2020-05-04 Thread GitBox


mikemccand commented on a change in pull request #1482:
URL: https://github.com/apache/lucene-solr/pull/1482#discussion_r419407888



##
File path: lucene/core/src/java/org/apache/lucene/codecs/CodecUtil.java
##
@@ -448,24 +448,27 @@ public static void checkFooter(ChecksumIndexInput in, 
Throwable priorException)
   checkFooter(in);
 } else {
   try {
+// If we have evidence of corruption then we return the corruption as 
the
+// main exception and the prior exception gets suppressed. Otherwise we
+// return the prior exception with a suppressed exception that notifies
+// the user that checksums matched.
 long remaining = in.length() - in.getFilePointer();
 if (remaining < footerLength()) {
   // corruption caused us to read into the checksum footer already: we 
can't proceed
-  priorException.addSuppressed(new CorruptIndexException("checksum 
status indeterminate: remaining=" + remaining +
- ", please run 
checkindex for more details", in));
+  throw new CorruptIndexException("checksum status indeterminate: 
remaining=" + remaining +
+  ", please run checkindex for more 
details", in);
 } else {
   // otherwise, skip any unread bytes.
   in.skipBytes(remaining - footerLength());
   
   // now check the footer
-  try {
-long checksum = checkFooter(in);
-priorException.addSuppressed(new CorruptIndexException("checksum 
passed (" + Long.toHexString(checksum) + 
-   "). 
possibly transient resource issue, or a Lucene or JVM bug", in));
-  } catch (CorruptIndexException t) {
-priorException.addSuppressed(t);
-  }
+  long checksum = checkFooter(in);
+  priorException.addSuppressed(new CorruptIndexException("checksum 
passed (" + Long.toHexString(checksum) +

Review comment:
   Do we normally (in other places) also use `CorruptIndexException` to 
indicate a valid checksum?  I feel like we need a `NotCorruptIndexException` 
for this :)

##
File path: lucene/core/src/java/org/apache/lucene/codecs/CodecUtil.java
##
@@ -448,24 +448,27 @@ public static void checkFooter(ChecksumIndexInput in, 
Throwable priorException)
   checkFooter(in);
 } else {
   try {
+// If we have evidence of corruption then we return the corruption as 
the
+// main exception and the prior exception gets suppressed. Otherwise we
+// return the prior exception with a suppressed exception that notifies
+// the user that checksums matched.
 long remaining = in.length() - in.getFilePointer();
 if (remaining < footerLength()) {
   // corruption caused us to read into the checksum footer already: we 
can't proceed
-  priorException.addSuppressed(new CorruptIndexException("checksum 
status indeterminate: remaining=" + remaining +
- ", please run 
checkindex for more details", in));
+  throw new CorruptIndexException("checksum status indeterminate: 
remaining=" + remaining +
+  ", please run checkindex for more 
details", in);

Review comment:
   Nitpick: `;` instead of `,` since these are really two separate 
sentences?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14431) SegmentsInfoRequestHandler.java does not release IndexWriter

2020-05-04 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098901#comment-17098901
 ] 

ASF subversion and git services commented on SOLR-14431:


Commit 96a8c6a91c050b4db3c0f39ba68a55db4936f348 in lucene-solr's branch 
refs/heads/branch_8x from Andrzej Bialecki
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=96a8c6a ]

SOLR-14431: SegmentsInfoRequestHandler does not release IndexWriter.


> SegmentsInfoRequestHandler.java does not release IndexWriter
> 
>
> Key: SOLR-14431
> URL: https://issues.apache.org/jira/browse/SOLR-14431
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Affects Versions: 8.1.1, 8.5.1
>Reporter: Tiziano Degaetano
>Assignee: Andrzej Bialecki
>Priority: Minor
> Fix For: 8.6
>
>
> If withCoreInfo is false iwRef.decref() will not
> be called to release the reader lock, preventing any further writer locks.
> https://github.com/apache/lucene-solr/blob/3a743ea953f0ecfc35fc7b198f68d142ce99d789/solr/core/src/java/org/apache/solr/handler/admin/SegmentsInfoRequestHandler.java#L144
> Line 130 should be moved inside the if statement L144.
> [~ab] FYI



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14431) SegmentsInfoRequestHandler.java does not release IndexWriter

2020-05-04 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098902#comment-17098902
 ] 

ASF subversion and git services commented on SOLR-14431:


Commit 5eea489e447be1bbc291d1465fca8b40c1f46d11 in lucene-solr's branch 
refs/heads/master from Andrzej Bialecki
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=5eea489 ]

SOLR-14431: SegmentsInfoRequestHandler does not release IndexWriter.


> SegmentsInfoRequestHandler.java does not release IndexWriter
> 
>
> Key: SOLR-14431
> URL: https://issues.apache.org/jira/browse/SOLR-14431
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Affects Versions: 8.1.1, 8.5.1
>Reporter: Tiziano Degaetano
>Assignee: Andrzej Bialecki
>Priority: Minor
> Fix For: 8.6
>
>
> If withCoreInfo is false iwRef.decref() will not
> be called to release the reader lock, preventing any further writer locks.
> https://github.com/apache/lucene-solr/blob/3a743ea953f0ecfc35fc7b198f68d142ce99d789/solr/core/src/java/org/apache/solr/handler/admin/SegmentsInfoRequestHandler.java#L144
> Line 130 should be moved inside the if statement L144.
> [~ab] FYI



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-14431) SegmentsInfoRequestHandler.java does not release IndexWriter

2020-05-04 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki resolved SOLR-14431.
-
Resolution: Fixed

I applied your fix and modified a unit test to ensure the fix works as 
intended. Thanks Tiziano!

> SegmentsInfoRequestHandler.java does not release IndexWriter
> 
>
> Key: SOLR-14431
> URL: https://issues.apache.org/jira/browse/SOLR-14431
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Affects Versions: 8.1.1, 8.5.1
>Reporter: Tiziano Degaetano
>Assignee: Andrzej Bialecki
>Priority: Minor
> Fix For: 8.6
>
>
> If withCoreInfo is false iwRef.decref() will not
> be called to release the reader lock, preventing any further writer locks.
> https://github.com/apache/lucene-solr/blob/3a743ea953f0ecfc35fc7b198f68d142ce99d789/solr/core/src/java/org/apache/solr/handler/admin/SegmentsInfoRequestHandler.java#L144
> Line 130 should be moved inside the if statement L144.
> [~ab] FYI



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] romseygeek opened a new pull request #1484: LUCENE-7889: Allow grouping on Double/LongValuesSource

2020-05-04 Thread GitBox


romseygeek opened a new pull request #1484:
URL: https://github.com/apache/lucene-solr/pull/1484


   The grouping module currently allows grouping on a SortedDocValues field, or 
on
   a ValueSource.  The latter groups only on exact values, and so will not 
perform well
   on numeric-valued fields.  This commit adds the ability to group by defined 
ranges
   from a Long or DoubleValuesSource.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7889) Allow grouping on DoubleValuesSource ranges

2020-05-04 Thread Alan Woodward (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-7889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098908#comment-17098908
 ] 

Alan Woodward commented on LUCENE-7889:
---

Bringing this up to date, the refactored base test classes make this much 
easier to test now.

> Allow grouping on DoubleValuesSource ranges
> ---
>
> Key: LUCENE-7889
> URL: https://issues.apache.org/jira/browse/LUCENE-7889
> Project: Lucene - Core
>  Issue Type: New Feature
>Affects Versions: 7.0
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-7889.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> LUCENE-7701 made it easier to define new ways of grouping results.  This 
> issue adds functionality to group the values of a DoubleValuesSource into a 
> set of ranges.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9191) Fix linefiledocs compression or replace in tests

2020-05-04 Thread Michael McCandless (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098909#comment-17098909
 ] 

Michael McCandless commented on LUCENE-9191:


OK, I can repro the above failure!  
https://issues.apache.org/jira/browse/LUCENE-9191?focusedCommentId=17094930&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17094930

Digging...

> Fix linefiledocs compression or replace in tests
> 
>
> Key: LUCENE-9191
> URL: https://issues.apache.org/jira/browse/LUCENE-9191
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Assignee: Michael McCandless
>Priority: Major
> Fix For: 8.6
>
> Attachments: LUCENE-9191.patch, LUCENE-9191.patch
>
>
> LineFileDocs(random) is very slow, even to open. It does a very slow "random 
> skip" through a gzip compressed file.
> For the analyzers tests, in LUCENE-9186 I simply removed its usage, since 
> TestUtil.randomAnalysisString is superior, and fast. But we should address 
> other tests using it, since LineFileDocs(random) is slow!
> I think it is also the case that every lucene test has probably tested every 
> LineFileDocs line many times now, whereas randomAnalysisString will invent 
> new ones.
> Alternatively, we could "fix" LineFileDocs(random), e.g. special compression 
> options (in blocks)... deflate supports such stuff. But it would make it even 
> hairier than it is now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] juanka588 commented on a change in pull request #1473: LUCENE-9353: Move terms metadata to its own file.

2020-05-04 Thread GitBox


juanka588 commented on a change in pull request #1473:
URL: https://github.com/apache/lucene-solr/pull/1473#discussion_r419448277



##
File path: 
lucene/core/src/java/org/apache/lucene/codecs/blocktree/BlockTreeTermsWriter.java
##
@@ -1060,36 +1052,35 @@ public void close() throws IOException {
   return;
 }
 closed = true;
-
+
+final String metaName = 
IndexFileNames.segmentFileName(state.segmentInfo.name, state.segmentSuffix, 
BlockTreeTermsReader.TERMS_META_EXTENSION);
 boolean success = false;
-try {
-  
-  final long dirStart = termsOut.getFilePointer();
-  final long indexDirStart = indexOut.getFilePointer();
+try (IndexOutput metaOut = state.directory.createOutput(metaName, 
state.context)) {

Review comment:
   why this file is not created at the same time with the indexOut, termOut?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] juanka588 commented on a change in pull request #1473: LUCENE-9353: Move terms metadata to its own file.

2020-05-04 Thread GitBox


juanka588 commented on a change in pull request #1473:
URL: https://github.com/apache/lucene-solr/pull/1473#discussion_r419449861



##
File path: 
lucene/core/src/java/org/apache/lucene/codecs/blocktree/BlockTreeTermsWriter.java
##
@@ -1060,36 +1052,35 @@ public void close() throws IOException {
   return;
 }
 closed = true;
-
+
+final String metaName = 
IndexFileNames.segmentFileName(state.segmentInfo.name, state.segmentSuffix, 
BlockTreeTermsReader.TERMS_META_EXTENSION);
 boolean success = false;
-try {
-  
-  final long dirStart = termsOut.getFilePointer();
-  final long indexDirStart = indexOut.getFilePointer();
+try (IndexOutput metaOut = state.directory.createOutput(metaName, 
state.context)) {
+  CodecUtil.writeIndexHeader(metaOut, 
BlockTreeTermsReader.TERMS_META_CODEC_NAME, 
BlockTreeTermsReader.VERSION_CURRENT,
+  state.segmentInfo.getId(), state.segmentSuffix);
 
-  termsOut.writeVInt(fields.size());
+  metaOut.writeVInt(fields.size());

Review comment:
   yes but maybe is the opportunity to move the code and add more 
testability in BlockTreeTermsReader.java 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob commented on a change in pull request #1482: LUCENE-7822: CodecUtil#checkFooter should throw a CorruptIndexException as the main exception.

2020-05-04 Thread GitBox


madrob commented on a change in pull request #1482:
URL: https://github.com/apache/lucene-solr/pull/1482#discussion_r419467116



##
File path: lucene/core/src/test/org/apache/lucene/util/TestOfflineSorter.java
##
@@ -353,12 +352,10 @@ protected void corruptFile() throws IOException {
 
   // This corruption made OfflineSorter fail with its own exception, but 
we verify it also went and added (as suppressed) that the

Review comment:
   no longer as suppressed 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob commented on pull request #1480: SOLR-14456: Fix Content-Type header forwarding on compressed requests

2020-05-04 Thread GitBox


madrob commented on pull request #1480:
URL: https://github.com/apache/lucene-solr/pull/1480#issuecomment-623493855


   Sorry, accidentally closed the PR because I misunderstood what a button in 
the IntelliJ plugin was doing (I thought it was to close a popup, not close the 
PR).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob commented on a change in pull request #341: SOLR-12131: ExternalRoleRuleBasedAuthorizationPlugin

2020-05-04 Thread GitBox


madrob commented on a change in pull request #341:
URL: https://github.com/apache/lucene-solr/pull/341#discussion_r419480359



##
File path: 
solr/core/src/java/org/apache/solr/security/RuleBasedAuthorizationPlugin.java
##
@@ -16,329 +16,45 @@
  */
 package org.apache.solr.security;
 
-import java.io.IOException;
 import java.lang.invoke.MethodHandles;
 import java.security.Principal;
-import java.util.ArrayList;
 import java.util.HashMap;
-import java.util.HashSet;
-import java.util.List;
 import java.util.Map;
 import java.util.Set;
-import java.util.function.Function;
 
-import org.apache.solr.common.SpecProvider;
-import org.apache.solr.common.util.Utils;
-import org.apache.solr.common.util.ValidatingJsonMap;
-import org.apache.solr.common.util.CommandOperation;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
-import static java.util.Arrays.asList;
-import static java.util.Collections.unmodifiableMap;
-import static java.util.function.Function.identity;
-import static java.util.stream.Collectors.toMap;
-import static org.apache.solr.handler.admin.SecurityConfHandler.getListValue;
 import static org.apache.solr.handler.admin.SecurityConfHandler.getMapValue;
 
-
-public class RuleBasedAuthorizationPlugin implements AuthorizationPlugin, 
ConfigEditablePlugin, SpecProvider {
+/**
+ * Original implementation of Rule Based Authz plugin which configures 
user/role
+ * mapping in the security.json configuration
+ */
+public class RuleBasedAuthorizationPlugin extends 
RuleBasedAuthorizationPluginBase {

Review comment:
   Yea, planning for back compat is fine, I think.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob commented on pull request #341: SOLR-12131: ExternalRoleRuleBasedAuthorizationPlugin

2020-05-04 Thread GitBox


madrob commented on pull request #341:
URL: https://github.com/apache/lucene-solr/pull/341#issuecomment-623502714


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9361) Consider removing CachingCollector

2020-05-04 Thread Alan Woodward (Jira)
Alan Woodward created LUCENE-9361:
-

 Summary: Consider removing CachingCollector
 Key: LUCENE-9361
 URL: https://issues.apache.org/jira/browse/LUCENE-9361
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Alan Woodward
Assignee: Alan Woodward


We have a caching collector implementation that is used by the grouping module 
to avoid running queries twice when doing two-phase grouping queries.  This 
stores things in a flat array of int[maxDoc], and optionally stores scores as 
well in a parallel float[] array.  Given that we have a much more efficient way 
of caching matching documents per-segment in the built-in QueryCache, should we 
consider removing this collector entirely and advise users to configure query 
caches appropriately instead?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9361) Consider removing CachingCollector

2020-05-04 Thread Michael McCandless (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17099014#comment-17099014
 ] 

Michael McCandless commented on LUCENE-9361:


Does the query cache also cache scores too now?

> Consider removing CachingCollector
> --
>
> Key: LUCENE-9361
> URL: https://issues.apache.org/jira/browse/LUCENE-9361
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
>
> We have a caching collector implementation that is used by the grouping 
> module to avoid running queries twice when doing two-phase grouping queries.  
> This stores things in a flat array of int[maxDoc], and optionally stores 
> scores as well in a parallel float[] array.  Given that we have a much more 
> efficient way of caching matching documents per-segment in the built-in 
> QueryCache, should we consider removing this collector entirely and advise 
> users to configure query caches appropriately instead?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9361) Consider removing CachingCollector

2020-05-04 Thread Alan Woodward (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17099018#comment-17099018
 ] 

Alan Woodward commented on LUCENE-9361:
---

Note that we may want to move the implementation to Solr for the time being, as 
the Solr query cache is used slightly differently

> Consider removing CachingCollector
> --
>
> Key: LUCENE-9361
> URL: https://issues.apache.org/jira/browse/LUCENE-9361
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
>
> We have a caching collector implementation that is used by the grouping 
> module to avoid running queries twice when doing two-phase grouping queries.  
> This stores things in a flat array of int[maxDoc], and optionally stores 
> scores as well in a parallel float[] array.  Given that we have a much more 
> efficient way of caching matching documents per-segment in the built-in 
> QueryCache, should we consider removing this collector entirely and advise 
> users to configure query caches appropriately instead?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9361) Consider removing CachingCollector

2020-05-04 Thread Alan Woodward (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17099020#comment-17099020
 ] 

Alan Woodward commented on LUCENE-9361:
---

> Does the query cache also cache scores too now?

It doesn't, I guess the question is how useful this is compared to how much 
memory it's likely to use.

> Consider removing CachingCollector
> --
>
> Key: LUCENE-9361
> URL: https://issues.apache.org/jira/browse/LUCENE-9361
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
>
> We have a caching collector implementation that is used by the grouping 
> module to avoid running queries twice when doing two-phase grouping queries.  
> This stores things in a flat array of int[maxDoc], and optionally stores 
> scores as well in a parallel float[] array.  Given that we have a much more 
> efficient way of caching matching documents per-segment in the built-in 
> QueryCache, should we consider removing this collector entirely and advise 
> users to configure query caches appropriately instead?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] HoustonPutman commented on pull request #1480: SOLR-14456: Fix Content-Type header forwarding on compressed requests

2020-05-04 Thread GitBox


HoustonPutman commented on pull request #1480:
URL: https://github.com/apache/lucene-solr/pull/1480#issuecomment-623527178


   I think this issue should exist in `Http2SolrClient`, I see similar logic 
there.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] HoustonPutman edited a comment on pull request #1480: SOLR-14456: Fix Content-Type header forwarding on compressed requests

2020-05-04 Thread GitBox


HoustonPutman edited a comment on pull request #1480:
URL: https://github.com/apache/lucene-solr/pull/1480#issuecomment-623527178


   I think this issue should exist in `Http2SolrClient`, I see similar logic 
there. That was just a cursory look though.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9191) Fix linefiledocs compression or replace in tests

2020-05-04 Thread Michael McCandless (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-9191:
---
Attachment: LUCENE-9191.patch
Status: Reopened  (was: Reopened)

OK I found the issue!

You cannot seek to the middle of a multi-byte UTF-8 encoded Unicode character.

The attached patch fixes it, by putting back logic that used to be there that I 
didn't understand and stupidly removed :)  I added a comment explaining the 
situation...

I'll commit soon.

> Fix linefiledocs compression or replace in tests
> 
>
> Key: LUCENE-9191
> URL: https://issues.apache.org/jira/browse/LUCENE-9191
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Assignee: Michael McCandless
>Priority: Major
> Fix For: 8.6
>
> Attachments: LUCENE-9191.patch, LUCENE-9191.patch, LUCENE-9191.patch
>
>
> LineFileDocs(random) is very slow, even to open. It does a very slow "random 
> skip" through a gzip compressed file.
> For the analyzers tests, in LUCENE-9186 I simply removed its usage, since 
> TestUtil.randomAnalysisString is superior, and fast. But we should address 
> other tests using it, since LineFileDocs(random) is slow!
> I think it is also the case that every lucene test has probably tested every 
> LineFileDocs line many times now, whereas randomAnalysisString will invent 
> new ones.
> Alternatively, we could "fix" LineFileDocs(random), e.g. special compression 
> options (in blocks)... deflate supports such stuff. But it would make it even 
> hairier than it is now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14400) two small SolrMetricsContext cleanups

2020-05-04 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17099079#comment-17099079
 ] 

ASF subversion and git services commented on SOLR-14400:


Commit 9c3b2b665471f563ef4d3de681fd229bf40803fe in lucene-solr's branch 
refs/heads/master from Christine Poerschke
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=9c3b2b6 ]

SOLR-14400: DirectUpdateHandler2 no longer needs to override 
getSolrMetricsContext


> two small SolrMetricsContext cleanups
> -
>
> Key: SOLR-14400
> URL: https://issues.apache.org/jira/browse/SOLR-14400
> Project: Solr
>  Issue Type: Task
>  Components: metrics
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
>Priority: Minor
> Attachments: SOLR-14400.patch
>
>
> (details to follow)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14400) two small SolrMetricsContext cleanups

2020-05-04 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17099080#comment-17099080
 ] 

ASF subversion and git services commented on SOLR-14400:


Commit b81083142c4c6391b73a3f0d41af817f8ed0c238 in lucene-solr's branch 
refs/heads/master from Christine Poerschke
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=b810831 ]

SOLR-14400: SuggestComponent can use parent class' SolrMetricsContext


> two small SolrMetricsContext cleanups
> -
>
> Key: SOLR-14400
> URL: https://issues.apache.org/jira/browse/SOLR-14400
> Project: Solr
>  Issue Type: Task
>  Components: metrics
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
>Priority: Minor
> Attachments: SOLR-14400.patch
>
>
> (details to follow)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14400) two small SolrMetricsContext cleanups

2020-05-04 Thread Christine Poerschke (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christine Poerschke updated SOLR-14400:
---
Fix Version/s: master (9.0)
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Builds upon SOLR-13858 change and so is applicable for 9.0 (master) branch only.

> two small SolrMetricsContext cleanups
> -
>
> Key: SOLR-14400
> URL: https://issues.apache.org/jira/browse/SOLR-14400
> Project: Solr
>  Issue Type: Task
>  Components: metrics
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
>Priority: Minor
> Fix For: master (9.0)
>
> Attachments: SOLR-14400.patch
>
>
> (details to follow)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14423) static caches in StreamHandler ought to move to CoreContainer lifecycle

2020-05-04 Thread Christine Poerschke (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17099091#comment-17099091
 ] 

Christine Poerschke commented on SOLR-14423:


bq. ... StreamHandler (at "/stream") has several statically declared caches. 
... SolrClientCache ...

I had noticed that GraphHandler references StreamHandler's client cache -- e.g. 
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.5.1/solr/core/src/java/org/apache/solr/handler/GraphHandler.java#L150
 -- and it sounds like the above solution could take care of that too then.

> static caches in StreamHandler ought to move to CoreContainer lifecycle
> ---
>
> Key: SOLR-14423
> URL: https://issues.apache.org/jira/browse/SOLR-14423
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: streaming expressions
>Reporter: David Smiley
>Priority: Major
>
> StreamHandler (at "/stream") has several statically declared caches.  I think 
> this is problematic, such as in testing wherein multiple nodes could be in 
> the same JVM.  One of them is more serious -- SolrClientCache which is 
> closed/cleared via a SolrCore close hook.  That's bad for performance but 
> also dangerous since another core might want to use one of these clients!
> CC [~jbernste]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14454) support for UTF-8 (string) types with DocValuesType.BINARY

2020-05-04 Thread Michael Gibney (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17099107#comment-17099107
 ] 

Michael Gibney commented on SOLR-14454:
---

I'm sorry I missed SOLR-10255, thanks for calling attention to it! There's 
definitely some overlap there (and I'm happy to close this issue and pursue 
these changes under SOLR-10255 if you think that's appropriate).

Although the two issues could be considered duplicative, the approach and 
emphasis appear to differ considerably. I'm curious what you think of the 
difference in the two approaches. [PR 
#1478|https://github.com/apache/lucene-solr/pull/1478] has a smaller footprint 
addressing a narrower set of use cases, and enables functionality 
(export/streaming expressions) that is otherwise completely unavailable. The 
patch for SOLR-10255 seems to focus more on improving performance of existing 
stored-field use cases. Is the approach taken by [PR 
#1478|https://github.com/apache/lucene-solr/pull/1478] something like what you 
had in mind with the following comment (re: SOLR-10255) – perhaps not, but 
still worth asking?:
{quote}I think this feature would go hand-in-hand with a compressing BDV-only 
DocValuesFormat... which should be done in parallel to this
{quote}
Point well taken regarding compression. But even if it doesn't make sense to 
include {{CompressedBinaryDocValuesStringField}} in this PR, it still works as 
a POC of a custom fieldType that could leverage the added functionality (of 
respecting utf8 fields that have binary docValues). Also, codec-level dv 
compression need not be mutually exclusive with support for per-value 
compression implemented at the field level, even if there's no canonical 
implementation of the latter; though in any case you wouldn't _use_ both at the 
same time. Since I have the export use case foremost in my mind, it's possible 
that for bulk arbitrary-order access (as in the export case), compression 
implemented per-doc/per-value in a custom fieldType might be preferable to 
block-based codec-level compression (esp. since a custom fieldType could 
presumably support more customization than codec-layer compression – e.g., 
deflate {{dictionaryFile}} in the POC implementation above).

> support for UTF-8 (string) types with DocValuesType.BINARY
> --
>
> Key: SOLR-14454
> URL: https://issues.apache.org/jira/browse/SOLR-14454
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Schema and Analysis
>Affects Versions: master (9.0)
>Reporter: Michael Gibney
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The goal is to add support for string fields with arbitrarily large values in 
> the {{/export}} handler and streaming expressions.
> {{StrField}} values are currently limited to 32766 bytes for the case where 
> {{indexed=true}} or {{docValues=true}}. Exceeding this value triggers an 
> "immense field" warning, and causes indexing to fail for the associated input 
> doc.
> Configuring a {{StrField}} field as "{{indexed=false docValues=false}}" 
> removes this size limitation, so it is already possible to have large 
> _stored_ {{StrField}} values. But the "{{docValues=true}}" prerequisite for 
> the {{/export}} handler (and consequently for streaming expressions) limits 
> the size of field that can be used in conjunction with these features.
> Adding support for UTF-8/string field types with {{DocValuesType.BINARY}} 
> would address this limitation and allow considerable flexibility in the 
> implementation of custom field types. N.b.: this would address field value 
> retrieval use cases only (e.g., {{/export}} and {{useDocValuesAsStored}}); 
> neither sorting nor faceting would be supported.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9191) Fix linefiledocs compression or replace in tests

2020-05-04 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17099163#comment-17099163
 ] 

ASF subversion and git services commented on LUCENE-9191:
-

Commit 1783c4ad47990d1a88ac3bb44b2da2c2d2abcc79 in lucene-solr's branch 
refs/heads/master from Michael McCandless
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=1783c4a ]

LUCENE-9191: ensure LineFileDocs random seeking effort does not seek into the 
middle of a multi-byte UTF-8 encoded Unicode character


> Fix linefiledocs compression or replace in tests
> 
>
> Key: LUCENE-9191
> URL: https://issues.apache.org/jira/browse/LUCENE-9191
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Assignee: Michael McCandless
>Priority: Major
> Fix For: 8.6
>
> Attachments: LUCENE-9191.patch, LUCENE-9191.patch, LUCENE-9191.patch
>
>
> LineFileDocs(random) is very slow, even to open. It does a very slow "random 
> skip" through a gzip compressed file.
> For the analyzers tests, in LUCENE-9186 I simply removed its usage, since 
> TestUtil.randomAnalysisString is superior, and fast. But we should address 
> other tests using it, since LineFileDocs(random) is slow!
> I think it is also the case that every lucene test has probably tested every 
> LineFileDocs line many times now, whereas randomAnalysisString will invent 
> new ones.
> Alternatively, we could "fix" LineFileDocs(random), e.g. special compression 
> options (in blocks)... deflate supports such stuff. But it would make it even 
> hairier than it is now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] samuelgmartinez commented on pull request #1480: SOLR-14456: Fix Content-Type header forwarding on compressed requests

2020-05-04 Thread GitBox


samuelgmartinez commented on pull request #1480:
URL: https://github.com/apache/lucene-solr/pull/1480#issuecomment-623599049


   > I think this issue should exist in `Http2SolrClient`, I see similar logic 
there. That was just a cursory look though.
   
   I reviewed it and it's now properly implemented. It relies on two different 
methods for getting the encoding: one is Jetty's 
`Response.CompleteListener#getEncoding` (which is the charset not the 
content-encoding) and then the `Http2SolrClient#getEncoding` (which relies on a 
"manual" parsing of the charset attribute from the content-type header).
   
   I think I should refactor `Http2SolrClient#getEncoding` to just get the 
encoding using the HttpClient's `ContentType` classes. I don't think it's 
perfect as the http2 Jetty based implementation would depend on a HttpClient 
class, but the dependency [is there 
already](https://github.com/apache/lucene-solr/blob/13f19f65559290a860df84fa1b5ac2db903b27ec/solr/solrj/src/java/org/apache/solr/client/solrj/impl/Http2SolrClient.java#L686).
 Any opinion on the refactor? 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] samuelgmartinez edited a comment on pull request #1480: SOLR-14456: Fix Content-Type header forwarding on compressed requests

2020-05-04 Thread GitBox


samuelgmartinez edited a comment on pull request #1480:
URL: https://github.com/apache/lucene-solr/pull/1480#issuecomment-623599049


   > I think this issue should exist in `Http2SolrClient`, I see similar logic 
there. That was just a cursory look though.
   
   I reviewed it and existing code is properly implemented. It relies on two 
different methods for getting the encoding: one is Jetty's 
`Response.CompleteListener#getEncoding` (which is the charset not the 
content-encoding) and then the `Http2SolrClient#getEncoding` (which relies on a 
"manual" parsing of the charset attribute from the content-type header).
   
   I think I should refactor `Http2SolrClient#getEncoding` to just get the 
encoding using the HttpClient's `ContentType` classes. I don't think it's 
perfect as the http2 Jetty based implementation would depend on a HttpClient 
class, but the dependency [is there 
already](https://github.com/apache/lucene-solr/blob/13f19f65559290a860df84fa1b5ac2db903b27ec/solr/solrj/src/java/org/apache/solr/client/solrj/impl/Http2SolrClient.java#L686).
 Any opinion on the refactor? 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9191) Fix linefiledocs compression or replace in tests

2020-05-04 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17099166#comment-17099166
 ] 

ASF subversion and git services commented on LUCENE-9191:
-

Commit eec79e0b2be7f1198d20c5d24e5a99d456d7b05c in lucene-solr's branch 
refs/heads/branch_8x from Michael McCandless
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=eec79e0 ]

LUCENE-9191: ensure LineFileDocs random seeking effort does not seek into the 
middle of a multi-byte UTF-8 encoded Unicode character


> Fix linefiledocs compression or replace in tests
> 
>
> Key: LUCENE-9191
> URL: https://issues.apache.org/jira/browse/LUCENE-9191
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Assignee: Michael McCandless
>Priority: Major
> Fix For: 8.6
>
> Attachments: LUCENE-9191.patch, LUCENE-9191.patch, LUCENE-9191.patch
>
>
> LineFileDocs(random) is very slow, even to open. It does a very slow "random 
> skip" through a gzip compressed file.
> For the analyzers tests, in LUCENE-9186 I simply removed its usage, since 
> TestUtil.randomAnalysisString is superior, and fast. But we should address 
> other tests using it, since LineFileDocs(random) is slow!
> I think it is also the case that every lucene test has probably tested every 
> LineFileDocs line many times now, whereas randomAnalysisString will invent 
> new ones.
> Alternatively, we could "fix" LineFileDocs(random), e.g. special compression 
> options (in blocks)... deflate supports such stuff. But it would make it even 
> hairier than it is now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9148) Move the BKD index to its own file.

2020-05-04 Thread Michael McCandless (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17099192#comment-17099192
 ] 

Michael McCandless commented on LUCENE-9148:


{quote}So I started working on splitting it into multiple files.
{quote}
Do you mean one file per field?

Or two files (data file, index file) per segment, so all BKD fields in that 
segment still need just the two files?

> Move the BKD index to its own file.
> ---
>
> Key: LUCENE-9148
> URL: https://issues.apache.org/jira/browse/LUCENE-9148
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Lucene60PointsWriter stores both inner nodes and leaf nodes in the same file, 
> interleaved. For instance if you have two fields, you would have 
> {{}}. It's not 
> ideal since leaves and inner nodes have quite different access patterns. 
> Should we split this into two files? In the case when the BKD index is 
> off-heap, this would also help force it into RAM with 
> {{MMapDirectory#setPreload}}.
> Note that Lucene60PointsFormat already has a file that it calls "index" but 
> it's really only about mapping fields to file pointers in the other file and 
> not what I'm discussing here. But we could possibly store the BKD indices in 
> this existing file if we want to avoid creating a new one.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] MarcusSorealheis commented on pull request #1471: SOLR-14014 PR Against Master

2020-05-04 Thread GitBox


MarcusSorealheis commented on pull request #1471:
URL: https://github.com/apache/lucene-solr/pull/1471#issuecomment-623629267


   > Here, I take UI instance that was previously started when ADMIN UI was 
enabled, disabled the admin ui via system property, and then show the result. 
The first page change possible is because it's a single page app and I didn't 
reload the page (but you can see system property):
   > 
   > 
![disabled-gif](https://user-images.githubusercontent.com/2353608/80791537-ce19f280-8b46-11ea-834b-e5bf59f6be80.gif)
   
   This image doesn't apply anymore as you need to set the environment variable 
at startup and not use the Java Prop directly. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9321) Port documentation task to gradle

2020-05-04 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17099229#comment-17099229
 ] 

Dawid Weiss commented on LUCENE-9321:
-

I read your comment in details (the previous time as well, Uwe). I would like 
to avoid having to render javadocs twice... but if we can't dodge this then 
sure - your approach sounds ok. 

Tomoko did a great job in renderJavadocs. If we are to render them twice then 
the code from renderJavadocs could be turned into a task and just declared 
twice, with different options (target folder, link rendering). Sounds good?

> Port documentation task to gradle
> -
>
> Key: LUCENE-9321
> URL: https://issues.apache.org/jira/browse/LUCENE-9321
> Project: Lucene - Core
>  Issue Type: Sub-task
>  Components: general/build
>Reporter: Tomoko Uchida
>Assignee: Uwe Schindler
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: screenshot-1.png
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> This is a placeholder issue for porting ant "documentation" task to gradle. 
> The generated documents should be able to be published on lucene.apache.org 
> web site on "as-is" basis.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9321) Port documentation task to gradle

2020-05-04 Thread Uwe Schindler (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17099249#comment-17099249
 ] 

Uwe Schindler commented on LUCENE-9321:
---

Hi Dawid,

That's my plan. I will work on that. I will also make the absolute "base URL" 
configurable, like in Ant. This allows to generate Maven Snapshot Artifacts on 
Jenkins that have working links.

Last Saturday, I was not sure what you wanted to say, but after reading your 
answer multiple times, I got your point. Sorry. Smartphone is a too small 
device and the Jira Mobile Interface is a usability desaster.

> Port documentation task to gradle
> -
>
> Key: LUCENE-9321
> URL: https://issues.apache.org/jira/browse/LUCENE-9321
> Project: Lucene - Core
>  Issue Type: Sub-task
>  Components: general/build
>Reporter: Tomoko Uchida
>Assignee: Uwe Schindler
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: screenshot-1.png
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> This is a placeholder issue for porting ant "documentation" task to gradle. 
> The generated documents should be able to be published on lucene.apache.org 
> web site on "as-is" basis.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9321) Port documentation task to gradle

2020-05-04 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17099256#comment-17099256
 ] 

Dawid Weiss commented on LUCENE-9321:
-

Thanks. Could be me just not expressing myself clearly. I know it's possible to 
work on a pull request together somehow so I can chip in and Tomoko by now is 
javadoc-expert level probably. ;) Let us know if you need help.

> Port documentation task to gradle
> -
>
> Key: LUCENE-9321
> URL: https://issues.apache.org/jira/browse/LUCENE-9321
> Project: Lucene - Core
>  Issue Type: Sub-task
>  Components: general/build
>Reporter: Tomoko Uchida
>Assignee: Uwe Schindler
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: screenshot-1.png
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> This is a placeholder issue for porting ant "documentation" task to gradle. 
> The generated documents should be able to be published on lucene.apache.org 
> web site on "as-is" basis.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9148) Move the BKD index to its own file.

2020-05-04 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17099293#comment-17099293
 ] 

Adrien Grand commented on LUCENE-9148:
--

Not one file per field, it would be horrible. :) My current prototype has 3 
files:
 - One meta file that is fully read when opening the index. It contains 
metadata about the field like number of dimensions, and offsets into the index 
and data files.
 - An index file that stores the inner nodes of the BKD tree.
 - A data file that stores the leaf nodes.

The motivation for splitting the index and data files is that they have 
different access patterns. For instance finding nearest neighbors is pretty 
intense on the index, and I believe some users might want to keep it in RAM so 
having it in a different file from the data file will help users leverage 
MmapDirectory#setPreload and FileSwitchDirectory to do so.

We could go without a meta file by storing its content at the beginning of the 
index or data file, but a separate file makes the write logic easier since 
there is no need to buffer a lot of content before writing and helps get better 
error messages in case of corruption since we can verify the content of the 
file against a checksum when opening the index, which avoids e.g. trying to 
create slices with out-of-bounds offsets.


> Move the BKD index to its own file.
> ---
>
> Key: LUCENE-9148
> URL: https://issues.apache.org/jira/browse/LUCENE-9148
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Lucene60PointsWriter stores both inner nodes and leaf nodes in the same file, 
> interleaved. For instance if you have two fields, you would have 
> {{}}. It's not 
> ideal since leaves and inner nodes have quite different access patterns. 
> Should we split this into two files? In the case when the BKD index is 
> off-heap, this would also help force it into RAM with 
> {{MMapDirectory#setPreload}}.
> Note that Lucene60PointsFormat already has a file that it calls "index" but 
> it's really only about mapping fields to file pointers in the other file and 
> not what I'm discussing here. But we could possibly store the BKD indices in 
> this existing file if we want to avoid creating a new one.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on a change in pull request #1473: LUCENE-9353: Move terms metadata to its own file.

2020-05-04 Thread GitBox


jpountz commented on a change in pull request #1473:
URL: https://github.com/apache/lucene-solr/pull/1473#discussion_r419694045



##
File path: 
lucene/core/src/java/org/apache/lucene/codecs/blocktree/BlockTreeTermsWriter.java
##
@@ -1060,36 +1052,35 @@ public void close() throws IOException {
   return;
 }
 closed = true;
-
+
+final String metaName = 
IndexFileNames.segmentFileName(state.segmentInfo.name, state.segmentSuffix, 
BlockTreeTermsReader.TERMS_META_EXTENSION);
 boolean success = false;
-try {
-  
-  final long dirStart = termsOut.getFilePointer();
-  final long indexDirStart = indexOut.getFilePointer();
+try (IndexOutput metaOut = state.directory.createOutput(metaName, 
state.context)) {

Review comment:
   That would work too. I like keeping the index output open for as little 
time as possible when it doesn't make things worse otherwise.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9357) AssertingSorted(Set|Numeric)DocValues should be unwrappable

2020-05-04 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17099311#comment-17099311
 ] 

Adrien Grand commented on LUCENE-9357:
--

+1 to this proposal

> AssertingSorted(Set|Numeric)DocValues should be unwrappable
> ---
>
> Key: LUCENE-9357
> URL: https://issues.apache.org/jira/browse/LUCENE-9357
> Project: Lucene - Core
>  Issue Type: Sub-task
>  Components: modules/test-framework
>Reporter: Mikhail Khludnev
>Priority: Minor
>
> # Obviously singular docValues might mimic multivalued ones via 
> {{DocValues.singleton()}}. However, some algorithms prefers to 
> {{DocValues.unwrap()}} them if possible. 
> # AssertingDocValues blocks this unwrapping slightly changing codepath for 
> singular DVs.
> h3. AS IS
> {{AssertingDV -> Singleton -> SingularDV}}
> h3. TODO
> {{Singleton -> AssertingDV -> SingularDV}}
> I think it's trivial, worthwhile and 0% risk. Are there any concerns?   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mkhludnev commented on a change in pull request #1462: LUCENE-9328: open group.sort docvalues once per segment.

2020-05-04 Thread GitBox


mkhludnev commented on a change in pull request #1462:
URL: https://github.com/apache/lucene-solr/pull/1462#discussion_r419706467



##
File path: 
lucene/grouping/src/test/org/apache/lucene/search/grouping/DocValuesPoolingReaderTest.java
##
@@ -0,0 +1,150 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.search.grouping;
+
+import java.io.IOException;
+
+import org.apache.lucene.analysis.MockAnalyzer;
+import org.apache.lucene.document.BinaryDocValuesField;
+import org.apache.lucene.document.Document;
+import org.apache.lucene.document.NumericDocValuesField;
+import org.apache.lucene.document.SortedDocValuesField;
+import org.apache.lucene.document.SortedNumericDocValuesField;
+import org.apache.lucene.document.SortedSetDocValuesField;
+import org.apache.lucene.index.BinaryDocValues;
+import org.apache.lucene.index.DirectoryReader;
+import org.apache.lucene.index.LeafReader;
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.index.NumericDocValues;
+import org.apache.lucene.index.RandomIndexWriter;
+import org.apache.lucene.index.SortedDocValues;
+import org.apache.lucene.index.SortedNumericDocValues;
+import org.apache.lucene.index.SortedSetDocValues;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.util.BytesRef;
+import org.apache.lucene.util.LuceneTestCase;
+import org.junit.AfterClass;
+import org.junit.BeforeClass;
+
+public class DocValuesPoolingReaderTest extends LuceneTestCase {
+  
+  private static RandomIndexWriter w;
+  private static Directory dir;
+  private static DirectoryReader reader;
+
+  @BeforeClass
+  public static void index() throws IOException {
+dir = newDirectory();
+w = new RandomIndexWriter(
+random(),
+dir,
+newIndexWriterConfig(new 
MockAnalyzer(random())).setMergePolicy(newLogMergePolicy()));
+Document doc = new Document();
+doc.add(new BinaryDocValuesField("bin", new BytesRef("binary")));
+doc.add(new BinaryDocValuesField("bin2", new BytesRef("binary2")));
+
+doc.add(new NumericDocValuesField("num", 1L));
+doc.add(new NumericDocValuesField("num2", 2L));
+
+doc.add(new SortedNumericDocValuesField("sortnum", 3L));
+doc.add(new SortedNumericDocValuesField("sortnum2", 4L));
+
+doc.add(new SortedDocValuesField("sort",  new BytesRef("sorted")));
+doc.add(new SortedDocValuesField("sort2",  new BytesRef("sorted2")));
+
+doc.add(new SortedSetDocValuesField("sortset", new BytesRef("sortedset")));
+doc.add(new SortedSetDocValuesField("sortset2", new 
BytesRef("sortedset2")));
+
+w.addDocument(doc);
+w.commit();
+reader = w.getReader();
+w.close();
+  }
+  
+  public void testDVCache() throws IOException {
+assertFalse(reader.leaves().isEmpty());
+for (LeafReaderContext leaf : reader.leaves()) {
+  final DocValuesPoolingReader caching = new 
DocValuesPoolingReader(leaf.reader());
+  
+  assertSame(assertBinaryDV(caching, "bin", "binary"), 
+  caching.getBinaryDocValues("bin"));
+  assertSame(assertBinaryDV(caching, "bin2", "binary2"), 
+  caching.getBinaryDocValues("bin2"));
+  
+  assertSame(assertNumericDV(caching, "num", 1L), 
+  caching.getNumericDocValues("num"));
+  assertSame(assertNumericDV(caching, "num2", 2L), 
+  caching.getNumericDocValues("num2"));
+  
+  assertSame(assertSortedNumericDV(caching, "sortnum", 3L), 
+  caching.getSortedNumericDocValues("sortnum"));
+  assertSame(assertSortedNumericDV(caching, "sortnum2", 4L), 
+  caching.getSortedNumericDocValues("sortnum2"));
+  
+  assertSame(assertSortedDV(caching, "sort", "sorted"), 
+  caching.getSortedDocValues("sort"));
+  assertSame(assertSortedDV(caching, "sort2", "sorted2"), 
+  caching.getSortedDocValues("sort2"));

Review comment:
   Hi, @romseygeek , thanks for feedback. I agree to add more tests. I 
suppose such functionality is not necessary. Docs arrive into grouping 
collector in-order. Thus, different group heads might read vals one by one also 
in-order. When it needs to jump back and read cached values? 
   Currently, test 

[GitHub] [lucene-solr] mkhludnev commented on a change in pull request #1462: LUCENE-9328: open group.sort docvalues once per segment.

2020-05-04 Thread GitBox


mkhludnev commented on a change in pull request #1462:
URL: https://github.com/apache/lucene-solr/pull/1462#discussion_r419713904



##
File path: 
lucene/grouping/src/java/org/apache/lucene/search/grouping/DocValuesPoolingReader.java
##
@@ -0,0 +1,175 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.search.grouping;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.Map;
+
+import org.apache.lucene.index.BinaryDocValues;
+import org.apache.lucene.index.DocValues;
+import org.apache.lucene.index.FilterLeafReader;
+import org.apache.lucene.index.LeafReader;
+import org.apache.lucene.index.NumericDocValues;
+import org.apache.lucene.index.SortedDocValues;
+import org.apache.lucene.index.SortedNumericDocValues;
+import org.apache.lucene.index.SortedSetDocValues;
+import org.apache.lucene.index.TermsEnum;
+import org.apache.lucene.search.DocIdSetIterator;
+import org.apache.lucene.util.BytesRef;
+
+/**
+ * Caches docValues for the given {@linkplain LeafReader}.
+ * It only necessary when consumer retrieves same docValues many times per
+ * segment. Returned docValues should be iterated forward only.
+ * Caveat: {@link #getContext()} is completely misguiding for this class since 
+ * it looses baseDoc, ord from underneath context.
+ * @lucene.experimental   
+ * */
+class DocValuesPoolingReader extends FilterLeafReader {
+
+  @FunctionalInterface
+  interface DVSupplier{
+T getDocValues(String field) throws IOException;
+  } 
+  
+  private Map cache = new HashMap<>();
+
+  DocValuesPoolingReader(LeafReader in) {
+super(in);
+  }
+
+  @SuppressWarnings("unchecked")
+  protected  T computeIfAbsent(String field, 
DVSupplier supplier) throws IOException {
+T dv;
+if ((dv = (T) cache.get(field)) == null) {
+ dv = supplier.getDocValues(field);
+ cache.put(field, dv);
+}
+return dv;
+  }
+
+  @Override
+  public CacheHelper getReaderCacheHelper() {
+return null;
+  }
+
+  @Override
+  public CacheHelper getCoreCacheHelper() {
+return null;
+  }
+
+  @Override
+  public BinaryDocValues getBinaryDocValues(String field) throws IOException {
+return computeIfAbsent(field, in::getBinaryDocValues);
+  }
+  
+  @Override
+  public NumericDocValues getNumericDocValues(String field) throws IOException 
{
+return computeIfAbsent(field, in::getNumericDocValues);
+  }
+  
+  @Override
+  public SortedNumericDocValues getSortedNumericDocValues(String field) throws 
IOException {
+return computeIfAbsent(field, in::getSortedNumericDocValues);
+  }
+  
+  public SortedDocValues getSortedDocValues(String field) throws IOException {
+return computeIfAbsent(field, in::getSortedDocValues);
+  }
+  
+  @Override
+  public SortedSetDocValues getSortedSetDocValues(String field) throws 
IOException {
+return computeIfAbsent(field, field1 -> {
+  final SortedSetDocValues sortedSet = in.getSortedSetDocValues(field1);
+  final SortedDocValues singleton = DocValues.unwrapSingleton(sortedSet);

Review comment:
   To pass by default strict singleton wrapper. If I comment it I've got
   `java.lang.IllegalStateException: iterator has already been used: docID=0
at 
__randomizedtesting.SeedInfo.seed([2825543CB6DDF1D2:83DF4929690177FC]:0)
at 
org.apache.lucene.index.SingletonSortedSetDocValues.getSortedDocValues(SingletonSortedSetDocValues.java:45)
at org.apache.lucene.index.DocValues.unwrapSingleton(DocValues.java:280)
at 
org.apache.lucene.search.SortedSetSelector.wrap(SortedSetSelector.java:74)
at 
org.apache.lucene.search.SortedSetSortField$1.getSortedDocValues(SortedSetSortField.java:125)
at 
org.apache.lucene.search.FieldComparator$TermOrdValComparator.getLeafComparator(FieldComparator.java:714)
at 
org.apache.lucene.search.grouping.AllGroupHeadsCollector$SortingGroupHead.(AllGroupHeadsCollector.java:276)`
   While second group head obtains DV, SortedSetSelector unwraps SetDV, then 
Singleton shouted out already moved iter. 





This is an automated message from the Apache Git Service.
To respond to the message, please log

[GitHub] [lucene-solr] mkhludnev commented on a change in pull request #1462: LUCENE-9328: open group.sort docvalues once per segment.

2020-05-04 Thread GitBox


mkhludnev commented on a change in pull request #1462:
URL: https://github.com/apache/lucene-solr/pull/1462#discussion_r419715037



##
File path: 
lucene/grouping/src/java/org/apache/lucene/search/grouping/DocValuesPoolingReader.java
##
@@ -0,0 +1,175 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.search.grouping;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.Map;
+
+import org.apache.lucene.index.BinaryDocValues;
+import org.apache.lucene.index.DocValues;
+import org.apache.lucene.index.FilterLeafReader;
+import org.apache.lucene.index.LeafReader;
+import org.apache.lucene.index.NumericDocValues;
+import org.apache.lucene.index.SortedDocValues;
+import org.apache.lucene.index.SortedNumericDocValues;
+import org.apache.lucene.index.SortedSetDocValues;
+import org.apache.lucene.index.TermsEnum;
+import org.apache.lucene.search.DocIdSetIterator;
+import org.apache.lucene.util.BytesRef;
+
+/**
+ * Caches docValues for the given {@linkplain LeafReader}.
+ * It only necessary when consumer retrieves same docValues many times per
+ * segment. Returned docValues should be iterated forward only.
+ * Caveat: {@link #getContext()} is completely misguiding for this class since 
+ * it looses baseDoc, ord from underneath context.
+ * @lucene.experimental   
+ * */
+class DocValuesPoolingReader extends FilterLeafReader {
+
+  @FunctionalInterface
+  interface DVSupplier{
+T getDocValues(String field) throws IOException;
+  } 
+  
+  private Map cache = new HashMap<>();
+
+  DocValuesPoolingReader(LeafReader in) {
+super(in);
+  }
+
+  @SuppressWarnings("unchecked")
+  protected  T computeIfAbsent(String field, 
DVSupplier supplier) throws IOException {
+T dv;
+if ((dv = (T) cache.get(field)) == null) {
+ dv = supplier.getDocValues(field);
+ cache.put(field, dv);
+}
+return dv;
+  }
+
+  @Override
+  public CacheHelper getReaderCacheHelper() {
+return null;
+  }
+
+  @Override
+  public CacheHelper getCoreCacheHelper() {
+return null;
+  }
+
+  @Override
+  public BinaryDocValues getBinaryDocValues(String field) throws IOException {
+return computeIfAbsent(field, in::getBinaryDocValues);
+  }
+  
+  @Override
+  public NumericDocValues getNumericDocValues(String field) throws IOException 
{
+return computeIfAbsent(field, in::getNumericDocValues);
+  }
+  
+  @Override
+  public SortedNumericDocValues getSortedNumericDocValues(String field) throws 
IOException {
+return computeIfAbsent(field, in::getSortedNumericDocValues);
+  }
+  
+  public SortedDocValues getSortedDocValues(String field) throws IOException {
+return computeIfAbsent(field, in::getSortedDocValues);
+  }
+  
+  @Override
+  public SortedSetDocValues getSortedSetDocValues(String field) throws 
IOException {
+return computeIfAbsent(field, field1 -> {
+  final SortedSetDocValues sortedSet = in.getSortedSetDocValues(field1);
+  final SortedDocValues singleton = DocValues.unwrapSingleton(sortedSet);

Review comment:
   To pass by default strict singleton wrapper. If I comment it I've got
   ```
   java.lang.IllegalStateException: iterator has already been used: docID=0 at 
   __randomizedtesting.SeedInfo.seed([2825543CB6DDF1D2:83DF4929690177FC]:0) at 
   
org.apache.lucene.index.SingletonSortedSetDocValues.getSortedDocValues(SingletonSortedSetDocValues.java:45)
 at 
   org.apache.lucene.index.DocValues.unwrapSingleton(DocValues.java:280) at 
   org.apache.lucene.search.SortedSetSelector.wrap(SortedSetSelector.java:74) 
at 
   
org.apache.lucene.search.SortedSetSortField$1.getSortedDocValues(SortedSetSortField.java:125)
 at 
   
org.apache.lucene.search.FieldComparator$TermOrdValComparator.getLeafComparator(FieldComparator.java:714)
 at 
   
org.apache.lucene.search.grouping.AllGroupHeadsCollector$SortingGroupHead.(AllGroupHeadsCollector.java:276)
   ```
   
   While second group head obtains DV, SortedSetSelector unwraps SetDV, then 
Singleton shouted out already moved iter.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitH

[GitHub] [lucene-solr] mkhludnev commented on pull request #1462: LUCENE-9328: open group.sort docvalues once per segment.

2020-05-04 Thread GitBox


mkhludnev commented on pull request #1462:
URL: https://github.com/apache/lucene-solr/pull/1462#issuecomment-623695126


   @romseygeek tests in the patch are dummy. The key evidence  that existing 
grouping tests keeps passing. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9328) SortingGroupHead to reuse DocValues

2020-05-04 Thread Mikhail Khludnev (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev updated LUCENE-9328:
-
Attachment: LUCENE-9328.patch
Status: Patch Available  (was: Patch Available)

Attaching a proper git patch.

> SortingGroupHead to reuse DocValues
> ---
>
> Key: LUCENE-9328
> URL: https://issues.apache.org/jira/browse/LUCENE-9328
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/grouping
>Reporter: Mikhail Khludnev
>Assignee: Mikhail Khludnev
>Priority: Minor
> Attachments: LUCENE-9328.patch, LUCENE-9328.patch, LUCENE-9328.patch, 
> LUCENE-9328.patch, LUCENE-9328.patch, LUCENE-9328.patch
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> That's why 
> https://issues.apache.org/jira/browse/LUCENE-7701?focusedCommentId=17084365&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17084365



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9360) NOT NEEDED. ToParentDocValues uses advanceExact() of underneath DocValues

2020-05-04 Thread Mikhail Khludnev (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev updated LUCENE-9360:
-
Summary: NOT NEEDED. ToParentDocValues uses advanceExact() of underneath 
DocValues  (was: ToParentDocValues uses advanceExact() of underneath DocValues)

> NOT NEEDED. ToParentDocValues uses advanceExact() of underneath DocValues
> -
>
> Key: LUCENE-9360
> URL: https://issues.apache.org/jira/browse/LUCENE-9360
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Mikhail Khludnev
>Priority: Major
>
> Currently {{ToParentDocvalues.advanceExact()}} propagates it to 
> {{DocValues.advance()}} as advised at LUCENE-7871. It causes some problem at 
> LUCENE-9328 and seems not really reasonable. The later jira has patch 
> attached which resolves this. The questions is why(not)?
> cc [~jpountz]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9360) NOT NEEDED. ToParentDocValues uses advanceExact() of underneath DocValues

2020-05-04 Thread Mikhail Khludnev (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17099354#comment-17099354
 ] 

Mikhail Khludnev commented on LUCENE-9360:
--

[~jpountz], I were rushing through test failures. Today LUCENE-9328 tests have 
passed without change in {{ToParentDocValues}}. Attached the right patch to 
LUCENE-9328, at least I think so.

> NOT NEEDED. ToParentDocValues uses advanceExact() of underneath DocValues
> -
>
> Key: LUCENE-9360
> URL: https://issues.apache.org/jira/browse/LUCENE-9360
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Mikhail Khludnev
>Priority: Major
>
> Currently {{ToParentDocvalues.advanceExact()}} propagates it to 
> {{DocValues.advance()}} as advised at LUCENE-7871. It causes some problem at 
> LUCENE-9328 and seems not really reasonable. The later jira has patch 
> attached which resolves this. The questions is why(not)?
> cc [~jpountz]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mkhludnev commented on a change in pull request #1462: LUCENE-9328: open group.sort docvalues once per segment.

2020-05-04 Thread GitBox


mkhludnev commented on a change in pull request #1462:
URL: https://github.com/apache/lucene-solr/pull/1462#discussion_r419713904



##
File path: 
lucene/grouping/src/java/org/apache/lucene/search/grouping/DocValuesPoolingReader.java
##
@@ -0,0 +1,175 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.search.grouping;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.Map;
+
+import org.apache.lucene.index.BinaryDocValues;
+import org.apache.lucene.index.DocValues;
+import org.apache.lucene.index.FilterLeafReader;
+import org.apache.lucene.index.LeafReader;
+import org.apache.lucene.index.NumericDocValues;
+import org.apache.lucene.index.SortedDocValues;
+import org.apache.lucene.index.SortedNumericDocValues;
+import org.apache.lucene.index.SortedSetDocValues;
+import org.apache.lucene.index.TermsEnum;
+import org.apache.lucene.search.DocIdSetIterator;
+import org.apache.lucene.util.BytesRef;
+
+/**
+ * Caches docValues for the given {@linkplain LeafReader}.
+ * It only necessary when consumer retrieves same docValues many times per
+ * segment. Returned docValues should be iterated forward only.
+ * Caveat: {@link #getContext()} is completely misguiding for this class since 
+ * it looses baseDoc, ord from underneath context.
+ * @lucene.experimental   
+ * */
+class DocValuesPoolingReader extends FilterLeafReader {
+
+  @FunctionalInterface
+  interface DVSupplier{
+T getDocValues(String field) throws IOException;
+  } 
+  
+  private Map cache = new HashMap<>();
+
+  DocValuesPoolingReader(LeafReader in) {
+super(in);
+  }
+
+  @SuppressWarnings("unchecked")
+  protected  T computeIfAbsent(String field, 
DVSupplier supplier) throws IOException {
+T dv;
+if ((dv = (T) cache.get(field)) == null) {
+ dv = supplier.getDocValues(field);
+ cache.put(field, dv);
+}
+return dv;
+  }
+
+  @Override
+  public CacheHelper getReaderCacheHelper() {
+return null;
+  }
+
+  @Override
+  public CacheHelper getCoreCacheHelper() {
+return null;
+  }
+
+  @Override
+  public BinaryDocValues getBinaryDocValues(String field) throws IOException {
+return computeIfAbsent(field, in::getBinaryDocValues);
+  }
+  
+  @Override
+  public NumericDocValues getNumericDocValues(String field) throws IOException 
{
+return computeIfAbsent(field, in::getNumericDocValues);
+  }
+  
+  @Override
+  public SortedNumericDocValues getSortedNumericDocValues(String field) throws 
IOException {
+return computeIfAbsent(field, in::getSortedNumericDocValues);
+  }
+  
+  public SortedDocValues getSortedDocValues(String field) throws IOException {
+return computeIfAbsent(field, in::getSortedDocValues);
+  }
+  
+  @Override
+  public SortedSetDocValues getSortedSetDocValues(String field) throws 
IOException {
+return computeIfAbsent(field, field1 -> {
+  final SortedSetDocValues sortedSet = in.getSortedSetDocValues(field1);
+  final SortedDocValues singleton = DocValues.unwrapSingleton(sortedSet);

Review comment:
   To pass by default strict singleton wrapper. If I comment it I've got
   `java.lang.IllegalStateException: iterator has already been used: docID=0
at 
__randomizedtesting.SeedInfo.seed([2825543CB6DDF1D2:83DF4929690177FC]:0)
at 
org.apache.lucene.index.SingletonSortedSetDocValues.getSortedDocValues(SingletonSortedSetDocValues.java:45)
at org.apache.lucene.index.DocValues.unwrapSingleton(DocValues.java:280)
at 
org.apache.lucene.search.SortedSetSelector.wrap(SortedSetSelector.java:74)
at 
org.apache.lucene.search.SortedSetSortField$1.getSortedDocValues(SortedSetSortField.java:125)
at 
org.apache.lucene.search.FieldComparator$TermOrdValComparator.getLeafComparator(FieldComparator.java:714)
at 
org.apache.lucene.search.grouping.AllGroupHeadsCollector$SortingGroupHead.(AllGroupHeadsCollector.java:276)`
   While second group head obtains DV, SortedSetSelector unwraps SetDV, then 
Singleton shouted out already moved iter. 





This is an automated message from the Apache Git Service.
To respond to the message, please log

[jira] [Commented] (SOLR-14014) Allow Solr to start with Admin UI disabled

2020-05-04 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17099374#comment-17099374
 ] 

ASF subversion and git services commented on SOLR-14014:


Commit 6f775bfa69db5b2488ac3070e1da657919c816b9 in lucene-solr's branch 
refs/heads/master from Marcus
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=6f775bf ]

SOLR-14014 Allow disabling AdminUI at launch (#1471)



> Allow Solr to start with Admin UI disabled
> --
>
> Key: SOLR-14014
> URL: https://issues.apache.org/jira/browse/SOLR-14014
> Project: Solr
>  Issue Type: Improvement
>  Components: Admin UI, security
>Affects Versions: master (9.0), 8.3.1
>Reporter: Jason Gerlowski
>Priority: Major
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Currently Solr always runs the Admin UI. With the history of XSS issues and 
> other security concerns that have been found in the Admin UI, Solr should 
> offer a mode where the Admin UI is disabled. Maybe, and this is a topic 
> that'll need some serious discussion, this should even be the default when 
> Solr starts.
> NOTE: Disabling the Admin UI removes XSS and other attack vectors. But even 
> with the Admin UI disabled, Solr will still be inherently unsafe without 
> firewall protection on a public network.
> *Proposed design:*
> A java system property called *headless* will be used as an internal flag for 
> starting Solr in headless mode. This property will default to true. A java 
> property can be used at startup to set this flag to false.
> Here is an example:
> {code:java}
>  bin/solr start -Dheadless=false {code}
> A message will be added following startup describing the mode.
> In headless mode the following message will be displayed:
> "solr is running in headless mode. The admin console is unavailable. To to 
> turn off headless mode and allow the admin console use the following 
> parameter startup parameter:
> -Dheadless=false 
>   
> In non-headless mode the following message will be displayed:
> "solr is running with headless mode turned off. The admin console is 
> available in this mode. Disabling the Admin UI removes XSS and other attack 
> vectors"  
> If a user attempts to access the admin console while Solr is in headless mode 
> it Solr will return 401 unauthorized.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-14014) Allow Solr to start with Admin UI disabled

2020-05-04 Thread Mike Drob (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Drob resolved SOLR-14014.
--
Fix Version/s: master (9.0)
   Resolution: Fixed

Thanks for taking this on, [~marcussorealheis]! Merged into master.

> Allow Solr to start with Admin UI disabled
> --
>
> Key: SOLR-14014
> URL: https://issues.apache.org/jira/browse/SOLR-14014
> Project: Solr
>  Issue Type: Improvement
>  Components: Admin UI, security
>Affects Versions: master (9.0), 8.3.1
>Reporter: Jason Gerlowski
>Assignee: Mike Drob
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Currently Solr always runs the Admin UI. With the history of XSS issues and 
> other security concerns that have been found in the Admin UI, Solr should 
> offer a mode where the Admin UI is disabled. Maybe, and this is a topic 
> that'll need some serious discussion, this should even be the default when 
> Solr starts.
> NOTE: Disabling the Admin UI removes XSS and other attack vectors. But even 
> with the Admin UI disabled, Solr will still be inherently unsafe without 
> firewall protection on a public network.
> *Proposed design:*
> A java system property called *headless* will be used as an internal flag for 
> starting Solr in headless mode. This property will default to true. A java 
> property can be used at startup to set this flag to false.
> Here is an example:
> {code:java}
>  bin/solr start -Dheadless=false {code}
> A message will be added following startup describing the mode.
> In headless mode the following message will be displayed:
> "solr is running in headless mode. The admin console is unavailable. To to 
> turn off headless mode and allow the admin console use the following 
> parameter startup parameter:
> -Dheadless=false 
>   
> In non-headless mode the following message will be displayed:
> "solr is running with headless mode turned off. The admin console is 
> available in this mode. Disabling the Admin UI removes XSS and other attack 
> vectors"  
> If a user attempts to access the admin console while Solr is in headless mode 
> it Solr will return 401 unauthorized.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Assigned] (SOLR-14014) Allow Solr to start with Admin UI disabled

2020-05-04 Thread Mike Drob (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Drob reassigned SOLR-14014:


Assignee: Mike Drob

> Allow Solr to start with Admin UI disabled
> --
>
> Key: SOLR-14014
> URL: https://issues.apache.org/jira/browse/SOLR-14014
> Project: Solr
>  Issue Type: Improvement
>  Components: Admin UI, security
>Affects Versions: master (9.0), 8.3.1
>Reporter: Jason Gerlowski
>Assignee: Mike Drob
>Priority: Major
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Currently Solr always runs the Admin UI. With the history of XSS issues and 
> other security concerns that have been found in the Admin UI, Solr should 
> offer a mode where the Admin UI is disabled. Maybe, and this is a topic 
> that'll need some serious discussion, this should even be the default when 
> Solr starts.
> NOTE: Disabling the Admin UI removes XSS and other attack vectors. But even 
> with the Admin UI disabled, Solr will still be inherently unsafe without 
> firewall protection on a public network.
> *Proposed design:*
> A java system property called *headless* will be used as an internal flag for 
> starting Solr in headless mode. This property will default to true. A java 
> property can be used at startup to set this flag to false.
> Here is an example:
> {code:java}
>  bin/solr start -Dheadless=false {code}
> A message will be added following startup describing the mode.
> In headless mode the following message will be displayed:
> "solr is running in headless mode. The admin console is unavailable. To to 
> turn off headless mode and allow the admin console use the following 
> parameter startup parameter:
> -Dheadless=false 
>   
> In non-headless mode the following message will be displayed:
> "solr is running with headless mode turned off. The admin console is 
> available in this mode. Disabling the Admin UI removes XSS and other attack 
> vectors"  
> If a user attempts to access the admin console while Solr is in headless mode 
> it Solr will return 401 unauthorized.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] tflobbe commented on a change in pull request #1456: SOLR-13289: Support for BlockMax WAND

2020-05-04 Thread GitBox


tflobbe commented on a change in pull request #1456:
URL: https://github.com/apache/lucene-solr/pull/1456#discussion_r419791575



##
File path: solr/core/src/test/org/apache/solr/request/TestFaceting.java
##
@@ -931,5 +934,28 @@ public void testListedTermCounts() throws Exception {
 
"//lst[@name='facet_fields']/lst[@name='title_ws']/int[2][@name='Book2']",
 
"//lst[@name='facet_fields']/lst[@name='title_ws']/int[3][@name='Book3']");
   }
+  
+  @Test
+  public void testFacetCountsWithMinExactHits() throws Exception {
+final int NUM_DOCS = 20;
+for (int i = 0; i < NUM_DOCS ; i++) {
+  assertU(adoc("id", String.valueOf(i), "title_ws", "Book1"));
+  assertU(commit());

Review comment:
   Actually, I wanted to have multiple segments. I could do something like 
"sometimes()", but since the number of docs is low, I didn't think it was 
needed to add any randomization or more complex logic.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] tflobbe commented on a change in pull request #1456: SOLR-13289: Support for BlockMax WAND

2020-05-04 Thread GitBox


tflobbe commented on a change in pull request #1456:
URL: https://github.com/apache/lucene-solr/pull/1456#discussion_r419791807



##
File path: solr/core/src/test/org/apache/solr/search/SolrIndexSearcherTest.java
##
@@ -0,0 +1,200 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.search;
+
+import org.apache.lucene.index.Term;
+import org.apache.lucene.search.TermQuery;
+import org.apache.solr.SolrTestCaseJ4;
+import org.apache.solr.common.HitCountRelation;
+import org.junit.Before;
+import org.junit.BeforeClass;
+
+import java.io.IOException;
+
+public class SolrIndexSearcherTest extends SolrTestCaseJ4 {
+  
+  private final static int NUM_DOCS = 20;
+
+  @BeforeClass
+  public static void setUpClass() throws Exception {
+initCore("solrconfig.xml", "schema.xml");
+for (int i = 0 ; i < NUM_DOCS ; i ++) {
+  assertU(adoc("id", String.valueOf(i), "field1_s", "foo", "field2_s", 
String.valueOf(i % 2), "field3_s", String.valueOf(i)));
+  assertU(commit());

Review comment:
   Same as above, wanted multiple segments





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13132) Improve JSON "terms" facet performance when sorted by relatedness

2020-05-04 Thread Chris M. Hostetter (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17099437#comment-17099437
 ] 

Chris M. Hostetter commented on SOLR-13132:
---

[~mgibney]- beefing up the randomized testing of the code paths involved with 
MultiAcc has uncovered 2 bugs - i committed some test changes showing these, 
but they can also be reproduced fairly easily with {{bin/solr -e techproducts}} 
...

What both cases have in common is:
 * limit==-1 to trigger single pass collection
 * 1 or more "non-sweepable" stats are being collected in addition to 
relatedness (so MultiAcc can't be completely optimized away)


Way to trigger First bug...
{noformat}
curl -sS -X POST http://localhost:8983/solr/techproducts/query -d 
'rows=0&q=inStock:true
&back=*:*  
&fore=popularity:[10 TO *]
&json.facet={
  hobby : {
type : terms,
field : cat,
limit : -1,
facet : {
  min : "min(price)",
  skg : { type : func,
  func : "relatedness($fore,$back)",
  sweep_collection: false,
  }
}  
  }
}'
{noformat}
...if sweeping is explicitly disabled, then the "skg" stat completely drops out 
of the results, probably related to {{MultiAcc}} making some assumptions the 
{{SweepableAcc}} even when the call to {{foo.registerSweepingAccs(...)}} 
returned a non null result?
{noformat}
{
  "responseHeader":{
"status":0,
"QTime":88,
"params":{
  "q":"inStock:true\n",
  "json.facet":"{\n  hobby : {\ntype : terms,\nfield : cat,\n
limit : -1,\nfacet : {\n  min : \"min(price)\",\n  skg : { type : 
func,\n  func : \"relatedness($fore,$back)\",\n  
sweep_collection: false,\n  }\n}  \n  }\n}",
  "back":"*:*  \n",
  "rows":"0",
  "fore":"popularity:[10 TO *]\n"}},
  "response":{"numFound":17,"start":0,"docs":[]
  },
  "facets":{
"count":17,
"hobby":{
  "buckets":[{
  "val":"electronics",
  "count":8,
  "min":74.98999786376953},
{
  "val":"currency",
  "count":4},
{
  "val":"memory",
  "count":3,
  "min":74.98999786376953},
{
  "val":"hard drive",
  "count":2,
  "min":92.0},

...
{noformat}

Way to trigger Second bug...
{noformat}
curl -sS -X POST http://localhost:8983/solr/techproducts/query -d 
'rows=0&q=inStock:true
&back=*:*  
&fore=popularity:[10 TO *]
&json.facet={
  hobby : {
type : terms,
field : cat,
limit : -1,
facet : {
  skg : { type : func,
  func : "relatedness($fore,$back)",
  sweep_collection: true,
  },
  max : "max(price)"
  min : "min(price)"
}  
  }
}'
{noformat}
...when there are multiple non-sweeping stats in the MultiAcc, we get an AIOOBE 
(it's possible the order of the stats matters in input, i didn't dig very 
deep)...
{noformat}
2020-05-05 00:25:21.371 ERROR (qtp1839168128-22) [   x:techproducts] 
o.a.s.s.HttpSolrCall null:java.lang.ArrayIndexOutOfBoundsException: 
arraycopy: last destination index 3 out of bounds for object array[2]
at java.base/java.lang.System.arraycopy(Native Method)
at org.apache.lucene.util.ArrayUtil.growExact(ArrayUtil.java:221)
at 
org.apache.solr.search.facet.FacetFieldProcessor$MultiAcc.registerSweepingAccs(FacetFieldProcessor.java:777)
at 
org.apache.solr.search.facet.FacetFieldProcessor.registerSweepingAccIfSupportedByCollectAcc(FacetFieldProcessor.java:797)
at 
org.apache.solr.search.facet.FacetFieldProcessorByArrayUIF.collectDocs(FacetFieldProcessorByArrayUIF.java:68)
at 
org.apache.solr.search.facet.FacetFieldProcessorByArray.calcFacets(FacetFieldProcessorByArray.java:112)
at 
org.apache.solr.search.facet.FacetFieldProcessorByArray.process(FacetFieldProcessorByArray.java:62)
at 
org.apache.solr.search.facet.FacetRequest.process(FacetRequest.java:416)
at 
org.apache.solr.search.facet.FacetProcessor.processSubs(FacetProcessor.java:474)
at 
org.apache.solr.search.facet.FacetProcessor.fillBucket(FacetProcessor.java:431)
at 
org.apache.solr.search.facet.FacetQueryProcessor.process(FacetQuery.java:64)
at 
org.apache.solr.search.facet.FacetRequest.process(FacetRequest.java:416)
at 
org.apache.solr.search.facet.FacetModule.process(FacetModule.java:147)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:328)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:209)

{noformat}

> Improve JSON "terms" facet performance when sorted by relatedness 
> --
>
> Key: SOLR-13132
> URL: https://issues.apach

[jira] [Commented] (LUCENE-9328) SortingGroupHead to reuse DocValues

2020-05-04 Thread Lucene/Solr QA (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17099444#comment-17099444
 ] 

Lucene/Solr QA commented on LUCENE-9328:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
20s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m 
17s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 58s{color} 
| {color:red} lucene_grouping generated 1 new + 100 unchanged - 0 fixed = 101 
total (was 100) {color} |
| {color:green}+1{color} | {color:green} Release audit (RAT) {color} | 
{color:green}  1m  9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Check forbidden APIs {color} | 
{color:green}  0m 58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Validate source patterns {color} | 
{color:green}  0m 58s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
19s{color} | {color:green} grouping in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
18s{color} | {color:green} join in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
32s{color} | {color:green} test-framework in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 71m 47s{color} 
| {color:red} core in the patch failed. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 83m  5s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | solr.search.grouping.AllGroupHeadsCollectorTest |
|   | solr.TestGroupingSearch |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | LUCENE-9328 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13002026/LUCENE-9328.patch |
| Optional Tests |  compile  javac  unit  ratsources  checkforbiddenapis  
validatesourcepatterns  |
| uname | Linux lucene2-us-west.apache.org 4.4.0-170-generic #199-Ubuntu SMP 
Thu Nov 14 01:45:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | ant |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-LUCENE-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh
 |
| git revision | master / 6f775bf |
| ant | version: Apache Ant(TM) version 1.9.6 compiled on July 20 2018 |
| Default Java | LTS |
| javac | 
https://builds.apache.org/job/PreCommit-LUCENE-Build/268/artifact/out/diff-compile-javac-lucene_grouping.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-LUCENE-Build/268/artifact/out/patch-unit-solr_core.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-LUCENE-Build/268/testReport/ |
| modules | C: lucene/grouping lucene/join lucene/test-framework solr/core U: . 
|
| Console output | 
https://builds.apache.org/job/PreCommit-LUCENE-Build/268/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> SortingGroupHead to reuse DocValues
> ---
>
> Key: LUCENE-9328
> URL: https://issues.apache.org/jira/browse/LUCENE-9328
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/grouping
>Reporter: Mikhail Khludnev
>Assignee: Mikhail Khludnev
>Priority: Minor
> Attachments: LUCENE-9328.patch, LUCENE-9328.patch, LUCENE-9328.patch, 
> LUCENE-9328.patch, LUCENE-9328.patch, LUCENE-9328.patch
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> That's why 
> https://issues.apache.org/jira/browse/LUCENE-7701?focusedCommentId=17084365&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17084365



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14453) Solr proximity search highlighting issue

2020-05-04 Thread amit naliyapara (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17099549#comment-17099549
 ] 

amit naliyapara commented on SOLR-14453:


Thank you very much for prompt response. As per your suggestion We will check 
IntervalQuery and verify the result. 

> Solr proximity search highlighting issue
> 
>
> Key: SOLR-14453
> URL: https://issues.apache.org/jira/browse/SOLR-14453
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: highlighter
>Affects Versions: 8.4.1
>Reporter: amit naliyapara
>Priority: Major
> Attachments: Highlighted-response.PNG, Not-Highlighted-response.PNG, 
> managed-schema, solr-doc-Id-1.txt
>
>
> I found some problem in highlighting module. Not all the search terms are 
> getting highlighted.
> Sample query: q={!complexphrase+inOrder=true}"pos1 (pos2 OR pos3)"~30&hl=true
> Indexed text: "pos1 pos2 pos3 pos4"
> You can see that only two terms are highlighted like, "pos1 
> pos2 pos3 pos4"
> Please find attached Not-highlighted-response screen shot for same.
> The scenario is when term positions are in-order in document and query both.
> If term position not in-order then it work proper
> Sample query: q={!complexphrase+inOrder=false}"pos3 (pos1 OR pos2)"~30&hl=true
> You can see that all three term are highlighted like, "pos1 
> pos2 pos3 pos4"
> Please find attached Highlighted-response screen shot for same.
> The scenario is same in Solr source code since long time (I have checked in 
> Solr version 4 to version 7).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9360) NOT NEEDED. ToParentDocValues uses advanceExact() of underneath DocValues

2020-05-04 Thread Mikhail Khludnev (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17099577#comment-17099577
 ] 

Mikhail Khludnev commented on LUCENE-9360:
--

Turns out it's necessary, because 
https://issues.apache.org/jira/secure/attachment/13002026/LUCENE-9328.patch 
fails 
https://builds.apache.org/job/PreCommit-LUCENE-Build/268/artifact/out/patch-unit-solr_core.txt
 
{code}
   [junit4] Suite: org.apache.solr.search.grouping.AllGroupHeadsCollectorTest
   [junit4]   2> 4047854 INFO  
(SUITE-AllGroupHeadsCollectorTest-seed#[A8E569347CC3E5CE]-worker) [ ] 
o.a.s.SolrTestCase Setting 'solr.default.confdir' system property to 
test-framework derived value of 
'/home/jenkins/jenkins-slave/workspace/PreCommit-LUCENE-Build/sourcedir/solr/server/solr/configsets/_default/conf'
   [junit4]   2> NOTE: reproduce with: ant test  
-Dtestcase=AllGroupHeadsCollectorTest -Dtests.method=testBasicBlockJoin 
-Dtests.seed=A8E569347CC3E5CE -Dtests.multiplier=2 -Dtests.slow=true 
-Dtests.locale=pa-Guru-IN -Dtests.timezone=Antarctica/South_Pole 
-Dtests.asserts=true -Dtests.file.encoding=UTF-8
   [junit4] FAILURE 0.12s J0 | AllGroupHeadsCollectorTest.testBasicBlockJoin <<<
   [junit4]> Throwable #1: java.lang.AssertionError
   [junit4]>at 
__randomizedtesting.SeedInfo.seed([A8E569347CC3E5CE:9E22703862079502]:0)
   [junit4]>at 
org.apache.lucene.index.AssertingLeafReader$AssertingSortedDocValues.advance(AssertingLeafReader.java:757)
   [junit4]>at 
org.apache.lucene.search.grouping.DocValuesPoolingReader$1.advance(DocValuesPoolingReader.java:127)
   [junit4]>at 
org.apache.lucene.search.SortedSetSelector$MaxValue.advance(SortedSetSelector.java:186)
   [junit4]>at 
org.apache.lucene.search.ConjunctionDISI.doNext(ConjunctionDISI.java:200)
   [junit4]>at 
org.apache.lucene.search.ConjunctionDISI.advance(ConjunctionDISI.java:230)
   [junit4]>at 
org.apache.lucene.search.join.ToParentDocValues.advanceExact(ToParentDocValues.java:259)
   [junit4]>at 
org.apache.lucene.search.join.ToParentDocValues$SortedDVs.advanceExact(ToParentDocValues.java:84)
   [junit4]>at 
org.apache.lucene.search.FieldComparator$TermOrdValComparator.getOrdForDoc(FieldComparator.java:643)
   [junit4]>at 
org.apache.lucene.search.FieldComparator$TermOrdValComparator.copy(FieldComparator.java:691)
   [junit4]>at 
org.apache.lucene.search.grouping.AllGroupHeadsCollector$SortingGroupHead.(AllGroupHeadsCollector.java:278)
   [junit4]>at 
org.apache.lucene.search.grouping.AllGroupHeadsCollector$SortingGroupHeadsCollector.newGroupHead(AllGroupHeadsCollector.java:260)
   [junit4]>at 
org.apache.lucene.search.grouping.AllGroupHeadsCollector.collect(AllGroupHeadsCollector.java:137)
   [junit4]>at 
org.apache.lucene.search.AssertingLeafCollector.collect(AssertingLeafCollector.java:49)
   [junit4]>at 
org.apache.lucene.search.AssertingCollector$1.collect(AssertingCollector.java:58)
   [junit4]>at 
org.apache.lucene.search.AssertingLeafCollector.collect(AssertingLeafCollector.java:49)
   [junit4]>at 
org.apache.lucene.search.AssertingLeafCollector.collect(AssertingLeafCollector.java:49)
   [junit4]>at 
org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:254)
   [junit4]>at 
org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:205)
   [junit4]>at 
org.apache.lucene.search.AssertingBulkScorer.score(AssertingBulkScorer.java:81)
   [junit4]>at 
org.apache.lucene.search.AssertingBulkScorer.score(AssertingBulkScorer.java:65)
   [junit4]>at 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:741)
   [junit4]>at 
org.apache.lucene.search.AssertingIndexSearcher.search(AssertingIndexSearcher.java:72)
   [junit4]>at 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:528)
   [junit4]>at 
org.apache.solr.search.grouping.AllGroupHeadsCollectorTest.testBasicBlockJoin(AllGroupHeadsCollectorTest.java:150)
{code}
I'll check whether earlier fix with {{advanceExact()}} can heal it. 

> NOT NEEDED. ToParentDocValues uses advanceExact() of underneath DocValues
> -
>
> Key: LUCENE-9360
> URL: https://issues.apache.org/jira/browse/LUCENE-9360
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Mikhail Khludnev
>Priority: Major
>
> Currently {{ToParentDocvalues.advanceExact()}} propagates it to 
> {{DocValues.advance()}} as advised at LUCENE-7871. It causes some problem at 
> LUCENE-9328 and seems not really reasonable. The later jira has patch 
> attached which resolves this. Th

[jira] [Updated] (LUCENE-9360) might be NEEDED. ToParentDocValues uses advanceExact() of underneath DocValues

2020-05-04 Thread Mikhail Khludnev (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev updated LUCENE-9360:
-
Summary: might be NEEDED. ToParentDocValues uses advanceExact() of 
underneath DocValues  (was: NOT NEEDED. ToParentDocValues uses advanceExact() 
of underneath DocValues)

> might be NEEDED. ToParentDocValues uses advanceExact() of underneath DocValues
> --
>
> Key: LUCENE-9360
> URL: https://issues.apache.org/jira/browse/LUCENE-9360
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Mikhail Khludnev
>Priority: Major
>
> Currently {{ToParentDocvalues.advanceExact()}} propagates it to 
> {{DocValues.advance()}} as advised at LUCENE-7871. It causes some problem at 
> LUCENE-9328 and seems not really reasonable. The later jira has patch 
> attached which resolves this. The questions is why(not)?
> cc [~jpountz]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org