[jira] [Created] (SOLR-14890) Refactor code to use annotations for cluster API

2020-09-23 Thread Noble Paul (Jira)
Noble Paul created SOLR-14890:
-

 Summary: Refactor code to use annotations for cluster API
 Key: SOLR-14890
 URL: https://issues.apache.org/jira/browse/SOLR-14890
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Noble Paul
Assignee: Noble Paul






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] arafalov commented on pull request #1863: SOLR-14701: GuessSchemaFields URP to replace AddSchemaFields URP in schemaless mode

2020-09-23 Thread GitBox


arafalov commented on pull request #1863:
URL: https://github.com/apache/lucene-solr/pull/1863#issuecomment-697179182


   Strong words there "worse than useless", especially considering that this - 
to me - seems a strong improvement on the current schemaless mode as it looks 
at more values and actually supports single/multivalued fields. 
   
   In general, I was trying to implement Hoss's proposal, but I am open to the 
other ideas, if we can clarify the use case.
   
   My understanding is that the use case is of having a lot of data that one 
does not quite know the shape off. So, they want to index it quickly, explore 
and then do some manual adjustments.  I am not expecting this to be anywhere 
near production. Schemaless mode should not have been either.
   
   I am not sure how many people will know how to do step 6, but currently they 
don't even have that option. Switching from single-value to multi-value is 
impossible (very hard?) once the actual values are in the index. One has to 
basically delete everything and start again. As happens in the films example, 
if one misses the README. With this one, they can look at field definitions in 
Admin UI and remove or add fields as required without underlying lucene indexes 
throwing complains.
   
   The way I am seeing this (as well as for other example) is to have a super 
minimal learning configuration where every additional field is quite obvious. 
That learning schema, clearly, would not need the step 2 as it would be all 
setup. I thought your question was about how you would test the code for 
yourself.
   
   Additionally, to help see what was changed, I think the tag JIRA could be 
helpful. And frankly, in my imagination, it is not a cloud setup, but a simple 
learning one. Whether that, by itself, is a breaking point for you, we shall 
have to see.
   
   Generating Schema JSON raises its own questions, such as the shape of the 
schema it will be applied to, as guessing is currently happening as a 
differential to the existing schema. Also, this does not seem like the code 
that should be in this particular URP, but more of a general utility. If one 
existed, maybe it would make sense to leverage on top of it.
   
   In general, I am open to implement it any way that seems most useful. I will 
wait for another couple of opinions rather than chasing one very strong one.
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] s1monw commented on pull request #1909: LUCENE-9539: Remove caches from SortingCodecReader

2020-09-23 Thread GitBox


s1monw commented on pull request #1909:
URL: https://github.com/apache/lucene-solr/pull/1909#issuecomment-697184182


   > I think our current DV format pulls doc values for a single field several 
times when flushing/merging, e.g. first to figure out whether the field is 
single-valued and how many bits per value are needed, and a second time to 
actually write data. Should we at least cache the last DVs that got pulled so 
that the second time you pull them, we don't re-do a lot of work?
   
   that's correct. some of them are pulled like 5 times. I added a very simple 
cache and assertions that makes sure we can reuse the same instance if it's 
pulled more than once.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] arafalov commented on pull request #1863: SOLR-14701: GuessSchemaFields URP to replace AddSchemaFields URP in schemaless mode

2020-09-23 Thread GitBox


arafalov commented on pull request #1863:
URL: https://github.com/apache/lucene-solr/pull/1863#issuecomment-697188648


   Also, I am not even sure there is a pathway to return a non-error message 
from commit that bin/post will echo to the user as a positive statement. For 
queries, yes. But we are talking an Update handler with a URP chain.  



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] noblepaul commented on pull request #1863: SOLR-14701: GuessSchemaFields URP to replace AddSchemaFields URP in schemaless mode

2020-09-23 Thread GitBox


noblepaul commented on pull request #1863:
URL: https://github.com/apache/lucene-solr/pull/1863#issuecomment-697194702


   >Strong words there "worse than useless", especially considering that this - 
to me - seems a strong improvement on the current schemaless mode as it looks 
at more values and actually supports single/multivalued fields.
   
   I was referring to the current solution we have in Solr (schemaless, guess 
schema thing) . It's not a comment on the new solution. The current solution is 
indeed worse than useless
   
   >Generating Schema JSON raises its own questions, such as the shape of the 
schema it will be applied to, as guessing is currently happening as a 
differential to the existing schema. 
   
   The command is only relevant for that moment. If you execute it right away, 
it's useful. Users most likely will just copy paste the command (and edit, if 
required) 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] noblepaul opened a new pull request #1911: SOLR-14890: Refactor code to use annotations for cluster API

2020-09-23 Thread GitBox


noblepaul opened a new pull request #1911:
URL: https://github.com/apache/lucene-solr/pull/1911


   
   
   
   # Description
   
   Please provide a short description of the changes you're making with this 
pull request.
   
   # Solution
   
   Please provide a short description of the approach taken to implement your 
solution.
   
   # Tests
   
   Please describe the tests you've developed or run to confirm this patch 
implements the feature or solves the problem.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [ ] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [ ] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [ ] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [ ] I have developed this patch against the `master` branch.
   - [ ] I have run `./gradlew check`.
   - [ ] I have added tests for my changes.
   - [ ] I have added documentation for the [Ref 
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) 
(for Solr changes only).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on a change in pull request #1909: LUCENE-9539: Remove caches from SortingCodecReader

2020-09-23 Thread GitBox


jpountz commented on a change in pull request #1909:
URL: https://github.com/apache/lucene-solr/pull/1909#discussion_r493279172



##
File path: lucene/core/src/java/org/apache/lucene/index/SortingCodecReader.java
##
@@ -510,4 +457,52 @@ public LeafMetaData getMetaData() {
 return metaData;
   }
 
+  // we try to cache the last used DV or Norms instance since during merge
+  // this instance is used more than once. We could in addition to this single 
instance
+  // also cache the fields that are used for sorting since we do the work 
twice for these fields
+  private String cachedField;
+  private Object cachedObject;
+  private boolean cacheIsNorms;
+
+  private  T getOrCreateNorms(String field, IOSupplier supplier) throws 
IOException {
+return getOrCreate(field, true, supplier);
+  }
+
+  @SuppressWarnings("unchecked")
+  private synchronized   T getOrCreate(String field, boolean norms, 
IOSupplier supplier) throws IOException {
+if ((field.equals(cachedField) && cacheIsNorms == norms) == false) {
+  assert assertCreatedOnlyOnce(field, norms);
+  cachedObject = supplier.get();
+  cachedField = field;
+  cacheIsNorms = norms;
+
+}
+assert cachedObject != null;
+return (T) cachedObject;
+  }
+
+  private final Map cacheStats = new HashMap<>(); // only 
with assertions enabled
+  private boolean assertCreatedOnlyOnce(String field, boolean norms) {
+assert Thread.holdsLock(this);
+// this is mainly there to make sure we change anything in the way we 
merge we realize it early
+Integer timesCached = cacheStats.compute(field + "N:" + norms, (s, i) -> i 
== null ? 1 : i.intValue() + 1);
+if (timesCached > 1) {
+  assert norms == false :"[" + field + "] norms must not be cached twice";

Review comment:
   I think we might cache norms twice if full-text is indexed, as we'd pull 
norms once for merging norms, and another time to index impacts in postings for 
the same field.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] noblepaul closed pull request #1599: SOLR-14586: replace the second function parameter in computeIfAbsent …

2020-09-23 Thread GitBox


noblepaul closed pull request #1599:
URL: https://github.com/apache/lucene-solr/pull/1599


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-14891) Upgrade Jetty to 9.4.28+ to fix Startup Warning

2020-09-23 Thread Bernd Wahlen (Jira)
Bernd Wahlen created SOLR-14891:
---

 Summary: Upgrade Jetty to 9.4.28+ to fix Startup Warning
 Key: SOLR-14891
 URL: https://issues.apache.org/jira/browse/SOLR-14891
 Project: Solr
  Issue Type: Wish
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Bernd Wahlen


Solr currently using Jetty 9.4.27 which displays strange Warning at startup.
I think it is fixed in 9.4.28
https://github.com/eclipse/jetty.project/issues/4631

2020-09-23 09:57:57.346 WARN  (main) [   ] o.e.j.x.XmlConfiguration Ignored 
arg: 

solr.jetty
  
  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Closed] (SOLR-14357) solrj: using insecure namedCurves

2020-09-23 Thread Bernd Wahlen (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Wahlen closed SOLR-14357.
---

> solrj: using insecure namedCurves
> -
>
> Key: SOLR-14357
> URL: https://issues.apache.org/jira/browse/SOLR-14357
> Project: Solr
>  Issue Type: Bug
>Reporter: Bernd Wahlen
>Priority: Major
>
> i tried to run our our backend with solrj 8.4.1 on jdk14 and get the 
> following error:
> Caused by: java.lang.IllegalArgumentException: Error in security property. 
> Constraint unknown: c2tnb191v1
> after i removed all the X9.62 algoriths from the property 
> jdk.disabled.namedCurves in
> /usr/lib/jvm/java-14-openjdk-14.0.0.36-1.rolling.el7.x86_64/conf/security/java.security
> everything is running.
> This does not happend on staging (i think because of only 1 solr node - not 
> using lb client).
> We do not set or change any ssl settings in solr.in.sh.
> I don't know how to fix that (default config?, apache client settings?), but 
> i think using insecure algorithms may be  a security risk and not only a 
> jdk14 issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Closed] (SOLR-13862) JDK 13+Shenandoah stability/recovery problems

2020-09-23 Thread Bernd Wahlen (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Wahlen closed SOLR-13862.
---

> JDK 13+Shenandoah stability/recovery problems
> -
>
> Key: SOLR-13862
> URL: https://issues.apache.org/jira/browse/SOLR-13862
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 8.2
>Reporter: Bernd Wahlen
>Priority: Major
>
> after updating my cluster (centos 7.7, solr 8.2, jdk12) to JDK13 (3 nodes, 4 
> collections, 1 shard) everything was running good (with lower p95) for some 
> hours. Then 2 nodes (not the leader) going to recovery state, but ~"Recovery 
> failed Error opening new searcher". I tried rolling restart the cluster, but 
> recovery is not working. After i switched to jdk11 recovery works again. In 
> summary jdk11 or jdk12 was running stable, jdk13 not.
> This is my solr.in.sh:
> GC_TUNE="-XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC"
>  SOLR_TIMEZONE="CET"
>  
> GC_LOG_OPTS="-Xlog:gc*:file=/var/log/solr/solr_gc.log:time:filecount=9,filesize=20M:safepoint"
> I also tried ADDREPLICA during my attempt to reapair the cluster, which 
> causes Out of Memory on JDK 13 and worked after going back to JDK 11.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14891) Upgrade Jetty to 9.4.28+ to fix Startup Warning

2020-09-23 Thread Bernd Wahlen (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Wahlen updated SOLR-14891:

Affects Version/s: 8.6.2

> Upgrade Jetty to 9.4.28+ to fix Startup Warning
> ---
>
> Key: SOLR-14891
> URL: https://issues.apache.org/jira/browse/SOLR-14891
> Project: Solr
>  Issue Type: Wish
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.6.2
>Reporter: Bernd Wahlen
>Priority: Minor
>
> Solr currently using Jetty 9.4.27 which displays strange Warning at startup.
> I think it is fixed in 9.4.28
> https://github.com/eclipse/jetty.project/issues/4631
> 2020-09-23 09:57:57.346 WARN  (main) [   ] o.e.j.x.XmlConfiguration Ignored 
> arg: 
>  class="com.codahale.metrics.jetty9.InstrumentedQueuedThreadPool"> name="registry">
>  class="com.codahale.metrics.SharedMetricRegistries">solr.jetty
>   
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9535) Investigate recent indexing slowdown for wikimedium documents

2020-09-23 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200698#comment-17200698
 ] 

Adrien Grand commented on LUCENE-9535:
--

I tried to reproduce the slowdown locally but results do not look significant. 
Since I don't have as many cores as Mike's beast, only 24, I ran with half the 
index buffer size and the number of threads, ie. 1024MB of index buffer and 18 
threads on the wikimediumall corpus.

Baseline (master):
 - 247GB/h 224 flushes
 - 259GB/h 225 flushes
 - 248GB/h 226 flushes
 - 262GB/h 224 flushes

Patch (stored fields ignored in IndexingChain memory accounting):
 - 256GB/h 224 flushes
 - 258GB/h 223 flushes

While the nightly benchmarks are seeing a ~10% slowdown, I'm not seeing a 
significant change. I'm running out of ideas so I will decrease the block size 
of stored fields later today to see whether that makes a difference for nightly 
benchmarks, which might help confirm whether stored fields are actually the 
problem or whether it's something else.

> Investigate recent indexing slowdown for wikimedium documents
> -
>
> Key: LUCENE-9535
> URL: https://issues.apache.org/jira/browse/LUCENE-9535
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: cpu_profile.svg
>
>
> Nightly benchmarks report a ~10% slowdown for 1kB documents as of September 
> 9th: [http://people.apache.org/~mikemccand/lucenebench/indexing.html].
> On that day, we added stored fields in DWPT accounting (LUCENE-9511), so I 
> first thought this could be due to smaller flushed segments and more merging, 
> but I still wonder whether there's something else. The benchmark runs with 
> 8GB of heap, 2GB of RAM buffer and 36 indexing threads. So it's about 2GB/36 
> = 57MB of RAM buffer per thread in the worst-case scenario that all DWPTs get 
> full at the same time. Stored fields account for about 0.7MB of memory, or 1% 
> of the indexing buffer size. How can a 1% reduction of buffering capacity 
> explain a 10% indexing slowdown? I looked into this further by running 
> indexing benchmarks locally with 8 indexing threads and 128MB of indexing 
> buffer memory, which would make this issue even more apparent if the smaller 
> RAM buffer was the cause, but I'm not seeing a regression and actually I'm 
> seeing similar number of flushes when I disabled memory accounting for stored 
> fields.
> I ran indexing under a profiler to see whether something else could cause 
> this slowdown, e.g. slow implementations of ramBytesUsed on stored fields 
> writers, but nothing surprising showed up and the profile looked just like I 
> would have expected.
> Another question I have is why the 4kB benchmark is not affected at all.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] Hronom commented on a change in pull request #1864: SOLR-14850 ExactStatsCache NullPointerException when shards.tolerant=true

2020-09-23 Thread GitBox


Hronom commented on a change in pull request #1864:
URL: https://github.com/apache/lucene-solr/pull/1864#discussion_r493408910



##
File path: solr/core/src/java/org/apache/solr/search/stats/ExactStatsCache.java
##
@@ -94,6 +94,12 @@ protected ShardRequest 
doRetrieveStatsRequest(ResponseBuilder rb) {
   protected void doMergeToGlobalStats(SolrQueryRequest req, 
List responses) {
 Set allTerms = new HashSet<>();
 for (ShardResponse r : responses) {
+  if 
("true".equalsIgnoreCase(req.getParams().get(ShardParams.SHARDS_TOLERANT)) && 
r.getException() != null) {

Review comment:
   @sigram thank you for details there to put this, let me work on this, I 
will ping you back when it's done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] Hronom commented on a change in pull request #1864: SOLR-14850 ExactStatsCache NullPointerException when shards.tolerant=true

2020-09-23 Thread GitBox


Hronom commented on a change in pull request #1864:
URL: https://github.com/apache/lucene-solr/pull/1864#discussion_r493408910



##
File path: solr/core/src/java/org/apache/solr/search/stats/ExactStatsCache.java
##
@@ -94,6 +94,12 @@ protected ShardRequest 
doRetrieveStatsRequest(ResponseBuilder rb) {
   protected void doMergeToGlobalStats(SolrQueryRequest req, 
List responses) {
 Set allTerms = new HashSet<>();
 for (ShardResponse r : responses) {
+  if 
("true".equalsIgnoreCase(req.getParams().get(ShardParams.SHARDS_TOLERANT)) && 
r.getException() != null) {

Review comment:
   @sigram thank you for details where to put this, let me work on this, I 
will ping you back when it's done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9535) Investigate recent indexing slowdown for wikimedium documents

2020-09-23 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200713#comment-17200713
 ] 

ASF subversion and git services commented on LUCENE-9535:
-

Commit 12dd19427e4888421202115fd86d87d0bb04eae6 in lucene-solr's branch 
refs/heads/master from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=12dd194 ]

LUCENE-9535: Reduce the size of compressed blocks of stored fields by 2x.

In order to see whether this has any effect on nigthly benchmarks.


> Investigate recent indexing slowdown for wikimedium documents
> -
>
> Key: LUCENE-9535
> URL: https://issues.apache.org/jira/browse/LUCENE-9535
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: cpu_profile.svg
>
>
> Nightly benchmarks report a ~10% slowdown for 1kB documents as of September 
> 9th: [http://people.apache.org/~mikemccand/lucenebench/indexing.html].
> On that day, we added stored fields in DWPT accounting (LUCENE-9511), so I 
> first thought this could be due to smaller flushed segments and more merging, 
> but I still wonder whether there's something else. The benchmark runs with 
> 8GB of heap, 2GB of RAM buffer and 36 indexing threads. So it's about 2GB/36 
> = 57MB of RAM buffer per thread in the worst-case scenario that all DWPTs get 
> full at the same time. Stored fields account for about 0.7MB of memory, or 1% 
> of the indexing buffer size. How can a 1% reduction of buffering capacity 
> explain a 10% indexing slowdown? I looked into this further by running 
> indexing benchmarks locally with 8 indexing threads and 128MB of indexing 
> buffer memory, which would make this issue even more apparent if the smaller 
> RAM buffer was the cause, but I'm not seeing a regression and actually I'm 
> seeing similar number of flushes when I disabled memory accounting for stored 
> fields.
> I ran indexing under a profiler to see whether something else could cause 
> this slowdown, e.g. slow implementations of ramBytesUsed on stored fields 
> writers, but nothing surprising showed up and the profile looked just like I 
> would have expected.
> Another question I have is why the 4kB benchmark is not affected at all.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9535) Investigate recent indexing slowdown for wikimedium documents

2020-09-23 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200714#comment-17200714
 ] 

ASF subversion and git services commented on LUCENE-9535:
-

Commit 12664ddbc188c4c1c7f73de7493f341befe32fd0 in lucene-solr's branch 
refs/heads/branch_8x from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=12664dd ]

LUCENE-9535: Reduce the size of compressed blocks of stored fields by 2x.

In order to see whether this has any effect on nigthly benchmarks.


> Investigate recent indexing slowdown for wikimedium documents
> -
>
> Key: LUCENE-9535
> URL: https://issues.apache.org/jira/browse/LUCENE-9535
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: cpu_profile.svg
>
>
> Nightly benchmarks report a ~10% slowdown for 1kB documents as of September 
> 9th: [http://people.apache.org/~mikemccand/lucenebench/indexing.html].
> On that day, we added stored fields in DWPT accounting (LUCENE-9511), so I 
> first thought this could be due to smaller flushed segments and more merging, 
> but I still wonder whether there's something else. The benchmark runs with 
> 8GB of heap, 2GB of RAM buffer and 36 indexing threads. So it's about 2GB/36 
> = 57MB of RAM buffer per thread in the worst-case scenario that all DWPTs get 
> full at the same time. Stored fields account for about 0.7MB of memory, or 1% 
> of the indexing buffer size. How can a 1% reduction of buffering capacity 
> explain a 10% indexing slowdown? I looked into this further by running 
> indexing benchmarks locally with 8 indexing threads and 128MB of indexing 
> buffer memory, which would make this issue even more apparent if the smaller 
> RAM buffer was the cause, but I'm not seeing a regression and actually I'm 
> seeing similar number of flushes when I disabled memory accounting for stored 
> fields.
> I ran indexing under a profiler to see whether something else could cause 
> this slowdown, e.g. slow implementations of ramBytesUsed on stored fields 
> writers, but nothing surprising showed up and the profile looked just like I 
> would have expected.
> Another question I have is why the 4kB benchmark is not affected at all.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz opened a new pull request #1912: LUCENE-9535: Try to do larger flushes.

2020-09-23 Thread GitBox


jpountz opened a new pull request #1912:
URL: https://github.com/apache/lucene-solr/pull/1912


   DWPTPool currently always returns the last DWPT that was added to the
   pool. By returning the largest DWPT instead, we could try to do larger
   flushes by finishing DWPTs that are close to being full instead of the
   last one that was added to the pool, which might be close to being
   empty.
   
   When indexing wikimediumall, this change did not seem to improve the
   indexing rate significantly, but it didn't slow things down either and
   the number of flushes went from 224-226 to 216, about 4% less.
   
   My expectation is that our nightly benchmarks are a best-case scenario
   for DWPTPool as the same number of threads is dedicated to indexing over
   time, but in the case when you have e.g. a single fixed threadpool that
   is responsible for indexing into several indices, the number of indexing
   threads that contribute to a given index might greatly vary over time.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jimczi commented on a change in pull request #1903: Fix bug in sort optimization

2020-09-23 Thread GitBox


jimczi commented on a change in pull request #1903:
URL: https://github.com/apache/lucene-solr/pull/1903#discussion_r493439005



##
File path: 
lucene/core/src/test/org/apache/lucene/search/TestFieldSortOptimizationSkipping.java
##
@@ -432,7 +439,48 @@ public void testDocSortOptimization() throws IOException {
   assertTrue(topDocs.totalHits.value < 10); // assert that very few docs 
were collected
 }
 
+reader.close();
+dir.close();
+  }
+
+  /**
+   * Test that sorting on _doc works correctly.
+   * This test goes through DefaultBulkSorter::scoreRange, where 
scorerIterator is BitSetIterator.
+   * As a conjunction of this BitSetIterator with DocComparator's iterator, we 
get BitSetConjunctionDISI.
+   * BitSetConjuctionDISI advances based on the DocComparator's iterator, and 
doesn't consider
+   * that its BitSetIterator may have advanced passed a certain doc. 

Review comment:
   Should we consider this a bug in `BitSetIterator` ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] s1monw commented on a change in pull request #1909: LUCENE-9539: Remove caches from SortingCodecReader

2020-09-23 Thread GitBox


s1monw commented on a change in pull request #1909:
URL: https://github.com/apache/lucene-solr/pull/1909#discussion_r493477140



##
File path: lucene/core/src/java/org/apache/lucene/index/SortingCodecReader.java
##
@@ -510,4 +457,52 @@ public LeafMetaData getMetaData() {
 return metaData;
   }
 
+  // we try to cache the last used DV or Norms instance since during merge
+  // this instance is used more than once. We could in addition to this single 
instance
+  // also cache the fields that are used for sorting since we do the work 
twice for these fields
+  private String cachedField;
+  private Object cachedObject;
+  private boolean cacheIsNorms;
+
+  private  T getOrCreateNorms(String field, IOSupplier supplier) throws 
IOException {
+return getOrCreate(field, true, supplier);
+  }
+
+  @SuppressWarnings("unchecked")
+  private synchronized   T getOrCreate(String field, boolean norms, 
IOSupplier supplier) throws IOException {
+if ((field.equals(cachedField) && cacheIsNorms == norms) == false) {
+  assert assertCreatedOnlyOnce(field, norms);
+  cachedObject = supplier.get();
+  cachedField = field;
+  cacheIsNorms = norms;
+
+}
+assert cachedObject != null;
+return (T) cachedObject;
+  }
+
+  private final Map cacheStats = new HashMap<>(); // only 
with assertions enabled
+  private boolean assertCreatedOnlyOnce(String field, boolean norms) {
+assert Thread.holdsLock(this);
+// this is mainly there to make sure we change anything in the way we 
merge we realize it early
+Integer timesCached = cacheStats.compute(field + "N:" + norms, (s, i) -> i 
== null ? 1 : i.intValue() + 1);
+if (timesCached > 1) {
+  assert norms == false :"[" + field + "] norms must not be cached twice";

Review comment:
   can you point me to the place where we do this? If that is the case our 
tests are not good enough here.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] s1monw commented on a change in pull request #1909: LUCENE-9539: Remove caches from SortingCodecReader

2020-09-23 Thread GitBox


s1monw commented on a change in pull request #1909:
URL: https://github.com/apache/lucene-solr/pull/1909#discussion_r493482012



##
File path: lucene/core/src/java/org/apache/lucene/index/SortingCodecReader.java
##
@@ -510,4 +457,52 @@ public LeafMetaData getMetaData() {
 return metaData;
   }
 
+  // we try to cache the last used DV or Norms instance since during merge
+  // this instance is used more than once. We could in addition to this single 
instance
+  // also cache the fields that are used for sorting since we do the work 
twice for these fields
+  private String cachedField;
+  private Object cachedObject;
+  private boolean cacheIsNorms;
+
+  private  T getOrCreateNorms(String field, IOSupplier supplier) throws 
IOException {
+return getOrCreate(field, true, supplier);
+  }
+
+  @SuppressWarnings("unchecked")
+  private synchronized   T getOrCreate(String field, boolean norms, 
IOSupplier supplier) throws IOException {
+if ((field.equals(cachedField) && cacheIsNorms == norms) == false) {
+  assert assertCreatedOnlyOnce(field, norms);
+  cachedObject = supplier.get();
+  cachedField = field;
+  cacheIsNorms = norms;
+
+}
+assert cachedObject != null;
+return (T) cachedObject;
+  }
+
+  private final Map cacheStats = new HashMap<>(); // only 
with assertions enabled
+  private boolean assertCreatedOnlyOnce(String field, boolean norms) {
+assert Thread.holdsLock(this);
+// this is mainly there to make sure we change anything in the way we 
merge we realize it early
+Integer timesCached = cacheStats.compute(field + "N:" + norms, (s, i) -> i 
== null ? 1 : i.intValue() + 1);
+if (timesCached > 1) {
+  assert norms == false :"[" + field + "] norms must not be cached twice";

Review comment:
   I think what we do here is we pull the already merged norms instance 
from disk instead of the one from the source reader. Is that what you mean in 
`PushPostingsWriterBase`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] noblepaul edited a comment on pull request #1863: SOLR-14701: GuessSchemaFields URP to replace AddSchemaFields URP in schemaless mode

2020-09-23 Thread GitBox


noblepaul edited a comment on pull request #1863:
URL: https://github.com/apache/lucene-solr/pull/1863#issuecomment-697194702


   >Strong words there "worse than useless", especially considering that this - 
to me - seems a strong improvement on the current schemaless mode as it looks 
at more values and actually supports single/multivalued fields.
   
   I was referring to the current solution we have in Solr (schemaless, guess 
schema thing) . It's not a comment on the new solution. The current schemaless 
is indeed worse than useless
   
   >Generating Schema JSON raises its own questions, such as the shape of the 
schema it will be applied to, as guessing is currently happening as a 
differential to the existing schema. 
   
   The command is only relevant for that moment. If you execute it right away, 
it's useful. Users most likely will just copy paste the command (and edit, if 
required) 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] noblepaul edited a comment on pull request #1863: SOLR-14701: GuessSchemaFields URP to replace AddSchemaFields URP in schemaless mode

2020-09-23 Thread GitBox


noblepaul edited a comment on pull request #1863:
URL: https://github.com/apache/lucene-solr/pull/1863#issuecomment-697194702


   >Strong words there "worse than useless", especially considering that this - 
to me - seems a strong improvement on the current schemaless mode as it looks 
at more values and actually supports single/multivalued fields.
   
   I'm sorry for the confusion.
   
   I was referring to the current solution we have in Solr (schemaless, guess 
schema thing) . It's not a comment on the new solution. The current schemaless 
is indeed worse than useless
   
   >Generating Schema JSON raises its own questions, such as the shape of the 
schema it will be applied to, as guessing is currently happening as a 
differential to the existing schema. 
   
   The command is only relevant for that moment. If you execute it right away, 
it's useful. Users most likely will just copy paste the command (and edit, if 
required) 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on a change in pull request #1909: LUCENE-9539: Remove caches from SortingCodecReader

2020-09-23 Thread GitBox


jpountz commented on a change in pull request #1909:
URL: https://github.com/apache/lucene-solr/pull/1909#discussion_r493490751



##
File path: lucene/core/src/java/org/apache/lucene/index/SortingCodecReader.java
##
@@ -510,4 +457,52 @@ public LeafMetaData getMetaData() {
 return metaData;
   }
 
+  // we try to cache the last used DV or Norms instance since during merge
+  // this instance is used more than once. We could in addition to this single 
instance
+  // also cache the fields that are used for sorting since we do the work 
twice for these fields
+  private String cachedField;
+  private Object cachedObject;
+  private boolean cacheIsNorms;
+
+  private  T getOrCreateNorms(String field, IOSupplier supplier) throws 
IOException {
+return getOrCreate(field, true, supplier);
+  }
+
+  @SuppressWarnings("unchecked")
+  private synchronized   T getOrCreate(String field, boolean norms, 
IOSupplier supplier) throws IOException {
+if ((field.equals(cachedField) && cacheIsNorms == norms) == false) {
+  assert assertCreatedOnlyOnce(field, norms);
+  cachedObject = supplier.get();
+  cachedField = field;
+  cacheIsNorms = norms;
+
+}
+assert cachedObject != null;
+return (T) cachedObject;
+  }
+
+  private final Map cacheStats = new HashMap<>(); // only 
with assertions enabled
+  private boolean assertCreatedOnlyOnce(String field, boolean norms) {
+assert Thread.holdsLock(this);
+// this is mainly there to make sure we change anything in the way we 
merge we realize it early
+Integer timesCached = cacheStats.compute(field + "N:" + norms, (s, i) -> i 
== null ? 1 : i.intValue() + 1);
+if (timesCached > 1) {
+  assert norms == false :"[" + field + "] norms must not be cached twice";

Review comment:
   Ah I had forgotten we were doing things this way. Then ignore my comment!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14890) Refactor code to use annotations for configset API

2020-09-23 Thread Noble Paul (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-14890:
--
Summary: Refactor code to use annotations for configset API  (was: Refactor 
code to use annotations for cluster API)

> Refactor code to use annotations for configset API
> --
>
> Key: SOLR-14890
> URL: https://issues.apache.org/jira/browse/SOLR-14890
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] noblepaul merged pull request #1911: SOLR-14890: Refactor code to use annotations for configset API

2020-09-23 Thread GitBox


noblepaul merged pull request #1911:
URL: https://github.com/apache/lucene-solr/pull/1911


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14890) Refactor code to use annotations for configset API

2020-09-23 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200779#comment-17200779
 ] 

ASF subversion and git services commented on SOLR-14890:


Commit fd0c08615df9440061e5ae664dcfa3f5a7600568 in lucene-solr's branch 
refs/heads/master from Noble Paul
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=fd0c086 ]

SOLR-14890: Refactor code to use annotations for configset API (#1911)



> Refactor code to use annotations for configset API
> --
>
> Key: SOLR-14890
> URL: https://issues.apache.org/jira/browse/SOLR-14890
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] cpoerschke opened a new pull request #1913: SOLR-11167: Avoid $SOLR_STOP_WAIT use during 'bin/solr start' if $SOLR_START_WAIT is supplied.

2020-09-23 Thread GitBox


cpoerschke opened a new pull request #1913:
URL: https://github.com/apache/lucene-solr/pull/1913


   https://issues.apache.org/jira/browse/SOLR-11167



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14890) Refactor code to use annotations for configset API

2020-09-23 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200784#comment-17200784
 ] 

ASF subversion and git services commented on SOLR-14890:


Commit fd0c08615df9440061e5ae664dcfa3f5a7600568 in lucene-solr's branch 
refs/heads/master from Noble Paul
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=fd0c086 ]

SOLR-14890: Refactor code to use annotations for configset API (#1911)



> Refactor code to use annotations for configset API
> --
>
> Key: SOLR-14890
> URL: https://issues.apache.org/jira/browse/SOLR-14890
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14890) Refactor code to use annotations for configset API

2020-09-23 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200783#comment-17200783
 ] 

ASF subversion and git services commented on SOLR-14890:


Commit fd0c08615df9440061e5ae664dcfa3f5a7600568 in lucene-solr's branch 
refs/heads/master from Noble Paul
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=fd0c086 ]

SOLR-14890: Refactor code to use annotations for configset API (#1911)



> Refactor code to use annotations for configset API
> --
>
> Key: SOLR-14890
> URL: https://issues.apache.org/jira/browse/SOLR-14890
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-11167) bin/solr uses $SOLR_STOP_WAIT during start

2020-09-23 Thread Christine Poerschke (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-11167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200786#comment-17200786
 ] 

Christine Poerschke commented on SOLR-11167:


Oops, a three year old ticket, not quite sure what happened here, apologies 
[~omar_abdelnabi]. Thanks for attaching a patch!

The patch after all this time unfortunately doesn't apply to the current master 
branch anymore. Hence I've replaced it with 
[https://github.com/apache/lucene-solr/pull/1913] instead, with two small 
differences:
 * {{solr.in.cmd}} changes left out of scope i.e. since it does not yet use 
$SOLR_STOP_WAIT currently it would be clearer to separately add 
$SOLR_START_WAIT and $SOLR_STOP_WAIT support for {{solr.cmd}}
 * instead of {{SOLR_START_WAIT=180}} initialisation (if no SOLR_START_WAIT was 
supplied) using {{SOLR_START_WAIT=$SOLR_STOP_WAIT}} will help ensure backwards 
compatibility for users that currently customise SOLR_STOP_WAIT e.g. anyone is 
currently setting {{SOLR_STOP_WAIT=42}} then they will continue to see 42s used 
for both stop and start even if they don't explicitly configure 
{{SOLR_START_WAIT=42}}

> bin/solr uses $SOLR_STOP_WAIT during start
> --
>
> Key: SOLR-11167
> URL: https://issues.apache.org/jira/browse/SOLR-11167
> Project: Solr
>  Issue Type: Improvement
>  Components: scripts and tools
>Reporter: Christine Poerschke
>Priority: Minor
> Attachments: SOLR-11167.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> bin/solr using $SOLR_STOP_WAIT during start is unexpected, I think it would 
> be clearer to have a separate $SOLR_START_WAIT variable.
> related minor thing: SOLR_STOP_WAIT is mentioned in solr.in.sh but not in 
> solr.in.cmd equivalent.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Assigned] (SOLR-11167) bin/solr uses $SOLR_STOP_WAIT during start

2020-09-23 Thread Christine Poerschke (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-11167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christine Poerschke reassigned SOLR-11167:
--

Assignee: Christine Poerschke

> bin/solr uses $SOLR_STOP_WAIT during start
> --
>
> Key: SOLR-11167
> URL: https://issues.apache.org/jira/browse/SOLR-11167
> Project: Solr
>  Issue Type: Improvement
>  Components: scripts and tools
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
>Priority: Minor
> Attachments: SOLR-11167.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> bin/solr using $SOLR_STOP_WAIT during start is unexpected, I think it would 
> be clearer to have a separate $SOLR_START_WAIT variable.
> related minor thing: SOLR_STOP_WAIT is mentioned in solr.in.sh but not in 
> solr.in.cmd equivalent.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-11167) bin/solr uses $SOLR_STOP_WAIT during start

2020-09-23 Thread Christine Poerschke (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-11167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christine Poerschke updated SOLR-11167:
---
Fix Version/s: 8.7
   master (9.0)

> bin/solr uses $SOLR_STOP_WAIT during start
> --
>
> Key: SOLR-11167
> URL: https://issues.apache.org/jira/browse/SOLR-11167
> Project: Solr
>  Issue Type: Improvement
>  Components: scripts and tools
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
>Priority: Minor
> Fix For: master (9.0), 8.7
>
> Attachments: SOLR-11167.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> bin/solr using $SOLR_STOP_WAIT during start is unexpected, I think it would 
> be clearer to have a separate $SOLR_START_WAIT variable.
> related minor thing: SOLR_STOP_WAIT is mentioned in solr.in.sh but not in 
> solr.in.cmd equivalent.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9539) Improve memory footprint of SortingCodecReader

2020-09-23 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200789#comment-17200789
 ] 

ASF subversion and git services commented on LUCENE-9539:
-

Commit 17c285d61743da0c06735e06235b20bd5aac4e14 in lucene-solr's branch 
refs/heads/master from Simon Willnauer
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=17c285d ]

LUCENE-9539: Remove caches from SortingCodecReader (#1909)

SortingCodecReader keeps all docvalues in memory that are loaded from this 
reader.
Yet, this reader should only be used for merging which happens sequentially. 
This makes
caching docvalues unnecessary.

Co-authored-by: Jim Ferenczi 

> Improve memory footprint of SortingCodecReader
> --
>
> Key: LUCENE-9539
> URL: https://issues.apache.org/jira/browse/LUCENE-9539
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Simon Willnauer
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> SortingCodecReader is a very memory heavy since it needs to re-sort and load 
> large parts of the index into memory. We can try to make it more efficient by 
> using more compact internal data-structures, remove the caches it uses 
> provided we define it's usage as a merge only reader wrapper. Ultimately we 
> need to find a way to allow the reader or some other structure to minimize 
> its heap memory. One way is to slice existing readers and merge them in 
> multiple steps. There will be multiple steps towards a more useable version 
> of this class.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] s1monw merged pull request #1909: LUCENE-9539: Remove caches from SortingCodecReader

2020-09-23 Thread GitBox


s1monw merged pull request #1909:
URL: https://github.com/apache/lucene-solr/pull/1909


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9539) Improve memory footprint of SortingCodecReader

2020-09-23 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200795#comment-17200795
 ] 

ASF subversion and git services commented on LUCENE-9539:
-

Commit 427e11c7f644a05be93bb801ca394b90dccf8df6 in lucene-solr's branch 
refs/heads/branch_8x from Simon Willnauer
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=427e11c ]

LUCENE-9539: Remove caches from SortingCodecReader (#1909)

SortingCodecReader keeps all docvalues in memory that are loaded from this 
reader.
Yet, this reader should only be used for merging which happens sequentially. 
This makes
caching docvalues unnecessary.

Co-authored-by: Jim Ferenczi 

> Improve memory footprint of SortingCodecReader
> --
>
> Key: LUCENE-9539
> URL: https://issues.apache.org/jira/browse/LUCENE-9539
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Simon Willnauer
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> SortingCodecReader is a very memory heavy since it needs to re-sort and load 
> large parts of the index into memory. We can try to make it more efficient by 
> using more compact internal data-structures, remove the caches it uses 
> provided we define it's usage as a merge only reader wrapper. Ultimately we 
> need to find a way to allow the reader or some other structure to minimize 
> its heap memory. One way is to slice existing readers and merge them in 
> multiple steps. There will be multiple steps towards a more useable version 
> of this class.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Assigned] (SOLR-14503) Solr does not respect waitForZk (SOLR_WAIT_FOR_ZK) property

2020-09-23 Thread Munendra S N (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Munendra S N reassigned SOLR-14503:
---

Assignee: Munendra S N

> Solr does not respect waitForZk (SOLR_WAIT_FOR_ZK) property
> ---
>
> Key: SOLR-14503
> URL: https://issues.apache.org/jira/browse/SOLR-14503
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 7.1, 7.2, 7.2.1, 7.3, 7.3.1, 7.4, 7.5, 7.6, 7.7, 7.7.1, 
> 7.7.2, 8.0, 8.1, 8.2, 7.7.3, 8.1.1, 8.3, 8.4, 8.3.1, 8.5, 8.4.1, 8.5.1
>Reporter: Colvin Cowie
>Assignee: Munendra S N
>Priority: Minor
> Attachments: SOLR-14503.patch, SOLR-14503.patch
>
>
> When starting Solr in cloud mode, if zookeeper is not available within 30 
> seconds, then core container intialization fails and the node will not 
> recover when zookeeper is available.
>  
> I believe SOLR-5129 should have addressed this issue, however it doesn't 
> quite do so for two reasons:
>  # 
> [https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/servlet/SolrDispatchFilter.java#L297]
>  it calls {{SolrZkClient(String zkServerAddress, int zkClientTimeout)}} 
> rather than {{SolrZkClient(String zkServerAddress, int zkClientTimeout, int 
> zkClientConnectTimeout)}} so the DEFAULT_CLIENT_CONNECT_TIMEOUT of 30 seconds 
> is used even when you specify a different waitForZk value
>  # bin/solr contains script to set -DwaitForZk from the SOLR_WAIT_FOR_ZK 
> environment property 
> [https://github.com/apache/lucene-solr/blob/master/solr/bin/solr#L2148] but 
> there is no corresponding assignment in bin/solr.cmd, while SOLR_WAIT_FOR_ZK 
> appears in the solr.in.cmd as an example.
>  
> I will attach a patch that fixes the above.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14503) Solr does not respect waitForZk (SOLR_WAIT_FOR_ZK) property

2020-09-23 Thread Munendra S N (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200808#comment-17200808
 ] 

Munendra S N commented on SOLR-14503:
-

I'm planning to commit current patch and handle other cases of zkClientTimeout 
usage in a separate issue

> Solr does not respect waitForZk (SOLR_WAIT_FOR_ZK) property
> ---
>
> Key: SOLR-14503
> URL: https://issues.apache.org/jira/browse/SOLR-14503
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 7.1, 7.2, 7.2.1, 7.3, 7.3.1, 7.4, 7.5, 7.6, 7.7, 7.7.1, 
> 7.7.2, 8.0, 8.1, 8.2, 7.7.3, 8.1.1, 8.3, 8.4, 8.3.1, 8.5, 8.4.1, 8.5.1
>Reporter: Colvin Cowie
>Assignee: Munendra S N
>Priority: Minor
> Attachments: SOLR-14503.patch, SOLR-14503.patch
>
>
> When starting Solr in cloud mode, if zookeeper is not available within 30 
> seconds, then core container intialization fails and the node will not 
> recover when zookeeper is available.
>  
> I believe SOLR-5129 should have addressed this issue, however it doesn't 
> quite do so for two reasons:
>  # 
> [https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/servlet/SolrDispatchFilter.java#L297]
>  it calls {{SolrZkClient(String zkServerAddress, int zkClientTimeout)}} 
> rather than {{SolrZkClient(String zkServerAddress, int zkClientTimeout, int 
> zkClientConnectTimeout)}} so the DEFAULT_CLIENT_CONNECT_TIMEOUT of 30 seconds 
> is used even when you specify a different waitForZk value
>  # bin/solr contains script to set -DwaitForZk from the SOLR_WAIT_FOR_ZK 
> environment property 
> [https://github.com/apache/lucene-solr/blob/master/solr/bin/solr#L2148] but 
> there is no corresponding assignment in bin/solr.cmd, while SOLR_WAIT_FOR_ZK 
> appears in the solr.in.cmd as an example.
>  
> I will attach a patch that fixes the above.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Assigned] (SOLR-14333) Implement toString() in CollapsingPostFilter

2020-09-23 Thread Munendra S N (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Munendra S N reassigned SOLR-14333:
---

Assignee: Munendra S N

> Implement toString() in CollapsingPostFilter
> 
>
> Key: SOLR-14333
> URL: https://issues.apache.org/jira/browse/SOLR-14333
> Project: Solr
>  Issue Type: Improvement
>Reporter: Munendra S N
>Assignee: Munendra S N
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> {{toString()}} is not overridden in CollapsingPostFilter. Debug component 
> returns {{parsed_filter_queries}}, for multiple CollapsingPostFilter in 
> request, value in {{parsed_filter_queries}} is always 
> {{CollapsingPostFilter()}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14503) Solr does not respect waitForZk (SOLR_WAIT_FOR_ZK) property

2020-09-23 Thread Colvin Cowie (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200811#comment-17200811
 ] 

Colvin Cowie commented on SOLR-14503:
-

Hi [~munendrasn], thanks. Sorry I've not got any time at the moment. Thanks

> Solr does not respect waitForZk (SOLR_WAIT_FOR_ZK) property
> ---
>
> Key: SOLR-14503
> URL: https://issues.apache.org/jira/browse/SOLR-14503
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 7.1, 7.2, 7.2.1, 7.3, 7.3.1, 7.4, 7.5, 7.6, 7.7, 7.7.1, 
> 7.7.2, 8.0, 8.1, 8.2, 7.7.3, 8.1.1, 8.3, 8.4, 8.3.1, 8.5, 8.4.1, 8.5.1
>Reporter: Colvin Cowie
>Assignee: Munendra S N
>Priority: Minor
> Attachments: SOLR-14503.patch, SOLR-14503.patch
>
>
> When starting Solr in cloud mode, if zookeeper is not available within 30 
> seconds, then core container intialization fails and the node will not 
> recover when zookeeper is available.
>  
> I believe SOLR-5129 should have addressed this issue, however it doesn't 
> quite do so for two reasons:
>  # 
> [https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/servlet/SolrDispatchFilter.java#L297]
>  it calls {{SolrZkClient(String zkServerAddress, int zkClientTimeout)}} 
> rather than {{SolrZkClient(String zkServerAddress, int zkClientTimeout, int 
> zkClientConnectTimeout)}} so the DEFAULT_CLIENT_CONNECT_TIMEOUT of 30 seconds 
> is used even when you specify a different waitForZk value
>  # bin/solr contains script to set -DwaitForZk from the SOLR_WAIT_FOR_ZK 
> environment property 
> [https://github.com/apache/lucene-solr/blob/master/solr/bin/solr#L2148] but 
> there is no corresponding assignment in bin/solr.cmd, while SOLR_WAIT_FOR_ZK 
> appears in the solr.in.cmd as an example.
>  
> I will attach a patch that fixes the above.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mayya-sharipova commented on a change in pull request #1903: Fix bug in sort optimization

2020-09-23 Thread GitBox


mayya-sharipova commented on a change in pull request #1903:
URL: https://github.com/apache/lucene-solr/pull/1903#discussion_r493568956



##
File path: 
lucene/core/src/test/org/apache/lucene/search/TestFieldSortOptimizationSkipping.java
##
@@ -432,7 +439,48 @@ public void testDocSortOptimization() throws IOException {
   assertTrue(topDocs.totalHits.value < 10); // assert that very few docs 
were collected
 }
 
+reader.close();
+dir.close();
+  }
+
+  /**
+   * Test that sorting on _doc works correctly.
+   * This test goes through DefaultBulkSorter::scoreRange, where 
scorerIterator is BitSetIterator.
+   * As a conjunction of this BitSetIterator with DocComparator's iterator, we 
get BitSetConjunctionDISI.
+   * BitSetConjuctionDISI advances based on the DocComparator's iterator, and 
doesn't consider
+   * that its BitSetIterator may have advanced passed a certain doc. 

Review comment:
   I will create an issue for this. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mayya-sharipova merged pull request #1903: Fix bug in sort optimization

2020-09-23 Thread GitBox


mayya-sharipova merged pull request #1903:
URL: https://github.com/apache/lucene-solr/pull/1903


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] munendrasn commented on pull request #1900: SOLR-14036: Remove explicit distrib=false from /terms handler

2020-09-23 Thread GitBox


munendrasn commented on pull request #1900:
URL: https://github.com/apache/lucene-solr/pull/1900#issuecomment-697360102


   I have included the changes and upgrade entry. Instead of adding upgrade 
entry to `solr-upgrade-notes.adoc`, I have added to 
`major-changes-in-solr-9.adoc` as mentioned the former doc



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] munendrasn opened a new pull request #1914: Move 9x upgrade notes out of changes.txt

2020-09-23 Thread GitBox


munendrasn opened a new pull request #1914:
URL: https://github.com/apache/lucene-solr/pull/1914


   Upgrade notes have been moved out of changes.txt. While working on PR #1900, 
I found there were few entries which were still present in changes.txt (most 
likely added at a later time)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14787) Inequality support in Payload Check query parser

2020-09-23 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200828#comment-17200828
 ] 

Gus Heck commented on SOLR-14787:
-

I have found something interesting WRT the failing case you mention... it only 
fails when I run the test in my IDE. If I use the ant build it passes. I notice 
some interesting differences in startup for these two scenarios... 

build:

 
{code:java}
   [junit4] Suite: org.apache.solr.search.TestPayloadCheckQParserPlugin
   [junit4]   2> 1454 INFO  
(SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] 
o.a.s.SolrTestCase Setting 'solr.default.confdir' system property to 
test-framework derived value of 
'/home/gus/projects/apache/lucene-solr/fork/lucene-solr8/solr/server/solr/configsets/_default/conf'
   [junit4]   2> 1475 INFO  
(SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] 
o.a.s.SolrTestCaseJ4 Created dataDir: 
/home/gus/projects/apache/lucene-solr/fork/lucene-solr8/solr/build/solr-core/test/J0/temp/solr.search.TestPayloadCheckQParserPlugin_AB5E0FC0380BB866-001/data-dir-1-001
   [junit4]   2> 1551 INFO  
(SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] 
o.a.s.SolrTestCaseJ4 Using TrieFields (NUMERIC_POINTS_SYSPROP=false) 
w/NUMERIC_DOCVALUES_SYSPROP=true
   [junit4]   2> 1592 INFO  
(SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] 
o.e.j.u.log Logging initialized @1620ms to org.eclipse.jetty.util.log.Slf4jLog
   [junit4]   2> 1597 INFO  
(SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] 
o.a.s.SolrTestCaseJ4 Randomized ssl (false) and clientAuth (true) via: 
@org.apache.solr.util.RandomizeSSL(reason=, ssl=NaN, value=NaN, clientAuth=NaN)
   [junit4]   2> 1621 INFO  
(SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] 
o.a.s.SolrTestCaseJ4 SecureRandom sanity checks: 
test.solr.allowed.securerandom=null & java.security.egd=file:/dev/./urandom
   [junit4]   2> 1626 INFO  
(SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] 
o.a.s.SolrTestCaseJ4 initCore
   [junit4]   2> 1757 INFO  
(SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] 
o.a.s.c.SolrConfig Using Lucene MatchVersion: 8.7.0
   [junit4]   2> 1901 INFO  
(SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] 
o.a.s.s.IndexSchema Schema name=example
   [junit4]   2> 1931 WARN  
(SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] 
o.a.s.c.SolrResourceLoader Solr loaded a deprecated plugin/analysis class 
[solr.TrieIntField]. Please consult documentation how to replace it accordingly.
   [junit4]   2> 1936 WARN  
(SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] 
o.a.s.c.SolrResourceLoader Solr loaded a deprecated plugin/analysis class 
[solr.TrieFloatField]. Please consult documentation how to replace it 
accordingly.
   [junit4]   2> 1940 WARN  
(SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] 
o.a.s.c.SolrResourceLoader Solr loaded a deprecated plugin/analysis class 
[solr.TrieLongField]. Please consult documentation how to replace it 
accordingly.
   [junit4]   2> 1944 WARN  
(SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] 
o.a.s.c.SolrResourceLoader Solr loaded a deprecated plugin/analysis class 
[solr.TrieDoubleField]. Please consult documentation how to replace it 
accordingly.
   [junit4]   2> 1966 WARN  
(SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] 
o.a.s.c.SolrResourceLoader Solr loaded a deprecated plugin/analysis class 
[solr.TrieDateField]. Please consult documentation how to replace it 
accordingly.
   [junit4]   2> 2202 WARN  
(SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] 
o.a.s.c.SolrResourceLoader Solr loaded a deprecated plugin/analysis class 
[solr.GeoHashField]. Please consult documentation how to replace it accordingly.
   [junit4]   2> 2208 WARN  
(SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] 
o.a.s.c.SolrResourceLoader Solr loaded a deprecated plugin/analysis class 
[solr.LatLonType]. Please consult documentation how to replace it accordingly.
   [junit4]   2> 2217 WARN  
(SUITE-TestPayloadCheckQParserPlugin-seed#[AB5E0FC0380BB866]-worker) [ ] 
o.a.s.c.SolrResourceLoader Solr loaded a deprecated plugin/analysis class 
[solr.EnumField]. Please consult documentation how to replace it accordingly.


{code}
IDE (Intellij)

 

 
{code:java}
1172 INFO  (SUITE-TestPayloadCheckQParserPlugin-seed#[5A2517E33080AEE6]-worker) 
[ ] o.a.s.SolrTestCase Setting 'solr.default.confdir' system property to 
test-framework derived value of 
'/home/gus/projects/apache/lucene-solr/fork/lucene-solr/solr/server/solr/configsets/_default/conf'
1190 INFO  (SUITE-TestPayloa

[GitHub] [lucene-solr] munendrasn commented on pull request #1914: Move 9x upgrade notes out of changes.txt

2020-09-23 Thread GitBox


munendrasn commented on pull request #1914:
URL: https://github.com/apache/lucene-solr/pull/1914#issuecomment-697368032


   @noblepaul @sigram Please review. I have moved the entries added by you 
guys, so would prefer your reviews



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mocobeta commented on pull request #1836: LUCENE-9317: Clean up split package in analyzers-common

2020-09-23 Thread GitBox


mocobeta commented on pull request #1836:
URL: https://github.com/apache/lucene-solr/pull/1836#issuecomment-697375125


   @uschindler seems busy.
   
   I don't want to maintain this branch for very long (the diff is so large), 
but I need at least one reviewer to proceed this.
   @dweiss would you take care this, if you have some time?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9541) BitSetConjunctionDISI can advance backwards from its components

2020-09-23 Thread Mayya Sharipova (Jira)
Mayya Sharipova created LUCENE-9541:
---

 Summary: BitSetConjunctionDISI can advance backwards from its 
components
 Key: LUCENE-9541
 URL: https://issues.apache.org/jira/browse/LUCENE-9541
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Mayya Sharipova


Not completely sure if this is a bug.

BitSetConjuctionDISI advances based on its lead  – DocIdSetIterator iterator, 
and doesn't consider that its another component – BitSetIterator may have 
already advanced passed a certain doc. This may result in duplicate documents.

This behaviour was exposed in this PR. 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9541) BitSetConjunctionDISI can advance to docs before its components

2020-09-23 Thread Mayya Sharipova (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayya Sharipova updated LUCENE-9541:

Summary: BitSetConjunctionDISI can advance to docs before its components  
(was: BitSetConjunctionDISI can advance backwards from its components)

> BitSetConjunctionDISI can advance to docs before its components
> ---
>
> Key: LUCENE-9541
> URL: https://issues.apache.org/jira/browse/LUCENE-9541
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Mayya Sharipova
>Priority: Minor
>
> Not completely sure if this is a bug.
> BitSetConjuctionDISI advances based on its lead  – DocIdSetIterator iterator, 
> and doesn't consider that its another component – BitSetIterator may have 
> already advanced passed a certain doc. This may result in duplicate documents.
> This behaviour was exposed in this PR. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9541) BitSetConjunctionDISI can advance to docs before its components

2020-09-23 Thread Mayya Sharipova (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayya Sharipova updated LUCENE-9541:

Description: 
Not completely sure if this is a bug.

BitSetConjuctionDISI advances based on its lead  – DocIdSetIterator iterator, 
and doesn't consider that its another component – BitSetIterator may have 
already advanced passed a certain doc. This may result in duplicate documents.

This behaviour was exposed in this 
[PR|https://github.com/apache/lucene-solr/pull/1903]. 

 

  was:
Not completely sure if this is a bug.

BitSetConjuctionDISI advances based on its lead  – DocIdSetIterator iterator, 
and doesn't consider that its another component – BitSetIterator may have 
already advanced passed a certain doc. This may result in duplicate documents.

This behaviour was exposed in this PR. 

 


> BitSetConjunctionDISI can advance to docs before its components
> ---
>
> Key: LUCENE-9541
> URL: https://issues.apache.org/jira/browse/LUCENE-9541
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Mayya Sharipova
>Priority: Minor
>
> Not completely sure if this is a bug.
> BitSetConjuctionDISI advances based on its lead  – DocIdSetIterator iterator, 
> and doesn't consider that its another component – BitSetIterator may have 
> already advanced passed a certain doc. This may result in duplicate documents.
> This behaviour was exposed in this 
> [PR|https://github.com/apache/lucene-solr/pull/1903]. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9541) BitSetConjunctionDISI can advance to docs before its components

2020-09-23 Thread Mayya Sharipova (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayya Sharipova updated LUCENE-9541:

Description: 
Not completely sure if this is a bug.

BitSetConjuctionDISI advances based on its lead  – DocIdSetIterator iterator, 
and doesn't consider that its another component – BitSetIterator may have 
already advanced passed a certain doc. This may result in duplicate documents.

For example if BitSetConjuctionDISI  _disi_ is composed of DocIdSetIterator _a_ 
of docs  [0,1] and BitSetIterator _b_ of docs [0,1].  Doing `b.nextDoc()` we 
are collecting doc0,  doing `disi.nextDoc` we again  collecting the same doc0.

It seems that other conjunction iterators don't have this behaviour, if we are 
advancing any of their component pass a certain document, the whole conjunction 
iterator will also be advanced pass this document. 

 

This behaviour was exposed in this 
[PR|https://github.com/apache/lucene-solr/pull/1903]. 

 

  was:
Not completely sure if this is a bug.

BitSetConjuctionDISI advances based on its lead  – DocIdSetIterator iterator, 
and doesn't consider that its another component – BitSetIterator may have 
already advanced passed a certain doc. This may result in duplicate documents.

This behaviour was exposed in this 
[PR|https://github.com/apache/lucene-solr/pull/1903]. 

 


> BitSetConjunctionDISI can advance to docs before its components
> ---
>
> Key: LUCENE-9541
> URL: https://issues.apache.org/jira/browse/LUCENE-9541
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Mayya Sharipova
>Priority: Minor
>
> Not completely sure if this is a bug.
> BitSetConjuctionDISI advances based on its lead  – DocIdSetIterator iterator, 
> and doesn't consider that its another component – BitSetIterator may have 
> already advanced passed a certain doc. This may result in duplicate documents.
> For example if BitSetConjuctionDISI  _disi_ is composed of DocIdSetIterator 
> _a_ of docs  [0,1] and BitSetIterator _b_ of docs [0,1].  Doing `b.nextDoc()` 
> we are collecting doc0,  doing `disi.nextDoc` we again  collecting the same 
> doc0.
> It seems that other conjunction iterators don't have this behaviour, if we 
> are advancing any of their component pass a certain document, the whole 
> conjunction iterator will also be advanced pass this document. 
>  
> This behaviour was exposed in this 
> [PR|https://github.com/apache/lucene-solr/pull/1903]. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9541) BitSetConjunctionDISI doesn't advance based on its components

2020-09-23 Thread Mayya Sharipova (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayya Sharipova updated LUCENE-9541:

Summary: BitSetConjunctionDISI doesn't advance based on its components  
(was: BitSetConjunctionDISI can advance to docs before its components)

> BitSetConjunctionDISI doesn't advance based on its components
> -
>
> Key: LUCENE-9541
> URL: https://issues.apache.org/jira/browse/LUCENE-9541
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Mayya Sharipova
>Priority: Minor
>
> Not completely sure if this is a bug.
> BitSetConjuctionDISI advances based on its lead  – DocIdSetIterator iterator, 
> and doesn't consider that its another component – BitSetIterator may have 
> already advanced passed a certain doc. This may result in duplicate documents.
> For example if BitSetConjuctionDISI  _disi_ is composed of DocIdSetIterator 
> _a_ of docs  [0,1] and BitSetIterator _b_ of docs [0,1].  Doing `b.nextDoc()` 
> we are collecting doc0,  doing `disi.nextDoc` we again  collecting the same 
> doc0.
> It seems that other conjunction iterators don't have this behaviour, if we 
> are advancing any of their component pass a certain document, the whole 
> conjunction iterator will also be advanced pass this document. 
>  
> This behaviour was exposed in this 
> [PR|https://github.com/apache/lucene-solr/pull/1903]. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mayya-sharipova commented on a change in pull request #1903: Fix bug in sort optimization

2020-09-23 Thread GitBox


mayya-sharipova commented on a change in pull request #1903:
URL: https://github.com/apache/lucene-solr/pull/1903#discussion_r493610854



##
File path: 
lucene/core/src/test/org/apache/lucene/search/TestFieldSortOptimizationSkipping.java
##
@@ -432,7 +439,48 @@ public void testDocSortOptimization() throws IOException {
   assertTrue(topDocs.totalHits.value < 10); // assert that very few docs 
were collected
 }
 
+reader.close();
+dir.close();
+  }
+
+  /**
+   * Test that sorting on _doc works correctly.
+   * This test goes through DefaultBulkSorter::scoreRange, where 
scorerIterator is BitSetIterator.
+   * As a conjunction of this BitSetIterator with DocComparator's iterator, we 
get BitSetConjunctionDISI.
+   * BitSetConjuctionDISI advances based on the DocComparator's iterator, and 
doesn't consider
+   * that its BitSetIterator may have advanced passed a certain doc. 

Review comment:
   Issue created: https://issues.apache.org/jira/browse/LUCENE-9541





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] arafalov commented on pull request #1863: SOLR-14701: GuessSchemaFields URP to replace AddSchemaFields URP in schemaless mode

2020-09-23 Thread GitBox


arafalov commented on pull request #1863:
URL: https://github.com/apache/lucene-solr/pull/1863#issuecomment-697406760


   Ok, I am glad we are on the same page that the current (let's call it _Add_) 
solution is rather bad despite all the great work put into it. Let's now get 
onto the same page about the next step you are actually proposing. I can read 
the rest of your statement in one of the following ways:
   
   1. Neither original _Add_ nor proposed _Guess_ solutions will address 
problem. **Next step: that discussion is not about code and should be taken up 
in the parent JIRA**. 
   That's exactly what it is there for and this code/PR is here to push the 
discussion from theoretical to practical.
   2. _Guess_ approach is ok overall, but the schema creation is still bad, 
could it return schema generation commands instead. I just double-checked code 
and there is no way for the current architecture to return non-error feedback 
(from either processCommit or SimplePostTool side). **Next step: Propose a way 
this could be done.** 
   Do note that the reason we are still an URP is because any schema guessing 
or creation depends on previous chain URPs to be always enabled (e.g. for 
custom dates formats); that is one of the things really broken with 
enable/disable flag for _Add_ solution and why I am doing the single-URP level 
flag.
   3. We need some other Guess approach. **Next action: propose alternative 
architecture, preferably as straw-man implementation**.
   This would give people on JIRA a chance to select from TWO ways forward, 
that would be amazing whether we end on one, another or merged solution.
   4. ??? Use veto and keep status quo until somebody yet different has a much 
better idea than people in last 3 JIRA? 
   5. ??? (I don't claim to read your mind, but I want to move this discussion 
forward in concrete non-blocking steps)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-8281) Add RollupMergeStream to Streaming API

2020-09-23 Thread Gus Heck (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-8281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200848#comment-17200848
 ] 

Gus Heck commented on SOLR-8281:


This seems related to something I wanted to do for a client... I had reduce 
with group() and I wanted to then feed the groups to an arbitrary streaming 
expression for further processing, and have the result show up in the groups 
(result would have been a matrix). Problem I stopped on was how to express the 
stream to process the group without having a source (the source is the group).

> Add RollupMergeStream to Streaming API
> --
>
> Key: SOLR-8281
> URL: https://issues.apache.org/jira/browse/SOLR-8281
> Project: Solr
>  Issue Type: Bug
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Major
>
> The RollupMergeStream merges the aggregate results emitted by the 
> RollupStream on *worker* nodes.
> This is designed to be used in conjunction with the HashJoinStream to perform 
> rollup Aggregations on the joined Tuples. The HashJoinStream will require the 
> tuples to be partitioned on the Join keys. To avoid needing to repartition on 
> the *group by* fields for the RollupStream, we can perform a merge of the 
> rolled up Tuples coming from the workers.
> The construct would like this:
> {code}
> mergeRollup (...
>   parallel (...
> rollup (...
> hashJoin (
>   search(...),
>   search(...),
>   on="fieldA" 
> )
>  )
>  )
>)
> {code}
> The pseudo code above would push the *hashJoin* and *rollup* to the *worker* 
> nodes. The emitted rolled up tuples would be merged by the mergeRollup.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] HoustonPutman commented on pull request #1863: SOLR-14701: GuessSchemaFields URP to replace AddSchemaFields URP in schemaless mode

2020-09-23 Thread GitBox


HoustonPutman commented on pull request #1863:
URL: https://github.com/apache/lucene-solr/pull/1863#issuecomment-697420014


   Purely responding to the URP response part, it’s definitely not possible for 
URP to send non-error responses. I do think its something we should implement 
though, since it will expand the use cases that URPs can solve. Ill create a 
JIRA for it.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mayya-sharipova opened a new pull request #1915: Fix bug in sort optimization (#1903)

2020-09-23 Thread GitBox


mayya-sharipova opened a new pull request #1915:
URL: https://github.com/apache/lucene-solr/pull/1915


   Fix bug how iterator with skipping functionality
   advances and produces docs
   
   Relates to #1725
   Backport for #1903



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mayya-sharipova merged pull request #1915: Fix bug in sort optimization (#1903)

2020-09-23 Thread GitBox


mayya-sharipova merged pull request #1915:
URL: https://github.com/apache/lucene-solr/pull/1915


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dalbani opened a new pull request #1916: Fix minor typo

2020-09-23 Thread GitBox


dalbani opened a new pull request #1916:
URL: https://github.com/apache/lucene-solr/pull/1916


   Ignoring the default issue template given that this PR is about a tiny fix 
for a typo. Right?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] Hronom commented on a change in pull request #1864: SOLR-14850 ExactStatsCache NullPointerException when shards.tolerant=true

2020-09-23 Thread GitBox


Hronom commented on a change in pull request #1864:
URL: https://github.com/apache/lucene-solr/pull/1864#discussion_r493662797



##
File path: solr/core/src/java/org/apache/solr/search/stats/ExactStatsCache.java
##
@@ -94,6 +94,12 @@ protected ShardRequest 
doRetrieveStatsRequest(ResponseBuilder rb) {
   protected void doMergeToGlobalStats(SolrQueryRequest req, 
List responses) {
 Set allTerms = new HashSet<>();
 for (ShardResponse r : responses) {
+  if 
("true".equalsIgnoreCase(req.getParams().get(ShardParams.SHARDS_TOLERANT)) && 
r.getException() != null) {

Review comment:
   @sigram @madrob I added test that reproduces the problem in 
`TestExactStatsCache`.
   
   Please can you adjust it(if needed) to nicely feet in solr tests suits.
   
   The trick here with this issue, that is reproducible only when at least one 
shard is fully down. This is why I didn't use `setDistributedParams`, since 
it's add's one work replica, so all shards is healthy and there no situation 
when one shard is completely down.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] Hronom commented on a change in pull request #1864: SOLR-14850 ExactStatsCache NullPointerException when shards.tolerant=true

2020-09-23 Thread GitBox


Hronom commented on a change in pull request #1864:
URL: https://github.com/apache/lucene-solr/pull/1864#discussion_r493662797



##
File path: solr/core/src/java/org/apache/solr/search/stats/ExactStatsCache.java
##
@@ -94,6 +94,12 @@ protected ShardRequest 
doRetrieveStatsRequest(ResponseBuilder rb) {
   protected void doMergeToGlobalStats(SolrQueryRequest req, 
List responses) {
 Set allTerms = new HashSet<>();
 for (ShardResponse r : responses) {
+  if 
("true".equalsIgnoreCase(req.getParams().get(ShardParams.SHARDS_TOLERANT)) && 
r.getException() != null) {

Review comment:
   @sigram @madrob I added test that reproduces the problem in 
`TestExactStatsCache`.
   
   Please can you adjust it(if needed) to nicely fit in solr tests suits.
   
   The trick here with this issue, that is reproducible only when at least one 
shard is fully down. This is why I didn't use `setDistributedParams`, since 
it's add's one work replica, so all shards is healthy and there no situation 
when one shard is completely down.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] Hronom commented on a change in pull request #1864: SOLR-14850 ExactStatsCache NullPointerException when shards.tolerant=true

2020-09-23 Thread GitBox


Hronom commented on a change in pull request #1864:
URL: https://github.com/apache/lucene-solr/pull/1864#discussion_r493662797



##
File path: solr/core/src/java/org/apache/solr/search/stats/ExactStatsCache.java
##
@@ -94,6 +94,12 @@ protected ShardRequest 
doRetrieveStatsRequest(ResponseBuilder rb) {
   protected void doMergeToGlobalStats(SolrQueryRequest req, 
List responses) {
 Set allTerms = new HashSet<>();
 for (ShardResponse r : responses) {
+  if 
("true".equalsIgnoreCase(req.getParams().get(ShardParams.SHARDS_TOLERANT)) && 
r.getException() != null) {

Review comment:
   @sigram @madrob I added test that reproduces the problem in 
`TestExactStatsCache`. And it fails with null exception if you remove my fix.
   
   Please can you adjust it(if needed) to nicely fit in solr tests suits.
   
   The trick here with this issue, that its reproducible only when at least one 
shard is fully down(no healthy replica there). This is why I didn't use 
`setDistributedParams`, since it's add's one work replica, so all shards is 
healthy and there no situation when one shard is completely down.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] Hronom commented on a change in pull request #1864: SOLR-14850 ExactStatsCache NullPointerException when shards.tolerant=true

2020-09-23 Thread GitBox


Hronom commented on a change in pull request #1864:
URL: https://github.com/apache/lucene-solr/pull/1864#discussion_r493662797



##
File path: solr/core/src/java/org/apache/solr/search/stats/ExactStatsCache.java
##
@@ -94,6 +94,12 @@ protected ShardRequest 
doRetrieveStatsRequest(ResponseBuilder rb) {
   protected void doMergeToGlobalStats(SolrQueryRequest req, 
List responses) {
 Set allTerms = new HashSet<>();
 for (ShardResponse r : responses) {
+  if 
("true".equalsIgnoreCase(req.getParams().get(ShardParams.SHARDS_TOLERANT)) && 
r.getException() != null) {

Review comment:
   @sigram @madrob I added test that reproduces the problem in 
`TestExactStatsCache`.
   
   Please can you adjust it(if needed) to nicely fit in solr tests suits.
   
   The trick here with this issue, that its reproducible only when at least one 
shard is fully down(no healthy replica there). This is why I didn't use 
`setDistributedParams`, since it's add's one work replica, so all shards is 
healthy and there no situation when one shard is completely down.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] Hronom commented on a change in pull request #1864: SOLR-14850 ExactStatsCache NullPointerException when shards.tolerant=true

2020-09-23 Thread GitBox


Hronom commented on a change in pull request #1864:
URL: https://github.com/apache/lucene-solr/pull/1864#discussion_r493662797



##
File path: solr/core/src/java/org/apache/solr/search/stats/ExactStatsCache.java
##
@@ -94,6 +94,12 @@ protected ShardRequest 
doRetrieveStatsRequest(ResponseBuilder rb) {
   protected void doMergeToGlobalStats(SolrQueryRequest req, 
List responses) {
 Set allTerms = new HashSet<>();
 for (ShardResponse r : responses) {
+  if 
("true".equalsIgnoreCase(req.getParams().get(ShardParams.SHARDS_TOLERANT)) && 
r.getException() != null) {

Review comment:
   @sigram @madrob I added test that reproduces the problem in 
`TestExactStatsCache`. And it fails with null exception if you remove my fix.
   
   Please can you adjust it(if needed) to nicely fit in solr tests suits, I set 
now `Allow edits by maintainers`.
   
   The trick here with this issue, that its reproducible only when at least one 
shard is fully down(no healthy replica there). This is why I didn't use 
`setDistributedParams`, since it's add's one work replica, so all shards is 
healthy and there no situation when one shard is completely down.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob commented on pull request #1916: Fix minor typo

2020-09-23 Thread GitBox


madrob commented on pull request #1916:
URL: https://github.com/apache/lucene-solr/pull/1916#issuecomment-697511015


   Thank you for finding and correcting this!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob merged pull request #1916: Fix minor typo

2020-09-23 Thread GitBox


madrob merged pull request #1916:
URL: https://github.com/apache/lucene-solr/pull/1916


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-8281) Add RollupMergeStream to Streaming API

2020-09-23 Thread Joel Bernstein (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-8281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200886#comment-17200886
 ] 

Joel Bernstein commented on SOLR-8281:
--

[~gus], feel free to send me an email to discuss.

> Add RollupMergeStream to Streaming API
> --
>
> Key: SOLR-8281
> URL: https://issues.apache.org/jira/browse/SOLR-8281
> Project: Solr
>  Issue Type: Bug
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Major
>
> The RollupMergeStream merges the aggregate results emitted by the 
> RollupStream on *worker* nodes.
> This is designed to be used in conjunction with the HashJoinStream to perform 
> rollup Aggregations on the joined Tuples. The HashJoinStream will require the 
> tuples to be partitioned on the Join keys. To avoid needing to repartition on 
> the *group by* fields for the RollupStream, we can perform a merge of the 
> rolled up Tuples coming from the workers.
> The construct would like this:
> {code}
> mergeRollup (...
>   parallel (...
> rollup (...
> hashJoin (
>   search(...),
>   search(...),
>   on="fieldA" 
> )
>  )
>  )
>)
> {code}
> The pseudo code above would push the *hashJoin* and *rollup* to the *worker* 
> nodes. The emitted rolled up tuples would be merged by the mergeRollup.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz opened a new pull request #1917: LUCENE-9535: Make ByteBuffersDataOutput#ramBytesUsed run in constant-time.

2020-09-23 Thread GitBox


jpountz opened a new pull request #1917:
URL: https://github.com/apache/lucene-solr/pull/1917


   This is called transitively from 
`DocumentsWriterFlushControl#doAfterDocument` which is synchronized and appears 
to be a point of contention.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9535) Investigate recent indexing slowdown for wikimedium documents

2020-09-23 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200946#comment-17200946
 ] 

Adrien Grand commented on LUCENE-9535:
--

I might have found something. When profiling indexing I noticed some contention 
in {{DocumentsWriterFlushControl#doAfterDocument}}, which happens to 
transitively call {{IndexingChain#ramBytesUsed}}, which was changed in 
LUCENE-9511 to call {{StoredFieldsWriter#ramBytesUsed}}. And 
{{StoredFieldsWriter#ramBytesUsed}} calls 
{{ByteBuffersDataOutput#ramBytesUsed}} which is a bit slow since it iterates 
over all pages. So we might have increased contention on 
{{DocumentsWriterFlushControl#doAfterDocument}} in LUCENE-9511, and this is 
only noticeable on Mike's beast because of the very high number of indexing 
threads (36). I opened https://github.com/apache/lucene-solr/pull/1917.

> Investigate recent indexing slowdown for wikimedium documents
> -
>
> Key: LUCENE-9535
> URL: https://issues.apache.org/jira/browse/LUCENE-9535
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: cpu_profile.svg
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Nightly benchmarks report a ~10% slowdown for 1kB documents as of September 
> 9th: [http://people.apache.org/~mikemccand/lucenebench/indexing.html].
> On that day, we added stored fields in DWPT accounting (LUCENE-9511), so I 
> first thought this could be due to smaller flushed segments and more merging, 
> but I still wonder whether there's something else. The benchmark runs with 
> 8GB of heap, 2GB of RAM buffer and 36 indexing threads. So it's about 2GB/36 
> = 57MB of RAM buffer per thread in the worst-case scenario that all DWPTs get 
> full at the same time. Stored fields account for about 0.7MB of memory, or 1% 
> of the indexing buffer size. How can a 1% reduction of buffering capacity 
> explain a 10% indexing slowdown? I looked into this further by running 
> indexing benchmarks locally with 8 indexing threads and 128MB of indexing 
> buffer memory, which would make this issue even more apparent if the smaller 
> RAM buffer was the cause, but I'm not seeing a regression and actually I'm 
> seeing similar number of flushes when I disabled memory accounting for stored 
> fields.
> I ran indexing under a profiler to see whether something else could cause 
> this slowdown, e.g. slow implementations of ramBytesUsed on stored fields 
> writers, but nothing surprising showed up and the profile looked just like I 
> would have expected.
> Another question I have is why the 4kB benchmark is not affected at all.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz merged pull request #1917: LUCENE-9535: Make ByteBuffersDataOutput#ramBytesUsed run in constant-time.

2020-09-23 Thread GitBox


jpountz merged pull request #1917:
URL: https://github.com/apache/lucene-solr/pull/1917


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9535) Investigate recent indexing slowdown for wikimedium documents

2020-09-23 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201008#comment-17201008
 ] 

ASF subversion and git services commented on LUCENE-9535:
-

Commit d226abd4481a5bd837264a7c53d1b13f417842ad in lucene-solr's branch 
refs/heads/master from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=d226abd ]

LUCENE-9535: Make ByteBuffersDataOutput#ramBytesUsed run in constant-time. 
(#1917)



> Investigate recent indexing slowdown for wikimedium documents
> -
>
> Key: LUCENE-9535
> URL: https://issues.apache.org/jira/browse/LUCENE-9535
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: cpu_profile.svg
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Nightly benchmarks report a ~10% slowdown for 1kB documents as of September 
> 9th: [http://people.apache.org/~mikemccand/lucenebench/indexing.html].
> On that day, we added stored fields in DWPT accounting (LUCENE-9511), so I 
> first thought this could be due to smaller flushed segments and more merging, 
> but I still wonder whether there's something else. The benchmark runs with 
> 8GB of heap, 2GB of RAM buffer and 36 indexing threads. So it's about 2GB/36 
> = 57MB of RAM buffer per thread in the worst-case scenario that all DWPTs get 
> full at the same time. Stored fields account for about 0.7MB of memory, or 1% 
> of the indexing buffer size. How can a 1% reduction of buffering capacity 
> explain a 10% indexing slowdown? I looked into this further by running 
> indexing benchmarks locally with 8 indexing threads and 128MB of indexing 
> buffer memory, which would make this issue even more apparent if the smaller 
> RAM buffer was the cause, but I'm not seeing a regression and actually I'm 
> seeing similar number of flushes when I disabled memory accounting for stored 
> fields.
> I ran indexing under a profiler to see whether something else could cause 
> this slowdown, e.g. slow implementations of ramBytesUsed on stored fields 
> writers, but nothing surprising showed up and the profile looked just like I 
> would have expected.
> Another question I have is why the 4kB benchmark is not affected at all.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9535) Investigate recent indexing slowdown for wikimedium documents

2020-09-23 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201011#comment-17201011
 ] 

ASF subversion and git services commented on LUCENE-9535:
-

Commit a83c2c2ab00fea84ea48053a53276db905f05000 in lucene-solr's branch 
refs/heads/branch_8x from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=a83c2c2 ]

LUCENE-9535: Make ByteBuffersDataOutput#ramBytesUsed run in constant-time. 
(#1917)



> Investigate recent indexing slowdown for wikimedium documents
> -
>
> Key: LUCENE-9535
> URL: https://issues.apache.org/jira/browse/LUCENE-9535
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: cpu_profile.svg
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Nightly benchmarks report a ~10% slowdown for 1kB documents as of September 
> 9th: [http://people.apache.org/~mikemccand/lucenebench/indexing.html].
> On that day, we added stored fields in DWPT accounting (LUCENE-9511), so I 
> first thought this could be due to smaller flushed segments and more merging, 
> but I still wonder whether there's something else. The benchmark runs with 
> 8GB of heap, 2GB of RAM buffer and 36 indexing threads. So it's about 2GB/36 
> = 57MB of RAM buffer per thread in the worst-case scenario that all DWPTs get 
> full at the same time. Stored fields account for about 0.7MB of memory, or 1% 
> of the indexing buffer size. How can a 1% reduction of buffering capacity 
> explain a 10% indexing slowdown? I looked into this further by running 
> indexing benchmarks locally with 8 indexing threads and 128MB of indexing 
> buffer memory, which would make this issue even more apparent if the smaller 
> RAM buffer was the cause, but I'm not seeing a regression and actually I'm 
> seeing similar number of flushes when I disabled memory accounting for stored 
> fields.
> I ran indexing under a profiler to see whether something else could cause 
> this slowdown, e.g. slow implementations of ramBytesUsed on stored fields 
> writers, but nothing surprising showed up and the profile looked just like I 
> would have expected.
> Another question I have is why the 4kB benchmark is not affected at all.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9535) Investigate recent indexing slowdown for wikimedium documents

2020-09-23 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201010#comment-17201010
 ] 

ASF subversion and git services commented on LUCENE-9535:
-

Commit d226abd4481a5bd837264a7c53d1b13f417842ad in lucene-solr's branch 
refs/heads/master from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=d226abd ]

LUCENE-9535: Make ByteBuffersDataOutput#ramBytesUsed run in constant-time. 
(#1917)



> Investigate recent indexing slowdown for wikimedium documents
> -
>
> Key: LUCENE-9535
> URL: https://issues.apache.org/jira/browse/LUCENE-9535
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: cpu_profile.svg
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Nightly benchmarks report a ~10% slowdown for 1kB documents as of September 
> 9th: [http://people.apache.org/~mikemccand/lucenebench/indexing.html].
> On that day, we added stored fields in DWPT accounting (LUCENE-9511), so I 
> first thought this could be due to smaller flushed segments and more merging, 
> but I still wonder whether there's something else. The benchmark runs with 
> 8GB of heap, 2GB of RAM buffer and 36 indexing threads. So it's about 2GB/36 
> = 57MB of RAM buffer per thread in the worst-case scenario that all DWPTs get 
> full at the same time. Stored fields account for about 0.7MB of memory, or 1% 
> of the indexing buffer size. How can a 1% reduction of buffering capacity 
> explain a 10% indexing slowdown? I looked into this further by running 
> indexing benchmarks locally with 8 indexing threads and 128MB of indexing 
> buffer memory, which would make this issue even more apparent if the smaller 
> RAM buffer was the cause, but I'm not seeing a regression and actually I'm 
> seeing similar number of flushes when I disabled memory accounting for stored 
> fields.
> I ran indexing under a profiler to see whether something else could cause 
> this slowdown, e.g. slow implementations of ramBytesUsed on stored fields 
> writers, but nothing surprising showed up and the profile looked just like I 
> would have expected.
> Another question I have is why the 4kB benchmark is not affected at all.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9535) Investigate recent indexing slowdown for wikimedium documents

2020-09-23 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201013#comment-17201013
 ] 

ASF subversion and git services commented on LUCENE-9535:
-

Commit a83c2c2ab00fea84ea48053a53276db905f05000 in lucene-solr's branch 
refs/heads/branch_8x from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=a83c2c2 ]

LUCENE-9535: Make ByteBuffersDataOutput#ramBytesUsed run in constant-time. 
(#1917)



> Investigate recent indexing slowdown for wikimedium documents
> -
>
> Key: LUCENE-9535
> URL: https://issues.apache.org/jira/browse/LUCENE-9535
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: cpu_profile.svg
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Nightly benchmarks report a ~10% slowdown for 1kB documents as of September 
> 9th: [http://people.apache.org/~mikemccand/lucenebench/indexing.html].
> On that day, we added stored fields in DWPT accounting (LUCENE-9511), so I 
> first thought this could be due to smaller flushed segments and more merging, 
> but I still wonder whether there's something else. The benchmark runs with 
> 8GB of heap, 2GB of RAM buffer and 36 indexing threads. So it's about 2GB/36 
> = 57MB of RAM buffer per thread in the worst-case scenario that all DWPTs get 
> full at the same time. Stored fields account for about 0.7MB of memory, or 1% 
> of the indexing buffer size. How can a 1% reduction of buffering capacity 
> explain a 10% indexing slowdown? I looked into this further by running 
> indexing benchmarks locally with 8 indexing threads and 128MB of indexing 
> buffer memory, which would make this issue even more apparent if the smaller 
> RAM buffer was the cause, but I'm not seeing a regression and actually I'm 
> seeing similar number of flushes when I disabled memory accounting for stored 
> fields.
> I ran indexing under a profiler to see whether something else could cause 
> this slowdown, e.g. slow implementations of ramBytesUsed on stored fields 
> writers, but nothing surprising showed up and the profile looked just like I 
> would have expected.
> Another question I have is why the 4kB benchmark is not affected at all.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9535) Investigate recent indexing slowdown for wikimedium documents

2020-09-23 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201012#comment-17201012
 ] 

ASF subversion and git services commented on LUCENE-9535:
-

Commit d226abd4481a5bd837264a7c53d1b13f417842ad in lucene-solr's branch 
refs/heads/master from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=d226abd ]

LUCENE-9535: Make ByteBuffersDataOutput#ramBytesUsed run in constant-time. 
(#1917)



> Investigate recent indexing slowdown for wikimedium documents
> -
>
> Key: LUCENE-9535
> URL: https://issues.apache.org/jira/browse/LUCENE-9535
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: cpu_profile.svg
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Nightly benchmarks report a ~10% slowdown for 1kB documents as of September 
> 9th: [http://people.apache.org/~mikemccand/lucenebench/indexing.html].
> On that day, we added stored fields in DWPT accounting (LUCENE-9511), so I 
> first thought this could be due to smaller flushed segments and more merging, 
> but I still wonder whether there's something else. The benchmark runs with 
> 8GB of heap, 2GB of RAM buffer and 36 indexing threads. So it's about 2GB/36 
> = 57MB of RAM buffer per thread in the worst-case scenario that all DWPTs get 
> full at the same time. Stored fields account for about 0.7MB of memory, or 1% 
> of the indexing buffer size. How can a 1% reduction of buffering capacity 
> explain a 10% indexing slowdown? I looked into this further by running 
> indexing benchmarks locally with 8 indexing threads and 128MB of indexing 
> buffer memory, which would make this issue even more apparent if the smaller 
> RAM buffer was the cause, but I'm not seeing a regression and actually I'm 
> seeing similar number of flushes when I disabled memory accounting for stored 
> fields.
> I ran indexing under a profiler to see whether something else could cause 
> this slowdown, e.g. slow implementations of ramBytesUsed on stored fields 
> writers, but nothing surprising showed up and the profile looked just like I 
> would have expected.
> Another question I have is why the 4kB benchmark is not affected at all.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] arafalov commented on pull request #1863: SOLR-14701: GuessSchemaFields URP to replace AddSchemaFields URP in schemaless mode

2020-09-23 Thread GitBox


arafalov commented on pull request #1863:
URL: https://github.com/apache/lucene-solr/pull/1863#issuecomment-697846565


   > Purely responding to the URP response part, it’s definitely not possible 
for URP to send non-error responses. I do think its something we should 
implement though, since it will expand the use cases that URPs can solve. Ill 
create a JIRA for it.
   
   It may be possible to future proof this implementation by making 
**guess-schema** being a mode switch, instead of current present/absent flag. 
So, maybe rename it to **guess-mode** instead with options of 
   - **update** - current (only) option basically, 
   - **show** - (if/when there is a way to return suggested JSON), 
   - **update-all** - (if we wanted to - sometimes - have specific fields even 
if dynamicField definition matches; could be done now if useful, 
   - **none** to support tools easier. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] s1monw opened a new pull request #1918: LUCENE-9535: Commit DWPT bytes used before locking indexing

2020-09-23 Thread GitBox


s1monw opened a new pull request #1918:
URL: https://github.com/apache/lucene-solr/pull/1918


   Currently we calcualte the ramBytesUsed by the DWPT under the flushControl
   lock. We can do this caculation safely outside of the lock without any 
downside.
   The FlushControl lock should be used with care since it's a central part of 
indexing
   and might block all indexing.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-14892) shards.info with shards.tolerant can yield an empty key

2020-09-23 Thread David Smiley (Jira)
David Smiley created SOLR-14892:
---

 Summary: shards.info with shards.tolerant can yield an empty key
 Key: SOLR-14892
 URL: https://issues.apache.org/jira/browse/SOLR-14892
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: search
Reporter: David Smiley


When using shards.tolerant=true and shards.info=true when a shard isn't 
available (and maybe other circumstances), the shards.info section of the 
response may have an empty-string key child with a value that is ambiguous as 
to which shard(s) couldn't be reached.

This problem can be revealed by modifying 
org.apache.solr.cloud.TestDownShardTolerantSearch#searchingShouldFailWithoutTolerantSearchSetToTrue
 to add shards.info and then examine the response in a debugger.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14892) shards.info with shards.tolerant can yield an empty key

2020-09-23 Thread David Smiley (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated SOLR-14892:

Attachment: solr14892.png

> shards.info with shards.tolerant can yield an empty key
> ---
>
> Key: SOLR-14892
> URL: https://issues.apache.org/jira/browse/SOLR-14892
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: search
>Reporter: David Smiley
>Priority: Minor
> Attachments: solr14892.png
>
>
> When using shards.tolerant=true and shards.info=true when a shard isn't 
> available (and maybe other circumstances), the shards.info section of the 
> response may have an empty-string key child with a value that is ambiguous as 
> to which shard(s) couldn't be reached.
> This problem can be revealed by modifying 
> org.apache.solr.cloud.TestDownShardTolerantSearch#searchingShouldFailWithoutTolerantSearchSetToTrue
>  to add shards.info and then examine the response in a debugger.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-14893) Allow UpdateRequestProcessors to add non-error messages to the response

2020-09-23 Thread Houston Putman (Jira)
Houston Putman created SOLR-14893:
-

 Summary: Allow UpdateRequestProcessors to add non-error messages 
to the response
 Key: SOLR-14893
 URL: https://issues.apache.org/jira/browse/SOLR-14893
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
  Components: UpdateRequestProcessors
Reporter: Houston Putman


There are many reasons why a UpdateRequestProcessor would want to send a 
response back to the user:
 * Informing the user on the results when they use schema-guessing mode 
(SOLR-14701)
 * Building a new Processor that uses the lucene monitor library to alert on 
incoming documents that match saved queries
 * The Language detection URPs could respond with the languages selected for 
each document.

Currently URPs can be passed in the Response object via the URPFactory that 
creates it. However, whenever the URP is placed in the chain after the 
DistributedURP, the response that it sends back will be dismissed by the DURP 
and not merged and sent back to the user.

The bulk of the logic here would be to add logic in the DURP to accept custom 
messages in the responses of the updates it sends, and then merge those into an 
overall response to send to the user. Each URP could be responsible for merging 
its section of responses, because that will likely contain business logic for 
the URP that the DURP is not aware of.

 

The SolrJ classes would also need updates to give the user an easy way to read 
response messages.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14892) shards.info with shards.tolerant can yield an empty key

2020-09-23 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201038#comment-17201038
 ] 

David Smiley commented on SOLR-14892:
-

I chased this down to 
org.apache.solr.handler.component.HttpShardHandler#createSliceShardsStr which 
when given an empty list, returns an empty string. It should probably return 
null.  But using  But null has ripple effects in many places which assume 
non-null values and maybe were written without shards.tolerant in mind. Lets 
say it remains an empty string. SearchHandler.handleRequestBody loops over 
"sreq.actualShards" which can yield that empty string. I hoped simply 
"continue"-ing this loop on this occurrence may help but it led to some other 
mystery. The code involved in general here is awfully messy.

> shards.info with shards.tolerant can yield an empty key
> ---
>
> Key: SOLR-14892
> URL: https://issues.apache.org/jira/browse/SOLR-14892
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: search
>Reporter: David Smiley
>Priority: Minor
> Attachments: solr14892.png
>
>
> When using shards.tolerant=true and shards.info=true when a shard isn't 
> available (and maybe other circumstances), the shards.info section of the 
> response may have an empty-string key child with a value that is ambiguous as 
> to which shard(s) couldn't be reached.
> This problem can be revealed by modifying 
> org.apache.solr.cloud.TestDownShardTolerantSearch#searchingShouldFailWithoutTolerantSearchSetToTrue
>  to add shards.info and then examine the response in a debugger.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on pull request #1836: LUCENE-9317: Clean up split package in analyzers-common

2020-09-23 Thread GitBox


dweiss commented on pull request #1836:
URL: https://github.com/apache/lucene-solr/pull/1836#issuecomment-697932731


   Hi Tomoko. The patch looks good to me (precommit doesn't pass though). I 
would commit it in once you get precommit to work - this issue has been out 
there for a while, nobody objected. If there is a need for changes (on master), 
we'll just follow-up.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on a change in pull request #1905: LUCENE-9488 Release with Gradle Part 2

2020-09-23 Thread GitBox


dweiss commented on a change in pull request #1905:
URL: https://github.com/apache/lucene-solr/pull/1905#discussion_r493853135



##
File path: lucene/build.gradle
##
@@ -15,8 +15,56 @@
  * limitations under the License.
  */
 
+// Should we do this as :lucene:packaging similar to how Solr does it?
+// Or is this fine here?
+
+plugins {
+  id 'distribution'
+}
+
 description = 'Parent project for Apache Lucene Core'
 
 subprojects {
   group "org.apache.lucene"
-}
\ No newline at end of file
+}
+
+distributions {
+  main {
+  // This is empirically wrong, but it is mostly a copy from `ant 
package-zip`

Review comment:
   Haven't forgotten about it, just busy with work. Those release scripts 
will have to be adjusted to Solr and Lucene released independently in the 
future. Which requires independent builds, which requires repo split. Will have 
to get to it, eventually. Sigh.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley commented on a change in pull request #1900: SOLR-14036: Remove explicit distrib=false from /terms handler

2020-09-23 Thread GitBox


dsmiley commented on a change in pull request #1900:
URL: https://github.com/apache/lucene-solr/pull/1900#discussion_r493862362



##
File path: solr/solr-ref-guide/src/major-changes-in-solr-9.adoc
##
@@ -128,6 +128,8 @@ _(raw; not yet edited)_
 * SOLR-14510: The `writeStartDocumentList` in `TextResponseWriter` now 
receives an extra boolean parameter representing the "exactness" of the 
numFound value (exact vs approximation).
   Any custom response writer extending `TextResponseWriter` will need to 
implement this abstract method now (instead previous with the same name but 
without the new boolean parameter).
 
+* SOLR-14036: Implicit /terms handler now supports distributed search by 
default, when running in cloud mode.

Review comment:
   Reworded to help a user think through upgrading:
   ```suggestion
   * SOLR-14036: Implicit /terms handler now returns terms across all shards in 
SolrCloud instead of only the local core.  Users/apps may be assuming the old 
behavior.  A request can be modified via the standard distrib=false param to 
only use the local core receiving the request.
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14354) HttpShardHandler send requests in async

2020-09-23 Thread Ishan Chattopadhyaya (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201067#comment-17201067
 ] 

Ishan Chattopadhyaya commented on SOLR-14354:
-

This doesn't have associated performance benchmarks for 8.7. 

bq. Would you recommend reverting from 8x? I'm not sure; it hasn't been shown 
to cause test failures that we can attribute here so seems safe from that end. 
At least where I work, it's something we'll use in our 8x fork and can serve as 
a canary.
We need to stop treating our users as guinea pigs.

-1 for 8.7 unless this is somehow made optional or there are performance 
benchmarks to prove its efficiency.

> HttpShardHandler send requests in async
> ---
>
> Key: SOLR-14354
> URL: https://issues.apache.org/jira/browse/SOLR-14354
> Project: Solr
>  Issue Type: Improvement
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Blocker
> Fix For: master (9.0), 8.7
>
> Attachments: image-2020-03-23-10-04-08-399.png, 
> image-2020-03-23-10-09-10-221.png, image-2020-03-23-10-12-00-661.png
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> h2. 1. Current approach (problem) of Solr
> Below is the diagram describe the model on how currently handling a request.
> !image-2020-03-23-10-04-08-399.png!
> The main-thread that handles the search requests, will submit n requests (n 
> equals to number of shards) to an executor. So each request will correspond 
> to a thread, after sending a request that thread basically do nothing just 
> waiting for response from other side. That thread will be swapped out and CPU 
> will try to handle another thread (this is called context switch, CPU will 
> save the context of the current thread and switch to another one). When some 
> data (not all) come back, that thread will be called to parsing these data, 
> then it will wait until more data come back. So there will be lots of context 
> switching in CPU. That is quite inefficient on using threads.Basically we 
> want less threads and most of them must busy all the time, because threads 
> are not free as well as context switching. That is the main idea behind 
> everything, like executor
> h2. 2. Async call of Jetty HttpClient
> Jetty HttpClient offers async API like this.
> {code:java}
> httpClient.newRequest("http://domain.com/path";)
> // Add request hooks
> .onRequestQueued(request -> { ... })
> .onRequestBegin(request -> { ... })
> // Add response hooks
> .onResponseBegin(response -> { ... })
> .onResponseHeaders(response -> { ... })
> .onResponseContent((response, buffer) -> { ... })
> .send(result -> { ... }); {code}
> Therefore after calling {{send()}} the thread will return immediately without 
> any block. Then when the client received the header from other side, it will 
> call {{onHeaders()}} listeners. When the client received some {{byte[]}} (not 
> all response) from the data it will call {{onContent(buffer)}} listeners. 
> When everything finished it will call {{onComplete}} listeners. One main 
> thing that will must notice here is all listeners should finish quick, if the 
> listener block, all further data of that request won’t be handled until the 
> listener finish.
> h2. 3. Solution 1: Sending requests async but spin one thread per response
>  Jetty HttpClient already provides several listeners, one of them is 
> InputStreamResponseListener. This is how it is get used
> {code:java}
> InputStreamResponseListener listener = new InputStreamResponseListener();
> client.newRequest(...).send(listener);
> // Wait for the response headers to arrive
> Response response = listener.get(5, TimeUnit.SECONDS);
> if (response.getStatus() == 200) {
>   // Obtain the input stream on the response content
>   try (InputStream input = listener.getInputStream()) {
> // Read the response content
>   }
> } {code}
> In this case, there will be 2 thread
>  * one thread trying to read the response content from InputStream
>  * one thread (this is a short-live task) feeding content to above 
> InputStream whenever some byte[] is available. Note that if this thread 
> unable to feed data into InputStream, this thread will wait.
> By using this one, the model of HttpShardHandler can be written into 
> something like this
> {code:java}
> handler.sendReq(req, (is) -> {
>   executor.submit(() ->
> try (is) {
>   // Read the content from InputStream
> }
>   )
> }) {code}
>  The first diagram will be changed into this
> !image-2020-03-23-10-09-10-221.png!
> Notice that although “sending req to shard1” is wide, it won’t take long time 
> since sending req is a very quick operation. With this operation, handling 
> threads won’t be spin up until first bytes are sent back. Notice that i

[GitHub] [lucene-solr] madrob commented on a change in pull request #1905: LUCENE-9488 Release with Gradle Part 2

2020-09-23 Thread GitBox


madrob commented on a change in pull request #1905:
URL: https://github.com/apache/lucene-solr/pull/1905#discussion_r493884949



##
File path: lucene/build.gradle
##
@@ -15,8 +15,56 @@
  * limitations under the License.
  */
 
+// Should we do this as :lucene:packaging similar to how Solr does it?
+// Or is this fine here?
+
+plugins {
+  id 'distribution'
+}
+
 description = 'Parent project for Apache Lucene Core'
 
 subprojects {
   group "org.apache.lucene"
-}
\ No newline at end of file
+}
+
+distributions {
+  main {
+  // This is empirically wrong, but it is mostly a copy from `ant 
package-zip`

Review comment:
   My goal here with getting things releasable is to also turn the smoke 
tester back on so that we can hopefully catch issues before we actually go to 
do the release. I understand there’s going to be more split related work, but 
that shouldn’t stop us from working on the pieces that we can work on before 
that. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9537) Add Indri Search Engine Functionality to Lucene

2020-09-23 Thread Cameron VandenBerg (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201089#comment-17201089
 ] 

Cameron VandenBerg commented on LUCENE-9537:


Hi Adrien,

Unfortunately, the smoothing score that we use is document specific, so I am 
not sure if I could make it "transferable".  I am definitely interested in 
brainstorming ways that we can make Indri fit into the Lucene architecture 
better though.  Perhaps an example of how Indri smoothing scores would be 
helpful.

 

Supposed we have an index with 4 documents (so sorry for the political nature 
of the documents... it's just what I can easily think of at the moment):

1) Donald Trump is the president of the United States.

2) There are three branches of government.  The president is the head of the 
executive branch.

3) Jane Doe is president of the PTO.

4) Trump was elected in the 2016 election.

 

Say that the query is: President Trump.

In this index, the term president occurs more than the term Trump.  The 
smoothing score acts like and idf for the query terms so that documents with 
just the term Trump will be ranked higher than documents with just the term 
president.

 

Consider documents 3&4, which have the same length and each have one search 
term, but Document 4 has the more rare search term.  Therefore the smoothing 
score for the term Trump in Document 3, will be lower than the smoothing score 
for the term president in Document 4.  The addition of the smoothing scores for 
the terms that don't exist allows Document 4 to get a higher score and be 
ranked above Document 3.  

 

Let me know whether this example makes sense.  Can you see a way that I can 
refactor the smoothing score so that it better fits into Lucene's existing 
architecture?  Or let me know if I misunderstood your comment and you still 
feel that what you suggested will work.

 

Thank you!

> Add Indri Search Engine Functionality to Lucene
> ---
>
> Key: LUCENE-9537
> URL: https://issues.apache.org/jira/browse/LUCENE-9537
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Cameron VandenBerg
>Priority: Major
>  Labels: patch
> Attachments: LUCENE-INDRI.patch
>
>
> Indri ([http://lemurproject.org/indri.php]) is an academic search engine 
> developed by The University of Massachusetts and Carnegie Mellon University.  
> The major difference between Lucene and Indri is that Indri will give a 
> document a "smoothing score" to a document that does not contain the search 
> term, which has improved the search ranking accuracy in our experiments.  I 
> have created an Indri patch, which adds the search code needed to implement 
> the Indri AND logic as well as Indri's implementation of Dirichlet Smoothing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13682) command line option to export data to a file

2020-09-23 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201092#comment-17201092
 ] 

David Smiley commented on SOLR-13682:
-

The ref-guide addition to solr-control-script-reference.adoc is nice, but I was 
unable to find it there as a user.  I only found it using my committer 
sleuthing experience.  My first action as a user was to search the ref guide 
search box for the word "export" which uncovered exporting-result-sets.adoc.  
That page definitely seemed like it was spot-on, yet it didn't have information 
about this new cool tool.  Can you add a link there [~noble.paul]?

> command line option to export data to a file
> 
>
> Key: SOLR-13682
> URL: https://issues.apache.org/jira/browse/SOLR-13682
> Project: Solr
>  Issue Type: Improvement
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Major
> Fix For: 8.3
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> example
> {code:java}
> bin/solr export -url http://localhost:8983/solr/gettingstarted
> {code}
> This will export all the docs in a collection called {{gettingstarted}} into 
> a file called {{gettingstarted.json}}
> additional options are
>  * {{format}} : {{jsonl}} (default) or {{javabin}}
>  * {{out}} : export file name 
>  * {{query}} : a custom query , default is **:**
>  * {{fields}}: a comma separated list of fields to be exported
>  * {{limit}} : no:of docs. default is 100 , send  {{-1}} to import all the 
> docs
> h2. Importing using {{curl}}
> importing json file
> {code:java}
> curl -X POST -d @gettingstarted.json 
> http://localhost:18983/solr/gettingstarted/update/json/docs?commit=true
> {code}
> importing javabin format file
> {code:java}
> curl -X POST --header "Content-Type: application/javabin" --data-binary 
> @gettingstarted.javabin 
> http://localhost:7574/solr/gettingstarted/update?commit=true
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] uschindler commented on pull request #1836: LUCENE-9317: Clean up split package in analyzers-common

2020-09-23 Thread GitBox


uschindler commented on pull request #1836:
URL: https://github.com/apache/lucene-solr/pull/1836#issuecomment-697990504


   > Create fake factory base classes in o.a.l.a.util for backward 
compatibility (?)
   
   We do this only in Lucene 9, so more important to add all changes to 
MIGRATE.md
   
   > Fix tests
   
   I mentioned this, as the META-INF/services files are not updated. This makes 
renamed analyzers not load, as SPI can't find them
   As said before we need an SPI load test that ensures that all analyzer 
coponents have a factory that loads successfully with SPI. Maybe move that test 
(abstract) to test-framework and create a test implementation instance for each 
module containing factories. The test in analysis/common is not enough anymore.
   
   > Fix gradle scripts (?)
   
   jflex regenerate may need to be adapted.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] goankur commented on a change in pull request #1893: LUCENE-9444 Utility class to get facet labels from taxonomy for a fac…

2020-09-23 Thread GitBox


goankur commented on a change in pull request #1893:
URL: https://github.com/apache/lucene-solr/pull/1893#discussion_r493919329



##
File path: 
lucene/facet/src/test/org/apache/lucene/facet/taxonomy/TestTaxonomyLabels.java
##
@@ -0,0 +1,192 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.facet.taxonomy;
+
+import org.apache.lucene.document.Document;
+import org.apache.lucene.facet.FacetField;
+import org.apache.lucene.facet.FacetTestCase;
+import org.apache.lucene.facet.FacetsCollector;
+import org.apache.lucene.facet.FacetsCollector.MatchingDocs;
+import org.apache.lucene.facet.FacetsConfig;
+import org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyReader;
+import org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyWriter;
+import org.apache.lucene.index.IndexWriterConfig;
+import org.apache.lucene.index.RandomIndexWriter;
+import org.apache.lucene.search.DocIdSetIterator;
+import org.apache.lucene.search.IndexSearcher;
+import org.apache.lucene.search.MatchAllDocsQuery;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.util.IOUtils;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+public class TestTaxonomyLabels extends FacetTestCase {
+
+  private List prepareDocuments() {
+List docs = new ArrayList<>();
+
+Document doc = new Document();
+doc.add(new FacetField("Author", "Bob"));
+doc.add(new FacetField("Publish Date", "2010", "10", "15"));
+docs.add(doc);
+
+doc = new Document();
+doc.add(new FacetField("Author", "Lisa"));
+doc.add(new FacetField("Publish Date", "2010", "10", "20"));
+docs.add(doc);
+
+doc = new Document();
+doc.add(new FacetField("Author", "Tom"));
+doc.add(new FacetField("Publish Date", "2012", "1", "1"));
+docs.add(doc);
+
+doc = new Document();
+doc.add(new FacetField("Author", "Susan"));
+doc.add(new FacetField("Publish Date", "2012", "1", "7"));
+docs.add(doc);
+
+doc = new Document();
+doc.add(new FacetField("Author", "Frank"));
+doc.add(new FacetField("Publish Date", "1999", "5", "5"));
+docs.add(doc);
+
+return docs;
+  }
+
+  private List allDocIds(MatchingDocs m, boolean decreasingDocIds) 
throws IOException {
+DocIdSetIterator disi = m.bits.iterator();
+List docIds = new ArrayList<>();
+while (disi.nextDoc() != DocIdSetIterator.NO_MORE_DOCS) {
+  docIds.add(disi.docID());
+}
+
+if (decreasingDocIds == true) {
+  Collections.reverse(docIds);
+}
+return docIds;
+  }
+
+  private List lookupFacetLabels(TaxonomyFacetLabels taxoLabels,
+ List matchingDocs) 
throws IOException {
+return lookupFacetLabels(taxoLabels, matchingDocs, null, false);
+  }
+
+  private List lookupFacetLabels(TaxonomyFacetLabels taxoLabels,
+ List matchingDocs,
+ String dimension) throws 
IOException {
+return lookupFacetLabels(taxoLabels, matchingDocs, dimension, false);
+  }
+
+  private List lookupFacetLabels(TaxonomyFacetLabels taxoLabels, 
List matchingDocs, String dimension,
+ boolean decreasingDocIds) throws 
IOException {
+List facetLabels = new ArrayList<>();
+
+for (MatchingDocs m : matchingDocs) {
+  TaxonomyFacetLabels.FacetLabelReader facetLabelReader = 
taxoLabels.getFacetLabelReader(m.context);
+  List docIds = allDocIds(m, decreasingDocIds);
+  FacetLabel facetLabel;
+  for (Integer docId : docIds) {
+while (true) {
+  if (dimension != null) {
+facetLabel = facetLabelReader.nextFacetLabel(docId, dimension);
+  } else {
+facetLabel = facetLabelReader.nextFacetLabel(docId);
+  }
+
+  if (facetLabel == null) {
+break;
+  }
+  facetLabels.add(facetLabel);
+}
+  }
+}
+
+return facetLabels;
+  }
+
+
+  public void testBasic() throws Exception {

Review comment:
   Done in this revision




[jira] [Updated] (SOLR-14889) improve templated variable escaping in ref-guide _config.yml

2020-09-23 Thread Chris M. Hostetter (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris M. Hostetter updated SOLR-14889:
--
Attachment: SOLR-14889.patch
Status: Open  (was: Open)

I thought this would be straight forward, but there's clearly still a lot about 
the gradle lifecycle / order-of-evaluation that i don't udnerstand

The key change in the attached patch that this whole idea hinges on is..
{noformat}
-expand(templateProps)
+expand( templateProps.collectEntries({ k, v -> [k, 
v.replaceAll("'","''")]}) )
{noformat}
But for reasons i don't understand, this seems to bypass the changes made to 
{{templateProps}} in ' {{setupLazyProps.doFirst}} ', where the ivy version 
values are added...
{noformat}
Execution failed for task ':solr:solr-ref-guide:prepareSources'.
> Could not copy file 
> '/home/hossman/lucene/dev/solr/solr-ref-guide/src/_config.yml.template' to 
> '/home/hossman/lucene/dev/solr/solr-ref-guide/build/content/_config.yml'.
   > Missing property (ivyCommonsCodec) for Groovy template expansion. Defined 
keys [javadocLink, solrGuideDraftStatus, solrRootPath, solrDocsVersion, 
solrGuideVersionPath, htmlSolrJavadocs, htmlLuceneJavadocs, buildDate, 
buildYear, out].
{noformat}
(I'm also not clear where that 'out' key is coming from, but i have no idea if 
that pre-dates this change)

I experimented with adding a {{doFirst}} block to {{prepareSources}} that would 
copy the (escaped) templateProps into a newly defined Map in that task, that 
would be used in the {{expand(...)}} call – but that still seemed to result in 
the {{expand(..)}} being evaluated before the {{doFirst}} modified the map (see 
big commented out nocommit block in the patch for what i mean)

[~uschindler] / [~dweiss] - can you help me understand what's going on here and 
how to do this "the right way" ?

> improve templated variable escaping in ref-guide _config.yml
> 
>
> Key: SOLR-14889
> URL: https://issues.apache.org/jira/browse/SOLR-14889
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-14889.patch
>
>
> SOLR-14824 ran into windows failures when we switching from using a hardcoded 
> "relative" path to the solrRootPath  to using groovy/project variables to get 
> the path.  the reason for the failures was that the path us used as a 
> variable tempted into {{_config.yml.template}} to build the {{_config.yml}} 
> file, but on windows the path seperater of '\' was being parsed by 
> jekyll/YAML as a string escape character.
> (This wasn't a problem we ran into before, even on windows, prior to the 
> SOLR-14824 changes, because the hardcoded relative path only used '/' 
> delimiters, which (j)ruby was happy to work with, even on windows.
> As Uwe pointed out when hotfixing this...
> {quote}Problem was that backslashes are used to escape strings, but windows 
> paths also have those. Fix was to add StringEscapeUtils, but I don't like 
> this too much. Maybe we find a better solution to make special characters in 
> those properties escaped correctly when used in strings inside templates.
> {quote}
> ...the current fix of using {{StringEscapeUtils.escapeJava}} - only for this 
> one variable -- doesn't really protect other variables that might have 
> special charactes in them down the road, and while "escapeJava" work ok for 
> the "\" issue, it isn't neccessarily consistent with all YAML escapse, which 
> could lead to even weird bugs/cofusion down the road.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14889) improve templated variable escaping in ref-guide _config.yml

2020-09-23 Thread Uwe Schindler (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201157#comment-17201157
 ] 

Uwe Schindler commented on SOLR-14889:
--

That's very easy to explain: The expansion is done when the project is 
configured!

Previously it was working because you just set a pointer to the (still changing 
props). Here the problem is that the collect loop is running during 
configuration phase.

To fix this the whole expand must be delayed using lazy evaluation. It's later, 
will try before going to bed.

> improve templated variable escaping in ref-guide _config.yml
> 
>
> Key: SOLR-14889
> URL: https://issues.apache.org/jira/browse/SOLR-14889
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-14889.patch
>
>
> SOLR-14824 ran into windows failures when we switching from using a hardcoded 
> "relative" path to the solrRootPath  to using groovy/project variables to get 
> the path.  the reason for the failures was that the path us used as a 
> variable tempted into {{_config.yml.template}} to build the {{_config.yml}} 
> file, but on windows the path seperater of '\' was being parsed by 
> jekyll/YAML as a string escape character.
> (This wasn't a problem we ran into before, even on windows, prior to the 
> SOLR-14824 changes, because the hardcoded relative path only used '/' 
> delimiters, which (j)ruby was happy to work with, even on windows.
> As Uwe pointed out when hotfixing this...
> {quote}Problem was that backslashes are used to escape strings, but windows 
> paths also have those. Fix was to add StringEscapeUtils, but I don't like 
> this too much. Maybe we find a better solution to make special characters in 
> those properties escaped correctly when used in strings inside templates.
> {quote}
> ...the current fix of using {{StringEscapeUtils.escapeJava}} - only for this 
> one variable -- doesn't really protect other variables that might have 
> special charactes in them down the road, and while "escapeJava" work ok for 
> the "\" issue, it isn't neccessarily consistent with all YAML escapse, which 
> could lead to even weird bugs/cofusion down the road.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-14889) improve templated variable escaping in ref-guide _config.yml

2020-09-23 Thread Uwe Schindler (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201157#comment-17201157
 ] 

Uwe Schindler edited comment on SOLR-14889 at 9/23/20, 11:13 PM:
-

That's very easy to explain: The expansion is done when the project is 
configured!

Previously it was working because you just set a pointer to the (still 
changing) props. Here the problem is that the collect loop is running during 
configuration phase and you set a pointer to the result during configuration.

To fix this the whole expand must be delayed using lazy evaluation. It's later, 
will try before going to bed.


was (Author: thetaphi):
That's very easy to explain: The expansion is done when the project is 
configured!

Previously it was working because you just set a pointer to the (still changing 
props). Here the problem is that the collect loop is running during 
configuration phase.

To fix this the whole expand must be delayed using lazy evaluation. It's later, 
will try before going to bed.

> improve templated variable escaping in ref-guide _config.yml
> 
>
> Key: SOLR-14889
> URL: https://issues.apache.org/jira/browse/SOLR-14889
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-14889.patch
>
>
> SOLR-14824 ran into windows failures when we switching from using a hardcoded 
> "relative" path to the solrRootPath  to using groovy/project variables to get 
> the path.  the reason for the failures was that the path us used as a 
> variable tempted into {{_config.yml.template}} to build the {{_config.yml}} 
> file, but on windows the path seperater of '\' was being parsed by 
> jekyll/YAML as a string escape character.
> (This wasn't a problem we ran into before, even on windows, prior to the 
> SOLR-14824 changes, because the hardcoded relative path only used '/' 
> delimiters, which (j)ruby was happy to work with, even on windows.
> As Uwe pointed out when hotfixing this...
> {quote}Problem was that backslashes are used to escape strings, but windows 
> paths also have those. Fix was to add StringEscapeUtils, but I don't like 
> this too much. Maybe we find a better solution to make special characters in 
> those properties escaped correctly when used in strings inside templates.
> {quote}
> ...the current fix of using {{StringEscapeUtils.escapeJava}} - only for this 
> one variable -- doesn't really protect other variables that might have 
> special charactes in them down the road, and while "escapeJava" work ok for 
> the "\" issue, it isn't neccessarily consistent with all YAML escapse, which 
> could lead to even weird bugs/cofusion down the road.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14889) improve templated variable escaping in ref-guide _config.yml

2020-09-23 Thread Uwe Schindler (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated SOLR-14889:
-
Attachment: SOLR-14889.patch

> improve templated variable escaping in ref-guide _config.yml
> 
>
> Key: SOLR-14889
> URL: https://issues.apache.org/jira/browse/SOLR-14889
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-14889.patch, SOLR-14889.patch
>
>
> SOLR-14824 ran into windows failures when we switching from using a hardcoded 
> "relative" path to the solrRootPath  to using groovy/project variables to get 
> the path.  the reason for the failures was that the path us used as a 
> variable tempted into {{_config.yml.template}} to build the {{_config.yml}} 
> file, but on windows the path seperater of '\' was being parsed by 
> jekyll/YAML as a string escape character.
> (This wasn't a problem we ran into before, even on windows, prior to the 
> SOLR-14824 changes, because the hardcoded relative path only used '/' 
> delimiters, which (j)ruby was happy to work with, even on windows.
> As Uwe pointed out when hotfixing this...
> {quote}Problem was that backslashes are used to escape strings, but windows 
> paths also have those. Fix was to add StringEscapeUtils, but I don't like 
> this too much. Maybe we find a better solution to make special characters in 
> those properties escaped correctly when used in strings inside templates.
> {quote}
> ...the current fix of using {{StringEscapeUtils.escapeJava}} - only for this 
> one variable -- doesn't really protect other variables that might have 
> special charactes in them down the road, and while "escapeJava" work ok for 
> the "\" issue, it isn't neccessarily consistent with all YAML escapse, which 
> could lead to even weird bugs/cofusion down the road.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14889) improve templated variable escaping in ref-guide _config.yml

2020-09-23 Thread Uwe Schindler (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201162#comment-17201162
 ] 

Uwe Schindler commented on SOLR-14889:
--

Here is my fix:
 [^SOLR-14889.patch] 

You need to create the expty map first and then populate it with escaped 
properties in doFirst. During configuration, the expand() method gets the empty 
map, which is populated in doFirst.

This is a quick hack; I don't like it. Maybe I have an idea this night.

> improve templated variable escaping in ref-guide _config.yml
> 
>
> Key: SOLR-14889
> URL: https://issues.apache.org/jira/browse/SOLR-14889
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-14889.patch, SOLR-14889.patch
>
>
> SOLR-14824 ran into windows failures when we switching from using a hardcoded 
> "relative" path to the solrRootPath  to using groovy/project variables to get 
> the path.  the reason for the failures was that the path us used as a 
> variable tempted into {{_config.yml.template}} to build the {{_config.yml}} 
> file, but on windows the path seperater of '\' was being parsed by 
> jekyll/YAML as a string escape character.
> (This wasn't a problem we ran into before, even on windows, prior to the 
> SOLR-14824 changes, because the hardcoded relative path only used '/' 
> delimiters, which (j)ruby was happy to work with, even on windows.
> As Uwe pointed out when hotfixing this...
> {quote}Problem was that backslashes are used to escape strings, but windows 
> paths also have those. Fix was to add StringEscapeUtils, but I don't like 
> this too much. Maybe we find a better solution to make special characters in 
> those properties escaped correctly when used in strings inside templates.
> {quote}
> ...the current fix of using {{StringEscapeUtils.escapeJava}} - only for this 
> one variable -- doesn't really protect other variables that might have 
> special charactes in them down the road, and while "escapeJava" work ok for 
> the "\" issue, it isn't neccessarily consistent with all YAML escapse, which 
> could lead to even weird bugs/cofusion down the road.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-14889) improve templated variable escaping in ref-guide _config.yml

2020-09-23 Thread Uwe Schindler (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201162#comment-17201162
 ] 

Uwe Schindler edited comment on SOLR-14889 at 9/23/20, 11:27 PM:
-

Here is my fix:
 [^SOLR-14889.patch] 

You need to create the empty map first and then populate it with escaped 
properties in doFirst. During configuration, the expand() method gets the empty 
map, which is populated in doFirst.

This is a quick hack; I don't like it. Maybe I have an idea this night.


was (Author: thetaphi):
Here is my fix:
 [^SOLR-14889.patch] 

You need to create the expty map first and then populate it with escaped 
properties in doFirst. During configuration, the expand() method gets the empty 
map, which is populated in doFirst.

This is a quick hack; I don't like it. Maybe I have an idea this night.

> improve templated variable escaping in ref-guide _config.yml
> 
>
> Key: SOLR-14889
> URL: https://issues.apache.org/jira/browse/SOLR-14889
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-14889.patch, SOLR-14889.patch
>
>
> SOLR-14824 ran into windows failures when we switching from using a hardcoded 
> "relative" path to the solrRootPath  to using groovy/project variables to get 
> the path.  the reason for the failures was that the path us used as a 
> variable tempted into {{_config.yml.template}} to build the {{_config.yml}} 
> file, but on windows the path seperater of '\' was being parsed by 
> jekyll/YAML as a string escape character.
> (This wasn't a problem we ran into before, even on windows, prior to the 
> SOLR-14824 changes, because the hardcoded relative path only used '/' 
> delimiters, which (j)ruby was happy to work with, even on windows.
> As Uwe pointed out when hotfixing this...
> {quote}Problem was that backslashes are used to escape strings, but windows 
> paths also have those. Fix was to add StringEscapeUtils, but I don't like 
> this too much. Maybe we find a better solution to make special characters in 
> those properties escaped correctly when used in strings inside templates.
> {quote}
> ...the current fix of using {{StringEscapeUtils.escapeJava}} - only for this 
> one variable -- doesn't really protect other variables that might have 
> special charactes in them down the road, and while "escapeJava" work ok for 
> the "\" issue, it isn't neccessarily consistent with all YAML escapse, which 
> could lead to even weird bugs/cofusion down the road.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14889) improve templated variable escaping in ref-guide _config.yml

2020-09-23 Thread Uwe Schindler (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201164#comment-17201164
 ] 

Uwe Schindler commented on SOLR-14889:
--

I also changed the logger.warn to logger.lifecycle when outputting the 
properties.

> improve templated variable escaping in ref-guide _config.yml
> 
>
> Key: SOLR-14889
> URL: https://issues.apache.org/jira/browse/SOLR-14889
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-14889.patch, SOLR-14889.patch
>
>
> SOLR-14824 ran into windows failures when we switching from using a hardcoded 
> "relative" path to the solrRootPath  to using groovy/project variables to get 
> the path.  the reason for the failures was that the path us used as a 
> variable tempted into {{_config.yml.template}} to build the {{_config.yml}} 
> file, but on windows the path seperater of '\' was being parsed by 
> jekyll/YAML as a string escape character.
> (This wasn't a problem we ran into before, even on windows, prior to the 
> SOLR-14824 changes, because the hardcoded relative path only used '/' 
> delimiters, which (j)ruby was happy to work with, even on windows.
> As Uwe pointed out when hotfixing this...
> {quote}Problem was that backslashes are used to escape strings, but windows 
> paths also have those. Fix was to add StringEscapeUtils, but I don't like 
> this too much. Maybe we find a better solution to make special characters in 
> those properties escaped correctly when used in strings inside templates.
> {quote}
> ...the current fix of using {{StringEscapeUtils.escapeJava}} - only for this 
> one variable -- doesn't really protect other variables that might have 
> special charactes in them down the road, and while "escapeJava" work ok for 
> the "\" issue, it isn't neccessarily consistent with all YAML escapse, which 
> could lead to even weird bugs/cofusion down the road.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14889) improve templated variable escaping in ref-guide _config.yml

2020-09-23 Thread Uwe Schindler (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated SOLR-14889:
-
Attachment: SOLR-14889.patch

> improve templated variable escaping in ref-guide _config.yml
> 
>
> Key: SOLR-14889
> URL: https://issues.apache.org/jira/browse/SOLR-14889
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-14889.patch, SOLR-14889.patch, SOLR-14889.patch
>
>
> SOLR-14824 ran into windows failures when we switching from using a hardcoded 
> "relative" path to the solrRootPath  to using groovy/project variables to get 
> the path.  the reason for the failures was that the path us used as a 
> variable tempted into {{_config.yml.template}} to build the {{_config.yml}} 
> file, but on windows the path seperater of '\' was being parsed by 
> jekyll/YAML as a string escape character.
> (This wasn't a problem we ran into before, even on windows, prior to the 
> SOLR-14824 changes, because the hardcoded relative path only used '/' 
> delimiters, which (j)ruby was happy to work with, even on windows.
> As Uwe pointed out when hotfixing this...
> {quote}Problem was that backslashes are used to escape strings, but windows 
> paths also have those. Fix was to add StringEscapeUtils, but I don't like 
> this too much. Maybe we find a better solution to make special characters in 
> those properties escaped correctly when used in strings inside templates.
> {quote}
> ...the current fix of using {{StringEscapeUtils.escapeJava}} - only for this 
> one variable -- doesn't really protect other variables that might have 
> special charactes in them down the road, and while "escapeJava" work ok for 
> the "\" issue, it isn't neccessarily consistent with all YAML escapse, which 
> could lead to even weird bugs/cofusion down the road.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14889) improve templated variable escaping in ref-guide _config.yml

2020-09-23 Thread Uwe Schindler (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201177#comment-17201177
 ] 

Uwe Schindler commented on SOLR-14889:
--

Small update:  [^SOLR-14889.patch] 

"replaceAll" is wrong, must be "replace" (as we dont use a regex). Typical Java 
error!

> improve templated variable escaping in ref-guide _config.yml
> 
>
> Key: SOLR-14889
> URL: https://issues.apache.org/jira/browse/SOLR-14889
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-14889.patch, SOLR-14889.patch, SOLR-14889.patch
>
>
> SOLR-14824 ran into windows failures when we switching from using a hardcoded 
> "relative" path to the solrRootPath  to using groovy/project variables to get 
> the path.  the reason for the failures was that the path us used as a 
> variable tempted into {{_config.yml.template}} to build the {{_config.yml}} 
> file, but on windows the path seperater of '\' was being parsed by 
> jekyll/YAML as a string escape character.
> (This wasn't a problem we ran into before, even on windows, prior to the 
> SOLR-14824 changes, because the hardcoded relative path only used '/' 
> delimiters, which (j)ruby was happy to work with, even on windows.
> As Uwe pointed out when hotfixing this...
> {quote}Problem was that backslashes are used to escape strings, but windows 
> paths also have those. Fix was to add StringEscapeUtils, but I don't like 
> this too much. Maybe we find a better solution to make special characters in 
> those properties escaped correctly when used in strings inside templates.
> {quote}
> ...the current fix of using {{StringEscapeUtils.escapeJava}} - only for this 
> one variable -- doesn't really protect other variables that might have 
> special charactes in them down the road, and while "escapeJava" work ok for 
> the "\" issue, it isn't neccessarily consistent with all YAML escapse, which 
> could lead to even weird bugs/cofusion down the road.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14354) HttpShardHandler send requests in async

2020-09-23 Thread Cao Manh Dat (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201179#comment-17201179
 ] 

Cao Manh Dat commented on SOLR-14354:
-

Thank Mark for your nice words. 

[~ichattopadhyaya] I will try to do benchmark based on your project above. If 
I'm not be able to finish it before 8.7 release then reverting it will be a 
good option.

> HttpShardHandler send requests in async
> ---
>
> Key: SOLR-14354
> URL: https://issues.apache.org/jira/browse/SOLR-14354
> Project: Solr
>  Issue Type: Improvement
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Blocker
> Fix For: master (9.0), 8.7
>
> Attachments: image-2020-03-23-10-04-08-399.png, 
> image-2020-03-23-10-09-10-221.png, image-2020-03-23-10-12-00-661.png
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> h2. 1. Current approach (problem) of Solr
> Below is the diagram describe the model on how currently handling a request.
> !image-2020-03-23-10-04-08-399.png!
> The main-thread that handles the search requests, will submit n requests (n 
> equals to number of shards) to an executor. So each request will correspond 
> to a thread, after sending a request that thread basically do nothing just 
> waiting for response from other side. That thread will be swapped out and CPU 
> will try to handle another thread (this is called context switch, CPU will 
> save the context of the current thread and switch to another one). When some 
> data (not all) come back, that thread will be called to parsing these data, 
> then it will wait until more data come back. So there will be lots of context 
> switching in CPU. That is quite inefficient on using threads.Basically we 
> want less threads and most of them must busy all the time, because threads 
> are not free as well as context switching. That is the main idea behind 
> everything, like executor
> h2. 2. Async call of Jetty HttpClient
> Jetty HttpClient offers async API like this.
> {code:java}
> httpClient.newRequest("http://domain.com/path";)
> // Add request hooks
> .onRequestQueued(request -> { ... })
> .onRequestBegin(request -> { ... })
> // Add response hooks
> .onResponseBegin(response -> { ... })
> .onResponseHeaders(response -> { ... })
> .onResponseContent((response, buffer) -> { ... })
> .send(result -> { ... }); {code}
> Therefore after calling {{send()}} the thread will return immediately without 
> any block. Then when the client received the header from other side, it will 
> call {{onHeaders()}} listeners. When the client received some {{byte[]}} (not 
> all response) from the data it will call {{onContent(buffer)}} listeners. 
> When everything finished it will call {{onComplete}} listeners. One main 
> thing that will must notice here is all listeners should finish quick, if the 
> listener block, all further data of that request won’t be handled until the 
> listener finish.
> h2. 3. Solution 1: Sending requests async but spin one thread per response
>  Jetty HttpClient already provides several listeners, one of them is 
> InputStreamResponseListener. This is how it is get used
> {code:java}
> InputStreamResponseListener listener = new InputStreamResponseListener();
> client.newRequest(...).send(listener);
> // Wait for the response headers to arrive
> Response response = listener.get(5, TimeUnit.SECONDS);
> if (response.getStatus() == 200) {
>   // Obtain the input stream on the response content
>   try (InputStream input = listener.getInputStream()) {
> // Read the response content
>   }
> } {code}
> In this case, there will be 2 thread
>  * one thread trying to read the response content from InputStream
>  * one thread (this is a short-live task) feeding content to above 
> InputStream whenever some byte[] is available. Note that if this thread 
> unable to feed data into InputStream, this thread will wait.
> By using this one, the model of HttpShardHandler can be written into 
> something like this
> {code:java}
> handler.sendReq(req, (is) -> {
>   executor.submit(() ->
> try (is) {
>   // Read the content from InputStream
> }
>   )
> }) {code}
>  The first diagram will be changed into this
> !image-2020-03-23-10-09-10-221.png!
> Notice that although “sending req to shard1” is wide, it won’t take long time 
> since sending req is a very quick operation. With this operation, handling 
> threads won’t be spin up until first bytes are sent back. Notice that in this 
> approach we still have active threads waiting for more data from InputStream
> h2. 4. Solution 2: Buffering data and handle it inside jetty’s thread.
> Jetty have another listener called BufferingResponseListener. This is how it 
> is get used
> {code:java}
> client.newRequest(

  1   2   >