Re: Solr 7.7 UpdateRequestProcessor broken

2019-02-15 Thread Jan Høydahl
Hi

This is a subtle change which is not detected by our langid unit tests, as I 
think it only happens when document is trasferred with SolrJ and Javabin codec.
Was introduced in https://issues.apache.org/jira/browse/SOLR-12992

Please create a new JIRA issue for langid so we can try to fix it in 7.7.1

Other SolrInputDocument users assuming String type for strings in 
SolrInputDocument would also be vulnerable.

I have a patch ready that you could test:

Index: 
solr/contrib/langid/src/java/org/apache/solr/update/processor/LangDetectLanguageIdentifierUpdateProcessor.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===
--- 
solr/contrib/langid/src/java/org/apache/solr/update/processor/LangDetectLanguageIdentifierUpdateProcessor.java
  (revision 8c831daf4eb41153c25ddb152501ab5bae3ea3d5)
+++ 
solr/contrib/langid/src/java/org/apache/solr/update/processor/LangDetectLanguageIdentifierUpdateProcessor.java
  (date 1550217809000)
@@ -60,12 +60,12 @@
   Collection fieldValues = doc.getFieldValues(fieldName);
   if (fieldValues != null) {
 for (Object content : fieldValues) {
-  if (content instanceof String) {
-String stringContent = (String) content;
+  if (content instanceof CharSequence) {
+CharSequence stringContent = (CharSequence) content;
 if (stringContent.length() > maxFieldValueChars) {
-  detector.append(stringContent.substring(0, 
maxFieldValueChars));
+  detector.append(stringContent.subSequence(0, 
maxFieldValueChars).toString());
 } else {
-  detector.append(stringContent);
+  detector.append(stringContent.toString());
 }
 detector.append(" ");
   } else {
Index: 
solr/contrib/langid/src/java/org/apache/solr/update/processor/LanguageIdentifierUpdateProcessor.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===
--- 
solr/contrib/langid/src/java/org/apache/solr/update/processor/LanguageIdentifierUpdateProcessor.java
(revision 8c831daf4eb41153c25ddb152501ab5bae3ea3d5)
+++ 
solr/contrib/langid/src/java/org/apache/solr/update/processor/LanguageIdentifierUpdateProcessor.java
(date 1550217691000)
@@ -413,10 +413,10 @@
 Collection fieldValues = doc.getFieldValues(fieldName);
 if (fieldValues != null) {
   for (Object content : fieldValues) {
-if (content instanceof String) {
-  String stringContent = (String) content;
+if (content instanceof CharSequence) {
+  CharSequence stringContent = (CharSequence) content;
   if (stringContent.length() > maxFieldValueChars) {
-sb.append(stringContent.substring(0, maxFieldValueChars));
+sb.append(stringContent.subSequence(0, maxFieldValueChars));
   } else {
 sb.append(stringContent);
   }
@@ -449,8 +449,8 @@
 Collection contents = doc.getFieldValues(field);
 if (contents != null) {
   for (Object content : contents) {
-if (content instanceof String) {
-  docSize += Math.min(((String) content).length(), 
maxFieldValueChars);
+if (content instanceof CharSequence) {
+  docSize += Math.min(((CharSequence) content).length(), 
maxFieldValueChars);
 }
   }
 


--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 14. feb. 2019 kl. 16:02 skrev Andreas Hubold :
> 
> Hi,
> 
> while trying to update from Solr 7.6 to 7.7 I run into some unexpected 
> incompatibilites with UpdateRequestProcessors.
> 
> The SolrInputDocument passed to UpdateRequestProcessor#processAdd does not 
> return Strings for string fields anymore but instances of 
> org.apache.solr.common.util.ByteArrayUtf8CharSequence. I found some related 
> JIRA issues (SOLR-12983?) but nothing under the "Upgrade Notes" section.
> 
> I can adapt our UpdateRequestProcessor implementations but at least the 
> org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessor 
> is broken now as well and needs to be fixed in Solr. It expects String values 
> and logs messages such as the following now:
> 
> 2019-02-14 13:14:47.537 WARN  (qtp802600647-19) [   x:studio] 
> o.a.s.u.p.LangDetectLanguageIdentifierUpdateProcessor Field name_tokenized 
> not a String value, not including in detection
> 
> I wonder what kind of plugins are affected by the change. Does this only 
> affect UpdateRequestProcessors or more plugins? Do I need to handle these 
> ByteArrayUtf8CharSequence instances in SolrJ clients now as well?
> 
> Cheers,
> Andreas
> 
> 



Re: Solr 7.7 UpdateRequestProcessor broken

2019-02-15 Thread Andreas Hubold

Hi,

thank you, Jan.

I've created https://issues.apache.org/jira/browse/SOLR-13255. Maybe you 
want to add your patch to that ticket. I did not have time to test it yet.


So I guess, all SolrJ usages have to handle CharSequence now for string 
fields? Well, this really sounds like a major breaking change for custom 
code.


Thanks,
Andreas

Jan Høydahl schrieb am 15.02.19 um 09:14:

Hi

This is a subtle change which is not detected by our langid unit tests, as I 
think it only happens when document is trasferred with SolrJ and Javabin codec.
Was introduced in https://issues.apache.org/jira/browse/SOLR-12992

Please create a new JIRA issue for langid so we can try to fix it in 7.7.1

Other SolrInputDocument users assuming String type for strings in 
SolrInputDocument would also be vulnerable.

I have a patch ready that you could test:

Index: 
solr/contrib/langid/src/java/org/apache/solr/update/processor/LangDetectLanguageIdentifierUpdateProcessor.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===
--- 
solr/contrib/langid/src/java/org/apache/solr/update/processor/LangDetectLanguageIdentifierUpdateProcessor.java
  (revision 8c831daf4eb41153c25ddb152501ab5bae3ea3d5)
+++ 
solr/contrib/langid/src/java/org/apache/solr/update/processor/LangDetectLanguageIdentifierUpdateProcessor.java
  (date 1550217809000)
@@ -60,12 +60,12 @@
Collection fieldValues = doc.getFieldValues(fieldName);
if (fieldValues != null) {
  for (Object content : fieldValues) {
-  if (content instanceof String) {
-String stringContent = (String) content;
+  if (content instanceof CharSequence) {
+CharSequence stringContent = (CharSequence) content;
  if (stringContent.length() > maxFieldValueChars) {
-  detector.append(stringContent.substring(0, 
maxFieldValueChars));
+  detector.append(stringContent.subSequence(0, 
maxFieldValueChars).toString());
  } else {
-  detector.append(stringContent);
+  detector.append(stringContent.toString());
  }
  detector.append(" ");
} else {
Index: 
solr/contrib/langid/src/java/org/apache/solr/update/processor/LanguageIdentifierUpdateProcessor.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===
--- 
solr/contrib/langid/src/java/org/apache/solr/update/processor/LanguageIdentifierUpdateProcessor.java
(revision 8c831daf4eb41153c25ddb152501ab5bae3ea3d5)
+++ 
solr/contrib/langid/src/java/org/apache/solr/update/processor/LanguageIdentifierUpdateProcessor.java
(date 1550217691000)
@@ -413,10 +413,10 @@
  Collection fieldValues = doc.getFieldValues(fieldName);
  if (fieldValues != null) {
for (Object content : fieldValues) {
-if (content instanceof String) {
-  String stringContent = (String) content;
+if (content instanceof CharSequence) {
+  CharSequence stringContent = (CharSequence) content;
if (stringContent.length() > maxFieldValueChars) {
-sb.append(stringContent.substring(0, maxFieldValueChars));
+sb.append(stringContent.subSequence(0, maxFieldValueChars));
} else {
  sb.append(stringContent);
}
@@ -449,8 +449,8 @@
  Collection contents = doc.getFieldValues(field);
  if (contents != null) {
for (Object content : contents) {
-if (content instanceof String) {
-  docSize += Math.min(((String) content).length(), 
maxFieldValueChars);
+if (content instanceof CharSequence) {
+  docSize += Math.min(((CharSequence) content).length(), 
maxFieldValueChars);
  }
}
  



--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com


14. feb. 2019 kl. 16:02 skrev Andreas Hubold :

Hi,

while trying to update from Solr 7.6 to 7.7 I run into some unexpected 
incompatibilites with UpdateRequestProcessors.

The SolrInputDocument passed to UpdateRequestProcessor#processAdd does not return Strings 
for string fields anymore but instances of 
org.apache.solr.common.util.ByteArrayUtf8CharSequence. I found some related JIRA issues 
(SOLR-12983?) but nothing under the "Upgrade Notes" section.

I can adapt our UpdateRequestProcessor implementations but at least the 
org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessor is 
broken now as well and needs to be fixed in Solr. It expects String values and 
logs messages such as the following now:

2019-02-14 13:14:47.537 WARN  (qtp802600647-19) [   x:studio] 
o.a.s.u.p.LangDetectLanguageIdentifierUpdateProcessor 

RE: Solr 7.7 UpdateRequestProcessor broken

2019-02-15 Thread Markus Jelsma
I stumbled upon this too yesterday and created SOLR-13249. In local unit tests 
we get String but in distributed unit tests we get a ByteArrayUtf8CharSequence 
instead.

https://issues.apache.org/jira/browse/SOLR-13249 

 
 
-Original message-
> From:Andreas Hubold 
> Sent: Friday 15th February 2019 10:10
> To: solr-user@lucene.apache.org
> Subject: Re: Solr 7.7 UpdateRequestProcessor broken
> 
> Hi,
> 
> thank you, Jan.
> 
> I've created https://issues.apache.org/jira/browse/SOLR-13255. Maybe you 
> want to add your patch to that ticket. I did not have time to test it yet.
> 
> So I guess, all SolrJ usages have to handle CharSequence now for string 
> fields? Well, this really sounds like a major breaking change for custom 
> code.
> 
> Thanks,
> Andreas
> 
> Jan Høydahl schrieb am 15.02.19 um 09:14:
> > Hi
> >
> > This is a subtle change which is not detected by our langid unit tests, as 
> > I think it only happens when document is trasferred with SolrJ and Javabin 
> > codec.
> > Was introduced in https://issues.apache.org/jira/browse/SOLR-12992
> >
> > Please create a new JIRA issue for langid so we can try to fix it in 7.7.1
> >
> > Other SolrInputDocument users assuming String type for strings in 
> > SolrInputDocument would also be vulnerable.
> >
> > I have a patch ready that you could test:
> >
> > Index: 
> > solr/contrib/langid/src/java/org/apache/solr/update/processor/LangDetectLanguageIdentifierUpdateProcessor.java
> > IDEA additional info:
> > Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
> > <+>UTF-8
> > ===
> > --- 
> > solr/contrib/langid/src/java/org/apache/solr/update/processor/LangDetectLanguageIdentifierUpdateProcessor.java
> >   (revision 8c831daf4eb41153c25ddb152501ab5bae3ea3d5)
> > +++ 
> > solr/contrib/langid/src/java/org/apache/solr/update/processor/LangDetectLanguageIdentifierUpdateProcessor.java
> >   (date 1550217809000)
> > @@ -60,12 +60,12 @@
> > Collection fieldValues = doc.getFieldValues(fieldName);
> > if (fieldValues != null) {
> >   for (Object content : fieldValues) {
> > -  if (content instanceof String) {
> > -String stringContent = (String) content;
> > +  if (content instanceof CharSequence) {
> > +CharSequence stringContent = (CharSequence) content;
> >   if (stringContent.length() > maxFieldValueChars) {
> > -  detector.append(stringContent.substring(0, 
> > maxFieldValueChars));
> > +  detector.append(stringContent.subSequence(0, 
> > maxFieldValueChars).toString());
> >   } else {
> > -  detector.append(stringContent);
> > +  detector.append(stringContent.toString());
> >   }
> >   detector.append(" ");
> > } else {
> > Index: 
> > solr/contrib/langid/src/java/org/apache/solr/update/processor/LanguageIdentifierUpdateProcessor.java
> > IDEA additional info:
> > Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
> > <+>UTF-8
> > ===
> > --- 
> > solr/contrib/langid/src/java/org/apache/solr/update/processor/LanguageIdentifierUpdateProcessor.java
> > (revision 8c831daf4eb41153c25ddb152501ab5bae3ea3d5)
> > +++ 
> > solr/contrib/langid/src/java/org/apache/solr/update/processor/LanguageIdentifierUpdateProcessor.java
> > (date 1550217691000)
> > @@ -413,10 +413,10 @@
> >   Collection fieldValues = doc.getFieldValues(fieldName);
> >   if (fieldValues != null) {
> > for (Object content : fieldValues) {
> > -if (content instanceof String) {
> > -  String stringContent = (String) content;
> > +if (content instanceof CharSequence) {
> > +  CharSequence stringContent = (CharSequence) content;
> > if (stringContent.length() > maxFieldValueChars) {
> > -sb.append(stringContent.substring(0, maxFieldValueChars));
> > +sb.append(stringContent.subSequence(0, 
> > maxFieldValueChars));
> > } else {
> >   sb.append(stringContent);
> > }
> > @@ -449,8 +449,8 @@
> >   Collection contents = doc.getFieldValues(field);
> >   if (contents != null) {
> > for (Object content : contents) {
> > -if (content instanceof String) {
> > -  docSize += Math.min(((String) content).length(), 
> > maxFieldValueChars);
> > +if (content instanceof CharSequence) {
> > +  docSize += Math.min(((CharSequence) content).length(), 
> > maxFieldValueChars);
> >   }
> > }
> >   
> >
> >
> > --
> > Jan Høydahl, search solution architect
> > Cominvent AS - www.cominvent.com
> >
> >> 14. feb. 2019 kl. 16:02 skrev Andreas Hubold 
> >> :
> >>
> >> Hi,
> >>
> >

Re: Indexing in one collection affect index in another collection

2019-02-15 Thread Zheng Lin Edwin Yeo
Hi Shawn,

This issue is also occurring in the new Solr 7.7.0, with only the same data
size of 20 GB.

Regards,
Edwin

On Fri, 8 Feb 2019 at 23:53, Zheng Lin Edwin Yeo 
wrote:

> Hi Shawn,
>
> Thanks for your reply.
>
> Although the space in the OS disk cache could be the issue, but we didn't
> face this problem previously, especially in our other setup using Solr
> 6.5.1, which contains much more data (more than 1 TB), as compared to our
> current setup in Solr 7.6.0, in which the data size is only 20 GB.
>
> Regards,
> Edwin
>
>
>
> On Wed, 6 Feb 2019 at 23:52, Shawn Heisey  wrote:
>
>> On 2/6/2019 7:58 AM, Zheng Lin Edwin Yeo wrote:
>> > Hi everyone,
>> >
>> > Does anyone has further updates on this issue?
>>
>> It is my strong belief that all the software running on this server
>> OTHER than Solr is competing with Solr for space in the OS disk cache,
>> and that Solr's data is getting pushed out of that cache.
>>
>> Best guess is that with only one collection, the disk cache was able to
>> hold onto Solr's data better, and that with another collection present,
>> there's not enough disk cache space available to cache both of them
>> effectively.
>>
>> I think you're going to need a dedicated machine for Solr, so Solr isn't
>> competing for system resources.
>>
>> Thanks,
>> Shawn
>>
>


Re: Intermittent timeout on the Slave

2019-02-15 Thread damian.pawski
Updating the TIME_WAIT on the server seems to fixed the issue as per 
https://support.solarwinds.com/Success_Center/Server_Application_Monitor_(SAM)/Knowledgebase_Articles/Application_monitor_using_port_443_periodically_goes_down

Thank you
Damian



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: SolrCloud exclusive features

2019-02-15 Thread David Hastings
>streaming expressions are only available in
SolrCloud mode and not in Solr master-slave mode?

yes, and its annoying as there are features of solr cloud I do not like.
as far as a comprehensive list, that I do not know but would be interested
in one as well

On Thu, Feb 14, 2019 at 5:07 PM Arnold Bronley 
wrote:

> Hi,
>
> Are there any features that are only exclusive to SolrCloud?
>
> e.g. when I am reading Streaming Expressions documentation, first sentence
> there says 'Streaming Expressions provide a simple yet powerful stream
> processing language for Solr Cloud.'
>
> So, does this mean that streaming expressions are only available in
> SolrCloud mode and not in Solr master-slave mode?
>
> If yes, is there a list of such features that only exclusively available in
> SolrCloud?
>


Suggest Component, prefix match (sur-)name

2019-02-15 Thread David '-1' Schmid
Hello solr-users!

I'm a bit stumped and after some days of trial-and-error, I've come to
the conclusion that I cannot figure this out by myself.

Where I'm at:

Solr 7.7 in cloud mode:
- 3 shards,
- 1 replication factor,
- 1 shards per node,
- 3 nodes,
  - coordinated with external zookeeper
  - running on three different VMs

What I do:

I'm building a search backend for academic citations, one of the most
important data are the authors. They are stored as 

.. managed-schema:
. 
. 
. 
. 
.

and a random sample from the relevant data:

 "author":["Stefan Diepenbrock", "Timo Ropinski", "Klaus H. Hinrichs"],


What I'd like to achieve:

I'd like to provide (auto-complete) suggestions based on the names.


Starting with the easy case:

Someone sends a query for
  'diepen'

I'd want to match case-insensitive on all authors having 'diepen' as
prefix in their (sur-)names.
In this example, matching
  'Stefan [Diepen]brock'

I got this working with defining a new field type for the suggester

.. managed-schema:
.
. 
.   
. 
. 
.   
.   
. 
.   
. 
.

and using that in the searchComponent

.. solrconfig.xml:
.
. 
.   
. default
. AnalyzingInfixLookupFactory
. DocumentDictionaryFactory
. author
. true
. true
. 4
. text_prefix
. false
.   
. 
.
. 
.   
. true
. 10
.   
.   
. authorsuggest
.   
. 
.

After building with
  curl 'http://localhost:8983/solr/dblp/authors?suggest.build=true'
this will yield someting along the lines of

.. curl 'http://localhost:8983/solr/dblp/authors?suggest.q=Diepen'   
. {
.   "suggest":{"default":{
.   "Diepen":{
. "numFound":10,
. "suggestions":[{
. "term":"M. Diepenhorst",
. "weight":0,
. "payload":""},
.   {
. "term":"Sjoerd Diepen",
. "weight":0,
. "payload":""},
.   {
. "term":"Stefan Diepenbrock",
. "weight":0,
. "payload":""},
.   {
./* abbreviated */
.

This might all have worked out by accident.
So if you see something wierd: this is what I ended up with after
running against this wall, trying out different things.


Now the tricky part:


If someone were to type two prefixes of an author's name:

  'Stef Diep' or 'Diep Stef'

I want to match these white-space seperated prefixes on all names of the
author and deliver the results were *both* prefixes match before the
others.

Because with this, curl yields:

.. curl 'http://localhost:8983/solr/dblp/authors?suggest.q=Stef%20Diep'
. {
.   "suggest":{"default":{
.   "Stef Diepen":{
. "numFound":10,
. "suggestions":[{
. "term":"J. Gregory Steffan",
. "weight":0,
. "payload":""},
.   {
. "term":"Stefano Spaccapietra",
. "weight":0,
. "payload":""},
.   {
./* abbreviated */
.

even, when providing the full name as "suggest.q=Stefan%20Diepenbrock".

Other stuff that's weird:
- I'm getting duplicates, like ten times the same name
- Suggester results are non-deterministic

These are not as important and I guess they due to running in
cloud-mode.

I've tried:
- reading
  - through some of the lucene JavaDocs, since the
solr-ref-guide is a bit sparse on information about the variables.
  - the ref-guide, over and over
  - many blogs based on old Solr versions (ab)using spellcheck for
suggestions,
  - and several other pages I found.
- other combinations of analyzers, tokenizers and filters
- other Dict and Lookup Implementations (the wrong ones?)

but no such luck.

I hope I did not leave anything relevant out.

regards,
-1


RE: Suggest Component, prefix match (sur-)name

2019-02-15 Thread Tannen, Lev (USAEO) [Contractor]
Hi David, 
If I understood your requirement correctly you should use "AND" rather than 
"or". Also I, believe,  "AND" should be in capital letters, but I am not sure. 
Good luck.
Lev Tannen

-Original Message-
From: David '-1' Schmid  
Sent: Friday, February 15, 2019 10:23 AM
To: solr-user@lucene.apache.org
Subject: Suggest Component, prefix match (sur-)name

Hello solr-users!

I'm a bit stumped and after some days of trial-and-error, I've come to the 
conclusion that I cannot figure this out by myself.

Where I'm at:

Solr 7.7 in cloud mode:
- 3 shards,
- 1 replication factor,
- 1 shards per node,
- 3 nodes,
  - coordinated with external zookeeper
  - running on three different VMs

What I do:

I'm building a search backend for academic citations, one of the most important 
data are the authors. They are stored as 

.. managed-schema:
. 
. 
. 
. 
.

and a random sample from the relevant data:

 "author":["Stefan Diepenbrock", "Timo Ropinski", "Klaus H. Hinrichs"],


What I'd like to achieve:

I'd like to provide (auto-complete) suggestions based on the names.


Starting with the easy case:

Someone sends a query for
  'diepen'

I'd want to match case-insensitive on all authors having 'diepen' as prefix in 
their (sur-)names.
In this example, matching
  'Stefan [Diepen]brock'

I got this working with defining a new field type for the suggester

.. managed-schema:
.
. 
.   
. 
. 
.   
.   
. 
.   
. 
.

and using that in the searchComponent

.. solrconfig.xml:
.
. 
.   
. default
. AnalyzingInfixLookupFactory
. DocumentDictionaryFactory
. author
. true
. true
. 4
. text_prefix
. false
.   
. 
.
. 
.   
. true
. 10
.   
.   
. authorsuggest
.   
. 
.

After building with
  curl 'http://localhost:8983/solr/dblp/authors?suggest.build=true'
this will yield someting along the lines of

.. curl 'http://localhost:8983/solr/dblp/authors?suggest.q=Diepen'   
. {
.   "suggest":{"default":{
.   "Diepen":{
. "numFound":10,
. "suggestions":[{
. "term":"M. Diepenhorst",
. "weight":0,
. "payload":""},
.   {
. "term":"Sjoerd Diepen",
. "weight":0,
. "payload":""},
.   {
. "term":"Stefan Diepenbrock",
. "weight":0,
. "payload":""},
.   {
./* abbreviated */
.

This might all have worked out by accident.
So if you see something wierd: this is what I ended up with after running 
against this wall, trying out different things.


Now the tricky part:


If someone were to type two prefixes of an author's name:

  'Stef Diep' or 'Diep Stef'

I want to match these white-space seperated prefixes on all names of the author 
and deliver the results were *both* prefixes match before the others.

Because with this, curl yields:

.. curl 'http://localhost:8983/solr/dblp/authors?suggest.q=Stef%20Diep'
. {
.   "suggest":{"default":{
.   "Stef Diepen":{
. "numFound":10,
. "suggestions":[{
. "term":"J. Gregory Steffan",
. "weight":0,
. "payload":""},
.   {
. "term":"Stefano Spaccapietra",
. "weight":0,
. "payload":""},
.   {
./* abbreviated */
.

even, when providing the full name as "suggest.q=Stefan%20Diepenbrock".

Other stuff that's weird:
- I'm getting duplicates, like ten times the same name
- Suggester results are non-deterministic

These are not as important and I guess they due to running in cloud-mode.

I've tried:
- reading
  - through some of the lucene JavaDocs, since the
solr-ref-guide is a bit sparse on information about the variables.
  - the ref-guide, over and over
  - many blogs based on old Solr versions (ab)using spellcheck for
suggestions,
  - and several other pages I found.
- other combinations of analyzers, tokenizers and filters
- other Dict and Lookup Implementations (the wrong ones?)

but no such luck.

I hope I did not leave anything relevant out.

regards,
-1


Re: solr cloud version upgrade 7.6 to 7.7 collection indexes all marked as down

2019-02-15 Thread Erick Erickson
Hmmm. I'm assuming that "nothing in the logs" is node/logs/solr.log, and that
you're not finding errors/exceptipons. Just sanity checking here.

My guess: you're picking up the default SOLR_HOME which is in your new
installation directory and all your
replicas are under the old install directory.

There should be some kind of message in the log files indicating that
Solr is at least trying to load replicas, something similar to:

Using system property solr.solr.home:
/Users/Erick/apache/solrVersions/playspace/solr/example/cloud/node1/solr

and/or:

CorePropertiesLocator Found 3 core definitions underneath
/Users/Erick/apache/solrVersions/playspace/solr/example/cloud/node1/solr

A bit of background: When Solr starts up, it recursively descends from
SOLR_HOME and whenever it finds a "core.properties" file
it says "Aha, this must be a core, I'll try to load it". So if
SOLR_HOME is doesn't point to an ancestor of your existing replicas,
Solr won't find any replicas and everything will stay down. _If_
SOLR_HOME is defined in solr.in.sh, this should just be picked up.

Best,
Erick

On Thu, Feb 14, 2019 at 7:43 PM Zheng Lin Edwin Yeo
 wrote:
>
> Hi,
>
> Which version of zookeeper are you using?
>
> Also, if you tried to query the index, did you get any error message?
>
> Regards,
> Edwin
>
>
> On Fri, 15 Feb 2019 at 02:34, Jeff Courtade  wrote:
>
> > Hi,
> >
> > I am working n doing a simple point upgrade from solr 7.6 to 7.7 cloud.
> >
> > 6 servers
> > 3 zookeepers
> > one simple test collection using the prepackages _default config.
> >
> > i stop all solr servers leaving the zookeepers up.
> >
> > change out the binaries and put the solr.in.sh file back in place with
> > memory and directory stuff.
> >
> > The index directory does not move the files dont change
> >
> > i start up the new binaries and it starts with no errors in the logs but
> > all of the indexes are "down"
> >
> > I have no clue here. nothing in the logs
> >


Re: Migrate from sol 5.3.1 to 7.5.0

2019-02-15 Thread Erick Erickson
Basically, you have to re-index whenever you change the schema,
with very few exceptions. Some changes cause exceptions.
Some changes just fail to return the correct results. Etc.

You can _add_ completely new fields without reindexing, but
they won't have any values for existing documents. You can
_remove_ fields completely, but they'll still be present in the
old documents.

"When in doubt, re-index"

On Thu, Feb 14, 2019 at 5:39 PM ramyogi  wrote:
>
> Do we need to reindex if we change synonymQueryStyle values for a fieldType ?
> I hope not.
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Suggest Component, prefix match (sur-)name

2019-02-15 Thread David '-1' Schmid
Hello solr-users!

I'm a bit stumped and after some days of trial-and-error, I've come to
the conclusion that I cannot figure this out by myself.

Where I'm at:

Solr 7.7 in cloud mode:
- 3 shards,
- 1 replication factor,
- 1 shards per node,
- 3 nodes,
  - coordinated with external zookeeper
  - running on three different VMs

What I do:

I'm building a search backend for academic citations, one of the most
important data are the authors. They are stored as 

.. managed-schema:
. 
. 
. 
. 
.

and a random sample from the relevant data:

 "author":["Stefan Diepenbrock", "Timo Ropinski", "Klaus H. Hinrichs"],


What I'd like to achieve:

I'd like to provide (auto-complete) suggestions based on the names.


Starting with the easy case:

Someone sends a query for
  'diepen'

I'd want to match case-insensitive on all authors having 'diepen' as
prefix in their (sur-)names.
In this example, matching
  'Stefan [Diepen]brock'

I got this working with defining a new field type for the suggester

.. managed-schema:
.
. 
.   
. 
. 
.   
.   
. 
.   
. 
.

and using that in the searchComponent

.. solrconfig.xml:
.
. 
.   
. default
. AnalyzingInfixLookupFactory
. DocumentDictionaryFactory
. author
. true
. true
. 4
. text_prefix
. false
.   
. 
.
. 
.   
. true
. 10
.   
.   
. authorsuggest
.   
. 
.

After building with
  curl 'http://localhost:8983/solr/dblp/authors?suggest.build=true'
this will yield someting along the lines of

.. curl 'http://localhost:8983/solr/dblp/authors?suggest.q=Diepen'   
. {
.   "suggest":{"default":{
.   "Diepen":{
. "numFound":10,
. "suggestions":[{
. "term":"M. Diepenhorst",
. "weight":0,
. "payload":""},
.   {
. "term":"Sjoerd Diepen",
. "weight":0,
. "payload":""},
.   {
. "term":"Stefan Diepenbrock",
. "weight":0,
. "payload":""},
.   {
./* abbreviated */
.

This might all have worked out by accident.
So if you see something wierd: this is what I ended up with after
running against this wall, trying out different things.


Now the tricky part:


If someone were to type two prefixes of an author's name:

  'Stef Diep' or 'Diep Stef'

I want to match these white-space seperated prefixes on all names of the
author and deliver the results were *both* prefixes match before the
others.

Because with this, curl yields:

.. curl 'http://localhost:8983/solr/dblp/authors?suggest.q=Stef%20Diep'
. {
.   "suggest":{"default":{
.   "Stef Diepen":{
. "numFound":10,
. "suggestions":[{
. "term":"J. Gregory Steffan",
. "weight":0,
. "payload":""},
.   {
. "term":"Stefano Spaccapietra",
. "weight":0,
. "payload":""},
.   {
./* abbreviated */
.

even, when providing the full name as "suggest.q=Stefan%20Diepenbrock".

Other stuff that's weird:
- I'm getting duplicates, like ten times the same name
- Suggester results are non-deterministic

These are not as important and I guess they due to running in
cloud-mode.

I've tried:
- reading
  - through some of the lucene JavaDocs, since the
solr-ref-guide is a bit sparse on information about the variables.
  - the ref-guide, over and over
  - many blogs based on old Solr versions (ab)using spellcheck for
suggestions,
  - and several other pages I found.
- other combinations of analyzers, tokenizers and filters
- other Dict and Lookup Implementations (the wrong ones?)

but no such luck.

I hope I did not leave anything relevant out.

regards,
-1


Re: Getting repeated Error - RunExecutableListener java.io.IOException

2019-02-15 Thread Hemant Verma
Thanks Jan
We are using Solr 6.6.3 version.
We didn't configure RunExecutableListener in solrconfig.xml, it seems
configured in configoverlay.json as default. Even we don't want to configure
RunExecutableListener.

Is it mandatory to use configoverlay.json or can we get rid of it? If yes
can you share details.

Attached the solrconfig.xml

solrconfig.xml
  



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


tracing ssl between master and slave

2019-02-15 Thread Baruch Volkov
Hi,

I can't find a way to trace ssl traffic between master and slave

I have on master:

-DSOLR_SSL_KEY_STORE=/ssl/s1/rcm.nyvmcs9.JKS.keystore
-DSOLR_SSL_KEY_STORE_PASSWORD=123456
-DSOLR_SSL_KEY_STORE_TYPE=JKS
-DSOLR_SSL_TRUST_STORE=/ssl/s1/rcm.nyvmcs9.JKS.truststore
-DSOLR_SSL_TRUST_STORE_PASSWORD=123456
-DSOLR_SSL_TRUST_STORE_TYPE=JKS
-DSOLR_SSL_NEED_CLIENT_AUTH=false
-DSOLR_SSL_WANT_CLIENT_AUTH=false
-Djavax.net.debug=ssl

I have on slave:

-Djavax.net.ssl.trustStore=/ssl/s1/rcm.nyvmcs9.JKS.keystore
-Djavax.net.ssl.trustStorePassword=123456
-Djavax.net.debug=ssl


However -Djavax.net.debug only trace java stores and SOLR traffic

Is there any way to generate trace


Thanks you

BARUCH VOLKOV
Senior Customer Support Engineer L2
NICE Actimize



Confidentiality: This communication and any attachments are intended for the 
above-named persons only and may be confidential and/or legally privileged. Any 
opinions expressed in this communication are not necessarily those of NICE 
Actimize. If this communication has come to you in error you must take no 
action based on it, nor must you copy or show it to anyone; please 
delete/destroy and inform the sender by e-mail immediately.
Monitoring: NICE Actimize may monitor incoming and outgoing e-mails.
Viruses: Although we have taken steps toward ensuring that this e-mail and 
attachments are free from any virus, we advise that in keeping with good 
computing practice the recipient should ensure they are actually virus free.

Confidentiality: This communication and any attachments are intended for the 
above-named persons only and may be confidential and/or legally privileged. Any 
opinions expressed in this communication are not necessarily those of NICE 
Actimize. If this communication has come to you in error you must take no 
action based on it, nor must you copy or show it to anyone; please 
delete/destroy and inform the sender by e-mail immediately.  
Monitoring: NICE Actimize may monitor incoming and outgoing e-mails.
Viruses: Although we have taken steps toward ensuring that this e-mail and 
attachments are free from any virus, we advise that in keeping with good 
computing practice the recipient should ensure they are actually virus free.


Errors in Solr log

2019-02-15 Thread Hemant Verma
We are using Solr 6.6.3 with Sitecore. Solr is installed on windows.
Below are few errors coming repeatedly in logs. What could be the possible
fix and reason for below errors.
FYI, we are not interested in using configoverlay.json, this could be one
reason of error and this file exist by default. Can we remove/delete
configoverlay.json?


o.a.s.u.p.DistributedUpdateProcessor Error sending update to
http://10.1.6.4:8983/solr
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
from server at
http://10.1.6.4:8983/solr/sitecore_master_index_shard1_replica2:
RunExecutableListener is deprecated and disabled by default for security
reasons. Legacy applications still using it must explicitely pass
'-Dsolr.enableRunExecutableListener=true' to the Solr command line. Be aware
that you should really disable API-based config editing at the same time,
using '-Ddisable.configEdit=true'!
at
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:612)
at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:279)
at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:268)
at
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient.request(ConcurrentUpdateSolrClient.java:430)
at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219)
at
org.apache.solr.update.SolrCmdDistributor.doRequest(SolrCmdDistributor.java:299)
at
org.apache.solr.update.SolrCmdDistributor.lambda$submit$0(SolrCmdDistributor.java:288)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)




2019-02-12 13:55:29.063 WARN  (qtp1543727556-94768) [   ]
o.a.s.h.a.LukeRequestHandler Error getting file length for [segments_mq5]
java.nio.file.NoSuchFileException:
C:\Solr\solr-6.6.3\server\solr\sitecore_master_index_shard1_replica2\data\index\segments_mq5
at sun.nio.fs.WindowsException.translateToIOException(Unknown Source)
at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source)
at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source)
at sun.nio.fs.WindowsFileAttributeViews$Basic.readAttributes(Unknown
Source)
at sun.nio.fs.WindowsFileAttributeViews$Basic.readAttributes(Unknown
Source)
at sun.nio.fs.WindowsFileSystemProvider.readAttributes(Unknown Source)
at java.nio.file.Files.readAttributes(Unknown Source)
at java.nio.file.Files.size(Unknown Source)
at org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:243)
at
org.apache.lucene.store.NRTCachingDirectory.fileLength(NRTCachingDirectory.java:128)
at
org.apache.solr.handler.admin.LukeRequestHandler.getFileLength(LukeRequestHandler.java:615)
at
org.apache.solr.handler.admin.LukeRequestHandler.getIndexInfo(LukeRequestHandler.java:588)
at
org.apache.solr.handler.admin.CoreAdminOperation.getCoreStatus(CoreAdminOperation.java:348)
at org.apache.solr.handler.admin.StatusOp.execute(StatusOp.java:48)
at
org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:384)
at
org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:388)
at
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:174)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)
at 
org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:748)
at
org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:729)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:510)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:361)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:305)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at
org.eclipse.jetty.server.handler.ContextHandler.doHand

Re: solr cloud version upgrade 7.6 to 7.7 collection indexes all marked as down

2019-02-15 Thread Jeff Courtade
Yes... nothing in the logs does mean that there was nothing of interest. I
have actual entries.

This is a test environment so this isn't an emergency. Thanks for the
clarification about what I should be seeing.

I was just so flabbergasted by this because it's so strange I had to tell
somebody and yell at the universe basically so I yelled at the solar
mailing list.

This is an automated upgrading so the next step is to go through and
manually perform all the steps and see if I get the same behavior.

I am fairly certain I just going to be some dumb thing that I'm doing and I
will be happy to update the mailing list when I figure this out for
everyone's Mutual entertainment.
--
Jeff Courtade
M: 240.507.6116

On Fri, Feb 15, 2019, 12:33 PM Erick Erickson  Hmmm. I'm assuming that "nothing in the logs" is node/logs/solr.log, and
> that
> you're not finding errors/exceptipons. Just sanity checking here.
>
> My guess: you're picking up the default SOLR_HOME which is in your new
> installation directory and all your
> replicas are under the old install directory.
>
> There should be some kind of message in the log files indicating that
> Solr is at least trying to load replicas, something similar to:
>
> Using system property solr.solr.home:
> /Users/Erick/apache/solrVersions/playspace/solr/example/cloud/node1/solr
>
> and/or:
>
> CorePropertiesLocator Found 3 core definitions underneath
> /Users/Erick/apache/solrVersions/playspace/solr/example/cloud/node1/solr
>
> A bit of background: When Solr starts up, it recursively descends from
> SOLR_HOME and whenever it finds a "core.properties" file
> it says "Aha, this must be a core, I'll try to load it". So if
> SOLR_HOME is doesn't point to an ancestor of your existing replicas,
> Solr won't find any replicas and everything will stay down. _If_
> SOLR_HOME is defined in solr.in.sh, this should just be picked up.
>
> Best,
> Erick
>
> On Thu, Feb 14, 2019 at 7:43 PM Zheng Lin Edwin Yeo
>  wrote:
> >
> > Hi,
> >
> > Which version of zookeeper are you using?
> >
> > Also, if you tried to query the index, did you get any error message?
> >
> > Regards,
> > Edwin
> >
> >
> > On Fri, 15 Feb 2019 at 02:34, Jeff Courtade 
> wrote:
> >
> > > Hi,
> > >
> > > I am working n doing a simple point upgrade from solr 7.6 to 7.7 cloud.
> > >
> > > 6 servers
> > > 3 zookeepers
> > > one simple test collection using the prepackages _default config.
> > >
> > > i stop all solr servers leaving the zookeepers up.
> > >
> > > change out the binaries and put the solr.in.sh file back in place with
> > > memory and directory stuff.
> > >
> > > The index directory does not move the files dont change
> > >
> > > i start up the new binaries and it starts with no errors in the logs
> but
> > > all of the indexes are "down"
> > >
> > > I have no clue here. nothing in the logs
> > >
>


Re: Delete by id

2019-02-15 Thread Dwane Hall
Thanks Matt,

I was thinking the same regarding Solr thinking it's an update, not a delete. 
Sorry about the second "longhand" example, yes that was a copy paste issue the 
format is incorrect I was playing around with a few options with the JSON 
format.  I'll keep testing the only difference I could see between our examples 
was you keeping your unique id field called "id" and not a custom value 
("DOC_ID" in my instance).  It seems minor but I've run out of any other ideas 
and am fishing at the moment.

Thanks again,

Dwane



From: Matt Pearce 
Sent: Wednesday, 13 February 2019 10:40 PM
To: solr-user@lucene.apache.org
Subject: Re: Delete by id

Hi Dwane,

The error suggests that Solr is trying to add a document, rather than
delete one, and is complaining that the DOC_ID is missing.

I tried each of your examples (without the smart quotes), and they all
worked as expected, both from curl and the admin UI. There's an error in
your longhand example, which should read
{ "delete": { "id": "123!12345" }}
However, even using your example, I didn't get a complaint about the
field being missing.

Using curl, my command was:
curl -XPOST -H 'Content-type: application/json'
http://localhost:8983/solr/testCollection/update -d '{ "delete":
"123!12345" }'

Are you doing anything differently from that?

Thanks,
Matt


On 11/02/2019 23:24, Dwane Hall wrote:
> Hey Solr community,
>
> I’m having an issue deleting documents from my Solr index and am seeking some 
> community advice when somebody gets a spare minute. It seems really like a 
> really simple problem …a requirement to delete a document by its id.
>
> Here’s how my documents are mapped in solr
>
> DOC_ID
>  required="true" multiValued="false" />
>
> My json format to delete the document (all looks correct according to 
> https://lucene.apache.org/solr/guide/7_6/uploading-data-with-index-handlers.html
>  “The JSON update format allows for a simple delete-by-id. The value of a 
> delete can be an array which contains a list of zero or more specific 
> document id’s (not a range) to be deleted. For example, a single document”)
>
> Attempt 1 – “shorthand”
> {“delete”:”123!12345”}
>
> Attempt 2 – “longhand”
> {“delete”:“DOC_ID”:”123!12345”}
> {“delete”:{“DOC_ID”:”123!12345”}}
>
> ..the error is the same in all instances 
> “org.apache.solr.common.SolrException: Document is missing mandatory 
> uniqueKey field: DOC_ID”
>
> Can anyone see any obvious details I’m overlooking?
>
> I’ve tried all the update handlers below (both curl and through admin ui)
>
> /update/
> /update/json
> /update/json/docs
>
> My environment
> Solr cloud 7.6
> Single node
>
> As always any advice would be greatly appreciated,
>
> Thanks,
>
> Dwane
>

--
Matt Pearce
Flax - Open Source Enterprise Search
www.flax.co.uk


Re: Under-utilization during streaming expression execution

2019-02-15 Thread Joel Bernstein
You can run in parallel and that should help quite a bit. But at a really
large batch job is better done like this:

https://joelsolr.blogspot.com/2016/10/solr-63-batch-jobs-parallel-etl-and.html

Joel Bernstein
http://joelsolr.blogspot.com/


On Thu, Feb 14, 2019 at 6:10 PM Gus Heck  wrote:

> Hi Folks,
>
> I'm looking for ideas on how to speed up processing for a streaming
> expression. I can't post the full details because it's customer related,
> but the structure is shown here: https://imgur.com/a/98sENVT What that
> does
> is take the results of two queries, join them and push them back into the
> collection as a new (denormalized) doc. The second (hash) join just updates
> a field that distinguishes the new docs from either of the old docs so it's
> hashing exactly one value, and thus this is not of concern for performance
> (if there were a good way to tell select to modify only one field and keep
> all the rest without listing the fields explicitly it wouldn't be needed) .
>
>
> When I run it across a test index with 1377364 and 5146620 docs for the two
> queries. The result is that it inserts 4742322 new documents, in ~10
> minutes. This seems pretty spiffy except this test index is ~1/1000 of the
> real index... so obviously I want to find *at least* a factor of 10
> improvement. So far I managed a factor of about 3 to get it down to
> slightly over 200 seconds by programmatically building the queries
> partitioning based on a set of percentiles from a stats query on one of the
> fields that is a floating point number with good distribution, but this
> seems to stop helping 10-12 splits on my 50 node cluster, scaling up to
> split to all 50 nodes brings things back to ~400 seconds.
>
> The CPU utilization on the machines mostly stabilizes around 30-50%, Disk
> metrics don't seem to look bad (disk idle stat in AWS stays over 90%).
> Still trying to get a good handle on network numbers, but I'm guessing that
> I'm either network limited or there's an inefficiency with contention
> somewhere inside solr (no I haven't put a profiler on it yet).
>
> Here's the interesting bit. I happen to know that the join key in the
> leftJoin is on a key that is used for document routing, so we're only
> joining up with documents on the same node. Furthermore, the id generated
> is a concatenation of these id's with a value from one of the fields and
> should also route to the same node... Is there any way to make the whole
> expression run locally on the nodes to avoid throwing the data back and
> forth across the network needlessly?
>
> Any other ideas for making this go another factor of 2-3 faster?
>
> -Gus
>


Re: Under-utilization during streaming expression execution

2019-02-15 Thread Joel Bernstein
Use large batches and fetch instead of hashjoin and lots of parallel
workers.

Joel Bernstein
http://joelsolr.blogspot.com/


On Fri, Feb 15, 2019 at 7:48 PM Joel Bernstein  wrote:

> You can run in parallel and that should help quite a bit. But at a really
> large batch job is better done like this:
>
>
> https://joelsolr.blogspot.com/2016/10/solr-63-batch-jobs-parallel-etl-and.html
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Thu, Feb 14, 2019 at 6:10 PM Gus Heck  wrote:
>
>> Hi Folks,
>>
>> I'm looking for ideas on how to speed up processing for a streaming
>> expression. I can't post the full details because it's customer related,
>> but the structure is shown here: https://imgur.com/a/98sENVT What that
>> does
>> is take the results of two queries, join them and push them back into the
>> collection as a new (denormalized) doc. The second (hash) join just
>> updates
>> a field that distinguishes the new docs from either of the old docs so
>> it's
>> hashing exactly one value, and thus this is not of concern for performance
>> (if there were a good way to tell select to modify only one field and keep
>> all the rest without listing the fields explicitly it wouldn't be needed)
>> .
>>
>>
>> When I run it across a test index with 1377364 and 5146620 docs for the
>> two
>> queries. The result is that it inserts 4742322 new documents, in ~10
>> minutes. This seems pretty spiffy except this test index is ~1/1000 of the
>> real index... so obviously I want to find *at least* a factor of 10
>> improvement. So far I managed a factor of about 3 to get it down to
>> slightly over 200 seconds by programmatically building the queries
>> partitioning based on a set of percentiles from a stats query on one of
>> the
>> fields that is a floating point number with good distribution, but this
>> seems to stop helping 10-12 splits on my 50 node cluster, scaling up to
>> split to all 50 nodes brings things back to ~400 seconds.
>>
>> The CPU utilization on the machines mostly stabilizes around 30-50%, Disk
>> metrics don't seem to look bad (disk idle stat in AWS stays over 90%).
>> Still trying to get a good handle on network numbers, but I'm guessing
>> that
>> I'm either network limited or there's an inefficiency with contention
>> somewhere inside solr (no I haven't put a profiler on it yet).
>>
>> Here's the interesting bit. I happen to know that the join key in the
>> leftJoin is on a key that is used for document routing, so we're only
>> joining up with documents on the same node. Furthermore, the id generated
>> is a concatenation of these id's with a value from one of the fields and
>> should also route to the same node... Is there any way to make the whole
>> expression run locally on the nodes to avoid throwing the data back and
>> forth across the network needlessly?
>>
>> Any other ideas for making this go another factor of 2-3 faster?
>>
>> -Gus
>>
>