Re: Basic auth and index replication

2019-04-23 Thread Dwane Hall
Hi guys,

Did anyone get an opportunity to confirm this behaviour.  If not is the 
community happy for me to raise a JIRA ticket for this issue?

Thanks,

Dwane

From: Dwane Hall 
Sent: Wednesday, 3 April 2019 7:15 PM
To: solr-user@lucene.apache.org
Subject: Basic auth and index replication

Hey Solr community.



I’ve been following a couple of open JIRA tickets relating to use of the basic 
auth plugin in a Solr cluster (https://issues.apache.org/jira/browse/SOLR-12584 
, https://issues.apache.org/jira/browse/SOLR-12860) and recently I’ve noticed 
similar behaviour when adding tlog replicas to an existing Solr collection.  
The problem appears to occur when Solr attempts to replicate the leaders index 
to a follower on another Solr node and it fails authentication in the process.



My environment

Solr cloud 7.6

Basic auth plugin enabled

SSL



Has anyone else noticed similar behaviour when using tlog replicas?



Thanks,



Dwane



2019-04-03T13:27:22,774 5000851636 WARN  : [   ] 
org.apache.solr.handler.IndexFetcher : Master at: 
https://myserver:myport/solr/mycollection_shard1_replica_n17/ is not available. 
Index fetch failed by exception: 
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
from server at https://myserver:myport/solr/mycollection_shard1_replica_n17: 
Expected mime type application/octet-stream but got text/html. 





Error 401 require authentication



HTTP ERROR 401

Problem accessing /solr/mycollection_shard1_replica_n17/replication. Reason:

require authentication







Re:LTR: Normalize Feature Weights

2019-04-23 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
Hi Kamal, 

You can use a MinMaxNormalizer[1], and get min and max from historical data, 
for the original score won't guarantee that the value will be **always** 
between 0..1 but it should happen  in the majority of the cases, if the 0..1 
constraint is not super strong I would rather use a StandardNormalizer[2]. 

For the value at -1, at the moment there is no way to assign a default value to 
a feature, but It is something that I'm planning to contribute, you might 
achieve that by playing with the default value of the field in the schema. 

If you can, could you please explain why you need to normalize only two fields? 
what are you trying to do? 

Cheers,
Diego

[1] 
https://lucene.apache.org/solr/6_6_0//solr-ltr/org/apache/solr/ltr/norm/MinMaxNormalizer.html
[2] 
https://lucene.apache.org/solr/6_6_0//solr-ltr/org/apache/solr/ltr/norm/StandardNormalizer.html


From: solr-user@lucene.apache.org At: 04/18/19 21:13:14To:  
solr-user@lucene.apache.org
Subject: LTR: Normalize Feature Weights

Hi,

Is there a way to normalize the value of fieldValueFeature
and OriginalScoreFeature features within some range i.e 0-1.

Lets suppose I have 4 products with some field values, I wish to normalize
weight within 0 and 1 using func (val-min)/(max-min).

Product FieldValue Normalized Value
P1 4 1
P2 3 0.6
P3 2 0.3
P4 1 0
P5 - -1

If the product does not contain the field value, make feature value as -1
(some static).

I tried to use the scale function, but since scale function works on the
whole index, so it will not be relevant for our case. If you multiple
function here, performance will be impacted.
I have seen solr ltr source code and there is a normalized function, but
not sure how to implement it in our case.

Regards
Kamal




Different Parsed query for solr cloud and master slave with same solr version

2019-04-23 Thread Anant Bhargatiya
Hello,

We are migrating from solr 5.5 master slave to solr 8.0 cloud deployment.
for exactly same index and config, we are getting different results.

we also compared solr cloud (solr 8.0) and master slave (solr 8.0) with
same indexed data.

in all of above scenario we observed that different edismax queries are
being generated for same search term.

In our setup IDF and TF is set to 1 by overriding similarity class.
[ also
did not make any difference in our observation]

our main concern is, why edismax is generating different queries in master
slave and cloud even if solr version is same.

is there any way to generate same edismax queries in these three different
scenario (solr 5.5 master slave, solr 8.0 master slave and solr 8.0 cloud).


More info:

in cloud, we are using sharding based on one field F1.
we are fetching grouped results based on field F1.
ungrouped queries are returning different results.



-- 
ANANT BHARGATIYA
Sr. Software Developer
Ph No.-> 08553660598


Determing Solr heap requirments and analyzing memory usage

2019-04-23 Thread Brian Ecker
Hello,

We are currently running into a situation where Solr (version 7.4) in
slowly using up all available memory allocated to the heap, and then
eventually hitting an OutOfMemory error. We have tried increasing the heap
size and also tuning the GC settings, but this does not seem to solve the
issue. What we see is a slow increase in G1 Old Gen heap utilization until
it eventually takes all of the heap space and causes instances to crash.
Previously we tried running each instance with 10GB of heap space
allocated. We then tried running with 20GB of heap space, and we ran into
the same issue. I have attached a histogram of the heap captured from an
instance using nearly all the available heap when allocated 10GB. What I’m
trying to determine is (1) How much heap does this setup need before it
stabilizes and stops crashing with OOM errors, (2) can this requirement
somehow be reduced so that we can use less memory, and (3) from the heap
histogram, what is actually using memory (lots of primitive type arrays and
data structures, but what part of Solr is using those)?

I am aware that distributing the index would reduce the requirements for
each shard, but we’d like to avoid that for as long as possible due to
operational difficulties associated. As far as I can tell, very few of the
conditions listed under
https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap section
actually apply to our instance. We don’t have a very large index, we never
update in production (only query), the documents don’t seem very large
(~4KB each), we don’t use faceting, caches are reasonably small (~3GB max),
RAMBufferSizeMB is 100MB, we don’t use RAMDirectoryFactory (as far as I can
tell), and we don’t use sort parameters. The solr instance is used for a
full-text complete-as-you-type use case. The typical query looks something
like the following (field names anonymized):

?q=(single_value_f1:"baril" OR multivalue_f1:"baril")^=1
(single_value_f2:(baril) OR multivalue_f2:(baril))^=0.5
&fl=score,myfield1,myfield2,myfield3:myfield3.ar&bf=product(def(myfield3.ar
,0),1)&rows=200&df=dummy&spellcheck=on&spellcheck.dictionary=spellchecker.es&spellcheck.dictionary=spellchecker.und&spellcheck.q=baril&spellcheck.accuracy=0.5&spellcheck.count=1&fq=+myfield1:(100
OR 200 OR 500)&fl=score&fl=myfield1&fl=myfield2&fl=myfield3:myfield3.ar

I have attached in various screenshots details from top on a running Solr
instance, GC logs, solr-config.xml, and also a heap histogram sampled with
Java Mission Control. I also provide various additional details below
related to how the instances are set up and details about their
configuration.

Operational summary:
We run multiple Solr instances, each acting as a completely independent
node. They are not a cluster and are not set up using Solr Cloud. Each
replica contains the entire index. These replicas run in Kubernetes on GCP.

GC Settings:
-XX:+UnlockExperimentalVMOptions -Xlog:gc*,heap=info
-XX:+ParallelRefProcEnabled -XX:MaxGCPauseMillis=50
-XX:InitiatingHeapOccupancyPercent=40 -XX:-G1UseAdaptiveIHOP

Index summary:
* ~2,100,000 documents
* Total size: 9.09 GB
* Average document size = 9.09 GB / 2,100,000 docs = 4.32 KB/doc
* 215 fields per document
* 77 are stored.
* 137 are multivalued
* Makes fields use of many spell checkers for different languages (see solr
config.xml)
* Most fields include some sort of tokenization and analysis. Example
config:

  

  
  
  
  
  
  




  
  
  
  

  

Please let me know if there is any additional information required.




${solr.indexVersion:7.4.0}

  ${solr.data.dir}/${solr.indexVersion:7.4.0}/${solr.core.name}/data










100
${solr.lock.type:native}


0
0









1024







true
200
200









false
2






max-age=604800, public





10
key
OR


spellcheck





edismax
5
OR

on
spellchecker.en
20
true
f



spellcheck







identifier_id
ping





spellchecker_analyzer
solr.DirectSolrSpellChecker
internal 
2 
1 
10 
3 
0.2 
0.01
.01



spellchecker.und
spellchecker.und
solr.DirectSolrSpellChecker


spellchecker.en
spellchecker.en
solr.DirectSolrSpellChecker


spellchecker.d

Re: Basic auth and index replication

2019-04-23 Thread Jan Høydahl
The MetricsHistory issue is now resolved. I think the replication issue already 
has a JIRA here https://issues.apache.org/jira/browse/SOLR-11904 
 ?
Feel free to vote on that issue and perhaps add your own comments. And if you 
have an idea for a solution that is of course great!

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 23. apr. 2019 kl. 11:27 skrev Dwane Hall :
> 
> Hi guys,
> 
> Did anyone get an opportunity to confirm this behaviour.  If not is the 
> community happy for me to raise a JIRA ticket for this issue?
> 
> Thanks,
> 
> Dwane
> 
> From: Dwane Hall 
> Sent: Wednesday, 3 April 2019 7:15 PM
> To: solr-user@lucene.apache.org
> Subject: Basic auth and index replication
> 
> Hey Solr community.
> 
> 
> 
> I’ve been following a couple of open JIRA tickets relating to use of the 
> basic auth plugin in a Solr cluster 
> (https://issues.apache.org/jira/browse/SOLR-12584 , 
> https://issues.apache.org/jira/browse/SOLR-12860) and recently I’ve noticed 
> similar behaviour when adding tlog replicas to an existing Solr collection.  
> The problem appears to occur when Solr attempts to replicate the leaders 
> index to a follower on another Solr node and it fails authentication in the 
> process.
> 
> 
> 
> My environment
> 
> Solr cloud 7.6
> 
> Basic auth plugin enabled
> 
> SSL
> 
> 
> 
> Has anyone else noticed similar behaviour when using tlog replicas?
> 
> 
> 
> Thanks,
> 
> 
> 
> Dwane
> 
> 
> 
> 2019-04-03T13:27:22,774 5000851636 WARN  : [   ] 
> org.apache.solr.handler.IndexFetcher : Master at: 
> https://myserver:myport/solr/mycollection_shard1_replica_n17/ is not 
> available. Index fetch failed by exception: 
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
> from server at https://myserver:myport/solr/mycollection_shard1_replica_n17: 
> Expected mime type application/octet-stream but got text/html. 
> 
> 
> 
> 
> 
> Error 401 require authentication
> 
> 
> 
> HTTP ERROR 401
> 
> Problem accessing /solr/mycollection_shard1_replica_n17/replication. 
> Reason:
> 
> require authentication
> 
> 
> 
> 
> 



Re: bin/post command not working when run from crontab

2019-04-23 Thread Carsten Agger


On 4/18/19 4:51 PM, Erik Hatcher wrote:
> Jason - thanks for replying 
>
> and I concur, it makes sense to open a JIRA for this.I'm glad there 
> is an acceptable workaround, at least.
>
> I recall doing a fair bit of trial and error, asking 'nix folk and 
> stackoverflow how to handle this stdin situation and honing in on what's 
> there now.   But it's obviously weirdly broken, sorry.  

Thanks a lot for the feedback, I've filed a report:

https://issues.apache.org/jira/projects/SOLR/issues/SOLR-13422?filter=allopenissues

Best
Carsten

-- 
Carsten Agger

Chief Technologist
Magenta ApS
Skt. Johannes Allé 2
8000 Århus C

Tlf  +45 5060 1476
http://www.magenta-aps.dk
carst...@magenta-aps.dk



Re: Different Parsed query for solr cloud and master slave with same solr version

2019-04-23 Thread Shawn Heisey

On 4/23/2019 2:04 AM, Anant Bhargatiya wrote:

We are migrating from solr 5.5 master slave to solr 8.0 cloud deployment.
for exactly same index and config, we are getting different results.


We'll need to see the configs you are working with as well as the raw 
and parsed queries from both.


Every time Solr executes a query, that query's full list of parameters 
is logged in solr.log.  If you could provide those logs from both the 
cloud and the not cloud deployment,s we can make sure that what Solr is 
executing is actually identical between the two.


Thanks,
Shawn


Re: Determing Solr heap requirments and analyzing memory usage

2019-04-23 Thread Shawn Heisey

On 4/23/2019 6:34 AM, Brian Ecker wrote:
What I’m trying to determine is (1) How much heap does 
this setup need before it stabilizes and stops crashing with OOM errors, 
(2) can this requirement somehow be reduced so that we can use less 
memory, and (3) from the heap histogram, what is actually using memory 
(lots of primitive type arrays and data structures, but what part of 
Solr is using those)?


Exactly one attachment made it through:  The file named 
solrconfig-anonymized.xml.  Attachments can't be used to share files 
because the mailing list software is going to eat them and we won't see 
them.  You'll need to use a file sharing website.  Dropbox is often a 
good choice.


We won't be able to tell anything about what's using all the memory from 
a histogram.  We would need an actual heap dump from Java.  This file 
will be huge -- if you have a 10GB heap, and that heap is full, the file 
will likely be larger than 10GB.


There is no way for us to know how much heap you need.  With a large 
amount of information about your setup, we can make a guess, but that 
guess will probably be wrong.  Info we'll need to make a start:


*) How many documents is this Solr instance handling?  You find this out 
by looking at every core and adding up all the "maxDoc" numbers.


*) How much disk space is the index data taking?  This could be found 
either by getting a disk usage value for the solr home, or looking at 
every core and adding up the size of each one.


*) What kind of queries are you running?  Anything with facets, or 
grouping?  Are you using a lot of sort fields?


*) What kind of data is in each document, and how large is that data?

Your cache sizes are reasonable.  So you can't reduce heap requirements 
by much by reducing cache sizes.


Here's some info about what takes a lot of heap and ideas for reducing 
the requirements:


https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap

That page also reiterates what I said above:  It's unlikely that anybody 
will be able to tell you exactly how much heap you need at a minimum. 
We can make guesses, but those guesses might be wrong.


Thanks,
Shawn


Re: Determing Solr heap requirments and analyzing memory usage

2019-04-23 Thread Brian Ecker
Thanks for your response. See below please for detailed responses.

On Tue, Apr 23, 2019 at 6:04 PM Shawn Heisey  wrote:

> On 4/23/2019 6:34 AM, Brian Ecker wrote:
> > What I’m trying to determine is (1) How much heap does
> > this setup need before it stabilizes and stops crashing with OOM errors,
> > (2) can this requirement somehow be reduced so that we can use less
> > memory, and (3) from the heap histogram, what is actually using memory
> > (lots of primitive type arrays and data structures, but what part of
> > Solr is using those)?
>
> Exactly one attachment made it through:  The file named
> solrconfig-anonymized.xml.  Attachments can't be used to share files
> because the mailing list software is going to eat them and we won't see
> them.  You'll need to use a file sharing website.  Dropbox is often a
> good choice.
>

I see. The other files I meant to attach were the GC log (
https://pastebin.com/raw/qeuQwsyd), the heap histogram (
https://pastebin.com/raw/aapKTKTU), and the screenshot from top (
http://oi64.tinypic.com/21r0bk.jpg).

>
> We won't be able to tell anything about what's using all the memory from
> a histogram.  We would need an actual heap dump from Java.  This file
> will be huge -- if you have a 10GB heap, and that heap is full, the file
> will likely be larger than 10GB.


I'll work on getting the heap dump, but would it also be sufficient to use
say a 5GB dump from when it's half full and then extrapolate to the
contents of the heap when it's full? That way the dump would be a bit
easier to work with.

>
> There is no way for us to know how much heap you need.  With a large
> amount of information about your setup, we can make a guess, but that
> guess will probably be wrong.  Info we'll need to make a start:
>

I believe I already provided most of this information in my original post,
as I understand that it's not trivial to make this assessment accurately.
I'll re-iterate below, but please see the original post too because I tried
to provide as much detail as possible.

>
> *) How many documents is this Solr instance handling?  You find this out
> by looking at every core and adding up all the "maxDoc" numbers.
>

There are around 2,100,000 documents.

>
> *) How much disk space is the index data taking?  This could be found
> either by getting a disk usage value for the solr home, or looking at
> every core and adding up the size of each one.
>

The data takes around 9GB on disk.

>
> *) What kind of queries are you running?  Anything with facets, or
> grouping?  Are you using a lot of sort fields?


No facets or grouping and no sort fields. The application performs a
full-text search complete-as-you-type function. Much of this is done using
prefix analyzers and edge ngrams. We also make heavy use of spellchecking.
An example of one of the queries produced is the following:

?q=(single_value_f1:"baril" OR multivalue_f1:"baril")^=1
(single_value_f2:(baril) OR multivalue_f2:(baril))^=0.5
&fl=score,myfield1,myfield2,myfield3:myfield3.ar&bf=product(def(myfield3.ar
,0),1)&rows=200&df=dummy&spellcheck=on&spellcheck.dictionary=spellchecker.es&spellcheck.dictionary=spellchecker.und&spellcheck.q=baril&spellcheck.accuracy=0.5&spellcheck.count=1&fq=+myfield1:(100
OR 200 OR 500)&fl=score&fl=myfield1&fl=myfield2&fl=myfield3:myfield3.ar


> *) What kind of data is in each document, and how large is that data?
>

The data contained is mostly 1-5 words of text in various fields and in
various languages. We apply different tokenizers and some language specific
analyzers for different fields, but almost every field is tokenized. There
are 215 fields in total, 77 of which are stored. Based on the index size on
disk and the number of documents, I guess that gives 4.32 KB/doc on
average.

>
> Your cache sizes are reasonable.  So you can't reduce heap requirements
> by much by reducing cache sizes.
>
> Here's some info about what takes a lot of heap and ideas for reducing
> the requirements:
>
> https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap


Thank you, but I've seen that page already and that's part of why I'm
confused, as I believe most of those points that usually take a lot of heap
don't seem to apply to my setup.

>
>
> That page also reiterates what I said above:  It's unlikely that anybody
> will be able to tell you exactly how much heap you need at a minimum.
> We can make guesses, but those guesses might be wrong.
>
> Thanks,
> Shawn
>


Zk Status Error

2019-04-23 Thread Sadiki Latty
Hi,

I am in the process of upgrading my solr cloud from the 7.2.1 to the latest 
7.7.1 and it seems to have successfully installed, my data is still there, 
still searchable and the other tabs are still working under the Cloud section, 
except the ZK Status section. I am seeing an error when I try to  view the "ZK 
Status" tab in the Solr Admin UI.

https://pasteboard.co/Ibv53co.png


[cid:image001.png@01D4F9DE.E02257B0]

Here are the 2 errors in the Solr Logging section

RequestHandlerBase  java.lang.ArrayIndexOutOfBoundsException: 1
"java.lang.ArrayIndexOutOfBoundsException: 1
at 
org.apache.solr.handler.admin.ZookeeperStatusHandler.monitorZookeeper(ZookeeperStatusHandler.java:185)
at 
org.apache.solr.handler.admin.ZookeeperStatusHandler.getZkStatus(ZookeeperStatusHandler.java:99)
at 
org.apache.solr.handler.admin.ZookeeperStatusHandler.handleRequestBody(ZookeeperStatusHandler.java:77)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
at 
org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:735)
at 
org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:716)
at 
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:496)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:395)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:341)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1588)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1557)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at 
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at org.eclipse.jetty.server.Server.handle(Server.java:502)
at 
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:364)
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)
at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
at 
org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
at 
org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
at 
org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683)
at java.lang.Thread.run(Thread.java:748)
"


Error #2:
HttpSolrCall null:java.lang.ArrayIndexOutOfBoundsException: 1


Re: Determing Solr heap requirments and analyzing memory usage

2019-04-23 Thread Shawn Heisey

On 4/23/2019 11:48 AM, Brian Ecker wrote:

I see. The other files I meant to attach were the GC log (
https://pastebin.com/raw/qeuQwsyd), the heap histogram (
https://pastebin.com/raw/aapKTKTU), and the screenshot from top (
http://oi64.tinypic.com/21r0bk.jpg).


I have no idea what to do with the histogram.  I doubt it's all that 
useful anyway, as it wouldn't have any information about what parts of 
the system are using the most.


The GC log is not complete.  It only covers 2 min 47 sec 674 ms of time. 
 To get anything useful out of a GC log, it would probably need to 
cover hours of runtime.


But if you are experiencing OutOfMemoryError, then either you have run 
into something where a memory leak exists, or there's something about 
your index or your queries that needs more heap than you have allocated. 
 Memory leaks are not super common in Solr, but they have happened.


Tuning GC will never help OOME problems.

The screenshot looks like it matches the info below.


I'll work on getting the heap dump, but would it also be sufficient to use
say a 5GB dump from when it's half full and then extrapolate to the
contents of the heap when it's full? That way the dump would be a bit
easier to work with.


That might be useful.  The only way to know for sure is to take a look 
at it to see if the part of the code using lots of heap is detectable.



There are around 2,100,000 documents.



The data takes around 9GB on disk.


Ordinarily, I would expect that level of data to not need a whole lot of 
heap.  10GB would be more than I would think necessary, but if your 
queries are big consumers of memory, I could be wrong.  I ran indexes 
with 30 million documents taking up 50GB of disk space on an 8GB heap. 
I probably could have gone lower with no problems.


I have absolutely no idea what kind of requirements the spellcheck 
feature has.  I've never used that beyond a few test queries.  If the 
query information you sent is complete, I wouldn't expect the 
non-spellcheck parts to require a whole lot of heap.  So perhaps 
spellcheck is the culprit here.  Somebody else will need to comment on that.


Thanks,
Shawn


Re: Zk Status Error

2019-04-23 Thread Shawn Heisey

On 4/23/2019 12:14 PM, Sadiki Latty wrote:

Here are the 2 errors in the Solr Logging section

RequestHandlerBase  java.lang.ArrayIndexOutOfBoundsException: 1

“java.lang.ArrayIndexOutOfBoundsException: 1

     at 
org.apache.solr.handler.admin.ZookeeperStatusHandler.monitorZookeeper(ZookeeperStatusHandler.java:185)


That line of code in Solr 7.7.1 is:

  obj.put(line.split("\t")[0], line.split("\t")[1]);

The error is saying that the array created by the split function didn't 
have two elements, so asking for the entry numbered 1 doesn't work.


Which means that the output Solr received from the ZK "mntr" command was 
not what Solr expected.


What version of ZK do you have running your ZK ensemble?  Is your ZK 
ensemble working correctly?  The ZK client version in Solr 7.7.1 is 
3.4.13.  If your server version is different, maybe there was a change 
in the output from the mntr command.


What happens if you issue the mntr command yourself directly?

Solr should, at the very least, detect possible problems with decoding 
the response from ZK and display a more helpful message.  If there was a 
change in the command output, Solr should be resilient enough to handle it.


Thanks,
Shawn