Solr Cloud: Zookeeper failure modes

2019-01-02 Thread Pavel Micka
Hi,
We are currently implementing Solr cloud and as part of this effort we are 
investigating, which failure modes may happen between Solr and Zookeeper.

We have found quite a lot articles describing the "happy path" failure, when ZK 
stops (loses majority) and the Solr Cluster ceases to serve write requests (& 
read continues to work as expected). Once ZK cluster is reconciled and majority 
achieved again, everything continues working as expected.

What we have not been able to find is what happens when ZK cluster 
catastrophically fails and loses its data. Either completely (scenario A) or is 
restarted from backup (scenario B).

So now the questions:

1)  Scenario A - Is existing Solr Cloud cluster able to start against a 
clean Zookeeper and reconstruct all the ZK data from its internal state (using 
some king of emergency recovery; it may take long)?

2)  Scenario B - What is the worst case backup/restore scenario? For 
example when

a.   ZK is backed up

b.   Cluster performs some transition between states "X -> Y" (such as 
commit shard, elect new leader etc.)

c.   ZK fails completely

d.   ZK is restored from backup created in step a

e.   Solr Cloud is in state "Y", while ZK is in state "X"

Thanks in advance,

Pavel



Solr Size Limitation upto 32 KB files

2019-01-02 Thread Kranthi Kumar K
Hi,

We are currently using Solr 4.2.1 version in our project and everything is 
going well. But recently, we are facing an issue with Solr Data Import. It is 
not importing the files with size greater than 32766 bytes (i.e, 32 kb) and 
showing 2 exceptions:


  1.  java.lang.illegalargumentexception
  2.  org.apache.lucene.util.bytesref hash$maxbyteslengthexceededexception


Please find the attached screenshot for reference.

We have searched for solutions in many forums and didn't find the exact 
solution for this issue. Interestingly, we found in the article, by changing 
the type of the 'field' from sting to  'text_general' might solve the issue. 
Please have a look in the below forum:

https://stackoverflow.com/questions/29445323/adding-a-document-to-the-index-in-solr-document-contains-at-least-one-immense-t

Schema.xml:
Changed from:
''

Changed to:
''

We have tried it but still it is not importing the files > 32 KB or 32766 bytes.

Could you please let us know the solution to fix this issue? We'll be awaiting 
your reply.


[image001]
Thanks & Regards,
Kranthi Kumar.K,
Software Engineer,
Ccube Fintech Global Services Pvt Ltd.,
Email/Skype: 
kranthikuma...@ccubefintech.com,
Mobile: +91-8978078449.




How to access the Solr Admin GUI (2)

2019-01-02 Thread solr

First I want to thank you for your comments.
Second I'll add some background information.

Here Solr is part of a complex information management project, which I  
developed for a customer and which includes different source  
databases, containing edited/imported/crawled content.
This project runs on a Debian root server, which is hosted by an ISP  
and maintained by the ISP's support team and - a little bit - by me.

This setting was required by my customer.

Solr searches are created and processed on this server from a PHP  
MySQL stack, and port 8983 is only available internally.
I agree the opening port 8983 to the public is dangerous, I've  
experienced that.
Nevertheless from time to time I need access to the Solr Admin GUI on  
that server.


My ISP's support team is not familiar with Solr, but willing to help.
So I'll forward your comments to them and discuss with them.

Thank you again.
Walter


Shawn Heisey  schrieb am 01.01.2019 20:00:13:

If you've blocked the Solr port, then you can't access Solr at all,  
including the admin UI.  The UI is accessed through the same port as  
the rest of Solr.


The admin UI is a static set of resources (html, css, javascript,  
images, etc) that gets downloaded and runs within the browser,  
accessing the same API that anything else would.  When you issue a  
query with the admin UI, it is your browser that makes the query,  
not the server.


If you set up a reverse proxy that blocks URL paths for the API  
while allowing URL paths for the admin UI, then the admin UI won't  
work -- because everything the admin UI displays or does is  
accomplished by your browser making calls to the API.


Thanks,
Shawn



Terry Steichen  schrieb am 01.01.2019 19:39:04:


I think a better approach to tunneling would be:

ssh -p  -L :localhost:8983 use...@myremoteserver.example.com

This requires you to set up a different port () rather than use the
standard 22 port (on your router and on your sshd config).  I've been
running something like this for about a year and have rarely if ever had
it attacked.  Prior to changing the port (to ), however, I was under
constant hacking attacks - they find port 22 too attractive to ignore.

Also, regarding my use of port : if you have the server running on
several local machines (as I do), the use of the  port may help
prevent confusion (as to whether your browser is accessing a local -
defaulted to 8983 - or a remote solr server).

Note: you might find that the ssh connection will drop out after some
inactivity, and need to be restarted occasionally.  Pretty simple to do
- just run the ssh line above again.

Note: I also add authorization controls to the AdminUI (and its functions)



Jörn Franke  schrieb am 01.01.2019 19:11:18:

You could configure a reverse proxy to provide one or more means of  
authentication.


However, I agree that the purpose why this is done should be clarified.



Kay Wrobel  schrieb am 01.01.2019 19:02:10:


You can use ssh to tunnel in.

ssh -L8983:localhost:8983 use...@myremoteserver.example.com

This will only require port 22 to be exposed to the public.


Sent from my iPhone



Walter Underwood  schrieb am 01.01.2019 19:00:31:


Yes, exposing the admin UI on the web is very dangerous. Anyone who finds it
can delete all your collections. That UI is designed for “back  
office” use only.


wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)



Gus Heck  schrieb am 01.01.2019 18:43:02:


Why would you want to expose the administration gui on the web? This is a
very hazardous thing to do. Never mind that it normally also runs on 8983
and all it's functionality relies on the ability to interact with 8983
hosted api end points.

What are you actually trying to solve?



Jörn Franke  schrieb am 31.12.2018 23:07:49:


Reverse proxy?



"aleksander_goncha...@yahoo.de"   
schrieb am 31.12.2018 23:22:59:



Hi Walter,

hatte ähnlichen Fall. Der wurde mit Proxy gelöst. "Einfach" Ngnix  
dazwischen geschaltet.


Viele Grüße
Alexander


s...@cid.is schrieb am 31.12.2018 22:48:55:


Hi all,

is there a way, better a solution, to access the Solr Admin GUI from  
 outside the server (via public web) while the Solr port 8983 is  
closed  by a firewall and only available inside the server via  
localhost?


Thanks in advance
Walter Claassen

Alexandraweg 32
D 64287 Darmstadt
Fon +49-6151-4937961
Fax +49-6151-4937969
c...@cid.is




Re: How to access the Solr Admin GUI (2)

2019-01-02 Thread Jörn Franke
In this case create a VPN and then access it.

> Am 02.01.2019 um 11:03 schrieb s...@cid.is:
> 
> First I want to thank you for your comments.
> Second I'll add some background information.
> 
> Here Solr is part of a complex information management project, which I 
> developed for a customer and which includes different source databases, 
> containing edited/imported/crawled content.
> This project runs on a Debian root server, which is hosted by an ISP and 
> maintained by the ISP's support team and - a little bit - by me.
> This setting was required by my customer.
> 
> Solr searches are created and processed on this server from a PHP MySQL 
> stack, and port 8983 is only available internally.
> I agree the opening port 8983 to the public is dangerous, I've experienced 
> that.
> Nevertheless from time to time I need access to the Solr Admin GUI on that 
> server.
> 
> My ISP's support team is not familiar with Solr, but willing to help.
> So I'll forward your comments to them and discuss with them.
> 
> Thank you again.
> Walter
> 
> 
> Shawn Heisey  schrieb am 01.01.2019 20:00:13:
> 
>> If you've blocked the Solr port, then you can't access Solr at all, 
>> including the admin UI.  The UI is accessed through the same port as the 
>> rest of Solr.
>> 
>> The admin UI is a static set of resources (html, css, javascript, images, 
>> etc) that gets downloaded and runs within the browser, accessing the same 
>> API that anything else would.  When you issue a query with the admin UI, it 
>> is your browser that makes the query, not the server.
>> 
>> If you set up a reverse proxy that blocks URL paths for the API while 
>> allowing URL paths for the admin UI, then the admin UI won't work -- because 
>> everything the admin UI displays or does is accomplished by your browser 
>> making calls to the API.
>> 
>> Thanks,
>> Shawn
> 
> 
> Terry Steichen  schrieb am 01.01.2019 19:39:04:
> 
>> I think a better approach to tunneling would be:
>> 
>> ssh -p  -L :localhost:8983 use...@myremoteserver.example.com
>> 
>> This requires you to set up a different port () rather than use the
>> standard 22 port (on your router and on your sshd config).  I've been
>> running something like this for about a year and have rarely if ever had
>> it attacked.  Prior to changing the port (to ), however, I was under
>> constant hacking attacks - they find port 22 too attractive to ignore.
>> 
>> Also, regarding my use of port : if you have the server running on
>> several local machines (as I do), the use of the  port may help
>> prevent confusion (as to whether your browser is accessing a local -
>> defaulted to 8983 - or a remote solr server).
>> 
>> Note: you might find that the ssh connection will drop out after some
>> inactivity, and need to be restarted occasionally.  Pretty simple to do
>> - just run the ssh line above again.
>> 
>> Note: I also add authorization controls to the AdminUI (and its functions)
> 
> 
> Jörn Franke  schrieb am 01.01.2019 19:11:18:
> 
>> You could configure a reverse proxy to provide one or more means of 
>> authentication.
>> 
>> However, I agree that the purpose why this is done should be clarified.
> 
> 
> Kay Wrobel  schrieb am 01.01.2019 19:02:10:
> 
>> You can use ssh to tunnel in.
>> 
>> ssh -L8983:localhost:8983 use...@myremoteserver.example.com
>> 
>> This will only require port 22 to be exposed to the public.
>> 
>> 
>> Sent from my iPhone
> 
> 
> Walter Underwood  schrieb am 01.01.2019 19:00:31:
> 
>> Yes, exposing the admin UI on the web is very dangerous. Anyone who finds it
>> can delete all your collections. That UI is designed for “back office” use 
>> only.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
> 
> 
> Gus Heck  schrieb am 01.01.2019 18:43:02:
> 
>> Why would you want to expose the administration gui on the web? This is a
>> very hazardous thing to do. Never mind that it normally also runs on 8983
>> and all it's functionality relies on the ability to interact with 8983
>> hosted api end points.
>> 
>> What are you actually trying to solve?
> 
> 
> Jörn Franke  schrieb am 31.12.2018 23:07:49:
> 
>> Reverse proxy?
> 
> 
> "aleksander_goncha...@yahoo.de"  schrieb am 
> 31.12.2018 23:22:59:
> 
>> Hi Walter,
>> 
>> hatte ähnlichen Fall. Der wurde mit Proxy gelöst. "Einfach" Ngnix dazwischen 
>> geschaltet.
>> 
>> Viele Grüße
>> Alexander
> 
> s...@cid.is schrieb am 31.12.2018 22:48:55:
> 
>> Hi all,
>> 
>> is there a way, better a solution, to access the Solr Admin GUI from  
>> outside the server (via public web) while the Solr port 8983 is closed  by a 
>> firewall and only available inside the server via localhost?
>> 
>> Thanks in advance
>> Walter Claassen
>> 
>> Alexandraweg 32
>> D 64287 Darmstadt
>> Fon +49-6151-4937961
>> Fax +49-6151-4937969
>> c...@cid.is
> 


Re: Last Modified Timestamp

2019-01-02 Thread Jason Gerlowski
Hi Antony,

I don't know a ton about DIH, so I can't answer your question myself.
But you might have better luck getting an answer from others if you
include more information about the behavior you're curious about.
Where do you see this Last Modified timestamp (in the Solr admin UI?
on your filesystem?  If so, on what files?) . How are you importing
documents (what is your DIH config?). etc.

Best,

Jason

On Wed, Dec 19, 2018 at 11:56 AM Antony A  wrote:
>
> Hello Solr Users,
>
> I am trying to figure out if there was a reason for "Last Modified: about
> 20 hours ago" remaining unchanged after a full data import into solr. I am
> running solr cloud on 7.2.1.
>
> I do see this value and also the numDocs value change on a Delta import.
>
> Thanks,
> Antony


Re: Debugging Solr Search results & Issues with Distributed IDF

2019-01-02 Thread Charlie Hull

On 01/01/2019 23:03, Lavanya Thirumalaisami wrote:


Hi,

I am trying to debug a query to find out why one documentgets more score than 
the other. The below are two similar products.


You might take a look at OSC's Splainer http://splainer.io/ or some of 
the other tools I've written about recently at 
http://www.flax.co.uk/blog/2018/11/15/defining-relevance-engineering-part-4-tools/ 
- note that this also covers some commercial offerings (and also that 
I'm very happy to take any comments or additions!).


Cheers

Charlie


Below is the debug results I get from Solr admin console.

  "Doc1": "\n15.20965 = sum of:\n 4.7573533 = max of:\n    4.7573533= weight(All:2x in 962) 
[], result of:\n   4.7573533 = score(doc=962,freq=2.0 =termFreq=2.0\n), product of:\n   3.4598935 
= idf(docFreq=1346, docCount=42836)\n    1.375 = tfNorm, computed from:\n  2.0 = termFreq=2.0\n   
   1.2 = parameter k1\n  0.0 = parameter b (norms omitted forfield)\n  10.452296 = max of:\n    
5.9166136 = weight(All:powerpoint in 962)[], result of:\n  5.9166136 =score(doc=962,freq=2.0 = 
termFreq=2.0\n), product of:\n    4.302992 = idf(docFreq=579,docCount=42836)\n    1.375 = 
tfNorm,computed from:\n  2.0 =termFreq=2.0\n  1.2 = parameterk1\n  0.0 = parameter b 
(normsomitted for field)\n    10.452296 =weight(All:\"socket outlet\" in 962) [], result of:\n  
10.452296 = score(doc=962,freq=2.0 =phraseFreq=2.0\n), product of:\n   7.60167 = idf(), sum of:\n 
3.5370626 = idf(docFreq=1246, docCount=42836)\n  4.064607 = idf(docFreq=735,docCount=42836)\n    
1.375 = tfNorm,computed from:\n  2.0 =phraseFreq=2.0\n  1.2 = parameterk1\n  0.0 = 
parameter b (normsomitted for field)\n",

"Doc15":"\n13.258003 = sum of:\n  5.7317085 = max of:\n    5.7317085 = weight(All:doubl in 
2122) [],result of:\n  5.7317085 =score(doc=2122,freq=2.0 = termFreq=2.0\n), product of:\n    
4.168515 = idf(docFreq=663,docCount=42874)\n    1.375 = tfNorm,computed from:\n  2.0 
=termFreq=2.0\n  1.2 = parameterk1\n  0.0 = parameter b (normsomitted for field)\n    
4.7657394 =weight(All:2x in 2122) [], result of:\n 4.7657394 = score(doc=2122,freq=2.0 = termFreq=2.0\n), 
productof:\n    3.4659925 =idf(docFreq=1339, docCount=42874)\n   1.375 = tfNorm, computed from:\n 
2.0 = termFreq=2.0\n  1.2= parameter k1\n  0.0 = parameterb (norms omitted for field)\n   
 5.390302= weight(All:2g in 2122) [], result of:\n 5.390302 = score(doc=2122,freq=2.0 = termFreq=2.0\n), 
product of:\n    3.9202197 = idf(docFreq=850,docCount=42874)\n    1.375 = tfNorm,computed from:\n 
 2.0 = termFreq=2.0\n  1.2 = parameter k1\n  0.0 = parameter b (norms omitted forfield)\n 
 7.526294 = max of:\n    5.8597584 = weight(All:powerpoint in 2122)[], result of:\n  5.8597584 
=score(doc=2122,freq=2.0 = termFreq=2.0\n), product of:\n    4.2616425 = 
idf(docFreq=604,docCount=42874)\n    1.375 = tfNorm,computed from:\n  2.0 = termFreq=2.0\n
  1.2 = parameter k1\n  0.0 = parameter b (norms omitted forfield)\n    7.526294 
=weight(All:\"socket outlet\" in 2122) [], result of:\n  7.526294 = score(doc=2122,freq=1.0 
=phraseFreq=1.0\n), product of:\n   7.526294 = idf(), sum of:\n 3.4955401 = idf(docFreq=1300, 
docCount=42874)\n  4.030754 = idf(docFreq=761,docCount=42874)\n    1.0 = tfNorm,computed from:\n  
    1.0 =phraseFreq=1.0\n  1.2 = parameterk1\n  0.0 = parameter b (normsomitted for 
field)\n",

  


My Questions

1.  IDF : I understand from solr documents that IDFis calculated for each 
separate shards, I have added the following stats cacheconfig to solrconfig.xml 
and reloaded collection



But even after that there is no change incalculated IDF.

2.  What are parameter b and parameter K1?

3.  Why there are lots of parameters included in myDoc15 rather than Doc1?

Is there any documentations I can refer to understand thesolr query 
calculations in depth.

We are using  Solr 6.1in Cloud with 3 zookeepers and 3 masters and 3 replicas.

Regards,
Lavanya




--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk


Re: Solr Size Limitation upto 32 KB files

2019-01-02 Thread Bernd Fehling

Hi,
I don't know the limits about Solr 4.2.1 but the RefGuide of Solr 6.6
says about Field Types for Class StrField:
"String (UTF-8 encoded string or Unicode). Strings are intended for
small fields and are not tokenized or analyzed in any way.
They have a hard limit of slightly less than 32K."

If you are trying to add larger content then you have to "chop" that
by yourself and add it as multivalued. Can be done within a self written loader.

Don't forget, Solr/Lucene is an indexer and not a fulltext engine.

Regards
Bernd


Am 02.01.19 um 10:23 schrieb Kranthi Kumar K:

Hi,

We are currently using Solr 4.2.1 version in our project and everything is 
going well. But recently, we are facing an issue with Solr Data Import. It is 
not importing the files with size greater than 32766 bytes (i.e, 32 kb) and 
showing 2 exceptions:


   1.  java.lang.illegalargumentexception
   2.  org.apache.lucene.util.bytesref hash$maxbyteslengthexceededexception


Please find the attached screenshot for reference.

We have searched for solutions in many forums and didn't find the exact 
solution for this issue. Interestingly, we found in the article, by changing 
the type of the 'field' from sting to  'text_general' might solve the issue. 
Please have a look in the below forum:

https://stackoverflow.com/questions/29445323/adding-a-document-to-the-index-in-solr-document-contains-at-least-one-immense-t

Schema.xml:
Changed from:
''

Changed to:
''

We have tried it but still it is not importing the files > 32 KB or 32766 bytes.

Could you please let us know the solution to fix this issue? We'll be awaiting 
your reply.


[image001]
Thanks & Regards,
Kranthi Kumar.K,
Software Engineer,
Ccube Fintech Global Services Pvt Ltd.,
Email/Skype: 
kranthikuma...@ccubefintech.com,
Mobile: +91-8978078449.





Re: Debugging Solr Search results & Issues with Distributed IDF

2019-01-02 Thread Doug Turnbull
On (2) these are BM25 parameters. There are several articles that discuss
BM25 in depth

https://opensourceconnections.com/blog/2015/10/16/bm25-the-next-generation-of-lucene-relevation/

https://www.elastic.co/blog/practical-bm25-part-2-the-bm25-algorithm-and-its-variables





On Tue, Jan 1, 2019 at 6:04 PM Lavanya Thirumalaisami
 wrote:

>
> Hi,
>
> I am trying to debug a query to find out why one documentgets more score
> than the other. The below are two similar products.
>
> Below is the debug results I get from Solr admin console.
>
>  "Doc1": "\n15.20965 = sum of:\n 4.7573533 = max of:\n4.7573533=
> weight(All:2x in 962) [], result of:\n   4.7573533 =
> score(doc=962,freq=2.0 =termFreq=2.0\n), product of:\n   3.4598935 =
> idf(docFreq=1346, docCount=42836)\n1.375 = tfNorm, computed
> from:\n  2.0 = termFreq=2.0\n  1.2 = parameter
> k1\n  0.0 = parameter b (norms omitted forfield)\n  10.452296 = max
> of:\n5.9166136 = weight(All:powerpoint in 962)[], result of:\n
> 5.9166136 =score(doc=962,freq=2.0 = termFreq=2.0\n), product of:\n
> 4.302992 = idf(docFreq=579,docCount=42836)\n1.375 = tfNorm,computed
> from:\n  2.0 =termFreq=2.0\n  1.2 = parameterk1\n
> 0.0 = parameter b (normsomitted for field)\n10.452296
> =weight(All:\"socket outlet\" in 962) [], result of:\n  10.452296 =
> score(doc=962,freq=2.0 =phraseFreq=2.0\n), product of:\n   7.60167 =
> idf(), sum of:\n 3.5370626 = idf(docFreq=1246,
> docCount=42836)\n  4.064607 =
> idf(docFreq=735,docCount=42836)\n1.375 = tfNorm,computed
> from:\n  2.0 =phraseFreq=2.0\n  1.2 =
> parameterk1\n  0.0 = parameter b (normsomitted for field)\n",
>
> "Doc15":"\n13.258003 = sum of:\n  5.7317085 = max of:\n5.7317085 =
> weight(All:doubl in 2122) [],result of:\n  5.7317085
> =score(doc=2122,freq=2.0 = termFreq=2.0\n), product of:\n4.168515 =
> idf(docFreq=663,docCount=42874)\n1.375 = tfNorm,computed
> from:\n  2.0 =termFreq=2.0\n  1.2 = parameterk1\n
> 0.0 = parameter b (normsomitted for field)\n4.7657394 =weight(All:2x in
> 2122) [], result of:\n 4.7657394 = score(doc=2122,freq=2.0 =
> termFreq=2.0\n), productof:\n3.4659925 =idf(docFreq=1339,
> docCount=42874)\n   1.375 = tfNorm, computed from:\n 2.0 =
> termFreq=2.0\n  1.2= parameter k1\n  0.0 = parameterb
> (norms omitted for field)\n5.390302= weight(All:2g in 2122) [], result
> of:\n 5.390302 = score(doc=2122,freq=2.0 = termFreq=2.0\n), product
> of:\n3.9202197 = idf(docFreq=850,docCount=42874)\n1.375 =
> tfNorm,computed from:\n  2.0 = termFreq=2.0\n  1.2 =
> parameter k1\n  0.0 = parameter b (norms omitted forfield)\n
> 7.526294 = max of:\n5.8597584 = weight(All:powerpoint in 2122)[],
> result of:\n  5.8597584 =score(doc=2122,freq=2.0 = termFreq=2.0\n),
> product of:\n4.2616425 = idf(docFreq=604,docCount=42874)\n
> 1.375 = tfNorm,computed from:\n  2.0 = termFreq=2.0\n  1.2
> = parameter k1\n  0.0 = parameter b (norms omitted forfield)\n
> 7.526294 =weight(All:\"socket outlet\" in 2122) [], result of:\n
> 7.526294 = score(doc=2122,freq=1.0 =phraseFreq=1.0\n), product
> of:\n   7.526294 = idf(), sum of:\n 3.4955401 =
> idf(docFreq=1300, docCount=42874)\n  4.030754 =
> idf(docFreq=761,docCount=42874)\n1.0 = tfNorm,computed
> from:\n  1.0 =phraseFreq=1.0\n  1.2 =
> parameterk1\n  0.0 = parameter b (normsomitted for field)\n",
>
>
>
> My Questions
>
> 1.  IDF : I understand from solr documents that IDFis calculated for
> each separate shards, I have added the following stats cacheconfig to
> solrconfig.xml and reloaded collection
>
> 
>
> But even after that there is no change incalculated IDF.
>
> 2.  What are parameter b and parameter K1?
>
> 3.  Why there are lots of parameters included in myDoc15 rather than
> Doc1?
>
> Is there any documentations I can refer to understand thesolr query
> calculations in depth.
>
> We are using  Solr 6.1in Cloud with 3 zookeepers and 3 masters and 3
> replicas.
>
> Regards,
> Lavanya
>
-- 
*Doug Turnbull **| CTO* | OpenSource Connections
, LLC | 240.476.9983
Author: Relevant Search 
This e-mail and all contents, including attachments, is considered to be
Company Confidential unless explicitly stated otherwise, regardless
of whether attachments are marked as such.


Re: Zookeeper timeout issue -

2019-01-02 Thread AshB
Hi Shawn,

Answers to your questions.

1.Yes we are aware of fault tolerance in our architecture,but its our dev
env,so we are working with solrCloud mode with limited machines.

2. Solr is running as separate app,its not on weblogic. We are using
Weblogic for rest services which further connect to zookeeper<-->Solr.

3.We used jconsole to monitor solr,zookeeper and weblogic process.In the
weblogic process looks like threads are getting stuck. One such thread
related to zookeeper is as below..

Name: zkConnectionManagerCallback-9207-thread-1
State: WAITING on
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@396cda76
Total blocked: 0  Total waited: 1

Stack trace:
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)

Have attached file containing snapshots of process.

Also attached the solr  GCeasy-report-gc.pdf
  gc
log report  TimoutIssue.docx
  of solr
during the load activity.







--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: How to access the Solr Admin GUI (2)

2019-01-02 Thread Gus Heck
I typically resolve this sort of situation with a ssh proxy such as

ssh -f  user@123.456.789.012 -L :127.0.0.1:8983 -N

Then I can access the solr GUI from localhost: on my machine, and all
the traffic is secured by SSH. Pick your local port ( here) as desired
of course. Sometimes I have to do two layers of this if there is only a
single machine accepting SSH for the remote network. Of course this
requires that the client be able to give you SSH shell access.


Re: How to access the Solr Admin GUI (2)

2019-01-02 Thread Kay Wrobel
In case of multiple "jumps", might I suggest the -J switch which allows you to 
specify a jump host.

Kay

> On Jan 2, 2019, at 9:37 AM, Gus Heck  wrote:
> 
> I typically resolve this sort of situation with a ssh proxy such as
> 
> ssh -f  user@123.456.789.012 -L :127.0.0.1:8983 -N
> 
> Then I can access the solr GUI from localhost: on my machine, and all
> the traffic is secured by SSH. Pick your local port ( here) as desired
> of course. Sometimes I have to do two layers of this if there is only a
> single machine accepting SSH for the remote network. Of course this
> requires that the client be able to give you SSH shell access.


-- 

The information in this e-mail is confidential and is intended solely for 
the addressee(s). Access to this email by anyone else is unauthorized. If 
you are not an intended recipient, you may not print, save or otherwise 
store the e-mail or any of the contents thereof in electronic or physical 
form, nor copy, use or disseminate the information contained in the email.  
If you are not an intended recipient,  please notify the sender of this 
email immediately.


Re: Solr Size Limitation upto 32 KB files

2019-01-02 Thread Erick Erickson
Adding to what Bernd said, _string_ fields that large are almost always
a result of misunderstanding the use case. Especially if you
find yourself searching with the q=field:*word* pattern.

If you're trying to search within the string you need a
TextField-based type, not a StrField.

Best,
Erick

On Wed, Jan 2, 2019 at 4:03 AM Bernd Fehling
 wrote:
>
> Hi,
> I don't know the limits about Solr 4.2.1 but the RefGuide of Solr 6.6
> says about Field Types for Class StrField:
> "String (UTF-8 encoded string or Unicode). Strings are intended for
> small fields and are not tokenized or analyzed in any way.
> They have a hard limit of slightly less than 32K."
>
> If you are trying to add larger content then you have to "chop" that
> by yourself and add it as multivalued. Can be done within a self written 
> loader.
>
> Don't forget, Solr/Lucene is an indexer and not a fulltext engine.
>
> Regards
> Bernd
>
>
> Am 02.01.19 um 10:23 schrieb Kranthi Kumar K:
> > Hi,
> >
> > We are currently using Solr 4.2.1 version in our project and everything is 
> > going well. But recently, we are facing an issue with Solr Data Import. It 
> > is not importing the files with size greater than 32766 bytes (i.e, 32 kb) 
> > and showing 2 exceptions:
> >
> >
> >1.  java.lang.illegalargumentexception
> >2.  org.apache.lucene.util.bytesref hash$maxbyteslengthexceededexception
> >
> >
> > Please find the attached screenshot for reference.
> >
> > We have searched for solutions in many forums and didn't find the exact 
> > solution for this issue. Interestingly, we found in the article, by 
> > changing the type of the 'field' from sting to  'text_general' might solve 
> > the issue. Please have a look in the below forum:
> >
> > https://stackoverflow.com/questions/29445323/adding-a-document-to-the-index-in-solr-document-contains-at-least-one-immense-t
> >
> > Schema.xml:
> > Changed from:
> > ' > multiValued="true" />'
> >
> > Changed to:
> > ' > multiValued="true" />'
> >
> > We have tried it but still it is not importing the files > 32 KB or 32766 
> > bytes.
> >
> > Could you please let us know the solution to fix this issue? We'll be 
> > awaiting your reply.
> >
> >
> > [image001]
> > Thanks & Regards,
> > Kranthi Kumar.K,
> > Software Engineer,
> > Ccube Fintech Global Services Pvt Ltd.,
> > Email/Skype: 
> > kranthikuma...@ccubefintech.com,
> > Mobile: +91-8978078449.
> >
> >
> >


Re: Solr Cloud: Zookeeper failure modes

2019-01-02 Thread Erick Erickson
1> no. At one point, this could be done in the sense that the
collections would be reconstructed, (legacyCloud) but that turned out
to have.. side effects. Even in that case, though, Solr couldn't
reconstruct the configsets. (insert rant that you really must store
your configsets in a VCS system somewhere IMO).

2> Should be fine, as long as the state changes don't include things
like adding replicas or collections or you've changed your configsets.
ZK has nothing to do with commits for instance. Leader election is
recorded in ZK, but other leaders will be elected if necessary. Again,
though, if you've changed the topology (added replicas and/or
collections and/or shards if using implicit routing) between the time
you took the snapshot and ZK failed you'll have an incomplete restored
state.

Now, all that said ZooKeeper data is "just data". Apart from blobs
stored in ZK, you can manually reconstruct the whole thing  with a
text editor and upload it. this would be tedious and error-prone to be
sure, but do-able. Periodically storing away a copy of the Collections
API CLUSTERSTATUS would help a lot.

Another approach would be to simply re-create your collections with
the exact same shard count. That'll create replicas with the same
ranges etc. Then shut your Solr instances down and copy the data
directory from the correct old replica to the correct new replica.
Once you're satisfied that things are running, you can delete the old
(unused) data. As an aside, in this case I'd create my new
collection(s) as leader-only (1 replica), then copy as necessary and
verify that things were as expected. Once that was done, I'd use
ADDREPLICA to build out the new collection(s). This pre-supposes you
can get your configsets back from VCS as well as any binary data
you've stored in ZK (e.g. jar files for custom code and the like).

So overall it's do-able even without ZK snapshots _assuming_ you can
find copies of your configsets and any custom code you've stored in
ZK. Not something I'd really _like_ to do, but in an emergency you
have options.

But backing up ZK snapshots in a safe place would be, by far, the
easiest and safest thing to do

HTH,
Erick

On Wed, Jan 2, 2019 at 12:36 AM Pavel Micka  wrote:
>
> Hi,
> We are currently implementing Solr cloud and as part of this effort we are 
> investigating, which failure modes may happen between Solr and Zookeeper.
>
> We have found quite a lot articles describing the "happy path" failure, when 
> ZK stops (loses majority) and the Solr Cluster ceases to serve write requests 
> (& read continues to work as expected). Once ZK cluster is reconciled and 
> majority achieved again, everything continues working as expected.
>
> What we have not been able to find is what happens when ZK cluster 
> catastrophically fails and loses its data. Either completely (scenario A) or 
> is restarted from backup (scenario B).
>
> So now the questions:
>
> 1)  Scenario A - Is existing Solr Cloud cluster able to start against a 
> clean Zookeeper and reconstruct all the ZK data from its internal state 
> (using some king of emergency recovery; it may take long)?
>
> 2)  Scenario B - What is the worst case backup/restore scenario? For 
> example when
>
> a.   ZK is backed up
>
> b.   Cluster performs some transition between states "X -> Y" (such as 
> commit shard, elect new leader etc.)
>
> c.   ZK fails completely
>
> d.   ZK is restored from backup created in step a
>
> e.   Solr Cloud is in state "Y", while ZK is in state "X"
>
> Thanks in advance,
>
> Pavel
>


Re: Please unsubscribe me from solr-user emails

2019-01-02 Thread Erick Erickson
Please follow the instructions here:
http://lucene.apache.org/solr/community.html#mailing-lists-irc. You
must use the _exact_ same e-mail as you used to subscribe.

If the initial try doesn't work and following the suggestions at the
"problems" link doesn't work for you, let us know. But note you need
to show us the _entire_ return header to allow anyone to diagnose the
problem.

Best,
Erick

On Tue, Jan 1, 2019 at 11:34 PM Gaurav Srivastava  wrote:
>
> Hi Team,
>
> I tried automated way to unsubscribe from solr-user emails. could you
> please help me in unsubscribing the emails ?
>
> --
> Regards
> Gaurav Srivastava


Re: How to access the Solr Admin GUI (2)

2019-01-02 Thread Gus Heck
If the keys line up nicely across jumps...

On Wed, Jan 2, 2019, 10:49 AM Kay Wrobel  In case of multiple "jumps", might I suggest the -J switch which allows
> you to specify a jump host.
>
> Kay
>
> > On Jan 2, 2019, at 9:37 AM, Gus Heck  wrote:
> >
> > I typically resolve this sort of situation with a ssh proxy such as
> >
> > ssh -f  user@123.456.789.012 -L :127.0.0.1:8983 -N
> >
> > Then I can access the solr GUI from localhost: on my machine, and all
> > the traffic is secured by SSH. Pick your local port ( here) as
> desired
> > of course. Sometimes I have to do two layers of this if there is only a
> > single machine accepting SSH for the remote network. Of course this
> > requires that the client be able to give you SSH shell access.
>
>
> --
>
> The information in this e-mail is confidential and is intended solely for
> the addressee(s). Access to this email by anyone else is unauthorized. If
> you are not an intended recipient, you may not print, save or otherwise
> store the e-mail or any of the contents thereof in electronic or physical
> form, nor copy, use or disseminate the information contained in the email.
>
> If you are not an intended recipient,  please notify the sender of this
> email immediately.
>


Re: Solr Cloud: Zookeeper failure modes

2019-01-02 Thread Gus Heck
I thought jar files for custom code were meant to go into the '.system'
collection, not zookeeper. Did I miss a new/old storage option?

On Wed, Jan 2, 2019, 12:25 PM Erick Erickson  1> no. At one point, this could be done in the sense that the
> collections would be reconstructed, (legacyCloud) but that turned out
> to have.. side effects. Even in that case, though, Solr couldn't
> reconstruct the configsets. (insert rant that you really must store
> your configsets in a VCS system somewhere IMO).
>
> 2> Should be fine, as long as the state changes don't include things
> like adding replicas or collections or you've changed your configsets.
> ZK has nothing to do with commits for instance. Leader election is
> recorded in ZK, but other leaders will be elected if necessary. Again,
> though, if you've changed the topology (added replicas and/or
> collections and/or shards if using implicit routing) between the time
> you took the snapshot and ZK failed you'll have an incomplete restored
> state.
>
> Now, all that said ZooKeeper data is "just data". Apart from blobs
> stored in ZK, you can manually reconstruct the whole thing  with a
> text editor and upload it. this would be tedious and error-prone to be
> sure, but do-able. Periodically storing away a copy of the Collections
> API CLUSTERSTATUS would help a lot.
>
> Another approach would be to simply re-create your collections with
> the exact same shard count. That'll create replicas with the same
> ranges etc. Then shut your Solr instances down and copy the data
> directory from the correct old replica to the correct new replica.
> Once you're satisfied that things are running, you can delete the old
> (unused) data. As an aside, in this case I'd create my new
> collection(s) as leader-only (1 replica), then copy as necessary and
> verify that things were as expected. Once that was done, I'd use
> ADDREPLICA to build out the new collection(s). This pre-supposes you
> can get your configsets back from VCS as well as any binary data
> you've stored in ZK (e.g. jar files for custom code and the like).
>
> So overall it's do-able even without ZK snapshots _assuming_ you can
> find copies of your configsets and any custom code you've stored in
> ZK. Not something I'd really _like_ to do, but in an emergency you
> have options.
>
> But backing up ZK snapshots in a safe place would be, by far, the
> easiest and safest thing to do
>
> HTH,
> Erick
>
> On Wed, Jan 2, 2019 at 12:36 AM Pavel Micka 
> wrote:
> >
> > Hi,
> > We are currently implementing Solr cloud and as part of this effort we
> are investigating, which failure modes may happen between Solr and
> Zookeeper.
> >
> > We have found quite a lot articles describing the "happy path" failure,
> when ZK stops (loses majority) and the Solr Cluster ceases to serve write
> requests (& read continues to work as expected). Once ZK cluster is
> reconciled and majority achieved again, everything continues working as
> expected.
> >
> > What we have not been able to find is what happens when ZK cluster
> catastrophically fails and loses its data. Either completely (scenario A)
> or is restarted from backup (scenario B).
> >
> > So now the questions:
> >
> > 1)  Scenario A - Is existing Solr Cloud cluster able to start
> against a clean Zookeeper and reconstruct all the ZK data from its internal
> state (using some king of emergency recovery; it may take long)?
> >
> > 2)  Scenario B - What is the worst case backup/restore scenario? For
> example when
> >
> > a.   ZK is backed up
> >
> > b.   Cluster performs some transition between states "X -> Y" (such
> as commit shard, elect new leader etc.)
> >
> > c.   ZK fails completely
> >
> > d.   ZK is restored from backup created in step a
> >
> > e.   Solr Cloud is in state "Y", while ZK is in state "X"
> >
> > Thanks in advance,
> >
> > Pavel
> >
>


Re: How to debug empty ParsedQuery from Edismax Query Parser

2019-01-02 Thread Kay Wrobel
Is there any way I can debug the parser? Especially, the edismax parser which 
does *not* raise any exception but produces an empty parsedQuery? Please, if 
anyone can help. I feel very lost and without guidance, and Google search has 
not provided me with any help at all.

> On Dec 28, 2018, at 9:57 AM, Kay Wrobel  wrote:
> 
> Here are my log entries:
> 
> SOLR 7.x (non-working)
> 2018-12-28 15:36:32.786 INFO  (qtp1769193365-20) [   x:collection1] 
> o.a.s.c.S.Request [collection1]  webapp=/solr path=/select 
> params={q=ac6023*&qf=tm_field_product^21.0&qf=tm_title_field^8.0&EchoParams=all&rows=10&wt=xml&debugQuery=true}
>  hits=0 status=0 QTime=2
> 
> SOLR 4.x (working)
> INFO  - 2018-12-28 15:43:41.938; org.apache.solr.core.SolrCore; [collection1] 
> webapp=/solr path=/select 
> params={q=ac6023*&qf=tm_field_product^21.0&qf=tm_title_field^8.0&EchoParams=all&rows=10&wt=xml&debugQuery=true}
>  hits=32 status=0 QTime=8 
> 
> EchoParams=all did not show anything different in the resulting XML from SOLR 
> 7.x.
> 
> 
> I found out something curious yesterday. When I try to force the Standard 
> query parser on SOLR 7.x using the same query, but adding "defType=lucene" at 
> the beginning, SOLR 7 raises a SolrException with this message: "analyzer 
> returned too many terms for multiTerm term: ac6023" (full response: 
> https://pastebin.com/ijdBj4GF)
> 
> Log entry for that request:
> 2018-12-28 15:50:58.804 ERROR (qtp1769193365-15) [   x:collection1] 
> o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: analyzer 
> returned too many terms for multiTerm term: ac6023
>at 
> org.apache.solr.schema.TextField.analyzeMultiTerm(TextField.java:180)
>at 
> org.apache.solr.parser.SolrQueryParserBase.analyzeIfMultitermTermText(SolrQueryParserBase.java:992)
>at 
> org.apache.solr.parser.SolrQueryParserBase.getPrefixQuery(SolrQueryParserBase.java:1173)
>at 
> org.apache.solr.parser.SolrQueryParserBase.handleBareTokenQuery(SolrQueryParserBase.java:781)
>at org.apache.solr.parser.QueryParser.Term(QueryParser.java:421)
>at org.apache.solr.parser.QueryParser.Clause(QueryParser.java:278)
>at org.apache.solr.parser.QueryParser.Query(QueryParser.java:162)
>at 
> org.apache.solr.parser.QueryParser.TopLevelQuery(QueryParser.java:131)
>at 
> org.apache.solr.parser.SolrQueryParserBase.parse(SolrQueryParserBase.java:254)
>at org.apache.solr.search.LuceneQParser.parse(LuceneQParser.java:49)
>at org.apache.solr.search.QParser.getQuery(QParser.java:173)
>at 
> org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:160)
>at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:279)
>at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
>at org.apache.solr.core.SolrCore.execute(SolrCore.java:2541)
>at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:709)
>at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:515)
>at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377)
>at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323)
>at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634)
>at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)
>at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
>at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
>at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)
>at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
>at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1317)
>at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
>at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)
>at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)
>at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
>at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1219)
>at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
>at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)
>at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)
>at 
> org.eclipse.jetty.server.handler.HandlerWrappe

Re: How to debug empty ParsedQuery from Edismax Query Parser

2019-01-02 Thread Gus Heck
If you mean attach a debugger, solr is just like any other java program.
Pass in the standard java options at start up to have it listen or connect
as usual. The port is just a TCP port so ssh tunneling the debugger port
can bridge the gap with a remote machine (or a vpn).

That said the prior thread posts makes it sound like we are looking for a
case where the query parser or something above it is inappropriately eating
an exception relating to too many terms.

Did 5.x impose a new or reduced limit there?

On Wed, Jan 2, 2019, 1:20 PM Kay Wrobel  Is there any way I can debug the parser? Especially, the edismax parser
> which does *not* raise any exception but produces an empty parsedQuery?
> Please, if anyone can help. I feel very lost and without guidance, and
> Google search has not provided me with any help at all.
>
> > On Dec 28, 2018, at 9:57 AM, Kay Wrobel  wrote:
> >
> > Here are my log entries:
> >
> > SOLR 7.x (non-working)
> > 2018-12-28 15:36:32.786 INFO  (qtp1769193365-20) [   x:collection1]
> o.a.s.c.S.Request [collection1]  webapp=/solr path=/select
> params={q=ac6023*&qf=tm_field_product^21.0&qf=tm_title_field^8.0&EchoParams=all&rows=10&wt=xml&debugQuery=true}
> hits=0 status=0 QTime=2
> >
> > SOLR 4.x (working)
> > INFO  - 2018-12-28 15:43:41.938; org.apache.solr.core.SolrCore;
> [collection1] webapp=/solr path=/select
> params={q=ac6023*&qf=tm_field_product^21.0&qf=tm_title_field^8.0&EchoParams=all&rows=10&wt=xml&debugQuery=true}
> hits=32 status=0 QTime=8
> >
> > EchoParams=all did not show anything different in the resulting XML from
> SOLR 7.x.
> >
> >
> > I found out something curious yesterday. When I try to force the
> Standard query parser on SOLR 7.x using the same query, but adding
> "defType=lucene" at the beginning, SOLR 7 raises a SolrException with this
> message: "analyzer returned too many terms for multiTerm term: ac6023"
> (full response: https://pastebin.com/ijdBj4GF)
> >
> > Log entry for that request:
> > 2018-12-28 15:50:58.804 ERROR (qtp1769193365-15) [   x:collection1]
> o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: analyzer
> returned too many terms for multiTerm term: ac6023
> >at
> org.apache.solr.schema.TextField.analyzeMultiTerm(TextField.java:180)
> >at
> org.apache.solr.parser.SolrQueryParserBase.analyzeIfMultitermTermText(SolrQueryParserBase.java:992)
> >at
> org.apache.solr.parser.SolrQueryParserBase.getPrefixQuery(SolrQueryParserBase.java:1173)
> >at
> org.apache.solr.parser.SolrQueryParserBase.handleBareTokenQuery(SolrQueryParserBase.java:781)
> >at org.apache.solr.parser.QueryParser.Term(QueryParser.java:421)
> >at org.apache.solr.parser.QueryParser.Clause(QueryParser.java:278)
> >at org.apache.solr.parser.QueryParser.Query(QueryParser.java:162)
> >at
> org.apache.solr.parser.QueryParser.TopLevelQuery(QueryParser.java:131)
> >at
> org.apache.solr.parser.SolrQueryParserBase.parse(SolrQueryParserBase.java:254)
> >at
> org.apache.solr.search.LuceneQParser.parse(LuceneQParser.java:49)
> >at org.apache.solr.search.QParser.getQuery(QParser.java:173)
> >at
> org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:160)
> >at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:279)
> >at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
> >at org.apache.solr.core.SolrCore.execute(SolrCore.java:2541)
> >at
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:709)
> >at
> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:515)
> >at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377)
> >at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323)
> >at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634)
> >at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)
> >at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
> >at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
> >at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
> >at
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
> >at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)
> >at
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
> >at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1317)
> >at
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
> >at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)
> >at
> org.eclipse.jetty.server.sess

Re: How to debug empty ParsedQuery from Edismax Query Parser

2019-01-02 Thread Kay Wrobel
Well, I was putting that info out there because I am literally hunting down 
this issue without any guidance. The real problem for still is that the Edismax 
Query Parser behaves abnormally starting with Version 5 until current giving me 
empty parsedQuery. Forcing the request through the Lucene parser was one way I 
was hoping to get to the bottom of this. Frankly, Multi-Term seems to be *the* 
new feature that was introduced in SOLR 5, and so I am jumping to conclusions 
here.

I would hate to go as low-level as debugging SOLR source just to find out what 
is going on here, but it sure seems that way. By the way, I have tried a 
multitude of other search terms (ending in *), like:
602* (works)
602K* (does NOT work)
A3F* (works!, but is also single changing characters, so...)
AC* (works, but MANY results for obvious reasons)
6023* (works)

So again, it seems that as soon as there is more than one character involved 
and a "word" is somewhat detected, the parser fails (in my specific case).

I am contemplating going down to the source-code level and debugging this 
issue; I am a programmer and should be able to understand some of it. That 
said, it seems like a very time-consuming thing to do. One last attempt right 
now is for me so change some logging level in the SOLR 7 instance and see what 
it spits out. I changed the following to "DEBUG":
org.apache.solr.search.PayloadCheckQParserPlugin
org.apache.solr.search.SurroundQParserPlugin
org.apache.solr.search.join.ChildFieldValueSourceParser

That didn't add any new information in the log file at all.


> On Jan 2, 2019, at 12:40 PM, Gus Heck  wrote:
> 
> If you mean attach a debugger, solr is just like any other java program.
> Pass in the standard java options at start up to have it listen or connect
> as usual. The port is just a TCP port so ssh tunneling the debugger port
> can bridge the gap with a remote machine (or a vpn).
> 
> That said the prior thread posts makes it sound like we are looking for a
> case where the query parser or something above it is inappropriately eating
> an exception relating to too many terms.
> 
> Did 5.x impose a new or reduced limit there?
> 
> On Wed, Jan 2, 2019, 1:20 PM Kay Wrobel  
>> Is there any way I can debug the parser? Especially, the edismax parser
>> which does *not* raise any exception but produces an empty parsedQuery?
>> Please, if anyone can help. I feel very lost and without guidance, and
>> Google search has not provided me with any help at all.
>> 
>>> On Dec 28, 2018, at 9:57 AM, Kay Wrobel  wrote:
>>> 
>>> Here are my log entries:
>>> 
>>> SOLR 7.x (non-working)
>>> 2018-12-28 15:36:32.786 INFO  (qtp1769193365-20) [   x:collection1]
>> o.a.s.c.S.Request [collection1]  webapp=/solr path=/select
>> params={q=ac6023*&qf=tm_field_product^21.0&qf=tm_title_field^8.0&EchoParams=all&rows=10&wt=xml&debugQuery=true}
>> hits=0 status=0 QTime=2
>>> 
>>> SOLR 4.x (working)
>>> INFO  - 2018-12-28 15:43:41.938; org.apache.solr.core.SolrCore;
>> [collection1] webapp=/solr path=/select
>> params={q=ac6023*&qf=tm_field_product^21.0&qf=tm_title_field^8.0&EchoParams=all&rows=10&wt=xml&debugQuery=true}
>> hits=32 status=0 QTime=8
>>> 
>>> EchoParams=all did not show anything different in the resulting XML from
>> SOLR 7.x.
>>> 
>>> 
>>> I found out something curious yesterday. When I try to force the
>> Standard query parser on SOLR 7.x using the same query, but adding
>> "defType=lucene" at the beginning, SOLR 7 raises a SolrException with this
>> message: "analyzer returned too many terms for multiTerm term: ac6023"
>> (full response: https://pastebin.com/ijdBj4GF)
>>> 
>>> Log entry for that request:
>>> 2018-12-28 15:50:58.804 ERROR (qtp1769193365-15) [   x:collection1]
>> o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: analyzer
>> returned too many terms for multiTerm term: ac6023
>>>   at
>> org.apache.solr.schema.TextField.analyzeMultiTerm(TextField.java:180)
>>>   at
>> org.apache.solr.parser.SolrQueryParserBase.analyzeIfMultitermTermText(SolrQueryParserBase.java:992)
>>>   at
>> org.apache.solr.parser.SolrQueryParserBase.getPrefixQuery(SolrQueryParserBase.java:1173)
>>>   at
>> org.apache.solr.parser.SolrQueryParserBase.handleBareTokenQuery(SolrQueryParserBase.java:781)
>>>   at org.apache.solr.parser.QueryParser.Term(QueryParser.java:421)
>>>   at org.apache.solr.parser.QueryParser.Clause(QueryParser.java:278)
>>>   at org.apache.solr.parser.QueryParser.Query(QueryParser.java:162)
>>>   at
>> org.apache.solr.parser.QueryParser.TopLevelQuery(QueryParser.java:131)
>>>   at
>> org.apache.solr.parser.SolrQueryParserBase.parse(SolrQueryParserBase.java:254)
>>>   at
>> org.apache.solr.search.LuceneQParser.parse(LuceneQParser.java:49)
>>>   at org.apache.solr.search.QParser.getQuery(QParser.java:173)
>>>   at
>> org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:160)
>>>   at
>> org.apache.solr.handler.component.SearchHandle

Re: How to debug empty ParsedQuery from Edismax Query Parser

2019-01-02 Thread Shawn Heisey

On 12/28/2018 8:57 AM, Kay Wrobel wrote:

Here are my log entries:

SOLR 7.x (non-working)
2018-12-28 15:36:32.786 INFO  (qtp1769193365-20) [   x:collection1] o.a.s.c.S.Request [collection1]  
webapp=/solr path=/select 
params={q=ac6023*&qf=tm_field_product^21.0&qf=tm_title_field^8.0&EchoParams=all&rows=10&wt=xml&debugQuery=true}
 hits=0 status=0 QTime=2

SOLR 4.x (working)
INFO  - 2018-12-28 15:43:41.938; org.apache.solr.core.SolrCore; [collection1] webapp=/solr path=/select 
params={q=ac6023*&qf=tm_field_product^21.0&qf=tm_title_field^8.0&EchoParams=all&rows=10&wt=xml&debugQuery=true}
 hits=32 status=0 QTime=8


Neither of those requests includes anything that would change from the 
default lucene parser to edismax.  The logging *would* include all 
parameters set by the configuration as well as those specified on the URL.


You ought to try adding "defType=edismax" to the URL parameters, or to 
the definition of "/select" in solrconfig.xml.



EchoParams=all did not show anything different in the resulting XML from SOLR 
7.x.


The parameter requested was "echoParams" and not "EchoParams".  There 
*is* a difference, and the latter will not work.



I found out something curious yesterday. When I try to force the Standard query parser on SOLR 7.x 
using the same query, but adding "defType=lucene" at the beginning, SOLR 7 raises a 
SolrException with this message: "analyzer returned too many terms for multiTerm term: 
ac6023"


I do not know what this is about.  I did find the message in the source 
code.  I don't understand the low-level code, and it looks to me like 
that section of code will *always* throw an exception, which isn't very 
useful.


Thanks,
Shawn



Re: How to debug empty ParsedQuery from Edismax Query Parser

2019-01-02 Thread Kay Wrobel
Thanks for your thoughts, Shawn. Are you a developer on SOLR?

Anyway, the configuration (solrconfig.xml) was provided by search_api_solr 
(Drupal 7 module) and is untouched. You can find it here:
https://cgit.drupalcode.org/search_api_solr/tree/solr-conf/7.x/solrconfig.xml?h=7.x-1.x

Thank you for pointing out the capital E on echoParams. I re-ran the query, but 
it doesn't change the output (at least on the surface of it).

> On Jan 2, 2019, at 1:11 PM, Shawn Heisey  wrote:
> 
> On 12/28/2018 8:57 AM, Kay Wrobel wrote:
>> Here are my log entries:
>> 
>> SOLR 7.x (non-working)
>> 2018-12-28 15:36:32.786 INFO  (qtp1769193365-20) [   x:collection1] 
>> o.a.s.c.S.Request [collection1] webapp=/solr path=/select 
>> params={q=ac6023*&qf=tm_field_product^21.0&qf=tm_title_field^8.0&EchoParams=all&rows=10&wt=xml&debugQuery=true}
>>  hits=0 status=0 QTime=2
>> 
>> SOLR 4.x (working)
>> INFO  - 2018-12-28 15:43:41.938; org.apache.solr.core.SolrCore; 
>> [collection1] webapp=/solr path=/select 
>> params={q=ac6023*&qf=tm_field_product^21.0&qf=tm_title_field^8.0&EchoParams=all&rows=10&wt=xml&debugQuery=true}
>>  hits=32 status=0 QTime=8
> 
> Neither of those requests includes anything that would change from the 
> default lucene parser to edismax.  The logging *would* include all parameters 
> set by the configuration as well as those specified on the URL.
> 
> You ought to try adding "defType=edismax" to the URL parameters, or to the 
> definition of "/select" in solrconfig.xml.
> 
>> EchoParams=all did not show anything different in the resulting XML from 
>> SOLR 7.x.
> 
> The parameter requested was "echoParams" and not "EchoParams".  There *is* a 
> difference, and the latter will not work.
> 
>> I found out something curious yesterday. When I try to force the Standard 
>> query parser on SOLR 7.x using the same query, but adding "defType=lucene" 
>> at the beginning, SOLR 7 raises a SolrException with this message: "analyzer 
>> returned too many terms for multiTerm term: ac6023"
> 
> I do not know what this is about.  I did find the message in the source code. 
>  I don't understand the low-level code, and it looks to me like that section 
> of code will *always* throw an exception, which isn't very useful.
> 
> Thanks,
> Shawn


-- 

The information in this e-mail is confidential and is intended solely for 
the addressee(s). Access to this email by anyone else is unauthorized. If 
you are not an intended recipient, you may not print, save or otherwise 
store the e-mail or any of the contents thereof in electronic or physical 
form, nor copy, use or disseminate the information contained in the email.  
If you are not an intended recipient,  please notify the sender of this 
email immediately.


Re: ConcurrentUpdateSolrClient - notify on success/failure?

2019-01-02 Thread Mikhail Khludnev
Needless to say about handleError() and onSuccess() callbacks.

On Wed, Jan 2, 2019 at 6:14 AM deniz  wrote:

> thanks a lot for the explanation :)
>
>
>
> -
> Zeki ama calismiyor... Calissa yapar...
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


-- 
Sincerely yours
Mikhail Khludnev


RE: Query kills Solrcloud

2019-01-02 Thread Webster Homer
We are still having serious problems with our solrcloud failing due to this 
problem.
The problem is clearly data related. 
How can I determine what documents are being searched? Is it possible to get 
Solr/lucene to output the docids being searched?

I believe that this is a lucene bug, but I need to narrow the focus to a 
smaller number of records, and I'm not certain how to do that efficiently. Are 
there debug parameters that could help?

-Original Message-
From: Webster Homer  
Sent: Thursday, December 20, 2018 3:45 PM
To: solr-user@lucene.apache.org
Subject: Query kills Solrcloud

We are experiencing almost nightly solr crashes due to Japanese queries. I’ve 
been able to determine that one of our field types seems to be a culprit. When 
I run a much reduced version of the query against out DEV solrcloud I see the 
memory usage jump from less than a gb to 5gb using only a single field in the 
query. The collection is fairly small ~411,000 documents of which only ~25,000 
have searchable Japanese fields. I have been able to simplify the query to run 
against a single Japanese field in the schema. The JVM memory jumps from less 
than a gig to close to 5 gb, and back down. The QTime is 36959 which seems high 
for 2500 documents. Indeed the single field that I’m using in my test case has 
2031 documents.

I extended the query to 5 fields and watch the memory usage in the Solr Console 
application. The memory usage goes to almost 6gb with a QTime of 100909. The 
Solrconsole shows connection errors, and when I look at the Cloud graph all the 
replicas on the node where I submitted the query are down. In dev the replicas 
eventually recover. In production, with the full query which has a lot more 
fields in the qf parameter, the solr cloud dies.
One example query term:
ジエチルアミノヒドロキシベンゾイル安息香酸ヘキシル

This is the field type that we have defined:
   
 

   
 






   

  

 
 
   

   
   





   

  


Why is searching even 1 field of this type so expensive?
I suspect that this is data related, as other queries return in far less than a 
second. What are good strategies for determining what documents are causing the 
problem? I’m new to debugging Solr so I could use some help. I’d like to reduce 
the number of records to a minimum to create a small dataset to reproduce the 
problem.
Right now our only option is to stop using this fieldtype, but it does improve 
the relevancy of searches that don’t cause Solr to crash.

It would be a great help if the Solrconsole would not timeout on these queries, 
is there a way to turn off the timeout?
We are running Solr 7.2


Re: Query kills Solrcloud

2019-01-02 Thread Gus Heck
Are you able to re-index a subset into a new collection?

For control of timeouts I would suggest Postman or curl, or some other
non-browser client.

On Wed, Jan 2, 2019 at 2:55 PM Webster Homer <
webster.ho...@milliporesigma.com> wrote:

> We are still having serious problems with our solrcloud failing due to
> this problem.
> The problem is clearly data related.
> How can I determine what documents are being searched? Is it possible to
> get Solr/lucene to output the docids being searched?
>
> I believe that this is a lucene bug, but I need to narrow the focus to a
> smaller number of records, and I'm not certain how to do that efficiently.
> Are there debug parameters that could help?
>
> -Original Message-
> From: Webster Homer 
> Sent: Thursday, December 20, 2018 3:45 PM
> To: solr-user@lucene.apache.org
> Subject: Query kills Solrcloud
>
> We are experiencing almost nightly solr crashes due to Japanese queries.
> I’ve been able to determine that one of our field types seems to be a
> culprit. When I run a much reduced version of the query against out DEV
> solrcloud I see the memory usage jump from less than a gb to 5gb using only
> a single field in the query. The collection is fairly small ~411,000
> documents of which only ~25,000 have searchable Japanese fields. I have
> been able to simplify the query to run against a single Japanese field in
> the schema. The JVM memory jumps from less than a gig to close to 5 gb, and
> back down. The QTime is 36959 which seems high for 2500 documents. Indeed
> the single field that I’m using in my test case has 2031 documents.
>
> I extended the query to 5 fields and watch the memory usage in the Solr
> Console application. The memory usage goes to almost 6gb with a QTime of
> 100909. The Solrconsole shows connection errors, and when I look at the
> Cloud graph all the replicas on the node where I submitted the query are
> down. In dev the replicas eventually recover. In production, with the full
> query which has a lot more fields in the qf parameter, the solr cloud dies.
> One example query term:
> ジエチルアミノヒドロキシベンゾイル安息香酸ヘキシル
>
> This is the field type that we have defined:
> positionIncrementGap="1" autoGeneratePhraseQueries="false">
>  
> 
> pattern="([\p{IsHangul}\p{IsHan}\p{IsKatakana}\p{IsHiragana}]+)\s+(?=[\p{IsHangul}\p{IsHan}\p{IsKatakana}\p{IsHiragana}])"
> replacement="$1"/>
>  
> 
> 
> 
>  id="Traditional-Simplified"/>
> 
>  id="Hiragana-Katakana"/>
>
>  hiragana="true" katakana="true" hangul="true" outputUnigrams="true" />
>   
>
>  
>  
> pattern="([\p{IsHangul}\p{IsHan}\p{IsKatakana}\p{IsHiragana}]+)\s+(?=[\p{IsHangul}\p{IsHan}\p{IsKatakana}\p{IsHiragana}])"
> replacement="$1"/>
>
>
> synonyms="synonyms.txt" ignoreCase="true" expand="true"
> tokenizerFactory="solr.ICUTokenizerFactory" />
> 
> 
>  id="Traditional-Simplified"/>
> 
>  id="Hiragana-Katakana"/>
>
>  hiragana="true" katakana="true" hangul="true" outputUnigrams="true" />
>   
> 
>
> Why is searching even 1 field of this type so expensive?
> I suspect that this is data related, as other queries return in far less
> than a second. What are good strategies for determining what documents are
> causing the problem? I’m new to debugging Solr so I could use some help.
> I’d like to reduce the number of records to a minimum to create a small
> dataset to reproduce the problem.
> Right now our only option is to stop using this fieldtype, but it does
> improve the relevancy of searches that don’t cause Solr to crash.
>
> It would be a great help if the Solrconsole would not timeout on these
> queries, is there a way to turn off the timeout?
> We are running Solr 7.2
>


-- 
http://www.the111shift.com


Re: Solr Cloud: Zookeeper failure modes

2019-01-02 Thread Erick Erickson
Right, don't quite know what I was thinking about. Even so, if
ZooKeeper is gone you'd still have to rebuild the .system collection
too. Or at least figure out how to access it again.

On Wed, Jan 2, 2019 at 10:21 AM Gus Heck  wrote:
>
> I thought jar files for custom code were meant to go into the '.system'
> collection, not zookeeper. Did I miss a new/old storage option?
>
> On Wed, Jan 2, 2019, 12:25 PM Erick Erickson 
> > 1> no. At one point, this could be done in the sense that the
> > collections would be reconstructed, (legacyCloud) but that turned out
> > to have.. side effects. Even in that case, though, Solr couldn't
> > reconstruct the configsets. (insert rant that you really must store
> > your configsets in a VCS system somewhere IMO).
> >
> > 2> Should be fine, as long as the state changes don't include things
> > like adding replicas or collections or you've changed your configsets.
> > ZK has nothing to do with commits for instance. Leader election is
> > recorded in ZK, but other leaders will be elected if necessary. Again,
> > though, if you've changed the topology (added replicas and/or
> > collections and/or shards if using implicit routing) between the time
> > you took the snapshot and ZK failed you'll have an incomplete restored
> > state.
> >
> > Now, all that said ZooKeeper data is "just data". Apart from blobs
> > stored in ZK, you can manually reconstruct the whole thing  with a
> > text editor and upload it. this would be tedious and error-prone to be
> > sure, but do-able. Periodically storing away a copy of the Collections
> > API CLUSTERSTATUS would help a lot.
> >
> > Another approach would be to simply re-create your collections with
> > the exact same shard count. That'll create replicas with the same
> > ranges etc. Then shut your Solr instances down and copy the data
> > directory from the correct old replica to the correct new replica.
> > Once you're satisfied that things are running, you can delete the old
> > (unused) data. As an aside, in this case I'd create my new
> > collection(s) as leader-only (1 replica), then copy as necessary and
> > verify that things were as expected. Once that was done, I'd use
> > ADDREPLICA to build out the new collection(s). This pre-supposes you
> > can get your configsets back from VCS as well as any binary data
> > you've stored in ZK (e.g. jar files for custom code and the like).
> >
> > So overall it's do-able even without ZK snapshots _assuming_ you can
> > find copies of your configsets and any custom code you've stored in
> > ZK. Not something I'd really _like_ to do, but in an emergency you
> > have options.
> >
> > But backing up ZK snapshots in a safe place would be, by far, the
> > easiest and safest thing to do
> >
> > HTH,
> > Erick
> >
> > On Wed, Jan 2, 2019 at 12:36 AM Pavel Micka 
> > wrote:
> > >
> > > Hi,
> > > We are currently implementing Solr cloud and as part of this effort we
> > are investigating, which failure modes may happen between Solr and
> > Zookeeper.
> > >
> > > We have found quite a lot articles describing the "happy path" failure,
> > when ZK stops (loses majority) and the Solr Cluster ceases to serve write
> > requests (& read continues to work as expected). Once ZK cluster is
> > reconciled and majority achieved again, everything continues working as
> > expected.
> > >
> > > What we have not been able to find is what happens when ZK cluster
> > catastrophically fails and loses its data. Either completely (scenario A)
> > or is restarted from backup (scenario B).
> > >
> > > So now the questions:
> > >
> > > 1)  Scenario A - Is existing Solr Cloud cluster able to start
> > against a clean Zookeeper and reconstruct all the ZK data from its internal
> > state (using some king of emergency recovery; it may take long)?
> > >
> > > 2)  Scenario B - What is the worst case backup/restore scenario? For
> > example when
> > >
> > > a.   ZK is backed up
> > >
> > > b.   Cluster performs some transition between states "X -> Y" (such
> > as commit shard, elect new leader etc.)
> > >
> > > c.   ZK fails completely
> > >
> > > d.   ZK is restored from backup created in step a
> > >
> > > e.   Solr Cloud is in state "Y", while ZK is in state "X"
> > >
> > > Thanks in advance,
> > >
> > > Pavel
> > >
> >