ConcurrentUpdateSolrClient - notify on success/failure?

2018-12-27 Thread deniz
I am trying to figure out if i can log anything or fire some events for other
listeners (like a jee event and so on) once ConcurrentUpdateSolrClient sends
the updates to the Solr (i.e internal queue is emptied and the request is
made with the all of the data to Solr itself) from java code... I am trying
to find a way to add some logic based on the "flushing" status of
ConcurrentUpdateSolrClient basically... 

After digging a bit, i found some methods which might be useful, but couldnt
find any explanation regarding those...

blockUntilFinished() -> seems this might be useful, but couldnt find any
example cases.
handleError(Throwable ex) -> only logs the error
onSuccess(HttpResponse resp) -> empty method body, needs overwriting

there is also shutdownNow(), but it doesnt seem to be useful for the
functionality i am looking for... 

are there any other ways to listen the flushing? and could anyone explain
some details about blockUntilFinished() please? in what cases it can be
useful? 







-
Zeki ama calismiyor... Calissa yapar...
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr reload process flow

2018-12-27 Thread Vadim Ivanov
Hi!
(Solr 7.6 ,  Tlog replicas)
I have an issue while reloading collection with 100 shards and 3 replicas per 
shard residing on 5 nodes.
Configuration of that collection is pretty complex (90 external file fields)
When node starts cores load always successfully.

When I reload collection with collection api command: 
/admin/collections?action=RELOAD&name=col 
all 5 nodes stop responding and I have dead cluster. Only restarting solr on 
all nodes revives it.

When I decreased number of shards/cores by 5 times (to 20 shards instead of 
100)  Collection reloaded successfully.
My guess is that during Collection RELOAD , limit on threads is not honored and 
all cores try to reload simultaneously.

Erick wrote here ( 
http://lucene.472066.n3.nabble.com/collection-reload-leads-to-OutOfMemoryError-td4380754.html#a4380791
 )
➢ There are a limited number of threads that load in parallel when 
➢ starting up, depends on the configuration. The defaults are 3 threads 
➢ in stand-alone and 8 in Cloud (see: NodeConfig.java) 
➢
➢ public static final int DEFAULT_CORE_LOAD_THREADS = 3; 
➢ public static final int DEFAULT_CORE_LOAD_THREADS_IN_CLOUD = 8; 

But unfortunately  stumbling about source I can't find out the place and 
approve 
whether these "threads limit" plays any role in reload collection or not...   
though I lack the necessary skills in java
Maybe somebody can give a hint where to look?

There was discussion here as well
http://lucene.472066.n3.nabble.com/Solr-reload-process-flow-td4379966.html#none
-- 
Vadim




Re: MoreLikeThis & Synonyms

2018-12-27 Thread Nicolas Paris
On Wed, Dec 26, 2018 at 09:09:02PM -0800, Erick Erickson wrote:
> bq. However multiword synonyms are only compatible with queryTime synonyms
> expansion.
> 
> Why do you say this? What version of Solr? Query-time mult-word
> synonyms were _added_, but AFAIK the capability of multi-word synonyms
> was not taken away. 

>From this blogpost [1] I deduced multi-word synonyms are only compatible
with query time expansion.

> Or are you saying that MLT doesn't play nice at all with multi-word
> synonyms?

>From my tests, MLT does not expand the query with synonyms. So it is not
possible to use query time synonyms nor mutli-word. Only index time is
possible with the limitations it has [1]

> What version of Solr are you using?

I am running solr 7.6.

[1] 
https://lucidworks.com/2017/04/18/multi-word-synonyms-solr-adds-query-time-support/

-- 
nicolas


CloudSolrClient with implicit route and disable distributed mode routes to unexpected cores

2018-12-27 Thread Jaroslaw Rozanski

Hi all,

Found interesting problem in Solr 7.5.0 regarding implicit router when 
_route_ param being provided in non-distributed request.


Imagine following set-up...

1. Collection: foo
2. Physical nodes: nodeA, nodeB
3. Shards: shard1, shard2
4. Replication factor: 2 (pure NRT)

- nodeA
-- foo_shard1_replica_n1
-- foo_shard2_replica_n1
- nodeB
-- foo_shard1_replica_n2
-- foo_shard2_replica_n2

TL;DR: two shards, two replicas each, co-sharing nodes.


Request: new SolrQuery("filter:value").setParam("_route_", 
"shard1").setParam("distrib", "false");


This request will return unpredictable results, depending on which core 
it hits.



The reason being is that CloudSolrClient will resolve node URLs to 
collection rather than cores. This is critical snippet in the code:


 Start from line 1072 ---

  List replicas = new ArrayList<>();
  String joinedInputCollections = StrUtils.join(inputCollections, ',');
  for (Slice slice : slices.values()) {
    for (ZkNodeProps nodeProps : slice.getReplicasMap().values()) {
  ZkCoreNodeProps coreNodeProps = new ZkCoreNodeProps(nodeProps);
  String node = coreNodeProps.getNodeName();
  if (!liveNodes.contains(node) // Must be a live node to continue
  || Replica.State.getState(coreNodeProps.getState()) != 
Replica.State.ACTIVE) // Must be an ACTIVE replica to continue

    continue;
  if (seenNodes.add(node)) { // if we haven't yet collected a 
URL to this node...
    String url = 
ZkCoreNodeProps.getCoreUrl(nodeProps.getStr(ZkStateReader.BASE_URL_PROP), 
joinedInputCollections); // BOOM!

    if (sendToLeaders && coreNodeProps.isLeader()) {
  theUrlList.add(url); // put leaders here eagerly (if 
sendToLeader mode)

    } else {
  replicas.add(url); // replicas here
    }
  }
    }
  }



The URL of replica is formed using collection name, not core name:
Line 1082: 
ZkCoreNodeProps.getCoreUrl(nodeProps.getStr(ZkStateReader.BASE_URL_PROP), 
joinedInputCollections)



Instead of getting URLs like:
- http://nodeA/solr/foo_shard1_replica_n1
- http://nodeB/solr/foo_shard1_replica_n2

We end up with:
- http://nodeA/solr/foo
- http://nodeB/solr/foo

Because in this example shards share physical nodes, sometimes request 
is routed to core of proper shard, sometimes not.


Should the CloudSolrClient resolve exact core URLs when distrib=false? I 
am guessing yes.



--
Jaroslaw Rozanski | e: m...@jarekrozanski.eu



Re: Nested Child document doesn't return in query

2018-12-27 Thread Mikhail Khludnev
it might be checked with explainOther request param

On Mon, Dec 17, 2018 at 9:08 PM Stephon Harris <
shar...@enterprise-knowledge.com> wrote:

> I ingested some nested documents into a Solr 7.4 core . When I search with
> the following it's not returning a child document that I expected:
>
>
>
> ```
>
> {!child of=cont_type:overview}id:2
>
> ```
>
>
>
> I can see that the document I'm looking for exists with the query:
>
>
>
> ```
>
> q=id:2-1
>
> ```
>
>
>
> I'm wondering why the document with id "2-1" now doesn't return with the
> Block Join Child Query Parser? It previously did. I'm wondering is there a
> way someone could have un-nested a child document?
>
> --
> Stephon Harris
>
> *Enterprise Knowledge, LLC*
> *Web: *http://www.enterprise-knowledge.com/
> <
> http://www.google.com/url?q=http%3A%2F%2Fwww.enterprise-knowledge.com%2F&sa=D&sntz=1&usg=AFQjCNFDktFDhseOl_Pha6Pz3fIFaWolNg
> >
> *E-mail:* shar...@enterprise-knowledge.com/
> <
> http://www.google.com/url?q=http%3A%2F%2Fwww.enterprise-knowledge.com%2F&sa=D&sntz=1&usg=AFQjCNFDktFDhseOl_Pha6Pz3fIFaWolNg
> >
> *Cell:* 832-628-8352
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Solr reload process flow

2018-12-27 Thread Erick Erickson
You can set it in solr.xml, see:
http://lucene.apache.org/solr/guide/7_6/format-of-solr-xml.html if
you'd like to
experiment with bumping it higher.

As for "stumbling around in the code" what this _sounds_ like is some
kind of deadlock and bumping the
number of threads would be, at best, a band-aid

There are two things I'd try:

1> assuming you're on a *nix system, make sure you've bumped your
ulimit for processes and files. Theres a warning when Solr
starts up about the limits for both processes and open files, both
should be quite high (65K).

2> take a thread dump of Solr when it's stuck, here's a pretty good
overview: https://dzone.com/articles/how-analyze-java-thread-dumps
What you're looking for in particular is DEADLOCK status codes.
That'll give you a stack trace of exactly any threads
that are deadlocked and indicate where to start looking. If you do
find deadlocked threads _and_ you have bumped your ulimits, it's
probably worth a JIRA...

And, of course, take a look at your Solr logs to see if there are any
clues there.

Best,
Erick

On Thu, Dec 27, 2018 at 1:51 AM Vadim Ivanov
 wrote:
>
> Hi!
> (Solr 7.6 ,  Tlog replicas)
> I have an issue while reloading collection with 100 shards and 3 replicas per 
> shard residing on 5 nodes.
> Configuration of that collection is pretty complex (90 external file fields)
> When node starts cores load always successfully.
>
> When I reload collection with collection api command: 
> /admin/collections?action=RELOAD&name=col
> all 5 nodes stop responding and I have dead cluster. Only restarting solr on 
> all nodes revives it.
>
> When I decreased number of shards/cores by 5 times (to 20 shards instead of 
> 100)  Collection reloaded successfully.
> My guess is that during Collection RELOAD , limit on threads is not honored 
> and all cores try to reload simultaneously.
>
> Erick wrote here ( 
> http://lucene.472066.n3.nabble.com/collection-reload-leads-to-OutOfMemoryError-td4380754.html#a4380791
>  )
> ➢ There are a limited number of threads that load in parallel when
> ➢ starting up, depends on the configuration. The defaults are 3 threads
> ➢ in stand-alone and 8 in Cloud (see: NodeConfig.java)
> ➢
> ➢ public static final int DEFAULT_CORE_LOAD_THREADS = 3;
> ➢ public static final int DEFAULT_CORE_LOAD_THREADS_IN_CLOUD = 8;
>
> But unfortunately  stumbling about source I can't find out the place and 
> approve
> whether these "threads limit" plays any role in reload collection or not...   
> though I lack the necessary skills in java
> Maybe somebody can give a hint where to look?
>
> There was discussion here as well
> http://lucene.472066.n3.nabble.com/Solr-reload-process-flow-td4379966.html#none
> --
> Vadim
>
>


Re: Starting optimize... Reading and rewriting the entire index! Use with care

2018-12-27 Thread Erick Erickson
1> Not sure. You can get stats after the fact if that would help.

2, 3, 4>  Well, optimize is a configuration parameter in DIH
that defaults to true so set it false and
you'll get rid of the optimize. See:
https://lucene.apache.org/solr/guide/6_6/uploading-structured-data-store-data-with-the-data-import-handler.html

On Wed, Dec 26, 2018 at 11:23 PM talhanather  wrote:
>
> Hi Erick,
>
> My DIH is working perfectly, As I have done the full import of index from
> SQL Server after installation.
>
> I had 25000 records which was imported to Solr through Full import using DIH
> config only.
> But my additional requirement is to import the new records through delta
> import (through some automation job) whenever the new records are getting
> created in the database.
> But when I click on Delta Import,  Its not working for some reason and
> nothing is showing in logs.
>
> While verifying logs, I saw the below warning occurring recursively and
> could see that the new records are getting indexed without clicking on delta
> import.
>
> "2018-12-19 16:13:21.927 WARN  (qtp736709391-15) [   x:solrprod]
> o.a.s.u.DirectUpdateHandler2 Starting optimize... Reading and rewriting the
> entire index! Use with care."
>
> I have mentioned my queries below,  Kindly suggest.
>
> 1. Without clicking on delta import, How my new records are getting indexed
> ?
> 2. I didnt click on the optimize, Then how its continuously doing the same.
> 3. How to stop optimization ?
> 4. As per the requirement my new records are getting indexed (without delta
> import), whether it will stop import of index when i stop optimization.
>
> Thanks,
> Talha
>
>
>
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Zookeeper timeout issue -

2018-12-27 Thread AshB
Hi Jan,

Setup Details

Mach-1 -->20Gb RAM.
Apps running :OracleDb,WeblogicServer(services deployed to call
solr),*OneSolr Node*,*One Zookeeper node*
Mach-2 -->20Gb RAM
Apps running :*One Solr Node*,*Two zookeeper nodes*.

Solr collection details : ~8k docs,~140MB size on disc,One shard on machine
1 and two replicas on mach-1 and mach-2

We did a jmeter load testing with 50 users 30 iterations i.e 1500
requests.In each call solr is called three times due to requirements.

What we noticed is when load on mach-1 goes high upto ~12 and memory
utilization goes high and then some requests time out.

Is this expected from zookeeper when load is too high?

-Ashish





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


How to debug empty ParsedQuery from Edismax Query Parser

2018-12-27 Thread Kay Wrobel
Hi everyone.

I have the task of converting our old SOLR 4.10.2 instance to the current SOLR 
7.6.0 version. We're using SOLR as our Search API backend for a Drupal 7 
Commerce web site. One of the most basic queries is that a customer would enter 
a part number or a portion of a part number on our web site and then get a list 
of part numbers back. Under the hood, we are using the "search_api_solr" module 
for Drupal 7 which produces a rather large query request. I have simplified a 
sample request for the sake of this discussion.

On SOLR 4.10.2, when I issue the following to our core:
/select?qf=tm_field_product^21.0&qf=tm_title_field^8.0&q=ac6023*&wt=xml&rows=10&debugQuery=true

I get 32 rows returned (out of a 1.4M indexed documents). Here is a link to the 
response (edited to focus on debugging info):
https://pastebin.com/JHuFcbGG 

Notice how "parsedQuery" has a proper DisjunctionMaxQuery based on the two 
query fields.

Now starting from SOLR version 5+, I receive zero (0) results back, but more 
importantly, the Query Parser produces an empty parsedQuery.

Here is the same query issued to SOLR 7.6.0 (current version):
https://pastebin.com/XcNhfdUD 

Notice how "parsedQuery" now shows "+()"; an empty query string.

As I understand it, the wildcard is a perfectly legal character in a query and 
has the following meaning:
http://lucene.apache.org/solr/guide/7_6/the-standard-query-parser.html#wildcard-searches
 


So why does this not work? I have installed SOLR 5.5.5 and 6.6.5 as well just 
to test when this behavior started happening, and it starts as early as SOLR 5. 
I've been searching Google for the past week now on the matter and cannot for 
the life of me find an answer to this issue. So I am turning to this mailing 
list for any advice and assistance.

Kind regards, and thanks.

Kay
-- 

The information in this e-mail is confidential and is intended solely for 
the addressee(s). Access to this email by anyone else is unauthorized. If 
you are not an intended recipient, you may not print, save or otherwise 
store the e-mail or any of the contents thereof in electronic or physical 
form, nor copy, use or disseminate the information contained in the email.  
If you are not an intended recipient,  please notify the sender of this 
email immediately.


Re: How to debug empty ParsedQuery from Edismax Query Parser

2018-12-27 Thread Shawn Heisey

On 12/27/2018 10:47 AM, Kay Wrobel wrote:

Now starting from SOLR version 5+, I receive zero (0) results back, but more 
importantly, the Query Parser produces an empty parsedQuery.

Here is the same query issued to SOLR 7.6.0 (current version):
https://pastebin.com/XcNhfdUD 

Notice how "parsedQuery" now shows "+()"; an empty query string.


I can duplicate this result on a 7.5.0 example config by sending an 
edismax query with undefined parameters for df and qf. The other 
field-related parameters for edismax are also undefined.  The following 
URL parameters with the default example config will produce that parsed 
query:


q=ac6023*&defType=edismax&df=&qf=&debugQuery=on

When a query is made and Solr's logging configuration is at its default 
setting, Solr will log a line into its logfile containing all of the 
parameters in the query, both those provided on the URL and those set by 
Solr's configuration (solrconfig.xml).  Can you share this log line from 
both the version that works and the version that doesn't?


This is the log line created when I ran the query mentioned above:

2018-12-27 23:03:23.199 INFO  (qtp315932542-23) [   x:baz] 
o.a.s.c.S.Request [baz]  webapp=/solr path=/select 
params={q=ac6023*&defType=edismax&df=&qf=&debugQuery=on} hits=0 status=0 
QTime=0


What I'm thinking is that there is a difference in the configuration of 
the two servers or the actual query being sent is different.  Either 
way, there's something different.  The two log lines that I have asked 
for are likely to be different from each other in some way that will 
explain what you're seeing.


Thanks,
Shawn



Re: How to debug empty ParsedQuery from Edismax Query Parser

2018-12-27 Thread Alexandre Rafalovitch
EchoParams=all

May also be helpful to pinpoint differences in params from all sources,
including request handler defaults.

Regards,
Alex

On Thu, Dec 27, 2018, 8:25 PM Shawn Heisey  On 12/27/2018 10:47 AM, Kay Wrobel wrote:
> > Now starting from SOLR version 5+, I receive zero (0) results back, but
> more importantly, the Query Parser produces an empty parsedQuery.
> >
> > Here is the same query issued to SOLR 7.6.0 (current version):
> > https://pastebin.com/XcNhfdUD 
> >
> > Notice how "parsedQuery" now shows "+()"; an empty query string.
>
> I can duplicate this result on a 7.5.0 example config by sending an
> edismax query with undefined parameters for df and qf. The other
> field-related parameters for edismax are also undefined.  The following
> URL parameters with the default example config will produce that parsed
> query:
>
> q=ac6023*&defType=edismax&df=&qf=&debugQuery=on
>
> When a query is made and Solr's logging configuration is at its default
> setting, Solr will log a line into its logfile containing all of the
> parameters in the query, both those provided on the URL and those set by
> Solr's configuration (solrconfig.xml).  Can you share this log line from
> both the version that works and the version that doesn't?
>
> This is the log line created when I ran the query mentioned above:
>
> 2018-12-27 23:03:23.199 INFO  (qtp315932542-23) [   x:baz]
> o.a.s.c.S.Request [baz]  webapp=/solr path=/select
> params={q=ac6023*&defType=edismax&df=&qf=&debugQuery=on} hits=0 status=0
> QTime=0
>
> What I'm thinking is that there is a difference in the configuration of
> the two servers or the actual query being sent is different.  Either
> way, there's something different.  The two log lines that I have asked
> for are likely to be different from each other in some way that will
> explain what you're seeing.
>
> Thanks,
> Shawn
>
>


UnifiedHighlighter returns an error when setting hl.maxAnalyzedChars=-1

2018-12-27 Thread Yasufumi Mizoguchi
Hi,

I faced UnifiedHighlighter error when setting hl.maxAnalyzedChars=-1 in
Solr 7.6.
Here is the procedure for reproducing.

$ bin/solr -e techproducts
$ curl -XGET
"localhost:8983/solr/techproducts/select?hl.fl=name&hl.maxAnalyzedChars=-1&hl.method=unified&hl=on&q=memory&df=name"

I have written a patch to replace negative values of the parameter with
Integer.MAX_VALUE - 1
(Because UnifiedHighlighter seems not to accept
maxAnalyzedChars=Integer.MAX_VALUE,
unlike the others...)

Can I open a JIRA about this issue and post my patch to that?

Thanks,
Yasufumi.