Solr 7.4 - LTR reranker not adhering by Elevate Plugin

2020-05-14 Thread Ashwin Ramesh
Hi everybody,

We are running a query with both elevateIds=1,2,3 & a reranker phase using
LTR plugin.

We noticed that the results do not return in the expected order - per the
elevateIds param.
Example LTR rq param {!ltr.model=foo reRankDocs=250 efi.query=$q}

When I used the standard reranker ({!rerank reRankQuery=$titleQuery
reRankDocs=1000 reRankWeight=3}) , it did adhere.

I assumed it's because the elevate plugin runs before the reranker (LTR).
However I'm finding it hard to confirm. The model is a linear model.

Is this expected behaviour?

Regards,

Ash

-- 
**
** Empowering the world to design
Share accurate 
information on COVID-19 and spread messages of support to your community.

Here are some resources 

 
that can help.
   
   
    













Re: Secure communication between Solr and Zookeeper

2020-05-14 Thread ChienHuaWang
Hi Jan,
Could you provide more detail what are the steps to setup between zookeeper
& Solr?



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Solr TLS for CDCR

2020-05-14 Thread ChienHuaWang
Does anyone have experience to setup TLS for Solr CDCR?

I read the documentation:
https://lucene.apache.org/solr/guide/7_6/enabling-ssl.html
Would this apply to CDCR once enable? or we need additional configuration
for CDCR?

Appreciate any feedback




--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Secure communication between Solr and Zookeeper

2020-05-14 Thread Jan Høydahl
I’m sorry, I don’t have the possibility of completing that now. As I said you 
have some pointers in https://issues.apache.org/jira/browse/SOLR-7893 
 but is it not completed, so 
this is currently an undocumented (and unsupported) feature. That means you’re 
on your own.
The documentation from ZK project should be enough, and I got it working on a 
local test once but then I needed to use JDK 14 and not JDK 11, don’t know why.

If you make it work, please contribute insights in the above JIRA issue.

Jan

> 14. mai 2020 kl. 05:47 skrev ChienHuaWang :
> 
> Hi Jan,
> Could you provide more detail what are the steps to setup between zookeeper
> & Solr?
> 
> 
> 
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Filtering large amount of values

2020-05-14 Thread Rudenko, Artur
Hi,
We have a requirement of implementing a boolean filter with up to 500k values.

We took the approach of post filter.

Our environment has 7 servers of 128gb ram and 64cpus each server. We have 
20-40m very large documents. Each solr instance has 64 shards with 2 replicas 
and JVM memory xms and xmx set to 31GB.

We are seeing that using single post filter with 1000 on 20m documents takes 
about 4.5 seconds.

Logic in our collect method:
numericDocValues = 
reader.getNumericDocValues(FileFilterPostQuery.this.metaField);

if (numericDocValues != null && 
numericDocValues.advanceExact(docNumber)) {
longVal = numericDocValues.longValue();
} else {
return;
}
}

if (numericValuesSet.contains(longVal)) {
super.collect(docNumber);
}


Is it the best we can get?


Thanks,
Artur Rudenko


This electronic message may contain proprietary and confidential information of 
Verint Systems Inc., its affiliates and/or subsidiaries. The information is 
intended to be for the use of the individual(s) or entity(ies) named above. If 
you are not the intended recipient (or authorized to receive this e-mail for 
the intended recipient), you may not use, copy, disclose or distribute to 
anyone this message or any information contained in this message. If you have 
received this electronic message in error, please notify us by replying to this 
e-mail.


Re: Filtering large amount of values

2020-05-14 Thread Mikhail Khludnev
Hi, Artur.

Please, don't tell me that you obtain docValues per every doc? It's deadly
slow see https://issues.apache.org/jira/browse/LUCENE-9328 for related
problem.
Make sure you obtain them once per segment, when leaf reader is injected.
Recently there are some new method(s) for {!terms} I'm wondering if any of
them might solve the problem.

On Thu, May 14, 2020 at 2:36 PM Rudenko, Artur 
wrote:

> Hi,
> We have a requirement of implementing a boolean filter with up to 500k
> values.
>
> We took the approach of post filter.
>
> Our environment has 7 servers of 128gb ram and 64cpus each server. We have
> 20-40m very large documents. Each solr instance has 64 shards with 2
> replicas and JVM memory xms and xmx set to 31GB.
>
> We are seeing that using single post filter with 1000 on 20m documents
> takes about 4.5 seconds.
>
> Logic in our collect method:
> numericDocValues =
> reader.getNumericDocValues(FileFilterPostQuery.this.metaField);
>
> if (numericDocValues != null &&
> numericDocValues.advanceExact(docNumber)) {
> longVal = numericDocValues.longValue();
> } else {
> return;
> }
> }
>
> if (numericValuesSet.contains(longVal)) {
> super.collect(docNumber);
> }
>
>
> Is it the best we can get?
>
>
> Thanks,
> Artur Rudenko
>
>
> This electronic message may contain proprietary and confidential
> information of Verint Systems Inc., its affiliates and/or subsidiaries. The
> information is intended to be for the use of the individual(s) or
> entity(ies) named above. If you are not the intended recipient (or
> authorized to receive this e-mail for the intended recipient), you may not
> use, copy, disclose or distribute to anyone this message or any information
> contained in this message. If you have received this electronic message in
> error, please notify us by replying to this e-mail.
>


-- 
Sincerely yours
Mikhail Khludnev


How to determine why solr stops running?

2020-05-14 Thread Ryan W
Hi all,

I manage a site where solr has stopped running a couple times in the past
week. The server hasn't been rebooted, so that's not the reason.  What else
causes solr to stop running?  How can I investigate why this is happening?

Thank you,
Ryan


Re: How to determine why solr stops running?

2020-05-14 Thread James Greene
Check the log for for an OOM crash.  Fatal exceptions will be in the main
solr log and out of memory errors will be in their own -oom log.

I've encountered quite a few solr crashes and usually it's when there's a
threshold of concurrent users and/or indexing happening.



On Thu, May 14, 2020, 9:23 AM Ryan W  wrote:

> Hi all,
>
> I manage a site where solr has stopped running a couple times in the past
> week. The server hasn't been rebooted, so that's not the reason.  What else
> causes solr to stop running?  How can I investigate why this is happening?
>
> Thank you,
> Ryan
>


Performance issue in Query execution in Solr 8.3.0 and 8.5.1

2020-05-14 Thread vishal patel
I am upgrading Solr 6.1.0 to Solr 8.3.0 or Solr 8.5.1.

I get performance issue for query execution in Solr 8.3.0 or Solr 8.5.1 when 
values of one field is large in query and group field is apply.

My Solr URL : 
https://drive.google.com/file/d/1UqFE8I6M451Z1wWAu5_C1dzqYEOGjuH2/view
My Solr config and schema : 
https://drive.google.com/drive/folders/1pJBxL0OOwAJSEC5uK_87ikaHEVGdDEEn

It takes 34 seconds in Solr 8.3.0 or Solr 8.5.1. Same URL takes 1.5 seconds in 
Solr 6.1.0.

Is there any changes or issue related to grouping in Solr 8.3.0 or 8.5.1?


Regards,
Vishal Patel



Re: DIH nested entity repeating query in verbose output

2020-05-14 Thread matthew sporleder
I think this is just an issue in the verbose/debug output.  tcpdump
does not show the same issue.

On Wed, May 13, 2020 at 7:39 PM matthew sporleder  wrote:
>
> I am attempting to use nested entities to populate documents from
> different tables and verbose/debug output is showing repeated queries
> on import.  The doc number repeats the sqls.
>
> "verbose-output":
> [ "entity:parent",
> ..
> [ "document#5", [
> ...
> "entity:nested1", [
> "query", "SELECT body AS nested1 FROM table WHERE p_id = '1234'",
> "query", "SELECT body AS nested1 FROM table WHERE p_id = '1234'",
> "query", "SELECT body AS nested1 FROM table WHERE p_id = '1234'",
> "query", "SELECT body AS nested1 FROM table WHERE p_id = '1234",
> "query", "SELECT body AS nested1 FROM table WHERE p_id = '1234",
> "time-taken", "0:0:0.1",
> "time-taken", "0:0:0.1",
> "time-taken", "0:0:0.1",
> "time-taken", "0:0:0.1",
> "time-taken", "0:0:0.1" ],
>
>
> The counts appears to be correct?
> Requests: 61 , Fetched: 20 , Skipped: 0 , Processed: 20
>
>
> I have a config like:
>
>dataSource="database"
>   name="parent"
>   pk="id"
>   query="SELECT .."
>   deltaImportQuery="SELECT.."
>   deltaQuery="SELECT.."
>   >
>name="child1"
> query="SELECT body AS nested1 FROM table WHERE p_id = '${parent.id}'
> deltaQuery=...
> parentDeltaQuery=...
> etc
>   >
>   
>name="child2"
> query="SELECT body AS nested2 FROM table WHERE p_id = '${parent.id}'
> deltaQuery=...
> parentDeltaQuery=...
> etc
>   >
>   
>name="child3"
> query="SELECT body AS nested3 FROM table WHERE p_id = '${parent.id}'
> deltaQuery=...
> parentDeltaQuery=...
> etc
>   >
>   
>
>
> 


nested entities and DIH indexing time

2020-05-14 Thread matthew sporleder
It appears that adding entities to my entities in my data import
config is slowing down my import process by a lot.  Is there a good
way to speed this up?  I see the ID's are individually queried instead
of using IN() or similar normal techniques to make things faster.

Just looking for some tips.  I prefer this architecture to the way we
currently do it with complex SQL, inserting weird strings, and then
splitting on them (gross but faster).


Re: 404 response from Schema API

2020-05-14 Thread Mark H. Wood
On Fri, Apr 17, 2020 at 10:11:40AM -0600, Shawn Heisey wrote:
> On 4/16/2020 10:07 AM, Mark H. Wood wrote:
> > I need to ask Solr 4.10 for the name of the unique key field of a
> > schema.  So far, no matter what I've done, Solr is returning a 404.
> > 
> > This works:
> > 
> >curl 'https://toolshed.wood.net:8443/isw6_3/solr/statistics/select'
> > 
> > This gets a 404:
> > 
> >curl 
> > 'https://toolshed.wood.net:8443/isw6_3/solr/statistics/schema/uniquekey'
> > 
> > So does this:
> > 
> >curl 'https://toolshed.wood.net:8443/isw6_3/solr/statistics/schema'
> > 
> > We normally use the ClassicIndexSchemaFactory.  I tried switching to
> > ManagedIndexSchemaFactory but it made no difference.  Nothing is
> > logged for the failed requests.
> 
>  From what I can see, the schema API handler was introduced in version 
> 5.0.  The SchemaHandler class exists in the released javadoc for the 5.0 
> version, but not the 4.10 version.  You'll need a newer version of Solr.

*sigh*  That's what I see too, when I dig through the JARs.  For some
reason, many folks believe that the Schema API existed at least as
far back as 4.2:

  
https://stackoverflow.com/questions/7247221/does-solr-has-api-to-read-solr-schema-xml

Perhaps because the _Apache Solr Reference Guide 4.10_ says so, on
page 53.

This writer thinks it worked, read-only, on 4.10.3:

  
https://stackoverflow.com/questions/33784998/solr-rest-api-for-schema-updates-returns-method-not-allowed-405

But it doesn't work here, on 4.10.4:

  curl 'https://toolshed.wood.net:8443/isw6/solr/statistics/schema?wt=json'
  14-May-2020 15:07:03.805 INFO 
[https-jsse-nio-fec0:0:0:1:0:0:0:7-8443-exec-60] 
org.restlet.engine.log.LogFilter.afterHandle 2020-05-14  15:07:03
fec0:0:0:1:0:0:0:7  -   fec0:0:0:1:0:0:0:7  8443GET 
/isw6/solr/schema   wt=json 404 0   0   0   
https://toolshed.wood.net:8443  curl/7.69.1 -

Strangely, Solr dropped the core-name element of the path!

Any idea what happened?

Anyway, I'll be reading up on how to upgrade to 5.  (Hopefully not
farther, just yet -- changes between, I think, 5 and 6 mean I'd have
to spend a week reloading 10 years worth of data.  For now I don't
want to go any farther than I have to, to make this work.)

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu


signature.asc
Description: PGP signature


Re: 404 response from Schema API

2020-05-14 Thread Mark H. Wood
On Thu, May 14, 2020 at 03:13:07PM -0400, Mark H. Wood wrote:
> Anyway, I'll be reading up on how to upgrade to 5.  (Hopefully not
> farther, just yet -- changes between, I think, 5 and 6 mean I'd have
> to spend a week reloading 10 years worth of data.  For now I don't
> want to go any farther than I have to, to make this work.)

Nope, my memory was faulty:  those changes happened in 5.0.  (The
schemas I've been given, used since time immemorial, are chock full of
IntField and DateField.)  I'm stuck with reloading.  Might as well go
to 8.x.  Or give up on asking Solr for the schema's uniqueKey,
configure the client with the field name and cross fingers.

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu


signature.asc
Description: PGP signature


Re: nested entities and DIH indexing time

2020-05-14 Thread Shawn Heisey

On 5/14/2020 9:36 AM, matthew sporleder wrote:

It appears that adding entities to my entities in my data import
config is slowing down my import process by a lot.  Is there a good
way to speed this up?  I see the ID's are individually queried instead
of using IN() or similar normal techniques to make things faster.

Just looking for some tips.  I prefer this architecture to the way we
currently do it with complex SQL, inserting weird strings, and then
splitting on them (gross but faster).


When you have nested entities, this is how DIH works.  A separate SQL 
query for the inner entity is made for each row returned on the outer 
entity.  Nested entities tend to be extremely slow for this reason.


The best way to work around this is to make the database server do the 
heavy lifting -- using JOIN or other methods so that you only need one 
entity and one SQL query.  Doing this will mean that you'll need to 
split the data after import, using either the DIH config or the analysis 
configuration in the schema.


Thanks,
Shawn


Re: 404 response from Schema API

2020-05-14 Thread Shawn Heisey

On 5/14/2020 1:13 PM, Mark H. Wood wrote:

On Fri, Apr 17, 2020 at 10:11:40AM -0600, Shawn Heisey wrote:

On 4/16/2020 10:07 AM, Mark H. Wood wrote:

I need to ask Solr 4.10 for the name of the unique key field of a
schema.  So far, no matter what I've done, Solr is returning a 404.


The Luke Request Handler, normally assigned to the /admin/luke path, 
will give you the info you're after.  On a stock Solr install, the 
following URL would work:


/solr/admin/luke?show=schema

I have tried this on solr 4.10.4 and can confirm that the response does 
have the information.


Since you are working with a different context path, you'll need to 
adjust your URL to match.


Note that as of Solr 5.0, running with a different context path is not 
supported.  The admin UI and the more advanced parts of the startup 
scripts are hardcoded for the /solr context.


Thanks,
Shawn


Re: nested entities and DIH indexing time

2020-05-14 Thread matthew sporleder
On Thu, May 14, 2020 at 4:46 PM Shawn Heisey  wrote:
>
> On 5/14/2020 9:36 AM, matthew sporleder wrote:
> > It appears that adding entities to my entities in my data import
> > config is slowing down my import process by a lot.  Is there a good
> > way to speed this up?  I see the ID's are individually queried instead
> > of using IN() or similar normal techniques to make things faster.
> >
> > Just looking for some tips.  I prefer this architecture to the way we
> > currently do it with complex SQL, inserting weird strings, and then
> > splitting on them (gross but faster).
>
> When you have nested entities, this is how DIH works.  A separate SQL
> query for the inner entity is made for each row returned on the outer
> entity.  Nested entities tend to be extremely slow for this reason.
>
> The best way to work around this is to make the database server do the
> heavy lifting -- using JOIN or other methods so that you only need one
> entity and one SQL query.  Doing this will mean that you'll need to
> split the data after import, using either the DIH config or the analysis
> configuration in the schema.
>
> Thanks,
> Shawn

This is too bad because it is very clean and the JOIN/CONCAT/SPLIT
method is very gross.

I was also hoping to use different delta queries for each nested entity.

Can a non-nested entity write into existing docs, or do they always
have to produce document-per-entity?


RE: using aliases in topic stream

2020-05-14 Thread Nightingale, Jonathan A (US)
Currently playing with 8.1 but 7.4 is what's in our production environment.

-Original Message-
From: Joel Bernstein  
Sent: Wednesday, May 13, 2020 1:11 PM
To: solr-user@lucene.apache.org
Subject: Re: using aliases in topic stream

*** WARNING ***
EXTERNAL EMAIL -- This message originates from outside our organization.


What version of Solr are you using? The topic stream in master seems to have 
the code in place to query aliases.

Joel Bernstein
http://joelsolr.blogspot.com/


On Wed, May 13, 2020 at 12:33 PM Nightingale, Jonathan A (US) < 
jonathan.nighting...@baesystems.com> wrote:

> Hi Everyone,
>
> I'm trying to run this stream and I get the following error
>
> topic(topics,collection1,
> q="classes:GXP/INDEX",fl="uuid",id="feed-8",initialCheckpoint=0,checkp
> ointEvery=-1)
>
> {
>   "result-set": {
> "docs": [
>   {
> "EXCEPTION": "Slices not found for collection1",
> "EOF": true,
> "RESPONSE_TIME": 6
>   }
> ]
>   }
> }
>
> "collection1" is an alias. I can search using the alias perfectly 
> fine. In fact the search stream operation works fine with the alias. 
> It's just this topic one I've seen so far. Does anyone know why this is?
>
> Thanks!
> Jonathan Nightingale
>
>


RE: using aliases in topic stream

2020-05-14 Thread Nightingale, Jonathan A (US)
I'm looking on master on git hub, the solrj tests assume never use aliases
Just as an example. that’s all over the place in the tests

https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/test/org/apache/solr/client/solrj/io/stream/StreamDecoratorTest.java

@Test
  public void testTerminatingDaemonStream() throws Exception {
Assume.assumeTrue(!useAlias);

-Original Message-
From: Joel Bernstein  
Sent: Wednesday, May 13, 2020 1:11 PM
To: solr-user@lucene.apache.org
Subject: Re: using aliases in topic stream

*** WARNING ***
EXTERNAL EMAIL -- This message originates from outside our organization.


What version of Solr are you using? The topic stream in master seems to have 
the code in place to query aliases.

Joel Bernstein
http://joelsolr.blogspot.com/


On Wed, May 13, 2020 at 12:33 PM Nightingale, Jonathan A (US) < 
jonathan.nighting...@baesystems.com> wrote:

> Hi Everyone,
>
> I'm trying to run this stream and I get the following error
>
> topic(topics,collection1,
> q="classes:GXP/INDEX",fl="uuid",id="feed-8",initialCheckpoint=0,checkp
> ointEvery=-1)
>
> {
>   "result-set": {
> "docs": [
>   {
> "EXCEPTION": "Slices not found for collection1",
> "EOF": true,
> "RESPONSE_TIME": 6
>   }
> ]
>   }
> }
>
> "collection1" is an alias. I can search using the alias perfectly 
> fine. In fact the search stream operation works fine with the alias. 
> It's just this topic one I've seen so far. Does anyone know why this is?
>
> Thanks!
> Jonathan Nightingale
>
>


Re: using aliases in topic stream

2020-05-14 Thread Joel Bernstein
This is where the alias work was done:

https://issues.apache.org/jira/browse/SOLR-9077

It could be though that there is a bug here. I'll see if I can reproduce it
locally.



Joel Bernstein
http://joelsolr.blogspot.com/


On Thu, May 14, 2020 at 6:24 PM Nightingale, Jonathan A (US) <
jonathan.nighting...@baesystems.com> wrote:

> I'm looking on master on git hub, the solrj tests assume never use aliases
> Just as an example. that’s all over the place in the tests
>
>
> https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/test/org/apache/solr/client/solrj/io/stream/StreamDecoratorTest.java
>
> @Test
>   public void testTerminatingDaemonStream() throws Exception {
> Assume.assumeTrue(!useAlias);
>
> -Original Message-
> From: Joel Bernstein 
> Sent: Wednesday, May 13, 2020 1:11 PM
> To: solr-user@lucene.apache.org
> Subject: Re: using aliases in topic stream
>
> *** WARNING ***
> EXTERNAL EMAIL -- This message originates from outside our organization.
>
>
> What version of Solr are you using? The topic stream in master seems to
> have the code in place to query aliases.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Wed, May 13, 2020 at 12:33 PM Nightingale, Jonathan A (US) <
> jonathan.nighting...@baesystems.com> wrote:
>
> > Hi Everyone,
> >
> > I'm trying to run this stream and I get the following error
> >
> > topic(topics,collection1,
> > q="classes:GXP/INDEX",fl="uuid",id="feed-8",initialCheckpoint=0,checkp
> > ointEvery=-1)
> >
> > {
> >   "result-set": {
> > "docs": [
> >   {
> > "EXCEPTION": "Slices not found for collection1",
> > "EOF": true,
> > "RESPONSE_TIME": 6
> >   }
> > ]
> >   }
> > }
> >
> > "collection1" is an alias. I can search using the alias perfectly
> > fine. In fact the search stream operation works fine with the alias.
> > It's just this topic one I've seen so far. Does anyone know why this is?
> >
> > Thanks!
> > Jonathan Nightingale
> >
> >
>


Terraform and EC2

2020-05-14 Thread Walter Underwood
Anybody building sharded clusters with Terraform on EC2? I’d love some hints.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)



Re: Terraform and EC2

2020-05-14 Thread Ganesh Sethuraman
We use terraform on EC2 for creating infrastructure as code for solr cloud
and Zookeeper quorum ( using 3 node auto scale target group terra form
module) and solr as well with n node auto scale group module. Auto scale
target group is just to make it easy to create cluster infrastructure. We
need to create a security group to attach to both zookeeper and solr.

Ganesh

On Thu, May 14, 2020, 7:57 PM Walter Underwood 
wrote:

> Anybody building sharded clusters with Terraform on EC2? I’d love some
> hints.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>


Re: nested entities and DIH indexing time

2020-05-14 Thread Shawn Heisey
On 5/14/2020 3:14 PM, matthew sporleder wrote:> Can a non-nested entity 
write into existing docs, or do they always> have to produce 
document-per-entity?
This is the only thing I found on this topic, and it is on a third-party 
website, so I can't say much about how accurate it is:


https://stackoverflow.com/questions/21006045/can-solr-dih-do-atomic-updates

I have never used a ScriptTransformer, so I do not know how to actually 
do this.


Your schema would have to be compatible with atomic updates.

Thanks,
Shawn



Dynamic Stopwords

2020-05-14 Thread A Adel
Hi - Is there a way to configure stop words to be dynamic for each document
based on the language detected of a multilingual text field? Combining all
languages stop words in one set is a possibility however it introduces
false positives for some language combinations, such as German and English.
Thanks, A.