Partial results from streaming expressions (i.e. making them "stream")

2018-01-15 Thread Radu Gheorghe
Hello fellow solr-users!

Currently, if I do an HTTP request to receive some data via streaming
expressions, like:

curl --data-urlencode 'expr=search(test,
   q="foo_s:*",
   fl="foo_s",
   sort="foo_s asc",
   qt="/export")'
http://localhost:8983/solr/test/stream

I get all results at once. This is more obvious if I simply introduce
a one-second sleep in CloudSolrStream: with three documents, the
request takes about three seconds, and I get all three docs after
three seconds.

Instead, I would like to get documents in a more "streaming" way. For
example, after X seconds give me what you already have. Or if an
Y-sized buffer fills up, give me all the tuples you have, then resume.

Any ideas/opinions in terms of how I could achieve this? With or
without changing Solr's code?

Here's what I have so far:
- this is normal with non-chunked HTTP/1.1. You get all results at
once. If I revert this patch[1] and get Solr to use chunked encoding,
I get partial results every... what seems to be a certain size between
16KB and 32KB
- I couldn't find a way to manually change this... what I assume is a
buffer size, but failed so far. I've tried changing Jetty's
response.setBufferSize() in HttpSolrCall (maybe the wrong place to do
it?) and also tried changing the default 8KB buffer in FastWriter
- manually flushing the writer (in JSONResponseWriter) gives the
expected results (in combination with chunking)

The thing is, even if I manage to change the buffer size, I assume
that will apply to all requests (not just streaming expressions). I
assume that ideally it would be configurable per request. As for
manual flushing, that would require changes to the streaming
expressions themselves. Would that be the way to go? What do you
think?

[1] https://issues.apache.org/jira/secure/attachment/12787283/SOLR-8669.patch

Best regards,
Radu
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


[ANNOUNCE] Apache Solr 7.2.1 released

2018-01-15 Thread jim ferenczi
15 January 2018, Apache Solr™ 7.2.1 available

The Lucene PMC is pleased to announce the release of Apache Solr 7.2.1

Solr is the popular, blazing fast, open source NoSQL search platform from
the Apache Lucene project. Its major features include powerful full-text
search, hit highlighting, faceted search and analytics, rich document
parsing, geospatial search, extensive REST APIs as well as parallel SQL.
Solr is enterprise grade, secure and highly scalable, providing fault
tolerant distributed search and indexing, and powers the search and
navigation features of many of the world's largest internet sites.

This release includes 3 bug fixes since the 7.2.0 release:

* Overseer can never process some last messages.

* Rename core in solr standalone mode is not persisted.

* QueryComponent's rq parameter parsing no longer considers the defType
parameter.

* Fix NPE in SolrQueryParser when the query terms inside a filter clause
reduce to nothing.

Furthermore, this release includes Apache Lucene 7.2.1 which includes 1 bug
fix since the 7.2.0 release.

The release is available for immediate download at:

http://www.apache.org/dyn/closer.lua/lucene/solr/7.2.1

Please read CHANGES.txt for a detailed list of changes:

https://lucene.apache.org/solr/7_2_1/changes/Changes.html

Please report any feedback to the mailing lists (
http://lucene.apache.org/solr/discussion.html)

Note: The Apache Software Foundation uses an extensive mirroring network
for distributing releases. It is possible that the mirror you are using may
not have replicated the release yet. If that is the case, please try
another mirror. This also goes for Maven access.


RE: Got unexpected results.

2018-01-15 Thread Peter Lancaster
Shouldn't the query just be something like title: "to order this report" and 
then it will work.


-Original Message-
From: Sanjeet Kumar [mailto:sanjeetkumar...@gmail.com]
Sent: 15 January 2018 06:20
To: solr-user@lucene.apache.org
Subject: Got unexpected results.

Hi,

I am using Solr-6.4.2, did a query (*title*:("to order this report"~*0*)) on 
"*text_en*" field and matched ("title":"Forrester Research cites SAP Hybris as 
a leader in B2B Order Management report").

As per my understanding, this could not match as there is a word "Management"
between "Order' and "report". Can somebody explain this?.

Thanks.


This message is confidential and may contain privileged information. You should 
not disclose its contents to any other person. If you are not the intended 
recipient, please notify the sender named above immediately. It is expressly 
declared that this e-mail does not constitute nor form part of a contract or 
unilateral obligation. Opinions, conclusions and other information in this 
message that do not relate to the official business of findmypast shall be 
understood as neither given nor endorsed by it.


__

This email has been checked for virus and other malicious content prior to 
leaving our network.
__


Solr cloud upgrade from 5 to 6

2018-01-15 Thread Novin Novin
Hi Guys,

I would need a piece of advise about upgrading solr cloud.  Would I need to
re-index data If upgrade Solr cloud from 5.5.4 to 6.6.2?

Thanks in advance.
Navin


How to implement the function of W/N in Solr?

2018-01-15 Thread xizhen.w...@incoshare.com
Hello,

I'm using Solr 4.10.3, and I want "A" and "B" are together, "C" and "D" are 
together, and the terms "B" and "C" are no more than 3 terms away from each 
other, by using {!surround} 3w("A B", "C D"), but it doesn't work.  Is there 
any other useful way?

Any help is appreciated.



xizhen.w...@incoshare.com


Uncheck dataimport checkboxes by default

2018-01-15 Thread Daniel Carrasco
Hello,

My question is just what I've summarized on the subject: Is there any way
to change the default state of the checkboxes on dataimport admin page?

I want to change the default state of the "clean" checkbox to uncheck
because sometimes I import incremental data and I forgot to uncheck that
box, then all data is cleared and I've to import all again.



Thanks!!​

-- 
_

  Daniel Carrasco Marín
  Ingeniería para la Innovación i2TIC, S.L.
  Tlf:  +34 911 12 32 84 Ext: 223
  www.i2tic.com
_


Re: How to implement the function of W/N in Solr?

2018-01-15 Thread Atita Arora
Did you give Proximity Search a try ?


On Mon, Jan 15, 2018 at 1:34 PM, xizhen.w...@incoshare.com <
xizhen.w...@incoshare.com> wrote:

> Hello,
>
> I'm using Solr 4.10.3, and I want "A" and "B" are together, "C" and "D"
> are together, and the terms "B" and "C" are no more than 3 terms away from
> each other, by using {!surround} 3w("A B", "C D"), but it doesn't work.  Is
> there any other useful way?
>
> Any help is appreciated.
>
>
>
> xizhen.w...@incoshare.com
>


Announcing the OpenMinTED Open Tender Phase II Funding opportunity for text mining/IR developers

2018-01-15 Thread Martin Krallinger
*Announcing the OpenMinTED Open Tender Phase II Funding opportunity for
information retrieval technology developers*

(Apologies for cross-posting)



OpenMinTED (openminted.eu) Open Tender Phase II Funding Award for
information retrieval, machine learning and text preprocessing technologies
applied to scientific publications.



Over the past years there has been a considerable accumulation of scholarly
literature, which motivated the implementation of IR and text mining
solutions to extract automatically valuable scientific information.



The OpenMinTeD project aspires to enable the creation of an open
infrastructure that facilitates the use of text mining and IR technologies
in the scientific publications world, building on existing tools, and
renders them interoperable through appropriate registries and a
standards-based interoperability layer.


OpenMintED, as part of its open tender call initiative, invites
researchers, service providers and SMEs to submit proposals related to the
development and integration of existing IR, text preprocessing and text
mining/NLP applications or software components as well as knowledge
resources that can align and interoperate with the OpenMinTeD
infrastructure.

*Why you should apply*

*Winners of this call will be awarded € 7.000 (for small bids) or € 17.500
(for larger bids) *including VAT and expenses to implement the integration
of their software or components into the OpenMinTeD infrastructure.



For details on the submission process and technical support please visit
the OpenMintED Open Tender call page at



https://openminted.bsc.es/





*You can apply for this call until 26th January 2018*. Winners of the call
will be awarded a sum of money to implement their plans.





For informal inquires  please contact,



Montserrat Marimon Felipe

montserrat.mari...@gmail.com
Barcelona Supercomputing Center –

Centro Nacional de Supercomputación,

Life Sciences Department,

Barcelona 08034,
Spain


Re: Solr server configuration

2018-01-15 Thread Emir Arnautović
Hi Deepak,
Here is another blog post containing some thought how it can be estimated.

http://www.od-bits.com/2018/01/solrelasticsearch-capacity-planning.html 


HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 12 Jan 2018, at 17:08, Erick Erickson  wrote:
> 
> First, it's totally impossible to answer in the abstract, see:
> https://lucidworks.com/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
> 
> Second, indexing DB tables directly into Solr is usually the wrong
> approach. Solr is not a replacement for a relational DB, it does not
> function as a DB, is not optimized for joins etc. It's  a _search
> engine_ and does that superlatively.
> 
> At the very least, the most common recommendation if you have the
> space is to de-normalize the data. My point is you need to think about
> this problem in terms of _search_, not "move some tables to Solr and
> use Solr like a DB". Which means that even if someone can answer your
> questions, it won't help much.
> 
> Best,
> Erick
> 
> On Thu, Jan 11, 2018 at 9:53 PM, Deepak Nair  wrote:
>> Hello,
>> 
>> We want to implement Solr 7.x for one of our client with below requirement.
>> 
>> 
>> 1.   The data to be index will be from 2 Oracle databases with 2 tables 
>> each and around 10 columns.
>> 
>> 2.   The data volume is expected to be reach around 10 million in each 
>> table.
>> 
>> 3.   4000+ users will query the indexed data from a UI. The peak load is 
>> expected to be around 2000 queries/sec.
>> 
>> 4.   The implementation will be on a standalone or clustered Unix 
>> environment.
>> 
>> I want to know what should be the best server configuration for this kind of 
>> requirement. Eg: how many VMs, what should be the RAM, Heap size etc.
>> 
>> Thanks,
>> Deepak



Re: Mixing simple and nested docs in same update?

2018-01-15 Thread Jan Høydahl
Radio silence…

Here is a GIST for easy reproduction. Is this by design?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 11. jan. 2018 kl. 00:42 skrev Jan Høydahl :
> 
> Hi,
> 
> We index several large nested documents. We found that querying the data 
> behaves differently depending on how the documents are indexed.
> 
> To reproduce:
> 
> solr start
> solr create -c nested
> # Index one plain document, “friend" and a nested one, “mother” and 
> “daughter”, in same request:
> curl localhost:8983/solr/nested/update -d ‘
> 
>   
> friend
> other
>   
>   
> mother
> parent
> 
>   daughter
>   child
> 
>   
> '
> 
> # Query for mother’s children using either child transformer or child query 
> parser
> curl 
> "localhost:8983/solr/a/query?q=id:mother&fl=%2A%2C%5Bchild%20parentFilter%3Dtype%3Aparent%5D”
> {
>  "responseHeader":{
>"zkConnected":true,
>"status":0,
>"QTime":4,
>"params":{
>  "q":"id:mother",
>  "fl":"*,[child parentFilter=type:parent]"}},
>  "response":{"numFound":1,"start":0,"docs":[
>  {
>"id":"mother",
>"type":["parent"],
>"_version_":1589249812802306048,
>"type_str":["parent"],
>"_childDocuments_":[
>{
>  "id":"friend",
>  "type":["other"],
>  "_version_":1589249812729954304,
>  "type_str":["other"]},
>{
>  "id":"daughter",
>  "type":["child"],
>  "_version_":1589249812802306048,
>  "type_str":["child"]}]}]
>  }}
> 
> As you can see, the “friend” got included as a child of “mother”.
> If you index the exact same request, putting “friend” after “mother” in the 
> xml,
> the query works as expected.
> 
> Inspecting the index, everything looks correct, and only “daughter” and 
> “mother” have _root_=mother.
> Is there a rule that you should start a new update request for each type of 
> parent/child relationship
> that you need to index, and not mix them in the same request?
> 
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> 



Parallelizing queries without Custom Component

2018-01-15 Thread Max Bridgewater
Hi,

My index is composed of product reviews. Each review contains the id of the
product it refers to. But it also contains a rating for this product and
the number of negative feedback provided on this product.

{
   id: solr doc id,
   rating: number between 0 and 5,
   product_id: the product that is being reviewed,
   negative_feedback: how many negative feedbacks on this product
}

The query below returns the "worst" review for the given product  7453632.
Worst is defined as  rated 1 to 3 and having the highest number of negative
feedback.

/select?q=product_id:7453632&fq=rating:[1 TO 3]&sort=negative_feedback
desc&rows=1

The query works as intended. Now the challenging part is to extend this
query to support many product_id. If executed with many product Id, the
result should be the list of worst reviews for all the provided products.

A query of the following form would return the list of worst products for
products: 7453632,645454,534664.

/select?q=product_id:[7453632,645454,534664]&fq=rating:[1 TO
3]&sort=negative_feedback desc

Is there a way to do this in Solr without custom component?

Thanks.
Max


Re: Parallelizing queries without Custom Component

2018-01-15 Thread Emir Arnautović
Hi Max,
It seems to me that you are looking for grouping 
https://lucene.apache.org/solr/guide/6_6/result-grouping.html 
 or field 
collapsing 
https://lucene.apache.org/solr/guide/6_6/collapse-and-expand-results.html 
 
feature.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 15 Jan 2018, at 17:27, Max Bridgewater  wrote:
> 
> Hi,
> 
> My index is composed of product reviews. Each review contains the id of the
> product it refers to. But it also contains a rating for this product and
> the number of negative feedback provided on this product.
> 
> {
>   id: solr doc id,
>   rating: number between 0 and 5,
>   product_id: the product that is being reviewed,
>   negative_feedback: how many negative feedbacks on this product
> }
> 
> The query below returns the "worst" review for the given product  7453632.
> Worst is defined as  rated 1 to 3 and having the highest number of negative
> feedback.
> 
> /select?q=product_id:7453632&fq=rating:[1 TO 3]&sort=negative_feedback
> desc&rows=1
> 
> The query works as intended. Now the challenging part is to extend this
> query to support many product_id. If executed with many product Id, the
> result should be the list of worst reviews for all the provided products.
> 
> A query of the following form would return the list of worst products for
> products: 7453632,645454,534664.
> 
> /select?q=product_id:[7453632,645454,534664]&fq=rating:[1 TO
> 3]&sort=negative_feedback desc
> 
> Is there a way to do this in Solr without custom component?
> 
> Thanks.
> Max



Re: request for instructions to add a another solr node

2018-01-15 Thread Sushil K Tripathi
Hi Erick,


Thanks a lot for your help.


The port is a typo while writing email but i double checked the URL with 
correct port. I am reinstalling the server and update you. In addition can you 
please confirm, after following the steps you mentioned, index data and 
replication used to automatically set on new target node or we need to 
configure that? If we need to configure that then can you please help with 
commands or configuration we need do follow.



With Warm Regards...
Sushil K. Tripathi



From: Erick Erickson 
Sent: Friday, January 12, 2018 11:29 PM
To: solr-user
Subject: Re: request for instructions to add a another solr node

What is the cause reported in the solr log? This should be in:

example/cloud/node3/solr/logs

that often gives a much more complete statement of what went wrong.

You don't really need the -cloud parameter, the -z parameter implies
that it's a SolrCloud
installation. That's not the root of your problem, more of an aside.

What's inconsistent here is that you started your third node on port
8987, but the URL you
accessed was 8983. That makes no sense to me. Forgetting the bits
about adding a new
Solr instance, do you see a healthy Solr cluster in the admin UI
before you add the
new instance? My bet is that your basic installation is messed up and
the new Solr node is
a red herring.

FWIW, I routinely spin up multiple Solr JVMs with :

mkdir ./example/cloud/node1

cp ./server/solr/solr.xml ./example/cloud/node1/solr

then

bin/solr start -z localhost:2181 -p 8981 -s example/cloud/node1/solr

Typically I use ports 8981, 8982, 8983, 8984 just because it makes keeping track
easier, but there's no reason 8987 wouldn't work.

Finally, assuming the Solr node starts successfully, you won't see
anything in the
admin UI unless you look under "live_nodes" in the
admin UI>>cloud>>tree
view

Best,
Erick


Re: request for instructions to add a another solr node

2018-01-15 Thread Erick Erickson
NP, I was new at this once too ;).

Erick

On Mon, Jan 15, 2018 at 8:38 AM, Sushil K Tripathi
 wrote:
> Hi Erick,
>
>
> Thanks a lot for your help.
>
>
> The port is a typo while writing email but i double checked the URL with 
> correct port. I am reinstalling the server and update you. In addition can 
> you please confirm, after following the steps you mentioned, index data and 
> replication used to automatically set on new target node or we need to 
> configure that? If we need to configure that then can you please help with 
> commands or configuration we need do follow.
>
>
>
> With Warm Regards...
> Sushil K. Tripathi
>
>
> 
> From: Erick Erickson 
> Sent: Friday, January 12, 2018 11:29 PM
> To: solr-user
> Subject: Re: request for instructions to add a another solr node
>
> What is the cause reported in the solr log? This should be in:
>
> example/cloud/node3/solr/logs
>
> that often gives a much more complete statement of what went wrong.
>
> You don't really need the -cloud parameter, the -z parameter implies
> that it's a SolrCloud
> installation. That's not the root of your problem, more of an aside.
>
> What's inconsistent here is that you started your third node on port
> 8987, but the URL you
> accessed was 8983. That makes no sense to me. Forgetting the bits
> about adding a new
> Solr instance, do you see a healthy Solr cluster in the admin UI
> before you add the
> new instance? My bet is that your basic installation is messed up and
> the new Solr node is
> a red herring.
>
> FWIW, I routinely spin up multiple Solr JVMs with :
>
> mkdir ./example/cloud/node1
>
> cp ./server/solr/solr.xml ./example/cloud/node1/solr
>
> then
>
> bin/solr start -z localhost:2181 -p 8981 -s example/cloud/node1/solr
>
> Typically I use ports 8981, 8982, 8983, 8984 just because it makes keeping 
> track
> easier, but there's no reason 8987 wouldn't work.
>
> Finally, assuming the Solr node starts successfully, you won't see
> anything in the
> admin UI unless you look under "live_nodes" in the
> admin UI>>cloud>>tree
> view
>
> Best,
> Erick


Re: Uncheck dataimport checkboxes by default

2018-01-15 Thread Erick Erickson
Daniel:

There's no preferences section in the admin UI. That said, it's
all angular js and the source is there wherever you unpacked
the package you could just change it. There's no need to
rebuild Solr etc

BTW, the mail server is pretty aggressive about stripping attachments,
your (presumed) screenshot is blank

Best,
Erick

On Mon, Jan 15, 2018 at 2:30 AM, Daniel Carrasco 
wrote:

> Hello,
>
> My question is just what I've summarized on the subject: Is there any way
> to change the default state of the checkboxes on dataimport admin page?
>
> I want to change the default state of the "clean" checkbox to uncheck
> because sometimes I import incremental data and I forgot to uncheck that
> box, then all data is cleared and I've to import all again.
>
>
>
> Thanks!!​
>
> --
> _
>
>   Daniel Carrasco Marín
>   Ingeniería para la Innovación i2TIC, S.L.
>   Tlf:  +34 911 12 32 84 Ext: 223
>   www.i2tic.com
> _
>


Re: Solr cloud upgrade from 5 to 6

2018-01-15 Thread Erick Erickson
No, Solr works hard to guarantee compatibility
one major revision back so any 6x version should
be able to work fine with any 5x version.

A couple of free bits of advice, worth what you pay
for them:

1> don't just use your configs from 5x in 6x. Rather
start with the stock 6x configs and customize them
as you did for 5x.

2> Really look over the CHANGES.txt, particularly hte
upgrade section for all versions between 5.5.4 and
6.6.2.

3> If you _can_ reindex, I always do if for no other reason
than that'll force me to look at what's new and make use
of it. Again you don't _have_ to though.

Best,
Erick

On Mon, Jan 15, 2018 at 2:10 AM, Novin Novin  wrote:
> Hi Guys,
>
> I would need a piece of advise about upgrading solr cloud.  Would I need to
> re-index data If upgrade Solr cloud from 5.5.4 to 6.6.2?
>
> Thanks in advance.
> Navin


Re: Got unexpected results.

2018-01-15 Thread Erick Erickson
What do you see when you add &debug=query?

Best,
Erick

On Mon, Jan 15, 2018 at 1:23 AM, Peter Lancaster
 wrote:
> Shouldn't the query just be something like title: "to order this report" and 
> then it will work.
>
>
> -Original Message-
> From: Sanjeet Kumar [mailto:sanjeetkumar...@gmail.com]
> Sent: 15 January 2018 06:20
> To: solr-user@lucene.apache.org
> Subject: Got unexpected results.
>
> Hi,
>
> I am using Solr-6.4.2, did a query (*title*:("to order this report"~*0*)) on 
> "*text_en*" field and matched ("title":"Forrester Research cites SAP Hybris 
> as a leader in B2B Order Management report").
>
> As per my understanding, this could not match as there is a word "Management"
> between "Order' and "report". Can somebody explain this?.
>
> Thanks.
> 
>
> This message is confidential and may contain privileged information. You 
> should not disclose its contents to any other person. If you are not the 
> intended recipient, please notify the sender named above immediately. It is 
> expressly declared that this e-mail does not constitute nor form part of a 
> contract or unilateral obligation. Opinions, conclusions and other 
> information in this message that do not relate to the official business of 
> findmypast shall be understood as neither given nor endorsed by it.
> 
>
> __
>
> This email has been checked for virus and other malicious content prior to 
> leaving our network.
> __


Re: Uncheck dataimport checkboxes by default

2018-01-15 Thread Daniel Carrasco
Thanks Erick, I'll take a look into the js.


Greetings!!

2018-01-15 17:46 GMT+01:00 Erick Erickson :

> Daniel:
>
> There's no preferences section in the admin UI. That said, it's
> all angular js and the source is there wherever you unpacked
> the package you could just change it. There's no need to
> rebuild Solr etc
>
> BTW, the mail server is pretty aggressive about stripping attachments,
> your (presumed) screenshot is blank
>
> Best,
> Erick
>
> On Mon, Jan 15, 2018 at 2:30 AM, Daniel Carrasco 
> wrote:
>
> > Hello,
> >
> > My question is just what I've summarized on the subject: Is there any way
> > to change the default state of the checkboxes on dataimport admin page?
> >
> > I want to change the default state of the "clean" checkbox to uncheck
> > because sometimes I import incremental data and I forgot to uncheck that
> > box, then all data is cleared and I've to import all again.
> >
> >
> >
> > Thanks!!​
> >
> > --
> > _
> >
> >   Daniel Carrasco Marín
> >   Ingeniería para la Innovación i2TIC, S.L.
> >   Tlf:  +34 911 12 32 84 Ext: 223
> >   www.i2tic.com
> > _
> >
>



-- 
_

  Daniel Carrasco Marín
  Ingeniería para la Innovación i2TIC, S.L.
  Tlf:  +34 911 12 32 84 Ext: 223
  www.i2tic.com
_


Re: Parallelizing queries without Custom Component

2018-01-15 Thread Max Bridgewater
Thanks Emir. Looks indeed like what I need.

On Mon, Jan 15, 2018 at 11:33 AM, Emir Arnautović <
emir.arnauto...@sematext.com> wrote:

> Hi Max,
> It seems to me that you are looking for grouping
> https://lucene.apache.org/solr/guide/6_6/result-grouping.html <
> https://lucene.apache.org/solr/guide/6_6/result-grouping.html> or field
> collapsing https://lucene.apache.org/solr/guide/6_6/collapse-and-
> expand-results.html  solr/guide/6_6/collapse-and-expand-results.html> feature.
>
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 15 Jan 2018, at 17:27, Max Bridgewater 
> wrote:
> >
> > Hi,
> >
> > My index is composed of product reviews. Each review contains the id of
> the
> > product it refers to. But it also contains a rating for this product and
> > the number of negative feedback provided on this product.
> >
> > {
> >   id: solr doc id,
> >   rating: number between 0 and 5,
> >   product_id: the product that is being reviewed,
> >   negative_feedback: how many negative feedbacks on this product
> > }
> >
> > The query below returns the "worst" review for the given product
> 7453632.
> > Worst is defined as  rated 1 to 3 and having the highest number of
> negative
> > feedback.
> >
> > /select?q=product_id:7453632&fq=rating:[1 TO 3]&sort=negative_feedback
> > desc&rows=1
> >
> > The query works as intended. Now the challenging part is to extend this
> > query to support many product_id. If executed with many product Id, the
> > result should be the list of worst reviews for all the provided products.
> >
> > A query of the following form would return the list of worst products for
> > products: 7453632,645454,534664.
> >
> > /select?q=product_id:[7453632,645454,534664]&fq=rating:[1 TO
> > 3]&sort=negative_feedback desc
> >
> > Is there a way to do this in Solr without custom component?
> >
> > Thanks.
> > Max
>
>


Re: Solr cloud upgrade from 5 to 6

2018-01-15 Thread Novin Novin
Thank you very much for your advise. I really appreciate.

On Mon, 15 Jan 2018 at 16:51 Erick Erickson  wrote:

> No, Solr works hard to guarantee compatibility
> one major revision back so any 6x version should
> be able to work fine with any 5x version.
>
> A couple of free bits of advice, worth what you pay
> for them:
>
> 1> don't just use your configs from 5x in 6x. Rather
> start with the stock 6x configs and customize them
> as you did for 5x.
>
> 2> Really look over the CHANGES.txt, particularly hte
> upgrade section for all versions between 5.5.4 and
> 6.6.2.
>
> 3> If you _can_ reindex, I always do if for no other reason
> than that'll force me to look at what's new and make use
> of it. Again you don't _have_ to though.
>
> Best,
> Erick
>
> On Mon, Jan 15, 2018 at 2:10 AM, Novin Novin  wrote:
> > Hi Guys,
> >
> > I would need a piece of advise about upgrading solr cloud.  Would I need
> to
> > re-index data If upgrade Solr cloud from 5.5.4 to 6.6.2?
> >
> > Thanks in advance.
> > Navin
>


cursorMark and Solrcloud

2018-01-15 Thread Webster Homer
I have noticed strange behavior using cursorMark for deep paging in an
application. We use solrcloud for searching. We have several clouds for
development. For our development systems we have two different clouds. One
cloud has 2 shards with 1 replica per shard. All or our other clouds are
set up with 2 shards and 2 replicas per shard.

The application sorts the data by score descending, and the schema's unique
id ascending. According to the documentation, cursor mark requires that the
tie breaker be the schema's unique id.

When I run against the first cloud, I always get consistent results for the
same query. That is not the case with the second cloud. Some queries return
different numbers of results each time it's called. In the code I return
the number found from solr, and I count the number of results for all
iterations against the cursor mark. Sometimes it returns more rows than the
numFound and sometimes less.

I figured that the problem was in my code or in the data to make it easier
to find the problem I changed the sort to just be the unique id from the
schema. The problem went away.

1. The Number Found from solr was always the same
2. It worked when there was only 1 replica per shard
3. From debug statements it appears to return different total counts from
different replicas. When there were 2 replicas per shard I saw 4 different
values being returned.
4. Not sorting on score, and only on the unique id provides consistent
results.

So it appears that we should not include score in the sort when using
cursor mark and solrcloud.

We use solrj and CloudSolrClient. We are currently using the Solr 6.2 solrj
client with Solr 7.2 in our dev environment. We are in the process of
moving completely to 7.2.

Is this a known issue with cursormark and solrcloud?
For debugging purposes can I determine which solr node that cloudSolrClient
is using for a particular query?

I have not yet created a standalone test case for the issue, I'm still not
100% convinced that it is solrcloud, but it certainly looks like it is.

Thanks,
Webster

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.


Re: cursorMark and Solrcloud

2018-01-15 Thread Shawn Heisey

On 1/15/2018 11:56 AM, Webster Homer wrote:

I have noticed strange behavior using cursorMark for deep paging in an
application. We use solrcloud for searching. We have several clouds for
development. For our development systems we have two different clouds. One
cloud has 2 shards with 1 replica per shard. All or our other clouds are
set up with 2 shards and 2 replicas per shard.


A cloud doesn't get set up with shards and replicas.  A collection 
does.  One SolrCloud cluster can contain many collections.


When you say "cloud" are you referring to a collection, or are you 
referring to a set of servers running ZooKeeper and Solr? The latter is 
what I would expect cloud to mean.



When I run against the first cloud, I always get consistent results for the
same query. That is not the case with the second cloud. Some queries return
different numbers of results each time it's called. In the code I return
the number found from solr, and I count the number of results for all
iterations against the cursor mark. Sometimes it returns more rows than the
numFound and sometimes less.

I figured that the problem was in my code or in the data to make it easier
to find the problem I changed the sort to just be the unique id from the
schema. The problem went away.

1. The Number Found from solr was always the same
2. It worked when there was only 1 replica per shard
3. From debug statements it appears to return different total counts from
different replicas. When there were 2 replicas per shard I saw 4 different
values being returned.
4. Not sorting on score, and only on the unique id provides consistent
results.


When you have multiple replicas, each replica may have different numbers 
of deleted documents.  Deleted documents will almost always affect 
scoring.  Because SolrCloud load balances across replicas, one page of 
your cursorMark query can be served by a different replica than the next 
one, so the order of results can differ.


When sorting by unique ID, deleted documents will not affect sort 
order.  When there is only one replica, then sorting by score will 
always produce the same order, unless the index gets modified.


Thanks,
Shawn



Re: cursorMark and Solrcloud

2018-01-15 Thread Webster Homer
The problem is that the cursor mark query returns different numbers of
documents each time it is called when the collection has multiple replicas
per shard.

I meant collection. The same collection is on different clouds. The
collection in one cloud 1 has 2 shards with 1 replica per shard. In the
second cloud the collection has 2 shards with 2 replicas per shard.

The same query using cursorMark against the second cloud returns different
numbers of documents. It appears that each replica returns a slightly
different number of documents. when run against cloud #1 it always returns
the same documents.
Here is a little bit from my debug statements.
count is the number found, solr_retrieved is a counter for all the
documents actually returned over all the calls to the cursor mark Why are
they different?
Each of these represent a search against our collection.

"count": 1382,
"solr_returned": 1281,

"count": 1382,
"solr_returned": 1366,

"count": 1382,
"solr_returned": 1225,

"count": 1382,
"solr_returned": 1397,


Taking score out of the sort, cloud #2 will return consistent result sets.



On Mon, Jan 15, 2018 at 1:28 PM, Shawn Heisey  wrote:

> On 1/15/2018 11:56 AM, Webster Homer wrote:
>
>> I have noticed strange behavior using cursorMark for deep paging in an
>> application. We use solrcloud for searching. We have several clouds for
>> development. For our development systems we have two different clouds. One
>> cloud has 2 shards with 1 replica per shard. All or our other clouds are
>> set up with 2 shards and 2 replicas per shard.
>>
>
> A cloud doesn't get set up with shards and replicas.  A collection does.
> One SolrCloud cluster can contain many collections.
>
> When you say "cloud" are you referring to a collection, or are you
> referring to a set of servers running ZooKeeper and Solr? The latter is
> what I would expect cloud to mean.
>
> When I run against the first cloud, I always get consistent results for the
>> same query. That is not the case with the second cloud. Some queries
>> return
>> different numbers of results each time it's called. In the code I return
>> the number found from solr, and I count the number of results for all
>> iterations against the cursor mark. Sometimes it returns more rows than
>> the
>> numFound and sometimes less.
>>
>> I figured that the problem was in my code or in the data to make it easier
>> to find the problem I changed the sort to just be the unique id from the
>> schema. The problem went away.
>>
>> 1. The Number Found from solr was always the same
>> 2. It worked when there was only 1 replica per shard
>> 3. From debug statements it appears to return different total counts from
>> different replicas. When there were 2 replicas per shard I saw 4 different
>> values being returned.
>> 4. Not sorting on score, and only on the unique id provides consistent
>> results.
>>
>
> When you have multiple replicas, each replica may have different numbers
> of deleted documents.  Deleted documents will almost always affect
> scoring.  Because SolrCloud load balances across replicas, one page of your
> cursorMark query can be served by a different replica than the next one, so
> the order of results can differ.
>
> When sorting by unique ID, deleted documents will not affect sort order.
> When there is only one replica, then sorting by score will always produce
> the same order, unless the index gets modified.
>
> Thanks,
> Shawn
>
>

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.


Re: cursorMark and Solrcloud

2018-01-15 Thread Webster Homer
When I don't have score in the sort, the solr_returned and count are the
same

On Mon, Jan 15, 2018 at 1:50 PM, Webster Homer 
wrote:

> The problem is that the cursor mark query returns different numbers of
> documents each time it is called when the collection has multiple replicas
> per shard.
>
> I meant collection. The same collection is on different clouds. The
> collection in one cloud 1 has 2 shards with 1 replica per shard. In the
> second cloud the collection has 2 shards with 2 replicas per shard.
>
> The same query using cursorMark against the second cloud returns different
> numbers of documents. It appears that each replica returns a slightly
> different number of documents. when run against cloud #1 it always returns
> the same documents.
> Here is a little bit from my debug statements.
> count is the number found, solr_retrieved is a counter for all the
> documents actually returned over all the calls to the cursor mark Why are
> they different?
> Each of these represent a search against our collection.
>
> "count": 1382,
> "solr_returned": 1281,
>
> "count": 1382,
> "solr_returned": 1366,
>
> "count": 1382,
> "solr_returned": 1225,
>
> "count": 1382,
> "solr_returned": 1397,
>
>
> Taking score out of the sort, cloud #2 will return consistent result sets.
>
>
>
> On Mon, Jan 15, 2018 at 1:28 PM, Shawn Heisey  wrote:
>
>> On 1/15/2018 11:56 AM, Webster Homer wrote:
>>
>>> I have noticed strange behavior using cursorMark for deep paging in an
>>> application. We use solrcloud for searching. We have several clouds for
>>> development. For our development systems we have two different clouds.
>>> One
>>> cloud has 2 shards with 1 replica per shard. All or our other clouds are
>>> set up with 2 shards and 2 replicas per shard.
>>>
>>
>> A cloud doesn't get set up with shards and replicas.  A collection does.
>> One SolrCloud cluster can contain many collections.
>>
>> When you say "cloud" are you referring to a collection, or are you
>> referring to a set of servers running ZooKeeper and Solr? The latter is
>> what I would expect cloud to mean.
>>
>> When I run against the first cloud, I always get consistent results for
>>> the
>>> same query. That is not the case with the second cloud. Some queries
>>> return
>>> different numbers of results each time it's called. In the code I return
>>> the number found from solr, and I count the number of results for all
>>> iterations against the cursor mark. Sometimes it returns more rows than
>>> the
>>> numFound and sometimes less.
>>>
>>> I figured that the problem was in my code or in the data to make it
>>> easier
>>> to find the problem I changed the sort to just be the unique id from the
>>> schema. The problem went away.
>>>
>>> 1. The Number Found from solr was always the same
>>> 2. It worked when there was only 1 replica per shard
>>> 3. From debug statements it appears to return different total counts from
>>> different replicas. When there were 2 replicas per shard I saw 4
>>> different
>>> values being returned.
>>> 4. Not sorting on score, and only on the unique id provides consistent
>>> results.
>>>
>>
>> When you have multiple replicas, each replica may have different numbers
>> of deleted documents.  Deleted documents will almost always affect
>> scoring.  Because SolrCloud load balances across replicas, one page of your
>> cursorMark query can be served by a different replica than the next one, so
>> the order of results can differ.
>>
>> When sorting by unique ID, deleted documents will not affect sort order.
>> When there is only one replica, then sorting by score will always produce
>> the same order, unless the index gets modified.
>>
>> Thanks,
>> Shawn
>>
>>
>

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.


Adding a child doc incrementally

2018-01-15 Thread S G
Hi,

We have a use-case where a single document can contain thousands of child
documents.
However, I could not find any way to do it incrementally.
Only way is to read the full document from Solr, add the new child document
to it and then re-index the full document will all of its child documents
again.
This causes lot of reads from Solr just to form the document with one extra
document.
Ideally, I would have liked to only send the parent-ID and the
child-document only as part of an "incremental update" command to Solr.

Is there a way to incrementally add a child document to a parent document?

Thanks
SG


Re: cursorMark and Solrcloud

2018-01-15 Thread Erick Erickson
bq: When I don't have score in the sort, the solr_returned and count
are the same.

Hmmm, I don't know the inner workings of cursor mark all that well. But can you
tell what the score of one of the omitted documents is and how it
compares against
the score of the mark returned on the previous call?

Say the mark returned was 10.333 and an omitted doc's score was 10.332. That
would be a hint a there being an issue with scoring being used as a
primary sort.

You could further nail it down if you fired a query like
solr/collection/collection_shad1_replica1/select?q=(your orginal query
here)&fq=docId:(the doc IDs in question)&distrib=false.
at both replicas in the shard once you've found one that's omitted.


The other possibility is to use distributed IDF, see Configuring
statsCache here:
https://lucene.apache.org/solr/guide/7_0/distributed-requests.html.
I'm not entirely sure
that'd fix the problem, but if if did it would be another bit of evidence.

I'm assuming there's no indexing going on.

Best,
Erick

On Mon, Jan 15, 2018 at 11:52 AM, Webster Homer  wrote:
> When I don't have score in the sort, the solr_returned and count are the
> same
>
> On Mon, Jan 15, 2018 at 1:50 PM, Webster Homer 
> wrote:
>
>> The problem is that the cursor mark query returns different numbers of
>> documents each time it is called when the collection has multiple replicas
>> per shard.
>>
>> I meant collection. The same collection is on different clouds. The
>> collection in one cloud 1 has 2 shards with 1 replica per shard. In the
>> second cloud the collection has 2 shards with 2 replicas per shard.
>>
>> The same query using cursorMark against the second cloud returns different
>> numbers of documents. It appears that each replica returns a slightly
>> different number of documents. when run against cloud #1 it always returns
>> the same documents.
>> Here is a little bit from my debug statements.
>> count is the number found, solr_retrieved is a counter for all the
>> documents actually returned over all the calls to the cursor mark Why are
>> they different?
>> Each of these represent a search against our collection.
>>
>> "count": 1382,
>> "solr_returned": 1281,
>>
>> "count": 1382,
>> "solr_returned": 1366,
>>
>> "count": 1382,
>> "solr_returned": 1225,
>>
>> "count": 1382,
>> "solr_returned": 1397,
>>
>>
>> Taking score out of the sort, cloud #2 will return consistent result sets.
>>
>>
>>
>> On Mon, Jan 15, 2018 at 1:28 PM, Shawn Heisey  wrote:
>>
>>> On 1/15/2018 11:56 AM, Webster Homer wrote:
>>>
 I have noticed strange behavior using cursorMark for deep paging in an
 application. We use solrcloud for searching. We have several clouds for
 development. For our development systems we have two different clouds.
 One
 cloud has 2 shards with 1 replica per shard. All or our other clouds are
 set up with 2 shards and 2 replicas per shard.

>>>
>>> A cloud doesn't get set up with shards and replicas.  A collection does.
>>> One SolrCloud cluster can contain many collections.
>>>
>>> When you say "cloud" are you referring to a collection, or are you
>>> referring to a set of servers running ZooKeeper and Solr? The latter is
>>> what I would expect cloud to mean.
>>>
>>> When I run against the first cloud, I always get consistent results for
 the
 same query. That is not the case with the second cloud. Some queries
 return
 different numbers of results each time it's called. In the code I return
 the number found from solr, and I count the number of results for all
 iterations against the cursor mark. Sometimes it returns more rows than
 the
 numFound and sometimes less.

 I figured that the problem was in my code or in the data to make it
 easier
 to find the problem I changed the sort to just be the unique id from the
 schema. The problem went away.

 1. The Number Found from solr was always the same
 2. It worked when there was only 1 replica per shard
 3. From debug statements it appears to return different total counts from
 different replicas. When there were 2 replicas per shard I saw 4
 different
 values being returned.
 4. Not sorting on score, and only on the unique id provides consistent
 results.

>>>
>>> When you have multiple replicas, each replica may have different numbers
>>> of deleted documents.  Deleted documents will almost always affect
>>> scoring.  Because SolrCloud load balances across replicas, one page of your
>>> cursorMark query can be served by a different replica than the next one, so
>>> the order of results can differ.
>>>
>>> When sorting by unique ID, deleted documents will not affect sort order.
>>> When there is only one replica, then sorting by score will always produce
>>> the same order, unless the index gets modified.
>>>
>>> Thanks,
>>> Shawn
>>>
>>>
>>
>
> --
>
>
> This message and any attachment are confidential and may be privilege

strdist on nested doc field

2018-01-15 Thread Dariusz Wojtas
Hi,

Is it possible to use the strdist() function to return distance on the
child document field?
Let's say I have:


   1
   record
Adam

  A1
  address
  business
  Shakespeare


  A2
  address
  correspondence
  Baker Street



What I want to do is to search for documents:
  type:record
  firstName:Adam
  and return max strdist('Shakespeare', address.street, edit) as the
resulting score?

or
  type:record
  firstName:Adam
  and return max strdist('Shakespeare', address.street, edit) of
"address.type:business" as the resulting score?

I am trying with the {!parent} mode and {!function}, various combinations.
But I do not get what I'd expect.

Best regards,
Dariusz Wojtas


Re: PayloadScoreQuery always returns score of zero

2018-01-15 Thread Chris Hostetter

what does your full request, including the results block look like when 
you search on one of these queries with "fl=*,score" ?

I'm suspicios that perhaps the problem isn't the payload encoding, or the 
PayloadScoreQuery -- but perhaps it's simply a bug in the Explanation 
produced by those queries?

: Date: Wed, 13 Dec 2017 14:15:48 -0600
: From: John Anonymous 
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: PayloadScoreQuery always returns score of zero
: 
: The PayloadScoreQuery always returns a score of zero, regardless of
: payloads.  The PayloadCheckQParser works fine, so I know that I am
: successfully indexing the payloads.   Details below
: 
: *payload field that I am searching on:*
: 
: 
: *definition of payload field type:*
: 
: 
: 
: 
: 
: 
: 
: 
: 
: 
: 
: 
: 
: 
: 
: 
: 
: 
: 
: 
: *Adding some documents with payloads in my test:*assertU(adoc(
: "key", "1",
: "report", "apple¯0 apple¯0 apple¯0"
: ));
: assertU(adoc(
: "key", "2",
: "report", "apple¯1 apple¯1 text¯1"
: ));
: 
: 
: *query:*{!payload_score f=report v=apple func=sum}
: 
: *score (both documents have a score of zero):*
: 
: 
: 0.0 = SumPayloadFunction.docScore()
: 
: 
: 0.0 = SumPayloadFunction.docScore()
: 
:   
: 
: I have tried using func=max as well, but it makes no difference.  Can
: anyone help me with what I am missing here?
: Thanks!
: Johnathan
: 

-Hoss
http://www.lucidworks.com/

Re: cursorMark and Solrcloud

2018-01-15 Thread Shawn Heisey

On 1/15/2018 12:52 PM, Webster Homer wrote:

When I don't have score in the sort, the solr_returned and count are the
same


I don't know what "solr_returned" means.  I haven't encountered that 
before, and nothing useful turns up in a google search.


If you're getting different numFound values for the same query and the 
index hasn't changed, there are two possible causes that I know of.  One 
is replicas out of sync as already described, the other is having 
documents with the same uniqueKey value in more than one shard.  If the 
count is always the same with one sort, then I am leaning towards the 
latter cause.


Which router does your collection use?  If it's implicit, how are you 
deciding which shard gets which document?  If it's compositeId, have you 
changed your hash ranges without deleting everything and building the 
index again?


Thanks,
Shawn



Re: strdist on nested doc field

2018-01-15 Thread Mikhail Khludnev
Hello, Dariusz.

It should be something like
q=+firstName:Adam +{!parent which=type:record
v=$chq}&chq=+type:address +strdist('Shakespeare',
address.street, edit)
post exception if it doesn't work.

On Tue, Jan 16, 2018 at 1:39 AM, Dariusz Wojtas  wrote:

> Hi,
>
> Is it possible to use the strdist() function to return distance on the
> child document field?
> Let's say I have:
>
> 
>1
>record
> Adam
> 
>   A1
>   address
>   business
>   Shakespeare
> 
> 
>   A2
>   address
>   correspondence
>   Baker Street
> 
> 
>
> What I want to do is to search for documents:
>   type:record
>   firstName:Adam
>   and return max strdist('Shakespeare', address.street, edit) as the
> resulting score?
>
> or
>   type:record
>   firstName:Adam
>   and return max strdist('Shakespeare', address.street, edit) of
> "address.type:business" as the resulting score?
>
> I am trying with the {!parent} mode and {!function}, various combinations.
> But I do not get what I'd expect.
>
> Best regards,
> Dariusz Wojtas
>



-- 
Sincerely yours
Mikhail Khludnev


Re: Got unexpected results.

2018-01-15 Thread Sanjeet Kumar
It would be nice if you explain what's wrong with *(*title*:("to order this
report"~0)) *query*.*

On Mon, Jan 15, 2018 at 2:53 PM, Peter Lancaster <
peter.lancas...@findmypast.com> wrote:

> Shouldn't the query just be something like title: "to order this report"
> and then it will work.
>
>
> -Original Message-
> From: Sanjeet Kumar [mailto:sanjeetkumar...@gmail.com]
> Sent: 15 January 2018 06:20
> To: solr-user@lucene.apache.org
> Subject: Got unexpected results.
>
> Hi,
>
> I am using Solr-6.4.2, did a query (*title*:("to order this report"~*0*))
> on "*text_en*" field and matched ("title":"Forrester Research cites SAP
> Hybris as a leader in B2B Order Management report").
>
> As per my understanding, this could not match as there is a word
> "Management"
> between "Order' and "report". Can somebody explain this?.
>
> Thanks.
> 
>
> This message is confidential and may contain privileged information. You
> should not disclose its contents to any other person. If you are not the
> intended recipient, please notify the sender named above immediately. It is
> expressly declared that this e-mail does not constitute nor form part of a
> contract or unilateral obligation. Opinions, conclusions and other
> information in this message that do not relate to the official business of
> findmypast shall be understood as neither given nor endorsed by it.
> 
>
> __
>
> This email has been checked for virus and other malicious content prior to
> leaving our network.
> __
>


Re: Got unexpected results.

2018-01-15 Thread Sanjeet Kumar
Thanks, Erick! Please find the attached screenshot of this debug query
(title:("to order this report"~0)).

On Mon, Jan 15, 2018 at 10:21 PM, Erick Erickson 
wrote:

> What do you see when you add &debug=query?
>
> Best,
> Erick
>
> On Mon, Jan 15, 2018 at 1:23 AM, Peter Lancaster
>  wrote:
> > Shouldn't the query just be something like title: "to order this report"
> and then it will work.
> >
> >
> > -Original Message-
> > From: Sanjeet Kumar [mailto:sanjeetkumar...@gmail.com]
> > Sent: 15 January 2018 06:20
> > To: solr-user@lucene.apache.org
> > Subject: Got unexpected results.
> >
> > Hi,
> >
> > I am using Solr-6.4.2, did a query (*title*:("to order this
> report"~*0*)) on "*text_en*" field and matched ("title":"Forrester Research
> cites SAP Hybris as a leader in B2B Order Management report").
> >
> > As per my understanding, this could not match as there is a word
> "Management"
> > between "Order' and "report". Can somebody explain this?.
> >
> > Thanks.
> > 
> >
> > This message is confidential and may contain privileged information. You
> should not disclose its contents to any other person. If you are not the
> intended recipient, please notify the sender named above immediately. It is
> expressly declared that this e-mail does not constitute nor form part of a
> contract or unilateral obligation. Opinions, conclusions and other
> information in this message that do not relate to the official business of
> findmypast shall be understood as neither given nor endorsed by it.
> > 
> >
> > 
> __
> >
> > This email has been checked for virus and other malicious content prior
> to leaving our network.
> > 
> __
>


RE: solr cluster: solr auto suggestion with requestHandler

2018-01-15 Thread Venkata MR
Any inputs on this really appreciated.

Thanks & Regards
Venkata MR
+91 98455 77125

From: Venkata MR
Sent: Tuesday, January 09, 2018 5:25 PM
To: 'solr-user@lucene.apache.org' 
Cc: Deepak Udapudi ; Nareshkumar P 
Subject: solr cluster: solr auto suggestion with requestHandler

Hi All,

Problem: Not able to build suggest data on all solr cluster nodes

Configured three solr using external zookeeper
Configured the requestHandler for auto-suggestion as below



  true
  5
  Name


  suggest




 
  Name
  name
  name
  AnalyzingInfixLookupFactory
  name_suggester_infix_dir
  DocumentDictionaryFactory
  key
  lowercase
  name_suggestor_dictionary
  string
 


When we manually issue request with suggest.build=true on one of the node for 
building suggest data, suggest data is built for that particular node only, 
other nodes of cluster are not getting build the suggest data.
Any configuration mismatch?

Thanks a lot.

Thanks & Regards
Venkata MR
+91 98455 77125

::DISCLAIMER::
--
The contents of this e-mail and any attachment(s) are confidential and intended 
for the named recipient(s) only. E-mail transmission is not guaranteed to be 
secure or error-free as information could be intercepted, corrupted, lost, 
destroyed, arrive late or incomplete, or may contain viruses in transmission. 
The e mail and its contents (with or without referred errors) shall therefore 
not attach any liability on the originator or HCL or its affiliates. Views or 
opinions, if any, presented in this email are solely those of the author and 
may not necessarily reflect the views or opinions of HCL or its affiliates. Any 
form of reproduction, dissemination, copying, disclosure, modification, 
distribution and / or publication of this message without the prior written 
consent of authorized representative of HCL is strictly prohibited. If you have 
received this email in error please delete it and notify the sender 
immediately. Before opening any email and/or attachments, please check them for 
viruses and other defects.
--


Re: strdist on nested doc field

2018-01-15 Thread Dariusz Wojtas
Hello Mikhail,

I've tried it and the query executes, but it does not treat strdist() as a
function to execute.
Looks like each part of the function - it's name and parameters - are
treated as keywords to search for against the default field.

If I try something different:

   q=+firstName:Adam +{!parent which=type:record v=$chq}&chq=+type:address
+{!func}strdist('Shakespeare',address.street, edit)

then I get exception:
  org.apache.solr.search.SyntaxError: Missing end to unquoted value
starting at 37 str='strdist('Shakespeare',address.street,'

Best regards,
Dariusz Wojtas




On Tue, Jan 16, 2018 at 4:04 AM, Mikhail Khludnev  wrote:

> Hello, Dariusz.
>
> It should be something like
> q=+firstName:Adam +{!parent which=type:record
> v=$chq}&chq=+type:address +strdist('Shakespeare',
> address.street, edit)
> post exception if it doesn't work.
>
> On Tue, Jan 16, 2018 at 1:39 AM, Dariusz Wojtas  wrote:
>
> > Hi,
> >
> > Is it possible to use the strdist() function to return distance on the
> > child document field?
> > Let's say I have:
> >
> > 
> >1
> >record
> > Adam
> > 
> >   A1
> >   address
> >   business
> >   Shakespeare
> > 
> > 
> >   A2
> >   address
> >   correspondence
> >   Baker Street
> > 
> > 
> >
> > What I want to do is to search for documents:
> >   type:record
> >   firstName:Adam
> >   and return max strdist('Shakespeare', address.street, edit) as the
> > resulting score?
> >
> > or
> >   type:record
> >   firstName:Adam
> >   and return max strdist('Shakespeare', address.street, edit) of
> > "address.type:business" as the resulting score?
> >
> > I am trying with the {!parent} mode and {!function}, various
> combinations.
> > But I do not get what I'd expect.
> >
> > Best regards,
> > Dariusz Wojtas
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


Re: Parallelizing queries without Custom Component

2018-01-15 Thread Mikhail Khludnev
It's also can be done with {!join .. score=max}... but this join is usually
slow on big indices.

On Mon, Jan 15, 2018 at 7:27 PM, Max Bridgewater 
wrote:

> Hi,
>
> My index is composed of product reviews. Each review contains the id of the
> product it refers to. But it also contains a rating for this product and
> the number of negative feedback provided on this product.
>
> {
>id: solr doc id,
>rating: number between 0 and 5,
>product_id: the product that is being reviewed,
>negative_feedback: how many negative feedbacks on this product
> }
>
> The query below returns the "worst" review for the given product  7453632.
> Worst is defined as  rated 1 to 3 and having the highest number of negative
> feedback.
>
> /select?q=product_id:7453632&fq=rating:[1 TO 3]&sort=negative_feedback
> desc&rows=1
>
> The query works as intended. Now the challenging part is to extend this
> query to support many product_id. If executed with many product Id, the
> result should be the list of worst reviews for all the provided products.
>
> A query of the following form would return the list of worst products for
> products: 7453632,645454,534664.
>
> /select?q=product_id:[7453632,645454,534664]&fq=rating:[1 TO
> 3]&sort=negative_feedback desc
>
> Is there a way to do this in Solr without custom component?
>
> Thanks.
> Max
>



-- 
Sincerely yours
Mikhail Khludnev


Re: strdist on nested doc field

2018-01-15 Thread Mikhail Khludnev
On Tue, Jan 16, 2018 at 10:30 AM, Dariusz Wojtas  wrote:

> Hello Mikhail,
>
> I've tried it and the query executes, but it does not treat strdist() as a
> function to execute.
> Looks like each part of the function - it's name and parameters - are
> treated as keywords to search for against the default field.
>
Can you post the exact observations, rather than interpretation?


>
> If I try something different:
>
>q=+firstName:Adam +{!parent which=type:record v=$chq}&chq=+type:address
> +{!func}strdist('Shakespeare',address.street, edit)
>
> then I get exception:
>   org.apache.solr.search.SyntaxError: Missing end to unquoted value
> starting at 37 str='strdist('Shakespeare',address.street,'
>
This particular query failed because of the space. Here is my pet peeve in
Solr: the syntax {!foo} captures whole string if it's in beginning of the
string, but in the middle of the string it captures only substring until
the first space.
So, after removing space it should work fine. Another potential issues are:
single quotes (do they it ever supported?), and the dot in the fieldname
(you never know).


>
> Best regards,
> Dariusz Wojtas
>
>
>
>
> On Tue, Jan 16, 2018 at 4:04 AM, Mikhail Khludnev  wrote:
>
> > Hello, Dariusz.
> >
> > It should be something like
> > q=+firstName:Adam +{!parent which=type:record
> > v=$chq}&chq=+type:address +strdist('Shakespeare',
> > address.street, edit)
> > post exception if it doesn't work.
> >
> > On Tue, Jan 16, 2018 at 1:39 AM, Dariusz Wojtas 
> wrote:
> >
> > > Hi,
> > >
> > > Is it possible to use the strdist() function to return distance on the
> > > child document field?
> > > Let's say I have:
> > >
> > > 
> > >1
> > >record
> > > Adam
> > > 
> > >   A1
> > >   address
> > >   business
> > >   Shakespeare
> > > 
> > > 
> > >   A2
> > >   address
> > >   correspondence
> > >   Baker Street
> > > 
> > > 
> > >
> > > What I want to do is to search for documents:
> > >   type:record
> > >   firstName:Adam
> > >   and return max strdist('Shakespeare', address.street, edit) as the
> > > resulting score?
> > >
> > > or
> > >   type:record
> > >   firstName:Adam
> > >   and return max strdist('Shakespeare', address.street, edit) of
> > > "address.type:business" as the resulting score?
> > >
> > > I am trying with the {!parent} mode and {!function}, various
> > combinations.
> > > But I do not get what I'd expect.
> > >
> > > Best regards,
> > > Dariusz Wojtas
> > >
> >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> >
>



-- 
Sincerely yours
Mikhail Khludnev