SOLR Cursor Pagination Issue

2020-09-25 Thread vmakovsky

Good afternoon,
Could you please suggest us a solution: during data updating process in 
solrCloud, requests with cursor mark return incorrect data. I suppose that 
the results do not follow each other during the indexation process, because 
the data doesn't have enough time to be replicated between the nodes.

Kind regards,
Vladislav Makovski
Developer
XB Software Ltd. | Minsk, Belarus
Site: https://xbsoftware.com
Skype: vlad__makovski
Cell:  +37529 6484100


SOLR Cursor Pagination Issue

2020-09-28 Thread vmakovsky

Good afternoon,
Could you please suggest us a solution: during data updating process in 
solrCloud, requests with cursor mark return incorrect data. I suppose that 
the results do not follow each other during the indexation process, because 
the data doesn't have enough time to be replicated between the nodes.

Kind regards,
Vladislav Makovski



Re: SOLR Cursor Pagination Issue

2020-09-28 Thread vmakovsky

Hi, Erick

I have a python script that sends requests with CursorMark. This script 
checks data against the following Expected series criteria:

Collected series:
Number of requests:
Collected unique series:
The request looks like this: 
select?indent=off&defType=edismax&wt=json&facet.query={!key=NUM_DOCS}NOT 
SERIES_ID:0&fq=NOT 
SERIES_ID:0&spellcheck=true&spellcheck.collate=true&spellcheck.extendedResults=true&facet.limit=-1&q=(* 
AND *)&qf=all_text_stemming all_text&fq=facet_db_code:( "CN" 
)&fq=-SERIES_CODE:( "TEST" )&fl=SERIES_ID&sort=score desc,docId 
asc&bq=SERIES_STATUS:T^5&bq=KEY_SERIES_FLAG:1^5&bq=accuracy_name:0&bq=SERIES_STATUS:C^-30&rows=1&cursorMark=*


DocId does not change during data update.During data updating process in 
solrCloud skript returnd incorect Number of requests and Collected series.


Best,
Vlad



Mon, 28 Sep 2020 08:54:57 -0400, Erick Erickson  
писал(а):


Define “incorrect” please. Also, showing the exact query you use 
would be helpful.


That said, indexing data at the same time you are using CursorMark 
is not guaranteed do find all documents. Consider a sort with date 
asc, id asc. doc53 has a date of 2001 and you’re already returned the 
doc.


Next, you update doc53 to 2020. It now appears sometime later in the 
results due to the changed data. 

Or the other way, doc53 starts with 2020, and while your cursormark 
label is in 2010, you change doc53 to have a date of 2001. It will 
never be returned.


Similarly for anything else you change that’s relevant to the sort 
criteria you’re using.


CursorMark doesn’t remember _documents_, just, well, call it the 
fingerprint (i.e. sort criteria values) of the last document returned 
so far.


Best,
Erick


On Sep 28, 2020, at 3:32 AM, vmakov...@xbsoftware.by wrote:

Good afternoon,
Could you please suggest us a solution: during data updating process 
in solrCloud, requests with cursor mark return incorrect data. I 
suppose that the results do not follow each other during the 
indexation process, because the data doesn't have enough time to be 
replicated between the nodes.

Kind regards,
Vladislav Makovski




Vladislav Makovski
Developer
XB Software Ltd. | Minsk, Belarus
Site: https://xbsoftware.com
Skype: vlad__makovski
Cell:  +37529 6484100


Re: SOLR Cursor Pagination Issue

2020-09-29 Thread vmakovsky
Hi Erick,"You still haven’t given an example of the results you’re seeing 
that are unexpected".
I will give an example of the data I received. Before starting data update I 
have:

solrCloud: Expected series criteria:386062
Collected series: 386062
Number of requests: 40
Collected unique series: 386062.
Similar results for nodes in solr cloud.
During the process of updating the series I have:
solrCloud: Expected series criteria:386062
Collected series: 445550
Number of requests: 124
Collected unique series: 386062.
First node:
Expected series criteria:386062
Collected series: 1442775
Number of requests: 146
Collected unique series: 386062.
Second node:
Expected series criteria:386062
Collected series: 242823
Number of requests: 26
Collected unique series: 242823.
After the completion of the data update. I get the data as before the 
update.

Best,
Vlad
 
Mon, 28 Sep 2020 10:51:01 -0400, Erick Erickson  
писал(а):


I said nothing about docId changing. _any_ sort criteria changing is 
an issue. You’re sorting by score. Well, as you index documents, the 
new docs change the values used to calculate scores for _all_ 
documents will change, thus changing the sort order and potentially 
causing unexpected results when using cursormark. That said, I don’t 
think you’re getting any different scores at all if you’re really 
searching for “(* AND *)", try returning score in the fl list, are 
they different?


You still haven’t given an example of the results you’re seeing that 
are unexpected. And my assumption is that you are seeing odd results 
when you call this query again with a cursorMark returned by a 
previous call. Or are you saying that you don’t think facet.query is 
returning the correct count? Be aware that Solr doesn’t support true 
Boolean logic, see: 
https://lucidworks.com/post/why-not-and-or-and-not/


There’s special handling for the form "fq=NOT something” to change 
it to "fq=*:* NOT something” that’s not present in something like 
"q=NOT something”. How that plays in facet.query I’m not sure, but 
try “facet.query=*:* NOT something” if the facet count is what the 
problem is.


l have no idea what you’re trying to accomplish with (* AND *) 
unless those are just placeholders and you put real text in them. 
That’s rather odd. *:* is “select everything”...


BTW, returning 10,000 docs is somewhat of an anti-pattern, if you 
really require that many documents consider streaming.



On Sep 28, 2020, at 10:21 AM, vmakov...@xbsoftware.by wrote:

Hi, Erick

I have a python script that sends requests with CursorMark. This 
script checks data against the following Expected series criteria:

Collected series:
Number of requests:
Collected unique series:
The request looks like this: 
select?indent=off&defType=edismax&wt=json&facet.query={!key=NUM_DOCS}NOT 
SERIES_ID:0&fq=NOT 
SERIES_ID:0&spellcheck=true&spellcheck.collate=true&spellcheck.extendedResults=true&facet.limit=-1&q=(* 
AND *)&qf=all_text_stemming all_text&fq=facet_db_code:( "CN" 
)&fq=-SERIES_CODE:( "TEST" )&fl=SERIES_ID&sort=score desc,docId 
asc&bq=SERIES_STATUS:T^5&bq=KEY_SERIES_FLAG:1^5&bq=accuracy_name:0&bq=SERIES_STATUS:C^-30&rows=1&cursorMark=*


DocId does not change during data update.During data updating 
process in solrCloud skript returnd incorect Number of requests and 
Collected series.


Best,
Vlad


Mon, 28 Sep 2020 08:54:57 -0400, Erick Erickson 
 писал(а):


Define “incorrect” please. Also, showing the exact query you use 
would be helpful.
That said, indexing data at the same time you are using CursorMark 
is not guaranteed do find all documents. Consider a sort with date 
asc, id asc. doc53 has a date of 2001 and you’re already returned the 
doc.
Next, you update doc53 to 2020. It now appears sometime later in the 
results due to the changed data. Or the other way, doc53 starts with 
2020, and while your cursormark label is in 2010, you change doc53 to 
have a date of 2001. It will never be returned.
Similarly for anything else you change that’s relevant to the sort 
criteria you’re using.
CursorMark doesn’t remember _documents_, just, well, call it the 
fingerprint (i.e. sort criteria values) of the last document returned 
so far.

Best,
Erick

On Sep 28, 2020, at 3:32 AM, vmakov...@xbsoftware.by wrote:
Good afternoon,
Could you please suggest us a solution: during data updating process 
in solrCloud, requests with cursor mark return incorrect data. I 
suppose that the results do not follow each other during the 
indexation process, because the data doesn't have enough time to be 
replicated between the nodes.

Kind regards,
Vladislav Makovski

Vladislav Makovski
Developer
XB Software Ltd. | Minsk, Belarus
Site: https://xbsoftware.com
Skype: vlad__makovski
Cell:  +37529 6484100




Vladislav Makovski
Developer
XB Software Ltd. | Minsk, Belarus
Site: https://xbsoftware.com
Skype: vlad__makovski
Cell:  +37529 6484100