Hi Erick,"You still haven’t given an example of the results you’re seeing
that are unexpected".
I will give an example of the data I received. Before starting data update I
have:
solrCloud: Expected series criteria:386062
Collected series: 386062
Number of requests: 40
Collected unique series: 386062.
Similar results for nodes in solr cloud.
During the process of updating the series I have:
solrCloud: Expected series criteria:386062
Collected series: 445550
Number of requests: 124
Collected unique series: 386062.
First node:
Expected series criteria:386062
Collected series: 1442775
Number of requests: 146
Collected unique series: 386062.
Second node:
Expected series criteria:386062
Collected series: 242823
Number of requests: 26
Collected unique series: 242823.
After the completion of the data update. I get the data as before the
update.
Best,
Vlad
Mon, 28 Sep 2020 10:51:01 -0400, Erick Erickson
писал(а):
I said nothing about docId changing. _any_ sort criteria changing is
an issue. You’re sorting by score. Well, as you index documents, the
new docs change the values used to calculate scores for _all_
documents will change, thus changing the sort order and potentially
causing unexpected results when using cursormark. That said, I don’t
think you’re getting any different scores at all if you’re really
searching for “(* AND *)", try returning score in the fl list, are
they different?
You still haven’t given an example of the results you’re seeing that
are unexpected. And my assumption is that you are seeing odd results
when you call this query again with a cursorMark returned by a
previous call. Or are you saying that you don’t think facet.query is
returning the correct count? Be aware that Solr doesn’t support true
Boolean logic, see:
https://lucidworks.com/post/why-not-and-or-and-not/
There’s special handling for the form "fq=NOT something” to change
it to "fq=*:* NOT something” that’s not present in something like
"q=NOT something”. How that plays in facet.query I’m not sure, but
try “facet.query=*:* NOT something” if the facet count is what the
problem is.
l have no idea what you’re trying to accomplish with (* AND *)
unless those are just placeholders and you put real text in them.
That’s rather odd. *:* is “select everything”...
BTW, returning 10,000 docs is somewhat of an anti-pattern, if you
really require that many documents consider streaming.
On Sep 28, 2020, at 10:21 AM, vmakov...@xbsoftware.by wrote:
Hi, Erick
I have a python script that sends requests with CursorMark. This
script checks data against the following Expected series criteria:
Collected series:
Number of requests:
Collected unique series:
The request looks like this:
select?indent=off&defType=edismax&wt=json&facet.query={!key=NUM_DOCS}NOT
SERIES_ID:0&fq=NOT
SERIES_ID:0&spellcheck=true&spellcheck.collate=true&spellcheck.extendedResults=true&facet.limit=-1&q=(*
AND *)&qf=all_text_stemming all_text&fq=facet_db_code:( "CN"
)&fq=-SERIES_CODE:( "TEST" )&fl=SERIES_ID&sort=score desc,docId
asc&bq=SERIES_STATUS:T^5&bq=KEY_SERIES_FLAG:1^5&bq=accuracy_name:0&bq=SERIES_STATUS:C^-30&rows=1&cursorMark=*
DocId does not change during data update.During data updating
process in solrCloud skript returnd incorect Number of requests and
Collected series.
Best,
Vlad
Mon, 28 Sep 2020 08:54:57 -0400, Erick Erickson
писал(а):
Define “incorrect” please. Also, showing the exact query you use
would be helpful.
That said, indexing data at the same time you are using CursorMark
is not guaranteed do find all documents. Consider a sort with date
asc, id asc. doc53 has a date of 2001 and you’re already returned the
doc.
Next, you update doc53 to 2020. It now appears sometime later in the
results due to the changed data. Or the other way, doc53 starts with
2020, and while your cursormark label is in 2010, you change doc53 to
have a date of 2001. It will never be returned.
Similarly for anything else you change that’s relevant to the sort
criteria you’re using.
CursorMark doesn’t remember _documents_, just, well, call it the
fingerprint (i.e. sort criteria values) of the last document returned
so far.
Best,
Erick
On Sep 28, 2020, at 3:32 AM, vmakov...@xbsoftware.by wrote:
Good afternoon,
Could you please suggest us a solution: during data updating process
in solrCloud, requests with cursor mark return incorrect data. I
suppose that the results do not follow each other during the
indexation process, because the data doesn't have enough time to be
replicated between the nodes.
Kind regards,
Vladislav Makovski
Vladislav Makovski
Developer
XB Software Ltd. | Minsk, Belarus
Site: https://xbsoftware.com
Skype: vlad__makovski
Cell: +37529 6484100
Vladislav Makovski
Developer
XB Software Ltd. | Minsk, Belarus
Site: https://xbsoftware.com
Skype: vlad__makovski
Cell: +37529 6484100