Hello

Apologies for the lack of actual detail in this, we're still digging into
it ourselves. I will provide more detail, and maybe some logs, once I have
a better idea of what is actually happening.
But I thought I might as well ask if anyone knows of changes that were made
in the Solr 8.3 release that are likely to have caused an issue like this?

We were on Solr 8.1.1 for several months and moved to 8.2.0 for about 2
weeks before moving to 8.3.0 last week.
We didn't see this issue at all on the previous releases. Since moving to
8.3 we have had a consistent (but non-deterministic) set of failing tests,
on Windows and Linux.

The issue we are seeing as that during updates, the data we have sent is
*sometimes* corrupted, as though a buffer has been used incorrectly. For
example if the well formed data went was
*'fieldName':"this is a long string"*
The error we see from Solr might be that
unknown field * 'fieldNamis a long string" *

And variations of that kind of behaviour, were part of the data is missing
or corrupted. The data we are indexing does include fields which store
(escaped) serialized JSON strings - if that might have any bearing - but
the error isn't always on those fields.
For example, given a valid document that looks like this (I've replaced the
values by hand, so if the json is messed up here, that's not relevant:)
when returned with the json response writer:




*{    "id": "abcd",    "testField": "blah",    "jsonField":
"{\"thing\":{\"abcd\":\"value\",\"xyz\":[\"abc\",\"def\",\"ghi\"],\"nnn\":\"xyz\"},\"stuff\":[{\"qqq\":\"rrr\"}],\"ttt\":0,\"mmm\":\"Some
string\",\"someBool\":true}"}*
We've had errors during indexing like:
*unknown field
'testField:"value","xyz":["abc","def","ghi"],"nnn":"xyz"},"stuff":[{"qqq":"rrr"}],"ttt":0,"mmm":"Some
string","someBool":true}���������������������������'*
(those � unprintable characters are part of it)

So far we've not been able to reproduce the problem on a collection with a
single shard, so it does seem like the problem is only happening internally
when updates are distributed to the other shards... But that's not been
totally verified.

We've also only encountered the problem on one of the collections we build
(the data within each collection is generally the same though. The ids are
slightly different - but still strings. The main difference is that this
problematic index is built using an Iterator<SolrInputDocument> to *solrj
org.apache.solr.client.solrj.SolrClient.add(String,
Iterator<SolrInputDocument>)* - the *SolrInputDocument*s are not being
reused in the client, I checked that -, while the other index is built by
streaming CSVs to Solr.)


We will look into it further, but if anyone has any ideas of what might
have changed in 8.3 from 8.1 / 8.2 that could cause this, that would be
helpful.

Cheers
Colvin

Reply via email to