Re: Reindex of document leaves old fields behind

2015-05-22 Thread tuxedomoon
This is fixed. My SolrJ client was putting a JSON object into a multivalued field in the SolrInputDocument. Solr returned a 0 status code but did not add the bad object, instead it performed what looks like an atomic index as described above. Once I removed the illegal JSON object from the SolrI

Re: Reindex of document leaves old fields behind

2015-05-21 Thread tuxedomoon
I'm posting the fields from one of my problem document, based on this comment I found from Shawn on Grokbase. >> If you are trying to use a Map object as the value of a field, that is >> probably why it is interpreting your add request as an atomic update. >> If this is the case, and you're doin

Re: Reindex of document leaves old fields behind

2015-05-21 Thread tuxedomoon
a few further clues to this unresolved problem 1. I found one of my 5 zookeeper instances was down 2. I tried another reindex of a bad document but no change on the SOLR side 3. I deleted and reindexed the same doc, that worked (obviously, but at this point I don't know what to expect) -- View

Re: Reindex of document leaves old fields behind

2015-05-21 Thread tuxedomoon
I'm relying on an autocommit of 60 secs. I just ran the same test via my SolrJ client and result was the same, SolrCloud query always returns correct number of fields. Is there a way to find out which shard and replica a particular document lives on? -- View this message in context: http://

Re: Reindex of document leaves old fields behind

2015-05-21 Thread Erick Erickson
My guess is that you're not committing from your SolrJ program. That's automatic when you post. Best, Erick On Thu, May 21, 2015 at 10:13 AM, tuxedomoon wrote: > OK it is composite > > I've just used post.sh to index a test doc with 3 fields to leader 1 of my > SolrCloud. I then reindexed it wi

Re: Reindex of document leaves old fields behind

2015-05-21 Thread tuxedomoon
OK it is composite I've just used post.sh to index a test doc with 3 fields to leader 1 of my SolrCloud. I then reindexed it with 1 field removed and the query on it shows 2 fields. I repeated this a few times and always get the correct field count from Solr. I'm now wondering if SolrJ is so

Re: Reindex of document leaves old fields behind

2015-05-21 Thread Shawn Heisey
On 5/21/2015 9:54 AM, tuxedomoon wrote: > I'm doing all my index to leader 1 and have not specified any router > configuration. But there is an equal distribution of 240M docs across 5 > shards. I think I've been stating I have 3 shards in these posts, I have 5, > sorry. > > How do I know what k

Re: Reindex of document leaves old fields behind

2015-05-21 Thread tuxedomoon
I'm doing all my index to leader 1 and have not specified any router configuration. But there is an equal distribution of 240M docs across 5 shards. I think I've been stating I have 3 shards in these posts, I have 5, sorry. How do I know what kind of routing I am using? -- View this messag

Re: Reindex of document leaves old fields behind

2015-05-21 Thread Shawn Heisey
On 5/21/2015 9:02 AM, tuxedomoon wrote: > l>> If it is "implicit" then >>> you may have indexed the new document to a different shard, which means >>> that it is now in your index more than once, and which one gets returned >>> may not be predictable. > > If a document with uniqueKey "1234" is ass

Re: Reindex of document leaves old fields behind

2015-05-21 Thread tuxedomoon
l>> If it is "implicit" then >> you may have indexed the new document to a different shard, which means >> that it is now in your index more than once, and which one gets returned >> may not be predictable. If a document with uniqueKey "1234" is assigned to a shard by SolrCloud, implicit routing w

Re: Reindex of document leaves old fields behind

2015-05-21 Thread tuxedomoon
>> let's see the code. simplified code and some comments 1. solrUrl points at leader 1 of 3 leaders, each with a replica 2. createSolrDoc takes a full Mongo doc and returns a valid SolrInputDocument 3. I have done dumps of the returned solrDoc and verified it does not have the unwanted fiel

Re: Reindex of document leaves old fields behind

2015-05-20 Thread Erick Erickson
Well, let's see the code. Standard updates should replace the previous docs, reindexing the same unique ID with fewer fields should show fewer fields. So something's weird here. Although do, just for yucks, issue a query on some of the unique ids in question, I'd be curious if you get more than on

Re: Reindex of document leaves old fields behind

2015-05-20 Thread tuxedomoon
The uniqueKey value is the same. The new documents contain fewer fields than the already indexed ones. Could this cause the updates to be treated as atomic? With the persisting fields treated as un-updated? Routing should be implicit since the collection was created using numShards. Many req

Re: Reindex of document leaves old fields behind

2015-05-20 Thread Shawn Heisey
On 5/20/2015 4:43 PM, tuxedomoon wrote: > I'm reindexing Mongo docs into SolrCloud. The new docs have had a few fields > removed so upon reindexing those fields should be gone in Solr. They are > not. So the result is a new doc merged with an old doc rather than a > replacement which is what I n