What do you mean "indexed for the next shard"? With embeddedSolrServer,
you will have to map the documents into the proper shard (probably a
separate directory?), then keep very careful track of what shards they are,
then move them correctly to the proper "real" Solr server and... Simply indexing
a bunch of documents to a sub-index then copying that out to a shard is
guaranteed not to give consistent results.

IOW, you'll have a _lot_ of work to do to make this right, requiring a deep
understanding of all the cloud document routing. Given that, I'm not even
going to speculate on exactly what went wrong, I wouldn't even know where
to start.

And there's no point at all to doing this. See the MapReduceIndexerTool in the
contrib area and just go with that.

bq: I have read somewhere consistency of the cloud is broken if different
shards are holding the value for same UniqueID field.

Yes, if the same document is on different shards, it'll be considered two
separate documents.



On Mon, Jun 16, 2014 at 1:06 AM, Vineet Mishra <clearmido...@gmail.com> wrote:
> Hi Erick,
>
> Thanks for your response, well I got it resolved. I think the index were
> not properly distributed and moreover I had some uneven behavior while
> indexing, so to elaborate it,
>
> I had three shards in my collection, I started indexing with
> EmbeddedSolrServer and indexed around 50 Million Documents(15 GB index size
> without replication), there after I indexed another 50 Million to different
> directory for next Shard but when I checked the stats of indexing next
> day(probably running after 15 hrs or so) it was still running and the index
> size was grown to 60 GB(I didn't understood why such a huge disk allocation
> had taken place even for the same amount of 50 Million data I indexed
> previously), eventually I stopped the process as I couldn't get better
> updates and copied the indexes to the next Shard.
>
> #When I queried later with *:* I got the response as 69 Million
> documents(which was supposed to be 100 Million).
>
> ##I am not sure where another 30 Million was gone, but the problem started
> coming once after I again indexed to next Shard with remaining 30 Million
> which was not coming in querying #.
>
> I have read somewhere consistency of the cloud is broken if different
> shards are holding the value for same UniqueID field.
>
> With this I got few things to clarify.
> *Does the inconsistency behavior was because of the step I took at ## ?
> *If the inconsistency was because of ## then why all 100 Million documents
> was not present after # ?
> *When the same set of data was previously indexed with just 15 GB, why the
> index size for next 50 Million was grown to 60 GB?
> *For indexing huge data in reasonable time for SolrCloud what approach
> should be taken, if EmbeddedSolrServer is not better choice?
>
> Looking out for response.
>
> Thanks!
>
>
> On Sat, Jun 14, 2014 at 12:31 AM, Erick Erickson <erickerick...@gmail.com>
> wrote:
>
>> It seems like for some reason you have shards that are not reachable.
>> What does your cloud stat in the admin UI tell you when you don't get
>> all the docs back?
>>
>> Best,
>> Erick
>>
>> On Fri, Jun 13, 2014 at 1:37 AM, Vineet Mishra <clearmido...@gmail.com>
>> wrote:
>> > Hi All,
>> >
>> > I am having a Cloud setup with 3 Shards and 2 Replica running on 3
>> Tomcats
>> > with 3 External Zookeeper, all running on single machine.
>> > I have Indexed around 70 Mln Documents that seems to be querying back
>> fine.
>> > When I index another 30 Mln to same, the result are vague as with the
>> query
>> > *:* its sometimes returning 2 Shards result and sometime all the shards
>> > result.
>> > So to make it clear if I query with *:* to the 100Mln index its should
>> > return back 100Mln docs, but sometimes its returning 70Mln and sometimes
>> > 100Mln(Actual Result) with the same query.
>> >
>> > This is just not case with the *:* query but even if I query with the id
>> >
>> > q=id:123
>> >
>> > its sometimes coming with the result and sometimes not.
>> >
>> > Looking for possible solution.
>> >
>> > Thanks!
>>

Reply via email to