Re: bulk reindexing 5.3.0 issue

Ravi Solr Fri, 25 Sep 2015 17:08:54 -0700

Thanks for responding Erick. I set the "start" to zero and "rows" always to
100. I create CloudSolrClient instance and use it to both query as well as
index. But I do sleep for 5 secs just to allow for any auto commits.


So query --> client.add(100 docs) --> wait --> query again

But the weird thing I noticed was that after 8 or 9 batches I.e 800/900
docs the "query again" returns zero docs causing my while loop to
exist...so was trying to see if I was doing the right thing or if there is
an alternate way to do heavy indexing.

Thanks

Ravi Kiran Bhaskar



On Friday, September 25, 2015, Erick Erickson <erickerick...@gmail.com>
wrote:

> How are you querying Solr? You say you query for 100 docs,
> update then get the next set. What are you using for a marker?
> If you're using the start parameter, and somehow a commit is
> creeping in things might be weird, especially if you're using any
> of the internal Lucene doc IDs. If you're absolutely sure no commits
> are taking place even that should be OK.
>
> The "deep paging" stuff could be helpful here, see:
>
> https://lucidworks.com/blog/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/
>
> Best,
> Erick
>
> On Fri, Sep 25, 2015 at 3:13 PM, Ravi Solr <ravis...@gmail.com
> <javascript:;>> wrote:
> > No problem Walter, it's all fun. Was just wondering if there was some
> other
> > good way that I did not know of, that's all 😀
> >
> > Thanks
> >
> > Ravi Kiran Bhaskar
> >
> > On Friday, September 25, 2015, Walter Underwood <wun...@wunderwood.org
> <javascript:;>>
> > wrote:
> >
> >> Sorry, I did not mean to be rude. The original question did not say that
> >> you don’t have the docs outside of Solr. Some people jump to the
> advanced
> >> features and miss the simple ones.
> >>
> >> It might be faster to fetch all the docs from Solr and save them in
> files.
> >> Then modify them. Then reload all of them. No guarantee, but it is
> worth a
> >> try.
> >>
> >> Good luck.
> >>
> >> wunder
> >> Walter Underwood
> >> wun...@wunderwood.org <javascript:;> <javascript:;>
> >> http://observer.wunderwood.org/  (my blog)
> >>
> >>
> >> > On Sep 25, 2015, at 2:59 PM, Ravi Solr <ravis...@gmail.com
> <javascript:;>
> >> <javascript:;>> wrote:
> >> >
> >> > Walter, Not in a mood for banter right now.... Its 6:00pm on a friday
> and
> >> > Iam stuck here trying to figure reindexing issues :-)
> >> > I dont have source of docs so I have to query the SOLR, modify and
> put it
> >> > back and that is seeming to be quite a task in 5.3.0, I did reindex
> >> several
> >> > times with 4.7.2 in a master slave env without any issue. Since then
> we
> >> > have moved to cloud and it has been a pain all day.
> >> >
> >> > Thanks
> >> >
> >> > Ravi Kiran Bhaskar
> >> >
> >> > On Fri, Sep 25, 2015 at 5:25 PM, Walter Underwood <
> wun...@wunderwood.org <javascript:;>
> >> <javascript:;>>
> >> > wrote:
> >> >
> >> >> Sure.
> >> >>
> >> >> 1. Delete all the docs (no commit).
> >> >> 2. Add all the docs (no commit).
> >> >> 3. Commit.
> >> >>
> >> >> wunder
> >> >> Walter Underwood
> >> >> wun...@wunderwood.org <javascript:;> <javascript:;>
> >> >> http://observer.wunderwood.org/  (my blog)
> >> >>
> >> >>
> >> >>> On Sep 25, 2015, at 2:17 PM, Ravi Solr <ravis...@gmail.com
> <javascript:;>
> >> <javascript:;>> wrote:
> >> >>>
> >> >>> I have been trying to re-index the docs (about 1.5 million) as one
> of
> >> the
> >> >>> field needed part of string value removed (accidentally
> introduced). I
> >> >> was
> >> >>> issuing a query for 100 docs getting 4 fields and updating the doc
> >> >> (atomic
> >> >>> update with "set") via the CloudSolrClient in batches, However from
> >> time
> >> >> to
> >> >>> time the query returns 0 results, which exits the re-indexing
> program.
> >> >>>
> >> >>> I cant understand as to why the cloud returns 0 results when there
> are
> >> >> 1.4x
> >> >>> million docs which have the "accidental" string in them.
> >> >>>
> >> >>> Is there another way to do bulk massive updates ?
> >> >>>
> >> >>> Thanks
> >> >>>
> >> >>> Ravi Kiran Bhaskar
> >> >>
> >> >>
> >>
> >>
>

Re: bulk reindexing 5.3.0 issue

Reply via email to