Re: bulk reindexing 5.3.0 issue

2015-09-28 Thread Ravi Solr
Gili I was constantly checking the cloud admin UI and it always stayed Green, that is why I initially overlooked sync issues...finally when all options dried out I went individually to each node and quieried and that is when i found the out of sync issue. The way I resolved my issue was shut down t

Re: bulk reindexing 5.3.0 issue

2015-09-28 Thread Gili Nachum
Were all of shard replica in active state (green color in admin ui) before starting? Sounds like it otherwise you won't hit the replica that is out of sync. Replicas can get out of sync, and report being in sync after a sequence of stop start w/o a chance to complete sync. See if it might have hap

Re: bulk reindexing 5.3.0 issue

2015-09-26 Thread Ravi Solr
Erick...There is only one type of String "sun.org.mozilla.javascript.internal.NativeString:" and no other variations of that in my index, so no question of missing it. Point taken regarding the CURSORMARK stuff, yes you are correct, my head so numb at this point after working 3 days on this, I wasn

Re: bulk reindexing 5.3.0 issue

2015-09-26 Thread Erick Erickson
bq: 3. Erick, I wasnt getting all 1.4 mill in one shot. I was initially using 100 docs batch, which, I later increased to 500 docs per batch. Also it would not be a infinite loop if I commit for each batch, right !!?? That's not the point at all. Look at the basic logic here: You run for a while

Re: bulk reindexing 5.3.0 issue

2015-09-26 Thread Ravi Solr
Erick & Shawn I incrporated your suggestions. 0. Shut off all other indexing processes. 1. As Shawn mentioned set batch size to 1. 2. Loved Erick's suggestion about not using filter at all and sort by uniqueId and put last known uinqueId as next queries start while still using cursor marks as

Re: bulk reindexing 5.3.0 issue

2015-09-26 Thread Ravi Solr
Thank you Erick & Shawn for taking significant time off your weekends to debug and explain in great detail. I will try to address the main points from your emails to provide more situation context for better understanding of my situation 1. Erick, As part of our upgrade from 4.7.2 to 5.3.0 I re-in

Re: bulk reindexing 5.3.0 issue

2015-09-26 Thread Erick Erickson
Oh, one more thing. _assuming_ you can't change the indexing process that gets the docs from the system of record, why not just add an update processor that does this at index time? See: https://cwiki.apache.org/confluence/display/solr/Update+Request+Processors, in particular the StatelessScriptUpd

Re: bulk reindexing 5.3.0 issue

2015-09-26 Thread Shawn Heisey
On 9/26/2015 10:41 AM, Shawn Heisey wrote: > 30 This needs to include openSearcher=false, as Erick mentioned. I'm sorry I screwed that up: 30 false Thanks, Shawn

Re: bulk reindexing 5.3.0 issue

2015-09-26 Thread Erick Erickson
Well, let's forget the cursormark stuff for a bit. There's no reason you should be getting all 1.4 million rows. Presumably you've been running this program occasionally and blanking strings like "sun.org.mozilla.javascript.internal.NativeString:" in the uuid field. Then you turn around and run th

Re: bulk reindexing 5.3.0 issue

2015-09-26 Thread Shawn Heisey
On 9/25/2015 10:10 PM, Ravi Solr wrote: > thank you for taking time to help me out. Yes I was not using cursorMark, I > will try that next. This is what I was doing, its a bit shabby coding but > what can I say my brain was fried :-) FYI this is a side process just to > correct a messed up string.

Re: bulk reindexing 5.3.0 issue

2015-09-25 Thread Ravi Solr
Erick I fixed the "missing content stream" issue as well. by making sure Iam not adding empty list. However, My very first issue of getting zero docs once in a while is still haunting me, even after using cursorMarkers, disabling auto commit and soft commit. I ran code two times and you can see the

Re: bulk reindexing 5.3.0 issue

2015-09-25 Thread Ravi Solr
Erick as per your advise I used cursorMarks (see code below). It was slightly better but Solr throws Exceptions randomly. Please look at the code and Stacktrace below 2015-09-26 01:00:45 INFO [a.b.c.AdhocCorrectUUID] - Indexed 500/1453133 2015-09-26 01:00:49 INFO [a.b.c.AdhocCorrectUUID] - Index

Re: bulk reindexing 5.3.0 issue

2015-09-25 Thread Ravi Solr
thank you for taking time to help me out. Yes I was not using cursorMark, I will try that next. This is what I was doing, its a bit shabby coding but what can I say my brain was fried :-) FYI this is a side process just to correct a messed up string. The actual indexing process was working all the

Re: bulk reindexing 5.3.0 issue

2015-09-25 Thread Erick Erickson
Wait, query again how? You've got to have something that keeps you from getting the same 100 docs back so you have to be sorting somehow. Or you have a high water mark. Or something. Waiting 5 seconds for any commit also doesn't really make sense to me. I mean how do you know 1> that you're going

Re: bulk reindexing 5.3.0 issue

2015-09-25 Thread Ravi Solr
Thanks for responding Erick. I set the "start" to zero and "rows" always to 100. I create CloudSolrClient instance and use it to both query as well as index. But I do sleep for 5 secs just to allow for any auto commits. So query --> client.add(100 docs) --> wait --> query again But the weird thin

Re: bulk reindexing 5.3.0 issue

2015-09-25 Thread Erick Erickson
How are you querying Solr? You say you query for 100 docs, update then get the next set. What are you using for a marker? If you're using the start parameter, and somehow a commit is creeping in things might be weird, especially if you're using any of the internal Lucene doc IDs. If you're absolute

Re: bulk reindexing 5.3.0 issue

2015-09-25 Thread Ravi Solr
No problem Walter, it's all fun. Was just wondering if there was some other good way that I did not know of, that's all 😀 Thanks Ravi Kiran Bhaskar On Friday, September 25, 2015, Walter Underwood wrote: > Sorry, I did not mean to be rude. The original question did not say that > you don’t have

Re: bulk reindexing 5.3.0 issue

2015-09-25 Thread Walter Underwood
Sorry, I did not mean to be rude. The original question did not say that you don’t have the docs outside of Solr. Some people jump to the advanced features and miss the simple ones. It might be faster to fetch all the docs from Solr and save them in files. Then modify them. Then reload all of t

Re: bulk reindexing 5.3.0 issue

2015-09-25 Thread Ravi Solr
Walter, Not in a mood for banter right now Its 6:00pm on a friday and Iam stuck here trying to figure reindexing issues :-) I dont have source of docs so I have to query the SOLR, modify and put it back and that is seeming to be quite a task in 5.3.0, I did reindex several times with 4.7.2 in a

Re: bulk reindexing 5.3.0 issue

2015-09-25 Thread Walter Underwood
Sure. 1. Delete all the docs (no commit). 2. Add all the docs (no commit). 3. Commit. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Sep 25, 2015, at 2:17 PM, Ravi Solr wrote: > > I have been trying to re-index the docs (about 1.5 million) as one

bulk reindexing 5.3.0 issue

2015-09-25 Thread Ravi Solr
I have been trying to re-index the docs (about 1.5 million) as one of the field needed part of string value removed (accidentally introduced). I was issuing a query for 100 docs getting 4 fields and updating the doc (atomic update with "set") via the CloudSolrClient in batches, However from time t