As you guessed, I'm trying to build a non-HDFS collection from the index
files in HDFS (constructed by MRIT). To give you the overall picture, below
is my workflow (Sorry if it is too long)- 

I have collection-A that is serving an index and I'm replacing the
collection with another one - collection-B. I'm using the aliasing to switch
to colletion-B. I have been doing this previously. The problem I'm having
now is when I'm trying to bring up the collection-B from the MRIT indices.
Precisely copying the index files is where I'm having the problem.

Previously (I was on 4.4.0 (Sorry, yeah so outdated. We were on CDH4)), I
was creating the collection-B on the cluster (clean without any data). I had
a request handler that takes a hdfs location and copies the index from the
hdfs to the appropriate local core data directory by replacing the original
content (original content is basically just one segment file without any
documents.). 

After this, once I reload the core(s), they picked up the new index and
collection-B is ready and I could switch the alias. All good.

With the newer version of Solr (I'm trying 4.10.3), replacing the contents
of the index directory for cores in collection-B is not working as it used
to. When I reload a core, the segment info for the index-writer in memory,
does not have the references to new files that I just placed in the core's
index directory. It deletes all the files that it doesn't know about (I
guess to prevent index corruption). So, once it comes back up, the core is
empty once again. So, copy and reload doesn't work.

To get around the the above situation, I'm creating a new core per shard
with a data directory that I give it (pre-populated with the index that is
copied from HDFS). This new core happily loads the index. I'm doing this for
all the shards btw.

Now, I'm unloading/deleting the old core so that the new core is the only
one serving the shard. Then I'll have the collection-B with all its shards
served by cores that are have the MRIT indices.

Problem: The new core gets stuck in the "recovering" state since I unloaded
the old core. This was my original problem. This new core will never become
the leader since its last published state was "recovering" and leaves the
collection-B unusable.

Update : Over the weekend, I found a way to get around this. I added a
custom handler that takes a core name through the request and puts it in
"active" state (publishing to the overseer). Now in the recovery strategy
recovery retry, the code looks for the lastPublishedState, sees it as
"active", joins the election process and becomes the leader. Now
collection-B is ready with new index and I can switch the alias to point to
collection-B.

Sorry for such a long description. Just wanted to give a clear description
of the problem.

We actually also have a case where we could use the go-live option as the
new in this case collection is kept on HDFS. But since the go-live option
effectively triggers a copy of of the index from HDFS to HDFS which coold
actually be replaced by a rename/move operation on HDFS, we are going with
the above workaround for the HDFS index as well.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Load-pre-built-index-to-Solr-tp4263162p4263734.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to