As you guessed, I'm trying to build a non-HDFS collection from the index files in HDFS (constructed by MRIT). To give you the overall picture, below is my workflow (Sorry if it is too long)-
I have collection-A that is serving an index and I'm replacing the collection with another one - collection-B. I'm using the aliasing to switch to colletion-B. I have been doing this previously. The problem I'm having now is when I'm trying to bring up the collection-B from the MRIT indices. Precisely copying the index files is where I'm having the problem. Previously (I was on 4.4.0 (Sorry, yeah so outdated. We were on CDH4)), I was creating the collection-B on the cluster (clean without any data). I had a request handler that takes a hdfs location and copies the index from the hdfs to the appropriate local core data directory by replacing the original content (original content is basically just one segment file without any documents.). After this, once I reload the core(s), they picked up the new index and collection-B is ready and I could switch the alias. All good. With the newer version of Solr (I'm trying 4.10.3), replacing the contents of the index directory for cores in collection-B is not working as it used to. When I reload a core, the segment info for the index-writer in memory, does not have the references to new files that I just placed in the core's index directory. It deletes all the files that it doesn't know about (I guess to prevent index corruption). So, once it comes back up, the core is empty once again. So, copy and reload doesn't work. To get around the the above situation, I'm creating a new core per shard with a data directory that I give it (pre-populated with the index that is copied from HDFS). This new core happily loads the index. I'm doing this for all the shards btw. Now, I'm unloading/deleting the old core so that the new core is the only one serving the shard. Then I'll have the collection-B with all its shards served by cores that are have the MRIT indices. Problem: The new core gets stuck in the "recovering" state since I unloaded the old core. This was my original problem. This new core will never become the leader since its last published state was "recovering" and leaves the collection-B unusable. Update : Over the weekend, I found a way to get around this. I added a custom handler that takes a core name through the request and puts it in "active" state (publishing to the overseer). Now in the recovery strategy recovery retry, the code looks for the lastPublishedState, sees it as "active", joins the election process and becomes the leader. Now collection-B is ready with new index and I can switch the alias to point to collection-B. Sorry for such a long description. Just wanted to give a clear description of the problem. We actually also have a case where we could use the go-live option as the new in this case collection is kept on HDFS. But since the go-live option effectively triggers a copy of of the index from HDFS to HDFS which coold actually be replaced by a rename/move operation on HDFS, we are going with the above workaround for the HDFS index as well. -- View this message in context: http://lucene.472066.n3.nabble.com/Load-pre-built-index-to-Solr-tp4263162p4263734.html Sent from the Solr - User mailing list archive at Nabble.com.