Load pre-built index to Solr
I'm building an index on HDFS using the MapReduceIndexerTool which I'd later like to load into my Solr cores with minimal delay. With Solr 4.4, I was able to switch out the underlying index directory of a core (I don't need to keep any of the existing index) and reload the core, and it worked fine. I'm upgrading to Solr 4.10.3 which behaves little differently. Upon reload it deletes all the index files that are not referenced by the SegmentInfo that was in memory (which would not know about the new index files). I end up with a clean index directory after reload. To work around this, I'm creating a new core with a datadirectory that already has the index I built for the same shard and then unloading the original core hoping for this new core to become the leader. But the problem here is that the new core gets stuck in the recovering state and cannot join the leader election since its state is "recovering". However, after one hour, I think (from the logs) is updating the status of these cores to "down" and they are brought back up. Then the core registers itself as a leader. Firstly, I'm trying to force a leader elect (including this recovering core). Secondly, I'm very curious as to what happens every 1 hour (or this is probably a timeout). I just want to understand. Thirdly, is there a better way to load a pre-built index quickly like before? Can anyone help me find answers to above questions? Thanks in advance -- View this message in context: http://lucene.472066.n3.nabble.com/Load-pre-built-index-to-Solr-tp4263162.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Load pre-built index to Solr
Thank you Erick. I actually only have an index. I do not have collection B that hosts this index. The reducers of the MR job build an index (reducer per shard). I'm looking to load these generated lucene index files to the cores of a collection (new or existing and then I can work with aliasing after that as you suggested). -- View this message in context: http://lucene.472066.n3.nabble.com/Load-pre-built-index-to-Solr-tp4263162p4263316.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Load pre-built index to Solr
As you guessed, I'm trying to build a non-HDFS collection from the index files in HDFS (constructed by MRIT). To give you the overall picture, below is my workflow (Sorry if it is too long)- I have collection-A that is serving an index and I'm replacing the collection with another one - collection-B. I'm using the aliasing to switch to colletion-B. I have been doing this previously. The problem I'm having now is when I'm trying to bring up the collection-B from the MRIT indices. Precisely copying the index files is where I'm having the problem. Previously (I was on 4.4.0 (Sorry, yeah so outdated. We were on CDH4)), I was creating the collection-B on the cluster (clean without any data). I had a request handler that takes a hdfs location and copies the index from the hdfs to the appropriate local core data directory by replacing the original content (original content is basically just one segment file without any documents.). After this, once I reload the core(s), they picked up the new index and collection-B is ready and I could switch the alias. All good. With the newer version of Solr (I'm trying 4.10.3), replacing the contents of the index directory for cores in collection-B is not working as it used to. When I reload a core, the segment info for the index-writer in memory, does not have the references to new files that I just placed in the core's index directory. It deletes all the files that it doesn't know about (I guess to prevent index corruption). So, once it comes back up, the core is empty once again. So, copy and reload doesn't work. To get around the the above situation, I'm creating a new core per shard with a data directory that I give it (pre-populated with the index that is copied from HDFS). This new core happily loads the index. I'm doing this for all the shards btw. Now, I'm unloading/deleting the old core so that the new core is the only one serving the shard. Then I'll have the collection-B with all its shards served by cores that are have the MRIT indices. Problem: The new core gets stuck in the "recovering" state since I unloaded the old core. This was my original problem. This new core will never become the leader since its last published state was "recovering" and leaves the collection-B unusable. Update : Over the weekend, I found a way to get around this. I added a custom handler that takes a core name through the request and puts it in "active" state (publishing to the overseer). Now in the recovery strategy recovery retry, the code looks for the lastPublishedState, sees it as "active", joins the election process and becomes the leader. Now collection-B is ready with new index and I can switch the alias to point to collection-B. Sorry for such a long description. Just wanted to give a clear description of the problem. We actually also have a case where we could use the go-live option as the new in this case collection is kept on HDFS. But since the go-live option effectively triggers a copy of of the index from HDFS to HDFS which coold actually be replaced by a rename/move operation on HDFS, we are going with the above workaround for the HDFS index as well. -- View this message in context: http://lucene.472066.n3.nabble.com/Load-pre-built-index-to-Solr-tp4263162p4263734.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr: IndexNotFoundException: no segments* file HdfsDirectoryFactory
I'm trying to write some integration tests against SolrCloud for which I'm setting up a solr instance backed with a zookeeper and pointing it to a namenode (all in memory using hadoop testing utilities and JettySolrRunner). I'm getting the following error when I'm trying to create a collection (btw, the exact same configuration works just fine in dev with solrcloud). org.apache.lucene.index.IndexNotFoundException: no segments* file found in NRTCachingDirectory(HdfsDirectory@2ea2a4e4 lockFactory=org.apache.solr.store.hdfs.HdfsLockFactory@4cf0e472; maxCacheMB=192.0 maxMergeSizeMB=16.0): files: [HdfsDirectory@6bf4fc1c lockFactory=org.apache.solr.store.hdfs.hdfslockfact...@51115f81-write.lock] I'm getting this error when I'm trying to create a collection (precisely, when solr is actually trying to open a searcher on the new index.). There are no segment files in the index directory on HDFS. So this error is expected on opening a searcher on the index but I thought that the segment file is created the first time (when a collection is being created). After some debugging I noticed that the IndexWriter is being initialized explicitly with APPEND mode by overriding the default APPEND_CREATE mode, which means that the segment files won't be created if at least one doesn't exist. I'm not sure why this is the case and also I may be going down the wrong path with the error. Again this only happens in my in-memory solrcloud setup. Can someone help me with this? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-IndexNotFoundException-no-segments-file-HdfsDirectoryFactory-tp4138737.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr: IndexNotFoundException: no segments* file HdfsDirectoryFactory
I've missed Norgorn's reply above. But in the past and also as suggested above, I think the following lock type solved the problem for me. ${solr.lock.type:hdfs} in your indexConfig in solrconfig.xml -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-IndexNotFoundException-no-segments-file-HdfsDirectoryFactory-tp4138737p4178098.html Sent from the Solr - User mailing list archive at Nabble.com.