Load pre-built index to Solr

2016-03-10 Thread praneethvarma
I'm building an index on HDFS using the MapReduceIndexerTool which I'd later
like to load into my Solr cores with minimal delay. With Solr 4.4, I was
able to switch out the underlying index directory of a core (I don't need to
keep any of the existing index) and reload the core, and it worked fine. I'm
upgrading to Solr 4.10.3 which behaves little differently. Upon reload it
deletes all the index files that are not referenced by the SegmentInfo that
was in memory (which would not know about the new index files). I end up
with a clean index directory after reload. To work around this, I'm creating
a new core with a datadirectory that already has the index I built for the
same shard and then unloading the original core hoping for this new core to
become the leader. But the problem here is that the new core gets stuck in
the recovering state and cannot join the leader election since its state is
"recovering". However, after one hour, I think (from the logs) is updating
the status of these cores to "down" and they are brought back up. Then the
core registers itself as a leader. 

Firstly, I'm trying to force a leader elect (including this recovering
core).

Secondly, I'm very curious as to what happens every 1 hour (or this is
probably a timeout). I just want to understand.

Thirdly, is there a better way to load a pre-built index quickly like
before? 

Can anyone help me find answers to above questions?

Thanks in advance



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Load-pre-built-index-to-Solr-tp4263162.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Load pre-built index to Solr

2016-03-11 Thread praneethvarma
Thank you Erick. 

I actually only have an index. I do not have collection B that hosts this
index. The reducers of the MR job build an index (reducer per shard). I'm
looking to load these generated lucene index files to the cores of a
collection (new or existing and then I can work with aliasing after that as
you suggested).



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Load-pre-built-index-to-Solr-tp4263162p4263316.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Load pre-built index to Solr

2016-03-14 Thread praneethvarma
As you guessed, I'm trying to build a non-HDFS collection from the index
files in HDFS (constructed by MRIT). To give you the overall picture, below
is my workflow (Sorry if it is too long)- 

I have collection-A that is serving an index and I'm replacing the
collection with another one - collection-B. I'm using the aliasing to switch
to colletion-B. I have been doing this previously. The problem I'm having
now is when I'm trying to bring up the collection-B from the MRIT indices.
Precisely copying the index files is where I'm having the problem.

Previously (I was on 4.4.0 (Sorry, yeah so outdated. We were on CDH4)), I
was creating the collection-B on the cluster (clean without any data). I had
a request handler that takes a hdfs location and copies the index from the
hdfs to the appropriate local core data directory by replacing the original
content (original content is basically just one segment file without any
documents.). 

After this, once I reload the core(s), they picked up the new index and
collection-B is ready and I could switch the alias. All good.

With the newer version of Solr (I'm trying 4.10.3), replacing the contents
of the index directory for cores in collection-B is not working as it used
to. When I reload a core, the segment info for the index-writer in memory,
does not have the references to new files that I just placed in the core's
index directory. It deletes all the files that it doesn't know about (I
guess to prevent index corruption). So, once it comes back up, the core is
empty once again. So, copy and reload doesn't work.

To get around the the above situation, I'm creating a new core per shard
with a data directory that I give it (pre-populated with the index that is
copied from HDFS). This new core happily loads the index. I'm doing this for
all the shards btw.

Now, I'm unloading/deleting the old core so that the new core is the only
one serving the shard. Then I'll have the collection-B with all its shards
served by cores that are have the MRIT indices.

Problem: The new core gets stuck in the "recovering" state since I unloaded
the old core. This was my original problem. This new core will never become
the leader since its last published state was "recovering" and leaves the
collection-B unusable.

Update : Over the weekend, I found a way to get around this. I added a
custom handler that takes a core name through the request and puts it in
"active" state (publishing to the overseer). Now in the recovery strategy
recovery retry, the code looks for the lastPublishedState, sees it as
"active", joins the election process and becomes the leader. Now
collection-B is ready with new index and I can switch the alias to point to
collection-B.

Sorry for such a long description. Just wanted to give a clear description
of the problem.

We actually also have a case where we could use the go-live option as the
new in this case collection is kept on HDFS. But since the go-live option
effectively triggers a copy of of the index from HDFS to HDFS which coold
actually be replaced by a rename/move operation on HDFS, we are going with
the above workaround for the HDFS index as well.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Load-pre-built-index-to-Solr-tp4263162p4263734.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr: IndexNotFoundException: no segments* file HdfsDirectoryFactory

2014-05-29 Thread praneethvarma
I'm trying to write some integration tests against SolrCloud for which I'm
setting up a solr instance backed with a zookeeper and pointing it to a
namenode (all in memory using hadoop testing utilities and JettySolrRunner).
I'm getting the following error when I'm trying to create a collection (btw,
the exact same configuration works just fine in dev with solrcloud).

org.apache.lucene.index.IndexNotFoundException: no segments* file found
in NRTCachingDirectory(HdfsDirectory@2ea2a4e4
lockFactory=org.apache.solr.store.hdfs.HdfsLockFactory@4cf0e472;
maxCacheMB=192.0 maxMergeSizeMB=16.0): files: [HdfsDirectory@6bf4fc1c
lockFactory=org.apache.solr.store.hdfs.hdfslockfact...@51115f81-write.lock]

I'm getting this error when I'm trying to create a collection (precisely,
when solr is actually trying to open a searcher on the new index.). There
are no segment files in the index directory on HDFS. So this error is
expected on opening a searcher on the index but I thought that the segment
file is created the first time (when a collection is being created). 

After some debugging I noticed that the IndexWriter  is being initialized
explicitly with APPEND mode by overriding the default APPEND_CREATE mode,
which means that the segment files won't be created if at least one doesn't
exist. I'm not sure why this is the case and also I may be going down the
wrong path with the error. Again this only happens in my in-memory solrcloud
setup.

Can someone help me with this? Thanks




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-IndexNotFoundException-no-segments-file-HdfsDirectoryFactory-tp4138737.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr: IndexNotFoundException: no segments* file HdfsDirectoryFactory

2015-01-08 Thread praneethvarma
I've missed Norgorn's reply above. But in the past and also as suggested
above, I think the following lock type solved the problem for me.

${solr.lock.type:hdfs} in your indexConfig in
solrconfig.xml



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-IndexNotFoundException-no-segments-file-HdfsDirectoryFactory-tp4138737p4178098.html
Sent from the Solr - User mailing list archive at Nabble.com.