RE: Solr Replication: How to restore data from last snapshot

2009-11-08 Thread Osborn Chan
What happen if it is multiple core?

Thanks

-Original Message-
From: noble.p...@gmail.com [mailto:noble.p...@gmail.com] On Behalf Of Noble 
Paul ??? ??
Sent: Friday, November 06, 2009 10:49 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr Replication: How to restore data from last snapshot

if it is a single core you will have to restart the master

On Sat, Nov 7, 2009 at 1:55 AM, Osborn Chan  wrote:
> Thanks. But I have following use cases:
>
> 1) Master index is corrupted, but it didn't replicate to slave servers.
>        - In this case, I only need to restore to last snapshot.
> 2) Master index is corrupted, and it has replicated to slave servers.
>        - In this case, I need to restore to last snapshot, and make sure 
> slave servers replicate the restored index from index server as well.
>
> Assuming both cases are in production environment, and I cannot shutdown the 
> master and slave servers.
> Is there any rest API call or something else I can do without manually using 
> linux command and restart?
>
> Thanks,
>
> Osborn
>
> -Original Message-
> From: Matthew Runo [mailto:matthew.r...@gmail.com]
> Sent: Friday, November 06, 2009 12:20 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr Replication: How to restore data from last snapshot
>
> If your master index is corrupt and it hasn't been replicated out, you
> should be able to shut down the server and remove the corrupted index
> files. Then copy the replicated index back onto the master and start
> everything back up.
>
> As far as I know, the indexes on the replicated slaves are exactly
> what you'd have on the master, so this method should work.
>
> --Matthew Runo
>
> On Fri, Nov 6, 2009 at 11:41 AM, Osborn Chan  wrote:
>> Hi,
>>
>> I have followed Solr set up ReplicationHandler for index replication to 
>> slave.
>> Do anyone know how to restore corrupted index from snapshot in master, and 
>> force replication of the restored index to slave?
>>
>>
>> Thanks,
>>
>> Osborn
>>
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


EOF IOException Query

2010-01-11 Thread Osborn Chan
Hi all,

I got following exception for SOLR, but the index is still searchable. (At 
least it is searchable for query "*:*".)
I am just wondering what is the root cause.

Thanks,
Osborn

INFO: [publicGalleryPostMaster] webapp=/multicore path=/select 
params={wt=javabin&rows=12&start=0&sort=/gallery/1/postlist/1Rank_i+desc&q=%2B(comm
unityList_s_m:/gallery/1/postlist/1)+%2Bstate_s:A&version=1} status=500 QTime=3
Jan 11, 2010 12:23:01 PM org.apache.solr.common.SolrException log
SEVERE: java.io.IOException: read past EOF
at 
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:151)
at 
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38)
at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:80)
at 
org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:112)
at 
org.apache.lucene.search.FieldCacheImpl$StringIndexCache.createValue(FieldCacheImpl.java:712)
at 
org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:208)
at 
org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:676)
at 
org.apache.lucene.search.FieldComparator$StringOrdValComparator.setNextReader(FieldComparator.java:667)
at 
org.apache.lucene.search.TopFieldCollector$OneComparatorNonScoringCollector.setNextReader(TopFieldCollector.java:94)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:245)
at org.apache.lucene.search.Searcher.search(Searcher.java:171)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:988)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:884)
at 
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:341)
at 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:182)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)


Index Courruption after replication by new Solr 1.4 Replication

2010-01-15 Thread Osborn Chan
Hi all,

I have migrated new Solr 1.4 Replication feature with multicore support from 
Solr 1.2 with NFS mounting recently. The following exceptions are in 
catalina.log from time to time, and there are some EOF exceptions which I 
believe the slave index files are corrupted after replication from index 
server. I have following configuration with Solr 1.4, please correct me if it 
is configured incorrectly. 

(The index files are not corrupted in master servers, but it is corrupted in 
slave servers. Usually only one of the slave servers are corrupted with EOF 
exception, but not all.)

1 Master Server: (Index Server)
- 8 indexes with multicore configuration.
- All indexes are configured to "replicateAfter" optimize only.
- The size of index data are vary. The smallest index only have 2.5 MB. 
The biggest index have ~ 100 MB. 
- There would be infrequent optimize calls to indexes. (a optimize call 
every ~30 mins to 6 hours depending on indexes).
- There are many commit calls to all indexes. (But there is no 
concurrent commit and optimize for all indexes.)
- Did not configure "commitReserveDuration" in ReplicationHandler - 
Using default values.

4 Slave Servers (Search Server)
- 8 indexes with multicore configuration.
- All indexes are configured to poll for every ~15 minutes.
- All update handler configuration are removed in solrconfig-slave.xml 
(solrconfig.xml) in order to prevent add/commit/optimize calls. 
- (Search Slave Servers are only responsible for search operation.)
-   removed.
-  removed.
-  removed.

A) FileNotFoundException

INFO: Total time taken for download : 1 secs
Jan 15, 2010 10:34:16 AM org.apache.solr.handler.ReplicationHandler doFetch
SEVERE: SnapPull failed
org.apache.solr.common.SolrException: Index fetch failed :
at 
org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:329)
at 
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:264)
at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:417)
at 
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:280)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:135)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:65)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:142)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:166)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
at java.lang.Thread.run(Thread.java:595)
Caused by: java.io.FileNotFoundException: File does not exist 
/slaveIndexData/publicGalleryTagDef/index.20100115103415/_al.fdx
at org.apache.solr.common.util.FileUtils.sync(FileUtils.java:55)
at 
org.apache.solr.handler.SnapPuller$FileFetcher$1.run(SnapPuller.java:911)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:417)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269)
at java.util.concurrent.FutureTask.run(FutureTask.java:123)
... 3 more
Jan 15, 2010 10:34:17 AM org.apache.solr.core.SolrCore execute
INFO: [publicGalleryPostMaster] webapp=/multicore path=/select 
params={wt=javabin&rows=10&start=0&sort=createTime_dt+desc&q=%2B(profileId_s:/community/sfly/publicprofile/0AcM27Nw3aNWLi4)+%2Bstate_s:A&version=1}
 hits=1 status=0 QTime=1

B) LockReleaseFailedException

SEVERE: SnapPull failed
org.apache.solr.common.SolrException: Index fetch failed :
at 
org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:329)
at 
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:264)
at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:417)
at 
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:280)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:135)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:65)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:142)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:166)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
at 
java.util.concurrent.ThreadPoolExecutor$W

RE: Index Courruption after replication by new Solr 1.4 Replication

2010-01-15 Thread Osborn Chan
Hi Otis,

Thanks. There is no NFS anymore, and all index files are local. We migrated to 
new Solr 1.4 new Replication in order to avoid all the NSF Stale Exception. 

Thanks,

Osborn

-Original Message-
From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] 
Sent: Friday, January 15, 2010 12:31 PM
To: solr-user@lucene.apache.org
Subject: Re: Index Courruption after replication by new Solr 1.4 Replication

This is not a direct answer to your question, but can you avoid NFS?  My first 
guess would be that NFS somehow causes this problem.  If you check the ML 
archives for: NFS lock , you will see what I mean.

Otis
--
Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch



- Original Message 
> From: Osborn Chan 
> To: "solr-user@lucene.apache.org" 
> Sent: Fri, January 15, 2010 3:23:21 PM
> Subject: Index Courruption after replication by new Solr 1.4 Replication
> 
> Hi all,
> 
> I have migrated new Solr 1.4 Replication feature with multicore support from 
> Solr 1.2 with NFS mounting recently. The following exceptions are in 
> catalina.log from time to time, and there are some EOF exceptions which I 
> believe the slave index files are corrupted after replication from index 
> server. 
> I have following configuration with Solr 1.4, please correct me if it is 
> configured incorrectly. 
> 
> (The index files are not corrupted in master servers, but it is corrupted in 
> slave servers. Usually only one of the slave servers are corrupted with EOF 
> exception, but not all.)
> 
> 1 Master Server: (Index Server)
> - 8 indexes with multicore configuration.
> - All indexes are configured to "replicateAfter" optimize only.
> - The size of index data are vary. The smallest index only have 2.5 MB. 
> The 
> biggest index have ~ 100 MB. 
> - There would be infrequent optimize calls to indexes. (a optimize call 
> every ~30 mins to 6 hours depending on indexes).
> - There are many commit calls to all indexes. (But there is no concurrent 
> commit and optimize for all indexes.)
> - Did not configure "commitReserveDuration" in ReplicationHandler - Using 
> default values.
> 
> 4 Slave Servers (Search Server)
> - 8 indexes with multicore configuration.
> - All indexes are configured to poll for every ~15 minutes.
> - All update handler configuration are removed in solrconfig-slave.xml 
> (solrconfig.xml) in order to prevent add/commit/optimize calls. 
> - (Search Slave Servers are only responsible for search operation.)
> -  removed.
> - 
> removed.
> - 
> class="solr.BinaryUpdateRequestHandler" /> removed.
> 
> A) FileNotFoundException
> 
> INFO: Total time taken for download : 1 secs
> Jan 15, 2010 10:34:16 AM org.apache.solr.handler.ReplicationHandler doFetch
> SEVERE: SnapPull failed
> org.apache.solr.common.SolrException: Index fetch failed :
> at 
> org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:329)
> at 
> org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:264)
> at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:417)
> at 
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:280)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:135)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:65)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:142)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:166)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
> at java.lang.Thread.run(Thread.java:595)
> Caused by: java.io.FileNotFoundException: File does not exist 
> /slaveIndexData/publicGalleryTagDef/index.20100115103415/_al.fdx
> at org.apache.solr.common.util.FileUtils.sync(FileUtils.java:55)
> at 
> org.apache.solr.handler.SnapPuller$FileFetcher$1.run(SnapPuller.java:911)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:417)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269)
> at java.util.concurrent.FutureTask.run(FutureTask.java:123)
> ... 3 more
> Jan 15, 2010 10:34:17 AM org.apache.solr.core.SolrCore execute
> INFO: [publicGa

RE: Index Courruption after replication by new Solr 1.4 Replication

2010-02-10 Thread Osborn Chan
Hi All,

I found out there is file corruption issue by using both "EmbeddedSolrServer" & 
"Solr 1.4 Java based replication" together in slave server.


In my slave server, I have 2 webapps in a tomcat instance. 
1) "multicore" webapp with slave config
2) "my custom" webapp using EmbeddedSolrServer while queries Solr Index Data.
 
Both webapps were set up according to the instruction from Solr wiki.
However, I found out there are multi-threading issue which cause index file 
corruption.

The following is the root case:
EmbeddedSolrServer requires to have a CoreContainer object as parameter. 
However, during the creation of CoreContainer object, the process load the 
slave solr configuration which silently creates an Extra ReplcationHandler 
(SnapPuller) in background. However, there is a ReplcationHandler (SnapPuller) 
already created by multicore webapp because of the slave configuration.

As a result, there are 2 threads doing file replication as same time. It causes 
index corruption with different IOExceptions.
After I replaced the usage of EmbeddedSolrServer with CommonsHttpSolrServer 
(Stop creating CoreContainer object in slave server), Solr 1.4 Java based 
replication work perfectly without having any file corruption issue.

In other to use EmbeddedSolrServer in slave server, I think we need to have a 
way to create CoreContainer object with slave configuration without creating 
extra thread to replicate files.
Should I file a bug?

Thanks,

Osborn



-Original Message-
From: Osborn Chan [mailto:oc...@shutterfly.com] 
Sent: Friday, January 15, 2010 12:35 PM
To: solr-user@lucene.apache.org
Subject: RE: Index Courruption after replication by new Solr 1.4 Replication

Hi Otis,

Thanks. There is no NFS anymore, and all index files are local. We migrated to 
new Solr 1.4 new Replication in order to avoid all the NSF Stale Exception. 

Thanks,

Osborn

-Original Message-
From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] 
Sent: Friday, January 15, 2010 12:31 PM
To: solr-user@lucene.apache.org
Subject: Re: Index Courruption after replication by new Solr 1.4 Replication

This is not a direct answer to your question, but can you avoid NFS?  My first 
guess would be that NFS somehow causes this problem.  If you check the ML 
archives for: NFS lock , you will see what I mean.

Otis
--
Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch



----- Original Message 
> From: Osborn Chan 
> To: "solr-user@lucene.apache.org" 
> Sent: Fri, January 15, 2010 3:23:21 PM
> Subject: Index Courruption after replication by new Solr 1.4 Replication
> 
> Hi all,
> 
> I have migrated new Solr 1.4 Replication feature with multicore support from 
> Solr 1.2 with NFS mounting recently. The following exceptions are in 
> catalina.log from time to time, and there are some EOF exceptions which I 
> believe the slave index files are corrupted after replication from index 
> server. 
> I have following configuration with Solr 1.4, please correct me if it is 
> configured incorrectly. 
> 
> (The index files are not corrupted in master servers, but it is corrupted in 
> slave servers. Usually only one of the slave servers are corrupted with EOF 
> exception, but not all.)
> 
> 1 Master Server: (Index Server)
> - 8 indexes with multicore configuration.
> - All indexes are configured to "replicateAfter" optimize only.
> - The size of index data are vary. The smallest index only have 2.5 MB. 
> The 
> biggest index have ~ 100 MB. 
> - There would be infrequent optimize calls to indexes. (a optimize call 
> every ~30 mins to 6 hours depending on indexes).
> - There are many commit calls to all indexes. (But there is no concurrent 
> commit and optimize for all indexes.)
> - Did not configure "commitReserveDuration" in ReplicationHandler - Using 
> default values.
> 
> 4 Slave Servers (Search Server)
> - 8 indexes with multicore configuration.
> - All indexes are configured to poll for every ~15 minutes.
> - All update handler configuration are removed in solrconfig-slave.xml 
> (solrconfig.xml) in order to prevent add/commit/optimize calls. 
> - (Search Slave Servers are only responsible for search operation.)
> -  removed.
> - 
> removed.
> - 
> class="solr.BinaryUpdateRequestHandler" /> removed.
> 
> A) FileNotFoundException
> 
> INFO: Total time taken for download : 1 secs
> Jan 15, 2010 10:34:16 AM org.apache.solr.handler.ReplicationHandler doFetch
> SEVERE: SnapPull failed
> org.apache.solr.common.SolrException: Index fetch failed :
> at 
> org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:329)
> at 
> org.apache.solr.handler.ReplicationHandle