SOLR/Tomcat6 keeping references to deleted tlog files

2013-10-22 Thread Eric Bus
Hi,

I've been running a SolrCloud setup running SOLR 4.4 consisting of 3 nodes for 
some time. The cloud is hosting about 40 small collections that receive updates 
once a day. The collections are using different shard and replication 
configurations (varying from 2 shards without replication to 2 shard with 3 
replicas).

After running Tomcat for a couple of weeks, I notice the number of open files 
is dramatically increasing. Most of those files are deleted tlog files that 
SOLR keeps open:

eric@node1:/ # lsof -np 16810 | grep deleted | wc -l
36345

Those files are no longer on disk, but SOLR still has a handle open. My disk 
use is going through the roof. 6GB is currently 'in use' by deleted but still 
open files. When I restart Tomcat, the space is freed and it starts all over 
again. All of my nodes experience this behavior.

First I thought it had something to do with the lack of commits. But it happens 
on all my collections, even the ones with fast autoCommit:


  5000
  12
  false


My update process always triggers a commit or rollback and updates are showing 
up correctly.

I read something about SOLR having TCP connections in CLOSE_WAIT. The only 
CLOSE_WAIT connection I see are between the nodes. And there are only about 10 
of them. Those connections can't be causing 36k open files, right?

Any suggestions/tips? At the moment, I have to restart my leader every couple 
of weeks and that's not really something I would like to do :)

Best regards,
Eric Bus



Using all SolrCloud servers in round-robin setup

2013-11-05 Thread Eric Bus
Hi,

I'm currently using a SolrCloud setup with 3 nodes. The setup hosts about 50 
(small) collections of a few thousand documents each. In the past, I've used 
collections with replicationFactor = 3. So each node has a replica of all the 
collections.

But now I want to add an extra node. Now, new collections can be created on 
server 1, 2 and 4. Or on 1, 3 and 4. I'm not specifying specific nodes at 
creation time. My problem is that I cannot use each node in the cluster to 
query my collections. If a collection is not hosted on node 2, I cannot use 
node 2 to query that collection. Is that normal behavior? Does that mean that 
I'll have to keep a list of nodes per collection (or query and cache it from 
zookeeper) and use that in my client application?

Currently I'm using one of the nodes as a fixed IP in my client application. 
This node contains all the collections, because new collections are always 
created on that node. But when it goes down, there is no other node that 
contains all the collections.

Best regards,
Eric Bus


SolrCloud keeps repeating exception 'SolrCoreState already closed'

2013-11-07 Thread Eric Bus
Hi,

I'm having a problem with one of my shards. Since yesterday, SOLR keeps 
repeating the same exception over and over for this shard.
The webinterface for this SOLR instance is also not working (it hangs on the 
Loading indicator).

Nov 7, 2013 9:08:12 AM org.apache.solr.update.processor.LogUpdateProcessor 
finish
INFO: [website1_shard1_replica3] webapp=/solr path=/update 
params={update.distrib=TOLEADER&wt=javabin&version=2} {} 0 0
Nov 7, 2013 9:08:12 AM org.apache.solr.common.SolrException log
SEVERE: java.lang.RuntimeException: SolrCoreState already closed
at 
org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:79)
at 
org.apache.solr.update.DirectUpdateHandler2.delete(DirectUpdateHandler2.java:276)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processDelete(RunUpdateProcessorFactory.java:77)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processDelete(UpdateRequestProcessor.java:55)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalDelete(DistributedUpdateProcessor.java:460)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.versionDelete(DistributedUpdateProcessor.java:1036)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processDelete(DistributedUpdateProcessor.java:721)
at 
org.apache.solr.update.processor.LogUpdateProcessor.processDelete(LogUpdateProcessorFactory.java:121)
at 
org.apache.solr.handler.loader.XMLLoader.processDelete(XMLLoader.java:346)
at 
org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:277)
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
at 
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:448)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:269)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602)
at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:662)

I have about 3GB of logfiles for this single message. Reloading the collection 
does not work. Reloading the specific shard core returns the same exception. 
The only option seems to be to restart the server. But because it's the leader 
for a lot of collections, I want to know why this is happening. I've seen this 
problem before, and I haven't figured out what is causing it.

I've reported a different problem a few days ago with 'hanging' deleted 
logfiles. Could this be related? Could the hanging logfiles prevent a new 
Searcher from opening? I've updated two of my three hosts to 4.5.1 but after 
only 2 days uptime, I'm still seeing about 11.000 deleted logfiles in the lsof 
output.

Best regards,
Eric Bus




RE: How to remove a Solr Node and its cores from a cluster SolrCloud and from collection

2013-11-29 Thread Eric Bus
Hi Sébastien,

Maybe this can help?

"Add a collection admin command to remove a replica"
https://issues.apache.org/jira/browse/SOLR-5310

It's part of the new 4.6.0 update.

Best regards,
Eric


-Oorspronkelijk bericht-
Van: Seb Geek [mailto:geek...@gmail.com] 
Verzonden: vrijdag 29 november 2013 12:47
Aan: solr-user@lucene.apache.org
Onderwerp: How to remove a Solr Node and its cores from a cluster SolrCloud and 
from collection

Hello,

I have a cluster of 4 Solr Cloud Nodes (nodes N1, N2, N3, N4). I use Solr
version 4.5.1 . One (N4) of these node have completely died (all cpu, ram
and disks are lost), I have added an other node (N5) to the Solr Cloud
cluster and copied all core configuration previously on node N4 to that
node (solr.xml and core.properties in data dir). That N5 node have
replicated all the index of my collection and is already able to respond to
request for the core (replica) that it owns.

In the state of my Solr cloud cluster, i can see old replicas on the died
node N4 ! how can i remove theses replica from my collection ?

Thanks
Sébastien


RE: SolrCloud keeps repeating exception 'SolrCoreState already closed'

2013-12-03 Thread Eric Bus
Are you currently running SOLR under Tomcat or standalone with Jetty? I 
switched from Tomcat to Jetty and the problems went away.

- Eric


-Oorspronkelijk bericht-
Van: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] 
Verzonden: dinsdag 3 december 2013 12:44
Aan: solr-user@lucene.apache.org
Onderwerp: Re: SolrCloud keeps repeating exception 'SolrCoreState already 
closed'

I just ran into this issue on solr 4.6 on an EC2 machine while indexing 
wikipedia dump with DIH. I'm trying to isolate exceptions before the 
SolrCoreState already closed exception.

On Sun, Nov 10, 2013 at 11:58 PM, Mark Miller  wrote:
> Can you isolate any exceptions that happened just before that exception. 
> started repeating?
>
> - Mark
>
>> On Nov 7, 2013, at 9:09 AM, Eric Bus  wrote:
>>
>> Hi,
>>
>> I'm having a problem with one of my shards. Since yesterday, SOLR keeps 
>> repeating the same exception over and over for this shard.
>> The webinterface for this SOLR instance is also not working (it hangs on the 
>> Loading indicator).
>>
>> Nov 7, 2013 9:08:12 AM 
>> org.apache.solr.update.processor.LogUpdateProcessor finish
>> INFO: [website1_shard1_replica3] webapp=/solr path=/update 
>> params={update.distrib=TOLEADER&wt=javabin&version=2} {} 0 0 Nov 7, 
>> 2013 9:08:12 AM org.apache.solr.common.SolrException log
>> SEVERE: java.lang.RuntimeException: SolrCoreState already closed
>>at 
>> org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:79)
>>at 
>> org.apache.solr.update.DirectUpdateHandler2.delete(DirectUpdateHandler2.java:276)
>>at 
>> org.apache.solr.update.processor.RunUpdateProcessor.processDelete(RunUpdateProcessorFactory.java:77)
>>at 
>> org.apache.solr.update.processor.UpdateRequestProcessor.processDelete(UpdateRequestProcessor.java:55)
>>at 
>> org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalDelete(DistributedUpdateProcessor.java:460)
>>at 
>> org.apache.solr.update.processor.DistributedUpdateProcessor.versionDelete(DistributedUpdateProcessor.java:1036)
>>at 
>> org.apache.solr.update.processor.DistributedUpdateProcessor.processDelete(DistributedUpdateProcessor.java:721)
>>at 
>> org.apache.solr.update.processor.LogUpdateProcessor.processDelete(LogUpdateProcessorFactory.java:121)
>>at 
>> org.apache.solr.handler.loader.XMLLoader.processDelete(XMLLoader.java:346)
>>at 
>> org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:277)
>>at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
>>at 
>> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
>>at 
>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
>>at 
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>>at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816)
>>at 
>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:448)
>>at 
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:269)
>>at 
>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>>at 
>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>>at 
>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>>at 
>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>>at 
>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
>>at 
>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>>at 
>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>>at 
>> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
>>at 
>> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
>>at 
>> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602)
>>at 
>> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
>>at java.lang.Thread.run(Thread.java:662)
>>
>> I have about 3GB of logfiles for this single message. Reloading the 
>> collection does not work. Reloading the specific shard core returns the same 
>> exc