SOLR/Tomcat6 keeping references to deleted tlog files
Hi, I've been running a SolrCloud setup running SOLR 4.4 consisting of 3 nodes for some time. The cloud is hosting about 40 small collections that receive updates once a day. The collections are using different shard and replication configurations (varying from 2 shards without replication to 2 shard with 3 replicas). After running Tomcat for a couple of weeks, I notice the number of open files is dramatically increasing. Most of those files are deleted tlog files that SOLR keeps open: eric@node1:/ # lsof -np 16810 | grep deleted | wc -l 36345 Those files are no longer on disk, but SOLR still has a handle open. My disk use is going through the roof. 6GB is currently 'in use' by deleted but still open files. When I restart Tomcat, the space is freed and it starts all over again. All of my nodes experience this behavior. First I thought it had something to do with the lack of commits. But it happens on all my collections, even the ones with fast autoCommit: 5000 12 false My update process always triggers a commit or rollback and updates are showing up correctly. I read something about SOLR having TCP connections in CLOSE_WAIT. The only CLOSE_WAIT connection I see are between the nodes. And there are only about 10 of them. Those connections can't be causing 36k open files, right? Any suggestions/tips? At the moment, I have to restart my leader every couple of weeks and that's not really something I would like to do :) Best regards, Eric Bus
Using all SolrCloud servers in round-robin setup
Hi, I'm currently using a SolrCloud setup with 3 nodes. The setup hosts about 50 (small) collections of a few thousand documents each. In the past, I've used collections with replicationFactor = 3. So each node has a replica of all the collections. But now I want to add an extra node. Now, new collections can be created on server 1, 2 and 4. Or on 1, 3 and 4. I'm not specifying specific nodes at creation time. My problem is that I cannot use each node in the cluster to query my collections. If a collection is not hosted on node 2, I cannot use node 2 to query that collection. Is that normal behavior? Does that mean that I'll have to keep a list of nodes per collection (or query and cache it from zookeeper) and use that in my client application? Currently I'm using one of the nodes as a fixed IP in my client application. This node contains all the collections, because new collections are always created on that node. But when it goes down, there is no other node that contains all the collections. Best regards, Eric Bus
SolrCloud keeps repeating exception 'SolrCoreState already closed'
Hi, I'm having a problem with one of my shards. Since yesterday, SOLR keeps repeating the same exception over and over for this shard. The webinterface for this SOLR instance is also not working (it hangs on the Loading indicator). Nov 7, 2013 9:08:12 AM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: [website1_shard1_replica3] webapp=/solr path=/update params={update.distrib=TOLEADER&wt=javabin&version=2} {} 0 0 Nov 7, 2013 9:08:12 AM org.apache.solr.common.SolrException log SEVERE: java.lang.RuntimeException: SolrCoreState already closed at org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:79) at org.apache.solr.update.DirectUpdateHandler2.delete(DirectUpdateHandler2.java:276) at org.apache.solr.update.processor.RunUpdateProcessor.processDelete(RunUpdateProcessorFactory.java:77) at org.apache.solr.update.processor.UpdateRequestProcessor.processDelete(UpdateRequestProcessor.java:55) at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalDelete(DistributedUpdateProcessor.java:460) at org.apache.solr.update.processor.DistributedUpdateProcessor.versionDelete(DistributedUpdateProcessor.java:1036) at org.apache.solr.update.processor.DistributedUpdateProcessor.processDelete(DistributedUpdateProcessor.java:721) at org.apache.solr.update.processor.LogUpdateProcessor.processDelete(LogUpdateProcessorFactory.java:121) at org.apache.solr.handler.loader.XMLLoader.processDelete(XMLLoader.java:346) at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:277) at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:448) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:269) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:662) I have about 3GB of logfiles for this single message. Reloading the collection does not work. Reloading the specific shard core returns the same exception. The only option seems to be to restart the server. But because it's the leader for a lot of collections, I want to know why this is happening. I've seen this problem before, and I haven't figured out what is causing it. I've reported a different problem a few days ago with 'hanging' deleted logfiles. Could this be related? Could the hanging logfiles prevent a new Searcher from opening? I've updated two of my three hosts to 4.5.1 but after only 2 days uptime, I'm still seeing about 11.000 deleted logfiles in the lsof output. Best regards, Eric Bus
RE: How to remove a Solr Node and its cores from a cluster SolrCloud and from collection
Hi Sébastien, Maybe this can help? "Add a collection admin command to remove a replica" https://issues.apache.org/jira/browse/SOLR-5310 It's part of the new 4.6.0 update. Best regards, Eric -Oorspronkelijk bericht- Van: Seb Geek [mailto:geek...@gmail.com] Verzonden: vrijdag 29 november 2013 12:47 Aan: solr-user@lucene.apache.org Onderwerp: How to remove a Solr Node and its cores from a cluster SolrCloud and from collection Hello, I have a cluster of 4 Solr Cloud Nodes (nodes N1, N2, N3, N4). I use Solr version 4.5.1 . One (N4) of these node have completely died (all cpu, ram and disks are lost), I have added an other node (N5) to the Solr Cloud cluster and copied all core configuration previously on node N4 to that node (solr.xml and core.properties in data dir). That N5 node have replicated all the index of my collection and is already able to respond to request for the core (replica) that it owns. In the state of my Solr cloud cluster, i can see old replicas on the died node N4 ! how can i remove theses replica from my collection ? Thanks Sébastien
RE: SolrCloud keeps repeating exception 'SolrCoreState already closed'
Are you currently running SOLR under Tomcat or standalone with Jetty? I switched from Tomcat to Jetty and the problems went away. - Eric -Oorspronkelijk bericht- Van: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] Verzonden: dinsdag 3 december 2013 12:44 Aan: solr-user@lucene.apache.org Onderwerp: Re: SolrCloud keeps repeating exception 'SolrCoreState already closed' I just ran into this issue on solr 4.6 on an EC2 machine while indexing wikipedia dump with DIH. I'm trying to isolate exceptions before the SolrCoreState already closed exception. On Sun, Nov 10, 2013 at 11:58 PM, Mark Miller wrote: > Can you isolate any exceptions that happened just before that exception. > started repeating? > > - Mark > >> On Nov 7, 2013, at 9:09 AM, Eric Bus wrote: >> >> Hi, >> >> I'm having a problem with one of my shards. Since yesterday, SOLR keeps >> repeating the same exception over and over for this shard. >> The webinterface for this SOLR instance is also not working (it hangs on the >> Loading indicator). >> >> Nov 7, 2013 9:08:12 AM >> org.apache.solr.update.processor.LogUpdateProcessor finish >> INFO: [website1_shard1_replica3] webapp=/solr path=/update >> params={update.distrib=TOLEADER&wt=javabin&version=2} {} 0 0 Nov 7, >> 2013 9:08:12 AM org.apache.solr.common.SolrException log >> SEVERE: java.lang.RuntimeException: SolrCoreState already closed >>at >> org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:79) >>at >> org.apache.solr.update.DirectUpdateHandler2.delete(DirectUpdateHandler2.java:276) >>at >> org.apache.solr.update.processor.RunUpdateProcessor.processDelete(RunUpdateProcessorFactory.java:77) >>at >> org.apache.solr.update.processor.UpdateRequestProcessor.processDelete(UpdateRequestProcessor.java:55) >>at >> org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalDelete(DistributedUpdateProcessor.java:460) >>at >> org.apache.solr.update.processor.DistributedUpdateProcessor.versionDelete(DistributedUpdateProcessor.java:1036) >>at >> org.apache.solr.update.processor.DistributedUpdateProcessor.processDelete(DistributedUpdateProcessor.java:721) >>at >> org.apache.solr.update.processor.LogUpdateProcessor.processDelete(LogUpdateProcessorFactory.java:121) >>at >> org.apache.solr.handler.loader.XMLLoader.processDelete(XMLLoader.java:346) >>at >> org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:277) >>at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173) >>at >> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) >>at >> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) >>at >> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) >>at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816) >>at >> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:448) >>at >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:269) >>at >> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) >>at >> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) >>at >> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) >>at >> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) >>at >> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) >>at >> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) >>at >> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) >>at >> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) >>at >> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859) >>at >> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602) >>at >> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) >>at java.lang.Thread.run(Thread.java:662) >> >> I have about 3GB of logfiles for this single message. Reloading the >> collection does not work. Reloading the specific shard core returns the same >> exc