Solr 5.2.1 - ReplicationHandler - No route to a host that is long gone
Hey guys! We have a deploy of SolrCloud 5.2.1 that is composed of 5 to 8 amazon linux ec2 c3.2xlarge instances. Our main core is composed of 4M docs (6GB) and we serve an average of 70 req/s per machine. We are using zookeeper 3.4.6 to provide cluster synchronization. The thing is we are noticing some weird "No route to host" exceptions on our logs. It seems that the ReplicationHandler is trying to contact some other server that used to be the cluster leader but is long gone now. This behaviour is triggered when accessing this specific core's "Dashboard". http://my-server/solr/admin/collections?action=clusterstatus tells me this former leader is down. So zookeeper knows about it. Any ideas on why the ReplicationHandler is still trying to contact it? I'll attach the stacktrace just to illustrate the situation. Any help will be greatly appreciated. Thanks! Best, Eric ''' 2015-Oct-06 18:18:02,446 [qtp1690716179-12764] org.apache.solr.handler.ReplicationHandler WARN Exception while invoking 'details' method for replication on master org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: http://10.10.10.10:8983/solr/my-core at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:574) at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:235) at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:227) at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1220) at org.apache.solr.handler.IndexFetcher.getDetails(IndexFetcher.java:1563) at org.apache.solr.handler.ReplicationHandler.getReplicationDetails(ReplicationHandler.java:821) at org.apache.solr.handler.ReplicationHandler.handleRequestBody(ReplicationHandler.java:305) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064) at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.eclipse.jetty.server.Server.handle(Server.java:497) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257) at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) at java.lang.Thread.run(Thread.java:745) Caused by: java.net.NoRouteToHostException: No route to host at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:117) at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:177) at org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:304) at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:611) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:446) at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882) at org.apache.http.impl.client.CloseableHttpCl
Re: Pressed optimize and now SOLR is not indexing while optimize is going on
Hello Shawn, I'm sorry to diverge this thread a little bit. But could please point me to resources that explain deeply how this process of OS using the non-java memory to cache index data? > Whatever RAM is left over after you give 12GB to Java for Solr will be > used automatically by the operating system to cache index data on the > disk. Solr is completely reliant on that caching for good performance. I'm puzzled as to why the physical memory of solr's host machine is always used up and I think some resources on that would help me understand it. Thanks On Tue, Oct 6, 2015 at 5:07 PM, Siddhartha Singh Sandhu < sandhus...@gmail.com> wrote: > Thank you for helping out. > > Further inquiry: I am committing records to my solr implementation and they > are not getting showing up in my search. I am search on the default id. > Is this related to the fact that I dont have enough memory so my SOLR is > taking a lot of time to actually making the indexed documents available > instantly. > > I also looked at the solr log when I sent in my curl commit with my > record(which I can not see in the SOLR instance even after sending it > repeatedly), but it didn't through an error. > > I got this as my response on insertion of that record: > > {"responseHeader":{"status":0,"QTime":57}} > > Thank you. > > Sid. > > On Tue, Oct 6, 2015 at 3:21 PM, Shawn Heisey wrote: > > > On 10/6/2015 8:18 AM, Siddhartha Singh Sandhu wrote: > > > A have a few questions about optimize. Is the search index fully > > searchable > > > after a commit? > > > > If openSearcher is true on the commit, then changes to the index > > (additions, replacements, deletions) will be visible when the commit > > completes. > > > > > How much time does one have to wait in case of a hard commit for the > > index > > > to be available? > > > > This is impossible to answer. It will take as long as it takes, and the > > time will depend on many factors, so it is nearly impossible to > > predict. The only way to know is to try it ... and the number you get > > on one test may be very different than what you actually see once the > > system is in production. > > > > > I have an index of 180G. Do I need to hit the optimize on this chunk. > > This > > > is a single core. Say I cannot get in a cloud env because of cost but > > this > > > is a fairly large > > > amazon machine where I have given SOLR 12G of memory. > > > > Whatever RAM is left over after you give 12GB to Java for Solr will be > > used automatically by the operating system to cache index data on the > > disk. Solr is completely reliant on that caching for good performance. > > A perfectly ideal system for that index and heap size would have 192GB > > of RAM, which is enough to entirely cache the index. I personally > > wouldn't expect good performance with less than 96GB. Some systems with > > a 180GB index and a 12GB heap might be OK with 64GBtotal memory, while > > others with the same size index will require more. > > > > > > > https://lucidworks.com/blog/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ > > > > If the index is on SSD, then RAM is *slightly* less important, and > > performance usually goes up with SSD ... but an SSD cannot completely > > replace RAM, because RAM is much faster. With SSD, you can get away > > with less RAM than you can on a spinning disk system, but depending on a > > bunch of factors, it may not be a LOT less RAM. > > > > https://wiki.apache.org/solr/SolrPerformanceProblems > > > > Optimizing the index is almost never necessary with recent versions. In > > almost all cases optimizing will get you a performance increase, but it > > comes at a huge cost in terms of resource utilization to DO the > > optimize. While the optimize is happening performance will likely be > > worse, possibly a LOT worse. Newer versions of Solr (Lucene) have > > closed the gap on performance with non-optimized indexes, so it doesn't > > gain you as much in performance as it did in earlier versions. > > > > Thanks, > > Shawn > > > > >
Re: Pressed optimize and now SOLR is not indexing while optimize is going on
Cool, Toke and Shawn! That's exactly what I was looking for. I'll have a look at those resources and if something is yet unclear I'll open a thread for it. Thanks for the information, Eric On Wed, Oct 7, 2015 at 10:29 AM, Shawn Heisey wrote: > On 10/7/2015 4:03 AM, Eric Torti wrote: > > I'm sorry to diverge this thread a little bit. But could please point me > to > > resources that explain deeply how this process of OS using the non-java > > memory to cache index data? > > > >> Whatever RAM is left over after you give 12GB to Java for Solr will be > >> used automatically by the operating system to cache index data on the > >> disk. Solr is completely reliant on that caching for good performance. > > > > I'm puzzled as to why the physical memory of solr's host machine is > always > > used up and I think some resources on that would help me understand it. > > Toke's reply is excellent, and describes the situation from Lucene's > perspective. Solr is a Lucene program, so the same information applies. > > Here's more generic information on how the OS uses memory for caching > for most programs: > > https://en.wikipedia.org/wiki/Page_cache > > Note that some programs, like MySQL and Microsoft Exchange, skip the OS > cache and take care of caching internally. > > Thanks, > Shawn > >
Is solr.StandardDirectoryFactory an MMapDirectory?
Hello, I'm running a 5.2.1 SolrCloud cluster and I see that one of my cores is configured under solrconfig.xml to use I'm just starting to grasp different strategies for Directory implementation. Can I assume that solr.StandardDirectoryFactory is a MMapDirectory as described by Uwe Schindler in this post about the use of virtual memory? [http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html] Thanks! Best, Eric Torti
Re: Is solr.StandardDirectoryFactory an MMapDirectory?
Thanks, Shawn. > After a look at the code, I found that StandardDirectoryFactory should > use MMap if the OS and Java version support it. If support isn't there, > it will use conventional file access methods. As far as I know, all > 64-bit Java versions and 64-bit operating systems will support MMap. Considering our JVM is 64-bit, that probably explains why we're experiencing MMapDirectory like behaviour on our cluster (i.e. high non-JVM related memory use). As to NRTCachingDirectoryFactory, when looking up the docs we were in doubt about what it means to have a "highish reopen rate". > public class NRTCachingDirectory > This class is likely only useful in a near-real-time context, where indexing > rate is lowish but reopen rate is highish, > resulting in many tiny files > being written. Can we read "high reopen rate" as "frequent soft commits"? (In our case, hard commits do not open a searcher. But soft commits do). Considering it does mean "frequent soft commits", I'd say that it doesn't fit our setup because we have an index rate of about 10 updates/s and we perform a soft commit at each 15min. So our scenario is not near real time in that sense. In light of this, do you thing using NRTCachingDirectory is still convenient? Best, Eric On Wed, Oct 7, 2015 at 12:08 PM, Shawn Heisey wrote: > On 10/7/2015 8:48 AM, Eric Torti wrote: >> > class="${solr.directoryFactory:solr.StandardDirectoryFactory}" >> name="DirectoryFactory"/> >> >> I'm just starting to grasp different strategies for Directory >> implementation. Can I assume that solr.StandardDirectoryFactory is a >> MMapDirectory as described by Uwe Schindler in this post about the use >> of virtual memory? >> [http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html] > > After a look at the code, I found that StandardDirectoryFactory should > use MMap if the OS and Java version support it. If support isn't there, > it will use conventional file access methods. As far as I know, all > 64-bit Java versions and 64-bit operating systems will support MMap. > > The factory you *should* be using is NRTCachingDirectoryFactory, and you > should enable the updateLog to ensure data reliability. > > Thanks, > Shawn >
Re: Is solr.StandardDirectoryFactory an MMapDirectory?
Correcting: When I mentioned high non-JVM memory usage, what I probably meant was high virtual memory allocation. On Wed, Oct 7, 2015 at 3:00 PM, Eric Torti wrote: > Thanks, Shawn. > >> After a look at the code, I found that StandardDirectoryFactory should >> use MMap if the OS and Java version support it. If support isn't there, >> it will use conventional file access methods. As far as I know, all >> 64-bit Java versions and 64-bit operating systems will support MMap. > > Considering our JVM is 64-bit, that probably explains why we're > experiencing MMapDirectory like behaviour on our cluster (i.e. high > non-JVM related memory use). > > As to NRTCachingDirectoryFactory, when looking up the docs we were in > doubt about what it means to have a "highish reopen rate". > >> public class NRTCachingDirectory > >> This class is likely only useful in a near-real-time context, where indexing >> rate is lowish but reopen rate is highish, > resulting in many tiny files >> being written. > > Can we read "high reopen rate" as "frequent soft commits"? (In our > case, hard commits do not open a searcher. But soft commits do). > > Considering it does mean "frequent soft commits", I'd say that it > doesn't fit our setup because we have an index rate of about 10 > updates/s and we perform a soft commit at each 15min. So our scenario > is not near real time in that sense. In light of this, do you thing > using NRTCachingDirectory is still convenient? > > Best, > > Eric > > > > On Wed, Oct 7, 2015 at 12:08 PM, Shawn Heisey wrote: >> On 10/7/2015 8:48 AM, Eric Torti wrote: >>> >> class="${solr.directoryFactory:solr.StandardDirectoryFactory}" >>> name="DirectoryFactory"/> >>> >>> I'm just starting to grasp different strategies for Directory >>> implementation. Can I assume that solr.StandardDirectoryFactory is a >>> MMapDirectory as described by Uwe Schindler in this post about the use >>> of virtual memory? >>> [http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html] >> >> After a look at the code, I found that StandardDirectoryFactory should >> use MMap if the OS and Java version support it. If support isn't there, >> it will use conventional file access methods. As far as I know, all >> 64-bit Java versions and 64-bit operating systems will support MMap. >> >> The factory you *should* be using is NRTCachingDirectoryFactory, and you >> should enable the updateLog to ensure data reliability. >> >> Thanks, >> Shawn >>
Re: Is solr.StandardDirectoryFactory an MMapDirectory?
Ok, thanks Shawn! That makes sense. We'll be experimenting with it. Best, Eric On Wed, Oct 7, 2015 at 5:54 PM, Shawn Heisey wrote: > On 10/7/2015 12:00 PM, Eric Torti wrote: >> Can we read "high reopen rate" as "frequent soft commits"? (In our >> case, hard commits do not open a searcher. But soft commits do). >> >> Considering it does mean "frequent soft commits", I'd say that it >> doesn't fit our setup because we have an index rate of about 10 >> updates/s and we perform a soft commit at each 15min. So our scenario >> is not near real time in that sense. In light of this, do you thing >> using NRTCachingDirectory is still convenient? > > The NRT factory achieves high speed in NRT situations by flushing very > small updates to RAM instead of the disk. As more updates come in, > older index segments sitting in RAM will eventually be flushed to disk, > so a sustained flood of updates doesn't really achieve a speed increase, > but a short burst of updates will be searchable *very* quickly. > > NRTCachingDirectoryFactory was chosen for Solr examples (and I think > it's the Solr default) because it has no real performance downsides, but > has a strong possibility to be noticeably faster than the standard > factory in NRT situations. > > The only problem with it is that small index segments from recent > updates might only exist in RAM, and not get flushed to disk, so they > would be lost if Solr dies or is killed suddenly. This is part of why > the updateLog feature exists -- when Solr is started, the transaction > logs will be replayed, inserting/replacing (at a minimum) all documents > indexed since the last hard commit. When the replay is finished, you > will not lose data. This does require a defined uniqueKey to operate > correctly. > > Thanks, > Shawn >
Re: slow queries
Hi, Lorenzo, I don't think this has a direct relation to your problem but it looks like you're setting -DzkClientTimeout twice. From what I know about setting VM arguments twice, you're probably ending up with the last one being enforced. Just something to be aware of I guess. I don't think this relates to your problem because the GC pauses are not superior to 30s which seems to be the time zookeeper would let a node be irresponsive before considering it in recovery. Best, Eric Torti On Thu, Oct 15, 2015 at 6:51 AM, Lorenzo Fundaró wrote: > On 14 October 2015 at 20:35, Pushkar Raste wrote: > >> You may want to start solr with following settings to enable logging GC >> details. Here are some flags you might want to enable. >> >> -Xloggc:/gc.log >> -XX:+PrintGCDetails >> -XX:+PrintGCDateStamps >> -XX:+PrintGCTimeStamps >> -XX:+PrintTenuringDistribution >> -XX:+PrintGCApplicationStoppedTime >> -XX:+PrintHeapAtGC >> >> Once you have GC logs, look for string "Total time for which application >> threads were stopped" to check if you have long pauses (you may get long >> pauses even with young generation GC). >> > > > Yes, there are several lines indicating that threads are being stopped. > There is this one particularly that draw my attention because right after a > second it happened 2 of my replicas went into > recovery mode, including the one who suffered the thread stop. > > solr_gc.log.1.current:2015-10-15T07:47:03.263+: 251173.653: Total time > for which application threads were stopped: 1.4936161 seconds, Stopping > threads took: 0.502 seconds > (is a second of stopped threads enough to have a node in recovery node ?) > When this happened, the leader had a couple of connection resets while > trying to communicate with this replica. > > and this server the highest stop takes 4s. > > solr_gc.log.1.current:2015-10-14T20:24:01.353+: 210191.743: Total time > for which application threads were stopped: 4.0111066 seconds, Stopping > threads took: 0.776 seconds > > These are the jvm flags > > -XX:NewSize=256m -XX:MaxNewSize=256m > /usr/lib/jvm/java-8-oracle/bin/java -server -Xss256k -Xms16g -Xmx16g > -XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 > -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC > -XX:ConcGCThreads=4 -XX:ParallelGCThreads=4 -XX:+CMSScavengeBeforeRemark > -XX:PretenureSizeThreshold=64m -XX:+UseCMSInitiatingOccupancyOnly > -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000 > -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -verbose:gc > -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps > -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution > -XX:+PrintGCApplicationStoppedTime -Xloggc:/solr_gc.log > -DzkClientTimeout=15000 -Duser.timezone=UTC > -Djava.net.preferIPv4Stack=true -DzkClientTimeout=3 -XX:NewSize=256m > -XX:MaxNewSize=256m > > All of the options are the default that come with the solr startup script, > only the ones specified in the first line are being put by us. > > > > >> >> -- Pushkar Raste >> >> On Wed, Oct 14, 2015 at 11:47 AM, Lorenzo Fundaró < >> lorenzo.fund...@dawandamail.com> wrote: >> >> > <> > &debug=true to the query?>> >> > >> > "debug": { "rawquerystring": "*:*", "querystring": "*:*", "parsedquery": >> > "(+MatchAllDocsQuery(*:*))/no_coord", "parsedquery_toString": "+*:*", " >> > explain": { "Product:47047358": "\n1.0 = (MATCH) MatchAllDocsQuery, >> product >> > of:\n 1.0 = queryNorm\n", "Product:3223": "\n1.0 = (MATCH) >> > MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:30852121": >> > "\n1.0 >> > = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", " >> > Product:35018929": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 >> = >> > queryNorm\n", "Product:31682082": "\n1.0 = (MATCH) MatchAllDocsQuery, >> > product of:\n 1.0 = queryNorm\n", "Product:31077677": "\n1.0 = (MATCH) >> > MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:22298365": >> > "\n1.0 >> > = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", " >> > Product:41094514": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 >> = >> > queryNorm\n", "Product:13106166": "\n1.0 = (MATCH) Matc
`cat /dev/null > solr-8983-console.log` frees host's memory
Hey guys! I had a 52GB solr-8983-console.log on my Solr 5.2.1 Amazon Linux 64-bit box and decided to `cat /dev/null > solr-8983-console.log` to free space. The weird thing is that when I checked Sematext I noticed the OS had freed a lot of memory at the same exact instant I did that. https://dl.dropboxusercontent.com/u/10240770/disk-space.png https://dl.dropboxusercontent.com/u/10240770/memory-usage.png I'm really not very familiar with how the OS allocates memory, but it seemed weird that truncating a file would actually free some of it. Any pointers on what I'm missing here would be greatly appreciated. Best, Eric Torti
Re: `cat /dev/null > solr-8983-console.log` frees host's memory
Thank you Shawn, Timothy, Emir and Rajani. Sorry, Shawn, I ended up cropping out the legend but you were right on your guess. Indeed, Timothy, this log is completely redundant. Will get rid of it soon. I'll look into the resources you all pointed out. Thanks! Best, Eric Torti On Wed, Oct 21, 2015 at 8:21 PM, Rajani Maski wrote: > This details in this link[1] might be of help. > > [1]https://support.lucidworks.com/hc/en-us/articles/207072137 > > On Wed, Oct 21, 2015 at 7:42 AM, Emir Arnautovic < > emir.arnauto...@sematext.com> wrote: > >> Hi Eric, >> As Shawn explained, memory is freed because it was used to cache portion >> of log file. >> >> Since you are already with Sematext, I guess you are aware, but doesn't >> hurt to remind you that we also have Logsene that you can use to manage >> your logs: http://sematext.com/logsene/index.html >> >> Thanks, >> Emir >> >> -- >> Monitoring * Alerting * Anomaly Detection * Centralized Log Management >> Solr & Elasticsearch Support * http://sematext.com/ >> >> >> >> >> On 20.10.2015 17:42, Shawn Heisey wrote: >> >>> On 10/20/2015 9:19 AM, Eric Torti wrote: >>> >>>> I had a 52GB solr-8983-console.log on my Solr 5.2.1 Amazon Linux >>>> 64-bit box and decided to `cat /dev/null > solr-8983-console.log` to >>>> free space. >>>> >>>> The weird thing is that when I checked Sematext I noticed the OS had >>>> freed a lot of memory at the same exact instant I did that. >>>> >>> On that memory graph, the legend doesn't indicate which of the graph >>> colors represent each of the four usage types at the top -- they all >>> have blue checkboxes, so I can't tell for sure what changed. >>> >>> If the number that dropped is "cached" (which I think is likely) then >>> everything is working exactly as it should. The OS had simply cached a >>> large chunk of the logfile, exactly as it is designed to do, and once >>> the file was deleted, it stopped reserving that memory and made it >>> available. >>> >>> https://en.wikipedia.org/wiki/Page_cache >>> >>> Thanks, >>> Shawn >>> >>>