Hi, I've been seeing the same thing on CentOS with high physical memory use with low JVM-Memory use. I came to the conclusion that this was expected behaviour. Using top I noticed that my solr user's java process has Virtual memory allocated of about twice the size of the index, actual is within the limits I set when jetty starts. I infer from this that 98% of Physical Memory is being used to cache the index. Walter, Erick and others are constantly reminding people on list to have RAM the size of the index available -- I think 98% physical memory use is exactly why. Here is an excerpt from Uwe Schindler's well written piece<http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html>which explains in greater detail:
*"Basically mmap does the same like handling the Lucene index as a swap file. The mmap() syscall tells the O/S kernel to virtually map our whole index files into the previously described virtual address space, and make them look like RAM available to our Lucene process. We can then access our index file on disk just like it would be a large byte[] array (in Java this is encapsulated by a ByteBuffer interface to make it safe for use by Java code). If we access this virtual address space from the Lucene code we don’t need to do any syscalls, the processor’s MMU and TLB handles all the mapping for us. If the data is only on disk, the MMU will cause an interrupt and the O/S kernel will load the data into file system cache. If it is already in cache, MMU/TLB map it directly to the physical memory in file system cache. It is now just a native memory access, nothing more! We don’t have to take care of paging in/out of buffers, all this is managed by the O/S kernel. Furthermore, we have no concurrency issue, the only overhead over a standard byte[] array is some wrapping caused by Java’s ByteBuffer interface (it is still slower than a real byte[] array, but that is the only way to use mmap from Java and is much faster than all other directory implementations shipped with Lucene). We also waste no physical memory, as we operate directly on the O/S cache, avoiding all Java GC issues described before."* * * Is it odd that my index is ~16GB but top shows 30GB in virtual memory? Would the extra be for the field and filter caches I've increased in size? I went through a few Java tuning steps relating to OutOfMemoryErrors when using DataImportHandler with Solr. The first thing is that when using the FileEntityProcessor for each file in the file system to be indexed an entry is made and stored in heap before any indexing actually occurs. When I started pointing this at very large directories I started running out of heap. One work-around is to divide the job up into smaller batches, but I was able to allocate more memory so that everything fit. The next thing is that with more memory allocated the limiting factor was too many open files. After allowing the solr user to open more files I was able to get past this as well. There was a sweet spot where indexing with just enough memory was slow enough that I didn't experience the too many open files error but why go slow? Now I'm able to index ~4M documents (newspaper articles and fulltext monographs) in about 7 hours. I hope someone will correct me if I'm wrong about anything I've said here and especially if there is a better way to do things. Best of luck, Tricia On Wed, Aug 28, 2013 at 12:12 PM, Dan Davis <dansm...@gmail.com> wrote: > This could be an operating systems problem rather than a Solr problem. > CentOS 6.4 (linux kernel 2.6.32) may have some issues with page flushing > and I would read-up up on that. > The VM parameters can be tuned in /etc/sysctl.conf > > > On Sun, Aug 25, 2013 at 4:23 PM, Furkan KAMACI <furkankam...@gmail.com > >wrote: > > > Hi Erick; > > > > I wanted to get a quick answer that's why I asked my question as that > way. > > > > Error is as follows: > > > > INFO - 2013-08-21 22:01:30.978; > > org.apache.solr.update.processor.LogUpdateProcessor; [collection1] > > webapp=/solr path=/update params={wt=javabin&version=2} > > {add=[com.deviantart.reachmeh > > ere:http/gallery/, com.deviantart.reachstereo:http/, > > com.deviantart.reachstereo:http/art/SE-mods-313298903, > > com.deviantart.reachtheclouds:http/, > com.deviantart.reachthegoddess:http/, > > co > > m.deviantart.reachthegoddess:http/art/retouched-160219962, > > com.deviantart.reachthegoddess:http/badges/, > > com.deviantart.reachthegoddess:http/favourites/, > > com.deviantart.reachthetop:http/ > > art/Blue-Jean-Baby-82204657 (1444006227844530177), > > com.deviantart.reachurdreams:http/, ... (163 adds)]} 0 38790 > > ERROR - 2013-08-21 22:01:30.979; org.apache.solr.common.SolrException; > > java.lang.RuntimeException: [was class org.eclipse.jetty.io.EofException] > > early EOF > > at > > > > > com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18) > > at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731) > > at > > > > > com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657) > > at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809) > > at org.apache.solr.handler.loader.XMLLoader.readDoc(XMLLoader.java:393) > > at > > > org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:245) > > at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173) > > at > > > > > org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) > > at > > > > > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) > > at > > > > > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) > > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1812) > > at > > > > > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639) > > at > > > > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345) > > at > > > > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141) > > at > > > > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307) > > at > > > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453) > > at > > > > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) > > at > > > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560) > > at > > > > > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) > > at > > > > > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072) > > at > > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382) > > at > > > > > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) > > at > > > > > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006) > > at > > > > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) > > at > > > > > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) > > at > > > > > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) > > at > > > > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) > > at org.eclipse.jetty.server.Server.handle(Server.java:365) > > at > > > > > org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485) > > at > > > > > org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) > > at > > > > > org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:937) > > at > > > > > org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:998) > > at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:948) > > at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) > > at > > > > > org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) > > at > > > > > org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) > > at > > > > > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) > > at > > > > > org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) > > at java.lang.Thread.run(Thread.java:722) > > Caused by: org.eclipse.jetty.io.EofException: early EOF > > at org.eclipse.jetty.server.HttpInput.read(HttpInput.java:65) > > at java.io.InputStream.read(InputStream.java:101) > > at com.ctc.wstx.io.UTF8Reader.loadMore(UTF8Reader.java:365) > > at com.ctc.wstx.io.UTF8Reader.read(UTF8Reader.java:110) > > at com.ctc.wstx.io.MergedReader.read(MergedReader.java:101) > > at com.ctc.wstx.io.ReaderSource.readInto(ReaderSource.java:84) > > at > > > > > com.ctc.wstx.io.BranchingReaderSource.readInto(BranchingReaderSource.java:57) > > at com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:992) > > at > > > > > com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4628) > > at > > > > > com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java:4126) > > at > > > com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701) > > at > > > > > com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3649) > > ... 36 more > > > > ERROR - 2013-08-21 22:01:30.980; org.apache.solr.common.SolrException; > > null:java.lang.RuntimeException: [was class > > org.eclipse.jetty.io.EofException] early EOF > > at > > > > > com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18) > > at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731) > > at > > > > > com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657) > > at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809) > > at org.apache.solr.handler.loader.XMLLoader.readDoc(XMLLoader.java:393) > > at > > > org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:245) > > at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173) > > at > > > > > org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) > > at > > > > > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) > > at > > > > > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) > > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1812) > > at > > > > > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639) > > at > > > > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345) > > > > > > I use Nutch (that uses Hadoop) to send documents from Hbase to Solr. I am > > not indexing documents at Hadoop. I just send documents via Map/Reduce > jobs > > into my SolrCloud. Nutch sends documents as like that: > > > > ... > > SolrServer solr = new CommonsHttpSolrServer(solrUrl); > > ... > > private final List<SolrInputDocument> inputDocs = new > > ArrayList<SolrInputDocument>(); > > ... > > solr.add(inputDocs); > > ... > > > > inputDocs holds maximum of 1000 documents. After I add inputdocs into > Solr > > Server I truncate inputdocs list. Then I add new 1000 documents into that > > list until every documents send to SolrCloud. When all documents send to > > SolrCloud I call commit command. > > > > My Hadoop job could not send documents into SolrCloud and stops to send > > documents into Solr (Hadoop job fails) When I open my Solr Adming Page I > > see that: > > > > Physical Memory 98.1% > > Swap Space NaN% > > File Descriptor Count 2.5% > > JVM-Memory 1.6% > > > > All in all I think that problem is Physical Memory. I stopped indexing > and > > Physical Memory is usage is still same (it does not goes down). My > machine > > uses CentOS 6.4. Should I drop caches when percentage goes up or what do > > you do for such kind of situations? > > > > > > > > 2013/8/24 Erick Erickson <erickerick...@gmail.com> > > > > > This is sounding like an XY problem. What are you measuring > > > when you say RAM usage is 99%? is this virtual memory? See: > > > > http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html > > > > > > What errors are you seeing when you say: "my node stops to receiving > > > documents"? > > > > > > How are you sending 10M documents? All at once in a huge packet > > > or some smaller number at a time? From where? How? > > > > > > And what does Hadoop have to do with anything? Are you putting > > > the Solr index on Hadoop? How? The recent contrib? > > > > > > In short, you haven't provided very many details. You've been around > > > long enough that I'm surprised you're saying "it doesn't work, how can > > > I fix it?" without providing much in the way of details to help us help > > > you. > > > > > > Best > > > Erick > > > > > > > > > > > > On Sat, Aug 24, 2013 at 1:52 PM, Furkan KAMACI <furkankam...@gmail.com > > > >wrote: > > > > > > > I make a test at my SolrCloud. I try to send 100 millions documents > > into > > > my > > > > node which has no replica via Hadoop. When document count send to > that > > > node > > > > is around 30 millions, RAM usage of my machine becomes 99% (Solr Heap > > > Usage > > > > is not 99%, it uses just 3GB - 4GB of RAM). After a time later my > node > > > > stops to receiving documents to index and the Indexer Job fails as > > well. > > > > > > > > How can I force to clean OS cache (if it is OS cache that blocks) me > or > > > > what should I do (maybe sending 10 million documents and waiting a > > little > > > > etc.) What fellows do at heavy indexing situations? > > > > > > > > > >