Re: How to Manage RAM Usage at Heavy Indexing

P Williams Mon, 09 Sep 2013 11:23:54 -0700

Hi,

I've been seeing the same thing on CentOS with high physical memory use
with low JVM-Memory use.  I came to the conclusion that this was expected
behaviour.  Using top I noticed that my solr user's java process has
Virtual memory allocated of about twice the size of the index, actual is
within the limits I set when jetty starts.  I infer from this that 98% of
Physical Memory is being used to cache the index.  Walter, Erick and others
are constantly reminding people on list to have RAM the size of the index
available -- I think 98% physical memory use is exactly why.  Here is an
excerpt from Uwe Schindler's well written
piece<http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html>which
explains in greater detail:


*"Basically mmap does the same like handling the Lucene index as a swap
file. The mmap() syscall tells the O/S kernel to virtually map our whole
index files into the previously described virtual address space, and make
them look like RAM available to our Lucene process. We can then access our
index file on disk just like it would be a large byte[] array (in Java this
is encapsulated by a ByteBuffer interface to make it safe for use by Java
code). If we access this virtual address space from the Lucene code we
don’t need to do any syscalls, the processor’s MMU and TLB handles all the
mapping for us. If the data is only on disk, the MMU will cause an
interrupt and the O/S kernel will load the data into file system cache. If
it is already in cache, MMU/TLB map it directly to the physical memory in
file system cache. It is now just a native memory access, nothing more! We
don’t have to take care of paging in/out of buffers, all this is managed by
the O/S kernel. Furthermore, we have no concurrency issue, the only
overhead over a standard byte[] array is some wrapping caused by
Java’s ByteBuffer
interface (it is still slower than a real byte[] array, but that is the
only way to use mmap from Java and is much faster than all other directory
implementations shipped with Lucene). We also waste no physical memory, as
we operate directly on the O/S cache, avoiding all Java GC issues described
before."*
*
*
Is it odd that my index is ~16GB but top shows 30GB in virtual memory?
 Would the extra be for the field and filter caches I've increased in size?

I went through a few Java tuning steps relating to OutOfMemoryErrors when
using DataImportHandler with Solr.  The first thing is that when using the
FileEntityProcessor for each file in the file system to be indexed an entry
is made and stored in heap before any indexing actually occurs.  When I
started pointing this at very large directories I started running out of
heap.  One work-around is to divide the job up into smaller batches, but I
was able to allocate more memory so that everything fit.  The next thing is
that with more memory allocated the limiting factor was too many open
files.  After allowing the solr user to open more files I was able to get
past this as well.  There was a sweet spot where indexing with just enough
memory was slow enough that I didn't experience the too many open files
error but why go slow?  Now I'm able to index ~4M documents (newspaper
articles and fulltext monographs) in about 7 hours.

I hope someone will correct me if I'm wrong about anything I've said here
and especially if there is a better way to do things.

Best of luck,
Tricia



On Wed, Aug 28, 2013 at 12:12 PM, Dan Davis <dansm...@gmail.com> wrote:

> This could be an operating systems problem rather than a Solr problem.
> CentOS 6.4 (linux kernel 2.6.32) may have some issues with page flushing
> and I would read-up up on that.
> The VM parameters can be tuned in /etc/sysctl.conf
>
>
> On Sun, Aug 25, 2013 at 4:23 PM, Furkan KAMACI <furkankam...@gmail.com
> >wrote:
>
> > Hi Erick;
> >
> > I wanted to get a quick answer that's why I asked my question as that
> way.
> >
> > Error is as follows:
> >
> > INFO  - 2013-08-21 22:01:30.978;
> > org.apache.solr.update.processor.LogUpdateProcessor; [collection1]
> > webapp=/solr path=/update params={wt=javabin&version=2}
> > {add=[com.deviantart.reachmeh
> > ere:http/gallery/, com.deviantart.reachstereo:http/,
> > com.deviantart.reachstereo:http/art/SE-mods-313298903,
> > com.deviantart.reachtheclouds:http/,
> com.deviantart.reachthegoddess:http/,
> > co
> > m.deviantart.reachthegoddess:http/art/retouched-160219962,
> > com.deviantart.reachthegoddess:http/badges/,
> > com.deviantart.reachthegoddess:http/favourites/,
> > com.deviantart.reachthetop:http/
> > art/Blue-Jean-Baby-82204657 (1444006227844530177),
> > com.deviantart.reachurdreams:http/, ... (163 adds)]} 0 38790
> > ERROR - 2013-08-21 22:01:30.979; org.apache.solr.common.SolrException;
> > java.lang.RuntimeException: [was class org.eclipse.jetty.io.EofException]
> > early EOF
> > at
> >
> >
> com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18)
> > at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731)
> > at
> >
> >
> com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657)
> > at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
> > at org.apache.solr.handler.loader.XMLLoader.readDoc(XMLLoader.java:393)
> > at
> >
> org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:245)
> > at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
> > at
> >
> >
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
> > at
> >
> >
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
> > at
> >
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1812)
> > at
> >
> >
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)
> > at
> >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
> > at
> >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
> > at
> >
> >
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
> > at
> >
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
> > at
> >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
> > at
> >
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)
> > at
> >
> >
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
> > at
> >
> >
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
> > at
> > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
> > at
> >
> >
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
> > at
> >
> >
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
> > at
> >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
> > at
> >
> >
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
> > at
> >
> >
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
> > at
> >
> >
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
> > at org.eclipse.jetty.server.Server.handle(Server.java:365)
> > at
> >
> >
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)
> > at
> >
> >
> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
> > at
> >
> >
> org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:937)
> > at
> >
> >
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:998)
> > at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:948)
> > at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
> > at
> >
> >
> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
> > at
> >
> >
> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
> > at
> >
> >
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
> > at
> >
> >
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
> > at java.lang.Thread.run(Thread.java:722)
> > Caused by: org.eclipse.jetty.io.EofException: early EOF
> > at org.eclipse.jetty.server.HttpInput.read(HttpInput.java:65)
> > at java.io.InputStream.read(InputStream.java:101)
> > at com.ctc.wstx.io.UTF8Reader.loadMore(UTF8Reader.java:365)
> > at com.ctc.wstx.io.UTF8Reader.read(UTF8Reader.java:110)
> > at com.ctc.wstx.io.MergedReader.read(MergedReader.java:101)
> > at com.ctc.wstx.io.ReaderSource.readInto(ReaderSource.java:84)
> > at
> >
> >
> com.ctc.wstx.io.BranchingReaderSource.readInto(BranchingReaderSource.java:57)
> > at com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:992)
> > at
> >
> >
> com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4628)
> > at
> >
> >
> com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java:4126)
> > at
> >
> com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701)
> > at
> >
> >
> com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3649)
> > ... 36 more
> >
> > ERROR - 2013-08-21 22:01:30.980; org.apache.solr.common.SolrException;
> > null:java.lang.RuntimeException: [was class
> > org.eclipse.jetty.io.EofException] early EOF
> > at
> >
> >
> com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18)
> > at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731)
> > at
> >
> >
> com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657)
> > at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
> > at org.apache.solr.handler.loader.XMLLoader.readDoc(XMLLoader.java:393)
> > at
> >
> org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:245)
> > at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
> > at
> >
> >
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
> > at
> >
> >
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
> > at
> >
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1812)
> > at
> >
> >
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)
> > at
> >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
> >
> >
> > I use Nutch (that uses Hadoop) to send documents from Hbase to Solr. I am
> > not indexing documents at Hadoop. I just send documents via Map/Reduce
> jobs
> > into my SolrCloud. Nutch sends documents as like that:
> >
> > ...
> > SolrServer solr = new CommonsHttpSolrServer(solrUrl);
> > ...
> >  private final List<SolrInputDocument> inputDocs =  new
> > ArrayList<SolrInputDocument>();
> > ...
> > solr.add(inputDocs);
> > ...
> >
> > inputDocs holds maximum of 1000 documents. After I add inputdocs into
> Solr
> > Server I truncate inputdocs list. Then I add new 1000 documents into that
> > list until every documents send to SolrCloud. When all documents send to
> > SolrCloud I call commit command.
> >
> > My Hadoop job could not send documents into SolrCloud and stops to send
> > documents into Solr (Hadoop job fails) When I open my Solr Adming Page I
> > see that:
> >
> > Physical Memory  98.1%
> > Swap Space NaN%
> > File Descriptor Count 2.5%
> > JVM-Memory 1.6%
> >
> > All in all I think that problem is Physical Memory. I stopped indexing
> and
> > Physical Memory is usage is still same (it does not goes down). My
> machine
> > uses CentOS 6.4. Should I drop caches when percentage goes up or what do
> > you do for such kind of situations?
> >
> >
> >
> > 2013/8/24 Erick Erickson <erickerick...@gmail.com>
> >
> > > This is sounding like an XY problem. What are you measuring
> > > when you say RAM usage is 99%? is this virtual memory? See:
> > >
> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
> > >
> > > What errors are you seeing when you say: "my node stops to receiving
> > > documents"?
> > >
> > > How are you sending 10M documents? All at once in a huge packet
> > > or some smaller number at a time? From where? How?
> > >
> > > And what does Hadoop have to do with anything? Are you putting
> > > the Solr index on Hadoop? How? The recent contrib?
> > >
> > > In short, you haven't provided very many details. You've been around
> > > long enough that I'm surprised you're saying "it doesn't work, how can
> > > I fix it?" without providing much in the way of details to help us help
> > > you.
> > >
> > > Best
> > > Erick
> > >
> > >
> > >
> > > On Sat, Aug 24, 2013 at 1:52 PM, Furkan KAMACI <furkankam...@gmail.com
> > > >wrote:
> > >
> > > > I make a test at my SolrCloud. I try to send 100 millions documents
> > into
> > > my
> > > > node which has no replica via Hadoop. When document count send to
> that
> > > node
> > > > is around 30 millions, RAM usage of my machine becomes 99% (Solr Heap
> > > Usage
> > > > is not 99%, it uses just 3GB - 4GB of RAM). After a time later my
> node
> > > > stops to receiving documents to index and the Indexer Job fails as
> > well.
> > > >
> > > > How can I force to clean OS cache (if it is OS cache that blocks) me
> or
> > > > what should I do (maybe sending 10 million documents and waiting a
> > little
> > > > etc.) What fellows do at heavy indexing situations?
> > > >
> > >
> >
>

Re: How to Manage RAM Usage at Heavy Indexing

Reply via email to