Re: Trying to understand SOLR memory requirements

qiu chi Mon, 16 Jan 2012 19:20:37 -0800

I remembered there is another implementation using lucene index file as the
look up table not the in memory FST


FST has its advantage in speed but if you writes documents during runtime,
reconstructing FST may cause performance issue
On Tue, Jan 17, 2012 at 11:08 AM, Robert Muir <rcm...@gmail.com> wrote:

> looks like https://issues.apache.org/jira/browse/SOLR-2888.
>
> Previously, FST would need to hold all the terms in RAM during
> construction, but with the patch it uses offline sorts/temporary
> files.
> I'll reopen the issue to backport this to the 3.x branch.
>
>
> On Mon, Jan 16, 2012 at 8:31 PM, Dave <dla...@gmail.com> wrote:
> > I'm trying to figure out what my memory needs are for a rather large
> > dataset. I'm trying to build an auto-complete system for every
> > city/state/country in the world. I've got a geographic database, and have
> > setup the DIH to pull the proper data in. There are 2,784,937 documents
> > which I've formatted into JSON-like output, so there's a bit of data
> > associated with each one. Here is an example record:
> >
> > Brooklyn, New York, United States?{ |id|: |2620829|,
> > |timezone|:|America/New_York|,|type|: |3|, |country|: { |id| : |229| },
> > |region|: { |id| : |3608| }, |city|: { |id|: |2616971|, |plainname|:
> > |Brooklyn|, |name|: |Brooklyn, New York, United States| }, |hint|:
> > |2300664|, |label|: |Brooklyn, New York, United States|, |value|:
> > |Brooklyn, New York, United States|, |title|: |Brooklyn, New York, United
> > States| }
> >
> > I've got the spellchecker / suggester module setup, and I can confirm
> that
> > everything works properly with a smaller dataset (i.e. just a couple of
> > countries worth of cities/states). However I'm running into a big problem
> > when I try to index the entire dataset. The
> dataimport?command=full-import
> > works and the system comes to an idle state. It generates the following
> > data/index/ directory (I'm including it in case it gives any indication
> on
> > memory requirements):
> >
> > -rw-rw---- 1 root   root   2.2G Jan 17 00:13 _2w.fdt
> > -rw-rw---- 1 root   root    22M Jan 17 00:13 _2w.fdx
> > -rw-rw---- 1 root   root    131 Jan 17 00:13 _2w.fnm
> > -rw-rw---- 1 root   root   134M Jan 17 00:13 _2w.frq
> > -rw-rw---- 1 root   root    16M Jan 17 00:13 _2w.nrm
> > -rw-rw---- 1 root   root   130M Jan 17 00:13 _2w.prx
> > -rw-rw---- 1 root   root   9.2M Jan 17 00:13 _2w.tii
> > -rw-rw---- 1 root   root   1.1G Jan 17 00:13 _2w.tis
> > -rw-rw---- 1 root   root     20 Jan 17 00:13 segments.gen
> > -rw-rw---- 1 root   root    291 Jan 17 00:13 segments_2
> >
> > Next I try to run the suggest?spellcheck.build=true command, and I get
> the
> > following error:
> >
> > Jan 16, 2012 4:01:47 PM org.apache.solr.spelling.suggest.Suggester build
> > INFO: build()
> > Jan 16, 2012 4:03:27 PM org.apache.solr.common.SolrException log
> > SEVERE: java.lang.OutOfMemoryError: GC overhead limit exceeded
> >  at java.util.Arrays.copyOfRange(Arrays.java:3209)
> > at java.lang.String.<init>(String.java:215)
> >  at org.apache.lucene.index.TermBuffer.toTerm(TermBuffer.java:122)
> > at org.apache.lucene.index.SegmentTermEnum.term(SegmentTermEnum.java:184)
> >  at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:203)
> > at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:172)
> >  at org.apache.lucene.index.SegmentReader.docFreq(SegmentReader.java:509)
> > at
> org.apache.lucene.index.DirectoryReader.docFreq(DirectoryReader.java:719)
> >  at
> org.apache.solr.search.SolrIndexReader.docFreq(SolrIndexReader.java:309)
> > at
> >
> org.apache.lucene.search.spell.HighFrequencyDictionary$HighFrequencyIterator.isFrequent(HighFrequencyDictionary.java:75)
> >  at
> >
> org.apache.lucene.search.spell.HighFrequencyDictionary$HighFrequencyIterator.hasNext(HighFrequencyDictionary.java:125)
> > at
> org.apache.lucene.search.suggest.fst.FSTLookup.build(FSTLookup.java:157)
> >  at org.apache.lucene.search.suggest.Lookup.build(Lookup.java:70)
> > at org.apache.solr.spelling.suggest.Suggester.build(Suggester.java:133)
> >  at
> >
> org.apache.solr.handler.component.SpellCheckComponent.prepare(SpellCheckComponent.java:109)
> > at
> >
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:173)
> >  at
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372)
> >  at
> >
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
> > at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
> >  at
> >
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> > at
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
> >  at
> >
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> > at
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
> >  at
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> > at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
> >  at
> >
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> > at
> >
> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
> >  at
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> > at org.mortbay.jetty.Server.handle(Server.java:326)
> >  at
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
> > at
> >
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
> >
> >
> > I also get an error if after the dataimport command completes, I just
> exit
> > the SOLR process and restart it:
> >
> > Jan 16, 2012 4:06:15 PM org.apache.solr.common.SolrException log
> > SEVERE: java.lang.OutOfMemoryError: Java heap space
> > at org.apache.lucene.util.fst.NodeHash.rehash(NodeHash.java:158)
> > at org.apache.lucene.util.fst.NodeHash.add(NodeHash.java:128)
> >  at org.apache.lucene.util.fst.Builder.compileNode(Builder.java:161)
> > at org.apache.lucene.util.fst.Builder.compilePrevTail(Builder.java:247)
> >  at org.apache.lucene.util.fst.Builder.add(Builder.java:364)
> > at
> >
> org.apache.lucene.search.suggest.fst.FSTLookup.buildAutomaton(FSTLookup.java:486)
> >  at
> org.apache.lucene.search.suggest.fst.FSTLookup.build(FSTLookup.java:179)
> > at org.apache.lucene.search.suggest.Lookup.build(Lookup.java:70)
> >  at org.apache.solr.spelling.suggest.Suggester.build(Suggester.java:133)
> > at org.apache.solr.spelling.suggest.Suggester.reload(Suggester.java:153)
> >  at
> >
> org.apache.solr.handler.component.SpellCheckComponent$SpellCheckerListener.newSearcher(SpellCheckComponent.java:675)
> > at org.apache.solr.core.SolrCore$3.call(SolrCore.java:1181)
> >  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> > at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> >  at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> >  at java.lang.Thread.run(Thread.java:662)
> >
> > Jan 16, 2012 4:06:15 PM org.apache.solr.core.SolrCore registerSearcher
> > INFO: [places] Registered new searcher Searcher@34b0ede5 main
> >
> >
> >
> > Basically this means once I've run a full-import, I cannot exit the SOLR
> > process because I receive this error no matter what when I restart the
> > process. I've tried with different -Xmx arguments, and I'm really at a
> loss
> > at this point. Is there any guideline to how much RAM I need? I've got
> 8GB
> > on this machine, although that could be increased if necessary. However,
> I
> > can't understand why it would need so much memory. Could I have something
> > configured incorrectly? I've been over the configs several times, trying
> to
> > get them down to the bare minimum.
> >
> > Thanks for any assistance!
> >
> > Dave
>
>
>
> --
> lucidimagination.com
>



-- 
Regards
Qiu
- chiqiu....@gmail.com

Re: Trying to understand SOLR memory requirements

Reply via email to