I remembered there is another implementation using lucene index file as the look up table not the in memory FST
FST has its advantage in speed but if you writes documents during runtime, reconstructing FST may cause performance issue On Tue, Jan 17, 2012 at 11:08 AM, Robert Muir <rcm...@gmail.com> wrote: > looks like https://issues.apache.org/jira/browse/SOLR-2888. > > Previously, FST would need to hold all the terms in RAM during > construction, but with the patch it uses offline sorts/temporary > files. > I'll reopen the issue to backport this to the 3.x branch. > > > On Mon, Jan 16, 2012 at 8:31 PM, Dave <dla...@gmail.com> wrote: > > I'm trying to figure out what my memory needs are for a rather large > > dataset. I'm trying to build an auto-complete system for every > > city/state/country in the world. I've got a geographic database, and have > > setup the DIH to pull the proper data in. There are 2,784,937 documents > > which I've formatted into JSON-like output, so there's a bit of data > > associated with each one. Here is an example record: > > > > Brooklyn, New York, United States?{ |id|: |2620829|, > > |timezone|:|America/New_York|,|type|: |3|, |country|: { |id| : |229| }, > > |region|: { |id| : |3608| }, |city|: { |id|: |2616971|, |plainname|: > > |Brooklyn|, |name|: |Brooklyn, New York, United States| }, |hint|: > > |2300664|, |label|: |Brooklyn, New York, United States|, |value|: > > |Brooklyn, New York, United States|, |title|: |Brooklyn, New York, United > > States| } > > > > I've got the spellchecker / suggester module setup, and I can confirm > that > > everything works properly with a smaller dataset (i.e. just a couple of > > countries worth of cities/states). However I'm running into a big problem > > when I try to index the entire dataset. The > dataimport?command=full-import > > works and the system comes to an idle state. It generates the following > > data/index/ directory (I'm including it in case it gives any indication > on > > memory requirements): > > > > -rw-rw---- 1 root root 2.2G Jan 17 00:13 _2w.fdt > > -rw-rw---- 1 root root 22M Jan 17 00:13 _2w.fdx > > -rw-rw---- 1 root root 131 Jan 17 00:13 _2w.fnm > > -rw-rw---- 1 root root 134M Jan 17 00:13 _2w.frq > > -rw-rw---- 1 root root 16M Jan 17 00:13 _2w.nrm > > -rw-rw---- 1 root root 130M Jan 17 00:13 _2w.prx > > -rw-rw---- 1 root root 9.2M Jan 17 00:13 _2w.tii > > -rw-rw---- 1 root root 1.1G Jan 17 00:13 _2w.tis > > -rw-rw---- 1 root root 20 Jan 17 00:13 segments.gen > > -rw-rw---- 1 root root 291 Jan 17 00:13 segments_2 > > > > Next I try to run the suggest?spellcheck.build=true command, and I get > the > > following error: > > > > Jan 16, 2012 4:01:47 PM org.apache.solr.spelling.suggest.Suggester build > > INFO: build() > > Jan 16, 2012 4:03:27 PM org.apache.solr.common.SolrException log > > SEVERE: java.lang.OutOfMemoryError: GC overhead limit exceeded > > at java.util.Arrays.copyOfRange(Arrays.java:3209) > > at java.lang.String.<init>(String.java:215) > > at org.apache.lucene.index.TermBuffer.toTerm(TermBuffer.java:122) > > at org.apache.lucene.index.SegmentTermEnum.term(SegmentTermEnum.java:184) > > at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:203) > > at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:172) > > at org.apache.lucene.index.SegmentReader.docFreq(SegmentReader.java:509) > > at > org.apache.lucene.index.DirectoryReader.docFreq(DirectoryReader.java:719) > > at > org.apache.solr.search.SolrIndexReader.docFreq(SolrIndexReader.java:309) > > at > > > org.apache.lucene.search.spell.HighFrequencyDictionary$HighFrequencyIterator.isFrequent(HighFrequencyDictionary.java:75) > > at > > > org.apache.lucene.search.spell.HighFrequencyDictionary$HighFrequencyIterator.hasNext(HighFrequencyDictionary.java:125) > > at > org.apache.lucene.search.suggest.fst.FSTLookup.build(FSTLookup.java:157) > > at org.apache.lucene.search.suggest.Lookup.build(Lookup.java:70) > > at org.apache.solr.spelling.suggest.Suggester.build(Suggester.java:133) > > at > > > org.apache.solr.handler.component.SpellCheckComponent.prepare(SpellCheckComponent.java:109) > > at > > > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:173) > > at > > > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) > > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372) > > at > > > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) > > at > > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) > > at > > > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > > at > > > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > > at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) > > at > > > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > > at > > > org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) > > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > > at org.mortbay.jetty.Server.handle(Server.java:326) > > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) > > at > > > org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) > > > > > > I also get an error if after the dataimport command completes, I just > exit > > the SOLR process and restart it: > > > > Jan 16, 2012 4:06:15 PM org.apache.solr.common.SolrException log > > SEVERE: java.lang.OutOfMemoryError: Java heap space > > at org.apache.lucene.util.fst.NodeHash.rehash(NodeHash.java:158) > > at org.apache.lucene.util.fst.NodeHash.add(NodeHash.java:128) > > at org.apache.lucene.util.fst.Builder.compileNode(Builder.java:161) > > at org.apache.lucene.util.fst.Builder.compilePrevTail(Builder.java:247) > > at org.apache.lucene.util.fst.Builder.add(Builder.java:364) > > at > > > org.apache.lucene.search.suggest.fst.FSTLookup.buildAutomaton(FSTLookup.java:486) > > at > org.apache.lucene.search.suggest.fst.FSTLookup.build(FSTLookup.java:179) > > at org.apache.lucene.search.suggest.Lookup.build(Lookup.java:70) > > at org.apache.solr.spelling.suggest.Suggester.build(Suggester.java:133) > > at org.apache.solr.spelling.suggest.Suggester.reload(Suggester.java:153) > > at > > > org.apache.solr.handler.component.SpellCheckComponent$SpellCheckerListener.newSearcher(SpellCheckComponent.java:675) > > at org.apache.solr.core.SolrCore$3.call(SolrCore.java:1181) > > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > > at > > > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > > at > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > > at java.lang.Thread.run(Thread.java:662) > > > > Jan 16, 2012 4:06:15 PM org.apache.solr.core.SolrCore registerSearcher > > INFO: [places] Registered new searcher Searcher@34b0ede5 main > > > > > > > > Basically this means once I've run a full-import, I cannot exit the SOLR > > process because I receive this error no matter what when I restart the > > process. I've tried with different -Xmx arguments, and I'm really at a > loss > > at this point. Is there any guideline to how much RAM I need? I've got > 8GB > > on this machine, although that could be increased if necessary. However, > I > > can't understand why it would need so much memory. Could I have something > > configured incorrectly? I've been over the configs several times, trying > to > > get them down to the bare minimum. > > > > Thanks for any assistance! > > > > Dave > > > > -- > lucidimagination.com > -- Regards Qiu - chiqiu....@gmail.com