In my original post I included one of my terms: Brooklyn, New York, United States?{ |id|: |2620829|, |timezone|:|America/New_York|,|type|: |3|, |country|: { |id| : |229| }, |region|: { |id| : |3608| }, |city|: { |id|: |2616971|, |plainname|: |Brooklyn|, |name|: |Brooklyn, New York, United States| }, |hint|: |2300664|, |label|: |Brooklyn, New York, United States|, |value|: |Brooklyn, New York, United States|, |title|: |Brooklyn, New York, United States| }
I'm matching on the first part of the term (the part before the ?), and then the rest is being passed via JSON into Javascript, then converted to a JSON term itself. Here is my data-config.xml file, in case it sheds any light: <dataConfig> <dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="" user="" password="" encoding="UTF-8"/> <document> <entity name="countries" pk="id" query="select p.id as placeid, c.id, c.plainname, c.name, p.timezone from countries c, places p where p.regionid = 1 AND p.cityid = 1 AND c.id=p.countryid AND p.settingid=1" transformer="TemplateTransformer"> <field column="id" name="countryid"/> <field column="plainname" name="countryname"/> <field column="name" name="fullcountryname"/> <field column="placeid" name="place_id"/> <field column="timezone" name="timezone"/> <field column="countryinfo" template="${countries.plainname}?{ |id|: |${countries.placeid}|, |timezone|:|${countries.timezone}|,|type|: |1|, |country|: { |id| : |${countries.id}|, |plainname|: |${countries.plainname}|, |name|: |${countries.plainname}| }, |region|: { |id| : |0| }, |city|: { |id|: |0| }, |hint|: ||, |label|: |${countries.plainname}|, |value|: |${countries.plainname}|, |title|: |${countries.plainname}| }"/> </entity> <entity name="regions" pk="id" query="select p.id as placeid, p.countryid as countryid, c.plainname as countryname, p.timezone as timezone, r.id as regionid, r.plainname as regionname, r.population as regionpop from places p, regions r, countries c where r.id = p.regionid AND p.settingid = 1 AND p.regionid > 1 AND p.countryid=c.id AND p.cityid=1 AND r.population > 0" transformer="TemplateTransformer"> <field column="regionid" name="regionid"/> <field column="regionname" name="regionname"/> <field column="regionpop" name="regionpop"/> <field column="countryid" name="countryid"/> <field column="timezone" name="timezone"/> <field column="regioninfo" template="${regions.regionname}, ${regions.countryname}?{ |id|: |${regions.placeid}|, |timezone|:|${regions.timezone}|,|type|: |2|, |country|: { |id| : |${regions.countryid}| }, |region|: { |id| : |${regions.regionid}|, |plainname|: |${regions.regionname}|, |name|: |${regions.regionname}, ${regions.countryname}| }, |city|: { |id|: |0| }, |hint|: |${regions.regionpop}|, |label|: |${regions.regionname}, ${regions.countryname}|, |value|: |${regions.regionname}, ${regions.countryname}|, |title|: |${regions.regionname}, ${regions.countryname}| }"/> </entity> <entity name="cities" pk="id" query="select c2.id as cityid, c2.plainname as cityname, c2.population as citypop, p.id as placeid, p.countryid as countryid, c.plainname as countryname, p.timezone as timezone, r.id as regionid, r.plainname as regionname from places p, regions r, countries c, cities c2 where c2.id = p.cityid AND p.settingid = 1 AND p.regionid > 1 AND p.countryid=c.id AND r.id=p.regionid" transformer="TemplateTransformer"> <field column="cityid" name="cityid"/> <field column="cityname" name="cityname"/> <field column="citypop" name="citypop"/> <field column="placeid" name="place_id2"/> <field column="regionid" name="regionid"/> <field column="regionname" name="regionname"/> <field column="countryid" name="countryid"/> <field column="plainname" name="countryname"/> <field column="timezone" name="timezone"/> <field column="fullplacename" template="${cities.cityname}, ${cities.regionname}, ${cities.countryname}?{ |id|: |${cities.placeid}|, |timezone|:|${cities.timezone}|,|type|: |3|, |country|: { |id| : |${cities.countryid}| }, |region|: { |id| : |${cities.regionid}| }, |city|: { |id|: |${cities.cityid}|, |plainname|: |${cities.cityname}|, |name|: |${cities.cityname}, ${cities.regionname}, ${cities.countryname}| }, |hint|: |${cities.citypop}|, |label|: |${cities.cityname}, ${cities.regionname}, ${cities.countryname}|, |value|: |${cities.cityname}, ${cities.regionname}, ${cities.countryname}|, |title|: |${cities.cityname}, ${cities.regionname}, ${cities.countryname}| }"/> </entity> </document> </dataConfig> On Thu, Jan 19, 2012 at 11:52 AM, Robert Muir <rcm...@gmail.com> wrote: > I don't think the problem is FST, since it sorts offline in your case. > > More importantly, what are you trying to put into the FST? > > it appears you are indexing terms from your term dictionary, but your > term dictionary is over 1GB, why is that? > > what do your terms look like? 1GB for 2,784,937 documents does not make > sense. > for example, all place names in geonames (7.2M documents) creates a > term dictionary of 22MB. > > So there is something wrong with your data importing and/or analysis > process, your terms are not what you think they are. > > On Thu, Jan 19, 2012 at 11:27 AM, Dave <dla...@gmail.com> wrote: > > I'm also seeing the error when I try to start up the SOLR instance: > > > > SEVERE: java.lang.OutOfMemoryError: Java heap space > > at org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:344) > > at org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:352) > > at org.apache.lucene.util.fst.FST$BytesWriter.writeByte(FST.java:975) > > at org.apache.lucene.util.fst.FST.writeLabel(FST.java:395) > > at org.apache.lucene.util.fst.FST.addNode(FST.java:499) > > at org.apache.lucene.util.fst.Builder.compileNode(Builder.java:182) > > at org.apache.lucene.util.fst.Builder.freezeTail(Builder.java:270) > > at org.apache.lucene.util.fst.Builder.add(Builder.java:365) > > at > > > org.apache.lucene.search.suggest.fst.FSTCompletionBuilder.buildAutomaton(FSTCompletionBuilder.java:228) > > at > > > org.apache.lucene.search.suggest.fst.FSTCompletionBuilder.build(FSTCompletionBuilder.java:202) > > at > > > org.apache.lucene.search.suggest.fst.FSTCompletionLookup.build(FSTCompletionLookup.java:199) > > at org.apache.lucene.search.suggest.Lookup.build(Lookup.java:70) > > at org.apache.solr.spelling.suggest.Suggester.build(Suggester.java:133) > > at org.apache.solr.spelling.suggest.Suggester.reload(Suggester.java:153) > > at > > > org.apache.solr.handler.component.SpellCheckComponent$SpellCheckerListener.newSearcher(SpellCheckComponent.java:675) > > at org.apache.solr.core.SolrCore$3.call(SolrCore.java:1184) > > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > > at > > > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > > at > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > > at java.lang.Thread.run(Thread.java:662) > > > > > > On Wed, Jan 18, 2012 at 5:24 PM, Dave <dla...@gmail.com> wrote: > > > >> Unfortunately, that doesn't look like it solved my problem. I built the > >> new .war file, dropped it in, and restarted the server. When I tried to > >> build the spellchecker index, it ran out of memory again. Is there > anything > >> I needed to change in the configuration? Did I need to upload new .jar > >> files, or was replacing the .war file enough? > >> > >> Jan 18, 2012 2:20:25 PM org.apache.solr.spelling.suggest.Suggester build > >> INFO: build() > >> > >> > >> Jan 18, 2012 2:22:06 PM org.apache.solr.common.SolrException log > >> SEVERE: java.lang.OutOfMemoryError: Java heap space > >> at org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:344) > >> at org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:352) > >> at org.apache.lucene.util.fst.FST$BytesWriter.writeByte(FST.java:975) > >> at org.apache.lucene.util.fst.FST.writeLabel(FST.java:395) > >> at org.apache.lucene.util.fst.FST.addNode(FST.java:499) > >> at org.apache.lucene.util.fst.Builder.compileNode(Builder.java:182) > >> at org.apache.lucene.util.fst.Builder.freezeTail(Builder.java:270) > >> at org.apache.lucene.util.fst.Builder.add(Builder.java:365) > >> at > >> > org.apache.lucene.search.suggest.fst.FSTCompletionBuilder.buildAutomaton(FSTCompletionBuilder.java:228) > >> at > >> > org.apache.lucene.search.suggest.fst.FSTCompletionBuilder.build(FSTCompletionBuilder.java:202) > >> at > >> > org.apache.lucene.search.suggest.fst.FSTCompletionLookup.build(FSTCompletionLookup.java:199) > >> at org.apache.lucene.search.suggest.Lookup.build(Lookup.java:70) > >> at org.apache.solr.spelling.suggest.Suggester.build(Suggester.java:133) > >> at > >> > org.apache.solr.handler.component.SpellCheckComponent.prepare(SpellCheckComponent.java:109) > >> at > >> > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174) > >> at > >> > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) > >> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1375) > >> at > >> > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:358) > >> at > >> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:253) > >> at > >> > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > >> at > >> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > >> at > >> > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > >> at > >> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > >> at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > >> at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) > >> at > >> > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > >> at > >> > org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) > >> at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > >> at org.mortbay.jetty.Server.handle(Server.java:326) > >> at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) > >> at > >> > org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) > >> at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) > >> > >> > >> On Tue, Jan 17, 2012 at 8:59 AM, Robert Muir <rcm...@gmail.com> wrote: > >> > >>> I committed it already: so you can try out branch_3x if you want. > >>> > >>> you can either wait for a nightly build or compile from svn > >>> (http://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/). > >>> > >>> On Tue, Jan 17, 2012 at 8:35 AM, Dave <dla...@gmail.com> wrote: > >>> > Thank you Robert, I'd appreciate that. Any idea how long it will > take to > >>> > get a fix? Would I be better switching to trunk? Is trunk stable > enough > >>> for > >>> > someone who's very much a SOLR novice? > >>> > > >>> > Thanks, > >>> > Dave > >>> > > >>> > On Mon, Jan 16, 2012 at 10:08 PM, Robert Muir <rcm...@gmail.com> > wrote: > >>> > > >>> >> looks like https://issues.apache.org/jira/browse/SOLR-2888. > >>> >> > >>> >> Previously, FST would need to hold all the terms in RAM during > >>> >> construction, but with the patch it uses offline sorts/temporary > >>> >> files. > >>> >> I'll reopen the issue to backport this to the 3.x branch. > >>> >> > >>> >> > >>> >> On Mon, Jan 16, 2012 at 8:31 PM, Dave <dla...@gmail.com> wrote: > >>> >> > I'm trying to figure out what my memory needs are for a rather > large > >>> >> > dataset. I'm trying to build an auto-complete system for every > >>> >> > city/state/country in the world. I've got a geographic database, > and > >>> have > >>> >> > setup the DIH to pull the proper data in. There are 2,784,937 > >>> documents > >>> >> > which I've formatted into JSON-like output, so there's a bit of > data > >>> >> > associated with each one. Here is an example record: > >>> >> > > >>> >> > Brooklyn, New York, United States?{ |id|: |2620829|, > >>> >> > |timezone|:|America/New_York|,|type|: |3|, |country|: { |id| : > |229| > >>> }, > >>> >> > |region|: { |id| : |3608| }, |city|: { |id|: |2616971|, > |plainname|: > >>> >> > |Brooklyn|, |name|: |Brooklyn, New York, United States| }, |hint|: > >>> >> > |2300664|, |label|: |Brooklyn, New York, United States|, |value|: > >>> >> > |Brooklyn, New York, United States|, |title|: |Brooklyn, New York, > >>> United > >>> >> > States| } > >>> >> > > >>> >> > I've got the spellchecker / suggester module setup, and I can > confirm > >>> >> that > >>> >> > everything works properly with a smaller dataset (i.e. just a > couple > >>> of > >>> >> > countries worth of cities/states). However I'm running into a big > >>> problem > >>> >> > when I try to index the entire dataset. The > >>> >> dataimport?command=full-import > >>> >> > works and the system comes to an idle state. It generates the > >>> following > >>> >> > data/index/ directory (I'm including it in case it gives any > >>> indication > >>> >> on > >>> >> > memory requirements): > >>> >> > > >>> >> > -rw-rw---- 1 root root 2.2G Jan 17 00:13 _2w.fdt > >>> >> > -rw-rw---- 1 root root 22M Jan 17 00:13 _2w.fdx > >>> >> > -rw-rw---- 1 root root 131 Jan 17 00:13 _2w.fnm > >>> >> > -rw-rw---- 1 root root 134M Jan 17 00:13 _2w.frq > >>> >> > -rw-rw---- 1 root root 16M Jan 17 00:13 _2w.nrm > >>> >> > -rw-rw---- 1 root root 130M Jan 17 00:13 _2w.prx > >>> >> > -rw-rw---- 1 root root 9.2M Jan 17 00:13 _2w.tii > >>> >> > -rw-rw---- 1 root root 1.1G Jan 17 00:13 _2w.tis > >>> >> > -rw-rw---- 1 root root 20 Jan 17 00:13 segments.gen > >>> >> > -rw-rw---- 1 root root 291 Jan 17 00:13 segments_2 > >>> >> > > >>> >> > Next I try to run the suggest?spellcheck.build=true command, and I > >>> get > >>> >> the > >>> >> > following error: > >>> >> > > >>> >> > Jan 16, 2012 4:01:47 PM org.apache.solr.spelling.suggest.Suggester > >>> build > >>> >> > INFO: build() > >>> >> > Jan 16, 2012 4:03:27 PM org.apache.solr.common.SolrException log > >>> >> > SEVERE: java.lang.OutOfMemoryError: GC overhead limit exceeded > >>> >> > at java.util.Arrays.copyOfRange(Arrays.java:3209) > >>> >> > at java.lang.String.<init>(String.java:215) > >>> >> > at org.apache.lucene.index.TermBuffer.toTerm(TermBuffer.java:122) > >>> >> > at > >>> org.apache.lucene.index.SegmentTermEnum.term(SegmentTermEnum.java:184) > >>> >> > at > >>> org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:203) > >>> >> > at > >>> org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:172) > >>> >> > at > >>> org.apache.lucene.index.SegmentReader.docFreq(SegmentReader.java:509) > >>> >> > at > >>> >> > >>> > org.apache.lucene.index.DirectoryReader.docFreq(DirectoryReader.java:719) > >>> >> > at > >>> >> > >>> > org.apache.solr.search.SolrIndexReader.docFreq(SolrIndexReader.java:309) > >>> >> > at > >>> >> > > >>> >> > >>> > org.apache.lucene.search.spell.HighFrequencyDictionary$HighFrequencyIterator.isFrequent(HighFrequencyDictionary.java:75) > >>> >> > at > >>> >> > > >>> >> > >>> > org.apache.lucene.search.spell.HighFrequencyDictionary$HighFrequencyIterator.hasNext(HighFrequencyDictionary.java:125) > >>> >> > at > >>> >> > >>> > org.apache.lucene.search.suggest.fst.FSTLookup.build(FSTLookup.java:157) > >>> >> > at org.apache.lucene.search.suggest.Lookup.build(Lookup.java:70) > >>> >> > at > >>> org.apache.solr.spelling.suggest.Suggester.build(Suggester.java:133) > >>> >> > at > >>> >> > > >>> >> > >>> > org.apache.solr.handler.component.SpellCheckComponent.prepare(SpellCheckComponent.java:109) > >>> >> > at > >>> >> > > >>> >> > >>> > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:173) > >>> >> > at > >>> >> > > >>> >> > >>> > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) > >>> >> > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372) > >>> >> > at > >>> >> > > >>> >> > >>> > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) > >>> >> > at > >>> >> > > >>> >> > >>> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) > >>> >> > at > >>> >> > > >>> >> > >>> > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > >>> >> > at > >>> >> > >>> > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > >>> >> > at > >>> >> > > >>> >> > >>> > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > >>> >> > at > >>> >> > >>> > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > >>> >> > at > >>> >> > >>> > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > >>> >> > at > >>> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) > >>> >> > at > >>> >> > > >>> >> > >>> > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > >>> >> > at > >>> >> > > >>> >> > >>> > org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) > >>> >> > at > >>> >> > >>> > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > >>> >> > at org.mortbay.jetty.Server.handle(Server.java:326) > >>> >> > at > >>> >> > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) > >>> >> > at > >>> >> > > >>> >> > >>> > org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) > >>> >> > > >>> >> > > >>> >> > I also get an error if after the dataimport command completes, I > just > >>> >> exit > >>> >> > the SOLR process and restart it: > >>> >> > > >>> >> > Jan 16, 2012 4:06:15 PM org.apache.solr.common.SolrException log > >>> >> > SEVERE: java.lang.OutOfMemoryError: Java heap space > >>> >> > at org.apache.lucene.util.fst.NodeHash.rehash(NodeHash.java:158) > >>> >> > at org.apache.lucene.util.fst.NodeHash.add(NodeHash.java:128) > >>> >> > at > org.apache.lucene.util.fst.Builder.compileNode(Builder.java:161) > >>> >> > at > >>> org.apache.lucene.util.fst.Builder.compilePrevTail(Builder.java:247) > >>> >> > at org.apache.lucene.util.fst.Builder.add(Builder.java:364) > >>> >> > at > >>> >> > > >>> >> > >>> > org.apache.lucene.search.suggest.fst.FSTLookup.buildAutomaton(FSTLookup.java:486) > >>> >> > at > >>> >> > >>> > org.apache.lucene.search.suggest.fst.FSTLookup.build(FSTLookup.java:179) > >>> >> > at org.apache.lucene.search.suggest.Lookup.build(Lookup.java:70) > >>> >> > at > >>> org.apache.solr.spelling.suggest.Suggester.build(Suggester.java:133) > >>> >> > at > >>> org.apache.solr.spelling.suggest.Suggester.reload(Suggester.java:153) > >>> >> > at > >>> >> > > >>> >> > >>> > org.apache.solr.handler.component.SpellCheckComponent$SpellCheckerListener.newSearcher(SpellCheckComponent.java:675) > >>> >> > at org.apache.solr.core.SolrCore$3.call(SolrCore.java:1181) > >>> >> > at > >>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > >>> >> > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > >>> >> > at > >>> >> > > >>> >> > >>> > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > >>> >> > at > >>> >> > > >>> >> > >>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > >>> >> > at java.lang.Thread.run(Thread.java:662) > >>> >> > > >>> >> > Jan 16, 2012 4:06:15 PM org.apache.solr.core.SolrCore > >>> registerSearcher > >>> >> > INFO: [places] Registered new searcher Searcher@34b0ede5 main > >>> >> > > >>> >> > > >>> >> > > >>> >> > Basically this means once I've run a full-import, I cannot exit > the > >>> SOLR > >>> >> > process because I receive this error no matter what when I restart > >>> the > >>> >> > process. I've tried with different -Xmx arguments, and I'm really > at > >>> a > >>> >> loss > >>> >> > at this point. Is there any guideline to how much RAM I need? I've > >>> got > >>> >> 8GB > >>> >> > on this machine, although that could be increased if necessary. > >>> However, > >>> >> I > >>> >> > can't understand why it would need so much memory. Could I have > >>> something > >>> >> > configured incorrectly? I've been over the configs several times, > >>> trying > >>> >> to > >>> >> > get them down to the bare minimum. > >>> >> > > >>> >> > Thanks for any assistance! > >>> >> > > >>> >> > Dave > >>> >> > >>> >> > >>> >> > >>> >> -- > >>> >> lucidimagination.com > >>> >> > >>> > >>> > >>> > >>> -- > >>> lucidimagination.com > >>> > >> > >> > > > > -- > lucidimagination.com >