Re: Trying to understand SOLR memory requirements
What is the largest -Xmx value you have tried? Your index size seems not very big Try -Xmx2048m , it should work On Tue, Jan 17, 2012 at 9:31 AM, Dave wrote: > I'm trying to figure out what my memory needs are for a rather large > dataset. I'm trying to build an auto-complete system for every > city/state/country in the world. I've got a geographic database, and have > setup the DIH to pull the proper data in. There are 2,784,937 documents > which I've formatted into JSON-like output, so there's a bit of data > associated with each one. Here is an example record: > > Brooklyn, New York, United States?{ |id|: |2620829|, > |timezone|:|America/New_York|,|type|: |3|, |country|: { |id| : |229| }, > |region|: { |id| : |3608| }, |city|: { |id|: |2616971|, |plainname|: > |Brooklyn|, |name|: |Brooklyn, New York, United States| }, |hint|: > |2300664|, |label|: |Brooklyn, New York, United States|, |value|: > |Brooklyn, New York, United States|, |title|: |Brooklyn, New York, United > States| } > > I've got the spellchecker / suggester module setup, and I can confirm that > everything works properly with a smaller dataset (i.e. just a couple of > countries worth of cities/states). However I'm running into a big problem > when I try to index the entire dataset. The dataimport?command=full-import > works and the system comes to an idle state. It generates the following > data/index/ directory (I'm including it in case it gives any indication on > memory requirements): > > -rw-rw 1 root root 2.2G Jan 17 00:13 _2w.fdt > -rw-rw 1 root root22M Jan 17 00:13 _2w.fdx > -rw-rw 1 root root131 Jan 17 00:13 _2w.fnm > -rw-rw 1 root root 134M Jan 17 00:13 _2w.frq > -rw-rw 1 root root16M Jan 17 00:13 _2w.nrm > -rw-rw 1 root root 130M Jan 17 00:13 _2w.prx > -rw-rw 1 root root 9.2M Jan 17 00:13 _2w.tii > -rw-rw 1 root root 1.1G Jan 17 00:13 _2w.tis > -rw-rw 1 root root 20 Jan 17 00:13 segments.gen > -rw-rw 1 root root291 Jan 17 00:13 segments_2 > > Next I try to run the suggest?spellcheck.build=true command, and I get the > following error: > > Jan 16, 2012 4:01:47 PM org.apache.solr.spelling.suggest.Suggester build > INFO: build() > Jan 16, 2012 4:03:27 PM org.apache.solr.common.SolrException log > SEVERE: java.lang.OutOfMemoryError: GC overhead limit exceeded > at java.util.Arrays.copyOfRange(Arrays.java:3209) > at java.lang.String.(String.java:215) > at org.apache.lucene.index.TermBuffer.toTerm(TermBuffer.java:122) > at org.apache.lucene.index.SegmentTermEnum.term(SegmentTermEnum.java:184) > at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:203) > at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:172) > at org.apache.lucene.index.SegmentReader.docFreq(SegmentReader.java:509) > at > org.apache.lucene.index.DirectoryReader.docFreq(DirectoryReader.java:719) > at > org.apache.solr.search.SolrIndexReader.docFreq(SolrIndexReader.java:309) > at > > org.apache.lucene.search.spell.HighFrequencyDictionary$HighFrequencyIterator.isFrequent(HighFrequencyDictionary.java:75) > at > > org.apache.lucene.search.spell.HighFrequencyDictionary$HighFrequencyIterator.hasNext(HighFrequencyDictionary.java:125) > at org.apache.lucene.search.suggest.fst.FSTLookup.build(FSTLookup.java:157) > at org.apache.lucene.search.suggest.Lookup.build(Lookup.java:70) > at org.apache.solr.spelling.suggest.Suggester.build(Suggester.java:133) > at > > org.apache.solr.handler.component.SpellCheckComponent.prepare(SpellCheckComponent.java:109) > at > > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:173) > at > > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372) > at > > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) > at > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) > at > > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) > at > > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > > org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:326) > at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) > at > > org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java
Re: Trying to understand SOLR memory requirements
you may disable FST look up and use lucene index as the suggest method FST look up loads all documents into the memory, you can use the lucene spell checker instead On Tue, Jan 17, 2012 at 10:31 AM, Dave wrote: > I've tried up to -Xmx5g > > On Mon, Jan 16, 2012 at 9:15 PM, qiu chi wrote: > > > What is the largest -Xmx value you have tried? > > Your index size seems not very big > > Try -Xmx2048m , it should work > > > > On Tue, Jan 17, 2012 at 9:31 AM, Dave wrote: > > > > > I'm trying to figure out what my memory needs are for a rather large > > > dataset. I'm trying to build an auto-complete system for every > > > city/state/country in the world. I've got a geographic database, and > have > > > setup the DIH to pull the proper data in. There are 2,784,937 documents > > > which I've formatted into JSON-like output, so there's a bit of data > > > associated with each one. Here is an example record: > > > > > > Brooklyn, New York, United States?{ |id|: |2620829|, > > > |timezone|:|America/New_York|,|type|: |3|, |country|: { |id| : |229| }, > > > |region|: { |id| : |3608| }, |city|: { |id|: |2616971|, |plainname|: > > > |Brooklyn|, |name|: |Brooklyn, New York, United States| }, |hint|: > > > |2300664|, |label|: |Brooklyn, New York, United States|, |value|: > > > |Brooklyn, New York, United States|, |title|: |Brooklyn, New York, > United > > > States| } > > > > > > I've got the spellchecker / suggester module setup, and I can confirm > > that > > > everything works properly with a smaller dataset (i.e. just a couple of > > > countries worth of cities/states). However I'm running into a big > problem > > > when I try to index the entire dataset. The > > dataimport?command=full-import > > > works and the system comes to an idle state. It generates the following > > > data/index/ directory (I'm including it in case it gives any indication > > on > > > memory requirements): > > > > > > -rw-rw 1 root root 2.2G Jan 17 00:13 _2w.fdt > > > -rw-rw 1 root root22M Jan 17 00:13 _2w.fdx > > > -rw-rw 1 root root131 Jan 17 00:13 _2w.fnm > > > -rw-rw 1 root root 134M Jan 17 00:13 _2w.frq > > > -rw-rw 1 root root16M Jan 17 00:13 _2w.nrm > > > -rw-rw 1 root root 130M Jan 17 00:13 _2w.prx > > > -rw-rw 1 root root 9.2M Jan 17 00:13 _2w.tii > > > -rw-rw 1 root root 1.1G Jan 17 00:13 _2w.tis > > > -rw-rw 1 root root 20 Jan 17 00:13 segments.gen > > > -rw-rw 1 root root291 Jan 17 00:13 segments_2 > > > > > > Next I try to run the suggest?spellcheck.build=true command, and I get > > the > > > following error: > > > > > > Jan 16, 2012 4:01:47 PM org.apache.solr.spelling.suggest.Suggester > build > > > INFO: build() > > > Jan 16, 2012 4:03:27 PM org.apache.solr.common.SolrException log > > > SEVERE: java.lang.OutOfMemoryError: GC overhead limit exceeded > > > at java.util.Arrays.copyOfRange(Arrays.java:3209) > > > at java.lang.String.(String.java:215) > > > at org.apache.lucene.index.TermBuffer.toTerm(TermBuffer.java:122) > > > at > org.apache.lucene.index.SegmentTermEnum.term(SegmentTermEnum.java:184) > > > at > org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:203) > > > at > org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:172) > > > at > org.apache.lucene.index.SegmentReader.docFreq(SegmentReader.java:509) > > > at > > > > org.apache.lucene.index.DirectoryReader.docFreq(DirectoryReader.java:719) > > > at > > > > org.apache.solr.search.SolrIndexReader.docFreq(SolrIndexReader.java:309) > > > at > > > > > > > > > org.apache.lucene.search.spell.HighFrequencyDictionary$HighFrequencyIterator.isFrequent(HighFrequencyDictionary.java:75) > > > at > > > > > > > > > org.apache.lucene.search.spell.HighFrequencyDictionary$HighFrequencyIterator.hasNext(HighFrequencyDictionary.java:125) > > > at > > org.apache.lucene.search.suggest.fst.FSTLookup.build(FSTLookup.java:157) > > > at org.apache.lucene.search.suggest.Lookup.build(Lookup.java:70) > > > at org.apache.solr.spelling.suggest.Suggester.build(Suggester.java:133) > > > at > > > > > > > > > org.apache.solr.handler.component.SpellCheckComponent.prepare(SpellCheckComponent.java:109) > > > at > > >
Re: Trying to understand SOLR memory requirements
I remembered there is another implementation using lucene index file as the look up table not the in memory FST FST has its advantage in speed but if you writes documents during runtime, reconstructing FST may cause performance issue On Tue, Jan 17, 2012 at 11:08 AM, Robert Muir wrote: > looks like https://issues.apache.org/jira/browse/SOLR-2888. > > Previously, FST would need to hold all the terms in RAM during > construction, but with the patch it uses offline sorts/temporary > files. > I'll reopen the issue to backport this to the 3.x branch. > > > On Mon, Jan 16, 2012 at 8:31 PM, Dave wrote: > > I'm trying to figure out what my memory needs are for a rather large > > dataset. I'm trying to build an auto-complete system for every > > city/state/country in the world. I've got a geographic database, and have > > setup the DIH to pull the proper data in. There are 2,784,937 documents > > which I've formatted into JSON-like output, so there's a bit of data > > associated with each one. Here is an example record: > > > > Brooklyn, New York, United States?{ |id|: |2620829|, > > |timezone|:|America/New_York|,|type|: |3|, |country|: { |id| : |229| }, > > |region|: { |id| : |3608| }, |city|: { |id|: |2616971|, |plainname|: > > |Brooklyn|, |name|: |Brooklyn, New York, United States| }, |hint|: > > |2300664|, |label|: |Brooklyn, New York, United States|, |value|: > > |Brooklyn, New York, United States|, |title|: |Brooklyn, New York, United > > States| } > > > > I've got the spellchecker / suggester module setup, and I can confirm > that > > everything works properly with a smaller dataset (i.e. just a couple of > > countries worth of cities/states). However I'm running into a big problem > > when I try to index the entire dataset. The > dataimport?command=full-import > > works and the system comes to an idle state. It generates the following > > data/index/ directory (I'm including it in case it gives any indication > on > > memory requirements): > > > > -rw-rw 1 root root 2.2G Jan 17 00:13 _2w.fdt > > -rw-rw 1 root root22M Jan 17 00:13 _2w.fdx > > -rw-rw 1 root root131 Jan 17 00:13 _2w.fnm > > -rw-rw 1 root root 134M Jan 17 00:13 _2w.frq > > -rw-rw 1 root root16M Jan 17 00:13 _2w.nrm > > -rw-rw 1 root root 130M Jan 17 00:13 _2w.prx > > -rw-rw 1 root root 9.2M Jan 17 00:13 _2w.tii > > -rw-rw 1 root root 1.1G Jan 17 00:13 _2w.tis > > -rw-rw 1 root root 20 Jan 17 00:13 segments.gen > > -rw-rw 1 root root291 Jan 17 00:13 segments_2 > > > > Next I try to run the suggest?spellcheck.build=true command, and I get > the > > following error: > > > > Jan 16, 2012 4:01:47 PM org.apache.solr.spelling.suggest.Suggester build > > INFO: build() > > Jan 16, 2012 4:03:27 PM org.apache.solr.common.SolrException log > > SEVERE: java.lang.OutOfMemoryError: GC overhead limit exceeded > > at java.util.Arrays.copyOfRange(Arrays.java:3209) > > at java.lang.String.(String.java:215) > > at org.apache.lucene.index.TermBuffer.toTerm(TermBuffer.java:122) > > at org.apache.lucene.index.SegmentTermEnum.term(SegmentTermEnum.java:184) > > at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:203) > > at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:172) > > at org.apache.lucene.index.SegmentReader.docFreq(SegmentReader.java:509) > > at > org.apache.lucene.index.DirectoryReader.docFreq(DirectoryReader.java:719) > > at > org.apache.solr.search.SolrIndexReader.docFreq(SolrIndexReader.java:309) > > at > > > org.apache.lucene.search.spell.HighFrequencyDictionary$HighFrequencyIterator.isFrequent(HighFrequencyDictionary.java:75) > > at > > > org.apache.lucene.search.spell.HighFrequencyDictionary$HighFrequencyIterator.hasNext(HighFrequencyDictionary.java:125) > > at > org.apache.lucene.search.suggest.fst.FSTLookup.build(FSTLookup.java:157) > > at org.apache.lucene.search.suggest.Lookup.build(Lookup.java:70) > > at org.apache.solr.spelling.suggest.Suggester.build(Suggester.java:133) > > at > > > org.apache.solr.handler.component.SpellCheckComponent.prepare(SpellCheckComponent.java:109) > > at > > > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:173) > > at > > > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) > > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372) > > at > > > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) > > at > > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) > > at > > > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > > at > > > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > > at > org.mortbay.jetty.handler.Conte
Re: CCU of Solr?
ab is not the best testing tool to test website performance try loadrunner or apache httpclient instead, they act more like an browser On Sat, Dec 19, 2009 at 11:20 AM, Olala wrote: > > I used ab(apache bench) to test with handle 1000 requests, with a maximum > of > 300 requests running concurrently (ab -n 1000 -c 300), and then I received > the output as follows: > > Concurrency Level: 300 > Time taken for tests: 6.797 seconds > Complete requests: 1000 > Failed requests:0 > Write errors: 0 > Non-2xx responses: 1000 > Total transferred: 162000 bytes > HTML transferred: 0 bytes > Requests per second:147.13 [#/sec] (mean) > Time per request: 2039.063 [ms] (mean) > Time per request: 6.797 [ms] (mean, across all concurrent requests) > Transfer rate: 23.28 [Kbytes/sec] received > > Connection Times (ms) > min mean[+/-sd] median max > Connect:01 4.7 0 78 > Processing: 234 979 310.6 10164141 > Waiting: 219 632 230.16094141 > Total:234 980 310.7 10164141 > > Percentage of the requests served within a certain time (ms) > 50% 1016 > 66% 1141 > 75% 1203 > 80% 1219 > 90% 1313 > 95% 1453 > 98% 1469 > 99% 1484 > 100% 4141 (longest request) > > > I wonder that 147 requests per second is too low? > > > > Erick Erickson wrote: > > > > You can't test it until you have a working SOLR instance in > > your specific problem space. > > > > But assuming you have a SOLR setup, there are a plethora of > > tools, just google "SOLR load testing". JMeter has been mentioned, > > as well as others. > > > > You can also write your own load tester that just spawns a bunch of > > threads that query your SOLR server, should take you about a day. > > > > Erick > > > > On Fri, Dec 18, 2009 at 3:42 AM, Olala wrote: > > > >> > >> Thanks for your answer! But how i can test this??? Do you know any tool > >> that > >> help me do that? :confused: > >> > >> > >> Noble Paul നോബിള് नोब्ळ्-2 wrote: > >> > > >> > it is very difficult to say. It depends on the cache hit ratio. If > >> > everything is served out of cache you may go upto arounbf 1000 req/sec > >> > > >> > On Fri, Dec 18, 2009 at 1:39 PM, Olala wrote: > >> >> > >> >> Hi all! > >> >> > >> >> I am developing an online dictionary application by using Solr, but I > >> >> wonder > >> >> that how many concurrent request that Solr can be process? > >> >> -- > >> >> View this message in context: > >> >> http://old.nabble.com/CCU-of-Solr--tp26840318p26840318.html > >> >> Sent from the Solr - User mailing list archive at Nabble.com. > >> >> > >> >> > >> > > >> > > >> > > >> > -- > >> > - > >> > Noble Paul | Systems Architect| AOL | http://aol.com > >> > > >> > > >> > >> -- > >> View this message in context: > >> http://old.nabble.com/CCU-of-Solr--tp26840318p26840598.html > >> Sent from the Solr - User mailing list archive at Nabble.com. > >> > >> > > > > > > -- > View this message in context: > http://old.nabble.com/CCU-of-Solr--tp26840318p26852460.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- Regards Qiu - chiqiu@gmail.com