Re: Trying to understand SOLR memory requirements

2012-01-16 Thread qiu chi
What is the largest -Xmx value you have tried?
Your index size seems not very big
Try -Xmx2048m , it should work

On Tue, Jan 17, 2012 at 9:31 AM, Dave  wrote:

> I'm trying to figure out what my memory needs are for a rather large
> dataset. I'm trying to build an auto-complete system for every
> city/state/country in the world. I've got a geographic database, and have
> setup the DIH to pull the proper data in. There are 2,784,937 documents
> which I've formatted into JSON-like output, so there's a bit of data
> associated with each one. Here is an example record:
>
> Brooklyn, New York, United States?{ |id|: |2620829|,
> |timezone|:|America/New_York|,|type|: |3|, |country|: { |id| : |229| },
> |region|: { |id| : |3608| }, |city|: { |id|: |2616971|, |plainname|:
> |Brooklyn|, |name|: |Brooklyn, New York, United States| }, |hint|:
> |2300664|, |label|: |Brooklyn, New York, United States|, |value|:
> |Brooklyn, New York, United States|, |title|: |Brooklyn, New York, United
> States| }
>
> I've got the spellchecker / suggester module setup, and I can confirm that
> everything works properly with a smaller dataset (i.e. just a couple of
> countries worth of cities/states). However I'm running into a big problem
> when I try to index the entire dataset. The dataimport?command=full-import
> works and the system comes to an idle state. It generates the following
> data/index/ directory (I'm including it in case it gives any indication on
> memory requirements):
>
> -rw-rw 1 root   root   2.2G Jan 17 00:13 _2w.fdt
> -rw-rw 1 root   root22M Jan 17 00:13 _2w.fdx
> -rw-rw 1 root   root131 Jan 17 00:13 _2w.fnm
> -rw-rw 1 root   root   134M Jan 17 00:13 _2w.frq
> -rw-rw 1 root   root16M Jan 17 00:13 _2w.nrm
> -rw-rw 1 root   root   130M Jan 17 00:13 _2w.prx
> -rw-rw 1 root   root   9.2M Jan 17 00:13 _2w.tii
> -rw-rw 1 root   root   1.1G Jan 17 00:13 _2w.tis
> -rw-rw 1 root   root 20 Jan 17 00:13 segments.gen
> -rw-rw 1 root   root291 Jan 17 00:13 segments_2
>
> Next I try to run the suggest?spellcheck.build=true command, and I get the
> following error:
>
> Jan 16, 2012 4:01:47 PM org.apache.solr.spelling.suggest.Suggester build
> INFO: build()
> Jan 16, 2012 4:03:27 PM org.apache.solr.common.SolrException log
> SEVERE: java.lang.OutOfMemoryError: GC overhead limit exceeded
>  at java.util.Arrays.copyOfRange(Arrays.java:3209)
> at java.lang.String.(String.java:215)
>  at org.apache.lucene.index.TermBuffer.toTerm(TermBuffer.java:122)
> at org.apache.lucene.index.SegmentTermEnum.term(SegmentTermEnum.java:184)
>  at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:203)
> at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:172)
>  at org.apache.lucene.index.SegmentReader.docFreq(SegmentReader.java:509)
> at
> org.apache.lucene.index.DirectoryReader.docFreq(DirectoryReader.java:719)
>  at
> org.apache.solr.search.SolrIndexReader.docFreq(SolrIndexReader.java:309)
> at
>
> org.apache.lucene.search.spell.HighFrequencyDictionary$HighFrequencyIterator.isFrequent(HighFrequencyDictionary.java:75)
>  at
>
> org.apache.lucene.search.spell.HighFrequencyDictionary$HighFrequencyIterator.hasNext(HighFrequencyDictionary.java:125)
> at org.apache.lucene.search.suggest.fst.FSTLookup.build(FSTLookup.java:157)
>  at org.apache.lucene.search.suggest.Lookup.build(Lookup.java:70)
> at org.apache.solr.spelling.suggest.Suggester.build(Suggester.java:133)
>  at
>
> org.apache.solr.handler.component.SpellCheckComponent.prepare(SpellCheckComponent.java:109)
> at
>
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:173)
>  at
>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372)
>  at
>
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
>  at
>
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
>  at
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
>  at
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
>  at
>
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> at
>
> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
>  at
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> at org.mortbay.jetty.Server.handle(Server.java:326)
>  at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
> at
>
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java

Re: Trying to understand SOLR memory requirements

2012-01-16 Thread qiu chi
you may disable FST look up and use lucene index as the suggest method

FST look up loads all documents into the memory, you can use the lucene
spell checker instead

On Tue, Jan 17, 2012 at 10:31 AM, Dave  wrote:

> I've tried up to -Xmx5g
>
> On Mon, Jan 16, 2012 at 9:15 PM, qiu chi  wrote:
>
> > What is the largest -Xmx value you have tried?
> > Your index size seems not very big
> > Try -Xmx2048m , it should work
> >
> > On Tue, Jan 17, 2012 at 9:31 AM, Dave  wrote:
> >
> > > I'm trying to figure out what my memory needs are for a rather large
> > > dataset. I'm trying to build an auto-complete system for every
> > > city/state/country in the world. I've got a geographic database, and
> have
> > > setup the DIH to pull the proper data in. There are 2,784,937 documents
> > > which I've formatted into JSON-like output, so there's a bit of data
> > > associated with each one. Here is an example record:
> > >
> > > Brooklyn, New York, United States?{ |id|: |2620829|,
> > > |timezone|:|America/New_York|,|type|: |3|, |country|: { |id| : |229| },
> > > |region|: { |id| : |3608| }, |city|: { |id|: |2616971|, |plainname|:
> > > |Brooklyn|, |name|: |Brooklyn, New York, United States| }, |hint|:
> > > |2300664|, |label|: |Brooklyn, New York, United States|, |value|:
> > > |Brooklyn, New York, United States|, |title|: |Brooklyn, New York,
> United
> > > States| }
> > >
> > > I've got the spellchecker / suggester module setup, and I can confirm
> > that
> > > everything works properly with a smaller dataset (i.e. just a couple of
> > > countries worth of cities/states). However I'm running into a big
> problem
> > > when I try to index the entire dataset. The
> > dataimport?command=full-import
> > > works and the system comes to an idle state. It generates the following
> > > data/index/ directory (I'm including it in case it gives any indication
> > on
> > > memory requirements):
> > >
> > > -rw-rw 1 root   root   2.2G Jan 17 00:13 _2w.fdt
> > > -rw-rw 1 root   root22M Jan 17 00:13 _2w.fdx
> > > -rw-rw 1 root   root131 Jan 17 00:13 _2w.fnm
> > > -rw-rw 1 root   root   134M Jan 17 00:13 _2w.frq
> > > -rw-rw 1 root   root16M Jan 17 00:13 _2w.nrm
> > > -rw-rw 1 root   root   130M Jan 17 00:13 _2w.prx
> > > -rw-rw 1 root   root   9.2M Jan 17 00:13 _2w.tii
> > > -rw-rw 1 root   root   1.1G Jan 17 00:13 _2w.tis
> > > -rw-rw 1 root   root 20 Jan 17 00:13 segments.gen
> > > -rw-rw 1 root   root291 Jan 17 00:13 segments_2
> > >
> > > Next I try to run the suggest?spellcheck.build=true command, and I get
> > the
> > > following error:
> > >
> > > Jan 16, 2012 4:01:47 PM org.apache.solr.spelling.suggest.Suggester
> build
> > > INFO: build()
> > > Jan 16, 2012 4:03:27 PM org.apache.solr.common.SolrException log
> > > SEVERE: java.lang.OutOfMemoryError: GC overhead limit exceeded
> > >  at java.util.Arrays.copyOfRange(Arrays.java:3209)
> > > at java.lang.String.(String.java:215)
> > >  at org.apache.lucene.index.TermBuffer.toTerm(TermBuffer.java:122)
> > > at
> org.apache.lucene.index.SegmentTermEnum.term(SegmentTermEnum.java:184)
> > >  at
> org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:203)
> > > at
> org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:172)
> > >  at
> org.apache.lucene.index.SegmentReader.docFreq(SegmentReader.java:509)
> > > at
> > >
> org.apache.lucene.index.DirectoryReader.docFreq(DirectoryReader.java:719)
> > >  at
> > >
> org.apache.solr.search.SolrIndexReader.docFreq(SolrIndexReader.java:309)
> > > at
> > >
> > >
> >
> org.apache.lucene.search.spell.HighFrequencyDictionary$HighFrequencyIterator.isFrequent(HighFrequencyDictionary.java:75)
> > >  at
> > >
> > >
> >
> org.apache.lucene.search.spell.HighFrequencyDictionary$HighFrequencyIterator.hasNext(HighFrequencyDictionary.java:125)
> > > at
> > org.apache.lucene.search.suggest.fst.FSTLookup.build(FSTLookup.java:157)
> > >  at org.apache.lucene.search.suggest.Lookup.build(Lookup.java:70)
> > > at org.apache.solr.spelling.suggest.Suggester.build(Suggester.java:133)
> > >  at
> > >
> > >
> >
> org.apache.solr.handler.component.SpellCheckComponent.prepare(SpellCheckComponent.java:109)
> > > at
> > >

Re: Trying to understand SOLR memory requirements

2012-01-16 Thread qiu chi
I remembered there is another implementation using lucene index file as the
look up table not the in memory FST

FST has its advantage in speed but if you writes documents during runtime,
reconstructing FST may cause performance issue
On Tue, Jan 17, 2012 at 11:08 AM, Robert Muir  wrote:

> looks like https://issues.apache.org/jira/browse/SOLR-2888.
>
> Previously, FST would need to hold all the terms in RAM during
> construction, but with the patch it uses offline sorts/temporary
> files.
> I'll reopen the issue to backport this to the 3.x branch.
>
>
> On Mon, Jan 16, 2012 at 8:31 PM, Dave  wrote:
> > I'm trying to figure out what my memory needs are for a rather large
> > dataset. I'm trying to build an auto-complete system for every
> > city/state/country in the world. I've got a geographic database, and have
> > setup the DIH to pull the proper data in. There are 2,784,937 documents
> > which I've formatted into JSON-like output, so there's a bit of data
> > associated with each one. Here is an example record:
> >
> > Brooklyn, New York, United States?{ |id|: |2620829|,
> > |timezone|:|America/New_York|,|type|: |3|, |country|: { |id| : |229| },
> > |region|: { |id| : |3608| }, |city|: { |id|: |2616971|, |plainname|:
> > |Brooklyn|, |name|: |Brooklyn, New York, United States| }, |hint|:
> > |2300664|, |label|: |Brooklyn, New York, United States|, |value|:
> > |Brooklyn, New York, United States|, |title|: |Brooklyn, New York, United
> > States| }
> >
> > I've got the spellchecker / suggester module setup, and I can confirm
> that
> > everything works properly with a smaller dataset (i.e. just a couple of
> > countries worth of cities/states). However I'm running into a big problem
> > when I try to index the entire dataset. The
> dataimport?command=full-import
> > works and the system comes to an idle state. It generates the following
> > data/index/ directory (I'm including it in case it gives any indication
> on
> > memory requirements):
> >
> > -rw-rw 1 root   root   2.2G Jan 17 00:13 _2w.fdt
> > -rw-rw 1 root   root22M Jan 17 00:13 _2w.fdx
> > -rw-rw 1 root   root131 Jan 17 00:13 _2w.fnm
> > -rw-rw 1 root   root   134M Jan 17 00:13 _2w.frq
> > -rw-rw 1 root   root16M Jan 17 00:13 _2w.nrm
> > -rw-rw 1 root   root   130M Jan 17 00:13 _2w.prx
> > -rw-rw 1 root   root   9.2M Jan 17 00:13 _2w.tii
> > -rw-rw 1 root   root   1.1G Jan 17 00:13 _2w.tis
> > -rw-rw 1 root   root 20 Jan 17 00:13 segments.gen
> > -rw-rw 1 root   root291 Jan 17 00:13 segments_2
> >
> > Next I try to run the suggest?spellcheck.build=true command, and I get
> the
> > following error:
> >
> > Jan 16, 2012 4:01:47 PM org.apache.solr.spelling.suggest.Suggester build
> > INFO: build()
> > Jan 16, 2012 4:03:27 PM org.apache.solr.common.SolrException log
> > SEVERE: java.lang.OutOfMemoryError: GC overhead limit exceeded
> >  at java.util.Arrays.copyOfRange(Arrays.java:3209)
> > at java.lang.String.(String.java:215)
> >  at org.apache.lucene.index.TermBuffer.toTerm(TermBuffer.java:122)
> > at org.apache.lucene.index.SegmentTermEnum.term(SegmentTermEnum.java:184)
> >  at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:203)
> > at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:172)
> >  at org.apache.lucene.index.SegmentReader.docFreq(SegmentReader.java:509)
> > at
> org.apache.lucene.index.DirectoryReader.docFreq(DirectoryReader.java:719)
> >  at
> org.apache.solr.search.SolrIndexReader.docFreq(SolrIndexReader.java:309)
> > at
> >
> org.apache.lucene.search.spell.HighFrequencyDictionary$HighFrequencyIterator.isFrequent(HighFrequencyDictionary.java:75)
> >  at
> >
> org.apache.lucene.search.spell.HighFrequencyDictionary$HighFrequencyIterator.hasNext(HighFrequencyDictionary.java:125)
> > at
> org.apache.lucene.search.suggest.fst.FSTLookup.build(FSTLookup.java:157)
> >  at org.apache.lucene.search.suggest.Lookup.build(Lookup.java:70)
> > at org.apache.solr.spelling.suggest.Suggester.build(Suggester.java:133)
> >  at
> >
> org.apache.solr.handler.component.SpellCheckComponent.prepare(SpellCheckComponent.java:109)
> > at
> >
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:173)
> >  at
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372)
> >  at
> >
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
> > at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
> >  at
> >
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> > at
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
> >  at
> >
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> > at
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
> >  at
> org.mortbay.jetty.handler.Conte

Re: CCU of Solr?

2009-12-18 Thread qiu chi
ab is not the best testing tool to test website performance
try loadrunner or apache httpclient instead, they act more like an browser


On Sat, Dec 19, 2009 at 11:20 AM, Olala  wrote:

>
> I used ab(apache bench) to test with handle 1000 requests, with a maximum
> of
> 300 requests running concurrently (ab -n 1000 -c 300), and then I received
> the output as follows:
>
> Concurrency Level:  300
> Time taken for tests:   6.797 seconds
> Complete requests:  1000
> Failed requests:0
> Write errors:   0
> Non-2xx responses:  1000
> Total transferred:  162000 bytes
> HTML transferred:   0 bytes
> Requests per second:147.13 [#/sec] (mean)
> Time per request:   2039.063 [ms] (mean)
> Time per request:   6.797 [ms] (mean, across all concurrent requests)
> Transfer rate:  23.28 [Kbytes/sec] received
>
> Connection Times (ms)
>  min  mean[+/-sd] median   max
> Connect:01   4.7  0  78
> Processing:   234  979 310.6   10164141
> Waiting:  219  632 230.16094141
> Total:234  980 310.7   10164141
>
> Percentage of the requests served within a certain time (ms)
>  50%   1016
>  66%   1141
>  75%   1203
>  80%   1219
>  90%   1313
>  95%   1453
>  98%   1469
>  99%   1484
>  100%   4141 (longest request)
>
>
> I wonder that 147 requests per second is too low?
>
>
>
> Erick Erickson wrote:
> >
> > You can't test it until you have a working SOLR instance in
> > your specific problem space.
> >
> > But assuming you have a SOLR setup, there are a plethora of
> > tools, just google "SOLR load testing". JMeter has been mentioned,
> > as well as others.
> >
> > You can also write your own load tester that just spawns a bunch of
> > threads that query your SOLR server, should take you about a day.
> >
> > Erick
> >
> > On Fri, Dec 18, 2009 at 3:42 AM, Olala  wrote:
> >
> >>
> >> Thanks for your answer! But how i can test this??? Do you know any tool
> >> that
> >> help me do that? :confused:
> >>
> >>
> >> Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
> >> >
> >> > it is very difficult to say. It depends on the cache hit ratio. If
> >> > everything is served out of cache you may go upto arounbf 1000 req/sec
> >> >
> >> > On Fri, Dec 18, 2009 at 1:39 PM, Olala  wrote:
> >> >>
> >> >> Hi all!
> >> >>
> >> >> I am developing an online dictionary application by using Solr, but I
> >> >> wonder
> >> >> that how many concurrent request that Solr can be process?
> >> >> --
> >> >> View this message in context:
> >> >> http://old.nabble.com/CCU-of-Solr--tp26840318p26840318.html
> >> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >> >>
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > -
> >> > Noble Paul | Systems Architect| AOL | http://aol.com
> >> >
> >> >
> >>
> >> --
> >> View this message in context:
> >> http://old.nabble.com/CCU-of-Solr--tp26840318p26840598.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
> >>
> >
> >
>
> --
> View this message in context:
> http://old.nabble.com/CCU-of-Solr--tp26840318p26852460.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


-- 
Regards
Qiu
- chiqiu@gmail.com