\
> &f.cs_rep.separator=%5E" --data-binary @- -H 'Content-type:text/plain;
> charset=utf-8'
> EnD)
>
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Tuesday, April 08, 2014 2:21 PM
> To: solr-user@lucene.apache.o
Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Tuesday, April 08, 2014 2:21 PM
To: solr-user@lucene.apache.org
Subject: Re: solr4 performance question
What do you have for hour _softcommit_ settings in solrconfig.xml? I'm
guessing you're using SolrJ or similar, but the solrconfig setti
What do you have for hour _softcommit_ settings in solrconfig.xml? I'm
guessing you're using SolrJ or similar, but the solrconfig settings
will trip a commit as well.
For that matter ,what are all our commit settings in solrconfig.xml,
both hard and soft?
Best,
Erick
On Tue, Apr 8, 2014 at 10:28
Hi Joshi;
Click to the Plugins/Stats section under your collection at Solr Admin UI.
You will see the cache statistics for different types of caches. hitratio
and evictions are good statistics to look at first. On the other hand you
should read here: https://wiki.apache.org/solr/SolrPerformanceFac
Hi,
We have 10 node Solr Cloud (5 shards, 2 replicas) with 30 GB JVM on 60GB
machine and 40 GB of index.
We're constantly noticing that Solr queries take longer time while update (with
commit=false setting) is in progress. The query which usually takes .5 seconds,
take up to 2 minutes while up
On Wed, Nov 6, 2013 at 12:07 PM, andres wrote:
> I'm debating whether or not to set the 'facets.missing' parameter to true by
> default when faceting. What is the performance impact of setting
> 'facets.missing' to true?
It really depends on the faceting method. For some faceting methods
(like e
I'm debating whether or not to set the 'facets.missing' parameter to true by
default when faceting. What is the performance impact of setting
'facets.missing' to true?
--
View this message in context:
http://lucene.472066.n3.nabble.com/Performance-Question-facets-mi
sing softCommit=true in update url and check if it
> gives us desired performance.
>
> Thanks for looking into this. Appreciate your help.
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Tuesday, August 13, 2013 8:12 AM
> To:
check if it
gives us desired performance.
Thanks for looking into this. Appreciate your help.
-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Tuesday, August 13, 2013 8:12 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr4 update and query perfor
1> That's hard-coded at present. There's anecdotal evidence that there
are throughput improvements with larger batch sizes, but no action
yet.
2> Yep, all searchers are also re-opened, caches re-warmed, etc.
3> Odd. I'm assuming your Solr3 was master/slave setup? Seeing the
queries wo
Hi,
We have SolrCloud (4.4.0) cluster (5 shards and 2 replicas) on 10 boxes with
about 450 mil documents (~90 mil per shard). We're loading 1000 or less
documents in CSV format every few minutes. In Solr3, with 300 mil documents, it
used to take 30 seconds to load 1000 documents while in Solr4,
From: "Steven Bower-2 [via Lucene]"
mailto:ml-node+s472066n4082569...@n3.nabble.com>>
Date: Monday, August 5, 2013 9:14 AM
To: "Smiley, David W." mailto:dsmi...@mitre.org>>
Subject: Re: Performance question on Spatial Search
So after re-feeding our data with
On 8/5/2013 7:13 AM, Steven Bower wrote:
> So after re-feeding our data with a new boolean field that is true when
> data exists and false when it doesn't our search times have gone from avg
> of about 20s to around 150ms... pretty amazing change in perf... It seems
> like https://issues.apache.org
So after re-feeding our data with a new boolean field that is true when
data exists and false when it doesn't our search times have gone from avg
of about 20s to around 150ms... pretty amazing change in perf... It seems
like https://issues.apache.org/jira/browse/SOLR-5093 might alleviate many
peopl
the list of IDs does change relatively frequently, but this doesn't seem to
have very much impact on the performance of the query as far as I can tell.
attached are the stacks
thanks,
steve
On Wed, Jul 31, 2013 at 6:33 AM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:
> On Wed, Jul 31
On Wed, Jul 31, 2013 at 1:10 AM, Steven Bower wrote:
>
> not sure what you mean by good hit raitio?
>
I mean such queries are really expensive (even on cache hit), so if the
list of ids changes every time, it never hit cache and hence executes these
heavy queries every time. It's well known perf
gt;> >>cache.
> >> >>>> >> >
> >> >>>> >> > @Bill will look into that, I'm not certain it will support the
> >> >>>> >>particular
> >> >>>> >> > queries that are being executed but
>> >>>> >> > erickerickson@
>> >>>> >>
>> >>>> >> > >wrote:
>> >>>> >> >
>> >>>> >> >> This is very strange. I
@David I will certainly update when we get the data refed... and if you
have things you'd like to investigate or try out please let me know.. I'm
happy to eval things at scale here... we will be taking this index from its
current 45m records to 6-700m over the next few months as well..
steve
On
Very good read... Already using MMap... verified using pmap and vsz from
top..
not sure what you mean by good hit raitio?
Here are the stacks...
Name Time (ms) Own Time (ms)
org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet(AtomicReaderContext,
Bits) 300879 203478
org.apache.luc
gt;> >> >> really shouldn't be a big issue, your index files
> >>>> >> >> should be MMaped.
> >>>> >> >>
> >>>> >> >> Let's try the crude thing first and give the JVM
> >>>> >> >> more memory.
> >>>> >> >>
> >>>> >> >> FWIW
> >>>> >> >> Erick
> >>>> >> >>
> >>>> >> >> On Mon, Jul 29, 2013 at 4:45 PM, Steven Bower <
> >>>> >>
> >>>> >> > smb-apache@
> >>>> >>
> >>>> >> > >
> >>>> >> >> wrote:
> >>>> >> >> > I've been doing some performance analysis of a spacial search
> >>>>use
> >>>> >>case
> >>>> >> >> I'm
> >>>> >> >> > implementing in Solr 4.3.0. Basically I'm seeing search times
> >>>> alot
> >>>> >> >> higher
> >>>> >> >> > than I'd like them to be and I'm hoping people may have some
> >>>> >> >> suggestions
> >>>> >> >> > for how to optimize further.
> >>>> >> >> >
> >>>> >> >> > Here are the specs of what I'm doing now:
> >>>> >> >> >
> >>>> >> >> > Machine:
> >>>> >> >> > - 16 cores @ 2.8ghz
> >>>> >> >> > - 256gb RAM
> >>>> >> >> > - 1TB (RAID 1+0 on 10 SSD)
> >>>> >> >> >
> >>>> >> >> > Content:
> >>>> >> >> > - 45M docs (not very big only a few fields with no large
> >>>>textual
> >>>> >> >> content)
> >>>> >> >> > - 1 geo field (using config below)
> >>>> >> >> > - index is 12gb
> >>>> >> >> > - 1 shard
> >>>> >> >> > - Using MMapDirectory
> >>>> >> >> >
> >>>> >> >> > Field config:
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> > >>>> class="solr.SpatialRecursivePrefixTreeFieldType"
> >>>> >> >>
> >>>> >> > > distErrPct="0.025" maxDistErr="0.00045"
> >>>> >> >> >
> >>>> >> >>
> >>>> >>
> >>>>
> >>>>
> >>>>>>spatialContextFactory="com.spatial4j.core.context.jts.JtsSpatialConte
> >>>>>>xtFa
> >>>> >>ctory"
> >>>> >> >> > units="degrees"/>
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> > >>>> >> >>
> >>>> >> > > required="false" stored="true" type="geo"/>
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> > What I've figured out so far:
> >>>> >> >> >
> >>>> >> >> > - Most of my time (98%) is being spent in
> >>>> >> >> > java.nio.Bits.copyToByteArray(long,Object,long,long) which is
> >>>> being
> >>>> >> >> > driven by
> >>>> >> >>
> >>>>BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.loadBlock()
> >>>> >> >> > which from what I gather is basically reading terms from the
> >>>>.tim
> >>>> >>file
> >>>> >> >> > in blocks
> >>>> >> >> >
> >>>> >> >> > - I moved from Java 1.6 to 1.7 based upon what I read here:
> >>>> >> >> >
> >>>> >> >>
> >>>> >>
> >>>>
> >>>>
> http://blog.vlad1.com/2011/10/05/looking-at-java-nio-buffer-performance
> >>>>/
> >>>> >> >> > and it definitely had some positive impact (i haven't been
> >>>>able
> >>>> to
> >>>> >> >> > measure this independantly yet)
> >>>> >> >> >
> >>>> >> >> > - I changed maxDistErr from 0.09 (which is 1m precision
> >>>>per
> >>>> >>docs)
> >>>> >> >> > to 0.00045 (50m precision) ..
> >>>> >> >> >
> >>>> >> >> > - It looks to me that the .tim file are being memory mapped
> >>>>fully
> >>>> >>(ie
> >>>> >> >> > they show up in pmap output) the virtual size of the jvm is
> >>>>~18gb
> >>>> >> >> > (heap is 6gb)
> >>>> >> >> >
> >>>> >> >> > - I've optimized the index but this doesn't have a dramatic
> >>>> impact
> >>>> >>on
> >>>> >> >> > performance
> >>>> >> >> >
> >>>> >> >> > Changing the precision and the JVM upgrade yielded a drop from
> >>>> ~18s
> >>>> >> >> > avg query time to ~9s avg query time.. This is fantastic but I
> >>>> >>want to
> >>>> >> >> > get this down into the 1-2 second range.
> >>>> >> >> >
> >>>> >> >> > At this point it seems that basically i am bottle-necked on
> >>>> >>basically
> >>>> >> >> > copying memory out of the mapped .tim file which leads me to
> >>>> think
> >>>> >> >> > that the only solution to my problem would be to read less
> >>>>data
> >>>> or
> >>>> >> >> > somehow read it more efficiently..
> >>>> >> >> >
> >>>> >> >> > If anyone has any suggestions of where to go with this I'd
> >>>>love
> >>>> to
> >>>> >> know
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> > thanks,
> >>>> >> >> >
> >>>> >> >> > steve
> >>>> >> >>
> >>>> >>
> >>>> >>
> >>>> >>
> >>>> >>
> >>>> >>
> >>>> >> -
> >>>> >> Author:
> >>>> >>
> http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
> >>>> >> --
> >>>> >> View this message in context:
> >>>> >>
> >>>> >>
> >>>>
> >>>>
> http://lucene.472066.n3.nabble.com/Performance-question-on-Spatial-Sear
> >>>>ch
> >>>> >>-tp4081150p4081309.html
> >>>> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>>> >>
> >>>>
> >>>>
> >>>
> >>
>
>
--
- Luis Cappa
> >>
>>>> >> >> On Mon, Jul 29, 2013 at 4:45 PM, Steven Bower <
>>>> >>
>>>> >> > smb-apache@
>>>> >>
>>>> >> > >
>>>> >> >> wrote:
>>>> >> >> > I've been doing some performance analysis of a spacial search
>>>>use
>>>> >>case
>>>> >> >> I'm
>>>> >> >> > implementing in Solr 4.3.0. Basically I'm seeing search times
>>>> alot
>>>> >> >> higher
>>>> >> >> > than I'd like them to be and I'm hoping people may have some
>>>> >> >> suggestions
>>>> >> >> > for how to optimize further.
>>>> >> >> >
>>>> >> >> > Here are the specs of what I'm doing now:
>>>> >> >> >
>>>> >> >> > Machine:
>>>> >> >> > - 16 cores @ 2.8ghz
>>>> >> >> > - 256gb RAM
>>>> >> >> > - 1TB (RAID 1+0 on 10 SSD)
>>>> >> >> >
>>>> >> >> > Content:
>>>> >> >> > - 45M docs (not very big only a few fields with no large
>>>>textual
>>>> >> >> content)
>>>> >> >> > - 1 geo field (using config below)
>>>> >> >> > - index is 12gb
>>>> >> >> > - 1 shard
>>>> >> >> > - Using MMapDirectory
>>>> >> >> >
>>>> >> >> > Field config:
>>>> >> >> >
>>>> >> >> >
>>>> >> > >>> class="solr.SpatialRecursivePrefixTreeFieldType"
>>>> >> >>
>>>> >> > > distErrPct="0.025" maxDistErr="0.00045"
>>>> >> >> >
>>>> >> >>
>>>> >>
>>>>
>>>>
>>>>>>spatialContextFactory="com.spatial4j.core.context.jts.JtsSpatialConte
>>>>>>xtFa
>>>> >>ctory"
>>>> >> >> > units="degrees"/>
>>>> >> >> >
>>>> >> >> >
>>>> >> > >>> >> >>
>>>> >> > > required="false" stored="true" type="geo"/>
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > What I've figured out so far:
>>>> >> >> >
>>>> >> >> > - Most of my time (98%) is being spent in
>>>> >> >> > java.nio.Bits.copyToByteArray(long,Object,long,long) which is
>>>> being
>>>> >> >> > driven by
>>>> >> >>
>>>>BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.loadBlock()
>>>> >> >> > which from what I gather is basically reading terms from the
>>>>.tim
>>>> >>file
>>>> >> >> > in blocks
>>>> >> >> >
>>>> >> >> > - I moved from Java 1.6 to 1.7 based upon what I read here:
>>>> >> >> >
>>>> >> >>
>>>> >>
>>>>
>>>>http://blog.vlad1.com/2011/10/05/looking-at-java-nio-buffer-performance
>>>>/
>>>> >> >> > and it definitely had some positive impact (i haven't been
>>>>able
>>>> to
>>>> >> >> > measure this independantly yet)
>>>> >> >> >
>>>> >> >> > - I changed maxDistErr from 0.09 (which is 1m precision
>>>>per
>>>> >>docs)
>>>> >> >> > to 0.00045 (50m precision) ..
>>>> >> >> >
>>>> >> >> > - It looks to me that the .tim file are being memory mapped
>>>>fully
>>>> >>(ie
>>>> >> >> > they show up in pmap output) the virtual size of the jvm is
>>>>~18gb
>>>> >> >> > (heap is 6gb)
>>>> >> >> >
>>>> >> >> > - I've optimized the index but this doesn't have a dramatic
>>>> impact
>>>> >>on
>>>> >> >> > performance
>>>> >> >> >
>>>> >> >> > Changing the precision and the JVM upgrade yielded a drop from
>>>> ~18s
>>>> >> >> > avg query time to ~9s avg query time.. This is fantastic but I
>>>> >>want to
>>>> >> >> > get this down into the 1-2 second range.
>>>> >> >> >
>>>> >> >> > At this point it seems that basically i am bottle-necked on
>>>> >>basically
>>>> >> >> > copying memory out of the mapped .tim file which leads me to
>>>> think
>>>> >> >> > that the only solution to my problem would be to read less
>>>>data
>>>> or
>>>> >> >> > somehow read it more efficiently..
>>>> >> >> >
>>>> >> >> > If anyone has any suggestions of where to go with this I'd
>>>>love
>>>> to
>>>> >> know
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > thanks,
>>>> >> >> >
>>>> >> >> > steve
>>>> >> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >> -
>>>> >> Author:
>>>> >> http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
>>>> >> --
>>>> >> View this message in context:
>>>> >>
>>>> >>
>>>>
>>>>http://lucene.472066.n3.nabble.com/Performance-question-on-Spatial-Sear
>>>>ch
>>>> >>-tp4081150p4081309.html
>>>> >> Sent from the Solr - User mailing list archive at Nabble.com.
>>>> >>
>>>>
>>>>
>>>
>>
On Tue, Jul 30, 2013 at 12:45 AM, Steven Bower wrote:
>
> - Most of my time (98%) is being spent in
> java.nio.Bits.copyToByteArray(long,Object,long,long) which is being
Steven, please
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html .my
benchmarking experience shows that
ally I'm seeing search times
>>> alot
>>> >> >> higher
>>> >> >> > than I'd like them to be and I'm hoping people may have some
>>> >> >> suggestions
>>> >> >> > for how to optimize further.
>>
t; > - 16 cores @ 2.8ghz
>> >> >> > - 256gb RAM
>> >> >> > - 1TB (RAID 1+0 on 10 SSD)
>> >> >> >
>> >> >> > Content:
>> >> >> > - 45M docs (not very big only a few fields with no large textual
>> >&g
00045"
> >> >> >
> >> >>
> >>
> >>spatialContextFactory="com.spatial4j.core.context.jts.JtsSpatialContextFa
> >>ctory"
> >> >> > units="degrees"/>
> >> >> >
> >> >> >
> >> > >> >>
> >> > > required="false" stored="true" type="geo"/>
> >> >> >
> >> >> >
> >> >> > What I've figured out so far:
> >> >> >
> >> >> > - Most of my time (98%) is being spent in
> >> >> > java.nio.Bits.copyToByteArray(long,Object,long,long) which is being
> >> >> > driven by
> >> >> BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.loadBlock()
> >> >> > which from what I gather is basically reading terms from the .tim
> >>file
> >> >> > in blocks
> >> >> >
> >> >> > - I moved from Java 1.6 to 1.7 based upon what I read here:
> >> >> >
> >> >>
> >>
> http://blog.vlad1.com/2011/10/05/looking-at-java-nio-buffer-performance/
> >> >> > and it definitely had some positive impact (i haven't been able to
> >> >> > measure this independantly yet)
> >> >> >
> >> >> > - I changed maxDistErr from 0.09 (which is 1m precision per
> >>docs)
> >> >> > to 0.00045 (50m precision) ..
> >> >> >
> >> >> > - It looks to me that the .tim file are being memory mapped fully
> >>(ie
> >> >> > they show up in pmap output) the virtual size of the jvm is ~18gb
> >> >> > (heap is 6gb)
> >> >> >
> >> >> > - I've optimized the index but this doesn't have a dramatic impact
> >>on
> >> >> > performance
> >> >> >
> >> >> > Changing the precision and the JVM upgrade yielded a drop from ~18s
> >> >> > avg query time to ~9s avg query time.. This is fantastic but I
> >>want to
> >> >> > get this down into the 1-2 second range.
> >> >> >
> >> >> > At this point it seems that basically i am bottle-necked on
> >>basically
> >> >> > copying memory out of the mapped .tim file which leads me to think
> >> >> > that the only solution to my problem would be to read less data or
> >> >> > somehow read it more efficiently..
> >> >> >
> >> >> > If anyone has any suggestions of where to go with this I'd love to
> >> know
> >> >> >
> >> >> >
> >> >> > thanks,
> >> >> >
> >> >> > steve
> >> >>
> >>
> >>
> >>
> >>
> >>
> >> -
> >> Author:
> >> http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
> >> --
> >> View this message in context:
> >>
> >>
> http://lucene.472066.n3.nabble.com/Performance-question-on-Spatial-Search
> >>-tp4081150p4081309.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
>
>
I've figured out so far:
>> >> >
>> >> > - Most of my time (98%) is being spent in
>> >> > java.nio.Bits.copyToByteArray(long,Object,long,long) which is being
>> >> > driven by
>> >> BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.loadBlock()
>> >> > which from what I gather is basically reading terms from the .tim
>>file
>> >> > in blocks
>> >> >
>> >> > - I moved from Java 1.6 to 1.7 based upon what I read here:
>> >> >
>> >>
>> http://blog.vlad1.com/2011/10/05/looking-at-java-nio-buffer-performance/
>> >> > and it definitely had some positive impact (i haven't been able to
>> >> > measure this independantly yet)
>> >> >
>> >> > - I changed maxDistErr from 0.09 (which is 1m precision per
>>docs)
>> >> > to 0.00045 (50m precision) ..
>> >> >
>> >> > - It looks to me that the .tim file are being memory mapped fully
>>(ie
>> >> > they show up in pmap output) the virtual size of the jvm is ~18gb
>> >> > (heap is 6gb)
>> >> >
>> >> > - I've optimized the index but this doesn't have a dramatic impact
>>on
>> >> > performance
>> >> >
>> >> > Changing the precision and the JVM upgrade yielded a drop from ~18s
>> >> > avg query time to ~9s avg query time.. This is fantastic but I
>>want to
>> >> > get this down into the 1-2 second range.
>> >> >
>> >> > At this point it seems that basically i am bottle-necked on
>>basically
>> >> > copying memory out of the mapped .tim file which leads me to think
>> >> > that the only solution to my problem would be to read less data or
>> >> > somehow read it more efficiently..
>> >> >
>> >> > If anyone has any suggestions of where to go with this I'd love to
>> know
>> >> >
>> >> >
>> >> > thanks,
>> >> >
>> >> > steve
>> >>
>>
>>
>>
>>
>>
>> -
>> Author:
>> http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
>> --
>> View this message in context:
>>
>>http://lucene.472066.n3.nabble.com/Performance-question-on-Spatial-Search
>>-tp4081150p4081309.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
> >> >
> >> > - I moved from Java 1.6 to 1.7 based upon what I read here:
> >> >
> >>
> http://blog.vlad1.com/2011/10/05/looking-at-java-nio-buffer-performance/
> >> > and it definitely had some positive impact (i haven't been able to
> >>
0.00045 (50m precision) ..
>> >
>> > - It looks to me that the .tim file are being memory mapped fully (ie
>> > they show up in pmap output) the virtual size of the jvm is ~18gb
>> > (heap is 6gb)
>> >
>> > - I've optimized the index but this doesn't have a dramatic impact on
>> > performance
>> >
>> > Changing the precision and the JVM upgrade yielded a drop from ~18s
>> > avg query time to ~9s avg query time.. This is fantastic but I want to
>> > get this down into the 1-2 second range.
>> >
>> > At this point it seems that basically i am bottle-necked on basically
>> > copying memory out of the mapped .tim file which leads me to think
>> > that the only solution to my problem would be to read less data or
>> > somehow read it more efficiently..
>> >
>> > If anyone has any suggestions of where to go with this I'd love to know
>> >
>> >
>> > thanks,
>> >
>> > steve
>>
-
Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context:
http://lucene.472066.n3.nabble.com/Performance-question-on-Spatial-Search-tp4081150p4081309.html
Sent from the Solr - User mailing list archive at Nabble.com.
bq: i've added {!cache=false}
Ahh, ok. forget my comments on warming then, they're irrelevant. Heap probably
isn't relevant either given, as you say, you don't see pressure there.
What puzzles me then is why you're spending all your time in
copyToByteArray(long,Object,long,long). I _suppose_ (an
@Erick it is alot of hw, but basically trying to create a "best case
scenario" to take HW out of the question. Will try increasing heap size
tomorrow.. I haven't seen it get close to the max heap size yet.. but it's
worth trying...
Note that these queries look something like:
q=*:*
fq=[date range
Can you compare with the old geo handler as a baseline. ?
Bill Bell
Sent from mobile
On Jul 29, 2013, at 4:25 PM, Erick Erickson wrote:
> This is very strange. I'd expect slow queries on
> the first few queries while these caches were
> warmed, but after that I'd expect things to
> be quite fa
This is very strange. I'd expect slow queries on
the first few queries while these caches were
warmed, but after that I'd expect things to
be quite fast.
For a 12G index and 256G RAM, you have on the
surface a LOT of hardware to throw at this problem.
You can _try_ giving the JVM, say, 18G but tha
I've been doing some performance analysis of a spacial search use case I'm
implementing in Solr 4.3.0. Basically I'm seeing search times alot higher
than I'd like them to be and I'm hoping people may have some suggestions
for how to optimize further.
Here are the specs of what I'm doing now:
Mach
rag.k...@gmail.com]
Sent: Tuesday, February 19, 2013 1:46 PM
To: solr-user@lucene.apache.org
Subject: SOLR Performance question
Hi everybody.
I stored 42 field in solr.
and indexed 34 field.
and going to store 4-6 coloum more and indexed 3-5
and total doc i have stored --- 250
and may be it
i shift machine to m1.large for 250 data or for 500??
or it will work for now ??
--
View this message in context:
http://lucene.472066.n3.nabble.com/SOLR-Performance-question-tp4041245.html
Sent from the Solr - User mailing list archive at Nabble.com.
Another revelation...
I can see that there is a time difference in the Solr output for adding
these documents when I watch it realtime.
Here are some rows from the 3.5 solr server:
Jan 23, 2013 11:57:23 AM org.apache.solr.core.SolrCore execute
INFO: [gxdResult] webapp=/solr path=/update/javabin
pa
I'm still poking around trying to find the differences. I found a couple
things that may or may not be relevant.
First, when I start up my 3.5 solr, I get all sorts of warnings that my
solrconfig is old and will run using 2.4 emulation.
Of course I had to upgrade the solconfig for the 4.0 instance
Do you mean commenting out the ... tag? Because
that I already commented out. Or do I also need to remove the entire
tag? Sorry, I am not too familiar with everything in the
solrconfig file. I have a tag that essentially looks like this:
Everything inside is commented out.
-Kevin
On 1/23/13
It's hard to guess, but I might start by looking at what the new UpdateLog is
costing you. Take it's definition out of solrconfig.xml and try your test
again. Then let's take it from there.
- Mark
On Jan 23, 2013, at 11:00 AM, Kevin Stone wrote:
> I am having some difficulty migrating our sol
I am having some difficulty migrating our solr indexing scripts from using 3.5
to solr 4.0. Notably, I am trying to track down why our performance in solr 4.0
is about 5-10 times slower when indexing documents. Querying is still quite
fast.
The code adds documents in groups of 1000, and adds e
The size of the index does matter practically speaking.
Bill Bell
Sent from mobile
On Mar 19, 2012, at 11:41 AM, Mikhail Khludnev
wrote:
> Exactly. That's what I mean.
>
> On Mon, Mar 19, 2012 at 6:15 PM, Jamie Johnson wrote:
>
>> Mikhail,
>>
>> Thanks for the response. Just to be clear
Exactly. That's what I mean.
On Mon, Mar 19, 2012 at 6:15 PM, Jamie Johnson wrote:
> Mikhail,
>
> Thanks for the response. Just to be clear you're saying that the size
> of the index does not matter, it's more the size of the results?
>
> On Fri, Mar 16, 2012 at 2:43 PM, Mikhail Khludnev
> wro
Mikhail,
Thanks for the response. Just to be clear you're saying that the size
of the index does not matter, it's more the size of the results?
On Fri, Mar 16, 2012 at 2:43 PM, Mikhail Khludnev
wrote:
> Hello,
>
> Frankly speaking the computational complexity of Lucene search depends from
> siz
Hello,
Frankly speaking the computational complexity of Lucene search depends from
size of search result: numFound*log(start+rows), but from size of index.
Regards
On Fri, Mar 16, 2012 at 9:34 PM, Jamie Johnson wrote:
> I'm curious if anyone tell me how Solr/Lucene performs in a situation
> wh
I'm curious if anyone tell me how Solr/Lucene performs in a situation
where you have 100,000 documents each with 100 tokens vs having
1,000,000 documents each with 10 tokens. Should I expect the
performance to be the same? Any information would be greatly
appreciated.
> You don't lose copyField capability with dynamic fields. You can copy
> dynamic fields into a fixed field name like *_s => text or dynamic fields
> into another dynamic field like *_s => *_t
Ahhh...I missed that little detail. Nice!
Ok, so there are no negatives to using dynamic fields then
You don't lose copyField capability with dynamic fields. You can copy
dynamic fields into a fixed field name like *_s => text or dynamic
fields into another dynamic field like *_s => *_t
Erik
On Jan 6, 2010, at 9:35 AM, A. Steven Anderson wrote:
Strictly speaking there is some ins
> Strictly speaking there is some insignificant distinctions in performance
> related to how a field name is resolved -- Grant alluded to this
> earlier in this thread -- but it only comes into play when you actually
> refer to that field by name and Solr has to "look them up" in the
> metadata. S
: > So, in general, there is no *significant* performance difference with using
: > dynamic fields. Correct?
:
: Correct. There's not even really an "insignificant" performance difference.
: A dynamic field is the same as a regular field in practically every way on the
: search side of things.
On Jan 4, 2010, at 12:04 AM, A. Steven Anderson wrote:
dynamic fields don't make it worse ... the number of actaul field
names
you sort on makes it worse.
If you sort on 100 fields, the cost is the same regardless of
wether all
100 of those fields exist because of a single
declaration
>
> dynamic fields don't make it worse ... the number of actaul field names
> you sort on makes it worse.
>
> If you sort on 100 fields, the cost is the same regardless of wether all
> 100 of those fields exist because of a single declaration,
> or 100 distinct declarations.
>
Ahh...thanks for t
: > If you sort on many of your dynamic fields your memory use will
: > explode, and the same with index norms and disk space.
: Thanks for the info. In general, I knew sorting was expensive, but I didn't
: realize that dynamic fields made it worse.
dynamic fields don't make it worse ... the nu
> Sorting and index norms have space penalties.
> Sorting on a field creates an array of Java ints, one for every
> document in the index. Index norms (used for boosting documents and
> other things) create an array of bytes in the Lucene index files, one
> for every document in the index.
> If you
Sorting and index norms have space penalties.
Sorting on a field creates an array of Java ints, one for every
document in the index. Index norms (used for boosting documents and
other things) create an array of bytes in the Lucene index files, one
for every document in the index.
If you sort on m
> There can be an impact if you are searching against a lot of fields or if
> you are indexing a lot of fields on every document, but for the most part in
> most applications it is negligible.
>
We index a lot of fields at one time, but we can tolerate the performance
impact at index time.
It pro
On Dec 29, 2009, at 2:19 PM, A. Steven Anderson wrote:
> Greetings!
>
> Is there any significant negative performance impact of using a
> dynamicField?
There can be an impact if you are searching against a lot of fields or if you
are indexing a lot of fields on every document, but for the most
Greetings!
Is there any significant negative performance impact of using a
dynamicField?
Likewise for multivalued fields?
The reason why I ask is that our system basically aggregates data from many
disparate data sources (structured, unstructured, and semi-structured), and
the management of the
,
could be the JVM being busy sweeping the garbage out, etc.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: Robert Purdy <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Thursday, November 15, 2007 4:05:00 PM
Subject: Performance qu
On Nov 15, 2007 4:05 PM, Robert Purdy <[EMAIL PROTECTED]> wrote:
> I was looking in the logs on the production server and noticed some queries
> were taking about 15 seconds
Could be a number of reasons... first make sure a major garbage
collection wasn't triggered at that point in time.
-Yonik
the time next to the query in the log greater on the production server? If
so what is the best way to configure tomcat to deal with that issue?
Thanks Robert.
--
View this message in context:
http://www.nabble.com/Performance-question%3A-Solr-64-bit-java-vs-32-bit-mode.-tf4817186.html#a13781791
On 11/5/07, Haishan Chen <[EMAIL PROTECTED]> wrote:
> As for the first issues. The number of different phrase queries have
> performance issues I found so far are about 10.
If these are normal phrase queries (no slop), a good solution might be
to simply index and query these phrases as a single t
> Date: Mon, 5 Nov 2007 14:55:21 -0500> From: [EMAIL PROTECTED]> To:
> solr-user@lucene.apache.org> Subject: Re: Phrase Query Performance Question
> and score threshold> > On 11/5/07, Haishan Chen <[EMAIL PROTECTED]> wrote:> >
> If I limit the docume
On 11/5/07, Haishan Chen <[EMAIL PROTECTED]> wrote:
> If I limit the documents returned based on a score threshold (filter by
> score) will it be able to improve query performance?
No.
Taking a different approach can really speed up queries though.
To figure out what approach you should take, we
u offer
advice on the best way to implement score threshold in SOLR with minimum
overhead?
Appreciate if anyone can help
Thank you
Haishan
> Date: Fri, 2 Nov 2007 12:31:29 -0700> From: [EMAIL PROTECTED]> To:
> solr-user@lucene.apache.org> Subject: Re: Phrase Query Per
> Date: Fri, 2 Nov 2007 12:31:29 -0700> From: [EMAIL PROTECTED]> To:
> solr-user@lucene.apache.org> Subject: Re: Phrase Query Performance Question>
> > > : It still feels to me that you are trying doing something unique with
> your> : phrase queries. Unfortuna
: It still feels to me that you are trying doing something unique with your
: phrase queries. Unfortunately, you still haven't said what you are trying to
: do in general terms, which makes it very difficult for people to help you.
Agreed. This seems very special case, but we dont' know what th
On 2-Nov-07, at 10:03 AM, Haishan Chen wrote:
Date: Fri, 2 Nov 2007 07:32:30 -0700> Subject: Re: Phrase Query
Performance Question> From: [EMAIL PROTECTED]> To: solr-
[EMAIL PROTECTED]> > He means "extremely frequent" and I
agree. --wunder
Then it means
> Date: Fri, 2 Nov 2007 07:32:30 -0700> Subject: Re: Phrase Query Performance
> Question> From: [EMAIL PROTECTED]> To: solr-user@lucene.apache.org> > He
> means "extremely frequent" and I agree. --wunder
Then it means a PHRASE (combination of terms exc
He means "extremely frequent" and I agree. --wunder
On 11/2/07 1:51 AM, "Haishan Chen" <[EMAIL PROTECTED]> wrote:
> Thanks for the advice. You certainly have a point. I believe you mean a query
> term that appears in 5-10% of an index in a natural language corpus is
> extremely INFREQUENT?
> From: [EMAIL PROTECTED]> Subject: Re: Phrase Query Performance Question>
> Date: Thu, 1 Nov 2007 11:25:26 -0700> To: solr-user@lucene.apache.org> > On
> 31-Oct-07, at 11:54 PM, Haishan Chen wrote:> > >> >> Date: Wed, 31 Oct 2007
> 17:54:53 -070
On 31-Oct-07, at 11:54 PM, Haishan Chen wrote:
Date: Wed, 31 Oct 2007 17:54:53 -0700> Subject: Re: Phrase Query
Performance Question> From: [EMAIL PROTECTED]> To: solr-
[EMAIL PROTECTED]> > "hurricane katrina" is a very expensive
query against a collection>
> Date: Wed, 31 Oct 2007 17:54:53 -0700> Subject: Re: Phrase Query Performance
> Question> From: [EMAIL PROTECTED]> To: solr-user@lucene.apache.org> >
> "hurricane katrina" is a very expensive query against a collection> focused
> on Hurricane Kat
> Date: Wed, 31 Oct 2007 19:19:07 -0700> From: [EMAIL PROTECTED]> To:
> solr-user@lucene.apache.org> Subject: RE: Phrase Query Performance Question>
> > > : ("auto repair") 100384 hits 946 ms(auto repair) 100384 hits 31ms("car >
> : repair
: ("auto repair") 100384 hits 946 ms(auto repair) 100384 hits 31ms("car
: repair"~100) 112183 hits 766 ms(car repair) 112183 hits 63
: ms("business service"~100) 1209751 hits 1500 ms(business service)
: 1209751 hits 234 ms("shopping center"~100) 119481 hits 359
: ms(shopping c
"hurricane katrina" is a very expensive query against a collection
focused on Hurricane Katrina. There will be many matches in many
documents. If you want to measure worst-case, this is fine.
I'd try other things, like:
* ninth ward
* Ray Nagin
* Audubon Park
* Canal Street
* French Quarter
* FEM
> From: [EMAIL PROTECTED]> Subject: Re: Phrase Query Performance Question>
> Date: Wed, 31 Oct 2007 15:25:42 -0700> To: solr-user@lucene.apache.org> > On
> 31-Oct-07, at 2:40 PM, Haishan Chen wrote:> > >> >
> http://mail-archives.apache.org/mod_mbox/l
On 31-Oct-07, at 2:40 PM, Haishan Chen wrote:
http://mail-archives.apache.org/mod_mbox/lucene-java-user/
200512.mbox/[EMAIL PROTECTED]
It mentioned that http://websearch.archive.org/katrina/ (in nutch)
had 10M documents and a search of "hurricane katrina" was able to
return in 1.35 second
> From: [EMAIL PROTECTED]> Subject: Re: Phrase Query Performance Question>
> Date: Tue, 30 Oct 2007 11:22:17 -0700> To: solr-user@lucene.apache.org> > On
> 30-Oct-07, at 6:09 AM, Yonik Seeley wrote:> > > On 10/30/07, Haishan Chen
> <[EMAIL PROTECTED
On 30-Oct-07, at 6:09 AM, Yonik Seeley wrote:
On 10/30/07, Haishan Chen <[EMAIL PROTECTED]> wrote:
Thanks a lot for replying Yonik!
I am running solr on a windows 2003 server (standard version).
intel Xeon CPU 3.00GHz, with 4.00 GB RAM.
The index is locate on Raid5 with 2 million documents.
On 10/30/07, Haishan Chen <[EMAIL PROTECTED]> wrote:
> Thanks a lot for replying Yonik!
>
> I am running solr on a windows 2003 server (standard version). intel Xeon CPU
> 3.00GHz, with 4.00 GB RAM.
> The index is locate on Raid5 with 2 million documents. Is there any way to
> improve query perfo
Thanks a lot for replying Yonik!
I am running solr on a windows 2003 server (standard version). intel Xeon CPU
3.00GHz, with 4.00 GB RAM.
The index is locate on Raid5 with 2 million documents. Is there any way to
improve query performance without moving to more powerful computer?
I understand
I am a new Solr user and wonder if anyone can help me these questions. I used
Solr to index about two million documents and query on it using standard
request handler. I disabled all cache. I found phrase query was substantially
slower than the usual query. The statistic I collected is as follo
Thanks Yonik. I think both of the conditions hold true for our application
;).
On 3/27/07, Yonik Seeley <[EMAIL PROTECTED]> wrote:
On 3/26/07, climbingrose <[EMAIL PROTECTED]> wrote:
> I'm developing an application that potentially creates thousands of
dynamic
> fields. Does anyone know if lar
On 3/26/07, climbingrose <[EMAIL PROTECTED]> wrote:
I'm developing an application that potentially creates thousands of dynamic
fields. Does anyone know if large number of dynamic fields will degrade
Solr performance?
Thousands of fields won't be a problem if
- you don't sort on most of them (
Hi all,
I'm developing an application that potentially creates thousands of dynamic
fields. Does anyone know if large number of dynamic fields will degrade
Solr performance?
Thanks.
--
Regards,
Cuong Hoang
Ok, I've restructured the tests and am now seeing performance differences
very close to the claims in the javadocs. Thanks much, Yonik and Hoss.
Took 953ms to get 5000 bitset intersection counts
Took 516ms to get 5000 openbitset intersection counts
New code...
public void testMultipleOpenBi
You are essentially testing sets of size 5000 against sets of size 500,000.
BitSet keeps track of the largest bit you set (which is 5000) and
doesn't actually calculate the intersection or the populationCount
beyond that. OpenBitSet does not (it tries to do the minimum
necessary and make everythi
DocSets will be your friend ... the fact that Solr will choose between
HashDocSets and BitDocSets depending on how many set docs there are in a
particular set is your friend's really cool roomate, and the filterCache
will be your friend's really sweet apartment -- both of which will make
your fri
: Took 421ms to get 5000 bitset intersection counts
: Took 1465ms to get 5000 openbitset intersection counts
:
: ...and I'm wondering what I've done wrong. The results are consistent
: across differenct jvms and different hardware setups. I'm using the 7/22
: nightly of Solr. See my test code b
Hello all,
I'm newish to both Lucene and Solr, but I'm loving learning both. I was
intrigued by the comments in the OpenBitSet javadocs...
OpenBitSet is faster than java.util.BitSet in most operations and *much*
faster at calculating cardinality of sets and results of set operations.
...so I
91 matches
Mail list logo