Re: solr4 performance question

2014-04-08 Thread Erick Erickson
\ > &f.cs_rep.separator=%5E" --data-binary @- -H 'Content-type:text/plain; > charset=utf-8' > EnD) > > > -Original Message- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: Tuesday, April 08, 2014 2:21 PM > To: solr-user@lucene.apache.o

RE: solr4 performance question

2014-04-08 Thread Joshi, Shital
Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, April 08, 2014 2:21 PM To: solr-user@lucene.apache.org Subject: Re: solr4 performance question What do you have for hour _softcommit_ settings in solrconfig.xml? I'm guessing you're using SolrJ or similar, but the solrconfig setti

Re: solr4 performance question

2014-04-08 Thread Erick Erickson
What do you have for hour _softcommit_ settings in solrconfig.xml? I'm guessing you're using SolrJ or similar, but the solrconfig settings will trip a commit as well. For that matter ,what are all our commit settings in solrconfig.xml, both hard and soft? Best, Erick On Tue, Apr 8, 2014 at 10:28

Re: solr4 performance question

2014-04-08 Thread Furkan KAMACI
Hi Joshi; Click to the Plugins/Stats section under your collection at Solr Admin UI. You will see the cache statistics for different types of caches. hitratio and evictions are good statistics to look at first. On the other hand you should read here: https://wiki.apache.org/solr/SolrPerformanceFac

solr4 performance question

2014-04-08 Thread Joshi, Shital
Hi, We have 10 node Solr Cloud (5 shards, 2 replicas) with 30 GB JVM on 60GB machine and 40 GB of index. We're constantly noticing that Solr queries take longer time while update (with commit=false setting) is in progress. The query which usually takes .5 seconds, take up to 2 minutes while up

Re: Performance Question: 'facets.missing'

2013-11-06 Thread Yonik Seeley
On Wed, Nov 6, 2013 at 12:07 PM, andres wrote: > I'm debating whether or not to set the 'facets.missing' parameter to true by > default when faceting. What is the performance impact of setting > 'facets.missing' to true? It really depends on the faceting method. For some faceting methods (like e

Performance Question: 'facets.missing'

2013-11-06 Thread andres
I'm debating whether or not to set the 'facets.missing' parameter to true by default when faceting. What is the performance impact of setting 'facets.missing' to true? -- View this message in context: http://lucene.472066.n3.nabble.com/Performance-Question-facets-mi

Re: Solr4 update and query performance question

2013-08-15 Thread Erick Erickson
sing softCommit=true in update url and check if it > gives us desired performance. > > Thanks for looking into this. Appreciate your help. > > -Original Message- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: Tuesday, August 13, 2013 8:12 AM > To:

RE: Solr4 update and query performance question

2013-08-14 Thread Joshi, Shital
check if it gives us desired performance. Thanks for looking into this. Appreciate your help. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, August 13, 2013 8:12 AM To: solr-user@lucene.apache.org Subject: Re: Solr4 update and query perfor

Re: Solr4 update and query performance question

2013-08-13 Thread Erick Erickson
1> That's hard-coded at present. There's anecdotal evidence that there are throughput improvements with larger batch sizes, but no action yet. 2> Yep, all searchers are also re-opened, caches re-warmed, etc. 3> Odd. I'm assuming your Solr3 was master/slave setup? Seeing the queries wo

Solr4 update and query performance question

2013-08-12 Thread Joshi, Shital
Hi, We have SolrCloud (4.4.0) cluster (5 shards and 2 replicas) on 10 boxes with about 450 mil documents (~90 mil per shard). We're loading 1000 or less documents in CSV format every few minutes. In Solr3, with 300 mil documents, it used to take 30 seconds to load 1000 documents while in Solr4,

Re: Performance question on Spatial Search

2013-08-05 Thread David Smiley (@MITRE.org)
From: "Steven Bower-2 [via Lucene]" mailto:ml-node+s472066n4082569...@n3.nabble.com>> Date: Monday, August 5, 2013 9:14 AM To: "Smiley, David W." mailto:dsmi...@mitre.org>> Subject: Re: Performance question on Spatial Search So after re-feeding our data with

Re: Performance question on Spatial Search

2013-08-05 Thread Shawn Heisey
On 8/5/2013 7:13 AM, Steven Bower wrote: > So after re-feeding our data with a new boolean field that is true when > data exists and false when it doesn't our search times have gone from avg > of about 20s to around 150ms... pretty amazing change in perf... It seems > like https://issues.apache.org

Re: Performance question on Spatial Search

2013-08-05 Thread Steven Bower
So after re-feeding our data with a new boolean field that is true when data exists and false when it doesn't our search times have gone from avg of about 20s to around 150ms... pretty amazing change in perf... It seems like https://issues.apache.org/jira/browse/SOLR-5093 might alleviate many peopl

Re: Performance question on Spatial Search

2013-07-31 Thread Steven Bower
the list of IDs does change relatively frequently, but this doesn't seem to have very much impact on the performance of the query as far as I can tell. attached are the stacks thanks, steve On Wed, Jul 31, 2013 at 6:33 AM, Mikhail Khludnev < mkhlud...@griddynamics.com> wrote: > On Wed, Jul 31

Re: Performance question on Spatial Search

2013-07-31 Thread Mikhail Khludnev
On Wed, Jul 31, 2013 at 1:10 AM, Steven Bower wrote: > > not sure what you mean by good hit raitio? > I mean such queries are really expensive (even on cache hit), so if the list of ids changes every time, it never hit cache and hence executes these heavy queries every time. It's well known perf

Re: Performance question on Spatial Search

2013-07-30 Thread Luis Cappa Banda
gt;> >>cache. > >> >>>> >> > > >> >>>> >> > @Bill will look into that, I'm not certain it will support the > >> >>>> >>particular > >> >>>> >> > queries that are being executed but

Re: Performance question on Spatial Search

2013-07-30 Thread Smiley, David W.
>> >>>> >> > erickerickson@ >> >>>> >> >> >>>> >> > >wrote: >> >>>> >> > >> >>>> >> >> This is very strange. I

Re: Performance question on Spatial Search

2013-07-30 Thread Steven Bower
@David I will certainly update when we get the data refed... and if you have things you'd like to investigate or try out please let me know.. I'm happy to eval things at scale here... we will be taking this index from its current 45m records to 6-700m over the next few months as well.. steve On

Re: Performance question on Spatial Search

2013-07-30 Thread Steven Bower
Very good read... Already using MMap... verified using pmap and vsz from top.. not sure what you mean by good hit raitio? Here are the stacks... Name Time (ms) Own Time (ms) org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet(AtomicReaderContext, Bits) 300879 203478 org.apache.luc

Re: Performance question on Spatial Search

2013-07-30 Thread Luis Cappa Banda
gt;> >> >> really shouldn't be a big issue, your index files > >>>> >> >> should be MMaped. > >>>> >> >> > >>>> >> >> Let's try the crude thing first and give the JVM > >>>> >> >> more memory. > >>>> >> >> > >>>> >> >> FWIW > >>>> >> >> Erick > >>>> >> >> > >>>> >> >> On Mon, Jul 29, 2013 at 4:45 PM, Steven Bower < > >>>> >> > >>>> >> > smb-apache@ > >>>> >> > >>>> >> > > > >>>> >> >> wrote: > >>>> >> >> > I've been doing some performance analysis of a spacial search > >>>>use > >>>> >>case > >>>> >> >> I'm > >>>> >> >> > implementing in Solr 4.3.0. Basically I'm seeing search times > >>>> alot > >>>> >> >> higher > >>>> >> >> > than I'd like them to be and I'm hoping people may have some > >>>> >> >> suggestions > >>>> >> >> > for how to optimize further. > >>>> >> >> > > >>>> >> >> > Here are the specs of what I'm doing now: > >>>> >> >> > > >>>> >> >> > Machine: > >>>> >> >> > - 16 cores @ 2.8ghz > >>>> >> >> > - 256gb RAM > >>>> >> >> > - 1TB (RAID 1+0 on 10 SSD) > >>>> >> >> > > >>>> >> >> > Content: > >>>> >> >> > - 45M docs (not very big only a few fields with no large > >>>>textual > >>>> >> >> content) > >>>> >> >> > - 1 geo field (using config below) > >>>> >> >> > - index is 12gb > >>>> >> >> > - 1 shard > >>>> >> >> > - Using MMapDirectory > >>>> >> >> > > >>>> >> >> > Field config: > >>>> >> >> > > >>>> >> >> > > >>>> >> > >>>> class="solr.SpatialRecursivePrefixTreeFieldType" > >>>> >> >> > >>>> >> > > distErrPct="0.025" maxDistErr="0.00045" > >>>> >> >> > > >>>> >> >> > >>>> >> > >>>> > >>>> > >>>>>>spatialContextFactory="com.spatial4j.core.context.jts.JtsSpatialConte > >>>>>>xtFa > >>>> >>ctory" > >>>> >> >> > units="degrees"/> > >>>> >> >> > > >>>> >> >> > > >>>> >> > >>>> >> >> > >>>> >> > > required="false" stored="true" type="geo"/> > >>>> >> >> > > >>>> >> >> > > >>>> >> >> > What I've figured out so far: > >>>> >> >> > > >>>> >> >> > - Most of my time (98%) is being spent in > >>>> >> >> > java.nio.Bits.copyToByteArray(long,Object,long,long) which is > >>>> being > >>>> >> >> > driven by > >>>> >> >> > >>>>BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.loadBlock() > >>>> >> >> > which from what I gather is basically reading terms from the > >>>>.tim > >>>> >>file > >>>> >> >> > in blocks > >>>> >> >> > > >>>> >> >> > - I moved from Java 1.6 to 1.7 based upon what I read here: > >>>> >> >> > > >>>> >> >> > >>>> >> > >>>> > >>>> > http://blog.vlad1.com/2011/10/05/looking-at-java-nio-buffer-performance > >>>>/ > >>>> >> >> > and it definitely had some positive impact (i haven't been > >>>>able > >>>> to > >>>> >> >> > measure this independantly yet) > >>>> >> >> > > >>>> >> >> > - I changed maxDistErr from 0.09 (which is 1m precision > >>>>per > >>>> >>docs) > >>>> >> >> > to 0.00045 (50m precision) .. > >>>> >> >> > > >>>> >> >> > - It looks to me that the .tim file are being memory mapped > >>>>fully > >>>> >>(ie > >>>> >> >> > they show up in pmap output) the virtual size of the jvm is > >>>>~18gb > >>>> >> >> > (heap is 6gb) > >>>> >> >> > > >>>> >> >> > - I've optimized the index but this doesn't have a dramatic > >>>> impact > >>>> >>on > >>>> >> >> > performance > >>>> >> >> > > >>>> >> >> > Changing the precision and the JVM upgrade yielded a drop from > >>>> ~18s > >>>> >> >> > avg query time to ~9s avg query time.. This is fantastic but I > >>>> >>want to > >>>> >> >> > get this down into the 1-2 second range. > >>>> >> >> > > >>>> >> >> > At this point it seems that basically i am bottle-necked on > >>>> >>basically > >>>> >> >> > copying memory out of the mapped .tim file which leads me to > >>>> think > >>>> >> >> > that the only solution to my problem would be to read less > >>>>data > >>>> or > >>>> >> >> > somehow read it more efficiently.. > >>>> >> >> > > >>>> >> >> > If anyone has any suggestions of where to go with this I'd > >>>>love > >>>> to > >>>> >> know > >>>> >> >> > > >>>> >> >> > > >>>> >> >> > thanks, > >>>> >> >> > > >>>> >> >> > steve > >>>> >> >> > >>>> >> > >>>> >> > >>>> >> > >>>> >> > >>>> >> > >>>> >> - > >>>> >> Author: > >>>> >> > http://www.packtpub.com/apache-solr-3-enterprise-search-server/book > >>>> >> -- > >>>> >> View this message in context: > >>>> >> > >>>> >> > >>>> > >>>> > http://lucene.472066.n3.nabble.com/Performance-question-on-Spatial-Sear > >>>>ch > >>>> >>-tp4081150p4081309.html > >>>> >> Sent from the Solr - User mailing list archive at Nabble.com. > >>>> >> > >>>> > >>>> > >>> > >> > > -- - Luis Cappa

Re: Performance question on Spatial Search

2013-07-30 Thread Smiley, David W.
> >> >>>> >> >> On Mon, Jul 29, 2013 at 4:45 PM, Steven Bower < >>>> >> >>>> >> > smb-apache@ >>>> >> >>>> >> > > >>>> >> >> wrote: >>>> >> >> > I've been doing some performance analysis of a spacial search >>>>use >>>> >>case >>>> >> >> I'm >>>> >> >> > implementing in Solr 4.3.0. Basically I'm seeing search times >>>> alot >>>> >> >> higher >>>> >> >> > than I'd like them to be and I'm hoping people may have some >>>> >> >> suggestions >>>> >> >> > for how to optimize further. >>>> >> >> > >>>> >> >> > Here are the specs of what I'm doing now: >>>> >> >> > >>>> >> >> > Machine: >>>> >> >> > - 16 cores @ 2.8ghz >>>> >> >> > - 256gb RAM >>>> >> >> > - 1TB (RAID 1+0 on 10 SSD) >>>> >> >> > >>>> >> >> > Content: >>>> >> >> > - 45M docs (not very big only a few fields with no large >>>>textual >>>> >> >> content) >>>> >> >> > - 1 geo field (using config below) >>>> >> >> > - index is 12gb >>>> >> >> > - 1 shard >>>> >> >> > - Using MMapDirectory >>>> >> >> > >>>> >> >> > Field config: >>>> >> >> > >>>> >> >> > >>>> >> > >>> class="solr.SpatialRecursivePrefixTreeFieldType" >>>> >> >> >>>> >> > > distErrPct="0.025" maxDistErr="0.00045" >>>> >> >> > >>>> >> >> >>>> >> >>>> >>>> >>>>>>spatialContextFactory="com.spatial4j.core.context.jts.JtsSpatialConte >>>>>>xtFa >>>> >>ctory" >>>> >> >> > units="degrees"/> >>>> >> >> > >>>> >> >> > >>>> >> > >>> >> >> >>>> >> > > required="false" stored="true" type="geo"/> >>>> >> >> > >>>> >> >> > >>>> >> >> > What I've figured out so far: >>>> >> >> > >>>> >> >> > - Most of my time (98%) is being spent in >>>> >> >> > java.nio.Bits.copyToByteArray(long,Object,long,long) which is >>>> being >>>> >> >> > driven by >>>> >> >> >>>>BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.loadBlock() >>>> >> >> > which from what I gather is basically reading terms from the >>>>.tim >>>> >>file >>>> >> >> > in blocks >>>> >> >> > >>>> >> >> > - I moved from Java 1.6 to 1.7 based upon what I read here: >>>> >> >> > >>>> >> >> >>>> >> >>>> >>>>http://blog.vlad1.com/2011/10/05/looking-at-java-nio-buffer-performance >>>>/ >>>> >> >> > and it definitely had some positive impact (i haven't been >>>>able >>>> to >>>> >> >> > measure this independantly yet) >>>> >> >> > >>>> >> >> > - I changed maxDistErr from 0.09 (which is 1m precision >>>>per >>>> >>docs) >>>> >> >> > to 0.00045 (50m precision) .. >>>> >> >> > >>>> >> >> > - It looks to me that the .tim file are being memory mapped >>>>fully >>>> >>(ie >>>> >> >> > they show up in pmap output) the virtual size of the jvm is >>>>~18gb >>>> >> >> > (heap is 6gb) >>>> >> >> > >>>> >> >> > - I've optimized the index but this doesn't have a dramatic >>>> impact >>>> >>on >>>> >> >> > performance >>>> >> >> > >>>> >> >> > Changing the precision and the JVM upgrade yielded a drop from >>>> ~18s >>>> >> >> > avg query time to ~9s avg query time.. This is fantastic but I >>>> >>want to >>>> >> >> > get this down into the 1-2 second range. >>>> >> >> > >>>> >> >> > At this point it seems that basically i am bottle-necked on >>>> >>basically >>>> >> >> > copying memory out of the mapped .tim file which leads me to >>>> think >>>> >> >> > that the only solution to my problem would be to read less >>>>data >>>> or >>>> >> >> > somehow read it more efficiently.. >>>> >> >> > >>>> >> >> > If anyone has any suggestions of where to go with this I'd >>>>love >>>> to >>>> >> know >>>> >> >> > >>>> >> >> > >>>> >> >> > thanks, >>>> >> >> > >>>> >> >> > steve >>>> >> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> - >>>> >> Author: >>>> >> http://www.packtpub.com/apache-solr-3-enterprise-search-server/book >>>> >> -- >>>> >> View this message in context: >>>> >> >>>> >> >>>> >>>>http://lucene.472066.n3.nabble.com/Performance-question-on-Spatial-Sear >>>>ch >>>> >>-tp4081150p4081309.html >>>> >> Sent from the Solr - User mailing list archive at Nabble.com. >>>> >> >>>> >>>> >>> >>

Re: Performance question on Spatial Search

2013-07-30 Thread Mikhail Khludnev
On Tue, Jul 30, 2013 at 12:45 AM, Steven Bower wrote: > > - Most of my time (98%) is being spent in > java.nio.Bits.copyToByteArray(long,Object,long,long) which is being Steven, please http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html .my benchmarking experience shows that

Re: Performance question on Spatial Search

2013-07-30 Thread Steven Bower
ally I'm seeing search times >>> alot >>> >> >> higher >>> >> >> > than I'd like them to be and I'm hoping people may have some >>> >> >> suggestions >>> >> >> > for how to optimize further. >>

Re: Performance question on Spatial Search

2013-07-30 Thread Steven Bower
t; > - 16 cores @ 2.8ghz >> >> >> > - 256gb RAM >> >> >> > - 1TB (RAID 1+0 on 10 SSD) >> >> >> > >> >> >> > Content: >> >> >> > - 45M docs (not very big only a few fields with no large textual >> >&g

Re: Performance question on Spatial Search

2013-07-30 Thread Steven Bower
00045" > >> >> > > >> >> > >> > >>spatialContextFactory="com.spatial4j.core.context.jts.JtsSpatialContextFa > >>ctory" > >> >> > units="degrees"/> > >> >> > > >> >> > > >> > >> >> > >> > > required="false" stored="true" type="geo"/> > >> >> > > >> >> > > >> >> > What I've figured out so far: > >> >> > > >> >> > - Most of my time (98%) is being spent in > >> >> > java.nio.Bits.copyToByteArray(long,Object,long,long) which is being > >> >> > driven by > >> >> BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.loadBlock() > >> >> > which from what I gather is basically reading terms from the .tim > >>file > >> >> > in blocks > >> >> > > >> >> > - I moved from Java 1.6 to 1.7 based upon what I read here: > >> >> > > >> >> > >> > http://blog.vlad1.com/2011/10/05/looking-at-java-nio-buffer-performance/ > >> >> > and it definitely had some positive impact (i haven't been able to > >> >> > measure this independantly yet) > >> >> > > >> >> > - I changed maxDistErr from 0.09 (which is 1m precision per > >>docs) > >> >> > to 0.00045 (50m precision) .. > >> >> > > >> >> > - It looks to me that the .tim file are being memory mapped fully > >>(ie > >> >> > they show up in pmap output) the virtual size of the jvm is ~18gb > >> >> > (heap is 6gb) > >> >> > > >> >> > - I've optimized the index but this doesn't have a dramatic impact > >>on > >> >> > performance > >> >> > > >> >> > Changing the precision and the JVM upgrade yielded a drop from ~18s > >> >> > avg query time to ~9s avg query time.. This is fantastic but I > >>want to > >> >> > get this down into the 1-2 second range. > >> >> > > >> >> > At this point it seems that basically i am bottle-necked on > >>basically > >> >> > copying memory out of the mapped .tim file which leads me to think > >> >> > that the only solution to my problem would be to read less data or > >> >> > somehow read it more efficiently.. > >> >> > > >> >> > If anyone has any suggestions of where to go with this I'd love to > >> know > >> >> > > >> >> > > >> >> > thanks, > >> >> > > >> >> > steve > >> >> > >> > >> > >> > >> > >> > >> - > >> Author: > >> http://www.packtpub.com/apache-solr-3-enterprise-search-server/book > >> -- > >> View this message in context: > >> > >> > http://lucene.472066.n3.nabble.com/Performance-question-on-Spatial-Search > >>-tp4081150p4081309.html > >> Sent from the Solr - User mailing list archive at Nabble.com. > >> > >

Re: Performance question on Spatial Search

2013-07-30 Thread Smiley, David W.
I've figured out so far: >> >> > >> >> > - Most of my time (98%) is being spent in >> >> > java.nio.Bits.copyToByteArray(long,Object,long,long) which is being >> >> > driven by >> >> BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.loadBlock() >> >> > which from what I gather is basically reading terms from the .tim >>file >> >> > in blocks >> >> > >> >> > - I moved from Java 1.6 to 1.7 based upon what I read here: >> >> > >> >> >> http://blog.vlad1.com/2011/10/05/looking-at-java-nio-buffer-performance/ >> >> > and it definitely had some positive impact (i haven't been able to >> >> > measure this independantly yet) >> >> > >> >> > - I changed maxDistErr from 0.09 (which is 1m precision per >>docs) >> >> > to 0.00045 (50m precision) .. >> >> > >> >> > - It looks to me that the .tim file are being memory mapped fully >>(ie >> >> > they show up in pmap output) the virtual size of the jvm is ~18gb >> >> > (heap is 6gb) >> >> > >> >> > - I've optimized the index but this doesn't have a dramatic impact >>on >> >> > performance >> >> > >> >> > Changing the precision and the JVM upgrade yielded a drop from ~18s >> >> > avg query time to ~9s avg query time.. This is fantastic but I >>want to >> >> > get this down into the 1-2 second range. >> >> > >> >> > At this point it seems that basically i am bottle-necked on >>basically >> >> > copying memory out of the mapped .tim file which leads me to think >> >> > that the only solution to my problem would be to read less data or >> >> > somehow read it more efficiently.. >> >> > >> >> > If anyone has any suggestions of where to go with this I'd love to >> know >> >> > >> >> > >> >> > thanks, >> >> > >> >> > steve >> >> >> >> >> >> >> >> - >> Author: >> http://www.packtpub.com/apache-solr-3-enterprise-search-server/book >> -- >> View this message in context: >> >>http://lucene.472066.n3.nabble.com/Performance-question-on-Spatial-Search >>-tp4081150p4081309.html >> Sent from the Solr - User mailing list archive at Nabble.com. >>

Re: Performance question on Spatial Search

2013-07-30 Thread Steven Bower
> >> > > >> > - I moved from Java 1.6 to 1.7 based upon what I read here: > >> > > >> > http://blog.vlad1.com/2011/10/05/looking-at-java-nio-buffer-performance/ > >> > and it definitely had some positive impact (i haven't been able to > >>

Re: Performance question on Spatial Search

2013-07-30 Thread David Smiley (@MITRE.org)
0.00045 (50m precision) .. >> > >> > - It looks to me that the .tim file are being memory mapped fully (ie >> > they show up in pmap output) the virtual size of the jvm is ~18gb >> > (heap is 6gb) >> > >> > - I've optimized the index but this doesn't have a dramatic impact on >> > performance >> > >> > Changing the precision and the JVM upgrade yielded a drop from ~18s >> > avg query time to ~9s avg query time.. This is fantastic but I want to >> > get this down into the 1-2 second range. >> > >> > At this point it seems that basically i am bottle-necked on basically >> > copying memory out of the mapped .tim file which leads me to think >> > that the only solution to my problem would be to read less data or >> > somehow read it more efficiently.. >> > >> > If anyone has any suggestions of where to go with this I'd love to know >> > >> > >> > thanks, >> > >> > steve >> - Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/Performance-question-on-Spatial-Search-tp4081150p4081309.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Performance question on Spatial Search

2013-07-30 Thread Erick Erickson
bq: i've added {!cache=false} Ahh, ok. forget my comments on warming then, they're irrelevant. Heap probably isn't relevant either given, as you say, you don't see pressure there. What puzzles me then is why you're spending all your time in copyToByteArray(long,Object,long,long). I _suppose_ (an

Re: Performance question on Spatial Search

2013-07-29 Thread Steven Bower
@Erick it is alot of hw, but basically trying to create a "best case scenario" to take HW out of the question. Will try increasing heap size tomorrow.. I haven't seen it get close to the max heap size yet.. but it's worth trying... Note that these queries look something like: q=*:* fq=[date range

Re: Performance question on Spatial Search

2013-07-29 Thread Bill Bell
Can you compare with the old geo handler as a baseline. ? Bill Bell Sent from mobile On Jul 29, 2013, at 4:25 PM, Erick Erickson wrote: > This is very strange. I'd expect slow queries on > the first few queries while these caches were > warmed, but after that I'd expect things to > be quite fa

Re: Performance question on Spatial Search

2013-07-29 Thread Erick Erickson
This is very strange. I'd expect slow queries on the first few queries while these caches were warmed, but after that I'd expect things to be quite fast. For a 12G index and 256G RAM, you have on the surface a LOT of hardware to throw at this problem. You can _try_ giving the JVM, say, 18G but tha

Performance question on Spatial Search

2013-07-29 Thread Steven Bower
I've been doing some performance analysis of a spacial search use case I'm implementing in Solr 4.3.0. Basically I'm seeing search times alot higher than I'd like them to be and I'm hoping people may have some suggestions for how to optimize further. Here are the specs of what I'm doing now: Mach

RE: SOLR Performance question

2013-02-19 Thread Harshvardhan Ojha
rag.k...@gmail.com] Sent: Tuesday, February 19, 2013 1:46 PM To: solr-user@lucene.apache.org Subject: SOLR Performance question Hi everybody. I stored 42 field in solr. and indexed 34 field. and going to store 4-6 coloum more and indexed 3-5 and total doc i have stored --- 250 and may be it

SOLR Performance question

2013-02-19 Thread anurag.jain
i shift machine to m1.large for 250 data or for 500?? or it will work for now ?? -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-Performance-question-tp4041245.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 4.0 indexing performance question

2013-01-23 Thread Kevin Stone
Another revelation... I can see that there is a time difference in the Solr output for adding these documents when I watch it realtime. Here are some rows from the 3.5 solr server: Jan 23, 2013 11:57:23 AM org.apache.solr.core.SolrCore execute INFO: [gxdResult] webapp=/solr path=/update/javabin pa

Re: Solr 4.0 indexing performance question

2013-01-23 Thread Kevin Stone
I'm still poking around trying to find the differences. I found a couple things that may or may not be relevant. First, when I start up my 3.5 solr, I get all sorts of warnings that my solrconfig is old and will run using 2.4 emulation. Of course I had to upgrade the solconfig for the 4.0 instance

Re: Solr 4.0 indexing performance question

2013-01-23 Thread Kevin Stone
Do you mean commenting out the ... tag? Because that I already commented out. Or do I also need to remove the entire tag? Sorry, I am not too familiar with everything in the solrconfig file. I have a tag that essentially looks like this: Everything inside is commented out. -Kevin On 1/23/13

Re: Solr 4.0 indexing performance question

2013-01-23 Thread Mark Miller
It's hard to guess, but I might start by looking at what the new UpdateLog is costing you. Take it's definition out of solrconfig.xml and try your test again. Then let's take it from there. - Mark On Jan 23, 2013, at 11:00 AM, Kevin Stone wrote: > I am having some difficulty migrating our sol

Solr 4.0 indexing performance question

2013-01-23 Thread Kevin Stone
I am having some difficulty migrating our solr indexing scripts from using 3.5 to solr 4.0. Notably, I am trying to track down why our performance in solr 4.0 is about 5-10 times slower when indexing documents. Querying is still quite fast. The code adds documents in groups of 1000, and adds e

Re: Performance Question

2012-03-19 Thread Bill Bell
The size of the index does matter practically speaking. Bill Bell Sent from mobile On Mar 19, 2012, at 11:41 AM, Mikhail Khludnev wrote: > Exactly. That's what I mean. > > On Mon, Mar 19, 2012 at 6:15 PM, Jamie Johnson wrote: > >> Mikhail, >> >> Thanks for the response. Just to be clear

Re: Performance Question

2012-03-19 Thread Mikhail Khludnev
Exactly. That's what I mean. On Mon, Mar 19, 2012 at 6:15 PM, Jamie Johnson wrote: > Mikhail, > > Thanks for the response. Just to be clear you're saying that the size > of the index does not matter, it's more the size of the results? > > On Fri, Mar 16, 2012 at 2:43 PM, Mikhail Khludnev > wro

Re: Performance Question

2012-03-19 Thread Jamie Johnson
Mikhail, Thanks for the response. Just to be clear you're saying that the size of the index does not matter, it's more the size of the results? On Fri, Mar 16, 2012 at 2:43 PM, Mikhail Khludnev wrote: > Hello, > > Frankly speaking the computational complexity of Lucene search depends from > siz

Re: Performance Question

2012-03-16 Thread Mikhail Khludnev
Hello, Frankly speaking the computational complexity of Lucene search depends from size of search result: numFound*log(start+rows), but from size of index. Regards On Fri, Mar 16, 2012 at 9:34 PM, Jamie Johnson wrote: > I'm curious if anyone tell me how Solr/Lucene performs in a situation > wh

Performance Question

2012-03-16 Thread Jamie Johnson
I'm curious if anyone tell me how Solr/Lucene performs in a situation where you have 100,000 documents each with 100 tokens vs having 1,000,000 documents each with 10 tokens. Should I expect the performance to be the same? Any information would be greatly appreciated.

Re: performance question

2010-01-06 Thread A. Steven Anderson
> You don't lose copyField capability with dynamic fields. You can copy > dynamic fields into a fixed field name like *_s => text or dynamic fields > into another dynamic field like *_s => *_t Ahhh...I missed that little detail. Nice! Ok, so there are no negatives to using dynamic fields then

Re: performance question

2010-01-06 Thread Erik Hatcher
You don't lose copyField capability with dynamic fields. You can copy dynamic fields into a fixed field name like *_s => text or dynamic fields into another dynamic field like *_s => *_t Erik On Jan 6, 2010, at 9:35 AM, A. Steven Anderson wrote: Strictly speaking there is some ins

Re: performance question

2010-01-06 Thread A. Steven Anderson
> Strictly speaking there is some insignificant distinctions in performance > related to how a field name is resolved -- Grant alluded to this > earlier in this thread -- but it only comes into play when you actually > refer to that field by name and Solr has to "look them up" in the > metadata. S

Re: performance question

2010-01-05 Thread Chris Hostetter
: > So, in general, there is no *significant* performance difference with using : > dynamic fields. Correct? : : Correct. There's not even really an "insignificant" performance difference. : A dynamic field is the same as a regular field in practically every way on the : search side of things.

Re: performance question

2010-01-04 Thread Erik Hatcher
On Jan 4, 2010, at 12:04 AM, A. Steven Anderson wrote: dynamic fields don't make it worse ... the number of actaul field names you sort on makes it worse. If you sort on 100 fields, the cost is the same regardless of wether all 100 of those fields exist because of a single declaration

Re: performance question

2010-01-03 Thread A. Steven Anderson
> > dynamic fields don't make it worse ... the number of actaul field names > you sort on makes it worse. > > If you sort on 100 fields, the cost is the same regardless of wether all > 100 of those fields exist because of a single declaration, > or 100 distinct declarations. > Ahh...thanks for t

Re: performance question

2010-01-03 Thread Chris Hostetter
: > If you sort on many of your dynamic fields your memory use will : > explode, and the same with index norms and disk space. : Thanks for the info. In general, I knew sorting was expensive, but I didn't : realize that dynamic fields made it worse. dynamic fields don't make it worse ... the nu

Re: performance question

2010-01-03 Thread A. Steven Anderson
> Sorting and index norms have space penalties. > Sorting on a field creates an array of Java ints, one for every > document in the index. Index norms (used for boosting documents and > other things) create an array of bytes in the Lucene index files, one > for every document in the index. > If you

Re: performance question

2010-01-02 Thread Lance Norskog
Sorting and index norms have space penalties. Sorting on a field creates an array of Java ints, one for every document in the index. Index norms (used for boosting documents and other things) create an array of bytes in the Lucene index files, one for every document in the index. If you sort on m

Re: performance question

2009-12-30 Thread A. Steven Anderson
> There can be an impact if you are searching against a lot of fields or if > you are indexing a lot of fields on every document, but for the most part in > most applications it is negligible. > We index a lot of fields at one time, but we can tolerate the performance impact at index time. It pro

Re: performance question

2009-12-30 Thread Grant Ingersoll
On Dec 29, 2009, at 2:19 PM, A. Steven Anderson wrote: > Greetings! > > Is there any significant negative performance impact of using a > dynamicField? There can be an impact if you are searching against a lot of fields or if you are indexing a lot of fields on every document, but for the most

performance question

2009-12-29 Thread A. Steven Anderson
Greetings! Is there any significant negative performance impact of using a dynamicField? Likewise for multivalued fields? The reason why I ask is that our system basically aggregates data from many disparate data sources (structured, unstructured, and semi-structured), and the management of the

Re: Performance question: Solr 64 bit java vs 32 bit mode.

2007-11-20 Thread Otis Gospodnetic
, could be the JVM being busy sweeping the garbage out, etc. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Robert Purdy <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Thursday, November 15, 2007 4:05:00 PM Subject: Performance qu

Re: Performance question: Solr 64 bit java vs 32 bit mode.

2007-11-17 Thread Yonik Seeley
On Nov 15, 2007 4:05 PM, Robert Purdy <[EMAIL PROTECTED]> wrote: > I was looking in the logs on the production server and noticed some queries > were taking about 15 seconds Could be a number of reasons... first make sure a major garbage collection wasn't triggered at that point in time. -Yonik

Performance question: Solr 64 bit java vs 32 bit mode.

2007-11-15 Thread Robert Purdy
the time next to the query in the log greater on the production server? If so what is the best way to configure tomcat to deal with that issue? Thanks Robert. -- View this message in context: http://www.nabble.com/Performance-question%3A-Solr-64-bit-java-vs-32-bit-mode.-tf4817186.html#a13781791

Re: Phrase Query Performance Question and score threshold

2007-11-05 Thread Yonik Seeley
On 11/5/07, Haishan Chen <[EMAIL PROTECTED]> wrote: > As for the first issues. The number of different phrase queries have > performance issues I found so far are about 10. If these are normal phrase queries (no slop), a good solution might be to simply index and query these phrases as a single t

RE: Phrase Query Performance Question and score threshold

2007-11-05 Thread Haishan Chen
> Date: Mon, 5 Nov 2007 14:55:21 -0500> From: [EMAIL PROTECTED]> To: > solr-user@lucene.apache.org> Subject: Re: Phrase Query Performance Question > and score threshold> > On 11/5/07, Haishan Chen <[EMAIL PROTECTED]> wrote:> > > If I limit the docume

Re: Phrase Query Performance Question and score threshold

2007-11-05 Thread Yonik Seeley
On 11/5/07, Haishan Chen <[EMAIL PROTECTED]> wrote: > If I limit the documents returned based on a score threshold (filter by > score) will it be able to improve query performance? No. Taking a different approach can really speed up queries though. To figure out what approach you should take, we

RE: Phrase Query Performance Question and score threshold

2007-11-05 Thread Haishan Chen
u offer advice on the best way to implement score threshold in SOLR with minimum overhead? Appreciate if anyone can help Thank you Haishan > Date: Fri, 2 Nov 2007 12:31:29 -0700> From: [EMAIL PROTECTED]> To: > solr-user@lucene.apache.org> Subject: Re: Phrase Query Per

RE: Phrase Query Performance Question

2007-11-02 Thread Haishan Chen
> Date: Fri, 2 Nov 2007 12:31:29 -0700> From: [EMAIL PROTECTED]> To: > solr-user@lucene.apache.org> Subject: Re: Phrase Query Performance Question> > > > : It still feels to me that you are trying doing something unique with > your> : phrase queries. Unfortuna

Re: Phrase Query Performance Question

2007-11-02 Thread Chris Hostetter
: It still feels to me that you are trying doing something unique with your : phrase queries. Unfortunately, you still haven't said what you are trying to : do in general terms, which makes it very difficult for people to help you. Agreed. This seems very special case, but we dont' know what th

Re: Phrase Query Performance Question

2007-11-02 Thread Mike Klaas
On 2-Nov-07, at 10:03 AM, Haishan Chen wrote: Date: Fri, 2 Nov 2007 07:32:30 -0700> Subject: Re: Phrase Query Performance Question> From: [EMAIL PROTECTED]> To: solr- [EMAIL PROTECTED]> > He means "extremely frequent" and I agree. --wunder Then it means

RE: Phrase Query Performance Question

2007-11-02 Thread Haishan Chen
> Date: Fri, 2 Nov 2007 07:32:30 -0700> Subject: Re: Phrase Query Performance > Question> From: [EMAIL PROTECTED]> To: solr-user@lucene.apache.org> > He > means "extremely frequent" and I agree. --wunder Then it means a PHRASE (combination of terms exc

Re: Phrase Query Performance Question

2007-11-02 Thread Walter Underwood
He means "extremely frequent" and I agree. --wunder On 11/2/07 1:51 AM, "Haishan Chen" <[EMAIL PROTECTED]> wrote: > Thanks for the advice. You certainly have a point. I believe you mean a query > term that appears in 5-10% of an index in a natural language corpus is > extremely INFREQUENT?

RE: Phrase Query Performance Question

2007-11-02 Thread Haishan Chen
> From: [EMAIL PROTECTED]> Subject: Re: Phrase Query Performance Question> > Date: Thu, 1 Nov 2007 11:25:26 -0700> To: solr-user@lucene.apache.org> > On > 31-Oct-07, at 11:54 PM, Haishan Chen wrote:> > >> >> Date: Wed, 31 Oct 2007 > 17:54:53 -070

Re: Phrase Query Performance Question

2007-11-01 Thread Mike Klaas
On 31-Oct-07, at 11:54 PM, Haishan Chen wrote: Date: Wed, 31 Oct 2007 17:54:53 -0700> Subject: Re: Phrase Query Performance Question> From: [EMAIL PROTECTED]> To: solr- [EMAIL PROTECTED]> > "hurricane katrina" is a very expensive query against a collection>

RE: Phrase Query Performance Question

2007-10-31 Thread Haishan Chen
> Date: Wed, 31 Oct 2007 17:54:53 -0700> Subject: Re: Phrase Query Performance > Question> From: [EMAIL PROTECTED]> To: solr-user@lucene.apache.org> > > "hurricane katrina" is a very expensive query against a collection> focused > on Hurricane Kat

RE: Phrase Query Performance Question

2007-10-31 Thread Haishan Chen
> Date: Wed, 31 Oct 2007 19:19:07 -0700> From: [EMAIL PROTECTED]> To: > solr-user@lucene.apache.org> Subject: RE: Phrase Query Performance Question> > > > : ("auto repair") 100384 hits 946 ms(auto repair) 100384 hits 31ms("car > > : repair

RE: Phrase Query Performance Question

2007-10-31 Thread Chris Hostetter
: ("auto repair") 100384 hits 946 ms(auto repair) 100384 hits 31ms("car : repair"~100) 112183 hits 766 ms(car repair) 112183 hits 63 : ms("business service"~100) 1209751 hits 1500 ms(business service) : 1209751 hits 234 ms("shopping center"~100) 119481 hits 359 : ms(shopping c

Re: Phrase Query Performance Question

2007-10-31 Thread Walter Underwood
"hurricane katrina" is a very expensive query against a collection focused on Hurricane Katrina. There will be many matches in many documents. If you want to measure worst-case, this is fine. I'd try other things, like: * ninth ward * Ray Nagin * Audubon Park * Canal Street * French Quarter * FEM

RE: Phrase Query Performance Question

2007-10-31 Thread Haishan Chen
> From: [EMAIL PROTECTED]> Subject: Re: Phrase Query Performance Question> > Date: Wed, 31 Oct 2007 15:25:42 -0700> To: solr-user@lucene.apache.org> > On > 31-Oct-07, at 2:40 PM, Haishan Chen wrote:> > >> > > http://mail-archives.apache.org/mod_mbox/l

Re: Phrase Query Performance Question

2007-10-31 Thread Mike Klaas
On 31-Oct-07, at 2:40 PM, Haishan Chen wrote: http://mail-archives.apache.org/mod_mbox/lucene-java-user/ 200512.mbox/[EMAIL PROTECTED] It mentioned that http://websearch.archive.org/katrina/ (in nutch) had 10M documents and a search of "hurricane katrina" was able to return in 1.35 second

RE: Phrase Query Performance Question

2007-10-31 Thread Haishan Chen
> From: [EMAIL PROTECTED]> Subject: Re: Phrase Query Performance Question> > Date: Tue, 30 Oct 2007 11:22:17 -0700> To: solr-user@lucene.apache.org> > On > 30-Oct-07, at 6:09 AM, Yonik Seeley wrote:> > > On 10/30/07, Haishan Chen > <[EMAIL PROTECTED

Re: Phrase Query Performance Question

2007-10-30 Thread Mike Klaas
On 30-Oct-07, at 6:09 AM, Yonik Seeley wrote: On 10/30/07, Haishan Chen <[EMAIL PROTECTED]> wrote: Thanks a lot for replying Yonik! I am running solr on a windows 2003 server (standard version). intel Xeon CPU 3.00GHz, with 4.00 GB RAM. The index is locate on Raid5 with 2 million documents.

Re: Phrase Query Performance Question

2007-10-30 Thread Yonik Seeley
On 10/30/07, Haishan Chen <[EMAIL PROTECTED]> wrote: > Thanks a lot for replying Yonik! > > I am running solr on a windows 2003 server (standard version). intel Xeon CPU > 3.00GHz, with 4.00 GB RAM. > The index is locate on Raid5 with 2 million documents. Is there any way to > improve query perfo

RE: Phrase Query Performance Question

2007-10-30 Thread Haishan Chen
Thanks a lot for replying Yonik! I am running solr on a windows 2003 server (standard version). intel Xeon CPU 3.00GHz, with 4.00 GB RAM. The index is locate on Raid5 with 2 million documents. Is there any way to improve query performance without moving to more powerful computer? I understand

Phrase Query Performance Question

2007-10-26 Thread Haishan Chen
I am a new Solr user and wonder if anyone can help me these questions. I used Solr to index about two million documents and query on it using standard request handler. I disabled all cache. I found phrase query was substantially slower than the usual query. The statistic I collected is as follo

Re: Dynamic fields performance question

2007-03-26 Thread climbingrose
Thanks Yonik. I think both of the conditions hold true for our application ;). On 3/27/07, Yonik Seeley <[EMAIL PROTECTED]> wrote: On 3/26/07, climbingrose <[EMAIL PROTECTED]> wrote: > I'm developing an application that potentially creates thousands of dynamic > fields. Does anyone know if lar

Re: Dynamic fields performance question

2007-03-26 Thread Yonik Seeley
On 3/26/07, climbingrose <[EMAIL PROTECTED]> wrote: I'm developing an application that potentially creates thousands of dynamic fields. Does anyone know if large number of dynamic fields will degrade Solr performance? Thousands of fields won't be a problem if - you don't sort on most of them (

Dynamic fields performance question

2007-03-25 Thread climbingrose
Hi all, I'm developing an application that potentially creates thousands of dynamic fields. Does anyone know if large number of dynamic fields will degrade Solr performance? Thanks. -- Regards, Cuong Hoang

Re: OpenBitSet performance question

2006-07-22 Thread Cass Costello
Ok, I've restructured the tests and am now seeing performance differences very close to the claims in the javadocs. Thanks much, Yonik and Hoss. Took 953ms to get 5000 bitset intersection counts Took 516ms to get 5000 openbitset intersection counts New code... public void testMultipleOpenBi

Re: OpenBitSet performance question

2006-07-22 Thread Yonik Seeley
You are essentially testing sets of size 5000 against sets of size 500,000. BitSet keeps track of the largest bit you set (which is 5000) and doesn't actually calculate the intersection or the populationCount beyond that. OpenBitSet does not (it tries to do the minimum necessary and make everythi

Re: OpenBitSet performance question

2006-07-22 Thread Cass Costello
DocSets will be your friend ... the fact that Solr will choose between HashDocSets and BitDocSets depending on how many set docs there are in a particular set is your friend's really cool roomate, and the filterCache will be your friend's really sweet apartment -- both of which will make your fri

Re: OpenBitSet performance question

2006-07-22 Thread Chris Hostetter
: Took 421ms to get 5000 bitset intersection counts : Took 1465ms to get 5000 openbitset intersection counts : : ...and I'm wondering what I've done wrong. The results are consistent : across differenct jvms and different hardware setups. I'm using the 7/22 : nightly of Solr. See my test code b

OpenBitSet performance question

2006-07-22 Thread Cass Costello
Hello all, I'm newish to both Lucene and Solr, but I'm loving learning both. I was intrigued by the comments in the OpenBitSet javadocs... OpenBitSet is faster than java.util.BitSet in most operations and *much* faster at calculating cardinality of sets and results of set operations. ...so I