Re: Performance Question: 'facets.missing'

2013-11-06 Thread Yonik Seeley
On Wed, Nov 6, 2013 at 12:07 PM, andres wrote: > I'm debating whether or not to set the 'facets.missing' parameter to true by > default when faceting. What is the performance impact of setting > 'facets.missing' to true? It really depends on the faceting method. For some faceting methods (like e

Re: Performance question on Spatial Search

2013-08-05 Thread David Smiley (@MITRE.org)
From: "Steven Bower-2 [via Lucene]" mailto:ml-node+s472066n4082569...@n3.nabble.com>> Date: Monday, August 5, 2013 9:14 AM To: "Smiley, David W." mailto:dsmi...@mitre.org>> Subject: Re: Performance question on Spatial Search So after re-feeding our data with

Re: Performance question on Spatial Search

2013-08-05 Thread Shawn Heisey
On 8/5/2013 7:13 AM, Steven Bower wrote: > So after re-feeding our data with a new boolean field that is true when > data exists and false when it doesn't our search times have gone from avg > of about 20s to around 150ms... pretty amazing change in perf... It seems > like https://issues.apache.org

Re: Performance question on Spatial Search

2013-08-05 Thread Steven Bower
So after re-feeding our data with a new boolean field that is true when data exists and false when it doesn't our search times have gone from avg of about 20s to around 150ms... pretty amazing change in perf... It seems like https://issues.apache.org/jira/browse/SOLR-5093 might alleviate many peopl

Re: Performance question on Spatial Search

2013-07-31 Thread Steven Bower
the list of IDs does change relatively frequently, but this doesn't seem to have very much impact on the performance of the query as far as I can tell. attached are the stacks thanks, steve On Wed, Jul 31, 2013 at 6:33 AM, Mikhail Khludnev < mkhlud...@griddynamics.com> wrote: > On Wed, Jul 31

Re: Performance question on Spatial Search

2013-07-31 Thread Mikhail Khludnev
On Wed, Jul 31, 2013 at 1:10 AM, Steven Bower wrote: > > not sure what you mean by good hit raitio? > I mean such queries are really expensive (even on cache hit), so if the list of ids changes every time, it never hit cache and hence executes these heavy queries every time. It's well known perf

Re: Performance question on Spatial Search

2013-07-30 Thread Luis Cappa Banda
Thank you very much, David. That was a great explanation! Regards, - Luis Cappa 2013/7/30 Smiley, David W. > Luis, > > field:* and field:[* TO *] are semantically equivalent -- they have the > same effect. But they internally work differently depending on the field > type. The field type ha

Re: Performance question on Spatial Search

2013-07-30 Thread Smiley, David W.
Luis, field:* and field:[* TO *] are semantically equivalent -- they have the same effect. But they internally work differently depending on the field type. The field type has the chance to intercept the range query to do something smart (FieldType.getRangeQuery(...)). Numeric/Date (trie) field

Re: Performance question on Spatial Search

2013-07-30 Thread Steven Bower
@David I will certainly update when we get the data refed... and if you have things you'd like to investigate or try out please let me know.. I'm happy to eval things at scale here... we will be taking this index from its current 45m records to 6-700m over the next few months as well.. steve On

Re: Performance question on Spatial Search

2013-07-30 Thread Steven Bower
Very good read... Already using MMap... verified using pmap and vsz from top.. not sure what you mean by good hit raitio? Here are the stacks... Name Time (ms) Own Time (ms) org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet(AtomicReaderContext, Bits) 300879 203478 org.apache.luc

Re: Performance question on Spatial Search

2013-07-30 Thread Luis Cappa Banda
Hey, David, I´ve been reading the thread and I think that is one of the most educative mail-threads I´ve read in Solr mailing list. Just for curiosity: internally for Solr, is it the same a query like "field:*" and "field:[* TO *]"? I think that it´s expected to receive the same number of numFound

Re: Performance question on Spatial Search

2013-07-30 Thread Smiley, David W.
Steve, The FieldCache and DocValues are irrelevant to this problem. Solr's FilterCache is, and Lucene has no counterpart. Perhaps it would be cool if Solr could look for expensive field:* usages when parsing its queries and re-write them to use the FilterCache. That's quite doable, I think. I ju

Re: Performance question on Spatial Search

2013-07-30 Thread Mikhail Khludnev
On Tue, Jul 30, 2013 at 12:45 AM, Steven Bower wrote: > > - Most of my time (98%) is being spent in > java.nio.Bits.copyToByteArray(long,Object,long,long) which is being Steven, please http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html .my benchmarking experience shows that

Re: Performance question on Spatial Search

2013-07-30 Thread Steven Bower
I am curious why the field:* walks the entire terms list.. could this be discovered from a field cache / docvalues? steve On Tue, Jul 30, 2013 at 2:00 PM, Steven Bower wrote: > Until I get the data refed I there was another field (a date field) that > was there and not when the geo field was/w

Re: Performance question on Spatial Search

2013-07-30 Thread Steven Bower
Until I get the data refed I there was another field (a date field) that was there and not when the geo field was/was not... i tried that field:* and query times come down to 2.5s .. also just removing that filter brings the query down to 30ms.. so I'm very hopeful that with just a boolean i'll be

Re: Performance question on Spatial Search

2013-07-30 Thread Steven Bower
Will give the boolean thing a shot... makes sense... On Tue, Jul 30, 2013 at 11:53 AM, Smiley, David W. wrote: > I see the problem ‹ it's +pp:*. It may look innocent but it's a > performance killer. What your telling Lucene to do is iterate over > *every* term in this index to find all document

Re: Performance question on Spatial Search

2013-07-30 Thread Smiley, David W.
I see the problem ‹ it's +pp:*. It may look innocent but it's a performance killer. What your telling Lucene to do is iterate over *every* term in this index to find all documents that have this data. Most fields are pretty slow to do that. Lucene/Solr does not have some kind of cache for this. I

Re: Performance question on Spatial Search

2013-07-30 Thread Steven Bower
#1 Here is my query: sort=vid asc start=0 rows=1000 defType=edismax q=*:* fq=recordType:"xxx" fq=vt:"X12B" AND fq=(cls:"3" OR cls:"8") fq=dt:[2013-05-08T00:00:00.00Z TO 2013-07-08T00:00:00.00Z] fq=(vid:86XXX73 OR vid:86XXX20 OR vid:89XXX60 OR vid:89XXX72 OR vid:89XXX48 OR vid:89XXX31 OR vid:89XXX2

Re: Performance question on Spatial Search

2013-07-30 Thread David Smiley (@MITRE.org)
Steve, (1) Can you give a specific example of how your are specifying the spatial query? I'm looking to ensure you are not using "IsWithin", which is not meant for point data. If your query shape is a circle or the bounding box of a circle, you should use the geofilt query parser, otherwise use

Re: Performance question on Spatial Search

2013-07-30 Thread Erick Erickson
bq: i've added {!cache=false} Ahh, ok. forget my comments on warming then, they're irrelevant. Heap probably isn't relevant either given, as you say, you don't see pressure there. What puzzles me then is why you're spending all your time in copyToByteArray(long,Object,long,long). I _suppose_ (an

Re: Performance question on Spatial Search

2013-07-29 Thread Steven Bower
@Erick it is alot of hw, but basically trying to create a "best case scenario" to take HW out of the question. Will try increasing heap size tomorrow.. I haven't seen it get close to the max heap size yet.. but it's worth trying... Note that these queries look something like: q=*:* fq=[date range

Re: Performance question on Spatial Search

2013-07-29 Thread Bill Bell
Can you compare with the old geo handler as a baseline. ? Bill Bell Sent from mobile On Jul 29, 2013, at 4:25 PM, Erick Erickson wrote: > This is very strange. I'd expect slow queries on > the first few queries while these caches were > warmed, but after that I'd expect things to > be quite fa

Re: Performance question on Spatial Search

2013-07-29 Thread Erick Erickson
This is very strange. I'd expect slow queries on the first few queries while these caches were warmed, but after that I'd expect things to be quite fast. For a 12G index and 256G RAM, you have on the surface a LOT of hardware to throw at this problem. You can _try_ giving the JVM, say, 18G but tha

Re: Performance Question

2012-03-19 Thread Bill Bell
The size of the index does matter practically speaking. Bill Bell Sent from mobile On Mar 19, 2012, at 11:41 AM, Mikhail Khludnev wrote: > Exactly. That's what I mean. > > On Mon, Mar 19, 2012 at 6:15 PM, Jamie Johnson wrote: > >> Mikhail, >> >> Thanks for the response. Just to be clear

Re: Performance Question

2012-03-19 Thread Mikhail Khludnev
Exactly. That's what I mean. On Mon, Mar 19, 2012 at 6:15 PM, Jamie Johnson wrote: > Mikhail, > > Thanks for the response. Just to be clear you're saying that the size > of the index does not matter, it's more the size of the results? > > On Fri, Mar 16, 2012 at 2:43 PM, Mikhail Khludnev > wro

Re: Performance Question

2012-03-19 Thread Jamie Johnson
Mikhail, Thanks for the response. Just to be clear you're saying that the size of the index does not matter, it's more the size of the results? On Fri, Mar 16, 2012 at 2:43 PM, Mikhail Khludnev wrote: > Hello, > > Frankly speaking the computational complexity of Lucene search depends from > siz

Re: Performance Question

2012-03-16 Thread Mikhail Khludnev
Hello, Frankly speaking the computational complexity of Lucene search depends from size of search result: numFound*log(start+rows), but from size of index. Regards On Fri, Mar 16, 2012 at 9:34 PM, Jamie Johnson wrote: > I'm curious if anyone tell me how Solr/Lucene performs in a situation > wh

Re: performance question

2010-01-06 Thread A. Steven Anderson
> You don't lose copyField capability with dynamic fields. You can copy > dynamic fields into a fixed field name like *_s => text or dynamic fields > into another dynamic field like *_s => *_t Ahhh...I missed that little detail. Nice! Ok, so there are no negatives to using dynamic fields then

Re: performance question

2010-01-06 Thread Erik Hatcher
You don't lose copyField capability with dynamic fields. You can copy dynamic fields into a fixed field name like *_s => text or dynamic fields into another dynamic field like *_s => *_t Erik On Jan 6, 2010, at 9:35 AM, A. Steven Anderson wrote: Strictly speaking there is some ins

Re: performance question

2010-01-06 Thread A. Steven Anderson
> Strictly speaking there is some insignificant distinctions in performance > related to how a field name is resolved -- Grant alluded to this > earlier in this thread -- but it only comes into play when you actually > refer to that field by name and Solr has to "look them up" in the > metadata. S

Re: performance question

2010-01-05 Thread Chris Hostetter
: > So, in general, there is no *significant* performance difference with using : > dynamic fields. Correct? : : Correct. There's not even really an "insignificant" performance difference. : A dynamic field is the same as a regular field in practically every way on the : search side of things.

Re: performance question

2010-01-04 Thread Erik Hatcher
On Jan 4, 2010, at 12:04 AM, A. Steven Anderson wrote: dynamic fields don't make it worse ... the number of actaul field names you sort on makes it worse. If you sort on 100 fields, the cost is the same regardless of wether all 100 of those fields exist because of a single declaration

Re: performance question

2010-01-03 Thread A. Steven Anderson
> > dynamic fields don't make it worse ... the number of actaul field names > you sort on makes it worse. > > If you sort on 100 fields, the cost is the same regardless of wether all > 100 of those fields exist because of a single declaration, > or 100 distinct declarations. > Ahh...thanks for t

Re: performance question

2010-01-03 Thread Chris Hostetter
: > If you sort on many of your dynamic fields your memory use will : > explode, and the same with index norms and disk space. : Thanks for the info. In general, I knew sorting was expensive, but I didn't : realize that dynamic fields made it worse. dynamic fields don't make it worse ... the nu

Re: performance question

2010-01-03 Thread A. Steven Anderson
> Sorting and index norms have space penalties. > Sorting on a field creates an array of Java ints, one for every > document in the index. Index norms (used for boosting documents and > other things) create an array of bytes in the Lucene index files, one > for every document in the index. > If you

Re: performance question

2010-01-02 Thread Lance Norskog
Sorting and index norms have space penalties. Sorting on a field creates an array of Java ints, one for every document in the index. Index norms (used for boosting documents and other things) create an array of bytes in the Lucene index files, one for every document in the index. If you sort on m

Re: performance question

2009-12-30 Thread A. Steven Anderson
> There can be an impact if you are searching against a lot of fields or if > you are indexing a lot of fields on every document, but for the most part in > most applications it is negligible. > We index a lot of fields at one time, but we can tolerate the performance impact at index time. It pro

Re: performance question

2009-12-30 Thread Grant Ingersoll
On Dec 29, 2009, at 2:19 PM, A. Steven Anderson wrote: > Greetings! > > Is there any significant negative performance impact of using a > dynamicField? There can be an impact if you are searching against a lot of fields or if you are indexing a lot of fields on every document, but for the most

Re: Performance question: Solr 64 bit java vs 32 bit mode.

2007-11-20 Thread Otis Gospodnetic
Solr runs equally well on both 64-bit and 32-bit systems. Your 15 second problem could be caused by IO bottleneck (not likely if your index is small and fits in RAM), could be concurrency (esp. if you are using compound index format), could be something else on production killing your CPU, coul

Re: Performance question: Solr 64 bit java vs 32 bit mode.

2007-11-17 Thread Yonik Seeley
On Nov 15, 2007 4:05 PM, Robert Purdy <[EMAIL PROTECTED]> wrote: > I was looking in the logs on the production server and noticed some queries > were taking about 15 seconds Could be a number of reasons... first make sure a major garbage collection wasn't triggered at that point in time. -Yonik