From: "Steven Bower-2 [via Lucene]"
mailto:ml-node+s472066n4082569...@n3.nabble.com>>
Date: Monday, August 5, 2013 9:14 AM
To: "Smiley, David W." mailto:dsmi...@mitre.org>>
Subject: Re: Performance question on Spatial Search
So after re-feeding our data with
On 8/5/2013 7:13 AM, Steven Bower wrote:
> So after re-feeding our data with a new boolean field that is true when
> data exists and false when it doesn't our search times have gone from avg
> of about 20s to around 150ms... pretty amazing change in perf... It seems
> like https://issues.apache.org
So after re-feeding our data with a new boolean field that is true when
data exists and false when it doesn't our search times have gone from avg
of about 20s to around 150ms... pretty amazing change in perf... It seems
like https://issues.apache.org/jira/browse/SOLR-5093 might alleviate many
peopl
the list of IDs does change relatively frequently, but this doesn't seem to
have very much impact on the performance of the query as far as I can tell.
attached are the stacks
thanks,
steve
On Wed, Jul 31, 2013 at 6:33 AM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:
> On Wed, Jul 31
On Wed, Jul 31, 2013 at 1:10 AM, Steven Bower wrote:
>
> not sure what you mean by good hit raitio?
>
I mean such queries are really expensive (even on cache hit), so if the
list of ids changes every time, it never hit cache and hence executes these
heavy queries every time. It's well known perf
Thank you very much, David. That was a great explanation!
Regards,
- Luis Cappa
2013/7/30 Smiley, David W.
> Luis,
>
> field:* and field:[* TO *] are semantically equivalent -- they have the
> same effect. But they internally work differently depending on the field
> type. The field type ha
Luis,
field:* and field:[* TO *] are semantically equivalent -- they have the
same effect. But they internally work differently depending on the field
type. The field type has the chance to intercept the range query to do
something smart (FieldType.getRangeQuery(...)). Numeric/Date (trie)
field
@David I will certainly update when we get the data refed... and if you
have things you'd like to investigate or try out please let me know.. I'm
happy to eval things at scale here... we will be taking this index from its
current 45m records to 6-700m over the next few months as well..
steve
On
Very good read... Already using MMap... verified using pmap and vsz from
top..
not sure what you mean by good hit raitio?
Here are the stacks...
Name Time (ms) Own Time (ms)
org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet(AtomicReaderContext,
Bits) 300879 203478
org.apache.luc
Hey, David,
I´ve been reading the thread and I think that is one of the most educative
mail-threads I´ve read in Solr mailing list. Just for curiosity: internally
for Solr, is it the same a query like "field:*" and "field:[* TO *]"? I
think that it´s expected to receive the same number of numFound
Steve,
The FieldCache and DocValues are irrelevant to this problem. Solr's
FilterCache is, and Lucene has no counterpart. Perhaps it would be cool
if Solr could look for expensive field:* usages when parsing its queries
and re-write them to use the FilterCache. That's quite doable, I think.
I ju
On Tue, Jul 30, 2013 at 12:45 AM, Steven Bower wrote:
>
> - Most of my time (98%) is being spent in
> java.nio.Bits.copyToByteArray(long,Object,long,long) which is being
Steven, please
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html .my
benchmarking experience shows that
I am curious why the field:* walks the entire terms list.. could this be
discovered from a field cache / docvalues?
steve
On Tue, Jul 30, 2013 at 2:00 PM, Steven Bower wrote:
> Until I get the data refed I there was another field (a date field) that
> was there and not when the geo field was/w
Until I get the data refed I there was another field (a date field) that
was there and not when the geo field was/was not... i tried that field:*
and query times come down to 2.5s .. also just removing that filter brings
the query down to 30ms.. so I'm very hopeful that with just a boolean i'll
be
Will give the boolean thing a shot... makes sense...
On Tue, Jul 30, 2013 at 11:53 AM, Smiley, David W. wrote:
> I see the problem ‹ it's +pp:*. It may look innocent but it's a
> performance killer. What your telling Lucene to do is iterate over
> *every* term in this index to find all document
I see the problem ‹ it's +pp:*. It may look innocent but it's a
performance killer. What your telling Lucene to do is iterate over
*every* term in this index to find all documents that have this data.
Most fields are pretty slow to do that. Lucene/Solr does not have some
kind of cache for this. I
#1 Here is my query:
sort=vid asc
start=0
rows=1000
defType=edismax
q=*:*
fq=recordType:"xxx"
fq=vt:"X12B" AND
fq=(cls:"3" OR cls:"8")
fq=dt:[2013-05-08T00:00:00.00Z TO 2013-07-08T00:00:00.00Z]
fq=(vid:86XXX73 OR vid:86XXX20 OR vid:89XXX60 OR vid:89XXX72 OR vid:89XXX48
OR vid:89XXX31 OR vid:89XXX2
Steve,
(1) Can you give a specific example of how your are specifying the spatial
query? I'm looking to ensure you are not using "IsWithin", which is not
meant for point data. If your query shape is a circle or the bounding box
of a circle, you should use the geofilt query parser, otherwise use
bq: i've added {!cache=false}
Ahh, ok. forget my comments on warming then, they're irrelevant. Heap probably
isn't relevant either given, as you say, you don't see pressure there.
What puzzles me then is why you're spending all your time in
copyToByteArray(long,Object,long,long). I _suppose_ (an
@Erick it is alot of hw, but basically trying to create a "best case
scenario" to take HW out of the question. Will try increasing heap size
tomorrow.. I haven't seen it get close to the max heap size yet.. but it's
worth trying...
Note that these queries look something like:
q=*:*
fq=[date range
Can you compare with the old geo handler as a baseline. ?
Bill Bell
Sent from mobile
On Jul 29, 2013, at 4:25 PM, Erick Erickson wrote:
> This is very strange. I'd expect slow queries on
> the first few queries while these caches were
> warmed, but after that I'd expect things to
> be quite fa
This is very strange. I'd expect slow queries on
the first few queries while these caches were
warmed, but after that I'd expect things to
be quite fast.
For a 12G index and 256G RAM, you have on the
surface a LOT of hardware to throw at this problem.
You can _try_ giving the JVM, say, 18G but tha
22 matches
Mail list logo