Re: Performance question on Spatial Search

2013-08-05 Thread David Smiley (@MITRE.org)
From: "Steven Bower-2 [via Lucene]" mailto:ml-node+s472066n4082569...@n3.nabble.com>> Date: Monday, August 5, 2013 9:14 AM To: "Smiley, David W." mailto:dsmi...@mitre.org>> Subject: Re: Performance question on Spatial Search So after re-feeding our data with

Re: Performance question on Spatial Search

2013-08-05 Thread Shawn Heisey
On 8/5/2013 7:13 AM, Steven Bower wrote: > So after re-feeding our data with a new boolean field that is true when > data exists and false when it doesn't our search times have gone from avg > of about 20s to around 150ms... pretty amazing change in perf... It seems > like https://issues.apache.org

Re: Performance question on Spatial Search

2013-08-05 Thread Steven Bower
So after re-feeding our data with a new boolean field that is true when data exists and false when it doesn't our search times have gone from avg of about 20s to around 150ms... pretty amazing change in perf... It seems like https://issues.apache.org/jira/browse/SOLR-5093 might alleviate many peopl

Re: Performance question on Spatial Search

2013-07-31 Thread Steven Bower
the list of IDs does change relatively frequently, but this doesn't seem to have very much impact on the performance of the query as far as I can tell. attached are the stacks thanks, steve On Wed, Jul 31, 2013 at 6:33 AM, Mikhail Khludnev < mkhlud...@griddynamics.com> wrote: > On Wed, Jul 31

Re: Performance question on Spatial Search

2013-07-31 Thread Mikhail Khludnev
On Wed, Jul 31, 2013 at 1:10 AM, Steven Bower wrote: > > not sure what you mean by good hit raitio? > I mean such queries are really expensive (even on cache hit), so if the list of ids changes every time, it never hit cache and hence executes these heavy queries every time. It's well known perf

Re: Performance question on Spatial Search

2013-07-30 Thread Luis Cappa Banda
Thank you very much, David. That was a great explanation! Regards, - Luis Cappa 2013/7/30 Smiley, David W. > Luis, > > field:* and field:[* TO *] are semantically equivalent -- they have the > same effect. But they internally work differently depending on the field > type. The field type ha

Re: Performance question on Spatial Search

2013-07-30 Thread Smiley, David W.
Luis, field:* and field:[* TO *] are semantically equivalent -- they have the same effect. But they internally work differently depending on the field type. The field type has the chance to intercept the range query to do something smart (FieldType.getRangeQuery(...)). Numeric/Date (trie) field

Re: Performance question on Spatial Search

2013-07-30 Thread Steven Bower
@David I will certainly update when we get the data refed... and if you have things you'd like to investigate or try out please let me know.. I'm happy to eval things at scale here... we will be taking this index from its current 45m records to 6-700m over the next few months as well.. steve On

Re: Performance question on Spatial Search

2013-07-30 Thread Steven Bower
Very good read... Already using MMap... verified using pmap and vsz from top.. not sure what you mean by good hit raitio? Here are the stacks... Name Time (ms) Own Time (ms) org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet(AtomicReaderContext, Bits) 300879 203478 org.apache.luc

Re: Performance question on Spatial Search

2013-07-30 Thread Luis Cappa Banda
Hey, David, I´ve been reading the thread and I think that is one of the most educative mail-threads I´ve read in Solr mailing list. Just for curiosity: internally for Solr, is it the same a query like "field:*" and "field:[* TO *]"? I think that it´s expected to receive the same number of numFound

Re: Performance question on Spatial Search

2013-07-30 Thread Smiley, David W.
Steve, The FieldCache and DocValues are irrelevant to this problem. Solr's FilterCache is, and Lucene has no counterpart. Perhaps it would be cool if Solr could look for expensive field:* usages when parsing its queries and re-write them to use the FilterCache. That's quite doable, I think. I ju

Re: Performance question on Spatial Search

2013-07-30 Thread Mikhail Khludnev
On Tue, Jul 30, 2013 at 12:45 AM, Steven Bower wrote: > > - Most of my time (98%) is being spent in > java.nio.Bits.copyToByteArray(long,Object,long,long) which is being Steven, please http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html .my benchmarking experience shows that

Re: Performance question on Spatial Search

2013-07-30 Thread Steven Bower
ally I'm seeing search times >>> alot >>> >> >> higher >>> >> >> > than I'd like them to be and I'm hoping people may have some >>> >> >> suggestions >>> >> >> > for how to optimize further. >>

Re: Performance question on Spatial Search

2013-07-30 Thread Steven Bower
t; >> content) >> >> >> > - 1 geo field (using config below) >> >> >> > - index is 12gb >> >> >> > - 1 shard >> >> >> > - Using MMapDirectory >> >> >> > >> >> >> > Field config: >> >> >> > >> >> >> > >> >> > > class="solr.SpatialRecursivePrefixTreeFieldType" >> >> >> >> >> > > distErrPct="0.025" maxDistErr="0.00045" >> >> >> > >> >> >> >> >> >> >> >>spatialContextFactory="com.spatial4j.core.context.jts.JtsSpatialContextFa >> >>ctory" >> >> >> > units="degrees"/> >> >> >> > >> >> >> > >> >> > > >> >> >> >> > > required="false" stored="true" type="geo"/> >> >> >> > >> >> >> > >> >> >> > What I've figured out so far: >> >> >> > >> >> >> > - Most of my time (98%) is being spent in >> >> >> > java.nio.Bits.copyToByteArray(long,Object,long,long) which is >> being >> >> >> > driven by >> >> >> BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.loadBlock() >> >> >> > which from what I gather is basically reading terms from the .tim >> >>file >> >> >> > in blocks >> >> >> > >> >> >> > - I moved from Java 1.6 to 1.7 based upon what I read here: >> >> >> > >> >> >> >> >> >> http://blog.vlad1.com/2011/10/05/looking-at-java-nio-buffer-performance/ >> >> >> > and it definitely had some positive impact (i haven't been able to >> >> >> > measure this independantly yet) >> >> >> > >> >> >> > - I changed maxDistErr from 0.09 (which is 1m precision per >> >>docs) >> >> >> > to 0.00045 (50m precision) .. >> >> >> > >> >> >> > - It looks to me that the .tim file are being memory mapped fully >> >>(ie >> >> >> > they show up in pmap output) the virtual size of the jvm is ~18gb >> >> >> > (heap is 6gb) >> >> >> > >> >> >> > - I've optimized the index but this doesn't have a dramatic impact >> >>on >> >> >> > performance >> >> >> > >> >> >> > Changing the precision and the JVM upgrade yielded a drop from >> ~18s >> >> >> > avg query time to ~9s avg query time.. This is fantastic but I >> >>want to >> >> >> > get this down into the 1-2 second range. >> >> >> > >> >> >> > At this point it seems that basically i am bottle-necked on >> >>basically >> >> >> > copying memory out of the mapped .tim file which leads me to think >> >> >> > that the only solution to my problem would be to read less data or >> >> >> > somehow read it more efficiently.. >> >> >> > >> >> >> > If anyone has any suggestions of where to go with this I'd love to >> >> know >> >> >> > >> >> >> > >> >> >> > thanks, >> >> >> > >> >> >> > steve >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> - >> >> Author: >> >> http://www.packtpub.com/apache-solr-3-enterprise-search-server/book >> >> -- >> >> View this message in context: >> >> >> >> >> http://lucene.472066.n3.nabble.com/Performance-question-on-Spatial-Search >> >>-tp4081150p4081309.html >> >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> >> >> >

Re: Performance question on Spatial Search

2013-07-30 Thread Steven Bower
00045" > >> >> > > >> >> > >> > >>spatialContextFactory="com.spatial4j.core.context.jts.JtsSpatialContextFa > >>ctory" > >> >> > units="degrees"/> > >> >> > > >> >> > > >> > >> >> > >> > > required="false" stored="true" type="geo"/> > >> >> > > >> >> > > >> >> > What I've figured out so far: > >> >> > > >> >> > - Most of my time (98%) is being spent in > >> >> > java.nio.Bits.copyToByteArray(long,Object,long,long) which is being > >> >> > driven by > >> >> BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.loadBlock() > >> >> > which from what I gather is basically reading terms from the .tim > >>file > >> >> > in blocks > >> >> > > >> >> > - I moved from Java 1.6 to 1.7 based upon what I read here: > >> >> > > >> >> > >> > http://blog.vlad1.com/2011/10/05/looking-at-java-nio-buffer-performance/ > >> >> > and it definitely had some positive impact (i haven't been able to > >> >> > measure this independantly yet) > >> >> > > >> >> > - I changed maxDistErr from 0.09 (which is 1m precision per > >>docs) > >> >> > to 0.00045 (50m precision) .. > >> >> > > >> >> > - It looks to me that the .tim file are being memory mapped fully > >>(ie > >> >> > they show up in pmap output) the virtual size of the jvm is ~18gb > >> >> > (heap is 6gb) > >> >> > > >> >> > - I've optimized the index but this doesn't have a dramatic impact > >>on > >> >> > performance > >> >> > > >> >> > Changing the precision and the JVM upgrade yielded a drop from ~18s > >> >> > avg query time to ~9s avg query time.. This is fantastic but I > >>want to > >> >> > get this down into the 1-2 second range. > >> >> > > >> >> > At this point it seems that basically i am bottle-necked on > >>basically > >> >> > copying memory out of the mapped .tim file which leads me to think > >> >> > that the only solution to my problem would be to read less data or > >> >> > somehow read it more efficiently.. > >> >> > > >> >> > If anyone has any suggestions of where to go with this I'd love to > >> know > >> >> > > >> >> > > >> >> > thanks, > >> >> > > >> >> > steve > >> >> > >> > >> > >> > >> > >> > >> - > >> Author: > >> http://www.packtpub.com/apache-solr-3-enterprise-search-server/book > >> -- > >> View this message in context: > >> > >> > http://lucene.472066.n3.nabble.com/Performance-question-on-Spatial-Search > >>-tp4081150p4081309.html > >> Sent from the Solr - User mailing list archive at Nabble.com. > >> > >

Re: Performance question on Spatial Search

2013-07-30 Thread Smiley, David W.
I've figured out so far: >> >> > >> >> > - Most of my time (98%) is being spent in >> >> > java.nio.Bits.copyToByteArray(long,Object,long,long) which is being >> >> > driven by >> >> BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.loadBlock() >> >> > which from what I gather is basically reading terms from the .tim >>file >> >> > in blocks >> >> > >> >> > - I moved from Java 1.6 to 1.7 based upon what I read here: >> >> > >> >> >> http://blog.vlad1.com/2011/10/05/looking-at-java-nio-buffer-performance/ >> >> > and it definitely had some positive impact (i haven't been able to >> >> > measure this independantly yet) >> >> > >> >> > - I changed maxDistErr from 0.09 (which is 1m precision per >>docs) >> >> > to 0.00045 (50m precision) .. >> >> > >> >> > - It looks to me that the .tim file are being memory mapped fully >>(ie >> >> > they show up in pmap output) the virtual size of the jvm is ~18gb >> >> > (heap is 6gb) >> >> > >> >> > - I've optimized the index but this doesn't have a dramatic impact >>on >> >> > performance >> >> > >> >> > Changing the precision and the JVM upgrade yielded a drop from ~18s >> >> > avg query time to ~9s avg query time.. This is fantastic but I >>want to >> >> > get this down into the 1-2 second range. >> >> > >> >> > At this point it seems that basically i am bottle-necked on >>basically >> >> > copying memory out of the mapped .tim file which leads me to think >> >> > that the only solution to my problem would be to read less data or >> >> > somehow read it more efficiently.. >> >> > >> >> > If anyone has any suggestions of where to go with this I'd love to >> know >> >> > >> >> > >> >> > thanks, >> >> > >> >> > steve >> >> >> >> >> >> >> >> - >> Author: >> http://www.packtpub.com/apache-solr-3-enterprise-search-server/book >> -- >> View this message in context: >> >>http://lucene.472066.n3.nabble.com/Performance-question-on-Spatial-Search >>-tp4081150p4081309.html >> Sent from the Solr - User mailing list archive at Nabble.com. >>

Re: Performance question on Spatial Search

2013-07-30 Thread Steven Bower
> >> > > >> > - I moved from Java 1.6 to 1.7 based upon what I read here: > >> > > >> > http://blog.vlad1.com/2011/10/05/looking-at-java-nio-buffer-performance/ > >> > and it definitely had some positive impact (i haven't been able to > >>

Re: Performance question on Spatial Search

2013-07-30 Thread David Smiley (@MITRE.org)
0.00045 (50m precision) .. >> > >> > - It looks to me that the .tim file are being memory mapped fully (ie >> > they show up in pmap output) the virtual size of the jvm is ~18gb >> > (heap is 6gb) >> > >> > - I've optimized the index but this doesn't have a dramatic impact on >> > performance >> > >> > Changing the precision and the JVM upgrade yielded a drop from ~18s >> > avg query time to ~9s avg query time.. This is fantastic but I want to >> > get this down into the 1-2 second range. >> > >> > At this point it seems that basically i am bottle-necked on basically >> > copying memory out of the mapped .tim file which leads me to think >> > that the only solution to my problem would be to read less data or >> > somehow read it more efficiently.. >> > >> > If anyone has any suggestions of where to go with this I'd love to know >> > >> > >> > thanks, >> > >> > steve >> - Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/Performance-question-on-Spatial-Search-tp4081150p4081309.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Performance question on Spatial Search

2013-07-30 Thread Erick Erickson
bq: i've added {!cache=false} Ahh, ok. forget my comments on warming then, they're irrelevant. Heap probably isn't relevant either given, as you say, you don't see pressure there. What puzzles me then is why you're spending all your time in copyToByteArray(long,Object,long,long). I _suppose_ (an

Re: Performance question on Spatial Search

2013-07-29 Thread Steven Bower
@Erick it is alot of hw, but basically trying to create a "best case scenario" to take HW out of the question. Will try increasing heap size tomorrow.. I haven't seen it get close to the max heap size yet.. but it's worth trying... Note that these queries look something like: q=*:* fq=[date range

Re: Performance question on Spatial Search

2013-07-29 Thread Bill Bell
Can you compare with the old geo handler as a baseline. ? Bill Bell Sent from mobile On Jul 29, 2013, at 4:25 PM, Erick Erickson wrote: > This is very strange. I'd expect slow queries on > the first few queries while these caches were > warmed, but after that I'd expect things to > be quite fa

Re: Performance question on Spatial Search

2013-07-29 Thread Erick Erickson
This is very strange. I'd expect slow queries on the first few queries while these caches were warmed, but after that I'd expect things to be quite fast. For a 12G index and 256G RAM, you have on the surface a LOT of hardware to throw at this problem. You can _try_ giving the JVM, say, 18G but tha

Performance question on Spatial Search

2013-07-29 Thread Steven Bower
I've been doing some performance analysis of a spacial search use case I'm implementing in Solr 4.3.0. Basically I'm seeing search times alot higher than I'd like them to be and I'm hoping people may have some suggestions for how to optimize further. Here are the specs of what I'm doing now: Mach