Re: Performance question on Spatial Search

Steven Bower Tue, 30 Jul 2013 08:34:16 -0700

#1 Here is my query:

sort=vid asc
start=0
rows=1000
defType=edismax
q=*:*
fq=recordType:"xxx"
fq=vt:"X12B" AND
fq=(cls:"3" OR cls:"8")
fq=dt:[2013-05-08T00:00:00.00Z TO 2013-07-08T00:00:00.00Z]
fq=(vid:86XXX73 OR vid:86XXX20 OR vid:89XXX60 OR vid:89XXX72 OR vid:89XXX48
OR vid:89XXX31 OR vid:89XXX28 OR vid:89XXX67 OR vid:90XXX76 OR vid:90XXX33
OR vid:90XXX47 OR vid:90XXX97 OR vid:90XXX69 OR vid:90XXX31 OR vid:90XXX44
OR vid:91XXX82 OR vid:91XXX08 OR vid:91XXX32 OR vid:91XXX13 OR vid:91XXX87
OR vid:91XXX82 OR vid:91XXX48 OR vid:91XXX34 OR vid:91XXX31 OR vid:91XXX94
OR vid:91XXX29 OR vid:91XXX31 OR vid:91XXX43 OR vid:91XXX55 OR vid:91XXX67
OR vid:91XXX15 OR vid:91XXX59 OR vid:92XXX95 OR vid:92XXX24 OR vid:92XXX13
OR vid:92XXX07 OR vid:92XXX92 OR vid:92XXX22 OR vid:92XXX25 OR vid:92XXX99
OR vid:92XXX53 OR vid:92XXX55 OR vid:92XXX27 OR vid:92XXX65 OR vid:92XXX41
OR vid:92XXX89 OR vid:92XXX11 OR vid:93XXX45 OR vid:93XXX05 OR vid:93XXX98
OR vid:93XXX70 OR vid:93XXX24 OR vid:93XXX39 OR vid:93XXX69 OR vid:93XXX28
OR vid:93XXX79 OR vid:93XXX66 OR vid:94XXX13 OR vid:94XXX16 OR vid:94XXX10
OR vid:94XXX37 OR vid:94XXX69 OR vid:94XXX29 OR vid:94XXX70 OR vid:94XXX58
OR vid:94XXX08 OR vid:94XXX64 OR vid:94XXX32 OR vid:94XXX44 OR vid:94XXX56
OR vid:95XXX59 OR vid:95XXX72 OR vid:95XXX14 OR vid:95XXX08 OR vid:96XXX10
OR vid:96XXX54 )
fq=gp:"Intersects(POLYGON((47.0 30.0, 47.0 27.0, 52.0 27.0, 52.0 30.0, 47.0
30.0)))" AND NOT pp:"Intersects(POLYGON((47.0 30.0, 47.0 27.0, 52.0 27.0,
52.0 30.0, 47.0 30.0)))" AND +pp:*


Basically looking for a set of records by "vid" then if its gp is in one
polygon and is pp is not in another (and it has a pp)... essentially
looking to see if a record moved between two polygons (gp=current, pp=prev)
during a time period.

#2 Yes on JTS (unless from my query above I don't) however this is only an
initial use case and I suspect we'll need more complex stuff in the future

#3 The data is distributed globally but along generally fixed paths and
then clustering around certain areas... for example the polygon above has
about 11k points (with no date filtering). So basically some areas will be
very dense and most areas not, the majority of searches will be around the
dense areas

#4 Its very likely to be less than 1M results (with filters) .. is there
any functinoality loss with LatLonType fields?

Thanks,

steve


On Tue, Jul 30, 2013 at 10:49 AM, David Smiley (@MITRE.org) <
dsmi...@mitre.org> wrote:

> Steve,
> (1)  Can you give a specific example of how your are specifying the spatial
> query?  I'm looking to ensure you are not using "IsWithin", which is not
> meant for point data.  If your query shape is a circle or the bounding box
> of a circle, you should use the geofilt query parser, otherwise use the
> quirky syntax that allows you to specify the spatial predicate with
> "Intersects".
> (2) Do you actually need JTS?  i.e. are you using Polygons, etc.
> (3) How "dense" would you estimate the data is at the 50m resolution you've
> configured the data?  If It's very dense then I'll tell you how to raise
> the
> "prefix grid scan level" to a # closer to max-levels.
> (4) Do all of your searches find less than a million points, considering
> all
> filters?  If so then it's worth comparing the results with LatLonType.
>
> ~ David Smiley
>
>
> Steven Bower wrote
> > @Erick it is alot of hw, but basically trying to create a "best case
> > scenario" to take HW out of the question. Will try increasing heap size
> > tomorrow.. I haven't seen it get close to the max heap size yet.. but
> it's
> > worth trying...
> >
> > Note that these queries look something like:
> >
> > q=*:*
> > fq=[date range]
> > fq=geo query
> >
> > on the fq for the geo query i've added {!cache=false} to prevent it from
> > ending up in the filter cache.. once it's in filter cache queries come
> > back
> > in 10-20ms. For my use case i need the first unique geo search query to
> > come back in a more reasonable time so I am currently ignoring the cache.
> >
> > @Bill will look into that, I'm not certain it will support the particular
> > queries that are being executed but I'll investigate..
> >
> > steve
> >
> >
> > On Mon, Jul 29, 2013 at 6:25 PM, Erick Erickson &lt;
>
> > erickerickson@
>
> > &gt;wrote:
> >
> >> This is very strange. I'd expect slow queries on
> >> the first few queries while these caches were
> >> warmed, but after that I'd expect things to
> >> be quite fast.
> >>
> >> For a 12G index and 256G RAM, you have on the
> >> surface a LOT of hardware to throw at this problem.
> >> You can _try_ giving the JVM, say, 18G but that
> >> really shouldn't be a big issue, your index files
> >> should be MMaped.
> >>
> >> Let's try the crude thing first and give the JVM
> >> more memory.
> >>
> >> FWIW
> >> Erick
> >>
> >> On Mon, Jul 29, 2013 at 4:45 PM, Steven Bower &lt;
>
> > smb-apache@
>
> > &gt;
> >> wrote:
> >> > I've been doing some performance analysis of a spacial search use case
> >> I'm
> >> > implementing in Solr 4.3.0. Basically I'm seeing search times alot
> >> higher
> >> > than I'd like them to be and I'm hoping people may have some
> >> suggestions
> >> > for how to optimize further.
> >> >
> >> > Here are the specs of what I'm doing now:
> >> >
> >> > Machine:
> >> > - 16 cores @ 2.8ghz
> >> > - 256gb RAM
> >> > - 1TB (RAID 1+0 on 10 SSD)
> >> >
> >> > Content:
> >> > - 45M docs (not very big only a few fields with no large textual
> >> content)
> >> > - 1 geo field (using config below)
> >> > - index is 12gb
> >> > - 1 shard
> >> > - Using MMapDirectory
> >> >
> >> > Field config:
> >> >
> >> >
> > <fieldType name="geo" class="solr.SpatialRecursivePrefixTreeFieldType"
> >>
> >  > distErrPct="0.025" maxDistErr="0.00045"
> >> >
> >>
> spatialContextFactory="com.spatial4j.core.context.jts.JtsSpatialContextFactory"
> >> > units="degrees"/>
> >> >
> >> >
> > <field  name="geopoint" indexed="true" multiValued="false"
> >>
> >  > required="false" stored="true" type="geo"/>
> >> >
> >> >
> >> > What I've figured out so far:
> >> >
> >> > - Most of my time (98%) is being spent in
> >> > java.nio.Bits.copyToByteArray(long,Object,long,long) which is being
> >> > driven by
> >> BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.loadBlock()
> >> > which from what I gather is basically reading terms from the .tim file
> >> > in blocks
> >> >
> >> > - I moved from Java 1.6 to 1.7 based upon what I read here:
> >> >
> >>
> http://blog.vlad1.com/2011/10/05/looking-at-java-nio-buffer-performance/
> >> > and it definitely had some positive impact (i haven't been able to
> >> > measure this independantly yet)
> >> >
> >> > - I changed maxDistErr from 0.000009 (which is 1m precision per docs)
> >> > to 0.00045 (50m precision) ..
> >> >
> >> > - It looks to me that the .tim file are being memory mapped fully (ie
> >> > they show up in pmap output) the virtual size of the jvm is ~18gb
> >> > (heap is 6gb)
> >> >
> >> > - I've optimized the index but this doesn't have a dramatic impact on
> >> > performance
> >> >
> >> > Changing the precision and the JVM upgrade yielded a drop from ~18s
> >> > avg query time to ~9s avg query time.. This is fantastic but I want to
> >> > get this down into the 1-2 second range.
> >> >
> >> > At this point it seems that basically i am bottle-necked on basically
> >> > copying memory out of the mapped .tim file which leads me to think
> >> > that the only solution to my problem would be to read less data or
> >> > somehow read it more efficiently..
> >> >
> >> > If anyone has any suggestions of where to go with this I'd love to
> know
> >> >
> >> >
> >> > thanks,
> >> >
> >> > steve
> >>
>
>
>
>
>
> -----
>  Author:
> http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Performance-question-on-Spatial-Search-tp4081150p4081309.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Performance question on Spatial Search

Reply via email to