Re: Performance question on Spatial Search

Steven Bower Mon, 29 Jul 2013 19:43:14 -0700

@Erick it is alot of hw, but basically trying to create a "best case
scenario" to take HW out of the question. Will try increasing heap size
tomorrow.. I haven't seen it get close to the max heap size yet.. but it's
worth trying...


Note that these queries look something like:

q=*:*
fq=[date range]
fq=geo query

on the fq for the geo query i've added {!cache=false} to prevent it from
ending up in the filter cache.. once it's in filter cache queries come back
in 10-20ms. For my use case i need the first unique geo search query to
come back in a more reasonable time so I am currently ignoring the cache.

@Bill will look into that, I'm not certain it will support the particular
queries that are being executed but I'll investigate..

steve


On Mon, Jul 29, 2013 at 6:25 PM, Erick Erickson <erickerick...@gmail.com>wrote:

> This is very strange. I'd expect slow queries on
> the first few queries while these caches were
> warmed, but after that I'd expect things to
> be quite fast.
>
> For a 12G index and 256G RAM, you have on the
> surface a LOT of hardware to throw at this problem.
> You can _try_ giving the JVM, say, 18G but that
> really shouldn't be a big issue, your index files
> should be MMaped.
>
> Let's try the crude thing first and give the JVM
> more memory.
>
> FWIW
> Erick
>
> On Mon, Jul 29, 2013 at 4:45 PM, Steven Bower <smb-apa...@alcyon.net>
> wrote:
> > I've been doing some performance analysis of a spacial search use case
> I'm
> > implementing in Solr 4.3.0. Basically I'm seeing search times alot higher
> > than I'd like them to be and I'm hoping people may have some suggestions
> > for how to optimize further.
> >
> > Here are the specs of what I'm doing now:
> >
> > Machine:
> > - 16 cores @ 2.8ghz
> > - 256gb RAM
> > - 1TB (RAID 1+0 on 10 SSD)
> >
> > Content:
> > - 45M docs (not very big only a few fields with no large textual content)
> > - 1 geo field (using config below)
> > - index is 12gb
> > - 1 shard
> > - Using MMapDirectory
> >
> > Field config:
> >
> > <fieldType name="geo" class="solr.SpatialRecursivePrefixTreeFieldType"
> > distErrPct="0.025" maxDistErr="0.00045"
> >
> spatialContextFactory="com.spatial4j.core.context.jts.JtsSpatialContextFactory"
> > units="degrees"/>
> >
> > <field  name="geopoint" indexed="true" multiValued="false"
> > required="false" stored="true" type="geo"/>
> >
> >
> > What I've figured out so far:
> >
> > - Most of my time (98%) is being spent in
> > java.nio.Bits.copyToByteArray(long,Object,long,long) which is being
> > driven by
> BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.loadBlock()
> > which from what I gather is basically reading terms from the .tim file
> > in blocks
> >
> > - I moved from Java 1.6 to 1.7 based upon what I read here:
> > http://blog.vlad1.com/2011/10/05/looking-at-java-nio-buffer-performance/
> > and it definitely had some positive impact (i haven't been able to
> > measure this independantly yet)
> >
> > - I changed maxDistErr from 0.000009 (which is 1m precision per docs)
> > to 0.00045 (50m precision) ..
> >
> > - It looks to me that the .tim file are being memory mapped fully (ie
> > they show up in pmap output) the virtual size of the jvm is ~18gb
> > (heap is 6gb)
> >
> > - I've optimized the index but this doesn't have a dramatic impact on
> > performance
> >
> > Changing the precision and the JVM upgrade yielded a drop from ~18s
> > avg query time to ~9s avg query time.. This is fantastic but I want to
> > get this down into the 1-2 second range.
> >
> > At this point it seems that basically i am bottle-necked on basically
> > copying memory out of the mapped .tim file which leads me to think
> > that the only solution to my problem would be to read less data or
> > somehow read it more efficiently..
> >
> > If anyone has any suggestions of where to go with this I'd love to know
> >
> >
> > thanks,
> >
> > steve
>

Re: Performance question on Spatial Search

Reply via email to