Re: Help getting a document by unique ID

Stefan Matheis Tue, 19 Mar 2013 14:18:20 -0700

It's a bit off-topic, but .. to mention it, because Brian said Graph Database 
-- Neo4J uses / can use Lucene .. so, dependent of the usecase this is worth a 
look?




On Tuesday, March 19, 2013 at 10:11 PM, Shawn Heisey wrote:

> On 3/19/2013 2:31 PM, Brian Hurt wrote:
> > Which is the problem- you might think that 60ms unique key accesses
> > (what I'm seeing) is more than good enough- and for most use cases,
> > you'd be right. But it's not unusual for a single web-page hit to
> > generate many dozens, if not low hundreds, of calls to get document by
> > id. At which point, 60ms hits pile up fast.
> 
> 
> 
> I have to concur with Jack's assessment that 60ms may indicate a general 
> performance issue, possibly caused by not having enough memory in your 
> server.
> 
> I've got a distributed index with 77 million documents in it, seven 
> shards, total index size about 85GB. It's running 4.2.
> 
> I tried some uncached unique id queries on it. This search kicks off 
> seven shard searches against two servers, collates the results, then 
> returns them to the browser. The results came back with a QTime of 7-8 
> milliseconds. When I try a different uncached query against one of the 
> shard servers directly (14GB index size), the QTime value is zero.
> 
> I have this performance level because I have plenty of extra RAM, which 
> lets the OS cache the index files effectively. Each server has half the 
> index (over 40GB on disk) and 64GB of RAM. Of that 64GB, 6GB is 
> allocated to Solr. If we say the OS takes up 1GB (which it most likely 
> does not), that leaves 57GB of OS disk cache. Java's garbage collector 
> is highly tuned in my setup, because without it, I experience very long 
> GC pauses.
> 
> 
> Here's some additional info that may or may not be useful to you:
> 
> The BloomFilter postings format for Lucene is rumored to have amazing 
> performance improvements for searching unique keys.
> 
> An obstacle: Solr does not currently have an out-of-the-box way to 
> actually use it. A high-level solution has been proposed, but no code 
> has been written yet. The following issue describes the current state:
> 
> https://issues.apache.org/jira/browse/SOLR-3950
> 
> You could always write your own custom postings format instead of 
> waiting for someone (most likely me) to figure out how to go about 
> including it directly in Solr. If you do this, I hope you'll be able to 
> attach your code to the issue so everyone benefits.
> 
> Thanks,
> Shawn

Re: Help getting a document by unique ID

Reply via email to