It's a bit off-topic, but .. to mention it, because Brian said Graph Database -- Neo4J uses / can use Lucene .. so, dependent of the usecase this is worth a look?
On Tuesday, March 19, 2013 at 10:11 PM, Shawn Heisey wrote: > On 3/19/2013 2:31 PM, Brian Hurt wrote: > > Which is the problem- you might think that 60ms unique key accesses > > (what I'm seeing) is more than good enough- and for most use cases, > > you'd be right. But it's not unusual for a single web-page hit to > > generate many dozens, if not low hundreds, of calls to get document by > > id. At which point, 60ms hits pile up fast. > > > > I have to concur with Jack's assessment that 60ms may indicate a general > performance issue, possibly caused by not having enough memory in your > server. > > I've got a distributed index with 77 million documents in it, seven > shards, total index size about 85GB. It's running 4.2. > > I tried some uncached unique id queries on it. This search kicks off > seven shard searches against two servers, collates the results, then > returns them to the browser. The results came back with a QTime of 7-8 > milliseconds. When I try a different uncached query against one of the > shard servers directly (14GB index size), the QTime value is zero. > > I have this performance level because I have plenty of extra RAM, which > lets the OS cache the index files effectively. Each server has half the > index (over 40GB on disk) and 64GB of RAM. Of that 64GB, 6GB is > allocated to Solr. If we say the OS takes up 1GB (which it most likely > does not), that leaves 57GB of OS disk cache. Java's garbage collector > is highly tuned in my setup, because without it, I experience very long > GC pauses. > > > Here's some additional info that may or may not be useful to you: > > The BloomFilter postings format for Lucene is rumored to have amazing > performance improvements for searching unique keys. > > An obstacle: Solr does not currently have an out-of-the-box way to > actually use it. A high-level solution has been proposed, but no code > has been written yet. The following issue describes the current state: > > https://issues.apache.org/jira/browse/SOLR-3950 > > You could always write your own custom postings format instead of > waiting for someone (most likely me) to figure out how to go about > including it directly in Solr. If you do this, I hope you'll be able to > attach your code to the issue so everyone benefits. > > Thanks, > Shawn