Solr RPS is painfully low

2008-01-02 Thread Alex Benjamen
Hello, I have a situation where I'm using solr with a 3Gb complete index (in ram) on a dual-core AMD machine, and I'm only getting about 1.3rps on cold queries (which for most part there is little chance for the query to be identical) Is this normal? The index contains about 20MM documents a

Performance stats for indeces with over 10MM documents

2008-01-02 Thread Alex Benjamen
Hi, I'm very interested in sharing performance stats with those who have indeces that contain more than 10MM documents. It seems that the response times and QPS drops drastically with the number of documents in the index. This overall makes sense, but it would be good to know what kind of QPS o

RE: Solr RPS is painfully low

2008-01-02 Thread Alex Benjamen
Walter: >How many rows are you requesting? Are you sorting? --wunder I'm only requesting 20 rows, and I'm not specifically sorting by any field. Does solr automatically induce sort by default, and if so, how do I disable it? Thanks, Alex

RE: Performance stats for indeces with over 10MM documents

2008-01-02 Thread Alex Benjamen
JDS: > That's too slow. Can you provide more details about your schema, queries etc? Ofcourse - I'm using the standard config which comes with solr, and I've added the following fields :

RE: Performance stats for indeces with over 10MM documents

2008-01-02 Thread Alex Benjamen
Mike, Thanks for the input, it's really valueable. Several forum users have suggested using fq to separate the caching of filters, and I can immediately see how this would help. I'm changing the code right now and going to run some benchmarks, hopefully see a big gain just from that > - use

RE: Performance stats for indeces with over 10MM documents

2008-01-03 Thread Alex Benjamen
we currently use a relational system, and it doesn't perform. Also, even though a lot of our queries are structured, we do combine them with text search, so for instance, there could be an additional clause which is a free text search for a favorite TV show -- I had exactly

Cache size clarification

2008-01-28 Thread Alex Benjamen
I need some clarification on the cache size parameters in the solrconfig. Suppose I'm using these values: What does size="5" mean... Is this 5 bytes, kilobytes, megabytes... or is it the number of documents that can be cached? In other words, how do I calculate the memory usage ba

SEVERE: java.lang.OutOfMemoryError: Java heap space

2008-01-28 Thread Alex Benjamen
We're now running several solr instances on quad-cores and getting fairly good RPS even on the largest index (26MM documents) after implementing faceted queries. Things are looking good except for this OutOfMemoryError which occurs every 2 hrs at peak. Note: I have browsed, searched the forum

RE: SEVERE: java.lang.OutOfMemoryError: Java heap space

2008-01-28 Thread Alex Benjamen
>We use 10GB of ram in one of our solr installs. You need to make sure >your java is 64 bit though. Alex, what does your java -version show? >Mine shows >java version "1.6.0_03" >Java(TM) SE Runtime Environment (build 1.6.0_03-b05) >Java HotSpot(TM) 64-Bit Server VM (build 1.6.0_03-b05, mixed m

RE: SEVERE: java.lang.OutOfMemoryError: Java heap space

2008-01-28 Thread Alex Benjamen
>Install the AMD64 version. (Confusingly, AMD64 is a spec name for >EM64T, which is now what both AMD and Intel use) >If that still doesn't work, is it possible that your machine/kernel is >not set up to support 64 bit? I was confused by the naming convention. Seems to work fine now, well, I me

RE: SEVERE: java.lang.OutOfMemoryError: Java heap space

2008-01-31 Thread Alex Benjamen
Thanks to all who responded. Things are running well! The IBM version of the JRE for Intel 64 seems to run good, and the stalling issue has dissappeared. (when the solr instance stops responding and freezes up) What I learned is that solr is a great product but needs "tuning" to fit the usage.

Master/Slave setup

2008-02-28 Thread Alex Benjamen
I'm trying to figure out how best to handle the replication for our system. (We're not using the rsync mechanism because we don't want to have frequent updates on slaves) Current process: 1. Master builds new incremental index once an hour. Commit/Optimize, copy over index to an nfs export

Top N terms of an indexed field

2008-02-28 Thread Alex Benjamen
I was wondering if it is possible to retrieve the top 20 terms for a given fields in an index. For example, if we're indexing user profile data and one of the fields is "interests" - it would be great to get the top 20 terms for interests found in the index. -Alex

RE: How long does optimize take on your Solr installation?

2008-02-28 Thread Alex Benjamen
It mostly depends on whether or not the index is completely new or incremental 4Gb, 28MM docs, ~30min (new index) 4Gb, 28MM docs, 30s (incremental)

RE: Optimization taking days/weeks

2008-02-28 Thread Alex Benjamen
This sounds too familiar... >java settings used - java -Xmx1024M -Xms1024M Sounds like your settings are pretty low... if you're using 64bit JVM, you should be able to set these much higher, maybe give it like 8gb. Another thing, you may want to look at reducing the index size... is there a

RE: Master/Slave setup

2008-02-29 Thread Alex Benjamen
OK, I'll give it a shot... Couple of issues I see with the snappuller: 1. When the master performs a commit, and then optimize, there is nothing to prevent snappuller to pul a non-optimized index? 2 Do uncommitted updates constitute a different index version... suppose I post 10 XML fi

question about snappuller script

2008-02-29 Thread Alex Benjamen
I'm looking at snappuller script and the only thing I see it doing is managing the snapshot pulling via rsync. And then once the new distribution is in ${data_dir}/${name}-wip it simpy moves it to the index dir: # move into place atomically mv ${data_dir}/${name}-wip ${data_dir}/${name} What