Re: Cache use

2007-12-06 Thread sfox
One possible explanation is that the OS's native file system caching is 
being successful at keeping these files mostly in RAM most of the time. 
 And so the performance benefits of 'forcing' the files into RAM by 
using tmpfs aren't significant.


So the slowness of the queries is the result of being CPU bound, rather 
than IO bound.  The cache within Solr is faster because it is saving and 
returning the information for which the CPU-bound work has already been 
done.


Just one possible explanation.

Sean Fox

Matthew Phillips wrote:
No one has a suggestion? I must be missing something because as I 
understand it from Dennis' email, all of queries are very quick (cached 
type response times) whereas mine are not. I can clearly see time 
differences between queries that are cached (things that have been auto 
warmed) and queries that are not. This seems odd as my whole index is 
loaded on a tmpfs memory based file system. Thanks for the help.


Matt

On Dec 4, 2007, at 3:55 PM, Matthew Phillips wrote:

Thanks for the suggestion, Dennis. I decided to implement this as you 
described on my collection of about 400,000 documents, but I did not 
receive the results I expected.


Prior to putting the indexes on a tmpfs, I did a bit of benchmarking 
and found that it usually takes a little under two seconds for each 
facet query. After moving my indexes from disk to a tmpfs file system, 
I seem to get about the same result from facet queries: about two 
seconds.


Does anyone have any insight into this? Doesn't it seem odd that my 
response times are about the same? Thanks for the help.


Matt Phillips

Dennis Kubes wrote:
One way to do this if you are running on linux is to create a tempfs 
(which is ram) and then mount the filesystem in the ram.  Then your 
index acts normally to the application but is essentially served from 
Ram.  This is how we server the Nutch lucene indexes on our web 
search engine (www.visvo.com) which is ~100M pages.  Below is how you 
can achieve this, assuming your indexes are in /path/to/indexes:

mv /path/to/indexes /path/to/indexes.dist
mkdir /path/to/indexes
cd /path/to
mount -t tmpfs -o size=2684354560 none /path/to/indexes
rsync --progress -aptv indexes.dist/* indexes/
chown -R user:group indexes
This would of course be limited by the amount of RAM you have on the 
machine.  But with this approach most searches are sub-second.

Dennis Kubes
Evgeniy Strokin wrote:

Hello,...
we have 110M records index under Solr. Some queries takes a while, 
but we need sub-second results. I guess the only solution is cache 
(something else?)...
We use standard LRUCache. In docs it says (as far as I understood) 
that it loads view of index in to memory and next time works with 
memory instead of hard drive.
So, my question: hypothetically, we can have all index in memory if 
we'd have enough memory size, right? In this case the result should 
come up very fast. We have very rear updates. So I think this could 
be a solution.

How should I configure the cache to achieve such approach?
Thanks for any advise.
Gene




Re: How long does optimize take on your Solr installation?

2008-02-28 Thread sfox

767 MB 76 seconds

(single, local SATA 7200rpm disk, unloaded XServe G5)

Sean Fox

Walter Underwood wrote:

Please answer with the size of your index (post-optimize) and how long
an optimize takes. I'll collect the data and see if I can draw a line
through it.

190 MB, 55 seconds

$ du -sk /apps/wss/solr_home/data/index
191592  /apps/wss/solr_home/data/index
$  grep commit /apps/wss/tomcat/logs/stdout.log
Feb 28, 2008 11:55:11 AM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start commit(optimize=true,waitFlush=false,waitSearcher=true)
Feb 28, 2008 11:56:06 AM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: end_commit_flush
$ uname -a
Linux spiderman4 2.6.9-22.EL #1 SMP Mon Sep 19 17:52:20 EDT 2005 ppc64 ppc64
ppc64 GNU/Linux

wunder



--
[EMAIL PROTECTED] | Technical Director | SERC | Carleton College