Re: duplicate entries being returned, possible caching issue?
I would guess you are seeing a view of the index after adding some documents but before the duplicates have been removed. Are you using Solr's replication scripts? -Yonik On Feb 1, 2008 6:01 PM, Rachel McConnell <[EMAIL PROTECTED]> wrote: > We have just started seeing an intermittent problem in our production > Solr instances, where the same document is returned twice in one > request. Most of the content of the response consists of duplicates. > It's not consistent; maybe 1/3 of the time this is happening and the > rest of the time, one return document is sent per actual Solr > document. > > We recently made some changes to our caching strategy, basically to > increase the values across the board. This is the only change to our > Solr instance for quite some time. > > Our production system consists of the following: > > * 'write', a Solr server used as the master index, optimized for > writes. all 3 application servers use this > * 'read1' & 'read2', Solr servers optimized for reads, which synch > from the master every 20 minutes. these two are behind a pound load > balancer. Two application servers use these for searching. > * 'read3', a Solr server identical to read1 & read2, but which is not > load balanced, and used by only one application server. > > Has anyone any ideas how to start debugging this? What information > should I be looking for that could shed some light on this? > > Thanks for any advice, > Rachel >
Field Compression
I just finished watching this talk about a column-store RDBMS, which has a long section on column compression. Specifically, it talks about the gains from compressing similar data together, and how lazily decompressing data only when it must be processed is great for memory/CPU cache usage. http://youtube.com/watch?v=yrLd-3lnZ58 While interesting, its not relevant to Lucene's stored field storage. On the other hand, it did get me thinking about stored field compression and lazy field loading. Can anyone give me some pointers about compressThreshold values that would be worth experimenting with? Our stored fields are often between 20 and 300 characters, and we're willing to spend more time indexing if it will make searching less IO bound. Thanks, Stu Hood Architecture Software Developer Mailtrust, a Rackspace Company
Re: Field Compression
On 3-Feb-08, at 1:34 PM, Stu Hood wrote: I just finished watching this talk about a column-store RDBMS, which has a long section on column compression. Specifically, it talks about the gains from compressing similar data together, and how lazily decompressing data only when it must be processed is great for memory/CPU cache usage. http://youtube.com/watch?v=yrLd-3lnZ58 While interesting, its not relevant to Lucene's stored field storage. On the other hand, it did get me thinking about stored field compression and lazy field loading. Can anyone give me some pointers about compressThreshold values that would be worth experimenting with? Our stored fields are often between 20 and 300 characters, and we're willing to spend more time indexing if it will make searching less IO bound. Field compression can save you space and converts the field into a binary field, which is lazy-loaded more efficiently than a string field. As for the threshold, I use 200 on a multi-kilobyte field, but this doesn't mean that it isn't effective on smaller fields. Experimentation on small indices followed by claculating the avg. stored bytes/docs is usually fruitful. Of course, the best way to improve performance in this regard is to store the less-frequently-used fields in a parallel solr index. This only works if the largest fields are the rarely-used ones, though (like retrieving the doc contents to create a summary). -Mike