A field-wide remove duplicate tokens filter

2014-12-17 Thread Varun Rajput
The org.apache.solr.analysis.RemoveDuplicatesTokenFilter, as per its description, "Filters out any tokens which are at the same logical position in the tokenstream as a previous token with the same text." A very useful filter would be one which filters out duplicate tokens throughout the field,

Re: Solr Cloud Segments and Merging Issues

2014-03-13 Thread Varun Rajput
Hey Shawn, > The config with the old policy used to be the literal name > "mergeFactor". With TieredMergePolicy, there are now three settings > that must be changed in order to actually be the same as what > mergeFactor used to do.The followingconfig snippet is the equivalent > config to a mergeF

Re: Solr Cloud Segments and Merging Issues

2014-03-13 Thread Varun Rajput
Hi Remi, I read your post and like you, I have also identified that running solr 4.6.0 in cloud mode results in higher response time which has something to do with merging of documents from the various shards. Looking at the source code, we couldn't understand why it would take so much time for m

Solr Cloud Segments and Merging Issues

2014-03-13 Thread Varun Rajput
I am using Solr 4.6.0 in cloud mode. The setup is of 4 shards, 1 on each machine with a zookeeper quorum running on 3 other machines. The index size on each shard is about 15GB. I noticed that the number of segments in second shard was 42 and in the remaining shards was between 25-30. I am basical