Distributed Search component question

2015-06-19 Thread Mihran Shahinian
Hi all, I have the following search components that I don't have a solution at the moment to get them working in distributed mode on solr 4.10.4. [standard query component] [search component-1] (StageID - 2500): handleResponses: get few values from docs and populate parameters for stats component

PatternReplaceCharFilter + solr.WhitespaceTokenizerFactory behaviour

2015-05-11 Thread Mihran Shahinian
I must be missing something obvious.I have a simple regex that removes pattern. The unit test below works fine, but when I plug it into schema and query, regex does not match, since input already gets split by space (further below). My understanding that charFilter would operate on raw input stri

Re: Relevancy : Keyword stuffing

2015-03-16 Thread Mihran Shahinian
Thank you Markus and Chris, for pointers. For SweetSpotSimilarity I am thinking perhaps a set of closed ranges exposed via similarity config is easier to maintain as data changes than making adjustments to fit a function. Another piece of info would've been handy is to know the average position inf

Relevancy : Keyword stuffing

2015-03-16 Thread Mihran Shahinian
Hi all, I have a use case where the data is generated by SEO minded authors and more often than not they perfectly guess the synonym expansions for the document titles skewing results in their favor. At the moment I don't have an offline processing infrastructure to detect these (I can't punish the

boosting by geodist - GC Overhead Limit exceeded

2015-01-21 Thread Mihran Shahinian
I am running solr 4.10.2 with geofilt (~20% of docs have 30+ lat/lon points) and everything work hunky dori. Than I added a bf with geodist along the lines of: recip(geodist(),5,20,5) after few hours of running I end up with OOM GC overhead limit exceeded. I've seen this https://issues.apache.