Re: Prevention of heavy wildcard queries

2013-06-02 Thread Isaac Hebsh
Hi everyone. I came across another need for term extraction: I want to find pairs of words that appear in queries together. All of the "clustering" work is ready. and the only hole is how to get the basic terms from the query. Nobody tried it before? There is no clean way to do it? On Tue, May

Re: Prevention of heavy wildcard queries

2013-05-27 Thread Isaac Hebsh
I don't want to affect on the (correctness of the) real query parsing, so creating a QParserPlugin is risky. Instead, If I'll parse the query in my search component, it will be detached from the real query parsing, (obviously this causes double parsing, but assume it's OK)... On Tue, May 28, 2013

Re: Prevention of heavy wildcard queries

2013-05-27 Thread Roman Chyla
Hi Issac, it is as you say, with the exception that you create a QParserPlugin, not a search component * create QParserPlugin, give it some name, eg. 'nw' * make a copy of the pipeline - your component should be at the same place, or just above, the wildcard processor also make sure you are setti

Re: Prevention of heavy wildcard queries

2013-05-27 Thread Isaac Hebsh
Thanks Roman. Based on some of your suggestions, will the steps below do the work? * Create (and register) a new SearchComponent * In its prepare method: Do for Q and all of the FQs (so this SearchComponent should run AFTER QueryComponent, in order to see all of the FQs) * Create org.apache.lucene

Re: Prevention of heavy wildcard queries

2013-05-27 Thread Roman Chyla
You are right that starting to parse the query before the query component can get soon very ugly and complicated. You should take advantage of the flex parser, it is already in lucene contrib - but if you are interested in the better version, look at https://issues.apache.org/jira/browse/LUCENE-501

Prevention of heavy wildcard queries

2013-05-27 Thread Isaac Hebsh
Hi. Searching terms with wildcard in their start, is solved with ReversedWildcardFilterFactory. But, what about terms with wildcard in both start AND end? This query is heavy, and I want to disallow such queries from my users. I'm looking for a way to cause these queries to fail. I guess there i