Re: Prevention of heavy wildcard queries

Roman Chyla Mon, 27 May 2013 17:53:52 -0700

Hi Issac,
it is as you say, with the exception that you create a QParserPlugin, not a
search component


* create QParserPlugin, give it some name, eg. 'nw'
* make a copy of the pipeline - your component should be at the same place,
or just above, the wildcard processor

also make sure you are setting your qparser for FQ queries, ie.
fq="{!nw}foo"


On Mon, May 27, 2013 at 5:01 PM, Isaac Hebsh <isaac.he...@gmail.com> wrote:

> Thanks Roman.
> Based on some of your suggestions, will the steps below do the work?
>
> * Create (and register) a new SearchComponent
> * In its prepare method: Do for Q and all of the FQs (so this
> SearchComponent should run AFTER QueryComponent, in order to see all of the
> FQs)
> * Create org.apache.lucene.queryparser.flexible.core.StandardQueryParser,
> with a special implementation of QueryNodeProcessorPipeline, which contains
> my NodeProcessor in the top of its list.
> * Set my analyzer into that StandardQueryParser
> * My NodeProcessor will be called for each term in the query, so it can
> throw an exception if a (basic) querynode contains wildcard in both start
> and end of the term.
>
> Do I have a way to avoid from reimplementing the whole StandardQueryParser
> class?
>

you can try subclassing it, if it allows it


> Will this work for both LuceneQParser and EdismaxQParser queries?
>

this will not work for edismax, nothing but changing the edismax qparser
will do the trick


>
> Any other solution/work-around? How do other production environments of
> Solr overcome this issue?
>

you can also try modifying the standard solr parser, or even the JavaCC
generated classes
I believe many people do just that (or some sort of preprocessing)

roman


>
>
> On Mon, May 27, 2013 at 10:15 PM, Roman Chyla <roman.ch...@gmail.com>
> wrote:
>
> > You are right that starting to parse the query before the query component
> > can get soon very ugly and complicated. You should take advantage of the
> > flex parser, it is already in lucene contrib - but if you are interested
> in
> > the better version, look at
> > https://issues.apache.org/jira/browse/LUCENE-5014
> >
> > The way you can solve this is:
> >
> > 1. use the standard syntax grammar (which allows *foo*)
> > 2. add (or modify) WildcardQueryNodeProcessor to dis/allow that case, or
> > raise error etc
> >
> > this way, you are changing semantics - but don't need to touch the syntax
> > definition; of course, you may also change the grammar and allow only one
> > instance of wildcard (or some combination) but for that you should
> probably
> > use LUCENE-5014
> >
> > roman
> >
> > On Mon, May 27, 2013 at 2:18 PM, Isaac Hebsh <isaac.he...@gmail.com>
> > wrote:
> >
> > > Hi.
> > >
> > > Searching terms with wildcard in their start, is solved with
> > > ReversedWildcardFilterFactory. But, what about terms with wildcard in
> > both
> > > start AND end?
> > >
> > > This query is heavy, and I want to disallow such queries from my users.
> > >
> > > I'm looking for a way to cause these queries to fail.
> > > I guess there is no built-in support for my need, so it is OK to write
> a
> > > new solution.
> > >
> > > My current plan is to create a search component (which will run before
> > > QueryComponent). It should analyze the query string, and to drop the
> > query
> > > if "too heavy" wildcard are found.
> > >
> > > Another option is to create a query parser, which wraps the current
> > > (specified or default) qparser, and does the same work as above.
> > >
> > > These two options require an analysis of the query text, which might be
> > an
> > > ugly work (just think about nested queries [using _query_], OR even a
> lot
> > > of more basic scenarios like quoted terms, etc.)
> > >
> > > Am I missing a simple and clean way to do this?
> > > What would you do?
> > >
> > > P.S. if no simple solution exists, timeAllowed limit is the best
> > > work-around I could think about. Any other suggestions?
> > >
> >
>

Re: Prevention of heavy wildcard queries

Reply via email to