Re: Prevention of heavy wildcard queries

Isaac Hebsh Mon, 27 May 2013 21:09:32 -0700

I don't want to affect on the (correctness of the) real query parsing, so
creating a QParserPlugin is risky.
Instead, If I'll parse the query in my search component, it will be
detached from the real query parsing, (obviously this causes double
parsing, but assume it's OK)...



On Tue, May 28, 2013 at 3:52 AM, Roman Chyla <roman.ch...@gmail.com> wrote:

> Hi Issac,
> it is as you say, with the exception that you create a QParserPlugin, not a
> search component
>
> * create QParserPlugin, give it some name, eg. 'nw'
> * make a copy of the pipeline - your component should be at the same place,
> or just above, the wildcard processor
>
> also make sure you are setting your qparser for FQ queries, ie.
> fq="{!nw}foo"
>
>
> On Mon, May 27, 2013 at 5:01 PM, Isaac Hebsh <isaac.he...@gmail.com>
> wrote:
>
> > Thanks Roman.
> > Based on some of your suggestions, will the steps below do the work?
> >
> > * Create (and register) a new SearchComponent
> > * In its prepare method: Do for Q and all of the FQs (so this
> > SearchComponent should run AFTER QueryComponent, in order to see all of
> the
> > FQs)
> > * Create org.apache.lucene.queryparser.flexible.core.StandardQueryParser,
> > with a special implementation of QueryNodeProcessorPipeline, which
> contains
> > my NodeProcessor in the top of its list.
> > * Set my analyzer into that StandardQueryParser
> > * My NodeProcessor will be called for each term in the query, so it can
> > throw an exception if a (basic) querynode contains wildcard in both start
> > and end of the term.
> >
> > Do I have a way to avoid from reimplementing the whole
> StandardQueryParser
> > class?
> >
>
> you can try subclassing it, if it allows it
>
>
> > Will this work for both LuceneQParser and EdismaxQParser queries?
> >
>
> this will not work for edismax, nothing but changing the edismax qparser
> will do the trick
>
>
> >
> > Any other solution/work-around? How do other production environments of
> > Solr overcome this issue?
> >
>
> you can also try modifying the standard solr parser, or even the JavaCC
> generated classes
> I believe many people do just that (or some sort of preprocessing)
>
> roman
>
>
> >
> >
> > On Mon, May 27, 2013 at 10:15 PM, Roman Chyla <roman.ch...@gmail.com>
> > wrote:
> >
> > > You are right that starting to parse the query before the query
> component
> > > can get soon very ugly and complicated. You should take advantage of
> the
> > > flex parser, it is already in lucene contrib - but if you are
> interested
> > in
> > > the better version, look at
> > > https://issues.apache.org/jira/browse/LUCENE-5014
> > >
> > > The way you can solve this is:
> > >
> > > 1. use the standard syntax grammar (which allows *foo*)
> > > 2. add (or modify) WildcardQueryNodeProcessor to dis/allow that case,
> or
> > > raise error etc
> > >
> > > this way, you are changing semantics - but don't need to touch the
> syntax
> > > definition; of course, you may also change the grammar and allow only
> one
> > > instance of wildcard (or some combination) but for that you should
> > probably
> > > use LUCENE-5014
> > >
> > > roman
> > >
> > > On Mon, May 27, 2013 at 2:18 PM, Isaac Hebsh <isaac.he...@gmail.com>
> > > wrote:
> > >
> > > > Hi.
> > > >
> > > > Searching terms with wildcard in their start, is solved with
> > > > ReversedWildcardFilterFactory. But, what about terms with wildcard in
> > > both
> > > > start AND end?
> > > >
> > > > This query is heavy, and I want to disallow such queries from my
> users.
> > > >
> > > > I'm looking for a way to cause these queries to fail.
> > > > I guess there is no built-in support for my need, so it is OK to
> write
> > a
> > > > new solution.
> > > >
> > > > My current plan is to create a search component (which will run
> before
> > > > QueryComponent). It should analyze the query string, and to drop the
> > > query
> > > > if "too heavy" wildcard are found.
> > > >
> > > > Another option is to create a query parser, which wraps the current
> > > > (specified or default) qparser, and does the same work as above.
> > > >
> > > > These two options require an analysis of the query text, which might
> be
> > > an
> > > > ugly work (just think about nested queries [using _query_], OR even a
> > lot
> > > > of more basic scenarios like quoted terms, etc.)
> > > >
> > > > Am I missing a simple and clean way to do this?
> > > > What would you do?
> > > >
> > > > P.S. if no simple solution exists, timeAllowed limit is the best
> > > > work-around I could think about. Any other suggestions?
> > > >
> > >
> >
>

Re: Prevention of heavy wildcard queries

Reply via email to