Thank you Toke, your comments made a lot of sense to me. Luckily we do not have 
many patterns and we just decided to consider only the prefixes up to the first 
wildcard. So we will no longer have to deal with patterns.
Alexander

-----Ursprüngliche Nachricht-----
Von: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] 
Gesendet: Freitag, 10. August 2012 13:29
An: solr-user@lucene.apache.org
Betreff: Re: Indexing wildcard patterns

On Fri, 2012-08-10 at 10:07 +0200, Lochschmied, Alexander wrote:
> Coming from a SQL database based search system, we already have a set of 
> defined patterns associated with our searchable documents.
> 
> % matches no or any number of characters _ matches one character
> 
> Example:
> Doc 1: 'AB%CD', 'AB%CD%'
> Doc 2: 'AB_CD'

As I understand it: You have a list of (simple) patterns and want to find those 
that matches a given input. When you do it in SQL, it iterates all patterns and 
applies them one at a time.

I am not aware of any mechanism in Lucene/Solr that provides this 
functionality. Implementing a new Query type for this would be a possibility, 
and speed could be somewhat optimized by compiling the patterns only once; but 
as long as the underlying algorithm is "iterate all patterns and see if they 
match", this will not scale very well.

Before speculating any further, it would be nice to know the scale of your 
problem: How many unique patterns are we talking about? Is there any "pattern 
to the patterns", such as specific lengths, maximum number of substitutions or 
literal prefixes?

Reply via email to