On Fri, 2012-08-10 at 10:07 +0200, Lochschmied, Alexander wrote:
> Coming from a SQL database based search system, we already have a set of 
> defined patterns associated with our searchable documents.
> 
> % matches no or any number of characters
> _ matches one character
> 
> Example:
> Doc 1: 'AB%CD', 'AB%CD%'
> Doc 2: 'AB_CD'

As I understand it: You have a list of (simple) patterns and want to
find those that matches a given input. When you do it in SQL, it
iterates all patterns and applies them one at a time.

I am not aware of any mechanism in Lucene/Solr that provides this
functionality. Implementing a new Query type for this would be a
possibility, and speed could be somewhat optimized by compiling the
patterns only once; but as long as the underlying algorithm is "iterate
all patterns and see if they match", this will not scale very well.

Before speculating any further, it would be nice to know the scale of
your problem: How many unique patterns are we talking about? Is there
any "pattern to the patterns", such as specific lengths, maximum number
of substitutions or literal prefixes?

Reply via email to