At sesam.no we want to replace a FAST (fast.no) Query Matching Server
with a Solr index.

The index we are trying to replace is not a regular index, but specially
configured to perform phrases (and sub-phrases) matches against several
large lists (like an index with only a 'title' field).

I'm not sure of a correct, or logical, name for the behavior we are
after, but it is like a combination between Shingles and exact matching.

Some examples should explain it well.

Lets say we have the following list:
> one two three
> one two
> two three
> one
> two
> three
> three two
> two one
> one three
> three one

For the query "one two three", we need hits against, and only against:
> one two three
> one two
> two three
> one
> two
> three

For the query "one two", we need hits against, and only against:
> one two
> one
> two

For the query "one three four" (or "four one three"), we need hits
against, and only against:
> one three
> one
> three

For the query "one two sesam three", we need hits against, and only
against:
> one two
> one
> two
> three

We have been testing out solr with the ShingleFilter for this, but
without luck.
I am unsure whether the reason is misconfiguration in schema.xml or that
the ShingleFilter actually don't support this type of behavior.
Attached our current schema.xml
(it is different from when I made this post to the solr-dev mailinglist,
the shingle "fieldType" is of class "solr.StrField")
Attached is screenshots of the solr/admin/analysis.jsp against this
configuration.

I'd like to know if the SchingleFilter is at all able to do what we
want.
 If it is: How can I configure schema.xml?
 If not: does there exist any other solutions that we can incorporate
into solr which will give us this behavior?

If there is no existing solution to this, we will probably end up
writing our own methods for it, extending the ShingleFilter, gladly
contributing to the solr project =)

Thanks for a great product,
Glenn-Erik

Attachment: schema.xml
Description: XML document

Reply via email to