Hi, I've been using Solr for some time in the simplest possible way (as a backend to a search engine for English documents) and I've been really happy about it. However, now I need to do something which is a bit non-standard, and unfortunately I am desperately stuck. To make things more complicated, I am using solr in a Django application through Haystack [http://haystacksearch.org], but I am pretty sure that there's no funny business going on between haystack and solr.
So, we have a database of movies and series, and as the data comes from many sources of varying reliability, we'd like to be able to do fuzzy string matching on the titles of episodes (the default matching mechanisms operate on word levels, which is not good enough for short strings, like titles). I had used n-grams approximate matching in the past, and I was very happy to find that Lucene (and Solr) supports something like this out of the box. I assumed that I need a special field type for this, so I added the following field-type to my schema.xml: <fieldType name="trigrams" stored="true" class="solr.StrField"> <analyzer type="index"> <tokenizer class="solr.analysis.NGramTokenizerFactory" minGramSize="3" maxGramSize="5" /> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType> and changed the appropriate field in the schema to: <field name="title" type="trigrams" indexed="true" stored="true" multiValued="false" /> However, this is not working as I expected. The query analysis looks correctly, but I don't get any results, which makes me believe that something happens at index time (ie. the title is indexed like a default string field instead of trigram field). Moreover, I would like to be able to do something more. I'd like to lowercace the string, remove all punctuation marks and spaces, remove English stopwords and THEN change the string into trigrams. However, the filters are applied only after the string has been tokenized... Could you please suggest me any solution to this problem? Thanks in advance for your answers. -- Ryszard Szopa -- http://gryziemy.net http://robimy.net