Check out http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.EdgeNGramFilterFactory Don't know if it works with phrases though
-- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 31. mars 2011, at 16.49, Brian Lamb wrote: > No, I don't really want to break down the words into subwords. In the > example I provided, I would not want "kind" to match either record because > it is not at the beginning of the word even though "kind" appears in both > records as part of a word. > > On Wed, Mar 30, 2011 at 4:42 PM, lboutros <boutr...@gmail.com> wrote: > >> Do you want to tokenize subwords based on dictionaries ? A bit like >> disagglutination of german words ? >> >> If so, something like this could help : DictionaryCompoundWordTokenFilter >> >> http://search.lucidimagination.com/search/document/CDRG_ch05_5.8.8 >> >> Ludovic >> >> >> http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/analysis/compound/DictionaryCompoundWordTokenFilter.html >> >> 2011/3/30 Brian Lamb [via Lucene] < >> ml-node+2754668-300063934-383...@n3.nabble.com> >> >>> Hi all, >>> >>> I have a field set up like this: >>> >>> <field name="common_names" multiValued="true" type="text" indexed="true" >>> stored="true" required="false" /> >>> >>> And I have some records: >>> >>> RECORD1 >>> <arr name="common_names"> >>> <str>companion to mankind</str> >>> <str>pooch</str> >>> </arr> >>> >>> RECORD2 >>> <arr name="common_names"> >>> <str>companion to womankind</str> >>> <str>man's worst enemy</str> >>> </arr> >>> >>> I would like to write a query that will match the beginning of a word >>> within >>> the term. Here is the query I would use as it exists now: >>> >>> >> http://localhost:8983/solr/search/?q=*:*&fq={!q.op=AND%20df=common_names} >> "companion >>> >>> man"~10 >>> >>> In the above example. I would want to return only RECORD1. >>> >>> The query as it exists right now is designed to only match records where >>> both words are present in the same term. So if I changed man to mankind >> in >>> the query, RECORD1 will be returned. >>> >>> Even though the phrases companion and man exist in the same term in >>> RECORD2, >>> I do not want RECORD2 to be returned because 'man' is not at the >> beginning >>> of the word. >>> >>> How can I achieve this? >>> >>> Thanks, >>> >>> Brian Lamb >>> >>> >>> ------------------------------ >>> If you reply to this email, your message will be added to the discussion >>> below: >>> >>> >> http://lucene.472066.n3.nabble.com/Matching-the-beginning-of-a-word-within-a-term-tp2754668p2754668.html >>> To start a new topic under Solr - User, email >>> ml-node+472068-1765922688-383...@n3.nabble.com >>> To unsubscribe from Solr - User, click here< >> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=472068&code=Ym91dHJvc2xAZ21haWwuY29tfDQ3MjA2OHw0Mzk2MDUxNjE= >>> . >>> >>> >> >> >> ----- >> Jouve >> France. >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/Matching-the-beginning-of-a-word-within-a-term-tp2754668p2755561.html >> Sent from the Solr - User mailing list archive at Nabble.com.