I once used a spell checker to break up compound words. It was slow, but worked pretty well.
wunder On Mar 1, 2012, at 5:53 AM, Erick Erickson wrote: > Right, there's nothing in Solr that I know of that'll help here. How would > a tokenizer understand that "smartphone" should be "smart" "phone"? > There's no general solution for this issue. > > You can do domain-specific solutions with synonyms for instance, or > some other word list that contains terms you're interested in, entries > like smartphone => smart phone > but that has the obvious drawback of requiring that you know all the > terms that might be smashed together. > > You *might* be able to do something with shingles, but I'm a little unclear > on how. > > Best > Erick > > On Tue, Feb 28, 2012 at 4:05 PM, PeterKerk <vettepa...@hotmail.com> wrote: >> I have the following in my schema.xml >> >> <field name="title" type="text_ws" indexed="true" stored="true"/> >> <field name="title_search" type="text" indexed="true" stored="true"/> >> >> >> <fieldType name="text" class="solr.TextField" positionIncrementGap="100"> >> <analyzer type="index"> >> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >> <filter class="solr.StopFilterFactory" ignoreCase="true" >> words="stopwords_dutch.txt"/> >> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" >> generateNumberParts="1" catenateWords="1" catenateNumbers="1" >> catenateAll="0" splitOnCaseChange="1"/> >> <filter class="solr.LowerCaseFilterFactory"/> >> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> >> </analyzer> >> <analyzer type="query"> >> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" >> ignoreCase="true" expand="true"/> >> <filter class="solr.StopFilterFactory" ignoreCase="true" >> words="stopwords_dutch.txt"/> >> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" >> generateNumberParts="1" catenateWords="0" catenateNumbers="0" >> catenateAll="0" splitOnCaseChange="1"/> >> <filter class="solr.LowerCaseFilterFactory"/> >> >> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> >> </analyzer> >> </fieldType> >> >> >> I want to search on field "title". >> Now my field title holds the value "great smartphone". >> If I search on "smartphone" the item is found. But I want the item also to >> be found on "great" or "phone" it doesnt work. >> I have been playing around with the tokenizer test function, but have failed >> to find the definition for the "text" fieldtype I need. >> Help? :) >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/Need-tokenization-that-finds-part-of-stringvalue-tp3785366p3785366.html >> Sent from the Solr - User mailing list archive at Nabble.com. -- Walter Underwood wun...@wunderwood.org