Hi Chris, Yes you've identified the problem :-)
I've tried using keyword tokeniser but that seems to merge all comma seperated lists of synonyms in one. the pattern tokeniser would seem to be a candidate but can you pass the pattern attribute to the tokeniser attribute in the synontm filter ? example synonym line which is problematic termA1,termA2,termA3, phrase termA, termA4 => normalisedTermA termB1,termB2,termB3 => normalisedTermB when the synonym filter uses the keyword tokeniser only "phrase term A" ends up being matched as a synonym :-) lee On 6 February 2011 12:58, lee carroll <lee.a.carr...@googlemail.com> wrote: > Hi Bill, > > quoting in the synonyms file did not produce the correct expansion :-( > > Looking at Chris's comments now > > cheers > > lee > > > On 5 February 2011 23:38, Bill Bell <billnb...@gmail.com> wrote: > >> OK that makes sense. >> >> If you double quote the synonyms file will that help for white space? >> >> Bill >> >> >> On 2/5/11 4:37 PM, "Chris Hostetter" <hossman_luc...@fucit.org> wrote: >> >> > >> >: You need to switch the order. Do synonyms and expansion first, then >> >: shingles.. >> > >> >except then he would be building shingles out of all the permutations of >> >"words" in his symonyms -- including the multi-word synonyms. i don't >> >*think* that's what he wants based on his example (but i may be wrong) >> > >> >: Have you tried using analysis.jsp ? >> > >> >he already mentioned he has, in his original mail, and that's how he can >> >tell it's not working. >> > >> >lee: based on your followup post about seeing problems in the synonyms >> >output, i suspect the problem you are having is with how the >> >synonymfilter >> >"parses" the synonyms file -- by default it assumes it should split on >> >certain characters to creates multi-word synonyms -- but in your case the >> >tokens you are feeding synonym filter (the output of your shingle filter) >> >really do have whitespace in them >> > >> >there is a "tokenizerFactory" option that Koji added a hwile back to the >> >SYnonymFilterFactory that lets you specify the classname of a >> >TokenizerFactory to use when parsing the synonym rule -- that may be what >> >you need to get your synonyms with spaces in them (so they work properly >> >with your shingles) >> > >> >(assuming of course that i really understand your problem) >> > >> > >> >-Hoss >> >> >> >