Hi Chris,

Yes you've identified the problem :-)

I've tried using keyword tokeniser but that seems to merge all comma
seperated lists of synonyms in one.

the pattern tokeniser would seem to be a candidate but can you pass the
pattern attribute to the tokeniser attribute in the synontm filter ?

example synonym line which is problematic

termA1,termA2,termA3, phrase termA, termA4 => normalisedTermA
termB1,termB2,termB3 => normalisedTermB

when the synonym filter uses the keyword tokeniser

only "phrase term A" ends up being matched as a synonym :-)


lee


On 6 February 2011 12:58, lee carroll <lee.a.carr...@googlemail.com> wrote:

> Hi Bill,
>
> quoting in the synonyms file did not produce the correct expansion :-(
>
> Looking at Chris's comments now
>
> cheers
>
> lee
>
>
> On 5 February 2011 23:38, Bill Bell <billnb...@gmail.com> wrote:
>
>> OK that makes sense.
>>
>> If you double quote the synonyms file will that help for white space?
>>
>> Bill
>>
>>
>> On 2/5/11 4:37 PM, "Chris Hostetter" <hossman_luc...@fucit.org> wrote:
>>
>> >
>> >: You need to switch the order. Do synonyms and expansion first, then
>> >: shingles..
>> >
>> >except then he would be building shingles out of all the permutations of
>> >"words" in his symonyms -- including the multi-word synonyms.  i don't
>> >*think* that's what he wants based on his example (but i may be wrong)
>> >
>> >: Have you tried using analysis.jsp ?
>> >
>> >he already mentioned he has, in his original mail, and that's how he can
>> >tell it's not working.
>> >
>> >lee: based on your followup post about seeing problems in the synonyms
>> >output, i suspect the problem you are having is with how the
>> >synonymfilter
>> >"parses" the synonyms file -- by default it assumes it should split on
>> >certain characters to creates multi-word synonyms -- but in your case the
>> >tokens you are feeding synonym filter (the output of your shingle filter)
>> >really do have whitespace in them
>> >
>> >there is a "tokenizerFactory" option that Koji added a hwile back to the
>> >SYnonymFilterFactory that lets you specify the classname of a
>> >TokenizerFactory to use when parsing the synonym rule -- that may be what
>> >you need to get your synonyms with spaces in them (so they work properly
>> >with your shingles)
>> >
>> >(assuming of course that i really understand your problem)
>> >
>> >
>> >-Hoss
>>
>>
>>
>

Reply via email to