Hi Lee,

Sorry, I think Erick and I both thought the issue was converting the
synonyms, not removing the other words.

To keep only a set of words that match a list, use the
KeepWordFilterFactory, with your list of synonyms.

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.KeepWordFilterFactory

I'd put the synonym filter first in your configuration for the field,
then the keep words filter factory.

Tom




On Tue, Dec 7, 2010 at 12:06 PM, lee carroll
<lee.a.carr...@googlemail.com> wrote:
> ok thanks for your response
>
> To summarise the solution then:
>
> To only index synonyms you must only send words that will match the synonym
> list. If words with out synonym ,atches are in the field to be indexed these
> words will be indexed. No way to avoid this by using schema.xml config.
>
> thanks lee c
>
> On 7 December 2010 13:21, Erick Erickson <erickerick...@gmail.com> wrote:
>
>> OK, the light finally dawns....
>>
>> *If* you have a defined list of words to remove, you can put them in
>> with your stopwords and add a stopword filter to the field in
>> schema.xml.
>>
>> Otherwise, you'll have to do some pre-processing and only send to
>> solr words you want. I'm assuming you have a list of valid words
>> (i.e. the words in your synonyms file) and could pre-filter the input
>> to remove everything else. In that case you don't need a synonyms
>> filter since you're controlling the whole process anyway....
>>
>> Best
>> Erick
>>
>> On Tue, Dec 7, 2010 at 6:07 AM, lee carroll <lee.a.carr...@googlemail.com
>> >wrote:
>>
>> > Hi tom
>> >
>> > This seems to place in the index
>> > This is a scenic line of words
>> > I just want scenic and words in the index
>> >
>> > I'm not at a terminal at the moment but will try again to make sure. I'm
>> > sure I'm missing the obvious
>> >
>> > Cheers lee
>> > On 7 Dec 2010 07:40, "Tom Hill" <solr-l...@worldware.com> wrote:
>> > > Hi Lee,
>> > >
>> > >
>> > > On Mon, Dec 6, 2010 at 10:56 PM, lee carroll
>> > > <lee.a.carr...@googlemail.com> wrote:
>> > >> Hi Erik
>> > >
>> > > Nope, Erik is the other one. :-)
>> > >
>> > >> thanks for the reply. I only want the synonyms to be in the index
>> > >> how can I achieve that ? Sorry probably missing something obvious in
>> the
>> > >> docs
>> > >
>> > > Exactly what he said, use the => syntax. You've already got it. Add the
>> > lines
>> > >
>> > > pretty => scenic
>> > > text => words
>> > >
>> > > to synonyms.txt, and it will do what you want.
>> > >
>> > > Tom
>> > >
>> > >> On 7 Dec 2010 01:28, "Erick Erickson" <erickerick...@gmail.com>
>> wrote:
>> > >>> See:
>> > >>>
>> > >>
>> >
>> >
>> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
>> > >>>
>> > >>> with the => syntax, I think that's what you're looking for
>> > >>>
>> > >>> Best
>> > >>> Erick
>> > >>>
>> > >>> On Mon, Dec 6, 2010 at 6:34 PM, lee carroll <
>> > lee.a.carr...@googlemail.com
>> > >>>wrote:
>> > >>>
>> > >>>> Hi Can the following usecase be achieved.
>> > >>>>
>> > >>>> value to be analysed at index time "this is a pretty line of text"
>> > >>>>
>> > >>>> synonym list is pretty => scenic , text => words
>> > >>>>
>> > >>>> valued placed in the index is "scenic words"
>> > >>>>
>> > >>>> That is to say only the matching synonyms. Basically i want to
>> produce
>> > a
>> > >>>> normalised set of phrases for faceting.
>> > >>>>
>> > >>>> Cheers Lee C
>> > >>>>
>> > >>
>> >
>>
>

Reply via email to