Barring the horrible name I am wondering if folks would be interested
in having something like this as an alternative to the standard
kstemmer.  This is largely based on the SynonymFilter except it builds
tokens using the kstemmer and the original input.  I've created a JIRA
for this to start discussion.  I'd be really interested in
comments/thoughts on this.

https://issues.apache.org/jira/browse/SOLR-3231


On Fri, Mar 9, 2012 at 4:04 PM, Jamie Johnson <jej2...@gmail.com> wrote:
> So I've thrown something together fairly quickly which is based on
> what Ahmet had sent that I believe will preserve the original token as
> well as the stemmed version.  I didn't go as far as weighting them
> differently using the payloads however.  I am not sure how to use the
> preserveOriginal attribute from WordDelimeterFilterFactory, can anyone
> provide guidance on that?
>
> On Fri, Mar 9, 2012 at 2:53 PM, Jamie Johnson <jej2...@gmail.com> wrote:
>> Further digging leads me to believe this is not the case.  The Synonym
>> Filter supports this, but the Stemming Filter does not.
>>
>> Ahmet,
>>
>> Would you be willing to provide your filter as well?  I wonder if we
>> can make it aware of the preserveOriginal attribute on
>> WordDelimterFilterFactory?
>>
>>
>> On Fri, Mar 9, 2012 at 2:27 PM, Jamie Johnson <jej2...@gmail.com> wrote:
>>> Ok, so I'm digging through the code and I noticed in
>>> org.apache.lucene.analysis.synonym.SynonymFilter there are mentions of
>>> a keepOrig attribute.  Doing some googling led me to
>>> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters which
>>> speaks of an attribute preserveOriginal="1" on
>>> solr.WordDelimiterFilterFactory.  So it seems like I can get the
>>> functionality I am looking for by setting preserveOriginal, is that
>>> correct?
>>>
>>>
>>> On Fri, Mar 9, 2012 at 9:53 AM, Ahmet Arslan <iori...@yahoo.com> wrote:
>>>>> I'd be very interested to see how you
>>>>> did this if it is available. Does
>>>>> this seem like something useful to the community at large?
>>>>
>>>> I PMed it to you. Filter is not a big deal. Just modified from {@link 
>>>> org.apache.lucene.wordnet.SynonymTokenFilter}. If requested,  I can 
>>>> provide it publicly too.

Reply via email to