Barring the horrible name I am wondering if folks would be interested
in having something like this as an alternative to the standard
kstemmer. This is largely based on the SynonymFilter except it builds
tokens using the kstemmer and the original input. I've created a JIRA
for this to start discu
So I've thrown something together fairly quickly which is based on
what Ahmet had sent that I believe will preserve the original token as
well as the stemmed version. I didn't go as far as weighting them
differently using the payloads however. I am not sure how to use the
preserveOriginal attribu
Further digging leads me to believe this is not the case. The Synonym
Filter supports this, but the Stemming Filter does not.
Ahmet,
Would you be willing to provide your filter as well? I wonder if we
can make it aware of the preserveOriginal attribute on
WordDelimterFilterFactory?
On Fri, Ma
Ok, so I'm digging through the code and I noticed in
org.apache.lucene.analysis.synonym.SynonymFilter there are mentions of
a keepOrig attribute. Doing some googling led me to
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters which
speaks of an attribute preserveOriginal="1" on
solr.Word
> I'd be very interested to see how you
> did this if it is available. Does
> this seem like something useful to the community at large?
I PMed it to you. Filter is not a big deal. Just modified from {@link
org.apache.lucene.wordnet.SynonymTokenFilter}. If requested, I can provide it
publicly t
I'd be very interested to see how you did this if it is available. Does
this seem like something useful to the community at large?
On Thursday, March 8, 2012, Ahmet Arslan wrote:
>> Thanks the KeywordMarkerFilterFactory
>> seems to be what I was looking
>> for. I'm still wondering about keeping
> Thanks the KeywordMarkerFilterFactory
> seems to be what I was looking
> for. I'm still wondering about keeping the unstemmed
> word as a token
> though. While I know that this would increase the
> index size slightly
> I wonder what the negative of doing such a thing would
> be? Just seems
>
Thanks the KeywordMarkerFilterFactory seems to be what I was looking
for. I'm still wondering about keeping the unstemmed word as a token
though. While I know that this would increase the index size slightly
I wonder what the negative of doing such a thing would be? Just seems
less destructive s
> I was previously using the
> PorterStemmer to do stemming and ran into
> an issue where it was overly aggressive with some words or
> abbreviations which I needed to stop. I have recently
> switched to
> KStem and I believe the issue is less, but I was wondering
> still if
> there was a way to s