you can use your new FilfterFactory in your schema.xml
--- On Sat, 1/24/09, Thushara Wijeratna wrote:
> From: Thushara Wijeratna
> Subject: Re: Solr stemming -> preserve original words
> To: solr-user@lucene.apache.org, iori...@yahoo.com
> Date: Saturday, January 24, 2009, 1:53 AM
&
Chris, Ahmet - thanks for the responses.
Ahmet - yes, i want to see "run" as a top term + the original words that
formed that term
The reason is that due to mis-stemming, the terms could become non-english.
ex: "permanent" would stem to "perm", "archive" would become "archiv".
I need to extract
I didn't understand what exactly you want.
if a document has run(10), running(20), runner(2), runners(8):
(assuming stemmer reduces all those words to run)
with non-stemmed you will see:
running(20)
run(10)
runners(8)
runner(2)
with stemmed you will see:
run(40)
You want to see run as a top te
It seems like what's desired is not so much a stemmer as what you might call
a "canonicalizer", which would translate each source word not into its
"stem" but into its "most canonical form". Critically, the latter, by
definition, is always a legitimate word, e.g. "run". What's more, it's
always the
hi Ahmet,
thanks. when i look at the non_stemmed_text field to get the top terms, i
will not be getting the useful feature of aggregating many related words
into one (which is done by stemming).
for ex: if a document has run(10), running(20), runner(2), runners(8) - i
would like to see a a "top t
I think best way to get non-stemmed top terms is to index the field using a
fieldType that does not employes any stem filter. For example:
By using copyField you can store two (or more) versions of a field. Stemmed and
non-stemmed.
Just a new field:
And a copy field:
Schema Brow
hello,
Is it possible to retrieve the original words once solr (Porter algorithm)
stems them?
I need to index a bunch of data, store it in solr, and get back a list of
most frequent terms out of solr. and i want to see the non-stemmed version
of this data.
so basically, i want to enhance this:
ht