Re: Solr stemming -> preserve original words

2009-01-24 Thread AHMET ARSLAN
you can use your new FilfterFactory in your schema.xml --- On Sat, 1/24/09, Thushara Wijeratna wrote: > From: Thushara Wijeratna > Subject: Re: Solr stemming -> preserve original words > To: solr-user@lucene.apache.org, iori...@yahoo.com > Date: Saturday, January 24, 2009, 1:53 AM &

Re: Solr stemming -> preserve original words

2009-01-23 Thread Thushara Wijeratna
Chris, Ahmet - thanks for the responses. Ahmet - yes, i want to see "run" as a top term + the original words that formed that term The reason is that due to mis-stemming, the terms could become non-english. ex: "permanent" would stem to "perm", "archive" would become "archiv". I need to extract

Re: Solr stemming -> preserve original words

2009-01-23 Thread AHMET ARSLAN
I didn't understand what exactly you want. if a document has run(10), running(20), runner(2), runners(8): (assuming stemmer reduces all those words to run) with non-stemmed you will see: running(20) run(10) runners(8) runner(2) with stemmed you will see: run(40) You want to see run as a top te

Re: Solr stemming -> preserve original words

2009-01-23 Thread Chris Harris
It seems like what's desired is not so much a stemmer as what you might call a "canonicalizer", which would translate each source word not into its "stem" but into its "most canonical form". Critically, the latter, by definition, is always a legitimate word, e.g. "run". What's more, it's always the

Re: Solr stemming -> preserve original words

2009-01-23 Thread Thushara Wijeratna
hi Ahmet, thanks. when i look at the non_stemmed_text field to get the top terms, i will not be getting the useful feature of aggregating many related words into one (which is done by stemming). for ex: if a document has run(10), running(20), runner(2), runners(8) - i would like to see a a "top t

Re: Solr stemming -> preserve original words

2009-01-23 Thread AHMET ARSLAN
I think best way to get non-stemmed top terms is to index the field using a fieldType that does not employes any stem filter. For example: By using copyField you can store two (or more) versions of a field. Stemmed and non-stemmed. Just a new field: And a copy field: Schema Brow

Solr stemming -> preserve original words

2009-01-23 Thread Thushara Wijeratna
hello, Is it possible to retrieve the original words once solr (Porter algorithm) stems them? I need to index a bunch of data, store it in solr, and get back a list of most frequent terms out of solr. and i want to see the non-stemmed version of this data. so basically, i want to enhance this: ht