hi Ahmet, thanks. when i look at the non_stemmed_text field to get the top terms, i will not be getting the useful feature of aggregating many related words into one (which is done by stemming).
for ex: if a document has run(10), running(20), runner(2), runners(8) - i would like to see a a "top term" to be "run" here. i think with the non-stemmed solution, i will see run, running, runner, runners as separate top terms so if the term "weather" happens to occur 21 times in the document, it will replace any version of "run" as the top term. of course i could go back to the text field for top terms where i will see "run", but some of the terms in the text field will be non-english (stemmed beyond english, ex: archiv, perman). so how can i tell if a term i see in the text field is a "badly stemmed" word or not? maybe at this point i could use a dictionary? if a term in the text field is not in the dictionary, i would try to find a prefix match from the non-stemmed field? or maybe there's a better way? thanks, thushara On Fri, Jan 23, 2009 at 11:37 AM, AHMET ARSLAN <iori...@yahoo.com> wrote: > I think best way to get non-stemmed top terms is to index the field using a > fieldType that does not employes any stem filter. For example: > > <fieldType name="non_stemmed_text" class="solr.TextField"> > <analyzer > class="org.apache.lucene.analysis.standard.StandardAnalyzer"/> > </fieldType> > > By using copyField you can store two (or more) versions of a field. Stemmed > and non-stemmed. > > Just a new field: > <field name="text" type="non_stemmed_text" indexed="true" stored="true" /> > > And a copy field: > <copyField source="your_original_field" dest="text" /> > > Schema Browser (Field: text) will give you top terms. > > > Is it possible to retrieve the original words once solr > > (Porter algorithm) > > stems them? > > I need to index a bunch of data, store it in solr, and get > > back a list of > > most frequent terms out of solr. and i want to see the > > non-stemmed version > > of this data. > > > > so basically, i want to enhance this: > > http://localhost:8983/solr/admin/schema.jsp to see the > > "top terms" in > > non-stemmed form. > > > > thanks, > > thushara > > > >