Way to go Bess!   This is great stuff you're sharing.
I have a question though...

On Jan 16, 2007, at 11:48 AM, Bess Sadler wrote:
Currently, we are assigning all fields, no matter what language to type string, defined as
<fieldtype name="string" class="solr.StrField"  
sortMissingLast="true"/>
This does string matching very well, but doesn't do any stop words,  
or stemming, or anything fancy. We are toying with the idea of a  
custom Tibetan indexer to better break up the Tibetan into discrete  
words, but for this particular project (because it mostly has to do  
with proper names, not long passages of text) this hasn't been a  
problem yet, and the above solution seems to be doing the trick.
Why are you assigning all fields to a "string" type?  That indexes  
each field as-is, with no tokenization at all.  How are you using  
that field from the front-end?   I'd think you'd want to copyField  
everything into a "text" field.
Elizabeth (Bess) Sadler
Head, Technical and Metadata Services
Digital Scholarship Services
Box 400129
Alderman Library
University of Virginia
Charlottesville, VA 22904
Just two floors down.... what amazing folks we have on this!

        Erik

Reply via email to