Way to go Bess! This is great stuff you're sharing.
I have a question though...
On Jan 16, 2007, at 11:48 AM, Bess Sadler wrote:
Currently, we are assigning all fields, no matter what language to
type string, defined as
<fieldtype name="string" class="solr.StrField"
sortMissingLast="true"/>
This does string matching very well, but doesn't do any stop words,
or stemming, or anything fancy. We are toying with the idea of a
custom Tibetan indexer to better break up the Tibetan into discrete
words, but for this particular project (because it mostly has to do
with proper names, not long passages of text) this hasn't been a
problem yet, and the above solution seems to be doing the trick.
Why are you assigning all fields to a "string" type? That indexes
each field as-is, with no tokenization at all. How are you using
that field from the front-end? I'd think you'd want to copyField
everything into a "text" field.
Elizabeth (Bess) Sadler
Head, Technical and Metadata Services
Digital Scholarship Services
Box 400129
Alderman Library
University of Virginia
Charlottesville, VA 22904
Just two floors down.... what amazing folks we have on this!
Erik