On Wed, 6 Aug 2008 20:21:28 -0400
"Ian Connor" <[EMAIL PROTECTED]> wrote:

> In order to preserve case for the data, but not for indexing, I have
> created two fields. One is type Author that is defined as:
> 
>     <fieldType name="author" class="solr.StrField"
> sortMissingLast="true" omitNorms="true">
>               <analyzer>
>                       <tokenizer class="solr.KeywordTokenizerFactory"/>
>                       <filter class="solr.StandardTokenizerFactory"/>
>                       <filter class="solr.LowerCaseFilterFactory"/>
>               </analyzer>
>     </fieldType>
> 
> and the other is just string:
> 
>     <fieldType name="string" class="solr.StrField"
> sortMissingLast="true" omitNorms="true"/>

Hi Ian,
the analyzers + filters apply to the data indexed (and to queries on the
field,of course), NOT what is stored. IOW, you don't have to do anything to have
SOLR return the data in your fields untouched. 

> this is used then for the author lists:
>    <field name="authors" type="author" indexed="true" stored="true"
> omitNorms="true" multiValued="true"/>
>    <field name="all_authors" type="string" indexed="true"
> stored="true" omitNorms="true" multiValued="true"/>
> 
> Is there any other way than to have two fields like this? One for
> searching and one for displaying. 

Of course, you can do this but, for the reason you explained, it isn't needed.
As a matter of fact, you will be indexing and storing both... If you wanted to
have one field for indexing/search on and the other for retrieving, you'd have
to set the values of the indexed and stored properties accordingly.

> People's names can be vary case
> sensitive for display purpose (eg McDonald. DeBros) but I don't want
> people to miss results because they search for "lee" instead of "Lee".

your definition of typeField author:

>     <fieldType name="author" class="solr.StrField"
> sortMissingLast="true" omitNorms="true">
>               <analyzer>
>                       <tokenizer class="solr.KeywordTokenizerFactory"/>
>                       <filter class="solr.StandardTokenizerFactory"/>
>                       <filter class="solr.LowerCaseFilterFactory"/>
>               </analyzer>
>     </fieldType>

 should do that - it is telling SOLR (lucene?)  that, each piece of data stored
in a field of this type, to tokenize it., and then to change to lower case -
both at indexing and query time.

> 
> Also, can anyone see danger is using StandardTokenizerFactory for
> people's names?

I don't know, give it a try :) you can use the analysis page in /admin/ to see
how your date would be treated both at index and query time...

good luck,
B

_________________________
{Beto|Norberto|Numard} Meijome

"As far as the laws of mathematics refer to reality, they are not certain, and
as far as they are certain, they do not refer to reality." Albert Einstein

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.

Reply via email to