That's what I thought, but my experiments show differently.  In actuality:

I have a number of fields that are of type "text" (the default as it is 
packaged).  

I have a type 'textCorrectlySpelled' that utilizes KeepWordFilterFactory in 
index-time analysis, using a file of terms which are known to be correctly 
spelled.

I have a type 'textDictionary' that has no index-time analysis.

I have the fields:
<field name="CorrectlySpelledTerms" type="textCorrectlySpelled" indexed="false" 
stored="false" multiValued="true"/>
<field name="TermsDictionary" type="textDictionary" indexed="true" 
stored="false" multiValued="true"/>

I want 'TermsDictionary' to contain only those terms from some fields that are 
correctly spelled plus those terms from a couple other fields (CompanyName and 
ContactName) as is.  I use several copyField directives as follows:

<copyField source="Field1" dest="CorrectlySpelledTerms"/>
<copyField source="Field2" dest="CorrectlySpelledTerms"/>
<copyField source="Field3" dest="CorrectlySpelledTerms"/>

<copyField source="Name" dest="TermsDictionary"/>
<copyField source="Contact" dest="TermsDictionary"/>
<copyField source ="CorrectlySpelledTerms" dest="TermsDictionary"/>

If I query 'Field1' for a term that I know is misspelled (electical) it yields 
results.
If I query 'TermsDictionary' for the same term it yields no results.

It would seem by these results that 'TermsDictionary' only contains those terms 
with misspellings stripped as a results of the text analysis on the field 
'CorrectlySpelledTerms'.

Asked another way, I think you can see what I'm getting at: a source for the 
spellchecker that only contains correct spelled terms plus proper names; should 
I have gone about this in a different way?

-----Original Message-----
From: Stephen Duncan Jr [mailto:stephen.dun...@gmail.com] 
Sent: Monday, August 22, 2011 9:30 AM
To: solr-user@lucene.apache.org
Subject: Re: Text Analysis and copyField

On Mon, Aug 22, 2011 at 9:25 AM, Herman Kiefus <herm...@angieslist.com> wrote:
> Is my thinking correct?
>
> I have a field 'F1' of type 'T1' whose index time analysis employs the 
> StopFilterFactory.
>
> I also have a field 'F2' of type 'T2' whose index time analysis does NOT 
> employ the StopFilterFactory.
>
> There is a copyField directive source="F1" dest="F2"
>
> F2 will not contain any stop words because they were filtered out as F1 was 
> populated.
>

No, F2 will contain stop words.  Copy fields does not process input through a 
chain, it sends the original content to each field and therefore analysis is 
totally independent.

--
Stephen Duncan Jr
www.stephenduncanjr.com

Reply via email to