Re: Text Analysis and copyField

Erick Erickson Mon, 22 Aug 2011 10:16:33 -0700

I suspect that the things going into TermsDictionary are from fields other than
CorrectlySpelledTerms.


In other words I don't think that anything is getting into TermsDictionary from
CorrectlySpelledTerms...

Be careful to remove the index between schema changes, just to be sure that
you're not seeing old data.

Best
Erick

On Mon, Aug 22, 2011 at 11:41 AM, Herman Kiefus <herm...@angieslist.com> wrote:
> That's what I thought, but my experiments show differently.  In actuality:
>
> I have a number of fields that are of type "text" (the default as it is 
> packaged).
>
> I have a type 'textCorrectlySpelled' that utilizes KeepWordFilterFactory in 
> index-time analysis, using a file of terms which are known to be correctly 
> spelled.
>
> I have a type 'textDictionary' that has no index-time analysis.
>
> I have the fields:
> <field name="CorrectlySpelledTerms" type="textCorrectlySpelled" 
> indexed="false" stored="false" multiValued="true"/>
> <field name="TermsDictionary" type="textDictionary" indexed="true" 
> stored="false" multiValued="true"/>
>
> I want 'TermsDictionary' to contain only those terms from some fields that 
> are correctly spelled plus those terms from a couple other fields 
> (CompanyName and ContactName) as is.  I use several copyField directives as 
> follows:
>
> <copyField source="Field1" dest="CorrectlySpelledTerms"/>
> <copyField source="Field2" dest="CorrectlySpelledTerms"/>
> <copyField source="Field3" dest="CorrectlySpelledTerms"/>
>
> <copyField source="Name" dest="TermsDictionary"/>
> <copyField source="Contact" dest="TermsDictionary"/>
> <copyField source ="CorrectlySpelledTerms" dest="TermsDictionary"/>
>
> If I query 'Field1' for a term that I know is misspelled (electical) it 
> yields results.
> If I query 'TermsDictionary' for the same term it yields no results.
>
> It would seem by these results that 'TermsDictionary' only contains those 
> terms with misspellings stripped as a results of the text analysis on the 
> field 'CorrectlySpelledTerms'.
>
> Asked another way, I think you can see what I'm getting at: a source for the 
> spellchecker that only contains correct spelled terms plus proper names; 
> should I have gone about this in a different way?
>
> -----Original Message-----
> From: Stephen Duncan Jr [mailto:stephen.dun...@gmail.com]
> Sent: Monday, August 22, 2011 9:30 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Text Analysis and copyField
>
> On Mon, Aug 22, 2011 at 9:25 AM, Herman Kiefus <herm...@angieslist.com> wrote:
>> Is my thinking correct?
>>
>> I have a field 'F1' of type 'T1' whose index time analysis employs the 
>> StopFilterFactory.
>>
>> I also have a field 'F2' of type 'T2' whose index time analysis does NOT 
>> employ the StopFilterFactory.
>>
>> There is a copyField directive source="F1" dest="F2"
>>
>> F2 will not contain any stop words because they were filtered out as F1 was 
>> populated.
>>
>
> No, F2 will contain stop words.  Copy fields does not process input through a 
> chain, it sends the original content to each field and therefore analysis is 
> totally independent.
>
> --
> Stephen Duncan Jr
> www.stephenduncanjr.com
>

Re: Text Analysis and copyField

Reply via email to