It had crossed my mind but for now we have a 'DictionarySource' field whose type utilizes the KeepWordFilterFactory that uses a text file containing all correctly spelled words (thanks to scrabble), location/last/first names (courtesy of the US census bureau) and a few other adds (month/day) names. A file this large does not seem to have a material impact on indexing.
What we're seeing now (we also have a field 'TermsMisspelled' that utilizes the same text file with StopFilterFactory) is almost pure misspellings and some contractions (can't, won't, don't, etc.). Thank you everyone for your help here, this is a truly fine community. -----Original Message----- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, August 24, 2011 1:00 PM To: solr-user@lucene.apache.org Subject: Re: Text Analysis and copyField Have you considered having two dictionaries and using ajax to query them both and intermingling the results in your suggestions? It'd be some work, but I think it might accomplish what you want. Best Erick On Tue, Aug 23, 2011 at 1:48 PM, Herman Kiefus <herm...@angieslist.com> wrote: > To close, I found this article from Hoss: > http://lucene.472066.n3.nabble.com/CopyField-into-another-CopyField-td > 3122408.html > > Since I cannot use one copyField directive to copy from another copyField's > dest[ination], I cannot achieve what I desire: some terms that are subject to > KeepWordFilterFactory and some that are not. > > -----Original Message----- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: Monday, August 22, 2011 1:16 PM > To: solr-user@lucene.apache.org > Subject: Re: Text Analysis and copyField > > I suspect that the things going into TermsDictionary are from fields other > than CorrectlySpelledTerms. > > In other words I don't think that anything is getting into TermsDictionary > from CorrectlySpelledTerms... > > Be careful to remove the index between schema changes, just to be sure that > you're not seeing old data. > > Best > Erick > > On Mon, Aug 22, 2011 at 11:41 AM, Herman Kiefus <herm...@angieslist.com> > wrote: >> That's what I thought, but my experiments show differently. In actuality: >> >> I have a number of fields that are of type "text" (the default as it is >> packaged). >> >> I have a type 'textCorrectlySpelled' that utilizes KeepWordFilterFactory in >> index-time analysis, using a file of terms which are known to be correctly >> spelled. >> >> I have a type 'textDictionary' that has no index-time analysis. >> >> I have the fields: >> <field name="CorrectlySpelledTerms" type="textCorrectlySpelled" >> indexed="false" stored="false" multiValued="true"/> <field >> name="TermsDictionary" type="textDictionary" indexed="true" >> stored="false" multiValued="true"/> >> >> I want 'TermsDictionary' to contain only those terms from some fields that >> are correctly spelled plus those terms from a couple other fields >> (CompanyName and ContactName) as is. I use several copyField directives as >> follows: >> >> <copyField source="Field1" dest="CorrectlySpelledTerms"/> <copyField >> source="Field2" dest="CorrectlySpelledTerms"/> <copyField >> source="Field3" dest="CorrectlySpelledTerms"/> >> >> <copyField source="Name" dest="TermsDictionary"/> <copyField >> source="Contact" dest="TermsDictionary"/> <copyField source >> ="CorrectlySpelledTerms" dest="TermsDictionary"/> >> >> If I query 'Field1' for a term that I know is misspelled (electical) it >> yields results. >> If I query 'TermsDictionary' for the same term it yields no results. >> >> It would seem by these results that 'TermsDictionary' only contains those >> terms with misspellings stripped as a results of the text analysis on the >> field 'CorrectlySpelledTerms'. >> >> Asked another way, I think you can see what I'm getting at: a source for the >> spellchecker that only contains correct spelled terms plus proper names; >> should I have gone about this in a different way? >> >> -----Original Message----- >> From: Stephen Duncan Jr [mailto:stephen.dun...@gmail.com] >> Sent: Monday, August 22, 2011 9:30 AM >> To: solr-user@lucene.apache.org >> Subject: Re: Text Analysis and copyField >> >> On Mon, Aug 22, 2011 at 9:25 AM, Herman Kiefus <herm...@angieslist.com> >> wrote: >>> Is my thinking correct? >>> >>> I have a field 'F1' of type 'T1' whose index time analysis employs the >>> StopFilterFactory. >>> >>> I also have a field 'F2' of type 'T2' whose index time analysis does NOT >>> employ the StopFilterFactory. >>> >>> There is a copyField directive source="F1" dest="F2" >>> >>> F2 will not contain any stop words because they were filtered out as F1 was >>> populated. >>> >> >> No, F2 will contain stop words. Copy fields does not process input through >> a chain, it sends the original content to each field and therefore analysis >> is totally independent. >> >> -- >> Stephen Duncan Jr >> www.stephenduncanjr.com >> >