> this may sound a bit too KISS - but another approach could be 
> based on synonyms, i.e. if the number of abbreviation is 
> limited and defined ("All US States"), you can simply define 
> complete state name for each abbreviation, this way a 
> "Chicago, IL" will be "translated" (...) in "Chicago, 
> Illinois" during indexing and/or querying... but this may 
> depend by the Tokenizer you use and how your index is defined 
> (do a search for "Chicago, Illinois" on a field gives you a 
> doc with "Chicago, Cook, Illinois" in some (other/same) field?)

Thanks for the suggestion!  The problem is there are over 1M places (it's a
database of historic places worldwide), most with multiple variations in the
way that they're written.  A complete synonym file would be pretty large.
Issuing queries before indexing the docs would be preferable to a
~100-megabyte synonym file, especially because it's a wiki and people can
add new places anytime so I'd have to re-build the synonym file on a regular
basis.

I sure wish I could figure out how to access the solr core object in my
token filter class though.

-dallan

Reply via email to