> this may sound a bit too KISS - but another approach could be > based on synonyms, i.e. if the number of abbreviation is > limited and defined ("All US States"), you can simply define > complete state name for each abbreviation, this way a > "Chicago, IL" will be "translated" (...) in "Chicago, > Illinois" during indexing and/or querying... but this may > depend by the Tokenizer you use and how your index is defined > (do a search for "Chicago, Illinois" on a field gives you a > doc with "Chicago, Cook, Illinois" in some (other/same) field?)
Thanks for the suggestion! The problem is there are over 1M places (it's a database of historic places worldwide), most with multiple variations in the way that they're written. A complete synonym file would be pretty large. Issuing queries before indexing the docs would be preferable to a ~100-megabyte synonym file, especially because it's a wiki and people can add new places anytime so I'd have to re-build the synonym file on a regular basis. I sure wish I could figure out how to access the solr core object in my token filter class though. -dallan