Dallas, got money to spend on solving this problem?  I believe this is 
something that tools like LingPipe can solve through language model training 
and named entity extraction.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


----- Original Message ----
> From: Dallan Quass <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Friday, May 30, 2008 4:22:37 PM
> Subject: RE: Issuing queries during analysis?
> 
> > this may sound a bit too KISS - but another approach could be 
> > based on synonyms, i.e. if the number of abbreviation is 
> > limited and defined ("All US States"), you can simply define 
> > complete state name for each abbreviation, this way a 
> > "Chicago, IL" will be "translated" (...) in "Chicago, 
> > Illinois" during indexing and/or querying... but this may 
> > depend by the Tokenizer you use and how your index is defined 
> > (do a search for "Chicago, Illinois" on a field gives you a 
> > doc with "Chicago, Cook, Illinois" in some (other/same) field?)
> 
> Thanks for the suggestion!  The problem is there are over 1M places (it's a
> database of historic places worldwide), most with multiple variations in the
> way that they're written.  A complete synonym file would be pretty large.
> Issuing queries before indexing the docs would be preferable to a
> ~100-megabyte synonym file, especially because it's a wiki and people can
> add new places anytime so I'd have to re-build the synonym file on a regular
> basis.
> 
> I sure wish I could figure out how to access the solr core object in my
> token filter class though.
> 
> -dallan

Reply via email to