Re: Issuing queries during analysis?

Grant Ingersoll Fri, 30 May 2008 08:02:24 -0700


On May 30, 2008, at 10:22 AM, Dallan Quass wrote:

this may sound a bit too KISS - but another approach could be
based on synonyms, i.e. if the number of abbreviation is
limited and defined ("All US States"), you can simply define
complete state name for each abbreviation, this way a
"Chicago, IL" will be "translated" (...) in "Chicago,
Illinois" during indexing and/or querying... but this may
depend by the Tokenizer you use and how your index is defined
(do a search for "Chicago, Illinois" on a field gives you a
doc with "Chicago, Cook, Illinois" in some (other/same) field?)
Thanks for the suggestion! The problem is there are over 1M places(it's adatabase of historic places worldwide), most with multiplevariations in theway that they're written. A complete synonym file would be prettylarge.
Issuing queries before indexing the docs would be preferable to a
~100-megabyte synonym file, especially because it's a wiki andpeople canadd new places anytime so I'd have to re-build the synonym file on aregular
basis.

Can you describe your indexing process a bit more? Do you just haveone or two tokens that you have "translate" or is it that you aregoing to query on every token in your text? I just don't see how thatwill perform at all to look up every token in some index, so maybe ifwe have some more info, something more obvious will arise.

I sure wish I could figure out how to access the solr core object inmy
token filter class though.

-dallan


--------------------------
Grant Ingersoll
http://www.lucidimagination.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ

Re: Issuing queries during analysis?

Reply via email to