How often does your collection change or get updated?

You could also have a slight alternative, which is to create a real small and simple Lucene index that contains your translations and then do it pre-indexing. The code for such a searcher is quite simple, albeit it isn't Solr.

Otherwise, you'd have to hack the SolrResourceLoader to recognize your Analyzer as being SolrCoreAware, but, geez, I don't know what the full ramifications of that would be, so caveat emptor.

-Grant

On May 31, 2008, at 12:51 AM, Dallan Quass wrote:

Hi Grant,

Can you describe your indexing process a bit more?  Do you
just have one or two tokens that you have "translate" or is
it that you are going to query on every token in your text?
I just don't see how that will perform at all to look up
every token in some index, so maybe if we have some more
info, something more obvious will arise.

One more clarification -- I don't need to do this for every token in the text; just for "place" fields in the document. Each document has 1-3 place
fields that need to be converted to standard form when the document is
indexed.

There is a special set of (~1M) "Place" documents that contain information about alternative/abbreviated place names, how places are nested inside each other, etc. Either before or during tokenization of the regular documents I want to query these "Place" documents to determine how to standardize the
place fields in the regular documents.

Thank-you again!

-dallan


--------------------------
Grant Ingersoll
http://www.lucidimagination.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ







Reply via email to