How often does your collection change or get updated?
You could also have a slight alternative, which is to create a real
small and simple Lucene index that contains your translations and then
do it pre-indexing. The code for such a searcher is quite simple,
albeit it isn't Solr.
Otherwise, you'd have to hack the SolrResourceLoader to recognize your
Analyzer as being SolrCoreAware, but, geez, I don't know what the full
ramifications of that would be, so caveat emptor.
-Grant
On May 31, 2008, at 12:51 AM, Dallan Quass wrote:
Hi Grant,
Can you describe your indexing process a bit more? Do you
just have one or two tokens that you have "translate" or is
it that you are going to query on every token in your text?
I just don't see how that will perform at all to look up
every token in some index, so maybe if we have some more
info, something more obvious will arise.
One more clarification -- I don't need to do this for every token in
the
text; just for "place" fields in the document. Each document has
1-3 place
fields that need to be converted to standard form when the document is
indexed.
There is a special set of (~1M) "Place" documents that contain
information
about alternative/abbreviated place names, how places are nested
inside each
other, etc. Either before or during tokenization of the regular
documents I
want to query these "Place" documents to determine how to
standardize the
place fields in the regular documents.
Thank-you again!
-dallan
--------------------------
Grant Ingersoll
http://www.lucidimagination.com
Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ