We receive about 100 documents a day of various sizes.  The documents 
could pertain to any of 40,000 contacts stored in our database, and could 
include more than one.   For each file we have, we maintain a list of contacts 
that are related to or involved in that file.  I know it will never be exact, 
but I'd like to index possible names in the text, and then attempt to identify 
which files the document might pertain to, looking with files that are tied to 
contacts contained in the document.

I've found some regex code to parse names from the text, but does anyone have 
any ideas on how to set up the index.  There are currently approximately 
900,000 documents in our library.

--Warren

Reply via email to