This kind of text processing is called entity extraction. I'm not up to date on 
what is available in Solr, but search on that.

wunder

On Jun 26, 2013, at 10:26 AM, Warren H. Prince wrote:

>       We receive about 100 documents a day of various sizes.  The documents 
> could pertain to any of 40,000 contacts stored in our database, and could 
> include more than one.   For each file we have, we maintain a list of 
> contacts that are related to or involved in that file.  I know it will never 
> be exact, but I'd like to index possible names in the text, and then attempt 
> to identify which files the document might pertain to, looking with files 
> that are tied to contacts contained in the document.
> 
> I've found some regex code to parse names from the text, but does anyone have 
> any ideas on how to set up the index.  There are currently approximately 
> 900,000 documents in our library.
> 
> --Warren




Reply via email to