What you want to do is basically named entity recognition. We have a quite
similar use case (medical/scientific documents, need to look for disease
names /drug names /MeSH terms, etc).

Take a look at David Smiley's Solr Text Tagger (
https://github.com/OpenSextant/SolrTextTagger ) which we've been using with
some success for this task.

best

-Simon

On Mon, Aug 24, 2015 at 2:13 PM, afrooz <afr.rahm...@gmail.com> wrote:

> Thanks Erick,
> I will explain the detail scenario so you might give me a solution:
> I want to annotate a medical document base on only medical dictionary. I
> don't need to annotate non medical words of document at all.
> The medical dictionary contains terms which contains multiple words, and
> these terms all together has a specific medical meanings. For example "back
> Pain", "back" and "pain" are two separate words but together they have
> another meaning. these terms might be using in different orders in a
> sentences but all with a same meaning. Ex "breast cancer" or "cancer in
> breast" should be consider the same...
> We have terms even more than 6 words also.
>
> So the question is that "I have a document with around 700 words and i need
> to annotate this document base on medical terminology of 3 million size in
> records"
> any idea how to do this?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/how-to-index-document-with-multiple-words-phrases-and-words-permutation-tp4224919p4224970.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Reply via email to