These look like requirements for a generic Solr search, maybe with
focus on proximity and/or phrase matching. Perhaps some white-listing
filter if you have a fixed set of words you care about. E.g. with
KeepWordFilter in the analyzer chain.
http://www.solr-start.com/info/analyzers/#KeepWordFilterFactory

There are several solutions for proximity matching, one for standard
query parser and one as a specialized one:
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-SurroundQueryParser

I'd do a most basic thing and then iterate a couple of times.

Regards,
   Alex.
----
Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 24 August 2015 at 14:13, afrooz <afr.rahm...@gmail.com> wrote:
> Thanks Erick,
> I will explain the detail scenario so you might give me a solution:
> I want to annotate a medical document base on only medical dictionary. I
> don't need to annotate non medical words of document at all.
> The medical dictionary contains terms which contains multiple words, and
> these terms all together has a specific medical meanings. For example "back
> Pain", "back" and "pain" are two separate words but together they have
> another meaning. these terms might be using in different orders in a
> sentences but all with a same meaning. Ex "breast cancer" or "cancer in
> breast" should be consider the same...
> We have terms even more than 6 words also.
>
> So the question is that "I have a document with around 700 words and i need
> to annotate this document base on medical terminology of 3 million size in
> records"
> any idea how to do this?
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/how-to-index-document-with-multiple-words-phrases-and-words-permutation-tp4224919p4224970.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to