These look like requirements for a generic Solr search, maybe with focus on proximity and/or phrase matching. Perhaps some white-listing filter if you have a fixed set of words you care about. E.g. with KeepWordFilter in the analyzer chain. http://www.solr-start.com/info/analyzers/#KeepWordFilterFactory
There are several solutions for proximity matching, one for standard query parser and one as a specialized one: https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-SurroundQueryParser I'd do a most basic thing and then iterate a couple of times. Regards, Alex. ---- Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 24 August 2015 at 14:13, afrooz <afr.rahm...@gmail.com> wrote: > Thanks Erick, > I will explain the detail scenario so you might give me a solution: > I want to annotate a medical document base on only medical dictionary. I > don't need to annotate non medical words of document at all. > The medical dictionary contains terms which contains multiple words, and > these terms all together has a specific medical meanings. For example "back > Pain", "back" and "pain" are two separate words but together they have > another meaning. these terms might be using in different orders in a > sentences but all with a same meaning. Ex "breast cancer" or "cancer in > breast" should be consider the same... > We have terms even more than 6 words also. > > So the question is that "I have a document with around 700 words and i need > to annotate this document base on medical terminology of 3 million size in > records" > any idea how to do this? > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/how-to-index-document-with-multiple-words-phrases-and-words-permutation-tp4224919p4224970.html > Sent from the Solr - User mailing list archive at Nabble.com.