Have you thought about using copyText with two different processing pipelines? Then you could search both variants with different weights?
Regards, Alex. ---- Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 4 March 2015 at 14:18, fredericbaroz <fredericba...@gmail.com> wrote: > Hello, > > My name is Frédéric Baroz. I work as a in-hospital physician in Intern > Medicin in Switzerland (i speak french) and software engineer. I work in > medical informatics and I m currently making some research about "semantic > search" for in-hosp physician who are daily confronted with searching > medical information. > > I am quite a newby in lucene/solr and I ve spend most of my time this last > year, getting aquainted with this briliant technology. In the context of my > work, I noticed that analysis, index-time or query-time, sometimes need to > expand the text by injecting more or less processed tokens one after the > other. > > One common scenario is to have the system "prefer" exact word match by > injecting in the index a stemmed version along with the unmolested version > of a token. Other tokenfilters have a similar behavior, like > KeywordRepeatFilter which inject 2 version of each processed token, of which > one is flagged in order to skip the stemming phase. A last example is > AutoPhrasingTokenFilter, contribution from Lucidwork which offers a > "workaround" for multi-term synonym matching (see > http://lucidworks.com/blog/automatic-phrase-tokenization-improving-lucene-search-precision-by-more-precise-linguistic-analysis/) > > One problem to this approach, as I understand it, is that filters that adopt > this behavior, break analysis capabilities for subsequent filters. For > example, if we use KeywordRepeatFilter and then AutoPhraseFilter, the latter > will have no effect since it *never sees* the token series that it was > waiting for, since one extra-word has been added after each word, because of > KeywordRepeatFilter. > > In my opinion, tokens "to be injected" should be injected all at once, after > the original token stream has been emitted, and not after each token seen by > the filter. This would be in order not to break the ordered sequence of > tokens, which in my opinion, carries some important information. > > So my question is: has anyone already adressed this problem and are there > any workarounds that one might have thought of? > > and for the record, today, google is no friend to me ;) > > Thanks in advance for help, > > Frédéric Baroz > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Text-analysis-which-expand-the-index-with-many-words-break-subsequent-analysis-tp4191001.html > Sent from the Solr - User mailing list archive at Nabble.com.