Hi Isaac, In the process of writing Solr in Action (http://solrinaction.com), I have built the solution to SOLR-5053 for the multilingual search chapter (I didn't realize this ticket existed at the time). The solution was something I called a "MultiTextField". Essentially, the field let's you map a list of defined pre-fixes to field types and dynamically substitute in one or more field types based upon the incoming content.
For example: #schema.xml# <fieldType name="multiText" class="sia.ch14.MultiTextField" sortMissingLast="true" defaultFieldType="text_general" fieldMappings="en:text_english, es:text_spanish, fr:text_french"/> <fieldType name="text_english" ... /> <fieldType name="text_spanish" ... /> <fieldType name="text_french" ... /> <field name="content" type="multiText" indexed="true" ... /> #document# <add><doc> <field name="id">1</field> <field name="content">en,es|the schools, la escuala</field> </doc></add> #Outputted Token Stream#: [Position 1] [Position 2] [Position 3] [Position 4] the school la escuela schools escuel #query on two languages# q=en,es|la OR en,es|escuela Essentially, this MultiText field type lets you dynamically combine one or more Analyzers (from a defined field type) and stack the tokens based upon term positions within each independent Analyzer. The use case here was multiple To answer your original question... at query time, this implementation requires that you pass the prefix before EACH term in the query, not just the first term (you can see this in the "q=" I demonstrated above). If you have a Token Filter you have developed, you "could" probably accomplish what you are trying to do the same way. You could write a custom QParserPlugin that would do this for you I think. Alternatively, it may be possible to create a similar implementation that makes use of a dynamic field name (i.e. "content|en,fr" as the field name), which would pull the prefix from the field name and apply it to all tokens instead of requiring/allowing each token to specify it's own prefix. I haven't done this in my implementation, but I could see where it might be more user-friendly for many Solr users. I'm just finishing up the "multilingual search" chapter and code now and will be happy to post it to SOLR-5053 once I finish in the next few days if this would be helpful to you. -Trey On Sat, Sep 21, 2013 at 4:15 PM, Isaac Hebsh <isaac.he...@gmail.com> wrote: > Thought about that again, > We can do this work as a search component, manipulating the query string. > The cons are the double QParser work, and the double tokenization work. > > Another approach which might solve this issue easily is "Dynamic query > analyze chain": https://issues.apache.org/jira/browse/SOLR-5053 > > What would you do? > > > On Tue, Sep 17, 2013 at 10:31 PM, Isaac Hebsh <isaac.he...@gmail.com> > wrote: > > > Hi everyone, > > > > We developed a TokenFilter. > > It should act differently, depends on a parameter supplied in the > > query (for query chain only, not the index one, of course). > > We found no way to pass that parameter into the TokenFilter flow. I guess > > that the root cause is because TokenFilter is a pure lucene object. > > > > As a last resort, we tried to pass the parameter as the first term in the > > query text (q=...), and save it as a member of the TokenFilter instance. > > > > Although it is ugly, it might work fine. > > But, the problem is that it is not guaranteed that all the terms of a > > particular query will be analyzed by the same instance of a TokenFilter. > In > > this case, some terms will be analyzed without the required information > of > > that "parameter". We can produce such a race very easily. > > > > How should I overcome this issue? > > Do anyone have a better resolution? > > >