Hi Isaac,

In the process of writing Solr in Action (http://solrinaction.com), I have
built the solution to SOLR-5053 for the multilingual search chapter (I
didn't realize this ticket existed at the time).  The solution was
something I called a "MultiTextField".  Essentially, the field let's you
map a list of defined pre-fixes to field types and dynamically substitute
in one or more field types based upon the incoming content.

For example:

#schema.xml#
 <fieldType name="multiText"
        class="sia.ch14.MultiTextField" sortMissingLast="true"
        defaultFieldType="text_general"
        fieldMappings="en:text_english,
                       es:text_spanish,
                       fr:text_french"/>

<fieldType name="text_english" ... />
<fieldType name="text_spanish" ... />
<fieldType name="text_french" ... />

<field name="content" type="multiText" indexed="true" ... />
#document#
<add><doc>
  <field name="id">1</field>
  <field name="content">en,es|the schools, la escuala</field>
</doc></add>

#Outputted Token Stream#:
[Position 1]       [Position 2]           [Position 3]     [Position 4]
     the                   school                   la
escuela
                             schools
escuel

#query on two languages#
q=en,es|la OR en,es|escuela

 Essentially, this MultiText field type lets you dynamically combine one or
more Analyzers (from a defined field type) and stack the tokens based upon
term positions within each independent Analyzer.  The use case here was
multiple

To answer your original question... at query time, this implementation
requires that you pass the prefix before EACH term in the query, not just
the first term (you can see this in the "q=" I demonstrated above).  If you
have a Token Filter you have developed, you "could" probably accomplish
what you are trying to do the same way.

You could write a custom QParserPlugin that would do this for you I think.
 Alternatively, it may be possible to create a similar implementation that
makes use of a dynamic field name (i.e.  "content|en,fr" as the field
name), which would pull the prefix from the field name and apply it to all
tokens instead of requiring/allowing each token to specify it's own prefix.
 I haven't done this in my implementation, but I could see where it might
be more user-friendly for many Solr users.

I'm just finishing up the "multilingual search" chapter and code now and
will be happy to post it to SOLR-5053 once I finish in the next few days if
this would be helpful to you.

-Trey


On Sat, Sep 21, 2013 at 4:15 PM, Isaac Hebsh <isaac.he...@gmail.com> wrote:

> Thought about that again,
> We can do this work as a search component, manipulating the query string.
> The cons are the double QParser work, and the double tokenization work.
>
> Another approach which might solve this issue easily is "Dynamic query
> analyze chain": https://issues.apache.org/jira/browse/SOLR-5053
>
> What would you do?
>
>
> On Tue, Sep 17, 2013 at 10:31 PM, Isaac Hebsh <isaac.he...@gmail.com>
> wrote:
>
> > Hi everyone,
> >
> > We developed a TokenFilter.
> > It should act differently, depends on a parameter supplied in the
> > query (for query chain only, not the index one, of course).
> > We found no way to pass that parameter into the TokenFilter flow. I guess
> > that the root cause is because TokenFilter is a pure lucene object.
> >
> > As a last resort, we tried to pass the parameter as the first term in the
> > query text (q=...), and save it as a member of the TokenFilter instance.
> >
> > Although it is ugly, it might work fine.
> > But, the problem is that it is not guaranteed that all the terms of a
> > particular query will be analyzed by the same instance of a TokenFilter.
> In
> > this case, some terms will be analyzed without the required information
> of
> > that "parameter". We can produce such a race very easily.
> >
> > How should I overcome this issue?
> > Do anyone have a better resolution?
> >
>

Reply via email to