Hi Alessandro, hi Alexandre,

Thanks a lot for your reply and your considerations and hints. We use a web front end that comes bundled with Solr. It currently uses a single core approach. We would like to stick to the original setup as closely as possible to avoid administrative overhead and to not prevent the possible use of several cores in a different context in the future. This is the reason why we would like to hide the language fields completely from the front end apart from specifying an additional language parameter. Language detection on indexing is currently not an issue for us, as we get the input in a standardized format and thus can determine the language beforehand.

https://github.com/treygrainger/solr-in-action/blob/master/example-docs/ch14/cores/multi-language-field/conf/schema.xml shows an example how the multiText field type makes use of language specific field types to specify the analyzers that are being used. The core issue for us (pun intended ;-)) is to find out whether it is possible to extend this approach to only return the selected language(s), i.e. to transparently add something like nested documents.

Best regards

Johannes


On 06.06.2016 10:10, Alessandro Benedetti wrote:
Hi Johannes,
nothing out of the box unfortunately but could be a nice idea and
contribution.
If having a multi-core setup is not an option ( out of curiousity, can I
ask why ?)
you could proceed in this way :

1) you define in the schema N field variation per field you are interested
in.
N is the number of language you can support.
Given for example the text field you define :
text field not indexed, only stored
text_en indexed
text_fr indexed
text_it indexed ...

2) At indexing time you can develop a custom updateRequestProcessor that
will identify the language ( Solr internal libraries offer support for
that) and address the correct text field to index the content .
If you want to index also translations, you need to rely on some third
party libraries to do that.

3) At query time you can address in parallel all the fields you want, with
the edismax query parser for example .

4) For rendering the results, I don't have exactly clear, do you want to :

a) translate the document content in the language you want, you could
develop a custom DocTransformer that will take in input the language and
translate, but I don't see that much benefit in that.

b) return only the documents that originally were of that language. This
case is easy, you add a fq at queyTime to filter only the documents of the
language you want ( at indexing time you identify the language)

c) return the original content of the document, this is quite easy. You can
store the generic "text" field, and always return that.

Let us know for further discussion,

Cheers

On Sun, Jun 5, 2016 at 9:57 PM, Riedl, Johannes <
johannes.ri...@uni-tuebingen.de> wrote:

Hi all,

we are currently in search of a solution for switching between different
languages in the query results and keeping the possibility to perform a
search in several languages in parallel.  The overall aim would be a
constant field name and a an additional Solr parameter "lang=XX_YY" that
allows to return the results in the chosen language while searches are
applied to all languages. Setting up several cores to obtain a generic
field name is not an option. Does anyone know of a clean way to achieve
this, particularly routing content indexed to a generic field (e.g. title)
to a "background field" (e.g. title_en, title_fr) etc on the fly and
retrieving it from there depending on the language chosen.

Background: So far, we have investigated the multi-language field approach
offered by Trey Grainger in the code examples for "Solr in Action" (
https://github.com/treygrainger/solr-in-action.git, chapter 14), an
extension to the ordinary textField that allows to use a generic field name
and the language is encoded at the beginning of the field content and
appropriate index and query analyzers associated to dummy fields in
schema.xml. If there is a way to store data in these dummy fields and
additionally the lang parameter is added we might be done.

Thanks a lot, best regards

Johannes




Reply via email to