Permutations of entries in a multivalued field

2015-12-16 Thread Johannes Riedl

Hello all,

we are facing the following problem: we use a multivalued string field 
that contains entries of the kind A/B/C/, where A,B,C are terms.
We are now looking for a simple way to also find all permutations of 
A/B/C, so e.g. B/A/C. As a workaround we added a new field that contains 
all entries alphabetically sorted and guarantee sorting on the user 
side. However - since this is limited in some ways - is there a simple 
way to either index in a way such that solely A/B/C and all permutations 
are found (using e.g. type=text is not an option since a term could 
occur in a different entry of the multivalued field) or trigger an 
alphabetical sorting of incoming queries.


Thanks a lot for your feedback, best regards

Johannes



Re: Permutations of entries in a multivalued field

2015-12-21 Thread Johannes Riedl

Thanks a lot for these useful hints.

Best,

Johannes

On 18.12.2015 20:59, Allison, Timothy B. wrote:

Duh, didn't realize you could set inOrder in Solr.  Y, that's the better 
solution.

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Friday, December 18, 2015 2:27 PM
To: solr-user 
Subject: Re: Permutations of entries in a multivalued field

The other thing to check is the ComplexPhraseQueryParser, see:
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-ComplexPhraseQueryParser

It uses the Span queries to build up the query...

Best,
Erick

On Fri, Dec 18, 2015 at 11:23 AM, Allison, Timothy B.
 wrote:

Hi Johannes,
   I suspect that Scott's answer would be more efficient than the following, 
and I may be misunderstanding the problem!

  This type of search is supported at the Lucene level by a SpanNearQuery with 
inOrder set to false.

  So, how do you get a SpanQuery in Solr?  You might want to look at the 
SurroundQueryParser, and I have an alternate (LUCENE-5205/SOLR-5410) here: 
https://github.com/tballison/lucene-addons.

  If you do find an appropriate parser, make sure that your position increment gap 
is > 0 on your text field definition, and then you'd never incorrectly get a 
hit across field entries of:

[0] A B
[1] C

Best,
Tim

On Wed, Dec 16, 2015 at 8:38 AM, Johannes Riedl < 
johannes.ri...@uni-tuebingen.de> wrote:


Hello all,

we are facing the following problem: we use a multivalued string
field that contains entries of the kind A/B/C/, where A,B,C are terms.
We are now looking for a simple way to also find all permutations of
A/B/C, so e.g. B/A/C. As a workaround we added a new field that
contains all entries alphabetically sorted and guarantee sorting on the user 
side.
However - since this is limited in some ways - is there a simple way
to either index in a way such that solely A/B/C and all permutations
are found (using e.g. type=text is not an option since a term could
occur in a different entry of the multivalued field) or trigger an
alphabetical sorting of incoming queries.

Thanks a lot for your feedback, best regards

Johannes




--
Scott Stults | Founder & Solutions Architect | OpenSource Connections,
LLC
| 434.409.2780
http://www.opensourceconnections.com




Re: Multilingual Solr

2016-06-06 Thread Johannes Riedl

Hi Alessandro, hi Alexandre,

Thanks a lot for your reply and your considerations and hints. We use a 
web front end that comes bundled with Solr. It currently uses a single 
core approach. We would like to stick to the original setup as closely 
as possible to avoid administrative overhead and to not prevent the 
possible use of several cores in a different context in the future. This 
is the reason why we would like to hide the language fields completely 
from the front end apart from specifying an additional language 
parameter. Language detection on indexing is currently not an issue for 
us, as we get the input in a standardized format and thus can determine 
the language beforehand.


https://github.com/treygrainger/solr-in-action/blob/master/example-docs/ch14/cores/multi-language-field/conf/schema.xml 
shows an example how the multiText field type makes use of language 
specific field types to specify the analyzers that are being used. The 
core issue for us (pun intended ;-)) is to find out whether it is 
possible to extend this approach to only return the selected 
language(s), i.e. to transparently add something like nested documents.


Best regards

Johannes


On 06.06.2016 10:10, Alessandro Benedetti wrote:

Hi Johannes,
nothing out of the box unfortunately but could be a nice idea and
contribution.
If having a multi-core setup is not an option ( out of curiousity, can I
ask why ?)
you could proceed in this way :

1) you define in the schema N field variation per field you are interested
in.
N is the number of language you can support.
Given for example the text field you define :
text field not indexed, only stored
text_en indexed
text_fr indexed
text_it indexed ...

2) At indexing time you can develop a custom updateRequestProcessor that
will identify the language ( Solr internal libraries offer support for
that) and address the correct text field to index the content .
If you want to index also translations, you need to rely on some third
party libraries to do that.

3) At query time you can address in parallel all the fields you want, with
the edismax query parser for example .

4) For rendering the results, I don't have exactly clear, do you want to :

a) translate the document content in the language you want, you could
develop a custom DocTransformer that will take in input the language and
translate, but I don't see that much benefit in that.

b) return only the documents that originally were of that language. This
case is easy, you add a fq at queyTime to filter only the documents of the
language you want ( at indexing time you identify the language)

c) return the original content of the document, this is quite easy. You can
store the generic "text" field, and always return that.

Let us know for further discussion,

Cheers

On Sun, Jun 5, 2016 at 9:57 PM, Riedl, Johannes <
johannes.ri...@uni-tuebingen.de> wrote:


Hi all,

we are currently in search of a solution for switching between different
languages in the query results and keeping the possibility to perform a
search in several languages in parallel.  The overall aim would be a
constant field name and a an additional Solr parameter "lang=XX_YY" that
allows to return the results in the chosen language while searches are
applied to all languages. Setting up several cores to obtain a generic
field name is not an option. Does anyone know of a clean way to achieve
this, particularly routing content indexed to a generic field (e.g. title)
to a "background field" (e.g. title_en, title_fr) etc on the fly and
retrieving it from there depending on the language chosen.

Background: So far, we have investigated the multi-language field approach
offered by Trey Grainger in the code examples for "Solr in Action" (
https://github.com/treygrainger/solr-in-action.git, chapter 14), an
extension to the ordinary textField that allows to use a generic field name
and the language is encoded at the beginning of the field content and
appropriate index and query analyzers associated to dummy fields in
schema.xml. If there is a way to store data in these dummy fields and
additionally the lang parameter is added we might be done.

Thanks a lot, best regards

Johannes