Permutations of entries in a multivalued field
Hello all, we are facing the following problem: we use a multivalued string field that contains entries of the kind A/B/C/, where A,B,C are terms. We are now looking for a simple way to also find all permutations of A/B/C, so e.g. B/A/C. As a workaround we added a new field that contains all entries alphabetically sorted and guarantee sorting on the user side. However - since this is limited in some ways - is there a simple way to either index in a way such that solely A/B/C and all permutations are found (using e.g. type=text is not an option since a term could occur in a different entry of the multivalued field) or trigger an alphabetical sorting of incoming queries. Thanks a lot for your feedback, best regards Johannes
Re: Permutations of entries in a multivalued field
Thanks a lot for these useful hints. Best, Johannes On 18.12.2015 20:59, Allison, Timothy B. wrote: Duh, didn't realize you could set inOrder in Solr. Y, that's the better solution. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Friday, December 18, 2015 2:27 PM To: solr-user Subject: Re: Permutations of entries in a multivalued field The other thing to check is the ComplexPhraseQueryParser, see: https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-ComplexPhraseQueryParser It uses the Span queries to build up the query... Best, Erick On Fri, Dec 18, 2015 at 11:23 AM, Allison, Timothy B. wrote: Hi Johannes, I suspect that Scott's answer would be more efficient than the following, and I may be misunderstanding the problem! This type of search is supported at the Lucene level by a SpanNearQuery with inOrder set to false. So, how do you get a SpanQuery in Solr? You might want to look at the SurroundQueryParser, and I have an alternate (LUCENE-5205/SOLR-5410) here: https://github.com/tballison/lucene-addons. If you do find an appropriate parser, make sure that your position increment gap is > 0 on your text field definition, and then you'd never incorrectly get a hit across field entries of: [0] A B [1] C Best, Tim On Wed, Dec 16, 2015 at 8:38 AM, Johannes Riedl < johannes.ri...@uni-tuebingen.de> wrote: Hello all, we are facing the following problem: we use a multivalued string field that contains entries of the kind A/B/C/, where A,B,C are terms. We are now looking for a simple way to also find all permutations of A/B/C, so e.g. B/A/C. As a workaround we added a new field that contains all entries alphabetically sorted and guarantee sorting on the user side. However - since this is limited in some ways - is there a simple way to either index in a way such that solely A/B/C and all permutations are found (using e.g. type=text is not an option since a term could occur in a different entry of the multivalued field) or trigger an alphabetical sorting of incoming queries. Thanks a lot for your feedback, best regards Johannes -- Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
Re: Multilingual Solr
Hi Alessandro, hi Alexandre, Thanks a lot for your reply and your considerations and hints. We use a web front end that comes bundled with Solr. It currently uses a single core approach. We would like to stick to the original setup as closely as possible to avoid administrative overhead and to not prevent the possible use of several cores in a different context in the future. This is the reason why we would like to hide the language fields completely from the front end apart from specifying an additional language parameter. Language detection on indexing is currently not an issue for us, as we get the input in a standardized format and thus can determine the language beforehand. https://github.com/treygrainger/solr-in-action/blob/master/example-docs/ch14/cores/multi-language-field/conf/schema.xml shows an example how the multiText field type makes use of language specific field types to specify the analyzers that are being used. The core issue for us (pun intended ;-)) is to find out whether it is possible to extend this approach to only return the selected language(s), i.e. to transparently add something like nested documents. Best regards Johannes On 06.06.2016 10:10, Alessandro Benedetti wrote: Hi Johannes, nothing out of the box unfortunately but could be a nice idea and contribution. If having a multi-core setup is not an option ( out of curiousity, can I ask why ?) you could proceed in this way : 1) you define in the schema N field variation per field you are interested in. N is the number of language you can support. Given for example the text field you define : text field not indexed, only stored text_en indexed text_fr indexed text_it indexed ... 2) At indexing time you can develop a custom updateRequestProcessor that will identify the language ( Solr internal libraries offer support for that) and address the correct text field to index the content . If you want to index also translations, you need to rely on some third party libraries to do that. 3) At query time you can address in parallel all the fields you want, with the edismax query parser for example . 4) For rendering the results, I don't have exactly clear, do you want to : a) translate the document content in the language you want, you could develop a custom DocTransformer that will take in input the language and translate, but I don't see that much benefit in that. b) return only the documents that originally were of that language. This case is easy, you add a fq at queyTime to filter only the documents of the language you want ( at indexing time you identify the language) c) return the original content of the document, this is quite easy. You can store the generic "text" field, and always return that. Let us know for further discussion, Cheers On Sun, Jun 5, 2016 at 9:57 PM, Riedl, Johannes < johannes.ri...@uni-tuebingen.de> wrote: Hi all, we are currently in search of a solution for switching between different languages in the query results and keeping the possibility to perform a search in several languages in parallel. The overall aim would be a constant field name and a an additional Solr parameter "lang=XX_YY" that allows to return the results in the chosen language while searches are applied to all languages. Setting up several cores to obtain a generic field name is not an option. Does anyone know of a clean way to achieve this, particularly routing content indexed to a generic field (e.g. title) to a "background field" (e.g. title_en, title_fr) etc on the fly and retrieving it from there depending on the language chosen. Background: So far, we have investigated the multi-language field approach offered by Trey Grainger in the code examples for "Solr in Action" ( https://github.com/treygrainger/solr-in-action.git, chapter 14), an extension to the ordinary textField that allows to use a generic field name and the language is encoded at the beginning of the field content and appropriate index and query analyzers associated to dummy fields in schema.xml. If there is a way to store data in these dummy fields and additionally the lang parameter is added we might be done. Thanks a lot, best regards Johannes