Thanks Trey! Last week I ordered the eBook. I look forward to seeing the information in it.
Jeremy On Thu, Mar 27, 2014 at 6:03 PM, Trey Grainger <solrt...@gmail.com> wrote: > In addition to the two approaches Liu Bo mentioned (separate core per > language and separate field per language), it is also possible to put > multiple languages in a single field. This saves you the overhead of > multiple cores and of having to search across multiple fields at query > time. The idea here is that you can run multiple analyzers (i.e. one for > German, one for English, one for Chinese, etc.) and stack the outputted > TokenStreams for each of these within a single field. It is also possible > to swap out the languages you want to use on a case-by-case basis (i.e. > per-document, per field, or even per word) if you really need to for > advanced use cases. > > All three of these methods, including code examples and the pros and cons > of each are discussed in the Multilingual Search chapter of Solr in Action, > which Alexandre referenced. If you don't have the book, you can also just > download and run the code examples for free, though they may be harder to > follow without the context from the book. > > Thanks, > > Trey Grainger > Co-author, Solr in Action > Director of Engineering, Search & Analytics @CareerBuilder > > > > > > On Wed, Mar 26, 2014 at 4:34 AM, Liu Bo <diabl...@gmail.com> wrote: > > > Hi Jeremy > > > > There're a lot of multi language discussions, two main approaches > > 1. like yours, a language is one core > > 2. all in one core, different language has it's own field. > > > > We have multi-language support in a single core, each multilingual field > > has it's own suffix such as name_en_US. We customized query handler to > hide > > the query details to client. > > The main reason we want to do this is about NRT index and search, > > take product for example: > > > > product has price, quantity which is common and it's used by > filtering > > and sorting, name, description is multi language field, > > if we split product in do different cores, the common field updating > > may end up a update in all of the multi language cores. > > > > As to scalability, we don't change solr cores/collections when a new > > language is added, but we probably need update our customized index > process > > and run a full re-index. > > > > This approach suits our requirement for now, but you may have your own > > concerns. > > > > We have similar "suggest filter" problem like yours, we want to return > > suggest result filtering by stores. I can't find a way to build > dictionary > > with query at my version of solr 4.6 > > > > What I do is run a query on a N-Gram analyzed field and with filter > queries > > on store_id field. The "suggest" is actually a query. It may not perform > as > > well as suggestion but can do the trick. > > > > You can try it to build a additional N-GRAM field for suggestion only and > > search on it with fq on your "Locale" field. > > > > All the best > > > > Liu Bo > > > > > > > > > > On 25 March 2014 09:15, Alexandre Rafalovitch <arafa...@gmail.com> > wrote: > > > > > Solr In Action has a significant discussion on the multi-lingual > > > approach. They also have some code samples out there. Might be worth a > > > look > > > > > > Regards, > > > Alex. > > > Personal website: http://www.outerthoughts.com/ > > > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch > > > - Time is the quality of nature that keeps events from happening all > > > at once. Lately, it doesn't seem to be working. (Anonymous - via GTD > > > book) > > > > > > > > > On Tue, Mar 25, 2014 at 4:43 AM, Jeremy Thomerson > > > <jer...@thomersonfamily.com> wrote: > > > > I recently deployed Solr to back the site search feature of a site I > > work > > > > on. The site itself is available in hundreds of languages. With the > > > initial > > > > release of site search we have enabled the feature for ten of those > > > > languages. This is distributed across eight cores, with two Chinese > > > > languages plus Korean combined into one CJK core and each of the > other > > > > seven languages in their own individual cores. The reason for > splitting > > > > these into separate cores was so that we could have the same field > > names > > > > across all cores but have different configuration for analyzers, etc, > > per > > > > core. > > > > > > > > Now I have some questions on this approach. > > > > > > > > 1) Scalability: Considering I need to scale this to many dozens more > > > > languages, perhaps hundreds more, is there a better way so that I > don't > > > end > > > > up needing dozens or hundreds of cores? My initial plan was that many > > > > languages that didn't have special support within Solr would simply > get > > > > lumped into a single "default" core that has some default analyzers > > that > > > > are applicable to the majority of languages. > > > > > > > > 1b) Related to this: is there a practical limit to the number of > cores > > > that > > > > can be run on one instance of Lucene? > > > > > > > > 2) Auto Suggest: In phase two I intend to add auto-suggestions as a > > user > > > > types a query. In reviewing how this is implemented and how the > > > suggestion > > > > dictionary is built I have concerns. If I have more than one language > > in > > > a > > > > single core (and I keep the same field name for suggestions on all > > > > languages within a core) then it seems that I could get suggestions > > from > > > > another language returned with a suggest query. Is there a way to > > build a > > > > separate dictionary for each language, but keep these languages > within > > > the > > > > same core? > > > > > > > > If it's helpful to know: I have a field in every core for "Locale". > > > Values > > > > will be the locale of the language of that document, i.e. "en", "es", > > > > "zh_hans", etc. I'd like to be able to: 1) when building a suggestion > > > > dictionary, divide it into multiple dictionaries, grouping them by > > > locale, > > > > and 2) supply a parameter to the suggest query that allows the > suggest > > > > component to only return suggestions from the appropriate dictionary > > for > > > > that locale. > > > > > > > > If the answer to #1 is "keep splitting groups of languages that have > > > > different analyzers into their own cores" and the answer to #2 is > > "that's > > > > not supported", then I'd be curious: where would I start to write my > > own > > > > extension that supported #2? I looked last night at the suggest > lookup > > > > classes, dictionary classes, etc. But I didn't see a clear point > where > > it > > > > would be clean to implement something like I'm suggesting above. > > > > > > > > Best Regards, > > > > Jeremy Thomerson > > > > > > > > > > > -- > > All the best > > > > Liu Bo > > >