Thanks Trey! Last week I ordered the eBook. I look forward to seeing the
information in it.

Jeremy


On Thu, Mar 27, 2014 at 6:03 PM, Trey Grainger <solrt...@gmail.com> wrote:

> In addition to the two approaches Liu Bo mentioned (separate core per
> language and separate field per language), it is also possible to put
> multiple languages in a single field. This saves you the overhead of
> multiple cores and of having to search across multiple fields at query
> time. The idea here is that you can run multiple analyzers (i.e. one for
> German, one for English, one for Chinese, etc.) and stack the outputted
> TokenStreams for each of these within a single field. It is also possible
> to swap out the languages you want to use on a case-by-case basis (i.e.
> per-document, per field, or even per word) if you really need to for
> advanced use cases.
>
> All three of these methods, including code examples and the pros and cons
> of each are discussed in the Multilingual Search chapter of Solr in Action,
> which Alexandre referenced. If you don't have the book, you can also just
> download and run the code examples for free, though they may be harder to
> follow without the context from the book.
>
> Thanks,
>
> Trey Grainger
> Co-author, Solr in Action
> Director of Engineering, Search & Analytics @CareerBuilder
>
>
>
>
>
> On Wed, Mar 26, 2014 at 4:34 AM, Liu Bo <diabl...@gmail.com> wrote:
>
> > Hi Jeremy
> >
> > There're a lot of multi language discussions, two main approaches
> >  1. like yours, a language is one core
> >  2. all in one core, different language has it's own field.
> >
> > We have multi-language support in a single core, each multilingual field
> > has it's own suffix such as name_en_US. We customized query handler to
> hide
> > the query details to client.
> > The main reason we want to do this is about NRT index and search,
> > take product for example:
> >
> >     product has price, quantity which is common and it's used by
> filtering
> > and sorting, name, description is multi language field,
> >     if we split product in do different cores, the common field updating
> > may end up a update in all of the multi language cores.
> >
> > As to scalability, we don't change solr cores/collections when a new
> > language is added, but we probably need update our customized index
> process
> > and run a full re-index.
> >
> > This approach suits our requirement for now, but you may have your own
> > concerns.
> >
> > We have similar "suggest filter" problem like yours, we want to return
> > suggest result filtering by stores. I can't find a way to build
> dictionary
> > with query at my version of solr 4.6
> >
> > What I do is run a query on a N-Gram analyzed field and with filter
> queries
> > on store_id field. The "suggest" is actually a query. It may not perform
> as
> > well as suggestion but can do the trick.
> >
> > You can try it to build a additional N-GRAM field for suggestion only and
> > search on it with fq on your "Locale" field.
> >
> > All the best
> >
> > Liu Bo
> >
> >
> >
> >
> > On 25 March 2014 09:15, Alexandre Rafalovitch <arafa...@gmail.com>
> wrote:
> >
> > > Solr In Action has a significant discussion on the multi-lingual
> > > approach. They also have some code samples out there. Might be worth a
> > > look
> > >
> > > Regards,
> > >    Alex.
> > > Personal website: http://www.outerthoughts.com/
> > > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> > > - Time is the quality of nature that keeps events from happening all
> > > at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> > > book)
> > >
> > >
> > > On Tue, Mar 25, 2014 at 4:43 AM, Jeremy Thomerson
> > > <jer...@thomersonfamily.com> wrote:
> > > > I recently deployed Solr to back the site search feature of a site I
> > work
> > > > on. The site itself is available in hundreds of languages. With the
> > > initial
> > > > release of site search we have enabled the feature for ten of those
> > > > languages. This is distributed across eight cores, with two Chinese
> > > > languages plus Korean combined into one CJK core and each of the
> other
> > > > seven languages in their own individual cores. The reason for
> splitting
> > > > these into separate cores was so that we could have the same field
> > names
> > > > across all cores but have different configuration for analyzers, etc,
> > per
> > > > core.
> > > >
> > > > Now I have some questions on this approach.
> > > >
> > > > 1) Scalability: Considering I need to scale this to many dozens more
> > > > languages, perhaps hundreds more, is there a better way so that I
> don't
> > > end
> > > > up needing dozens or hundreds of cores? My initial plan was that many
> > > > languages that didn't have special support within Solr would simply
> get
> > > > lumped into a single "default" core that has some default analyzers
> > that
> > > > are applicable to the majority of languages.
> > > >
> > > > 1b) Related to this: is there a practical limit to the number of
> cores
> > > that
> > > > can be run on one instance of Lucene?
> > > >
> > > > 2) Auto Suggest: In phase two I intend to add auto-suggestions as a
> > user
> > > > types a query. In reviewing how this is implemented and how the
> > > suggestion
> > > > dictionary is built I have concerns. If I have more than one language
> > in
> > > a
> > > > single core (and I keep the same field name for suggestions on all
> > > > languages within a core) then it seems that I could get suggestions
> > from
> > > > another language returned with a suggest query. Is there a way to
> > build a
> > > > separate dictionary for each language, but keep these languages
> within
> > > the
> > > > same core?
> > > >
> > > > If it's helpful to know: I have a field in every core for "Locale".
> > > Values
> > > > will be the locale of the language of that document, i.e. "en", "es",
> > > > "zh_hans", etc. I'd like to be able to: 1) when building a suggestion
> > > > dictionary, divide it into multiple dictionaries, grouping them by
> > > locale,
> > > > and 2) supply a parameter to the suggest query that allows the
> suggest
> > > > component to only return suggestions from the appropriate dictionary
> > for
> > > > that locale.
> > > >
> > > > If the answer to #1 is "keep splitting groups of languages that have
> > > > different analyzers into their own cores" and the answer to #2 is
> > "that's
> > > > not supported", then I'd be curious: where would I start to write my
> > own
> > > > extension that supported #2? I looked last night at the suggest
> lookup
> > > > classes, dictionary classes, etc. But I didn't see a clear point
> where
> > it
> > > > would be clean to implement something like I'm suggesting above.
> > > >
> > > > Best Regards,
> > > > Jeremy Thomerson
> > >
> >
> >
> >
> > --
> > All the best
> >
> > Liu Bo
> >
>

Reply via email to