.
Just one clarification, when you say ICUFilterFactory am I correct in thinking
its ICUFodingFilterFactory.
Thanks,
Rishi.
-Original Message-
From: Tom Burton-West
To: solr-user
Sent: Wed, Feb 25, 2015 4:33 pm
Subject: Re: Basic Multilingual search capability
Hi Rishi,
As
Hi Rishi,
As others have indicated Multilingual search is very difficult to do well.
At HathiTrust we've been using the ICUTokenizer and ICUFilterFactory to
deal with having materials in 400 languages. We also added the
CJKBigramFilter to get better precision on CJK queries. We don'
: Re: Basic Multilingual search capability
Given the limited needs, I would probably do something like this:
1) Put a language identifier in the UpdateRequestProcessor chain
during indexing and route out at least known problematic languages,
such as Chinese, Japanese, Arabic into individual fields
forward to trying out once its integrated to main.
Thanks,
Rishi.
-Original Message-
From: Trey Grainger
To: solr-user
Sent: Tue, Feb 24, 2015 1:40 am
Subject: Re: Basic Multilingual search capability
Hi Rishi,
I don't generally recommend a language-insensitive approach excep
Given the limited needs, I would probably do something like this:
1) Put a language identifier in the UpdateRequestProcessor chain
during indexing and route out at least known problematic languages,
such as Chinese, Japanese, Arabic into individual fields
2) Put everything else together into one f
n
use index-time language detection to map to the appropriate
fields/analyzers if you are otherwise unaware of the languages of the
content from your application layer. The third option requires custom code
(included in the large Multilingual Search chapter of Solr in Action
<http://solrinaction.com
solr-user
Sent: Mon, Feb 23, 2015 11:17 pm
Subject: Re: Basic Multilingual search capability
It isn’t just complicated, it can be impossible.
Do you have content in Chinese or Japanese? Those languages (and some others)
do
not separate words with spaces. You cannot even do word search without a
>
> Thanks,
> Rishi.
>
> -Original Message-
> From: Alexandre Rafalovitch
> To: solr-user
> Sent: Mon, Feb 23, 2015 5:49 pm
> Subject: Re: Basic Multilingual search capability
>
>
> Which languages are you expecting to deal with? Multilingual suppor
Subject: Re: Basic Multilingual search capability
Which languages are you expecting to deal with? Multilingual support
is a complex issue. Even if you think you don't need much, it is
usually a lot more complex than expected, especially around relevancy.
Regards,
Alex.
Sign up for my
xt documents from our end users,
> which can be in any language (sometimes combination) and we cannot determine
> the language of the incoming text. Language detection at index time is not
> necessary.
>
> Which analyzer is recommended to achive basic multilingual search capability
>
nd we cannot determine the language
of the incoming text. Language detection at index time is not necessary.
Which analyzer is recommended to achive basic multilingual search capability
for a use case like this.
I have read a bunch of posts about using a combination standardtokenizer or
ICUtoke
> 1. Modify the qf parameter directly by either adding the "_xx" language
> suffix to each field in qf, or replacing the "xx" for any qf fields that
> already have an "_xx" suffix.
> 2. Have separate "qf_xx" parameters which are customized for specific
> languages and then copy the language-spec
uot; parameters which are customized for specific
languages and then copy the language-specific "qf_xx" parameter to the main
qf parameter based on the language that is detected.
-- Jack Krupansky
-Original Message-
From: Paul Libbrecht
Sent: Friday, July 4, 2014 11:36 AM
T
s to use for the "qf" parameter.
>
> -- Jack Krupansky
>
> -Original Message- From: benjelloun
> Sent: Friday, July 4, 2014 10:52 AM
> To: solr-user@lucene.apache.org
> Subject: multilingual search
>
> Hello,
>
> what i need to do i
2014 10:52 AM
To: solr-user@lucene.apache.org
Subject: multilingual search
Hello,
what i need to do is to detect language of my fields then when i search with
"/select RequestHandler"
how can i define for a search to detect the language of words to choose
which field_langid use.
my con
or on language. i dont want to add stopwords_fr in
stopwords_en.
what i want is to detect the language before the select search then choose
the field_langid for search.
Best regards,
Anass BENJELLOUN
--
View this message in context:
http://lucene.472066.n3.nabble.com/multilingual-search-tp4145639.html
: I want to do multilingual search in single-core solr. That requires to
: define language specific tokenizers in scheme.xml. Say for example, I have
: two tokenizers, one for English ("en") and one for simplified Chinese
: ("zh-cn"). Can I just put following definit
only one field element?
There should be two or?
One for each language.
paul
Le 14 févr. 2012 à 07:34, bing a écrit :
>
> Hi, all,
>
> I want to do multilingual search in single-core solr. That requires to
> define language specific tokenizers in scheme.xml. Say for exampl
Hi, Erick,
Thanks for commenting on this thread, and I think my problem has been
solved. I might start another thread raising technical questions about using
SolrJ.
Thank you again.
Best Regards,
Bing
--
View this message in context:
http://lucene.472066.n3.nabble.com/Multilingual
find many start-up tutorials about that, thus would be grateful if
> any suggestions and hints brought about.
>
> Best
> Bing
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Multilingual-search-in-multicore-solr-tp3698969p3705556.html
> Sent from the Solr - User mailing list archive at Nabble.com.
e.com/Multilingual-search-in-multicore-solr-tp3698969p3705556.html
Sent from the Solr - User mailing list archive at Nabble.com.
core, but still is a concern.
> Thanks.
>
> Best Regards,
> Ni Bing
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Multilingual-search-in-multicore-solr-tp3698969p3702041.html
> Sent from the Solr - User mailing list archive at Nabble.com.
, returned with a set of scores. Is it
confident to conclude that the highest score gives the most confidence of
the results?
Thanks.
Best Regards,
Ni Bing
--
View this message in context:
http://lucene.472066.n3.nabble.com/Multilingual-search-in-multicore-solr-tp3698969p3702041.html
Sent from the
to stick on multicore solr and want to
> see whether the problems can be solved. Meanwhile, I am aware that
> multilingual search does not necessarily need multicore solr, which I have
> learned in previous thread.
> http://lucene.472066.n3.nabble.com/Tika0-10-language-identifier-in-Solr3-5-0-
Hi, all,
I am going to multilingual search in multicore solr. Specifically, the
design of the solr server is like: I have several cores corresponding to
different languages, where each core has its configuration files and data.
I have following questions:
1. While indexing a document, I use
Hi Jan,
I totally agree with what you said.
In a), you talked about boosting. I guess you meant to boost at the client
side, right?
I still have a question:
>> does Solr choose the appropriate analysis for the query. i.e., if a query is
>> compared to a document having English free text (tex
Hi Jan,
I totally agree with what you said.
In a), you talked about boosting. I guess you meant to boost at the client
side, right?
I still have a question:
>> does Solr choose the appropriate analysis for the query. i.e., if a query is
>> compared to a document having English free text (tex
Hi,
I have chosen the same approach as you, indexing content into text_
fields with custom analysis, and it works great. Solr does not have any
overhead with this even if there are hundreds of languages, due to the
schema-less nature of Lucene.
And if you know which language is being searched,
Hi,
I know this topic has been treated many times in the (distant) past, but I
wonder whether there are new better practices/tendencies.
In my application, I'm dealing with documents in different languages. Each
document is monolingual; it has some fields containing free text and a set of
fiel
/Malayalam_script
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
> From: Sachit P. Menon <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Saturday, May 17, 2008 8:44:06 AM
> Subject: RE: MultiLingual Search
>
> Hi A
Hi All,
I would like to know if Indian regional languages (like Malayalam, Kannada,
Tamil, etc.) can also be indexed through Solr.
Thanks and Regards
Sachit P. Menon
DISCLAIMER:
This message (including attachment if any) is confidential and may be
privileged. If you have received this message
On Mon, 12 May 2008 16:16:28 +0530
"Sachit P. Menon" <[EMAIL PROTECTED]> wrote:
> My project requires having the same content (mostly) in multiple languages.
hi Sachit,
please search the archives of the list. this topic seems to come up twice a
week or thereabouts :)
You are of course encoura
gt; from
> the front end.
>
>
>
> Can anyone tell me the steps to implement multilingual search in Solr as
> I'm
> very new to Solr?
>
>
>
>
>
> Thanks,
>
> Sachit
>
>
>
> DISCLAIMER:
> This message (including attachment if any) is co
to implement multilingual search in Solr as I'm
very new to Solr?
Thanks,
Sachit
DISCLAIMER:
This message (including attachment if any) is confidential and may be
privileged. If you have received this message by mistake please notify the
sender by return e-mail and delete this me
Yes. Solr handles UTF-8 and has many analyzers for non-English
languages.
-Grant
On May 9, 2008, at 7:23 AM, Sachit P. Menon wrote:
Can we have a multilingual search using Solr
Thanks and Regards
Sachit P. Menon| Programmer Analyst| MindTree Ltd. |West Campus,
Phase-1,
Global Village
Can we have a multilingual search using Solr
Thanks and Regards
Sachit P. Menon| Programmer Analyst| MindTree Ltd. |West Campus, Phase-1,
Global Village, RVCE Post, Mysore Road, Bangalore-560 059, INDIA |Voice +91
80 26264000 |Extn 65377|Fax +91 80 26264100 | Mob : +91
9986747356
36 matches
Mail list logo