Re: Language support

2016-08-23 Thread Walter Underwood
Synonyms are also domain specific. A synonym set for one area may be completely wrong in another. In cooking, arugula and rocket are the same thing. In military or aerospace, missile and rocket are very similar. I would start with librarians. They maintain controlled vocabularies (called “thes

Language support

2016-08-23 Thread Bradley Belyeu
Hi, I’m trying to find a synonym list for any of the following languages: Catalan, Farsi, Hindi, Korean, Latvian, Dutch, Romanian, Thai, and Turkish Does anyone know of resources where I can get a synonym list for these languages?

Re: What are the best practices on Multiple Language support in Solr Cloud ?

2014-05-05 Thread shamik
om/What-are-the-best-practices-on-Multiple-Language-support-in-Solr-Cloud-tp4134006p4134743.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: What are the best practices on Multiple Language support in Solr Cloud ?

2014-05-02 Thread Nicole Lacoste
Hi Shamik, I don't have an answer for you, just a couple of comments. Why not use dynamic field definitions in the schema? As you say most of your fields are not analysed you just add a language tag _en, _fr, _de, ...) to the field when you index or query. Then you can add languages as you need

What are the best practices on Multiple Language support in Solr Cloud ?

2014-04-30 Thread Shamik Bandopadhyay
Hi, I'm trying to implement multiple language support in Solr Cloud (4.7). Although we've different languages in index, we were only supporting english in terms of index and query. To provide some context, our current index size is 35 GB with close to 15 million documents. We

Re: eDisMax, multiple language support and stopwords

2013-11-11 Thread Liu Bo
; > -Original message- > > > From:Tom Mortimer > > > Sent: Thursday 7th November 2013 12:50 > > > To: solr-user@lucene.apache.org > > > Subject: eDisMax, multiple language support and stopwords > > > > > > Hi all, > > > > >

Re: eDisMax, multiple language support and stopwords

2013-11-07 Thread Tom Mortimer
085 > > -Original message- > > From:Tom Mortimer > > Sent: Thursday 7th November 2013 12:50 > > To: solr-user@lucene.apache.org > > Subject: eDisMax, multiple language support and stopwords > > > > Hi all, > > > > Thanks for the help and advice I'v

RE: eDisMax, multiple language support and stopwords

2013-11-07 Thread Markus Jelsma
-Minimum-Match-Stopwords-Bug-td493483.html https://issues.apache.org/jira/browse/SOLR-3085 -Original message- > From:Tom Mortimer > Sent: Thursday 7th November 2013 12:50 > To: solr-user@lucene.apache.org > Subject: eDisMax, multiple language support and stopwords > > H

eDisMax, multiple language support and stopwords

2013-11-07 Thread Tom Mortimer
Hi all, Thanks for the help and advice I've got here so far! Another question - I want to support stopwords at search time, so that e.g. the query "oscar and wilde" is equivalent to "oscar wilde" (this is with lowercaseOperators=false). Fair enough, I have stopword "and" in the query analyser cha

Re: copyField at search time / multi-language support

2011-03-29 Thread Erick Erickson
This may not be all that helpful, but have you looked at edismax? https://issues.apache.org/jira/browse/SOLR-1553 It allows the full Solr query syntax while preserving the goodness of dismax. This is standard equipment on 3.1, which is being released even as we speak, and I also know it's being u

Re: copyField at search time / multi-language support

2011-03-29 Thread lboutros
p;i=0&by-user=t>> > wrote: > > > From: Markus Jelsma <[hidden > > > email]<http://user/SendEmail.jtp?type=node&node=2747011&i=1&by-user=t>> > > > > Subject: Re: copyField at search time / multi-language support > > > To: [hi

Re: copyField at search time / multi-language support

2011-03-28 Thread Markus Jelsma
a wrote: > > From: Markus Jelsma > > Subject: Re: copyField at search time / multi-language support > > To: solr-user@lucene.apache.org > > Cc: "Andy" > > Date: Tuesday, March 29, 2011, 1:29 AM > > https://issues.apache.org/jira/browse/SOLR-1979 &g

Re: copyField at search time / multi-language support

2011-03-28 Thread Andy
Thanks Markus. Do you know if this patch is good enough for production use? Thanks. Andy --- On Tue, 3/29/11, Markus Jelsma wrote: > From: Markus Jelsma > Subject: Re: copyField at search time / multi-language support > To: solr-user@lucene.apache.org > Cc: "Andy" >

Re: copyField at search time / multi-language support

2011-03-28 Thread Markus Jelsma
ct: copyField at search time / multi-language support > > To: solr-user@lucene.apache.org > > Date: Monday, March 28, 2011, 4:45 AM > > Hi, > > > > Here's my problem: I'm indexing a corpus with text in a > > variety of > > languages. I'm pla

Re: copyField at search time / multi-language support

2011-03-28 Thread Andy
Tom, Could you share the method you use to perform language detection? Any open source tools that do that? Thanks. --- On Mon, 3/28/11, Tom Mortimer wrote: > From: Tom Mortimer > Subject: copyField at search time / multi-language support > To: solr-user@lucene.apache.org >

Re: copyField at search time / multi-language support

2011-03-28 Thread Gora Mohanty
On Mon, Mar 28, 2011 at 2:15 PM, Tom Mortimer wrote: > Hi, > > Here's my problem: I'm indexing a corpus with text in a variety of > languages. I'm planning to detect these at index time and send the > text to one of a suitably-configured field (e.g. "mytext_de" for > German, "mytext_cjk" for Chine

copyField at search time / multi-language support

2011-03-28 Thread Tom Mortimer
Hi, Here's my problem: I'm indexing a corpus with text in a variety of languages. I'm planning to detect these at index time and send the text to one of a suitably-configured field (e.g. "mytext_de" for German, "mytext_cjk" for Chinese/Japanese/Korean etc.) At search time I want to search all of

Re: Help on Multi-language support

2011-03-06 Thread Jan Høydahl
ther language, or > shall i just change my schema to accomodate those additional language > related fields? > > > Thanks. Your help is appreciated. > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Help-on-Multi-language-support-tp2636054p2636054.html > Sent from the Solr - User mailing list archive at Nabble.com.

Re: Help on Multi-language support

2011-03-04 Thread cyang2010
This is the solr schema: -- View this message in context: http://lucene.472066.n3.nabble.com/Help-on-Multi-language-support-tp2636054p2636065.html Sent from the Solr - User mailing list archive at Nabble.com.

Help on Multi-language support

2011-03-04 Thread cyang2010
66.n3.nabble.com/Help-on-Multi-language-support-tp2636054p2636054.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: slovene language support

2010-07-19 Thread Robert Muir
Hello, There is some information here (prototype stemmer) about support in snowball. But Martin Porter had some unanswered questions/reservations so nothing ever got added to snowball: http://snowball.tartarus.org/archives/snowball-discuss/0725.html

slovene language support

2010-07-19 Thread Markus Goldbach
Hi, I want to setup an solr with support for several languages. The language list includes slovene, unfortunately I found nothing about it in the wiki. Has some one experiences with solr 1.4 and slovene? thanks for help Markus

Re: Polish language support?

2010-07-09 Thread Robert Muir
IRC trying to help someone find Polish-language support for Solr. > > Seems lucene has nothing to offer? Found one stemmer that looks to be > compatibly licensed in case someone wants to take a shot at > incorporating it: http://www.getopt.org/stempel/ > > -Peter > >

Polish language support?

2010-07-09 Thread Peter Wolanin
In IRC trying to help someone find Polish-language support for Solr. Seems lucene has nothing to offer? Found one stemmer that looks to be compatibly licensed in case someone wants to take a shot at incorporating it: http://www.getopt.org/stempel/ -Peter -- Peter M. Wolanin, Ph.D. Momentum

Re: Hindi language support in solr

2010-01-22 Thread Ranveer kumar
Hi Robert, Thanks for reply. As you write, I used "textgen" but still not able to search hindi text. Might be missing some important configuration. following is my schema.xml configuration

Re: Hindi language support in solr

2010-01-21 Thread Robert Muir
hello, take a look at field type "textgen" (a general unstemmed text field) the whitespacetokenizer + worddelimiterfilter used by this type will work correctly for hindi tokenization and punctuation. On Thu, Jan 21, 2010 at 10:55 AM, Ranveer kumar wrote: > Hi all, > > I am very new in solr. > I

Hindi language support in solr

2010-01-21 Thread Ranveer kumar
Hi all, I am very new in solr. I download latest release 1.4 and install. For Indexing and Searching I am using SolrJ api. My Question is "How to enable solr to search hindi language text ?". Please Help me.. thanks with regards Ranveer K Kumar

Re: Multi language support

2010-01-13 Thread Lance Norskog
Robert Muir: Thank you for the pointer to that paper! On Wed, Jan 13, 2010 at 6:29 AM, Paul Libbrecht wrote: > Isn't the conclusion here that some "stopword and stemming free matching" > should be the best match if ever and to then gently degrade to  weaker forms > of matching? > > paul > > > Le

Re: Multi language support

2010-01-13 Thread Paul Libbrecht
Isn't the conclusion here that some "stopword and stemming free matching" should be the best match if ever and to then gently degrade to weaker forms of matching? paul Le 13-janv.-10 à 07:08, Walter Underwood a écrit : There is a band named "The The". And a producer named "Don Was". For

Re: Multi language support

2010-01-13 Thread Robert Muir
right, but we should not encourage users to significantly degrade overall relevance for all movies due to a few movies and a band (very special cases, as I said). In english, by not using stopwords, it doesn't really degrade relevance that much, so its a reasonable decision to make. This is not tr

Re: Multi language support

2010-01-12 Thread Walter Underwood
There is a band named "The The". And a producer named "Don Was". For a list of all-stopword movie titles at Netflix, see this post: http://wunderwood.org/most_casual_observer/2007/05/invisible_titles.html My favorite is "To Be and To Have (Être et Avoir)", which is all stopwords in two language

Re: Multi language support

2010-01-12 Thread Robert Muir
sorry, i forgot to include this 2009 paper comparing what stopwords do across 3 languages: http://doc.rero.ch/lm.php?url=1000,43,4,20091218142456-GY/Dolamic_Ljiljana_-_When_Stopword_Lists_Make_the_Difference_20091218.pdf in my opinion, if stopwords annoy your users for very special cases like 'th

Re: Multi language support

2010-01-12 Thread Robert Muir
I don't think this is something to consider across the board for all languages. The same grammatical units that are part of a word in one language (and removed by stemmers) are independent morphemes in others (and should be stopwords) so please take this advice on a case-by-case basis for each lan

Re: Multi language support

2010-01-12 Thread Lance Norskog
There are a lot of projects that don't use stopwords any more. You might consider dropping them altogether. On Mon, Jan 11, 2010 at 2:25 PM, Don Werve wrote: > This is the way I've implemented multilingual search as well. > > 2010/1/11 Markus Jelsma > >> Hello, >> >> >> We have implemented langu

Re: Multi language support

2010-01-11 Thread Don Werve
This is the way I've implemented multilingual search as well. 2010/1/11 Markus Jelsma > Hello, > > > We have implemented language specific search in Solr using language > specific fields and field types. For instance, an en_text field type can > use an English stemmer, and list of stopwords and

Re: Multi language support

2010-01-11 Thread Markus Jelsma
Hello, We have implemented language specific search in Solr using language specific fields and field types. For instance, an en_text field type can use an English stemmer, and list of stopwords and synonyms. We, however did not use specific stopwords, instead we used one list shared by both langu

Multi language support

2010-01-11 Thread Daniel Persson
Hi Solr users. I'm trying to set up a site with Solr search integrated. And I use the SolJava API to feed the index with search documents. At the moment I have only activated search on the English portion of the site. I'm interested in using as many features of solr as possible. Synonyms, Stopword

Re: Multi-language support

2009-04-14 Thread Grant Ingersoll
On Apr 9, 2009, at 7:09 AM, revas wrote: Hi, To reframe my earlier question Some languages have just analyzers only but nostemmer from snowball porter,then does the analyzer take care of stemming as well? Some languages only have the stemmer from snowball but no analyzer? Some have both. C

Multi-language support

2009-04-09 Thread revas
Hi, To reframe my earlier question Some languages have just analyzers only but nostemmer from snowball porter,then does the analyzer take care of stemming as well? Some languages only have the stemmer from snowball but no analyzer? Some have both. Can we say then that solr supports all the abo

Re: Multiple language support

2008-12-29 Thread Otis Gospodnetic
he.org > Sent: Monday, December 29, 2008 4:52:19 AM > Subject: Multiple language support > > Hi All, > > I have a multiple language supporting schema in which there is a separate > field > for every language. > > I have a field "product_name" to store p

Multiple language support

2008-12-29 Thread Deshpande, Mukta
Hi All, I have a multiple language supporting schema in which there is a separate field for every language. I have a field "product_name" to store product name and its description that can be in any user preferred language. This can be stored in fields product_name_EN if user prefers English

Re: Language support

2008-03-20 Thread Benson Margulies
Oh, Walter! Hello! I thought that name was familiar. Greetings from Basis. All that makes sense. On Thu, Mar 20, 2008 at 1:00 PM, Walter Underwood <[EMAIL PROTECTED]> wrote: > Extreme, but guaranteed to work and it avoids bad IDF when there are > inter-language collisions. In Ultraseek, we only s

Re: Language support

2008-03-20 Thread Walter Underwood
Extreme, but guaranteed to work and it avoids bad IDF when there are inter-language collisions. In Ultraseek, we only stored the hash, so the size of the source token didn't matter. Trademarks are a bad source of collisions and anomalous IDF. If you have LaserJet support docs in 20 languages, the

Re: Language support

2008-03-20 Thread Benson Margulies
Token/by/token seems a bit extreme. Are you concerned with macaronic documents? On Thu, Mar 20, 2008 at 12:42 PM, Walter Underwood <[EMAIL PROTECTED]> wrote: > Nice list. > > You may still need to mark the language of each document. There are > plenty of cross-language collisions: "die" and "boot

Re: Language support

2008-03-20 Thread Benson Margulies
;> > >> I guess what I'm asking is, if my approach seems convoluted, I'm > >> probably doing it wrong, so how *a*re people solving the problem of > >> searching over multiple languages? What is the canonical way to do > >> this? > >> > >> > >&g

Re: Language support

2008-03-20 Thread Walter Underwood
Nice list. You may still need to mark the language of each document. There are plenty of cross-language collisions: "die" and "boot" have different meanings in German and English. Proper nouns ("Laserjet") may be the same in all languages, a different problem if you are trying to get answers in on

Re: Language support

2008-03-20 Thread David King
ing it wrong, so how *a*re people solving the problem of searching over multiple languages? What is the canonical way to do this? Nicolas -Message d'origine- De : David King [mailto:[EMAIL PROTECTED] Envoyé : mercredi 19 mars 2008 20:07 À : solr-user@lucene.apache.org Obj

Re: Language support

2008-03-20 Thread Benson Margulies
; > > > > > > > Nicolas > > > > -Message d'origine- > > De : David King [mailto:[EMAIL PROTECTED] > > Envoyé : mercredi 19 mars 2008 20:07 > > À : solr-user@lucene.apache.org > > Objet : Language support > > > > This has

Re: Language support

2008-03-20 Thread David King
2008 20:07 À : solr-user@lucene.apache.org Objet : Language support This has probably been asked before, but I'm having trouble finding it. Basically, we want to be able to search for content across several languages, given that we know what language a datum and a query are in. Is there an obvious way

RE: Language support

2008-03-20 Thread nicolas . dessaigne
-user@lucene.apache.org Objet : Language support This has probably been asked before, but I'm having trouble finding it. Basically, we want to be able to search for content across several languages, given that we know what language a datum and a query are in. Is there an obvious way to do this?

Language support

2008-03-19 Thread David King
This has probably been asked before, but I'm having trouble finding it. Basically, we want to be able to search for content across several languages, given that we know what language a datum and a query are in. Is there an obvious way to do this? Here's the longer version: I am trying to in