Re: looking for multilanguage indexing best practice/hint

2008-12-21 Thread Julian Davchev
Dude, There was already a warning with stealing thread. Please do something about it as advised. Run your own if want answers for your problem. Cheers, Sujatha Arun wrote: > Thanks Daniel and Erik, > > The requirement from the user end is to only search in that particular > language and not acro

Re: looking for multilanguage indexing best practice/hint

2008-12-19 Thread Sujatha Arun
Thanks Daniel and Erik, The requirement from the user end is to only search in that particular language and not across languages. Also going forward we will be adding more languages. so if i have separate fields for each language ,then we need to change the schema everytime and that will not sca

Re: looking for multilanguage indexing best practice/hint

2008-12-18 Thread Julian Davchev
Thanks Erick, I think I will go with different language fields as I want to give different stop words, analyzers etc. I might also consider scheme per language so scaling is more flexible as I was already advised but this will really make sense if I have more than one server I guess, else just all

Re: looking for multilanguage indexing best practice/hint

2008-12-18 Thread Chris Hostetter
: Subject: looking for multilanguage indexing best practice/hint : References: <49483388.8030...@drun.net> : <502b8706-828b-4eaa-886d-af0dccf37...@stylesight.com> : <8c0c601f0812170825j766cf005i9546b2604a19f...@mail.gmail.com> : In-Reply-To: <8c0c601f0812170825j766cf005i9546b2604a19

RE: looking for multilanguage indexing best practice/hint

2008-12-18 Thread Daniel Alheiros
you can pre-define some base query parts and also do score boosting behind the scenes. I hope it helps. Regards, Daniel -Original Message- From: Sujatha Arun [mailto:suja.a...@gmail.com] Sent: 18 December 2008 04:15 To: solr-user@lucene.apache.org Subject: Re: looking for multilanguage

Re: looking for multilanguage indexing best practice/hint

2008-12-18 Thread Erick Erickson
See the CJKAnalyzer for a start, StandardAnalyzer won't help you much. Also, tell us a little more about your requirements. For instance, if a user submits a query in Japanese, do you want to search across documents in the other languages too? And will you want to associate different analyzers wit

Re: looking for multilanguage indexing best practice/hint

2008-12-17 Thread Sujatha Arun
Hi, I am prototyping lanuage search using solr 1.3 .I have 3 fields in the schema -id,content and language. I am indexing 3 pdf files ,the languages are foroyo,chinese and japanese. I use xpdf to convert the content of pdf to text and push the text to solr in the content field. What is the ana

RE: looking for multilanguage indexing best practice/hint

2008-12-17 Thread Feak, Todd
Don't forget to consider scaling concerns (if there are any). There are strong differences in the number of searches we receive for each language. We chose to create separate schema and config per language so that we can throw servers at a particular language (or set of languages) if we needed to.

Re: looking for multilanguage indexing best practice/hint

2008-12-17 Thread Alexander Ramos Jardim
I think this is up to your needs. If you will make one search in many languages, and your doc's won't get too big, you can put all the data in one schema.xml and configure your field types by a language basis. 2008/12/17 Julian Davchev > Hi, > From my study on solr and lucene so far it seems t