Dude, There was already a warning with stealing thread. Please do something about it as advised. Run your own if want answers for your problem. Cheers,
Sujatha Arun wrote: > Thanks Daniel and Erik, > > The requirement from the user end is to only search in that particular > language and not across languages. > > Also going forward we will be adding more languages. > > so if i have separate fields for each language ,then we need to change the > schema everytime and that will not scale very well. > > So there are two options ,either use dynamic fields or use multi core . > > Please advice which is better in terms of scaling ,optimum use of existing > resources (available ram which is abt 4GB for several instances of solr) . > > If we use multicore ,will it degrade in terms of speed etc? > > Any pointers will be helpful > > Regards > Sujatha > > > > > On 12/19/08, Julian Davchev <j...@drun.net> wrote: > >> Thanks Erick, >> I think I will go with different language fields as I want to give >> different stop words, analyzers etc. >> I might also consider scheme per language so scaling is more flexible as >> I was already advised but this will really make sense if I have more >> than one server I guess, else just all other data is duplicated for no >> reason. >> We already made decision that language will be passed each time in >> search so won't make sense to search quert in any lang. >> >> As of CJKAnalyzer from first look doesn't seem to be in solr (haven't >> tried yet) and since I am noob in java will check how it's done. >> Will definately give a try. >> >> Thanks alot for help. >> >> Erick Erickson wrote: >> >>> See the CJKAnalyzer for a start, StandardAnalyzer won't >>> help you much. >>> >>> Also, tell us a little more about your requirements. For instance, >>> if a user submits a query in Japanese, do you want to search >>> across documents in the other languages too? And will you want >>> to associate different analyzers with the content from different >>> languages? You really have two options: >>> >>> if you want different analyzers used with the different languages, >>> you probably have to index the content in different fields. That is >>> a Chinese document would have a chinese_content field, a Japanese >>> document would have a japanese_content field etc. Now you can >>> associate a different analyzer with each *_content field. >>> >>> If the same analyzer would work for all three languages, you >>> can just index all the content in a "content" field, and if you >>> need to restrict searching to the language in which the query >>> was submitted, you could always add a clause on the >>> language, e.g. AND language:chinese >>> >>> Hope this helps >>> Erick >>> >>> On Wed, Dec 17, 2008 at 11:15 PM, Sujatha Arun <suja.a...@gmail.com> >>> >> wrote: >> >>> >>>> Hi, >>>> >>>> I am prototyping lanuage search using solr 1.3 .I have 3 fields in the >>>> schema -id,content and language. >>>> >>>> I am indexing 3 pdf files ,the languages are foroyo,chinese and >>>> >> japanese. >> >>>> I use xpdf to convert the content of pdf to text and push the text to >>>> >> solr >> >>>> in the content field. >>>> >>>> What is the analyzer that i need to use for the above. >>>> >>>> By using the default text analyzer and posting this content to solr, i >>>> >> am >> >>>> not getting any results. >>>> >>>> Does solr support stemmin for the above languages. >>>> >>>> Regards >>>> Sujatha >>>> >>>> >>>> >>>> >>>> On 12/18/08, Feak, Todd <todd.f...@smss.sony.com> wrote: >>>> >>>> >>>>> Don't forget to consider scaling concerns (if there are any). There are >>>>> strong differences in the number of searches we receive for each >>>>> language. We chose to create separate schema and config per language so >>>>> that we can throw servers at a particular language (or set of >>>>> >> languages) >> >>>>> if we needed to. We see 2 orders of magnitude difference between our >>>>> most popular language and our least popular. >>>>> >>>>> -Todd Feak >>>>> >>>>> -----Original Message----- >>>>> From: Julian Davchev [mailto:j...@drun.net] >>>>> Sent: Wednesday, December 17, 2008 11:31 AM >>>>> To: solr-user@lucene.apache.org >>>>> Subject: looking for multilanguage indexing best practice/hint >>>>> >>>>> Hi, >>>>> From my study on solr and lucene so far it seems that I will use single >>>>> scheme.....at least don't see scenario where I'd need more than that. >>>>> So question is how do I approach multilanguage indexing and multilang >>>>> searching. Will it really make sense for just searching word..or rather >>>>> I should supply lang param to search as well. >>>>> >>>>> I see there are those filters and already advised on them but I guess >>>>> question is more of a best practice. >>>>> solr.ISOLatin1AccentFilterFactory, solr.SnowballPorterFilterFactory >>>>> >>>>> So solution I see is using copyField I have same field in different >>>>> langs or something using distinct filter. >>>>> Cheers >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>> >> > >