Hi Erick, thanks for your help! I need some technical help though... let me put it that way:
1. I deleted everything in index with: curl http://localhost:8983/solr/update -F stream.body=' <delete><query>*:*</query></delete>' curl http://localhost:8983/solr/update -F stream.body=' <commit />' 2. I created 2 documents with fields: name_en, answer_en, name_es, answer_es 3. I made a query through admin page, with response: <response> - <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">9</int> - <lst name="params"> <str name="indent">on</str> <str name="start">0</str> <str name="q">Jakub </str> <str name="version">2.2</str> <str name="rows">10</str> </lst> </lst> - <result name="response" numFound="2" start="0"> - <doc> - <arr name="answer_en_t"> <str>My name is Jakub</str> </arr> - <arr name="answer_es_t"> <str>Me llamo Jakub.</str> </arr> - <arr name="id"> <str>Question:1</str> </arr> - <arr name="name_en_t"> <str>What is your name?</str> </arr> - <arr name="name_es_t"> <str>Como te llamas?</str> </arr> - <arr name="pk_s"> <str>1</str> </arr> - <arr name="spell"> <str>What is your name?</str> <str>My name is Jakub</str> <str>Como te llamas?</str> <str>Me llamo Jakub.</str> </arr> </doc> - <doc> - <arr name="answer_en_t"> <str>I am in the kitchen Jakub!</str> </arr> - <arr name="answer_es_t"> <str>Estoy en la cocina.</str> </arr> - <arr name="id"> <str>Question:2</str> </arr> - <arr name="name_en_t"> <str>Where are you?</str> </arr> - <arr name="name_es_t"> <str>Donde estas?</str> </arr> - <arr name="pk_s"> <str>2</str> </arr> - <arr name="spell"> <str>Where are you?</str> <str>I am in the kitchen Jakub!</str> <str>Donde estas?</str> <str>Estoy en la cocina.</str> </arr> </doc> </result> </response> 4. Now I needed two dismaxes to make it work in two separate languages. Lets say I just want to look up in *_en fields, then I created a dismax: <requestHandler name="/English" class="solr.SearchHandler"> <lst name="defaults"> <str name="defType">dismax</str> <str name="echoParams">explicit</str> <float name="tie">0.01</float> <str name="qf"> name_en_t^0.5 answer_en_t^1.0 </str> </lst> </requestHandler> 5. Hitting the url: http://localhost:8982/solr/English/?q=Jakub gaves me an error: there are more terms than documents in field "name_en_t", but it's impossible to sort on tokenized fields 6. I know that I should create a separate dismax for Spanish. My questions: 1. Why those fields are named with *_t? I saw in schema.xml that they are made dynamicly. Can/should I create my own predefined fields in schema.xml? Is this the place where you put "HOW" the field should be interpreted by indexer? 2. Why the error in no. 5 is being thrown? I know that you cannot do sorting on tokenized fields, but I don't see myself trying to index anything nor tokenizing. 3. How should it be changed to work properly? Thank you and I ask for patience as this can help many rookies like to me to get started. Jakub. 2010/10/21 Erick Erickson <erickerick...@gmail.com> > See below: > > But also search the archives for multilanguage, this topic has been > discussed > many times before. Lucid Imagination maintains a Solr-powered (of course) > searchable > list at: http://www.lucidimagination.com/search/ > > <http://www.lucidimagination.com/search/> > > On Wed, Oct 20, 2010 at 9:03 AM, Jakub Godawa <jakub.god...@gmail.com > >wrote: > > > Hi everyone! (my first post) > > > > I am new, but really curious about usefullness of lucene/solr in > documents > > search from the web applications. I use Ruby on Rails to create one, with > > plugin "acts_as_solr_reloaded" that makes connection between web app and > > solr easy. > > > > So I am in a point, where I know that good solution is to prepare > > multi-language documents with fields like: > > question_en, answer_en, > > question_fr, answer_fr, > > question_pl, answer_pl... etc. > > > > I need to create an index that would work with 6 languages: english, > > french, > > german, russian, ukrainian and polish. > > > > My questions are: > > 1. Is it doable to have just one search field that behaves like Google's > > for > > all those documents? It can be an option to indicate a language to > search. > > > > This depends on what you mean by do-able. Are you going to allow a French > user to search an English document (& etc)? But the real answer is "yes, > you > can > if you .....". There'll be tradeoffs. > > Take a look at the dismax handler. It's kind of hard to grok all at once, > but you > can cause it to search across multiple fields. That is, the user types > "language", > and you can turn it into a complex query under the covers like > lang_en:language lang_fr:language lang_ru:language, etc. You can also > apply boosts. Note that this has obvious problems with, say, Russian. Half > your > job will be figuring out what will satisfy the user..... > > You could also have a #different# dismax handler defined for various > languages. Say > the user was coming from Spanish. Consider a browseES handler. See > solrconfig.xml > for the default dismax handler. The Solr book mentioned above describes > this. > > > > 2. How should I begin changing the solr/conf/schema.xml (or other) file > to > > tailor it to my needs? As I am a real rookie here, I am still a bit > > confused > > about "fields", "fieldTypes" and their connection with particular field > > (ex. > > answer_fr) and the "tokenizers" and "analyzers". If someone can provide a > > basic step by step tutorial on how to make it work in two languages I > would > > be more that happy. > > > > You have several choices here: > > books "Lucene in Action" and "Solr 1.4, Enterprise SearchServer" both > have > discussions here. > > Spend some time on the solr/admin/analysis page. That page allows you to > see > pretty much exactly what each of the steps in an analyzer chain > accomplish. > > > > 3. Do all those languages are supported (officially/unofficialy) by > > lucene/solr? > > > > See: > > http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/analysis/Analyzer.html > Remember that Solr is built on Lucene, so these analyzers are available. > > > > > > Thank you for help, > > Jakub Godawa. > > > > Best > Erick >