Hi, Just a question, I thought if I write <analyzer> without defining any type like index or query, it would apply it for both, isn't it ?
thanks, John E. McBride wrote: > > In your schema you define each field as follows: > > <fieldtype name="text_it" class="solr.TextField"> > − > <analyzer> > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.StandardFilterFactory"/> > <filter class="solr.ISOLatin1AccentFilterFactory"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.SnowballPorterFilterFactory" language="Italian"/> > </analyzer> > </fieldtype> > > etc > > However, you have not defined the query filters - if you do not this > then you will not get any matches for searches in different languages. > > for example, in english if you index the sentence "the joyful boy played > tennis", this would typically get stored as "joy boy play tennis" due to > the analysis filters. If you then made a query for "joyful" without > applying the same filters on the query side you would get no matches. > > You will also want to get some multilingual stop words lists from > snowball website eg > http://snowball.tartarus.org/algorithms/german/stop.txt. > > sunnyfr wrote: >> What is the problem with the way that I've done, >> Does that's means that there is some which are linked with language that >> we >> won't manage by search, >> there is too many language, the application will be for video, >> we will manage around 10 language, but in our database we have around 25 >> language, >> Should i create a core text and others like text_en, text_fr, text_es, >> and >> all the video which are not in this language manage by the search engine >> should be stored in text ? >> >> Because even if they are on the english website they should be able if >> they >> enter a french word "chien" for "dog" >> to find french videos. >> I don't know if I'm clear?? >> >> and even so text should manage all the other language which are not >> managed >> in the other cores ?? >> >> thanks >> >> >> John E. McBride wrote: >> >>> Well, it's this section shown below, which would change from geography >>> to geography. >>> Parameterise the EnglishPorterFilterFactory and protwords. >>> >>> You could introduce logic in the front end which asks if num results is >>> zero then makes a call to the english language, but it doesn't make >>> logical sense? why would a search in the italian language bring up >>> anything in the english index? >>> >>> I think you need to explain your application in a little more detail. >>> >>> >>> <fieldType name="text" class="solr.TextField" >>> positionIncrementGap="100"> >>> - >>> <analyzer type="index"> >>> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >>> - >>> <!-- >>> in this example, we will only use synonyms at query time >>> <filter class="solr.SynonymFilterFactory" >>> synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/> >>> >>> --> >>> - >>> <!-- >>> Case insensitive stop word removal. >>> enablePositionIncrements=true ensures that a 'gap' is left >>> to >>> allow for accurate phrase queries. >>> >>> --> >>> <filter class="solr.StopFilterFactory" ignoreCase="true" >>> words="stopwords.txt" enablePositionIncrements="true"/> >>> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" >>> generateNumberParts="1" catenateWords="1" catenateNumbers="1" >>> catenateAll="0" splitOnCaseChange="1"/> >>> <filter class="solr.LowerCaseFilterFactory"/> >>> <filter class="solr.EnglishPorterFilterFactory" >>> protected="protwords.txt"/> >>> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> >>> </analyzer> >>> - >>> <analyzer type="query"> >>> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >>> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" >>> ignoreCase="true" expand="true"/> >>> <filter class="solr.StopFilterFactory" ignoreCase="true" >>> words="stopwords.txt"/> >>> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" >>> generateNumberParts="1" catenateWords="0" catenateNumbers="0" >>> catenateAll="0" splitOnCaseChange="1"/> >>> <filter class="solr.LowerCaseFilterFactory"/> >>> <filter class="solr.EnglishPorterFilterFactory" >>> protected="protwords.txt"/> >>> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> >>> </analyzer> >>> </fieldType> >>> >>> sunnyfr wrote: >>> >>>> Hi, >>>> >>>> Thanks guys for your answer, but I don't think I can use multi-core for >>>> each >>>> language, >>>> because for exemple if somebody is connected from Italia and if there >>>> is >>>> not >>>> that much Italian's book, >>>> so by default I will show up few italian books but all the english one >>>> as >>>> well. >>>> >>>> Do you have an example ? >>>> I'm quite lost about it, >>>> >>>> >>>> John E. McBride wrote: >>>> >>>> >>>>> Fairly nebulous requirements, but I recently was involved in a >>>>> multilingual search platform. >>>>> >>>>> The approach, translated to solr 1.3 would be to use multicore - one >>>>> core per geography. Then a schema.xml per core, each with a different >>>>> language in the porter algorithm, stopwords etc - taken from snowball. >>>>> >>>>> Then on the german front end you make requests to the de core, on the >>>>> english front end make requests to the english core. >>>>> >>>>> This is much simpler than sorting every language in the one index, for >>>>> example german queries will need to be run through the german query >>>>> filters etc. If you have all languages in one schema, then you will >>>>> have to do some front end logic to map the query to the correct field. >>>>> >>>>> You have failed to consider internationalisation of the query side of >>>>> the process - your field type merely have analysis filters. >>>>> >>>>> Additionally, if the data source for each different geography is >>>>> different it makes sense to separate the indexes and subsequently the >>>>> ingestion mechanisms and schedules. >>>>> >>>>> Just a few thoughts. >>>>> >>>>> John >>>>> >>>>> sunnyfr wrote: >>>>> >>>>> >>>>>> Hi, >>>>>> >>>>>> I would like to manage properly multi language search motor, >>>>>> I would like your advice about what have I done. >>>>>> >>>>>> Solr1.3 >>>>>> tomcat55 >>>>>> >>>>>> http://www.nabble.com/file/p19954805/schema.xml schema.xml >>>>>> >>>>>> Thanks a lot, >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> > > > -- View this message in context: http://www.nabble.com/Multi-language-solr1.3-what-would-you-reckon--tp19954805p20059666.html Sent from the Solr - User mailing list archive at Nabble.com.