Jack, I have these in schema.xml that defines "features" as type of text_tr
But unfortunately, this fails. <field name="features" type="text_tr" indexed="true" stored="true" multiValued="true"/> <copyField source="features" dest="text"/> <fieldType name="text_tr" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.TurkishLowerCaseFilterFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_tr.txt" enablePositionIncrements="true"/> <filter class="solr.SnowballPorterFilterFactory" language="Turkish"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.TurkishLowerCaseFilterFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_tr.txt" enablePositionIncrements="true"/> <filter class="solr.SnowballPorterFilterFactory" language="Turkish"/> </analyzer> </fieldType> On Mon, Dec 3, 2012 at 1:15 PM, Jack Krupansky <j...@basetechnology.com>wrote: > Ah! See where it says "<str name="parsedquery_toString">**text:baş</str>"? > Your query is against the "text" field, which probably doesn't have the > Turkish analysis. > > There is probably a copyField from "features" to "text". You use the > "text_tr" field type for "features", but probably not for the "text" field. > > > -- Jack Krupansky > > -----Original Message----- From: Erol Akarsu > Sent: Monday, December 03, 2012 1:06 PM > > To: solr-user@lucene.apache.org > Subject: Re: Luke and SOLR search giving different results > > Jack, > > I have already set tomcat server fro UTF-Encoding before. I have added > URIEncoding="UTF-8" to all <Connector ..> elements in server.xml in Tomcat > 7. > > As you see below, when I search word "baş" with debug mode I can see > empty response. But when I search word "baştan", I can get correct > response. > > It seems to me that TurkishAnalyser is not being used in SOLR search > because we can make only full word search "baştan" but not the root word > "baş". Probably, English Analyzer is being used and could not find the root > word. For example, in Luke, if I change "Analyser to use for query parsing" > to EnglishAnalyser, then it can not find word "baş" but it can with > TurkishAnalyser" only. I guess SOLR is not using TurkishAnalyzer. > > Is this assumption true? I could not find any other reason > > > <?xml version="1.0" encoding="UTF-8"?> > <response> > <lst name="responseHeader"> > <int name="status">0</int> > <int name="QTime">58</int> > <lst name="params"> > <str name="debugQuery">true</str> > <str name="q">baş</str> > <str name="wt">xml</str> > </lst> > </lst> > <result name="response" numFound="0" start="0" /> > <lst name="debug"> > <str name="rawquerystring">baş</**str> > <str name="querystring">baş</str> > <str name="parsedquery">text:baş</**str> > <str name="parsedquery_toString">**text:baş</str> > <lst name="explain" /> > <str name="QParser">LuceneQParser</**str> > <lst name="timing"> > <double name="time">38.0</double> > <lst name="prepare"> > <double name="time">16.0</double> > <lst > name="org.apache.solr.handler.**component.QueryComponent"> > <double name="time">3.0</double> > </lst> > <lst > name="org.apache.solr.handler.**component.FacetComponent"> > <double name="time">0.0</double> > </lst> > <lst > name="org.apache.solr.handler.**component.**MoreLikeThisComponent"> > <double name="time">0.0</double> > </lst> > <lst > name="org.apache.solr.handler.**component.HighlightComponent"> > <double name="time">0.0</double> > </lst> > <lst > name="org.apache.solr.handler.**component.StatsComponent"> > <double name="time">0.0</double> > </lst> > <lst > name="org.apache.solr.handler.**component.DebugComponent"> > <double name="time">0.0</double> > </lst> > </lst> > <lst name="process"> > <double name="time">10.0</double> > <lst > name="org.apache.solr.handler.**component.QueryComponent"> > <double name="time">0.0</double> > </lst> > <lst > name="org.apache.solr.handler.**component.FacetComponent"> > <double name="time">0.0</double> > </lst> > <lst > name="org.apache.solr.handler.**component.**MoreLikeThisComponent"> > <double name="time">0.0</double> > </lst> > <lst > name="org.apache.solr.handler.**component.HighlightComponent"> > <double name="time">0.0</double> > </lst> > <lst > name="org.apache.solr.handler.**component.StatsComponent"> > <double name="time">0.0</double> > </lst> > <lst > name="org.apache.solr.handler.**component.DebugComponent"> > <double name="time">10.0</double> > </lst> > </lst> > </lst> > </lst> > </response> > > <response> > <lst name="responseHeader"> > <int name="status">0</int> > <int name="QTime">2</int> > <lst name="params"> > <str name="debugQuery">true</str> > <str name="q">baştan</str> > <str name="wt">xml</str> > </lst> > </lst> > <result name="response" numFound="1" start="0"> > <doc> > <str name="url">htt://111.a.b1</**str> > <str name="id">6H500F0XXXX</str> > <str name="lang">tr</str> > <str name="name">Maxtor DiamondMax 11 - hard drive - 500 GB - > SATA-300 > </str> > <str name="manu">Maxtor Corp.</str> > <str name="manu_id_s">maxtor</str> > <arr name="cat"> > <str>electronics</str> > <str>hard drive</str> > </arr> > <arr name="features"> > <str>SATA 3.0Gb/s, NCQ</str> > <str>8.5ms seek</str> > <str>16MB cache</str> > <str> > Firmalarsa "Nasılsa buldum oynatacak ünlüyü, neyleyim > senaryoyu!" diyerek > baştan savma reklamlarla kotarmaya bakıyor işi. > Futbolcu Arda Turan > ve büyük umutlarla Türkiye'ye getirilen Paris Hilton'un > oynatıldığı > giyim firması reklamı da tam bir fiyasko. Birbirinden > ünlü bu iki > ismin oynadığı reklam Arda'nın kabinde papağan gibi > tekrarladığı > "My darling!" repliği, sonunda Paris'i görünce anlam > veremediğimiz > uyduruk bayılma sahnesi, bir de Paris'in ancak 5 kez > izledikten > sonra anlaşılan "Paris seçti, firma yaptı, Arda > bayıldı." > sözleriyle kazındı hafızalara, "Keşke unutabilsek!" > dedirterek. > </str> > </arr> > <float name="price">350.0</float> > <str name="price_c">350,USD</str> > <int name="popularity">6</int> > <bool name="inStock">true</bool> > <date name="manufacturedate_dt">**2006-02-13T15:26:37Z</date> > <long name="_version_">**1420300467908378624</long> > </doc> > </result> > <lst name="debug"> > <str name="rawquerystring">baştan</**str> > <str name="querystring">baştan</**str> > <str name="parsedquery">text:**baştan</str> > <str name="parsedquery_toString">**text:baştan</str> > <lst name="explain"> > <str name="6H500F0XXXX"> > 0.028767452 = (MATCH) weight(text:baştan in 0) > [DefaultSimilarity], result of: > 0.028767452 = fieldWeight in 0, product of: > 1.0 = tf(freq=1.0), with freq of: > 1.0 = termFreq=1.0 > 0.30685282 = idf(docFreq=1, maxDocs=1) > 0.09375 = fieldNorm(doc=0) > </str> > </lst> > <str name="QParser">LuceneQParser</**str> > <lst name="timing"> > <double name="time">2.0</double> > <lst name="prepare"> > <double name="time">1.0</double> > <lst > name="org.apache.solr.handler.**component.QueryComponent"> > <double name="time">1.0</double> > </lst> > <lst > name="org.apache.solr.handler.**component.FacetComponent"> > <double name="time">0.0</double> > </lst> > <lst > name="org.apache.solr.handler.**component.**MoreLikeThisComponent"> > <double name="time">0.0</double> > </lst> > <lst > name="org.apache.solr.handler.**component.HighlightComponent"> > <double name="time">0.0</double> > </lst> > <lst > name="org.apache.solr.handler.**component.StatsComponent"> > <double name="time">0.0</double> > </lst> > <lst > name="org.apache.solr.handler.**component.DebugComponent"> > <double name="time">0.0</double> > </lst> > </lst> > <lst name="process"> > <double name="time">1.0</double> > <lst > name="org.apache.solr.handler.**component.QueryComponent"> > <double name="time">0.0</double> > </lst> > <lst > name="org.apache.solr.handler.**component.FacetComponent"> > <double name="time">0.0</double> > </lst> > <lst > name="org.apache.solr.handler.**component.**MoreLikeThisComponent"> > <double name="time">0.0</double> > </lst> > <lst > name="org.apache.solr.handler.**component.HighlightComponent"> > <double name="time">0.0</double> > </lst> > <lst > name="org.apache.solr.handler.**component.StatsComponent"> > <double name="time">0.0</double> > </lst> > <lst > name="org.apache.solr.handler.**component.DebugComponent"> > <double name="time">1.0</double> > </lst> > </lst> > </lst> > </lst> > </response> > > On Mon, Dec 3, 2012 at 12:30 PM, Jack Krupansky <j...@basetechnology.com>* > *wrote: > > Two points: >> >> 1. Possibly an encoding problem with your container? Is UTF-8 encoding >> enabled? >> 2. Add &debugQuery=true to your query (from the browser) and see if the >> parser_query has the expected term that matches what Luke reports for the >> index and what Solr Admin Analysis also reports for index analysis. >> >> -- Jack Krupansky >> >> -----Original Message----- From: Erol Akarsu >> Sent: Monday, December 03, 2012 11:35 AM >> >> To: solr-user@lucene.apache.org >> Subject: Re: Luke and SOLR search giving different results >> >> Jack, >> >> Yes. >> >> I expect SOLR should give same search results as Luked does. >> >> Term analyzer gives correct answer in SOLR as expected. But SOLR does not >> return correct search results. >> >> I don't know why. >> >> Erol Akarsu >> >> On Mon, Dec 3, 2012 at 11:21 AM, Jack Krupansky <j...@basetechnology.com >> >* >> *wrote: >> >> >> So, does that highlight the problem for you or not? Is the term analyzed >> >>> as you expected? >>> >>> -- Jack Krupansky >>> >>> From: Erol Akarsu >>> Sent: Monday, December 03, 2012 8:44 AM >>> To: solr-user@lucene.apache.org >>> Subject: Re: Luke and SOLR search giving different results >>> >>> Jack, >>> >>> Thanks for help. >>> >>> I removed data folder of SOLR and indexed this sample doc from scratch, >>> there was no document in SOLR but only one. >>> >>> When I analysed , I can see stemming is correct and I can see these for >>> words "bul", "baş" ,"gör" and "umut" in SF row >>> I attached analyse screens >>> >>> Erol Akarsu >>> >>> >>> On Sun, Dec 2, 2012 at 11:00 PM, Jack Krupansky <j...@basetechnology.com >>> > >>> wrote: >>> >>> Have you tried using the Solr Admin Analysis page, using the word and a >>> few words of context for index analysis and the word alone for query >>> analysis? >>> >>> And be sure to fully reindex if you change ANYTHING in the schema >>> fields >>> or field types. >>> >>> -- Jack Krupansky >>> >>> From: Erol Akarsu >>> Sent: Sunday, December 02, 2012 10:38 PM >>> To: solr-user@lucene.apache.org >>> Subject: Luke and SOLR search giving different results >>> >>> >>> Hi, >>> >>> I am trying to apply SOLR for Turkish Language for my research. >>> >>> Instead of using language identification, I manually assigned Turkish >>> language for a sample test document. I have configured SOLR schema.xml, >>> activated the part below. I have added the attached document >>> testTurkishDoc.xml that is inserted to SOLR database. >>> >>> But searching for raw Lucene index through Luke and SOLR 4.0 search >>> though GUI is giving different results. In picture Selection_006.png, the >>> word "baş" is listed as top term. I search the word "baş" in Luke and I >>> got >>> the result result that is only document, shown in Selection_004.png. >>> >>> But in SOLR GUI, I am getting empty result for word "baş" in picture >>> Selection_002.png. >>> >>> In the text we have features field, that has word "baştan" that is >>> being derived from root word "baş" in Turkish Grammar. Somehow, SOLR GUI >>> is >>> doing search different than Luke. I could not figure it out why I could >>> not >>> find it while getting in Luke. The same thing happens for words "umut", >>> "bul" and "gör". >>> >>> I will appreciate if you can help me to get same results from SOLR UI. >>> >>> >>> <field name="features"> >>> Firmalarsa "Nasılsa buldum oynatacak ünlüyü, neyleyim >>> senaryoyu!" >>> diyerek baştan savma reklamlarla kotarmaya bakıyor işi. Futbolcu Arda >>> Turan >>> ve büyük umutlarla Türkiye'ye getirilen Paris Hilton'un oynatıldığı giyim >>> firması reklamı da tam bir fiyasko. Birbirinden ünlü bu iki ismin >>> oynadığı >>> reklam Arda'nın kabinde papağan gibi tekrarladığı "My darling!" repliği, >>> sonunda Paris'i görünce anlam veremediğimiz uyduruk bayılma sahnesi, bir >>> de >>> Paris'in ancak 5 kez izledikten sonra anlaşılan "Paris seçti, firma >>> yaptı, >>> Arda bayıldı." sözleriyle kazındı hafızalara, "Keşke unutabilsek!" >>> dedirterek. >>> </field> >>> >>> >>> >>> Added to schema.xml for SOLR: >>> >>> <field name="features" type="text_tr" indexed="true" stored="true" >>> multiValued="true"/> >>> <fieldType name="text_tr" class="solr.TextField" >>> positionIncrementGap="100"> >>> <analyzer type="index"> >>> <tokenizer class="solr.****StandardTokenizerFactory"/> >>> <filter class="solr.****TurkishLowerCaseFilterFactory"****/> >>> >>> <filter class="solr.StopFilterFactory" ignoreCase="true" >>> words="lang/stopwords_tr.txt" enablePositionIncrements="****true"/> >>> <filter class="solr.****SnowballPorterFilterFactory" >>> >>> language="Turkish"/> >>> </analyzer> >>> <analyzer type="query"> >>> <tokenizer class="solr.****StandardTokenizerFactory"/> >>> <filter class="solr.****TurkishLowerCaseFilterFactory"****/> >>> >>> <filter class="solr.StopFilterFactory" ignoreCase="true" >>> words="lang/stopwords_tr.txt" enablePositionIncrements="****true"/> >>> <filter class="solr.****SnowballPorterFilterFactory" >>> >>> language="Turkish"/> >>> </analyzer> >>> </fieldType> >>> >>> >>> >>> >>> >>> >> >