Luke and SOLR search giving different results

2012-12-02 Thread Erol Akarsu
Hi,

I am trying to apply SOLR for Turkish Language for my research.

Instead of using language identification, I manually assigned Turkish
language for a sample test document. I have configured SOLR schema.xml,
activated the part below. I have added the attached document
testTurkishDoc.xml that is inserted to SOLR database.

But searching for raw Lucene index through Luke and SOLR 4.0 search though
GUI is giving different results. In picture Selection_006.png, the word "baş"
is listed as top term. I search the word "baş" in Luke and I got the result
result that is only document, shown in Selection_004.png.

But in SOLR GUI, I am getting empty result for word "baş" in picture
Selection_002.png.

In the text we have  features field, that has word "baştan" that is being
derived from root word "baş" in Turkish Grammar. Somehow, SOLR GUI is doing
search different than Luke. I could not figure it out why I could not find
it while getting in Luke. The same thing happens for words "umut", "bul"
and "gör".

I will appreciate if you can help me to get same results from SOLR UI.



   Firmalarsa "Nasılsa buldum oynatacak ünlüyü, neyleyim senaryoyu!"
diyerek baştan savma reklamlarla kotarmaya bakıyor işi. Futbolcu Arda Turan
ve büyük umutlarla Türkiye'ye getirilen Paris Hilton'un oynatıldığı giyim
firması reklamı da tam bir fiyasko. Birbirinden ünlü bu iki ismin oynadığı
reklam Arda'nın kabinde papağan gibi tekrarladığı "My darling!" repliği,
sonunda Paris'i görünce anlam veremediğimiz uyduruk bayılma sahnesi, bir de
Paris'in ancak 5 kez izledikten sonra anlaşılan "Paris seçti, firma yaptı,
Arda bayıldı." sözleriyle kazındı hafızalara, "Keşke unutabilsek!"
dedirterek.
  



Added to schema.xml for SOLR:



  




  
  




  



  htt://111.a.b1
  6H500F0
  tr
  Maxtor DiamondMax 11 - hard drive - 500 GB - SATA-300
  Maxtor Corp.
  
  maxtor
  electronics
  hard drive
  SATA 3.0Gb/s, NCQ
  8.5ms seek
  16MB cache
  350
  6
  true
  
   Firmalarsa “Nasılsa buldum oynatacak ünlüyü, neyleyim senaryoyu!” diyerek baştan savma reklamlarla kotarmaya bakıyor işi. Futbolcu Arda Turan ve büyük umutlarla Türkiye’ye getirilen Paris Hilton’un oynatıldığı giyim firması reklamı da tam bir fiyasko. Birbirinden ünlü bu iki ismin oynadığı reklam Arda’nın kabinde papağan gibi tekrarladığı “My darling!” repliği, sonunda Paris’i görünce anlam veremediğimiz uyduruk bayılma sahnesi, bir de Paris’in ancak 5 kez izledikten sonra anlaşılan “Paris seçti, firma yaptı, Arda bayıldı.” sözleriyle kazındı hafızalara, “Keşke unutabilsek!” dedirterek.
   
  
  2006-02-13T15:26:37Z





Re: Luke and SOLR search giving different results

2012-12-03 Thread Erol Akarsu
Jack,

Thanks for help.

I removed data folder  of SOLR and indexed this sample doc from scratch,
there was no document in SOLR but only one.

When I analysed , I can see stemming is correct and I can see these for
words "bul", "baş" ,"gör" and "umut" in SF row
I attached analyse screens

Erol Akarsu

On Sun, Dec 2, 2012 at 11:00 PM, Jack Krupansky wrote:

> Have you tried using the Solr Admin Analysis page, using the word and a
> few words of context for index analysis and the word alone for query
> analysis?
>
> And be sure to fully reindex if you change ANYTHING in the schema fields
> or field types.
>
> -- Jack Krupansky
>
> From: Erol Akarsu
> Sent: Sunday, December 02, 2012 10:38 PM
> To: solr-user@lucene.apache.org
> Subject: Luke and SOLR search giving different results
>
> Hi,
>
> I am trying to apply SOLR for Turkish Language for my research.
>
> Instead of using language identification, I manually assigned Turkish
> language for a sample test document. I have configured SOLR schema.xml,
> activated the part below. I have added the attached document
> testTurkishDoc.xml that is inserted to SOLR database.
>
> But searching for raw Lucene index through Luke and SOLR 4.0 search though
> GUI is giving different results. In picture Selection_006.png, the word
> "baş" is listed as top term. I search the word "baş" in Luke and I got the
> result result that is only document, shown in Selection_004.png.
>
> But in SOLR GUI, I am getting empty result for word "baş" in picture
> Selection_002.png.
>
> In the text we have  features field, that has word "baştan" that is being
> derived from root word "baş" in Turkish Grammar. Somehow, SOLR GUI is doing
> search different than Luke. I could not figure it out why I could not find
> it while getting in Luke. The same thing happens for words "umut", "bul"
> and "gör".
>
> I will appreciate if you can help me to get same results from SOLR UI.
>
>
> 
>Firmalarsa "Nasılsa buldum oynatacak ünlüyü, neyleyim senaryoyu!"
> diyerek baştan savma reklamlarla kotarmaya bakıyor işi. Futbolcu Arda Turan
> ve büyük umutlarla Türkiye'ye getirilen Paris Hilton'un oynatıldığı giyim
> firması reklamı da tam bir fiyasko. Birbirinden ünlü bu iki ismin oynadığı
> reklam Arda'nın kabinde papağan gibi tekrarladığı "My darling!" repliği,
> sonunda Paris'i görünce anlam veremediğimiz uyduruk bayılma sahnesi, bir de
> Paris'in ancak 5 kez izledikten sonra anlaşılan "Paris seçti, firma yaptı,
> Arda bayıldı." sözleriyle kazındı hafızalara, "Keşke unutabilsek!"
> dedirterek.
>   
>
>
>
> Added to schema.xml for SOLR:
>
>  multiValued="true"/>
>  positionIncrementGap="100">
>   
> 
> 
>  words="lang/stopwords_tr.txt" enablePositionIncrements="true"/>
>  language="Turkish"/>
>   
>   
> 
> 
>  words="lang/stopwords_tr.txt" enablePositionIncrements="true"/>
>  language="Turkish"/>
>   
> 
>
>
>


Re: Luke and SOLR search giving different results

2012-12-03 Thread Erol Akarsu
Jack,

Yes.

I expect SOLR should give same search results as Luked does.

Term analyzer gives correct answer in SOLR as expected. But SOLR does not
return correct search results.

I don't know why.

Erol Akarsu

On Mon, Dec 3, 2012 at 11:21 AM, Jack Krupansky wrote:

> So, does that highlight the problem for you or not? Is the term analyzed
> as you expected?
>
> -- Jack Krupansky
>
> From: Erol Akarsu
> Sent: Monday, December 03, 2012 8:44 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Luke and SOLR search giving different results
>
> Jack,
>
> Thanks for help.
>
> I removed data folder  of SOLR and indexed this sample doc from scratch,
> there was no document in SOLR but only one.
>
> When I analysed , I can see stemming is correct and I can see these for
> words "bul", "baş" ,"gör" and "umut" in SF row
> I attached analyse screens
>
> Erol Akarsu
>
>
> On Sun, Dec 2, 2012 at 11:00 PM, Jack Krupansky 
> wrote:
>
>   Have you tried using the Solr Admin Analysis page, using the word and a
> few words of context for index analysis and the word alone for query
> analysis?
>
>   And be sure to fully reindex if you change ANYTHING in the schema fields
> or field types.
>
>   -- Jack Krupansky
>
>   From: Erol Akarsu
>   Sent: Sunday, December 02, 2012 10:38 PM
>   To: solr-user@lucene.apache.org
>   Subject: Luke and SOLR search giving different results
>
>
>   Hi,
>
>   I am trying to apply SOLR for Turkish Language for my research.
>
>   Instead of using language identification, I manually assigned Turkish
> language for a sample test document. I have configured SOLR schema.xml,
> activated the part below. I have added the attached document
> testTurkishDoc.xml that is inserted to SOLR database.
>
>   But searching for raw Lucene index through Luke and SOLR 4.0 search
> though GUI is giving different results. In picture Selection_006.png, the
> word "baş" is listed as top term. I search the word "baş" in Luke and I got
> the result result that is only document, shown in Selection_004.png.
>
>   But in SOLR GUI, I am getting empty result for word "baş" in picture
> Selection_002.png.
>
>   In the text we have  features field, that has word "baştan" that is
> being derived from root word "baş" in Turkish Grammar. Somehow, SOLR GUI is
> doing search different than Luke. I could not figure it out why I could not
> find it while getting in Luke. The same thing happens for words "umut",
> "bul" and "gör".
>
>   I will appreciate if you can help me to get same results from SOLR UI.
>
>
>   
>  Firmalarsa "Nasılsa buldum oynatacak ünlüyü, neyleyim senaryoyu!"
> diyerek baştan savma reklamlarla kotarmaya bakıyor işi. Futbolcu Arda Turan
> ve büyük umutlarla Türkiye'ye getirilen Paris Hilton'un oynatıldığı giyim
> firması reklamı da tam bir fiyasko. Birbirinden ünlü bu iki ismin oynadığı
> reklam Arda'nın kabinde papağan gibi tekrarladığı "My darling!" repliği,
> sonunda Paris'i görünce anlam veremediğimiz uyduruk bayılma sahnesi, bir de
> Paris'in ancak 5 kez izledikten sonra anlaşılan "Paris seçti, firma yaptı,
> Arda bayıldı." sözleriyle kazındı hafızalara, "Keşke unutabilsek!"
> dedirterek.
> 
>
>
>
>   Added to schema.xml for SOLR:
>
>multiValued="true"/>
>positionIncrementGap="100">
> 
>   
>   
>words="lang/stopwords_tr.txt" enablePositionIncrements="true"/>
>language="Turkish"/>
> 
> 
>   
>   
>words="lang/stopwords_tr.txt" enablePositionIncrements="true"/>
>language="Turkish"/>
> 
>   
>
>
>
>


Re: Luke and SOLR search giving different results

2012-12-03 Thread Erol Akarsu
expected term that matches what Luke reports for the
> index and what Solr Admin Analysis also reports for index analysis.
>
> -- Jack Krupansky
>
> -Original Message- From: Erol Akarsu
> Sent: Monday, December 03, 2012 11:35 AM
>
> To: solr-user@lucene.apache.org
> Subject: Re: Luke and SOLR search giving different results
>
> Jack,
>
> Yes.
>
> I expect SOLR should give same search results as Luked does.
>
> Term analyzer gives correct answer in SOLR as expected. But SOLR does not
> return correct search results.
>
> I don't know why.
>
> Erol Akarsu
>
> On Mon, Dec 3, 2012 at 11:21 AM, Jack Krupansky *
> *wrote:
>
>  So, does that highlight the problem for you or not? Is the term analyzed
>> as you expected?
>>
>> -- Jack Krupansky
>>
>> From: Erol Akarsu
>> Sent: Monday, December 03, 2012 8:44 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Luke and SOLR search giving different results
>>
>> Jack,
>>
>> Thanks for help.
>>
>> I removed data folder  of SOLR and indexed this sample doc from scratch,
>> there was no document in SOLR but only one.
>>
>> When I analysed , I can see stemming is correct and I can see these for
>> words "bul", "baş" ,"gör" and "umut" in SF row
>> I attached analyse screens
>>
>> Erol Akarsu
>>
>>
>> On Sun, Dec 2, 2012 at 11:00 PM, Jack Krupansky 
>> wrote:
>>
>>   Have you tried using the Solr Admin Analysis page, using the word and a
>> few words of context for index analysis and the word alone for query
>> analysis?
>>
>>   And be sure to fully reindex if you change ANYTHING in the schema fields
>> or field types.
>>
>>   -- Jack Krupansky
>>
>>   From: Erol Akarsu
>>   Sent: Sunday, December 02, 2012 10:38 PM
>>   To: solr-user@lucene.apache.org
>>   Subject: Luke and SOLR search giving different results
>>
>>
>>   Hi,
>>
>>   I am trying to apply SOLR for Turkish Language for my research.
>>
>>   Instead of using language identification, I manually assigned Turkish
>> language for a sample test document. I have configured SOLR schema.xml,
>> activated the part below. I have added the attached document
>> testTurkishDoc.xml that is inserted to SOLR database.
>>
>>   But searching for raw Lucene index through Luke and SOLR 4.0 search
>> though GUI is giving different results. In picture Selection_006.png, the
>> word "baş" is listed as top term. I search the word "baş" in Luke and I
>> got
>> the result result that is only document, shown in Selection_004.png.
>>
>>   But in SOLR GUI, I am getting empty result for word "baş" in picture
>> Selection_002.png.
>>
>>   In the text we have  features field, that has word "baştan" that is
>> being derived from root word "baş" in Turkish Grammar. Somehow, SOLR GUI
>> is
>> doing search different than Luke. I could not figure it out why I could
>> not
>> find it while getting in Luke. The same thing happens for words "umut",
>> "bul" and "gör".
>>
>>   I will appreciate if you can help me to get same results from SOLR UI.
>>
>>
>>   
>>  Firmalarsa "Nasılsa buldum oynatacak ünlüyü, neyleyim senaryoyu!"
>> diyerek baştan savma reklamlarla kotarmaya bakıyor işi. Futbolcu Arda
>> Turan
>> ve büyük umutlarla Türkiye'ye getirilen Paris Hilton'un oynatıldığı giyim
>> firması reklamı da tam bir fiyasko. Birbirinden ünlü bu iki ismin oynadığı
>> reklam Arda'nın kabinde papağan gibi tekrarladığı "My darling!" repliği,
>> sonunda Paris'i görünce anlam veremediğimiz uyduruk bayılma sahnesi, bir
>> de
>> Paris'in ancak 5 kez izledikten sonra anlaşılan "Paris seçti, firma yaptı,
>> Arda bayıldı." sözleriyle kazındı hafızalara, "Keşke unutabilsek!"
>> dedirterek.
>> 
>>
>>
>>
>>   Added to schema.xml for SOLR:
>>
>>   > multiValued="true"/>
>>   > positionIncrementGap="100">
>> 
>>   
>>   
>>   > words="lang/stopwords_tr.txt" enablePositionIncrements="**true"/>
>>   > language="Turkish"/>
>> 
>> 
>>   
>>   
>>   > words="lang/stopwords_tr.txt" enablePositionIncrements="**true"/>
>>   > language="Turkish"/>
>> 
>>   
>>
>>
>>
>>
>>
>


Re: Luke and SOLR search giving different results

2012-12-03 Thread Erol Akarsu
Jack,

I have these in schema.xml that defines "features" as type of text_tr

But unfortunately, this fails.

 



  




  
  




  




On Mon, Dec 3, 2012 at 1:15 PM, Jack Krupansky wrote:

> Ah! See where it says "**text:baş"?
> Your query is against the "text" field, which probably doesn't have the
> Turkish analysis.
>
> There is probably a copyField from "features" to "text". You use the
> "text_tr" field type for "features", but probably not for the "text" field.
>
>
> -- Jack Krupansky
>
> -Original Message- From: Erol Akarsu
> Sent: Monday, December 03, 2012 1:06 PM
>
> To: solr-user@lucene.apache.org
> Subject: Re: Luke and SOLR search giving different results
>
> Jack,
>
> I have already set tomcat server fro UTF-Encoding before. I have added
> URIEncoding="UTF-8" to all  elements in server.xml in Tomcat
> 7.
>
> As you see below, when I search  word "baş"  with debug mode I can see
> empty response. But  when I search word "baştan", I can get correct
> response.
>
> It seems to me that TurkishAnalyser is not being used in SOLR search
> because we can make only full word search "baştan" but not the root word
> "baş". Probably, English Analyzer is being used and could not find the root
> word. For example, in Luke, if I change "Analyser to use for query parsing"
> to EnglishAnalyser, then it can not find word "baş" but it can with
> TurkishAnalyser" only. I guess SOLR is not using TurkishAnalyzer.
>
> Is this assumption true? I could not find any other reason
>
>
> 
> 
>
>0
>58
>
>true
>baş
>xml
>
>
>
>
>baş
>baş
>text:baş
>**text:baş
>
>LuceneQParser
>
>38.0
>
>16.0
> name="org.apache.solr.handler.**component.QueryComponent">
>3.0
>
> name="org.apache.solr.handler.**component.FacetComponent">
>0.0
>
> name="org.apache.solr.handler.**component.**MoreLikeThisComponent">
>0.0
>
> name="org.apache.solr.handler.**component.HighlightComponent">
>0.0
>
> name="org.apache.solr.handler.**component.StatsComponent">
>0.0
>
> name="org.apache.solr.handler.**component.DebugComponent">
>0.0
>
>
>
>10.0
> name="org.apache.solr.handler.**component.QueryComponent">
>0.0
>
> name="org.apache.solr.handler.**component.FacetComponent">
>0.0
>
> name="org.apache.solr.handler.**component.**MoreLikeThisComponent">
>0.0
>
> name="org.apache.solr.handler.**component.HighlightComponent">
>0.0
>
> name="org.apache.solr.handler.**component.StatsComponent">
>0.0
>
> name="org.apache.solr.handler.**component.DebugComponent">
>10.0
>
>
>
>
> 
>
> 
>
>0
>2
>
>true
>baştan
>xml
>
>
>
>
>htt://111.a.b1
>6H500F0
>tr
>Maxtor DiamondMax 11 - hard drive - 500 GB -
> SATA-300
>
>Maxtor Corp.
>maxtor
>
>electronics
>hard drive
>
>
>SATA 3.0Gb/s, NCQ
>8.5ms seek
>16MB cache
>
>Firmalarsa "Nasılsa buldum oynatacak ünlüyü, neyleyim
> senaryoyu!" diyerek
>baştan savma reklamlarla kotarmaya bakıyor işi.
> Futbolcu Arda Turan
>ve büyük umutlarla Türkiye'ye getirilen Paris Hilton'un
> oynatıldığı
&

Re: Luke and SOLR search giving different results

2012-12-03 Thread Erol Akarsu
Jack,

I see interesting stuff here now.

I tried  as search query  not "baş" but "features:baş" in field "q" in SOLR
GUI. And, I got result!

In the one document, I had some fields type of text_eng, text_general and
one field features type of text_tr. If I don't specify field name, SOLR use
EnglishAnalyzer. If I do, it uses the analyzer specific to field specified
in search query string.

Is this true?

Erol Akarsu

On Mon, Dec 3, 2012 at 1:30 PM, Erol Akarsu  wrote:

> Jack,
>
> I have these in schema.xml that defines "features" as type of text_tr
>
> But unfortunately, this fails.
>
>
>   multiValued="true"/>
> 
>
>
>  positionIncrementGap="100">
>   
>  
> 
>  words="lang/stopwords_tr.txt" enablePositionIncrements="true"/>
>   language="Turkish"/>
>   
>   
>
> 
> 
>  words="lang/stopwords_tr.txt" enablePositionIncrements="true"/>
>   language="Turkish"/>
>   
> 
>
>
>
>
> On Mon, Dec 3, 2012 at 1:15 PM, Jack Krupansky wrote:
>
>> Ah! See where it says "**text:baş"?
>> Your query is against the "text" field, which probably doesn't have the
>> Turkish analysis.
>>
>> There is probably a copyField from "features" to "text". You use the
>> "text_tr" field type for "features", but probably not for the "text" field.
>>
>>
>> -- Jack Krupansky
>>
>> -Original Message- From: Erol Akarsu
>> Sent: Monday, December 03, 2012 1:06 PM
>>
>> To: solr-user@lucene.apache.org
>> Subject: Re: Luke and SOLR search giving different results
>>
>> Jack,
>>
>> I have already set tomcat server fro UTF-Encoding before. I have added
>> URIEncoding="UTF-8" to all  elements in server.xml in Tomcat
>> 7.
>>
>> As you see below, when I search  word "baş"  with debug mode I can see
>> empty response. But  when I search word "baştan", I can get correct
>> response.
>>
>> It seems to me that TurkishAnalyser is not being used in SOLR search
>> because we can make only full word search "baştan" but not the root word
>> "baş". Probably, English Analyzer is being used and could not find the
>> root
>> word. For example, in Luke, if I change "Analyser to use for query
>> parsing"
>> to EnglishAnalyser, then it can not find word "baş" but it can with
>> TurkishAnalyser" only. I guess SOLR is not using TurkishAnalyzer.
>>
>> Is this assumption true? I could not find any other reason
>>
>>
>> 
>> 
>>
>>0
>>58
>>
>>true
>>baş
>>xml
>>
>>
>>
>>
>>baş
>>baş
>>text:baş
>>**text:baş
>>
>>LuceneQParser
>>
>>38.0
>>
>>16.0
>>> name="org.apache.solr.handler.**component.QueryComponent">
>>3.0
>>
>>> name="org.apache.solr.handler.**component.FacetComponent">
>>0.0
>>
>>> name="org.apache.solr.handler.**component.**MoreLikeThisComponent">
>>0.0
>>
>>> name="org.apache.solr.handler.**component.HighlightComponent">
>>0.0
>>
>>> name="org.apache.solr.handler.**component.StatsComponent">
>>0.0
>>
>>> name="org.apache.solr.handler.**component.DebugComponent">
>>0.0
>>
>>
>>
>>10.0
>>> name="org.apache.solr.handler.**component.QueryComponent">
>>0.0
>>
>>> name="org.apache.solr.handler.**component.FacetComponent">
>>0.0
>>
>>> name="org.apache.solr.handler.**component.**MoreLikeThisComponent">
>>0.0
>>
>>>

Re: Luke and SOLR search giving different results

2012-12-04 Thread Erol Akarsu
Thanks Shawn and Jack,

I changed solrconfig to set defaul query field (qf) to field content. It
works fine now.

Erol Akarsu

On Mon, Dec 3, 2012 at 5:03 PM, Shawn Heisey  wrote:

> On 12/3/2012 1:44 PM, Erol Akarsu wrote:
>
>> I tried  as search query  not "baş" but "features:baş" in field "q" in
>> SOLR
>> GUI. And, I got result!
>>
>> In the one document, I had some fields type of text_eng, text_general and
>> one field features type of text_tr. If I don't specify field name, SOLR
>> use
>> EnglishAnalyzer. If I do, it uses the analyzer specific to field specified
>> in search query string.
>>
>
> Your config is set up to search against a field named "text" by default -
> either by a setting in schema.xml or a "df" parameter in your search
> handler definition in solrconfig.xml.  If you are using (e)dismax, it might
> be qf/pf parameters instead of df.
>
> The field named text is not properly set up for this search.  Your
> attachment at the beginning of this thread indicates that either you do not
> have a text field for this document at all, or that field is not stored.
>  If the text field is a copyField as Jack has mentioned, note that it
> doesn't matter what analysis you are doing on features -- the copy is done
> before analysis, so it is completely separate.
>
> Thanks,
> Shawn
>
>


Parallel merge of indexes

2020-02-04 Thread Erol Akarsu
I need some help in merging indexes in parallel much faster way. I am using
IndexMergeTool provided by Lucene but it seems very slow. Is there a way to
speed up the process ?

What I do is that I make 16 shards with no replication and then add replica
for every node and every shard. In the last step, I merge indexes. First 2
steps is finished quickly but last merging step takes time

I appreciate your help

Erol Akarsu

-- 

Erol Akarsu

-- 

Erol Akarsu