date:20140714

Re: Korean Tokenizer in solr

2014-07-14 Thread Alexandre Rafalovitch

You sure, it's not a spelling error or something other weird like
that? Because Solr ships with that filter in it's example schema:


So, you can compare what you are doing differently with that.

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On Mon, Jul 14, 2014 at 1:58 PM, Poornima Jay
 wrote:
> I have upgrade the solr version to 4.8.1. But after making changes in the 
> schema file i am getting the below error
> Error instantiating class: 
> 'org.apache.lucene.analysis.cjk.CJKBigramFilterFactory'
> I assume CJKBigramFilterFactory and CJKFoldingFilterFactory are supported in 
> 4.8.1. Do I need to make any configuration changes to get this working.
>
> Please advice.
>
> Regards,
> Poornima
>
>
> On Thursday, 10 July 2014 2:45 PM, Alexandre Rafalovitch  
> wrote:
>
>
>
> I would suggest you read through all 12 (?) articles in this series:
> http://discovery-grindstone.blogspot.com/2013/10/cjk-with-solr-for-libraries-part-1.html
> . It will probably lay out most of the issues for you.
>
> And if you are starting, I would really suggest using the latest Solr
> (4.9). A lot more people remember what the latest version has then
> what was in 3.6. And, as the series above will tell you, some relevant
> issues had been fixed in more recent Solr versions.
>
> Regards,
>Alex.
> Personal website: http://www.outerthoughts.com/
> Current project: http://www.solr-start.com/ - Accelerating your Solr 
> proficiency
>
>
>
> On Thu, Jul 10, 2014 at 4:11 PM, Poornima Jay
>  wrote:
>> Till now I was thinking solr will support KoreanTokenizer. I haven't used 
>> any other 3rd party one.
>> Actually the issue i am facing is I need to integrate English, Chinese, 
>> Japanese and Korean language search in a single site. Based on the user's 
>> selected language to search the fields will be queried appropriately.
>>
>> I tried using cjk for all the 3 languages like below but only few search 
>> terms work for Chinese and Japanese. nothing works for Korean.
>>
>> > positionIncrementGap="1" autoGeneratePhraseQueries="false">
>>  
>> 
>> 
>> > class="edu.stanford.lucene.analysis.CJKFoldingFilterFactory"/>
>> > id="Traditional-Simplified"/>
>> > id="Katakana-Hiragana"/>
>> 
>> > hiragana="true" katakana="true" hangul="true" outputUnigrams="true" />
>>   
>> 
>>
>> So i tried to implement individual fieldtype for each language as below
>>
>> Chinese
>>  > positionIncrementGap="1000" autoGeneratePhraseQueries="false">
>>  
>>  
>>
>>
>>
>>
>> 
>>
>> Japanese
>> > autoGeneratePhraseQueries="false">
>>
>>  
>>   
>>   > tags="stoptags_ja.txt" />
>>   
>>   > words="stopwords_ja.txt" />
>>   > minimumLength="4"/>
>>   
>>
>> 
>>
>> Korean
>> > autoGeneratePhraseQueries="false">
>>   
>> 
>> > hasCNoun="true"  bigrammable="true"/>
>> 
>> > words="stopwords_kr.txt"/>
>>   
>>   
>> 
>> > hasCNoun="false"  bigrammable="false"/>
>> 
>> > words="stopwords_kr.txt"/>
>>   
>> 
>>
>> I am really struck how to implement this. Please help me.
>>
>> Thanks,
>> Poornima
>>
>>
>>
>> On Thursday, 10 July 2014 2:22 PM, Alexandre Rafalovitch 
>>  wrote:
>>
>>
>>
>> I don't think Solr ships with Korean Tokenizer, does it?
>>
>> If you are using a 3rd party one, you need to give full class name,
>> not just solr.Korean... And you need the library added in the lib
>> statement in solrconfig.xml (at least in Solr 4).
>>
>> Regards,
>>Alex.
>> Personal website: http://www.outerthoughts.com/
>> Current project: http://www.solr-start.com/ - Accelerating your Solr 
>> proficiency
>>
>>
>>
>> On Thu, Jul 10, 2014 at 3:23 PM, Poornima Jay
>>  wrote:
>>> I have defined the fieldtype inside the fields section.  When i checked the 
>>> error log i found the below error
>>>
>>> Caused by: java.lang.ClassNotFoundException: solr.KoreanTokenizerFactory
>>>
>>> SEVERE: org.apache.solr.common.SolrException: analyzer without class or 
>>> tokenizer & filter list
>>>
>>>
>>> Do i need to add any libraries for koreanTokenizer?
>>>
>>> Regards,
>>> Poornima
>>>
>>>
>>> On Thursday, 10 July 2014 1:03 PM, Alexandre Rafalovitch 
>>>  wrote:
>>>
>>>
>>>
>>> Double check your xml file that you don't - for example - define your
>>> fieldType outside of fields section. Or maybe you have exception
>>> earlier about some component in the type definition.
>>>
>>> This is not about Korean language, it seems. Something more
>>> fundamentally about XML config.
>>>
>>> Regards,
>>>Alex.
>>> Personal website: http://www.outerthoughts.com/
>>> Current project: http://www.solr-start.com/ - Accelerating your Solr 
>>> proficiency
>>>
>>>
>>>
>>> On Thu, Jul 10, 2014 at

Re: Korean Tokenizer in solr

2014-07-14 Thread Poornima Jay

Yes, Below is my defined fieldtype


      
         
         
         
      
      
         
         
         
      
   

Please correct me if I am doing anything wrong here

Regards,
Poornima


On Monday, 14 July 2014 12:33 PM, Alexandre Rafalovitch  
wrote:
 


You sure, it's not a spelling error or something other weird like
that? Because Solr ships with that filter in it's example schema:
        

So, you can compare what you are doing differently with that.

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853



On Mon, Jul 14, 2014 at 1:58 PM, Poornima Jay
 wrote:
> I have upgrade the solr version to 4.8.1. But after making changes in the 
> schema file i am getting the below error
> Error instantiating class: 
> 'org.apache.lucene.analysis.cjk.CJKBigramFilterFactory'
> I assume CJKBigramFilterFactory and CJKFoldingFilterFactory are supported in 
> 4.8.1. Do I need to make any configuration changes to get this working.
>
> Please advice.
>
> Regards,
> Poornima
>
>
> On Thursday, 10 July 2014 2:45 PM, Alexandre Rafalovitch  
> wrote:
>
>
>
> I would suggest you read through all 12 (?) articles in this series:
> http://discovery-grindstone.blogspot.com/2013/10/cjk-with-solr-for-libraries-part-1.html
> . It will probably lay out most of the issues for you.
>
> And if you are starting, I would really suggest using the latest Solr
> (4.9). A lot more people remember what the latest version has then
> what was in 3.6. And, as the series above will tell you, some relevant
> issues had been fixed in more recent Solr versions.
>
> Regards,
>    Alex.
> Personal website: http://www.outerthoughts.com/
> Current project: http://www.solr-start.com/ - Accelerating your Solr 
> proficiency
>
>
>
> On Thu, Jul 10, 2014 at 4:11 PM, Poornima Jay
>  wrote:
>> Till now I was thinking solr will support KoreanTokenizer. I haven't used 
>> any other 3rd party one.
>> Actually the issue i am facing is I need to integrate English, Chinese, 
>> Japanese and Korean language search in a single site. Based on the user's 
>> selected language to search the fields will be queried appropriately.
>>
>> I tried using cjk for all the 3 languages like below but only few search 
>> terms work for Chinese and Japanese. nothing works for Korean.
>>
>> > positionIncrementGap="1" autoGeneratePhraseQueries="false">
>>      
>>         
>>         
>>         >class="edu.stanford.lucene.analysis.CJKFoldingFilterFactory"/>
>>         >id="Traditional-Simplified"/>
>>         >id="Katakana-Hiragana"/>
>>         
>>         >hiragana="true" katakana="true" hangul="true" outputUnigrams="true" />
>>       
>>     
>>
>> So i tried to implement individual fieldtype for each language as below
>>
>> Chinese
>>  >positionIncrementGap="1000" autoGeneratePhraseQueries="false">
>>      
>>          
>>            
>>            
>>            
>>        
>>     
>>
>> Japanese
>> > autoGeneratePhraseQueries="false">
>>    
>>      
>>       
>>       >tags="stoptags_ja.txt" />
>>       
>>       >words="stopwords_ja.txt" />
>>       >minimumLength="4"/>
>>       
>>    
>> 
>>
>> Korean
>> > autoGeneratePhraseQueries="false">
>>       
>>         
>>         >hasCNoun="true"  bigrammable="true"/>
>>         
>>         >words="stopwords_kr.txt"/>
>>       
>>       
>>         
>>         >hasCNoun="false"  bigrammable="false"/>
>>         
>>         >words="stopwords_kr.txt"/>
>>       
>>     
>>
>> I am really struck how to implement this. Please help me.
>>
>> Thanks,
>> Poornima
>>
>>
>>
>> On Thursday, 10 July 2014 2:22 PM, Alexandre Rafalovitch 
>>  wrote:
>>
>>
>>
>> I don't think Solr ships with Korean Tokenizer, does it?
>>
>> If you are using a 3rd party one, you need to give full class name,
>> not just solr.Korean... And you need the library added in the lib
>> statement in solrconfig.xml (at least in Solr 4).
>>
>> Regards,
>>    Alex.
>> Personal website: http://www.outerthoughts.com/
>> Current project: http://www.solr-start.com/ - Accelerating your Solr 
>> proficiency
>>
>>
>>
>> On Thu, Jul 10, 2014 at 3:23 PM, Poornima Jay
>>  wrote:
>>> I have defined the fieldtype inside the fields section.  When i checked the 
>>> error log i found the below error
>>>
>>> Caused by: java.lang.ClassNotFoundException: solr.KoreanTokenizerFactory
>>>
>>> SEVERE: org.apache.solr.common.SolrException: analyzer without class or 
>>> tokenizer & filter list
>>>
>>>
>>> Do i need to add any libraries for koreanTokenizer?
>>>
>>> Regards,
>>> Poornima
>>>
>>>
>>> On Thursday, 10 July 2014 1:03 PM, Alexandre Rafalovitch 
>>>  wrote:
>>>
>>>
>>>
>>> Double check your xml file that you don't - for example - define your
>>> fieldType outside of fields section. Or maybe you have exception
>>> earlier about some component in the type definition.
>>>
>>> This is not about Korean language, it seems. S

Re: Korean Tokenizer in solr

2014-07-14 Thread Alexandre Rafalovitch

What happens if you have a new collection with absolute minimum in it
and then add the definition? Start from something like:
https://github.com/arafalov/simplest-solr-config .

Also, is there a long exception earlier in a log. It may have more clues.

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On Mon, Jul 14, 2014 at 2:15 PM, Poornima Jay
 wrote:
> Yes, Below is my defined fieldtype
>
>  positionIncrementGap="100">
>   
>  
>   han="true"/>
>   generateWordParts="1" generateNumberParts="1" catenateWords="1" 
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="0" 
> preserveOriginal="1"/>
>   
>   
>  
>   han="true"/>
>   generateWordParts="1" generateNumberParts="1" catenateWords="0" 
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="0" 
> preserveOriginal="1"/>
>   
>
>
> Please correct me if I am doing anything wrong here
>
> Regards,
> Poornima
>
>
> On Monday, 14 July 2014 12:33 PM, Alexandre Rafalovitch  
> wrote:
>
>
>
> You sure, it's not a spelling error or something other weird like
> that? Because Solr ships with that filter in it's example schema:
> 
>
> So, you can compare what you are doing differently with that.
>
> Regards,
>Alex.
> Personal: http://www.outerthoughts.com/ and @arafalov
> Solr resources: http://www.solr-start.com/ and @solrstart
> Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
>
>
>
> On Mon, Jul 14, 2014 at 1:58 PM, Poornima Jay
>  wrote:
>> I have upgrade the solr version to 4.8.1. But after making changes in the 
>> schema file i am getting the below error
>> Error instantiating class: 
>> 'org.apache.lucene.analysis.cjk.CJKBigramFilterFactory'
>> I assume CJKBigramFilterFactory and CJKFoldingFilterFactory are supported in 
>> 4.8.1. Do I need to make any configuration changes to get this working.
>>
>> Please advice.
>>
>> Regards,
>> Poornima
>>
>>
>> On Thursday, 10 July 2014 2:45 PM, Alexandre Rafalovitch 
>>  wrote:
>>
>>
>>
>> I would suggest you read through all 12 (?) articles in this series:
>> http://discovery-grindstone.blogspot.com/2013/10/cjk-with-solr-for-libraries-part-1.html
>> . It will probably lay out most of the issues for you.
>>
>> And if you are starting, I would really suggest using the latest Solr
>> (4.9). A lot more people remember what the latest version has then
>> what was in 3.6. And, as the series above will tell you, some relevant
>> issues had been fixed in more recent Solr versions.
>>
>> Regards,
>>Alex.
>> Personal website: http://www.outerthoughts.com/
>> Current project: http://www.solr-start.com/ - Accelerating your Solr 
>> proficiency
>>
>>
>>
>> On Thu, Jul 10, 2014 at 4:11 PM, Poornima Jay
>>  wrote:
>>> Till now I was thinking solr will support KoreanTokenizer. I haven't used 
>>> any other 3rd party one.
>>> Actually the issue i am facing is I need to integrate English, Chinese, 
>>> Japanese and Korean language search in a single site. Based on the user's 
>>> selected language to search the fields will be queried appropriately.
>>>
>>> I tried using cjk for all the 3 languages like below but only few search 
>>> terms work for Chinese and Japanese. nothing works for Korean.
>>>
>>> >> positionIncrementGap="1" autoGeneratePhraseQueries="false">
>>>  
>>> 
>>> 
>>> >> class="edu.stanford.lucene.analysis.CJKFoldingFilterFactory"/>
>>> >> id="Traditional-Simplified"/>
>>> >> id="Katakana-Hiragana"/>
>>> 
>>> >> hiragana="true" katakana="true" hangul="true" outputUnigrams="true" />
>>>   
>>> 
>>>
>>> So i tried to implement individual fieldtype for each language as below
>>>
>>> Chinese
>>>  >> positionIncrementGap="1000" autoGeneratePhraseQueries="false">
>>>  
>>>  
>>>
>>>
>>>
>>>
>>> 
>>>
>>> Japanese
>>> >> autoGeneratePhraseQueries="false">
>>>
>>>  
>>>   
>>>   >> tags="stoptags_ja.txt" />
>>>   
>>>   >> words="stopwords_ja.txt" />
>>>   >> minimumLength="4"/>
>>>   
>>>
>>> 
>>>
>>> Korean
>>> >> positionIncrementGap="1000" autoGeneratePhraseQueries="false">
>>>   
>>> 
>>> >> hasCNoun="true"  bigrammable="true"/>
>>> 
>>> >> words="stopwords_kr.txt"/>
>>>   
>>>   
>>> 
>>> >> hasCNoun="false"  bigrammable="false"/>
>>> 
>>> >> words="stopwords_kr.txt"/>
>>>   
>>> 
>>>
>>> I am really struck how to implement this. Please help me.
>>>
>>> Thanks,
>>> Poornima
>>>
>>>
>>>
>>> On Thursday, 10 July 2014 2:22 PM, Alexandre Rafalovitch 
>>>  wrote:
>>>
>>>
>>>
>>> I don't think Solr ships with Korean Tokenizer, does it?
>>>
>>> If you are using a 3rd party one, you need to give full class name,
>

Re: Korean Tokenizer in solr

2014-07-14 Thread Poornima Jay

When I am trying to index the below error comes

java.io.FileNotFoundException: 
/home/searchuser/multicore/apac_content/data/tlog/tlog.000 (No 
such file or directory)





On Monday, 14 July 2014 2:07 PM, Poornima Jay  
wrote:
 


Yes, Below is my defined fieldtype


      
         
         
         
      
      
         
         
         
      
   

Please correct me if I am doing anything wrong here

Regards,
Poornima



On Monday, 14 July 2014 12:33 PM, Alexandre Rafalovitch  
wrote:



You sure, it's not a spelling error or something other weird like
that? Because Solr ships with that filter in it's example schema:
        

So, you can compare what you are doing differently with that.

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853



On Mon, Jul 14, 2014 at 1:58 PM, Poornima Jay
 wrote:
> I have upgrade the solr version to 4.8.1. But after making changes in the 
> schema file i am getting the below error
> Error instantiating class: 
> 'org.apache.lucene.analysis.cjk.CJKBigramFilterFactory'
> I assume CJKBigramFilterFactory and CJKFoldingFilterFactory are supported in 
> 4.8.1. Do I need to make any configuration changes to get this working.
>
> Please advice.
>
> Regards,
> Poornima
>
>
> On Thursday, 10 July 2014 2:45 PM, Alexandre Rafalovitch  
> wrote:
>
>
>
> I would suggest you read through all 12 (?) articles in this series:
> http://discovery-grindstone.blogspot.com/2013/10/cjk-with-solr-for-libraries-part-1.html
> . It will probably lay out most of the issues for you.
>
> And if you are starting, I would really suggest using the latest Solr
> (4.9). A lot more people remember what the latest version has then
> what was in 3.6. And, as the series above will tell you, some relevant
> issues had been fixed in more recent Solr versions.
>
> Regards,
>    Alex.
> Personal website: http://www.outerthoughts.com/
> Current project: http://www.solr-start.com/ - Accelerating your Solr 
> proficiency
>
>
>
> On Thu, Jul 10, 2014 at 4:11 PM, Poornima Jay
>  wrote:
>> Till now I was thinking solr will support KoreanTokenizer. I haven't used 
>> any other 3rd party one.
>> Actually the issue i am facing is I need to integrate English, Chinese, 
>> Japanese and Korean language search in a single site. Based on the user's 
>> selected language to search the fields will be queried appropriately.
>>
>> I tried using cjk for all the 3 languages like below but only few search 
>> terms work for Chinese and Japanese. nothing works for Korean.
>>
>> > positionIncrementGap="1" autoGeneratePhraseQueries="false">
>>      
>>         
>>         
>>         >class="edu.stanford.lucene.analysis.CJKFoldingFilterFactory"/>
>>         >id="Traditional-Simplified"/>
>>         >id="Katakana-Hiragana"/>
>>         
>>         >hiragana="true" katakana="true" hangul="true" outputUnigrams="true" />
>>       
>>     
>>
>> So i tried to implement individual fieldtype for each language as below
>>
>> Chinese
>>  >positionIncrementGap="1000" autoGeneratePhraseQueries="false">
>>      
>>          
>>            
>>            
>>            
>>        
>>     
>>
>> Japanese
>> > autoGeneratePhraseQueries="false">
>>    
>>      
>>       
>>       >tags="stoptags_ja.txt" />
>>       
>>       >words="stopwords_ja.txt" />
>>       >minimumLength="4"/>
>>       
>>    
>> 
>>
>> Korean
>> > autoGeneratePhraseQueries="false">
>>       
>>         
>>         >hasCNoun="true"  bigrammable="true"/>
>>         
>>         >words="stopwords_kr.txt"/>
>>       
>>       
>>         
>>         >hasCNoun="false"  bigrammable="false"/>
>>         
>>         >words="stopwords_kr.txt"/>
>>       
>>     
>>
>> I am really struck how to implement this. Please help me.
>>
>> Thanks,
>> Poornima
>>
>>
>>
>> On Thursday, 10 July 2014 2:22 PM, Alexandre Rafalovitch 
>>  wrote:
>>
>>
>>
>> I don't think Solr ships with Korean Tokenizer, does it?
>>
>> If you are using a 3rd party one, you need to give full class name,
>> not just solr.Korean... And you need the library added in the lib
>> statement in solrconfig.xml (at least in Solr 4).
>>
>> Regards,
>>    Alex.
>> Personal website: http://www.outerthoughts.com/
>> Current project: http://www.solr-start.com/ - Accelerating your Solr 
>> proficiency
>>
>>
>>
>> On Thu, Jul 10, 2014 at 3:23 PM, Poornima Jay
>>  wrote:
>>> I have defined the fieldtype inside the fields section.  When i checked the 
>>> error log i found the below error
>>>
>>> Caused by: java.lang.ClassNotFoundException: solr.KoreanTokenizerFactory
>>>
>>> SEVERE: org.apache.solr.common.SolrException: analyzer without class or 
>>> tokenizer & filter list
>>>
>>>
>>> Do i need to add any libraries for koreanTokenizer?
>>>
>>> Regards,
>>> Poornima
>>>
>>>
>>> On Thursday, 10 July 2014 1:03 PM, Alexandre Rafalovitch 
>>>  wrote:
>>>
>>>
>>>
>>> D

Re: Solr irregularly having QTime > 50000ms, stracing solr cures the problem

2014-07-14 Thread Harald Kirsch

Thanks IJ for the link. I am not sure this can solve my problem, because 
I have only one machine in play anyway.


Harald.

On 12.07.2014 20:49, IJ wrote:

GUess - I had the same issues as you. Was resolved
http://lucene.472066.n3.nabble.com/Slow-QTimes-5-seconds-for-Small-sized-Collections-td4143681.html

was resolved by adding an explicit host mapping entry on /etc/hosts for
inter node solr communication - thereby bypassing DNS Lookups.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-irregularly-having-QTime-5ms-stracing-solr-cures-the-problem-tp4146047p4146858.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Reference numbers for major page fauls per seconds, index size, query throughput

2014-07-14 Thread Harald Kirsch


Hello Erik,

thanks for the reply. Indeed the CPUs are kind of idling during the load 
test. They are not <20% but clearly don't get far beyond 40%.


Changing the number of threads in jmeter has minor effects only on the 
qps, but increases the average latency, as soon as the threads outnumber 
the CPUs --- expected behavior I would say.


I varied the number of results returned between 20 and 10 with no 
remarkable changes in performance.


I restricted to fl=id and even this increased the throughput only 
minimally (meantime the index has 16 million, increase from 2.x qps to 
3). Jmeter reported a reduction in average transferred size from 10kByes 
to 2.5kBytes. This is not really the issue here and in the end we need 
more than the IDs in production anyway.


What really bugs me currently is that htop reports an IORR (supposed to 
be read(2) calls) of between 100 to 200 MByte/s during the load test.


This somehow runs contrary to my understanding of why Solr uses mmapped 
files. There should be no read(2) calls and certainly not 200 MB/s :-/ 
And this did not drop when I restricted to fl=id.


I will try to check this with strace to see were it is reading from.

Hints appreciated. With a bit of luck, I'll get more RAM and can compare 
then.


Thanks,
Harald.


On 12.07.2014 17:58, Erick Erickson wrote:

If the stats you're reporting are during the load test, your CPU is
kind of idling along at < 20% which supports your theory.

Just to cover all bases, when you bump the number of threads jmeter is
firing does it make any difference? And how many rows are you
returning? This latter is important because to return documents, Solr
needs to go out to disk, possibly generating your page faults
(guessing here).

One note about your index size it's largely useless to measure
index on disk if for no other reason than the _stored_ data doesn't
really count towards memory requirements for search. The *.fdt an
d*.fdx segment files contain the stored data, so subtract them out

Speaking of which, try just returning the id (&fl=id). That should
reduce the disk seeks due to assembling the docs.

But 4 qps for simple term queries seems very slow at first blush.

FWIW,
Erick

On Thu, Jul 10, 2014 at 7:30 AM, Harald Kirsch
 wrote:

Hi everyone,

currently I am taking some performance measurements on a Solr installation
and I am trying to figure out if what I see mostly fits expectations:

The data is as follows:

- solr 4.8.1
- 8 millon documents
- mostly office documents with real text content, stored
- index size on disk 90G
- full index memory mapped into virtual memory:
- this is a on a vmware server, 4 cores, 16 GB RAM

PID PR  NI  VIRT  RES  SHR S   %CPU %MEMTIME+  nFLT
961 20   0 93.9g  10g 6.0g S 19 64.5 718:39.81 757k

When I start running a jmeter query test sending requests as fast a possible
with a few threads, it peaks at about 4 qps with a real-world query replay
of mostly 1, 2, sometimes more terms.

What I see are around 150 to 200 major page faults per second, meaning that
Solr is not really happy with what happens to be in memory at any instance
in time.

My hunch is that this hints at a too small RAM footprint. Much more RAM is
needed to get the number of major page faults down.

Would anyone agree or disagree with this analysis. Someone out there saying
"200 major page faults/second are normal, there must be another problem"?

Thanks,
Harald.

Re: Solr irregularly having QTime > 50000ms, stracing solr cures the problem

2014-07-14 Thread Harald Kirsch

This problem seems to completely disappear under load. I started making 
load tests despite fearing them to be useless. It turns out that there 
are no more 5 ms delays under load.


Harald.

On 09.07.2014 09:50, Harald Kirsch wrote:

Good point. I will see if I can get the necessary access rights on this
machine to run tcpdump.

Thanks for the suggestion,
Harald.

On 09.07.2014 00:32, Steve McKay wrote:

Sure sounds like a socket bug, doesn't it? I turn to tcpdump when Solr
starts behaving strangely in a socket-related way. Knowing exactly
what's happening at the transport level is worth a month of guessing
and poking.

On Jul 8, 2014, at 3:53 AM, Harald Kirsch 
wrote:


Hi all,

This is what happens when I run a regular wget query to log the
current number of documents indexed:

2014-07-08:07:23:28 QTime=20 numFound="5720168"
2014-07-08:07:24:28 QTime=12 numFound="5721126"
2014-07-08:07:25:28 QTime=19 numFound="5721126"
2014-07-08:07:27:18 QTime=50071 numFound="5721126"
2014-07-08:07:29:08 QTime=50058 numFound="5724494"
2014-07-08:07:30:58 QTime=50033 numFound="5730710"
2014-07-08:07:31:58 QTime=13 numFound="5730710"
2014-07-08:07:33:48 QTime=50065 numFound="5734069"
2014-07-08:07:34:48 QTime=16 numFound="5737742"
2014-07-08:07:36:38 QTime=50037 numFound="5737742"
2014-07-08:07:37:38 QTime=12 numFound="5738190"
2014-07-08:07:38:38 QTime=23 numFound="5741208"
2014-07-08:07:40:29 QTime=50034 numFound="5742067"
2014-07-08:07:41:29 QTime=12 numFound="5742067"
2014-07-08:07:42:29 QTime=17 numFound="5742067"
2014-07-08:07:43:29 QTime=20 numFound="5745497"
2014-07-08:07:44:29 QTime=13 numFound="5745981"
2014-07-08:07:45:29 QTime=23 numFound="5746420"

As you can see, the QTime is just over 50 seconds at irregular
intervals.

This happens independent of whether I am indexing documents with
around 20 dps or not. First I thought about a dependence on the
auto-commit of 5 minutes, but the the 50 seconds hits are too irregular.

Furthermore, and this is *really strange*: when hooking strace on the
solr process, the 50 seconds QTimes disappear completely and
consistently --- a real Heisenbug.

Nevertheless, strace shows that there is a socket timeout of 50
seconds defined in calls like this:

[pid  1253] 09:09:37.857413 poll([{fd=96, events=POLLIN|POLLERR}], 1,
5) = 1 ([{fd=96, revents=POLLIN}]) <0.40>

where the fd=96 is the result of

[pid 25446] 09:09:37.855235 accept(122, {sa_family=AF_INET,
sin_port=htons(57236), sin_addr=inet_addr("ip address of local
host")}, [16]) = 96 <0.54>

where again fd=122 is the TCP port on which solr was started.

My hunch is that this is communication between the cores of solr.

I tried to search the internet for such a strange connection between
socket timeouts and strace, but could not find anything (the
stackoverflow entry from yesterday is my own :-(


This smells a bit like a race condition/deadlock kind of thing which
is broken up by timing differences introduced by stracing the process.

Any hints appreciated.

For completeness, here is my setup:
- solr-4.8.1,
- cloud version running
- 10 shards on 10 cores in one instance
- hosted on SUSE Linux Enterprise Server 11 (x86_64), VERSION 11,
PATCHLEVEL 2
- hosted on a vmware, 4 CPU cores, 16 GB RAM
- single digit million docs indexed, exact number does not matter
- zero query load


Harald.

Of, To, and Other Small Words

2014-07-14 Thread Teague James

Hello all,

I am working with Solr 4.9.0 and am searching for phrases that contain words
like "of" or "to" that Solr seems to be ignoring at index time. Here's what
I tried:

curl http://localhost/solr/update?commit=true -H "Content-Type: text/xml"
--data-binary '100blah blah blah knowledge of science blah blah
blah'

Then, using a broswer:

http://localhost/solr/collection1/select?q="knowledge+of+science"&fq=id:100

I get zero hits. Search for "knowledge" or "science" and I'll get hits.
"knowledge of" or "of science" and I get zero hits. I don't want to use
proximity if I can avoid it, as this may introduce too many undesireable
results. Stopwords.txt is blank, yet clearly Solr is ignoring "of" and "to"
and possibly more words that I have not discovered through testing yet. Is
there some other configuration file that contains these small words? Is
there any way to force Solr to pay attention to them and not drop them from
the phrase? Any advice is appreciated! Thanks!

-Teague

Re: Of, To, and Other Small Words

2014-07-14 Thread Anshum Gupta

Hi Teague,

The StopFilterFactory (which I think you're using) by default uses
lang/stopwords_en.txt (which wouldn't be empty if you check).
What you're looking at is the stopword.txt. You could either empty
that file out or change the field type for your field.


On Mon, Jul 14, 2014 at 12:53 PM, Teague James  wrote:
> Hello all,
>
> I am working with Solr 4.9.0 and am searching for phrases that contain words
> like "of" or "to" that Solr seems to be ignoring at index time. Here's what
> I tried:
>
> curl http://localhost/solr/update?commit=true -H "Content-Type: text/xml"
> --data-binary '100 name="content">blah blah blah knowledge of science blah blah
> blah'
>
> Then, using a broswer:
>
> http://localhost/solr/collection1/select?q="knowledge+of+science"&fq=id:100
>
> I get zero hits. Search for "knowledge" or "science" and I'll get hits.
> "knowledge of" or "of science" and I get zero hits. I don't want to use
> proximity if I can avoid it, as this may introduce too many undesireable
> results. Stopwords.txt is blank, yet clearly Solr is ignoring "of" and "to"
> and possibly more words that I have not discovered through testing yet. Is
> there some other configuration file that contains these small words? Is
> there any way to force Solr to pay attention to them and not drop them from
> the phrase? Any advice is appreciated! Thanks!
>
> -Teague
>
>



-- 

Anshum Gupta
http://www.anshumgupta.net

Re: Of, To, and Other Small Words

2014-07-14 Thread Jack Krupansky

Or, if you happen to leave off the "words" attribute of the stop filter (or 
misspell the attribute name), it will use the internal Lucene hardwired list 
of stop words.


-- Jack Krupansky

-Original Message- 
From: Anshum Gupta

Sent: Monday, July 14, 2014 4:03 PM
To: solr-user@lucene.apache.org
Subject: Re: Of, To, and Other Small Words

Hi Teague,

The StopFilterFactory (which I think you're using) by default uses
lang/stopwords_en.txt (which wouldn't be empty if you check).
What you're looking at is the stopword.txt. You could either empty
that file out or change the field type for your field.


On Mon, Jul 14, 2014 at 12:53 PM, Teague James  
wrote:

Hello all,

I am working with Solr 4.9.0 and am searching for phrases that contain 
words
like "of" or "to" that Solr seems to be ignoring at index time. Here's 
what

I tried:

curl http://localhost/solr/update?commit=true -H "Content-Type: text/xml"
--data-binary '100blah blah blah knowledge of science blah blah
blah'

Then, using a broswer:

http://localhost/solr/collection1/select?q="knowledge+of+science"&fq=id:100

I get zero hits. Search for "knowledge" or "science" and I'll get hits.
"knowledge of" or "of science" and I get zero hits. I don't want to use
proximity if I can avoid it, as this may introduce too many undesireable
results. Stopwords.txt is blank, yet clearly Solr is ignoring "of" and 
"to"

and possibly more words that I have not discovered through testing yet. Is
there some other configuration file that contains these small words? Is
there any way to force Solr to pay attention to them and not drop them 
from

the phrase? Any advice is appreciated! Thanks!

-Teague






--

Anshum Gupta
http://www.anshumgupta.net

Strategies for effective prefix queries?

2014-07-14 Thread Hayden Muhl

I'm working on using Solr for autocompleting usernames. I'm running into a
problem with the wildcard queries (e.g. username:al*).

We are tokenizing usernames so that a username like "solr-user" will be
tokenized into "solr" and "user", and will match both "sol" and "use"
prefixes. The problem is when we get "solr-u" as a prefix, I'm having to
split that up on the client side before I construct a query "username:solr*
username:u*". I'm basically using a regex as a poor man's tokenizer.

Is there a better way to approach this? Is there a way to tell Solr to
tokenize a string and use the parts as prefixes?

- Hayden

RE: Of, To, and Other Small Words

2014-07-14 Thread Teague James

Hi Anshum,

Thanks for replying and suggesting this, but the field type I am using (a 
modified text_general) in my schema has the file set to 'stopwords.txt'. 

Just to be double sure I cleared the list in stopwords_en.txt, restarted Solr, 
re-indexed, and searched with still zero results. Any other suggestions on 
where I might be able to control this behavior?

-Teague

-Original Message-
From: Anshum Gupta [mailto:ans...@anshumgupta.net] 
Sent: Monday, July 14, 2014 4:04 PM
To: solr-user@lucene.apache.org
Subject: Re: Of, To, and Other Small Words

Hi Teague,

The StopFilterFactory (which I think you're using) by default uses 
lang/stopwords_en.txt (which wouldn't be empty if you check).
What you're looking at is the stopword.txt. You could either empty that file 
out or change the field type for your field.

On Mon, Jul 14, 2014 at 12:53 PM, Teague James  wrote:
> Hello all,
>
> I am working with Solr 4.9.0 and am searching for phrases that contain 
> words like "of" or "to" that Solr seems to be ignoring at index time. 
> Here's what I tried:
>
> curl http://localhost/solr/update?commit=true -H "Content-Type: text/xml"
> --data-binary '100 name="content">blah blah blah knowledge of science blah blah 
> blah'
>
> Then, using a broswer:
>
> http://localhost/solr/collection1/select?q="knowledge+of+science"&fq=i
> d:100
>
> I get zero hits. Search for "knowledge" or "science" and I'll get hits.
> "knowledge of" or "of science" and I get zero hits. I don't want to 
> use proximity if I can avoid it, as this may introduce too many 
> undesireable results. Stopwords.txt is blank, yet clearly Solr is ignoring 
> "of" and "to"
> and possibly more words that I have not discovered through testing 
> yet. Is there some other configuration file that contains these small 
> words? Is there any way to force Solr to pay attention to them and not 
> drop them from the phrase? Any advice is appreciated! Thanks!
>
> -Teague
>
>

-- 

Anshum Gupta
http://www.anshumgupta.net

Re: Of, To, and Other Small Words

2014-07-14 Thread Alexandre Rafalovitch

Have you tried the Admin UI's Analyze screen. Because it will show you
what happens to the text as it progresses through the tokenizers and
filters. No need to reindex.

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On Tue, Jul 15, 2014 at 8:10 AM, Teague James  wrote:
> Hi Anshum,
>
> Thanks for replying and suggesting this, but the field type I am using (a 
> modified text_general) in my schema has the file set to 'stopwords.txt'.
>
>  positionIncrementGap="100">
> 
> 
>  ignoreCase="true" words="stopwords.txt" />
> 
> 
> 
>  minGramSize="3" maxGramSize="10" />
> 
> 
> 
> 
> 
>  ignoreCase="true" words="stopwords.txt" />
>  synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
> 
> 
> 
> 
> 
>
> Just to be double sure I cleared the list in stopwords_en.txt, restarted 
> Solr, re-indexed, and searched with still zero results. Any other suggestions 
> on where I might be able to control this behavior?
>
> -Teague
>
>
> -Original Message-
> From: Anshum Gupta [mailto:ans...@anshumgupta.net]
> Sent: Monday, July 14, 2014 4:04 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Of, To, and Other Small Words
>
> Hi Teague,
>
> The StopFilterFactory (which I think you're using) by default uses 
> lang/stopwords_en.txt (which wouldn't be empty if you check).
> What you're looking at is the stopword.txt. You could either empty that file 
> out or change the field type for your field.
>
>
> On Mon, Jul 14, 2014 at 12:53 PM, Teague James  
> wrote:
>> Hello all,
>>
>> I am working with Solr 4.9.0 and am searching for phrases that contain
>> words like "of" or "to" that Solr seems to be ignoring at index time.
>> Here's what I tried:
>>
>> curl http://localhost/solr/update?commit=true -H "Content-Type: text/xml"
>> --data-binary '100> name="content">blah blah blah knowledge of science blah blah
>> blah'
>>
>> Then, using a broswer:
>>
>> http://localhost/solr/collection1/select?q="knowledge+of+science"&fq=i
>> d:100
>>
>> I get zero hits. Search for "knowledge" or "science" and I'll get hits.
>> "knowledge of" or "of science" and I get zero hits. I don't want to
>> use proximity if I can avoid it, as this may introduce too many
>> undesireable results. Stopwords.txt is blank, yet clearly Solr is ignoring 
>> "of" and "to"
>> and possibly more words that I have not discovered through testing
>> yet. Is there some other configuration file that contains these small
>> words? Is there any way to force Solr to pay attention to them and not
>> drop them from the phrase? Any advice is appreciated! Thanks!
>>
>> -Teague
>>
>>
>
>
>
> --
>
> Anshum Gupta
> http://www.anshumgupta.net
>

Re: Strategies for effective prefix queries?

2014-07-14 Thread Alexandre Rafalovitch

Search against both fields (one split, one not split)? Keep original
and tokenized form? I am doing something similar with class name
autocompletes here:
https://github.com/arafalov/Solr-Javadoc/blob/master/JavadocIndex/JavadocCollection/conf/schema.xml#L24

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853

On Tue, Jul 15, 2014 at 8:04 AM, Hayden Muhl  wrote:
> I'm working on using Solr for autocompleting usernames. I'm running into a
> problem with the wildcard queries (e.g. username:al*).
>
> We are tokenizing usernames so that a username like "solr-user" will be
> tokenized into "solr" and "user", and will match both "sol" and "use"
> prefixes. The problem is when we get "solr-u" as a prefix, I'm having to
> split that up on the client side before I construct a query "username:solr*
> username:u*". I'm basically using a regex as a poor man's tokenizer.
>
> Is there a better way to approach this? Is there a way to tell Solr to
> tokenize a string and use the parts as prefixes?
>
> - Hayden

RE: Of, To, and Other Small Words

2014-07-14 Thread Teague James

Jack,

Thanks for replying and the suggestion. I replied to another suggestion with my 
field type and I do have .  There's nothing in the 
stopwords.txt. I even cleaned out stopwords_en.txt just to be certain. Any 
other suggestions on how to control this behavior?

-Teague

-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com] 
Sent: Monday, July 14, 2014 4:26 PM
To: solr-user@lucene.apache.org
Subject: Re: Of, To, and Other Small Words

Or, if you happen to leave off the "words" attribute of the stop filter (or 
misspell the attribute name), it will use the internal Lucene hardwired list of 
stop words.

-- Jack Krupansky

-Original Message-
From: Anshum Gupta
Sent: Monday, July 14, 2014 4:03 PM
To: solr-user@lucene.apache.org
Subject: Re: Of, To, and Other Small Words

Hi Teague,

The StopFilterFactory (which I think you're using) by default uses 
lang/stopwords_en.txt (which wouldn't be empty if you check).
What you're looking at is the stopword.txt. You could either empty that file 
out or change the field type for your field.

On Mon, Jul 14, 2014 at 12:53 PM, Teague James 
wrote:
> Hello all,
>
> I am working with Solr 4.9.0 and am searching for phrases that contain 
> words like "of" or "to" that Solr seems to be ignoring at index time. 
> Here's what I tried:
>
> curl http://localhost/solr/update?commit=true -H "Content-Type: text/xml"
> --data-binary '100 name="content">blah blah blah knowledge of science blah blah 
> blah'
>
> Then, using a broswer:
>
> http://localhost/solr/collection1/select?q="knowledge+of+science"&fq=i
> d:100
>
> I get zero hits. Search for "knowledge" or "science" and I'll get hits.
> "knowledge of" or "of science" and I get zero hits. I don't want to 
> use proximity if I can avoid it, as this may introduce too many 
> undesireable results. Stopwords.txt is blank, yet clearly Solr is 
> ignoring "of" and "to"
> and possibly more words that I have not discovered through testing 
> yet. Is there some other configuration file that contains these small 
> words? Is there any way to force Solr to pay attention to them and not 
> drop them from the phrase? Any advice is appreciated! Thanks!
>
> -Teague
>
>

-- 

Anshum Gupta
http://www.anshumgupta.net

RE: Of, To, and Other Small Words

2014-07-14 Thread Teague James

Alex,

Thanks! Great suggestion. I figured out that it was the EdgeNGramFilterFactory. 
Taking that out of the mix did it.

-Teague

-Original Message-
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] 
Sent: Monday, July 14, 2014 9:14 PM
To: solr-user
Subject: Re: Of, To, and Other Small Words

Have you tried the Admin UI's Analyze screen. Because it will show you what 
happens to the text as it progresses through the tokenizers and filters. No 
need to reindex.

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov Solr resources: 
http://www.solr-start.com/ and @solrstart Solr popularizers community: 
https://www.linkedin.com/groups?gid=6713853


On Tue, Jul 15, 2014 at 8:10 AM, Teague James  wrote:
> Hi Anshum,
>
> Thanks for replying and suggesting this, but the field type I am using (a 
> modified text_general) in my schema has the file set to 'stopwords.txt'.
>
>  positionIncrementGap="100">
> 
> 
>  ignoreCase="true" words="stopwords.txt" />
> 
> 
> 
>  minGramSize="3" maxGramSize="10" />
> 
> 
> 
> 
> 
>  ignoreCase="true" words="stopwords.txt" />
>  synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
> 
> 
> 
> 
> 
>
> Just to be double sure I cleared the list in stopwords_en.txt, restarted 
> Solr, re-indexed, and searched with still zero results. Any other suggestions 
> on where I might be able to control this behavior?
>
> -Teague
>
>
> -Original Message-
> From: Anshum Gupta [mailto:ans...@anshumgupta.net]
> Sent: Monday, July 14, 2014 4:04 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Of, To, and Other Small Words
>
> Hi Teague,
>
> The StopFilterFactory (which I think you're using) by default uses 
> lang/stopwords_en.txt (which wouldn't be empty if you check).
> What you're looking at is the stopword.txt. You could either empty that file 
> out or change the field type for your field.
>
>
> On Mon, Jul 14, 2014 at 12:53 PM, Teague James  
> wrote:
>> Hello all,
>>
>> I am working with Solr 4.9.0 and am searching for phrases that 
>> contain words like "of" or "to" that Solr seems to be ignoring at index time.
>> Here's what I tried:
>>
>> curl http://localhost/solr/update?commit=true -H "Content-Type: text/xml"
>> --data-binary '100> name="content">blah blah blah knowledge of science blah blah 
>> blah'
>>
>> Then, using a broswer:
>>
>> 
>> i
>> d:100
>>
>> I get zero hits. Search for "knowledge" or "science" and I'll get hits.
>> "knowledge of" or "of science" and I get zero hits. I don't want to 
>> use proximity if I can avoid it, as this may introduce too many 
>> undesireable results. Stopwords.txt is blank, yet clearly Solr is ignoring 
>> "of" and "to"
>> and possibly more words that I have not discovered through testing 
>> yet. Is there some other configuration file that contains these small 
>> words? Is there any way to force Solr to pay attention to them and 
>> not drop them from the phrase? Any advice is appreciated! Thanks!
>>
>> -Teague
>>
>>
>
>
>
> --
>
> Anshum Gupta
> http://www.anshumgupta.net
>

Re: Of, To, and Other Small Words

2014-07-14 Thread Alexandre Rafalovitch

You could try experimenting with CommonGramsFilterFactory and
CommonGramsQueryFilter (slightly different). There is actually a lot
of cool analyzers bundled with Solr. You can find full list on my site
at: http://www.solr-start.com/info/analyzers

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On Tue, Jul 15, 2014 at 8:42 AM, Teague James  wrote:
> Alex,
>
> Thanks! Great suggestion. I figured out that it was the 
> EdgeNGramFilterFactory. Taking that out of the mix did it.
>
> -Teague
>
> -Original Message-
> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
> Sent: Monday, July 14, 2014 9:14 PM
> To: solr-user
> Subject: Re: Of, To, and Other Small Words
>
> Have you tried the Admin UI's Analyze screen. Because it will show you what 
> happens to the text as it progresses through the tokenizers and filters. No 
> need to reindex.
>
> Regards,
>Alex.
> Personal: http://www.outerthoughts.com/ and @arafalov Solr resources: 
> http://www.solr-start.com/ and @solrstart Solr popularizers community: 
> https://www.linkedin.com/groups?gid=6713853
>
>
> On Tue, Jul 15, 2014 at 8:10 AM, Teague James  
> wrote:
>> Hi Anshum,
>>
>> Thanks for replying and suggesting this, but the field type I am using (a 
>> modified text_general) in my schema has the file set to 'stopwords.txt'.
>>
>> > positionIncrementGap="100">
>> 
>> 
>> > ignoreCase="true" words="stopwords.txt" />
>> 
>> 
>> 
>> > minGramSize="3" maxGramSize="10" />
>> 
>> 
>> 
>> 
>> 
>> > ignoreCase="true" words="stopwords.txt" />
>> > synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>> 
>> 
>> 
>> 
>> 
>>
>> Just to be double sure I cleared the list in stopwords_en.txt, restarted 
>> Solr, re-indexed, and searched with still zero results. Any other 
>> suggestions on where I might be able to control this behavior?
>>
>> -Teague
>>
>>
>> -Original Message-
>> From: Anshum Gupta [mailto:ans...@anshumgupta.net]
>> Sent: Monday, July 14, 2014 4:04 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Of, To, and Other Small Words
>>
>> Hi Teague,
>>
>> The StopFilterFactory (which I think you're using) by default uses 
>> lang/stopwords_en.txt (which wouldn't be empty if you check).
>> What you're looking at is the stopword.txt. You could either empty that file 
>> out or change the field type for your field.
>>
>>
>> On Mon, Jul 14, 2014 at 12:53 PM, Teague James  
>> wrote:
>>> Hello all,
>>>
>>> I am working with Solr 4.9.0 and am searching for phrases that
>>> contain words like "of" or "to" that Solr seems to be ignoring at index 
>>> time.
>>> Here's what I tried:
>>>
>>> curl http://localhost/solr/update?commit=true -H "Content-Type: text/xml"
>>> --data-binary '100>> name="content">blah blah blah knowledge of science blah blah
>>> blah'
>>>
>>> Then, using a broswer:
>>>
>>> 
>>> i
>>> d:100
>>>
>>> I get zero hits. Search for "knowledge" or "science" and I'll get hits.
>>> "knowledge of" or "of science" and I get zero hits. I don't want to
>>> use proximity if I can avoid it, as this may introduce too many
>>> undesireable results. Stopwords.txt is blank, yet clearly Solr is ignoring 
>>> "of" and "to"
>>> and possibly more words that I have not discovered through testing
>>> yet. Is there some other configuration file that contains these small
>>> words? Is there any way to force Solr to pay attention to them and
>>> not drop them from the phrase? Any advice is appreciated! Thanks!
>>>
>>> -Teague
>>>
>>>
>>
>>
>>
>> --
>>
>> Anshum Gupta
>> http://www.anshumgupta.net
>>
>

Re: External File Field eating memory

2014-07-14 Thread Apoorva Gaurav

Hey Kamal,
What all config changes have you done to establish replication of external
files and how have you disabled role reloading?


On Wed, Jul 9, 2014 at 11:30 AM, Kamal Kishore Aggarwal <
kkroyal@gmail.com> wrote:

> Hi All,
>
> It was found that external file, which was getting replicated after every
> 10 minutes was reloading the core as well. This was increasing the query
> time.
>
> Thanks
> Kamal Kishore
>
>
>
> On Thu, Jul 3, 2014 at 12:48 PM, Kamal Kishore Aggarwal <
> kkroyal@gmail.com> wrote:
>
> > With the above replication configuration, the eff file is getting
> > replicated at core/conf/data/external_eff_views (new dir data is being
> > created in conf dir) location, but it is not getting replicated at
> core/data/external_eff_views
> > on slave.
> >
> > Please help.
> >
> >
> > On Thu, Jul 3, 2014 at 12:21 PM, Kamal Kishore Aggarwal <
> > kkroyal@gmail.com> wrote:
> >
> >> Thanks for your guidance Alexandre Rafalovitch.
> >>
> >> I am looking into this seriously.
> >>
> >> Another question is that I facing error in replication of eff file
> >>
> >> This is master replication configuration:
> >>
> >> core/conf/solrconfig.xml
> >>
> >> 
> >>> 
> >>> commit
> >>> startup
> >>> ../data/external_eff_views
> >>> 
> >>> 
> >>
> >>
> >> The eff file is present at core/data/external_eff_views location.
> >>
> >>
> >> On Thu, Jul 3, 2014 at 11:50 AM, Shalin Shekhar Mangar <
> >> shalinman...@gmail.com> wrote:
> >>
> >>> This might be related:
> >>>
> >>> https://issues.apache.org/jira/browse/SOLR-3514
> >>>
> >>>
> >>> On Sat, Jun 28, 2014 at 5:34 PM, Kamal Kishore Aggarwal <
> >>> kkroyal@gmail.com> wrote:
> >>>
> >>> > Hi Team,
> >>> >
> >>> > I have recently implemented EFF in solr. There are about 1.5
> >>> lacs(unsorted)
> >>> > values in the external file. After this implementation, the server
> has
> >>> > become slow. The solr query time has also increased.
> >>> >
> >>> > Can anybody confirm me if these issues are because of this
> >>> implementation.
> >>> > Is that memory does EFF eats up?
> >>> >
> >>> > Regards
> >>> > Kamal Kishore
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>> Regards,
> >>> Shalin Shekhar Mangar.
> >>>
> >>
> >>
> >
>



-- 
Thanks & Regards,
Apoorva

Re: Korean Tokenizer in solr

Re: Korean Tokenizer in solr

Re: Korean Tokenizer in solr

Re: Korean Tokenizer in solr

Re: Solr irregularly having QTime > 50000ms, stracing solr cures the problem

Re: Reference numbers for major page fauls per seconds, index size, query throughput

Re: Solr irregularly having QTime > 50000ms, stracing solr cures the problem

Of, To, and Other Small Words

Re: Of, To, and Other Small Words

Re: Of, To, and Other Small Words

Strategies for effective prefix queries?

RE: Of, To, and Other Small Words

Re: Of, To, and Other Small Words

Re: Strategies for effective prefix queries?

RE: Of, To, and Other Small Words

RE: Of, To, and Other Small Words

Re: Of, To, and Other Small Words

Re: External File Field eating memory

18 matches

Site Navigation

Mail list logo

Footer information