Re: leaks in solr

2012-07-27 Thread roz dev
in my case, I see only 1 searcher, no field cache - still Old Gen is almost
full at 22 GB

Does it have to do with index or some other configuration

-Saroj

On Thu, Jul 26, 2012 at 7:41 PM, Lance Norskog  wrote:

> What does the "Statistics" page in the Solr admin say? There might be
> several "searchers" open: org.apache.solr.search.SolrIndexSearcher
>
> Each searcher holds open different generations of the index. If
> obsolete index files are held open, it may be old searchers. How big
> are the caches? How long does it take to autowarm them?
>
> On Thu, Jul 26, 2012 at 6:15 PM, Karthick Duraisamy Soundararaj
>  wrote:
> > Mark,
> > We use solr 3.6.0 on freebsd 9. Over a period of time, it
> > accumulates lots of space!
> >
> > On Thu, Jul 26, 2012 at 8:47 PM, roz dev  wrote:
> >
> >> Thanks Mark.
> >>
> >> We are never calling commit or optimize with openSearcher=false.
> >>
> >> As per logs, this is what is happening
> >>
> >>
> openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false}
> >>
> >> --
> >> But, We are going to use 4.0 Alpha and see if that helps.
> >>
> >> -Saroj
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Thu, Jul 26, 2012 at 5:12 PM, Mark Miller 
> >> wrote:
> >>
> >> > I'd take a look at this issue:
> >> > https://issues.apache.org/jira/browse/SOLR-3392
> >> >
> >> > Fixed late April.
> >> >
> >> > On Jul 26, 2012, at 7:41 PM, roz dev  wrote:
> >> >
> >> > > it was from 4/11/12
> >> > >
> >> > > -Saroj
> >> > >
> >> > > On Thu, Jul 26, 2012 at 4:21 PM, Mark Miller  >
> >> > wrote:
> >> > >
> >> > >>
> >> > >> On Jul 26, 2012, at 3:18 PM, roz dev  wrote:
> >> > >>
> >> > >>> Hi Guys
> >> > >>>
> >> > >>> I am also seeing this problem.
> >> > >>>
> >> > >>> I am using SOLR 4 from Trunk and seeing this issue repeat every
> day.
> >> > >>>
> >> > >>> Any inputs about how to resolve this would be great
> >> > >>>
> >> > >>> -Saroj
> >> > >>
> >> > >>
> >> > >> Trunk from what date?
> >> > >>
> >> > >> - Mark
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> >
> >> > - Mark Miller
> >> > lucidimagination.com
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >>
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>


too many instances of "org.tartarus.snowball.Among" in the heap

2012-07-27 Thread roz dev
Hi All

I am trying to find out the reason for very high memory use and ran JMAP
-hist

It is showing that i have too many instances of org.tartarus.snowball.Among

Any ideas what is this for and why am I getting so many of them

num   #instances#bytes  Class description
--
*1:  467281101869124400  org.tartarus.snowball.Among
*
2:  5244210 1840458960  byte[]
3:  526519495969839368  char[]
4:  10008928864769280   int[]
5:  10250527410021080
java.util.LinkedHashMap$Entry
6:  4672811 268474232   org.tartarus.snowball.Among[]
*7:  8072312 258313984   java.util.HashMap$Entry*
8:  466514  246319392   org.apache.lucene.util.fst.FST$Arc[]
9:  1828542 237600432   java.util.HashMap$Entry[]
10: 3834312 153372480   java.util.TreeMap$Entry
11: 2684700 128865600
org.apache.lucene.util.fst.Builder$UnCompiledNode
12: 4712425 113098200   org.apache.lucene.util.BytesRef
13: 3484836 111514752   java.lang.String
14: 2636045 105441800   org.apache.lucene.index.FieldInfo
15: 1813561 101559416   java.util.LinkedHashMap
16: 6291619 100665904   java.lang.Integer
17: 2684700 85910400
org.apache.lucene.util.fst.Builder$Arc
18: 956998  84215824
org.apache.lucene.index.TermsHashPerField
19: 2892957 69430968
org.apache.lucene.util.AttributeSource$State
20: 2684700 64432800
org.apache.lucene.util.fst.Builder$Arc[]
21: 685595  60332360org.apache.lucene.util.fst.FST
22: 933451  59210944java.lang.Object[]
23: 957043  53594408org.apache.lucene.util.BytesRefHash
24: 591463  42585336
org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader
25: 424801  40780896
org.tartarus.snowball.ext.EnglishStemmer
26: 424801  40780896
org.apache.lucene.analysis.miscellaneous.WordDelimiterFilter
27: 1549670 37192080org.apache.lucene.index.Term
28: 849602  33984080
org.apache.lucene.analysis.miscellaneous.WordDelimiterFilter$WordDelimiterConcatenation
29: 424801  27187264
org.apache.lucene.analysis.core.WhitespaceTokenizer
30: 478499  26795944
org.apache.lucene.index.FreqProxTermsWriterPerField
31: 535521  25705008
org.apache.lucene.index.FreqProxTermsWriterPerField$FreqProxPostingsArray
32: 219081  24537072
org.apache.lucene.codecs.BlockTreeTermsWriter$TermsWriter
33: 478499  22967952
org.apache.lucene.index.FieldInvertState
34: 956998  22967952
org.apache.lucene.index.TermsHashPerField$PostingsBytesStartArray
35: 478499  22967952
org.apache.lucene.index.TermVectorsConsumerPerField
36: 478499  22967952
org.apache.lucene.index.NormsConsumerPerField
37: 316582  22793904
org.apache.lucene.store.MMapDirectory$MMapIndexInput
38: 906708  21760992
org.apache.lucene.util.AttributeSource$State[]
39: 906708  21760992
org.apache.lucene.analysis.tokenattributes.OffsetAttributeImpl
40: 883588  21206112java.util.ArrayList
41: 438192  21033216
org.apache.lucene.store.RAMOutputStream
42: 860601  20654424java.lang.StringBuilder
43: 424801  20390448
org.apache.lucene.analysis.miscellaneous.WordDelimiterIterator
44: 424801  20390448
org.apache.lucene.analysis.core.StopFilter
45: 424801  20390448
org.apache.lucene.analysis.miscellaneous.KeywordMarkerFilter
46: 424801  20390448
org.apache.lucene.analysis.snowball.SnowballFilter
47: 839390  20145360
org.apache.lucene.index.DocumentsWriterDeleteQueue$TermNode


-Saroj


Re: Upgrade solr 1.4.1 to 3.6

2012-07-27 Thread alexander81
Yes, the index.
You know any link/documentation about upgrade solr 1.4.1 -> 3.6?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Upgrade-solr-1-4-1-to-3-6-tp3996952p3997678.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Skip first word

2012-07-27 Thread Finotti Simone
Hi Chantal,

if I understand correctly, this implies that I have to populate different 
fields according to their lenght. Since I'm not aware of any logical condition 
you can apply to copyField directive, it means that this logic has to be 
implementend by the process that populates the Solr core. Is this assumption 
correct?

That's kind of bad, because I'd like to have this kind of "rules" in the Solr 
configuration. Of course, if that's the only way... :)

Thank you 


Inizio: Chantal Ackermann [c.ackerm...@it-agenten.com]
Inviato: giovedì 26 luglio 2012 18.32
Fine: solr-user@lucene.apache.org
Oggetto: Re: Skip first word

Hi,

use two fields:
1. KeywordTokenizer (= single token) with ngram minsize=1 and maxsize=2 for 
inputs of length < 3,
2. the other one tokenized as appropriate with minsize=3 and longer for all 
longer inputs


Cheers,
Chantal


Am 26.07.2012 um 09:05 schrieb Finotti Simone:

> Hi Ahmet,
> business asked me to apply EdgeNGram with minGramSize=1 on the first term and 
> with minGramSize=3 on the latter terms.
>
> We are developing a search suggestion mechanism, the idea is that if the user 
> types "D", the engine should suggest "Dolce & Gabbana", but if we type "G", 
> it should suggest other brands. Only if users type "Gab" it should suggest 
> "Dolce & Gabbana".
>
> Thanks
> S
> 
> Inizio: Ahmet Arslan [iori...@yahoo.com]
> Inviato: mercoledì 25 luglio 2012 18.10
> Fine: solr-user@lucene.apache.org
> Oggetto: Re: Skip first word
>
>> is there a tokenizer and/or a combination of filter to
>> remove the first term from a field?
>>
>> For example:
>> The quick brown fox
>>
>> should be tokenized as:
>> quick
>> brown
>> fox
>
> There is no such filter that i know of. Though, you can implement one with 
> modifying source code of LengthFilterFactory or StopFilterFactory. They both 
> remove tokens. Out of curiosity, what is the use case for this?
>
>
>
>







R: Skip first word

2012-07-27 Thread Finotti Simone
Could you elaborate it, please? 

thanks
S


Inizio: in.abdul [in.ab...@gmail.com]
Inviato: giovedì 26 luglio 2012 20.36
Fine: solr-user@lucene.apache.org
Oggetto: Re: Skip first word

That's is best option I had also used shingle filter factory . .
On Jul 26, 2012 10:03 PM, "Chantal Ackermann-2 [via Lucene]" <
ml-node+s472066n399748...@n3.nabble.com> wrote:

> Hi,
>
> use two fields:
> 1. KeywordTokenizer (= single token) with ngram minsize=1 and maxsize=2
> for inputs of length < 3,
> 2. the other one tokenized as appropriate with minsize=3 and longer for
> all longer inputs
>
>
> Cheers,
> Chantal
>
>
> Am 26.07.2012 um 09:05 schrieb Finotti Simone:
>
> > Hi Ahmet,
> > business asked me to apply EdgeNGram with minGramSize=1 on the first
> term and with minGramSize=3 on the latter terms.
> >
> > We are developing a search suggestion mechanism, the idea is that if the
> user types "D", the engine should suggest "Dolce & Gabbana", but if we type
> "G", it should suggest other brands. Only if users type "Gab" it should
> suggest "Dolce & Gabbana".
> >
> > Thanks
> > S
> > 
> > Inizio: Ahmet Arslan [[hidden 
> > email]]
>
> > Inviato: mercoledì 25 luglio 2012 18.10
> > Fine: [hidden email]
> > Oggetto: Re: Skip first word
> >
> >> is there a tokenizer and/or a combination of filter to
> >> remove the first term from a field?
> >>
> >> For example:
> >> The quick brown fox
> >>
> >> should be tokenized as:
> >> quick
> >> brown
> >> fox
> >
> > There is no such filter that i know of. Though, you can implement one
> with modifying source code of LengthFilterFactory or StopFilterFactory.
> They both remove tokens. Out of curiosity, what is the use case for this?
> >
> >
> >
> >
>
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
> http://lucene.472066.n3.nabble.com/Skip-first-word-tp3997277p3997480.html
>  To unsubscribe from Lucene, click 
> here
> .
> NAML
>




-
THANKS AND REGARDS,
SYED ABDUL KATHER
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Skip-first-word-tp3997277p3997509.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: Skip first word

2012-07-27 Thread Chantal Ackermann
Hi Simone,

no I meant that you populate the two fields with the same input - best done via 
copyField directive.

The first field will contain ngrams of size 1 and 2. The other field will 
contain ngrams of size 3 and longer (you might want to set a decent maxsize 
there).

The query for the autocomplete list uses the first field when the input (typed 
in by the user) is one or two characters long. Your example was: "D", "G", or 
than "Do" or "Ga". The result would search only on the single token field that 
contains for the input "Dolce & Gabbana" only the ngrams "D" and "Do". So, only 
the input "D" or "Do" would result in a hit on "Dolce & Gabbana".
Once the user has typed in the third letter: "Dol" or "Gab", you query the 
second, more tokenized field which would contain for "Dolce & Gabbana" the 
ngrams "Dol" "Dolc" "Dolce" "Gab" "Gabb" "Gabba" etc.
Both inputs "Gab" and "Dol" would then return "Dolce & Gabbana".

1. First  field type:




2. Secong field type:





3. field declarations:







Chantal

Am 27.07.2012 um 11:05 schrieb Finotti Simone:

> Hi Chantal,
> 
> if I understand correctly, this implies that I have to populate different 
> fields according to their lenght. Since I'm not aware of any logical 
> condition you can apply to copyField directive, it means that this logic has 
> to be implementend by the process that populates the Solr core. Is this 
> assumption correct?
> 
> That's kind of bad, because I'd like to have this kind of "rules" in the Solr 
> configuration. Of course, if that's the only way... :)
> 
> Thank you 
> 
> 
> Inizio: Chantal Ackermann [c.ackerm...@it-agenten.com]
> Inviato: giovedì 26 luglio 2012 18.32
> Fine: solr-user@lucene.apache.org
> Oggetto: Re: Skip first word
> 
> Hi,
> 
> use two fields:
> 1. KeywordTokenizer (= single token) with ngram minsize=1 and maxsize=2 for 
> inputs of length < 3,
> 2. the other one tokenized as appropriate with minsize=3 and longer for all 
> longer inputs
> 
> 
> Cheers,
> Chantal
> 
> 
> Am 26.07.2012 um 09:05 schrieb Finotti Simone:
> 
>> Hi Ahmet,
>> business asked me to apply EdgeNGram with minGramSize=1 on the first term 
>> and with minGramSize=3 on the latter terms.
>> 
>> We are developing a search suggestion mechanism, the idea is that if the 
>> user types "D", the engine should suggest "Dolce & Gabbana", but if we type 
>> "G", it should suggest other brands. Only if users type "Gab" it should 
>> suggest "Dolce & Gabbana".
>> 
>> Thanks
>> S
>> 
>> Inizio: Ahmet Arslan [iori...@yahoo.com]
>> Inviato: mercoledì 25 luglio 2012 18.10
>> Fine: solr-user@lucene.apache.org
>> Oggetto: Re: Skip first word
>> 
>>> is there a tokenizer and/or a combination of filter to
>>> remove the first term from a field?
>>> 
>>> For example:
>>> The quick brown fox
>>> 
>>> should be tokenized as:
>>> quick
>>> brown
>>> fox
>> 
>> There is no such filter that i know of. Though, you can implement one with 
>> modifying source code of LengthFilterFactory or StopFilterFactory. They both 
>> remove tokens. Out of curiosity, what is the use case for this?
>> 
>> 
>> 
>> 
> 
> 
> 
> 
> 



dynamic EdgeNGramFilter

2012-07-27 Thread Alexander Helhorn

hi

is there a possibility to configure the minGramSize (EdgeNGramFilter) 
dynamically while searching a term.


all my content is indexed with minGramSize=3 and that is ok but when I 
want to search a term like *communic*... solr should not return results 
like *com*puter, *com*mander, *com*a, ...


I know I can avoid this when I use quotes like "communic" but isn't 
there a better way? It would be nice when I could tell solr (for 
instance with a query parameter) which amout of characters must be 
idendical with the search term --> dynamic minGramSize.


I hope someone can help me.

--
Mit freundlichen Grüßen
Alexander Helhorn
BA-Student/IT-Service

Kommunale Immobilien Jena
Paradiesstr. 6
07743 Jena

Tel.: 0 36 41 49- 55 11
Fax:  0 36 41 49- 11 55 11
E-Mail: alexander.helh...@jena.de
Internet: www.kij.de




__ Information from ESET Mail Security, version of virus signature 
database 7333 (20120727) __

The message was checked by ESET Mail Security.
http://www.eset.com



Re: Skip first word

2012-07-27 Thread Finotti Simone
Brilliant!
Thank you very much :)


Inizio: Chantal Ackermann [c.ackerm...@it-agenten.com]
Inviato: venerdì 27 luglio 2012 11.20
Fine: solr-user@lucene.apache.org
Oggetto: Re: Skip first word

Hi Simone,

no I meant that you populate the two fields with the same input - best done via 
copyField directive.

The first field will contain ngrams of size 1 and 2. The other field will 
contain ngrams of size 3 and longer (you might want to set a decent maxsize 
there).

The query for the autocomplete list uses the first field when the input (typed 
in by the user) is one or two characters long. Your example was: "D", "G", or 
than "Do" or "Ga". The result would search only on the single token field that 
contains for the input "Dolce & Gabbana" only the ngrams "D" and "Do". So, only 
the input "D" or "Do" would result in a hit on "Dolce & Gabbana".
Once the user has typed in the third letter: "Dol" or "Gab", you query the 
second, more tokenized field which would contain for "Dolce & Gabbana" the 
ngrams "Dol" "Dolc" "Dolce" "Gab" "Gabb" "Gabba" etc.
Both inputs "Gab" and "Dol" would then return "Dolce & Gabbana".

1. First  field type:




2. Secong field type:





3. field declarations:







Chantal

Am 27.07.2012 um 11:05 schrieb Finotti Simone:

> Hi Chantal,
>
> if I understand correctly, this implies that I have to populate different 
> fields according to their lenght. Since I'm not aware of any logical 
> condition you can apply to copyField directive, it means that this logic has 
> to be implementend by the process that populates the Solr core. Is this 
> assumption correct?
>
> That's kind of bad, because I'd like to have this kind of "rules" in the Solr 
> configuration. Of course, if that's the only way... :)
>
> Thank you
>
> 
> Inizio: Chantal Ackermann [c.ackerm...@it-agenten.com]
> Inviato: giovedì 26 luglio 2012 18.32
> Fine: solr-user@lucene.apache.org
> Oggetto: Re: Skip first word
>
> Hi,
>
> use two fields:
> 1. KeywordTokenizer (= single token) with ngram minsize=1 and maxsize=2 for 
> inputs of length < 3,
> 2. the other one tokenized as appropriate with minsize=3 and longer for all 
> longer inputs
>
>
> Cheers,
> Chantal
>
>
> Am 26.07.2012 um 09:05 schrieb Finotti Simone:
>
>> Hi Ahmet,
>> business asked me to apply EdgeNGram with minGramSize=1 on the first term 
>> and with minGramSize=3 on the latter terms.
>>
>> We are developing a search suggestion mechanism, the idea is that if the 
>> user types "D", the engine should suggest "Dolce & Gabbana", but if we type 
>> "G", it should suggest other brands. Only if users type "Gab" it should 
>> suggest "Dolce & Gabbana".
>>
>> Thanks
>> S
>> 
>> Inizio: Ahmet Arslan [iori...@yahoo.com]
>> Inviato: mercoledì 25 luglio 2012 18.10
>> Fine: solr-user@lucene.apache.org
>> Oggetto: Re: Skip first word
>>
>>> is there a tokenizer and/or a combination of filter to
>>> remove the first term from a field?
>>>
>>> For example:
>>> The quick brown fox
>>>
>>> should be tokenized as:
>>> quick
>>> brown
>>> fox
>>
>> There is no such filter that i know of. Though, you can implement one with 
>> modifying source code of LengthFilterFactory or StopFilterFactory. They both 
>> remove tokens. Out of curiosity, what is the use case for this?
>>
>>
>>
>>
>
>
>
>
>







Solr - customize Fragment using hl.fragmenter and hl.regex.pattern

2012-07-27 Thread meghana
 0 down vote favorite


I want solr highlight in specific format.

Below is string format for which i need to provide highlighting feature
---
130s: LISTEN! LISTEN! 138s: [THUMP] 143s: WHAT IS THAT? 144s: HEAR THAT?
152s: EVERYBODY, SHH. SHH. 156s: STAY UP THERE. 163s: [BOAT CREAKING] 165s:
WHAT IS THAT? 167s: [SCREAMING] 191s: COME ON! 192s: OH, GOD! 193s: AAH!
249s: OK. WE'VE HAD SOME PROBLEMS 253s: AT THE FACILITY. 253s: WHAT WE'RE
ATTEMPTING TO ACHIEVE 256s: HERE HAS NEVER BEEN DONE. 256s: WE'RE THIS CLOSE
259s: TO THE REACTIVATION 259s: OF A HUMAN BRAIN CELL. 260s: DOCTOR, THE 200
MILLION 264s: I'VE SUNK INTO THIS COMPANY 264s: IS DUE IN GREAT PART 266s:
TO YOUR RESEARCH.
---

after user search I want to provide user fragment in below format

*Previous Line of Highlight + Line containing Highlight + Next Line of
Highlight*

For. E.g. user searched for term hear , then one typical highlight fragment
should be like below

*143s: WHAT IS THAT? 144s: HEAR THAT? 152s: EVERYBODY, SHH.
SHH.*

above is my ultimate plan , but right now I am trying to get fragment as,
which start with ns: where n is numner between 0 to 

i use hl.regex.slop = 0.6 and my hl.fragsize=120 and below is regex for
that.

*\b(?=\s*\d{1,4}s:){50,200} *

using above regular expression my fragment always do not start with ns:

Please suggest me on this , how can i achieve ultimate plan

Thanks




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-customize-Fragment-using-hl-fragmenter-and-hl-regex-pattern-tp3997693.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Skip first word

2012-07-27 Thread Chantal Ackermann
Your're welcome :-)
C


Re: too many instances of "org.tartarus.snowball.Among" in the heap

2012-07-27 Thread Bernd Fehling
It is something from internally of the snowball analyzer (stemmer).

To find out more you should take a heapdump and look into it with
Memory Analyzer (MAT) http://www.eclipse.org/mat/

Regards,
Bernd


Am 27.07.2012 09:53, schrieb roz dev:
> Hi All
> 
> I am trying to find out the reason for very high memory use and ran JMAP
> -hist
> 
> It is showing that i have too many instances of org.tartarus.snowball.Among
> 
> Any ideas what is this for and why am I getting so many of them
> 
> num   #instances#bytes  Class description
> --
> *1:  467281101869124400  org.tartarus.snowball.Among
> *
> 2:  5244210 1840458960  byte[]
> 3:  526519495969839368  char[]
> 4:  10008928864769280   int[]
> 5:  10250527410021080
> java.util.LinkedHashMap$Entry
> 6:  4672811 268474232   org.tartarus.snowball.Among[]
> *7:  8072312 258313984   java.util.HashMap$Entry*
> 8:  466514  246319392   org.apache.lucene.util.fst.FST$Arc[]
> 9:  1828542 237600432   java.util.HashMap$Entry[]
> 10: 3834312 153372480   java.util.TreeMap$Entry
> 11: 2684700 128865600
> org.apache.lucene.util.fst.Builder$UnCompiledNode
> 12: 4712425 113098200   org.apache.lucene.util.BytesRef
> 13: 3484836 111514752   java.lang.String
> 14: 2636045 105441800   org.apache.lucene.index.FieldInfo
> 15: 1813561 101559416   java.util.LinkedHashMap
> 16: 6291619 100665904   java.lang.Integer
> 17: 2684700 85910400
> org.apache.lucene.util.fst.Builder$Arc
> 18: 956998  84215824
> org.apache.lucene.index.TermsHashPerField
> 19: 2892957 69430968
> org.apache.lucene.util.AttributeSource$State
> 20: 2684700 64432800
> org.apache.lucene.util.fst.Builder$Arc[]
> 21: 685595  60332360org.apache.lucene.util.fst.FST
> 22: 933451  59210944java.lang.Object[]
> 23: 957043  53594408org.apache.lucene.util.BytesRefHash
> 24: 591463  42585336
> org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader
> 25: 424801  40780896
> org.tartarus.snowball.ext.EnglishStemmer
> 26: 424801  40780896
> org.apache.lucene.analysis.miscellaneous.WordDelimiterFilter
> 27: 1549670 37192080org.apache.lucene.index.Term
> 28: 849602  33984080
> org.apache.lucene.analysis.miscellaneous.WordDelimiterFilter$WordDelimiterConcatenation
> 29: 424801  27187264
> org.apache.lucene.analysis.core.WhitespaceTokenizer
> 30: 478499  26795944
> org.apache.lucene.index.FreqProxTermsWriterPerField
> 31: 535521  25705008
> org.apache.lucene.index.FreqProxTermsWriterPerField$FreqProxPostingsArray
> 32: 219081  24537072
> org.apache.lucene.codecs.BlockTreeTermsWriter$TermsWriter
> 33: 478499  22967952
> org.apache.lucene.index.FieldInvertState
> 34: 956998  22967952
> org.apache.lucene.index.TermsHashPerField$PostingsBytesStartArray
> 35: 478499  22967952
> org.apache.lucene.index.TermVectorsConsumerPerField
> 36: 478499  22967952
> org.apache.lucene.index.NormsConsumerPerField
> 37: 316582  22793904
> org.apache.lucene.store.MMapDirectory$MMapIndexInput
> 38: 906708  21760992
> org.apache.lucene.util.AttributeSource$State[]
> 39: 906708  21760992
> org.apache.lucene.analysis.tokenattributes.OffsetAttributeImpl
> 40: 883588  21206112java.util.ArrayList
> 41: 438192  21033216
> org.apache.lucene.store.RAMOutputStream
> 42: 860601  20654424java.lang.StringBuilder
> 43: 424801  20390448
> org.apache.lucene.analysis.miscellaneous.WordDelimiterIterator
> 44: 424801  20390448
> org.apache.lucene.analysis.core.StopFilter
> 45: 424801  20390448
> org.apache.lucene.analysis.miscellaneous.KeywordMarkerFilter
> 46: 424801  20390448
> org.apache.lucene.analysis.snowball.SnowballFilter
> 47: 839390  20145360
> org.apache.lucene.index.DocumentsWriterDeleteQueue$TermNode
> 
> 
> -Saroj
> 

-- 
*
Bernd FehlingUniversitätsbibliothek Bielefeld
Dipl.-Inform. (FH)LibTec - Bibliothekstechnologie
Universitätsstr. 25 und Wissensmanagement
33615 Bielefeld
Tel. +49 521 106-4060   bernd.fehling(at)uni-bielefeld.de

BASE - Bielefeld Academic Search Engine - www.base-search.net
*




Re: too many instances of "org.tartarus.snowball.Among" in the heap

2012-07-27 Thread Alexandre Rafalovitch
Try taking a couple of thread dumps and see where in the stack the
snowball classes show up. That might give you a clue.

Did you customize the parameters to the stemmer? If so, maybe it has
problems with the file you gave it.

Just some generic thoughts that might help.

Regards,
   Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Fri, Jul 27, 2012 at 3:53 AM, roz dev  wrote:
> Hi All
>
> I am trying to find out the reason for very high memory use and ran JMAP
> -hist
>
> It is showing that i have too many instances of org.tartarus.snowball.Among
>
> Any ideas what is this for and why am I getting so many of them
>
> num   #instances#bytes  Class description
> --
> *1:  467281101869124400  org.tartarus.snowball.Among
> *
> 2:  5244210 1840458960  byte[]


Re: leaks in solr

2012-07-27 Thread Karthick Duraisamy Soundararaj
I have tons of these open.
searcherName : Searcher@24be0446 main
caching : true
numDocs : 1331167
maxDoc : 1338549
reader : SolrIndexReader{this=5585c0de,r=ReadOnlyDirectoryReader@5585c0de
,refCnt=1,segments=18}
readerDir : org.apache.lucene.store.NIOFSDirectory@
/usr/local/solr/highlander/data/..@2f2d9d89
indexVersion : 1336499508709
openedAt : Fri Jul 27 09:45:16 EDT 2012
registeredAt : Fri Jul 27 09:45:19 EDT 2012
warmupTime : 0

In my custom handler, I have the following code
I have the following problem
Although in my custom handler, I have the following implementation(its not
the full code but it gives an overall idea of the implementation) and it

  class CustomHandler extends SearchHandler {

void handleRequestBody(SolrQueryRequest req,SolrQueryResponse
rsp)

 SolrCore core= req.getCore();
 vector> requestParams =
new   vector>();
/*parse the params such a way that
requestParams[i] -=> parameter of the ith
request
  */
..

  try {
   vector subQueries = new
 vector(solrcore, requestParams[i]);

   for(i=0;i comps) {
   //  ResponseBuilder rb = new ResponseBuilder()  ;

   ..
 }
void handleRequestBody(SolrQueryRequest req, SolrQueryResponse)
{
 ResponseBuilder rb = new ResponseBuilder(req,rsp, new
ResponseBuilder());
 handleRequestBody(req, rsp, rb, comps) ;
 }
  }


I don see the index old index searcher geting closed after warming up the
new guy... Because I replicate every 5 mintues, it crashes in 2 hours..

On Fri, Jul 27, 2012 at 3:36 AM, roz dev  wrote:

> in my case, I see only 1 searcher, no field cache - still Old Gen is almost
> full at 22 GB
>
> Does it have to do with index or some other configuration
>
> -Saroj
>
> On Thu, Jul 26, 2012 at 7:41 PM, Lance Norskog  wrote:
>
> > What does the "Statistics" page in the Solr admin say? There might be
> > several "searchers" open: org.apache.solr.search.SolrIndexSearcher
> >
> > Each searcher holds open different generations of the index. If
> > obsolete index files are held open, it may be old searchers. How big
> > are the caches? How long does it take to autowarm them?
> >
> > On Thu, Jul 26, 2012 at 6:15 PM, Karthick Duraisamy Soundararaj
> >  wrote:
> > > Mark,
> > > We use solr 3.6.0 on freebsd 9. Over a period of time, it
> > > accumulates lots of space!
> > >
> > > On Thu, Jul 26, 2012 at 8:47 PM, roz dev  wrote:
> > >
> > >> Thanks Mark.
> > >>
> > >> We are never calling commit or optimize with openSearcher=false.
> > >>
> > >> As per logs, this is what is happening
> > >>
> > >>
> >
> openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false}
> > >>
> > >> --
> > >> But, We are going to use 4.0 Alpha and see if that helps.
> > >>
> > >> -Saroj
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> On Thu, Jul 26, 2012 at 5:12 PM, Mark Miller 
> > >> wrote:
> > >>
> > >> > I'd take a look at this issue:
> > >> > https://issues.apache.org/jira/browse/SOLR-3392
> > >> >
> > >> > Fixed late April.
> > >> >
> > >> > On Jul 26, 2012, at 7:41 PM, roz dev  wrote:
> > >> >
> > >> > > it was from 4/11/12
> > >> > >
> > >> > > -Saroj
> > >> > >
> > >> > > On Thu, Jul 26, 2012 at 4:21 PM, Mark Miller <
> markrmil...@gmail.com
> > >
> > >> > wrote:
> > >> > >
> > >> > >>
> > >> > >> On Jul 26, 2012, at 3:18 PM, roz dev  wrote:
> > >> > >>
> > >> > >>> Hi Guys
> > >> > >>>
> > >> > >>> I am also seeing this problem.
> > >> > >>>
> > >> > >>> I am using SOLR 4 from Trunk and seeing this issue repeat every
> > day.
> > >> > >>>
> > >> > >>> Any inputs about how to resolve this would be great
> > >> > >>>
> > >> > >>> -Saroj
> > >> > >>
> > >> > >>
> > >> > >> Trunk from what date?
> > >> > >>
> > >> > >> - Mark
> > >> > >>
> > >> > >>
> > >> > >>
> > >> > >>
> > >> > >>
> > >> > >>
> > >> > >>
> > >> > >>
> > >> > >>
> > >> > >>
> > >> >
> > >> > - Mark Miller
> > >> > lucidimagination.com
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >>
> >
> >
> >
> > --
> > Lance Norskog
> > goks...@gmail.com
> >
>


how solr will apply regex fragmenter

2012-07-27 Thread meghana
I was looking on Regex fragment for customizing my highlight fragment, I was
wondering  how Regex fragment works within solr and googled for it , But
didn't found any results.  

Can anybody tell me how regex fragmenter works with in solr. 

And when regex fragmenter apply regex on fragments , do i first get fragment
using default solr operation and then apply regex on it. Or it directly
apply regex on Search term and then return fragment..











--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-solr-will-apply-regex-fragmenter-tp3997749.html
Sent from the Solr - User mailing list archive at Nabble.com.


Problem with Solr 4.0-ALPHA and JSON response

2012-07-27 Thread Federico Valeri
Hi all, I'm new to Solr, I have a problem with JSON format, this is my Java
client code:

PrintWriter out = res.getWriter();
res.setContentType("text/plain");
String query = req.getParameter("query");
SolrServer solr = new HttpSolrServer(solrServer);
ModifiableSolrParams params = new ModifiableSolrParams();
params.set("qt", "/select");
params.set("q", "contenuto:(" + query + ")");
params.set("hl", "true");
params.set("hl.fl", "id,contenuto,score");
params.set("wt", "json");

QueryResponse response = solr.query(params);
log.debug(response.toString());
out.print(response.toString());
out.flush();

Now the problem is that I recieve the response but it doesn't trigger the
javascript callback function.
I see "wt=javabin" in SolrCore.execute log, even if I set wt=json in
paramters, is this normal?
This is the jQuery call to the server:

$.getJSON('solrServer.html', {query:
escape($('input[name=query]:visible').val())}, function(data){
var view = '';
for (var i=0; i';
}
$('#placeholder').html(view);
});

Thanks for reading.


Deduplication in SolrCloud

2012-07-27 Thread Daniel Brügge
Hi,

in my old Solr Setup I have used the deduplication feature in the update
chain
with couple of fields.


 
true
 signature
false
 uuid,type,url,content_hash
org.apache.solr.update.processor.Lookup3Signature
 

 


This worked fine. When I now use this in my 2 shards SolrCloud setup when
inserting 150.000 documents,
I am always getting an error:

*INFO: end_commit_flush*
*Jul 27, 2012 3:29:36 PM org.apache.solr.common.SolrException log*
*SEVERE: null:java.lang.RuntimeException: java.lang.OutOfMemoryError:
unable to create new native thread*
* at
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:456)
*
* at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:284)
*

I am inserting the documents via CSV import and curl command and split them
also into 50k chunks.

Without the dedupe chain, the import finishes after 40secs.

The curl command writes to one of my shards.


Do you have an idea why this happens? Should I reduce the fields to one? I
have read that not using the id as
dedupe fields could be an issue?


I have searched for deduplication with SolrCloud and I am wondering if it
is already working correctly? see e.g.
http://lucene.472066.n3.nabble.com/SolrCloud-deduplication-td3984657.html

Thanks & regards

Daniel


RE: Deduplication in SolrCloud

2012-07-27 Thread Markus Jelsma
This issue doesn't really describe your problem but a more general problem of 
distributed deduplication:
https://issues.apache.org/jira/browse/SOLR-3473
 
 
-Original message-
> From:Daniel Brügge 
> Sent: Fri 27-Jul-2012 17:38
> To: solr-user@lucene.apache.org
> Subject: Deduplication in SolrCloud
> 
> Hi,
> 
> in my old Solr Setup I have used the deduplication feature in the update
> chain
> with couple of fields.
> 
> 
>  
> true
>  signature
> false
>  uuid,type,url,content_hash
>  name="signatureClass">org.apache.solr.update.processor.Lookup3Signature
>  
> 
>  
> 
> 
> This worked fine. When I now use this in my 2 shards SolrCloud setup when
> inserting 150.000 documents,
> I am always getting an error:
> 
> *INFO: end_commit_flush*
> *Jul 27, 2012 3:29:36 PM org.apache.solr.common.SolrException log*
> *SEVERE: null:java.lang.RuntimeException: java.lang.OutOfMemoryError:
> unable to create new native thread*
> * at
> org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:456)
> *
> * at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:284)
> *
> 
> I am inserting the documents via CSV import and curl command and split them
> also into 50k chunks.
> 
> Without the dedupe chain, the import finishes after 40secs.
> 
> The curl command writes to one of my shards.
> 
> 
> Do you have an idea why this happens? Should I reduce the fields to one? I
> have read that not using the id as
> dedupe fields could be an issue?
> 
> 
> I have searched for deduplication with SolrCloud and I am wondering if it
> is already working correctly? see e.g.
> http://lucene.472066.n3.nabble.com/SolrCloud-deduplication-td3984657.html
> 
> Thanks & regards
> 
> Daniel
> 


question(s) re lucene spatial toolkit aka LSP aka spatial4j

2012-07-27 Thread solr-user
hopefully someone is using the lucene spatial toolkit aka LSP aka spatial4j,
and can answer this question

we are using this spatial tool for doing searches.  overall, it seems to
work very well.  however, finding documentation is difficult.

I have a couple of questions:

1. I have a geohash field in my solr schema that contains indexed geographic
polygon data.  I want to find all docs where that polygon intersects a given
lat/long.  I was experimenting with returning distance in the resultset and
with sorting by distance and found that the following query works.  However,
I dont know what distance means in the query.  i.e. is it distance from
point to the polygon centroid, to the closest outer edge of the polygon, its
a useless random value, etc. Does anyone know??

http://solrserver:solrport/solr/core0/select?q=*:*&fq={!v=$geoq%20cache=false}&geoq=wkt_search:%22Intersects(Circle(-97.057%2047.924%20d=0.01))%22&sort=query($geoq)+asc&fl=catchment_wkt1_trimmed,school_name,latitude,longitude,dist:query($geoq,-1),loc_city,loc_state

2. some of the polygons, being geographic representations, are very big (ie
state/province polygons).  when solr starts processing a spatial query (like
the one above), I can see ("INFO: Building Cache [xx]") it fills in some
sort of memory cache
(org.apache.lucene.spatial.strategy.util.ShapeFieldCache) of the indexed
polygon data.  We are encountering Java OOM issues when this occurs (even
when we booested the mem to 7GB). I know that some of the polygons can have
more than 2300 points, but heavy trimming isn't really an option due to
level of detail issues. Can we control this caching, or the indexing of the
polygons, in any way to reduce the memory requirements??



--
View this message in context: 
http://lucene.472066.n3.nabble.com/question-s-re-lucene-spatial-toolkit-aka-LSP-aka-spatial4j-tp3997757.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Bulk Indexing

2012-07-27 Thread Zhang, Lisheng
Hi,

Previously I asked a similar question and I have not fully implemented yet.

My plan is:
1) use Solr only for search, not for indexing
2) have a separate java process to index (calling lucene API directly, maybe
   can call Solr API, I need to check more details).

As other people pointed earlier, the problem with above plan is that Solr
does not know when to reload IndexSearcher (namely underlying IndexReader)
after indexing is done, since indexer and Solr are two separate processes?

My plan is to let Solr not to cache any IndexReader (each time when performing
search, just create a new IndexSearcher), because:

1) our app is made of many lucene indexed data folders (in Solr language, many
   cores), caching IndexSearcher would be too expensive.
2) in my experience, without caching search is still quite fast (this is 
   maybe partially due to the fact our indexed data is not large, per folder).

This is just my plan (not fully implemented yet).

Best regards, Lisheng

-Original Message-
From: Sohail Aboobaker [mailto:sabooba...@gmail.com]
Sent: Friday, July 27, 2012 6:56 AM
To: solr-user@lucene.apache.org
Subject: Bulk Indexing


Hi,

We have created a search service which is responsible for providing
interface between Solr and rest of our application. It basically takes one
document at a time and updates or adds it to appropriate index.

Now, in application, we have processes, that add products (our document are
based on products) in bulk using a data bulk load process. At this point,
we use the same search service to add the documents in a loop. These can be
up to 20,000 documents in one load.

In a recent solr user discussion, it seems like this is a no-no strategy
with red flags all around it.

What are other alternatives?

Thanks,

Regards,
Sohail Aboobaker.


Re: Bulk Indexing

2012-07-27 Thread Alexandre Rafalovitch
Haven't tried this but:
1) I think SOLR 4 supports on-the-fly core attach/detach/select. Can
somebody confirm this?
2) If 1) is true, run everything as two cores.
3) One core is live in production
4) Second core is detached from SOLR and attached to something like
SolrJ, which I believe can index without going over network
5) Once SolrJ finished bulk import indexing, switch the cores around

Or if you are not live, just use SolrJ to run the index and then
attached finished core to SOLR.

Regards,
   Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Fri, Jul 27, 2012 at 9:55 AM, Sohail Aboobaker  wrote:
> Hi,
>
> We have created a search service which is responsible for providing
> interface between Solr and rest of our application. It basically takes one
> document at a time and updates or adds it to appropriate index.
>
> Now, in application, we have processes, that add products (our document are
> based on products) in bulk using a data bulk load process. At this point,
> we use the same search service to add the documents in a loop. These can be
> up to 20,000 documents in one load.
>
> In a recent solr user discussion, it seems like this is a no-no strategy
> with red flags all around it.
>
> What are other alternatives?
>
> Thanks,
>
> Regards,
> Sohail Aboobaker.


Solr not getting OpenText document name and metadata

2012-07-27 Thread eShard
Hi,
I'm currently using ManifoldCF (v.5.1) to crawl OpenText (v10.5) and the
output is sent to Solr (4.0 alpha).
All I see in the index is an id = to the opentext download URL and a version
(a big integer value).
What I don't see is the document name from OpenText or any of the Opentext
metadata.
Does anyone know how I can get this data? because I can't even search by
document name or by document extension! 
Only a few of the documents actually have a title in the solr index. but the
Opentext name of the document is nowhere to be found.
if I know some text within the document I can search for that.
I'm using the default schema with tika as the extraction handler
I'm also using uprefix = attr to get all of the ignored properties but most
of those are useless.
Please advise...



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-not-getting-OpenText-document-name-and-metadata-tp3997786.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Bulk Indexing

2012-07-27 Thread Sohail Aboobaker
We will be using Solr 3.x version. I was wondering if we do need to worry
about this as we have only 10k index entries at a time. It sounds like a
very low number and we have only document type at this point.

Should we worry about directly using SolrJ for indexing and searching for
this low volume simple schema?


Re: Solr edismax NOT operator behavior

2012-07-27 Thread Jack Krupansky
"can any one explain" - add the &debugQuery=true option to your request and 
Solr will give an explanation, including the parsed query and the Lucene 
scoring of documents.


If you think Solr is wrong, show us a sample document that either is 
supposed to appear that doesn't, or doesn't appear and should. How are the 
results "unexpected"?


Then do simple queries, each using the id value for the unexplained document 
and each of the clauses in your expression.


-- Jack Krupansky

-Original Message- 
From: Alok Bhandari

Sent: Friday, July 27, 2012 1:55 AM
To: solr-user@lucene.apache.org
Subject: Solr edismax NOT operator behavior

Hello,

I am using Edismax parser and query submitted by application is of the
format

price:1000 AND ( NOT ( launch_date:[2007-06-07T00:00:00.000Z TO
2009-04-07T23:59:59.999Z] AND product_type:electronic)).

Solr while executing gives unexpected result. I am suspecting it is because
of the AND ( NOT  portion of the query .
Please can any one explain me how this structure is handled.

I am using solr 3.6

Any help is appreciated ..

Thanks
Alok





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-edismax-NOT-operator-behavior-tp3997663.html
Sent from the Solr - User mailing list archive at Nabble.com. 



RE: Bulk Indexing

2012-07-27 Thread Lan
I assume your're indexing on the same server that is used to execute search
queries. Adding 20K documents in bulk could cause the Solr Server to 'stop
the world' where the server would stop responding to queries.

My suggestion is 
- Setup master/slave to insulate your clients from 'stop the world' events
during indexing.
- Update in batches with a commit at the end of the batch.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Bulk-Indexing-tp3997745p3997815.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: leaks in solr

2012-07-27 Thread Karthick Duraisamy Soundararaj
Hello all,
While running in my eclipse and run a set of queries, this
works fine, but when I run it in test production server, the searchers are
leaked. Any hint would be appreciated. I have not used CoreContainer.

Considering that the SearchHandler is running fine, I am not able to think
of a reason why my extended version wouldnt work.. Does anyone have any
idea?

On Fri, Jul 27, 2012 at 10:19 AM, Karthick Duraisamy Soundararaj <
karthick.soundara...@gmail.com> wrote:

> I have tons of these open.
> searcherName : Searcher@24be0446 main
> caching : true
> numDocs : 1331167
> maxDoc : 1338549
> reader : SolrIndexReader{this=5585c0de,r=ReadOnlyDirectoryReader@5585c0de
> ,refCnt=1,segments=18}
> readerDir : org.apache.lucene.store.NIOFSDirectory@
> /usr/local/solr/highlander/data/..@2f2d9d89
> indexVersion : 1336499508709
> openedAt : Fri Jul 27 09:45:16 EDT 2012
> registeredAt : Fri Jul 27 09:45:19 EDT 2012
> warmupTime : 0
>
> In my custom handler, I have the following code
> I have the following problem
> Although in my custom handler, I have the following implementation(its not
> the full code but it gives an overall idea of the implementation) and it
>
>   class CustomHandler extends SearchHandler {
>
> void handleRequestBody(SolrQueryRequest req,SolrQueryResponse
> rsp)
>
>  SolrCore core= req.getCore();
>  vector> requestParams =
> new   vector>();
> /*parse the params such a way that
>  requestParams[i] -=> parameter of the ith
> request
>   */
> ..
>
>   try {
>vector subQueries = new
>  vector(solrcore, requestParams[i]);
>
>for(i=0;i   ResponseBuilder rb = new ResponseBuilder()
>   rb.req = req;
>
>   handlerRequestBody(req,rsp,rb); //this would
> call search handler's handler request body, whose signature, i have modified
>  }
>  } finally {
>   for(i=0; i  subQueries.get(i).close();
>  }
>   }
>
> *Search Handler Changes*
>   class SearchHandler {
> void handleRequestBody(SolrQueryRequest req, SolrQueryResponse
> rsp, ResponseBuilder rb, ArrayList comps) {
>//  ResponseBuilder rb = new ResponseBuilder()  ;
>
>..
>  }
> void handleRequestBody(SolrQueryRequest req,
> SolrQueryResponse) {
>  ResponseBuilder rb = new ResponseBuilder(req,rsp, new
> ResponseBuilder());
>  handleRequestBody(req, rsp, rb, comps) ;
>  }
>   }
>
>
> I don see the index old index searcher geting closed after warming up the
> new guy... Because I replicate every 5 mintues, it crashes in 2 hours..
>
>  On Fri, Jul 27, 2012 at 3:36 AM, roz dev  wrote:
>
>> in my case, I see only 1 searcher, no field cache - still Old Gen is
>> almost
>> full at 22 GB
>>
>> Does it have to do with index or some other configuration
>>
>> -Saroj
>>
>> On Thu, Jul 26, 2012 at 7:41 PM, Lance Norskog  wrote:
>>
>> > What does the "Statistics" page in the Solr admin say? There might be
>> > several "searchers" open: org.apache.solr.search.SolrIndexSearcher
>> >
>> > Each searcher holds open different generations of the index. If
>> > obsolete index files are held open, it may be old searchers. How big
>> > are the caches? How long does it take to autowarm them?
>> >
>> > On Thu, Jul 26, 2012 at 6:15 PM, Karthick Duraisamy Soundararaj
>> >  wrote:
>> > > Mark,
>> > > We use solr 3.6.0 on freebsd 9. Over a period of time, it
>> > > accumulates lots of space!
>> > >
>> > > On Thu, Jul 26, 2012 at 8:47 PM, roz dev  wrote:
>> > >
>> > >> Thanks Mark.
>> > >>
>> > >> We are never calling commit or optimize with openSearcher=false.
>> > >>
>> > >> As per logs, this is what is happening
>> > >>
>> > >>
>> >
>> openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false}
>> > >>
>> > >> --
>> > >> But, We are going to use 4.0 Alpha and see if that helps.
>> > >>
>> > >> -Saroj
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >> On Thu, Jul 26, 2012 at 5:12 PM, Mark Miller 
>> > >> wrote:
>> > >>
>> > >> > I'd take a look at this issue:
>> > >> > https://issues.apache.org/jira/browse/SOLR-3392
>> > >> >
>> > >> > Fixed late April.
>> > >> >
>> > >> > On Jul 26, 2012, at 7:41 PM, roz dev  wrote:
>> > >> >
>> > >> > > it was from 4/11/12
>> > >> > >
>> > >> > > -Saroj
>> > >> > >
>> > >> > > On Thu, Jul 26, 2012 at 4:21 PM, Mark Miller <
>> markrmil...@gmail.com
>> > >
>> > >> > wrote:
>> > >> > >
>> > >> > >>
>> > >> > >> On Jul 26, 2012, at 3:18 PM, roz dev 
>> wrote:

Re: leaks in solr

2012-07-27 Thread Karthick Duraisamy Soundararaj
Just to clarify, the leak happens everytime a new searcher is opened.

On Fri, Jul 27, 2012 at 8:28 PM, Karthick Duraisamy Soundararaj <
karthick.soundara...@gmail.com> wrote:

> Hello all,
> While running in my eclipse and run a set of queries, this
> works fine, but when I run it in test production server, the searchers are
> leaked. Any hint would be appreciated. I have not used CoreContainer.
>
> Considering that the SearchHandler is running fine, I am not able to think
> of a reason why my extended version wouldnt work.. Does anyone have any
> idea?
>
> On Fri, Jul 27, 2012 at 10:19 AM, Karthick Duraisamy Soundararaj <
> karthick.soundara...@gmail.com> wrote:
>
>> I have tons of these open.
>> searcherName : Searcher@24be0446 main
>> caching : true
>> numDocs : 1331167
>> maxDoc : 1338549
>> reader : SolrIndexReader{this=5585c0de,r=ReadOnlyDirectoryReader@5585c0de
>> ,refCnt=1,segments=18}
>> readerDir : org.apache.lucene.store.NIOFSDirectory@
>> /usr/local/solr/highlander/data/..@2f2d9d89
>> indexVersion : 1336499508709
>> openedAt : Fri Jul 27 09:45:16 EDT 2012
>> registeredAt : Fri Jul 27 09:45:19 EDT 2012
>> warmupTime : 0
>>
>> In my custom handler, I have the following code
>> I have the following problem
>> Although in my custom handler, I have the following implementation(its
>> not the full code but it gives an overall idea of the implementation) and it
>>
>>   class CustomHandler extends SearchHandler {
>>
>> void handleRequestBody(SolrQueryRequest req,SolrQueryResponse
>> rsp)
>>
>>  SolrCore core= req.getCore();
>>  vector> requestParams =
>> new   vector>();
>> /*parse the params such a way that
>>  requestParams[i] -=> parameter of the
>> ith request
>>   */
>> ..
>>
>>   try {
>>vector subQueries = new
>>  vector(solrcore, requestParams[i]);
>>
>>for(i=0;i>   ResponseBuilder rb = new ResponseBuilder()
>>   rb.req = req;
>>
>>   handlerRequestBody(req,rsp,rb); //this
>> would call search handler's handler request body, whose signature, i have
>> modified
>>  }
>>  } finally {
>>   for(i=0; i>  subQueries.get(i).close();
>>  }
>>   }
>>
>> *Search Handler Changes*
>>   class SearchHandler {
>> void handleRequestBody(SolrQueryRequest req,
>> SolrQueryResponse rsp, ResponseBuilder rb, ArrayList comps) {
>>//  ResponseBuilder rb = new ResponseBuilder()  ;
>>
>>..
>>  }
>> void handleRequestBody(SolrQueryRequest req,
>> SolrQueryResponse) {
>>  ResponseBuilder rb = new ResponseBuilder(req,rsp,
>> new ResponseBuilder());
>>  handleRequestBody(req, rsp, rb, comps) ;
>>  }
>>   }
>>
>>
>> I don see the index old index searcher geting closed after warming up the
>> new guy... Because I replicate every 5 mintues, it crashes in 2 hours..
>>
>>  On Fri, Jul 27, 2012 at 3:36 AM, roz dev  wrote:
>>
>>> in my case, I see only 1 searcher, no field cache - still Old Gen is
>>> almost
>>> full at 22 GB
>>>
>>> Does it have to do with index or some other configuration
>>>
>>> -Saroj
>>>
>>> On Thu, Jul 26, 2012 at 7:41 PM, Lance Norskog 
>>> wrote:
>>>
>>> > What does the "Statistics" page in the Solr admin say? There might be
>>> > several "searchers" open: org.apache.solr.search.SolrIndexSearcher
>>> >
>>> > Each searcher holds open different generations of the index. If
>>> > obsolete index files are held open, it may be old searchers. How big
>>> > are the caches? How long does it take to autowarm them?
>>> >
>>> > On Thu, Jul 26, 2012 at 6:15 PM, Karthick Duraisamy Soundararaj
>>> >  wrote:
>>> > > Mark,
>>> > > We use solr 3.6.0 on freebsd 9. Over a period of time, it
>>> > > accumulates lots of space!
>>> > >
>>> > > On Thu, Jul 26, 2012 at 8:47 PM, roz dev  wrote:
>>> > >
>>> > >> Thanks Mark.
>>> > >>
>>> > >> We are never calling commit or optimize with openSearcher=false.
>>> > >>
>>> > >> As per logs, this is what is happening
>>> > >>
>>> > >>
>>> >
>>> openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false}
>>> > >>
>>> > >> --
>>> > >> But, We are going to use 4.0 Alpha and see if that helps.
>>> > >>
>>> > >> -Saroj
>>> > >>
>>> > >>
>>> > >>
>>> > >>
>>> > >>
>>> > >>
>>> > >>
>>> > >>
>>> > >>
>>> > >>
>>> > >> On Thu, Jul 26, 2012 at 5:12 PM, Mark Miller >> >
>>> > >> wrote:
>>> > >>
>>> > >> > I'd take a look at this issue:
>>> > >> > https://issues.apache.org/jira/browse/SOLR-3392
>>> > >> >
>>> > >> > Fixed late April.
>>> > >

Re: leaks in solr

2012-07-27 Thread Lance Norskog
A finally clause can throw exceptions. Can this throw an exception?
 subQueries.get(i).close();

 If so, each close() call should be in a try-catch block.

On Fri, Jul 27, 2012 at 5:28 PM, Karthick Duraisamy Soundararaj
 wrote:
> Hello all,
> While running in my eclipse and run a set of queries, this
> works fine, but when I run it in test production server, the searchers are
> leaked. Any hint would be appreciated. I have not used CoreContainer.
>
> Considering that the SearchHandler is running fine, I am not able to think
> of a reason why my extended version wouldnt work.. Does anyone have any
> idea?
>
> On Fri, Jul 27, 2012 at 10:19 AM, Karthick Duraisamy Soundararaj <
> karthick.soundara...@gmail.com> wrote:
>
>> I have tons of these open.
>> searcherName : Searcher@24be0446 main
>> caching : true
>> numDocs : 1331167
>> maxDoc : 1338549
>> reader : SolrIndexReader{this=5585c0de,r=ReadOnlyDirectoryReader@5585c0de
>> ,refCnt=1,segments=18}
>> readerDir : org.apache.lucene.store.NIOFSDirectory@
>> /usr/local/solr/highlander/data/..@2f2d9d89
>> indexVersion : 1336499508709
>> openedAt : Fri Jul 27 09:45:16 EDT 2012
>> registeredAt : Fri Jul 27 09:45:19 EDT 2012
>> warmupTime : 0
>>
>> In my custom handler, I have the following code
>> I have the following problem
>> Although in my custom handler, I have the following implementation(its not
>> the full code but it gives an overall idea of the implementation) and it
>>
>>   class CustomHandler extends SearchHandler {
>>
>> void handleRequestBody(SolrQueryRequest req,SolrQueryResponse
>> rsp)
>>
>>  SolrCore core= req.getCore();
>>  vector> requestParams =
>> new   vector>();
>> /*parse the params such a way that
>>  requestParams[i] -=> parameter of the ith
>> request
>>   */
>> ..
>>
>>   try {
>>vector subQueries = new
>>  vector(solrcore, requestParams[i]);
>>
>>for(i=0;i>   ResponseBuilder rb = new ResponseBuilder()
>>   rb.req = req;
>>
>>   handlerRequestBody(req,rsp,rb); //this would
>> call search handler's handler request body, whose signature, i have modified
>>  }
>>  } finally {
>>   for(i=0; i>  subQueries.get(i).close();
>>  }
>>   }
>>
>> *Search Handler Changes*
>>   class SearchHandler {
>> void handleRequestBody(SolrQueryRequest req, SolrQueryResponse
>> rsp, ResponseBuilder rb, ArrayList comps) {
>>//  ResponseBuilder rb = new ResponseBuilder()  ;
>>
>>..
>>  }
>> void handleRequestBody(SolrQueryRequest req,
>> SolrQueryResponse) {
>>  ResponseBuilder rb = new ResponseBuilder(req,rsp, new
>> ResponseBuilder());
>>  handleRequestBody(req, rsp, rb, comps) ;
>>  }
>>   }
>>
>>
>> I don see the index old index searcher geting closed after warming up the
>> new guy... Because I replicate every 5 mintues, it crashes in 2 hours..
>>
>>  On Fri, Jul 27, 2012 at 3:36 AM, roz dev  wrote:
>>
>>> in my case, I see only 1 searcher, no field cache - still Old Gen is
>>> almost
>>> full at 22 GB
>>>
>>> Does it have to do with index or some other configuration
>>>
>>> -Saroj
>>>
>>> On Thu, Jul 26, 2012 at 7:41 PM, Lance Norskog  wrote:
>>>
>>> > What does the "Statistics" page in the Solr admin say? There might be
>>> > several "searchers" open: org.apache.solr.search.SolrIndexSearcher
>>> >
>>> > Each searcher holds open different generations of the index. If
>>> > obsolete index files are held open, it may be old searchers. How big
>>> > are the caches? How long does it take to autowarm them?
>>> >
>>> > On Thu, Jul 26, 2012 at 6:15 PM, Karthick Duraisamy Soundararaj
>>> >  wrote:
>>> > > Mark,
>>> > > We use solr 3.6.0 on freebsd 9. Over a period of time, it
>>> > > accumulates lots of space!
>>> > >
>>> > > On Thu, Jul 26, 2012 at 8:47 PM, roz dev  wrote:
>>> > >
>>> > >> Thanks Mark.
>>> > >>
>>> > >> We are never calling commit or optimize with openSearcher=false.
>>> > >>
>>> > >> As per logs, this is what is happening
>>> > >>
>>> > >>
>>> >
>>> openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false}
>>> > >>
>>> > >> --
>>> > >> But, We are going to use 4.0 Alpha and see if that helps.
>>> > >>
>>> > >> -Saroj
>>> > >>
>>> > >>
>>> > >>
>>> > >>
>>> > >>
>>> > >>
>>> > >>
>>> > >>
>>> > >>
>>> > >>
>>> > >> On Thu, Jul 26, 2012 at 5:12 PM, Mark Miller 
>>> > >> wrote:
>>> > >>
>>> > >> > I'd take a look at this issue:
>>> > >> > https://issues.apache.org/jira/browse/SOLR-3392
>>> > >

Re: Deduplication in SolrCloud

2012-07-27 Thread Lance Norskog
Should the old Signature code be removed? Given that the goal is to
have everyone use SolrCloud, maybe this kind of landmine should be
removed?

On Fri, Jul 27, 2012 at 8:43 AM, Markus Jelsma
 wrote:
> This issue doesn't really describe your problem but a more general problem of 
> distributed deduplication:
> https://issues.apache.org/jira/browse/SOLR-3473
>
>
> -Original message-
>> From:Daniel Brügge 
>> Sent: Fri 27-Jul-2012 17:38
>> To: solr-user@lucene.apache.org
>> Subject: Deduplication in SolrCloud
>>
>> Hi,
>>
>> in my old Solr Setup I have used the deduplication feature in the update
>> chain
>> with couple of fields.
>>
>> 
>>  
>> true
>>  signature
>> false
>>  uuid,type,url,content_hash
>> > name="signatureClass">org.apache.solr.update.processor.Lookup3Signature
>>  
>> 
>>  
>> 
>>
>> This worked fine. When I now use this in my 2 shards SolrCloud setup when
>> inserting 150.000 documents,
>> I am always getting an error:
>>
>> *INFO: end_commit_flush*
>> *Jul 27, 2012 3:29:36 PM org.apache.solr.common.SolrException log*
>> *SEVERE: null:java.lang.RuntimeException: java.lang.OutOfMemoryError:
>> unable to create new native thread*
>> * at
>> org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:456)
>> *
>> * at
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:284)
>> *
>>
>> I am inserting the documents via CSV import and curl command and split them
>> also into 50k chunks.
>>
>> Without the dedupe chain, the import finishes after 40secs.
>>
>> The curl command writes to one of my shards.
>>
>>
>> Do you have an idea why this happens? Should I reduce the fields to one? I
>> have read that not using the id as
>> dedupe fields could be an issue?
>>
>>
>> I have searched for deduplication with SolrCloud and I am wondering if it
>> is already working correctly? see e.g.
>> http://lucene.472066.n3.nabble.com/SolrCloud-deduplication-td3984657.html
>>
>> Thanks & regards
>>
>> Daniel
>>



-- 
Lance Norskog
goks...@gmail.com


Re: querying using filter query and lots of possible values

2012-07-27 Thread Chris Hostetter

: the list of IDs is constant for a longer time. I will take a look at
: these join thematic.
: Maybe another solution would be to really create a whole new
: collection or set of documents containing the aggregated documents (from the
: ids) from scratch and to execute queries on this collection. Then this
: would take
: some time, but maybe it's worth it because the querying will thank you.

Another avenue to consider...

http://lucene.apache.org/solr/api-4_0_0-ALPHA/org/apache/solr/schema/ExternalFileField.html

...would allow you to map values in your "source_id" to some numeric 
values (many to many) and these numeric values would then be accessible in 
functions -- so you could use something like fq={!frange ...} to select 
all docs with value 67 where your extenral file field says that value 67 
is mapped ot the following thousand source_id values.

the external field fields can then be modified at any time just by doing a 
commit on your index.



-Hoss


Re: leaks in solr

2012-07-27 Thread Karthick Duraisamy Soundararaj
First no. Because i do the following
for(i=0;i wrote:

> A finally clause can throw exceptions. Can this throw an exception?
>  subQueries.get(i).close();
>
>  If so, each close() call should be in a try-catch block.
>
> On Fri, Jul 27, 2012 at 5:28 PM, Karthick Duraisamy Soundararaj
>  wrote:
> > Hello all,
> > While running in my eclipse and run a set of queries, this
> > works fine, but when I run it in test production server, the searchers
> are
> > leaked. Any hint would be appreciated. I have not used CoreContainer.
> >
> > Considering that the SearchHandler is running fine, I am not able to
> think
> > of a reason why my extended version wouldnt work.. Does anyone have any
> > idea?
> >
> > On Fri, Jul 27, 2012 at 10:19 AM, Karthick Duraisamy Soundararaj <
> > karthick.soundara...@gmail.com> wrote:
> >
> >> I have tons of these open.
> >> searcherName : Searcher@24be0446 main
> >> caching : true
> >> numDocs : 1331167
> >> maxDoc : 1338549
> >> reader :
> SolrIndexReader{this=5585c0de,r=ReadOnlyDirectoryReader@5585c0de
> >> ,refCnt=1,segments=18}
> >> readerDir : org.apache.lucene.store.NIOFSDirectory@
> >> /usr/local/solr/highlander/data/..@2f2d9d89
> >> indexVersion : 1336499508709
> >> openedAt : Fri Jul 27 09:45:16 EDT 2012
> >> registeredAt : Fri Jul 27 09:45:19 EDT 2012
> >> warmupTime : 0
> >>
> >> In my custom handler, I have the following code
> >> I have the following problem
> >> Although in my custom handler, I have the following implementation(its
> not
> >> the full code but it gives an overall idea of the implementation) and it
> >>
> >>   class CustomHandler extends SearchHandler {
> >>
> >> void handleRequestBody(SolrQueryRequest
> req,SolrQueryResponse
> >> rsp)
> >>
> >>  SolrCore core= req.getCore();
> >>  vector> requestParams
> =
> >> new   vector>();
> >> /*parse the params such a way that
> >>  requestParams[i] -=> parameter of the
> ith
> >> request
> >>   */
> >> ..
> >>
> >>   try {
> >>vector subQueries = new
> >>  vector(solrcore, requestParams[i]);
> >>
> >>for(i=0;i >>   ResponseBuilder rb = new ResponseBuilder()
> >>   rb.req = req;
> >>
> >>   handlerRequestBody(req,rsp,rb); //this
> would
> >> call search handler's handler request body, whose signature, i have
> modified
> >>  }
> >>  } finally {
> >>   for(i=0; i >>  subQueries.get(i).close();
> >>  }
> >>   }
> >>
> >> *Search Handler Changes*
> >>   class SearchHandler {
> >> void handleRequestBody(SolrQueryRequest req,
> SolrQueryResponse
> >> rsp, ResponseBuilder rb, ArrayList comps) {
> >>//  ResponseBuilder rb = new ResponseBuilder()  ;
> >>
> >>..
> >>  }
> >> void handleRequestBody(SolrQueryRequest req,
> >> SolrQueryResponse) {
> >>  ResponseBuilder rb = new ResponseBuilder(req,rsp,
> new
> >> ResponseBuilder());
> >>  handleRequestBody(req, rsp, rb, comps) ;
> >>  }
> >>   }
> >>
> >>
> >> I don see the index old index searcher geting closed after warming up
> the
> >> new guy... Because I replicate every 5 mintues, it crashes in 2 hours..
> >>
> >>  On Fri, Jul 27, 2012 at 3:36 AM, roz dev  wrote:
> >>
> >>> in my case, I see only 1 searcher, no field cache - still Old Gen is
> >>> almost
> >>> full at 22 GB
> >>>
> >>> Does it have to do with index or some other configuration
> >>>
> >>> -Saroj
> >>>
> >>> On Thu, Jul 26, 2012 at 7:41 PM, Lance Norskog 
> wrote:
> >>>
> >>> > What does the "Statistics" page in the Solr admin say? There might be
> >>> > several "searchers" open: org.apache.solr.search.SolrIndexSearcher
> >>> >
> >>> > Each searcher holds open different generations of the index. If
> >>> > obsolete index files are held open, it may be old searchers. How big
> >>> > are the caches? How long does it take to autowarm them?
> >>> >
> >>> > On Thu, Jul 26, 2012 at 6:15 PM, Karthick Duraisamy Soundararaj
> >>> >  wrote:
> >>> > > Mark,
> >>> > > We use solr 3.6.0 on freebsd 9. Over a period of time, it
> >>> > > accumulates lots of space!
> >>> > >
> >>> > > On Thu, Jul 26, 2012 at 8:47 PM, roz dev 
> wrote:
> >>> > >
> >>> > >> Thanks Mark.
> >>> > >>
> >>> > >> We are never calling commit or optimize with openSearcher=false.
> >>> > >>
> >>> > >> As per logs, this is what is happening
> >>> > >>
> >>> > >>
> >>> >
> >>>
> openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false}
> >>> > >>
> >>> > >> --
> >>> > >> But, We ar

Re: leaks in solr

2012-07-27 Thread Karthick Duraisamy Soundararaj
SimpleOrderedMap commonRequestParams; //This holds the common
request params.
Vector> subQueryRequestParams;  // This holds the
request params of sub Queries

I use the above to create multiple localQueryRequests. To add a little more
information, I create new ResponseBuilder for each request

I also hold a reference to query component as a private member in my
CustomHandler. Considering that the component is initialized only once
during the start up, I assume this isnt a cause of concernt.

On Fri, Jul 27, 2012 at 9:49 PM, Karthick Duraisamy Soundararaj <
karthick.soundara...@gmail.com> wrote:

> First no. Because i do the following
> for(i=0;i   subQueries.get(i).close();
> }
>
> Second, I dont see any exception until the first searcher leak happens.
>
>
> On Fri, Jul 27, 2012 at 9:04 PM, Lance Norskog  wrote:
>
>> A finally clause can throw exceptions. Can this throw an exception?
>>  subQueries.get(i).close();
>>
>>  If so, each close() call should be in a try-catch block.
>>
>> On Fri, Jul 27, 2012 at 5:28 PM, Karthick Duraisamy Soundararaj
>>  wrote:
>> > Hello all,
>> > While running in my eclipse and run a set of queries, this
>> > works fine, but when I run it in test production server, the searchers
>> are
>> > leaked. Any hint would be appreciated. I have not used CoreContainer.
>> >
>> > Considering that the SearchHandler is running fine, I am not able to
>> think
>> > of a reason why my extended version wouldnt work.. Does anyone have any
>> > idea?
>> >
>> > On Fri, Jul 27, 2012 at 10:19 AM, Karthick Duraisamy Soundararaj <
>> > karthick.soundara...@gmail.com> wrote:
>> >
>> >> I have tons of these open.
>> >> searcherName : Searcher@24be0446 main
>> >> caching : true
>> >> numDocs : 1331167
>> >> maxDoc : 1338549
>> >> reader :
>> SolrIndexReader{this=5585c0de,r=ReadOnlyDirectoryReader@5585c0de
>> >> ,refCnt=1,segments=18}
>> >> readerDir : org.apache.lucene.store.NIOFSDirectory@
>> >> /usr/local/solr/highlander/data/..@2f2d9d89
>> >> indexVersion : 1336499508709
>> >> openedAt : Fri Jul 27 09:45:16 EDT 2012
>> >> registeredAt : Fri Jul 27 09:45:19 EDT 2012
>> >> warmupTime : 0
>> >>
>> >> In my custom handler, I have the following code
>> >> I have the following problem
>> >> Although in my custom handler, I have the following implementation(its
>> not
>> >> the full code but it gives an overall idea of the implementation) and
>> it
>> >>
>> >>   class CustomHandler extends SearchHandler {
>> >>
>> >> void handleRequestBody(SolrQueryRequest
>> req,SolrQueryResponse
>> >> rsp)
>> >>
>> >>  SolrCore core= req.getCore();
>> >>  vector>
>> requestParams =
>> >> new   vector>();
>> >> /*parse the params such a way that
>> >>  requestParams[i] -=> parameter of the
>> ith
>> >> request
>> >>   */
>> >> ..
>> >>
>> >>   try {
>> >>vector subQueries = new
>> >>  vector(solrcore, requestParams[i]);
>> >>
>> >>for(i=0;i> >>   ResponseBuilder rb = new
>> ResponseBuilder()
>> >>   rb.req = req;
>> >>
>> >>   handlerRequestBody(req,rsp,rb); //this
>> would
>> >> call search handler's handler request body, whose signature, i have
>> modified
>> >>  }
>> >>  } finally {
>> >>   for(i=0; i> >>  subQueries.get(i).close();
>> >>  }
>> >>   }
>> >>
>> >> *Search Handler Changes*
>> >>   class SearchHandler {
>> >> void handleRequestBody(SolrQueryRequest req,
>> SolrQueryResponse
>> >> rsp, ResponseBuilder rb, ArrayList comps) {
>> >>//  ResponseBuilder rb = new ResponseBuilder()  ;
>> >>
>> >>..
>> >>  }
>> >> void handleRequestBody(SolrQueryRequest req,
>> >> SolrQueryResponse) {
>> >>  ResponseBuilder rb = new ResponseBuilder(req,rsp,
>> new
>> >> ResponseBuilder());
>> >>  handleRequestBody(req, rsp, rb, comps) ;
>> >>  }
>> >>   }
>> >>
>> >>
>> >> I don see the index old index searcher geting closed after warming up
>> the
>> >> new guy... Because I replicate every 5 mintues, it crashes in 2 hours..
>> >>
>> >>  On Fri, Jul 27, 2012 at 3:36 AM, roz dev  wrote:
>> >>
>> >>> in my case, I see only 1 searcher, no field cache - still Old Gen is
>> >>> almost
>> >>> full at 22 GB
>> >>>
>> >>> Does it have to do with index or some other configuration
>> >>>
>> >>> -Saroj
>> >>>
>> >>> On Thu, Jul 26, 2012 at 7:41 PM, Lance Norskog 
>> wrote:
>> >>>
>> >>> > What does the "Statistics" page in the Solr admin say? There might
>> be
>> >>>

Re: leaks in solr

2012-07-27 Thread Karthick Duraisamy Soundararaj
subQueries.get(i).close() is nothing but pulling the refrence from the
vector and closing it. So yes. it wouldnt throw exception.

vector subQueries

Please let me know if you need any more information

On Fri, Jul 27, 2012 at 10:14 PM, Karthick Duraisamy Soundararaj <
karthick.soundara...@gmail.com> wrote:

> SimpleOrderedMap commonRequestParams; //This holds the common
> request params.
> Vector> subQueryRequestParams;  // This holds the
> request params of sub Queries
>
> I use the above to create multiple localQueryRequests. To add a little
> more information, I create new ResponseBuilder for each request
>
> I also hold a reference to query component as a private member in my
> CustomHandler. Considering that the component is initialized only once
> during the start up, I assume this isnt a cause of concernt.
>
> On Fri, Jul 27, 2012 at 9:49 PM, Karthick Duraisamy Soundararaj <
> karthick.soundara...@gmail.com> wrote:
>
>> First no. Because i do the following
>> for(i=0;i>   subQueries.get(i).close();
>> }
>>
>> Second, I dont see any exception until the first searcher leak happens.
>>
>>
>> On Fri, Jul 27, 2012 at 9:04 PM, Lance Norskog  wrote:
>>
>>> A finally clause can throw exceptions. Can this throw an exception?
>>>  subQueries.get(i).close();
>>>
>>>  If so, each close() call should be in a try-catch block.
>>>
>>> On Fri, Jul 27, 2012 at 5:28 PM, Karthick Duraisamy Soundararaj
>>>  wrote:
>>> > Hello all,
>>> > While running in my eclipse and run a set of queries, this
>>> > works fine, but when I run it in test production server, the searchers
>>> are
>>> > leaked. Any hint would be appreciated. I have not used CoreContainer.
>>> >
>>> > Considering that the SearchHandler is running fine, I am not able to
>>> think
>>> > of a reason why my extended version wouldnt work.. Does anyone have any
>>> > idea?
>>> >
>>> > On Fri, Jul 27, 2012 at 10:19 AM, Karthick Duraisamy Soundararaj <
>>> > karthick.soundara...@gmail.com> wrote:
>>> >
>>> >> I have tons of these open.
>>> >> searcherName : Searcher@24be0446 main
>>> >> caching : true
>>> >> numDocs : 1331167
>>> >> maxDoc : 1338549
>>> >> reader :
>>> SolrIndexReader{this=5585c0de,r=ReadOnlyDirectoryReader@5585c0de
>>> >> ,refCnt=1,segments=18}
>>> >> readerDir : org.apache.lucene.store.NIOFSDirectory@
>>> >> /usr/local/solr/highlander/data/..@2f2d9d89
>>> >> indexVersion : 1336499508709
>>> >> openedAt : Fri Jul 27 09:45:16 EDT 2012
>>> >> registeredAt : Fri Jul 27 09:45:19 EDT 2012
>>> >> warmupTime : 0
>>> >>
>>> >> In my custom handler, I have the following code
>>> >> I have the following problem
>>> >> Although in my custom handler, I have the following
>>> implementation(its not
>>> >> the full code but it gives an overall idea of the implementation) and
>>> it
>>> >>
>>> >>   class CustomHandler extends SearchHandler {
>>> >>
>>> >> void handleRequestBody(SolrQueryRequest
>>> req,SolrQueryResponse
>>> >> rsp)
>>> >>
>>> >>  SolrCore core= req.getCore();
>>> >>  vector>
>>> requestParams =
>>> >> new   vector>();
>>> >> /*parse the params such a way that
>>> >>  requestParams[i] -=> parameter of
>>> the ith
>>> >> request
>>> >>   */
>>> >> ..
>>> >>
>>> >>   try {
>>> >>vector subQueries = new
>>> >>  vector(solrcore, requestParams[i]);
>>> >>
>>> >>for(i=0;i>> >>   ResponseBuilder rb = new
>>> ResponseBuilder()
>>> >>   rb.req = req;
>>> >>
>>> >>   handlerRequestBody(req,rsp,rb); //this
>>> would
>>> >> call search handler's handler request body, whose signature, i have
>>> modified
>>> >>  }
>>> >>  } finally {
>>> >>   for(i=0; i>> >>  subQueries.get(i).close();
>>> >>  }
>>> >>   }
>>> >>
>>> >> *Search Handler Changes*
>>> >>   class SearchHandler {
>>> >> void handleRequestBody(SolrQueryRequest req,
>>> SolrQueryResponse
>>> >> rsp, ResponseBuilder rb, ArrayList comps) {
>>> >>//  ResponseBuilder rb = new ResponseBuilder()  ;
>>> >>
>>> >>..
>>> >>  }
>>> >> void handleRequestBody(SolrQueryRequest req,
>>> >> SolrQueryResponse) {
>>> >>  ResponseBuilder rb = new
>>> ResponseBuilder(req,rsp, new
>>> >> ResponseBuilder());
>>> >>  handleRequestBody(req, rsp, rb, comps) ;
>>> >>  }
>>> >>   }
>>> >>
>>> >>
>>> >> I don see the index old index searcher geting closed after warming up
>>> the
>>> >> new guy... Because I replicate every 5 mintues, it crashes in 2

Re: Can Solr handle large text files?

2012-07-27 Thread Peter Spam
Has the performance of highlighting large text documents been improved in Solr 
4?


Thanks!
Pete

On Nov 5, 2011, at 9:03 AM, Erick Erickson  wrote:

> Sure, if you write a custom update handler. But I'm not at all sure
> this is "ideal".
> You're requiring all that data to be transmitted across the wire and processed
> by Solr. Assuming you have more than one input source, the Solr server in
> the background will be handling up to N documents simultaneously. Plus
> the effort to index. I think I'd recommend splitting them up on the client 
> side.
> 
> Best
> Erick
> 
> On Fri, Nov 4, 2011 at 3:23 AM, Peter Spam  wrote:
>> Solr 4.0 (11/1 snapshot)
>> Data: 80k files, average size 2.5MB, largest is 750MB;
>> Solr: Each document is max 256k; total docs = 800k
>> Machine: Early 2009 Mac Pro, 6GB RAM, 1GBmin/2GBmax given to Solr Java; 
>> Admin shows 30% mem usage
>> 
>> I originally tried injecting the entire file into a single Solr document, 
>> and this had disastrous results when trying to highlight.  I've now tried 
>> splitting each file into 256k segments per Solr document, and the results 
>> are better, but still not what I was hoping for.  Queries are around 2-8 
>> seconds, with some reaching into 30+ second territory.
>> 
>> Ideally, I'd like to feed Solr the metadata and the entire file at once, and 
>> have the back-end split the file into thousands of pieces.  Is this possible?
>> 
>> 
>> Thanks!
>> Pete
>> 
>> On Nov 1, 2011, at 5:15 PM, Peter Spam wrote:
>> 
>>> Wow, 50 lines is tiny!  Is that how small you need to go, to get good 
>>> highlighting performance?
>>> 
>>> I'm looking at documents that can be up to 800MB in size, so I've decided 
>>> to split them down into 256k chunks.  I'm still indexing right now - I'm 
>>> curious to see how performance is when the injection is finished.
>>> 
>>> Has anyone done analysis on where the knee in the curve is, wrt document 
>>> size vs. # of documents?
>>> 
>>> 
>>> Thanks!
>>> Pete
>>> 
>>> On Oct 31, 2011, at 9:28 PM, anand.ni...@rbs.com wrote:
>>> 
 Hi,
 
 Basically I need to index very large log files. I have modified the 
 ExtractingDocumentLoader to create a new document for every 50 lines (it 
 is made configurable by keeping it as a system property)  of the log file 
 being indexed. 'Filename' field for document created from 1 log file is 
 kept the same and unique id is generated by appending the line no. with 
 the file name, e.g 'log.txt (line no. 100 -150)'. Each doc is given the 
 custom score stored in field called 'custom_score' which is directly 
 proportional to its distance from the beginning of the file.
 
 I have also found 'hitGrouped.vm' from the net. Since I am reading only 50 
 lines for each document so the default max chunk size works for me but it 
 can be easily adjusted depending upon the no of lines you are reading per 
 doc.
 
 Now I have done the grouping based on the 'filename' field and show the 
 results from docs having highest score as a result I am able to show the 
 last matching results from log file. Query parameters that I am using for 
 search are:
 
 http://localhost:8080/solr/select?defType=dismax&qf=Content&q=Solr&fl=id,score&defType=dismax&bf=sub(1000,caprice_score)&group=true&group.field=FileName
 
 Results are amazing, I am able to index and search from very larger log 
 files (few 100 MBs) with very low memory requirements. Highlighting is 
 also working fine.
 
 Thanks & Regards,
 Anand
 
 
 
 
 
 Anand Nigam
 RBS Global Banking & Markets
 Office: +91 124 492 5506
 
 -Original Message-
 From: Peter Spam [mailto:ps...@mac.com]
 Sent: 21 October 2011 23:04
 To: solr-user@lucene.apache.org
 Subject: Re: Can Solr handle large text files?
 
 Thanks for your note, Anand.  What was the maximum chunk size for you?  
 Could you post the relevant portions of your configuration file?
 
 
 Thanks!
 Pete
 
 On Oct 21, 2011, at 4:20 AM, anand.ni...@rbs.com wrote:
 
> Hi,
> 
> I was also facing the issue of highlighting the large text files. I 
> applied the solution proposed here and it worked. But I am getting 
> following error :
> 
> 
> Basically 'hitGrouped.vm' is not found. I am using solr-3.4.0. Where
> can I get this file from. Its reference is present in browse.vm
> 
> 
> #if($response.response.get('grouped'))
>  #foreach($grouping in $response.response.get('grouped'))
>#parse("hitGrouped.vm")
>  #end
> #else
>  #foreach($doc in $response.results)
>#parse("hit.vm")
>  #end
> #end
> 
> 
> 
> HTTP Status 500 - Can't find resource 'hitGrouped.vm' in classpath or
> 'C:\caprice\workspace\caprice\dist\DEV\solr\.\conf/',
> cwd=C:\glassfish3\glassfish\domains\domain1\config
> jav