Re: Solr and Garbage Collection

2009-09-26 Thread Jonathan Ariel
Yes, it seems like a bug. I will update my JVM, try again and let you know the results :) On 9/26/09, Mark Miller wrote: > Jonathan Ariel wrote: >> Ok. After the server ran for more than 12 hours, the time spent on GC >> decreased from 11% to 3,4%, but 5 hours later it crashed. This is the >> thr

Re: Solr and Garbage Collection

2009-09-26 Thread Mark Miller
Sorry Walter. Half the time I type faster than I think. I was mixing concurrent with parallel. I do agree with you on the concurrent part for batch processing (and likely other things). It would likely be far better to use as many CPU's as you can (as many as make sense) collecting in parallel whil

Re: Solr and Garbage Collection

2009-09-26 Thread Mark Miller
Also, in case the info might help track something down: Its pretty darn odd that both your survivor spaces are full. I've never seen that ever in one of these dumps. Always one is empty. When one is filled, its moved to the other. Then back. And forth. For a certain number of times until its moved

Re: Solr and Garbage Collection

2009-09-26 Thread Mark Miller
Jonathan Ariel wrote: > Ok. After the server ran for more than 12 hours, the time spent on GC > decreased from 11% to 3,4%, but 5 hours later it crashed. This is the thread > dump, maybe you can help identify what happened? > Well thats a tough ;) My guess is its a bug :) Your two survivor spac

Re: Solr and Garbage Collection

2009-09-26 Thread Jonathan Ariel
Ok. After the server ran for more than 12 hours, the time spent on GC decreased from 11% to 3,4%, but 5 hours later it crashed. This is the thread dump, maybe you can help identify what happened? # # An unexpected error has been detected by Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x2

Re: Using two Solr documents to represent one logical document/file

2009-09-26 Thread Matt Weber
Check out the field collapsing patch: http://wiki.apache.org/solr/FieldCollapsing https://issues.apache.org/jira/browse/SOLR-236 Thanks, Matt Weber On Sep 25, 2009, at 3:15 AM, Peter Ledbrook wrote: Hi, I want to index both the contents of a document/file and metadata associated with tha

Punctuation marks in documents prevent recognition of synonyms at indexing?

2009-09-26 Thread G.S.J. Lobbestael
Hi, The wiki uses the example: With "dog, canine" in syn.txt and a document with "I have a dog, Bob.", "dog" is not seen as a synonym. With a document "I have a dog Bob" it is. We could replace the WhitespaceTokenizerFactory with a PatternTokenizer

Re: DIH & RSS > 1.4 nightly 2009-09-25 > full-import&clean=false always clean and import command do nothing

2009-09-26 Thread Shalin Shekhar Mangar
On Fri, Sep 25, 2009 at 6:48 PM, Brahim Abdesslam < brahim.abdess...@maecia.com> wrote: > Hello everybody, > > we are using Solr to index some RSS feeds for a news agregator application. > > We've got some difficulties with the publication date of each item because > each site use an homemade date

Re: Punctuation marks in documents prevent recognition of synonyms at indexing?

2009-09-26 Thread AHMET ARSLAN
> Hi, > > The wiki uses the example: > >     class="solr.TextField"> >       >           class="solr.WhitespaceTokenizerFactory"/> >           class="solr.SynonymFilterFactory synonyms="syn.txt" > ignoreCase="true" expand="false"/> >       >     > > With "dog, canine" in syn.txt and a docu

Re: Solr and Garbage Collection

2009-09-26 Thread Mark Miller
Jonathan Ariel wrote: > I have around 8M documents. > Thats actually not so bad - I take it you are faceting/sorting on quite a few unique fields? > I set up my server to use a different collector and it seems like it > decreased from 11% to 4%, of course I need to wait a bit more because it is

Re: Punctuation marks in documents prevent recognition of synonyms at indexing?

2009-09-26 Thread G.S.J. Lobbestael
> > The wiki uses the example: > > > >     > class="solr.TextField"> > >       > >           > class="solr.WhitespaceTokenizerFactory"/> > >           > class="solr.SynonymFilterFactory synonyms="syn.txt" > > ignoreCase="true" expand="false"/> > >       > >     > > > > With "dog, canine" in

Re: DIH & RSS > 1.4 nightly 2009-09-25 > full-import&clean=false always clean and import command do nothing

2009-09-26 Thread Shalin Shekhar Mangar
On Sat, Sep 26, 2009 at 9:41 PM, Brahim Abdesslam < brahim.abdess...@maecia.com> wrote: > > on a Linux system the command : > curl > http://192.168.0.14:8983/solr/dataimport?command=full-import&clean=false > just don't work like this command : > curl " > http://192.168.0.14:8983/solr/dataimport?co

Re: DIH & RSS > 1.4 nightly 2009-09-25 > full-import&clean=false always clean and import command do nothing

2009-09-26 Thread Shalin Shekhar Mangar
On Fri, Sep 25, 2009 at 6:48 PM, Brahim Abdesslam < brahim.abdess...@maecia.com> wrote: > * when i do a simple import, nothing seems to be done. > That was a bug. It is fixed in trunk now. Thanks! -- Regards, Shalin Shekhar Mangar.

Re: Solr + Jboss + Custom Transformers

2009-09-26 Thread Shalin Shekhar Mangar
On Fri, Sep 25, 2009 at 11:24 PM, Papiya Misra wrote: > > I could use the source code to create solr.war that includes the > CustomTransformer class. Is there any other option - one that preferably > does not include re-packaging solr.war ? > > You should add your custom transformers (actually an

Re: Punctuation marks in documents prevent recognition of synonyms at indexing?

2009-09-26 Thread AHMET ARSLAN
> You lose the WordDelimiterFilterFactory functionality: > > Syn.txt has: ADC, HIV-dementie > Search on "ADC" doesn't find document with "HIV-dementie". synonym filter can handle multi word synonyms. Replace Syn.txt to Syn.txt has: ADC, HIV dementie And search on "ADC" will find document with

Re: DIH & RSS > 1.4 nightly 2009-09-25 > full-import&clean=false always clean and import command do nothing

2009-09-26 Thread Brahim Abdesslam
Shalin Shekhar Mangar a écrit : On Fri, Sep 25, 2009 at 6:48 PM, Brahim Abdesslam < brahim.abdess...@maecia.com> wrote: we are using Solr to index some RSS feeds for a news agregator application. We've got some difficulties with the publication date of each item because each site use an homema