Re: Out of memory during the indexing

2011-12-02 Thread Jeff Crump
Can anyone advise techniques for limiting the size of the RAM buffers to
begin with?  As the index grows, I shouldn't have to keep increasing the
heap.  We have a high-ingest, low-query-rate environment and I'm not as
much concerned with the query-time caches as I am with the segment core
readers/SolrIndexSearchers themselves.

On 9 November 2011 06:10, Andre Bois-Crettez  wrote:

> How much memory you actually allocate to the JVM ?
> http://wiki.apache.org/solr/**SolrPerformanceFactors#Memory_**
> allocated_to_the_Java_VM
> You need to increase the -Xmx value, otherwise your large ram buffers
> won't fit in the java heap.
>
>
>
> sivaprasad wrote:
>
>> Hi,
>>
>> I am getting the following error during the indexing.I am trying to index
>> 14
>> million records but the document size is very minimal.
>>
>> *Error:*
>> 2011-11-08 14:53:24,634 ERROR [STDERR] (Thread-12)
>> java.lang.OutOfMemoryError: GC overhead limit exceeded
>>
>>
>>
> [...]
>
>  Do i need to increase the heap size for JVM?
>>
>> My solrconfig settings are given below.
>>
>> 
>>  false
>>
>>25
>>2
>>   1024
>>2147483647
>>1
>>1000
>>1
>>
>> and the main index values are
>> false
>>512
>>10
>>2147483647
>>1
>>
>> Do i need to increase the ramBufferSizeMB to a little higher?
>>
>> Please provide your inputs.
>>
>> Regards,
>> Siva
>>
>> --
>> View this message in context: http://lucene.472066.n3.**
>> nabble.com/Out-of-memory-**during-the-indexing-**tp3492701p3492701.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
>>
>
> --
> André Bois-Crettez
>
> Search technology, Kelkoo
> http://www.kelkoo.com/
>
>


Re: Out of memory during the indexing

2011-12-05 Thread Jeff Crump
Yes, and without doing much in the way of queries, either.   Basically, our
test data has large numbers of distinct terms, each of which can be large
in themselves.   Heap usage is a straight line -- up --  75 percent of the
heap is consumed with byte[] allocations at the leaf of an object graph
like so:

SolrCore
SolrIndexSearcher
DirectoryReader
SegmentReader
SegmentCoreReaders
PerFieldPostingsFormat$FieldsReader
...
FST
byte[]

Our application is less concerned with query performance than it is with
making sure our index doesn't OOM.   My suspicion is that we're looking at
just in-memory representation of the index rather than any caching (it's
all turned down to levels suggested in other documentation); plus, we're
not doing much querying in this test anyway.

Any suggestions or places to go for further information?

On 5 December 2011 08:38, Erick Erickson  wrote:

> There's no good way to say to Solr "Use only this
> much memory for searching". You can certainly
> limit the size somewhat by configuring your caches
> to be small. But if you're sorting, then Lucene will
> use up some cache space etc.
>
> Are you actually running into problems?
>
> Best
> Erick
>
> On Fri, Dec 2, 2011 at 2:26 PM, Jeff Crump 
> wrote:
> > Can anyone advise techniques for limiting the size of the RAM buffers to
> > begin with?  As the index grows, I shouldn't have to keep increasing the
> > heap.  We have a high-ingest, low-query-rate environment and I'm not as
> > much concerned with the query-time caches as I am with the segment core
> > readers/SolrIndexSearchers themselves.
> >
> > On 9 November 2011 06:10, Andre Bois-Crettez 
> wrote:
> >
> >> How much memory you actually allocate to the JVM ?
> >> http://wiki.apache.org/solr/**SolrPerformanceFactors#Memory_**
> >> allocated_to_the_Java_VM<
> http://wiki.apache.org/solr/SolrPerformanceFactors#Memory_allocated_to_the_Java_VM
> >
> >> You need to increase the -Xmx value, otherwise your large ram buffers
> >> won't fit in the java heap.
> >>
> >>
> >>
> >> sivaprasad wrote:
> >>
> >>> Hi,
> >>>
> >>> I am getting the following error during the indexing.I am trying to
> index
> >>> 14
> >>> million records but the document size is very minimal.
> >>>
> >>> *Error:*
> >>> 2011-11-08 14:53:24,634 ERROR [STDERR] (Thread-12)
> >>> java.lang.OutOfMemoryError: GC overhead limit exceeded
> >>>
> >>>
> >>>
> >> [...]
> >>
> >>  Do i need to increase the heap size for JVM?
> >>>
> >>> My solrconfig settings are given below.
> >>>
> >>> 
> >>>  false
> >>>
> >>>25
> >>>2
> >>>   1024
> >>>2147483647
> >>>1
> >>>1000
> >>>1
> >>>
> >>> and the main index values are
> >>> false
> >>>512
> >>>10
> >>>2147483647
> >>>1
> >>>
> >>> Do i need to increase the ramBufferSizeMB to a little higher?
> >>>
> >>> Please provide your inputs.
> >>>
> >>> Regards,
> >>> Siva
> >>>
> >>> --
> >>> View this message in context: http://lucene.472066.n3.**
> >>>
> nabble.com/Out-of-memory-**during-the-indexing-**tp3492701p3492701.html<
> http://lucene.472066.n3.nabble.com/Out-of-memory-during-the-indexing-tp3492701p3492701.html
> >
> >>> Sent from the Solr - User mailing list archive at Nabble.com.
> >>>
> >>>
> >>>
> >>
> >> --
> >> André Bois-Crettez
> >>
> >> Search technology, Kelkoo
> >> http://www.kelkoo.com/
> >>
> >>
>


Two XPathEntityProcessor questions

2011-05-18 Thread Jeff Crump
Hi,

Can anyone tell me if the XPathEntityProcessor handles expresions like this:

xpath="/a/b[c='value']/d/e

That is, return a node that has a predecessor with a given text value?

I would like to map various XPath expressions of that form to the same
document in the index (I have a unique key constraint).

Also, is it possible to assign a value to a unique key from an HTTP
parameter?  Something like this:

${dataimporter.request.id}

I'm using a ContentStreamDataSource to fetch data from a POST.

Thanks,

Jeff


Re: very slow commits and overlapping commits

2011-05-22 Thread Jeff Crump
I don't have an answer to this but only another question:  I don't think I
can use auto-commit in my application, as I have to "checkpoint" my index
submissions and I don't know of any callback mechanism that would let me
know a commit has happened.  Is there one?

2011/5/21 Erick Erickson 

> Well, committing less offside a possibilty  . Here's what's probably
> happening. When you pass certain thresholds, segments are merged which can
> take quite some time.  His are you triggering commits? If it's external,
> think about using auto commit instead.
>
> Best
> Erick
> On May 20, 2011 6:04 PM, "Bill Au"  wrote:
> > On my Solr 1.4.1 master I am doing commits regularly at a fixed interval.
> I
> > noticed that from time to time commit will take longer than the commit
> > interval, causing commits to overlap. Then things will get worse as
> commit
> > will take longer and longer. Here is the logs for a long commit:
> >
> >
> > [2011-05-18 23:47:30.071] start
> >
>
> commit(optimize=false,waitFlush=false,waitSearcher=false,expungeDeletes=false)
> > [2011-05-18 23:49:48.119] SolrDeletionPolicy.onCommit: commits:num=2
> > [2011-05-18 23:49:48.119]
> >
>
> commit{dir=/var/opt/resin3/5062/solr/data/index,segFN=segments_5cpa,version=1247782702272,generation=249742,filenames=[_4dqu_2g.del,
> > _4e66.tis, _4e3r.tis, _4e59.nrm, _4e68_1.del, _4e4n.prx, _4e4n.fnm,
> > _4e67.fnm, _4e3r.frq, _4e3r.tii, _4e6d.fnm, _4e6c.prx, _4e68.fdx,
> _4e68.nrm,
> > _4e6a.frq, _4e68.fdt, _4dqu.fnm, _4e4n.tii, _4e69.fdx, _4e69.fdt,
> _4e0e.nrm,
> > _4e4n.tis, _4e6e.fnm, _4e3r.prx, _4e66.fnm, _4e3r.nrm, _4e0e.prx,
> _4e4c.fdx,
> > _4dx1.prx, _4e5v.frq, _4e3r.fdt, _4e4c.tis, _4e41_6.del, _4e6b.tis,
> > _4e6b_1.del, _4e4y_3.del, _4e6b.tii, _4e3r.fdx, _4dx1.nrm, _4e4y.frq,
> > _4e4c.fdt, _4e4c.tii, _4e6d.fdt, _4e5k.fnm, _4e41.fnm, _4e69.fnm,
> _4e67.fdt,
> > _4e0e.tii, _4dty_h.del, _4e6b.fnm, _4e0e_h.del, _4e6d.fdx, _4e67.fdx,
> > _4e0e.tis, _4e5v.nrm, _4dx1.fnm, _4e5v.tii, _4dqu.fdt, segments_5cpa,
> > _4e5v.prx, _4dqu.fdx, _4e59.fnm, _4e6d.prx, _4e59_5.del, _4e4c.prx,
> > _4e4c.nrm, _4e5k.prx, _4e66.fdx, _4dty.frq, _4e6c.frq, _4e5v.tis,
> _4e6e.tii,
> > _4e66.fdt, _4e6b.fdx, _4e68.prx, _4e59.fdx, _4e6e.fdt, _4e41.prx,
> _4dx1.tii,
> > _4dx1.fdt, _4e6b.fdt, _4e5v_4.del, _4e4n.fdt, _4e6e.fdx, _4dx1.fdx,
> > _4e41.nrm, _4e4n.fdx, _4e6e.tis, _4e66.tii, _4e4c.fnm, _4e6b.prx,
> _4e67.prx,
> > _4e0e.fnm, _4e4n.nrm, _4e67.nrm, _4e5k.nrm, _4e6a.prx, _4e68.fnm,
> > _4e4c_4.del, _4dx1.tis, _4e6e.nrm, _4e59.tii, _4e68.tis, _4e67.frq,
> > _4e3r.fnm, _4dty.nrm, _4e4y.prx, _4e6e.prx, _4dty.tis, _4e4y.tis,
> _4e6b.nrm,
> > _4e6a.fdt, _4e4n.frq, _4e6d.frq, _4e59.fdt, _4e6a.fdx, _4e6a.fnm,
> _4dqu.tii,
> > _4e41.tii, _4e67_1.del, _4e41.tis, _4dty.fdt, _4e69.tis, _4dqu.frq,
> > _4dty.fdx, _4dx1.frq, _4e6e.frq, _4e66_1.del, _4e69.prx, _4e6d.tii,
> > _4e5k.tii, _4e0e.fdt, _4dqu.tis, _4e6d.tis, _4e69.nrm, _4dqu.prx,
> _4e4y.fnm,
> > _4e67.tis, _4e69_1.del, _4e6d.nrm, _4e6c.tis, _4e0e.fdx, _4e6c.tii,
> > _4dx1_n.del, _4e5v.fnm, _4e5k.tis, _4e59.tis, _4e67.tii, _4dqu.nrm,
> > _4e5k_8.del, _4e6c.fdx, _4e6c.fdt, _4e41.frq, _4e4y.fdx, _4e69.frq,
> > _4e6a.tis, _4dty.prx, _4e66.frq, _4e5k.frq, _4e6a.tii, _4e69.tii,
> _4e6c.nrm,
> > _4dty.fnm, _4e59.prx, _4e59.frq, _4e66.prx, _4e68.frq, _4e5k.fdx,
> _4e4y.tii,
> > _4e6c.fnm, _4e0e.frq, _4e6b.frq, _4e41.fdt, _4e4n_2.del, _4dty.tii,
> > _4e4y.fdt, _4e66.nrm, _4e4c.frq, _4e6a.nrm, _4e5k.fdt, _4e3r_i.del,
> > _4e5v.fdt, _4e4y.nrm, _4e68.tii, _4e5v.fdx, _4e41.fdx]
> > [2011-05-18 23:49:48.119]
> >
>
> commit{dir=/var/opt/resin3/5062/solr/data/index,segFN=segments_5cpb,version=1247782702273,generation=249743,filenames=[_4dqu_2g.del,
> > _4e66.tis, _4e59.nrm, _4e3r.tis, _4e4n.fnm, _4e67.fnm, _4e3r.tii,
> _4e6d.fnm,
> > _4e68.fdx, _4e68.fdt, _4dqu.fnm, _4e4n.tii, _4e69.fdx, _4e69.fdt,
> _4e4n.tis,
> > _4e6e.fnm, _4e0e.prx, _4e4c.tis, _4e5v.frq, _4e4y_3.del, _4e6b_1.del,
> > _4e4c.tii, _4e6f.fnm, _4e5k.fnm, _4e6c_1.del, _4e41.fnm, _4dx1.fnm,
> > _4e5v.nrm, _4e5v.tii, _4e5v.prx, _4e5k.prx, _4e4c.nrm, _4dty.frq,
> _4e66.fdx,
> > _4e5v.tis, _4e66.fdt, _4e6e.tii, _4e59.fdx, _4e6b.fdx, _4e41.prx,
> _4e6b.fdt,
> > _4e41.nrm, _4e6e.tis, _4e4c.fnm, _4e66.tii, _4e6b.prx, _4e0e.fnm,
> _4e5k.nrm,
> > _4e6a.prx, _4e6e.nrm, _4e59.tii, _4e67.frq, _4dty.nrm, _4e4y.tis,
> _4e6a.fdt,
> > _4e6b.nrm, _4e59.fdt, _4e6a.fdx, _4e41.tii, _4e41.tis, _4e67_1.del,
> > _4dty.fdt, _4dty.fdx, _4e69.tis, _4e66_1.del, _4e6e.frq, _4e5k.tii,
> > _4dqu.prx, _4e67.tis, _4e69_1.del, _4e6c.tis, _4e6c.tii, _4e5v.fnm,
> > _4e5k.tis, _4e59.tis, _4e67.tii, _4e6c.fdx, _4e4y.fdx, _4e41.frq,
> _4e6c.fdt,
> > _4dty.prx, _4e66.frq, _4e69.tii, _4e6c.nrm, _4e59.frq, _4e66.prx,
> _4e5k.fdx,
> > _4e68.frq, _4e4y.tii, _4e4n_2.del, _4e41.fdt, _4e6b.frq, _4e4y.fdt,
> > _4e66.nrm, _4e4c.frq, _4e3r_i.del, _4e5k.fdt, _4e4y.nrm, _4e41.fdx,
> > _4e4n.prx, _4e68_1.del, _4e3r.frq, _4e6f.fdt, _4e6f.fdx, _4e6c.prx,
> > _4e68.nrm, _4e6a.frq, _4e0e.nrm, _4e3r.prx, _4e66.fnm, _4e3r.nrm,
> _4

Basic questions about Solr cost in programming time

2010-01-26 Thread Jeff Crump
Hi, 
I hope this message is OK for this list.
 
I'm looking into search solutions for an intranet site built with Drupal.
Eventually we'd like to scale to enterprise search, which would include the
Drupal site, a document repository, and Jive SBS (collaboration software).
I'm interested in Lucene/Solr because of its scalability, faceted search and
optimization features, and because it is free. Our problem is that we are a
non-profit organization with only three very busy programmers/sys admins
supporting our employees around the world. 
 
To help me argue for Solr in terms of total cost, I'm hoping that members of
this list can share their insights about the following:
 
* About how many hours of programming did it take you to set up your
instance of Lucene/Solr (not counting time spent on optimization)?
 
* Are there any disadvantages of going with a certified distribution rather
than the standard distribution?
 
 
Thanks and best regards,
Jeff
 
Jeff Crump
jcr...@hq.mercycorps.org