I've also index a concatenation of 50k journal articles (making a single document of several hundred MB of text) and it did not give me an OOM.
-glen On 16 March 2010 15:57, Erick Erickson <erickerick...@gmail.com> wrote: > Why do you think you'd hit OOM errors? How big is "very large"? I've > indexed, as a single document, a 26 volume encyclopedia of civil war > records...... > > Although as much as I like the technology, if I could get away without using > two technologies, I would. Are you completely sure you can't get what you > want with clever Oracle querying? > > Best > Erick > > On Tue, Mar 16, 2010 at 3:20 PM, Neil Chaudhuri < > nchaudh...@potomacfusion.com> wrote: > >> I am working on an application that currently hits a database containing >> millions of very large documents. I use Oracle Text Search at the moment, >> and things work fine. However, there is a request for faceting capability, >> and Solr seems like a technology I should look at. Suffice to say I am new >> to Solr, but at the moment I see two approaches-each with drawbacks: >> >> >> 1) Have Solr index document metadata (id, subject, date). Then Use >> Oracle Text to do a content search based on criteria. Finally, query the >> Solr index for all documents whose id's match the set of id's returned by >> Oracle Text. That strikes me as an unmanageable Boolean query. (e.g. >> id:4ORid:33432323OR...). >> >> 2) Remove Oracle Text from the equation and use Solr to query document >> content based on search criteria. The indexing process though will almost >> certainly encounter an OutOfMemoryError given the number and size of >> documents. >> >> >> >> I am using the embedded server and Solr Java APIs to do the indexing and >> querying. >> >> >> >> I would welcome your thoughts on the best way to approach this situation. >> Please let me know if I should provide additional information. >> >> >> >> Thanks. >> > -- -