Certainly I could use some basic SQL count(*) queries to achieve faceted results, but I am not sure of the flexibility, extensibility, or scalability of that approach. And from what I have read, Oracle Text doesn't do faceting out of the box.
Each document is a few MB, and there will be millions of them. I suppose it depends on how I index them. I am pretty sure my current approach of using Hibernate to load all rows, constructing Solr POJO's from them, and then passing the POJO's to the embedded server would lead to a OOM error. I should probably look into the other options. Thanks. -----Original Message----- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, March 16, 2010 3:58 PM To: solr-user@lucene.apache.org Subject: Re: Moving From Oracle Text Search To Solr Why do you think you'd hit OOM errors? How big is "very large"? I've indexed, as a single document, a 26 volume encyclopedia of civil war records...... Although as much as I like the technology, if I could get away without using two technologies, I would. Are you completely sure you can't get what you want with clever Oracle querying? Best Erick On Tue, Mar 16, 2010 at 3:20 PM, Neil Chaudhuri < nchaudh...@potomacfusion.com> wrote: > I am working on an application that currently hits a database containing > millions of very large documents. I use Oracle Text Search at the moment, > and things work fine. However, there is a request for faceting capability, > and Solr seems like a technology I should look at. Suffice to say I am new > to Solr, but at the moment I see two approaches-each with drawbacks: > > > 1) Have Solr index document metadata (id, subject, date). Then Use > Oracle Text to do a content search based on criteria. Finally, query the > Solr index for all documents whose id's match the set of id's returned by > Oracle Text. That strikes me as an unmanageable Boolean query. (e.g. > id:4ORid:33432323OR...). > > 2) Remove Oracle Text from the equation and use Solr to query document > content based on search criteria. The indexing process though will almost > certainly encounter an OutOfMemoryError given the number and size of > documents. > > > > I am using the embedded server and Solr Java APIs to do the indexing and > querying. > > > > I would welcome your thoughts on the best way to approach this situation. > Please let me know if I should provide additional information. > > > > Thanks. >