That is a great article, David. 

For the moment, I am trying an all-Solr approach, but I have run into a small 
problem. The documents are stored as XML CLOB's using Oracle's OPAQUE object. 
Is there any facility to unpack this into the actual text? Or must I execute 
that in the SQL query?
Thanks.


-----Original Message-----
From: Smiley, David W. [mailto:dsmi...@mitre.org] 
Sent: Tuesday, March 16, 2010 4:45 PM
To: solr-user@lucene.apache.org
Subject: Re: Moving From Oracle Text Search To Solr

If you do stay with Oracle, please report back to the list how that went.  In 
order to get decent filtering and faceting performance, I believe you will need 
to use "bitmapped indexes" which Oracle and some other databases support.

You may want to check out my article on this subject: 
http://www.packtpub.com/article/text-search-your-database-or-solr

~ David Smiley
Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/


On Mar 16, 2010, at 4:13 PM, Neil Chaudhuri wrote:

> Certainly I could use some basic SQL count(*) queries to achieve faceted 
> results, but I am not sure of the flexibility, extensibility, or scalability 
> of that approach. And from what I have read, Oracle Text doesn't do faceting 
> out of the box.
> 
> Each document is a few MB, and there will be millions of them. I suppose it 
> depends on how I index them. I am pretty sure my current approach of using 
> Hibernate to load all rows, constructing Solr POJO's from them, and then 
> passing the POJO's to the embedded server would lead to a OOM error. I should 
> probably look into the other options.
> 
> Thanks.
> 
> 
> -----Original Message-----
> From: Erick Erickson [mailto:erickerick...@gmail.com] 
> Sent: Tuesday, March 16, 2010 3:58 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Moving From Oracle Text Search To Solr
> 
> Why do you think you'd hit OOM errors? How big is "very large"? I've
> indexed, as a single document, a 26 volume encyclopedia of civil war
> records......
> 
> Although as much as I like the technology, if I could get away without using
> two technologies, I would. Are you completely sure you can't get what you
> want with clever Oracle querying?
> 
> Best
> Erick
> 
> On Tue, Mar 16, 2010 at 3:20 PM, Neil Chaudhuri <
> nchaudh...@potomacfusion.com> wrote:
> 
>> I am working on an application that currently hits a database containing
>> millions of very large documents. I use Oracle Text Search at the moment,
>> and things work fine. However, there is a request for faceting capability,
>> and Solr seems like a technology I should look at. Suffice to say I am new
>> to Solr, but at the moment I see two approaches-each with drawbacks:
>> 
>> 
>> 1)      Have Solr index document metadata (id, subject, date). Then Use
>> Oracle Text to do a content search based on criteria. Finally, query the
>> Solr index for all documents whose id's match the set of id's returned by
>> Oracle Text. That strikes me as an unmanageable Boolean query.  (e.g.
>> id:4ORid:33432323OR...).
>> 
>> 2)      Remove Oracle Text from the equation and use Solr to query document
>> content based on search criteria. The indexing process though will almost
>> certainly encounter an OutOfMemoryError given the number and size of
>> documents.
>> 
>> 
>> 
>> I am using the embedded server and Solr Java APIs to do the indexing and
>> querying.
>> 
>> 
>> 
>> I would welcome your thoughts on the best way to approach this situation.
>> Please let me know if I should provide additional information.
>> 
>> 
>> 
>> Thanks.
>> 




Reply via email to