Mike I would be very interested in the answer to that question too. My hunch is that the answer is no too. I have a few text databases that range from 200MB to about 60GB with which I could run some tests. I will have some downtime in early July and will post results.
From what I can tell the Guardian newspaper is doing just that: http://www.guardian.co.uk/open-platform/blog/what-is-powering-the-content-api http://www.lucidimagination.com/blog/2010/04/29/for-the-guardian-solr-is-the-new-database/ Cheers François On Jun 20, 2011, at 9:05 AM, Mike Sokolov wrote: > I'd be very interested in this, as well, if you do it before me and are > willing to share... > > A related question I have tried to ask on this list, and have never really > gotten a good answer to, is whether it makes sense to just chuck the external > storage and treat the lucene index as the primary storage for documents. I > have a feeling the answer is no; perhaps because of increased I/O costs for > lucene and solr, but I don't really know. I've been considering doing some > experimentation, but would really love an expert opinion... > > -Mike > > On 06/20/2011 08:41 AM, Jamie Johnson wrote: >> I am trying to index data where I'm concerned that storing the contents of a >> specific field will be a bit of a hog so we are planning to retrieve this >> information as needed for highlighting from an external source. I am >> looking to extend the default solr highlighting capability to work with >> information pulled from this external source and it looks like this is >> possible by extending DefaultSolrHighlighter (line 418 to pull a particular >> field from external source) for standard highlighting and >> BaseFragmentsBuilder (line 99) for FastVectorHighlighter. I could just hard >> code this to say if the field name is a specific value look into the >> external source, is this the best way to accomplish this? Are there any >> other extension points to do what I'm suggesting? >> >>