Depends on how large/many are your documents.
Using Solr/Lucene: you can store the documents in the same index as
the search index, or you can have "key" in the search index which is a
look-up in a second index, which has two fields, the key (indexed) and
the document(stored, maybe compressed too). The choice would depend on
the impact of the stored documents on the size of the index. It is
also very application dependent: the ratio of hit result to document
access is important: if the document is accessed 1% of the time it is
in a search result this will have different impact than if the
document is accessed 99% of the time it is in the search result.

I think you could do this with 2 cores in Solr, if I understand Solr correctly.

I have also had good experience with BDB for (non-networked) document storage.

Glen Newton
http://zzzoot.blogspot.com/

2009/5/26 Peter Keane <pke...@mail.utexas.edu>:
> Hi All-
>
> I've just recently began playing with Apache Solr, and it seems to be
> a perfect fit for our project (http://code.google.com/p/dase/).  I've
> been quite surprised at both how easy Solr was to get up and running
> and how flexible it seems to be. I've been tempted to use it for not
> just search, but document storage as well.  Seems, though, this is not
> the best road to go down.
>
> I'd like to know if there are recommendations for a document store (or
> distributed hash table) that would work well alongside Solr.
> Basically, I'd like to be able to deploy in Tomcat and interact w/ the
> store over http (like Solr).  Recommendations I've seen include
> Project Voldemort, BDB, BananaDB, CouchDB, etc.  I'd be quite
> interested to hear comments about pluses/minuses of any of those or
> other options OR comments about Solr suitability as a document store.
>
> thanks-
> Peter Keane
> daseproject.org
>



-- 

-

Reply via email to