Would anyone care to comment on the merits of storing indexed full-text documents in Solr versus storing them externally?

It seems there are three options for us:

1) store documents both in Solr and externally - this is what we are doing now, and gives us all sorts of flexibility, but doesn't seem like the most scalable option, at least in terms of storage space and I/O required when updating/inserting documents.

2) store documents externally: For the moment, the only thing that requires us to store documents in Solr is the need to highlight them, both in search result snippets and in full document views. We are considering hunting for or writing a Highlighter extension that could pull in the document text from an external source (eg filesystem).

3) store documents only in Solr. We'd just retrieve document text as a Solr field value rather than reading from the filesystem. Somehow this strikes me as the wrong thing to do, but it could work: I'm not sure why. A lot of unnecessary merging I/O activity perhaps. Makes it hard to grep the documents or use other filesystem tools, I suppose.

Which one of these sounds best to you? Under which circumstances? Are there other possibilities?

Thanks!

--

Michael Sokolov
Engineering Director
www.ifactory.com

Reply via email to