Would anyone care to comment on the merits of storing indexed full-text
documents in Solr versus storing them externally?
It seems there are three options for us:
1) store documents both in Solr and externally - this is what we are
doing now, and gives us all sorts of flexibility, but doesn't seem like
the most scalable option, at least in terms of storage space and I/O
required when updating/inserting documents.
2) store documents externally: For the moment, the only thing that
requires us to store documents in Solr is the need to highlight them,
both in search result snippets and in full document views. We are
considering hunting for or writing a Highlighter extension that could
pull in the document text from an external source (eg filesystem).
3) store documents only in Solr. We'd just retrieve document text as a
Solr field value rather than reading from the filesystem. Somehow this
strikes me as the wrong thing to do, but it could work: I'm not sure
why. A lot of unnecessary merging I/O activity perhaps. Makes it hard
to grep the documents or use other filesystem tools, I suppose.
Which one of these sounds best to you? Under which circumstances? Are
there other possibilities?
Thanks!
--
Michael Sokolov
Engineering Director
www.ifactory.com