Hi All
We are considering SOLR for a large database of XMLs. I have some
newbie questions - if there is a place I can go read about them do let
me know and I will go read up :)
1. Currently we are able to pull the XMLs from a file systems using
FileDataSource. The DIH is convenient since I can map my XML fields
using the XPathProcessor. This works for an initial load. However
after the initial load, we would like to 'post' changed xmls to SOLR
whenever the XML is updated in a separate system. I know we can post
xmls with 'add' however I was not sure how to do this and maintain the
DIH mapping I use in data-config.xml? I don't want to save the file
to the disk and then call the DIH - would prefer to directly post it.
Do I need to use solrj for this?
2. If my solr schema.xml changes then do I HAVE to reindex all the
old documents? Suppose in future we have newer XML documents that
contain a new additional xml field. The old documents that are
already indexed don't have this field and (so) I don't need search on
them with this field. However the new ones need to be search-able on
this new field. Can I just add this new field to the SOLR schema,
restart the servers just post the new new documents or do I need to
reindex everything?
3. Can I backup the index directory. So that in case of a disk crash
- I can restore this directory and bring solr up. I realize that any
documents indexed after this backup would be lost - I can however keep
track of these outside and simply re-index documents 'newer' than that
backup date. This question is really important to me in the context
of using a Master Server with replicated index. I would like to run
this backup for the 'Master'.
4. In general what happens when the solr application is bounced? Is
the index affected (anything maintained in memory)?
Regards
Guna
- Newbie Design Question... Gunaranjan Chandraraju
-