: I am very new to Solr. I am looking to index an xml file and search its
: contents. Its structure resembles something like this
...
: Is it essential to use the DIH to import this data into Solr? Isn't there
: any simpler way to accomplish the task? Can it be done through SolrJ as I am
Ignore for a minute that your data happens to be in an XML file....
* you have some structured data
* you want to put it in solr
* you want to search it
In order to do thiese things, you have to understand (for yourself, but if
you want help from others you have to be able to explain it to us as well)
what that structure means, and how you want to be able to search it.
If you are familiar with relational databases, ask your self: if i were
putting my data into a table, what would my rows be? what would my
colunmns be? what data types would i use for each column? what pieces of
my data would i put into each column/row?
You have to ask yourself the same types questions when you use Solr to
decide what you want your schema.xml to look like, and what you want to
model as "documents" -- and depening on your answers, then you can decide
how to index the data.
Do you have to use DIH to index an XML file? Not at all.
You do have to use *something* to pull the pieces of data you want out of
your XML file (or out of your CSV file, or out of your relational
database, etc...) to model them as "Documents" containing "Fields" that
can put them into Solr. You might find DIH useful for that, or you might
also find the ExtractingRequestHandler useful forthat, or you might ust
want to implement your own bit of code that pulls what you want out of
your XML files and sends them to Solr as SolrInputDocuments (using SolrJ),
or you might want to write a bit of python/ruby/perl/lua/haskel code that
does the same thing and sends it to Solr as xml or json using the format
Solr expects for indexing commands.
that's entirely up to you.
-Hoss