: I am very new to Solr. I am looking to index an xml file and search its
: contents. Its structure resembles something like this
        ...
: Is it essential to use the DIH to import this data into Solr? Isn't there
: any simpler way to accomplish the task? Can it be done through SolrJ as I am

Ignore for a minute that your data happens to be in an XML file....

 * you have some structured data
 * you want to put it in solr
 * you want to search it

In order to do thiese things, you have to understand (for yourself, but if 
you want help from others you have to be able to explain it to us as well) 
what that structure means, and how you want to be able to search it.

If you are familiar with relational databases, ask your self: if i were 
putting my data into a table, what would my rows be? what would my 
colunmns be? what data types would i use for each column? what pieces of 
my data would i put into each column/row? 

You have to ask yourself the same types questions when you use Solr to 
decide what you want your schema.xml to look like, and what you want to 
model as "documents" -- and depening on your answers, then you can decide 
how to index the data.

Do you have to use DIH to index an XML file?  Not at all.  

You do have to use *something* to pull the pieces of data you want out of 
your XML file (or out of your CSV file, or out of your relational 
database, etc...) to model them as "Documents" containing "Fields" that 
can put them into Solr.  You might find DIH useful for that, or you might 
also find the ExtractingRequestHandler useful forthat, or you might ust 
want to implement your own bit of code that pulls what you want out of 
your XML files and sends them to Solr as SolrInputDocuments (using SolrJ), 
or you might want to write a bit of python/ruby/perl/lua/haskel code that 
does the same thing and sends it to Solr as xml or json using the format 
Solr expects for indexing commands.

that's entirely up to you.


-Hoss

Reply via email to