: I am very new to Solr. I am looking to index an xml file and search its : contents. Its structure resembles something like this ... : Is it essential to use the DIH to import this data into Solr? Isn't there : any simpler way to accomplish the task? Can it be done through SolrJ as I am
Ignore for a minute that your data happens to be in an XML file.... * you have some structured data * you want to put it in solr * you want to search it In order to do thiese things, you have to understand (for yourself, but if you want help from others you have to be able to explain it to us as well) what that structure means, and how you want to be able to search it. If you are familiar with relational databases, ask your self: if i were putting my data into a table, what would my rows be? what would my colunmns be? what data types would i use for each column? what pieces of my data would i put into each column/row? You have to ask yourself the same types questions when you use Solr to decide what you want your schema.xml to look like, and what you want to model as "documents" -- and depening on your answers, then you can decide how to index the data. Do you have to use DIH to index an XML file? Not at all. You do have to use *something* to pull the pieces of data you want out of your XML file (or out of your CSV file, or out of your relational database, etc...) to model them as "Documents" containing "Fields" that can put them into Solr. You might find DIH useful for that, or you might also find the ExtractingRequestHandler useful forthat, or you might ust want to implement your own bit of code that pulls what you want out of your XML files and sends them to Solr as SolrInputDocuments (using SolrJ), or you might want to write a bit of python/ruby/perl/lua/haskel code that does the same thing and sends it to Solr as xml or json using the format Solr expects for indexing commands. that's entirely up to you. -Hoss