Well, here's a place to start if you want to patch the code: http://wiki.apache.org/solr/HowToContribute
If you do want to take this on, hop on over to the dev list and start a discussion. I'd start with some posts on that list before entering or working on a JIRA issue, just ask for some guidance. A good place to start is pretty much what you've done here, state your problem, and what you think the correct behavior is. Be prepared for things to be brought up you never thought of <G>... which is the point of starting the discussion there. A very good way to start is to get the code, compile it, and then run some of the test cases in an IDE, stepping through the test case in the debugger. Sometimes that doesn't work easily, but if it does it gives you an idea of how the code works. There are instructions at the above link for setting things up in an IDE (Eclipse and Intellij are popular). Just loading the project and looking for files that begin with CSV might be a place to start. Then look for files that begin with TestCSV. Both of these "look promising". Anyway, if you get that far, then go over to the dev list and say "I'm thinking of XXX, this code appears to be handled in YYY and I'm thinking of changing it like ZZZ" and it will be well received. Of course if you want to go ahead and make your changes and submit a patch, that's even better, but it's often best to get a bit of guidance first. Best Erick On Thu, Jun 9, 2011 at 5:17 PM, Helmut Hoffer von Ankershoffen <helmut...@googlemail.com> wrote: > On Thu, Jun 9, 2011 at 11:05 PM, Ken Krugler > <kkrugler_li...@transpac.com>wrote: > >> >> On Jun 9, 2011, at 1:27pm, Helmut Hoffer von Ankershoffen wrote: >> >> > Hi, >> > >> > ... that would be an option if there is a defined set of field names and >> a >> > single column/CSV layout. The scenario however is different csv files >> (from >> > different shops) with individual column layouts (separators, encodings >> > etc.). The idea is to map known field names to defined field names in the >> > solr schema. If I understand the capabilities of the CSVLoader correctly >> > (sorry, I am completely new to Solr, started work on it today) this is >> not >> > possible - is it? >> >> As per the documentation on >> http://wiki.apache.org/solr/UpdateCSV#fieldnames, you can specify the >> names/positions of fields in the CSV file, and ignore fieldnames. >> >> So this seems like it would solve your requirement, as each different >> layout could specify its own such mapping during import. >> >> Sure, but the requirement (to keep the process of integrating new shops > efficient) is not to have one mapping per import (cp. the Email regarding > "more or less schema free") but to enhance one mapping that maps common > field names to defined fields disregarding order of known fields/columns. As > far as I understand that is not a problem at all with DIH, however DIH and > CSV are not a perfect match ,-) > > >> It could be handy to provide a fieldname map (versus the value map that >> UpdateCSV supports). > > Definitely. Either a fieldname map in CSVLoader or a robust CSVLoader in DIH > ... > > >> Then you could use the header, and just provide a mapping from header >> fieldnames to schema fieldnames. >> > That's the idea -) > > => what's the best way to progress. Either someone enhances the CSVLoader by > a field mapper (with multipel input field names mapping to one field name in > the Solr schema) or someone enhances the DIH with a robust CSV loader ,-). > As I am completely new to this Community, please give me the direction to go > (or wait :-). > > best regards > > >> -- Ken >> >> > On Thu, Jun 9, 2011 at 10:12 PM, Yonik Seeley < >> yo...@lucidimagination.com>wrote: >> > >> >> On Thu, Jun 9, 2011 at 4:07 PM, Helmut Hoffer von Ankershoffen >> >> <helmut...@googlemail.com> wrote: >> >>> Hi, >> >>> yes, it's about CSV files loaded via HTTP from shops to be fed into a >> >>> shopping search engine. >> >>> The CSV Loader cannot map fields (only field values) etc. >> >> >> >> You can provide your own list of fieldnames and optionally ignore the >> >> first line of the CSV file (assuming it contains the field names). >> >> http://wiki.apache.org/solr/UpdateCSV#fieldnames >> >> >> >> -Yonik >> >> http://www.lucidimagination.com >> >> >> >> -------------------------- >> Ken Krugler >> +1 530-210-6378 >> http://bixolabs.com >> custom data mining solutions >> >> >> >> >> >> >> >