On Jun 9, 2011, at 2:21pm, Helmut Hoffer von Ankershoffen wrote: > Hi, > > btw: there seems to somewhat of a non-match regarding efforts to Enhance DIH > regarding the CSV format (James Dyer) and the effort to maintain the > CSVLoader (Ken Krugler). How about merging your efforts and migrating the > CSVLoader to a CSVEntityProcessor (cp. my initial email)? :-)
While I'm a CSVLoader user (and I've found/fixed one bug in it), I'm not involved in any active development/maintenance of that piece of code. If James or you can make progress on merging support for CSV into DIH, that's great. -- Ken > On Thu, Jun 9, 2011 at 11:17 PM, Helmut Hoffer von Ankershoffen < > helmut...@googlemail.com> wrote: > >> >> >> On Thu, Jun 9, 2011 at 11:05 PM, Ken Krugler >> <kkrugler_li...@transpac.com>wrote: >> >>> >>> On Jun 9, 2011, at 1:27pm, Helmut Hoffer von Ankershoffen wrote: >>> >>>> Hi, >>>> >>>> ... that would be an option if there is a defined set of field names and >>> a >>>> single column/CSV layout. The scenario however is different csv files >>> (from >>>> different shops) with individual column layouts (separators, encodings >>>> etc.). The idea is to map known field names to defined field names in >>> the >>>> solr schema. If I understand the capabilities of the CSVLoader correctly >>>> (sorry, I am completely new to Solr, started work on it today) this is >>> not >>>> possible - is it? >>> >>> As per the documentation on >>> http://wiki.apache.org/solr/UpdateCSV#fieldnames, you can specify the >>> names/positions of fields in the CSV file, and ignore fieldnames. >>> >>> So this seems like it would solve your requirement, as each different >>> layout could specify its own such mapping during import. >>> >>> Sure, but the requirement (to keep the process of integrating new shops >> efficient) is not to have one mapping per import (cp. the Email regarding >> "more or less schema free") but to enhance one mapping that maps common >> field names to defined fields disregarding order of known fields/columns. As >> far as I understand that is not a problem at all with DIH, however DIH and >> CSV are not a perfect match ,-) >> >> >>> It could be handy to provide a fieldname map (versus the value map that >>> UpdateCSV supports). >> >> Definitely. Either a fieldname map in CSVLoader or a robust CSVLoader in >> DIH ... >> >> >>> Then you could use the header, and just provide a mapping from header >>> fieldnames to schema fieldnames. >>> >> That's the idea -) >> >> => what's the best way to progress. Either someone enhances the CSVLoader >> by a field mapper (with multipel input field names mapping to one field name >> in the Solr schema) or someone enhances the DIH with a robust CSV loader >> ,-). As I am completely new to this Community, please give me the direction >> to go (or wait :-). >> >> best regards >> >> >>> -- Ken >>> >>>> On Thu, Jun 9, 2011 at 10:12 PM, Yonik Seeley < >>> yo...@lucidimagination.com>wrote: >>>> >>>>> On Thu, Jun 9, 2011 at 4:07 PM, Helmut Hoffer von Ankershoffen >>>>> <helmut...@googlemail.com> wrote: >>>>>> Hi, >>>>>> yes, it's about CSV files loaded via HTTP from shops to be fed into a >>>>>> shopping search engine. >>>>>> The CSV Loader cannot map fields (only field values) etc. >>>>> >>>>> You can provide your own list of fieldnames and optionally ignore the >>>>> first line of the CSV file (assuming it contains the field names). >>>>> http://wiki.apache.org/solr/UpdateCSV#fieldnames >>>>> >>>>> -Yonik >>>>> http://www.lucidimagination.com >>>>> >>> >>> -------------------------- >>> Ken Krugler >>> +1 530-210-6378 >>> http://bixolabs.com >>> custom data mining solutions >>> >>> >>> >>> >>> >>> >>> >> -------------------------- Ken Krugler +1 530-210-6378 http://bixolabs.com custom data mining solutions