Hi, yes, it's about CSV files loaded via HTTP from shops to be fed into a shopping search engine.
The CSV Loader cannot map fields (only field values) etc. DIH is flexible enough for building the importing part of such a thing but misses elegant handling of CSV data ... Regards On Thu, Jun 9, 2011 at 9:50 PM, Yonik Seeley <yo...@lucidimagination.com>wrote: > On Thu, Jun 9, 2011 at 3:31 PM, Helmut Hoffer von Ankershoffen > <helmut...@googlemail.com> wrote: > > Hi, > > > > there seems to be no way to index CSV using the DataImportHandler. > > Looking over the features you want, it looks like you're starting from > a CSV file (as opposed to CSV stored in a database). > Is there a reason that you need to use DIH and can't directly use the > CSV loader? > http://wiki.apache.org/solr/UpdateCSV > > > -Yonik > http://www.lucidimagination.com > > > > > Using a combination of > > LineEntityProcessor< > http://wiki.apache.org/solr/DataImportHandler#LineEntityProcessor> > > and RegexTransformer< > http://wiki.apache.org/solr/DataImportHandler#RegexTransformer> > > as > > proposed in > > > http://robotlibrarian.billdueber.com/an-exercise-in-solr-and-dataimporthandler-hathitrust-data/is > > not working for real world CSV files. > > > > E.g. many CSV files have double-quotes enclosing some but not all columns > - > > there is no elegant way to segment this using a simple regular > expression. > > > > As CSV is still very common esp. in E-Commerce scenarios, I propose that > > Solr provides a CSVEntityProcessor that: > > 1) Handles the case of CSV files with/without and with some double-quote > > enclosed columns > > 2) Allows for a configurable column separator (';',',','\t' etc.) > > 3) Allows for a leading row containing column headings > > 4) If there is a leading row with column headings provides a possibility > to > > address columns by their column names and map them to Solr fields > (similar > > to the XPathEntityProcessor) > > 5) Auto-detects encoding of the file (UTF-8 etc.) > > > > This would make it A LOT easier to use Solr for E-Commerce scenarios. > > > > If there is no such entity processor in the works i will develop one ... > So > > please let me know. > > > > Regards > > >