Hi,

yes, it's about CSV files loaded via HTTP from shops to be fed into a
shopping search engine.

The CSV Loader cannot map fields (only field values) etc. DIH is flexible
enough for building the importing part of such a thing but misses elegant
handling of CSV data ...

Regards

On Thu, Jun 9, 2011 at 9:50 PM, Yonik Seeley <yo...@lucidimagination.com>wrote:

> On Thu, Jun 9, 2011 at 3:31 PM, Helmut Hoffer von Ankershoffen
> <helmut...@googlemail.com> wrote:
> > Hi,
> >
> > there seems to be no way to index CSV using the DataImportHandler.
>
> Looking over the features you want, it looks like you're starting from
> a CSV file (as opposed to CSV stored in a database).
> Is there a reason that you need to use DIH and can't directly use the
> CSV loader?
> http://wiki.apache.org/solr/UpdateCSV
>
>
> -Yonik
> http://www.lucidimagination.com
>
>
>
> > Using a combination of
> > LineEntityProcessor<
> http://wiki.apache.org/solr/DataImportHandler#LineEntityProcessor>
> >  and RegexTransformer<
> http://wiki.apache.org/solr/DataImportHandler#RegexTransformer>
> > as
> > proposed in
> >
> http://robotlibrarian.billdueber.com/an-exercise-in-solr-and-dataimporthandler-hathitrust-data/is
> > not working for real world CSV files.
> >
> > E.g. many CSV files have double-quotes enclosing some but not all columns
> -
> > there is no elegant way to segment this using a simple regular
> expression.
> >
> > As CSV is still very common esp. in E-Commerce scenarios, I propose that
> > Solr provides a CSVEntityProcessor that:
> > 1) Handles the case of CSV files with/without and with some double-quote
> > enclosed columns
> > 2) Allows for a configurable column separator (';',',','\t' etc.)
> > 3) Allows for a leading row containing column headings
> > 4) If there is a leading row with column headings provides a possibility
> to
> > address columns by their column names and map them to Solr fields
> (similar
> > to the XPathEntityProcessor)
> > 5) Auto-detects encoding of the file (UTF-8 etc.)
> >
> > This would make it A LOT easier to use Solr for E-Commerce scenarios.
> >
> > If there is no such entity processor in the works i will develop one ...
> So
> > please let me know.
> >
> > Regards
> >
>

Reply via email to