Re: Processing/Indexing CSV

Helmut Hoffer von Ankershoffen Fri, 10 Jun 2011 06:23:15 -0700

Hi,

thanks for the Intro, will do next week :-)


greetings from berlin

On Fri, Jun 10, 2011 at 2:49 PM, Erick Erickson <erickerick...@gmail.com>wrote:

> Well, here's a place to start if you want to patch the code:
>
> http://wiki.apache.org/solr/HowToContribute
>
> If you do want to take this on, hop on over to the dev list
> and start a discussion. I'd start with some posts on that list
> before entering or working on a JIRA issue, just ask for
> some guidance. A good place to start is pretty much what
> you've done here, state your problem, and what you think
> the correct behavior is.
>
> Be prepared for things to be brought up you never thought
> of <G>... which is the point of starting the discussion there.
>
> A very good way to start is to get the code, compile it, and then
> run some of the test cases in an IDE, stepping through the test
> case in the debugger. Sometimes that doesn't work easily, but
> if it does it gives you an idea of how the code works. There are
> instructions at the above link for setting things up in an IDE
> (Eclipse and Intellij are popular).
>
> Just loading the project and looking for files that begin with
> CSV might be a place to start. Then look for files that begin
> with TestCSV. Both of these "look promising".
>
> Anyway, if you get that far, then go over to the dev list and say
> "I'm thinking of XXX, this code appears to be handled in YYY and
> I'm thinking of changing it like ZZZ" and it will be well received.
>
> Of course if you want to go ahead and make your changes and submit
> a patch, that's even better, but it's often best to get a bit of guidance
> first.
>
> Best
> Erick
>
> On Thu, Jun 9, 2011 at 5:17 PM, Helmut Hoffer von Ankershoffen
> <helmut...@googlemail.com> wrote:
> > On Thu, Jun 9, 2011 at 11:05 PM, Ken Krugler <
> kkrugler_li...@transpac.com>wrote:
> >
> >>
> >> On Jun 9, 2011, at 1:27pm, Helmut Hoffer von Ankershoffen wrote:
> >>
> >> > Hi,
> >> >
> >> > ... that would be an option if there is a defined set of field names
> and
> >> a
> >> > single column/CSV layout. The scenario however is different csv files
> >> (from
> >> > different shops) with individual column layouts (separators, encodings
> >> > etc.). The idea is to map known field names to defined field names in
> the
> >> > solr schema. If I understand the capabilities of the CSVLoader
> correctly
> >> > (sorry, I am completely new to Solr, started work on it today) this is
> >> not
> >> > possible - is it?
> >>
> >> As per the documentation on
> >> http://wiki.apache.org/solr/UpdateCSV#fieldnames, you can specify the
> >> names/positions of fields in the CSV file, and ignore fieldnames.
> >>
> >> So this seems like it would solve your requirement, as each different
> >> layout could specify its own such mapping during import.
> >>
> >> Sure, but the requirement (to keep the process of integrating new shops
> > efficient) is not to have one mapping per import (cp. the Email regarding
> > "more or less schema free") but to enhance one mapping that maps common
> > field names to defined fields disregarding order of known fields/columns.
> As
> > far as I understand that is not a problem at all with DIH, however DIH
> and
> > CSV are not a perfect match ,-)
> >
> >
> >> It could be handy to provide a fieldname map (versus the value map that
> >> UpdateCSV supports).
> >
> > Definitely. Either a fieldname map in CSVLoader or a robust CSVLoader in
> DIH
> > ...
> >
> >
> >> Then you could use the header, and just provide a mapping from header
> >> fieldnames to schema fieldnames.
> >>
> > That's the idea -)
> >
> > => what's the best way to progress. Either someone enhances the CSVLoader
> by
> > a field mapper (with multipel input field names mapping to one field name
> in
> > the Solr schema) or someone enhances the DIH with a robust CSV loader
> ,-).
> > As I am completely new to this Community, please give me the direction to
> go
> > (or wait :-).
> >
> > best regards
> >
> >
> >> -- Ken
> >>
> >> > On Thu, Jun 9, 2011 at 10:12 PM, Yonik Seeley <
> >> yo...@lucidimagination.com>wrote:
> >> >
> >> >> On Thu, Jun 9, 2011 at 4:07 PM, Helmut Hoffer von Ankershoffen
> >> >> <helmut...@googlemail.com> wrote:
> >> >>> Hi,
> >> >>> yes, it's about CSV files loaded via HTTP from shops to be fed into
> a
> >> >>> shopping search engine.
> >> >>> The CSV Loader cannot map fields (only field values) etc.
> >> >>
> >> >> You can provide your own list of fieldnames and optionally ignore the
> >> >> first line of the CSV file (assuming it contains the field names).
> >> >> http://wiki.apache.org/solr/UpdateCSV#fieldnames
> >> >>
> >> >> -Yonik
> >> >> http://www.lucidimagination.com
> >> >>
> >>
> >> --------------------------
> >> Ken Krugler
> >> +1 530-210-6378
> >> http://bixolabs.com
> >> custom data mining solutions
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >
>

Re: Processing/Indexing CSV

Reply via email to