Hi, thanks for the Intro, will do next week :-)
greetings from berlin On Fri, Jun 10, 2011 at 2:49 PM, Erick Erickson <erickerick...@gmail.com>wrote: > Well, here's a place to start if you want to patch the code: > > http://wiki.apache.org/solr/HowToContribute > > If you do want to take this on, hop on over to the dev list > and start a discussion. I'd start with some posts on that list > before entering or working on a JIRA issue, just ask for > some guidance. A good place to start is pretty much what > you've done here, state your problem, and what you think > the correct behavior is. > > Be prepared for things to be brought up you never thought > of <G>... which is the point of starting the discussion there. > > A very good way to start is to get the code, compile it, and then > run some of the test cases in an IDE, stepping through the test > case in the debugger. Sometimes that doesn't work easily, but > if it does it gives you an idea of how the code works. There are > instructions at the above link for setting things up in an IDE > (Eclipse and Intellij are popular). > > Just loading the project and looking for files that begin with > CSV might be a place to start. Then look for files that begin > with TestCSV. Both of these "look promising". > > Anyway, if you get that far, then go over to the dev list and say > "I'm thinking of XXX, this code appears to be handled in YYY and > I'm thinking of changing it like ZZZ" and it will be well received. > > Of course if you want to go ahead and make your changes and submit > a patch, that's even better, but it's often best to get a bit of guidance > first. > > Best > Erick > > On Thu, Jun 9, 2011 at 5:17 PM, Helmut Hoffer von Ankershoffen > <helmut...@googlemail.com> wrote: > > On Thu, Jun 9, 2011 at 11:05 PM, Ken Krugler < > kkrugler_li...@transpac.com>wrote: > > > >> > >> On Jun 9, 2011, at 1:27pm, Helmut Hoffer von Ankershoffen wrote: > >> > >> > Hi, > >> > > >> > ... that would be an option if there is a defined set of field names > and > >> a > >> > single column/CSV layout. The scenario however is different csv files > >> (from > >> > different shops) with individual column layouts (separators, encodings > >> > etc.). The idea is to map known field names to defined field names in > the > >> > solr schema. If I understand the capabilities of the CSVLoader > correctly > >> > (sorry, I am completely new to Solr, started work on it today) this is > >> not > >> > possible - is it? > >> > >> As per the documentation on > >> http://wiki.apache.org/solr/UpdateCSV#fieldnames, you can specify the > >> names/positions of fields in the CSV file, and ignore fieldnames. > >> > >> So this seems like it would solve your requirement, as each different > >> layout could specify its own such mapping during import. > >> > >> Sure, but the requirement (to keep the process of integrating new shops > > efficient) is not to have one mapping per import (cp. the Email regarding > > "more or less schema free") but to enhance one mapping that maps common > > field names to defined fields disregarding order of known fields/columns. > As > > far as I understand that is not a problem at all with DIH, however DIH > and > > CSV are not a perfect match ,-) > > > > > >> It could be handy to provide a fieldname map (versus the value map that > >> UpdateCSV supports). > > > > Definitely. Either a fieldname map in CSVLoader or a robust CSVLoader in > DIH > > ... > > > > > >> Then you could use the header, and just provide a mapping from header > >> fieldnames to schema fieldnames. > >> > > That's the idea -) > > > > => what's the best way to progress. Either someone enhances the CSVLoader > by > > a field mapper (with multipel input field names mapping to one field name > in > > the Solr schema) or someone enhances the DIH with a robust CSV loader > ,-). > > As I am completely new to this Community, please give me the direction to > go > > (or wait :-). > > > > best regards > > > > > >> -- Ken > >> > >> > On Thu, Jun 9, 2011 at 10:12 PM, Yonik Seeley < > >> yo...@lucidimagination.com>wrote: > >> > > >> >> On Thu, Jun 9, 2011 at 4:07 PM, Helmut Hoffer von Ankershoffen > >> >> <helmut...@googlemail.com> wrote: > >> >>> Hi, > >> >>> yes, it's about CSV files loaded via HTTP from shops to be fed into > a > >> >>> shopping search engine. > >> >>> The CSV Loader cannot map fields (only field values) etc. > >> >> > >> >> You can provide your own list of fieldnames and optionally ignore the > >> >> first line of the CSV file (assuming it contains the field names). > >> >> http://wiki.apache.org/solr/UpdateCSV#fieldnames > >> >> > >> >> -Yonik > >> >> http://www.lucidimagination.com > >> >> > >> > >> -------------------------- > >> Ken Krugler > >> +1 530-210-6378 > >> http://bixolabs.com > >> custom data mining solutions > >> > >> > >> > >> > >> > >> > >> > > >