Re: Processing/Indexing CSV

2011-06-10 Thread Helmut Hoffer von Ankershoffen
Hi, thanks for the Intro, will do next week :-) greetings from berlin On Fri, Jun 10, 2011 at 2:49 PM, Erick Erickson wrote: > Well, here's a place to start if you want to patch the code: > > http://wiki.apache.org/solr/HowToContribute > > If you do want to take this on, hop on over to the dev

Re: Processing/Indexing CSV

2011-06-10 Thread Erick Erickson
Well, here's a place to start if you want to patch the code: http://wiki.apache.org/solr/HowToContribute If you do want to take this on, hop on over to the dev list and start a discussion. I'd start with some posts on that list before entering or working on a JIRA issue, just ask for some guidanc

Re: Processing/Indexing CSV

2011-06-09 Thread Ken Krugler
On Jun 9, 2011, at 2:21pm, Helmut Hoffer von Ankershoffen wrote: > Hi, > > btw: there seems to somewhat of a non-match regarding efforts to Enhance DIH > regarding the CSV format (James Dyer) and the effort to maintain the > CSVLoader (Ken Krugler). How about merging your efforts and migrating t

Re: Processing/Indexing CSV

2011-06-09 Thread Helmut Hoffer von Ankershoffen
Hi, btw: there seems to somewhat of a non-match regarding efforts to Enhance DIH regarding the CSV format (James Dyer) and the effort to maintain the CSVLoader (Ken Krugler). How about merging your efforts and migrating the CSVLoader to a CSVEntityProcessor (cp. my initial email)? :-) Best Regard

Re: Processing/Indexing CSV

2011-06-09 Thread Helmut Hoffer von Ankershoffen
On Thu, Jun 9, 2011 at 11:05 PM, Ken Krugler wrote: > > On Jun 9, 2011, at 1:27pm, Helmut Hoffer von Ankershoffen wrote: > > > Hi, > > > > ... that would be an option if there is a defined set of field names and > a > > single column/CSV layout. The scenario however is different csv files > (from

Re: Processing/Indexing CSV

2011-06-09 Thread Ken Krugler
On Jun 9, 2011, at 1:27pm, Helmut Hoffer von Ankershoffen wrote: > Hi, > > ... that would be an option if there is a defined set of field names and a > single column/CSV layout. The scenario however is different csv files (from > different shops) with individual column layouts (separators, encod

Re: Processing/Indexing CSV

2011-06-09 Thread Helmut Hoffer von Ankershoffen
Hi, ... that would be an option if there is a defined set of field names and a single column/CSV layout. The scenario however is different csv files (from different shops) with individual column layouts (separators, encodings etc.). The idea is to map known field names to defined field names in th

Re: Processing/Indexing CSV

2011-06-09 Thread Yonik Seeley
On Thu, Jun 9, 2011 at 4:07 PM, Helmut Hoffer von Ankershoffen wrote: > Hi, > yes, it's about CSV files loaded via HTTP from shops to be fed into a > shopping search engine. > The CSV Loader cannot map fields (only field values) etc. You can provide your own list of fieldnames and optionally igno

Re: Processing/Indexing CSV

2011-06-09 Thread Helmut Hoffer von Ankershoffen
Hi, yes, it's about CSV files loaded via HTTP from shops to be fed into a shopping search engine. The CSV Loader cannot map fields (only field values) etc. DIH is flexible enough for building the importing part of such a thing but misses elegant handling of CSV data ... Regards On Thu, Jun 9, 2

Re: Processing/Indexing CSV

2011-06-09 Thread Helmut Hoffer von Ankershoffen
ith encodings but I'm not sure this will be >> an issue either... >> >> James Dyer >> E-Commerce Systems >> Ingram Content Group >> (615) 213-4311 >> >> -Original Message- >> From: Helmut Hoffer von Ankershoffen [mailto:hel

Re: Processing/Indexing CSV

2011-06-09 Thread Helmut Hoffer von Ankershoffen
ngs but I'm not sure this will be > an issue either... > > James Dyer > E-Commerce Systems > Ingram Content Group > (615) 213-4311 > > -Original Message- > From: Helmut Hoffer von Ankershoffen [mailto:helmut...@googlemail.com] > Sent: Thursday, June 09, 201

Re: Processing/Indexing CSV

2011-06-09 Thread Yonik Seeley
On Thu, Jun 9, 2011 at 3:31 PM, Helmut Hoffer von Ankershoffen wrote: > Hi, > > there seems to be no way to index CSV using the DataImportHandler. Looking over the features you want, it looks like you're starting from a CSV file (as opposed to CSV stored in a database). Is there a reason that you

RE: Processing/Indexing CSV

2011-06-09 Thread Dyer, James
t: Thursday, June 09, 2011 2:32 PM To: solr-user@lucene.apache.org Subject: Processing/Indexing CSV Hi, there seems to be no way to index CSV using the DataImportHandler. Using a combination of LineEntityProcessor<http://wiki.apache.org/solr/DataImportHandler#LineEntityProcessor>

Re: Processing/Indexing CSV

2011-06-09 Thread Helmut Hoffer von Ankershoffen
Hi, to make my point more clear: if the CSV has a fixed schema / column layout, using the RegexTransformer is of course a possibility (however awkward). But if you want to implement a (more or less) schema free shopping search engine ... regards On Thu, Jun 9, 2011 at 9:31 PM, Helmut Hoffer von

Processing/Indexing CSV

2011-06-09 Thread Helmut Hoffer von Ankershoffen
Hi, there seems to be no way to index CSV using the DataImportHandler. Using a combination of LineEntityProcessor and RegexTransformer as proposed in http://robotli