Re: a new DIH manifestEnityProcessor SOLR-1060 on jira

2009-03-10 Thread Fergus McMenemie
OK, SOLR-1060 created. >To this requirement I would add the basic requirement that this file >(what Fergus calls the manifest to which I still don't agree) >represents a update-set and that there should be a delete-set as well. > >ChangeSetEntityProcessor, on there I would jump with two feet.

Re: a new DIH manifestEnityProcessor

2009-03-10 Thread Paul Libbrecht
To this requirement I would add the basic requirement that this file (what Fergus calls the manifest to which I still don't agree) represents a update-set and that there should be a delete-set as well. ChangeSetEntityProcessor, on there I would jump with two feet. paul Le 10-mars-09 à 05:4

Re: a new DIH manifestEnityProcessor

2009-03-09 Thread Noble Paul നോബിള്‍ नोब्ळ्
Hi Fergus open a JIRA issue anyway. put in your thoughts and we can refine the requirements as a part of the discussion. Basically the requirements are , 1)read a file line by line 2) filter out lines (include or exclude ) based on a regex 3) extract parts (named parts) from the line using another

Re: a new DIH manifestEnityProcessor

2009-03-09 Thread Fergus McMenemie
>Hi Fergus, >The idea is that we have something generic which can be applicable to >a large set of users. If the manifest is a text file it can be read in >somestandard way (say line by line). So we can have an EntityProcessor >which reads a text file line and filer it by a regex like the way >'gre

Re: a new DIH manifestEnityProcessor

2009-03-09 Thread Noble Paul നോബിള്‍ नोब्ळ्
Hi Fergus, The idea is that we have something generic which can be applicable to a large set of users. If the manifest is a text file it can be read in somestandard way (say line by line). So we can have an EntityProcessor which reads a text file line and filer it by a regex like the way 'grep' wor

Re: a new DIH manifestEnityProcessor

2009-03-09 Thread Fergus McMenemie
>manifest processing has a very limited usecase. Why can't it be >processed using a PlainTextEntityProcessor and write a Tranformer to >read lines using regex? > Ehmmm Ok. The PlainTextEntityProcessor docs do not give me enough insight to see how this could be used to index each of the files listed

Re: a new DIH manifestEnityProcessor

2009-03-09 Thread Noble Paul നോബിള്‍ नोब्ळ्
manifest processing has a very limited usecase. Why can't it be processed using a PlainTextEntityProcessor and write a Tranformer to read lines using regex? --Noble On Mon, Mar 9, 2009 at 8:30 PM, Fergus McMenemie wrote: > Hello, > > I have almost finished a new DIH EntityProcessor which > I a