Hi Fergus open a JIRA issue anyway. put in your thoughts and we can refine the requirements as a part of the discussion.
Basically the requirements are , 1)read a file line by line 2) filter out lines (include or exclude ) based on a regex 3) extract parts (named parts) from the line using another regex Noble On Tue, Mar 10, 2009 at 1:50 AM, Fergus McMenemie <fer...@twig.me.uk> wrote: >>Hi Fergus, >>The idea is that we have something generic which can be applicable to >>a large set of users. If the manifest is a text file it can be read in >>somestandard way (say line by line). So we can have an EntityProcessor >>which reads a text file line and filer it by a regex like the way >>'grep' works. > Yes. That is what I have written. It is just an alternate form of the > FileListEntityProcessor except that rather than walking the file system > it reads from a file, line by line, and identifies the portion of the > line containing the filename using a regexp. > > >> >>On Mon, Mar 9, 2009 at 10:44 PM, Fergus McMenemie <fer...@twig.me.uk> wrote: >>>>manifest processing has a very limited usecase. Why can't it be >>>>processed using a PlainTextEntityProcessor and write a Tranformer to >>>>read lines using regex? >>>> >>> Ehmmm Ok. The PlainTextEntityProcessor docs do not give me enough >>> insight to see how this could be used to index each of the files >>> listed by a 'tar xvf' report. Can you explain further? >>> >>> About the limited usecase. Verity thought it was useful enough >>> to have there own "bulk insert file" or bif file format that >>> did the same and was far less flexible. >>> >>> In my experience we generally start off with some kind of >>> file walker or crawler looking after file repositories. But >>> these always proved slow and unreliable and over time they >>> were always replaced it with some kind of manifest based >>> control of the indexer. Where we could get a report of changes >>> we always used it, and only relied on walkers or crawlers >>> where we had to. >>> >>> Fergus >>> >>>> >>>>--Noble >>>> >>>>On Mon, Mar 9, 2009 at 8:30 PM, Fergus McMenemie <fer...@twig.me.uk> wrote: >>>>> Hello, >>>>> >>>>> I have almost finished a new DIH EntityProcessor which >>>>> I am calling the manifestEnityProcessor. It is designed >>>>> around the idea that whatever demon is used to maintain >>>>> your set of a few 100,000 xml documents it is likely to >>>>> drop a report or log file explaining what has been changed >>>>> within your content store. This assumes a file based >>>>> content repository. >>>>> >>>>> The manifestEnityProcessor is used as follows >>>>> >>>>> <entity name="jc" >>>>> processor="ManifestEntityProcessor" >>>>> baseDir="/Volumes/Techmore/ts/aaa/schema/data" >>>>> rootEntity="false" >>>>> dataSource="null" >>>>> >>>>> allowRegex="^.*\.xml$" >>>>> manifestFileName="/Volumes/ts/man-find.txt" >>>>> manifestAddRegex="(.*)$" >>>>> > >>>>> >>>>> The idea is you have a log file or other report, perhaps >>>>> from tar or zip, and you wish to use this to control the >>>>> indexing of the new content. The new entity fields are as >>>>> follows. >>>>> >>>>> manifestFileName is the name of the manifest file. If >>>>> this value is relative, it assumed to >>>>> be relative to baseDir. Required. >>>>> >>>>> manifestAddRegex is a required regex to identify lines >>>>> which when matched should cause docs to >>>>> be added to the index. >>>>> >>>>> manifestDelRegex is an optional value of a regex to >>>>> identify documents which when matched should >>>>> be deleted from the index **PLANNED** >>>>> >>>>> allowRegex a required regex to identify the portion >>>>> of the ADD/DELete line identified above >>>>> which contains the file or pathname to >>>>> ADDed or DELeted. If the resulting value >>>>> relative, it assumed to be relative to >>>>> baseDir. >>>>> >>>>> What do I do next? >>>>> Raise a JIRA issue and add the code? >>>>> Is DIH the right place to add this? >>>>> Suggestions for a different name? >>>>> Suggestions on how to do the delete bitty from within an entity? >>>>> >>>>> Regards Fergus. >>>>--Noble Paul >>> >>> -- >>> >>> =============================================================== >>> Fergus McMenemie Email:fer...@twig.me.uk >>> Techmore Ltd Phone:(UK) 07721 376021 >>> >>> Unix/Mac/Intranets Analyst Programmer >>> =============================================================== >>> >> >> >> >>-- >>--Noble Paul > > -- > > =============================================================== > Fergus McMenemie Email:fer...@twig.me.uk > Techmore Ltd Phone:(UK) 07721 376021 > > Unix/Mac/Intranets Analyst Programmer > =============================================================== > -- --Noble Paul