>Hi Fergus,
>The idea is that we have something generic which can be applicable to
>a large set of users. If the manifest is a text file it can be read in
>somestandard way (say line by line). So we can have an EntityProcessor
>which reads a text file line and filer it by a regex like the way
>'grep' works.
Yes. That is what I have written. It is just an alternate form of the
FileListEntityProcessor except that rather than walking the file system
it reads from a file, line by line, and identifies the portion of the
line containing the filename using a regexp.


>
>On Mon, Mar 9, 2009 at 10:44 PM, Fergus McMenemie <fer...@twig.me.uk> wrote:
>>>manifest processing has a very limited usecase. Why can't it be
>>>processed using a PlainTextEntityProcessor and write a Tranformer to
>>>read lines using regex?
>>>
>> Ehmmm Ok. The PlainTextEntityProcessor docs do not give me enough
>> insight to see how this could be used to index each of the files
>> listed by a 'tar xvf' report. Can you explain further?
>>
>> About the limited usecase. Verity thought it was useful enough
>> to have there own "bulk insert file" or bif file format that
>> did the same and was far less flexible.
>>
>> In my experience we generally start off with some kind of
>> file walker or crawler looking after file repositories. But
>> these always proved slow and unreliable and over time they
>> were always replaced it with some kind of manifest based
>> control of the indexer. Where we could get a report of changes
>> we always used it, and only relied on walkers or crawlers
>> where we had to.
>>
>> Fergus
>>
>>>
>>>--Noble
>>>
>>>On Mon, Mar 9, 2009 at 8:30 PM, Fergus McMenemie <fer...@twig.me.uk> wrote:
>>>> Hello,
>>>>
>>>> I have almost finished a new DIH EntityProcessor which
>>>> I am calling the manifestEnityProcessor. It is designed
>>>> around the idea that whatever demon is used to maintain
>>>> your set of a few 100,000 xml documents it is likely to
>>>> drop a report or log file explaining what has been changed
>>>> within your content store. This assumes a file based
>>>> content repository.
>>>>
>>>> The manifestEnityProcessor is used as follows
>>>>
>>>>       <entity name="jc"
>>>>               processor="ManifestEntityProcessor"
>>>>               baseDir="/Volumes/Techmore/ts/aaa/schema/data"
>>>>               rootEntity="false"
>>>>               dataSource="null"
>>>>
>>>>               allowRegex="^.*\.xml$"
>>>>               manifestFileName="/Volumes/ts/man-find.txt"
>>>>               manifestAddRegex="(.*)$"
>>>>               >
>>>>
>>>> The idea is you have a log file or other report, perhaps
>>>> from tar or zip, and you wish to use this to control the
>>>> indexing of the new content. The new entity fields are as
>>>> follows.
>>>>
>>>> manifestFileName is the name of the manifest file. If
>>>>                 this value is relative, it assumed to
>>>>                 be relative to baseDir. Required.
>>>>
>>>> manifestAddRegex is a required regex to identify lines
>>>>                 which when matched should cause docs to
>>>>                 be added to the index.
>>>>
>>>> manifestDelRegex is an optional value of a regex to
>>>>                 identify documents which when matched should
>>>>                 be deleted from the index **PLANNED**
>>>>
>>>> allowRegex       a required regex to identify the portion
>>>>                 of the ADD/DELete line identified above
>>>>                 which contains the file or pathname to
>>>>                 ADDed or DELeted. If the resulting value
>>>>                 relative, it assumed to be relative to
>>>>                 baseDir.
>>>>
>>>> What do I do next?
>>>>   Raise a JIRA issue and add the code?
>>>>   Is DIH the right place to add this?
>>>>   Suggestions for a different name?
>>>>   Suggestions on how to do the delete bitty from within an entity?
>>>>
>>>> Regards Fergus.
>>>--Noble Paul
>>
>> --
>>
>> ===============================================================
>> Fergus McMenemie               Email:fer...@twig.me.uk
>> Techmore Ltd                   Phone:(UK) 07721 376021
>>
>> Unix/Mac/Intranets             Analyst Programmer
>> ===============================================================
>>
>
>
>
>-- 
>--Noble Paul

-- 

===============================================================
Fergus McMenemie               Email:fer...@twig.me.uk
Techmore Ltd                   Phone:(UK) 07721 376021

Unix/Mac/Intranets             Analyst Programmer
===============================================================

Reply via email to