I have to write a script to parse XML files we receive daily. The XML files are all individual stories but there is an index page that comes with each batch that contains blocks of information for each story as follows (below). I need to run through this index file and for each story I need to grab the NewsItemID, the Time, and then the SourceFilePath.
>From there I need to then open up the individual stories and do some formatting but >for now I need to get by this :) I was planning on line by line through the file but >am not sure how I would go about grabbing the information I require. Sometimes there >is a SourceFilepath but sometimes its missing. Any help would be greatly appreciated. <ContentItem> <Comment NewsItemID="780023, " Time="28-05-02 13:43"/> <Comment SlugLine="Canada-U.S.-Protectionism"/> <DataContent> <CPOnlineFile Type="IndexStoryItem"> <JavaScript ScriptLanguage="&JavaScriptLanguage;">&CPJavaScriptOpenWindow;</JavaScript> <CPIndexStoryHead>Chretien pushes Bush on softwood, agriculture, but gets no promises</CPIndexStoryHead> <CPStory> <CPStoryPara Number="1" ParaSpace="FALSE"> (CP) - Prime Minister Jean Chretien said he pressed U.S. President George W. Bush on Tuesday to address festering trade disputes between the two countries, but got no assurances that disagreements over softwood lumber or agricultural subsidies would be resolved. Chretien, who raised the matters after a NATO meeting in the Italian capital, said he was "very forceful" with Bush. But he said the president blamed Congress for the logjam. </CPStoryPara> <CPStoryPara Number="2" ParaSpace="FALSE"> "It's always like that when you deal with the president of the United States: 'Yes, but the Congress and the Senate . . . ' In Canada you blame the prime minister or you congratulate the prime minister because he cannot pass the buck to anyone else." </CPStoryPara> </CPStory> <CPLink Type="StoryFile" Number="1" SourceFilePath="./n052814A.xml"/> </CPOnlineFile> </DataContent> </ContentItem>