I have to write a script to parse XML files we receive daily. The XML files are all 
individual stories but there is an index page that comes with each batch that contains 
blocks of information for each story as follows (below). I need to run through this 
index file and for each story I need to grab the NewsItemID, the Time, and then the 
SourceFilePath.

>From there I need to then open up the individual stories and do some formatting but 
>for now I need to get by this :) I was planning on line by line through the file but 
>am not sure how I would go about grabbing the information I require. Sometimes there 
>is a SourceFilepath but sometimes its missing.

Any help would be greatly appreciated.

<ContentItem>

<Comment NewsItemID="780023,  " Time="28-05-02 13:43"/>
<Comment SlugLine="Canada-U.S.-Protectionism"/>

<DataContent>
<CPOnlineFile Type="IndexStoryItem">
<JavaScript ScriptLanguage="&JavaScriptLanguage;">&CPJavaScriptOpenWindow;</JavaScript>
<CPIndexStoryHead>Chretien pushes Bush on softwood, agriculture, but gets no 
promises</CPIndexStoryHead>
<CPStory>
<CPStoryPara Number="1" ParaSpace="FALSE">
(CP) - Prime Minister Jean Chretien said he pressed U.S. President George W. Bush on 
Tuesday to address festering trade disputes between the two countries, but got no 
assurances that disagreements over softwood lumber or agricultural subsidies would be 
resolved. Chretien, who raised the matters after a NATO meeting in the Italian 
capital, said he was "very forceful" with Bush. But he said the president blamed 
Congress for the logjam.

</CPStoryPara>
<CPStoryPara Number="2" ParaSpace="FALSE">
"It's always like that when you deal with the president of the United States: 'Yes, 
but the Congress and the Senate . . . ' In Canada you blame the prime minister or you 
congratulate the prime minister because he cannot pass the buck to anyone else."

</CPStoryPara>
</CPStory>
<CPLink Type="StoryFile" Number="1" SourceFilePath="./n052814A.xml"/>

</CPOnlineFile>
</DataContent>
</ContentItem>

Reply via email to