Hello:
I have a series of newspaper articles from a Canadian newspaper database 
(Canadian Newsstand) that look just like below.

I've read through this vignette 
(http://cran.r-project.org/web/packages/tm/vignettes/extensions.pdf) about 
creating a custom reader to extract meta-data, but I can't understand how to 
apply this in the context of a text document, rather than in the tabular format 
as in the vignette.  You can see there's all kinds of valuable information in 
each document -Author, page number, publication year, section, publication 
title....
Can anyone provide some suggestions to someone unfamiliar with the tm package 
as to how to go about creating a custom reader for this situation?
Yours truly,
Simon Kiss

____________________________________________________________

Document 1 of 40
First Nation agrees not to block trains
Author: SHAWN BERRY Legislature Bureau
Publication info: Daily Gleaner [Fredericton, N.B] 07 Jan 2013: A.3.
http://remote.libproxy.wlu.ca/login?url=http://search.proquest.com/docview/1266701269?accountid=15090
Abstract: Participants are also concerned about Chief Theresa Spence who 
stopped eating solid food on Dec. 11 in a bid to secure a meeting between First 
Nations leaders, Prime Minister Stephen Harper and Gov. Gen. David Johnston to 
discuss the treaty relationship.
Links: null
Full Text: A bunch of text about a story here
Subject: Railroads; Native North Americans; Meetings; Injunctions
Title: First Nation agrees not to block trains
Publication title: Daily Gleaner
First page: A.3
Publication year: 2013
Publication date: Jan 7, 2013
Year: 2013
Section: Main
Publisher: Infomart, a division of Postmedia Network Inc.
Place of publication: Fredericton, N.B.
Country of publication: Canada
Journal subject: GENERAL INTEREST PERIODICALS--UNITED STATES
ISSN: 08216983
Source type: Newspapers
Language of publication: English
Document type: News
ProQuest document ID: 1266701269
Document URL: 
http://remote.libproxy.wlu.ca/login?url=http://search.proquest.com/docview/1266701269?accountid=15090
Copyright: (Copyright (c) 2013 The Daily Gleaner (Fredericton))
Last updated: 2013-01-07
Database: Canadian Newsstand Complete


*********************************
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9
Cell: +1 905 746 7606

Please avoid sending me Word, PowerPoint or Excel attachments. Sending these 
documents puts pressure on many people to use Microsoft software and helps to 
deny them any other choice. In effect, you become a buttress of the Microsoft 
monopoly.

To convert to plain text choose Text Only or Text Document as the Save As Type. 
 Your computer may also have a program to convert to PDF format. Select File, 
then Print. Scroll through available printers and select the PDF converter. 
Click on the Print button and enter a name for the PDF file when requested.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to