WinXP, R-2.9.1 LS.,
I have been trying to solve a (for me) tricky issue. No matter what I've tried, I just can't find a way to do this. This is the issue: I have a text file (ansi text) "titles.txt" with lines of text; here is an example of such a file: >>>>> a brief history of polio vaccines anti-vaccination movements and their interpretations early warning in the light of theories of technological change international mobility among nordic doctoral students land of hope and glory: exploring cochlear implantation in the netherlands making science - between nature and society medical innovations in historical-perspective photographing medicine - images and power in britain and america since 1840 shifts in global immunisation goals (1984-2004): unfinished agendas and mixed results striking the mother lode in science - the importance of age, place, and time technology assessment and the sociopolitics of health technologies the policy of science and technology - evolution of research policy - france, the united-kingdom, the federal-republic-of-germany, japan, the united-states - french vaccine independence, local competences and globalisation: lessons from the history of pertussis vaccines external assessment and conditional financing of research in dutch universities histories of cochlear implantation lock in, the state and vaccine development: lessons from the history of the polio vaccines peerless science - peer-review and united-states science policy technology, science, and obstetric practice - the origins and transformation of cephalopelvimetry the rhetoric and counter-rhetoric of a ''bionic'' technology vaccine innovation and adoption: polio vaccines in the uk, the netherlands and west germany, 1955-1965 <<<<< Some of the lines in such a file are very long (not in this example). The file contains titles and abstracts of scientific articles. In addition to this file, I also have a file "words.txt" that includes a set of words I want to analyze. Part of this file: >>>>> technology technological innovations science policy society history <<<<< What I want is to create a matrix in which cell [i,j] contains the number of times word i (i.e the ith word from "words.txt") appears in line j of "titles.txt". So, for the data above this would yield (barring any typos on my side): 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 This is the precursor to co-word analysis and some basic statistics on these titles and abstracts. I have always had a hard time working with text in R and still have no idea how to achieve the results above. I am probably overlooking something pretty straightforward. But right now, I am completely in the dark. Any help is very much appreciated, Peter Verbeet [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.