Re: [R] Burt table from word frequency list

Murray Cooper Sun, 29 Mar 2009 19:23:44 -0700

The usual approach is to count the co-occurence within so many words of eachother.

Typical is between 5 words before and 5 words after a given word.

So for each word in the document, you look for the occurence of all otherwordswithin -5 -4 -3 -2 -1 0 1 2 3 4 5 words. Depending on the language and thequestion

being asked certain words may be excluded.

This is not a simple function! I don't know if anyone has done a package,for this type

of analysis but with over 2000 packages floating around you might get lucky.

Murray M Cooper, Ph.D.
Richland Statistics
9800 N 24th St
Richland, MI, USA 49083
Mail: richs...@earthlink.net

----- Original Message -----From: "Ted Harding" <ted.hard...@manchester.ac.uk>

To: "Joan-Josep Vallbé" <pep.val...@uab.cat>
Cc: <r-help@r-project.org>
Sent: Sunday, March 29, 2009 2:46 PM
Subject: Re: [R] Burt table from word frequency list


On 29-Mar-09 16:32:11, Joan-Josep Vallbé wrote:

Ok, thank you. And is there any function to get the table directly
from the original corpus?

best,
joan-josep vallbé


You will have to think about what you are doing. As Duncan said,
you need "counts of pairs of words" or, more precisely, of
co-occurrence. But co-occurrence within what?

Adjacent?
Within the same sentence?
Within the same paragraph?
Within the same chapter?
Within the same document (if your corpus incorporates several
 documents)?
Within documents by the same author?
 If so, then is there an additional classification by
 individual document?

Etc., etc., etc.

In short, what is the structure of your corpus, and how do
you wish this to be represented in the Burt table?

Hoping this helps to move you forward,
Ted.

On Mar 29, 2009, at 2:00 PM, Duncan Murdoch wrote:

On 29/03/2009 7:02 AM, Joan-Josep Vallbé wrote:

Dear all,
I have a word frequency list from a corpus (say, in .csv), where
the  first column is a word and the second is the occurrence
frequency of  that word in the corpus. Is it possible to obtain a
Burt table (a  table crossing all words with each other, i.e.,
where rows and columns  are the words) from that frequency list
with R? I'm exploring the "ca"  package but I'm not able to solve
this detail.


No, because you don't have any information on that.  You only have
marginal counts.  You need counts of pairs of words (from the
original corpus, or already summarized.)

Duncan Murdoch


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--------------------------------------------------------------------
E-Mail: (Ted Harding) <ted.hard...@manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 29-Mar-09                                       Time: 18:46:40
------------------------------ XFMail ------------------------------

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Burt table from word frequency list

Reply via email to