Re: [R] bibtex::read.bib -- extracting bibentry keys

Michael Friendly Mon, 06 Aug 2012 10:49:31 -0700

On 8/6/2012 11:54 AM, Achim Zeileis wrote:

On Mon, 6 Aug 2012, Michael Friendly wrote:
I have two versions of a bibtex database which have gotten badly outof sync. I need to find find all the entries in bib2 which are notcontained in bib1, according to their bibtex keys. But I can't figureout how to extract a list of the bibentry keys in these databases.
read.bib() returns a "bibentry" object so you can simply do this as usual
for "bibentry" objects with $key:

One thing that was confusing was that read.bib returns a "bibentry"object, all of whose

elements are also "bibentry" objects.


x <- read.bib(...)
x$key

or maybe

unlist(x$key)

Whatever is more convenient for you. See ?bibentry for more details.

That is what I was missing -- it would have helped to find a link toutils::bibentry in the [rather scanty] documentation for

read.bib. I'm now a happy camper in this regard. What I wanted is given by:

bib1 <- read.bib("C:/localtexmf/bibtex/bib/timeref.bib")
length(bib1)
keys1 <- unlist(bib1$key)

bib2 <- read.bib("W:/texmf/bibtex/bib/timeref.bib")
length(bib2)
keys2 <- unlist(bib2$key)


> which(! keys1 %in% keys2)

[1] 133 249 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626627 628

> keys1[which(! keys1 %in% keys2)]
[1] "Langren:1646" "Fisher:1915a" "Stigler:2012"
[4] "Wainer:2011" "Minard:1860a" "CNAM:1906"
[7] "Wainer:2012" "Wainer-Ramsay:2010" "Stephenson-Galneder:1969"
[10] "Waters:1964" "Agathe:1988" "Gascoigne:2007"
[13] "Krzywinski:2009" "Bolle:1929" "Balbi:1829"
[16] "Bills-Li:2005" "Lewi:2006" "Fletcher:1851"
[19] "Perrot:1976"
>

As a side note, I searched extensively for bibtex tools that would helpme resolve the differences between tworelated bibtex files, but none was as simple as this, once I could getthe keys. Thanks to Roman for providing this

infrastructure!

So, ignoring for now differences in the contents of the bibentries, auseful tool for my purpose is bibdiff(),


bibdiff <- function(bib1, bib2) {
keys1 <- unlist(bib1$key)
keys2 <- unlist(bib2$key)
only1 <- keys1[which(! keys1 %in% keys2)]
only2 <- keys2[which(! keys2 %in% keys1)]
cat("Only in bib1:\n")
print(only1)
cat("Only in bib2:\n")
print(only2)
}

> bibdiff(bib1, bib2)
Only in bib1:
[1] "Langren:1646" "Fisher:1915a" "Stigler:2012"
[4] "Wainer:2011" "Minard:1860a" "CNAM:1906"
[7] "Wainer:2012" "Wainer-Ramsay:2010" "Stephenson-Galneder:1969"
[10] "Waters:1964" "Agathe:1988" "Gascoigne:2007"
[13] "Krzywinski:2009" "Bolle:1929" "Balbi:1829"
[16] "Bills-Li:2005" "Lewi:2006" "Fletcher:1851"
[19] "Perrot:1976"
Only in bib2:
[1] "Langren:1644" "Quetelet:1842"
>

which gives me the complete answer, as far as it goes.

A minor question: Is there someway to prevent read.bib from ignoringentries that do not contain all required fields?
Also not really an issue with read.bib itself. read.bib() wants toreturn a "bibentry" object but bibentry() just allows to createobjects that are valid BibTeX, i.e., have all required fields.

It turns out that read.bib seems to be pickier than bibtex itself -- itdoes not accommodate crossref= fields, used for

InCollection items; these resolve correctly using bibtex.
For some books in my database, the publisher is unknown. bibtex generates

warnings (I think) and does include the references. It would be nicer ifthere was an argument to read.bib, e.g.,strict = {T/F} where strict=FALSE would allow entries not containing allrequired fields. But perhaps that's buried

too deep in the implementation.

> bib1 <- read.bib("C:/localtexmf/bibtex/bib/timeref.bib")
ignoring entry 'Donoho-etal:1988' (line 40) because :

A bibentry of bibtype ‘InCollection’ has to correctly specify thefield(s): booktitle


ignoring entry 'Martonne:1919:map' (line 90) because :

A bibentry of bibtype ‘InCollection’ has to correctly specify thefield(s): booktitle, publisher, year


ignoring entry 'Touraine:2002' (line 5423) because :

A bibentry of bibtype ‘Book’ has to correctly specify the field(s):publisher


ignoring entry 'Cotes:1722' (line 6004) because :

A bibentry of bibtype ‘Book’ has to correctly specify the field(s):publisher


ignoring entry 'Quetelet:1842' (line 6605) because :

A bibentry of bibtype ‘Book’ has to correctly specify the field(s):publisher


ignoring entry 'Wenzlick:1950' (line 6663) because :

A bibentry of bibtype ‘Unpublished’ has to correctly specify thefield(s): note


ignoring entry 'Verniquet:1791' (line 6695) because :

A bibentry of bibtype ‘Book’ has to correctly specify the field(s):publisher


> length(bib1)
[1] 628
>

A suggestion: it would be nice if bibtex provided some extractorfunctions for bibentry fields.
So that only a subset of fields is read as opposed to all fields?
If you read all fields, you can easily subset afterwards (again using$-notation).

No, it was only lack of documentation, and perhaps an example or two forread.bib that caused me to

stumble.


hth,
Z



--
Michael Friendly     Email: friendly AT yorku DOT ca
Professor, Psychology Dept.
York University      Voice: 416 736-2100 x66249 Fax: 416 736-5814
4700 Keele Street    Web:   http://www.datavis.ca
Toronto, ONT  M3J 1P3 CANADA

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] bibtex::read.bib -- extracting bibentry keys

Reply via email to