On 8/6/2012 11:54 AM, Achim Zeileis wrote:
On Mon, 6 Aug 2012, Michael Friendly wrote:

I have two versions of a bibtex database which have gotten badly out of sync. I need to find find all the entries in bib2 which are not contained in bib1, according to their bibtex keys. But I can't figure out how to extract a list of the bibentry keys in these databases.

read.bib() returns a "bibentry" object so you can simply do this as usual
for "bibentry" objects with $key:
One thing that was confusing was that read.bib returns a "bibentry" object, all of whose
elements are also "bibentry" objects.

x <- read.bib(...)
x$key

or maybe

unlist(x$key)

Whatever is more convenient for you. See ?bibentry for more details.
That is what I was missing -- it would have helped to find a link to utils::bibentry in the [rather scanty] documentation for
read.bib. I'm now a happy camper in this regard. What I wanted is given by:

bib1 <- read.bib("C:/localtexmf/bibtex/bib/timeref.bib")
length(bib1)
keys1 <- unlist(bib1$key)

bib2 <- read.bib("W:/texmf/bibtex/bib/timeref.bib")
length(bib2)
keys2 <- unlist(bib2$key)


> which(! keys1 %in% keys2)
[1] 133 249 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628
> keys1[which(! keys1 %in% keys2)]
[1] "Langren:1646" "Fisher:1915a" "Stigler:2012"
[4] "Wainer:2011" "Minard:1860a" "CNAM:1906"
[7] "Wainer:2012" "Wainer-Ramsay:2010" "Stephenson-Galneder:1969"
[10] "Waters:1964" "Agathe:1988" "Gascoigne:2007"
[13] "Krzywinski:2009" "Bolle:1929" "Balbi:1829"
[16] "Bills-Li:2005" "Lewi:2006" "Fletcher:1851"
[19] "Perrot:1976"
>

As a side note, I searched extensively for bibtex tools that would help me resolve the differences between two related bibtex files, but none was as simple as this, once I could get the keys. Thanks to Roman for providing this
infrastructure!

So, ignoring for now differences in the contents of the bibentries, a useful tool for my purpose is bibdiff(),

bibdiff <- function(bib1, bib2) {
keys1 <- unlist(bib1$key)
keys2 <- unlist(bib2$key)
only1 <- keys1[which(! keys1 %in% keys2)]
only2 <- keys2[which(! keys2 %in% keys1)]
cat("Only in bib1:\n")
print(only1)
cat("Only in bib2:\n")
print(only2)
}

> bibdiff(bib1, bib2)
Only in bib1:
[1] "Langren:1646" "Fisher:1915a" "Stigler:2012"
[4] "Wainer:2011" "Minard:1860a" "CNAM:1906"
[7] "Wainer:2012" "Wainer-Ramsay:2010" "Stephenson-Galneder:1969"
[10] "Waters:1964" "Agathe:1988" "Gascoigne:2007"
[13] "Krzywinski:2009" "Bolle:1929" "Balbi:1829"
[16] "Bills-Li:2005" "Lewi:2006" "Fletcher:1851"
[19] "Perrot:1976"
Only in bib2:
[1] "Langren:1644" "Quetelet:1842"
>

which gives me the complete answer, as far as it goes.


A minor question: Is there someway to prevent read.bib from ignoring entries that do not contain all required fields?

Also not really an issue with read.bib itself. read.bib() wants to return a "bibentry" object but bibentry() just allows to create objects that are valid BibTeX, i.e., have all required fields.

It turns out that read.bib seems to be pickier than bibtex itself -- it does not accommodate crossref= fields, used for
InCollection items; these resolve correctly using bibtex.
For some books in my database, the publisher is unknown. bibtex generates
warnings (I think) and does include the references. It would be nicer if there was an argument to read.bib, e.g., strict = {T/F} where strict=FALSE would allow entries not containing all required fields. But perhaps that's buried
too deep in the implementation.

> bib1 <- read.bib("C:/localtexmf/bibtex/bib/timeref.bib")
ignoring entry 'Donoho-etal:1988' (line 40) because :
A bibentry of bibtype ‘InCollection’ has to correctly specify the field(s): booktitle

ignoring entry 'Martonne:1919:map' (line 90) because :
A bibentry of bibtype ‘InCollection’ has to correctly specify the field(s): booktitle, publisher, year

ignoring entry 'Touraine:2002' (line 5423) because :
A bibentry of bibtype ‘Book’ has to correctly specify the field(s): publisher

ignoring entry 'Cotes:1722' (line 6004) because :
A bibentry of bibtype ‘Book’ has to correctly specify the field(s): publisher

ignoring entry 'Quetelet:1842' (line 6605) because :
A bibentry of bibtype ‘Book’ has to correctly specify the field(s): publisher

ignoring entry 'Wenzlick:1950' (line 6663) because :
A bibentry of bibtype ‘Unpublished’ has to correctly specify the field(s): note

ignoring entry 'Verniquet:1791' (line 6695) because :
A bibentry of bibtype ‘Book’ has to correctly specify the field(s): publisher

> length(bib1)
[1] 628
>

A suggestion: it would be nice if bibtex provided some extractor functions for bibentry fields.

So that only a subset of fields is read as opposed to all fields?

If you read all fields, you can easily subset afterwards (again using $-notation).

No, it was only lack of documentation, and perhaps an example or two for read.bib that caused me to
stumble.

hth,
Z


--
Michael Friendly     Email: friendly AT yorku DOT ca
Professor, Psychology Dept.
York University      Voice: 416 736-2100 x66249 Fax: 416 736-5814
4700 Keele Street    Web:   http://www.datavis.ca
Toronto, ONT  M3J 1P3 CANADA

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to