On Jan 5, 2011, at 3:19 PM, Anthony Staines wrote: > Dear colleagues, > > This may be a question with a really obvious answer, but I > can't find it. I have access to a large file with real > medical record identifiers (mixed strings of characters and > numbers) in it. These represent medical events for many > thousands of people. It's important to be able to link > events for the same people. > > It's much more important that the real record numbers are > strongly obscured. I'm interested in some kind of strong > one-way hash function to which I can feed the real numbers > and get back unique codes for each record identifier fed > in. I can do this on the health service system, and I have > to do this before making further use of the data! > > There is the 'digest' function, in the digest package, but > this seems to work on the whole vector of IDs, producing, in > my case, a vector with 60,000 identical entries. > > H.Out$P_ID = digest(H.In$MRNr,serialize=FALSE, algo='md5') > > I could do this in Perl, but I'd have to do quite a bit of > work to get it installed. > > Any quick suggestions? > Anthony Staines
Try using sapply(): L <- replicate(60000, paste(sample(letters, 10, replace = TRUE), collapse = "")) > str(L) chr [1:60000] "dfederergw" "nwphehurvb" "avzmvltrhn" ... > head(L) [1] "dfederergw" "nwphehurvb" "avzmvltrhn" "ecmeiasmbk" "kmlcxydygl" [6] "wpftnyrzwe" # Use sapply() to run digest() over each element of L > system.time(L.Digest <- sapply(L, digest)) user system elapsed 6.920 0.031 7.361 > str(L.Digest) Named chr [1:60000] "6d5861904ee004d251504cb0f731a69a" ... - attr(*, "names")= chr [1:60000] "dfederergw" "nwphehurvb" "avzmvltrhn" "ecmeiasmbk" ... > head(L.Digest) dfederergw nwphehurvb "6d5861904ee004d251504cb0f731a69a" "bf8ee61f69c83468988cad681a9f7ad0" avzmvltrhn ecmeiasmbk "ba1c66af41359cf1a3f5e91f22c6dfe5" "95ca2deaa6c1118852c9ffed71994a7f" kmlcxydygl wpftnyrzwe "f3647a7937a2c484123ef33bb52a27ac" "e84f17180703e4805493d88a760be682" HTH, Marc Schwartz ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.