[R] Pattern Matching within Vector?

2009-09-21 Thread Anne-Marie Ternes
Dear mailing list,

I'm stuck with a tricky problem here - at least it seems tricky to me,
being not really talented in pattern matching and regex matters.

I'm analysing amino acid mutations by position and type of mutation.
E.g. (fictitious example) in position 92, I can find L92V, L92MV,
L92I... L is in this example the wild-type amino-acid, and everything
behind the position number is a mutation (single amino acid or
mixture). I'm only interested in the mutation information, so:

Say I've got this vector:
bla -> c("V", "MV", "I", "IL", "PT", "M", "E", "OM")

I'd like to count only those elements that are "truly unique"
mutations, i.e.count "V", "MV" as 1, "I", "IL" as 1, "PT" as 1, "M" as
1, "E" as 1, not count "OM".

I could do it iteratively:
Element 1: V. Keep.
Element 2: MV. Match Keep vs New -> 1. I got already a V, so don't count.
Element 3: I. Match Keep vs New -> 0. I is new, keep. Keep = V,I
Element 4: IL. Match Keep vs New -> 1. I got already an I, so don't count.
Element 5: PT. Match Keep vs New -> 0. PT is new, keep. Keep = V,I,PT
Element 6: M: Match Keep vs New -> 0. M is new, keep. Keep = V,I,PT,M
Element 7: E. Match Keep vs New -> 0. E is new, keep. Keep = V,I,PT,M,E
Element 8: OM. Match Keep vs New -> 1. I got already M, so don't count.

Keep vector= (V,I,PT,M,E), count =5

OK. There must be a more elegant way to do this! Something with
vector-wise pattern matching or so?... By the way, I dont care e.g.
which of "V" or "MV" is counted, what is important is that they are
only counted as 1.

Thanks for your help!

Anne-Marie

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] position legend below x-axis title

2008-07-02 Thread Anne-Marie Ternes
Dear helpers,

I'm using a R script on several different datasets, which makes that
axis scales may vary quite a lot from dataset to dataset. So what I'm
looking for now, is how to automagically find out how to position the
legend (horizontal) in the space below the x-axis title, and how to
make sure that the legend is within the limits of the lower inner or
outer margin?

I'm aware of plot, device and figure regions, and of the "din, fin,
pin, usr, mai, mar, omi, oma and xpd" parameters, of inner and outer
margins.

I cannot simply position my legend on "minus something", as that
depends on "usr" coordinates, and those depend on the scale of the
y-axis.

I tried finding the ideal spot by taking the figure height and
subtracting the upper margin, the height of the plot region, and half
of the lower margin. This is quite tedious and doesn't help me, as I
get a spot in inches, which doesn't to correspond to the "usr"
coordinates.

I also tried to convert between inches and "usr" coordinates using
"xy.coords", but the result did not correspond to the position
returned by "locator".

I'm also aware of "mtext" and its fabulous arguments "side" and
"line", but there I loose the functionality of displaying the legend
symbols.

Finally, I found a hint about using "layout", but there I admit I
would need more than a hint - a small tutorial or graphical example
with code would be very helpful.

So, the question is, if there is per chance a way to pass "side" and
"line" arguments to "legend", and if not, what is the best way to do
what I try to do? BTW, the plot types involved are mostly line plots
and barplots.

Thanks a lot for your help,

Anne-Marie

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sweave / Latex per-chapter output

2008-07-02 Thread Anne-Marie Ternes
Hello to all who have helped me on this topic,

first I need to apologize for apparently replying only now... In fact
I use the "Pan" Newsreader to read the list, and I posted a reply to
the thread a week after your suggestions through Pan, and I only now
realised that the posting never arrived on the list although Pan gave
me no error message at all!

So, let me try again using the good old email-to-email way.

You all helped me so much! This is what I'm doing now:

- I separated my large file into several chapter files
- In each file, I include a Sweave options file using "\SweaveInput"
- I make sure to first run a "pre" file, which checks if some files
containing data from the database that I need repeatedly are there or
not, and if they are not too old. If the files are absent or expired,
the database is queried and the files recreated.
- In each file, I then "source" an init.R file, which reads the
previously created files and sets some "global" variables I'll need
all the time. So, the querying is done maximum 1 time, while the file
reading and variable setting is re-run for each chapter, which doesn't
seem to be a problem (performance-wise).
- In my master tex file, I include the different chapters. If I want
to generate PDF for only a single chapter, I cannot use the
"\includeonly" directive, because I will always need to run "pre"
first, and then the chapter I want. So, I just comment out the things
I don't want to run.
- In "my" Makefile (it's Mark's really, with some minor adaptations),
I specify the following to make sure "pre" is run first before running
the individual chapters:

RNWFILES = pre.Rnw intro.Rnw $(wildcard c*.Rnw)

So, with all this I can get "whole document" PDFs or "per-chapter" PDFs - Great!

Actually, then I go on and feed the tex file(s) to "latex2html", a
great tool, to generate HTML equivalents all ready with navigation,
buttons and all.

Maybe it's not yet perfect, but for sure it is much, much better than before!

Greetings from Luxembourg,

Anne-Marie

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] recursively divide a value to get a sequence

2008-07-09 Thread Anne-Marie Ternes
Hi,

if given the value of, say, 15000, I would like to be able to divide
that value recursively by, say, 5, and to get a vector of a determined
length, say 9, the last value being (set to) zero- i.e. like this:

15000 3000 600 120 24 4.8 0.96 0.192 0

These are in fact concentration values from an experiment. For my
script, I get only the starting value (here 15000), and the factor by
which concentration is divided for each well, the last one having, by
definition, no antagonist at all.

I have tried to use "seq", but it can "only" do positive or negative
increment. I didn't either find a way with "rep", "sweep" etc. These
function normally start from an existing vector, which is not the case
here, I have only got a single value to start with.

I suppose I could do something "loopy", but I'm sure there is a better
way to do it.

Thanks a lot for your help, hope the question is not too dumb...

Anne-Marie

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] recursively divide a value to get a sequence

2008-07-09 Thread Anne-Marie Ternes
Keith,

I am simply baffled! Didn't think a second about doing it this way, tsss -
Great!

Thanks also for Daniel, Jim's and Bart's proposals!

R is cool, I realise it every day again :-)

Thanks!!

On Wed, Jul 9, 2008 at 12:33 PM, Jim Lemon <[EMAIL PROTECTED]> wrote:
> On Wed, 2008-07-09 at 11:40 +0200, Anne-Marie Ternes wrote:
>> Hi,
>>
>> if given the value of, say, 15000, I would like to be able to divide
>> that value recursively by, say, 5, and to get a vector of a determined
>> length, say 9, the last value being (set to) zero- i.e. like this:
>>
>> 15000 3000 600 120 24 4.8 0.96 0.192 0
>>
>> These are in fact concentration values from an experiment. For my
>> script, I get only the starting value (here 15000), and the factor by
>> which concentration is divided for each well, the last one having, by
>> definition, no antagonist at all.
>>
>> I have tried to use "seq", but it can "only" do positive or negative
>> increment. I didn't either find a way with "rep", "sweep" etc. These
>> function normally start from an existing vector, which is not the case
>> here, I have only got a single value to start with.
>>
>> I suppose I could do something "loopy", but I'm sure there is a better
>> way to do it.
>>
> Well, if you really want to do it recursively (and maybe loopy as well)
>
> recursivdiv<-function(x,denom,lendiv,firstpass=TRUE) {
>  if(firstpass) lendiv<-lendiv-1
>  if(lendiv > 1) {
>  divvec<-c(x/denom,recursivdiv(x/denom,denom,lendiv-1,FALSE))
>  cat(divvec,ndiv,"\n")
>  }
>  else divvec<-0
>  if(firstpass) divvec<-c(x,divvec)
>  return(divvec)
> }
>
> Jim
>
>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] tryCatch - return from function to main script

2008-07-15 Thread Anne-Marie Ternes
Dear helpers,

I've got a main script, which calls 4 times a function on 4 different
datasets respectively. This function runs "nls" and is located in
another R script which is sourced into my main script.

What I would like to have is this:

If, e.g. in the 3rd call of the function, nls fails, because it can't
converge, I would like it to return an error (value or message), and
continue with the 4th call in my main script.

I've tried "try", but it always completely stops execution. I've also
played around with "tryCatch", but to be honest, the help page is
quite cryptic to me. I'm sure "tryCatch" has a way of being told to
"ok, stop this, and continue with the main script".

As I'm quite in a hurry (need to finish this before leaving tomorrow),
I'd be glad if you could give me a very practical example - I promise
to look deeper into the details of exception handling in R when I'm
back.

Thanks a lot!!

Anne-Marie

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] genotype analysis

2008-03-26 Thread Anne-Marie Ternes
Dear mailing list,

I'm still quite a newbie in the statistical analysis of
genotype/allele data, resp. more generally in the analysis of
categorical variables. Moreover, I'm currently totally confused by the
many R packages available to do such analysis.

Here is my case: I've got a list of genes, and a number of
case-control population pairs, and for each population and gene, the
various genotypes that have been found. I've got both aggregate data
(ex. gene1: homozygote wildtype: 201, heterozygote mutation carrier:
34, homozygote mutation carrier: 5) and per-gene data (i.e. for gene1
a list of e.g. "V/V", "V/I", "II" etc).

The question asked is whether there is a difference in the mutation
pattern between the case and the control groups influencing the
outcome, both at the level of a single gene, and at the level of their
combination. Moreover, I would like to check for linkage
desequilibrium (LD), as I know that some of these genes are located
quite closely on the chromosome.

OK, so up to now I've been doing the Chi-square tests, McNemar matched
pairs test, Fisher test if my numbers were too small.

As for the LD question, if I have understood correctly, I have to use
log-linear regression. I have been trying several R packages, and I'm
so confused now, because I don't know which one is best suited for my
problem. I have to add that I'm new also to log-linear regression...

I've used "hwde", and read the paper on which it is based (see hwde
doc), but the package leaves out certain output rows that are shown in
the paper, and it doesn't show which of the output rows is
significant, as the paper does. Is there any simply way to interpret
"hwde" output (something like a p-value)?

Then there are the "GeneticsBase", "Genetics", "mapLD",
"Hardy-Weinberg" packages. Some work only for a single gene, some
apply a thing called "MLE", some "general linearized models", etc.

I know these questions are as much basic statistical than R questions.
But I'd be glad if you could help me find the best solution for my
type of analysis, resp. point me to good resources that show me how to
do this. The problem is that most resources show "how to" do the
analysis, but they don't explain at all how to *interpret* their
output.

Thanks a lot in advance,

Anne-Marie

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Sweave / Latex per-chapter output

2008-05-21 Thread Anne-Marie Ternes
Dear R-help,

I am using Sweave and pdflatex to generate a large report from data
contained in my database (Postgres via RODBC). Currently, I work with
a single R/Sweave file, containing several "chapter" indications for
the Latex engine. My master tex file sets the document class, and
includes the introduction, the main Sweave file, and a conclusions and
reference file. I use a makefile to produce the final PDF (based on
the thread "Sweave, R and complex latex projects:
http://tolstoy.newcastle.edu.au/R/e2/help/06/11/4891.html)

What I would like to do, is to be able to get 2 types of output with
the same code (I'm lazy ;-) ):
1. my large report in a single PDF file, for printing out and distributing
2. a PDF and HTML file *per chapter*, for displaying on our website
and allowing people to download individual chapters

I have tried the following things:
- see if pdflatex has an option to split PDF output per chapter; as
far as I see, it doesn't
- separate the Sweave file into chapter parts. The problems here are
1) that I do a certain number of R preparations (variables setting,
table querying) which are data that I will need in later parts of the
code, 2) that I would need to embed the generated tex files with
per-chapter master tex files setting the documentclass and other
options and including the chapter; I also tried to see if it was
possible to tell pdflatex to assume documentclass X even if it wasn't
specified in the file, but that doesn't seem to work either
- generate my large PDF report as usual and manually cut it into
chapters (tedious)
- use R via PHP to output per-chapter HTMLs which I could turn into
PDFs using output buffering; this works for the graphics, but I'm
unable to get back e.g. tabular data for proper display; also I would
loose the latex-y beauty of my PDF

As I'm a novice in Latex and Makefiles usage, I'd be glad if you could
tell me if what I want to do is feasible (I'm sure it is), and which
would be the best, fussless method to do it (i.e. generate both types
of output without changing the R/Sweave code).

I know you'll probably tell me to break my long Sweave code into
smaller parts, but as I briefly said above, I do some variable setting
and table querying at the start - things I will repeatedly need in
later chapters (e.g. I query a population table for computing
incidence rates several times in later chapters). If there is a better
way to split the code without having to requery the database at each
chapter, I'll be glad to know about that too!

BTW, I'm working on Ubuntu Linux.

Thanks a lot for your insight,

Anne-Marie

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.