Re: [R] OT: A philosophical question about statistics

avi.e.gross Tue, 06 May 2025 12:53:31 -0700

Actually, what I would love to be discussed here as more On Topic is which 
functions and packages commonly used in R use the various kind of methods you 
mention. Are newer packages focused some way or another?

What are the drawbacks and advantages. As an example, if a simulation method 
can return different answers each time it is called, then I see anomalies if 
you try comparing it to the results obtained another way. I can imagine two 
methods that each return an answer between .99 and 1.01 where one may be a tad 
higher 90% of the time but 10% it is lower. Comparing results to each other or 
to a classical method that always return 1.00 can be misleading as it is not 
always ...

One final thought. Sometimes a classical approach may still be useful even if a 
newer twist seems to have advantages. A recent example I noticed was a 
discussion in the book The Da Vinci Code of the אתבש (atbash) substitution 
cipher. The suggestion was to write out the alephbet/alphabet (22 letters in 
ancient Hebrew) from one direction to the other and then rewrite in immediately 
below in the opposite order. For example, the short alphabet ABCDEF would be 
shown as:

ABCDEF
FEDCBA

And you could encode or decode by finding the letter in one line and 
substituting the corresponding letter in the other line.

A character suggested a new trick that avoids duplication of just folding one 
copy of the line to make:

ABC
FED

You now can find a letter needed in either line and simply replace it with the 
same letter in the other line. This is an elegant solution, albeit with an odd 
number of letters adjusted a bit.

But which is easier to do in a computer? Obviously both can easily be done and 
perhaps the first is easier to code even if it occupies a bit more space. 
Basically, you do some form of search in the top line and get an index of where 
it is found and reference the second entry in the same index location. Then 
again, the second method can use a simple trick with indices to get a result in 
less space. But, some might argue that in a language like Python, an even 
simpler way is to create a dictionary/hash and skip any linear representation 
by just asking for atbash['A'] and letting it compute a hash and address in 
linear time no matter how large the alphabet being used can get.

The example may be contrived but I have seen countless places where people 
debate which of many methods to use and often the answer turns out to be that 
there are tradeoffs. In one language, there is a sort algorithm that realizes 
that sorting one, two and three and maybe four things is trivially done by a 
few IF statements and only does whatever complex sort is needed if the number 
of items  is larger. It works really fast for small routine tasks and even 
something like a merge/sort can be faster as it speeds through the regions 
where it is down to sorting a few things and can skip some more recursive 
function calls.

R also has some interesting twists in doing some calculations that may help 
guide what available statistical functions make sense as you can use various 
data structures in some but not others. Sometimes you can use a matrix or one 
of many kinds of data.frame, for example. 

So, I am wondering if besides base R functions, are there fairly detailed 
packages for statistics that perhaps may be a bit like the tidyverse and some 
people prefer to use a well-designed and integrated .., ?

-----Original Message-----
From: R-help <r-help-boun...@r-project.org> On Behalf Of Kevin Zembower via 
R-help
Sent: Tuesday, May 6, 2025 9:15 AM
To: r-help@r-project.org
Subject: Re: [R] OT: A philosophical question about statistics

Thank you to everyone who responded. I gained a lot of insight into
statistical methods and the nature of statistical thinking. I replied
to some people privately, to limit the traffic on this OT question.

And thank you for the patience of all who were annoyed by this off-
topic question, and who didn't write to complain. I promise to limit
off-topic questions in the future.

-Kevin

On Mon, 2025-05-05 at 15:17 +0000, Kevin Zembower wrote:
> I marked this posting as Off Topic because it doesn’t specifically
> apply to R and Statistics, but is rather a general question about
> statistics and the teaching of statistics. If this is annoying to
> you,
> I apologize.
> 
> As I wrap up my work in my beginning statistics course, I’d like to
> ask
> a philosophical question regarding statistics.
> 
> In my course, we’ve learned two different ways to solve statistical
> problems: simulations, using bootstraps and randomized distributions,
> and theoretical methods, using Normal (z) and t-distributions. We’ve
> learned that both systems solve all the questions we’ve asked of
> them,
> and that both give comparable answers. Out of six chapters that we’ve
> studied in our textbook, the first four only used simulation methods.
> Only the last two used theoretical methods.
> 
> My questions are:
> 
> 1) Why don’t professional statisticians settle on one or the other,
> and
> just apply that system to their problems and work? What advantage
> does
> one system have over the other?
> 
> 2) As beginning statistics students, why is it important for us to
> learn both systems? Do you think that beginning statistics students
> will still be learning both systems in the future?
> 
> Thank you very much for your time and effort in answering my
> questions.
> I really appreciate the thoughts of the members of this group.
> 
> -Kevin
> 
> 
> 
> 

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] OT: A philosophical question about statistics

Reply via email to