Re: [R] Problem with filling dataframe's column

avi.e.gross Wed, 14 Jun 2023 22:52:19 -0700

Richard, it is indeed possible for different languages to choose different 
approaches.
 
If your point is that an R  named list can simulate a Python dictionary (or for 
that manner, a set) there is some validity to that. You can also use 
environments similarly.
 
Arguably there are differences including in things like what notations are 
built into the language. If you look the other way, Python chose to make lists 
a major feature which can hold any combination of things and can even be used 
to emulate a matrix with sub-lists and also had a tuple version that is similar 
but immutable and initially neglected something as simple as a vector 
containing just one kind of content. If you look at it now, many people simply 
load numpy (and often pandas) to get functionality that is faster and comes by 
default in R.
 
I think this discussion was about my (amended) offhand remark suggesting R 
factors stored plain text in a vector attached to the variable and the offset 
was the number stored in the main factor vector. If that changed to internally 
use something hashed like a dictionary, fine. I have often made data structures 
such as in your example to store named items but did not call it a dictionary 
but simply a named list. In one sense, the two map into each other but I could 
argue there remain differences. For example, you can use something immutable 
like a tuple as a key in python. 
 
This is not an argument about which language is better. Each has developed to 
fill ideas and has been extended and quite a few things can now be done in 
either one. Still, it can be interesting to combine the two inside RSTUDIO so 
each does some of what it may do better or faster or in a way you find more 
natural.
 
 
From: Richard O'Keefe <rao...@gmail.com> 
Sent: Wednesday, June 14, 2023 10:34 PM
To: avi.e.gr...@gmail.com
Cc: Bert Gunter <bgunter.4...@gmail.com>; R-help@r-project.org
Subject: Re: [R] Problem with filling dataframe's column
 
Consider
 
  m <- list(foo=c(1,2),"B'ar"=as.matrix(1:4,2,2),"!*#"=c(FALSE,TRUE))
 
It is a collection of elements of different types/structures, accessible
via string keys (and also by position).  Entries can be added:
 
  m[["fred"]] <- 47
 
Entries can be removed:
 
  m[["!*#"]] <- NULL
 
How much more like a Python dictionary do you need it to be?
 
 
 
On Wed, 14 Jun 2023 at 11:25, <avi.e.gr...@gmail.com 
<mailto:avi.e.gr...@gmail.com> > wrote:
Bert,

I stand corrected. What I said may have once been true but apparently the 
implementation seems to have changed at some level.

I did not factor that in.

Nevertheless, whether you use an index as a key or as an offset into an 
attached vector of labels, it seems to work the same and I think my comment 
applies well enough that changing a few labels instead of scanning lots of 
entries can sometimes be a good think. As far as I can tell, external interface 
seem the same for now. 

One issue with R for a long time was how they did not do something more like a 
Python dictionary and it looks like …

ABOVE

From: Bert Gunter <bgunter.4...@gmail.com <mailto:bgunter.4...@gmail.com> > 
Sent: Tuesday, June 13, 2023 6:15 PM
To: avi.e.gr...@gmail.com <mailto:avi.e.gr...@gmail.com> 
Cc: javad bayat <j.bayat...@gmail.com <mailto:j.bayat...@gmail.com> >; 
R-help@r-project.org <mailto:R-help@r-project.org> 
Subject: Re: [R] Problem with filling dataframe's column

Below.

On Tue, Jun 13, 2023 at 2:18 PM <avi.e.gr...@gmail.com 
<mailto:avi.e.gr...@gmail.com>  <mailto:avi.e.gr...@gmail.com 
<mailto:avi.e.gr...@gmail.com> > > wrote:
>
>  
> Javad,
>
> There may be nothing wrong with the methods people are showing you and if it 
> satisfied you, great.
>
> But I note you have lots of data in over a quarter million rows. If much of 
> the text data is redundant, and you want to simplify some operations such as 
> changing some of the values to others I multiple ways, have you done any 
> learning about an R feature very useful for dealing with categorical data 
> called "factors"?
>
> If you have a vector or a column in a data.frame that contains text, then it 
> can be replaced by a factor that often takes way less space as it stores a 
> sort of dictionary of all the unique values and just records numbers like 
> 1,2,3 to tell which one each item is.

-- This is false. It used to be true a **long time ago**, but R has for quite a 
while used hashing/global string tables to avoid this problem. See here 
<https://stackoverflow.com/questions/50310092/why-does-r-use-factors-to-store-characters>
  for details/references.
As a result, I think many would argue that working with strings *as strings,* 
not factors, if often a better default, though of course there are still 
situations where factors are useful (e.g. in ordering results by factor levels 
where the desired level order is not alphabetical).

**I would appreciate correction/ clarification if my claims are wrong or 
misleading! **

In any case, please do check such claims before making them on this list.

Cheers,
Bert

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org <mailto:R-help@r-project.org>  mailing list -- To 
UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with filling dataframe's column

Reply via email to