Bert,
 
I stand corrected. What I said may have once been true but apparently the 
implementation seems to have changed at some level.
 
I did not factor that in.
 
Nevertheless, whether you use an index as a key or as an offset into an 
attached vector of labels, it seems to work the same and I think my comment 
applies well enough that changing a few labels instead of scanning lots of 
entries can sometimes be a good think. As far as I can tell, external interface 
seem the same for now. 
 
One issue with R for a long time was how they did not do something more like a 
Python dictionary and it looks like …
 
ABOVE
 
From: Bert Gunter <bgunter.4...@gmail.com> 
Sent: Tuesday, June 13, 2023 6:15 PM
To: avi.e.gr...@gmail.com
Cc: javad bayat <j.bayat...@gmail.com>; R-help@r-project.org
Subject: Re: [R] Problem with filling dataframe's column
 
Below.


On Tue, Jun 13, 2023 at 2:18 PM <avi.e.gr...@gmail.com 
<mailto:avi.e.gr...@gmail.com> > wrote:
>
>  
> Javad,
>
> There may be nothing wrong with the methods people are showing you and if it 
> satisfied you, great.
>
> But I note you have lots of data in over a quarter million rows. If much of 
> the text data is redundant, and you want to simplify some operations such as 
> changing some of the values to others I multiple ways, have you done any 
> learning about an R feature very useful for dealing with categorical data 
> called "factors"?
>
> If you have a vector or a column in a data.frame that contains text, then it 
> can be replaced by a factor that often takes way less space as it stores a 
> sort of dictionary of all the unique values and just records numbers like 
> 1,2,3 to tell which one each item is.
 
-- This is false. It used to be true a **long time ago**, but R has for quite a 
while used hashing/global string tables to avoid this problem. See here 
<https://stackoverflow.com/questions/50310092/why-does-r-use-factors-to-store-characters>
  for details/references.
As a result, I think many would argue that working with strings *as strings,* 
not factors, if often a better default, though of course there are still 
situations where factors are useful (e.g. in ordering results by factor levels 
where the desired level order is not alphabetical).
 
**I would appreciate correction/ clarification if my claims are wrong or 
misleading! **
 
In any case, please do check such claims before making them on this list.
 
Cheers,
Bert
 
 

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to