Bert, I stand corrected. What I said may have once been true but apparently the implementation seems to have changed at some level. I did not factor that in. Nevertheless, whether you use an index as a key or as an offset into an attached vector of labels, it seems to work the same and I think my comment applies well enough that changing a few labels instead of scanning lots of entries can sometimes be a good think. As far as I can tell, external interface seem the same for now. One issue with R for a long time was how they did not do something more like a Python dictionary and it looks like … ABOVE From: Bert Gunter <bgunter.4...@gmail.com> Sent: Tuesday, June 13, 2023 6:15 PM To: avi.e.gr...@gmail.com Cc: javad bayat <j.bayat...@gmail.com>; R-help@r-project.org Subject: Re: [R] Problem with filling dataframe's column Below.
On Tue, Jun 13, 2023 at 2:18 PM <avi.e.gr...@gmail.com <mailto:avi.e.gr...@gmail.com> > wrote: > > > Javad, > > There may be nothing wrong with the methods people are showing you and if it > satisfied you, great. > > But I note you have lots of data in over a quarter million rows. If much of > the text data is redundant, and you want to simplify some operations such as > changing some of the values to others I multiple ways, have you done any > learning about an R feature very useful for dealing with categorical data > called "factors"? > > If you have a vector or a column in a data.frame that contains text, then it > can be replaced by a factor that often takes way less space as it stores a > sort of dictionary of all the unique values and just records numbers like > 1,2,3 to tell which one each item is. -- This is false. It used to be true a **long time ago**, but R has for quite a while used hashing/global string tables to avoid this problem. See here <https://stackoverflow.com/questions/50310092/why-does-r-use-factors-to-store-characters> for details/references. As a result, I think many would argue that working with strings *as strings,* not factors, if often a better default, though of course there are still situations where factors are useful (e.g. in ordering results by factor levels where the desired level order is not alphabetical). **I would appreciate correction/ clarification if my claims are wrong or misleading! ** In any case, please do check such claims before making them on this list. Cheers, Bert [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.