I found this piece of code on github
https://gist.github.com/kljensen/5452382
def one_hot_dataframe(data, cols, replace=False):
""" Takes a dataframe and a list of columns that need to be encoded.
Returns a 3-tuple comprising the data, the vectorized data,
and the fitted vectorizor.
"""
vec = DictVectorizer()
mkdict = lambda row: dict((col, row[col]) for col in cols)
#<<<<<<<<<<<<<<<<<<
vecData = pandas.DataFrame(vec.fit_transform(data[cols].apply(mkdict,
axis=1)).toarray())
vecData.columns = vec.get_feature_names()
vecData.index = data.index
if replace is True:
data = data.drop(cols, axis=1)
data = data.join(vecData)
return (data, vecData, vec)
I don't understand how that lambda expression works.
For starters where did row come from?
How did it know it was working on data?
Any help with understanding this would be appreciate.
And I tried the code out and it works exactly how it is supposed to. I just
don't understand how.
Thanks,
Mike
_______________________________________________
Tutor maillist - Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor