On Sat, Dec 14, 2013 at 12:14 PM, Michael Crawford <dalu...@gmail.com> wrote: > I found this piece of code on github > > https://gist.github.com/kljensen/5452382 > > def one_hot_dataframe(data, cols, replace=False): > """ Takes a dataframe and a list of columns that need to be encoded. > Returns a 3-tuple comprising the data, the vectorized data, > and the fitted vectorizor. > """ > vec = DictVectorizer() > mkdict = lambda row: dict((col, row[col]) for col in cols) > #<<<<<<<<<<<<<<<<<< > vecData = pandas.DataFrame(vec.fit_transform(data[cols].apply(mkdict, > axis=1)).toarray()) > vecData.columns = vec.get_feature_names() > vecData.index = data.index > if replace is True: > data = data.drop(cols, axis=1) > data = data.join(vecData) > return (data, vecData, vec) > > I don't understand how that lambda expression works. > For starters where did row come from? > How did it know it was working on data?
Consider this simple example: >>> l = lambda x: x**2 >>> apply(l, (3,)) 9 A lambda is an anonymous function. So, when you use apply(), the lambda, l gets the value 3 in x and then returns x**2 which is 9 in this case. Hope this helps you. Best, Amit. _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor