1
down vote
favorite
Its a bit long question have patience. firstly I have 2 data frames one in
which i have name of a guy and pages liked by him in columns. So no. of columns
will be different for different person here is the example. 1st column is the
name of user.Then pages liked by him is stored across the row.So no. of columns
for 'random guy' will be different from 'mank rion'. 'BlackBick' , '500
Startups' e.t.c are name of the page. let say name of this data frame is
User_page
random guy BlackBuck GiveMeSport Analytics Ninja
mank nion DJ CHETAS Celina Jaitly Gurkeerat Singh
pop rajuel WOW Editions 500 Startups Biswapati Sarkar
Roshan ghai MensXP No Abuse the smartian
Now I have another Data frame in which is kind of same as upper one but in the
place of page's name there is a category of page.you might now there are
different category of pages on fb. so let say 'BlacBuck''s category is
'Transport/Freight'. There are pages with same name and different category.That
is why i cant use name directly as key this is how my data frame looks like.Let
say name of this data frame User_category.
random guy Transport/Freight Sport Insurance Company
mank nion Arts/Entertainment Actress Actor/Director
pop rajuel Concert Tour App Page Actor/Director
Roshan ghai News/Media Website Community Public Figure
Now I have two more Data frames. one in which I have name of fb pages as 1st
column and 162 more columns with some tag for each page there is value 1 for
i*j element if ith page comes in to jth tag otherwise left empty so it will
look like.let say name of this dataframe is Page_tag
name of page tag 1 tag2 tag3
BlackBuck 1 1
GiveMeSport 1 1
Analytics Ninja 1 1
DJ CHETAS 1 1
the another one have name of categories as 1st column and same 162 as further.
like this. let say name of this dataframe is Category_tag.
category_name tag 1 tag2 tag3
Sport 1 1
App Page 1 1
Actor/Director 1
Public Figure 1 1
Now what I have to get the tag counts for each user from pages he has liked.
for that first I have to first check that the page which he has liked where
exist in data frame of Page_tag which is 3rd dataframe in my question if it
exist there take the counts of tags that how many times a specific tags
appeared for that user.this is first step if not found the name of page as no.
of pages in Page_tag dataframe(3rd one) is limited. I will go to category of
page (from 2nd dataframe in this question) for the pages which are left out and
for that category i will count the tags count for the specific user from
dataframe named Category_tags(4th dataframe in this question) and sum the tag
count and my output something like this. Output
username tag1 tag2 tag3
random guy 1 2 2
mank nion 2 1 3
pop rajuel 4 0 2
Roshan ghai 0 2 1
a i*j element on this dataframe shows no. times that the jth tag appears for
ith user. I have written code for this and more in R i am stuck in this
particular step. The code of R wasnt optimal as i used loops many time. I
wanted to rhis optimally, hopefully can be done in pandas. Please me know if
clarification is needed. Any help will be appreciated. Thank you.
--
https://mail.python.org/mailman/listinfo/python-list