from:"Rahul Gupta"

i am want to read data from the csv that i wrote using python csv module but apart from filed names and row count i am unable to read rest of the data

2020-04-12 Thread Rahul Gupta

the cells in my csv that i wrote looks likes this 
['82#201#426#553#602#621#811#908#1289#1342#1401#1472#1593#1641#1794#2290#2341#2391#3023#3141#3227#3240#3525#3529#3690#3881#4406#4421#4497#4719#4722#4920#5053#5146#5433']
and the cells which are empty looks like ['']
i have tried the following code
import csv
import numpy as np
with open("D:\PHD\obranking\\cell_split_demo.csv", mode='r') as csv_file:
csv_reader = csv.DictReader(csv_file)
print(csv_reader.fieldnames)
col_count = print(len(csv_reader.fieldnames))
print(sum(1 for row in csv_file))
for line in csv_reader:
print(line)
 but when i print line it shows nothing 
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: i am want to read data from the csv that i wrote using python csv module but apart from filed names and row count i am unable to read rest of the data

2020-04-12 Thread Rahul Gupta

On Sunday, April 12, 2020 at 1:35:10 PM UTC+5:30, Rahul Gupta wrote:
> the cells in my csv that i wrote looks likes this 
> ['82#201#426#553#602#621#811#908#1289#1342#1401#1472#1593#1641#1794#2290#2341#2391#3023#3141#3227#3240#3525#3529#3690#3881#4406#4421#4497#4719#4722#4920#5053#5146#5433']
> and the cells which are empty looks like ['']
> i have tried the following code
> import csv
> import numpy as np
> with open("D:\PHD\obranking\\cell_split_demo.csv", mode='r') as csv_file:
> csv_reader = csv.DictReader(csv_file)
> print(csv_reader.fieldnames)
> col_count = print(len(csv_reader.fieldnames))
> print(sum(1 for row in csv_file))
> for line in csv_reader:
> print(line)
>  but when i print line it shows nothing
@Peter Otten thanks that problem got solved but now when i am trying to acess a 
particular column for every row in csv i am getting error.
the code used in addition to the above code
for line in enumerate(csv_reader):
    print(line[csv_reader.fieldnames[1]])
the eoors as follows
"C:\Users\Rahul Gupta\PycharmProjects\CSVLearn\venv\Scripts\python.exe" 
"C:/Users/Rahul Gupta/PycharmProjects/CSVLearn/test10.py"
Traceback (most recent call last):
  File "C:/Users/Rahul Gupta/PycharmProjects/CSVLearn/test10.py", line 16, in 

print(line[csv_reader.fieldnames[1]])
TypeError: tuple indices must be integers or slices, not str
['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', 
'14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', 
'27', '28', '29', '30', '31', '32', '33', '34', '35', '36', '37', '38', '39', 
'40', '41', '42', '43', '44', '45', '46', '47', '48', '49', '50', '51', '52', 
'53', '54', '55', '56', '57', '58', '59', '60', '61', '62', '63', '64', '65', 
'66', '67', '68', '69', '70', '71', '72', '73', '74', '75', '76', '77', '78', 
'79', '80', '81', '82', '83', '84', '85', '86', '87', '88', '89', '90', '91', 
'92', '93', '94', '95', '96', '97', '98', '99', '100', '101', '102', '103', 
'104', '105', '106', '107', '108', '109', '110', '111', '112', '113', '114', 
'115', '116', '117', '118', '119', '120', '121', '122', '123', '124', '125', 
'126', '127', '128', '129', '130', '131', '132', '133', '134', '135', '136', 
'137', '138', '139', '140', '141', '142', '143', '144', '145', '146', '147', 
'148', '149', '150', '151', '152', '153', '154', '155', '156', '157', '
 158', '159', '160', '161', '162', '163', '164', '165', '166', '167', '168', 
'169', '170', '171', '172', '173', '174', '175', '176', '177', '178', '179', 
'180', '181', '182', '183', '184', '185', '186', '187', '188', '189', '190', 
'191', '192', '193', '194', '195', '196', '197', '198', '199', '200', '201', 
'202', '203', '204', '205', '206', '207', '208', '209', '210', '211', '212', 
'213', '214', '215', '216', '217', '218', '219', '220', '221', '222', '223', 
'224', '225', '226', '227', '228', '229', '230', '231', '232', '233', '234', 
'235', '236', '237', '238', '239', '240', '241', '242', '243', '244', '245', 
'246', '247', '248', '249', '250', '251', '252', '253', '254', '255', '256', 
'257', '258', '259', '260', '261', '262', '263', '264', '265', '266', '267', 
'268', '269', '270', '271', '272', '273', '274', '275', '276', '277', '278', 
'279', '280', '281', '282', '283', '284', '285', '286', '287', '288', '289', 
'290', '291', '292', '293', '294', '295', '296', '297', '298', '299']
300

Process finished with exit code 1

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: i am want to read data from the csv that i wrote using python csv module but apart from filed names and row count i am unable to read rest of the data

2020-04-12 Thread Rahul Gupta





import csv
import numpy as np
with open("D:\PHD\obranking\\cell_split_demo.csv", mode='r') as csv_file:
csv_reader = csv.DictReader(csv_file)
print(csv_reader.fieldnames)
col_count = print(len(csv_reader.fieldnames))
#print(sum(1 for row in csv_file))
row_count = 0

for line in enumerate(csv_reader):
print(line[csv_reader.fieldnames[1]])

@peter Otten this above one is Test10.py
@peter otten below i am posting how i created cell_split_demo.csv using test9.py
this is test9.py
import csv
import numpy as np

with open("D:\PHD\obranking\\demo.csv", mode='r') as csv_file1, 
open("D:\PHD\obranking\\demo.csv", mode='r') as csv_file2:
csv_reader1 = csv.DictReader(csv_file1)
csv_reader2 = csv.DictReader(csv_file2)

#csv_contents = list(csv_reader)
#for i in csv_contents:
#print(i['label'])
#print(csv_contents)

filename = "cell_split_demo.csv"
with open("D:\PHD\obranking\\cell_split_demo.csv", 'w') as csvfilew1:
fields = (range(0, 300))
csvwriter1 = csv.DictWriter(csvfilew1, fieldnames=fields)
csvwriter1.writeheader()

for i, row in enumerate(csv_reader1):
Mat = np.full([1, 300], '', dtype='object')
matrixrows = dict().fromkeys(fields)
for j, line in enumerate(csv_reader2):
if j != 300:
matrixrows[j] = []
if row['label'] != line['label']:
for k in range(1,5502):
if row[csv_reader1.fieldnames[k]] != 
line[csv_reader2.fieldnames[k]]:
if Mat[0][j] == '':
Mat[0][j] = str(k)
else:
Mat[0][j] += '#' + str(k)
#print(Mat[0][j])
print(i)
#print(j)
matrixrows[j].append(Mat[0][j])
if j == 299:
csvwriter1.writerow(matrixrows)
csv_file2.seek(0)


-- 
https://mail.python.org/mailman/listinfo/python-list

Re: i am want to read data from the csv that i wrote using python csv module but apart from filed names and row count i am unable to read rest of the data

2020-04-12 Thread Rahul Gupta

@Peter Thanks alot
-- 
https://mail.python.org/mailman/listinfo/python-list

To apply pca for a large csv

2020-04-14 Thread Rahul Gupta

Hello all, i have a csv of 1 gb which consists of 25000 columns and 2 rows. 
I want to apply pca so i have seen sciki-learn had inbuilt fucntionality to use 
that. But i have seen to do eo you have to load data in data frame. But my 
machine is i5 with 8 gb of ram which fails to load all this data in data frame 
and shows memory error. Is there any alternative way that still i could aaply 
PCA on the same machine to the same rata set
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: To apply pca for a large csv

2020-04-14 Thread Rahul Gupta

64 bit version
-- 
https://mail.python.org/mailman/listinfo/python-list

Incremental PCA

2020-04-18 Thread Rahul Gupta

i wanted to implement incremental PCA.
Got this code for stack overflow but i am wondering what y = chunk.pop("y") 
does and what is this argument "y" to pop
from sklearn.decomposition import IncrementalPCA
import csv
import sys
import numpy as np
import pandas as pd

dataset = sys.argv[1]
chunksize_ = 5 * 25000
dimensions = 300

reader = pd.read_csv(dataset, sep = ',', chunksize = chunksize_)
sklearn_pca = IncrementalPCA(n_components=dimensions)
for chunk in reader:
y = chunk.pop("Y")
sklearn_pca.partial_fit(chunk)

# Computed mean per feature
mean = sklearn_pca.mean_
# and stddev
stddev = np.sqrt(sklearn_pca.var_)

Xtransformed = None
for chunk in pd.read_csv(dataset, sep = ',', chunksize = chunksize_):
y = chunk.pop("Y")
Xchunk = sklearn_pca.transform(chunk)
if Xtransformed == None:
Xtransformed = Xchunk
else:
Xtransformed = np.vstack((Xtransformed, Xchunk))
-- 
https://mail.python.org/mailman/listinfo/python-list

chi square test in sklearn printing NAN values for most of the columns

2020-04-27 Thread Rahul Gupta

Hi i am trying to use chi-square Test to select most important columns among 
5501 columns. But for most of the columns i am getting NAN value as a Chi test 
value

import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.feature_selection import chi2
cols =[]
cols.append(int(0))
#for i in range(1, 5502):
cols.append(int(10))

df = pd.read_csv("D:\PHD\obranking\\demo.csv", usecols=cols)
df.apply(LabelEncoder().fit_transform)
X = df.drop(labels='label', axis=1)
Y = df['label']
chi_scores = chi2(X, Y)
print(chi_scores)
in this code i printed chi value for 10th column but for most of the columns it 
is behaving like below "C:\Users\Rahul 
Gupta\PycharmProjects\CSVLearn\venv\Scripts\python.exe" "C:/Users/Rahul 
Gupta/PycharmProjects/CSVLearn/ChiSq_learn.py" (array([nan]), array([nan]))

Process finished with exit code 0
-- 
https://mail.python.org/mailman/listinfo/python-list

unable to write content in csv filw

2020-04-27 Thread Rahul Gupta

FOLLWOING IS MY CODE
import pandas as pd
import csv
from sklearn.preprocessing import LabelEncoder
from sklearn.feature_selection import chi2
with open("D:\PHD\obranking\\test_chi.csv", 'w') as csvfilew1:
fields = ['index', 'feature name', 'p_value']
csvwriter1 = csv.DictWriter(csvfilew1, fieldnames=fields)
csvwriter1.writeheader()
for i in range(1, 5502):
csv_data = dict().fromkeys(fields)
csv_data['index'] = i
cols =[]
cols.append(int(0))
cols.append(int(i))

df = pd.read_csv("D:\PHD\obranking\\demo.csv", usecols=cols)
df.apply(LabelEncoder().fit_transform)
X = df.drop(labels='label', axis=1)
Y = df['label']
chi_scores = chi2(X, Y)
if(chi_scores[1] < 0.05):
f_name = str(X.columns)
f_name = f_name[8:-19]
csv_data['feature name'] = f_name
p_val = str(chi_scores[1])
p_val = p_val[1:-1]
csv_data['p_value'] = p_val
print(csv_data)
csvwriter1.writerow(csv_data)
#print(csv_data)
#print(f_name + p_val)
#print(str(X.col + str(chi_scores[1]))
test_chi.csv is created but it remains empty after execution of the code. 
although when i am printing csv_data it gets printed but not written in csv 
using writerow(csv_data). Also there are no field names in the csv even 
writeheader() seems to not work. I am confused what is wrong. Could someone 
help
-- 
https://mail.python.org/mailman/listinfo/python-list

error in CSV resetting with seek(0)

2020-05-01 Thread Rahul Gupta

consider the following code
import csv
import numpy as np

with open("D:\PHD\obranking\\demo.csv", mode='r') as csv_file1, 
open("D:\PHD\obranking\\demo.csv", mode='r') as csv_file2:
csv_reader1 = csv.DictReader(csv_file1)
csv_reader2 = csv.DictReader(csv_file2)


filename = "cell_split_demo.csv"
with open("D:\PHD\obranking\\cell_split_demo.csv", 'w') as csvfilew1:
fields = (range(0, 300))
csvwriter1 = csv.DictWriter(csvfilew1, fieldnames=fields)
csvwriter1.writeheader()

for i, row in enumerate(csv_reader1):
print(f"value_i({i}) label({row['label']})")
for j, line in enumerate(csv_reader2):
if j <= i:
matrixrows[j] = []
if row['label'] != line['label']:
print(f"value_j({j})Unequal label({line['label']})")
else:
print(f"value_j({j})   equal label({line['label']})")
pass
else:
break
csv_file2.seek(0)
Here is some of the out_put samples
value_i(0) label(BW)
value_j(0)   equal label(BW)
value_i(1) label(BW)
value_j(0)   Unequal label(label)
value_j(1)   equal label(BW)
value_i(2) label(BW)
value_j(0)   Unequal label(label)
value_j(1)   equal label(BW)
value_j(2)   equal label(BW)
You can see for j=0 while i goes from 1 to n it is not able to acess 
line['label'] value.
Kindly help what is wrong with this?
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: error in CSV resetting with seek(0)

2020-05-02 Thread Rahul Gupta

@peter Otten thanks
-- 
https://mail.python.org/mailman/listinfo/python-list

Ram memory not freed after executing python script on ubuntu system

2020-05-27 Thread Rahul Gupta



I am having a Ubuntu system which has 125 Gb of RAM. I executed few python 
scripts on that system. Those scripts uses numpy arrays and pandas. Now 
execution was over but still 50 gb of RAM and 2 Gb cache and 8.4 Gb of swap is 
occupied. At this moment nothing is running on the system. I have googled it. 
Most of th result shows that python garbage collector is poor in performance. I 
want this memory to be cleaned and re claim. One of the easiest way is to 
restart the system but i dont want to restart i want a way to do this when the 
system is up and running. Kindly tell me how to do this. Thanks
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Ram memory not freed after executing python script on ubuntu system

2020-05-28 Thread Rahul Gupta

On Thursday, May 28, 2020 at 11:20:05 AM UTC+5:30, Rahul Gupta wrote:
> I am having a Ubuntu system which has 125 Gb of RAM. I executed few python 
> scripts on that system. Those scripts uses numpy arrays and pandas. Now 
> execution was over but still 50 gb of RAM and 2 Gb cache and 8.4 Gb of swap 
> is occupied. At this moment nothing is running on the system. I have googled 
> it. Most of th result shows that python garbage collector is poor in 
> performance. I want this memory to be cleaned and re claim. One of the 
> easiest way is to restart the system but i dont want to restart i want a way 
> to do this when the system is up and running. Kindly tell me how to do this. 
> Thanks
Yes i am sure 125 gb of ram is there.
And you talked about refrences 
see these links
https://stackoverflow.com/questions/39100971/how-do-i-release-memory-used-by-a-pandas-dataframe
http://effbot.org/pyfaq/why-doesnt-python-release-the-memory-when-i-delete-a-large-object.htm
-- 
https://mail.python.org/mailman/listinfo/python-list

i am want to read data from the csv that i wrote using python csv module but apart from filed names and row count i am unable to read rest of the data

Re: i am want to read data from the csv that i wrote using python csv module but apart from filed names and row count i am unable to read rest of the data

Re: i am want to read data from the csv that i wrote using python csv module but apart from filed names and row count i am unable to read rest of the data

Re: i am want to read data from the csv that i wrote using python csv module but apart from filed names and row count i am unable to read rest of the data

To apply pca for a large csv

Re: To apply pca for a large csv

Incremental PCA

chi square test in sklearn printing NAN values for most of the columns

unable to write content in csv filw

error in CSV resetting with seek(0)

Re: error in CSV resetting with seek(0)

Ram memory not freed after executing python script on ubuntu system

Re: Ram memory not freed after executing python script on ubuntu system

13 matches

Site Navigation

Mail list logo

Footer information