On Sunday, August 30, 2015 at 1:16:12 PM UTC-4, MRAB wrote:
> On 2015-08-30 17:31, kbtyo wrote:
> > On Saturday, August 29, 2015 at 10:50:18 PM UTC-4, MRAB wrote:
> >> On 2015-08-30 03:05, kbtyo wrote:
> >> > I am using Jupyter Notebook and Python 3.4. I have a data structure in
> >> > the format, (type list):
> >> >
> >> > [{'AccountNumber': N,
> >> > 'Amount': '0',
> >> > 'Answer': '12:00:00 PM',
> >> > 'ID': None,
> >> > 'Type': 'WriteLetters',
> >> > 'Amount': '10',
> >> > {'AccountNumber': Y,
> >> > 'Amount': '0',
> >> > 'Answer': ' 12:00:00 PM',
> >> > 'ID': None,
> >> > 'Type': 'Transfer',
> >> > 'Amount': '2'}]
> >> >
> >> > The end goal is to write this out to CSV.
> >> >
> >> > For the above example the output would look like:
> >> >
> >> > AccountNumber, Amount, Answer, ID, Type, Amount
> >> > N,0,12:00:00 PM,None,WriteLetters,10
> >> > Y,2,12:00:00 PM,None,Transfer,2
> >> >
> >> > Below is the function that I am using to write out this data structure.
> >> > Please excuse any indentation formatting issues. The data structure is
> >> > returned through the function "construct_results(get_just_xml_data)".
> >> >
> >> > The data that is returned is in the format as above.
> >> > "construct_headers(get_just_xml_data)" returns a list of headers.
> >> > Writing out the row for "headers_list" works.
> >> >
> >> > The list comprehension "data" is to maintain the integrity of the column
> >> > headers and the values for each new instance of the data structure
> >> > (where the keys in the dictionary are the headers and values - row
> >> > instances). The keys in this specific data structure are meant to check
> >> > if there is a value instance, and if there is not - place an ''.
> >> >
> >> > def write_to_csv(results, headers):
> >> >
> >> > headers = construct_headers(get_just_xml_data)
> >> > results = construct_results(get_just_xml_data)
> >> > headers_list = list(headers)
> >> >
> >> > with open('real_csv_output.csv', 'wt') as f:
> >> > writer = csv.writer(f)
> >> > writer.writerow(headers_list)
> >> > for row in results:
> >> > data = [row.get(index, '') for index in results]
> >> > writer.writerow(data)
> >> >
> >> >
> >> >
> >> > However, when I run this, I receive this error:
> >> >
> >> > ---------------------------------------------------------------------------
> >> > TypeError Traceback (most recent call
> >> > last)
> >> > <ipython-input-747-7746797fc9a5> in <module>()
> >> > ----> 1 write_to_csv(results, headers)
> >> >
> >> > <ipython-input-746-c822437eeaf0> in write_to_csv(results, headers)
> >> > 9 writer.writerow(headers_list)
> >> > 10 for item in results:
> >> > ---> 11 data = [item.get(index, '') for index in results]
> >> > 12 writer.writerow(data)
> >> >
> >> > <ipython-input-746-c822437eeaf0> in <listcomp>(.0)
> >> > 9 writer.writerow(headers_list)
> >> > 10 for item in results:
> >> > ---> 11 data = [item.get(index, '') for index in results]
> >> > 12 writer.writerow(data)
> >> >
> >> > TypeError: unhashable type: 'dict'
> >> >
> >> >
> >> > I have done some research, namely, the following:
> >> >
> >> > https://mail.python.org/pipermail//tutor/2011-November/086761.html
> >> >
> >> > http://stackoverflow.com/questions/27435798/unhashable-type-dict-type-error
> >> >
> >> > http://stackoverflow.com/questions/1957396/why-dict-objects-are-unhashable-in-python
> >> >
> >> > However, I am still perplexed by this error. Any feedback is welcomed.
> >> > Thank you.
> >> >
> >> You're taking the index values from 'results' instead of 'headers'.
> >
> > Would you be able to elaborate on this? I partially understand what you
> > mean. However, each dictionary (of results) has the same keys to map to
> > (aka, headers when written out to CSV). I am wondering if you would be able
> > to explain how the index is being used in this case?
> >
> In the list comprehension on line 11, you have "item.get(index, '')".
>
> What is 'index'?
>
> You have "for index in results" in the list comprehension, and 'results'
> is a list of dicts, therefore 'index' is a _dict_.
>
> That means that you're trying to look up an entry in the 'item' dict
> using a _dict_ as the key.
>
> Oh, and incidentally, line 12 should be indented to the same level as
> line 11.
Yes, as mentioned in my OP, please forgive formatting issues with indentation:
I feel that I need to provide some context to avoid any confusion over my
motivations for choosing to do something.
My original task was to parse an XML data structure stored in a CSV file with
other data types and then add the elements back as headers and the text as row
values. I went back to drawing board and creating a "results" list of
dictionaries where the keys have values as lists using this.
def convert_list_to_dict(get_just_xml_data):
d = {}
for item in get_just_xml_data(get_all_data):
for k, v in item.items():
try:
d[k].append(v)
except KeyError:
d[k] = [v]
return d
This creates a dictionary for each XML tag - for example:
{
'Number1': ['0'],
'Number2': ['0'],
'Number3': ['0'],
'Number4': ['0'],
'Number5': ['0'],
'RepgenName': [None],
'RTpes': ['Execution', 'Letters'],
'RTID': ['3', '5']}
I then used this to create a "headers" set (to prevent duplicates to be added)
and the list of dictionaries that I mentioned in my OP.
I achieve this via:
#just headers
def construct_headers(convert_list_to_dict):
header = set()
with open('real.csv', 'rU') as infile:
reader = csv.DictReader(infile)
for row in reader:
xml_data = convert_list_to_dict(get_just_xml_data)
#get_just_xml_data(get_all_data)
row.update(xml_data)
header.update(row.keys())
return header
#get all of the results
def construct_results(convert_list_to_dict):
header = set()
results = []
with open('real.csv', 'rU') as infile:
reader = csv.DictReader(infile)
for row in reader:
xml_data = convert_list_to_dict(get_just_xml_data)
#get_just_xml_data(get_all_data)
# print(row)
row.update(xml_data)
# print(row)
results.append(row)
# print(results)
header.update(row.keys())
# print(type(results))
return results
I guess I am using the headers list originally written out. My initial thought
is to just write out the values corresponding with each transaction. For
example, citing this data structure:
{
'Number1': ['0'],
'Number2': ['0'],
'Number3': ['0'],
'Number4': ['0'],
'Number5': ['0'],
'RPN': [None],
'RTypes': ['Execution', 'Letters'],
'RTID': ['3', '5']}
I would get a CSV
Number1, Number2, Number3, Number4, Number5, RPN, RTypes,RTID
0, 0, 0, 0, 0, None, Execution, 3
None, None, None,None,None, Letters, 5
I am wondering how I would achieve this when all of the headers set is not
sorted (should I do so before writing this out?). Also, since I have millions
of transactions I want to make sure that the values for each of the headers is
sequentially placed. Any guidance would be very helpful. Thanks.
--
https://mail.python.org/mailman/listinfo/python-list