[Tutor] Attacking this problem (2 parts):JSON object to CSV file

2015-06-19 Thread Saran Ahluwalia
Good Evening,

I have a conundrum regarding JSON objects and converting them to CSV:

*Context*


   - I am converting XML files to a JSON object (please see snippet below)
   and then finally producing a CSV file. Here is a an example JSON object:


"PAC": {
"Account": [{
"PC": "0",
"CMC": "0",
"WC": "0",
"DLA": "0",
"CN": null,
"FC": {
"Int32": ["0",
"0",
"0",
"0",
"0"]
},
"F": {
"Description": null,
"Code": "0"
}

In general, when I convert any of the files from JSON to CSV, I have been
successful when using the following:


import csv
import json
import sys

def hook(obj):
return obj

def flatten(obj):
for k, v in obj:
if isinstance(v, list):
yield from flatten(v)
else:
yield k, v

if __name__ == "__main__":
with open("somefileneame.json") as f:
data = json.load(f, object_pairs_hook=hook)

pairs = list(flatten(data))

writer = csv.writer(sys.stdout)
header = writer.writerow([k for k, v in pairs])
row = writer.writerow([v for k, v in pairs]) #writer.writerows for any
other iterable object


However with the example JSON object (above) i receive the following error
when applying this function:

ValueError: too many values to unpack

Here are some more samples.


   1. "FC": {"Int32": ["0","0","0","0","0","0"]}
   2. "PBA": {"Double": ["0","0","0","0","0","0","0","0"]}


3.  "PBDD": {
"DateTime": ["1/1/0001
12:00:00 AM",
"1/1/0001 12:00:00 AM",
"1/1/0001 12:00:00 AM",
"1/1/0001 12:00:00 AM",
"1/1/0001 12:00:00 AM",
"1/1/0001 12:00:00 AM",
"1/1/0001 12:00:00 AM",
"1/1/0001 12:00:00 AM"]
},



In the above example, I would like to remove the keys *Int32*, *Double *and
*DateTime*. I am wondering if there is a function or methodology that
would allow
me to remove such nested keys and reassign the new keys to the outer key
(in this case above *FC, PBA *and *PBDD*) as column headers in a CSV and
concatenate all of the values within the list (as corresponding fields).

Also, here is how I strategized my XML to CSV conversion (if this is of any
use):


import xml.etree.cElementTree as ElementTree
from xml.etree.ElementTree import XMLParser
import json
import csv
import tokenize
import token
try:
from collections import OrderedDict
import json
except ImportError:
from ordereddict import OrderedDict
import simplejson as json
import itertools
import six
import string
from csvkit import CSVKitWriter


class XmlListConfig(list):
def __init__(self, aList):
for element in aList:
if element:
# treat like dict
if len(element) == 1 or element[0].tag != element[1].tag:
self.append(XmlDictConfig(element))
# treat like list
elif element[0].tag == element[1].tag:
self.append(XmlListConfig(element))
elif element.text:
text = element.text.strip()
if text:
self.append(text)


class XmlDictConfig(dict):
'''
Example usage:

>>> tree = ElementTree.parse('your_file.xml')
>>> root = tree.getroot()
>>> xmldict = XmlDictConfig(root)

Or, if you want to use an XML string:

>>> root = ElementTree.XML(xml_string)
>>> xmldict = XmlDictConfig(root)

And then use xmldict for what it is..a dictionary.
'''
def __init__(self, parent_element):
if parent_element.items():
self.update(dict(parent_element.items()))
for element in parent_element:
if element:
# treat like dict - we assume that if the first two tags
# in a series are different, then they are all different.
if len(element) == 1 or element[0].tag != element[1].tag:
aDict = XmlDictConfig(element)
# treat like list - we assume that if the first two tags
# in a series are the same, then the rest are the same.
else:
# here, we put the list in dictionary; the key is the
# tag name the list elements all share in common, and
# the value is the list itself
aDict = {element[0].tag: XmlListConfig(element)}
# if the tag has attributes, add those to the dict
if element.items():
aDict.update(dict(element.items()))
self.update({element.tag: aDict})
# this assumes that if you've got an attribute in a tag,
# you won't be having any text.

[Tutor] Newbie to Python: enumerate XML tags (keys that will become headers) along with text (values) and write to CSV in one row (as opposed to "stacked" values with one header)

2015-06-25 Thread Saran Ahluwalia
My question can be found here:


http://stackoverflow.com/questions/31058100/enumerate-column-headers-in-csv-that-belong-to-the-same-tag-key-in-python


Here is an additional sample sample of the XML that I am working with:



0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0


0
0
0
0
0
0


0
0
0
0
0
0
0
0


1/1/0001 12:00:00 AM
1/1/0001 12:00:00 AM
1/1/0001 12:00:00 AM
1/1/0001 12:00:00 AM
1/1/0001 12:00:00 AM
1/1/0001 12:00:00 AM
1/1/0001 12:00:00 AM
1/1/0001 12:00:00 AM




False
False
False
False
False
0




--
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] Advice on Strategy for Attacking IO Program

2015-03-29 Thread Saran Ahluwalia
Hello:

Here is what I am trying have my program do:


• Monitor a folder for files that are dropped throughout the day

• When a file is dropped in the folder the program should scan the file

o IF all the contents in the file have the same length

o THEN the file should be moved to a "success" folder and a text file
written indicating the total number of records processed

o IF the file is empty OR the contents are not all of the same length

o THEN the file should be moved to a "failure" folder and a text file
written indicating the cause for failure (for example: Empty file or line
100 was not the same length as the rest).


Below are the functions that I have been experimenting with. I am not sure
how to most efficiently create a functional program from each of these
constituent parts. I could use decorators (sacrificing speed) or simply
pass a function within another function.

[code]
import time
import fnmatch
import os
import shutil


#If you want to write to a file, and if it doesn't exist, do this:

if not os.path.exists(filepath):
f = open(filepath, 'w')

#If you want to read a file, and if it exists, do the following:

try:
f = open(filepath)
except IOError:
print 'I will be moving this to the '


#Changing a directory to "/home/newdir"
os.chdir("/home/newdir")

def move(src, dest):
shutil.move(src, dest)

def fileinfo(file):
filename = os.path.basename(file)
rootdir = os.path.dirname(file)
lastmod = time.ctime(os.path.getmtime(file))
creation = time.ctime(os.path.getctime(file))
filesize = os.path.getsize(file)

print "%s**\t%s\t%s\t%s\t%s" % (rootdir, filename, lastmod, creation,
filesize)

searchdir = r'D:\Your\Directory\Root'
matches = []

def search
for root, dirnames, filenames in os.walk(searchdir):
##  for filename in fnmatch.filter(filenames, '*.c'):
for filename in filenames:
##  matches.append(os.path.join(root, filename))
##print matches
fileinfo(os.path.join(root, filename))


def get_files(src_dir):
# traverse root directory, and list directories as dirs and files as files
for root, dirs, files in os.walk(src_dir):
path = root.split('/')
for file in files:
process(os.path.join(root, file))
os.remove(os.path.join(root, file))

def del_dirs(src_dir):
for dirpath, _, _ in os.walk(src_dir, topdown=False):  # Listing the
files
if dirpath == src_dir:
break
try:
os.rmdir(dirpath)
except OSError as ex:
print(ex)


def main():
get_files(src_dir)
del_dirs(src_dir)


if __name__ == "__main__":
main()


[/code]
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] Feedback on Script for Pandas DataFrame Written into XML

2015-03-29 Thread Saran Ahluwalia
Hello:

I would appreciate your feedback on whether I correctly wrote my XML. I am
exporting a DataFrame and writing into a XML file. I used the ElementTree
library. The DataFrame has 11 rows and 8 columns (excluding the index
column).

#My schema assumption:
#
#[
#Some number row
#Sample text 
#]
#
CODE: SELECT ALL 

document = ET.Element("list")

def make_message(document, row):
msg = ET.SubElement(document, "message")
for field in row.index:
field_element = ET.SubElement(msg, field)
field_element.text = row[field]
return msg

def add_to_document(row):
return make_message(document, row)

#df.apply(add_to_document, axis=0) ---> if I were to import a DataFrame
stored in the variable
#"df", I would simply APPLY the add_to_document function and COMBINE this
into a document

ET.dump(document)

Thank you, in advance for your help.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Feedback on Script for Pandas DataFrame Written into XML

2015-03-30 Thread Saran Ahluwalia
Good Morning Martin:

Thank you for your feedback.

I have attached a .html file (I would recommend downloading this first and
then opening the file), and my .py script. Here is the data
.


My function (included in the prior message) and my schema is based off of
my interpretation that each row of the DataFrame is a message and each
field is within the message.

Each column's index and its corresponding field is nested within each
message (for example "date"). I gave this hypothetical example as one can
see one of the columns includes a data/timestamp of a correspondence.  My
question is as follows:

1. I this the correct translation/interpretation of the data set? Or am I
over thinking the schema and interpretation of the DataFrame?

I welcome your thoughts and feedback.

Sincerely,

Saran

On Sun, Mar 29, 2015 at 10:32 PM, Martin A. Brown 
wrote:

>
> Good evening again,
>
> I'm replying to your second post, because I replied to the first. This may
> be a more specific request than is typically handled on Python tutor.  This
> involves specific knowledge of the xml.etree.ElementTree and
> pandas.DataFrame objects.
>
>  I would appreciate your feedback on whether I correctly wrote my XML. I
>> am exporting a DataFrame and writing into a XML file. I used the
>> ElementTree library. The DataFrame has 11 rows and 8 columns (excluding the
>> index column).
>>
>
> Side note:  Hard to know or give any advice without considerably more
> detail on the data involved.  But
>
>  #My schema assumption:
>> #
>> #[
>> #Some number row
>> #Sample text 
>> #]
>> #
>>
>
> That shows 6 (XML) elements.  This is neither 8 nor 11.
>
>  CODE: SELECT ALL 
>>
>> document = ET.Element("list")
>>
>> def make_message(document, row):
>>msg = ET.SubElement(document, "message")
>>for field in row.index:
>>field_element = ET.SubElement(msg, field)
>>field_element.text = row[field]
>>return msg
>>
>> def add_to_document(row):
>>return make_message(document, row)
>>
>> #df.apply(add_to_document, axis=0) ---> if I were to import a DataFrame
>> stored in the variable
>> #"df", I would simply APPLY the add_to_document function and COMBINE this
>> into a document
>>
>> ET.dump(document)
>>
>> Thank you, in advance for your help.
>>
>
> This is a more general inquiry and is probably better suited for the lxml
> (ElementTree) mailing list ...
>
>   https://mailman-mail5.webfaction.com/listinfo/lxml
>
> ... or maybe the Pandas mailing list:
>
>   https://groups.google.com/forum/#!forum/pydata
>
> Best of luck,
>
> -Martin
>
> --
> Martin A. Brown
> http://linux-ip.net/
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] Question for Strategy for Directory Monitor

2015-04-02 Thread Saran Ahluwalia
Good Evening :

Here is what I want me program to do:

• *Monitor* a folder for files that are dropped throughout the day

• When a file is dropped in the folder the program should scan the file

o IF all the contents in the file have the same length (let's assume line
length)

o THEN the file should be moved to a "success" folder and a text file
written indicating the total number of records/lines/words processed

o IF the file is empty OR the contents are not all of the same length

o THEN the file should be moved to a "failure" folder and a text file
written indicating the cause for failure (for example: Empty file or line
100 was not the same length as the rest).

I want to thank Martin Brown for his guidance and feedback. I welcome any
and all feedback on the following

import os
import time
import glob
import sys

def initialize_logger(output_dir):
logger = logging.getLogger()
logger.setLevel(logging.DEBUG)

# create console handler and set level to info
handler = logging.StreamHandler()
handler.setLevel(logging.INFO)
formatter = logging.Formatter("%(levelname)s - %(message)s")
handler.setFormatter(formatter)
logger.addHandler(handler)

# create error file handler and set level to error
handler = logging.FileHandler(os.path.join(output_dir,
"error.log"),"w", encoding=None, delay="true")
handler.setLevel(logging.ERROR)
formatter = logging.Formatter("%(levelname)s - %(message)s")
handler.setFormatter(formatter)
logger.addHandler(handler)

# create debug file handler and set level to debug
handler = logging.FileHandler(os.path.join(output_dir, "all.log"),"w")
handler.setLevel(logging.DEBUG)
formatter = logging.Formatter("%(levelname)s - %(message)s")
handler.setFormatter(formatter)
logger.addHandler(handler)


def main(dirslist):
while True:
for file in os.listdir(dirslist) :
return validate_files(file)
time.sleep(5)

if __name__ == "__main__":
main()


*#Helper Functions for the Success and Failure Folder Outcomes,
respectively*

def file_len(filename):
with open(filename) as f:
for i, l in enumerate(f):
pass
return i + 1

def copyFile(src, dest):
try:
shutil.copy(src, dest)
# eg. src and dest are the same file
except shutil.Error as e:
print('Error: %s' % e)
# eg. source or destination doesn't exist
except IOError as e:
print('Error: %s' % e.strerror)

def move_to_failure_folder_and_return_error_file():
os.mkdir('Failure')
copyFile(filename, 'Failure')
initialize_logger('rootdir/Failure')
logging.error("Either this file is empty or the lines")


def move_to_success_folder_and_read(file):
os.mkdir('Success')
copyFile(filename, 'Success')
print("Success", file)
return file_len()

#This simply checks the file information by name
def fileinfo(file):
filename = os.path.basename(file)
rootdir = os.path.dirname(file)
lastmod = time.ctime(os.path.getmtime(file))
creation = time.ctime(os.path.getctime(file))
filesize = os.path.getsize(file)
return filename, rootdir, lastmod, creation, filesize

if __name__ == '__main__':
   import sys
   validate_files(sys.argv[1:])

I am trying to specifically address the fact that the program does not:

   - Assumes that all filenames come directly from the commandline.  No
   searching of a directory.


   - The present code does not move any files to success or failure
directories (I
   have added functions at the end that could serve as substitutes)


   - The present code doesn't calculate or write to a text file.


   - The present code runs once through the names, and terminates.  It doesn't
   "monitor" anything  - I think that I have added the correct while loop.


   - The present code doesn't check for zero-length files

I have attempted to address these but, as a novice, am not sure what is
best practice.

Sincerely,

Saran
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] New to Programming: TypeError: coercing to Unicode: need string or buffer, list found

2015-04-02 Thread Saran Ahluwalia
Good Morning:

I understand this error message when I run this code. However, I am curious
to know what the most pythonic way is to convert  the list to a string? I
use Python 2.7.

"Traceback (most recent call last):
before = dict([(f, None) for f in os.listdir(dirlist)])
TypeError: coercing to Unicode: need string or buffer, list found"


The sample code that I am trying to run is:

path = "/Users/Desktop/Projects/"
dirlist = os.listdir(path)
before = dict([(f, None) for f in os.listdir(dirlist)])

def main(dirlist):
while True:
time.sleep(10) #time between update check
after = dict([(f, None) for f in os.listdir(dirlist)])
added = [f for f in after if not f in before]
if added:
print('Successfully added new file - ready to validate')
if __name__ == "__main__":
main()
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] New to Programming: TypeError: coercing to Unicode: need string or buffer, list found

2015-04-02 Thread Saran Ahluwalia
Danny,

You were spot on with that issue. I debugged this. Here are my two commits
for my homework: Starting with pyinotify
<https://github.com/ahlusar1989/WGProjects/blob/master/pyinotifyWGrun.py>
 and OS agnostic?
<https://github.com/ahlusar1989/WGProjects/commit/b8a2de25d45b1dc02b1dc189d2cfb71683fbdd9a?diff=unified#diff-0700a6bf321f99757963c11d7866aea4>
I
am still working on the latter - in regards to adding more customization
that fits the homework specifications.

Feel free to shoot me any critical feedback - when you can.

Cheers:

Saran

On Thu, Apr 2, 2015 at 1:45 PM, Danny Yoo  wrote:

>
> On Apr 2, 2015 9:45 AM, "Saran Ahluwalia" 
> wrote:
> >
> > Good Morning:
> >
> > I understand this error message when I run this code. However, I am
> curious
> > to know what the most pythonic way is to convert  the list to a string? I
> > use Python 2.7.
> >
> > "Traceback (most recent call last):
> > before = dict([(f, None) for f in os.listdir(dirlist)])
> > TypeError: coercing to Unicode: need string or buffer, list found"
>
> I actually do not understand the error.  :p
>
> In particular, I do not understand the subexpression:
>
> os.listdir(dirlist)
>
> Can you explain what you are trying to do there?  The listdir function
> only consumes single strings, so this looks like a mistake unless dirlist
> is a string.
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor