Thanks Martin, for the detailed break down, it actually helped me solve one of my other problems..
My applogies to begin with, it seems i didnt state my problem clearly for this particular case - perharps I/O was not the best way to describe my problem. We have a system fully developed in python- with thousands of lines of code- i know i can use the logging facilty for this task, but this means i have to go into the code and edit it to log specifics of what i need, this is gone take a while and this also implies all other users adding modules must include logging statements.. Specifically, i would like to track all inputs/output to modules/functions - if a module retrieved and used files and run some analysis on them and produced other files in return, i would like to take not of this. i.e what i want is to recored input and outputs to a module. and also to record all paramaters, attribute vaules used by the same module. I thought i would build a wrapper around the orignial python program or probably pick this information at OS level. Sorry for the confusion.. Jojo On Fri, Sep 17, 2010 at 12:45 AM, Martin A. Brown <mar...@linux-ip.net>wrote: > > [apologies in advance for an answer that is partially off topic] > > Hi there JoJo, > > : I could begin with tracing I/O calls in my App.. if its > : sufficient enough i may not need i/o calls for the OS. > > What do you suspect? Filesystem I/O? > > * open(), close(), opendir() closedir() filesystem latency? > * read(), write() latency? > * low read() and write() throughput? > > Network I/O? > > * Are name lookups taking a long time? > * Do you have slow network throughput? (Consider tcpdump.) > > Rather than writing code (at first glance), why not use a system > call profiler to check this out. It is very unlikely that python > itself is the problem. Could it be the filesystem/network? Could > it be DNS? A system call profiler can help you find this. > > Are you asking this because you plan on diagnosing I/O performance > issues in your application? Is this a one time thing in a > production environment that is sensitive to application latency? > If so, you might try tickling the application and attaching to the > process with a system call tracer. Under CentOS you should be able > to install 'strace'. If you can run the proggie on the command > line: > > strace -o /tmp/trace-output-file.txt -f python yourscript.py args > > Then, go learn how to read the /tmp/trace-output-file.txt. > > Suggested options: > > -f follow children > -ttt sane Unix-y timestamps > -T total time spent in each system call > -s 256 256 byte limit on string output (default is 32) > -o file store trace data in a file > -p pid attach to running process of pid > -c only show a summary of cumulative time per system call > > : > But this is extremely dependant on the Operating System - you will > : > basically have to intercept the system calls. So, which OS are > : > you using? And how familiar are you with its API? > : > : I am using centos, however i don't even have admin privileges. > : Which API are you referring to? > > You shouldn't need admin privileges if you can run the program as > yourself. If you have setuid/setgid bits, then you will need > somebody with administrative privileges to help you. > > OK, so let's say that you have already done this and understand all > of the above, you know it's not the system and you really want to > understand where your application is susceptible to bad performance > or I/O issues. Now, we're back to python land. > > * look at the profile module > http://docs.python.org/library/profile.html > > * instrument your application by using the logging module > http://docs.python.org/library/logging.html > > You might ask how it is a benefit to use the logging module. Well, > if your program generates logging data (let's say to STDERR) and you > do not include timestamps on each log line, you can trivially add > timestamps to the logging data using your system's logging > facilities: > > { python thingy.py >/dev/null ; } 2>&1 | logger -ist 'thingy.py' -- > > Or, if you like DJB tools: > > { python thingy.py >/dev/null ; } 2>&1 | multilog t ./directory/ > > Either of which solution leaves you (implicitly) with timing > information. > > : > Also, While you can probably do this in Python but its likely > : > to have a serious impact on the OS performance, it will slow > : > down the performamce quite noticeably. I'd normally recommend > : > using C for something like this. > > Alan's admonition bears repeating. Trapping all application I/O is > probably just fine for development, instrumenting and diagnosing, > but you may wish to support that in an easily removable manner, > especially if performance is paramount. > > Good luck, > > -Martin > > -- > Martin A. Brown > http://linux-ip.net/ >
_______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor