I'd like to propose a new PEP [no, that isn't a redundant 'process' in there :-)--pre-PEP is a different process than PEP], for a standard library module that deals with files and file paths in an object oriented manner. I believe this module should be included as part of the standard Python distribution.
Background ========== Some time ago, I wrote such a module for myself, and have found it extremely useful. Recently, I found a reference to a similar module, http://www.jorendorff.com/articles/python/path/ by Jeff Orendorff. There are of course differences--I think mine is more comprehensive but probably less stable--but the similarities in thought are striking. Both work by creating a class representing file paths, and then using that class to unify methods from shutil, os.path, and some builtin functions such as 'open' (and maybe some other stuff I can't remember). I haven't looked at Jeff's code yet, but for my own, a major enabler of the enhanced functionality has been the inclusion of generators in Python. This allows, for example, a method which yields all of the lines in a file and automatically closes that file after. The availability of attributes also makes certain things cleaner than was the case in previous versions of python. Fit With Python Philosophy ========================= One of the strengths of Python is that it is a highly object-oriented language, but this is not true when it comes to handling files. As far as python is concerned a file path is just a string, and there are a bunch of things you can do with it, but they all have to be done with function calls (not methods) since there is no concept of a file path object. Even worse, these functions are spread out across various modules, and often have cryptic names that hardly make it obvious what they do. Given that two different people concluded that such a module was desirable, and independently implemented modules that are actually very similar, I suspect there is an 'object-oriented mindset' to which this way of addressing files and file paths is natural. And that should be part of Python. Pragmatic Justification ================= I've been using my module for about a year and a half now. The ease- of-use and uniformity make a huge (I'm tempted to say 'vast') difference in dealing with files. I believe other users would experience an increase in efficiency when dealing with files ranging from 'significant' to 'very large' (in precise technical terms :-) ) Also, I think this type of API would be much easier for new users to learn and use. Examples ======== A few examples are in order. Again, these are from my own library, since I'm not too familiar with Jeff's. Also, this is stuff I'm just typing in right now as an illustration--there may be syntactic errors. (However, all of this functionality is present.) And these by no means represent the full functionality that is already defined. # define a new path object mydir = filepath("#&*$directory") # Note that special characters are automatically escaped # by filepath, as necessary for the current OS. If a character # is illegal in a file name no matter what (cannot be escaped), # an exception will be raised. # A file in that directory f = mydir / "some.txt" # Go through the lines in the file. When all lines are done, # the file will be closed automatically. If the file does not # actually exist, an appropriate exception will be raised. for line in f.iterlines(): ...do something... #The directory containing f is, of course, 'mydir' assert f.parent == mydir #Another path aPath = filepath(....) #In my module (not in Jeff's), a file path is considered # semantically as a sequence of directory names terminated # by the name of a file or directory. This makes it easy to # obtain the name of the file at the end of a path: theFile = aPath[-1] # or the directory leading to that file parentDir = aPath[0:-1] #of course, these two common indexes/slices are accessible through attributes theFile = aPath.basename parentDir = aPath.parent # A more powerful 'walk'-type method is included. Below, # the 'recursive' indicates that directories should be recursively # walked, and the 'preorder' indicates that directories should # be included in the iteration _before_ their contents are given. # There is also a 'postorder' argument, and both may be used to # yield directories both before and after their contents. aPath.iterfiles(recursive=True, preorder=True) # With the advent of the 'itertools' module in python, there is no # need to provide an argument taking a function that is applied # during the walk process, so in that sense, iterfiles is actually simpler than walk. #...and more. All of the various file capabilities available in Python are provided # in a unified package in this module. -- http://mail.python.org/mailman/listinfo/python-list
