On 05/21/2012 06:38 AM, wolfrage8...@gmail.com wrote: > All, I have had a curious idea for awhile, and was wondering the best > way to implement it in Python and if it is even possible. The concept > is this, a file that is actually a folder that contains multiple files > (Like an Archive format). The actual files are really un-important. > What I want is for the folder to be represented as a single file by > any normal file browser, but to be able to access the files with-in > via Python. I will actually use the word archive to represent my > mystical folder as a file concept for the rest of this message. Some > additional things I would like to be possible: is for multiple copies > of the program to write to the same archive, but different files > with-in at the same time (Reading & Writing to the archive should not > lock the archive as long as they are different files); and for just > the desired files with-in the archive to be loaded to memory with out > having to hold the entire archive in memory. > Use case for these additional capabilities. I was reading about how > some advanced word processing programs (MS Word) actually save > multiple working copies of the file with-in a single file > representation and then just prior to combining the working copies it > locks the original file and saves the working changes. That is what I > would like to do. I want the single file because it is easy for a user > to grasp that they need to copy a single file or that they are working > on a single file, but it is not so easy for them to grasp the multiple > file concepts. > > MS Word uses Binary streams as shown here: > http://download.microsoft.com/download/5/0/1/501ED102-E53F-4CE0-AA6B-B0F93629DDC6/WindowsCompoundBinaryFileFormatSpecification.pdf > Is this easy to do with python? Does it prevent file locking if you > use streams? Is this worth the trouble, or should I just use a > directory and forget this magical idea? > A piece of reference for my archive thoughts, ISO/IEC 26300:2006 chapter 17.2 >
When I first read your description, I assumed you were talking about the streams supported by NTFS, where it's possible to store multiple independent, named, streams of data within a single file. That particular feature isn't portable to other operating systems, nor even to non-NTFS systems. And the user could get tripped up by copying what he thought was the whole file, but in fact was only the unnamed stream. However, thanks for the link. That specification describes building an actual filesystem inside the file, which is a lot of work. It does not mention anything about stream locking, and my experience with MSWORD (which has been several years ago, now) indicates that the entire archive is locked whenever one instance of MSWORD is working on it. I think if you're trying to do something as flexible as MSWORD apparently does (or even worse - adding locking to it) you're asking for subtle bugs and race conditions. If the data you were storing is structured such that you can simplify the MS approach, then it may be worthwhile. For example, if each stream is always 4k. -- DaveA _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor