On Wed, Sep 2, 2009 at 10:11 AM, Robert Kern <robert.k...@gmail.com> wrote:
> On Wed, Sep 2, 2009 at 09:38, Gökhan Sever<gokhanse...@gmail.com> wrote: > > Hello, > > > > I want to be able to parse a binary file which hold information regarding > to > > experiment configuration and data obviously. Both configuration and data > > sections are variable-length. A chuck this data is shown as below (after > a > > binary read operation) > > > > '\x00\x00@ > \x00$\x00\x02\x00\x12\x00\xff\x00\x00\x00U\xaa\xfa\xffd\x00\x08\x00\x01\x00\x08\x00\xff\x00\x00\x00U\xaa\xfb\xffl\x00\xab\x00\x01\x00\xab\x00\xff\x00\x00\x00U\xaa\xe7\x03\x17\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00U\xaa\xd9\x07\x04\x00\x02\x00\r\x00\x06\x00\x03\x00\x00\x00\x01\x00\x00\x00\xd9\x07\x04\x00\x02\x00\r\x00\x06\x00\x03\x00\x00\x00\x01\x00\x00\x00prj.300\x00; > > Version = 1\n', 'ProjectName = PME1 2009 King Air N825ST\n', 'FlightId = > > \n', 'AircraftType = WMI King Air 200\n', 'AircraftId = N825ST\n', > > 'OperatorName = Weather Modification Inc.\n', 'Comments = \n', '\x00\x00@ > > > > In binary form the file is 1.3MB, and when written to a txt file it > expands > > to 3.7MB totalling approximately 4 million characters. When fully > processed > > (with an IDL code) it produces 86 seperate configuration files, and 46 > ascii > > files for data, about 10-15 different instruments and in various > > combinations plus sampling rates. > > > > I attemted to use RE module, however the time it takes parse the file is > > really longer than I expected. What would be wisest and fastest way to > > tackle this issue? Upon successful re-construction of the data and > metadata, > > I am planning to use a much modular structure like HDF5 or netCDF4 for an > > easy data storage and analyses. > > Are there fixed delimiters? Like '\x00\...@\x00' perhaps? It might be > faster to search for those using .find() instead of regexes. > > Without more information about how the file format gets split up, I'm > not sure we can make good suggestions. > > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." > -- Umberto Eco > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > Fixed delims... That is what I used to parse metadata with a regex. Something like: r = re.compile("\0;.+?\...@\0\$", re.DOTALL) which extracts to portions that I am interested. However I have yet to figure parsing separate data streams. Couldn't find a way find to see which data blocks goes with which device. I put the test binary file I am using at: http://drop.io/1plh5rt -- Gökhan
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion