I seem to have misunderstood what buffy is meant to do. I thought I could create a spec (using its dynamic frames feature) for the entire file and pass to buffy this spec and the file name. When I query for a certain field, it would skip the required number of bytes and load into memory only those bytes that comprise the field I queried. I thought this would be an efficient way to read only a part of the file in a structured manner.
That however seems to be incorrect. If I create the spec for the entire file, then the buffer passed to buffy has to contain the data for the entire file. In the case of a 500MB file, I would have to read the entire 500MB into a buffer and pass that to buffy. That kinda seems inefficient when I only wish to query certain fields from the spec. The binary file has its data stored in a custom format. A lot of data is stored as a pair representing size of data (1 to 4 bytes of data is used to store the size) and actual bytes of data. Existing code is littered with custom `readInt(), readShort()` method to read the size of data and `skip(size)` to skip over it. This is done many times till it reaches the beginning of the interesting data. I was hoping to somehow avoid that using buffy and its spec. On Wednesday, 21 October 2015 20:09:58 UTC+5:30, Amith George wrote: > > I am interested in using buffy[1] to read data from multiple binary files. > The files have sizes varying from 10MB to 500MB. From the documenation, > buffy seems to work directly on a buffer and not a file. It can either > create a heap or off-heap buffer of size equaling the size of the spec or > it can wrap a passed in buffer. *With the former, how does it know which > file to read from? * > > If we choose the latter, ie pass in an existing buffer, how do we go about > creating that buffer? I am new to Java, so what should I take into account? > Reading blog posts, the general trend seems to be to create a buffer from > the inchannel of a RandomAccessFile opened in read mode. The size of the > buffer can either match the file size or be a fixed size. Depending on the > size of the buffer, `buffer.flip()` is called once or once for each > iteration of the read loop. The other alternative seems to be to create a > memory mapped buffer, either of size equalling file size or of a fixed > size. Since my file size won't go beyond 500MB and I can create a direct > buffer using standard allocation code, do I need to use a memory mapped > buffer? If I am not using a memory mapped buffer, do I need to call > buffer.flip() before passing it to buffy? > > Also, how does buffy handle reading in data that is of size larger than > the fixed size buffer? In this specific scenario, I would be interested in > only about 5-10MB of data, located somewhere in the middle of the 500MB > sized file. I don't see the value in creating a buffer of size 500MB. Can I > create a 1MB fixed size buffer and tell buffy to read in a dynamic type > field whose size in that file happens to be 2.5MB? > > I looked at other binary file reading libraries and their documentation > also don't mention how to create the buffer. I feel like I am overlooking > something basic. > > [1] - https://github.com/clojurewerkz/buffy > -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to [email protected] Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
