I seem to have misunderstood what buffy is meant to do. I thought I could 
create a spec (using its dynamic frames feature) for the entire file and 
pass to buffy this spec and the file name. When I query for a certain 
field, it would skip the required number of bytes and load into memory only 
those bytes that comprise the field I queried. I thought this would be an 
efficient way to read only a part of the file in a structured manner. 

That however seems to be incorrect. If I create the spec for the entire 
file, then the buffer passed to buffy has to contain the data for the 
entire file. In the case of a 500MB file, I would have to read the entire 
500MB into a buffer and pass that to buffy. That kinda seems inefficient 
when I only wish to query certain fields from the spec. 

The binary file has its data stored in a custom format. A lot of data is 
stored as a pair representing size of data (1 to 4 bytes of data is used to 
store the size) and actual bytes of data. Existing code is littered with 
custom `readInt(), readShort()` method to read the size of data and 
`skip(size)` to skip over it. This is done many times till it reaches the 
beginning of the interesting data. I was hoping to somehow avoid that using 
buffy and its spec. 


On Wednesday, 21 October 2015 20:09:58 UTC+5:30, Amith George wrote:
>
> I am interested in using buffy[1] to read data from multiple binary files. 
> The files have sizes varying from 10MB to 500MB. From the documenation, 
> buffy seems to work directly on a buffer and not a file. It can either 
> create a heap or off-heap buffer of size equaling the size of the spec or 
> it can wrap a passed in buffer. *With the former, how does it know which 
> file to read from? *
>
> If we choose the latter, ie pass in an existing buffer, how do we go about 
> creating that buffer? I am new to Java, so what should I take into account? 
> Reading blog posts, the general trend seems to be to create a buffer from 
> the inchannel of a RandomAccessFile opened in read mode. The size of the 
> buffer can either match the file size or be a fixed size. Depending on the 
> size of the buffer, `buffer.flip()` is called once or once for each 
> iteration of the read loop. The other alternative seems to be to create a 
> memory mapped buffer, either of size equalling file size or of a fixed 
> size. Since my file size won't go beyond 500MB and I can create a direct 
> buffer using standard allocation code, do I need to use a memory mapped 
> buffer? If I am not using a memory mapped buffer, do I need to call 
> buffer.flip() before passing it to buffy? 
>
> Also, how does buffy handle reading in data that is of size larger than 
> the fixed size buffer? In this specific scenario, I would be interested in 
> only about 5-10MB of data, located somewhere in the middle of the 500MB 
> sized file. I don't see the value in creating a buffer of size 500MB. Can I 
> create a 1MB fixed size buffer and tell buffy to read in a dynamic type 
> field whose size in that file happens to be 2.5MB? 
>
> I looked at other binary file reading libraries and their documentation 
> also don't mention how to create the buffer. I feel like I am overlooking 
> something basic. 
>
> [1] - https://github.com/clojurewerkz/buffy
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to