The inner blob is expecting an io.Reader. But, perhaps I can change that
to pass a Decoder based on what you are saying. For some reason I hadn't
grokked that is how Decoder was working. Just to re-iterate what I think
you are saying (and in case anyone stumbles across this thread later),
assuming a file that has this type of structure (call each of the outer
blobs A, B, C for reference):
{
[
{...},
{...}
]
}
{
[
{...},
{...}
]
}
[
{...},
{...}
]
The first call to Decoder() will move the pointer to the first `{` in A.
Something like exponen-io.jsonpath Seek() could be used to advance to
A's `[`
The second call to Decoder(), with the embedded reader, will set the
position at A's first inner {...}
Each subsequent call to Decode() will process each inner {...} of A one
at a time until More() is false, at which point the position is at A's `]`
The third call to Decoder() will move the pointer to the first `{` in B.
*Question:
Is this in fact correct? If not how to I get reader to this point of the
stream?*
The fourth call to Decoder() will allow me to stream read to B's `[` (in
this case using exponent-io.jsonpath SeekTo() or some other mechanism)
Each subsequent call to Decode() will process each inner {...} of B one
at a time until More() is false, at which point the position is at B's `]`
The fifth call to Decoder() will move the pointer to the first `[` in C.
Each subsequent call to Decode() will process each inner {...} of C one
at a time until More() is false
I realize this may not what actually is going internally inside these
packages, but at a high level is that conceptually something approaching
what is going on?
If this is true, I gotta say this is one of the things I *LOVE* about Go.
I cannot count the number of times I had some complicated problem which,
which Go makes a whole lot easier. Or put another way: I was
over-complicating the problem and not recognizing the underlying code
defect which should change. In fact, even refactoring this code even
though its used in about 100 places would be trivial. I could probably
just use perl -pie to fix the code. And also, if I may be a bit indulgent
here, the quality of the answers that come out of the Golang community are
just amazing. I love reading this mailing list even though I've only
posted to it a few times.
- Greg
On Sunday, March 28, 2021 at 1:26:17 AM UTC-7 Brian Candler wrote:
> > This works, but the downside is that each {...} of bytes has to be
> pulled into memory. And the functions that is called is already designed
> to receive an io.Reader and parse the VERY large inner blob in an efficient
> manner.
>
> Is the inner blob decoder actually using a json.Decoder, as shown in your
> example func secondDecoder()? In that case, the simplest and most
> efficient answer is to create a persistent json.Decoder which wraps the
> underlying io.Reader directly, and just keep calling w2.Decode(&v) on each
> call. It will happily consume the stream, one object at a time.
>
> If that's not possible for some reason, then it sounds like you want to
> break the outer stream at outer object boundaries, i.e. { ... }, without
> fully parsing it. You can do that with json.RawMessage:
> https://play.golang.org/p/BitE6l27160
>
> However, you've still read each object as a stream of bytes into memory,
> and you've still done some of the work of parsing the JSON to find the
> start and end of each object. You can turn it back into an io.Reader by
> creating a bytes.NewBuffer around it, if that's what the inner parser
> requires. However if each object is large, and you really need to avoid
> reading it into memory at all, then you'd need some sort of rewindable
> stream.
>
> Another approach is to stop the source generating pretty-printed JSON, and
> make it generate in JSON-Lines <https://jsonlines.org/> format instead.
> It sounds like you're unable to change the source, but you might be able to
> un-prettyprint the JSON by using an external tool (perhaps jq can do
> this). Then I am thinking you could make a custom io.Reader which returns
> data up to a newline, then sends EOF and sends you a fresh io.Reader for
> the next line.
>
> But this is all very complicated, when keeping the inner Decoder around
> from object to object is a simple solution to the problem that you
> described. Is there some other constraint which prevents you from doing
> this?
>
> On Saturday, 27 March 2021 at 19:42:40 UTC [email protected] wrote:
>
>> Good afternoon,
>>
>> For a case where there's a file containing a sequence of hashes (it could
>> be arrays too, as the underlying object type seems irrelevant) as per
>> RFC-7464. I cannot figure out how to handle this in a memory efficient way
>> that doesn't involve pulling each blob
>>
>> I've tried to express this on Go playground here:
>> https://play.golang.org/p/Aqx0gnc39rn
>> Note that I'm using exponent-io/jsonpath as the JSON decoder, but
>> certainly that could be swapped for something else.
>>
>> In essence here is an example of the input bytes:
>>
>> {
>> "elements" : [
>> {
>> "Space" : "YCbCr",
>> "Point" : {
>> "Cb" : 0,
>> "Y" : 255,
>> "Cr" : -10
>> }
>> },
>> {
>> "Point" : {
>> "B" : 255,
>> "R" : 98,
>> "G" : 218
>> },
>> "Space" : "RGB"
>> }
>> ]
>> }
>> {
>> "elements" : [
>> {
>> "Space" : "YCbCr",
>> "Point" : {
>> "Cb" : 3000,
>> "Y" : 355,
>> "Cr" : -310
>> }
>> },
>> {
>> "Space" : "RGB",
>> "Point" : {
>> "B" : 355,
>> "G" : 318,
>> "R" : 108
>> }
>> }
>> ]
>> }
>> {
>> "elements" : [
>> {
>> "Space" : "YCbCr",
>> "Point" : {
>> "Cr" : -410,
>> "Cb" : 400,
>> "Y" : 455
>> }
>> },
>> {
>> "Space" : "RGB",
>> "Point" : {
>> "B" : 455,
>> "R" : 118,
>> "G" : 418
>> }
>> }
>> ]
>> }
>>
>> I can iterate through that with this code:
>>
>> w := json.NewDecoder(bytes.NewReader(j))
>> for w.More() {
>> var v interface{}
>> w.Decode(&v)
>> fmt.Printf("%+v\n", v)
>> }
>>
>> This works, but the downside is that each {...} of bytes has to be pulled
>> into memory. And the functions that is called is already designed to
>> receive an io.Reader and parse the VERY large inner blob in an efficient
>> manner.
>>
>> So in principal, this is kinda want I want to do, but maybe I'm looking
>> at it all wrong:
>>
>>
>> w := json.NewDecoder(bytes.NewReader(j))
>> for w.More() {
>> reader2 := ???? //Some io.Reader that represents each of the 3 json-seq
>> blocks
>> secondDecoder(reader2)
>> }
>>
>> func secondDecoder(reader io.Reader) {
>> w2 := json.NewDecoder(reader)
>> var v interface{}
>> w2.Decode(&v)
>> fmt.Printf("%+v\n", v)
>> }
>>
>> Any ideas on how to solve this problem?
>>
>> I should note that it is not possible for the input to change in this
>> case as the system that consumes it is not the same one that has been
>> generating it for the past 5 years.
>>
>> Thanks!
>>
>> - Greg
>>
>>
--
You received this message because you are subscribed to the Google Groups
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/golang-nuts/58472f92-aa24-43a1-b22a-adc8f872e8ccn%40googlegroups.com.