The error model for Microsoft's Midori project (2009-2013)
<http://joeduffyblog.com/2016/02/07/the-error-model/> is a Nice Link
On Friday, November 4, 2016 at 5:45:39 AM UTC-4, Tamas Papp wrote:
>
> I figured it out (posting the solution for the archives, and possibly
> for comments). Reading the Julia issues about exceptions, I came across
> a blog post about the Midori error model [1], and also some discussions
> on how exceptions are not the way to handle errors which are not
> bugs. So I realized I need a version of parse that returns a
> Nullable, then found it that it already exists (tryparse).
>
> So here is my solution (for the self-contained stylized example, the
> actual code is much more complex):
>
> parsefield{T <: Real}(::Type{T}, string) = tryparse(T, string)
>
> function parsefile(io, schema)
> line = 1
> while !eof(io)
> strings = split(chomp(readline(io)), ';')
> values = parsefield.(schema, strings)
> function checked(column, value)
> if isnull(value)
> error("could not parse \"$(strings[column])\" as " *
> "$(schema[column]) in line $(line), column
> $(column)")
> else
> value
> end
> end
> # do something with this
> [checked(column,value) for (column, value) in enumerate(values)]
> line += 1
> end
> end
>
> test_file = """
> 1;2;3
> 4;5;6
> 7;error;9
> """
>
> parsefile(IOBuffer(test_file), fill(Int, 3))
>
> I still need to figure out type stability etc, but I think I am on the
> right track.
>
> [1] http://joeduffyblog.com/2016/02/07/the-error-model/
>
> On Thu, Nov 03 2016, Tamas Papp wrote:
>
> > Unfortunately, the data is too large to fit in memory -- I must process
> > it in a stream.
> >
> > I will look at some libraries, hoping to find an idiomatic solution. I
> > am sure that I am not the first one encountering this pattern.
> >
> > On Thu, Nov 03 2016, Jeffrey Sarnoff wrote:
> >
> >> or split the string into rows of strings and rows into individual
> >> value-keeper strings and put that into a matrix of strings and process
> the
> >> matrix, tracking row and col and checking for "error"
> >>
> >> On Thursday, November 3, 2016 at 5:15:06 AM UTC-4, Jeffrey Sarnoff
> wrote:
> >>>
> >>> Or, redefine the question :>
> >>>
> >>> If you are not tied to string processing, reading the test_file as a
> >>> string (if it is) and then splitting the string
> >>> ```julia
> >>> rowstrings = map(String, split(test_file, '\n')) # need the map to
> >>> avoid SubString results, if it matters
> >>> # then split the rows on ';' and convert to ?Float64 with NaN for
> error
> >>> or ?Nullable Ints
> >>> # and put the values in a matrix, processing the matrix you have
> the
> >>> rows and cols
> >>> ```
> >>>
> >>>
> >>> On Thursday, November 3, 2016 at 4:34:53 AM UTC-4, Tamas Papp wrote:
> >>>>
> >>>> Jeffrey,
> >>>>
> >>>> Thanks, but my question was about how to have line and column in the
> >>>> error message. So I would like to have an error message like this:
> >>>>
> >>>> ERROR: Failed to parse "error" as type Int64 in column 2, line 3.
> >>>>
> >>>> My best idea so far: catch the error at each level, and add i and
> line
> >>>> number. But this requires two try-catch-end blocks with rethrow.
> >>>>
> >>>> Extremely convoluted mess with rethrow here:
> >>>> https://gist.github.com/tpapp/6f67ff36a228f47a1792e011d9b0fc13
> >>>>
> >>>> It does what I want, but it is ugly. A simpler solution would be
> >>>> appreciated. I am sure I am missing something.
> >>>>
> >>>> Best,
> >>>>
> >>>> Tamas
> >>>>
> >>>> On Thu, Nov 03 2016, Jeffrey Sarnoff wrote:
> >>>>
> >>>> > Tamas,
> >>>> >
> >>>> > running this
> >>>> >
> >>>> >
> >>>> >
> >>>> > typealias AkoString Union{String, SubString{String}}
> >>>> >
> >>>> > function parsefield{T <: Real, S <: AkoString}(::Type{T}, str::S)
> >>>> > result = T(0)
> >>>> > try
> >>>> > result = parse(T, str)
> >>>> > catch ArgumentError
> >>>> > errormsg = string("Failed to parse \"",str,"\" as type ",
> T)
> >>>> > throw(ErrorException(errormsg))
> >>>> > end
> >>>> > return result
> >>>> > end
> >>>> >
> >>>> > function parserow(schema, strings)
> >>>> > # keep i for reporting column, currently not used
> >>>> > [parsefield(T, string) for (i, (T, string)) in
> >>>> enumerate(zip(schema,
> >>>> > strings))]
> >>>> > end
> >>>> >
> >>>> > function parsefile(io, schema)
> >>>> > line = 1
> >>>> > while !eof(io)
> >>>> > strings = split(chomp(readline(io)), ';')
> >>>> > parserow(schema, strings)
> >>>> > line += 1 # currently not used, use for error reporting
> >>>> > end
> >>>> > end
> >>>> >
> >>>> > test_file = """
> >>>> > 1;2;3
> >>>> > 4;5;6
> >>>> > 7;8;error
> >>>> > """
> >>>> >
> >>>> > parsefile(IOBuffer(test_file), fill(Int, 3))
> >>>> >
> >>>> >
> >>>> >
> >>>> >
> >>>> > by evaluating parsefile(...), results in
> >>>> >
> >>>> >
> >>>> >
> >>>> > julia> parsefile(IOBuffer(test_file), fill(Int, 3))
> >>>> > ERROR: Failed to parse "error" as type Int64
> >>>> > in parsefield(::Type{Int64}, ::SubString{String}) at ./REPL[2]:7
> >>>> > in (::##1#2)(::Tuple{Int64,Tuple{DataType,SubString{String}}}) at
> >>>> > ./<missing>:0
> >>>> > in collect_to!(::Array{Int64,1},
> >>>> >
> >>>>
> ::Base.Generator{Enumerate{Base.Zip2{Array{DataType,1},Array{SubString{String},1}}},##1#2},
>
>
> >>>>
> >>>> > ::Int64, ::Tuple{Int64,Tuple{Int64,Int64}}) at ./array.jl:340
> >>>> > in
> >>>> >
> >>>>
> collect(::Base.Generator{Enumerate{Base.Zip2{Array{DataType,1},Array{SubString{String},1}}},##1#2})
>
>
> >>>>
> >>>> > at ./array.jl:308
> >>>> > in parsefile(::Base.AbstractIOBuffer{Array{UInt8,1}},
> >>>> ::Array{DataType,1})
> >>>> > at ./REPL[4]:5
> >>>> >
> >>>> >
> >>>> >
> >>>> >
> >>>> >
> >>>> > On Wednesday, November 2, 2016 at 1:01:30 PM UTC-4, Tamas Papp
> wrote:
> >>>> >>
> >>>> >> This is a conceptual question. Consider the following (extremely
> >>>> >> stylized, but self-contained) code
> >>>> >>
> >>>> >> parsefield{T <: Real}(::Type{T}, string) = parse(T, string)
> >>>> >>
> >>>> >> function parserow(schema, strings)
> >>>> >> # keep i for reporting column, currently not used
> >>>> >> [parsefield(T, string) for (i, (T, string)) in
> >>>> enumerate(zip(schema,
> >>>> >> strings))]
> >>>> >> end
> >>>> >>
> >>>> >> function parsefile(io, schema)
> >>>> >> line = 1
> >>>> >> while !eof(io)
> >>>> >> strings = split(chomp(readline(io)), ';')
> >>>> >> parserow(schema, strings)
> >>>> >> line += 1 # currently not used, use for error reporting
> >>>> >> end
> >>>> >> end
> >>>> >>
> >>>> >> test_file = """
> >>>> >> 1;2;3
> >>>> >> 4;5;6
> >>>> >> 7;8;error
> >>>> >> """
> >>>> >>
> >>>> >> parsefile(IOBuffer(test_file), fill(Int, 3))
> >>>> >>
> >>>> >> This will fail with an error message
> >>>> >>
> >>>> >> ERROR: ArgumentError: invalid base 10 digit 'e' in "error"
> >>>> >> in tryparse_internal(::Type{Int64}, ::SubString{String}, ::Int64,
> >>>> >> ::Int64, ::Int64
> >>>> >> , ::Bool) at ./parse.jl:88
> >>>> >> in parse(::Type{Int64}, ::SubString{String}) at ./parse.jl:152
> >>>> >> in parsefield(::Type{Int64}, ::SubString{String}) at
> ./REPL[152]:1
> >>>> >> in (::##5#6)(::Tuple{Int64,Tuple{DataType,SubString{String}}}) at
> >>>> >> ./<missing>:0
> >>>> >> in collect_to!(::Array{Int64,1},
> >>>> >> ::Base.Generator{Enumerate{Base.Zip2{Array{DataTy
> >>>> >> pe,1},Array{SubString{String},1}}},##5#6}, ::Int64,
> >>>> >> ::Tuple{Int64,Tuple{Int64,Int64
> >>>> >> }}) at ./array.jl:340
> >>>> >> in
> >>>> >>
> >>>>
> collect(::Base.Generator{Enumerate{Base.Zip2{Array{DataType,1},Array{SubString{
>
>
> >>>>
> >>>> >>
> >>>> >> String},1}}},##5#6}) at ./array.jl:308
> >>>> >> in parsefile(::Base.AbstractIOBuffer{Array{UInt8,1}},
> >>>> >> ::Array{DataType,1}) at ./RE
> >>>> >> PL[154]:5
> >>>> >>
> >>>> >> Instead, I would like to report something like this:
> >>>> >>
> >>>> >> ERROR: Failed to parse "error" as Int on line 3, column 3.
> >>>> >>
> >>>> >> What's the idiomatic way of doing this in Julia? My problem is
> that
> >>>> >> parsefield fails without knowing line or column (i in parserow). I
> >>>> could
> >>>> >> catch and rethrow, constructing an error object gradually. Or I
> could
> >>>> >> pass line and column numbers to parserow and parsefield for error
> >>>> >> reporting, but that seems somehow inelegant (I have seen it in
> code
> >>>> >> though).
> >>>> >>
> >>>> >> Best,
> >>>> >>
> >>>> >> Tamas
> >>>> >>
> >>>>
> >>>
>