On Thu, 24 Aug 2017 07:40:21 BST roger peppe <[email protected]> wrote:
> On 24 August 2017 at 06:39, Bakul Shah <[email protected]> wrote:
> >
> > Finally, for better performance it may make sense to store the
> > FSM as a vector of vectors or vector of maps so that a slice
> > of inputs may be processed in one function call. Probably best
> > done with a FSM generator.
>
> That's interesting. What might that look like?
Something like this:
type state struct {
next byte
action byte
}
type Token struct {
text []byte
kind byte
}
type Scanner struct {
st state
parser *Parser
token Token
...
}
const ( err byte = iota, skip, flush, unget, emit... )
var fsm [][]state
var class []byte
func init() {
// initialize classs. e.g. letter for [A-Za-z]
// create the FSM from a more compact spec
}
func (sc* Scanner) Scan(str []byte) {
st := sc.state
p := sc.parser
i := 0
start = 0
for {
if i >= len(str) { break }
b := str[i] // no unicode, just 8 bit chars!
c := class[b] // map char to a much smaller char-class
st := fsm[st.next][c]
i++
switch st.action&7 {
case err: // handle errors...
case skip: continue
case flush: start = i // e.g. at the end of a comment
case unget: i--; fallthrough
case emit:
sc.token = Token{str[start:i], st.action>>3}
start = i
p.Parse(sc.token)
}
}
sc.state = st
}
Scan is called every time a line is read. When a full token is
recognized, the Parse is called. sc.action is composed of the
recognized terminal if any + next action.
If an extra char had to be read, the action is unget and
then call Parse. The parser in turn may trigger things
downstream when some sematic action has to be taken.
Scan can be called with the contents of a whole file or even a
single byte string. With this structure the mainloop can
poll/select on a number of input connections.
--
You received this message because you are subscribed to the Google Groups
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.