I suppose what I was implying is that I believe there exists a way to deterministically keep "the smallest necessary cache" at all times, as opposed to having to fall back to less accurate techniques like a cache window. I also believe that such a "smallest necessary cache" would save more memory than the techniques currently in use. The example I gave in my original email was just one simple example of a case when you know for sure you can dump the cache. Allow me to present another example:
iteration = "whille" condition block / "for" condition block If we pretend that "iteration" is the top level rule for this grammar, then after you have successfully parsed "while" you know for sure you will either succeed in parsing the rest of the rule or fail the parse completely: in other words, there is no successful backtracking available, and thus it is no longer necessary to keep memos at all. You have essentially "collapsed" the backtracking opportunity and your grammar behaves as if it is just sequences now. This particular case was examined in this paper: http://www.ialab.cs.tsukuba.ac.jp/~mizusima/publications/paste513-mizushima.pdf . But again, this is just one case. Another case where this happens is in the last option of a choice rule: choice = one / two / three Once you are in "three", you know that after every character advance, you can dump all the memos before you because you won't be backtracking. Again, in the last option of a choice, you have essentially "collapsed" the backtracking nature of the grammar. This has the interesting effect that if possible you should place the more probably parsing option last so that you can save memory. I have successfully implemented the above heuristics in my own PEG and gotten very positive results. I have reduced my memory consumption a tremendous amount and have done so without sacrificing accurate memoing at all. However, I am still a far cry away from having what I actually want, which is a complete set of rules for determining at every step what is needed and what isn't (perhaps its not possible to do this, I think it is). On Jul 31, 2011, at 2:22 PM, Dustin Voss wrote: > It is always safe to dump memos, so long as you are prepared to recreate them > or read directly from the input. Memoization just speeds things up and > probably makes algorithmic analysis more tractable. You ask about dumping > memos, but really what you want to do is reduce the memory footprint, and > there are several ways to accomplish that while keeping some of the speed-up > that memos give you. > > The simplest option is to use a cache window. Drop results for earlier > positions as you examine later positions and drop results for later positions > if you have to backtrack to an earlier position. Consider it like this: > either your early matches are accurate, or they leave you on the wrong track. > If the former, you don’t need to keep information for earlier positions, and > if the latter, by the time you get on the right track the information for > later positions is likely to be inapplicable anyway. > > Another option is to cache the results of only certain rules, skipping ones > that are not likely to be reexamined. There was a paper that indicated that > selective caching actually resulted in faster performance than ubiquitous > caching; I forget whether that paper included selection criteria. > > A third option is to employ some sort of sparse or lazy cache. Perhaps you > can initially assume that no results at any input position need to be cached, > but if you ever revisit a position, assume that you’ll revisit it again in > the future and start caching results for it. > > What I would like is some sort of probabilistic evaluation of a grammar. > Something that tell me, "how likely is it that the results of this rule at > such-and-such a position would be referenced during backtracking?" Is there > research along those lines? > > > On Jul 2, 2011, at 12:50 PM, Francisco Tolmasky wrote: > >> Hi, >> >> I've only recently started working on PEG parsers (you can see the one I'm >> working on here: http://github.com/tolmasky/language/ ). I was wondering if >> there were any recommendations for good papers or other material relating to >> when it is safe to dump "memos" in a memoizing PEG parser. For example, in >> the following grammar: >> >> start = A A >> A = something / something_else >> >> After successfully parsing through the first "A", it is safe to dump some >> entries from the memoization table since you know for sure that you will not >> be backtracking to them ever again. Unfortunately as far as I can tell this >> optimization can only be used when none of your ancestor rules are ordered >> choice rules (or I suppose any rule that may backtrack), so it can only be >> applied occasionally. I've read up a little and have a preliminary >> implementation for cut support ( >> http://www.ialab.cs.tsukuba.ac.jp/~mizusima/publications/paste513-mizushima.pdf >> ) but I am looking to see if there are any simpler heuristics like the one >> I showed above that don't require me to change the grammar. >> >> Any help is greatly appreciated. >> >> Thanks, >> >> Francisco >> >> >> _______________________________________________ >> PEG mailing list >> [email protected] >> https://lists.csail.mit.edu/mailman/listinfo/peg > Francisco Tolmasky 714-224-6179 [email protected]
_______________________________________________ PEG mailing list [email protected] https://lists.csail.mit.edu/mailman/listinfo/peg
