It is always safe to dump memos, so long as you are prepared to recreate them 
or read directly from the input. Memoization just speeds things up and probably 
makes algorithmic analysis more tractable. You ask about dumping memos, but 
really what you want to do is reduce the memory footprint, and there are 
several ways to accomplish that while keeping some of the speed-up that memos 
give you.

The simplest option is to use a cache window. Drop results for earlier 
positions as you examine later positions and drop results for later positions 
if you have to backtrack to an earlier position. Consider it like this: either 
your early matches are accurate, or they leave you on the wrong track. If the 
former, you don’t need to keep information for earlier positions, and if the 
latter, by the time you get on the right track the information for later 
positions is likely to be inapplicable anyway.

Another option is to cache the results of only certain rules, skipping ones 
that are not likely to be reexamined. There was a paper that indicated that 
selective caching actually resulted in faster performance than ubiquitous 
caching; I forget whether that paper included selection criteria.

A third option is to employ some sort of sparse or lazy cache. Perhaps you can 
initially assume that no results at any input position need to be cached, but 
if you ever revisit a position, assume that you’ll revisit it again in the 
future and start caching results for it.

What I would like is some sort of probabilistic evaluation of a grammar. 
Something that tell me, "how likely is it that the results of this rule at 
such-and-such a position would be referenced during backtracking?" Is there 
research along those lines?


On Jul 2, 2011, at 12:50 PM, Francisco Tolmasky wrote:

> Hi,
> 
> I've only recently started working on PEG parsers (you can see the one I'm 
> working on here: http://github.com/tolmasky/language/ ). I was wondering if 
> there were any recommendations for good papers or other material relating to 
> when it is safe to dump "memos" in a memoizing PEG parser. For example, in 
> the following grammar:
> 
> start = A A
> A = something / something_else
> 
> After successfully parsing through the first "A", it is safe to dump some 
> entries from the memoization table since you know for sure that you will not 
> be backtracking to them ever again. Unfortunately as far as I can tell this 
> optimization can only be used when none of your ancestor rules are ordered 
> choice rules (or I suppose any rule that may backtrack), so it can only be 
> applied occasionally. I've read up a little and have a preliminary 
> implementation for cut support ( 
> http://www.ialab.cs.tsukuba.ac.jp/~mizusima/publications/paste513-mizushima.pdf
>  ) but I am looking to see if there are any simpler heuristics like the one I 
> showed above that don't require me to change the grammar.
> 
> Any help is greatly appreciated.
> 
> Thanks,
> 
> Francisco
> 
> 
> _______________________________________________
> PEG mailing list
> [email protected]
> https://lists.csail.mit.edu/mailman/listinfo/peg


_______________________________________________
PEG mailing list
[email protected]
https://lists.csail.mit.edu/mailman/listinfo/peg

Reply via email to