I suppose what I was implying is that I believe there exists a way to 
deterministically keep "the smallest necessary cache" at all times, as opposed 
to having to fall back to less accurate techniques like a cache window. I also 
believe that such a "smallest necessary cache" would save more memory than the 
techniques currently in use. The example I gave in my original email was just 
one simple example of a case when you know for sure you can dump the cache. 
Allow me to present another example:

iteration = "whille" condition block / "for" condition block

If we pretend that "iteration" is the top level rule for this grammar, then 
after you have successfully parsed "while" you know for sure you will either 
succeed in parsing the rest of the rule or fail the parse completely: in other 
words, there is no successful backtracking available, and thus it is no longer 
necessary to keep memos at all. You have essentially "collapsed" the 
backtracking opportunity and your grammar behaves as if it is just sequences 
now. This particular case was examined in this paper: 
http://www.ialab.cs.tsukuba.ac.jp/~mizusima/publications/paste513-mizushima.pdf 
. But again, this is just one case.

Another case where this happens is in the last option of a choice rule:

choice = one / two / three

Once you are in "three", you know that after every character advance, you can 
dump all the memos before you because you won't be backtracking. Again, in the 
last option of a choice, you have essentially "collapsed" the backtracking 
nature of the grammar. This has the interesting effect that if possible you 
should place the more probably parsing option last so that you can save memory.

I have successfully implemented the above heuristics in my own PEG and gotten 
very positive results. I have reduced my memory consumption a tremendous amount 
and have done so without sacrificing accurate memoing at all. However, I am 
still a far cry away from having what I actually want, which is a complete set 
of rules for determining at every step what is needed and what isn't (perhaps 
its not possible to do this, I think it is).

On Jul 31, 2011, at 2:22 PM, Dustin Voss wrote:

> It is always safe to dump memos, so long as you are prepared to recreate them 
> or read directly from the input. Memoization just speeds things up and 
> probably makes algorithmic analysis more tractable. You ask about dumping 
> memos, but really what you want to do is reduce the memory footprint, and 
> there are several ways to accomplish that while keeping some of the speed-up 
> that memos give you.
> 
> The simplest option is to use a cache window. Drop results for earlier 
> positions as you examine later positions and drop results for later positions 
> if you have to backtrack to an earlier position. Consider it like this: 
> either your early matches are accurate, or they leave you on the wrong track. 
> If the former, you don’t need to keep information for earlier positions, and 
> if the latter, by the time you get on the right track the information for 
> later positions is likely to be inapplicable anyway.
> 
> Another option is to cache the results of only certain rules, skipping ones 
> that are not likely to be reexamined. There was a paper that indicated that 
> selective caching actually resulted in faster performance than ubiquitous 
> caching; I forget whether that paper included selection criteria.
> 
> A third option is to employ some sort of sparse or lazy cache. Perhaps you 
> can initially assume that no results at any input position need to be cached, 
> but if you ever revisit a position, assume that you’ll revisit it again in 
> the future and start caching results for it.
> 
> What I would like is some sort of probabilistic evaluation of a grammar. 
> Something that tell me, "how likely is it that the results of this rule at 
> such-and-such a position would be referenced during backtracking?" Is there 
> research along those lines?
> 
> 
> On Jul 2, 2011, at 12:50 PM, Francisco Tolmasky wrote:
> 
>> Hi,
>> 
>> I've only recently started working on PEG parsers (you can see the one I'm 
>> working on here: http://github.com/tolmasky/language/ ). I was wondering if 
>> there were any recommendations for good papers or other material relating to 
>> when it is safe to dump "memos" in a memoizing PEG parser. For example, in 
>> the following grammar:
>> 
>> start = A A
>> A = something / something_else
>> 
>> After successfully parsing through the first "A", it is safe to dump some 
>> entries from the memoization table since you know for sure that you will not 
>> be backtracking to them ever again. Unfortunately as far as I can tell this 
>> optimization can only be used when none of your ancestor rules are ordered 
>> choice rules (or I suppose any rule that may backtrack), so it can only be 
>> applied occasionally. I've read up a little and have a preliminary 
>> implementation for cut support ( 
>> http://www.ialab.cs.tsukuba.ac.jp/~mizusima/publications/paste513-mizushima.pdf
>>  ) but I am looking to see if there are any simpler heuristics like the one 
>> I showed above that don't require me to change the grammar.
>> 
>> Any help is greatly appreciated.
>> 
>> Thanks,
>> 
>> Francisco
>> 
>> 
>> _______________________________________________
>> PEG mailing list
>> [email protected]
>> https://lists.csail.mit.edu/mailman/listinfo/peg
> 

Francisco Tolmasky
714-224-6179
[email protected]

_______________________________________________
PEG mailing list
[email protected]
https://lists.csail.mit.edu/mailman/listinfo/peg

Reply via email to