[PHP] Aggressive PHP Smart Caching

2007-06-18 Thread Alexander Romanovich

I'm a PHP developer looking for feedback on a caching approach I put
together recently. It's informed by thoughts people have shared on this
newslist and other places over the years. My goal was to come up with an
extremely lightweight flat file caching system which solves various concerns
about portability, speed, and specific feature implementations.


Included is a wishlist I generated to explain the approach. The code is
small, and documented, and easy to test for those interested. Feedback form
attached to the following web page (but this newsgroup is as good a forum as
any, too).


http://technologies.babywhale.net/cache/


Re: [PHP] Aggressive PHP Smart Caching

2007-06-26 Thread Alexander Romanovich

Thank you for your reply Nathan.

You are right that this method of caching is different than the two  
types you have outlined below. I would not say that it is a new  
method though, in fact, "pushing static files" to the server is very  
common. If it weren't for the fact that this method, as I have  
designed it, allows a very tiny PHP overhead to handle dynamic  
updating of the cache I could have even gone the extra mile to push  
html files that would be loaded directly by the end user without PHP  
being initialized at all. (My reasons for not taking this last step  
should become apparent to those who read the wishlist I produced at  
http://technologies.babywhale.net/cache/ )


Understand that this method does not *exclude* using the other two  
methods you have outlined. In fact, I personally make use of  
memcached and APC where I feel it is appropriate in my application  
design. This does not mean that I can not also write a cache layer  
that makes the application itself and its variables irrelevant and  
not required for most site hits (hence a major optimization).


To answer your other questions:

1) Caching on disk could easily be handled instead by caching in  
memory, but this approach is meant to be ultra-portable and work  
everywhere. There are situations where a viable memory storage  
mechanism is simply not available, and other cases where it is not  
desirable to consume memory for this purpose and plenty of hard drive  
storage space is a good alternative. I think you will find this  
caching method is intensely speed-tuned and a fast implementation of  
a portable file system based method. I would also point out that in  
my line of work, where I chiefly have to adopt environments that are  
configured under rather political circumstances, it is consistently  
this type of caching that the system administrators argue for. As  
someone has already pointed out, there may not even be a significant  
difference between disk and memory based storage mechanisms on your  
server.


2) Again, one of the main theories behind this method is portability.  
In order to not rely on cron, server queries, or other external  
checks for a stale cache, I have gone with a "refresh interval" which  
has been proposed on this list in the past. It proposes that dynamic  
content should be refreshed once every X seconds/minutes/hours. This  
script avoids PHP date manipulations and instead performs some basic  
math to handle the refresh rate, but also to *sync* content to some  
degree, so portions of dynamic content are less likely to haphazardly  
refresh independently and therefore not match. I think this is a  
slight improvement over code that has been posted here before. In a  
practical sense, this means that your application fires and produces  
content only once every X minutes, and not each and every time the  
page is hit. Furthermore, because in this case it is known ahead of  
time when that page will expire, a cache header can be sent with an  
exact expiration time so repeated hits by the same end user will not  
even trigger a transmission of cached content from the server.


3) In regards to daily purging: for one, if you are going for a  
scheduled refresh of content, then you probably already have a  
refresh rate that is less than 24 hours, so accepting an additional  
daily trigger of recaching should not be unacceptable. But more  
specifically, the reason behind this is that a file system based  
caching method does not natively support a TTL on cached files, and  
there has to be some way to handle a cache of a script that has since  
been deleted. Note that if 24 hours is not acceptable for some  
reason, this script can easily be modified to increase that without  
negatively affecting anything else.


On Jun 24, 2007, at 11:55 PM, Nathan Nobbe wrote:


Alexander,

sorry to see nobody has replied to your post, im sure you worked  
very hard on the cache system and are eager for feedback..


so to me it looks like youve introduced a somewhat new style of  
caching here (though im sure there are other such approaches); for  
instance i know of 2 main uses for caches at this time [as caching  
pertains to php].

caching php intermediate code
caching application variables
both of these caching techniques are designed to overcome  
limitations of the language as it ships out of the box, more or  
less; afaik.
it appears you are interested in caching the output of php scripts,  
which is, i suppose, a third technique that could be added to the  
list.
so i have a criticism about your system and a couple questions as  
well.

criticism
why cache script output on disk?  if a fast cache is your goal, why  
not store the result of script output in memory rather than on  
disk; that would be much faster

questions
how does your cache system know when cached output is stale and  
allow fresh contents to be delivered from the original script  
rather than being served from the cache?