[PHP] heavy parsing of text, storing both versions

Justin French Thu, 19 Feb 2004 15:35:38 -0800

Hi all,

I'm building a CMS that does heavy parsing of a "HTML shorthand" plain text to XHTML strict, in a similar way to Textile <http://www.textism.com/tools/textile/>.

The problem is this conversion might take place on 2-3 columns of text, and unlimited other fields (my CMS has user-defined data models), and since they'll need to edit this text at a later date, I either need to:

1. Parse the text on demand into HTML -- the parsing script is to heavy/slow for this.

2. Store both the plain (shorthand HTML) text and parsed XHTML versions of each field -- the problem with this being that i'm storing double the data in the database... combine this with versioning of each 'page', and I'm going to be storing a LOT of data in the DB.

100 articles x 3 versions each x 500 words x 6 chars per word = 900,000 chars; add a whole bunch of XHTML to this, and it's looking pretty huge. Double the articles or versions, and it's scary :)

It also means I need to have two fields for each "field" (input and parsed), which makes the MySQL tables a lot more complex, etc.

3. write a reverse set of functions which converts the XHTML back to the shorthand on demand for editing -- this seems great, but I don't like the idea of maintaining two functions for such a beast.

Has anyone got any further ideas?


---
Justin French
http://indent.com.au

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

[PHP] heavy parsing of text, storing both versions

Reply via email to