Re: [PHP] Re: heavy parsing of text, storing both versions

Justin French Thu, 19 Feb 2004 17:26:01 -0800

On Friday, February 20, 2004, at 10:58 AM, Torsten Schabdach wrote:

Hi Justin,

I'm building a CMS that does heavy parsing of a "HTML shorthand" plain text to XHTML strict, in a similar way to Textile <http://www.textism.com/tools/textile/>. The problem is this conversion might take place on 2-3 columns of text, and unlimited other fields (my CMS has user-defined data models),
Could you please provide a short example or an URL of your data model?
This would clear things a little bit.

Well, I'm still trying to define the DM, but something like this:

pages
id,path,status,dataModel_id
1,"/products/foo/",1,"3col"
2,"/home/",1,"2col"
etc

dataModel_3col
pageid, title,desc,keywords,col1,col2,col3

dataModel_2col
pageid, title,desc,keywords,col1,col2

... the above would be enough if it were all plain text, but since I'm doing a lot of parsing (especially on <textarea>'s), dataModel "3col" might look more like this:

dataModel_3col pageid,title,desc_in,desc_out,keywords,col1_in,col1_out, col2_in,col2_out, col3_in,col3_out

There may be an entirely better way of doing this though :)

2. [...] I'm going to be storing a LOT of data in the DB.
Why not save both as files? Maybe you find a structure for this, i.e. http://example.org/articles/2004/02/19/text34_v3.html or something like this. You save the "editfiles" and cache the xhtml-output.

That's a good idea which I'll definitely look into... ideally, I was hoping for everything to sit in one place -- either a database OR file structure -- not both. It makes for an easier learning curve, easier back-ups, etc etc.

The other problem with the above is that I don't wish to merge the fields with the template at this point -- I want that to be on-the-fly. So the question is how to store the multiple fields (in their parsed state) in flat files. I guess simple XML would be an option...

<pageid>4</pageid>
<title>Parsed title</title>
<desc>Parsed desc</desc>
<keywords>...</keywords>
<col1>...</col1>
<col2>...</col2>
<col3>...</col3>

... but that's adding complexity that perhaps doesn't need to be there.

I think I'd rather everything in the database. Perhaps I need to keep all the versions of the input text in one table, and all the versions of the output in another -- in this case, I could choose to keep ALL versions of the input, but perhaps only 1-5 versions of the output... reducing the "double data".

So, the process of adding/editing a page might be:

- user hits submit
- input saved into dataModel_3col_in (all versions are stored)
- input parsed
- output saved into dataModel_3col_out
- older versions of output (eg any more than 3) are cleaned out

And the process of retrieving data for output to a page would be simply retrieving the latest from dataModel_3col_out.

My only concern here is that there's 2 tables for every data model, which need to "match" at all times.

3. write a reverse set of functions which converts the XHTML back to the shorthand on demand for editing -- this seems great, but I don't like the idea of maintaining two functions for such a beast.
Only parse one way. I mean only from your format to XHTML. There is no need for converting back (I believe). your people edit the text, after that they open up the browser and hitting refresh. So that's when your parser comes in play. And with caching only once per document.


---
Justin French
http://indent.com.au

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP] Re: heavy parsing of text, storing both versions

Reply via email to