Hi,
Thursday, January 15, 2004, 10:41:57 AM, you wrote:
TR> Hi,
TR> Thursday, January 15, 2004, 3:07:02 AM, you wrote:
RS>> Hello,
RS>> This question may border on OT...
RS>> I have a web form where visitors must enter large amounts of text at one
RS>> time (text area). Once submitted, the large amount of text is stored as
RS>> a CLOB in an Oracle database.
RS>> Some of my visitors create their text in Ms-Word and then cut and paste
RS>> it into the text area and then submit the form.
RS>> When I retrieve it from the database, I do a stripslahses, htmlentities
RS>> and nl2br in that order to preserve the format of the submitted test.
RS>> When I view this text, single or double quotes show up as little white
RS>> square blocks. I've tested this out with MS-Word on a windows machine
RS>> and a mac machine. Same thing happens with either OS. This only
RS>> happens when they cut and paste from MS-Word into the text area. If
RS>> they type text into the text area directly, everything is fine...
RS>> I know I can search through their submitted text and swap out the
RS>> unrecognized character and insert the proper one. I just don't know
RS>> what to look for as being the unrecognized character.
RS>> I've googled all over looking at ascII charts and keyboard maps.
RS>> Nothing mentions MS-Word specific information though.
RS>> Anyone out there dealt with this before?
RS>> Thanks,
RS>> R
TR> The quotes are actually a sequence of three bytes with values like
TR> 226 128 156
TR> 226 128 157
TR> for the 2 quotes
TR> here is a bit of code to fix them and a few others, I would be
TR> interested if anyone knew the complete set of these weirdos :)
TR> $crap =
TR>
array(chr(226).chr(128).chr(147),chr(226).chr(128).chr(156),chr(226).chr(128).chr(157),chr(226).chr(128).chr(153));
TR> $clean = array('-','"','"',"'");
TR> $content = str_replace($crap,$clean,$text);
TR> --
TR> regards,
TR> Tom
I am probably misleading you ... sorry
It seems scintilla is the one creating the 3 byte sequence for me from
a msword paste. Here is function to clean it to entities:
function clean_ms_word($text){
$crap = array(
Ox82,0x83,0x84,0x85,0x86,0x87,0x88,0x89,
0x8a,0x8b,0x8c,0x91,0x92,0x93,0x94,0x95,
0x96,0x97,0x98,0x99,0x9a,0x9b,0x9c,9f
);
$clean = array(
'‚','ƒ','„','&ldots;','†','‡','','‰','Š',
'‹','Œ','‘','’','“','”','•','–',
'—','˜','™','š','›','œ','Ÿ'
);
$content = str_replace($crap,$clean,$text);
return $content;
}
--
regards,
Tom
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php