from:"a . h . s . boy"

Re: [PHP] OT- "Private Registrations" is bogus

2002-11-23 Thread a . h . s . boy

FOR THE RECORD:
I (spud who really is at nothingness.org) did NOT post this original 
message to the list, despite the apparent sender address (and the 
authentic signature at the bottom). Even the message-id header was 
forged to appear to have emanated from my server (whose logs I checked, 
to be sure it hadn't). Funny thing is, I haven't been subscribed to 
this list from that address in weeks, having moved all my subscriptions 
to "spudlists(at)nothingness.org".

I'm not sure the original SMTP envelope is anywhere to be found, but 
I'd love to see it and find out who's been posing as me. In any case, I 
can spell "something" correctly, and in general I believe my grammar to 
be better than this...

So caveat lector...someone IS spamming this list, but it isn't me...

Cheers,
spud.

On Saturday, November 23, 2002, at 02:09  AM, Hugh Danaher wrote:

Did you register just to plug this?
- Original Message -
From: <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Friday, November 22, 2002 9:02 PM
Subject: [PHP] OT- "Private Registrations" for Domains



Has anyone heard of or used somthing like this, just received this 
notice
from my registrar?


-

Dear Valued Customer,
Great news! We now offer Private Registrations.

What is a private registration? A private registration allows you to

shield your personal information from the WhoIs database when 
registering a
domain while retaining the full benefits of ownership. With a private 
domain
registration, you keep your personal information private.

This process is new and so unique that it is supported by two patent

applications.


The way it works is simple:
-Domains By Proxy(TM), a sister company of WORXdoamins, becomes the

registrant of record for any new, existing or transferred domain name 
you
designates.
-The "WHOIS" database is then populated with Domains By Proxy's 
contact
information, not yours!

-Domains By Proxy becomes the registrant of record for any domain 
name.
-Domains By Proxy's proprietary registration and e-mail handling 
systems
even let you elect whether or not to receive postal mail or email


Best of all, you still retain the full benefits of ownership! You can

cancel, sell, renew or transfer your domain names; set-up name servers 
for
the private domain name; resolve disputes involving the domain name; 
and
more.

Getting a Private Registration will:
+ Stop domain-related spam
+ End data mining
+ Deter identity theft
+ Prevent harassers & stalkers
+ Protect your family
+ And more!

To use this great new service just use this link:
http://private.worxdomains.com

-

spud.

---
a.h.s. boy
spud(at)nothingness.org "as yes is to if,love is to yes"
http://www.nothingness.org
---




--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php




--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



---
a.h.s. boy
spud(at)nothingness.org"as yes is to if,love is to yes"
http://www.nothingness.org/
---


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

[PHP] Cleaning pasted Word text

2002-10-29 Thread a . h . s . boy

I'm working on a PHP-based CMS that allows users to post lengthy  
article texts by submitting through a form. The short version of my  
quandary is this: How can I create a conversion routine that reliably  
substitutes HTML-acceptable output for high-ASCII characters pasted  
into the form (from a variety of operating systems)?

The longer version is this:
In order to prevent scripting vulnerabilities and a variety of other  
undesirable content, I run the body of the text through a cleantext()  
function. This function first strips out illegal HTML tags and  
JavaScript. So far so good.

Then it attempts to perform some character conversions to clean up  
8-bit ASCII characters in the text, so smart quotes, en- and em-dashes,  
ellipses, etc. are converted to suitable alternative, or to HTML  
entities. I'm using:

// Reference:
// chr(133) = ellipsis
// chr(145) = left curly single quote
// chr(146) = right curly single quote (apostrophe)
// chr(147) = left curly double quote
// chr(148) = right curly double quote
// chr(149) = bullet
// chr(150) = en dash
// chr(151) = em dash
// chr(153) = trademark
// chr(160) = non-breaking space
// chr(161) = inverted exclamation mark
// chr(169) = copyright symbol
// chr(171) = left guillemet
// chr(173) = soft hyphen
// chr(174) = registered trademark
// chr(187) = right guillemet
// chr(188) = 1/4 fraction
// chr(189) = 1/2 fraction
// chr(190) = 3/4 fraction
// chr(191) = inverted question mark
$changearr = array(" "=>" ",
	"\r"=>"\n",
	"\r\n"=>"\n",
	"\n\n\n" => "\n\n",
	chr(133)=>"...",
	chr(145)=>"'",
	chr(146)=>"'",
	chr(147)=>"\"",
	chr(148)=>"\"",
	chr(149)=>"*",
	chr(150)=>"-",
	chr(151)=>"--",
	chr(153)=>"(TM)",
	chr(160)=>" ",
	chr(161)=>"¡",
	chr(169)=>"©",
	chr(171)=>"«",
	chr(173)=>"-",
	chr(174)=>"(R)",
	chr(187)=>"»",
	chr(188)=>"1/4",
	chr(189)=>"1/2",
	chr(190)=>"3/4",
	chr(191)=>"¿");
$returnstr = strtr($returnstr,$changearr);

The server's on a Linux box (RedHat 7.2, standard US installation);  
users can obviously post from any sort of operating system.

This routine seems to work well on Word text pasted in from my Mac (OS  
X 10.2.1), but I see a number of articles appearing on the site with  
text like:

Wouldnâ(TM)t you say?

(That's "Wouldn[a circumflex][Euro symbol](TM)t" instead of "Wouldn't".

...which was almost definitely pasted in from a Windows-based Microsoft  
Word, and the conversion routines are failing. (And inserting even  
weirder characters...why would the single quote be replace by _3_  
character substitutions?)

I understand that Windows may well use a different character set for  
high-ASCII, but I frankly don't understand how to work that knowledge  
into this situation. And the combination of original text, Linux ,  
chr(), and ord() stuff just doesn't make sense to me. For example, if I  
post text (from my Mac) containing only:


(that's  
[open-double-quote][close-double-quote][open-single-quote][close- 
single-quote][ellipsis])

and have PHP run this:

for ($x = 0; $x < strlen($str); $x++) {
   $mailstr .= $str[$x].' is '.ord($str[$x])."\n";
}
mail('me','Characters',$mailstr);

I get mail that says (in parentheses is a description of the character):

ì is 147 (accent-grave-i)
î is 148 (circumflex-i)
ë is 145 (umlaut-e)
í is 146 (accent-acute-i)
Ö is 133 (umlaut capital o)

...which means that "recognizes" the correct ASCII value (147) of a  
double-quote, though my Linux box seems to think that the character is  
a lowercase "i" with a grave accent on it. With this kind of strange  
sub-conversion going on, I'm not all that surprised that things are  
getting mucked up.

Is there some way of getting pasted Word text from Windows "clean" in  
this manner, as well as accommodating the already-working-right Mac  
Word text?

Cheers,
spud.

-
a.h.s. boy
[EMAIL PROTECTED]
dadaIMC support
http://www.dadaimc.org/
-

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP] Cleaning pasted Word text

2002-10-29 Thread a . h . s . boy

Brent --

Thanks for the pointer, but it doesn't really address the problem. I am 
specifying the character set for the page (ISO-8859-1), and I'm 
inserting an ACCEPT-CHARSET parameter into the FORM element, but it 
specifies acceptable charsets as UTF-8, ISO-8859-1, and Windows 1252. 
The problem isn't accepting or displaying the characters correctly, the 
problem is figuring out what characters PHP thinks it's looking at!

After further investigation, I find that ISO-8859-1 doesn't even use 
ASCII codes 128-159, so when a user types in a smart quote, it can't 
_really_ be using Latin 1 (but could be Windows Latin 1).

Oddly enough, I've set the page charset to "ISO-8859-1" (which doesn't 
have a smart quote), and my browser is set to "Use character set 
specified by server", and it displays a smart quote just fine with 
chr(147). If I manually change my browser to use "Latin 1", it displays 
a ? (unknown character symbol). So between browsers, character sets, 
meta tags, and operating systems, I'm beginning to think that 
interpreting high-ASCII input is an art rather than a science...

spud.

On Tuesday, October 29, 2002, at 02:51  PM, Brent Baisley wrote:

I think you have posted before and probably didn't get an answer. I'm 
not going to give you an answer (because I don't have one), but 
perhaps I can point you in the right direction.
Look at http://www.w3.org/TR/REC-html40/charset.html and see if that 
helps you. Below is a paragraph I pulled from it.

---
a.h.s. boy
spud(at)nothingness.org"as yes is to if,love is to yes"
http://www.nothingness.org/
---


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP] Cleaning pasted Word text

2002-10-29 Thread a . h . s . boy

Errr...I'm not sure how this is applicable to my situation. I'm 
concerned, above all, with converting

curly double quotes
curly single quotes
em and en dashes
inverted exclamation points
inverted question marks
ellipses
non-breaking spaces
registered trademark symbols
bullets
left and right guillemets

Many of these characters do not exist in the ISO Latin 1 character set, 
but can nonetheless be inserted by a browser which defaults to MacRoman 
or Windows Latin 1 (1252) character sets.

The big questions, I suppose, are:

1) What character/ASCII code does PHP interpret  (left curly quote) 
as, when pasted into a form?
2) Does it interpret it the same way pasted in on a Mac as on a Windows 
box?
3) What influence does the page charset meta tag have on such a 
submission?
4) What influence does the form ACCEPT-CHARSET parameter have?
5) What influence does the browser encoding setting have on such 
submissions?
and finally,
6) If all of these factors can influence the final interpretation of a 
character, what's the best way to approach handling all possible 
combinations?

All of this would be s much easier if I'd just get my hands on a 
Windows box for testing. Guess I'll have to do that. I'm just a bit 
surprised that no one seems to have tackled this problem already...it 
can't be that uncommon.

Then again, I've seen any number of CMS-driven web sites that obviously 
haven't this sort of conversion, including large news corporation 
sites. And given the paucity of Mac-friendly programming on the web, 
it's not too surprising that so few sites attempt to accommodate Mac 
users. (Testing for Mac compatibility tends to be on par with testing 
for Netscape 3.0 compatibility...not usually a very high priority, 
despite IE 5 for the Mac supposedly being more standards-compliant than 
the Windows version.)

spud.

On Tuesday, October 29, 2002, at 08:49  PM, Jimmy Brake wrote:

for file maker pro (windows/mac) -- word (windows/mac)

function make_safe($text)
{
$text = preg_replace("/(\cM)/", " ", $text);
$text = preg_replace("/(\c])/", " ", $text);
$text = str_replace("\r\n", " ", $text);
$text = str_replace("\x0B", " ", $text);
$text = str_replace('"', " ", $text);
$text = explode("\n", $text);
$text = implode(" ", $text);
$text = addslashes(trim($text));
return($text);
}

function make_safe2($text)
{
$text = str_replace("\r\n", "\n", $text);
$text = preg_replace("/(\cM)/", "\n", $text);
$text = preg_replace("/(\c])/", "\n", $text);
$text = str_replace("\x0B", "\n", $text);
$text = addslashes($text);
return($text);
}

cannot remember I why put in two functions ... but anyhow have fun you
will probably not the the implode / explode either



On Tue, 2002-10-29 at 16:39, Daniel Guerrier wrote:

Paste into notepad, the copy the text from notepad.
Notepad should remove the high ASCII text.
--- Brent Baisley <[EMAIL PROTECTED]> wrote:

I think you have posted before and probably didn't
get an answer. I'm
not going to give you an answer (because I don't
have one), but perhaps
I can point you in the right direction.
Look at http://www.w3.org/TR/REC-html40/charset.html
and see if that
helps you. Below is a paragraph I pulled from it.

The document character set, however, does not
suffice to allow user
agents to correctly interpret HTML documents as they
are typically
exchanged -- encoded as a sequence of bytes in a
file or during a
network transmission. User agents must also know the
specific character
encoding that was used to transform the document
character stream into a
byte stream.


On Tuesday, October 29, 2002, at 02:20 PM, a.h.s.
boy wrote:


I'm working on a PHP-based CMS that allows users

to post lengthy

article texts by submitting through a form. The

short version of my

quandary is this: How can I create a conversion

routine that reliably

substitutes HTML-acceptable output for high-ASCII

characters pasted

into the form (from a variety of operating

systems)?



--
Brent Baisley
Systems Architect
Landover Associates, Inc.
Search & Advisory Services for Advanced Technology
Environments
p: 212.759.6400/800.759.0577


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php




__
Do you Yahoo!?
HotJobs - Search new jobs daily now
http://hotjobs.yahoo.com/

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php






--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



---
a.h.s. boy
spud(at)nothingness.org"as yes is to if,love is to yes"
http://www.nothingness.org/
---


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://w

[PHP] More on cleaning Windows characters...

2002-11-03 Thread a . h . s . boy

After considerable investigation into the form input of non-Latin 1 
characters to be processed by PHP on a Linux box, I've been able to 
distill the issue down considerably, though a solution (and one oddity) 
remains confusing.

I found a very helpful web page entitled "On the use of some MS Windows 
characters in HTML" that explains my problem rather well at 
http://www.cs.tut.fi/~jkorpela/www/windows-chars.html. Recommended 
reading for anyone displaying text that may have been entered by 
Windows users, especially text pasted in from word-processing apps.

Basically, the problem is this: on a Windows machine using Windows 1252 
("Windows Latin 1"), a pair of smart quotes are ASCII characters 147 
and 148. There are a number of other "special" characters that Windows 
maps onto ASCII 128-159, like em dashes and trademark symbols.

Unfortunately, _true_ Latin 1 (iso-8859-1) reserves chars 128-159 for 
control characters. So, while you may type ALT-0147 to type a smart 
quote into your word processing app (or allow Word to create them 
automagically when you type a quote), when that very same character is 
pasted into a web page form set to accept iso-8859-1 or UTF-8 encoding, 
it DOES NOT MAP to chr(147) when processed by PHP on a Linux box.

Strangely, pasting in a Word-created smart quote character into a web 
form and processing it with PHP produces VERY ODD results. Take the 
string

="=

where the quotation mark is a curly-style quote. Tell PHP to step 
through the characters and print their ASCII value. The two equal signs 
are fine (char 61), but the curly quote comes across as THREE 
characters: (226)(128)(156). Where this comes from, I do not understand.

I'm inclined to think that if I _don't_ try to specify the 
accept-charset parameter on the form, and _don't_ try to convert em 
dashes, curly quotes, etc that I'll probably end up with cleaner text 
than I do now.

Still, if anyone has any really helpful input on this topic, please 
write me and let me know. We're getting into the ugly guts of page 
charset vs. form accept-charset vs. browser input charset vs. latin 1 
vs. Windows latin 1 vs. MacRoman here, but I'm surprised that no one 
has chimed in on this. Does anyone else ever run into this problem, or 
does everyone else's forms just handle all of this magically without 
any intervention?

spud.

---
a.h.s. boy
spud(at)nothingness.org"as yes is to if,love is to yes"
http://www.nothingness.org/
---


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

[PHP] ForceType hack with Apache 2?

2002-06-10 Thread a . h . s . boy


I've built an application framework in PHP that makes heavy use of the 
"smart URL" technique for passing variables, which works great with 
Apache 1.3.22. I have reports, however, that it breaks under Apache 2.x, 
and would like to verify whether or not anyone can confirm this.

I'm using URLs to pass parameters in the manner of the relatively 
well-known method, e.g.

http://www.server.com/info/display/123/index.php

with


   ForceType  application/x-httpd-php


in the httpd.conf file. It works like a charm (on Linux, anyway). I've 
run into problems with it under FreeBSD, but I was forewarned that it 
might not work there.

Another user of the framework, however, just installed Apache 2/PHP 
4.2.1 on Linux, and reports that the "smart URLs" aren't being so smart, 
and generate 404s.

Strangely, the URL

http://www.server.com/info

will correctly execute as PHP a script called "info", but

http://www.server.com/info/display/123

will NOT execute the info script, instead looking for a directory path 
that doesn't exist, and generating a 404 error.

Since I don't have Apache 2 installed, I can't test it myself. Has 
anyone used this trick (and made it work) with Apache 2? I don't know 
whether to blame a fundamental change in Apache 2, or to look for some 
other configuration error in the user's system.

Cheers,
spud.


---
a.h.s. boy
spud(at)nothingness.org"as yes is to if,love is to yes"
http://www.nothingness.org/
PGP Fingerprint: 7B5B 2E7A FA96 865A D9D9  5D6D 54CD D2C1 3429 56B4
---


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

[PHP] internationalization and gettext

2002-06-13 Thread a . h . s . boy


I developed a rather large and extensive PHP application for maintaining 
a news publishing site. All static text was, when I created it, written 
in English. Form field labels, long explanatory texts, navigational 
links, everything. The popularity of the application, however, has 
drifted outside the Anglo boundaries, and now I need to internationalize 
the code to support other languages for text output and date formatting. 
(Localization I'll leave up to those better qualified). I'm looking for 
feedback on various methods of allowing easy localization.

The date formatting stuff is easy enough with built-in PHP functions. 
All my dates are instantiated as objects anyway, so output format is 
nicely encapsulated in class methods, making that part easy.

The manner of supporting localization of text strings, however, allows 
for a few more options. I've taken note of how a few different PHP apps 
have broached the topic:

-- phpMyAdmin. Minimal static text. Loads a localized text file 
containing localized variable assignments, and always puts out text by 
variable reference rather than hard-coded text.

-- FUDforum2. Fairly large quantity of text. Also uses a text file of 
variables and their localized value, but seems to have a more indirect 
manner of putting out the variable-based text. (Part of the apparent 
"indirection" is that perusing the code reveals TONS of hard-coded 
English phrases that don't seem to be localized. But I haven't actually 
seen, as a web user, the FUDforum2 operating in a non-English language, 
so I can't really vouch for whether all the text is internationalized or 
not).

-- The PHP Manual speaks highly of gettext() calls, relying on a 
standard, mature UNIX function to automatically handle 
internationalization. It sounded good, and I liked the idea of it 
leveraging the power of something that was _designed_ for 
internationalization, and not a "good hack job of variable 
substitutions" attempting the same functionality.

So I've begun the process of modifying all of my text output from

echo ("My English phrase dangles");
to
echo _("My English phrase dangles");

and the like. It's beautiful. 3 extra characters -- "_()" -- to wrap 
around strings, and poof! it's 90% of the way towards the United Nations.

I created a quick English string file of a few webpages worth of text, 
threw together a rough French translation, and tried viewing the site in 
French. Seemed to work wonderfully.

So, this far, I'm happy, and continuing to modify the code to 
internationalize the output. I'm only a bit wary, however, because it 
seems relatively easy, yet I haven't run into any PHP applications that 
have internationalized their interface in this way. (That doesn't mean 
there aren't any out there, but I haven't run into them). Are there any 
potential problems that I need to be aware of? Substantial performance 
hits? Bogeymen?

I'd love to hear success stories from anyone using gettext() with a 
substantial number of strings (like 500-1000).

Cheers,
spud.

---
a.h.s. boy
spud(at)nothingness.org"as yes is to if,love is to yes"
http://www.nothingness.org/
PGP Fingerprint: 7B5B 2E7A FA96 865A D9D9  5D6D 54CD D2C1 3429 56B4
---


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP] OT- "Private Registrations" is bogus

[PHP] Cleaning pasted Word text

Re: [PHP] Cleaning pasted Word text

Re: [PHP] Cleaning pasted Word text

[PHP] More on cleaning Windows characters...

[PHP] ForceType hack with Apache 2?

[PHP] internationalization and gettext

7 matches

Site Navigation

Mail list logo

Footer information