Richard Davey wrote:
Hello Axel,
Monday, February 23, 2004, 7:03:38 PM, you wrote:
AIM> Guys, this isn't THAT stupid of a question is it? From my perspective,
AIM> the way PHP seems to see it is that I should already know what kind of
AIM> file I'm looking at. In most cases that's not an unrea
Alternatively, count unigrams in the first 1000 characters and get the
euclidean distance to a sample from e.g. an english text, a french
text, a chinese text, etc.
- Lucas
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
On Monday 23 February 2004 03:02 pm, Axel IS Main wrote:
> That's not bad, but I found a way to do it simply using chr() and
> passing it a value. It turns out the if I go 0-31 Almost nothing will
> get through. Even the simples html has something in there from that
> list. However, by just looking
That's not bad, but I found a way to do it simply using chr() and
passing it a value. It turns out the if I go 0-31 Almost nothing will
get through. Even the simples html has something in there from that
list. However, by just looking between 14 and 26, one more than carriage
return, and one le
Hello Evan,
Monday, February 23, 2004, 8:57:43 PM, you wrote:
>> It would be wise to check for characters from 0 to 31, if they appear
>> then it's almost certainly (but not guaranteed) binary.
EN> Assuming that's decimal, you're including 0x09 0x0a and 0x0d which are,
EN> respectively, tab, lin
On Monday 23 February 2004 11:55 am, Richard Davey wrote:
> Hello Axel,
>
> Monday, February 23, 2004, 7:38:25 PM, you wrote:
>
> AIM> Thanks, you just gave me the solution, I think. I don't have to strip
> AIM> out every character above standard ascii, I just have to look for
> them. AIM> If one i
Thanks, that's very helpful. It beats the heck out of doing it the way
I've been doing it.
Richard Davey wrote:
Hello Axel,
Monday, February 23, 2004, 7:38:25 PM, you wrote:
AIM> Thanks, you just gave me the solution, I think. I don't have to strip
AIM> out every character above standard ascii
Hello Axel,
Monday, February 23, 2004, 7:38:25 PM, you wrote:
AIM> Thanks, you just gave me the solution, I think. I don't have to strip
AIM> out every character above standard ascii, I just have to look for them.
AIM> If one is there, then just get rid of it. It's true that an OS can't
AIM> tell
Generally, binaries have \0 in them, but it is not necessery.
Axel IS Main wrote:
Guys, this isn't THAT stupid of a question is it? From my perspective,
the way PHP seems to see it is that I should already know what kind of
file I'm looking at. In most cases that's not an unreasonable
assumptio
On Mon, 2004-02-23 at 14:19, Axel IS Main wrote:
> Yes, and in fact that is what I am doing now. This is a spider bot
> though, so I'm having to think of every single type of binary file that
> could be linked to on the web. So far I'm up to 28 with no end in sight.
> What about a .com file? I c
Well actually to check .com, just make sure it contains a / then the
.com, that will filter yahoo.com, but keep yahoo.com/downloadme.com
On Mon, 2004-02-23 at 14:19, Axel IS Main wrote:
> Yes, and in fact that is what I am doing now. This is a spider bot
> though, so I'm having to think of every
Hello Axel,
Monday, February 23, 2004, 7:03:38 PM, you wrote:
AIM> Guys, this isn't THAT stupid of a question is it? From my perspective,
AIM> the way PHP seems to see it is that I should already know what kind of
AIM> file I'm looking at. In most cases that's not an unreasonable
AIM> assumption
Well you can do a check on the mime type of the file. eg.
$mimes = array("1" => "application/octet-stream",
"2: => "image/jpeg",
etc.
For more info...
http://us4.php.net/manual/en/ref.filesystem.php
Just like the upload file function you can check for the mime types...
http://us4.p
Yes, and in fact that is what I am doing now. This is a spider bot
though, so I'm having to think of every single type of binary file that
could be linked to on the web. So far I'm up to 28 with no end in sight.
What about a .com file? I can't omit links that end in .com can I? That
would be co
Couldn't you just check the extension on the file?
On Mon, 2004-02-23 at 14:03, Axel IS Main wrote:
> Guys, this isn't THAT stupid of a question is it? From my perspective,
> the way PHP seems to see it is that I should already know what kind of
> file I'm looking at. In most cases that's not a
Guys, this isn't THAT stupid of a question is it? From my perspective,
the way PHP seems to see it is that I should already know what kind of
file I'm looking at. In most cases that's not an unreasonable
assumption. Unfortunately, that's only good for most cases. PHP is rich
in ways to work wit
I'm using file_get_contents() to open URLs. Does anyone know if there is
a way to look at the result and determine if the file is binary? I'd
like to be able to block binaries from being processed without having to
try to think of all the possible binary extensions and omit them with a
function
17 matches
Mail list logo