Re: [PHP] Detecting Binaries

2004-02-23 Thread Shane Nelson
Richard Davey wrote: Hello Axel, Monday, February 23, 2004, 7:03:38 PM, you wrote: AIM> Guys, this isn't THAT stupid of a question is it? From my perspective, AIM> the way PHP seems to see it is that I should already know what kind of AIM> file I'm looking at. In most cases that's not an unrea

Re: Re[4]: [PHP] Detecting Binaries

2004-02-23 Thread Lucas Gonze
Alternatively, count unigrams in the first 1000 characters and get the euclidean distance to a sample from e.g. an english text, a french text, a chinese text, etc. - Lucas -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP] Detecting Binaries

2004-02-23 Thread Evan Nemerson
On Monday 23 February 2004 03:02 pm, Axel IS Main wrote: > That's not bad, but I found a way to do it simply using chr() and > passing it a value. It turns out the if I go 0-31 Almost nothing will > get through. Even the simples html has something in there from that > list. However, by just looking

Re: [PHP] Detecting Binaries

2004-02-23 Thread Axel IS Main
That's not bad, but I found a way to do it simply using chr() and passing it a value. It turns out the if I go 0-31 Almost nothing will get through. Even the simples html has something in there from that list. However, by just looking between 14 and 26, one more than carriage return, and one le

Re[4]: [PHP] Detecting Binaries

2004-02-23 Thread Richard Davey
Hello Evan, Monday, February 23, 2004, 8:57:43 PM, you wrote: >> It would be wise to check for characters from 0 to 31, if they appear >> then it's almost certainly (but not guaranteed) binary. EN> Assuming that's decimal, you're including 0x09 0x0a and 0x0d which are, EN> respectively, tab, lin

Re: Re[2]: [PHP] Detecting Binaries

2004-02-23 Thread Evan Nemerson
On Monday 23 February 2004 11:55 am, Richard Davey wrote: > Hello Axel, > > Monday, February 23, 2004, 7:38:25 PM, you wrote: > > AIM> Thanks, you just gave me the solution, I think. I don't have to strip > AIM> out every character above standard ascii, I just have to look for > them. AIM> If one i

Re: [PHP] Detecting Binaries

2004-02-23 Thread Axel IS Main
Thanks, that's very helpful. It beats the heck out of doing it the way I've been doing it. Richard Davey wrote: Hello Axel, Monday, February 23, 2004, 7:38:25 PM, you wrote: AIM> Thanks, you just gave me the solution, I think. I don't have to strip AIM> out every character above standard ascii

Re[2]: [PHP] Detecting Binaries

2004-02-23 Thread Richard Davey
Hello Axel, Monday, February 23, 2004, 7:38:25 PM, you wrote: AIM> Thanks, you just gave me the solution, I think. I don't have to strip AIM> out every character above standard ascii, I just have to look for them. AIM> If one is there, then just get rid of it. It's true that an OS can't AIM> tell

Re: [PHP] Detecting Binaries

2004-02-23 Thread Marek Kilimajer
Generally, binaries have \0 in them, but it is not necessery. Axel IS Main wrote: Guys, this isn't THAT stupid of a question is it? From my perspective, the way PHP seems to see it is that I should already know what kind of file I'm looking at. In most cases that's not an unreasonable assumptio

Re: [PHP] Detecting Binaries

2004-02-23 Thread Adam Bregenzer
On Mon, 2004-02-23 at 14:19, Axel IS Main wrote: > Yes, and in fact that is what I am doing now. This is a spider bot > though, so I'm having to think of every single type of binary file that > could be linked to on the web. So far I'm up to 28 with no end in sight. > What about a .com file? I c

Re: [PHP] Detecting Binaries

2004-02-23 Thread Adam Voigt
Well actually to check .com, just make sure it contains a / then the .com, that will filter yahoo.com, but keep yahoo.com/downloadme.com On Mon, 2004-02-23 at 14:19, Axel IS Main wrote: > Yes, and in fact that is what I am doing now. This is a spider bot > though, so I'm having to think of every

Re[2]: [PHP] Detecting Binaries

2004-02-23 Thread Richard Davey
Hello Axel, Monday, February 23, 2004, 7:03:38 PM, you wrote: AIM> Guys, this isn't THAT stupid of a question is it? From my perspective, AIM> the way PHP seems to see it is that I should already know what kind of AIM> file I'm looking at. In most cases that's not an unreasonable AIM> assumption

Re: [PHP] Detecting Binaries

2004-02-23 Thread Jas
Well you can do a check on the mime type of the file. eg. $mimes = array("1" => "application/octet-stream", "2: => "image/jpeg", etc. For more info... http://us4.php.net/manual/en/ref.filesystem.php Just like the upload file function you can check for the mime types... http://us4.p

Re: [PHP] Detecting Binaries

2004-02-23 Thread Axel IS Main
Yes, and in fact that is what I am doing now. This is a spider bot though, so I'm having to think of every single type of binary file that could be linked to on the web. So far I'm up to 28 with no end in sight. What about a .com file? I can't omit links that end in .com can I? That would be co

Re: [PHP] Detecting Binaries

2004-02-23 Thread Adam Voigt
Couldn't you just check the extension on the file? On Mon, 2004-02-23 at 14:03, Axel IS Main wrote: > Guys, this isn't THAT stupid of a question is it? From my perspective, > the way PHP seems to see it is that I should already know what kind of > file I'm looking at. In most cases that's not a

Re: [PHP] Detecting Binaries

2004-02-23 Thread Axel IS Main
Guys, this isn't THAT stupid of a question is it? From my perspective, the way PHP seems to see it is that I should already know what kind of file I'm looking at. In most cases that's not an unreasonable assumption. Unfortunately, that's only good for most cases. PHP is rich in ways to work wit

[PHP] Detecting Binaries

2004-02-22 Thread Axel IS Main
I'm using file_get_contents() to open URLs. Does anyone know if there is a way to look at the result and determine if the file is binary? I'd like to be able to block binaries from being processed without having to try to think of all the possible binary extensions and omit them with a function