Re: [PHP] New Service - Short URL Unshortener
On Sun, Sep 25, 2011 at 20:08, Mike Mackintosh wrote: > Hey All, > > Wanted to pass a kind word of a new service we launched called Unshortenr > (www.unshortenr.com) - which was linkrater.com. > > Input a short url and you'll get the target address, the page title, and a > description of the page. > > Soon to add an option to preview it with a images on/off type of deal. > > Feedback would be appreciated. Please don't send this type of thing to this list again, Mike. It may be a great service, and I wish you all the best of luck, but these lists are not for advertisements (even for sites such as this which were written in PHP). -- Network Infrastructure Manager http://www.php.net/ -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
[PHP] Sequential access of XML nodes.
Hi. I've got a project which will be needing to iterate some very large XML files (around 250 files ranging in size from around 50MB to several hundred MB - 2 of them are in excess of 500MB). The XML files have a root node and then a collection of products. In total, in all the files, there are going to be several million product details. Each XML feed will have a different structure as it relates to a different source of data. I plan to have an abstract reader class with the concrete classes being extensions of this, each covering the specifics of the format being received and has the ability to return a standardised view of the data for importing into mysql and eventually MongoDB. I want to use an XML iterator so that I can say something along the lines of ... 1 - Instantiate the XML iterator with the XML's URL. 2 - Iterate the XML getting back one node at a time without keeping all the nodes in memory. e.g. http://www.site.com/data.xml'); foreach($o_XML as $o_Product) { // Process product. } Add to this that some of the xml feeds come .gz, I want to be able to stream the XML out of the .gz file without having to extract the entire file first. I've not got access to the XML feeds yet (they are coming from the various affiliate networks around, and I'm a remote user so need to get credentials and the like). If you have any pointers on the capabilities of the various XML reader classes, based upon this scenario, then I'd be very grateful. In this instance, the memory limitation is important. The current code is string based and whilst it works, you can imagine the complexity of it. The structure of each product internally will be different, but I will be happy to get back a nested array or an XML fragment, as long as the iterator is only holding onto 1 array/fragment at a time and not caching the massive number of products per file. Thanks. Richard. -- Richard Quadling Twitter : EE : Zend : PHPDoc @RQuadling : e-e.com/M_248814.html : bit.ly/9O8vFY : bit.ly/lFnVea -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Sequential access of XML nodes.
On 26 Sep 2011, at 17:24, Richard Quadling wrote: > I've got a project which will be needing to iterate some very large > XML files (around 250 files ranging in size from around 50MB to > several hundred MB - 2 of them are in excess of 500MB). > > The XML files have a root node and then a collection of products. In > total, in all the files, there are going to be several million product > details. Each XML feed will have a different structure as it relates > to a different source of data. > > I plan to have an abstract reader class with the concrete classes > being extensions of this, each covering the specifics of the format > being received and has the ability to return a standardised view of > the data for importing into mysql and eventually MongoDB. > > I want to use an XML iterator so that I can say something along the lines of > ... > > 1 - Instantiate the XML iterator with the XML's URL. > 2 - Iterate the XML getting back one node at a time without keeping > all the nodes in memory. > > e.g. > > $o_XML = new SomeExtendedXMLReader('http://www.site.com/data.xml'); > foreach($o_XML as $o_Product) { > // Process product. > } > > > Add to this that some of the xml feeds come .gz, I want to be able to > stream the XML out of the .gz file without having to extract the > entire file first. > > I've not got access to the XML feeds yet (they are coming from the > various affiliate networks around, and I'm a remote user so need to > get credentials and the like). > > If you have any pointers on the capabilities of the various XML reader > classes, based upon this scenario, then I'd be very grateful. > > > In this instance, the memory limitation is important. The current code > is string based and whilst it works, you can imagine the complexity of > it. > > The structure of each product internally will be different, but I will > be happy to get back a nested array or an XML fragment, as long as the > iterator is only holding onto 1 array/fragment at a time and not > caching the massive number of products per file. As far as I'm aware, XML Parser can handle all of this. http://php.net/xml It's a SAX parser so you can feed it the data chunk by chunk. You can use gzopen to open gzipped files and manually feed the data into xml_parse. Be sure to read the docs carefully because there's a lot to be aware of when parsing an XML document in pieces. -Stuart -- Stuart Dallas 3ft9 Ltd http://3ft9.com/ -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] New Service - Short URL Unshortener
Le lundi 26 septembre 2011 02:08, Mike Mackintosh a écrit : > Hey All, > > Wanted to pass a kind word of a new service we launched called Unshortenr > (www.unshortenr.com) - which was linkrater.com. > > Input a short url and you'll get the target address, the page title, and a > description of the page. > > Soon to add an option to preview it with a images on/off type of deal. > > Feedback would be appreciated. > > Thanks, > > Mike and I also have to create a system of free license short url [1] and I hope to have in my team of contributors [1] http://urlshort.eu -- http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x092164A7 gpg --keyserver pgp.mit.edu --recv-key 092164A7 http://urlshort.eu fakessh @ pgpoHofB2G7Ou.pgp Description: PGP signature
Re: [PHP] Sequential access of XML nodes.
On Mon, Sep 26, 2011 at 12:24 PM, Richard Quadling wrote: > Hi. > > I've got a project which will be needing to iterate some very large > XML files (around 250 files ranging in size from around 50MB to > several hundred MB - 2 of them are in excess of 500MB). > > The XML files have a root node and then a collection of products. In > total, in all the files, there are going to be several million product > details. Each XML feed will have a different structure as it relates > to a different source of data. > > I plan to have an abstract reader class with the concrete classes > being extensions of this, each covering the specifics of the format > being received and has the ability to return a standardised view of > the data for importing into mysql and eventually MongoDB. > > I want to use an XML iterator so that I can say something along the lines > of ... > > 1 - Instantiate the XML iterator with the XML's URL. > 2 - Iterate the XML getting back one node at a time without keeping > all the nodes in memory. > > e.g. > > $o_XML = new SomeExtendedXMLReader('http://www.site.com/data.xml'); > foreach($o_XML as $o_Product) { > // Process product. > } > > > Add to this that some of the xml feeds come .gz, I want to be able to > stream the XML out of the .gz file without having to extract the > entire file first. > > I've not got access to the XML feeds yet (they are coming from the > various affiliate networks around, and I'm a remote user so need to > get credentials and the like). > > If you have any pointers on the capabilities of the various XML reader > classes, based upon this scenario, then I'd be very grateful. > > > In this instance, the memory limitation is important. The current code > is string based and whilst it works, you can imagine the complexity of > it. > > The structure of each product internally will be different, but I will > be happy to get back a nested array or an XML fragment, as long as the > iterator is only holding onto 1 array/fragment at a time and not > caching the massive number of products per file. > > Thanks. > > Richard. > > > -- > Richard Quadling > Twitter : EE : Zend : PHPDoc > @RQuadling : e-e.com/M_248814.html : bit.ly/9O8vFY : bit.ly/lFnVea > > -- > PHP General Mailing List (http://www.php.net/) > To unsubscribe, visit: http://www.php.net/unsub.php > > I believe the XMLReader allows you to pull node by node, and it's really easy to work with: http://www.php.net/manual/en/intro.xmlreader.php In terms of dealing with various forms of compression, I believe you con use the compression streams to handle this: http://stackoverflow.com/questions/1190906/php-open-gzipped-xml http://us3.php.net/manual/en/wrappers.compression.php Adam -- Nephtali: A simple, flexible, fast, and security-focused PHP framework http://nephtaliproject.com
[PHP] Question about losing port number
Hi: I have a general question about PHP: So basically I have a link, and I want the href to be absolute., so I do 'https://' . $_SERVER['HTTP_HOST'] . '/login' ; this gives me https://127.0.0.1/login on my local; however, what i really want is https://127.0.0.1:9090/login, it is missing ":9090". I also have tried to use $_SERVER['SERVER_PORT'], but $_SERVER['SERVER_PORT'] doesn't give me 9090, it gives me 80. Could anyone help me? Thx
[PHP] Re: Question about losing port number
On 09/26/2011 05:45 PM, vince chan wrote: > Hi: > I have a general question about PHP: > So basically I have a link, and I want the href to be absolute., so I > do 'https://' . $_SERVER['HTTP_HOST'] . '/login' ; this gives me > https://127.0.0.1/login on my local; however, what i really want is > https://127.0.0.1:9090/login, it is missing ":9090". I also have tried to > use $_SERVER['SERVER_PORT'], but $_SERVER['SERVER_PORT'] doesn't give me > 9090, it gives me 80. > > Could anyone help me? > Thx > I the page that you are on is connected via port 80 then $_SERVER['SERVER_PORT'] will be 80. If it is connected via 9090 then $_SERVER['SERVER_PORT'] will be 9090. If you want it to be different than how it is currently connected then you will have to hard code the port number in the href. -- Thanks! -Shawn http://www.spidean.com -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Question about losing port number
On 26 Sep 2011 at 23:45, vince chan wrote: > I have a general question about PHP: > So basically I have a link, and I want the href to be absolute., so I > do 'https://' . $_SERVER['HTTP_HOST'] . '/login' ; this gives me > https://127.0.0.1/login on my local; however, what i really want is > https://127.0.0.1:9090/login, it is missing ":9090". I also have tried to > use $_SERVER['SERVER_PORT'], but $_SERVER['SERVER_PORT'] doesn't give me > 9090, it gives me 80. Where does the 9090 come from? -- Cheers -- Tim -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Querying a database for 50 users' information: 50 queries or a WHERE array?
On Thu, Sep 15, 2011 at 00:56, Alex Nikitin wrote: > > MySQL real escape string doesn't work, it's a bad solution to the > problem that has been with the internets since the very beginning, and > if people program like they are taught to by books, doesn't look like > it's going away any time soon. The problem of course is that various > programming languages don't know how to talk to other languages, and > we as devs see no better way to do this then concatenate strings. > Basically this is the core reason why XSS and SQL injection is rampant > on the interwebs. Escaping only seems like it's a good idea to you, > but if you analyze what it does and compare it to today's technology, > you quickly realize how wrong of a concept it actually is. Escaping > looks for certain characters, and if found escapes them in some form. > The problem here is that rather then say defining all safe characters, > it defines what the developers believe to be bad characters, and the > affect that you get is not dissimilar to creating a firewall rule set > where the bottom rule is accept all, as long as my character doesn't > match what they thought was a bad character, it is allowed. This was > fine in the days of ASCII, but the tubes are hardly ASCII anymore, > with Unicode, UTF-16, i have 1,112,064 code points, they are not even > called characters anymore, because they really aren't. And if you are > familiar with best-fit mapping, you would know that there are now > dozens of characters that can represent any single symbol in ASCII, > meaning that using the above type of blocking mechanisms is silly and > technically insecure. > I agree with this point, except that MySQL does not parse any other (unicode) character as the single quote. > Another problem with it is the fact that security-wise this again is a > bad solution from another perspective. A programmer comes in, and > starts debugging code, the first thing they always seem to do is to > turn off the security and comment out the escape line, and you know > what happens, the bug gets found and fixed completely else-where, but > the security never gets re-enabled. This is called failing open, and > it again goes with the concept above where the escape in itself fails > open as well. > This has not been my experience. As for turning off the escape line, that is another argument for using the array that I demonstrated previously. > So if you look into the problem at the core, what you have are two > types of code, code that you know is good, and crap data that you have > to somehow make safe. So you know how you do it in the same language? > Right, you assign that data to a storage container called a variable, > and the interpreter knows that this data here, i execute, and that > data there i use as data and don't execute. Well what happens when you > add another language into the mix? Well language a passes known good > code that it string concatenates to bad code, and what you get as a > result is the second language parser thinking "hey, all of this stuff > is good code, let me execute it!"... This is why a stringent delimiter > between known good and not good data needs to be portrayed to the > second language. > > How do we do it with SQL? There are a few ways, one of the more common > ones is to use a prepared statement, this clearly separates the code > from the data for the SQL interpreter on the other side. This works > really well, with one HUGE down-side, it can be a REAL pain in the > butt to use, the more complex your query gets, the more pain in the > butt it is to use prepared statements. > I just googled prepared statements in PHP and I see that they don't need to be pre-prepared in the database. I must have been conflating them with stored procedures. Thanks, I'll play around and possibly adopt the use of prepared statements. > Another way, and this works for mostly any language is to use an > in-common function that jumbles the known-bad data on one end, and > unjumbles it as data on the other. For example base64. It works > extremely well, you take any data on the PHP side, base 64 encode it, > and send it to SQL or JS or whatever. you can string concatenate the > b64'd data, because you know what b64'd data looks like? Yep, data, > its not JS, it's not SQL, bunch of garbled junk. You can then use > b64decode on that data, and by the design of the function the result > will be just that, data. So with this you keep the code/data > separation even with string concatenation... > This is not good for searching the data afterwards. If I have a specific non-searchable field meant for code or such, then I do base64 encode. > Base 64 performs really well, and is well worth the few extra cycles > for the above-mentioned guaranteed code/data separation barrier, it's > easy to implement. More importantly, this by default fails closed. You > would have to disable at least 4 security points and change 2 queries > to disable this (and if you are usin
Re: [PHP] Sequential access of XML nodes.
On Mon, 26 Sep 2011 14:17:43 -0400, Adam Richardson wrote: >I believe the XMLReader allows you to pull node by node, and it's really >easy to work with: >http://www.php.net/manual/en/intro.xmlreader.php > >In terms of dealing with various forms of compression, I believe you con use >the compression streams to handle this: >http://stackoverflow.com/questions/1190906/php-open-gzipped-xml >http://us3.php.net/manual/en/wrappers.compression.php +1 here. XMLReader is easy and fast, and will do the job you want albeit without the nice foreach(...) loop Richard spec's. You just loop over reading the XML and checking the node type, watching the state of your stream to see how to handle each iteration. e.g. (assuming $xml is an open XMLReader, $db is PDO in example) $text = ''; $haveRecord = FALSE; $records = 0; // prepare insert statement $sql = ' insert into Product (ID, Product, ...) values (:ID, :Product, ...) '; $cmd = $db->prepare($sql); // set list of allowable fields and their parameter type $fields = array( 'ID' => PDO::PARAM_INT, 'Product' => PDO::PARAM_STR, ... ); while ($xml->read()) { switch ($xml->nodeType) { case XMLReader::ELEMENT: if ($xml->name === 'Product') { // start of Product element, // reset command parameters to empty foreach ($fields as $name => $type) { $cmd->bindValue(":$name", NULL, PDO::PARAM_NULL); } $haveRecord = TRUE; } $text = ''; break; case XMLReader::END_ELEMENT: if ($xml->name === 'Product') { // end of Product element, save record if ($haveRecord) { $result = $cmd->execute(); $records++; } $haveRecord = FALSE; } elseif ($haveRecord) { // still inside a Product element, // record field value and move on $name = $xml->name; if (array_key_exists($name, $fields)) { $cmd->bindValue(":$name", $text, $fields[$name]); } } $text = ''; break; case XMLReader::TEXT: case XMLReader::CDATA: // record value (or part value) of text or cdata node $text .= $xml->value; break; default: break; } } return $records; -- Ross McKay, Toronto, NSW Australia "Tuesday is Soylent Green day" -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php