If you encode your binary data as pure ascii, in and out, any database will do 
just fine.

Encodings to consider are mime-64 (https://metacpan.org/pod/MIME::Base64) and 
q-encoding (https://en.wikipedia.org/wiki/Quoted-printable ) - both are 
lossless and can store any binary data.  Hexadecimal representation of the 
binary data also works but is less efficient.  You can also consider 
compressing the data before converting to a pure-ascii representation to help 
address space concerns.  XML with "character entities" ( &ddddd; encoding) is 
another option - or some mix of these.  To store large runs of text in the 
database you can break the text into segments.  If the "binary files" are 
mostly text with some "binary" characters you can use q-encoding fileid, line, 
segment (three fields of type "NUMBER" storing an integer).   If the data is 
truly binary (no discernable structure) you can use fileid, block, segment 
(three NUMBER fields).  You can then use VARCHAR with a size of 256 or 512 or 
1024 (etc) to store the encoded binary data as characters.  Databases which 
offer "BLOB" (Binary Large OBject) or "CLOB" (Character Large OBject) 
capability often do so in a way that trades efficiency for flexibility and 
standards compliance.  The method I suggest should not need anything beyond 
standard capabilities that every database, mysql included, offers.  It may be 
less efficient than choosing a specific database and exploiting the specific 
capabilities of that database.

If you stick to LCD (Least Common Denominator) for your database interactions 
then you can even make the database type configurable at run-time.

I once stored arbitrary text (log) files in an Oracle database which had a 
maximum VARCHAR size of 2048.  My code broke lines up into segments of no more 
than 512 characters and I q-encoded any non-ascii characters.  The source 
character encoding was a one-byte-to-a-character-always (Windows 1252) encoding 
so the worst case was expanding one character to three ( "=3d" represented "=" 
, while "=FC" represented "LATIN SMALL LETTER U WITH DIAERESIS")
I didn't have to worry about the database character set as long as it was a 
superset of ascii.  I could store files of any size and with any length lines.  
I chose simplicity over fighting with the non-standard BLOB or CLOB apis.

If the files had been really binary I could have used 
https://metacpan.org/pod/MIME::Base64 .

This is the perl code I used for MIME Q-Encoding:

sub mimeq
{
my($in)  = @_;
my($out);

  $out =  $in;
  $out =~ s/(\W)/sprintf('=%02x', ord($1))/eg ;

  return $out;

}

sub unmimeq
{
my($in)  = @_;
my($out);

  $out =  $in;
  $out =~ s/=([\da-fA-F][\da-fA-F])/chr hex $1/eg ;

  return $out;

}

And a variation I sometimes use:

sub mimeq-permissive
{
my($in)  = @_;
my($out);

  $out =  $in;
  $out =~ 
s{([^]\-\s|#'a-zA-Z0-9,+.?()*,_.:;%@&/[])}{sprintf("=%02x",ord($1))}eg ;

  return $out;

}

I agree that using a filename or a URL/URI in the database and storing the 
actual file data someplace else, pointed to by the database, is another option, 
and one that is often used.

If your "binary data" isn't very big and you really want to, you can just use 
mime-64 and a VARCHAR and go happily about your business.

The choice is yours.

Perl - there is more than one way to do it.





The information contained in this electronic mail transmission is intended only 
for the use of the individual or entity named in this transmission. If you are 
not the intended recipient of this transmission, you are hereby notified that 
any disclosure, copying or distribution of the contents of this transmission is 
strictly prohibited and that you should delete the contents of this 
transmission from your system immediately. Any comments or statements contained 
in this transmission do not necessarily reflect the views or position of Radial 
or its subsidiaries and/or affiliates.

Reply via email to