Do mp3 concatenation programs exist?

Peter Philipp Fri, 20 Oct 2006 13:47:57 -0700

Hi,

3 months ago I asked if any programs exist that do this according to this 
thread:


http://marc.theaimsgroup.com/?l=openbsd-misc&m=115298981814514&w=2

A bunch of you jumped on me like starving wolves, and well I didn't get
any of the help I was looking for.  So tonight at the beginning of the 
weekend I set out to learn how an MP3 frame is constructed and I "reversed"
the GNU code of MPlayer, feel free to use this information under a BSD 
license for your own programs.  Now all I gotta do is bang out my program
based on this info. :-)

Kind regards,

-peter

MP3 Header

[explanation from RFC 3119]

2. The Structure of MP3 Frames

   In this section we give a brief overview of the structure of a MP3
   frame.  (For more detailed description, see the MPEG 1 audio [3] and
   MPEG 2 audio [4] specifications.)

   Each MPEG audio frame begins with a 4-byte header.  Information
   defined by this header includes:

   -  Whether the audio is MPEG 1 or MPEG 2.
   -  Whether the audio is layer I, II, or III.
      (The remainder of this document assumes layer III, i.e., "MP3"
      frames)
   -  Whether the audio is mono or stereo.
   -  Whether or not there is a 2-byte CRC field following the header.
   -  (indirectly) The size of the frame.

   The following structures appear after the header:
   -  (optionally) A 2-byte CRC field
   -  A "side info" structure.  This has the following length:
      -  32 bytes for MPEG 1 stereo
      -  17 bytes for MPEG 1 mono, or for MPEG 2 stereo
      -  9 bytes for MPEG 2 mono
   -  Encoded audio data, plus optional ancillary data (filling out the
      rest of the frame)

   For the purpose of this document, the "side info" structure is the
   most important, because it defines the location and size of the
   "Application Data Unit" (ADU) that an MP3 decoder will process.  In
   particular, the "side info" structure defines:

   -  "main_data_begin": This is a back-pointer (in bytes) to the start
      of the ADU.  The back-pointer is counted from the beginning of the
      frame, and counts only encoded audio data and any ancillary data
      (i.e., ignoring any header, CRC, or "side info" fields).

   An MP3 decoder processes each ADU independently.  The ADUs will
   generally vary in length, but their average length will, of course,
   be that of the of the MP3 frames (minus the length of the header,
   CRC, and "side info" fields).  (In MPEG literature, this ADU is
   sometimes referred to as a "bit reservoir".)

---

[Reverse engineered from MPlayer-1.0pre7/libmpdemux/mpeg_hdr.c]
[// based on libmpeg2/header.c by Aaron Holtzman <[EMAIL PROTECTED]>]
[which has the following license: ]
/*
 * header.c
 * Copyright (C) 2000-2003 Michel Lespinasse <[EMAIL PROTECTED]>
 * Copyright (C) 2003      Regis Duchesne <[EMAIL PROTECTED]>
 * Copyright (C) 1999-2000 Aaron Holtzman <[EMAIL PROTECTED]>
 *
 * This file is part of mpeg2dec, a free MPEG-2 video stream decoder.
 * See http://libmpeg2.sourceforge.net/ for updates.
 *
 * mpeg2dec is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 * (at your option) any later version.
 *
 * mpeg2dec is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with this program; if not, write to the Free Software
 * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
 *
 *
 * Modified for use with MPlayer, see libmpeg-0.4.0.diff for the exact changes.
 * detailed CVS changelog at http://www.mplayerhq.hu/cgi-bin/cvsweb.cgi/main/
 * $Id: header.c,v 1.18 2005/02/19 02:32:12 diego Exp $
 */

+----------------+----------------+----------------+----------------+
|                      |[D]|[B]|[E|  [F]  |[C]|G|  |[H]|   [I]      |
|  13 bits set [A]     |   |   |] |       |   |*|  |   |            |
+----------------+----------------+----------------+----------------+

[ following struct written by Peter Philipp (could be erroneous) based on ]
[ the information taken from the above mentioned files ]

/* network byte order */
struct mp3_header {
        u_int16_t first;
#define FIRST_13_BITS   0xFFE0          /* must be set */
#define HAS_CRC         0x1             /* if set CRC trails header */
#define HAS_LAYER3      0x2
#define HAS_LAYER2      0x4
#define HAS_LAYER1      0x6
#define HAS_MPEG1       0x8             /* if not set MPEG 2.0 leftshift 1 */
#define HAS_MPEG1ORMPEG2 0x16           /* if not set MPEG 2.5 */
        u_int16_t second;
#define HAS_MONO                0xC0    /* if set is MONO (1 channel) */
#define HAS_PADDING             0x200   /* if set has padding byte */
/* sample frequency if none set it's MPEG 1.0 */
#define HAS_SAMPLE_FQ1          0x400   /* MPEG 2.0 */
#define HAS_SAMPLE_FQ2          0x800   /* MPEG 2.5 */
/* frequency bits */
                                        /* MP3 layers 1, 2, 2.5 respectively */
#define HAS_FREQ_SLOT1          0x1000  /* 32, 32, 32   with lsf 32, 8, 8 */
#define HAS_FREQ_SLOT2          0x2000  /* 64, 48, 40   with lsf 48, 16, 16 */
#define HAS_FREQ_SLOT3          0x3000  /* 96, 56, 48   with lsf 56, 24, 24 */
#define HAS_FREQ_SLOT4          0x4000  /* 128, 64, 56  with lsf 64, 32, 32 */
#define HAS_FREQ_SLOT5          0x5000  /* 160, 80, 64  with lsf 80, 40, 40 */
#define HAS_FREQ_SLOT6          0x6000  /* 192, 96, 80  with lsf 96, 48, 48 */
#define HAS_FREQ_SLOT7          0x7000  /* 224, 112, 96 with lsf 112, 56, 56 */
#define HAS_FREQ_SLOT8          0x8000  /* 256, 128, 112 with lsf 128, 64, 64 */
#define HAS_FREQ_SLOT9          0x9000  /* 288, 160, 128 with lsf 144, 80, 80 */
#define HAS_FREQ_SLOT10         0xA000  /* 320, 192, 160 with lsf 160, 96, 96 */
#define HAS_FREQ_SLOT11         0xB000  /* 352, 224, 192 with lsf 176, 112, 112 
*/
#define HAS_FREQ_SLOT12         0xC000  /* 384, 256, 224 with lsf 192, 128, 128 
*/
#define HAS_FREQ_SLOT13         0xD000  /* 416, 320, 256 with lsf 224, 144, 144 
*/
#define HAS_FREQ_SLOT14         0xE000  /* 448, 384, 320 with lsf 256, 160, 160 
*/
#define HAS_FREQ_SLOT15         0xF000  /* not used */
};

---
[explanation of 4 byte header drawing]

[A] If the first 13 bits are not set, this is not a MP3 frame

[B] defines layer (2 bits) (something there must be set or not layer 1/2/3)
    0x1 = layer3
    0x2 = layer2
    0x3 = layer1

[C] sampling frequency valid are 0,1 and 2.
    0x0
    0x1
    0x2
    0x3 Invalid

[D] LSF (?) Leftshift Sampling Frequency (?)
    0x1 if set sampling frequency stays as is, if not set add 0x3 to it.
        lsf = 1 if not set else 0
    0x2 if set indicates MPEG 1.0 or MPEG 2.0, if not set MPEG 2.5

    if MPEG 2.5 add 0x6 to sampling frequency.
    lsf = 1.

static long freqs[9] = { 44100, 48000, 32000,   // MPEG 1.0
                         22050, 24000, 16000,   // MPEG 2.0
                         11025, 12000,  8000};  // MPEG 2.5

     The value of LSF determines the offset of the table-matrix above,
     if MPEG 1.0 (bits 0x1 and 0x2 set), if MPEG 2.0 (bit 0x2 set)      
     if MPEG 2.5 (bit 0x2 not set)

[E] if this bit is set, CRC trails 4 byte header (2 bytes CRC)

[F] 4 bits (1 nibble) indicating the bitrate index
        
[G] if set indicates there is padding...

[H] indicates whether stereo or mono
    0x0 stereo (2)
    0x1 stereo (2) if 0x2 unset
    0x2 stereo (2) if 0x1 unset
    0x3 mono

if lsf the ssize is 9 if Mono, else it's 17 if stereo
if lsf not set ssize is 17 if Mono else 32 if stereo

add 2 to ssize if CRC[E] is set 

[I] Unused.

FRAMESIZE is then determined by this TABLE-MATRIX (result is bits)

/* tabsel for layer 1, or layer 2 (layer 2.5 is part of layer 2) */
static int tabsel_123[2][3][16] = {
   { {0,32,64,96,128,160,192,224,256,288,320,352,384,416,448,0},
     {0,32,48,56, 64, 80, 96,112,128,160,192,224,256,320,384,0},
     {0,32,40,48, 56, 64, 80, 96,112,128,160,192,224,256,320,0} },

   { {0,32,48,56,64,80,96,112,128,144,160,176,192,224,256,0},
     {0,8,16,24,32,40,48,56,64,80,96,112,128,144,160,0},
     {0,8,16,24,32,40,48,56,64,80,96,112,128,144,160,0} }
};

framesize = tabsel_123[lsf][layer-1][bitrate_index] * mult[layer-1];

In MP3's case this is only the following:

no lsf     { {0,32,64,96,128,160,192,224,256,288,320,352,384,416,448,0},
with lsf   { {0,32,48,56,64,80,96,112,128,144,160,176,192,224,256,0},

Take your pick from the bitrate index (should be 0 through 15, terminated with 
0)

In the end the result is multiplied by the multiplier table which depends
on the layer:

Layer 1 - 144000
Layer 2 - 144000
Layer 3 - 12000 

----

    framesize /= freqs[sampling_frequency]<<lsf;        

    Divide by "sampling frequency" frequency with leftshift if needed.
        
    if(layer==1)
      framesize = (framesize+padding)*4;
    else
      framesize += padding;

     Add padding byte if needed.


-- 
Here my ticker tape .signature #### My name is Peter Philipp #### lynx -dump 
"http://en.wikipedia.org/w/index.php?title=Pufferfish&oldid=20768394"; | sed -n 
131,137p #### http://centroid.eu #### So long and thanks for all the fish!!!

Do mp3 concatenation programs exist?

Reply via email to