P1L1	
P1L2	
P1L3	
P1L4	Network Working Group                                     H. Schulzrinne
P1L5	Request for Comments: 3551                           Columbia University
P1L6	Obsoletes: 1890                                                S. Casner
P1L7	Category: Standards Track                                  Packet Design
P1L8	                                                               July 2003
P1L9	
P1L10	
P1L11	              RTP Profile for Audio and Video Conferences
P1L12	                          with Minimal Control
P1L13	
P1L14	Status of this Memo
P1L15	
P1L16	   This document specifies an Internet standards track protocol for the
P1L17	   Internet community, and requests discussion and suggestions for
P1L18	   improvements.  Please refer to the current edition of the "Internet
P1L19	   Official Protocol Standards" (STD 1) for the standardization state
P1L20	   and status of this protocol.  Distribution of this memo is unlimited.
P1L21	
P1L22	Copyright Notice
P1L23	
P1L24	   Copyright (C) The Internet Society (2003).  All Rights Reserved.
P1L25	
P1L26	Abstract
P1L27	
P1L28	   This document describes a profile called "RTP/AVP" for the use of the
P1L29	   real-time transport protocol (RTP), version 2, and the associated
P1L30	   control protocol, RTCP, within audio and video multiparticipant
P1L31	   conferences with minimal control.  It provides interpretations of
P1L32	   generic fields within the RTP specification suitable for audio and
P1L33	   video conferences.  In particular, this document defines a set of
P1L34	   default mappings from payload type numbers to encodings.
P1L35	
P1L36	   This document also describes how audio and video data may be carried
P1L37	   within RTP.  It defines a set of standard encodings and their names
P1L38	   when used within RTP.  The descriptions provide pointers to reference
P1L39	   implementations and the detailed standards.  This document is meant
P1L40	   as an aid for implementors of audio, video and other real-time
P1L41	   multimedia applications.
P1L42	
P1L43	   This memorandum obsoletes RFC 1890.  It is mostly backwards-
P1L44	   compatible except for functions removed because two interoperable
P1L45	   implementations were not found.  The additions to RFC 1890 codify
P1L46	   existing practice in the use of payload formats under this profile
P1L47	   and include new payload formats defined since RFC 1890 was published.
P1L48	
P2L1	Table of Contents
P2L2	
P2L3	   1.  Introduction .................................................  3
P2L4	       1.1  Terminology .............................................  3
P2L5	   2.  RTP and RTCP Packet Forms and Protocol Behavior ..............  4
P2L6	   3.  Registering Additional Encodings .............................  6
P2L7	   4.  Audio ........................................................  8
P2L8	       4.1  Encoding-Independent Rules ..............................  8
P2L9	       4.2  Operating Recommendations ...............................  9
P2L10	       4.3  Guidelines for Sample-Based Audio Encodings ............. 10
P2L11	       4.4  Guidelines for Frame-Based Audio Encodings .............. 11
P2L12	       4.5  Audio Encodings ......................................... 12
P2L13	            4.5.1   DVI4 ............................................ 13
P2L14	            4.5.2   G722 ............................................ 14
P2L15	            4.5.3   G723 ............................................ 14
P2L16	            4.5.4   G726-40, G726-32, G726-24, and G726-16 .......... 18
P2L17	            4.5.5   G728 ............................................ 19
P2L18	            4.5.6   G729 ............................................ 20
P2L19	            4.5.7   G729D and G729E ................................. 22
P2L20	            4.5.8   GSM ............................................. 24
P2L21	            4.5.9   GSM-EFR ......................................... 27
P2L22	            4.5.10  L8 .............................................. 27
P2L23	            4.5.11  L16 ............................................. 27
P2L24	            4.5.12  LPC ............................................. 27
P2L25	            4.5.13  MPA ............................................. 28
P2L26	            4.5.14  PCMA and PCMU ................................... 28
P2L27	            4.5.15  QCELP ........................................... 28
P2L28	            4.5.16  RED ............................................. 29
P2L29	            4.5.17  VDVI ............................................ 29
P2L30	   5.  Video ........................................................ 30
P2L31	       5.1  CelB .................................................... 30
P2L32	       5.2  JPEG .................................................... 30
P2L33	       5.3  H261 .................................................... 30
P2L34	       5.4  H263 .................................................... 31
P2L35	       5.5  H263-1998 ............................................... 31
P2L36	       5.6  MPV ..................................................... 31
P2L37	       5.7  MP2T .................................................... 31
P2L38	       5.8  nv ...................................................... 32
P2L39	   6.  Payload Type Definitions ..................................... 32
P2L40	   7.  RTP over TCP and Similar Byte Stream Protocols ............... 34
P2L41	   8.  Port Assignment .............................................. 34
P2L42	   9.  Changes from RFC 1890 ........................................ 35
P2L43	   10. Security Considerations ...................................... 38
P2L44	   11. IANA Considerations .......................................... 39
P2L45	   12. References ................................................... 39
P2L46	       12.1 Normative References .................................... 39
P2L47	       12.2 Informative References .................................. 39
P2L48	   13. Current Locations of Related Resources ....................... 41
P3L1	   14. Acknowledgments .............................................. 42
P3L2	   15. Intellectual Property Rights Statement ....................... 43
P3L3	   16. Authors' Addresses ........................................... 43
P3L4	   17. Full Copyright Statement ..................................... 44
P3L5	
P3L6	1. Introduction
P3L7	
P3L8	   This profile defines aspects of RTP left unspecified in the RTP
P3L9	   Version 2 protocol definition (RFC 3550) [1].  This profile is
P3L10	   intended for the use within audio and video conferences with minimal
P3L11	   session control.  In particular, no support for the negotiation of
P3L12	   parameters or membership control is provided.  The profile is
P3L13	   expected to be useful in sessions where no negotiation or membership
P3L14	   control are used (e.g., using the static payload types and the
P3L15	   membership indications provided by RTCP), but this profile may also
P3L16	   be useful in conjunction with a higher-level control protocol.
P3L17	
P3L18	   Use of this profile may be implicit in the use of the appropriate
P3L19	   applications; there may be no explicit indication by port number,
P3L20	   protocol identifier or the like.  Applications such as session
P3L21	   directories may use the name for this profile specified in Section
P3L22	   11.
P3L23	
P3L24	   Other profiles may make different choices for the items specified
P3L25	   here.
P3L26	
P3L27	   This document also defines a set of encodings and payload formats for
P3L28	   audio and video.  These payload format descriptions are included here
P3L29	   only as a matter of convenience since they are too small to warrant
P3L30	   separate documents.  Use of these payload formats is NOT REQUIRED to
P3L31	   use this profile.  Only the binding of some of the payload formats to
P3L32	   static payload type numbers in Tables 4 and 5 is normative.
P3L33	
P3L34	1.1 Terminology
P3L35	
P3L36	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
P3L37	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
P3L38	   document are to be interpreted as described in RFC 2119 [2] and
P3L39	   indicate requirement levels for implementations compliant with this
P3L40	   RTP profile.
P3L41	
P3L42	   This document defines the term media type as dividing encodings of
P3L43	   audio and video content into three classes: audio, video and
P3L44	   audio/video (interleaved).
P3L45	
P3L46	
P3L47	
P3L48	
P4L1	2. RTP and RTCP Packet Forms and Protocol Behavior
P4L2	
P4L3	   The section "RTP Profiles and Payload Format Specifications" of RFC
P4L4	   3550 enumerates a number of items that can be specified or modified
P4L5	   in a profile.  This section addresses these items.  Generally, this
P4L6	   profile follows the default and/or recommended aspects of the RTP
P4L7	   specification.
P4L8	
P4L9	   RTP data header: The standard format of the fixed RTP data
P4L10	      header is used (one marker bit).
P4L11	
P4L12	   Payload types: Static payload types are defined in Section 6.
P4L13	
P4L14	   RTP data header additions: No additional fixed fields are
P4L15	      appended to the RTP data header.
P4L16	
P4L17	   RTP data header extensions: No RTP header extensions are
P4L18	      defined, but applications operating under this profile MAY use
P4L19	      such extensions.  Thus, applications SHOULD NOT assume that the
P4L20	      RTP header X bit is always zero and SHOULD be prepared to ignore
P4L21	      the header extension.  If a header extension is defined in the
P4L22	      future, that definition MUST specify the contents of the first 16
P4L23	      bits in such a way that multiple different extensions can be
P4L24	      identified.
P4L25	
P4L26	   RTCP packet types: No additional RTCP packet types are defined
P4L27	      by this profile specification.
P4L28	
P4L29	   RTCP report interval: The suggested constants are to be used for
P4L30	      the RTCP report interval calculation.  Sessions operating under
P4L31	      this profile MAY specify a separate parameter for the RTCP traffic
P4L32	      bandwidth rather than using the default fraction of the session
P4L33	      bandwidth.  The RTCP traffic bandwidth MAY be divided into two
P4L34	      separate session parameters for those participants which are
P4L35	      active data senders and those which are not.  Following the
P4L36	      recommendation in the RTP specification [1] that 1/4 of the RTCP
P4L37	      bandwidth be dedicated to data senders, the RECOMMENDED default
P4L38	      values for these two parameters would be 1.25% and 3.75%,
P4L39	      respectively.  For a particular session, the RTCP bandwidth for
P4L40	      non-data-senders MAY be set to zero when operating on
P4L41	      unidirectional links or for sessions that don't require feedback
P4L42	      on the quality of reception.  The RTCP bandwidth for data senders
P4L43	      SHOULD be kept non-zero so that sender reports can still be sent
P4L44	      for inter-media synchronization and to identify the source by
P4L45	      CNAME.  The means by which the one or two session parameters for
P4L46	      RTCP bandwidth are specified is beyond the scope of this memo.
P4L47	
P4L48	
P5L1	   SR/RR extension: No extension section is defined for the RTCP SR
P5L2	      or RR packet.
P5L3	
P5L4	   SDES use: Applications MAY use any of the SDES items described
P5L5	      in the RTP specification.  While CNAME information MUST be sent
P5L6	      every reporting interval, other items SHOULD only be sent every
P5L7	      third reporting interval, with NAME sent seven out of eight times
P5L8	      within that slot and the remaining SDES items cyclically taking up
P5L9	      the eighth slot, as defined in Section 6.2.2 of the RTP
P5L10	      specification.  In other words, NAME is sent in RTCP packets 1, 4,
P5L11	      7, 10, 13, 16, 19, while, say, EMAIL is used in RTCP packet 22.
P5L12	
P5L13	   Security: The RTP default security services are also the default
P5L14	      under this profile.
P5L15	
P5L16	   String-to-key mapping: No mapping is specified by this profile.
P5L17	
P5L18	   Congestion: RTP and this profile may be used in the context of
P5L19	      enhanced network service, for example, through Integrated Services
P5L20	      (RFC 1633) [4] or Differentiated Services (RFC 2475) [5], or they
P5L21	      may be used with best effort service.
P5L22	
P5L23	      If enhanced service is being used, RTP receivers SHOULD monitor
P5L24	      packet loss to ensure that the service that was requested is
P5L25	      actually being delivered.  If it is not, then they SHOULD assume
P5L26	      that they are receiving best-effort service and behave
P5L27	      accordingly.
P5L28	
P5L29	      If best-effort service is being used, RTP receivers SHOULD monitor
P5L30	      packet loss to ensure that the packet loss rate is within
P5L31	      acceptable parameters.  Packet loss is considered acceptable if a
P5L32	      TCP flow across the same network path and experiencing the same
P5L33	      network conditions would achieve an average throughput, measured
P5L34	      on a reasonable timescale, that is not less than the RTP flow is
P5L35	      achieving.  This condition can be satisfied by implementing
P5L36	      congestion control mechanisms to adapt the transmission rate (or
P5L37	      the number of layers subscribed for a layered multicast session),
P5L38	      or by arranging for a receiver to leave the session if the loss
P5L39	      rate is unacceptably high.
P5L40	
P5L41	      The comparison to TCP cannot be specified exactly, but is intended
P5L42	      as an "order-of-magnitude" comparison in timescale and throughput.
P5L43	      The timescale on which TCP throughput is measured is the round-
P5L44	      trip time of the connection.  In essence, this requirement states
P5L45	      that it is not acceptable to deploy an application (using RTP or
P5L46	      any other transport protocol) on the best-effort Internet which
P5L47	      consumes bandwidth arbitrarily and does not compete fairly with
P5L48	      TCP within an order of magnitude.
P6L1	   Underlying protocol: The profile specifies the use of RTP over
P6L2	      unicast and multicast UDP as well as TCP.  (This does not preclude
P6L3	      the use of these definitions when RTP is carried by other lower-
P6L4	      layer protocols.)
P6L5	
P6L6	   Transport mapping: The standard mapping of RTP and RTCP to
P6L7	      transport-level addresses is used.
P6L8	
P6L9	   Encapsulation: This profile leaves to applications the
P6L10	      specification of RTP encapsulation in protocols other than UDP.
P6L11	
P6L12	3.  Registering Additional Encodings
P6L13	
P6L14	   This profile lists a set of encodings, each of which is comprised of
P6L15	   a particular media data compression or representation plus a payload
P6L16	   format for encapsulation within RTP.  Some of those payload formats
P6L17	   are specified here, while others are specified in separate RFCs.  It
P6L18	   is expected that additional encodings beyond the set listed here will
P6L19	   be created in the future and specified in additional payload format
P6L20	   RFCs.
P6L21	
P6L22	   This profile also assigns to each encoding a short name which MAY be
P6L23	   used by higher-level control protocols, such as the Session
P6L24	   Description Protocol (SDP), RFC 2327 [6], to identify encodings
P6L25	   selected for a particular RTP session.
P6L26	
P6L27	   In some contexts it may be useful to refer to these encodings in the
P6L28	   form of a MIME content-type.  To facilitate this, RFC 3555 [7]
P6L29	   provides registrations for all of the encodings names listed here as
P6L30	   MIME subtype names under the "audio" and "video" MIME types through
P6L31	   the MIME registration procedure as specified in RFC 2048 [8].
P6L32	
P6L33	   Any additional encodings specified for use under this profile (or
P6L34	   others) may also be assigned names registered as MIME subtypes with
P6L35	   the Internet Assigned Numbers Authority (IANA).  This registry
P6L36	   provides a means to insure that the names assigned to the additional
P6L37	   encodings are kept unique.  RFC 3555 specifies the information that
P6L38	   is required for the registration of RTP encodings.
P6L39	
P6L40	   In addition to assigning names to encodings, this profile also
P6L41	   assigns static RTP payload type numbers to some of them.  However,
P6L42	   the payload type number space is relatively small and cannot
P6L43	   accommodate assignments for all existing and future encodings.
P6L44	   During the early stages of RTP development, it was necessary to use
P6L45	   statically assigned payload types because no other mechanism had been
P6L46	   specified to bind encodings to payload types.  It was anticipated
P6L47	   that non-RTP means beyond the scope of this memo (such as directory
P6L48	   services or invitation protocols) would be specified to establish a
P7L1	   dynamic mapping between a payload type and an encoding.  Now,
P7L2	   mechanisms for defining dynamic payload type bindings have been
P7L3	   specified in the Session Description Protocol (SDP) and in other
P7L4	   protocols such as ITU-T Recommendation H.323/H.245.  These mechanisms
P7L5	   associate the registered name of the encoding/payload format, along
P7L6	   with any additional required parameters, such as the RTP timestamp
P7L7	   clock rate and number of channels, with a payload type number.  This
P7L8	   association is effective only for the duration of the RTP session in
P7L9	   which the dynamic payload type binding is made.  This association
P7L10	   applies only to the RTP session for which it is made, thus the
P7L11	   numbers can be re-used for different encodings in different sessions
P7L12	   so the number space limitation is avoided.
P7L13	
P7L14	   This profile reserves payload type numbers in the range 96-127
P7L15	   exclusively for dynamic assignment.  Applications SHOULD first use
P7L16	   values in this range for dynamic payload types.  Those applications
P7L17	   which need to define more than 32 dynamic payload types MAY bind
P7L18	   codes below 96, in which case it is RECOMMENDED that unassigned
P7L19	   payload type numbers be used first.  However, the statically assigned
P7L20	   payload types are default bindings and MAY be dynamically bound to
P7L21	   new encodings if needed.  Redefining payload types below 96 may cause
P7L22	   incorrect operation if an attempt is made to join a session without
P7L23	   obtaining session description information that defines the dynamic
P7L24	   payload types.
P7L25	
P7L26	   Dynamic payload types SHOULD NOT be used without a well-defined
P7L27	   mechanism to indicate the mapping.  Systems that expect to
P7L28	   interoperate with others operating under this profile SHOULD NOT make
P7L29	   their own assignments of proprietary encodings to particular, fixed
P7L30	   payload types.
P7L31	
P7L32	   This specification establishes the policy that no additional static
P7L33	   payload types will be assigned beyond the ones defined in this
P7L34	   document.  Establishing this policy avoids the problem of trying to
P7L35	   create a set of criteria for accepting static assignments and
P7L36	   encourages the implementation and deployment of the dynamic payload
P7L37	   type mechanisms.
P7L38	
P7L39	   The final set of static payload type assignments is provided in
P7L40	   Tables 4 and 5.
P7L41	
P7L42	
P7L43	
P7L44	
P7L45	
P7L46	
P7L47	
P7L48	
P8L1	4.  Audio
P8L2	
P8L3	4.1  Encoding-Independent Rules
P8L4	
P8L5	   Since the ability to suppress silence is one of the primary
P8L6	   motivations for using packets to transmit voice, the RTP header
P8L7	   carries both a sequence number and a timestamp to allow a receiver to
P8L8	   distinguish between lost packets and periods of time when no data was
P8L9	   transmitted.  Discontiguous transmission (silence suppression) MAY be
P8L10	   used with any audio payload format.  Receivers MUST assume that
P8L11	   senders may suppress silence unless this is restricted by signaling
P8L12	   specified elsewhere.  (Even if the transmitter does not suppress
P8L13	   silence, the receiver should be prepared to handle periods when no
P8L14	   data is present since packets may be lost.)
P8L15	
P8L16	   Some payload formats (see Sections 4.5.3 and 4.5.6) define a "silence
P8L17	   insertion descriptor" or "comfort noise" frame to specify parameters
P8L18	   for artificial noise that may be generated during a period of silence
P8L19	   to approximate the background noise at the source.  For other payload
P8L20	   formats, a generic Comfort Noise (CN) payload format is specified in
P8L21	   RFC 3389 [9].  When the CN payload format is used with another
P8L22	   payload format, different values in the RTP payload type field
P8L23	   distinguish comfort-noise packets from those of the selected payload
P8L24	   format.
P8L25	
P8L26	   For applications which send either no packets or occasional comfort-
P8L27	   noise packets during silence, the first packet of a talkspurt, that
P8L28	   is, the first packet after a silence period during which packets have
P8L29	   not been transmitted contiguously, SHOULD be distinguished by setting
P8L30	   the marker bit in the RTP data header to one.  The marker bit in all
P8L31	   other packets is zero.  The beginning of a talkspurt MAY be used to
P8L32	   adjust the playout delay to reflect changing network delays.
P8L33	   Applications without silence suppression MUST set the marker bit to
P8L34	   zero.
P8L35	
P8L36	   The RTP clock rate used for generating the RTP timestamp is
P8L37	   independent of the number of channels and the encoding; it usually
P8L38	   equals the number of sampling periods per second.  For N-channel
P8L39	   encodings, each sampling period (say, 1/8,000 of a second) generates
P8L40	   N samples.  (This terminology is standard, but somewhat confusing, as
P8L41	   the total number of samples generated per second is then the sampling
P8L42	   rate times the channel count.)
P8L43	
P8L44	   If multiple audio channels are used, channels are numbered left-to-
P8L45	   right, starting at one.  In RTP audio packets, information from
P8L46	   lower-numbered channels precedes that from higher-numbered channels.
P8L47	
P8L48	
P9L1	   For more than two channels, the convention followed by the AIFF-C
P9L2	   audio interchange format SHOULD be followed [3], using the following
P9L3	   notation, unless some other convention is specified for a particular
P9L4	   encoding or payload format:
P9L5	
P9L6	      l  left
P9L7	      r  right
P9L8	      c  center
P9L9	      S  surround
P9L10	      F  front
P9L11	      R  rear
P9L12	
P9L13	      channels  description  channel
P9L14	                                1     2   3   4   5   6
P9L15	      _________________________________________________
P9L16	      2         stereo          l     r
P9L17	      3                         l     r   c
P9L18	      4                         l     c   r   S
P9L19	      5                        Fl     Fr  Fc  Sl  Sr
P9L20	      6                         l     lc  c   r   rc  S
P9L21	
P9L22	         Note: RFC 1890 defined two conventions for the ordering of four
P9L23	         audio channels.  Since the ordering is indicated implicitly by
P9L24	         the number of channels, this was ambiguous.  In this revision,
P9L25	         the order described as "quadrophonic" has been eliminated to
P9L26	         remove the ambiguity.  This choice was based on the observation
P9L27	         that quadrophonic consumer audio format did not become popular
P9L28	         whereas surround-sound subsequently has.
P9L29	
P9L30	   Samples for all channels belonging to a single sampling instant MUST
P9L31	   be within the same packet.  The interleaving of samples from
P9L32	   different channels depends on the encoding.  General guidelines are
P9L33	   given in Section 4.3 and 4.4.
P9L34	
P9L35	   The sampling frequency SHOULD be drawn from the set:  8,000, 11,025,
P9L36	   16,000, 22,050, 24,000, 32,000, 44,100 and 48,000 Hz.  (Older Apple
P9L37	   Macintosh computers had a native sample rate of 22,254.54 Hz, which
P9L38	   can be converted to 22,050 with acceptable quality by dropping 4
P9L39	   samples in a 20 ms frame.)  However, most audio encodings are defined
P9L40	   for a more restricted set of sampling frequencies.  Receivers SHOULD
P9L41	   be prepared to accept multi-channel audio, but MAY choose to only
P9L42	   play a single channel.
P9L43	
P9L44	4.2  Operating Recommendations
P9L45	
P9L46	   The following recommendations are default operating parameters.
P9L47	   Applications SHOULD be prepared to handle other values.  The ranges
P9L48	   given are meant to give guidance to application writers, allowing a
P10L1	   set of applications conforming to these guidelines to interoperate
P10L2	   without additional negotiation.  These guidelines are not intended to
P10L3	   restrict operating parameters for applications that can negotiate a
P10L4	   set of interoperable parameters, e.g., through a conference control
P10L5	   protocol.
P10L6	
P10L7	   For packetized audio, the default packetization interval SHOULD have
P10L8	   a duration of 20 ms or one frame, whichever is longer, unless
P10L9	   otherwise noted in Table 1 (column "ms/packet").  The packetization
P10L10	   interval determines the minimum end-to-end delay; longer packets
P10L11	   introduce less header overhead but higher delay and make packet loss
P10L12	   more noticeable.  For non-interactive applications such as lectures
P10L13	   or for links with severe bandwidth constraints, a higher
P10L14	   packetization delay MAY be used.  A receiver SHOULD accept packets
P10L15	   representing between 0 and 200 ms of audio data.  (For framed audio
P10L16	   encodings, a receiver SHOULD accept packets with a number of frames
P10L17	   equal to 200 ms divided by the frame duration, rounded up.)  This
P10L18	   restriction allows reasonable buffer sizing for the receiver.
P10L19	
P10L20	4.3  Guidelines for Sample-Based Audio Encodings
P10L21	
P10L22	   In sample-based encodings, each audio sample is represented by a
P10L23	   fixed number of bits.  Within the compressed audio data, codes for
P10L24	   individual samples may span octet boundaries.  An RTP audio packet
P10L25	   may contain any number of audio samples, subject to the constraint
P10L26	   that the number of bits per sample times the number of samples per
P10L27	   packet yields an integral octet count.  Fractional encodings produce
P10L28	   less than one octet per sample.
P10L29	
P10L30	   The duration of an audio packet is determined by the number of
P10L31	   samples in the packet.
P10L32	
P10L33	   For sample-based encodings producing one or more octets per sample,
P10L34	   samples from different channels sampled at the same sampling instant
P10L35	   SHOULD be packed in consecutive octets.  For example, for a two-
P10L36	   channel encoding, the octet sequence is (left channel, first sample),
P10L37	   (right channel, first sample), (left channel, second sample), (right
P10L38	   channel, second sample), ....  For multi-octet encodings, octets
P10L39	   SHOULD be transmitted in network byte order (i.e., most significant
P10L40	   octet first).
P10L41	
P10L42	   The packing of sample-based encodings producing less than one octet
P10L43	   per sample is encoding-specific.
P10L44	
P10L45	   The RTP timestamp reflects the instant at which the first sample in
P10L46	   the packet was sampled, that is, the oldest information in the
P10L47	   packet.
P10L48	
P11L1	4.4  Guidelines for Frame-Based Audio Encodings
P11L2	
P11L3	   Frame-based encodings encode a fixed-length block of audio into
P11L4	   another block of compressed data, typically also of fixed length.
P11L5	   For frame-based encodings, the sender MAY choose to combine several
P11L6	   such frames into a single RTP packet.  The receiver can tell the
P11L7	   number of frames contained in an RTP packet, if all the frames have
P11L8	   the same length, by dividing the RTP payload length by the audio
P11L9	   frame size which is defined as part of the encoding.  This does not
P11L10	   work when carrying frames of different sizes unless the frame sizes
P11L11	   are relatively prime.  If not, the frames MUST indicate their size.
P11L12	
P11L13	   For frame-based codecs, the channel order is defined for the whole
P11L14	   block.  That is, for two-channel audio, right and left samples SHOULD
P11L15	   be coded independently, with the encoded frame for the left channel
P11L16	   preceding that for the right channel.
P11L17	
P11L18	   All frame-oriented audio codecs SHOULD be able to encode and decode
P11L19	   several consecutive frames within a single packet.  Since the frame
P11L20	   size for the frame-oriented codecs is given, there is no need to use
P11L21	   a separate designation for the same encoding, but with different
P11L22	   number of frames per packet.
P11L23	
P11L24	   RTP packets SHALL contain a whole number of frames, with frames
P11L25	   inserted according to age within a packet, so that the oldest frame
P11L26	   (to be played first) occurs immediately after the RTP packet header.
P11L27	   The RTP timestamp reflects the instant at which the first sample in
P11L28	   the first frame was sampled, that is, the oldest information in the
P11L29	   packet.
P11L30	
P11L31	
P11L32	
P11L33	
P11L34	
P11L35	
P11L36	
P11L37	
P11L38	
P11L39	
P11L40	
P11L41	
P11L42	
P11L43	
P11L44	
P11L45	
P11L46	
P11L47	
P11L48	
P12L1	4.5 Audio Encodings
P12L2	
P12L3	   name of                              sampling              default
P12L4	   encoding  sample/frame  bits/sample      rate  ms/frame  ms/packet
P12L5	   __________________________________________________________________
P12L6	   DVI4      sample        4                var.                   20
P12L7	   G722      sample        8              16,000                   20
P12L8	   G723      frame         N/A             8,000        30         30
P12L9	   G726-40   sample        5               8,000                   20
P12L10	   G726-32   sample        4               8,000                   20
P12L11	   G726-24   sample        3               8,000                   20
P12L12	   G726-16   sample        2               8,000                   20
P12L13	   G728      frame         N/A             8,000       2.5         20
P12L14	   G729      frame         N/A             8,000        10         20
P12L15	   G729D     frame         N/A             8,000        10         20
P12L16	   G729E     frame         N/A             8,000        10         20
P12L17	   GSM       frame         N/A             8,000        20         20
P12L18	   GSM-EFR   frame         N/A             8,000        20         20
P12L19	   L8        sample        8                var.                   20
P12L20	   L16       sample        16               var.                   20
P12L21	   LPC       frame         N/A             8,000        20         20
P12L22	   MPA       frame         N/A              var.      var.
P12L23	   PCMA      sample        8                var.                   20
P12L24	   PCMU      sample        8                var.                   20
P12L25	   QCELP     frame         N/A             8,000        20         20
P12L26	   VDVI      sample        var.             var.                   20
P12L27	
P12L28	   Table 1: Properties of Audio Encodings (N/A: not applicable; var.:
P12L29	            variable)
P12L30	
P12L31	   The characteristics of the audio encodings described in this document
P12L32	   are shown in Table 1; they are listed in order of their payload type
P12L33	   in Table 4.  While most audio codecs are only specified for a fixed
P12L34	   sampling rate, some sample-based algorithms (indicated by an entry of
P12L35	   "var." in the sampling rate column of Table 1) may be used with
P12L36	   different sampling rates, resulting in different coded bit rates.
P12L37	   When used with a sampling rate other than that for which a static
P12L38	   payload type is defined, non-RTP means beyond the scope of this memo
P12L39	   MUST be used to define a dynamic payload type and MUST indicate the
P12L40	   selected RTP timestamp clock rate, which is usually the same as the
P12L41	   sampling rate for audio.
P12L42	
P12L43	
P12L44	
P12L45	
P12L46	
P12L47	
P12L48	
P13L1	4.5.1 DVI4
P13L2	
P13L3	   DVI4 uses an adaptive delta pulse code modulation (ADPCM) encoding
P13L4	   scheme that was specified by the Interactive Multimedia Association
P13L5	   (IMA) as the "IMA ADPCM wave type".  However, the encoding defined
P13L6	   here as DVI4 differs in three respects from the IMA specification:
P13L7	
P13L8	   o  The RTP DVI4 header contains the predicted value rather than the
P13L9	      first sample value contained the IMA ADPCM block header.
P13L10	
P13L11	   o  IMA ADPCM blocks contain an odd number of samples, since the first
P13L12	      sample of a block is contained just in the header (uncompressed),
P13L13	      followed by an even number of compressed samples.  DVI4 has an
P13L14	      even number of compressed samples only, using the `predict' word
P13L15	      from the header to decode the first sample.
P13L16	
P13L17	   o  For DVI4, the 4-bit samples are packed with the first sample in
P13L18	      the four most significant bits and the second sample in the four
P13L19	      least significant bits.  In the IMA ADPCM codec, the samples are
P13L20	      packed in the opposite order.
P13L21	
P13L22	   Each packet contains a single DVI block.  This profile only defines
P13L23	   the 4-bit-per-sample version, while IMA also specified a 3-bit-per-
P13L24	   sample encoding.
P13L25	
P13L26	   The "header" word for each channel has the following structure:
P13L27	
P13L28	      int16  predict;  /* predicted value of first sample
P13L29	                          from the previous block (L16 format) */
P13L30	      u_int8 index;    /* current index into stepsize table */
P13L31	      u_int8 reserved; /* set to zero by sender, ignored by receiver */
P13L32	
P13L33	   Each octet following the header contains two 4-bit samples, thus the
P13L34	   number of samples per packet MUST be even because there is no means
P13L35	   to indicate a partially filled last octet.
P13L36	
P13L37	   Packing of samples for multiple channels is for further study.
P13L38	
P13L39	   The IMA ADPCM algorithm was described in the document IMA Recommended
P13L40	   Practices for Enhancing Digital Audio Compatibility in Multimedia
P13L41	   Systems (version 3.0).  However, the Interactive Multimedia
P13L42	   Association ceased operations in 1997.  Resources for an archived
P13L43	   copy of that document and a software implementation of the RTP DVI4
P13L44	   encoding are listed in Section 13.
P13L45	
P13L46	
P13L47	
P13L48	
P14L1	4.5.2 G722
P14L2	
P14L3	   G722 is specified in ITU-T Recommendation G.722, "7 kHz audio-coding
P14L4	   within 64 kbit/s".  The G.722 encoder produces a stream of octets,
P14L5	   each of which SHALL be octet-aligned in an RTP packet.  The first bit
P14L6	   transmitted in the G.722 octet, which is the most significant bit of
P14L7	   the higher sub-band sample, SHALL correspond to the most significant
P14L8	   bit of the octet in the RTP packet.
P14L9	
P14L10	   Even though the actual sampling rate for G.722 audio is 16,000 Hz,
P14L11	   the RTP clock rate for the G722 payload format is 8,000 Hz because
P14L12	   that value was erroneously assigned in RFC 1890 and must remain
P14L13	   unchanged for backward compatibility.  The octet rate or sample-pair
P14L14	   rate is 8,000 Hz.
P14L15	
P14L16	4.5.3 G723
P14L17	
P14L18	   G723 is specified in ITU Recommendation G.723.1, "Dual-rate speech
P14L19	   coder for multimedia communications transmitting at 5.3 and 6.3
P14L20	   kbit/s".  The G.723.1 5.3/6.3 kbit/s codec was defined by the ITU-T
P14L21	   as a mandatory codec for ITU-T H.324 GSTN videophone terminal
P14L22	   applications.  The algorithm has a floating point specification in
P14L23	   Annex B to G.723.1, a silence compression algorithm in Annex A to
P14L24	   G.723.1 and a scalable channel coding scheme for wireless
P14L25	   applications in G.723.1 Annex C.
P14L26	
P14L27	   This Recommendation specifies a coded representation that can be used
P14L28	   for compressing the speech signal component of multi-media services
P14L29	   at a very low bit rate.  Audio is encoded in 30 ms frames, with an
P14L30	   additional delay of 7.5 ms due to look-ahead.  A G.723.1 frame can be
P14L31	   one of three sizes:  24 octets (6.3 kb/s frame), 20 octets (5.3 kb/s
P14L32	   frame), or 4 octets.  These 4-octet frames are called SID frames
P14L33	   (Silence Insertion Descriptor) and are used to specify comfort noise
P14L34	   parameters.  There is no restriction on how 4, 20, and 24 octet
P14L35	   frames are intermixed.  The least significant two bits of the first
P14L36	   octet in the frame determine the frame size and codec type:
P14L37	
P14L38	         bits  content                      octets/frame
P14L39	         00    high-rate speech (6.3 kb/s)            24
P14L40	         01    low-rate speech  (5.3 kb/s)            20
P14L41	         10    SID frame                               4
P14L42	         11    reserved
P14L43	
P14L44	
P14L45	
P14L46	
P14L47	
P14L48	
P15L1	   It is possible to switch between the two rates at any 30 ms frame
P15L2	   boundary.  Both (5.3 kb/s and 6.3 kb/s) rates are a mandatory part of
P15L3	   the encoder and decoder.  Receivers MUST accept both data rates and
P15L4	   MUST accept SID frames unless restriction of these capabilities has
P15L5	   been signaled.  The MIME registration for G723 in RFC 3555 [7]
P15L6	   specifies parameters that MAY be used with MIME or SDP to restrict to
P15L7	   a single data rate or to restrict the use of SID frames.  This coder
P15L8	   was optimized to represent speech with near-toll quality at the above
P15L9	   rates using a limited amount of complexity.
P15L10	
P15L11	   The packing of the encoded bit stream into octets and the
P15L12	   transmission order of the octets is specified in Rec. G.723.1 and is
P15L13	   the same as that produced by the G.723 C code reference
P15L14	   implementation.  For the 6.3 kb/s data rate, this packing is
P15L15	   illustrated as follows, where the header (HDR) bits are always "0 0"
P15L16	   as shown in Fig. 1 to indicate operation at 6.3 kb/s, and the Z bit
P15L17	   is always set to zero.  The diagrams show the bit packing in "network
P15L18	   byte order", also known as big-endian order.  The bits of each 32-bit
P15L19	   word are numbered 0 to 31, with the most significant bit on the left
P15L20	   and numbered 0.  The octets (bytes) of each word are transmitted most
P15L21	   significant octet first.  The bits of each data field are numbered in
P15L22	   the order of the bit stream representation of the encoding (least
P15L23	   significant bit first).  The vertical bars indicate the boundaries
P15L24	   between field fragments.
P15L25	
P15L26	
P15L27	
P15L28	
P15L29	
P15L30	
P15L31	
P15L32	
P15L33	
P15L34	
P15L35	
P15L36	
P15L37	
P15L38	
P15L39	
P15L40	
P15L41	
P15L42	
P15L43	
P15L44	
P15L45	
P15L46	
P15L47	
P15L48	
P16L1	    0                   1                   2                   3
P16L2	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
P16L3	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
P16L4	   |    LPC    |HDR|      LPC      |      LPC      |    ACL0   |LPC|
P16L5	   |           |   |               |               |           |   |
P16L6	   |0 0 0 0 0 0|0 0|1 1 1 1 0 0 0 0|2 2 1 1 1 1 1 1|0 0 0 0 0 0|2 2|
P16L7	   |5 4 3 2 1 0|   |3 2 1 0 9 8 7 6|1 0 9 8 7 6 5 4|5 4 3 2 1 0|3 2|
P16L8	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
P16L9	   |  ACL2   |ACL|A| GAIN0 |ACL|ACL|    GAIN0      |    GAIN1      |
P16L10	   |         | 1 |C|       | 3 | 2 |               |               |
P16L11	   |0 0 0 0 0|0 0|0|0 0 0 0|0 0|0 0|1 1 0 0 0 0 0 0|0 0 0 0 0 0 0 0|
P16L12	   |4 3 2 1 0|1 0|6|3 2 1 0|1 0|6 5|1 0 9 8 7 6 5 4|7 6 5 4 3 2 1 0|
P16L13	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
P16L14	   | GAIN2 | GAIN1 |     GAIN2     |     GAIN3     | GRID  | GAIN3 |
P16L15	   |       |       |               |               |       |       |
P16L16	   |0 0 0 0|1 1 0 0|1 1 0 0 0 0 0 0|0 0 0 0 0 0 0 0|0 0 0 0|1 1 0 0|
P16L17	   |3 2 1 0|1 0 9 8|1 0 9 8 7 6 5 4|7 6 5 4 3 2 1 0|3 2 1 0|1 0 9 8|
P16L18	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
P16L19	   |   MSBPOS    |Z|POS|  MSBPOS   |     POS0      |POS|   POS0    |
P16L20	   |             | | 0 |           |               | 1 |           |
P16L21	   |0 0 0 0 0 0 0|0|0 0|1 1 1 0 0 0|0 0 0 0 0 0 0 0|0 0|1 1 1 1 1 1|
P16L22	   |6 5 4 3 2 1 0| |1 0|2 1 0 9 8 7|9 8 7 6 5 4 3 2|1 0|5 4 3 2 1 0|
P16L23	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
P16L24	   |     POS1      | POS2  | POS1  |     POS2      | POS3  | POS2  |
P16L25	   |               |       |       |               |       |       |
P16L26	   |0 0 0 0 0 0 0 0|0 0 0 0|1 1 1 1|1 1 0 0 0 0 0 0|0 0 0 0|1 1 1 1|
P16L27	   |9 8 7 6 5 4 3 2|3 2 1 0|3 2 1 0|1 0 9 8 7 6 5 4|3 2 1 0|5 4 3 2|
P16L28	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
P16L29	   |     POS3      |   PSIG0   |POS|PSIG2|  PSIG1  |  PSIG3  |PSIG2|
P16L30	   |               |           | 3 |     |         |         |     |
P16L31	   |1 1 0 0 0 0 0 0|0 0 0 0 0 0|1 1|0 0 0|0 0 0 0 0|0 0 0 0 0|0 0 0|
P16L32	   |1 0 9 8 7 6 5 4|5 4 3 2 1 0|3 2|2 1 0|4 3 2 1 0|4 3 2 1 0|5 4 3|
P16L33	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
P16L34	
P16L35	                  Figure 1: G.723 (6.3 kb/s) bit packing
P16L36	
P16L37	   For the 5.3 kb/s data rate, the header (HDR) bits are always "0 1",
P16L38	   as shown in Fig. 2, to indicate operation at 5.3 kb/s.
P16L39	
P16L40	
P16L41	
P16L42	
P16L43	
P16L44	
P16L45	
P16L46	
P16L47	
P16L48	
P17L1	    0                   1                   2                   3
P17L2	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
P17L3	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
P17L4	   |    LPC    |HDR|      LPC      |      LPC      |   ACL0    |LPC|
P17L5	   |           |   |               |               |           |   |
P17L6	   |0 0 0 0 0 0|0 1|1 1 1 1 0 0 0 0|2 2 1 1 1 1 1 1|0 0 0 0 0 0|2 2|
P17L7	   |5 4 3 2 1 0|   |3 2 1 0 9 8 7 6|1 0 9 8 7 6 5 4|5 4 3 2 1 0|3 2|
P17L8	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
P17L9	   |  ACL2   |ACL|A| GAIN0 |ACL|ACL|     GAIN0     |     GAIN1     |
P17L10	   |         | 1 |C|       | 3 | 2 |               |               |
P17L11	   |0 0 0 0 0|0 0|0|0 0 0 0|0 0|0 0|1 1 0 0 0 0 0 0|0 0 0 0 0 0 0 0|
P17L12	   |4 3 2 1 0|1 0|6|3 2 1 0|1 0|6 5|1 0 9 8 7 6 5 4|7 6 5 4 3 2 1 0|
P17L13	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
P17L14	   | GAIN2 | GAIN1 |     GAIN2     |    GAIN3      | GRID  | GAIN3 |
P17L15	   |       |       |               |               |       |       |
P17L16	   |0 0 0 0|1 1 0 0|1 1 0 0 0 0 0 0|0 0 0 0 0 0 0 0|0 0 0 0|1 1 0 0|
P17L17	   |3 2 1 0|1 0 9 8|1 0 9 8 7 6 5 4|7 6 5 4 3 2 1 0|4 3 2 1|1 0 9 8|
P17L18	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
P17L19	   |     POS0      | POS1  | POS0  |     POS1      |     POS2      |
P17L20	   |               |       |       |               |               |
P17L21	   |0 0 0 0 0 0 0 0|0 0 0 0|1 1 0 0|1 1 0 0 0 0 0 0|0 0 0 0 0 0 0 0|
P17L22	   |7 6 5 4 3 2 1 0|3 2 1 0|1 0 9 8|1 0 9 8 7 6 5 4|7 6 5 4 3 2 1 0|
P17L23	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
P17L24	   | POS3  | POS2  |     POS3      | PSIG1 | PSIG0 | PSIG3 | PSIG2 |
P17L25	   |       |       |               |       |       |       |       |
P17L26	   |0 0 0 0|1 1 0 0|1 1 0 0 0 0 0 0|0 0 0 0|0 0 0 0|0 0 0 0|0 0 0 0|
P17L27	   |3 2 1 0|1 0 9 8|1 0 9 8 7 6 5 4|3 2 1 0|3 2 1 0|3 2 1 0|3 2 1 0|
P17L28	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
P17L29	
P17L30	                  Figure 2: G.723 (5.3 kb/s) bit packing
P17L31	
P17L32	   The packing of G.723.1 SID (silence) frames, which are indicated by
P17L33	   the header (HDR) bits having the pattern "1 0", is depicted in Fig.
P17L34	   3.
P17L35	
P17L36	    0                   1                   2                   3
P17L37	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
P17L38	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
P17L39	   |    LPC    |HDR|      LPC      |      LPC      |   GAIN    |LPC|
P17L40	   |           |   |               |               |           |   |
P17L41	   |0 0 0 0 0 0|1 0|1 1 1 1 0 0 0 0|2 2 1 1 1 1 1 1|0 0 0 0 0 0|2 2|
P17L42	   |5 4 3 2 1 0|   |3 2 1 0 9 8 7 6|1 0 9 8 7 6 5 4|5 4 3 2 1 0|3 2|
P17L43	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
P17L44	
P17L45	                   Figure 3: G.723 SID mode bit packing
P17L46	
P17L47	
P17L48	
P18L1	4.5.4  G726-40, G726-32, G726-24, and G726-16
P18L2	
P18L3	   ITU-T Recommendation G.726 describes, among others, the algorithm
P18L4	   recommended for conversion of a single 64 kbit/s A-law or mu-law PCM
P18L5	   channel encoded at 8,000 samples/sec to and from a 40, 32, 24, or 16
P18L6	   kbit/s channel.  The conversion is applied to the PCM stream using an
P18L7	   Adaptive Differential Pulse Code Modulation (ADPCM) transcoding
P18L8	   technique.  The ADPCM representation consists of a series of
P18L9	   codewords with a one-to-one correspondence to the samples in the PCM
P18L10	   stream.  The G726 data rates of 40, 32, 24, and 16 kbit/s have
P18L11	   codewords of 5, 4, 3, and 2 bits, respectively.
P18L12	
P18L13	   The 16 and 24 kbit/s encodings do not provide toll quality speech.
P18L14	   They are designed for used in overloaded Digital Circuit
P18L15	   Multiplication Equipment (DCME).  ITU-T G.726 recommends that the 16
P18L16	   and 24 kbit/s encodings should be alternated with higher data rate
P18L17	   encodings to provide an average sample size of between 3.5 and 3.7
P18L18	   bits per sample.
P18L19	
P18L20	   The encodings of G.726 are here denoted as G726-40, G726-32, G726-24,
P18L21	   and G726-16.  Prior to 1990, G721 described the 32 kbit/s ADPCM
P18L22	   encoding, and G723 described the 40, 32, and 16 kbit/s encodings.
P18L23	   Thus, G726-32 designates the same algorithm as G721 in RFC 1890.
P18L24	
P18L25	   A stream of G726 codewords contains no information on the encoding
P18L26	   being used, therefore transitions between G726 encoding types are not
P18L27	   permitted within a sequence of packed codewords.  Applications MUST
P18L28	   determine the encoding type of packed codewords from the RTP payload
P18L29	   identifier.
P18L30	
P18L31	   No payload-specific header information SHALL be included as part of
P18L32	   the audio data.  A stream of G726 codewords MUST be packed into
P18L33	   octets as follows:  the first codeword is placed into the first octet
P18L34	   such that the least significant bit of the codeword aligns with the
P18L35	   least significant bit in the octet, the second codeword is then
P18L36	   packed so that its least significant bit coincides with the least
P18L37	   significant unoccupied bit in the octet.  When a complete codeword
P18L38	   cannot be placed into an octet, the bits overlapping the octet
P18L39	   boundary are placed into the least significant bits of the next
P18L40	   octet.  Packing MUST end with a completely packed final octet.  The
P18L41	   number of codewords packed will therefore be a multiple of 8, 2, 8,
P18L42	   and 4 for G726-40, G726-32, G726-24, and G726-16, respectively.  An
P18L43	   example of the packing scheme for G726-32 codewords is as shown,
P18L44	   where bit 7 is the least significant bit of the first octet, and bit
P18L45	   A3 is the least significant bit of the first codeword:
P18L46	
P18L47	
P18L48	
P19L1	          0                   1
P19L2	          0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
P19L3	         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
P19L4	         |B B B B|A A A A|D D D D|C C C C| ...
P19L5	         |0 1 2 3|0 1 2 3|0 1 2 3|0 1 2 3|
P19L6	         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
P19L7	
P19L8	   An example of the packing scheme for G726-24 codewords follows, where
P19L9	   again bit 7 is the least significant bit of the first octet, and bit
P19L10	   A2 is the least significant bit of the first codeword:
P19L11	
P19L12	          0                   1                   2
P19L13	          0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3
P19L14	         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
P19L15	         |C C|B B B|A A A|F|E E E|D D D|C|H H H|G G G|F F| ...
P19L16	         |1 2|0 1 2|0 1 2|2|0 1 2|0 1 2|0|0 1 2|0 1 2|0 1|
P19L17	         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
P19L18	
P19L19	   Note that the "little-endian" direction in which samples are packed
P19L20	   into octets in the G726-16, -24, -32 and -40 payload formats
P19L21	   specified here is consistent with ITU-T Recommendation X.420, but is
P19L22	   the opposite of what is specified in ITU-T Recommendation I.366.2
P19L23	   Annex E for ATM AAL2 transport.  A second set of RTP payload formats
P19L24	   matching the packetization of I.366.2 Annex E and identified by MIME
P19L25	   subtypes AAL2-G726-16, -24, -32 and -40 will be specified in a
P19L26	   separate document.
P19L27	
P19L28	4.5.5 G728
P19L29	
P19L30	   G728 is specified in ITU-T Recommendation G.728, "Coding of speech at
P19L31	   16 kbit/s using low-delay code excited linear prediction".
P19L32	
P19L33	   A G.278 encoder translates 5 consecutive audio samples into a 10-bit
P19L34	   codebook index, resulting in a bit rate of 16 kb/s for audio sampled
P19L35	   at 8,000 samples per second.  The group of five consecutive samples
P19L36	   is called a vector.  Four consecutive vectors, labeled V1 to V4
P19L37	   (where V1 is to be played first by the receiver), build one G.728
P19L38	   frame.  The four vectors of 40 bits are packed into 5 octets, labeled
P19L39	   B1 through B5.  B1 SHALL be placed first in the RTP packet.
P19L40	
P19L41	   Referring to the figure below, the principle for bit order is
P19L42	   "maintenance of bit significance".  Bits from an older vector are
P19L43	   more significant than bits from newer vectors.  The MSB of the frame
P19L44	   goes to the MSB of B1 and the LSB of the frame goes to LSB of B5.
P19L45	
P19L46	
P19L47	
P19L48	
P20L1	                   1         2         3        3
P20L2	         0         0         0         0        9
P20L3	         ++++++++++++++++++++++++++++++++++++++++
P20L4	         <---V1---><---V2---><---V3---><---V4---> vectors
P20L5	         <--B1--><--B2--><--B3--><--B4--><--B5--> octets
P20L6	         <------------- frame 1 ---------------->
P20L7	
P20L8	   In particular, B1 contains the eight most significant bits of V1,
P20L9	   with the MSB of V1 being the MSB of B1.  B2 contains the two least
P20L10	   significant bits of V1, the more significant of the two in its MSB,
P20L11	   and the six most significant bits of V2.  B1 SHALL be placed first in
P20L12	   the RTP packet and B5 last.
P20L13	
P20L14	4.5.6 G729
P20L15	
P20L16	   G729 is specified in ITU-T Recommendation G.729, "Coding of speech at
P20L17	   8 kbit/s using conjugate structure-algebraic code excited linear
P20L18	   prediction (CS-ACELP)".  A reduced-complexity version of the G.729
P20L19	   algorithm is specified in Annex A to Rec. G.729.  The speech coding
P20L20	   algorithms in the main body of G.729 and in G.729 Annex A are fully
P20L21	   interoperable with each other, so there is no need to further
P20L22	   distinguish between them.  An implementation that signals or accepts
P20L23	   use of G729 payload format may implement either G.729 or G.729A
P20L24	   unless restricted by additional signaling specified elsewhere related
P20L25	   specifically to the encoding rather than the payload format.  The
P20L26	   G.729 and G.729 Annex A codecs were optimized to represent speech
P20L27	   with high quality, where G.729 Annex A trades some speech quality for
P20L28	   an approximate 50% complexity reduction [10].  See the next Section
P20L29	   (4.5.7) for other data rates added in later G.729 Annexes.  For all
P20L30	   data rates, the sampling frequency (and RTP timestamp clock rate) is
P20L31	   8,000 Hz.
P20L32	
P20L33	   A voice activity detector (VAD) and comfort noise generator (CNG)
P20L34	   algorithm in Annex B of G.729 is RECOMMENDED for digital simultaneous
P20L35	   voice and data applications and can be used in conjunction with G.729
P20L36	   or G.729 Annex A.  A G.729 or G.729 Annex A frame contains 10 octets,
P20L37	   while the G.729 Annex B comfort noise frame occupies 2 octets.
P20L38	   Receivers MUST accept comfort noise frames if restriction of their
P20L39	   use has not been signaled.  The MIME registration for G729 in RFC
P20L40	   3555 [7] specifies a parameter that MAY be used with MIME or SDP to
P20L41	   restrict the use of comfort noise frames.
P20L42	
P20L43	   A G729 RTP packet may consist of zero or more G.729 or G.729 Annex A
P20L44	   frames, followed by zero or one G.729 Annex B frames.  The presence
P20L45	   of a comfort noise frame can be deduced from the length of the RTP
P20L46	   payload.  The default packetization interval is 20 ms (two frames),
P20L47	   but in some situations it may be desirable to send 10 ms packets.  An
P20L48	
P21L1	   example would be a transition from speech to comfort noise in the
P21L2	   first 10 ms of the packet.  For some applications, a longer
P21L3	   packetization interval may be required to reduce the packet rate.
P21L4	
P21L5	       0                   1                   2                   3
P21L6	       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
P21L7	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
P21L8	      |L|      L1     |    L2   |    L3   |       P1      |P|    C1   |
P21L9	      |0|             |         |         |               |0|         |
P21L10	      | |0 1 2 3 4 5 6|0 1 2 3 4|0 1 2 3 4|0 1 2 3 4 5 6 7| |0 1 2 3 4|
P21L11	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
P21L12	      |       C1      |  S1   | GA1 |  GB1  |    P2   |      C2       |
P21L13	      |          1 1 1|       |     |       |         |               |
P21L14	      |5 6 7 8 9 0 1 2|0 1 2 3|0 1 2|0 1 2 3|0 1 2 3 4|0 1 2 3 4 5 6 7|
P21L15	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
P21L16	      |   C2    |  S2   | GA2 |  GB2  |
P21L17	      |    1 1 1|       |     |       |
P21L18	      |8 9 0 1 2|0 1 2 3|0 1 2|0 1 2 3|
P21L19	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
P21L20	
P21L21	                    Figure 4: G.729 and G.729A bit packing
P21L22	
P21L23	   The transmitted parameters of a G.729/G.729A 10-ms frame, consisting
P21L24	   of 80 bits, are defined in Recommendation G.729, Table 8/G.729.  The
P21L25	   mapping of the these parameters is given below in Fig. 4.  The
P21L26	   diagrams show the bit packing in "network byte order", also known as
P21L27	   big-endian order.  The bits of each 32-bit word are numbered 0 to 31,
P21L28	   with the most significant bit on the left and numbered 0.  The octets
P21L29	   (bytes) of each word are transmitted most significant octet first.
P21L30	   The bits of each data field are numbered in the order as produced by
P21L31	   the G.729 C code reference implementation.
P21L32	
P21L33	   The packing of the G.729 Annex B comfort noise frame is shown in Fig.
P21L34	   5.
P21L35	
P21L36	          0                   1
P21L37	          0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
P21L38	         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
P21L39	         |L|  LSF1   |  LSF2 |   GAIN  |R|
P21L40	         |S|         |       |         |E|
P21L41	         |F|         |       |         |S|
P21L42	         |0|0 1 2 3 4|0 1 2 3|0 1 2 3 4|V|    RESV = Reserved (zero)
P21L43	         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
P21L44	
P21L45	                       Figure 5: G.729 Annex B bit packing
P21L46	
P21L47	
P21L48	
P22L1	4.5.7 G729D and G729E
P22L2	
P22L3	   Annexes D and E to ITU-T Recommendation G.729 provide additional data
P22L4	   rates.  Because the data rate is not signaled in the bitstream, the
P22L5	   different data rates are given distinct RTP encoding names which are
P22L6	   mapped to distinct payload type numbers.  G729D indicates a 6.4
P22L7	   kbit/s coding mode (G.729 Annex D, for momentary reduction in channel
P22L8	   capacity), while G729E indicates an 11.8 kbit/s mode (G.729 Annex E,
P22L9	   for improved performance with a wide range of narrow-band input
P22L10	   signals, e.g., music and background noise).  Annex E has two
P22L11	   operating modes, backward adaptive and forward adaptive, which are
P22L12	   signaled by the first two bits in each frame (the most significant
P22L13	   two bits of the first octet).
P22L14	
P22L15	   The voice activity detector (VAD) and comfort noise generator (CNG)
P22L16	   algorithm specified in Annex B of G.729 may be used with Annex D and
P22L17	   Annex E frames in addition to G.729 and G.729 Annex A frames.  The
P22L18	   algorithm details for the operation of Annexes D and E with the Annex
P22L19	   B CNG are specified in G.729 Annexes F and G.  Note that Annexes F
P22L20	   and G do not introduce any new encodings.  Receivers MUST accept
P22L21	   comfort noise frames if restriction of their use has not been
P22L22	   signaled.  The MIME registrations for G729D and G729E in RFC 3555 [7]
P22L23	   specify a parameter that MAY be used with MIME or SDP to restrict the
P22L24	   use of comfort noise frames.
P22L25	
P22L26	   For G729D, an RTP packet may consist of zero or more G.729 Annex D
P22L27	   frames, followed by zero or one G.729 Annex B frame.  Similarly, for
P22L28	   G729E, an RTP packet may consist of zero or more G.729 Annex E
P22L29	   frames, followed by zero or one G.729 Annex B frame.  The presence of
P22L30	   a comfort noise frame can be deduced from the length of the RTP
P22L31	   payload.
P22L32	
P22L33	   A single RTP packet must contain frames of only one data rate,
P22L34	   optionally followed by one comfort noise frame.  The data rate may be
P22L35	   changed from packet to packet by changing the payload type number.
P22L36	   G.729 Annexes D, E and H describe what the encoding and decoding
P22L37	   algorithms must do to accommodate a change in data rate.
P22L38	
P22L39	   For G729D, the bits of a G.729 Annex D frame are formatted as shown
P22L40	   below in Fig. 6 (cf.  Table D.1/G.729).  The frame length is 64 bits.
P22L41	
P22L42	
P22L43	
P22L44	
P22L45	
P22L46	
P22L47	
P22L48	
P23L1	       0                   1                   2                   3
P23L2	       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
P23L3	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
P23L4	      |L|      L1     |    L2   |    L3   |        P1     |     C1    |
P23L5	      |0|             |         |         |               |           |
P23L6	      | |0 1 2 3 4 5 6|0 1 2 3 4|0 1 2 3 4|0 1 2 3 4 5 6 7|0 1 2 3 4 5|
P23L7	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
P23L8	      | C1  |S1 | GA1 | GB1 |  P2   |        C2       |S2 | GA2 | GB2 |
P23L9	      |     |   |     |     |       |                 |   |     |     |
P23L10	      |6 7 8|0 1|0 1 2|0 1 2|0 1 2 3|0 1 2 3 4 5 6 7 8|0 1|0 1 2|0 1 2|
P23L11	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
P23L12	
P23L13	                     Figure 6: G.729 Annex D bit packing
P23L14	
P23L15	   The net bit rate for the G.729 Annex E algorithm is 11.8 kbit/s and a
P23L16	   total of 118 bits are used.  Two bits are appended as "don't care"
P23L17	   bits to complete an integer number of octets for the frame.  For
P23L18	   G729E, the bits of a data frame are formatted as shown in the next
P23L19	   two diagrams (cf. Table E.1/G.729).  The fields for the G729E forward
P23L20	   adaptive mode are packed as shown in Fig. 7.
P23L21	
P23L22	       0                   1                   2                   3
P23L23	       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
P23L24	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
P23L25	      |0 0|L|      L1     |    L2   |    L3   |        P1     |P| C0_1|
P23L26	      |   |0|             |         |         |               |0|     |
P23L27	      |   | |0 1 2 3 4 5 6|0 1 2 3 4|0 1 2 3 4|0 1 2 3 4 5 6 7| |0 1 2|
P23L28	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
P23L29	      |       |   C1_1      |     C2_1    |   C3_1      |    C4_1     |
P23L30	      |       |             |             |             |             |
P23L31	      |3 4 5 6|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2 3 4 5 6|
P23L32	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
P23L33	      | GA1 |  GB1  |    P2   |   C0_2      |     C1_2    |   C2_2    |
P23L34	      |     |       |         |             |             |           |
P23L35	      |0 1 2|0 1 2 3|0 1 2 3 4|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2 3 4 5|
P23L36	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
P23L37	      | |    C3_2     |     C4_2    | GA2 | GB2   |DC |
P23L38	      | |             |             |     |       |   |
P23L39	      |6|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2|0 1 2 3|0 1|
P23L40	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
P23L41	
P23L42	         Figure 7: G.729 Annex E (forward adaptive mode) bit packing
P23L43	
P23L44	   The fields for the G729E backward adaptive mode are packed as shown
P23L45	   in Fig. 8.
P23L46	
P23L47	
P23L48	
P24L1	       0                   1                   2                   3
P24L2	       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
P24L3	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
P24L4	      |1 1|       P1      |P|       C0_1              |     C1_1      |
P24L5	      |   |               |0|                    1 1 1|               |
P24L6	      |   |0 1 2 3 4 5 6 7|0|0 1 2 3 4 5 6 7 8 9 0 1 2|0 1 2 3 4 5 6 7|
P24L7	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
P24L8	      |   |  C2_1       | C3_1        | C4_1        |GA1  | GB1   |P2 |
P24L9	      |   |             |             |             |     |       |   |
P24L10	      |8 9|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2|0 1 2 3|0 1|
P24L11	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
P24L12	      |     |          C0_2           |       C1_2        |    C2_2   |
P24L13	      |     |                    1 1 1|                   |           |
P24L14	      |2 3 4|0 1 2 3 4 5 6 7 8 9 0 1 2|0 1 2 3 4 5 6 7 8 9|0 1 2 3 4 5|
P24L15	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
P24L16	      | |    C3_2     |     C4_2    | GA2 | GB2   |DC |
P24L17	      | |             |             |     |       |   |
P24L18	      |6|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2|0 1 2 3|0 1|
P24L19	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
P24L20	
P24L21	         Figure 8: G.729 Annex E (backward adaptive mode) bit packing
P24L22	
P24L23	4.5.8 GSM
P24L24	
P24L25	   GSM (Group Speciale Mobile) denotes the European GSM 06.10 standard
P24L26	   for full-rate speech transcoding, ETS 300 961, which is based on
P24L27	   RPE/LTP (residual pulse excitation/long term prediction) coding at a
P24L28	   rate of 13 kb/s [11,12,13].  The text of the standard can be obtained
P24L29	   from:
P24L30	
P24L31	   ETSI (European Telecommunications Standards Institute)
P24L32	   ETSI Secretariat: B.P.152
P24L33	   F-06561 Valbonne Cedex
P24L34	   France
P24L35	   Phone: +33 92 94 42 00
P24L36	   Fax:   +33 93 65 47 16
P24L37	
P24L38	   Blocks of 160 audio samples are compressed into 33 octets, for an
P24L39	   effective data rate of 13,200 b/s.
P24L40	
P24L41	4.5.8.1  General Packaging Issues
P24L42	
P24L43	   The GSM standard (ETS 300 961) specifies the bit stream produced by
P24L44	   the codec, but does not specify how these bits should be packed for
P24L45	   transmission.  The packetization specified here has subsequently been
P24L46	   adopted in ETSI Technical Specification TS 101 318.  Some software
P24L47	   implementations of the GSM codec use a different packing than that
P24L48	   specified here.
P25L1	               field  field name  bits  field  field name  bits
P25L2	               ________________________________________________
P25L3	               1      LARc[0]     6     39     xmc[22]     3
P25L4	               2      LARc[1]     6     40     xmc[23]     3
P25L5	               3      LARc[2]     5     41     xmc[24]     3
P25L6	               4      LARc[3]     5     42     xmc[25]     3
P25L7	               5      LARc[4]     4     43     Nc[2]       7
P25L8	               6      LARc[5]     4     44     bc[2]       2
P25L9	               7      LARc[6]     3     45     Mc[2]       2
P25L10	               8      LARc[7]     3     46     xmaxc[2]    6
P25L11	               9      Nc[0]       7     47     xmc[26]     3
P25L12	               10     bc[0]       2     48     xmc[27]     3
P25L13	               11     Mc[0]       2     49     xmc[28]     3
P25L14	               12     xmaxc[0]    6     50     xmc[29]     3
P25L15	               13     xmc[0]      3     51     xmc[30]     3
P25L16	               14     xmc[1]      3     52     xmc[31]     3
P25L17	               15     xmc[2]      3     53     xmc[32]     3
P25L18	               16     xmc[3]      3     54     xmc[33]     3
P25L19	               17     xmc[4]      3     55     xmc[34]     3
P25L20	               18     xmc[5]      3     56     xmc[35]     3
P25L21	               19     xmc[6]      3     57     xmc[36]     3
P25L22	               20     xmc[7]      3     58     xmc[37]     3
P25L23	               21     xmc[8]      3     59     xmc[38]     3
P25L24	               22     xmc[9]      3     60     Nc[3]       7
P25L25	               23     xmc[10]     3     61     bc[3]       2
P25L26	               24     xmc[11]     3     62     Mc[3]       2
P25L27	               25     xmc[12]     3     63     xmaxc[3]    6
P25L28	               26     Nc[1]       7     64     xmc[39]     3
P25L29	               27     bc[1]       2     65     xmc[40]     3
P25L30	               28     Mc[1]       2     66     xmc[41]     3
P25L31	               29     xmaxc[1]    6     67     xmc[42]     3
P25L32	               30     xmc[13]     3     68     xmc[43]     3
P25L33	               31     xmc[14]     3     69     xmc[44]     3
P25L34	               32     xmc[15]     3     70     xmc[45]     3
P25L35	               33     xmc[16]     3     71     xmc[46]     3
P25L36	               34     xmc[17]     3     72     xmc[47]     3
P25L37	               35     xmc[18]     3     73     xmc[48]     3
P25L38	               36     xmc[19]     3     74     xmc[49]     3
P25L39	               37     xmc[20]     3     75     xmc[50]     3
P25L40	               38     xmc[21]     3     76     xmc[51]     3
P25L41	
P25L42	                      Table 2: Ordering of GSM variables
P25L43	
P25L44	
P25L45	
P25L46	
P25L47	
P25L48	
P26L1	   Octet  Bit 0   Bit 1   Bit 2   Bit 3   Bit 4   Bit 5   Bit 6   Bit 7
P26L2	   _____________________________________________________________________
P26L3	       0    1       1       0       1    LARc0.0 LARc0.1 LARc0.2 LARc0.3
P26L4	       1 LARc0.4 LARc0.5 LARc1.0 LARc1.1 LARc1.2 LARc1.3 LARc1.4 LARc1.5
P26L5	       2 LARc2.0 LARc2.1 LARc2.2 LARc2.3 LARc2.4 LARc3.0 LARc3.1 LARc3.2
P26L6	       3 LARc3.3 LARc3.4 LARc4.0 LARc4.1 LARc4.2 LARc4.3 LARc5.0 LARc5.1
P26L7	       4 LARc5.2 LARc5.3 LARc6.0 LARc6.1 LARc6.2 LARc7.0 LARc7.1 LARc7.2
P26L8	       5  Nc0.0   Nc0.1   Nc0.2   Nc0.3   Nc0.4   Nc0.5   Nc0.6  bc0.0
P26L9	       6  bc0.1   Mc0.0   Mc0.1  xmaxc00 xmaxc01 xmaxc02 xmaxc03 xmaxc04
P26L10	       7 xmaxc05 xmc0.0  xmc0.1  xmc0.2  xmc1.0  xmc1.1  xmc1.2  xmc2.0
P26L11	       8 xmc2.1  xmc2.2  xmc3.0  xmc3.1  xmc3.2  xmc4.0  xmc4.1  xmc4.2
P26L12	       9 xmc5.0  xmc5.1  xmc5.2  xmc6.0  xmc6.1  xmc6.2  xmc7.0  xmc7.1
P26L13	      10 xmc7.2  xmc8.0  xmc8.1  xmc8.2  xmc9.0  xmc9.1  xmc9.2  xmc10.0
P26L14	      11 xmc10.1 xmc10.2 xmc11.0 xmc11.1 xmc11.2 xmc12.0 xmc12.1 xcm12.2
P26L15	      12  Nc1.0   Nc1.1   Nc1.2   Nc1.3   Nc1.4   Nc1.5   Nc1.6   bc1.0
P26L16	      13  bc1.1   Mc1.0   Mc1.1  xmaxc10 xmaxc11 xmaxc12 xmaxc13 xmaxc14
P26L17	      14 xmax15  xmc13.0 xmc13.1 xmc13.2 xmc14.0 xmc14.1 xmc14.2 xmc15.0
P26L18	      15 xmc15.1 xmc15.2 xmc16.0 xmc16.1 xmc16.2 xmc17.0 xmc17.1 xmc17.2
P26L19	      16 xmc18.0 xmc18.1 xmc18.2 xmc19.0 xmc19.1 xmc19.2 xmc20.0 xmc20.1
P26L20	      17 xmc20.2 xmc21.0 xmc21.1 xmc21.2 xmc22.0 xmc22.1 xmc22.2 xmc23.0
P26L21	      18 xmc23.1 xmc23.2 xmc24.0 xmc24.1 xmc24.2 xmc25.0 xmc25.1 xmc25.2
P26L22	      19  Nc2.0   Nc2.1   Nc2.2   Nc2.3   Nc2.4   Nc2.5   Nc2.6   bc2.0
P26L23	      20  bc2.1   Mc2.0   Mc2.1  xmaxc20 xmaxc21 xmaxc22 xmaxc23 xmaxc24
P26L24	      21 xmaxc25 xmc26.0 xmc26.1 xmc26.2 xmc27.0 xmc27.1 xmc27.2 xmc28.0
P26L25	      22 xmc28.1 xmc28.2 xmc29.0 xmc29.1 xmc29.2 xmc30.0 xmc30.1 xmc30.2
P26L26	      23 xmc31.0 xmc31.1 xmc31.2 xmc32.0 xmc32.1 xmc32.2 xmc33.0 xmc33.1
P26L27	      24 xmc33.2 xmc34.0 xmc34.1 xmc34.2 xmc35.0 xmc35.1 xmc35.2 xmc36.0
P26L28	      25 Xmc36.1 xmc36.2 xmc37.0 xmc37.1 xmc37.2 xmc38.0 xmc38.1 xmc38.2
P26L29	      26  Nc3.0   Nc3.1   Nc3.2   Nc3.3   Nc3.4   Nc3.5   Nc3.6   bc3.0
P26L30	      27  bc3.1   Mc3.0   Mc3.1  xmaxc30 xmaxc31 xmaxc32 xmaxc33 xmaxc34
P26L31	      28 xmaxc35 xmc39.0 xmc39.1 xmc39.2 xmc40.0 xmc40.1 xmc40.2 xmc41.0
P26L32	      29 xmc41.1 xmc41.2 xmc42.0 xmc42.1 xmc42.2 xmc43.0 xmc43.1 xmc43.2
P26L33	      30 xmc44.0 xmc44.1 xmc44.2 xmc45.0 xmc45.1 xmc45.2 xmc46.0 xmc46.1
P26L34	      31 xmc46.2 xmc47.0 xmc47.1 xmc47.2 xmc48.0 xmc48.1 xmc48.2 xmc49.0
P26L35	      32 xmc49.1 xmc49.2 xmc50.0 xmc50.1 xmc50.2 xmc51.0 xmc51.1 xmc51.2
P26L36	
P26L37	                        Table 3: GSM payload format
P26L38	
P26L39	   In the GSM packing used by RTP, the bits SHALL be packed beginning
P26L40	   from the most significant bit.  Every 160 sample GSM frame is coded
P26L41	   into one 33 octet (264 bit) buffer.  Every such buffer begins with a
P26L42	   4 bit signature (0xD), followed by the MSB encoding of the fields of
P26L43	   the frame.  The first octet thus contains 1101 in the 4 most
P26L44	   significant bits (0-3) and the 4 most significant bits of F1 (0-3) in
P26L45	   the 4 least significant bits (4-7).  The second octet contains the 2
P26L46	   least significant bits of F1 in bits 0-1, and F2 in bits 2-7, and so
P26L47	   on.  The order of the fields in the frame is described in Table 2.
P26L48	
P27L1	4.5.8.2   GSM Variable Names and Numbers
P27L2	
P27L3	   In the RTP encoding we have the bit pattern described in Table 3,
P27L4	   where F.i signifies the ith bit of the field F, bit 0 is the most
P27L5	   significant bit, and the bits of every octet are numbered from 0 to 7
P27L6	   from most to least significant.
P27L7	
P27L8	4.5.9 GSM-EFR
P27L9	
P27L10	   GSM-EFR denotes GSM 06.60 enhanced full rate speech transcoding,
P27L11	   specified in ETS 300 726 which is available from ETSI at the address
P27L12	   given in Section 4.5.8.  This codec has a frame length of 244 bits.
P27L13	   For transmission in RTP, each codec frame is packed into a 31 octet
P27L14	   (248 bit) buffer beginning with a 4-bit signature 0xC in a manner
P27L15	   similar to that specified here for the original GSM 06.10 codec.  The
P27L16	   packing is specified in ETSI Technical Specification TS 101 318.
P27L17	
P27L18	4.5.10 L8
P27L19	
P27L20	   L8 denotes linear audio data samples, using 8-bits of precision with
P27L21	   an offset of 128, that is, the most negative signal is encoded as
P27L22	   zero.
P27L23	
P27L24	4.5.11 L16
P27L25	
P27L26	   L16 denotes uncompressed audio data samples, using 16-bit signed
P27L27	   representation with 65,535 equally divided steps between minimum and
P27L28	   maximum signal level, ranging from -32,768 to 32,767.  The value is
P27L29	   represented in two's complement notation and transmitted in network
P27L30	   byte order (most significant byte first).
P27L31	
P27L32	   The MIME registration for L16 in RFC 3555 [7] specifies parameters
P27L33	   that MAY be used with MIME or SDP to indicate that analog pre-
P27L34	   emphasis was applied to the signal before quantization or to indicate
P27L35	   that a multiple-channel audio stream follows a different channel
P27L36	   ordering convention than is specified in Section 4.1.
P27L37	
P27L38	4.5.12 LPC
P27L39	
P27L40	   LPC designates an experimental linear predictive encoding contributed
P27L41	   by Ron Frederick, which is based on an implementation written by Ron
P27L42	   Zuckerman posted to the Usenet group comp.dsp on June 26, 1992.  The
P27L43	   codec generates 14 octets for every frame.  The framesize is set to
P27L44	   20 ms, resulting in a bit rate of 5,600 b/s.
P27L45	
P27L46	
P27L47	
P27L48	
P28L1	4.5.13 MPA
P28L2	
P28L3	   MPA denotes MPEG-1 or MPEG-2 audio encapsulated as elementary
P28L4	   streams.  The encoding is defined in ISO standards ISO/IEC 11172-3
P28L5	   and 13818-3.  The encapsulation is specified in RFC 2250 [14].
P28L6	
P28L7	   The encoding may be at any of three levels of complexity, called
P28L8	   Layer I, II and III.  The selected layer as well as the sampling rate
P28L9	   and channel count are indicated in the payload.  The RTP timestamp
P28L10	   clock rate is always 90,000, independent of the sampling rate.
P28L11	   MPEG-1 audio supports sampling rates of 32, 44.1, and 48 kHz (ISO/IEC
P28L12	   11172-3, section 1.1; "Scope").  MPEG-2 supports sampling rates of
P28L13	   16, 22.05 and 24 kHz.  The number of samples per frame is fixed, but
P28L14	   the frame size will vary with the sampling rate and bit rate.
P28L15	
P28L16	   The MIME registration for MPA in RFC 3555 [7] specifies parameters
P28L17	   that MAY be used with MIME or SDP to restrict the selection of layer,
P28L18	   channel count, sampling rate, and bit rate.
P28L19	
P28L20	4.5.14 PCMA and PCMU
P28L21	
P28L22	   PCMA and PCMU are specified in ITU-T Recommendation G.711.  Audio
P28L23	   data is encoded as eight bits per sample, after logarithmic scaling.
P28L24	   PCMU denotes mu-law scaling, PCMA A-law scaling.  A detailed
P28L25	   description is given by Jayant and Noll [15].  Each G.711 octet SHALL
P28L26	   be octet-aligned in an RTP packet.  The sign bit of each G.711 octet
P28L27	   SHALL correspond to the most significant bit of the octet in the RTP
P28L28	   packet (i.e., assuming the G.711 samples are handled as octets on the
P28L29	   host machine, the sign bit SHALL be the most significant bit of the
P28L30	   octet as defined by the host machine format).  The 56 kb/s and 48
P28L31	   kb/s modes of G.711 are not applicable to RTP, since PCMA and PCMU
P28L32	   MUST always be transmitted as 8-bit samples.
P28L33	
P28L34	   See Section 4.1 regarding silence suppression.
P28L35	
P28L36	4.5.15 QCELP
P28L37	
P28L38	   The Electronic Industries Association (EIA) & Telecommunications
P28L39	   Industry Association (TIA) standard IS-733, "TR45: High Rate Speech
P28L40	   Service Option for Wideband Spread Spectrum Communications Systems",
P28L41	   defines the QCELP audio compression algorithm for use in wireless
P28L42	   CDMA applications.  The QCELP CODEC compresses each 20 milliseconds
P28L43	   of 8,000 Hz, 16-bit sampled input speech into one of four different
P28L44	   size output frames:  Rate 1 (266 bits), Rate 1/2 (124 bits), Rate 1/4
P28L45	   (54 bits) or Rate 1/8 (20 bits).  For typical speech patterns, this
P28L46	   results in an average output of 6.8 kb/s for normal mode and 4.7 kb/s
P28L47	   for reduced rate mode.  The packetization of the QCELP audio codec is
P28L48	   described in [16].
P29L1	4.5.16 RED
P29L2	
P29L3	   The redundant audio payload format "RED" is specified by RFC 2198
P29L4	   [17].  It defines a means by which multiple redundant copies of an
P29L5	   audio packet may be transmitted in a single RTP stream.  Each packet
P29L6	   in such a stream contains, in addition to the audio data for that
P29L7	   packetization interval, a (more heavily compressed) copy of the data
P29L8	   from a previous packetization interval.  This allows an approximation
P29L9	   of the data from lost packets to be recovered upon decoding of a
P29L10	   subsequent packet, giving much improved sound quality when compared
P29L11	   with silence substitution for lost packets.
P29L12	
P29L13	4.5.17 VDVI
P29L14	
P29L15	   VDVI is a variable-rate version of DVI4, yielding speech bit rates of
P29L16	   between 10 and 25 kb/s.  It is specified for single-channel operation
P29L17	   only.  Samples are packed into octets starting at the most-
P29L18	   significant bit.  The last octet is padded with 1 bits if the last
P29L19	   sample does not fill the last octet.  This padding is distinct from
P29L20	   the valid codewords.  The receiver needs to detect the padding
P29L21	   because there is no explicit count of samples in the packet.
P29L22	
P29L23	   It uses the following encoding:
P29L24	
P29L25	            DVI4 codeword  VDVI bit pattern
P29L26	            _______________________________
P29L27	                        0  00
P29L28	                        1  010
P29L29	                        2  1100
P29L30	                        3  11100
P29L31	                        4  111100
P29L32	                        5  1111100
P29L33	                        6  11111100
P29L34	                        7  11111110
P29L35	                        8  10
P29L36	                        9  011
P29L37	                       10  1101
P29L38	                       11  11101
P29L39	                       12  111101
P29L40	                       13  1111101
P29L41	                       14  11111101
P29L42	                       15  11111111
P29L43	
P29L44	
P29L45	
P29L46	
P29L47	
P29L48	
P30L1	5.  Video
P30L2	
P30L3	   The following sections describe the video encodings that are defined
P30L4	   in this memo and give their abbreviated names used for
P30L5	   identification.  These video encodings and their payload types are
P30L6	   listed in Table 5.
P30L7	
P30L8	   All of these video encodings use an RTP timestamp frequency of 90,000
P30L9	   Hz, the same as the MPEG presentation time stamp frequency.  This
P30L10	   frequency yields exact integer timestamp increments for the typical
P30L11	   24 (HDTV), 25 (PAL), and 29.97 (NTSC) and 30 Hz (HDTV) frame rates
P30L12	   and 50, 59.94 and 60 Hz field rates.  While 90 kHz is the RECOMMENDED
P30L13	   rate for future video encodings used within this profile, other rates
P30L14	   MAY be used.  However, it is not sufficient to use the video frame
P30L15	   rate (typically between 15 and 30 Hz) because that does not provide
P30L16	   adequate resolution for typical synchronization requirements when
P30L17	   calculating the RTP timestamp corresponding to the NTP timestamp in
P30L18	   an RTCP SR packet.  The timestamp resolution MUST also be sufficient
P30L19	   for the jitter estimate contained in the receiver reports.
P30L20	
P30L21	   For most of these video encodings, the RTP timestamp encodes the
P30L22	   sampling instant of the video image contained in the RTP data packet.
P30L23	   If a video image occupies more than one packet, the timestamp is the
P30L24	   same on all of those packets.  Packets from different video images
P30L25	   are distinguished by their different timestamps.
P30L26	
P30L27	   Most of these video encodings also specify that the marker bit of the
P30L28	   RTP header SHOULD be set to one in the last packet of a video frame
P30L29	   and otherwise set to zero.  Thus, it is not necessary to wait for a
P30L30	   following packet with a different timestamp to detect that a new
P30L31	   frame should be displayed.
P30L32	
P30L33	5.1  CelB
P30L34	
P30L35	   The CELL-B encoding is a proprietary encoding proposed by Sun
P30L36	   Microsystems.  The byte stream format is described in RFC 2029 [18].
P30L37	
P30L38	5.2 JPEG
P30L39	
P30L40	   The encoding is specified in ISO Standards 10918-1 and 10918-2.  The
P30L41	   RTP payload format is as specified in RFC 2435 [19].
P30L42	
P30L43	5.3 H261
P30L44	
P30L45	   The encoding is specified in ITU-T Recommendation H.261, "Video codec
P30L46	   for audiovisual services at p x 64 kbit/s".  The packetization and
P30L47	   RTP-specific properties are described in RFC 2032 [20].
P30L48	
P31L1	5.4 H263
P31L2	
P31L3	   The encoding is specified in the 1996 version of ITU-T Recommendation
P31L4	   H.263, "Video coding for low bit rate communication".  The
P31L5	   packetization and RTP-specific properties are described in RFC 2190
P31L6	   [21].  The H263-1998 payload format is RECOMMENDED over this one for
P31L7	   use by new implementations.
P31L8	
P31L9	5.5 H263-1998
P31L10	
P31L11	   The encoding is specified in the 1998 version of ITU-T Recommendation
P31L12	   H.263, "Video coding for low bit rate communication".  The
P31L13	   packetization and RTP-specific properties are described in RFC 2429
P31L14	   [22].  Because the 1998 version of H.263 is a superset of the 1996
P31L15	   syntax, this payload format can also be used with the 1996 version of
P31L16	   H.263, and is RECOMMENDED for this use by new implementations.  This
P31L17	   payload format does not replace RFC 2190, which continues to be used
P31L18	   by existing implementations, and may be required for backward
P31L19	   compatibility in new implementations.  Implementations using the new
P31L20	   features of the 1998 version of H.263 MUST use the payload format
P31L21	   described in RFC 2429.
P31L22	
P31L23	5.6 MPV
P31L24	
P31L25	   MPV designates the use of MPEG-1 and MPEG-2 video encoding elementary
P31L26	   streams as specified in ISO Standards ISO/IEC 11172 and 13818-2,
P31L27	   respectively.  The RTP payload format is as specified in RFC 2250
P31L28	   [14], Section 3.
P31L29	
P31L30	   The MIME registration for MPV in RFC 3555 [7] specifies a parameter
P31L31	   that MAY be used with MIME or SDP to restrict the selection of the
P31L32	   type of MPEG video.
P31L33	
P31L34	5.7 MP2T
P31L35	
P31L36	   MP2T designates the use of MPEG-2 transport streams, for either audio
P31L37	   or video.  The RTP payload format is described in RFC 2250 [14],
P31L38	   Section 2.
P31L39	
P31L40	
P31L41	
P31L42	
P31L43	
P31L44	
P31L45	
P31L46	
P31L47	
P31L48	
P32L1	5.8 nv
P32L2	
P32L3	   The encoding is implemented in the program `nv', version 4, developed
P32L4	   at Xerox PARC by Ron Frederick.  Further information is available
P32L5	   from the author:
P32L6	
P32L7	   Ron Frederick
P32L8	   Blue Coat Systems Inc.
P32L9	   650 Almanor Avenue
P32L10	   Sunnyvale, CA 94085
P32L11	   United States
P32L12	   EMail: ronf@bluecoat.com
P32L13	
P32L14	6.  Payload Type Definitions
P32L15	
P32L16	   Tables 4 and 5 define this profile's static payload type values for
P32L17	   the PT field of the RTP data header.  In addition, payload type
P32L18	   values in the range 96-127 MAY be defined dynamically through a
P32L19	   conference control protocol, which is beyond the scope of this
P32L20	   document.  For example, a session directory could specify that for a
P32L21	   given session, payload type 96 indicates PCMU encoding, 8,000 Hz
P32L22	   sampling rate, 2 channels.  Entries in Tables 4 and 5 with payload
P32L23	   type "dyn" have no static payload type assigned and are only used
P32L24	   with a dynamic payload type.  Payload type 2 was assigned to G721 in
P32L25	   RFC 1890 and to its equivalent successor G726-32 in draft versions of
P32L26	   this specification, but its use is now deprecated and that static
P32L27	   payload type is marked reserved due to conflicting use for the
P32L28	   payload formats G726-32 and AAL2-G726-32 (see Section 4.5.4).
P32L29	   Payload type 13 indicates the Comfort Noise (CN) payload format
P32L30	   specified in RFC 3389 [9].  Payload type 19 is marked "reserved"
P32L31	   because some draft versions of this specification assigned that
P32L32	   number to an earlier version of the comfort noise payload format.
P32L33	   The payload type range 72-76 is marked "reserved" so that RTCP and
P32L34	   RTP packets can be reliably distinguished (see Section "Summary of
P32L35	   Protocol Constants" of the RTP protocol specification).
P32L36	
P32L37	   The payload types currently defined in this profile are assigned to
P32L38	   exactly one of three categories or media types:  audio only, video
P32L39	   only and those combining audio and video.  The media types are marked
P32L40	   in Tables 4 and 5 as "A", "V" and "AV", respectively.  Payload types
P32L41	   of different media types SHALL NOT be interleaved or multiplexed
P32L42	   within a single RTP session, but multiple RTP sessions MAY be used in
P32L43	   parallel to send multiple media types.  An RTP source MAY change
P32L44	   payload types within the same media type during a session.  See the
P32L45	   section "Multiplexing RTP Sessions" of RFC 3550 for additional
P32L46	   explanation.
P32L47	
P32L48	
P33L1	               PT   encoding    media type  clock rate   channels
P33L2	                    name                    (Hz)
P33L3	               ___________________________________________________
P33L4	               0    PCMU        A            8,000       1
P33L5	               1    reserved    A
P33L6	               2    reserved    A
P33L7	               3    GSM         A            8,000       1
P33L8	               4    G723        A            8,000       1
P33L9	               5    DVI4        A            8,000       1
P33L10	               6    DVI4        A           16,000       1
P33L11	               7    LPC         A            8,000       1
P33L12	               8    PCMA        A            8,000       1
P33L13	               9    G722        A            8,000       1
P33L14	               10   L16         A           44,100       2
P33L15	               11   L16         A           44,100       1
P33L16	               12   QCELP       A            8,000       1
P33L17	               13   CN          A            8,000       1
P33L18	               14   MPA         A           90,000       (see text)
P33L19	               15   G728        A            8,000       1
P33L20	               16   DVI4        A           11,025       1
P33L21	               17   DVI4        A           22,050       1
P33L22	               18   G729        A            8,000       1
P33L23	               19   reserved    A
P33L24	               20   unassigned  A
P33L25	               21   unassigned  A
P33L26	               22   unassigned  A
P33L27	               23   unassigned  A
P33L28	               dyn  G726-40     A            8,000       1
P33L29	               dyn  G726-32     A            8,000       1
P33L30	               dyn  G726-24     A            8,000       1
P33L31	               dyn  G726-16     A            8,000       1
P33L32	               dyn  G729D       A            8,000       1
P33L33	               dyn  G729E       A            8,000       1
P33L34	               dyn  GSM-EFR     A            8,000       1
P33L35	               dyn  L8          A            var.        var.
P33L36	               dyn  RED         A                        (see text)
P33L37	               dyn  VDVI        A            var.        1
P33L38	
P33L39	               Table 4: Payload types (PT) for audio encodings
P33L40	
P33L41	
P33L42	
P33L43	
P33L44	
P33L45	
P33L46	
P33L47	
P33L48	
P34L1	               PT      encoding    media type  clock rate
P34L2	                       name                    (Hz)
P34L3	               _____________________________________________
P34L4	               24      unassigned  V
P34L5	               25      CelB        V           90,000
P34L6	               26      JPEG        V           90,000
P34L7	               27      unassigned  V
P34L8	               28      nv          V           90,000
P34L9	               29      unassigned  V
P34L10	               30      unassigned  V
P34L11	               31      H261        V           90,000
P34L12	               32      MPV         V           90,000
P34L13	               33      MP2T        AV          90,000
P34L14	               34      H263        V           90,000
P34L15	               35-71   unassigned  ?
P34L16	               72-76   reserved    N/A         N/A
P34L17	               77-95   unassigned  ?
P34L18	               96-127  dynamic     ?
P34L19	               dyn     H263-1998   V           90,000
P34L20	
P34L21	               Table 5: Payload types (PT) for video and combined
P34L22	                        encodings
P34L23	
P34L24	   Session participants agree through mechanisms beyond the scope of
P34L25	   this specification on the set of payload types allowed in a given
P34L26	   session.  This set MAY, for example, be defined by the capabilities
P34L27	   of the applications used, negotiated by a conference control protocol
P34L28	   or established by agreement between the human participants.
P34L29	
P34L30	   Audio applications operating under this profile SHOULD, at a minimum,
P34L31	   be able to send and/or receive payload types 0 (PCMU) and 5 (DVI4).
P34L32	   This allows interoperability without format negotiation and ensures
P34L33	   successful negotiation with a conference control protocol.
P34L34	
P34L35	7.  RTP over TCP and Similar Byte Stream Protocols
P34L36	
P34L37	   Under special circumstances, it may be necessary to carry RTP in
P34L38	   protocols offering a byte stream abstraction, such as TCP, possibly
P34L39	   multiplexed with other data.  The application MUST define its own
P34L40	   method of delineating RTP and RTCP packets (RTSP [23] provides an
P34L41	   example of such an encapsulation specification).
P34L42	
P34L43	8.  Port Assignment
P34L44	
P34L45	   As specified in the RTP protocol definition, RTP data SHOULD be
P34L46	   carried on an even UDP port number and the corresponding RTCP packets
P34L47	   SHOULD be carried on the next higher (odd) port number.
P34L48	
P35L1	   Applications operating under this profile MAY use any such UDP port
P35L2	   pair.  For example, the port pair MAY be allocated randomly by a
P35L3	   session management program.  A single fixed port number pair cannot
P35L4	   be required because multiple applications using this profile are
P35L5	   likely to run on the same host, and there are some operating systems
P35L6	   that do not allow multiple processes to use the same UDP port with
P35L7	   different multicast addresses.
P35L8	
P35L9	   However, port numbers 5004 and 5005 have been registered for use with
P35L10	   this profile for those applications that choose to use them as the
P35L11	   default pair.  Applications that operate under multiple profiles MAY
P35L12	   use this port pair as an indication to select this profile if they
P35L13	   are not subject to the constraint of the previous paragraph.
P35L14	   Applications need not have a default and MAY require that the port
P35L15	   pair be explicitly specified.  The particular port numbers were
P35L16	   chosen to lie in the range above 5000 to accommodate port number
P35L17	   allocation practice within some versions of the Unix operating
P35L18	   system, where port numbers below 1024 can only be used by privileged
P35L19	   processes and port numbers between 1024 and 5000 are automatically
P35L20	   assigned by the operating system.
P35L21	
P35L22	9.  Changes from RFC 1890
P35L23	
P35L24	   This RFC revises RFC 1890.  It is mostly backwards-compatible with
P35L25	   RFC 1890 except for functions removed because two interoperable
P35L26	   implementations were not found.  The additions to RFC 1890 codify
P35L27	   existing practice in the use of payload formats under this profile.
P35L28	   Since this profile may be used without using any of the payload
P35L29	   formats listed here, the addition of new payload formats in this
P35L30	   revision does not affect backwards compatibility.  The changes are
P35L31	   listed below, categorized into functional and non-functional changes.
P35L32	
P35L33	   Functional changes:
P35L34	
P35L35	   o  Section 11, "IANA Considerations" was added to specify the
P35L36	      registration of the name for this profile.  That appendix also
P35L37	      references a new Section 3 "Registering Additional Encodings"
P35L38	      which establishes a policy that no additional registration of
P35L39	      static payload types for this profile will be made beyond those
P35L40	      added in this revision and included in Tables 4 and 5.  Instead,
P35L41	      additional encoding names may be registered as MIME subtypes for
P35L42	      binding to dynamic payload types.  Non-normative references were
P35L43	      added to RFC 3555 [7] where MIME subtypes for all the listed
P35L44	      payload formats are registered, some with optional parameters for
P35L45	      use of the payload formats.
P35L46	
P35L47	
P35L48	
P36L1	   o  Static payload types 4, 16, 17 and 34 were added to incorporate
P36L2	      IANA registrations made since the publication of RFC 1890, along
P36L3	      with the corresponding payload format descriptions for G723 and
P36L4	      H263.
P36L5	
P36L6	   o  Following working group discussion, static payload types 12 and 18
P36L7	      were added along with the corresponding payload format
P36L8	      descriptions for QCELP and G729.  Static payload type 13 was
P36L9	      assigned to the Comfort Noise (CN) payload format defined in RFC
P36L10	      3389.  Payload type 19 was marked reserved because it had been
P36L11	      temporarily allocated to an earlier version of Comfort Noise
P36L12	      present in some draft revisions of this document.
P36L13	
P36L14	   o  The payload format for G721 was renamed to G726-32 following the
P36L15	      ITU-T renumbering, and the payload format description for G726 was
P36L16	      expanded to include the -16, -24 and -40 data rates.  Because of
P36L17	      confusion regarding draft revisions of this document, some
P36L18	      implementations of these G726 payload formats packed samples into
P36L19	      octets starting with the most significant bit rather than the
P36L20	      least significant bit as specified here.  To partially resolve
P36L21	      this incompatibility, new payload formats named AAL2-G726-16, -24,
P36L22	      -32 and -40 will be specified in a separate document (see note in
P36L23	      Section 4.5.4), and use of static payload type 2 is deprecated as
P36L24	      explained in Section 6.
P36L25	
P36L26	   o  Payload formats G729D and G729E were added following the ITU-T
P36L27	      addition of Annexes D and E to Recommendation G.729.  Listings
P36L28	      were added for payload formats GSM-EFR, RED, and H263-1998
P36L29	      published in other documents subsequent to RFC 1890.  These
P36L30	      additional payload formats are referenced only by dynamic payload
P36L31	      type numbers.
P36L32	
P36L33	   o  The descriptions of the payload formats for G722, G728, GSM, VDVI
P36L34	      were expanded.
P36L35	
P36L36	   o  The payload format for 1016 audio was removed and its static
P36L37	      payload type assignment 1 was marked "reserved" because two
P36L38	      interoperable implementations were not found.
P36L39	
P36L40	   o  Requirements for congestion control were added in Section 2.
P36L41	
P36L42	   o  This profile follows the suggestion in the revised RTP spec that
P36L43	      RTCP bandwidth may be specified separately from the session
P36L44	      bandwidth and separately for active senders and passive receivers.
P36L45	
P36L46	   o  The mapping of a user pass-phrase string into an encryption key
P36L47	      was deleted from Section 2 because two interoperable
P36L48	      implementations were not found.
P37L1	   o  The "quadrophonic" sample ordering convention for four-channel
P37L2	      audio was removed to eliminate an ambiguity as noted in Section
P37L3	      4.1.
P37L4	
P37L5	   Non-functional changes:
P37L6	
P37L7	   o  In Section 4.1, it is now explicitly stated that silence
P37L8	      suppression is allowed for all audio payload formats.  (This has
P37L9	      always been the case and derives from a fundamental aspect of
P37L10	      RTP's design and the motivations for packet audio, but was not
P37L11	      explicit stated before.)  The use of comfort noise is also
P37L12	      explained.
P37L13	
P37L14	   o  In Section 4.1, the requirement level for setting of the marker
P37L15	      bit on the first packet after silence for audio was changed from
P37L16	      "is" to "SHOULD be", and clarified that the marker bit is set only
P37L17	      when packets are intentionally not sent.
P37L18	
P37L19	   o  Similarly, text was added to specify that the marker bit SHOULD be
P37L20	      set to one on the last packet of a video frame, and that video
P37L21	      frames are distinguished by their timestamps.
P37L22	
P37L23	   o  RFC references are added for payload formats published after RFC
P37L24	      1890.
P37L25	
P37L26	   o  The security considerations and full copyright sections were
P37L27	      added.
P37L28	
P37L29	   o  According to Peter Hoddie of Apple, only pre-1994 Macintosh used
P37L30	      the 22254.54 rate and none the 11127.27 rate, so the latter was
P37L31	      dropped from the discussion of suggested sampling frequencies.
P37L32	
P37L33	   o  Table 1 was corrected to move some values from the "ms/packet"
P37L34	      column to the "default ms/packet" column where they belonged.
P37L35	
P37L36	   o  Since the Interactive Multimedia Association ceased operations, an
P37L37	      alternate resource was provided for a referenced IMA document.
P37L38	
P37L39	   o  A note has been added for G722 to clarify a discrepancy between
P37L40	      the actual sampling rate and the RTP timestamp clock rate.
P37L41	
P37L42	   o  Small clarifications of the text have been made in several places,
P37L43	      some in response to questions from readers.  In particular:
P37L44	
P37L45	      -  A definition for "media type" is given in Section 1.1 to allow
P37L46	         the explanation of multiplexing RTP sessions in Section 6 to be
P37L47	         more clear regarding the multiplexing of multiple media.
P37L48	
P38L1	      -  The explanation of how to determine the number of audio frames
P38L2	         in a packet from the length was expanded.
P38L3	
P38L4	      -  More description of the allocation of bandwidth to SDES items
P38L5	         is given.
P38L6	
P38L7	      -  A note was added that the convention for the order of channels
P38L8	         specified in Section 4.1 may be overridden by a particular
P38L9	         encoding or payload format specification.
P38L10	
P38L11	      -  The terms MUST, SHOULD, MAY, etc. are used as defined in RFC
P38L12	         2119.
P38L13	
P38L14	   o  A second author for this document was added.
P38L15	
P38L16	10. Security Considerations
P38L17	
P38L18	   Implementations using the profile defined in this specification are
P38L19	   subject to the security considerations discussed in the RTP
P38L20	   specification [1].  This profile does not specify any different
P38L21	   security services.  The primary function of this profile is to list a
P38L22	   set of data compression encodings for audio and video media.
P38L23	
P38L24	   Confidentiality of the media streams is achieved by encryption.
P38L25	   Because the data compression used with the payload formats described
P38L26	   in this profile is applied end-to-end, encryption may be performed
P38L27	   after compression so there is no conflict between the two operations.
P38L28	
P38L29	   A potential denial-of-service threat exists for data encodings using
P38L30	   compression techniques that have non-uniform receiver-end
P38L31	   computational load.  The attacker can inject pathological datagrams
P38L32	   into the stream which are complex to decode and cause the receiver to
P38L33	   be overloaded.
P38L34	
P38L35	   As with any IP-based protocol, in some circumstances a receiver may
P38L36	   be overloaded simply by the receipt of too many packets, either
P38L37	   desired or undesired.  Network-layer authentication MAY be used to
P38L38	   discard packets from undesired sources, but the processing cost of
P38L39	   the authentication itself may be too high.  In a multicast
P38L40	   environment, source pruning is implemented in IGMPv3 (RFC 3376) [24]
P38L41	   and in multicast routing protocols to allow a receiver to select
P38L42	   which sources are allowed to reach it.
P38L43	
P38L44	
P38L45	
P38L46	
P38L47	
P38L48	
P39L1	11. IANA Considerations
P39L2	
P39L3	   The RTP specification establishes a registry of profile names for use
P39L4	   by higher-level control protocols, such as the Session Description
P39L5	   Protocol (SDP), RFC 2327 [6], to refer to transport methods.  This
P39L6	   profile registers the name "RTP/AVP".
P39L7	
P39L8	   Section 3 establishes the policy that no additional registration of
P39L9	   static RTP payload types for this profile will be made beyond those
P39L10	   added in this document revision and included in Tables 4 and 5.  IANA
P39L11	   may reference that section in declining to accept any additional
P39L12	   registration requests.  In Tables 4 and 5, note that types 1 and 2
P39L13	   have been marked reserved and the set of "dyn" payload types included
P39L14	   has been updated.  These changes are explained in Sections 6 and 9.
P39L15	
P39L16	12.  References
P39L17	
P39L18	12.1 Normative References
P39L19	
P39L20	   [1]  Schulzrinne, H., Casner, S., Frederick, R. and V. Jacobson,
P39L21	        "RTP:  A Transport Protocol for Real-Time Applications", RFC
P39L22	        3550, July 2003.
P39L23	
P39L24	   [2]  Bradner, S., "Key Words for Use in RFCs to Indicate Requirement
P39L25	        Levels", BCP 14, RFC 2119, March 1997.
P39L26	
P39L27	   [3]  Apple Computer, "Audio Interchange File Format AIFF-C", August
P39L28	        1991.  (also ftp://ftp.sgi.com/sgi/aiff-c.9.26.91.ps.Z).
P39L29	
P39L30	12.2 Informative References
P39L31	
P39L32	   [4]  Braden, R., Clark, D. and S. Shenker, "Integrated Services in
P39L33	        the Internet Architecture: an Overview", RFC 1633, June 1994.
P39L34	
P39L35	   [5]  Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z. and W.
P39L36	        Weiss, "An Architecture for Differentiated Service", RFC 2475,
P39L37	        December 1998.
P39L38	
P39L39	   [6]  Handley, M. and V. Jacobson, "SDP: Session Description
P39L40	        Protocol", RFC 2327, April 1998.
P39L41	
P39L42	   [7]  Casner, S. and P. Hoschka, "MIME Type Registration of RTP
P39L43	        Payload Types", RFC 3555, July 2003.
P39L44	
P39L45	   [8]  Freed, N., Klensin, J. and J. Postel, "Multipurpose Internet
P39L46	        Mail Extensions (MIME) Part Four: Registration Procedures", BCP
P39L47	        13, RFC 2048, November 1996.
P39L48	
P40L1	   [9]  Zopf, R., "Real-time Transport Protocol (RTP) Payload for
P40L2	        Comfort Noise (CN)", RFC 3389, September 2002.
P40L3	
P40L4	   [10] Deleam, D. and J.-P. Petit, "Real-time implementations of the
P40L5	        recent ITU-T low bit rate speech coders on the TI TMS320C54X
P40L6	        DSP: results, methodology, and applications", in Proc. of
P40L7	        International Conference on Signal Processing, Technology, and
P40L8	        Applications (ICSPAT) , (Boston, Massachusetts), pp. 1656--1660,
P40L9	        October 1996.
P40L10	
P40L11	   [11] Mouly, M. and M.-B. Pautet, The GSM system for mobile
P40L12	        communications Lassay-les-Chateaux, France: Europe Media
P40L13	        Duplication, 1993.
P40L14	
P40L15	   [12] Degener, J., "Digital Speech Compression", Dr. Dobb's Journal,
P40L16	        December 1994.
P40L17	
P40L18	   [13] Redl, S., Weber, M. and M. Oliphant, An Introduction to GSM
P40L19	        Boston: Artech House, 1995.
P40L20	
P40L21	   [14] Hoffman, D., Fernando, G., Goyal, V. and M. Civanlar, "RTP
P40L22	        Payload Format for MPEG1/MPEG2 Video", RFC 2250, January 1998.
P40L23	
P40L24	   [15] Jayant, N. and P. Noll, Digital Coding of Waveforms--Principles
P40L25	        and Applications to Speech and Video Englewood Cliffs, New
P40L26	        Jersey: Prentice-Hall, 1984.
P40L27	
P40L28	   [16] McKay, K., "RTP Payload Format for PureVoice(tm) Audio", RFC
P40L29	        2658, August 1999.
P40L30	
P40L31	   [17] Perkins, C., Kouvelas, I., Hodson, O., Hardman, V., Handley, M.,
P40L32	        Bolot, J.-C., Vega-Garcia, A. and S. Fosse-Parisis, "RTP Payload
P40L33	        for Redundant Audio Data", RFC 2198, September 1997.
P40L34	
P40L35	   [18] Speer, M. and D. Hoffman, "RTP Payload Format of Sun's CellB
P40L36	        Video Encoding", RFC 2029, October 1996.
P40L37	
P40L38	   [19] Berc, L., Fenner, W., Frederick, R., McCanne, S. and P. Stewart,
P40L39	        "RTP Payload Format for JPEG-Compressed Video", RFC 2435,
P40L40	        October 1998.
P40L41	
P40L42	   [20] Turletti, T. and C. Huitema, "RTP Payload Format for H.261 Video
P40L43	        Streams", RFC 2032, October 1996.
P40L44	
P40L45	   [21] Zhu, C., "RTP Payload Format for H.263 Video Streams", RFC 2190,
P40L46	        September 1997.
P40L47	
P40L48	
P41L1	   [22] Bormann, C., Cline, L., Deisher, G., Gardos, T., Maciocco, C.,
P41L2	        Newell, D., Ott, J., Sullivan, G., Wenger, S. and C. Zhu, "RTP
P41L3	        Payload Format for the 1998 Version of ITU-T Rec. H.263 Video
P41L4	        (H.263+)", RFC 2429, October 1998.
P41L5	
P41L6	   [23] Schulzrinne, H., Rao, A. and R. Lanphier, "Real Time Streaming
P41L7	        Protocol (RTSP)", RFC 2326, April 1998.
P41L8	
P41L9	   [24] Cain, B., Deering, S., Kouvelas, I., Fenner, B. and A.
P41L10	        Thyagarajan, "Internet Group Management Protocol, Version 3",
P41L11	        RFC 3376, October 2002.
P41L12	
P41L13	13. Current Locations of Related Resources
P41L14	
P41L15	   Note:  Several sections below refer to the ITU-T Software Tool
P41L16	   Library (STL).  It is available from the ITU Sales Service, Place des
P41L17	   Nations, CH-1211 Geneve 20, Switzerland (also check
P41L18	   http://www.itu.int).  The ITU-T STL is covered by a license defined
P41L19	   in ITU-T Recommendation G.191, "Software tools for speech and audio
P41L20	   coding standardization".
P41L21	
P41L22	   DVI4
P41L23	
P41L24	   An archived copy of the document IMA Recommended Practices for
P41L25	   Enhancing Digital Audio Compatibility in Multimedia Systems (version
P41L26	   3.0), which describes the IMA ADPCM algorithm, is available at:
P41L27	
P41L28	      http://www.cs.columbia.edu/~hgs/audio/dvi/
P41L29	
P41L30	   An implementation is available from Jack Jansen at
P41L31	
P41L32	      ftp://ftp.cwi.nl/local/pub/audio/adpcm.shar
P41L33	
P41L34	   G722
P41L35	
P41L36	   An implementation of the G.722 algorithm is available as part of the
P41L37	   ITU-T STL, described above.
P41L38	
P41L39	   G723
P41L40	
P41L41	   The reference C code implementation defining the G.723.1 algorithm
P41L42	   and its Annexes A, B, and C are available as an integral part of
P41L43	   Recommendation G.723.1 from the ITU Sales Service, address listed
P41L44	   above.  Both the algorithm and C code are covered by a specific
P41L45	   license.  The ITU-T Secretariat should be contacted to obtain such
P41L46	   licensing information.
P41L47	
P41L48	
P42L1	   G726
P42L2	
P42L3	   G726 is specified in the ITU-T Recommendation G.726, "40, 32, 24, and
P42L4	   16 kb/s Adaptive Differential Pulse Code Modulation (ADPCM)".  An
P42L5	   implementation of the G.726 algorithm is available as part of the
P42L6	   ITU-T STL, described above.
P42L7	
P42L8	   G729
P42L9	
P42L10	   The reference C code implementation defining the G.729 algorithm and
P42L11	   its Annexes A through I are available as an integral part of
P42L12	   Recommendation G.729 from the ITU Sales Service, listed above.  Annex
P42L13	   I contains the integrated C source code for all G.729 operating
P42L14	   modes.  The G.729 algorithm and associated C code are covered by a
P42L15	   specific license.  The contact information for obtaining the license
P42L16	   is available from the ITU-T Secretariat.
P42L17	
P42L18	   GSM
P42L19	
P42L20	   A reference implementation was written by Carsten Bormann and Jutta
P42L21	   Degener (then at TU Berlin, Germany).  It is available at
P42L22	
P42L23	      http://www.dmn.tzi.org/software/gsm/
P42L24	
P42L25	   Although the RPE-LTP algorithm is not an ITU-T standard, there is a C
P42L26	   code implementation of the RPE-LTP algorithm available as part of the
P42L27	   ITU-T STL.  The STL implementation is an adaptation of the TU Berlin
P42L28	   version.
P42L29	
P42L30	   LPC
P42L31	
P42L32	   An implementation is available at
P42L33	
P42L34	      ftp://parcftp.xerox.com/pub/net-research/lpc.tar.Z
P42L35	
P42L36	   PCMU, PCMA
P42L37	
P42L38	   An implementation of these algorithms is available as part of the
P42L39	   ITU-T STL, described above.
P42L40	
P42L41	14. Acknowledgments
P42L42	
P42L43	   The comments and careful review of Simao Campos, Richard Cox and AVT
P42L44	   Working Group participants are gratefully acknowledged.  The GSM
P42L45	   description was adopted from the IMTC Voice over IP Forum Service
P42L46	   Interoperability Implementation Agreement (January 1997).  Fred Burg
P42L47	   and Terry Lyons helped with the G.729 description.
P42L48	
P43L1	15. Intellectual Property Rights Statement
P43L2	
P43L3	   The IETF takes no position regarding the validity or scope of any
P43L4	   intellectual property or other rights that might be claimed to
P43L5	   pertain to the implementation or use of the technology described in
P43L6	   this document or the extent to which any license under such rights
P43L7	   might or might not be available; neither does it represent that it
P43L8	   has made any effort to identify any such rights.  Information on the
P43L9	   IETF's procedures with respect to rights in standards-track and
P43L10	   standards-related documentation can be found in BCP-11.  Copies of
P43L11	   claims of rights made available for publication and any assurances of
P43L12	   licenses to be made available, or the result of an attempt made to
P43L13	   obtain a general license or permission for the use of such
P43L14	   proprietary rights by implementors or users of this specification can
P43L15	   be obtained from the IETF Secretariat.
P43L16	
P43L17	   The IETF invites any interested party to bring to its attention any
P43L18	   copyrights, patents or patent applications, or other proprietary
P43L19	   rights which may cover technology that may be required to practice
P43L20	   this standard.  Please address the information to the IETF Executive
P43L21	   Director.
P43L22	
P43L23	16. Authors' Addresses
P43L24	
P43L25	   Henning Schulzrinne
P43L26	   Department of Computer Science
P43L27	   Columbia University
P43L28	   1214 Amsterdam Avenue
P43L29	   New York, NY 10027
P43L30	   United States
P43L31	
P43L32	   EMail: schulzrinne@cs.columbia.edu
P43L33	
P43L34	
P43L35	   Stephen L. Casner
P43L36	   Packet Design
P43L37	   3400 Hillview Avenue, Building 3
P43L38	   Palo Alto, CA 94304
P43L39	   United States
P43L40	
P43L41	   EMail: casner@acm.org
P43L42	
P43L43	
P43L44	
P43L45	
P43L46	
P43L47	
P43L48	
P44L1	17. Full Copyright Statement
P44L2	
P44L3	   Copyright (C) The Internet Society (2003).  All Rights Reserved.
P44L4	
P44L5	   This document and translations of it may be copied and furnished to
P44L6	   others, and derivative works that comment on or otherwise explain it
P44L7	   or assist in its implementation may be prepared, copied, published
P44L8	   and distributed, in whole or in part, without restriction of any
P44L9	   kind, provided that the above copyright notice and this paragraph are
P44L10	   included on all such copies and derivative works.  However, this
P44L11	   document itself may not be modified in any way, such as by removing
P44L12	   the copyright notice or references to the Internet Society or other
P44L13	   Internet organizations, except as needed for the purpose of
P44L14	   developing Internet standards in which case the procedures for
P44L15	   copyrights defined in the Internet Standards process must be
P44L16	   followed, or as required to translate it into languages other than
P44L17	   English.
P44L18	
P44L19	   The limited permissions granted above are perpetual and will not be
P44L20	   revoked by the Internet Society or its successors or assigns.
P44L21	
P44L22	   This document and the information contained herein is provided on an
P44L23	   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
P44L24	   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
P44L25	   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
P44L26	   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
P44L27	   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
P44L28	
P44L29	Acknowledgement
P44L30	
P44L31	   Funding for the RFC Editor function is currently provided by the
P44L32	   Internet Society.
P44L33	
P44L34	
P44L35	
P44L36	
P44L37	
P44L38	
P44L39	
P44L40	
P44L41	
P44L42	
P44L43	
P44L44	
P44L45	
P44L46	
P44L47	
P44L48