Óscar del-Ama-Esteban, Sergio Sanz-Rodríguez, Manuel de-Frutos-López,
Fernando Díaz-de-María
A Cauchy-density-based rate controller for
H.264/AVC in low-delay environments
Conference object, Postprint version
This version is available at http://dx.doi.org/10.14279/depositonce-5773.
Suggested Citation
del-Ama-Esteban, Óscar; Sanz-Rodríguez, Sergio; de-Frutos-López, Manuel; Díaz-de-María, Fernando: A
Cauchy-density-based rate controller for H.264/AVC in low-delay Environments. - In: 2009 Picture
Coding Symposium : PCS. - New York, NY [u.a.] : IEEE, 2009. - ISBN: 978-1-4244-4593-6. - pp. 1-4. - DOI:
10.1109/PCS.2009.5167438. (Postprint version is cited, page numbers differ.)
Terms of Use
© © 2009 IEEE. Personal use of this material is permitted. Permission from IEEE must be
obtained for all other uses, in any current or future media, including reprinting/republishing
this material for advertising or promotional purposes, creating new collective works, for
resale or redistribution to servers or lists, or reuse of any copyrighted component of this
work in other works.
Powered by TCPDF (www.tcpdf.org)
A CAUCHY-DENSITY-BASED RATE CONTROLLER FOR H.264/AVC
IN LOW-DELAY ENVIRONMENTS
´
Oscar del-Ama-Esteban, Sergio Sanz-Rodr´
ıguez, Manuel de-Frutos-L´
opez, Fernando D´
ıaz-de-Mar´
ıa
Department of Signal Theory and Communications
Universidad Carlos III, Legan´
es (Madrid), Spain
ABSTRACT
The accuracy of the Cauchy probability density function for mod-
eling of the discrete cosine transform coefficient distribution has al-
ready been proved for the frame layer of the rate control subsystem
of a hybrid video coder. Nevertheless, in some specific applications
operating in real-time low-delay environments, a basic unit layer is
recommended in order to provide a good trade-off between quality
and delay control. In this paper, a novel basic unit bit allocation
for H.264/AVC is proposed based on a simplified Cauchy proba-
bility density function source modeling. The experimental results
show that the proposed algorithm improves the average peak signal-
to-noise ratio in 0.28 and 0.35 dB with respect to two well-known
rate control schemes, while maintaining similar peak signal-to-noise
ratio standard deviation and buffer occupancy evolution.
Index Terms—H.264, low-delay, rate control, bit allocation,
basic unit.
1. INTRODUCTION
The inherent variability of video information implies that the video
encoder normally produces a variable output bit rate, which must
be controlled in order to comply with the nominal network rate re-
quired by real-time communications through constant bit rate chan-
nels. Therefore, the inclusion of a rate control (RC) algorithm in
video encoders is of paramount importance. Although it is not a nor-
mative tool for video coding standards, several schemes have been
recommended during the different standardization processes, such
as TM5 for MPEG-2 [1], VM8 for MPEG-4 [2], TMN8 for H.263
[3], and AVC-TM for H.264/AVC [4].
Almost in any RC algorithm, a virtual buffer is considered at the out-
put of the encoder. This buffer aims at modeling the decoder buffer
behavior and bears the difference between the nominal bit rate of
the network and the variable source bit rate per picture. The RC
algorithm must keep the buffer in secure levels, avoiding both the
overflow and the underflow.
In order to maintain the output rate within the buffer limits without
visual quality degradation, a RC scheme should assign, according
to the buffer status and the picture complexity, the most appropri-
ate amount of bits and the corresponding quantization step (Q) to
each coding unit. Traditional RC schemes select the Qvalue using
an analytical rate-quantization (R-Q) function. This function is de-
rived by means of a source modeling of the discrete cosine transform
(DCT) coefficients. For instance, using a Gaussian probability den-
sity function (PDF), a logarithmic R-Q function can be inferred [5];
while using a Laplacian PDF, different linear [6], quadratic [2] or
ρ-domain-based [7] R-Q models have been proposed.
Kamaci et al demonstrated in [8] the accuracy of the Cauchy PDF
for DCT coefficient modeling. Starting from this distribution, a sim-
ple exponential R-Q model is obtained which, applied to a RC al-
gorithm for H.264/AVC, achieves a mean PSNR improvement when
compared to schemes based on the Laplacian PDF.
As many others algorithms, the one reported in [8] employs a frame
layer which assigns the Qvalue on a frame basis. However, in the
case of low-delay applications, the buffer size is very restricted and
a finer RC is desirable. Given the growing popularity of these sce-
narios, such as in videophone or videoconferencing, several RC al-
gorithms have been proposed for different video coding standards,
where an additional basic unit (BU) layer, defined as a group of mac-
roblocks in raster scan order which share the same Qvalue, is em-
ployed. In [1],[3],[7] and [9] a single macroblock is proposed as BU
size, while in [4] a greater size is recommended for a better trade-off
between quality smoothness and target frame size adjustment.
In this paper, a Cauchy-PDF-based BU bit allocation algorithm is
proposed, showing that the exponential R-Q and rate-distortion mod-
els work properly at the BU layer RC.
The paper is organized as follows. Section 2 describes the solutions
adopted for group of pictures (GOP), picture and BU layer RC. Sec-
tion 3 shows and analyzes the experimental results of the suggested
method in comparison with other popular RC schemes. Finally, in
Section 4 some conclusions are drawn.
2. PROPOSED CAUCHY-PDF-BASED RATE CONTROL
Since the main contribution of this paper involves a new BU layer,
some state of the art algorithms have been studied and their solutions
adopted for the GOP and picture layers of the proposed RC scheme.
These solutions are briefly described in the following subsections as
well as the BU layer itself.
2.1. GOP Layer
The most common coding pattern in low-delay applications is the
IP...P structure, since no structural delay is involved. Moreover,
since I pictures need a larger amount of bits than P or B pictures,
a unique GOP is adopted to encode the whole sequence in order to
reduce the buffer delay.
The GOP layer computes the QP values for the first picture of each
type, (QP I
0and QP P
0). These values depend on the average number
of target bits per pixel (see [4] for further details).
Before encoding the current jth picture, the total number of bits to
encode the remaining pictures, B(j), and the buffer fullness V(j)
are calculated as follows:
B(j) =
J·R
f,if j=1
B(j−1) −t(j−1),otherwise, (1)
1
V(j) =
0,if j=1
V(j−1)+t(j−1)−R
f,otherwise, (2)
where Jrepresents the number of pictures, Ris the target bit rate,
fis the frame rate and t(j−1) is the total number of bits used to
encode the last picture.
2.2. Picture Layer
At picture layer, the amount of target bits for the jth picture, ˆ
T(j),
is obtained as a portion of the remaining bits B(j):
ˆ
T(j) = B(j)
Np,r
,(3)
where Np,r represents the number of remaining pictures.
In low-delay environments an additional buffer-status aware bit al-
location algorithm is highly recommended to achieve a tight buffer
level control and compensate the extra bits needed for encoding the
first I picture. This model obeys:
˜
T(j) = R
f+δ
S(j)−V(j)
,(4)
where δis a constant which is set to 0.5 to achieve a fast enough
adjustment to the target buffer level without increasing the QP fluc-
tuation, and S(j)is the linear model of the target buffer level (see
[10] for details). Thus, the total number of target bits to encode the
jth picture is calculated as a combination of (3) and (4):
T(j) = βˆ
T(j) + (1 −β)˜
T(j),(5)
where βis a constant which is set to 0.5 as a good trade-off between
PSNR and buffer fluctuation. Finally, T(j)is bounded to conform
to hypothetical reference decoder (HRD) constraints ([4]).
2.3. Basic Unit Layer
As pointed out in [3] and [9], in real-time low-delay applications,
where the overall delay between the transmitter and receiver is
severely constrained, the picture layer RC schemes are generally not
suitable approaches, since the misadjustments to the frame target
bits could result in larger fluctuations of the buffer fullness and,
therefore, the buffer overflow or underflow risk is higher.
A BU is a group of macroblocks in raster scan order which share the
same quantization value. The number of macroblocks in a BU is set
before beginning the encoding process and it is generally an entire
fraction of the total number of macroblocks in a picture.
2.3.1. Bit Allocation
We assume that the jth picture is composed of KBUs (K > 1), and
the previous k−1BUs in the picture have already been encoded.
We have a budget of Tr,k(j)bits for the remaining BUs (k, k +
1, . . . , K), which is calculated as:
Tr,k(j) =
T(j),if k= 1
Tr,k(j)−tk−1(j),otherwise, (6)
where tk−1(j)represents the number of bits used to encode the pre-
vious BU. Then, we want to find the ˜
bl(j)target texture bits for the
lth BU so that:
Tr,k(j) =
K
X
l=k
˜
bl(j) + ˜
hl(j)
,(7)
where ˜
hl(j)is a prediction of the number of header and motion data
bits of the remaining BUs.
In order to maintain a consistent video quality within the frames,
we propose that the proper bit allocation for the current BU must be
subject to:
Dl(j)≈Dm(j)l6=m, (8)
where Dl(j)represents the distortion of the lth BU in the jth pic-
ture. At frame level, it is assumed that the same QP value in con-
secutive pictures produces similar distortion [8], since the global
variation between consecutive pictures is usually moderated. How-
ever, at BU level, this assumption is no longer valid because of
the potentially large differences in complexity of neighboring BUs.
Therefore, although we start from the same models, the theoretical
approach to the bit allocation must be different from that used in
[8]. Specifically, we propose to use the simplified R-D function de-
rived from a Cauchy-type model of the DCT coefficient distribution,
namely:
D(R)≈cR−γ,(9)
where cand γare the model parameters. Using this R-D model and
the constant distortion (or quality) condition (8), the target texture
bits for the lth BU, ˜
bl(j), can be obtained from those of the current
one, ˜
bk(j), as follows:
˜
bl(j) = c
1
γl
l,j c
−
1
γl
k,j ˜
bk(j)
γk
γl.(10)
As mentioned before, the complexity of neighboring BUs may be
significantly different in those pictures with high spatial heterogene-
ity. Thus, the assumptions cl,j =ck,j and γl,j =γk,j , with l6=k,
are no longer valid. Then, combining the expressions (7) and (10),
the following BU bit allocation is obtained:
Tr,k(j) =
K
X
l=k
c
1
γl
l,j c
−
1
γl
k,j ˜
bk(j)
γk
γl+˜
hl(j)
.(11)
Finally, assuming the same header and motion data bits for the Nr
remaining BUs within the picture, the expression (11) can be rewrit-
ten as:
Tr,k(j) =
K
X
l=k
c
1
γl
l,j c
−
1
γl
k,j ˜
bk(j)
γk
γl
+Nr˜
hk(j),(12)
where ˜
hk(j)can be obtained as the average number of header and
motion data bits used to encode the previous k−1BUs. The New-
ton’s method can be used to iteratively obtain a solution for ˜
bk(j).
2.3.2. Quantization Parameter Estimation
When the jth picture is to be encoded, the quantization parameter
(QP) value for the kth BU, QPk(j), is obtained from the correspond-
ing Qk(j), which is computed using the exponential R-Q model de-
scribed in [8] adapted for BU level:
Qk(j) =
˜
bk(j)
ak,j
−
1
αk.(13)
where ak,j and αk,j are model parameters. Finally, the QPk(j)
value is bounded at BU and picture levels as stated in [4] in order to
avoid noticeable quality variations within the frame.
2
Table 1.αkand γkcoefficients update.
Bpp,k αkBpp,k γk
<0.05 1.60 <0.07 0.50
0.05 −0.10 1.40 0.07 −0.20 0.70
>0.10 1.20 >0.20 1.00
2.3.3. Rate-Quantization and Rate-Distortion Models Update
After encoding the kth BU of the jth inter picture, the coefficients
ak,j+1 and ck,j+1 are updated as follows:
ak,j+1 =bk(j)
e
Qk(j)αk,(14)
ck,j+1 =dk(j)bk(j)γk,(15)
where bk(j)is the number of texture bits used to encode the kth BU,
and dk(j)is the sum of square error between the original and recon-
structed luminance pixels in the BU.
e
Qk(j)is the Qvalue obtained
from QPk(j), the QP value for the kth BU of the jth frame. After
encoding the first inter picture in the sequence, Table 1 is employed
to initialize the parameters αkand γkaccording to the average num-
ber of texture bits to encode a pixel of the kth BU, Bpp,k. Thresholds
used to set the initial αkvalues have been obtained from [8].
3. EXPERIMENTS AND RESULTS
The proposed BU layer RC algorithm was implemented on the Joint
Video Team (JVT) software version JM 10.2 [11]. In order to as-
sess its performance, it was compared to two Laplacian-PDF-based
schemes: the RC algorithm adopted by JVT [4] and the one pro-
posed in [12]. CIF and QCIF common video sequences as well as
linkings of them were used in the experiments.
A row of macroblocks was employed as BU size to achieve a good
trade-off among quality, buffer fluctuation and computational cost.
The complete configuration of our tests is stated here:
•QCIF sequences: ”MobileCalendar”, ”MotherDaughter”,
”News”, ”Salesman”, ”Silent” and ”Paris-Football”
•CIF sequences: ”Coastguard”, ”Container”, ”Football”,
”Foreman”, ”Paris” and ”Bus-Foreman”
•Profile: baseline
•Number of pictures: 300
•Frame rate: f= 25 f/s
•Target rates (QCIF/CIF): R= 32/128,64/256,96/512,
128/1024 kbits/s
•Buffer size: 500 ms
•R-D optimization: disabled
•Symbol mode: CAVLC
Luminance peak signal-to-noise ratio (PSNR), in terms of mean and
standard deviation, and output bit rate are shown in Tables 2 and 3
for some QCIF and CIF sequences, respectively. The proposed algo-
rithm achieves an average PSNR improvement of 0.35 dB over the
RC scheme reported in [12], since the latter does not take into ac-
count the complexities of BUs for better target bit assignment. Fur-
thermore, we achieve a 0.28 dB average PSNR improvement over
the default RC algorithm of JM 10.2 [4], which employs a BU bit
allocation algorithm based on a complexity ratio that takes into ac-
count the remaining BUs of a picture. Moreover, the results in terms
Table 2. PSNR (mean and standard deviation) and output rate for
several QCIF sequences and target rates.
Sequence Algorithm Mean PSNR Output
Target Rate on PSNR Deviation Rate
(kbits/s) JM 10.2 (dB) (dB) (kbits/s)
News [12] 36.22 0.82 64.16
64 [4] 36.41 0.81 64.12
Proposed 36.91 0.95 64.05
News [12] 40.95 1.08 128.24
128 [4] 41.02 0.98 128.17
Proposed 41.42 1.10 128.14
Paris-Football [12] 27.29 2.82 64.05
64 [4] 27.27 2.81 63.13
Proposed 27.68 2.95 64.03
Paris-Football [12] 31.55 3.76 127.98
128 [4] 31.52 3.67 128.12
Proposed 31.89 3.91 127.99
Table 3. PSNR (mean and standard deviation) and output rate for
several CIF sequences and target rates.
Sequence Algorithm Mean PSNR Output
Target Rate on PSNR Deviation Rate
(kbits/s) JM 10.2 (dB) (dB) (kbits/s)
Bus-Foreman [12] 31.32 3.92 256.29
256 [4] 31.48 3.78 256.19
Proposed 31.49 3.75 256.08
Bus-Foreman [12] 37.65 3.58 1024.79
1024 [4] 37.92 3.31 1024.35
Proposed 37.91 3.31 1024.19
Paris [12] 31.76 0.59 256.18
256 [4] 31.95 0.64 256.09
Proposed 32.57 0.81 256.07
Paris [12] 40.29 0.69 1024.71
1024 [4] 40.91 0.80 1024.21
Proposed 41.19 0.89 1023.95
of PSNR standard deviation and output bit rate are similar to those
of the reference schemes.
Finally, the PSNR and encoder buffer occupancy versus picture num-
ber are plotted in Figs 1 and 2, respectively. It can be seen that the
proposed method achieves a good quality performance and similar
buffer occupancy evolution when compared to [12] and [4]. Fur-
thermore, in sequences with scene cuts, such as ”Paris-Football” and
”Bus-Foreman”, a faster adaptation to the complexity of a new scene
is reached with our proposal.
4. CONCLUSIONS
In this paper, a BU layer RC algorithm has been proposed for
H.264/AVC. Starting from a simplified Cauchy PDF for DCT co-
efficient distribution modeling, a novel bit allocation has been
developed and the exponential R-Q model has been applied on a
BU basis. In order to assess its performance, the RC was integrated
on an H.264/AVC encoder, which was configured according to a
3
Fig. 1. PSNR evolution for several video sequences and target rates.
real-time low-delay application and compared to two Laplacian-
PDF-based BU layer RC schemes. The experimental results show
an average PSNR improvement over the default RC algorithm of
JM 10.2 [4] and the algorithm described in [12], with similar buffer
occupancy evolutions. Furthermore, our proposal achieves similar
PSNR standard deviation and output bit rate.
Another characteristic of the proposed RC algorithm is the low com-
putational requirement to update the coefficients of the exponential
R-Q and R-D models associated to each BU in the picture, akand
ckwith the linear expressions (14) and (15). In contrast to this,
the Laplacian-PDF-based method used in the reference schemes
involves updating parameters on both the quadratic R-Q and the
complexity prediction models by means of a regression analysis of
previously encoded BUs, with the consequent overhead.
5. REFERENCES
[1] Test Model 5, “http://www.mpeg.org/MPEG/MSSG/tm5,” .
[2] Tihao Chiang and Ya-Qin Zhang, “A new rate control scheme
using quadratic rate distortion model,” in Image Processing,
1996. Proceedings., International Conference on, 1996, vol. 1,
pp. 73–76 vol.2.
[3] J. Ribas-Corbera and Shawmin Lei, “Rate control in DCT
video coding for low-delay communications,” Circuits and
Systems for Video Technology, IEEE Transactions on, vol. 9,
no. 1, pp. 172–185, 1999.
[4] S. Ma, Z. Li, and F. We, “Proposed draft of adaptive rate con-
trol,” JVT-H017, 8th JVT Meeting, Geneva, Switzerland, May
2003.
[5] Bo Tao, B.W. Dickinson, and H.A. Peterson, “Adaptive model-
driven bit allocation for MPEG video coding,” Circuits and
Systems for Video Technology, IEEE Transactions on, vol. 10,
no. 1, pp. 147–157, Feb 2000.
Fig. 2. Encoder buffer occupancy evolution for several video se-
quences and target rates.
[6] S. Ma, Wen Gao, and Yan Lu, “Rate-distortion analysis for
H.264/AVC video coding and its application to rate control,”
Circuits and Systems for Video Technology, IEEE Transactions
on, vol. 15, no. 12, pp. 1533–1544, 2005.
[7] Zhihai He, Yong Kwan Kim, and S.K. Mitra, “Low-delay rate
control for DCT video coding via ρ-domain source modeling,”
Circuits and Systems for Video Technology, IEEE Transactions
on, vol. 11, no. 8, pp. 928–940, 2001.
[8] N. Kamaci, Y. Altunbasak, and R.M. Mersereau, “Frame bit
allocation for the H.264/AVC video coder via Cauchy-density-
based rate and distortion models,” Circuits and Systems for
Video Technology, IEEE Transactions on, vol. 15, no. 8, pp.
994–1006, 2005.
[9] Minqiang Jiang and Nam Ling, “Low-delay rate control for
real-time H.264/AVC video coding,” Multimedia, IEEE Trans-
actions on, vol. 8, no. 3, pp. 467–477, 2006.
[10] Siwei Ma, Wen Gao, Peng Gao, and Yan Lu, “Rate control
for advance video coding (avc) standard,” in Circuits and Sys-
tems, 2003. ISCAS ’03. Proceedings of the 2003 International
Symposium on, 2003, vol. 2, pp. II–892–II–895 vol.2.
[11] JM 10.2, “http://iphome.hhi.de/suehring/tml/download/old jm/,”
.
[12] Z. Li, W. Gao, F. Pan, S. Ma, , G.N. Feng, K.P. Lim, X. Lin,
S. Rahardja, and H.Q. Lu, “Adaptive rate control with HRD
consideration,” JVT-H014, 8th JVT Meeting, Geneva, Switzer-
land, May 2003.
4