scieee Science in your language
[en] (orig)
Sergio Sanz-Rodríguez, Óscardel-Ama-Esteban, Manuel de-Frutos-López,
Fernando Díaz-de-María
Cauchy-density-based basic unit layer rate
controller for H.264/AVC
Article, Postprint version
This version is available at http://dx.doi.org/10.14279/depositonce-5747.
Suggested Citation
Sanz-Rodríguez, S.; del-Ama-Este an, Ó.; de-Frutos-López, M.; Díaz-de-María, F.: Cauchy-density- ased a
sic unit layer rate controller for H.264/AVC. - In: IEEE transactions on circuits and systems for videot
echnology : a pu lication of the Circuits and Systems Society. - ISSN: 1558-2205 (online), 1051-8215 (pr
int). - 20 (2010), 8. - pp. 1139–1143. - DOI: 10.1109/TCSVT.2010.2051369. (Postprint version is cited,ba
vailable at http://dx.doi.org/10.14279/depositonce-5747)
Terms of Use
© © 2010 IEEE. Personal use of this material is permitted. Permission from IEEE must be
obtained for all other uses, in any current or future media, including reprinting/republishing
this material for advertising or promotional purposes, creating new collective works, for
resale or redistribution to servers or lists, or reuse of any copyrighted component of this
work in other works.
Powered by TCPDF (www.tcpdf.org)
1
Cauchy-Density-Based Basic Unit Layer
Rate Controller for H.264/AVC
Sergio Sanz-Rodr´
ıguez, Student Member, IEEE, ´
Oscar del-Ama-Esteban, Manuel de-Frutos-L´
opez, Student
Member, IEEE, Fernando D´
ıaz-de-Mar´
ıa, Member, IEEE
Abstract—The rate control problem has been extensively stud-
ied in parallel to the development of the different video coding
standards. The bit allocation via Cauchy-density-based rate-
distortion (R-D) modeling of the discrete cosine transform (DCT)
coefficients has proved to be one of the most accurate solution at
picture level. Nevertheless, in some specific applications operating
in real-time low-delay environments, a basic unit (BU) layer
is recommended in order to provide a good trade-off between
picture quality and delay control. In this paper, a novel BU
bit allocation for H.264/AVC is proposed based on a simplified
Cauchy probability density function (PDF) source modeling. The
experimental results are twofold: 1) the proposed rate control
algorithm (RCA) achieves an average PSNR improvement of 0.28
dB respect to a well known BU layer RCA, while maintaining a
similar buffer occupancy evolution; and 2) It achieves to notably
reduce the buffer occupancy fluctuations respect to a well known
picture layer RCA, while maintaining similar quality levels.
Index Terms—H.264/AVC, low-delay, rate control, basic unit,
bit allocation
I. INTRODUCTION
THE inherent variability of video information implies
consequently a variable output bit rate, which must be
controlled in order to comply with the requirements of real-
time video communications through constant bit rate (CBR)
channels. For this purpose, several schemes for rate control
have been recommended during the standardization process of
video codecs, such as TM5 for MPEG-2 [1], VM8 for MPEG-
4 [2], TMN8 for H.263 [3], and AVC-TM for H.264/AVC [4].
The hypothetical reference decoder (HRD), a normative part
of the standard that describes a set of requirements to transmit
and decode bit streams, is taken into account in order to restrict
the target bit assignment per picture by means of two (lower
and upper) bounds for a virtual buffer fullness. The RCA must
keep the buffer in secure levels, avoiding both overflow and
underflow and, given that the decoder must wait for enough
bits to begin the decoding process of a frame, the final delay
is directly related to the buffer size.
In order to maintain the output rate within the buffer lim-
its without noticeable visual quality degradation, the RCA
must assign the most appropriate amount of bits and the
corresponding quantization step (Q) to each coding unit. The
traditional RCAs select this Qvalue using an analytical rate-
quantization (R-Q) function. This function is derived by means
of a source modeling of the DCT coefficients. For instance,
The authors are with the Department of Signal Theory and Communica-
tions, Universidad Carlos III de Madrid, Legan´
es, Madrid 28911 Spain (e-
mail: {sescalona, oaesteban, mfrutos, fdiaz}@tsc.uc3m.es).
using a Gaussian PDF, a logarithmic R-Q function can be
inferred [5]. On the other hand, considering a Laplacian PDF,
different linear [6], quadratic [2] or ρ-domain-based [7] R-Q
models have been proposed. Finally, assuming a Cauchy PDF,
Kamaci et al [8] obtain a simple exponential R-Q model which
outperforms the previous algorithms.
In the particular case of low-delay applications, where the
buffer size is very small, the selection of Qon a frame
basis is not accurate enough, and a finer rate control is
desirable. Due to the growing popularity of these scenarios,
such as in videophone or videoconferencing, several RCAs
have been proposed, which employ an additional BU layer,
defined as a group of macroblocks in raster scan order which
share the same Qvalue. In [1], [3], [7] and [9] a single
macroblock is proposed as BU size, while in [4] a greater
size is recommended for a better trade-off between quality
smoothness and target frame size adjustment.
In this paper, a Cauchy-PDF-based BU rate controller is de-
rived, showing that exponential R-Q and R-D models, similar
to those applied in [8] on a frame basis, work properly at the
BU layer.
The paper is organized as follows. In Section II, the proposed
BU layer RCA for H.264/AVC is described. In Section III,
a pseudocode summary of the proposed algorithm is given.
Experimental results are shown and analyzed in Section IV.
Finally, in Section V some conclusions are drawn.
II. BASIC UNIT LAYER RATE CONTROLLER
In real-time low-delay applications, the feasible buffer size
is quite limited. As pointed out in [3] and [9], the picture layer
RCAs are generally not suitable approaches for this kind of
environments, since the misadjustments in the frame target bits
increase the buffer overflow or underflow risk. Therefore, a
fine quantization parameter (QP) value variation within each
frame is recommended for a better adjustment to the frame
target bits.
In the following subsections we describe the proposed Cauchy-
PDF-based RCA for low-delay environments. It consists of
three layers: the group of pictures (GOP) and picture layers
are similar to those described in [4] and [8], while the BU layer
is novel and entails a bit allocation algorithm, the consequent
QP estimation and, finally, the R-Q and R-D model update.
A. GOP Layer
This layer computes the QP value for the first picture of
each type, (QP I
0, QPP
0, QPB
0), which depends on the average
2
number of target bits per pixel. The amount of remaining bits
for the jth picture in the ith GOP is determined as follows:
Bi(j) = J·R
fVi(j),if j=1
Bi(j1) ti(j1),otherwise, (1)
where Jrepresents the number of pictures in a GOP, Ris the
target bit rate, fis the frame rate, Vi(j)is the current encoder
buffer fullness, and ti(j1) is the number of bits used to
encode the last picture.
B. Picture Layer
Similarly to [4], the bit allocation algorithm at picture level
consists of main two terms, one modeling the target bits and
the other the buffer status, and obeys the following expression:
Ti(j) = ˆ
Ti(j),if I picture
(1 β)˜
Ti(j) + βˆ
Ti(j),otherwise, (2)
with
ˆ
Ti(j) = ˜
bi(j) + ˜
hi(j),(3)
and
e
Ti(j) = R
f+δSi(j)Vi(j).(4)
The first term, ˆ
Ti(j), is a target bit model of the remaining
bits in a GOP, Bi(j). The amount of target texture bits of the
jth frame in the ith GOP, ˜
bi(j), is computed following [8].
The number of header and motion data bits, ˜
hi(j), is predicted
by computing an average of the header and motion data bits
of the previously encoded pictures in the GOP.
The second term, e
Ti(j), is derived from the buffer occupancy,
Vi(j).Si(j)is the target encoder buffer level after encoding
the jth picture in the GOP, which is computed either by means
of a linear model for IP. . . P GOP patterns or a saw-tooth
shaped model [10] for GOP patterns with B pictures. δis
chosen as a trade-off between QP variation and target buffer
level adaptation. And βdetermines the relative importance
between both bit allocation models.
Finally, Ti(j)is bounded to satisfy the HRD constraints (see
[4] for further details).
C. Basic Unit Bit Allocation
As mentioned in Section I, a BU is a group of macroblocks
in raster scan order which share the same quantization value.
Therefore, the set of possible BU sizes, generally an entire
fraction of the total number of macroblocks in a picture, goes
from one macroblock to an entire frame.
We assume that the jth picture in the ith GOP is composed
of KBUs (K > 1), and the previous k1BUs in the picture
have already been encoded. We have a budget of Tr,i(j)bits
to encode the remaining BUs (k, k + 1, . . . , K), which is
calculated as:
Tr,i(j) = Ti(j),if k= 1
Tr,i(j)tk1,i(j),otherwise, (5)
where tk1,i(j)represents the number of bits used to encode
the previous basic unit. Then, we want to find the ˜
bl,i(j)target
texture bits for the lth BU so that:
Tr,i(j) =
K
X
l=k˜
bl,i(j) + ˜
hl,i(j),(6)
where ˜
hl,i(j)is a prediction of the number of header and
motion data bits of the remaining basic units.
In order to maintain consistent video quality within the frames,
the proper bit allocation for the current BU is subject to:
Dl,i(j)Dm,i(j)l6=m, (7)
where Dl,i(j)represents the distortion of the lth BU in the
jth picture. In [8] is assumed that the same QP value in con-
secutive pictures produces similar distortion, since the global
variation between consecutive pictures is usually moderated.
However, at BU level, this assumption is no longer valid
because of the potentially large differences in complexity
of neighboring BUs. Thus, the theoretical approach becomes
different from that used in [8]. Specifically, we propose to
use the simplified R-D function derived from a Cauchy-type
model of the DCT coefficient distribution, i.e.:
D(R)cRγ,(8)
where cand γare the model parameters. Using this R-D model
and the constant quality condition (7), the target texture bits
for the lth basic unit, ˜
bl,i(j), can be obtained from those of
the current one, ˜
bk,i(j), as follows:
˜
bl,i(j) = c
1
γl,j
l,j c
1
γl,j
k,j ˜
bk,i(j)
γk,j
γl,j .(9)
The model parameters cl,j and γl,j can take different values
according to the type of picture, as it will be described
in Subsection II-E. As mentioned before, the complexity
of neighboring BUs may be significantly different in those
pictures with large spatial heterogeneity. Thus, the assumptions
cl,j =ck,j and γl,j =γk,j, with l6=k, are no longer
valid. Therefore, combining the expressions (6) and (9), the
following BU bit allocation is obtained:
Tr,i(j) =
K
X
l=kc
1
γl,j
l,j c
1
γl,j
k,j ˜
bk,i(j)
γk,j
γl,j +˜
hl,i(j).(10)
Finally, assuming the same header and motion data bits for
the Nrremaining BUs within the picture, the expression (10)
can be rewritten as:
Tr,i(j) =
K
X
l=kc
1
γl,j
l,j c
1
γl,j
k,j ˜
bk,i(j)
γk,j
γl,j +Nr˜
hk,i(j),(11)
where ˜
hk,i(j)can be obtained as the average number of
header and motion data bits used to encode the previous k1
basic units. The Newton’s method can be applied in (11) to
iteratively obtain a solution for ˜
bk,i(j).
3
TABLE I
αk,P/B COEFFICIENT UPDATE.
αk,P αk,B
Bpp,k <0.05 1.60 2.00
0.05 Bpp,k <0.10 1.40 1.80
0.10 Bpp,k 1.20 1.60
TABLE II
γk,P/B COEFFICIENT UPDATE.
γk,P γk,B
Bpp,k <0.07 0.50 0.45
0.07 Bpp,k <0.13 0.70 0.60
0.13 Bpp,k <0.20 0.70 0.75
0.20 Bpp,k 1.00 0.75
D. Quantization Parameter Estimation
When the jth P or B picture within the ith GOP is to be
encoded, the quantization parameter value for the kth basic
unit, QPk,i(j), an integer value between 0and 51, is obtained
from the corresponding quantization step, Qk,i(j), which is
computed using the exponential model:
Qk,i(j) = ˜
bk,i(j)
ak,j 1
αk,j .(12)
The remaining operations for QP selection at BU layer such
as:
- the QP value to be assigned to the first BU in the picture,
- the QP increment to be applied when the frame bit budget
is over and there are still some BUs to be encoded,
- and the clipping algorithms to bound the final QP value,
are the same as those proposed in [4].
For intra pictures other than the first one, the QP estima-
tion at BU layer is not recommended, since eventual object
movements might reduce the correlation between co-located
BUs when the distance between intra pictures is large. Con-
sequently, the picture layer RCA described in [8] is used to
obtain a unique quantization parameter value to encode all
BUs in these intra pictures.
E. Rate-Quantization and Rate-Distortion Model Update
After encoding the kth BU of the jth inter picture, the
coefficients ak,P/B and ck,P/B are updated as follows:
ak,P/B =bk,i(j)e
Qk,i(j)αk,P/B ,(13)
ck,P/B =dk,i(j)bk,i(j)γk,P/B ,(14)
where bk,i(j)is the number of texture bits used to encode
the kth basic unit, and dk,i(j)is the sum of the square error
between the original and reconstructed luminance pixels in the
basic unit. e
Qk,i(j)is the quantization step value obtained from
QPk,i(j), the quantization parameter for the kth BU of the jth
frame in the ith GOP. Furthermore, if the P/B picture is the
first one in the GOP, Tables I and II are employed to update
the parameters αk,P/B and γk,P/B according to the average
number of texture bits per pixel of the kth basic unit, Bpp,k,
which obeys the following expression:
Bpp,k =bk,i(j)
Np,BU
,(15)
where Np,BU is the number of pixels per BU (if the BU size
is one macroblock, Np,BU = 16 ×16 ×1.5for a 4:2:0 format
sequence). Thresholds used to set the αk,P/B value (see Table
I) are the same as those applied to update the parameter αP/B
at picture layer [8], and thresholds used to set the γk,P/B value
have been determined empirically (see Table II).
Finally, after encoding a whole I/P/B picture, the model
coefficient at picture layer, aI/P/B, must also be updated
(see [8]) for the next picture bit allocation. The required
Qi(j)value is obtained as the quantization step associated to
QPi(j), the average quantization parameter value of QPk,i(j).
III. ALGORITHM OUTLINE
The complete RCA is summarized as follows:
A. GOP and Picture Layers
1. Initialize GOP number: i= 1
2. Compute initial QP values QPI
0/QPP
0/QPB
0
3. For each picture j= 1 to J, do
Update buffer level Vi(j): Ref. [4]
Update Bi(j)remaining bits to encode GOP: (1)
Compute Ti(j)target bits: (2), (3), (4), Ref. [8]
If I picture
Compute QPi(1): Ref. [8]
Encode picture
Else
Go to step B1
End
Update aI/P/B: Ref. [8]
If first I/P/B picture in GOP
Update αI/P/B: Ref. [8]
End
End
4. i=i+ 1
5. Go to step A3
B. Basic Unit Layer
1. For each BU k= 1 to K, do
Update Tr,i(j)remaining bits to encode frame: (5)
Compute ˜
bk,i(j)target texture bits: (11)
Compute QPk,i(j): (12), Ref. [4]
Encode basic unit
Update ak,P/B and ck,P/B: (13), (14)
If first P/B picture in GOP
Update αk,P/B and γk,P/B: Tables I, II
End
End
IV. EXPERIMENTS AND RESULTS
The proposed BU layer RCA was implemented on the Joint
Video Team (JVT) H.264/AVC reference software version JM
10.2 [11]. In order to assess its performance, it was compared
to the Laplacian-PDF-based scheme adopted by JVT [4] and
the Cauchy-PDF-based picture layer RCA proposed in [8].
4
Two sets of color video sequences were used in the ex-
periments. The first one consisted of the following QCIF
sequences: ”Carphone”, ”MotherDaughter”, ”News”, ”Sales-
man”, ”Silent” and ”Paris-Football”. The second one consisted
of the following CIF sequences: ”Coastguard”, ”Container”,
”Football”, ”Foreman”, ”Paris” and ”Bus-Foreman”. Note that
the last sequence in each group contained at least one scene
change. Thus, the behavior of the proposed method in non-
stationary situations could be evaluated as well.
The aforementioned sequences were encoded with the follow-
ing configuration according to a low-delay application:
Profile: main
Number of pictures: 300
Frame rate: f= 25 f/s
GOP: 300 pictures IP. . . P
Target rates (QCIF/CIF): R= 32/128,64/256,96/512,
128/1024 kbits/s
Buffer size: 300 ms
BU size: a row of macroblocks
R-D optimization: enabled
Symbol mode: CABAC
Given a sequence and a target bit rate, the quantization
parameter value for the first picture of the sequence, QPI
0, was
the same for all the compared RCAs. It was selected so that the
encoder buffer does not overflow during the encoding process.
Moreover, a row of macroblocks is proposed as BU size to
achieve a good trade-off among quality, buffer fluctuation and
computational cost.
The δvalue in the expression (4) was set to 0.5to achieve
a fast enough adjustment to the target buffer level without
increasing the QP fluctuation. After studying the behaviour of
the proposed algorithm for different βvalues, this parameter
was fixed to 0.5for our buffer-constrained scenario, which
is a good trade-off between PSNR and buffer fluctuation,
and agrees with the value employed in [4]. Furthermore, for
comparison purposes, the RCA described in [8] was also
adapted to the low-delay requirement by including the same bit
allocation model described in Subsection II-B, with identical
δand βvalues.
Some representative sequences and target rates have been
selected to summarize the encoding results. Luminance PSNR
(mean and standard deviation) and output bit rate are shown
in Table III. In addition, the evolution of PSNR and encoder
buffer occupancy versus picture number are depicted in Figs
1 and 2, respectively. When compared to another scheme
including a BU layer [4], the our proposal achieves an average
PSNR improvement of 0.28 dB while exhibiting a similar
buffer occupancy evolution. When compared to a picture
layer rate controller [8], the proposed algorithm achieves,
as expected, a finer control of both the output bit rate and
the buffer occupancy evolution, reducing notably the risk of
underflow and overflow (see for example the buffer occupancy
evolution around picture #130 for ”Paris-Football” or around
picture #155 for ”Bus-Foreman”), while maintaining similar
quality level.
TABLE III
PSNR (MEAN AND STANDARD DEVIATION)AND OUTPUT RATE FOR
SEVERAL SEQUENCES AND TARGET RATES.
Sequence Algorithm Mean PSNR Output
Target Rate on PSNR Deviation Rate
(kbits/s) JM 10.2 (dB) (dB) (kbits/s)
News (QCIF) [4] 37.20 1.07 64.13
64 [8] 37.27 1.10 63.92
Proposed 37.53 1.19 64.20
News (QIF) [4] 41.68 0.93 128.22
128 [8] 41.66 0.93 128.01
Proposed 42.12 1.08 128.06
Paris-Football (QCIF) [4] 28.24 2.85 64.57
64 [8] 28.59 3.07 64.16
Proposed 28.49 3.19 64.02
Paris-Football (QCIF) [4] 32.14 3.54 128.12
128 [8] 32.54 3.90 130.16
Proposed 32.62 3.74 127.98
Silent (QCIF) [4] 35.78 1.02 64.04
64 [8] 36.21 1.37 60.01
Proposed 36.36 1.32 64.04
Silent (QCIF) [4] 40.35 1.07 128.04
128 [8] 40.82 1.26 127.94
Proposed 40.67 1.18 128.02
Bus-Foreman (CIF) [4] 32.31 3.85 256.13
256 [8] 32.54 3.79 255.77
Proposed 32.41 3.81 256.06
Bus-Foreman (CIF) [4] 38.64 3.30 1024.22
1024 [8] 38.79 3.26 1023.75
Proposed 38.63 3.30 1024.05
Football (CIF) [4] 25.84 0.64 255.90
256 [8] 25.96 0.66 256.19
Proposed 25.91 0.63 255.78
Football (CIF) [4] 31.77 0.71 1023.59
1024 [8] 31.84 0.72 1024.20
Proposed 31.96 0.62 1023.69
Paris (CIF) [4] 32.58 0.69 256.12
256 [8] 33.24 0.91 255.99
Proposed 33.47 0.89 256.01
Paris (CIF) [4] 41.58 0.85 1023.97
1024 [8] 41.85 0.85 1023.72
Proposed 41.86 0.89 1023.93
V. CONCLUSIONS
In this paper, a BU layer RCA has been proposed for
H.264/AVC. Starting from a simplified Cauchy PDF for mod-
eling of DCT coefficient distribution, a novel bit allocation
has been developed and the exponential R-Q model has been
applied on a BU basis.
In order to assess its performance, the RCA was integrated
on an H.264/AVC encoder, which was configured according
to a real-time low-delay application and compared to two
well known RCAs. First, the proposed algorithm achieves
a 0.28 dB average PSNR improvement over the Laplacian-
PDF-based BU layer rate controller of JM 10.2 [4], with
similar buffer occupancy evolutions. And second, our proposal
notably reduces the buffer fluctuation in sequences with time
varying complexity while maintaining a similar quality level,
when compared to the picture layer RCA described in [8].
Therefore, the proposed BU extension is recommended for
buffer-constrained scenarios.
The computational cost of the proposed BU layer algorithm
obviously increases with respect to that of the picture layer
RCA reported in [8], since the rate control operations are
5
Fig. 1. PSNR evolution for several video sequences and target rates.
performed on a BU basis. Nevertheless, the computational
requirements to update the R-Q and R-D models associated
to each BU in P/B pictures, ak,P/B and ck,P/B, are low when
compared to the Laplacian-PDF-based BU layer scheme used
in JM 10.2 [4]. This last RCA involves updating parameters on
both the quadratic R-Q and the complexity prediction models
by means of a regression analysis of previously encoded BUs.
Our proposal simply updates ak,P/B and ck,P/B following
the expressions (13) and (14), which depend linearly on the
number of texture bits, the distortion generated by the kth BU,
and the quantization step value.
REFERENCES
[1] “Test Model 5 [Online], http://www.mpeg.org/MPEG/MSSG/tm5.
[2] T. Chiang and Y.-Q. Zhang, “A new rate control scheme using quadratic
rate distortion model, in Image Processing, 1996. Proceedings., Inter-
national Conference on, vol. 1, 1996, pp. 73–76 vol.2.
[3] J. Ribas-Corbera and S. Lei, “Rate control in DCT video coding for
low-delay communications, Circuits and Systems for Video Technology,
IEEE Transactions on, vol. 9, no. 1, pp. 172–185, 1999.
[4] S. Ma, Z. Li, and F. We, “Proposed draft of adaptive rate control, JVT-
H017, 8th JVT Meeting, Geneva, Switzerland, May 2003.
[5] B. Tao, B. Dickinson, and H. Peterson, Adaptive model-driven bit
allocation for MPEG video coding, Circuits and Systems for Video
Technology, IEEE Transactions on, vol. 10, no. 1, pp. 147–157, Feb
2000.
Fig. 2. Encoder buffer occupancy evolution for several video sequences and
target rates.
[6] S. Ma, W. Gao, and Y. Lu, “Rate-distortion analysis for H.264/AVC
video coding and its application to rate control, Circuits and Systems
for Video Technology, IEEE Transactions on, vol. 15, no. 12, pp. 1533–
1544, 2005.
[7] Z. He, Y. K. Kim, and S. Mitra, “Low-delay rate control for DCT video
coding via ρ-domain source modeling, Circuits and Systems for Video
Technology, IEEE Transactions on, vol. 11, no. 8, pp. 928–940, 2001.
[8] N. Kamaci, Y. Altunbasak, and R. Mersereau, “Frame bit allocation for
the H.264/AVC video coder via Cauchy-density-based rate and distortion
models, Circuits and Systems for Video Technology, IEEE Transactions
on, vol. 15, no. 8, pp. 994–1006, 2005.
[9] M. Jiang and N. Ling, “Low-delay rate control for real-time H.264/AVC
video coding, Multimedia, IEEE Transactions on, vol. 8, no. 3, pp. 467–
477, 2006.
[10] S. Sanz-Rodriguez, M. de-Frutos-Lopez, I. Gonzalez-Diaz, and J. Cid-
Sueiro, “A rate control algorithm for low-delay H.264 video coding with
stored-B pictures, in Acoustics, Speech and Signal Processing, 2007.
ICASSP 2007. IEEE International Conference on, vol. 1, 15-20 April
2007, pp. I–1153–I–1156.
[11] “JM 10.2 [Online], http://iphome.hhi.de/suehring/tml/download/old jm/.