Document [original]

Óscar del-Ama-Esteban, Sergio Sanz-Rodríguez, Manuel de-Frutos-López,

Fernando Díaz-de-María

A Cauchy-density-based rate controller for

H.264/AVC in low-delay environments

Conference object, Postprint version

This version is available at http://dx.doi.org/10.14279/depositonce-5773.

Suggested Citation

del-Ama-Esteban, Óscar; Sanz-Rodríguez, Sergio; de-Frutos-López, Manuel; Díaz-de-María, Fernando: A

Cauchy-density-based rate controller for H.264/AVC in low-delay Environments. - In: 2009 Picture

Coding Symposium : PCS. - New York, NY [u.a.] : IEEE, 2009. - ISBN: 978-1-4244-4593-6. - pp. 1-4. - DOI:

10.1109/PCS.2009.5167438. (Postprint version is cited, page numbers differ.)

obtained for all other uses, in any current or future media, including reprinting/republishing

this material for advertising or promotional purposes, creating new collective works, for

resale or redistribution to servers or lists, or reuse of any copyrighted component of this

work in other works.

A CAUCHY-DENSITY-BASED RATE CONTROLLER FOR H.264/AVC

IN LOW-DELAY ENVIRONMENTS

Oscar del-Ama-Esteban, Sergio Sanz-Rodr´

ıguez, Manuel de-Frutos-L´

opez, Fernando D´

ıaz-de-Mar´

ıa

Department of Signal Theory and Communications

Universidad Carlos III, Legan´

es (Madrid), Spain

ABSTRACT

The accuracy of the Cauchy probability density function for mod-

eling of the discrete cosine transform coefficient distribution has al-

ready been proved for the frame layer of the rate control subsystem

of a hybrid video coder. Nevertheless, in some specific applications

operating in real-time low-delay environments, a basic unit layer is

recommended in order to provide a good trade-off between quality

and delay control. In this paper, a novel basic unit bit allocation

for H.264/AVC is proposed based on a simplified Cauchy proba-

bility density function source modeling. The experimental results

show that the proposed algorithm improves the average peak signal-

to-noise ratio in 0.28 and 0.35 dB with respect to two well-known

rate control schemes, while maintaining similar peak signal-to-noise

ratio standard deviation and buffer occupancy evolution.

Index Terms—H.264, low-delay, rate control, bit allocation,

basic unit.

1. INTRODUCTION

The inherent variability of video information implies that the video

encoder normally produces a variable output bit rate, which must

be controlled in order to comply with the nominal network rate re-

quired by real-time communications through constant bit rate chan-

nels. Therefore, the inclusion of a rate control (RC) algorithm in

video encoders is of paramount importance. Although it is not a nor-

mative tool for video coding standards, several schemes have been

recommended during the different standardization processes, such

as TM5 for MPEG-2 [1], VM8 for MPEG-4 [2], TMN8 for H.263

[3], and AVC-TM for H.264/AVC [4].

Almost in any RC algorithm, a virtual buffer is considered at the out-

put of the encoder. This buffer aims at modeling the decoder buffer

behavior and bears the difference between the nominal bit rate of

the network and the variable source bit rate per picture. The RC

algorithm must keep the buffer in secure levels, avoiding both the

overflow and the underflow.

In order to maintain the output rate within the buffer limits without

visual quality degradation, a RC scheme should assign, according

to the buffer status and the picture complexity, the most appropri-

ate amount of bits and the corresponding quantization step (Q) to

each coding unit. Traditional RC schemes select the Qvalue using

an analytical rate-quantization (R-Q) function. This function is de-

rived by means of a source modeling of the discrete cosine transform

(DCT) coefficients. For instance, using a Gaussian probability den-

sity function (PDF), a logarithmic R-Q function can be inferred [5];

while using a Laplacian PDF, different linear [6], quadratic [2] or

ρ-domain-based [7] R-Q models have been proposed.

Kamaci et al demonstrated in [8] the accuracy of the Cauchy PDF

for DCT coefficient modeling. Starting from this distribution, a sim-

ple exponential R-Q model is obtained which, applied to a RC al-

gorithm for H.264/AVC, achieves a mean PSNR improvement when

compared to schemes based on the Laplacian PDF.

As many others algorithms, the one reported in [8] employs a frame

layer which assigns the Qvalue on a frame basis. However, in the

case of low-delay applications, the buffer size is very restricted and

a finer RC is desirable. Given the growing popularity of these sce-

narios, such as in videophone or videoconferencing, several RC al-

gorithms have been proposed for different video coding standards,

where an additional basic unit (BU) layer, defined as a group of mac-

roblocks in raster scan order which share the same Qvalue, is em-

ployed. In [1],[3],[7] and [9] a single macroblock is proposed as BU

size, while in [4] a greater size is recommended for a better trade-off

between quality smoothness and target frame size adjustment.

In this paper, a Cauchy-PDF-based BU bit allocation algorithm is

proposed, showing that the exponential R-Q and rate-distortion mod-

els work properly at the BU layer RC.

The paper is organized as follows. Section 2 describes the solutions

adopted for group of pictures (GOP), picture and BU layer RC. Sec-

tion 3 shows and analyzes the experimental results of the suggested

method in comparison with other popular RC schemes. Finally, in

Section 4 some conclusions are drawn.

2. PROPOSED CAUCHY-PDF-BASED RATE CONTROL

Since the main contribution of this paper involves a new BU layer,

some state of the art algorithms have been studied and their solutions

adopted for the GOP and picture layers of the proposed RC scheme.

These solutions are briefly described in the following subsections as

well as the BU layer itself.

2.1. GOP Layer

The most common coding pattern in low-delay applications is the

IP...P structure, since no structural delay is involved. Moreover,

since I pictures need a larger amount of bits than P or B pictures,

a unique GOP is adopted to encode the whole sequence in order to

reduce the buffer delay.

The GOP layer computes the QP values for the first picture of each

type, (QP I

0and QP P

0). These values depend on the average number

of target bits per pixel (see [4] for further details).

Before encoding the current jth picture, the total number of bits to

encode the remaining pictures, B(j), and the buffer fullness V(j)

are calculated as follows:

B(j) =



J·R

f,if j=1

B(j−1) −t(j−1),otherwise, (1)

V(j) =



0,if j=1

V(j−1)+t(j−1)−R

f,otherwise, (2)

where Jrepresents the number of pictures, Ris the target bit rate,

fis the frame rate and t(j−1) is the total number of bits used to

encode the last picture.

2.2. Picture Layer

At picture layer, the amount of target bits for the jth picture, ˆ

T(j),

is obtained as a portion of the remaining bits B(j):

T(j) = B(j)

Np,r

,(3)

where Np,r represents the number of remaining pictures.

In low-delay environments an additional buffer-status aware bit al-

location algorithm is highly recommended to achieve a tight buffer

level control and compensate the extra bits needed for encoding the

first I picture. This model obeys:

T(j) = R

f+δ



S(j)−V(j)



,(4)

where δis a constant which is set to 0.5 to achieve a fast enough

adjustment to the target buffer level without increasing the QP fluc-

tuation, and S(j)is the linear model of the target buffer level (see

[10] for details). Thus, the total number of target bits to encode the

jth picture is calculated as a combination of (3) and (4):

T(j) = βˆ

T(j) + (1 −β)˜

T(j),(5)

where βis a constant which is set to 0.5 as a good trade-off between

PSNR and buffer fluctuation. Finally, T(j)is bounded to conform

to hypothetical reference decoder (HRD) constraints ([4]).

2.3. Basic Unit Layer

As pointed out in [3] and [9], in real-time low-delay applications,

where the overall delay between the transmitter and receiver is

severely constrained, the picture layer RC schemes are generally not

suitable approaches, since the misadjustments to the frame target

bits could result in larger fluctuations of the buffer fullness and,

therefore, the buffer overflow or underflow risk is higher.

A BU is a group of macroblocks in raster scan order which share the

same quantization value. The number of macroblocks in a BU is set

before beginning the encoding process and it is generally an entire

fraction of the total number of macroblocks in a picture.

2.3.1. Bit Allocation

We assume that the jth picture is composed of KBUs (K > 1), and

the previous k−1BUs in the picture have already been encoded.

We have a budget of Tr,k(j)bits for the remaining BUs (k, k +

1, . . . , K), which is calculated as:

Tr,k(j) =



T(j),if k= 1

Tr,k(j)−tk−1(j),otherwise, (6)

where tk−1(j)represents the number of bits used to encode the pre-

vious BU. Then, we want to find the ˜

bl(j)target texture bits for the

lth BU so that:

Tr,k(j) =

l=k



bl(j) + ˜

hl(j)



,(7)

where ˜

hl(j)is a prediction of the number of header and motion data

bits of the remaining BUs.

In order to maintain a consistent video quality within the frames,

we propose that the proper bit allocation for the current BU must be

subject to:

Dl(j)≈Dm(j)l6=m, (8)

where Dl(j)represents the distortion of the lth BU in the jth pic-

ture. At frame level, it is assumed that the same QP value in con-

secutive pictures produces similar distortion [8], since the global

variation between consecutive pictures is usually moderated. How-

ever, at BU level, this assumption is no longer valid because of

the potentially large differences in complexity of neighboring BUs.

Therefore, although we start from the same models, the theoretical

approach to the bit allocation must be different from that used in

[8]. Specifically, we propose to use the simplified R-D function de-

rived from a Cauchy-type model of the DCT coefficient distribution,

namely:

D(R)≈cR−γ,(9)

where cand γare the model parameters. Using this R-D model and

the constant distortion (or quality) condition (8), the target texture

bits for the lth BU, ˜

bl(j), can be obtained from those of the current

one, ˜

bk(j), as follows:

bl(j) = c

γl

l,j c

−

γl

k,j ˜

bk(j)

γk

γl.(10)

As mentioned before, the complexity of neighboring BUs may be

significantly different in those pictures with high spatial heterogene-

ity. Thus, the assumptions cl,j =ck,j and γl,j =γk,j , with l6=k,

are no longer valid. Then, combining the expressions (7) and (10),

the following BU bit allocation is obtained:

Tr,k(j) =

l=k



γl

l,j c

−

γl

k,j ˜

bk(j)

γk

γl+˜

hl(j)



.(11)

Finally, assuming the same header and motion data bits for the Nr

remaining BUs within the picture, the expression (11) can be rewrit-

ten as:

Tr,k(j) =

l=k



γl

l,j c

−

γl

k,j ˜

bk(j)

γk

γl



+Nr˜

hk(j),(12)

where ˜

hk(j)can be obtained as the average number of header and

motion data bits used to encode the previous k−1BUs. The New-

ton’s method can be used to iteratively obtain a solution for ˜

bk(j).

2.3.2. Quantization Parameter Estimation

When the jth picture is to be encoded, the quantization parameter

(QP) value for the kth BU, QPk(j), is obtained from the correspond-

ing Qk(j), which is computed using the exponential R-Q model de-

scribed in [8] adapted for BU level:

Qk(j) =



bk(j)

ak,j



−

αk.(13)

where ak,j and αk,j are model parameters. Finally, the QPk(j)

value is bounded at BU and picture levels as stated in [4] in order to

avoid noticeable quality variations within the frame.

Table 1.αkand γkcoefficients update.

Bpp,k αkBpp,k γk

<0.05 1.60 <0.07 0.50

0.05 −0.10 1.40 0.07 −0.20 0.70

>0.10 1.20 >0.20 1.00

2.3.3. Rate-Quantization and Rate-Distortion Models Update

After encoding the kth BU of the jth inter picture, the coefficients

ak,j+1 and ck,j+1 are updated as follows:

ak,j+1 =bk(j)

Qk(j)αk,(14)

ck,j+1 =dk(j)bk(j)γk,(15)

where bk(j)is the number of texture bits used to encode the kth BU,

and dk(j)is the sum of square error between the original and recon-

structed luminance pixels in the BU.

Qk(j)is the Qvalue obtained

from QPk(j), the QP value for the kth BU of the jth frame. After

encoding the first inter picture in the sequence, Table 1 is employed

to initialize the parameters αkand γkaccording to the average num-

ber of texture bits to encode a pixel of the kth BU, Bpp,k. Thresholds

used to set the initial αkvalues have been obtained from [8].

3. EXPERIMENTS AND RESULTS

The proposed BU layer RC algorithm was implemented on the Joint

Video Team (JVT) software version JM 10.2 [11]. In order to as-

sess its performance, it was compared to two Laplacian-PDF-based

schemes: the RC algorithm adopted by JVT [4] and the one pro-

posed in [12]. CIF and QCIF common video sequences as well as

linkings of them were used in the experiments.

A row of macroblocks was employed as BU size to achieve a good

trade-off among quality, buffer fluctuation and computational cost.

The complete configuration of our tests is stated here:

•QCIF sequences: ”MobileCalendar”, ”MotherDaughter”,

”News”, ”Salesman”, ”Silent” and ”Paris-Football”

•CIF sequences: ”Coastguard”, ”Container”, ”Football”,

”Foreman”, ”Paris” and ”Bus-Foreman”

•Profile: baseline

•Number of pictures: 300

•Frame rate: f= 25 f/s

•Target rates (QCIF/CIF): R= 32/128,64/256,96/512,

128/1024 kbits/s

•Buffer size: 500 ms

•R-D optimization: disabled

•Symbol mode: CAVLC

Luminance peak signal-to-noise ratio (PSNR), in terms of mean and

standard deviation, and output bit rate are shown in Tables 2 and 3

for some QCIF and CIF sequences, respectively. The proposed algo-

rithm achieves an average PSNR improvement of 0.35 dB over the

RC scheme reported in [12], since the latter does not take into ac-

count the complexities of BUs for better target bit assignment. Fur-

thermore, we achieve a 0.28 dB average PSNR improvement over

the default RC algorithm of JM 10.2 [4], which employs a BU bit

allocation algorithm based on a complexity ratio that takes into ac-

count the remaining BUs of a picture. Moreover, the results in terms

Table 2. PSNR (mean and standard deviation) and output rate for

several QCIF sequences and target rates.

Sequence Algorithm Mean PSNR Output

Target Rate on PSNR Deviation Rate

(kbits/s) JM 10.2 (dB) (dB) (kbits/s)

News [12] 36.22 0.82 64.16

64 [4] 36.41 0.81 64.12

Proposed 36.91 0.95 64.05

News [12] 40.95 1.08 128.24

128 [4] 41.02 0.98 128.17

Proposed 41.42 1.10 128.14

Paris-Football [12] 27.29 2.82 64.05

64 [4] 27.27 2.81 63.13

Proposed 27.68 2.95 64.03

Paris-Football [12] 31.55 3.76 127.98

128 [4] 31.52 3.67 128.12

Proposed 31.89 3.91 127.99

Table 3. PSNR (mean and standard deviation) and output rate for

several CIF sequences and target rates.

Sequence Algorithm Mean PSNR Output

Target Rate on PSNR Deviation Rate

(kbits/s) JM 10.2 (dB) (dB) (kbits/s)

Bus-Foreman [12] 31.32 3.92 256.29

256 [4] 31.48 3.78 256.19

Proposed 31.49 3.75 256.08

Bus-Foreman [12] 37.65 3.58 1024.79

1024 [4] 37.92 3.31 1024.35

Proposed 37.91 3.31 1024.19

Paris [12] 31.76 0.59 256.18

256 [4] 31.95 0.64 256.09

Proposed 32.57 0.81 256.07

Paris [12] 40.29 0.69 1024.71

1024 [4] 40.91 0.80 1024.21

Proposed 41.19 0.89 1023.95

of PSNR standard deviation and output bit rate are similar to those

of the reference schemes.

Finally, the PSNR and encoder buffer occupancy versus picture num-

ber are plotted in Figs 1 and 2, respectively. It can be seen that the

proposed method achieves a good quality performance and similar

buffer occupancy evolution when compared to [12] and [4]. Fur-

thermore, in sequences with scene cuts, such as ”Paris-Football” and

”Bus-Foreman”, a faster adaptation to the complexity of a new scene

is reached with our proposal.

4. CONCLUSIONS

In this paper, a BU layer RC algorithm has been proposed for

H.264/AVC. Starting from a simplified Cauchy PDF for DCT co-

efficient distribution modeling, a novel bit allocation has been

developed and the exponential R-Q model has been applied on a

BU basis. In order to assess its performance, the RC was integrated

on an H.264/AVC encoder, which was configured according to a

Fig. 1. PSNR evolution for several video sequences and target rates.

real-time low-delay application and compared to two Laplacian-

PDF-based BU layer RC schemes. The experimental results show

an average PSNR improvement over the default RC algorithm of

JM 10.2 [4] and the algorithm described in [12], with similar buffer

occupancy evolutions. Furthermore, our proposal achieves similar

PSNR standard deviation and output bit rate.

Another characteristic of the proposed RC algorithm is the low com-

putational requirement to update the coefficients of the exponential

R-Q and R-D models associated to each BU in the picture, akand

ckwith the linear expressions (14) and (15). In contrast to this,

the Laplacian-PDF-based method used in the reference schemes

involves updating parameters on both the quadratic R-Q and the

complexity prediction models by means of a regression analysis of

previously encoded BUs, with the consequent overhead.

5. REFERENCES

[1] Test Model 5, “http://www.mpeg.org/MPEG/MSSG/tm5,” .

[2] Tihao Chiang and Ya-Qin Zhang, “A new rate control scheme

using quadratic rate distortion model,” in Image Processing,

1996. Proceedings., International Conference on, 1996, vol. 1,

pp. 73–76 vol.2.

[3] J. Ribas-Corbera and Shawmin Lei, “Rate control in DCT

video coding for low-delay communications,” Circuits and

Systems for Video Technology, IEEE Transactions on, vol. 9,

no. 1, pp. 172–185, 1999.

[4] S. Ma, Z. Li, and F. We, “Proposed draft of adaptive rate con-

trol,” JVT-H017, 8th JVT Meeting, Geneva, Switzerland, May

2003.

[5] Bo Tao, B.W. Dickinson, and H.A. Peterson, “Adaptive model-

driven bit allocation for MPEG video coding,” Circuits and

Systems for Video Technology, IEEE Transactions on, vol. 10,

no. 1, pp. 147–157, Feb 2000.

Fig. 2. Encoder buffer occupancy evolution for several video se-

quences and target rates.

[6] S. Ma, Wen Gao, and Yan Lu, “Rate-distortion analysis for

H.264/AVC video coding and its application to rate control,”

Circuits and Systems for Video Technology, IEEE Transactions

on, vol. 15, no. 12, pp. 1533–1544, 2005.

[7] Zhihai He, Yong Kwan Kim, and S.K. Mitra, “Low-delay rate

control for DCT video coding via ρ-domain source modeling,”

Circuits and Systems for Video Technology, IEEE Transactions

on, vol. 11, no. 8, pp. 928–940, 2001.

[8] N. Kamaci, Y. Altunbasak, and R.M. Mersereau, “Frame bit

allocation for the H.264/AVC video coder via Cauchy-density-

based rate and distortion models,” Circuits and Systems for

Video Technology, IEEE Transactions on, vol. 15, no. 8, pp.

994–1006, 2005.

[9] Minqiang Jiang and Nam Ling, “Low-delay rate control for

real-time H.264/AVC video coding,” Multimedia, IEEE Trans-

actions on, vol. 8, no. 3, pp. 467–477, 2006.

[10] Siwei Ma, Wen Gao, Peng Gao, and Yan Lu, “Rate control

for advance video coding (avc) standard,” in Circuits and Sys-

tems, 2003. ISCAS ’03. Proceedings of the 2003 International

Symposium on, 2003, vol. 2, pp. II–892–II–895 vol.2.

[11] JM 10.2, “http://iphome.hhi.de/suehring/tml/download/old jm/,”

[12] Z. Li, W. Gao, F. Pan, S. Ma, , G.N. Feng, K.P. Lim, X. Lin,

S. Rahardja, and H.Q. Lu, “Adaptive rate control with HRD

consideration,” JVT-H014, 8th JVT Meeting, Geneva, Switzer-

land, May 2003.