Document [original]

Sergio Sanz-Rodríguez, Óscardel-Ama-Esteban, Manuel de-Frutos-López,

Fernando Díaz-de-María

Cauchy-density-based basic unit layer rate

controller for H.264/AVC

Article, Postprint version

This version is available at http://dx.doi.org/10.14279/depositonce-5747.

Suggested Citation

Sanz-Rodríguez, S.; del-Ama-Este an, Ó.; de-Frutos-López, M.; Díaz-de-María, F.: Cauchy-density- ased a

sic unit layer rate controller for H.264/AVC. - In: IEEE transactions on circuits and systems for videot

echnology : a pu lication of the Circuits and Systems Society. - ISSN: 1558-2205 (online), 1051-8215 (pr

int). - 20 (2010), 8. - pp. 1139–1143. - DOI: 10.1109/TCSVT.2010.2051369. (Postprint version is cited,ba

vailable at http://dx.doi.org/10.14279/depositonce-5747)

obtained for all other uses, in any current or future media, including reprinting/republishing

this material for advertising or promotional purposes, creating new collective works, for

resale or redistribution to servers or lists, or reuse of any copyrighted component of this

work in other works.

Cauchy-Density-Based Basic Unit Layer

Rate Controller for H.264/AVC

Sergio Sanz-Rodr´

ıguez, Student Member, IEEE, ´

Oscar del-Ama-Esteban, Manuel de-Frutos-L´

opez, Student

Member, IEEE, Fernando D´

ıaz-de-Mar´

ıa, Member, IEEE

Abstract—The rate control problem has been extensively stud-

ied in parallel to the development of the different video coding

standards. The bit allocation via Cauchy-density-based rate-

distortion (R-D) modeling of the discrete cosine transform (DCT)

coefficients has proved to be one of the most accurate solution at

picture level. Nevertheless, in some specific applications operating

in real-time low-delay environments, a basic unit (BU) layer

is recommended in order to provide a good trade-off between

picture quality and delay control. In this paper, a novel BU

bit allocation for H.264/AVC is proposed based on a simplified

Cauchy probability density function (PDF) source modeling. The

experimental results are twofold: 1) the proposed rate control

algorithm (RCA) achieves an average PSNR improvement of 0.28

dB respect to a well known BU layer RCA, while maintaining a

similar buffer occupancy evolution; and 2) It achieves to notably

reduce the buffer occupancy fluctuations respect to a well known

picture layer RCA, while maintaining similar quality levels.

Index Terms—H.264/AVC, low-delay, rate control, basic unit,

bit allocation

I. INTRODUCTION

THE inherent variability of video information implies

consequently a variable output bit rate, which must be

controlled in order to comply with the requirements of real-

time video communications through constant bit rate (CBR)

channels. For this purpose, several schemes for rate control

have been recommended during the standardization process of

video codecs, such as TM5 for MPEG-2 [1], VM8 for MPEG-

4 [2], TMN8 for H.263 [3], and AVC-TM for H.264/AVC [4].

The hypothetical reference decoder (HRD), a normative part

of the standard that describes a set of requirements to transmit

and decode bit streams, is taken into account in order to restrict

the target bit assignment per picture by means of two (lower

and upper) bounds for a virtual buffer fullness. The RCA must

keep the buffer in secure levels, avoiding both overflow and

underflow and, given that the decoder must wait for enough

bits to begin the decoding process of a frame, the final delay

is directly related to the buffer size.

In order to maintain the output rate within the buffer lim-

its without noticeable visual quality degradation, the RCA

must assign the most appropriate amount of bits and the

corresponding quantization step (Q) to each coding unit. The

traditional RCAs select this Qvalue using an analytical rate-

quantization (R-Q) function. This function is derived by means

of a source modeling of the DCT coefficients. For instance,

The authors are with the Department of Signal Theory and Communica-

tions, Universidad Carlos III de Madrid, Legan´

es, Madrid 28911 Spain (e-

mail: {sescalona, oaesteban, mfrutos, fdiaz}@tsc.uc3m.es).

using a Gaussian PDF, a logarithmic R-Q function can be

inferred [5]. On the other hand, considering a Laplacian PDF,

different linear [6], quadratic [2] or ρ-domain-based [7] R-Q

models have been proposed. Finally, assuming a Cauchy PDF,

Kamaci et al [8] obtain a simple exponential R-Q model which

outperforms the previous algorithms.

In the particular case of low-delay applications, where the

buffer size is very small, the selection of Qon a frame

basis is not accurate enough, and a finer rate control is

desirable. Due to the growing popularity of these scenarios,

such as in videophone or videoconferencing, several RCAs

have been proposed, which employ an additional BU layer,

defined as a group of macroblocks in raster scan order which

share the same Qvalue. In [1], [3], [7] and [9] a single

macroblock is proposed as BU size, while in [4] a greater

size is recommended for a better trade-off between quality

smoothness and target frame size adjustment.

In this paper, a Cauchy-PDF-based BU rate controller is de-

rived, showing that exponential R-Q and R-D models, similar

to those applied in [8] on a frame basis, work properly at the

BU layer.

The paper is organized as follows. In Section II, the proposed

BU layer RCA for H.264/AVC is described. In Section III,

a pseudocode summary of the proposed algorithm is given.

Experimental results are shown and analyzed in Section IV.

Finally, in Section V some conclusions are drawn.

II. BASIC UNIT LAYER RATE CONTROLLER

In real-time low-delay applications, the feasible buffer size

is quite limited. As pointed out in [3] and [9], the picture layer

RCAs are generally not suitable approaches for this kind of

environments, since the misadjustments in the frame target bits

increase the buffer overflow or underflow risk. Therefore, a

fine quantization parameter (QP) value variation within each

frame is recommended for a better adjustment to the frame

target bits.

In the following subsections we describe the proposed Cauchy-

PDF-based RCA for low-delay environments. It consists of

three layers: the group of pictures (GOP) and picture layers

are similar to those described in [4] and [8], while the BU layer

is novel and entails a bit allocation algorithm, the consequent

QP estimation and, finally, the R-Q and R-D model update.

A. GOP Layer

This layer computes the QP value for the first picture of

each type, (QP I

0, QPP

0, QPB

0), which depends on the average

number of target bits per pixel. The amount of remaining bits

for the jth picture in the ith GOP is determined as follows:

Bi(j) = J·R

f−Vi(j),if j=1

Bi(j−1) −ti(j−1),otherwise, (1)

where Jrepresents the number of pictures in a GOP, Ris the

target bit rate, fis the frame rate, Vi(j)is the current encoder

buffer fullness, and ti(j−1) is the number of bits used to

encode the last picture.

B. Picture Layer

Similarly to [4], the bit allocation algorithm at picture level

consists of main two terms, one modeling the target bits and

the other the buffer status, and obeys the following expression:

Ti(j) = ˆ

Ti(j),if I picture

(1 −β)˜

Ti(j) + βˆ

Ti(j),otherwise, (2)

with

Ti(j) = ˜

bi(j) + ˜

hi(j),(3)

and

Ti(j) = R

f+δSi(j)−Vi(j).(4)

The first term, ˆ

Ti(j), is a target bit model of the remaining

bits in a GOP, Bi(j). The amount of target texture bits of the

jth frame in the ith GOP, ˜

bi(j), is computed following [8].

The number of header and motion data bits, ˜

hi(j), is predicted

by computing an average of the header and motion data bits

of the previously encoded pictures in the GOP.

The second term, e

Ti(j), is derived from the buffer occupancy,

Vi(j).Si(j)is the target encoder buffer level after encoding

the jth picture in the GOP, which is computed either by means

of a linear model for IP. . . P GOP patterns or a saw-tooth

shaped model [10] for GOP patterns with B pictures. δis

chosen as a trade-off between QP variation and target buffer

level adaptation. And βdetermines the relative importance

between both bit allocation models.

Finally, Ti(j)is bounded to satisfy the HRD constraints (see

[4] for further details).

C. Basic Unit Bit Allocation

As mentioned in Section I, a BU is a group of macroblocks

in raster scan order which share the same quantization value.

Therefore, the set of possible BU sizes, generally an entire

fraction of the total number of macroblocks in a picture, goes

from one macroblock to an entire frame.

We assume that the jth picture in the ith GOP is composed

of KBUs (K > 1), and the previous k−1BUs in the picture

have already been encoded. We have a budget of Tr,i(j)bits

to encode the remaining BUs (k, k + 1, . . . , K), which is

calculated as:

Tr,i(j) = Ti(j),if k= 1

Tr,i(j)−tk−1,i(j),otherwise, (5)

where tk−1,i(j)represents the number of bits used to encode

the previous basic unit. Then, we want to find the ˜

bl,i(j)target

texture bits for the lth BU so that:

Tr,i(j) =

l=k˜

bl,i(j) + ˜

hl,i(j),(6)

where ˜

hl,i(j)is a prediction of the number of header and

motion data bits of the remaining basic units.

In order to maintain consistent video quality within the frames,

the proper bit allocation for the current BU is subject to:

Dl,i(j)≈Dm,i(j)l6=m, (7)

where Dl,i(j)represents the distortion of the lth BU in the

jth picture. In [8] is assumed that the same QP value in con-

secutive pictures produces similar distortion, since the global

variation between consecutive pictures is usually moderated.

However, at BU level, this assumption is no longer valid

because of the potentially large differences in complexity

of neighboring BUs. Thus, the theoretical approach becomes

different from that used in [8]. Specifically, we propose to

use the simplified R-D function derived from a Cauchy-type

model of the DCT coefficient distribution, i.e.:

D(R)≈cR−γ,(8)

where cand γare the model parameters. Using this R-D model

and the constant quality condition (7), the target texture bits

for the lth basic unit, ˜

bl,i(j), can be obtained from those of

the current one, ˜

bk,i(j), as follows:

bl,i(j) = c

γl,j

l,j c

−1

γl,j

k,j ˜

bk,i(j)

γk,j

γl,j .(9)

The model parameters cl,j and γl,j can take different values

according to the type of picture, as it will be described

in Subsection II-E. As mentioned before, the complexity

of neighboring BUs may be significantly different in those

pictures with large spatial heterogeneity. Thus, the assumptions

cl,j =ck,j and γl,j =γk,j, with l6=k, are no longer

valid. Therefore, combining the expressions (6) and (9), the

following BU bit allocation is obtained:

Tr,i(j) =

l=kc

γl,j

l,j c

−1

γl,j

k,j ˜

bk,i(j)

γk,j

γl,j +˜

hl,i(j).(10)

Finally, assuming the same header and motion data bits for

the Nrremaining BUs within the picture, the expression (10)

can be rewritten as:

Tr,i(j) =

l=kc

γl,j

l,j c

−1

γl,j

k,j ˜

bk,i(j)

γk,j

γl,j +Nr˜

hk,i(j),(11)

where ˜

hk,i(j)can be obtained as the average number of

header and motion data bits used to encode the previous k−1

basic units. The Newton’s method can be applied in (11) to

iteratively obtain a solution for ˜

bk,i(j).

TABLE I

αk,P/B COEFFICIENT UPDATE.

αk,P αk,B

Bpp,k <0.05 1.60 2.00

0.05 ≤Bpp,k <0.10 1.40 1.80

0.10 ≤Bpp,k 1.20 1.60

TABLE II

γk,P/B COEFFICIENT UPDATE.

γk,P γk,B

Bpp,k <0.07 0.50 0.45

0.07 ≤Bpp,k <0.13 0.70 0.60

0.13 ≤Bpp,k <0.20 0.70 0.75

0.20 ≤Bpp,k 1.00 0.75

D. Quantization Parameter Estimation

When the jth P or B picture within the ith GOP is to be

encoded, the quantization parameter value for the kth basic

unit, QPk,i(j), an integer value between 0and 51, is obtained

from the corresponding quantization step, Qk,i(j), which is

computed using the exponential model:

Qk,i(j) = ˜

bk,i(j)

ak,j −1

αk,j .(12)

The remaining operations for QP selection at BU layer such

as:

- the QP value to be assigned to the first BU in the picture,

- the QP increment to be applied when the frame bit budget

is over and there are still some BUs to be encoded,

- and the clipping algorithms to bound the final QP value,

are the same as those proposed in [4].

For intra pictures other than the first one, the QP estima-

tion at BU layer is not recommended, since eventual object

movements might reduce the correlation between co-located

BUs when the distance between intra pictures is large. Con-

sequently, the picture layer RCA described in [8] is used to

obtain a unique quantization parameter value to encode all

BUs in these intra pictures.

E. Rate-Quantization and Rate-Distortion Model Update

After encoding the kth BU of the jth inter picture, the

coefficients ak,P/B and ck,P/B are updated as follows:

ak,P/B =bk,i(j)e

Qk,i(j)αk,P/B ,(13)

ck,P/B =dk,i(j)bk,i(j)γk,P/B ,(14)

where bk,i(j)is the number of texture bits used to encode

the kth basic unit, and dk,i(j)is the sum of the square error

between the original and reconstructed luminance pixels in the

basic unit. e

Qk,i(j)is the quantization step value obtained from

QPk,i(j), the quantization parameter for the kth BU of the jth

frame in the ith GOP. Furthermore, if the P/B picture is the

first one in the GOP, Tables I and II are employed to update

the parameters αk,P/B and γk,P/B according to the average

number of texture bits per pixel of the kth basic unit, Bpp,k,

which obeys the following expression:

Bpp,k =bk,i(j)

Np,BU

,(15)

where Np,BU is the number of pixels per BU (if the BU size

is one macroblock, Np,BU = 16 ×16 ×1.5for a 4:2:0 format

sequence). Thresholds used to set the αk,P/B value (see Table

I) are the same as those applied to update the parameter αP/B

at picture layer [8], and thresholds used to set the γk,P/B value

have been determined empirically (see Table II).

Finally, after encoding a whole I/P/B picture, the model

coefficient at picture layer, aI/P/B, must also be updated

(see [8]) for the next picture bit allocation. The required

Qi(j)value is obtained as the quantization step associated to

QPi(j), the average quantization parameter value of QPk,i(j).

III. ALGORITHM OUTLINE

The complete RCA is summarized as follows:

A. GOP and Picture Layers

1. Initialize GOP number: i= 1

2. Compute initial QP values QPI

0/QPP

0/QPB

3. For each picture j= 1 to J, do

Update buffer level Vi(j): Ref. [4]

Update Bi(j)remaining bits to encode GOP: (1)

Compute Ti(j)target bits: (2), (3), (4), Ref. [8]

If I picture

Compute QPi(1): Ref. [8]

Encode picture

Else

Go to step B1

End

Update aI/P/B: Ref. [8]

If first I/P/B picture in GOP

Update αI/P/B: Ref. [8]

End

4. i=i+ 1

5. Go to step A3

B. Basic Unit Layer

1. For each BU k= 1 to K, do

Update Tr,i(j)remaining bits to encode frame: (5)

Compute ˜

bk,i(j)target texture bits: (11)

Compute QPk,i(j): (12), Ref. [4]

Encode basic unit

Update ak,P/B and ck,P/B: (13), (14)

If first P/B picture in GOP

Update αk,P/B and γk,P/B: Tables I, II

End

IV. EXPERIMENTS AND RESULTS

The proposed BU layer RCA was implemented on the Joint

Video Team (JVT) H.264/AVC reference software version JM

10.2 [11]. In order to assess its performance, it was compared

to the Laplacian-PDF-based scheme adopted by JVT [4] and

the Cauchy-PDF-based picture layer RCA proposed in [8].

Two sets of color video sequences were used in the ex-

periments. The first one consisted of the following QCIF

sequences: ”Carphone”, ”MotherDaughter”, ”News”, ”Sales-

man”, ”Silent” and ”Paris-Football”. The second one consisted

of the following CIF sequences: ”Coastguard”, ”Container”,

”Football”, ”Foreman”, ”Paris” and ”Bus-Foreman”. Note that

the last sequence in each group contained at least one scene

change. Thus, the behavior of the proposed method in non-

stationary situations could be evaluated as well.

The aforementioned sequences were encoded with the follow-

ing configuration according to a low-delay application:

•Profile: main

•Number of pictures: 300

•Frame rate: f= 25 f/s

•GOP: 300 pictures IP. . . P

•Target rates (QCIF/CIF): R= 32/128,64/256,96/512,

128/1024 kbits/s

•Buffer size: 300 ms

•BU size: a row of macroblocks

•R-D optimization: enabled

•Symbol mode: CABAC

Given a sequence and a target bit rate, the quantization

parameter value for the first picture of the sequence, QPI

0, was

the same for all the compared RCAs. It was selected so that the

encoder buffer does not overflow during the encoding process.

Moreover, a row of macroblocks is proposed as BU size to

achieve a good trade-off among quality, buffer fluctuation and

computational cost.

The δvalue in the expression (4) was set to 0.5to achieve

a fast enough adjustment to the target buffer level without

increasing the QP fluctuation. After studying the behaviour of

the proposed algorithm for different βvalues, this parameter

was fixed to 0.5for our buffer-constrained scenario, which

is a good trade-off between PSNR and buffer fluctuation,

and agrees with the value employed in [4]. Furthermore, for

comparison purposes, the RCA described in [8] was also

adapted to the low-delay requirement by including the same bit

allocation model described in Subsection II-B, with identical

δand βvalues.

Some representative sequences and target rates have been

selected to summarize the encoding results. Luminance PSNR

(mean and standard deviation) and output bit rate are shown

in Table III. In addition, the evolution of PSNR and encoder

buffer occupancy versus picture number are depicted in Figs

1 and 2, respectively. When compared to another scheme

including a BU layer [4], the our proposal achieves an average

PSNR improvement of 0.28 dB while exhibiting a similar

buffer occupancy evolution. When compared to a picture

layer rate controller [8], the proposed algorithm achieves,

as expected, a finer control of both the output bit rate and

the buffer occupancy evolution, reducing notably the risk of

underflow and overflow (see for example the buffer occupancy

evolution around picture #130 for ”Paris-Football” or around

picture #155 for ”Bus-Foreman”), while maintaining similar

quality level.

TABLE III

PSNR (MEAN AND STANDARD DEVIATION)AND OUTPUT RATE FOR

SEVERAL SEQUENCES AND TARGET RATES.

Sequence Algorithm Mean PSNR Output

Target Rate on PSNR Deviation Rate

(kbits/s) JM 10.2 (dB) (dB) (kbits/s)

News (QCIF) [4] 37.20 1.07 64.13

64 [8] 37.27 1.10 63.92

Proposed 37.53 1.19 64.20

News (QIF) [4] 41.68 0.93 128.22

128 [8] 41.66 0.93 128.01

Proposed 42.12 1.08 128.06

Paris-Football (QCIF) [4] 28.24 2.85 64.57

64 [8] 28.59 3.07 64.16

Proposed 28.49 3.19 64.02

Paris-Football (QCIF) [4] 32.14 3.54 128.12

128 [8] 32.54 3.90 130.16

Proposed 32.62 3.74 127.98

Silent (QCIF) [4] 35.78 1.02 64.04

64 [8] 36.21 1.37 60.01

Proposed 36.36 1.32 64.04

Silent (QCIF) [4] 40.35 1.07 128.04

128 [8] 40.82 1.26 127.94

Proposed 40.67 1.18 128.02

Bus-Foreman (CIF) [4] 32.31 3.85 256.13

256 [8] 32.54 3.79 255.77

Proposed 32.41 3.81 256.06

Bus-Foreman (CIF) [4] 38.64 3.30 1024.22

1024 [8] 38.79 3.26 1023.75

Proposed 38.63 3.30 1024.05

Football (CIF) [4] 25.84 0.64 255.90

256 [8] 25.96 0.66 256.19

Proposed 25.91 0.63 255.78

Football (CIF) [4] 31.77 0.71 1023.59

1024 [8] 31.84 0.72 1024.20

Proposed 31.96 0.62 1023.69

Paris (CIF) [4] 32.58 0.69 256.12

256 [8] 33.24 0.91 255.99

Proposed 33.47 0.89 256.01

Paris (CIF) [4] 41.58 0.85 1023.97

1024 [8] 41.85 0.85 1023.72

Proposed 41.86 0.89 1023.93

V. CONCLUSIONS

In this paper, a BU layer RCA has been proposed for

H.264/AVC. Starting from a simplified Cauchy PDF for mod-

eling of DCT coefficient distribution, a novel bit allocation

has been developed and the exponential R-Q model has been

applied on a BU basis.

In order to assess its performance, the RCA was integrated

on an H.264/AVC encoder, which was configured according

to a real-time low-delay application and compared to two

well known RCAs. First, the proposed algorithm achieves

a 0.28 dB average PSNR improvement over the Laplacian-

PDF-based BU layer rate controller of JM 10.2 [4], with

similar buffer occupancy evolutions. And second, our proposal

notably reduces the buffer fluctuation in sequences with time

varying complexity while maintaining a similar quality level,

when compared to the picture layer RCA described in [8].

Therefore, the proposed BU extension is recommended for

buffer-constrained scenarios.

The computational cost of the proposed BU layer algorithm

obviously increases with respect to that of the picture layer

RCA reported in [8], since the rate control operations are

Fig. 1. PSNR evolution for several video sequences and target rates.

performed on a BU basis. Nevertheless, the computational

requirements to update the R-Q and R-D models associated

to each BU in P/B pictures, ak,P/B and ck,P/B, are low when

compared to the Laplacian-PDF-based BU layer scheme used

in JM 10.2 [4]. This last RCA involves updating parameters on

both the quadratic R-Q and the complexity prediction models

by means of a regression analysis of previously encoded BUs.

Our proposal simply updates ak,P/B and ck,P/B following

the expressions (13) and (14), which depend linearly on the

number of texture bits, the distortion generated by the kth BU,

and the quantization step value.

REFERENCES

[1] “Test Model 5 [Online], http://www.mpeg.org/MPEG/MSSG/tm5.”

[2] T. Chiang and Y.-Q. Zhang, “A new rate control scheme using quadratic

rate distortion model,” in Image Processing, 1996. Proceedings., Inter-

national Conference on, vol. 1, 1996, pp. 73–76 vol.2.

[3] J. Ribas-Corbera and S. Lei, “Rate control in DCT video coding for

low-delay communications,” Circuits and Systems for Video Technology,

IEEE Transactions on, vol. 9, no. 1, pp. 172–185, 1999.

[4] S. Ma, Z. Li, and F. We, “Proposed draft of adaptive rate control,” JVT-

H017, 8th JVT Meeting, Geneva, Switzerland, May 2003.

[5] B. Tao, B. Dickinson, and H. Peterson, “Adaptive model-driven bit

allocation for MPEG video coding,” Circuits and Systems for Video

Technology, IEEE Transactions on, vol. 10, no. 1, pp. 147–157, Feb

2000.

Fig. 2. Encoder buffer occupancy evolution for several video sequences and

target rates.

[6] S. Ma, W. Gao, and Y. Lu, “Rate-distortion analysis for H.264/AVC

video coding and its application to rate control,” Circuits and Systems

for Video Technology, IEEE Transactions on, vol. 15, no. 12, pp. 1533–

1544, 2005.

[7] Z. He, Y. K. Kim, and S. Mitra, “Low-delay rate control for DCT video

coding via ρ-domain source modeling,” Circuits and Systems for Video

Technology, IEEE Transactions on, vol. 11, no. 8, pp. 928–940, 2001.

[8] N. Kamaci, Y. Altunbasak, and R. Mersereau, “Frame bit allocation for

the H.264/AVC video coder via Cauchy-density-based rate and distortion

models,” Circuits and Systems for Video Technology, IEEE Transactions

on, vol. 15, no. 8, pp. 994–1006, 2005.

[9] M. Jiang and N. Ling, “Low-delay rate control for real-time H.264/AVC

video coding,” Multimedia, IEEE Transactions on, vol. 8, no. 3, pp. 467–

477, 2006.

[10] S. Sanz-Rodriguez, M. de-Frutos-Lopez, I. Gonzalez-Diaz, and J. Cid-

Sueiro, “A rate control algorithm for low-delay H.264 video coding with

stored-B pictures,” in Acoustics, Speech and Signal Processing, 2007.

ICASSP 2007. IEEE International Conference on, vol. 1, 15-20 April

2007, pp. I–1153–I–1156.

[11] “JM 10.2 [Online], http://iphome.hhi.de/suehring/tml/download/old jm/.”