In-layer multibuffer framework for rate-controlled scalable video coding [original]

Sergio Sanz-Rodríguez, Fernando Díaz-de-María

In-layer multibuffer framework for rate-

controlled scalable video coding

Article, Postprint version

This version is available at http://dx.doi.org/10.14279/depositonce-5748.

Suggested Citation

Sanz-Rodríguez, S.; Díaz-de-María, F.: In-layer multibuffer framework for rate-controlled scalable video

coding. - In: IEEE transactions on circuits and systems for video technology : a publication of the

Circuits and Systems Society. - ISSN: 1558-2205 (online), 1051-8215 (print). - 22 (2012), 8. - pp.

1199–1212. - DOI: 10.1109/TCSVT.2012.2198089. (Postprint version cited, available at

http://dx.doi.org/10.14279/depositonce-5747.)

obtained for all other uses, in any current or future media, including reprinting/republishing

this material for advertising or promotional purposes, creating new collective works, for

resale or redistribution to servers or lists, or reuse of any copyrighted component of this

work in other works.

In-Layer Multi-Buffer Framework for

Rate-Controlled Scalable Video Coding

Sergio Sanz-Rodr´

ıguez, Member, IEEE, Fernando D´

ıaz-de-Mar´

ıa, Member, IEEE

Abstract—Temporal scalability is supported in scalable video

coding (SVC) by means of hierarchical prediction structures,

where the higher layers can be ignored for frame rate reduction.

Nevertheless, this kind of scalability is not totally exploited by

the rate control (RC) algorithms since the hypothetical reference

decoder (HRD) requirement is only satisfied for the highest frame

rate sub-stream of every dependency (spatial or coarse grain

scalability) layer. In this paper we propose a novel RC approach

that aims to deliver several HRD-compliant temporal resolutions

within a particular dependency layer. Instead of using the com-

mon SVC encoder configuration consisting of a dependency layer

per each temporal resolution, a compact configuration that does

not require additional dependency layers for providing different

HRD-compliant temporal resolutions is proposed. Specifically,

the proposed framework for rate-controlled SVC uses a set of

virtual buffers within a dependency layer so that their levels

can be simultaneously controlled for overflow and underflow

prevention while minimizing the reconstructed video distortion

of the corresponding sub-streams. This in-layer multi-buffer

approach has been built on top of a baseline H.264/SVC RC

algorithm for variable bit rate applications. The experimental

results show that our proposal achieves a good performance in

terms of mean quality, quality consistency, and buffer control

using a reduced number of layers.

Index Terms—H.264/advanced video coding (AVC),

H.264/SVC, hypothetical reference decoder (HRD), rate

control, scalable video coding (SVC), variable bit rate (VBR).

I. INTRODUCTION

DURING the last years video applications have grown

in popularity because of the increasing advances on

network infrastructure, data storage, and computational and

memory capacity of multimedia devices. Within this techno-

logical framework, scalable video coding (SVC) provides an

attractive solution for bit rate adaptation to certain application

requirements, such as display resolutions and computational

capabilities of target devices, or varying channel conditions.

Specifically, SVC enables the extraction of either one or a sub-

set of sub-streams from a high-quality bit stream, so that these

simpler sub-streams, bearing lower spatio-temporal resolutions

or reduced quality versions of the original sequence, can

be decoded by a given target receiver. Furthermore, unequal

error protection (UEP) or unequal erasure protection (UXP)

Manuscript received XX X, 2011; revised XX XX, 20XX. This work has

been partially supported by the National Grant TEC2011-26807 of the Spanish

Ministry of Science and Innovation.

The authors are with the Department of Signal Theory and Communica-

tions, Universidad Carlos III de Madrid, Legan´

es, Madrid 28911 Spain (e-

mail: {sescalona, fdiaz}@tsc.uc3m.es).

However, permission to use this material for any other purposes must be

obtained from the IEEE by sending an email to [email protected].

techniques [1] can be used to ensure an error free transmission

of more important sub-streams, such as that associated with the

lowest spatio-temporal resolution. UEP/UXP would be located

on top of the already existing channel forward error correction.

Several industries and application areas, from video confer-

ence or video surveillance [2] to Internet protocol television

(IPTV) broadcast [3], have benefited from these SVC features

for multimedia information delivery.

Scalable profiles have been developed for video coding

standards prior to H.264/advanced video coding (AVC) [4],

such as MPEG-2 [5], H.263 [6], and MPEG-4 Visual [7].

Nevertheless, most of these extensions have been rarely used

in real applications. Several factors have caused that limited

deployment: on the one hand, the unsuitability of traditional

video transmission systems and the lack of an actual diversity

of decoding devices; on the other hand, the loss in coding

efficiency and the increase in decoding complexity when com-

pared to non-scalable profiles [8]. Consequently, for earlier

coding standards, alternative approaches such as simulcasting

or transcoding have been preferred to scalable profiles. In

contrast, nowadays, the transmission systems have evolved

to properly manage this kind of traffic, and the diversity

of devices has become an apparent reality. Furthermore, the

recently standardized scalable extension of H.264/AVC, named

H.264/SVC [8], [9], is able to provide both coding efficiency

and decoding complexity more similar to those achieved using

non-scalable coding.

As prior scalable standards, H.264/SVC supports spatial,

temporal, and quality (or signal-to-noise ratio, SNR) scalabil-

ity. For spatial scalability, a layered coding approach is used

to encode different picture sizes of an input video sequence.

The base layer provides an H.264/AVC compatible bit stream

for the lowest spatial resolution, while larger picture sizes are

encoded as enhancement layers. In addition, the redundancies

between contiguous spatial layers can be exploited via inter-

layer prediction tools in order to improve the coding efficiency.

Moreover, each spatial layer is capable of supporting tem-

poral scalability by means of hierarchical prediction structures,

which go from these very efficient ones using hierarchical

bipredictive (B) pictures to those with zero structural delay.

The pictures of the temporal base layer can only use previous

pictures of the same layer as references. The pictures of a

temporal enhancement layer can be bidirectionally predicted

from pictures of a lower layer. The number of temporal layers

in a spatial layer is determined by the group of pictures

(GoP) size, defined in H.264/SVC as the distance between

two consecutive intra (I) or predictive pictures, also named

key pictures.

When SNR scalability is considered, different reconstruc-

tion quality levels with the same spatio-temporal resolution

are provided. In particular, the H.264/SVC standard defines

two types of SNR scalable coding: coarse grain scalability

(CGS) and medium grain scalability (MGS). The first is a

special case of spatial scalability with identical picture sizes.

The second employs a multi-layer approach within a spatial

layer in order to provide a finer bit rate granularity in the

rate-distortion (R-D) space.

For a variety of video coding applications, the rate control

(RC) algorithm is a key subsystem in both scalable (multi-

layer) and non-scalable (single-layer) video coding systems.

The RC algorithm works in two steps. First, a bit budget is

allocated to a video segment such as GoP, picture, or mac-

roblock according to the video content, the target bit rate, and

the buffer constraints imposed by the hypothetical reference

decoder (HRD) [10] (additionally, for digital storage, the bit

allocation method must be aware of the maximum allowed

storage capacity). Second, a quantization parameter (QP) value

is assigned to the video segment in order to satisfy these buffer

and/or budget constraints, while minimizing the reconstructed

video distortion. In the case of SVC, it is also worth noting that

the RC algorithm actually consists of a set of rate controllers,

each one located at each dependency (spatial scalability or

CGS) or MGS layer, to provide a set of HRD-compliant

scalable sub-streams, each one for a certain target bit rate

suitable for a target decoding terminal managing a particular

spatio-temporal resolution or computational capability.

The RC problem has been extensively studied for both

single-layer video coding and SVC. According to the target

application, two kinds of RC methods have been proposed:

constant bit rate (CBR) and variable bit rate (VBR) control

algorithms. On the one hand, the CBR controllers, commonly

used for real-time video conference, pursue a short-term target

bit rate adjustment to guarantee a low buffer delay. On the

other hand, the VBR controllers, typically used for video

streaming or digital storage, manage a long-term target bit rate

adaptation at the expense of a longer buffer delay to maintain

a high visual quality consistency [11], [12].

Most of the CBR controllers have focused on modeling the

discrete cosine transform (DCT) coefficients to provide ana-

lytical R-D functions for QP estimation. In single-layer video

coding, several R-D functions have been proposed: logarithmic

[13]–[15], linear [16], [17], quadratic [18]–[22] (in particular,

Chen et al. [21] proposed separate R-D models for luminance

and chrominance DCT coefficients, whilst Kwon et al. [22]

proposed separate rate models for source and header bits),

ρ-domain [23] and exponential [24], [25]. Although the RC

algorithm is not a normative part of video coding standards, it

usually forms part of their reference implementations, such as

the Test Model Version 5 for MPEG-2 [16], the Verification

Model Version 8 for MPEG-4 [18], the Test Model Near-Term

8 for H.263 [14], and the Joint Model for H.264/AVC [19].

Likewise, most of the CBR control algorithms proposed for

SVC also rely on analytical R-D models for QP estimation;

in particular: logarithmic [26], linear [27], quadratic [28], ρ-

domain [29], [30], and exponential [31], [32] models have

been proposed.

Regarding VBR controllers, several solutions for single-

layer coding have been proposed for a variety of applications,

such as video streaming and broadcast [33], [34], one-pass

digital storage [35], [36], or two-pass digital storage [37], [38].

Other methods, such as those in [39] and [40], take advantage

of networking infrastructures supporting VBR transport [12]

to improve the visual quality while reducing the buffer delay.

For SVC, a few approaches have been proposed for video

streaming [41], [42], broadcast [43], as well as applications

dealing with varying channel conditions [44]. From the R-D

modeling point of view, while some of these methods rely on

analytical R-D functions for QP estimation [33], [35], [37],

[41], [44], others estimate a QP increment with respect to a

reference QP value [34], [36], [39], [40], [43], to reduce the

QP variation for the sake of visual quality consistency.

The bit allocation problem has also been studied for SVC.

In particular, R-D models for optimal bit allocation among

spatial, quality, and temporal layers have been proposed in

[31] and [32]. Likewise, the optimal distribution of the total

target bit rate among dependency layers for visual quality

maximization has been addressed in [45]. It is also worth

noting that quality scalability was specially investigated for

MPEG-4 fine grain scalability (FGS) [41], [44] and H.264

MGS [31], [42], [46], [47].

Nevertheless, all these previous RC approaches for SVC

only guarantee the HRD requirement for the highest temporal

layer of each dependency layer. Therefore, temporal scala-

bility is not fully exploited since, in order to deliver HRD-

compliant sub-streams, it is necessary to increase the number

of dependency layers. For instance, if a video transmission

service offered the same quality of service (QoS) to two

target decoders with identical spatial resolutions but different

temporal resolutions, the SVC encoder would have to use

two CGS layers, one per temporal layer. Although the two

desired HRD-compliant sub-streams are provided, temporal

scalability is underused since each one of the highest temporal

layers actually also contains the lower frame rate. In summary,

the common SVC encoder configuration for rate-controlled

video may incur in redundant dependency layers, producing

an unnecessary increase in bit rate and coding complexity.

In this paper we propose a novel RC approach for delivering

HRD-compliant temporal resolutions within a particular de-

pendency layer. Specifically, the proposed method uses a set of

virtual buffers (one per HRD-compliant temporal resolution)

within a dependency layer, so that the buffer levels can be

simultaneously controlled for overflow and underflow preven-

tion, while minimizing the reconstructed video distortion of

the corresponding sub-streams. The proposed in-layer multi-

buffer (IL-MB) approach has been built on top of a baseline

RC algorithm described in [43], which relies on an effective

radial basis function (RBF)-based model for QP estimation in

VBR scenarios.

The paper is organized as follows. In Section II, an overview

of the baseline RC algorithm for H.264/SVC is given. In

Section III a detailed description of the proposed IL-MB

VBR controller is provided. First, a general description of

the proposed method is given. Second, the proposed VBR

controller is described stage by stage, making special emphasis

TABLE I

SUMMARY OF NOTATION

DNumber of dependency layers

dDependency layer identifier

tTemporal layer identifier

jCurrent picture number

BD Buffer size in seconds

nT F Normalized target buffer fullness with respect to BD

HGaussian-type function

LNumber of Gaussian-type functions

C,Σ,w,w0Centers, widths, weights, and bias of the RBF network

For each layer d

T(d)Number of temporal layers

t(d)

max Maximum temporal layer identifier

t(d)

min Minimum involved temporal layer identifier

RC(d)Rate control module

kTemporal layer index that goes from t(d)

min to t(d)

max

R(d,k)Target bit rate for the sub-stream k

f(d,k)

out Output frame rate of the sub-stream k

QP (d)QP value

QP (d)

REF Reference QP value

∆QP (d)QP increment

QP(d)Set of previous QPs

BS(d,k)Buffer size in bits associated with the sub-stream k

V(d,k)Buffer fullness associated with the sub-stream k

V(d,k)Set of involved buffers after encoding a picture at the

layer t

nV(d)Set of normalized versions of all the buffer fullness

nV (d)Normalized version of the buffer fullness

G(d,t,k)AU target bits at the layer tto meet R(d,k)

AU(d,t)AU output bits of a picture at the layer t

nAU(d)Set of normalized versions of the AU output bits

nAU(d)Normalized version of the AU output bits

C(d,t)

T EX Average texture complexity of the layer t

C(d,t)

MOT Average motion complexity of the layer t

X(d)Input vector to the RBF network

on the buffer modeling stage, which is used to properly manage

the set of virtual buffers. Section IV reports and discusses

the experimental results. Finally, in Section V conclusions are

drawn and future work is outlined.

II. BASELINE VBR CONTROLLER SUMMARY [43]

A. System Overview

In order to make the reading easier, the notation used along

the paper has been summarized in Table I. In this way, the

reader may turn to it when necessary and some superfluous

definitions may be skipped in the text to make it more readable.

The baseline RC scheme is illustrated in dark gray in Fig. 1

for an encoder consisting of two dependency layers. Let us

denote as Dthe number of dependency layers, identified as

d={0,1. . . D −1}, and let us denote as T(d)the number of

temporal layers for a particular dependency layer, identified as

t=0,1. . . T(d)−1. Alternatively, for the sake of notation

consistency with the proposed method, we will also refer to

the maximum temporal layer identifier T(d)−1as t(d)

max.

Each dependency layer dinvolves a rate controller RC(d)

and a virtual buffer. The virtual buffer at the layer dreceives

the contributions of those layers with identifiers from (0,0) to

(d, t(d)

max)and simulates the encoder buffering process of the

highest temporal resolution sub-stream. Thus, both the virtual

buffer and the corresponding sub-stream will be identified as

(d, t(d)

max)to indicate that the video packets with higher spatio-

temporal identifiers will be discarded by the target decoder.

The generation of an HRD-compliant sub-stream depends

on two fundamental parameters: the target bit rate R(d,t(d)

max)

and the output frame rate f(d,t(d)

max)

out . It should be noticed that

R(d,t(d)

max)must be higher than those associated with lower

layers, i.e., R(d−x,t(d)

max−y)≤R(d,t(d)

max), with x= 0,1. . . d,

and y=0,1. . . t(d)

max, since those lower layers form part of the

sub-stream d, t(d)

max.

If a particular dependency layer contained additional Q(d)

MGS refinements, denoted as q=1. . . Q(d)−1(it should

be noticed that q= 0 refers to the quality base layer for

a given dependency layer), a rate controller RC(d,q)and

the corresponding virtual buffer would be located at each

spatio-quality layer (d, q). However, in order to make the

notation easier, hereafter we will only consider spatial/CGS

and temporal scalability.

In order to encode the jth picture with spatio-temporal iden-

tifier (d, t), the RC(d)module should provide an appropriate

QP(d)

jvalue, on a frame basis, so that the QP fluctuation is

minimized (to improve visual quality consistency), while the

buffer fullness V(d,t(d)

max)is maintained at secure levels. To

this end, the RC(d)module operation leans on three input

parameters:

1) The fullness V(d,t(d)

max)of the corresponding virtual buffer.

2) The amount of bits yield by the encoding of the spatial

layers 0to dfor a given time instant. Henceforth, follow-

ing the H.264/SVC nomenclature [8], we will refer to this

amount of bits as access unit (AU) output bits AU(d,t).

3) The QP value used to encode the previous picture of the

same dependency layer QP(d)

j−1.

A proper QP increment ∆QP(d)is estimated from the two

firsts, and QP(d)

j−1is employed as a reference value to obtain

the final quantization parameter as follows:

QP(d)

j=QP(d)

j−1+∆QP(d).(1)

Furthermore, in the case of CGS scalability, the QP obtained

is lower bounded by the QP of the reference layer, so that a

higher quality for the enhancement layer is ensured:

QP(d)

j=minhQP(d−1)

j, QP(d)

ji.(2)

The VBR control algorithm for a specific spatial or CGS

layer, i.e., the algorithm that estimates an appropriate QP

increment for the jth picture with identifier (d, t)is illustrated

in Fig. 2. As shown in the figure, the RC module RC(d)

is organized in two stages, named parameter updating and

RBF-based QP increment estimation. These stages are briefly

described bellow.

B. Parameter Updating

After encoding the (j−1)th picture with layer identifier

(d, t0)(t0is used instead of tbecause the previous picture can

belong to a different temporal layer), two parameters required

to estimate the QP increment are updated: 1) a normalized

version of the buffer fullness, denoted as nV (d); and 2) a

normalized version of the amount of bits generated by the

Fig. 1. Block diagram of the baseline H.264/SVC RC scheme for two

dependency layers (D=2).

AU, denoted as nAU(d). These normalized versions of the

buffer fullness and the AU output bits are defined as follows:

nV (d)=max "0,min"V(d,t(d)

max)

BS(d,t(d)

max),1##,(3)

nAU(d)=max "1

2,min"AU(d,t0)

G(d,t0,t(d)

max),2##,(4)

where BS(d,t(d)

max)denotes the buffer size in bits, V(d,t(d)

max)and

AU(d,t0)have already been defined, and G(d,t0,t(d)

max)denotes

the bit budget for the AU at the layer (d, t0)in order for the

sub-stream (d, t(d)

max)to satisfy the target bit rate constraint

R(d,t(d)

max). The updating equations for V(d,t(d)

max),AU(d,t0), and

G(d,t0,t(d)

max)are briefly summarized next.

The buffer fullness V(d,t(d)

max)is updated as follows:

V(d,t(d)

max)

j=V(d,t(d)

max)

j−1+AU(d,t0)

j−1−R(d,t(d)

max)

f(d,t(d)

max)

out

,(5)

The amount AU output bits AU(d,t0)is updated as:

AU(d,t0)

j−1=

m=0 b(m,t0)

j−1+h(m,t0)

j−1,(6)

where b(m,t0)

j−1and h(m,t0)

j−1are, respectively, the amount of

texture bits and header plus motion data bits generated by

the (j−1)th picture with layer identifier (m, t0).

Finally, G(d,t0,t(d)

max)is determined by the following model:

G(d,t0,t(d)

max)=G(d,t(d)

max)

NOM +∆G(d,t0,t(d)

max)

T EX +∆G(d,t0,t(d)

max)

MOT ,(7)

where G(d,t(d)

max)

NOM is the nominal bit budget:

G(d,t(d)

max)

NOM =R(d,t(d)

max)

f(d,t(d)

max)

out

,(8)

Fig. 2. Block diagram of the rate controller module RC(d)for a specific

dependency layer d.

and ∆G(d,t0,t(d)

max)

T EX and ∆G(d,t0,t(d)

max)

MOT represent texture and

motion bit increments, respectively, i.e.:

∆G(d,t0,t(d)

max)

T EX =R(d,t(d)

max)

f(d,t(d)

max)

out







C(d,t0)

T EX Pt(d)

max

u=0 N(d,u)

Pt(d)

max

u=0 C(d,u)

T EXN(d,u)−1





,(9)

∆G(d,t0,t(d)

max)

MOT =C(d,t0)

MOT −

C(d,t0)

T EX Pt(d)

max

u=0 C(d,u)

MOTN(d,u)

Pt(d)

max

u=0 C(d,u)

T EXN(d,u),(10)

where N(d,u)is the total number of pictures per GoP with

layer identifier (d, u), and C(d,t0)

T EX and C(d,t0)

MOT denote, respec-

tively, the average texture and motion complexities of the

encoded pictures at the dependency layers 0to dbelonging

to the temporal layer t0. The following updating equations for

both complexity measurements are proposed:

C(d,t0)

T EX =α

m=0Q(m)

j−1b(m,t0)

j−1+(1 −α)C(d,t0)

T EX ,(11)

C(d,t0)

MOT =β

m=0

h(m,t0)

j−1+(1 −β)C(d,t0)

MOT ,(12)

where αand βare forgetting factors that are set to 0.5in

our experiments, and Q(m)

j−1is the quantization step value

associated with QP(m)

j−1.

C. RBF-based QP Increment Estimation

Before encoding the jth picture, the proper QP increment

∆QP(d)with respect to QP(d)

j−1should be estimated from

nV (d)in Eq. (3) and nAU(d)in Eq. (4). Furthermore, two

additional constant parameters are considered as inputs to this

process in order to provide a solution suitable for a variety of

scenarios. The first, denoted as nTF, is the normalized target

buffer fullness with respect to the buffer size; and the second,

denoted as BD, is the maximum buffering delay (or buffer

size in seconds), which is related to that measured in bits as

BS(d,t(d)

max)=BD×R(d,t(d)

max).(13)

Thus, the proposed ∆QP(d)estimation method operates on

the following input vector:

X(d)=nV (d), nAU(d), nTF, BDT

,(14)

implicitly assuming that all the virtual buffers share the same

nTF and BD values. Since the input parameters nT F and

BD are set before starting the encoding process, the proposed

∆QP(d)prediction function can be seen as a surface whose

shape depends on these constants.

An RBF network is used to estimate ∆QP(d)from the

input vector X(d)for any dependency layer. This RBF-based

estimation obeys:

∆QP(d)=round"w0+

i=1

wiHiX(d)#,(15)

where Lis the number of basis functions HiX(d)i=1,...,L,

withe output weights, and w0the bias. The output of the

RBF network is then converted into an integer, given the

discrete nature of the QP in H.264/SVC. The basis functions

are Gaussian-type functions with centers Ciand widths Σ,

that is:

HiX(d)

=exp



−

j=1 X(d)

j−Cij2

Σ2

j



.(16)

The training of the RBF network relies on a data set

containing pairs input vector-desired output, which have to be

previously generated. Once these training data were generated,

it was observed that their distributions for key (K) and non-

K (NK) pictures were different enough to justify the design

of two RBF networks, one for K pictures and the other for

NK pictures. Furthermore, some validation experiments were

performed to properly dimension the RBF networks whose

results led to 7Gaussian functions in both cases.

Finally, since some unnecessary fluctuations of the QP value

at NK pictures were observed in cases of stationary video

complexity when the buffer level approached the target buffer

fullness, a simple post-processing stage of the output of the

NK-picture RBF network was proposed, that obeys:

∆QP(d)=









−1if ∆QP(d)=−2

0if ∆QP(d)=−1

0if ∆QP(d)=1

1if ∆QP(d)=2.

(17)

In doing so, the number of short-term QP fluctuations

happening in stationary complexity situations was minimized

without decreasing the performance in time-varying situations.

III. IN-LAYER MULTI-BUFFER VBR CONTROLLER

A. System Overview

The proposed VBR control scheme is illustrated in Fig. 3.

For clarity reasons, only the dependency base layer (d= 0) of

the SVC encoder is shown. The blocks depicted in dark gray

are the extensions required by the baseline VBR controller

shown in Fig. 1 to become an IL-MB rate controller.

Each dependency layer dinvolves an RC module RC(d)and

a set of virtual buffers. Each of these virtual buffers simulates

the encoder buffering process of the sub-stream corresponding

to certain temporal resolution. In order to properly formulate

the IL-MB rate controller, a parameter t(d)

min is introduced that

indicates which of those temporal resolution sub-streams from

(d, 0) to (d, t(d)

max)should comply with the HRD constraints.

Specifically, when t(d)

min =t(d)

max, the proposed IL-MB RC

scheme becomes the baseline algorithm.

To make the explanation of the IL-MB framework easier,

let us follow the example illustrated in Fig. 3. In particular, the

input video is a quarter common intermediate format (QCIF)

sequence at 25 Hz using a GoP size of 8pictures, so that

encoded video from QCIF@3.125 Hz to QCIF@25 Hz can be

provided. Setting t(0)

min = 1 means that the three higher tempo-

ral resolution sub-streams (0,1),(0,2), and (0,3) should be

HRD-compliant and, consequently, their corresponding virtual

buffers should be controlled for proper video content delivery.

For the lowest temporal resolution sub-stream, however, the

HRD compliance would not be guaranteed.

Following with the example, when the jth picture with layer

identifier (0,2) (see Fig. 3) is going to be encoded, the goal of

the the RC module RC(0) is to provide an appropriate QP(0)

value, so that the set of virtual buffers involved are maintained

at secure levels. Specifically, the set of virtual buffers involved

in the encoding of the jth picture with layer identifier (0,2)

are:

V(0,2) =nV(0,k)ok=max

ht(0)

min,2i...t(0)

max

where V(0,k)denotes the buffer fullness associated with the

sub-stream (0, k), with k= maxht(0)

min,2i. . . t(0)

max. It should

be noticed that, since t(0)

min = 1, the lowest kvalue is 2and,

therefore, the two higher virtual buffers are updated. However,

if the picture belonged to a temporal layer lower than or equal

to t(0)

min, the three virtual buffers would be updated. From now

on, we will refer to the virtual buffers to be updated at the

time instant jas involved buffers.

It is also worth mentioning that all the involved buffers

must be taken into account to estimate the current QP value,

since a proper behavior is not guaranteed in all of them

otherwise. Thus, the method for properly controlling any set

V(d,t)becomes the main focus of the proposed IL-MB VBR

controller.

The rate controller RC(d), similarly to what was described

for the baseline RC approach, obtains a reference QP, QP(d)

REF ,

estimates a ∆QP(d)value, and finally computes the desired

QP(d)

jas follows:

QP(d)

j=QP(d)

REF +∆QP(d),(18)

The reference QP is computed from those QPs used for en-

coding the last pictures belonging to the sub-streams (d, t(d)

min)

to (d, t(d)

max)(see Subsection III-B2 for details). This set of

previous QPs, defined as

QP(d)=nQP(d,k)ok=t(d)

min...t(d)

max

is updated on a frame basis according to the involved buffers

at the time instant j, as described in Algorithm 1. It should be

noticed that the storage of this set of QPs requires a memory

block (see Fig. 3) that was not necessary in the baseline

approach (see Fig. 1), where there was just a delay line to

make previous QP value available.

Fig. 3. Block diagram of the proposed H.264/SVC RC scheme for IL-MB

control. Only the spatial base layer is depicted for the sake of clarity.

Algorithm 1 QP(d)updating procedure

1. for k=max ht(d)

min, tito t(d)

max do {involved buffers}

2. QP(d,k)←QP (d)

3. end for

The QP increment is selected to provide a slow QP variation

so that the visual quality consistency is improved. Similarly

to what was described for the baseline VBR control algo-

rithm, the following input parameters are required to compute

∆QP(d):

1) The current fullness of the virtual buffers (d, t(d)

min)to

(d, t(d)

max).

2) The amount AU(d,t)of AU output bits.

In the following subsection, a detailed description of the RC

module for IL-MB control at a specific dependency layer is

given.

B. The Rate Controller Module RC(d)

The MB-based RC module RC(d)is illustrated in Fig.

4. The estimation of QP(d)

jis performed in three stages,

namely: parameter updating,buffer modeling and RBF-based

QP increment estimation, which are described in more detail

through the next subsections.

1) Parameter Updating: After encoding the (j−1)th picture

with layer identifier (d, t0), two parameter sets, required to esti-

mate the QP increment, should be updated: 1) the normalized

versions of the buffer levels (d, t(d)

min)to (d, t(d)

max), denoted

as nV(d); and 2) the normalized versions of AU(d,t0)for the

sub-streams (d, t(d)

min)to (d, t(d)

max), denoted as nAU(d). These

parameter sets are defined as follows:

nV(d)=V(d,k)

BS(d,k)k=t(d)

min...t(d)

max

nAU(d)=AU(d,t0)

G(d,t0,k)k=t(d)

min...t(d)

max

where BS(d,k)is the buffer size in bits for the sub-stream

(d, k), which is computed from the buffer size in seconds,

Fig. 4. Block diagram of the MB-based rate controller module RC(d)for

a specific dependency layer d.

BD, and the target bit rate, R(d,k)(see Eq. (13)); and G(d,t0,k)

is the AU target bits at the layer (d, t0)to satisfy R(d,k).

These updating equations require the previous update of

the involved buffers V(d,t0)and the estimation of the set

of AU target bits {G(d,t0,k)}. In turn, the update of the

involved buffers requires to obtain the AU output bits AU(d,t0),

and the estimation of the set of AU target bits requires the

previous update of average texture and motion complexities

for each temporal layer ufrom 0to t(d)

max,C(d,u)

T EX and C(d,u)

MOT ,

respectively.

The virtual buffer levels, the AU output bits, the AU target

bits, as well as the average texture and motion complexities

are updated as in Subsection II-B, but replacing t(d)

max by the

index k, which takes values from t(d)

min to t(d)

max. Algorithm 2

summarizes the complete updating procedure for nV(d)and

nAU(d).

Algorithm 2 nV(d)and nAU(d)updating procedure

1. Compute AU(d,t0)(6)

2. Update C(d,t0)

T EX (11)

3. Update C(d,t0)

MOT (12)

4. for k=max ht(d)

min, t0ito t(d)

max do {involved buffers}

5. Update V(d,k)(5)

6. Compute G(d,t0,k)(7), (8), (9), (10)

7. nV (d,k)←max h0,minhV(d,k)

BS(d,k),1ii

8. nAU(d,k)←max h1

2,minhAU(d,t0)

G(d,t0,k),2ii

9. end for

2) Buffer Modeling: In this stage three parameters required

to estimate the QP value are computed. These parameters are

representative values of the sets QP(d)(Algorithm 1), nV(d),

and nAU(d)(Algorithm 2), which are denoted as QP(d)

REF ,

nV (d), and nAU(d), respectively. The first parameter is used

as reference QP in Eq. (18), while the last two are required

for ∆QP(d)estimation.

The buffer modeling algorithm suggested for estimating the

aforementioned values is made up of several decision rules that

are described next. If none of the involved buffer levels is close

to overflow or underflow, then nV (d),nAU(d)and QP(d)

REF are

computed as the arithmetic average of the parameters corre-

sponding to the involved temporal resolutions. Otherwise, only

the parameters coming from that temporal resolution showing

the most critical buffer fullness is considered. Nevertheless,

given that more than one involved buffer fullness could be

considered as critical at a certain time instant, the following

precedence rules have been established (relying on certain

observations about the time evolution of the virtual buffers

for a variety of video sequences):

1) Since the overflow risk is more likely than the underflow

risk, especially when encoding I pictures, the overflow

risk is given precedence in each involved buffer.

2) Since the buffer of the lowest temporal resolution usually

exhibits the largest fluctuations and, therefore, the highest

overflow and underflow risks (since its buffer size in bits

is the smallest for a given pre-established BD value), the

involved buffer levels are given precedence according to

their temporal layer identifier.

The pseudocode given in Algorithm 3 summarizes the

proposed buffer modeling process.

Algorithm 3 nV (d),nAU(d)and QP(d)

REF updating procedure

1. nV (d)=nAU(d)=QP(d)

REF =0

2. NumOfInvolBuffers=t(d)

max −maxht(d)

min, ti+1

3. for k=max ht(d)

min, tito t(d)

max do {involved buffers}

4. if nV (d,k)≥0.8then {overflow risk}

5. nV (d)←nV (d,k)

6. nAU(d)←nAU(d,k)

7. QP(d)

REF ←QP (d,k)

8. break for

9. else if nV (d,k)≤0.2then {underflow risk}

10. nV (d)←nV (d,k)

11. nAU(d)←nAU(d,k)

12. QP(d)

REF ←QP (d,k)

13. break for

14. else {secure level}

15. nV (d)←nV (d)+nV (d,k)

16. nAU(d)←nAU(d)+nAU(d,k)

17. QP(d)

REF ←QP (d)

REF +QP (d,k)

18. if k=t(d)

max then {all buffers at secure levels}

19. nV (d)←nV (d)

NumOfInvolBuffers

20. nAU(d)←nAU(d)

NumOfInvolBuffers

21. QP(d)

REF ←roundQP (d)

REF

NumOfInvolBuffers 

22. end if

23. end if

24. end for

It is worth noticing that, although the given description of

the buffer modeling stage is tied to the baseline RC algorithm

formulation, the underlying ideas migth be adapted to any

other RC algorithm for SVC in order to obtain the proper

values of the required parameters for QP estimation.

3) RBF-based QP Increment Estimation: As in the baseline

RC scheme, the four-dimensional input vector given in Eq.

(14) is fed into an RBF network to produce a ∆QP(d)

estimation. Actually, two different networks are used, one for

K pictures and the other for NK pictures. The architecture

of each RBF network is the same as that given in Eqs. (15)

and (16); however, the RBF network parameters must be

specifically trained to cope with the proposed IL-MB method,

where the buffer and distortion constraints for QP selection are

tougher; in particular, the RBF network parameters should be

chosen to properly deal with the fact that several buffers have

to be simultaneously controlled within a dependency layer.

In order to find the most suitable RBF network parameters,

a training data set was previously generated. Subsequently, the

training and parameter selection processes were performed. To

this purpose, the general methodology described in [43] was

followed; nevertheless, the cost function used for data labeling

had to be modified so that the desired QP increment would

adapt to the IL-MB framework, providing a good tradeoff

between the control of the involved buffers and the quality

consistency of the corresponding sub-streams. The adapted

cost function is given in Appendix A.

The training and validation results led us to select 10

Gaussian-type functions for both (K- and NK-picture) RBF

networks, whose parameters are also given in Appendix A.

Finally, the post-processing stage of the output of the NK-

picture RBF network given in Eq. (17) is also performed in

order to reduce unnecessary QP fluctuations.

IV. EXPERIMENTS AND RESULTS

The Joint Scalable Video Model (JSVM) H.264/SVC refer-

ence software version JSVM 9.16 [48] was used to implement

the proposed IL-MB VBR controller. Its performance was

compared to other two methods: 1) constant QP (CQP) encod-

ing1, which can be seen as an unconstrained VBR controller

[11] and was used as a reference for nearly constant quality

video; and 2) our baseline VBR controller described in [43],

which can be seen as a particular case of the proposed method

when t(d)

min =t(d)

max for every dependency layer.

In the following subsections, the SVC encoder and RC

configurations employed for comparisons are described, the

experimental results are given, and a discussion concerning

these results is provided.

A. Description of the SVC Encoder and RC Configurations

According to the SVC testing conditions recommended in

[49], the mobile live streaming scenario described in [43] was

used to assess the aforementioned algorithms. In particular,

the following five-dependency layer H.264/SVC encoder con-

figuration was used for the baseline VBR controller:

a) Number of pictures: 900.

b) GoP size/Intra period: 8/32 pictures.

c) GoP structure: hierarchical B pictures.

d) Search range for motion estimation: 16×16 pixels.

e) Number of dependency layers: D=5.

i) d=0 : QCIF, f(0,1)

out =6.25 Hz T(0) =2.

ii) d=1 : QCIF, f(1,2)

out =12.5Hz T(1) =3.

iii) d=2 : CIF, f(2,2)

out =12.5Hz T(2) =3.

1CQP encoding means that every temporal layer within a spatial or quality

layer shares the same QP value, while the QP value of each spatial or quality

layer can be different in order to reach the pre-established target bit rate.

iv) d=3 : CIF, f(3,2)

out =12.5Hz T(3) =3.

v) d=4 : CIF, f(4,3)

out =25 Hz T(4) =4.

f) Symbol mode: CAVLC.

The RC parameters were set as follows: target buffer full-

ness nTF = 50% and buffer size BD = 3 s. Henceforth, we

will refer to this SVC configuration as baseline configuration

(BC) and to the rate-controlled SVC (RC-SVC) as single-

buffer BC (SB-BC).

For the proposed IL-MB VBR controller, the following

three-dependency layer H.264/SVC encoder configuration was

used:

a) Number of pictures: 900.

b) GoP size/Intra period: 8/32 pictures.

c) GoP structure: hierarchical B pictures.

d) Search range for motion estimation: 16×16 pixels.

e) Number of dependency layers: D=3.

i) d=0: QCIF, f(0,2)

out =12.5Hz T(0) =3.

ii) d=1: CIF, f(1,2)

out =12.5Hz T(1) =3.

iii) d=2: CIF, f(2,3)

out =25 Hz T(2) =4.

f) Symbol mode: CAVLC.

We will refer to this SVC encoder configuration as compact

configuration (CC) since it consists of only three layers in

comparison to BC, which is made of five layers. The RC

parameters took the following values: nTF = 50% and

BD =3 s., the same as for SB-BC, and t(0)

min = 1,t(1)

min = 2,

and t(2)

min = 2. As can be observed, t(0)

min and t(2)

min were

set such that HRD-compliant sub-streams for QCIF@6.25 Hz

(d= 0) and high-quality (HQ) CIF@12.5Hz (d= 2) were

available, as for SB-BC. Henceforth, this RC-SVC encoder

will be referred to as MB-CC.

Furthermore, in order to analyze the behavior of the pro-

posed VBR controller if only one buffer per dependency layer

was controlled (that corresponding to the highest frame rate),

an additional H.264/SVC encoder and RC configuration with

t(d)

min =t(d)

max for every dependency layer was also studied. We

will refer to it as SB-CC.

Two sets of video sequences at 25 Hz exhibiting a variety

of complexities were used in our experiments. The first set

consisted of four well-known test sequences recommended

in [49] for streaming applications: Bus,Football,Foreman,

and Mobile. These sequences were concatenated to themselves

several times to reach the aforementioned number of pictures.

The second set consisted of three sequences displaying scene

changes: Soccer-Mobile-Foreman,Spiderman (movie), and

The Lord of the Rings (movie). Soccer-Mobile-Foreman was

formed by concatenating 300 frames of each sequence. The

other two were extracted from HQ Digital Versatile Disks and

downsampled to either QCIF or CIF format, and have been

made available on-line in [50]. They show many scene cuts,

so they are challenging from the RC point of view.

All the sequences were encoded using the set of constant

QP values that best approached some pre-established target bit

rates. We will refer to this RC-SVC encoder as CQP-CC. For

the first group of sequences, the target bit rates for the highest

temporal resolution of each layer d, i.e., QCIF@12.5Hz (0,2),

low-quality (LQ) CIF@12.5Hz (1,2) and HQ CIF@25 Hz

TABLE II

TARGET BIT RATES ASSIGNED TO EACH LAYER OF THE COMPARED

RC-SVC ENCODERS

Layer RC-SVC Resolution Assigned R(d,t)

(d,t) Encoder from CQP-CC

(0,1) SB-BC

[email protected] Hz R(0,1)

out

- -

(0,1) MB-CC

(1,2) SB-BC

[email protected] Hz R(0,2)

out

(0,2) SB-CC

(0,2) MB-CC

(2,2) SB-BC

LQ [email protected] Hz R(1,2)

out

(1,2) SB-CC

(1,2) MB-CC

(3,2) SB-BC

HQ [email protected] Hz R(2,2)

out

- -

(2,2) MB-CC

(4,3) SB-BC

HQ CIF@25 Hz R(2,3)

out

(2,3) SB-CC

(2,3) MB-CC

TABLE III

AVERAGE RESULTS ACHIEVED BY THE SB-BC, THE SB-CC, AND THE

PROPOSED MB-CC VBR CONTROLLERS. INCREMENTAL RESULTS ARE

GIVEN WITH RESPECT TO CQP-CC ENCODING

Layer RC-SVC ∆µPSNR ∆σPSNR,jBit Rate #O/#U µV

(d,t) encoder. (dB) (dB) Error (%) (%)

(0,1) SB-BC -0.12 0.09 1.00 0/0 52.34

(0,1) SB-CC 0.05 0.22 2.68 5/0 59.90

(0,1) MB-CC 0.05 0.19 1.48 0/0 55.72

(1,2) SB-BC -0.25 0.15 1.86 1/0 64.77

(0,2) SB-CC 0.10 0.19 0.94 0/0 55.09

(0,2) MB-CC 0.08 0.16 0.84 0/0 54.66

(2,2) SB-BC -0.16 0.11 0.94 0/0 52.98

(1,2) SB-CC 0.00 0.09 0.93 0/0 55.26

(1,2) MB-CC 0.00 0.09 1.02 0/0 55.19

(3,2) SB-BC -0.10 0.07 0.59 0/0 52.42

(2,2) SB-CC 0.05 0.12 1.45 0/0 56.42

(2,2) MB-CC 0.06 0.11 0.79 0/0 54.17

(4,3) SB-BC -0.21 0.06 1.57 0/0 64.82

(2,3) SB-CC 0.09 0.11 0.48 0/0 54.02

(2,3) MB-CC 0.08 0.10 0.51 0/0 53.43

(2,3) were those suggested in [49] for the spatial/CGS testing

scenario. For the second group, the following medium-quality

target bit rates associated with the highest temporal resolution

of each layer dwere selected: 96 (0,2),192 (1,2), and 512

kbps (2,3). The output bit rates R(d,t)

out generated by the CQP-

CC encoding for the five target spatio-temporal resolutions

were used as target bit rates R(d,t)for the three assessed RC-

SVC encoders, i.e.: SB-BC, SB-CC, and MB-CC. The same

target bit rates were assigned to each involved spatio-temporal

layer for all the RC-SVC encoders so that all the compared

encoders operated under the same bit rate constraints. The

actual R(d,t)values are listed in Table II. It should be noted

that the low temporal resolution for both QCIF and HQ CIF

layers is not rate-controlled in SB-CC.

B. Experimental Results and Discussion

In order to assess the performance of the proposed IL-

MB VBR controller from the quality point of view, the

average luminance peak SNR (PSNR) µP SNR was used. The

Bjøntegaard recommendation [51] was followed to properly

TABLE IV

PERFORMANCE COMPARISON AMONG THE SB-BC, THE SB-CC, AND THE

PROPOSED MB-CC VBR CONTROLLERS,FOR A SPECIFIC STATIONARY

COMPLEXITY VIDEO SEQUENCE,Bus. THE RESULTS ACHIEVED BY

CQP-CC ENCODING HAVE ALSO BEEN INCLUDED FOR REFERENCE

Layer R(d,t)RC-SVC µPSNR σPSNR,jBit Rate #O/#U µV

(d,t) (kbps) Scheme (dB) (dB) Error (%) (%)

(0,1)

73.89

CQP-CC 31.24 0.31 - 0/0 51.97

(0,1) SB-BC 31.24 0.31 -0.03 0/0 51.89

(0,1) SB-CC 31.23 0.39 0.95 0/0 56.07

(0,1) MB-CC 31.22 0.40 0.96 0/0 55.53

(0,2)

101.61

CQP-CC 31.11 0.27 - 0/0 52.70

(1,2) SB-BC 31.00 0.32 1.27 0/0 63.15

(0,2) SB-CC 31.10 0.35 0.52 0/0 53.94

(0,2) MB-CC 31.10 0.36 0.50 0/0 52.88

(1,2)

202.67

CQP-CC 26.94 0.16 - 0/0 51.91

(2,2) SB-BC 26.86 0.19 -0.09 0/0 51.03

(1,2) SB-CC 26.92 0.23 0.26 0/0 53.44

(1,2) MB-CC 26.92 0.23 0.44 0/0 53.15

(2,2)

404.97

CQP-CC 30.01 0.19 - 0/0 52.01

(3,2) SB-BC 29.99 0.19 -0.04 0/0 52.15

(2,2) SB-CC 30.01 0.25 0.40 0/0 54.76

(2,2) MB-CC 30.02 0.24 0.53 0/0 53.66

(2,3)

517.67

CQP-CC 30.05 0.17 - 0/0 52.20

(4,3) SB-BC 29.91 0.18 1.5 0/0 65.57

(2,3) SB-CC 30.05 0.22 0.31 0/0 54.43

(2,3) MB-CC 30.06 0.21 0.46 0/0 53.41

compare the µP SNR values obtained by the compared algo-

rithms. The average results over all the test video sequences

are summarized in Table III in terms of PSNR increments

∆µP SNR with respect to CQP-CC encoding. Three rows per

spatio-temporal layer are shown, one for each assessed RC-

SVC encoder. As can be observed, the average PSNR achieved

by SB-CC and MB-CC at every spatio-temporal layer were

similar to that of CQP-CC and higher than that of SB-BC,

which, for the same target bit rate R(d,t), is encoding more

layers.

A detailed comparison of the algorithms is shown in Tables

IV and V. Table IV shows the results achieved for Bus,

a representative example of video sequence with stationary

complexity, and Table V shows the results for The Lord of the

Rings, a representative example of video sequence with scene

changes. The results in terms of average PSNR indicate that,

for non-stationary complexity sequences, the performance of

either SB-CC or MB-CC improved that of the nearly constant

quality system at most spatio-temporal layers. However, for

stationary complexity sequences, the performance achieved by

the three VBR controllers were very close to that of the nearly

constant quality system.

Representative behaviors of the encoder buffer occupancy,

PSNR and QP time evolutions corresponding to the two lower

spatio-temporal resolutions, QCIF@6.25 Hz and QCIF@12.5

Hz, are depicted in Figs. 5 and 6 for Bus, and Figs. 7

and 8 for The Lord of the Rings, where the QCP-CC plots

have been removed for clarity reasons. High quality plots

including those of CQP-CC encoding can be found in [50]

for every spatio-temporal resolution. As can be shown, in the

stationary scenario the three assessed VBR controllers were

able to keep the QP fluctuation low most of the time, thus

providing a nearly constant PSNR time evolution. However,

some high buffer levels and QP fluctuations were observed

TABLE V

PERFORMANCE COMPARISON AMONG THE SB-BC, THE SB-CC, AND THE

PROPOSED MB-CC VBR CONTROLLERS,FOR A SPECIFIC

NON-STATIONARY COMPLEXITY VIDEO SEQUENCE,The Lord of the

Rings. THE RESULTS ACHIEVED BY CQP-CC ENCODING HAVE ALSO

BEEN INCLUDED FOR REFERENCE

Layer R(d,t)RC-SVC µPSNR σPSNR,jBit Rate #O/#U µV

(d,t) (kbps) Scheme (dB) (dB) Error (%) (%)

(0,1)

66.50

CQP-CC 34.45 0.66 - 42/48 49.70

(0,1) SB-BC 34.40 0.91 2.31 0/0 53.89

(0,1) SB-CC 34.75 0.96 5.53 36/0 70.17

(0,1) MB-CC 34.77 0.94 1.70 0/0 62.76

(0,2)

93.99

CQP-CC 34.36 0.66 - 104/113 46.71

(1,2) SB-BC 34.23 0.99 2.59 0/0 60.90

(0,2) SB-CC 34.80 0.97 1.05 0/0 54.38

(0,2) MB-CC 34.76 0.94 0.20 0/0 50.47

(1,2)

186.51

CQP-CC 32.87 0.90 - 98/113 47.26

(2,2) SB-BC 32.77 1.09 1.90 0/0 52.02

(1,2) SB-CC 33.15 1.08 1.00 0/0 55.21

(1,2) MB-CC 33.12 1.07 1.18 0/0 55.88

(2,2)

385.34

CQP-CC 35.25 0.83 - 93/114 45.16

(3,2) SB-BC 35.31 0.94 1.73 0/0 51.98

(2,2) SB-CC 35.54 1.00 3.24 0/0 71.54

(2,2) MB-CC 35.58 0.94 0.92 0/0 58.58

(2,3)

507.26

CQP-CC 35.29 0.81 - 217/241 45.11

(4,3) SB-BC 35.27 0.95 2.26 0/0 58.81

(2,3) SB-CC 35.69 0.99 0.42 0/0 53.58

(2,3) MB-CC 35.67 0.94 0.04 0/0 49.95

at certain time instants for SB-BC (see Fig. 6) because more

layers were encoded for a given target bit rate. In the non-

stationary scenario the three assessed algorithms made, with

some exceptions that will be dicussed, a proper use of the

buffer fullness to provide PSNR and QP evolutions closer

to those of the nearly constant quality system, as expected

for VBR control algorithms, given that larger amount of bits

were assigned to more complex scenes. The undesirable buffer

levels observed in the SB-CC VBR controller at the layer

(0,1) (see Fig. 7) were due to the fact that only the highest

temporal resolution buffer associated with the layer (0,2) was

considered for QP estimation. Furthermore, as in the stationary

scenario, some undesirable buffer levels and QP fluctuations

also happened at the highest temporal resolution sub-stream

for SB-BC (see Fig. 8), again due to the fact that it is coding

more layers.

From the quality consistency point of view, the performance

of the VBR controllers was also assessed by means of a

time-local version of the PSNR standard deviation, denoted as

σP SNR,j, which attempts to measure the quality consistency

within a scene by reducing the impact of the scene cuts on

the PSNR standard deviation (the reader is referred to [43] for

details). Thus, a low value of σP SNR,j indicates good quality

consistency, and vice versa. The average results over all the test

video sequences in terms of σP SNR,j increment with respect

to CQP-CC encoding, ∆σP SNR,j , are provided in Table III.

As can be seen, the three VBR controllers achieved a quality

consistency close to that of CQP-CC encoding. Furthermore,

the σP SNR,j differences among them were not significant

either in particular stationary (see Table IV) or non-stationary

scenarios (see Table V), as expected, since the VBR controllers

were specially designed to provide consistent-quality scalable

sub-streams.

Fig. 5. Encoder buffer occupancy, PSNR, and QP time evolutions correspond-

ing to the spatio-temporal resolution QCIF@6.25 Hz for Bus. A high-quality

plot is available on-line in [50].

Fig. 6. Encoder buffer occupancy, PSNR, and QP time evolutions correspond-

ing to the spatio-temporal resolution QCIF@12.5Hz for Bus. A high-quality

plot is available on-line in [50].

The VBR controllers were also quantitatively compared in

terms of target bit rate adjustment and buffer level behavior.

To this end, the following metrics were employed: output

bit rate error with respect to that of CQP-CC encoding,

number of pictures in which either an overflow (#O) or an

underflow (#U) occurred, and mean buffer level (µV). As

can be seen in Table III, the average output bit rate errors

achieved by the three VBR controllers at every spatio-temporal

layer were generally below 2%, that is the maximum bit

rate error recommended in [49] for the spatial/CGS testing

scenario. Nevertheless, in some sequences with time-varying

complexity, such as The Lord of the Rings, higher bit rate

errors occurred in some spatio-temporal layers for the SB-BC

and SB-CC VBR controllers (see Table V). Specifically, for

the SB-BC VBR controller, such bit rate mismatches together

with the large µVvalues observed in layers (1,2) and (4,3)

indicate that the corresponding target bit rates were not high

Fig. 7. Encoder buffer occupancy, PSNR, and QP time evolutions corre-

sponding to the spatio-temporal resolution QCIF@6.25 Hz for The Lord of

the Rings. A high-quality plot is available on-line in [50].

Fig. 8. Encoder buffer occupancy, PSNR, and QP time evolutions corre-

sponding to the spatio-temporal resolution QCIF@12.5Hz for The Lord of

the Rings. A high-quality plot is available on-line in [50].

enough to encode all the spatio-temporal layers. For the SB-

CC VBR controller, the results in terms of bit rate error,

mean buffer level, and number of overflows shown in Table

V for layers (0,1) (see also Fig. 7) and (2,2), proved the

need of simultaneously controlling all the involved buffers

within a dependency layer, as in the MB-CC VBR controller,

which was able to prevent overflow and underflow in all the

encodings (see Tables III–V).

From the complexity point of view, although the computa-

tional cost of the IL-MB VBR controller is slightly higher

than that of its baseline version, the increment is clearly

justified by two facts: 1) the computational cost of the baseline

rate controller proved to be significantly lower than those of

conventional approaches [43]; therefore, there is room enough

to allocate some moderate increment as the one proposed; and

2) the IL-MB approach actually reduces the number of spatio-

temporal layers to be encoded; for example, in the simulated

Related document tools

Prepare academic work with more confidence

Plag helps compare academic text against other sources. Identific can help confirm that document-related processes are easier to manage. Together, they help make academic review more transparent.

plag.ai

mobile live streaming scenario, MB-CC uses three dependency

layers instead of five (used by SB-BC) for delivering HRD-

compliant video content to five target terminals, thus substan-

tially improving the overall coding efficiency.

It should be noticed that the good performance achieved

by the proposed MB-CC VBR controller, specifically at the

lowest and the highest dependency layers, could be partly

due to the fact that the total bit rate per dependency layer

was optimally distributed among temporal layers since the

corresponding R(d,t)values were previously obtained using

CQP-CC encoding. In real-time video coding applications,

the optimal distribution of the target bit rate among temporal

layers is not known in advance because it depends on the video

content. For instance, the target bit rate for a sequence with

high spatial detail but low motion content should be shared

out among temporal layers such that the bit resources are

mainly allocated to K pictures. However, for a sequence with

medium-low spatial detail but high motion content, a more

balanced target bit rate distribution between K and NK pictures

is desirable to encode better the motion information.

In order to explore the sensitivity of the proposed MB-

CC VBR controller to target bit rate deviations with respect

to those obtained by CQP-CC encoding, we performed an

“ad hoc” experiment. This experiment involved modifying the

target bit rates of the low temporal resolutions of those layers

encoded using the IL-MB approach. In particular, assuming

that the target bit rates for the highest temporal resolutions can

be set in advance following, for instance, the recommendation

in [49], the target bit rates for QCIF@6.25 Hz (0,1) and HQ

CIF@12.5Hz (2,2) were deviated ±2%, ±5%, and ±10%

from their corresponding reference target bit rates. The average

results over all the test video sequences in terms of ∆µP SNR,

∆σP SNR,j, and bit rate error with respect to those achieved

without R(d,t)deviations, as well as the number of overflows

and underflows and mean buffer level, are summarized in Table

VI. As can be observed, target bit rate deviations of 10% led

to noticeable loss of qualty consistency (due to the increase of

QP fluctuations caused by the sub-optimal target bit rates), bit

rate errors above 2%, and mean buffer levels close to either

overflow or underflow.

It is also interesting to notice how the sub-optimal distri-

bution of the target bit rate affects to the buffer levels of

the involved temporal layers. To this end, let us focus on

the results from layers (0,{1,2})for an R(d,t)deviation of

+10%. As can be observed, the corresponding µVtook op-

posite values: the low temporal resolution buffer was close to

underflow, while the high temporal resolution buffer was close

to overflow. This mirror-like behavior of the buffers is due to

the fact that the buffer modeling stage averages the current

encoding states of the involved temporal resolutions at many

time instants for nV (0),nAU(0) and QP(0)

REF computation.

Although optimum adjustment to R(0,{1,2})or nTF were not

achieved, neither overflows nor underfllows ocurred in most of

the assessed video sequences. However, if the highest temporal

resolution buffer was only considered for QP estimation (as in

SB-CC), a suitable adaptation to both R(0,2) and nTF would

be achieved at the expense of a higher underflow risk at the

lowest temporal resolution buffer. In short, when the target bit

TABLE VI

AVERAGE RESULTS ACHIEVED BY THE PROPOSED MB-CC VBR

CONTROLLER FOR DIFFERENT TARGET BIT RATE DEVIATIONS AT

LAYERS (0,1) AND (2,2). INCREMENTAL RESULTS ARE GIVEN WITH

RESPECT TO THOSE ACHIEVED BY CQP-CC ENCODING

Layer R(d,t)∆µPSNR ∆σPSNR,jBit Rate #O/#U µV

(d,t) Dev. (%) (dB) (dB) Error (%) (%)

(0,1)

+10 0.80 0.18 1.60 0/0 31.90

+5 0.44 0.02 1.05 0/0 41.56

+2 0.14 0.00 0.89 0/0 49.93

-2 -0.28 -0.01 1.75 0/0 59.36

-5 -0.62 0.03 2.17 0/0 64.51

-10 -1.20 0.14 2.69 1/0 71.01

(0,2) 0

-0.04 0.22 2.15 1/0 67.42

0.03 0.04 1.52 0/0 61.16

0.00 0.01 1.06 0/0 57.29

-0.08 0.00 0.66 0/0 51.78

-0.17 0.03 0.72 0/0 48.84

-0.34 0.17 0.74 0/0 45.09

(1,2) 0

-0.07 0.02 0.75 0/0 55.30

-0.04 0.01 0.97 0/0 55.23

-0.04 0.01 0.84 0/0 55.18

-0.06 0.00 0.91 0/0 54.98

-0.09 -0.01 0.77 0/0 54.59

-0.16 -0.02 0.70 0/0 54.88

(2,2)

+10 0.81 0.24 2.29 0/0 29.98

+5 0.47 0.04 1.29 0/0 39.88

+2 0.17 0.01 0.42 0/0 48.80

-2 -0.27 -0.01 1.34 0/0 58.38

-5 -0.61 0.02 1.88 0/0 64.87

-10 -1.18 0.17 2.47 0/0 72.20

(2,3) 0

-0.01 0.21 1.87 0/0 67.68

0.05 0.03 1.28 0/0 61.34

0.01 0.01 0.78 0/0 57.04

-0.08 0.00 0.51 0/0 49.75

-0.17 0.03 0.60 0/0 47.06

-0.33 0.21 0.77 0/0 43.25

rate distributions among temporal resolutions are not optimally

distributed, the proposed method for IL-MB control makes its

best to provide a good tradeoff between quality consistency

and buffer control in all the involved buffers.

V. CONCLUSIONS AND FURTHER WORK

In this paper a novel IL-MB approach built on top of a

baseline VBR controller for H.264/SVC has been proposed.

Given a dependency layer, our proposal aims to deliver HRD-

compliant sub-streams with different temporal resolutions. In

doing so, temporal scalability is fully exploited by reducing

the number of dependency layers required to provide the

same spatial or quality level for decoding terminals requiring

different frame rates. For this purpose, the proposed IL-MB

VBR controller estimates, on a frame basis, the most proper

QP value such that the virtual buffers, each one associated

with a temporal resolution of the same dependency layer, are

maintained at secure levels, while minimizing the distortion

of the corresponding sub-streams. Furthermore, the decision

rules suggested for simultaneously controlling the set of virtual

buffers might be used in any other RC algorithm for SVC.

In order to guarantee robust performance, the proposed IL-

MB framework requires proper target bit rates for the lower

temporal resolution sub-streams to be known in advance. An

effective method to estimate such target bit rates is left for

future work.

APPENDIX A

RBF NETWORK DESIGN

The methodology described in [43] was followed to find the

most suitable RBF network parameters for both K and NK

pictures. This methodology may be structured in three stages:

training data generation,training process and parameter

selection process. These stages are summarized in the sequel.

1) Training Data Generation

The first stage focuses on the extraction of a training

data set consisting of pairs input vector-desired output, i.e.:

X(d),∆QP∗(d). To this end, a representative set of video

sequences exhibiting a large variety of spatio-temporal con-

tents was employed and some of their GoPs were encoded

using Φdifferent configurations involving several encoder-

and RC-related parameters: number of dependency layers,

spatial resolutions, GoP size, target bit rate, minimum available

temporal layer identifier, target buffer level, and buffer size.

Given an input vector X(d)extracted from a picture with

identifier (d, t0)encoded with a configuration φat the time

instant (j−1), the goal was to find, from a set of Qquantiza-

tion increments n∆QP(d)

qoq=1,...,Q, the most appropriate QP

increment ∆QP∗(d)for the next picture with identifier (d, t)

so that each involved buffer k, with k=maxht(d)

min, ti. . . t(d)

max,

was maintained at secure levels, while minimizing the coding

distortion of the corresponding sub-streams. Specifically, t(d)

min

was fixed to t(d)

max −2so that three buffers (at most) could be

simultaneously controlled in an IL-MB framework2.

To satisfy these buffer and distortion constraints, the

∆QP(d)

qvalue that minimized certain cost function Ψwas

chosen as optimum:

∆QP∗(d)=argmin

∆QP (d)

∆QP(d)

q.(19)

The proposed cost function, which was designed “ad hoc”

for this problem, balances three conflicting factors: quality

consistency, buffer control, and QP consistency. Specifically,

Ψobeys:

Ψ∆QP(d)

q=λ1θ





D(d)

j−D(d,k)

255

t(d)

max −maxht(d)

min, ti+1







λ2







k V(d,k)

j+1

BDφ×R(d,k)

−nTFφ

t(d)

max −maxht(d)

min, ti+1







λ3 ∆QP(d)

∆QP(d)

MAX!2

.(20)

2For a sequence frame rate of 25 Hz, t(d)

min =t(d)

max −2means that the

minimum output frame rate of encoded video ensuring the HRD constraints is

the fourth part (6.25 Hz), which is a sufficient temporal resolution in practical

SVC applications [49].

The first term monitors the quality consistency by means

of the squared mean of the differences between the distortion

D(d)

jof the current picture and the average distortion D(d,k)

of each sub-stream (d, k). The distortion metric used was the

mean of absolute error between the original and reconstructed

luminance pictures. Furthermore, θis a scaling factor so that

the dynamic range of this term was similar to the remaining

terms. In particular, θwas set to 100 in our experiments.

The second term considers the buffer control through the

squared mean of the differences between the normalized

current buffer level V(d,k)

j+1 /BDφ×R(d,k)

φassociated with each

sub-stream (d, k)and the normalized target buffer fullness

nTFφ.

The third term watches over the QP consistency by means

of the squared ratio between the considered QP increment and

the maximum allowed QP increment ∆QP (d)

MAX .

Finally, the weight vector (λ1, λ2, λ3)Tis meant to establish

a proper tradeoff among the considered conflicting factors.

2) RBF Network Training and Parameter Selection

To consider different tradeoffs among the three terms of

the cost function for data labeling, a reduced set of tentative

weight vectors was previously selected. Subsequently, for each

pre-established weight vector, two training data sets, one for K

pictures and the other for NK pictures, were generated. Each

RBF network was trained several times considering each one

of the pre-established weight vectors, different random initial-

izations, and different numbers Lof radial basis functions. For

this purpose, a training algorithm based on Gaussian processes

(GP) [52] was used because it provides a robust solution for

the parameters that relies on maximizing a marginal likelihood.

In particular, the sparse approximation GP toolbox for Matlab

[53] due to Snelson and Gharahmani [54] was used.

Finally, the validation process for parameter selection

led us to select the weight vectors (0.90,0.09,0.01)Tand

(0.75,0.24,0.01)Tfor K- and NK-picture RBF networks,

respectively, as well as a total of 10 Gaussian-type functions

for each RBF network. Specifically, their centers, widths and

weights are the following (also available on line in electronic

format in [50]):

i. K-picture RBF parameters

w0=−2.11439,w=







−947.17558

17.91979

99.79803

−119.72409

87.53142

14.05907

60.23655

−797.73548

1605.45805

−82.09706







,Σ=





0.92748

3.20320

1.23924

8.84978













0.43803 1.27831 0.13142 2.61346

0.76851 1.13763 0.65991 2.79565

−0.75232 0.79498 1.60194 1.76489

−1.23805 −0.62409 −0.45549 2.01148

0.26089 2.77186 0.38882 0.19505

0.66948 3.32571 0.31369 2.04133

0.92787 1.04185 −0.27238 1.67820

0.29267 1.88389 0.28556 2.79760

0.35347 1.49620 0.20293 2.77878

−0.39515 0.50965 1.25654 0.14932







ii. NK-picture RBF parameters

w0=−0.25419,w=







12511.56054

−54.23826

−30.39539

26.81211

−4.73220

−16.14186

−12507.11582

4.66150

11.06643

0.66867







,Σ=





0.59234

2.05179

1.00957

3.00504













0.19710 1.71061 0.12047 3.04580

−0.67315 −0.68530 −0.17373 1.42105

0.39981 −0.66020 0.89182 −0.90448

0.58803 1.82533 0.24637 −0.95955

0.66092 0.77316 0.57093 3.35614

0.70296 1.74486 −0.15198 0.65384

0.19696 1.71090 0.12112 3.04637

0.88774 0.42078 0.61288 1.74001

0.92236 2.50876 0.15902 2.95167

−0.12642 0.67930 0.67757 1.23198







REFERENCES

[1] G. Liebl, M. Wagner, J. Pandel, and W. Weng, “An RTP payload format

for erasure-resilient transmission of progressive multimedia streams,”

Document draft-ietf-avt-uxp-07.txt, Oct. 2004.

[2] R. Schaefer, H. Schwarz, D. Marpe, T. Schierl, and T. Wiegand, “MCTF

and scalability extension of H.264/AVC and its application to video

transmission, storage, and surveillance,” in Proceedings of VCIP 2005,

Peking, China, July 2005, pp. 596 011: 1–12.

[3] T. Wiegand, L. Noblet, and F. Rovati, “Scalable video coding for IPTV

services,” Broadcasting, IEEE Transactions on, vol. 55, pp. 527–538,

june 2009.

[4] Joint Video Team (JVT), “Advanced video coding for generic audiovi-

sual services,” ITU-T Recommendation International Standard of Joint

Video Specification, ITU-T Rec. H.264/ISO/IEC 14496-10 AVC, Version

1, JVT-G50, May 2003.

[5] ISO/IEC, “Generic coding of moving pictures and associated audio

information - Part 2: Video,” ITU-T Recommendation H.262-ISO/IEC

13818-2, MPEG-2, Nov. 1994.

[6] ITU-T, “Video coding for low bitrate communication,” ITU-T Draft

Recommendation H.263 Version 1, Nov. 1995.

[7] ISO/IEC, “Coding of audio-visual objects - Part 2: Visual,” ISO/IEC

14496-2, MPGEG-4 Visual Version 1, Apr. 1999.

[8] H. Schwarz, D. Marpe, and T. Wiegand, “Overview of the scalable video

coding extension of the H.264/AVC standard,” Circuits and Systems for

Video Technology, IEEE Transactions on, vol. 17, no. 9, pp. 1103–1120,

Sept. 2007.

[9] M. Wien, H. Schwarz, and T. Oelbaum, “Performance analysis of SVC,”

Circuits and Systems for Video Technology, IEEE Transactions on,

vol. 17, no. 9, pp. 1194–1203, Sept. 2007.

[10] J. Ribas-Corbera, P. Chou, and S. Regunathan, “A generalized hypothet-

ical reference decoder for H.264/AVC,” Circuits and Systems for Video

Technology, IEEE Transactions on, vol. 13, no. 7, pp. 674–687, 2003.

[11] T. Lakshman, A. Ortega, and A. Reibman, “VBR video: tradeoffs and

potentials,” Proceedings of the IEEE, vol. 86, no. 5, pp. 952–973, 1998.

[12] A. Ortega, “Variable bit-rate video coding,” in Compressed Video over

Networks, M.-T. Sun and A. R. Reibman, Eds. New York: Marcel Dekker,

pp. 343–382, 2000.

[13] H.-M. Hang and J.-J. Chen, “Source model for transform video coder

and its application. I. Fundamental theory,” Circuits and Systems for

Video Technology, IEEE Transactions on, vol. 7, no. 2, pp. 287–298,

1997.

[14] J. Ribas-Corbera and S. Lei, “Rate control in DCT video coding for

low-delay communications,” Circuits and Systems for Video Technology,

IEEE Transactions on, vol. 9, no. 1, pp. 172–185, 1999.

[15] B. Tao, B. Dickinson, and H. Peterson, “Adaptive model-driven bit

allocation for MPEG video coding,” Circuits and Systems for Video

Technology, IEEE Transactions on, vol. 10, no. 1, pp. 147–157, Feb.

2000.

[16] ISO/IEC, “MPEG Test Model 5,” ISO/IEC JTC/SC29/WG11, MPEG

Test Model 5, April 1993.

[17] S. Ma, W. Gao, and Y. Lu, “Rate-distortion analysis for H.264/AVC

video coding and its application to rate control,” Circuits and Systems

for Video Technology, IEEE Transactions on, vol. 15, no. 12, pp. 1533–

1544, 2005.

[18] T. Chiang and Y.-Q. Zhang, “A new rate control scheme using quadratic

rate distortion model,” Circuits and Systems for Video Technology, IEEE

Transactions on, vol. 7, no. 1, pp. 246–250, Feb. 1997.

[19] S. Ma, Z. Li, and F. We, “Proposed draft of adaptive rate control,” JVT-

H017, 8th JVT Meeting, Geneva, Switzerland, May 2003.

[20] B. Xie and W. Zeng, “A sequence-based rate control framework for

consistent quality real-time video,” Circuits and Systems for Video

Technology, IEEE Transactions on, vol. 16, no. 1, pp. 56–71, 2006.

[21] Z. Chen and K. N. Ngan, “Towards rate-distortion tradeoff in real-time

color video coding,” Circuits and Systems for Video Technology, IEEE

Transactions on, vol. 17, no. 2, pp. 158–167, Feb. 2007.

[22] D.-K. Kwon, M.-Y. Shen, and C.-C. J. Kuo, “Rate control for H.264

video with enhanced rate and distortion models,” Circuits and Systems

for Video Technology, IEEE Transactions on, vol. 17, no. 5, pp. 517–529,

2007.

[23] Z. He, Y. K. Kim, and S. Mitra, “Low-delay rate control for DCT video

coding via ρ-domain source modeling,” Circuits and Systems for Video

Technology, IEEE Transactions on, vol. 11, no. 8, pp. 928–940, 2001.

[24] N. Kamaci, Y. Altunbasak, and R. Mersereau, “Frame bit allocation for

the H.264/AVC video coder via Cauchy-density-based rate and distortion

models,” Circuits and Systems for Video Technology, IEEE Transactions

on, vol. 15, no. 8, pp. 994–1006, 2005.

[25] S. Sanz-Rodr´

ıguez, O. del-Ama-Esteban, M. de-Frutos-L´

opez, and

F. D´

ıaz-de-Mar´

ıa, “Cauchy-density-based basic unit layer rate controller

for H.264/AVC,” Circuits and Systems for Video Technology, IEEE

Transactions on, vol. 20, no. 8, pp. 1139–1143, Aug. 2010.

[26] L. Xu, W. Gao, X. Ji, D. Zhao, and S. Ma, “Rate control for spatial

scalable coding in SVC,” in Picture Coding Symposium, 2007. PCS

2007, Nov. 2007.

[27] Y. Liu, Z. G. Li, and Y. C. Soh, “Rate control of H.264/AVC scalable ex-

tension,” Circuits and Systems for Video Technology, IEEE Transactions

on, vol. 18, no. 1, pp. 116–121, Jan. 2008.

[28] A. Leontaris and A. M. Tourapis, “Rate control for the Joint Scalable

Video Model (JSVM),” Video Team of ISO/IEC MPEG and ITU-T

VCEG, JVT-W043, San Jose, California, April 2007.

[29] Y. Pitrey, M. Babel, and O. Deforges, “One-pass bitrate control for

MPEG-4 scalable video coding using ρ-domain,” Broadband Multime-

dia Systems and Broadcasting, 2009. BMSB ’09. IEEE International

Symposium on, pp. 1–5, May 2009.

[30] M. Liu, Y. Guo, H. Li, and C.-W. Chen, “Low-complexity rate control

based on ρ-domain model for scalable video coding,” in Image Process-

ing, 2010. ICIP 2010. IEEE International Conference on, Sept. 2010,

pp. 1277–1280.

[31] Y. Cho, J. Liu, D.-K. Kwon, and C.-C. Kuo, “Joint quality-temporal (Q-

T) bit allocation for H.264/SVC,” in Circuits and Systems, 2009. ISCAS

2009. IEEE International Symposium on, May 2009, pp. 2361–2364.

[32] J. Liu, Y. Cho, Z. Guo, and J. Kuo, “Bit allocation for spatial scalability

coding of H.264/SVC with dependent rate-distortion analysis,” Circuits

and Systems for Video Technology, IEEE Transactions on, vol. 20, no. 7,

pp. 967–981, 2010.

[33] N. Mohsenian, R. Rajagopalan, and C. A. Gonzales, “Single-pass

constant- and variable-bit-rate MPEG-2 video compression,” IBM Jour-

nal of Research and Development, vol. 43, no. 4, pp. 489–509, Jul.

1999.

[34] M. Rezaei, M. Hannuksela, and M. Gabbouj, “Semi-fuzzy rate controller

for variable bit rate video,” Circuits and Systems for Video Technology,

IEEE Transactions on, vol. 18, no. 5, pp. 633–645, May 2008.

[35] A. Jagmohan and K. Ratakonda, “MPEG-4 one-pass VBR rate control

for digital storage,” Circuits and Systems for Video Technology, IEEE

Transactions on, vol. 13, no. 5, pp. 447–452, 2003.

[36] M. de-Frutos-L´

opez, O. del-Ama-Esteban, S. Sanz-Rodr´

ıguez, and

F. D´

ıaz-de-Mar´

ıa, “A two-level sliding-window VBR controller for real-

time hierarchical video coding,” in Image Processing, 2010. ICIP 2010.

IEEE International Conference on, Sep. 2010, pp. 4217–4220.

[37] P. H. Westerink, R. Rajagopalan, and C. A. Gonzales, “Two-pass

MPEG-2 variable-bit-rate encoding,” IBM Journal of Research and

Development, vol. 43, no. 4, pp. 471–488, 1999.

[38] Y. Yu, J. Zhou, Y. Wang, and C. W. Chen, “A novel two-pass VBR

coding algorithm for fixed-size storage application,” Circuits and Sys-

tems for Video Technology, IEEE Transactions on, vol. 11, no. 3, pp.

345–356, Mar. 2001.

[39] W. Ding, “Joint encoder and channel rate control of VBR video over

ATM networks,” Circuits and Systems for Video Technology, IEEE

Transactions on, vol. 7, no. 2, pp. 266–278, 1997.

[40] J. Bai, Q. Liao, X. Lin, and X. Zhuang, “Rate-distortion model based rate

control for real-time VBR video coding and low-delay communications,”

Signal Processing: Image Communication, vol. 17, no. 2, pp. 187–199,

2002.

[41] M. Dai, D. Loguinov, and H. Radha, “Rate-distortion analysis and

quality control in scalable internet streaming,” Multimedia, IEEE Trans-

actions on, vol. 8, no. 6, pp. 1135–1146, 2006.

[42] H. Lee, Y. Lee, D. Lee, J. Lee, and H. Shin, “Implementing rate

allocation and control for real-time H.264/SVC encoding,” in Consumer

Electronics (ICCE), 2010 Digest of Technical Papers International

Conference on, 2010, pp. 269–270.

[43] S. Sanz-Rodr´

ıguez and F. D´

ıaz-de-Mar´

ıa, “RBF-based QP estimation

model for VBR control in H.264/SVC,” Circuits and Systems for Video

Technology, IEEE Transactions on, vol. 21, no. 9, pp. 1263 –1277, Sep.

2011.

[44] X. M. Zhang, A. Vetro, Y. Shi, and H. Sun, “Constant quality constrained

rate allocation for FGS-coded video,” Circuits and Systems for Video

Technology, IEEE Transactions on, vol. 13, no. 2, pp. 121–130, Feb.

2003.

[45] A. Unterweger and H. Thoma, “The influence of bit rate allocation to

scalability layers on video quality in H.264 SVC,” in Picture Coding

Symposium, 2007. PCS 2007, Nov. 2007.

[46] H. Mansour, V. Krishnamurthy, and P. Nasiopoulos, “Rate and distortion

modeling of medium grain scalable video coding,” in Image Processing,

2008. ICIP 2008. IEEE International Conference on, Oct. 2008, pp.

2564–2567.

[47] M. Hannuksela, H. Zhu, H. Li, and M. Gabbouj, “Congestion-aware

transmission rate control using medium grain scalability of scalable

video coding,” in Image Processing (ICIP), 2010 17th IEEE Interna-

tional Conference on, Sep. 2010, pp. 2929–2932.

[48] J. Vieron, M. Wien, and H. Schwarz, “JSVM 11 software,” 24th Joint

Video Team (JVT) of ISO/IEC MPEG & ITU-T VCEG Meeting, Geneva,

Doc. JVT-X203, Jul. 2007.

[49] M. Wien and H. Schwarz, “Testing conditions for SVC coding efficiency

and JSVM performance evaluation,” 16th Joint Video Team (JVT) of

ISO/IEC MPEG & ITU-T VCEG Meeting, Poznan, Doc. JVT-Q205,

Poznan, Poland, Jul. 2005.

[50] [OnLine], “http://www.tsc.uc3m.es/∼sescalona/RbfVbrSvc/MultiBuffer/.”

[51] G. Bjøntegaard, “Calculation of average PSNR differences between RD

curves,” VCEG contribution, VCEG-M33, Austin, Apr. 2001.

[52] C. E. Rasmussen and C. Williams, Gaussian Processes for Machine

Learning. MIT Press, 2006.

[53] E. Snelson, “Matlab code for sparse pseudo-input Gaussian

processes (SPGP), [On-Line] http://www.gatsby.ucl.ac.uk/ snelson/,”

2005. [Online]. Available: http://www.gatsby.ucl.ac.uk/ snelson/

[54] E. Snelson and Z. Ghahramani, “Sparse Gaussian processes using

pseudo-inputs,” in Advances in Neural Information Processing Systems

18. MIT Press, 2006, pp. 1259–1266.

Sergio Sanz-Rodr´

ıguez (M’12) received the de-

gree in Technical Telecommunication Engineering in

2001, the M.S. degree in Telecommunication Engi-

neering in 2005, both from Universidad Polit´

ecnica

de Madrid, Madrid, Spain, and the Ph.D degree

in Multimedia and Communications in 2011 from

Universidad Carlos III de Madrid, Madrid, Spain.

His primary research interests include rate control

for video coding, scalable video coding, perceptual

video coding and video signal processing.

Fernando D´

ıaz-de-Mar´

ıa (M’97) received the M.S.

degree in Telecommunication Engineering in 1991

and Ph.D. degree in Telecommunication Engineer-

ing in 1996, both from Universidad Polit´

ecnica de

Madrid, Madrid, Spain.

Since October 1996, Dr. D´

ıaz-de-Mar´

ıa has been

an Associate Professor with the Department of

Signal Theory and Communications, Universidad

Carlos III de Madrid, Madrid, Spain. His current

research interests include image and video analy-

sis and coding. He has led numerous projects and

contracts in these fields. He is co-author of several papers in prestigious in-

ternational journals and quite a few papers in revised national and international

conferences.