scieee Science in your language
[en] (orig)
Gervin Thomas, Karthik Chandrasekar, Benny Åkesson, Ben Juurlink, Kees
Goossens
A predictor-based power-saving policy for
DRAM memories
Conference object, Postprint version
This version is available at http://dx.doi.org/10.14279/depositonce-5780.
Suggested Citation
Thomas, Gervin; Chandrasekar, Karthik; Åkesson, Benny; Juurlink, Ben; Goossens, Kees : A predictor-
based power-saving policy for DRAM memories. - In: 2012 15th Euromicro Conference on Digital System
Design: Architectures, Methods and Tools : DSD. - New York, NY [u.a.] : IEEE, 2012. - ISBN:
978-1-4673-2498-4. - pp. 882-889. - DOI: 10.1109/DSD.2012.11. (Postprint version is cited, page
numbers differ.)
Terms of Use
© © 2012 IEEE. Personal use of this material is permitted. Permission from IEEE must be
obtained for all other uses, in any current or future media, including reprinting/republishing
this material for advertising or promotional purposes, creating new collective works, for
resale or redistribution to servers or lists, or reuse of any copyrighted component of this
work in other works.
Powered by TCPDF (www.tcpdf.org)
A Predictor-based Power-Saving Policy
for DRAM Memories
Gervin Thomas, Karthik Chandrasekar, Benny ˚
Akesson, Ben Juurlinkand Kees Goossens
Technische Universit¨
at Berlin, Department of Computer Engineering and Microelectronics
Embedded Systems Architecture, Berlin, Germany
Technische Universiteit Delft, Department of Computer Engineering, Delft, The Netherlands
Technische Universiteit Eindhoven, Department of Electrical Engineering
Electronic Systems, Eindhoven, The Netherlands
Abstract—Reducing power/energy consumption is an impor-
tant goal for all computer systems, from servers to battery-driven
hand-held devices. To achieve this goal, the energy consumption
of all system components needs to be reduced. One of the most
power-hungry components is the off-chip DRAM, even when it
is idle. DRAMs support different power-saving modes, such as
self-refresh and power-down, but employing them every time
the DRAM is idle, reduces performance due to their power-up
latencies. The self-refresh mode offers large power savings, but
incurs a long power-up latency. The power-down mode, on the
other hand, has a shorter power-up latency, but provides lower
power savings.
In this paper, we propose and evaluate a novel power-saving
policy that combines the best of both power-saving modes in
order to achieve significant power reductions with a marginal
performance penalty. To accomplish this, we use a history-based
predictor to forecast the duration of an idle period and then
either employ self-refresh, or power-down, or a combination
of both power saving modes. Significant refinements are made
to the predictor to maximize the energy savings and minimize
the performance penalty. The presented policy is evaluated
using several applications from the multimedia domain and
the experimental results show that it reduces the total DRAM
energy consumption between 68.8% and 79.9% at a negligible
performance penalty between 0.3% and 2.2%.
Index Terms—Predictor-based Power Saving Policy, Predictor,
DRAM-Memory, Self-Refresh, Power-Down.
I. INTRODUCTION
The power/energy consumption is an important constraint
for all kinds of computing systems, not only for battery-
powered embedded systems, but also for high-performance
servers and any computing system in between. Battery-driven
embedded systems, such as cell phones, have limited power
budgets as well as high performance requirements, and these
requirements do not go hand in hand. High-end server systems,
on the other hand, also require the energy to be reduced
because it brings down the operating costs and cooling effort.
DRAM memories contribute significantly to the overall
system energy consumption. For example, memory energy
consumption in mobile devices is up to 20% [22] and in
data center servers up to 25% [13]. The DRAM memory
energy consumption profile in [3] and [4] show that DRAMs
consume significant amounts of power even when they are idle.
To reduce DRAM energy consumption during idle periods,
different power-saving modes are available, such as power-
down and self-refresh. The drawback of the self-refresh mode
is that it takes several clock cycles to power up the DRAM,
whereas that of the power-down mode is that it saves much
less power than the former.
To make this discussion more concrete, let us consider
a1 Gb DDR3-800 MICRON memory [12]. This memory
draws around 50 mA of current when idle, which corresponds
to about 75 mW of power. DDR3 memories support two
important power-saving modes, namely power-down and self-
refresh. The power-down mode reduces the power consump-
tion to 18 mW, while the self-refresh mode brings it down
to 9 mW [12]. Hence it is possible to reduce the power
consumption during idle periods by factors of 4.1 (76%) and
8.3 (88%), respectively.
Both modes, however, incur a performance penalty, due to
their power-up latencies. For the 1 Gb MICRON DDR3-800
memory, the power-down mode has a relatively small penalty
of around 25 ns (10 memory clock cycles), while the self-
refresh mode has a very large penalty of about 1280 ns (512
memory clock cycles) [7]. Thus the larger the power saving,
the larger the power-up latency and hence the performance
penalty, and a trade-off needs to be made.
In order to reduce power dissipation while not incurring
a large performance penalty, this paper employs and hones
a generic history-based predictor to anticipate the length of
the next memory idle period. Depending on the predicted
idle period length, the presented power-saving policy employs
either power-down, or self-refresh, or a combination of both
power-saving modes. Furthermore, in an effort to completely
avoid the power-up latency, the power-saving policy uses a
conservative prediction to power up the memory just in time
before it will be accessed again. The three main contributions
of this paper can be summarized as follows:
1) We significantly extend and fine tune a versatile prediction
algorithm to be able to apply it to the problem at hand.
For example, to be able to apply the prediction algorithm
to the problem of reducing DRAM energy consumption,
levels of idle period lengths need to be introduced because
the idle period lengths vary enormously (up to 4-5 orders
of magnitude).
2) We present a novel power-saving policy based on the fine-
tuned predictor that, depending on the predicted duration
of the idle period, employs either power-down, or self-
refresh, or a combination of both power-saving modes.
1
Furthermore, to avoid powering down the memory during
short idle periods, which would hardly save any power but
incur a large penalty, a time-out strategy is employed.
3) We evaluate the proposed power-saving policy using sev-
eral applications from the multimedia domain. Experi-
mental results, obtained by employing a trace player that
simulates the application behavior in a SystemC model
of an MPSoC, show that we save between 68.6% and
79.9% of total memory energy with a marginal increase
of execution time between 0.3% and 2.2%.
The rest of this paper is organized as follows, Section II
presents a brief overview of related work that employ predic-
tion for different components of an MPSoC, as well as work
targeted at reducing DRAM energy consumption. Background
information, such as the baseline prediction algorithm and
basic DRAM operations and their power-saving modes, is
given in Section III. Section IV describes how the baseline
prediction algorithm needs to be extended and fine tuned
in order to be able to apply it to the problem of reducing
DRAM energy consumption. Based on the modified prediction
algorithm, the proposed power-saving policy is presented in
detail in Section V. The proposed policy is experimentally
evaluated in Section VI. Finally, Section VII summarizes and
highlights the contributions of this work and presents our final
conclusions.
II. RELATED WORK
Predictors have been used in many areas of MPSoC re-
search. In the Networks On Chip domain, [14], [20] used
predictors to forecast end-to-end traffic. In [20], a history-
based predictor is used to forecast traffic patterns by searching
for similar traffic shapes in a history buffer. In [14], a model
based on state space representation is used to predict the
availability of input buffers in routers.
In DRAM research, predictors have been used to reduce
memory access times for DRAMs connected to MPSoCs.
In [19], [23], a predictor was employed to reduce precharges
and activates and thereby reducing the average DRAM access
latency. This forecast is used to determine whether an open
DRAM row should be closed or kept open and which row to
open next to minimize the average access latency. Similarly
in [1], a predictor is used to track the number of accesses
to a given DRAM page to predict DRAM locality to make
page closing decisions. The work in [15] proposed a dynamic
memory mode control scheme with a predictor to predict
whether the next memory reference causes a page hit or not.
This is used to exploit locality and reduce the number of
activates and precharges to a bank. However, all of these
solutions that used predictors, targeted active power reduction
and not idle power reduction.
Other works have proposed DRAM power reduction so-
lutions without using explicit prediction logic. For instance
in [11], a DRAM precharge policy based on address analysis
is presented. Statistical information from instructions waiting
to access the memory are collected and analyzed to determine
the next bank access and decide on which bank to precharge,
thus reducing the average memory latency. In [10], the authors
proposed a hardware prefetching technique assisted by static
(design time) analysis of data access patterns for efficient
data prefetching. However, this idea would only be useful in
improving performance of caches and can lead to increased
main memory power consumption, due to the mispredicted
prefetches. [21] proposed combining read/write multiple times
within a single activate-precharge pair to obtain significant
energy savings and [8] achieves low power consumption in
DRAM memories by changing the processor and memory
clock frequencies. However, these solutions also target active
power reduction.
In [6], the authors proposed reducing idle memory power by
extending by using a compiler-directed selective power-down
and a hardware-assisted prediction-based run-time power-
down. However the former is not suitable for run-time use
and the latter only employs power-down mode and does not
exploit the self-refresh mode.
The hesitation of using the self-refresh mode stems from
the large power-up latency associated with it. As a result, there
has been no known effort to combine the use of prediction and
self-refresh modes to obtain memory idle energy savings. In
this paper, we propose an efficient power-saving policy that
with the help of a predictor employs both the self-refresh
and power-down modes to reduce idle energy consumption
significantly while keeping the penalty negligible.
III. BACKGROUND
This section introduces the history-based predictor used in
this paper and briefly discusses basic DRAM operations and
their different power-down modes.
A. Predictor
The generic history-based predictor used in this paper was
originally proposed in [20], where it was used to forecast
traffic pattern for rerouting in networks. In this paper, we
modify and employ this predictor to forecast memory idle
periods. The generic predictor is briefly introduced here.
The predictor probes a history of data points, considering
a current set of reference data points and searches for similar
patterns in the history. This is used to predict the future set of
data points. The prediction algorithm considers the latest set of
mdata points as a reference pattern (shown in Figure 1) from
the history set Y(y0, y1,· · · , yn)of n+ 1 data points (where
m < n) and searches for similar patterns of the reference
pattern length in the past. A parameter width (w) is used to
identify whether a set of past data points fits the reference
pattern. In the set of data points from the history, if a particular
data point differs by more than |w/2|that pattern is neglected
by the predictor to forecast the next data point. The algorithm
continues to compare the reference pattern with all data points
in the history moving one data point at a time. If the algorithm
finds many similar patterns in the history, it forecasts the next
data point by considering all these patterns.
The predictor builds up a history before forecasting future
data points. The latest set of reference data points between
(yn)and (ynm+1)are compared with the different patterns
from history. The algorithm also defines a parameter history
length, which gives the limit on the number of past data points
2
value
time
yγm+1 yγynm+1 yn
yγ+1
yn+1
Reference pattern
History length
w/2
Fig. 1. Working of the Predictor
to be taken into account for this prediction. In Figure 1,
the history includes all shown data points, but any size of
the history length can be used for the analysis. If there is
a pattern of length min the past that is very similar to the
reference pattern, like the pattern between (yγm+1)and (yγ),
the algorithm predicts that the next future data point (yn+1)
is very similar to the data point that follows the past pattern
(yγ+1). As stated before, the matching to past data points
is not limited to just one reference pattern in the history. If
multiple past data patterns are similar to the current one, based
on the similarity, the weighted sum of the matching data points
is calculated to forecast the next data point [20].
In this paper, we adapt and extend this versatile generic
predictor to employ it for our memory power optimization
problem. We employ memory idle period lengths as data
points and predict lengths of future memory idle periods.
The predictor needs further extensions because the statically
computed width parameter can produce wrong estimations for
memory idle periods with large variations. The improvements
and extensions of this predictor are explained in Section IV.
B. DRAM Basics and Power-Saving Modes
Dynamic Random Access Memory (DRAM) is the most
used type of main memory in mobile phones, laptops, gaming
consoles and servers. DRAMs consist of several banks, where
data are stored in rows and columns. When reading or writing
data from or to the memory, the data from any given row
is moved to the row buffer, and then to the I/O buffers to
complete the data transfer. If data is retained in the row buffer
after the operation is finished, it keeps the memory in the
active state. If it is moved back to the memory row, it moves
the memory to the precharged state. The memory can be idle
in either of these two states.
Each bit of data is stored as charge in a capacitor. The
capacitors leak the charge over time. The memory has to be
refreshed at regular intervals to avoid losing data. Therefore,
DRAM is a volatile memory and data is lost when the memory
is turned off. However, when the memory is on, it is not used
all the time and depending on the application there can be
several memory idle periods of varying lengths. The memory
consumes a significant amount of energy during these idle
periods, which can be reduced using power-saving modes,
such as power-down or self-refresh. The power-down mode
can be employed in either the active or precharged state, while
the self-refresh mode can be employed only in the precharged
state. In general, the precharged state power-down saves more
power than the active state power-down. For simplicity in this
paper, we make sure the memory is in the precharged state at
the end of every read or write transaction, making it easier to
employ the precharge power-down and the self-refresh modes.
Comparing these two modes, the power-down mode saves
less power than the self-refresh mode, but the memory can
power-up from the power-down mode much faster than from
the self-refresh mode. The goal of this work is to use self-
refresh as often as possible to maximize the power savings
while keeping the performance penalty low.
For our analysis, we consider a MICRON 1 Gb DDR3-800
memory device [12]. For this memory, the current consumed
during power-up cycles and in the precharged idle mode
(denoted by IDD2N) is 50 mA. When in self-refresh mode
(SR), the memory draws a current of 6 mA (denoted by IDD6)
and needs XSDLL clock cycles (equal to 512 cc) to power-up
the memory. In precharged power-down mode (PD), 12 mA of
current is consumed (denoted by IDD2P0) and the power-up
latency is given by XP DLL (equal to 10 cc).
IV. EXTENDING THE PREDICTOR
This section extends and fine tunes the generic predictor
from [20] to serve two purposes: (1) For efficient selection of
power-saving modes for any given idle period length, (2) To
apply the predictor to forecast idle periods in DRAMs.
A. Efficient Power-Saving Mode Selection
Selecting the best power-saving mode depends on the length
of the idle period. For short idle periods (up to a few thousand
clock cycles for DDR3-800), the power-down mode is more
gainful because of its short power-up latency compared to self-
refresh. For longer idle periods, the self-refresh mode becomes
more gainful because of its lower power consumption that
compensates for its long power-up latency.
To estimate the minimum idle period length at which the
self-refresh mode saves more power compared to power-down
(including powering-up time), we need to consider the energy
consumption when the memory is in the power-down or the
self-refresh mode, in addition to the energy consumption dur-
ing their corresponding power-up cycles. This minimum idle
duration defines the self-refresh threshold (SRT), employed by
the prediction algorithm in this paper.
To derive the SRT, we estimate energy consumption when
employing the self-refresh (ESR) and power-down (EP R)
modes (including their power-up latencies) in SRT clock
cycles. The self-refresh mode keeps the memory in the self-
refresh state for SRT XSDLL clock cycles, and powers-
up the memory during XSDLL clock cycles (its power-up
latency). During the self-refresh period, IDD6current is drawn
by the memory, and during the XSDLL cycles, it draws IDD2N
current, as shown in Equation (1). Similarly, when the power-
down mode is selected, the memory is in the power-down
state for SRT XP DLL clock cycles and consumes IDD2P0
current. During its power-up period of XP DLL, it consumes
IDD2Ncurrent, as given by Equation (2). In these equations,
VDD corresponds to the supply voltage and clk to the clock
3
period of the memory clock.
ESR = [IDD6·(SRT XSDLL)] ·VDD ·clk
+ [IDD2N·XSDLL]·VDD ·clk (1)
EP D = [IDD2P0·(SRT XP DLL)] ·VDD ·clk
+ [IDD2N·XP DLL]·VDD ·clk (2)
Equating them and solving for SRT, as shown in Equa-
tion (3), gives the minimum idle period length (rounded up)
when self-refresh saves more energy than power-down. For
the 1 Gb Micron DDR3-800 memory discussed in Section I,
SRT equates to 3691 clock cycles.
SRT =XSDLL ·(IDD6IDD2P0)
IDD6IDD2P0
XP DLL ·(IDD2NIDD2P0)
IDD6IDD2P0
(3)
B. Applying the Predictor to DRAMs
This section describes how we significantly extend and fine
tune the generic predictor described in Section III-A.
To adapt the predictor to predict idle periods in DRAM
memories, we define a set of values for the history length,
the reference pattern length and the width parameters (see
Section III-A). For this, we use the inferences of the impact of
these parameters on prediction accuracy from [20]. We do a
design-time analysis using a combination of useful values for
these parameters and statically select the ones with the best
prediction accuracy for a particular application.
After selecting the parameters for the problem at hand, we
observed that the width parameter, which defines the allowed
difference between the reference pattern and the patterns from
the history, has an adverse effect on the prediction accuracy
if there are large variations in the idle period lengths. This is
due to the fact that for a set of highly variable idle periods,
a small width ignores idle patterns in the history even for a
relatively small variation. To resolve this issue, we propose
an extension to the predictor to be able to accurately predict
length of idle periods even when there are large variations.
As an extension, we introduce a new parameter to the
prediction algorithm, called levels.Levels are used to classify
the different idle period lengths within a set of pre-defined
ranges. We employ the different levels of the idle periods
(representing their lengths) as data points in the generic
prediction algorithm, introduced in Section III-A. The width
parameter is now applied on the allowed difference in levels
and not on exact idle period lengths and hence, a small value
for the width parameter is sufficient. We show an illustrative
example of using levels to predict idle period lengths in
Figure 2.
As can be seen in the figure, the range from [xi1, xi]refers
to level li. Assume SRT = 3691 as derived in Section IV-A.
To assure that that self-refresh mode is always better than
employing a power-down, level 1 includes all idle periods of
lengths from 0 cc to 3690 cc (SRT 1cc). Level 2 includes
all idle periods from 3691 cc (SRT) to 7381 cc. The bounds
of every sub-sequent level is derived by doubling the length
x0x1
l1
x2
l2
x3
l3
x4
l4
p1
idle
Fig. 2. Idleness Prediction on levels
of the current one. Therefore, level 3 includes idle period
lengths from 7382 cc to 14762 cc. By choosing levels in this
manner, we employ level 1 to indicate that self-refresh is not
gainful and all other levels where self-refresh is the favoured
power-saving mode. The introduction of levels reduces the
variation in idleness to range from level 1 to level 7 instead
of 0 cc to 236162 cc. Hence, a small width parameter can
be successfully employed. However, since the prediction is
performed on levels, as shown in Figure 2, predicting a level
liequates to a conservative value xi1, since we use the lower
bound of the range lias the predicted value, to reduce the mis-
prediction penalty. The history now consists of different levels
of idle periods in the past and the pattern comparison is done
on the basis of levels. Note that since levels represent ranges of
idleness, all predictions may not be accurate at the single clock
cycle level. Also since the predictor forecasts conservatively,
100% of all the idle cycles in some idle periods may not
exploited by the prediction.
V. POWER SAVING POLICY
In this section, we propose a novel power-saving policy that
employs the prediction algorithm form [20] in combination
with a time-out strategy to identify memory idleness. This
enables the use of either the self-refresh mode or the power-
down mode or both, while avoiding or reducing the penalty
cycles. The standard Time-Out strategy [16], briefly discussed
in Section V-A, is used to weed out any speculative usage of
the self-refresh mode. Additionally, we employ the prediction
algorithm multiple times in every idle period for effective use
of the self-refresh mode, described in Section V-B, and also
use the power-down mode speculatively when some idle cycles
are not exploited using the self-refresh mode (described in
Section V-C).
A. Time-Out
To avoid unnecessary usage of the self-refresh mode, the
standard time-out strategy is used that waits for a system to
be idle for a pre-defined time-out interval before powering
it down. The idle periods of the memory can vary between
a few and several thousand clock cycles, as explained in
Section IV-B. The self-refresh mode is not gainful for short
idle periods because of its long power-up latency. For short
idle periods, it always results in a performance penalty and no
energy gain. Hence, using a time-out interval can avoid the
unnecessary power-up penalty for short idle periods.
B. Prediction for Self-refresh
In this subsection, we analyse use of the prediction al-
gorithm for efficiently employing self-refresh in combination
4
with the time-out strategy described earlier.
The predictor conservatively predicts idle periods longer
than SRT in order to avoid the penalty cycles when employing
the self-refresh mode. This allows the memory to power-up
expectantly before the next request arrives. Furthermore, by
employing the time-out strategy, the predictor forecasts only
idle periods larger than the pre-defined time-out period. If the
predictor forecasts an idle period shorter than SRT, these idle
periods are neglected and the self-refresh mode is not used.
It is possible that the predictor under or over-estimates the
idle period length. The former is more probable, since the
predictor always provides a conservative estimate for the idle
period length. How to solve the under/over-estimation problem
is explained in the following two subsections.
1) Reasons for Estimation Problems: If the predictor fore-
casts idle period lengths shorter than the actual idle period, it
can be for two reasons. One, since the prediction is done in
terms of levels and provides the lower bound of a particular
level as the prediction value, it may neglect some idle clock
cycles in the corresponding idle period. The second reason
for under-estimating the length of idle periods is that very
long idle periods can be rare and far apart. If a long idle
period has not been predicted before or it is already out of the
limited history buffer, the predictor under-estimate because of
the missing similar large value in the history. It is also possible
that the predictor over-estimates the idle period length due to
a mismatch in the history or an unexpected extreme variation
in the idleness.
2) Solution for Estimation Problems: To solve the under-
estimation problem, we propose to employ multiple predic-
tions in a given idle period. This extends the use of the self-
refresh mode if possible, and powers-up the memory as late as
possible, just in time before it is accessed again. This policy
is depicted in Figure 3.
r1i1r2i2r3
tout e1
p1e2
p2tout e3
pred11 pred12 pred21
SR11 SR12 SR21
PD SRtotal P D P D SRtotal
Fig. 3. Multiple predictions for Self-Refresh
In the figure, requests are shown as hatched bars and
denoted as rxand idle periods occurs as ix. The initial time-
out is denoted as tout. After this time-out, the predicted value
marked as pred11 is checked if it is greater than or equal to
level 1 (minimum SRT cycles) and if so, a self-refresh (SR11)
is scheduled. The memory is expected to start to power up at p1
to assure that it powers up completely at e1. However, at p1the
history is temporarily updated with the elapsed idle cycles and
an additional prediction is invoked. If the predictor forecasts
that the new prediction is not gainful for an additional self-
refresh, the memory continues to power up. However, if the
predicted value is still gainful (pred12) the memory stays in
self-refresh. All predicted self-refresh periods (SR11,SR12)
are combined into a longer continuous self-refresh period
(SRtotal). At the end of the real idle period, the temporarily
set history value is overwritten by the real idle period length.
At every pi, an additional prediction can be done to extend the
self-refresh period. The maximum number of predictions per
idle period can be limited to avoid unnecessary penalty due to
over-estimations. However, the actual number of predictions
is lower than the defined maximum when either the predictor
forecasts that a self-refresh is not gainful or a misprediction
occurs. As already explained, the prediction is done conser-
vatively on levels and therefore a 100% exploitation of idle
cycles in every idle period is not possible using self-refresh.
Figure 3 also shows the idle period i2which has an over-
estimation problem. After an initial time-out, a self-refresh
(SR12) is scheduled based on the length of the predicted
value (pred21). The predicted value is larger as the actual idle
period and a wake-up penalty arises. The wake-up penalty can
be either the total power-up latency or a fraction of it. The
latter occurs if the memory has already started powering-up
when the next request arrives. In this case, the penalty is the
difference between the power-up latency and the cycles the
memory is already into the power-up.
C. Proposed Power-Saving Policy
As stated in Section V-B, by performing multiple predic-
tions per idle period, it is possible to exploit most of the
idle cycles using the self-refresh mode. However, since the
prediction is done on levels, a 100% exploitation of idle cycles
is not possible for all idle periods. To resolve this issue for the
unexploited idle cycles, we propose to combine the multiple
predictions for self-refresh with speculative use of the power-
down mode, whenever all idle cycles in any given idle period
are not exploited by self-refresh. The power-down mode saves
a considerable amount of power, though lower than the savings
from self-refresh, but also at a much lower power-up penalty
(10 cc against 512 cc for self-refresh for DDR3-800). Thus,
the proposed policy uses (a) self-refresh, when the prediction
exploits all idle cycles of an idle period, (b) power-down,
when the prediction forecasts idle periods shorter than SRT
clock cycles (level 1) and (c) combination of the two modes
to exploit most idle cycles by self-refresh and the rest by the
power-down mode, all at a nominal performance penalty.
The idle cycles not covered by the self-refresh prediction
include (1) the short idle periods where using the self-refresh
mode is not gainful (level 1), (2) the cycles spent during the
initial time-out, and (3) the idle cycles not exploited by the
prediction and self-refresh due to the conservative estimates on
levels. The proposed power-saving policy thus uses prediction
for self-refresh and also schedules a speculative power-down
for idle clock cycles not covered by self-refresh, thereby
covering 100% of the idle cycles by either of the two power-
saving modes.
Figure 3 also depicts this proposed power-saving policy. All
cycles exploited by power-down are donated as PD. For all idle
periods not exploited by self-refresh, the power-down mode is
used and only marginally increases the execution time. In other
words, scheduling a speculative power down results in a minor
penalty for a considerable energy gain.
5
In the next section, we compare the proposed power-saving
policy, which uses multiple predictions for self-refresh or
speculative power-down or a combination of both modes
against a naive speculative usage of the self-refresh mode
using several applications from the multimedia domain.
VI. EXPERIMENTAL EVALUATION
The goal of the proposed power-saving policy is to reduce
memory energy consumption while only marginally increasing
the execution time due to the power-up latencies. In this
section, we first briefly introduce the experimental setup and
the usage of the predictor in the system setup in Section VI-A.
Later, in Section VI-B, we analyse the impact of (a) the
time-out strategy by speculatively employing the self-refresh
mode, (b) using multiple predictions for self-refresh in an idle
period, and (c) employing the power-down mode speculatively
for the idle cycles not exploited by the prediction in the
self-refresh mode. For these evaluations, different applications
from the multimedia domain like MediaBench [9] are used.
A. Experimental Setup
In our experiments, we employ a 1 Gb Micron DDR3-
800 [12] memory and evaluate the proposed predictor-based
power-saving policy with respect to energy savings and the
impact on execution time. Three different multimedia appli-
cations are used: (1) H263 decoder, (2) Ray Tracer, and (3)
JPEG encoder. To evaluate the proposed power saving policy,
we run each application on the Simplescalar simulator [2] with
a16 kB L1-Data cache, 16 kB L1-Instruction cache, a 256 kB
shared L2 cache and 256 B cache line configuration. We filter
out the L2 cache misses and obtain the transactions to the
DRAM memory. We then employ a trace player to simulate
the application behavior in a SystemC simulation model of
the CompSOC platform [18]. We forward these transactions
from the trace player to a DRAM memory controller [17] in
platform, which is modified to employ our predictor and the
power saving policy logic.
For all our power analysis, we employed our open-source
DRAM energy estimation tool [5] based on the power model
presented in [4]. Current and voltage numbers are obtained
from the DRAM vendors datasheets [12].
The predictor is placed in the front-end of the memory
controller adjacent to the arbiter-bus. An overview of the
CompSOC platform including the predictor and the memory
controller is depicted in Figure 4.
The predictor monitors the inputs of the bus for incoming
requests and records the time stamps of the beginning of
the first transaction after every idle period and the end of
the last transaction before every idle period. The length of
the idle periods are encoded to levels. Using these levels, it
builds up a history of levels representing the lengths of idle
periods in a history buffer. The predictor uses the contents
of the history buffer to forecast the prospective levels, which
are decoded to conservative lengths of future idle periods,
as described in Section IV-B. Using the inferences from our
previous work [20] on the predictor and an initial design-time
analysis, we set the history buffer length (hl), the reference
pattern length (pl), and the predictor width parameter (w) as
NoC
IP1
IP2
Frontend
Request Buffer
Response Buffer
Request Buffer
Response Buffer
Bus
Predictor
Backend
Memory
Map
Command
Generator
data
logical
address
cmd
physical
address
DRAM
cmd
DRAM
Memory Controller
Level encoder
Idle
History Buffer
Reference
Pattern
Level
Predictor
Level decoder
Level
Power
Saving
Policy
Idle
Period
SR/PD
next prediction
Fig. 4. CompSOC overview including the predictor
TABLE I
PREDICTOR PARAMETERS
Application. hl pl w to pi
H263 decoder 50 2 4 230 150
Ray Tracer 50 2 4 250 40
JPEG encoder 50 2 4 0 200
depicted in Table I. The table also includes the best initial time-
out (to) parameter (see Section V-A) as well as the maximum
number of predictor invocations (pi). These two parameters
are explained in more detail in Section VI-B.
B. Evaluation of the Proposed Policy
In this section, we evaluate the impact of (a) the time-
out strategy (Section V-A) on speculative use of the self-
refresh mode, (b) using multiple predictions for self-refresh
(Section V-B, and (c) employing speculative power-down
(Section V-C) for idle cycles not exploited by prediction.
The pareto plots in Figure 5 show the total memory energy
consumption and the performance penalty (in the form of
impact on execution time) for the different media applications
when employing the time-out strategy, multiple predictions
with self-refresh, and the proposed power-saving policy that
selects between self-refresh, power-down or a combination of
both modes for different idle periods. Figure 5a presents the
results for the H263 decoder, Figure 5b for the Ray Tracer,
and Figure 5c for the JPEG encoder, respectively. Additionally,
Table II explicitly presents in percentage as well as factor by
which the energy consumption reduces and the execution time
increases due to the power-up penalties.
In both Figure 5 and Table II, Base corresponds to the
baseline results when no prediction or power-saving mode is
employed. Therefore, Base has the lowest execution time (no
penalty) and the highest power consumption.
In our first experiment, we compare our solution against
speculative use of the self-refresh mode when the mem-
ory is idle. This is depicted in Figure 5 and Table II by
SSR(time) (Speculative Self-Refresh), where time gives
the time-out threshold employed. Table I indicates the best
time-out threshold values for the three applications. As can
6
5.9 6 6.1 6.2 6.3 6.4
·106
500
1,000
Execution Time [cc]
Energy [uJ]
Base SSR(0)
SSR(100) SSR(200)
SSR(230) PSR(1)
PSR(5) PSR(50)
PSR(100) PSR(150)
PSRS(150)
(a) H263 decoder
6.85 6.9 6.95 7 7.05
·106
500
1,000
Execution Time [cc]
Energy [uJ]
Base SSR(0)
SSR(100) SSR(200)
SSR(250) PSR(1)
PSR(10) PSR(20)
PSR(30) PSR(40)
PSRS(40)
(b) Ray Tracer
6.6 6.8 7 7.2 7.4
·106
500
1,000
Execution Time [cc]
Energy [uJ]
Base SSR(0)
SSR(50) SSR(100)
PSR(1) PSR(50)
PSR(100) PSR(150)
PSR(200) PSRS(200)
(c) JPEG encoder
Fig. 5. Pareto plot of energy and execution time
be noticed in Figure 5a and 5b, by increasing the time-out
threshold up to a defined limit, the speculative self-refresh
gives improved energy savings and lower performance penalty.
This confirms the usefulness of using the time-out strategy for
these two applications, as there are many idle periods shorter
than the time-out threshold values and are ignored by the
speculative self-refresh by increasing the time-out thresholds.
However, in the case of the JPEG encoder (Figure 5c), the
TABLE II
POWER SAVINGS AND PENALTY CYCLES
H263 decoder Ray Tracer JPEG encoder
Inc.
Exe-
cution
Time
[%]
Savings
[%] /
Factor
Inc.
Exe-
cution
Time
[%]
Savings
[%] /
Factor
Inc.
Exe-
cution
Time
[%]
Savings
[%]
/Fac-
tor
Base 0 0 / 1 0 0 / 1 0 0 / 1
SSR 2.72 72.2 /
3.6
0.67 79.9 /
5
11.2 68.3 /
3.2
PSR 2.04 53.5 /
2.2
0.25 68.5 /
3.2
0.33 31.2 /
1.6
PSRS 2.2 73.1 /
3.7
0.32 79.9 /
5
0.55 68.6 /
3.2
design-time effort identifies that a time-out threshold value
greater than zero increases the energy consumption and only
marginally reduces the penalty and therefore results in a poor
trade-off. Hence, the time-out strategy is not employed for
this application. For all applications, the use of the time-out
strategy with the speculative self-refresh reduces the energy
consumption by factors between 3.2 (68.3%) and 5 (79.9%).
However, the speculative use of self-refresh without employing
prediction results in the very high penalties and an increase
in execution time by up to 11.2%. In short, high energy
savings are achieved because most of the idle cycles, except
those filtered out during time-out, are covered by the self-
refresh mode. The power-up penalties are unavoidable because
the self-refresh mode is used speculatively. Note that without
employing the time-out strategy speculative use of self-refresh
would result in a much higher penalties.
In our next experiment, we evaluate the use of multiple
predictions per idle period for employing the self-refresh
mode in combination with the previously mentioned time-
out strategy. This is represented by PSR(limit) (Prediction
for Self-Refresh), where the parameter limit gives the
maximum number of invocations of the predictor in a single
idle period. As can be seen in Figure 5, increasing the number
of predictions per idle period exploits more idle cycles using
the self-refresh mode and reduces the energy consumption. At
the same time the prediction is used to wake-up the memory
before the next request arrives and therefore avoids most of the
penalty observed when using self-refresh speculatively. This
can be noticed in Figure 5, where using prediction results in
only a small increase of the execution time compared to the
Base mode and also reduces the energy consumption.
Using design-time analysis, a limit is derived on the maxi-
mum number of predictor invocations per idle period, beyond
which employing additional predictions is not gainful. This
limit indicates the number of predictor invocations when the
predictor forecasts that it is no longer gainful to continue in
self-refresh or when the predictor starts over-estimating the
idleness that results in an increase in performance penalty.
The best values for the limit parameter for the different
applications are also shown in Table I. Using the prediction
for employing self-refresh reduces the energy consumption
significantly across the different applications by a factor be-
tween 1.6 (31.2%) and 3.2 (68.5%). These savings are lesser
than the speculative usage of the self-refresh mode, since the
7
predictions are done conservatively on levels and therefore
cover lesser number of idle cycles using the self-refresh mode
compared to the speculative self-refresh. On the other hand, the
use of the predictor avoids a lot of penalty cycles compared to
the speculative self-refresh mode, which results in a marginal
increase of execution time up to 2.04%. The energy savings
and the increase of execution time are shown in Table II.
In our final experiment, we evaluate the proposed power-
saving policy, which combines the time-out strategy and the
predictions for self-refresh and additionally schedules a specu-
lative power-down to maximize power savings, but still avoids
most of the power-up penalties. Using this policy, the self-
refresh mode is used when the prediction is above level 1 and
covers all idle cycles in that idle period, the power-down mode
is used for predictions of level 0 and cycles ignored due to
the time-out threshold and a combination of both the modes is
used when the prediction is inadequate in exploiting all the idle
cycles in that idle period using self-refresh alone. The policy is
denoted by PSRS(limit) (Prediction for Self-Refresh with
Speculative power-down), where limit corresponds to the
maximum number of predictor invocations. Using this policy
all idle periods are completely exploited using either the self-
refresh mode, or the power-down mode or a combination
of both power saving modes. The proposed policy has the
highest energy savings for all applications and reduces the
energy consumption by a factor between 3.2 (68.6%) and 5
(79.9%). This policy also results in a low performance penalty
between 0.32% and 2.2%.This policy harnesses the benefits
of the predictor and efficiently combines both the self-refresh
and the power-down modes to get maximum energy savings at
considerably lower performance penalties compared to using
the self-refresh mode speculatively.
VII. CONCLUSIONS
In this paper, we have significantly extended and fine
tuned a generic prediction algorithm to be able to employ it
for reducing DRAM power/energy consumption. Furthermore,
based on the prediction algorithm, we have proposed a novel
power-saving policy that leverages DRAM idle periods to put
the DRAM either in the self-refresh mode or the power-down
mode, or a combination of both power-saving modes in a
given idle period depending on the predicted duration of the
idle period. The power-saving policy, referred to as prediction
for self-refresh with speculative power-down (PSRS), places
the memory in self-refresh mode provided the predicted idle
period length is sufficient to save power. Otherwise it exploits
the idle period by scheduling a speculative power-down. If
the predicted idle period length is shorter than the actual idle
period length, it schedules a speculative power-down for the
clock cycles not exploited by the prediction. This policy hence
exploits all the idle cycles in all idle periods. Experimental
results for several multimedia benchmarks have shown that this
policy significantly reduces the total DRAM energy consump-
tion with negligible performance penalties when compared
to using the self-refresh mode speculatively. The proposed
policy results in very high energy savings (between 68.6%
and 79.9%) at very marginal performance penalty (between
0.32% and 2.2%).
REFERENCES
[1] M. Awasthi, D. W. Nellans, R. Balasubramonian, and A. Davis. Pre-
diction Based DRAM Row-Buffer Management in the Many-Core
Era. In Proc. 20th International Conference on Parallel Architecture
and Compilation Techniques (PACT), pages 183–184, Galveston Island,
Texas, October 2011. Poster Track.
[2] D. Burger and T. M. Austin. The SimpleScalar tool set, version 2.0.
ACM SIGARCH Comput. Archit. News, 25:13–25, June 1997.
[3] R. Can, F. Koch, I. Choi, I. Suh, H. Byun, and P. Blumstengel. Save
power and improve efficiency in virtualized environment of datacenter
by right choice of memory. Technical report, Microsoft Technology
Center & Samsung Semiconductor, 2011.
[4] K. Chandrasekar, B. ˚
Akesson, and K. Goossens. Improved Power
Modeling of DDR SDRAMs. In 14th Euromicro Conference on Digital
System Design Architectures, Methods and Tools (DSD), pages 99 –108,
2011.
[5] K. Chandrasekar et al. DRAMPower: Open Source DRAM Power &
Energy Estimation Tool. www.es.ele.tue.nl/drampower, 2012.
[6] V. Delaluz, M. Kandemir, N. Vijaykrishnan, A. Sivasubramaniam, and
M. J. Irwin. Hardware and software techniques for controlling DRAM
power modes. IEEE Transaction on Computers, 50(11):1154–1173,
2001.
[7] JEDEC SST Association. DDR3 SDRAM Standard, 2010. JESD79-3E.
[8] Y. Joo, Y. Choi, and H. Shim. Energy exploration and reduction of
SDRAM memory systems. In Proc. 39th Design Automation Conf, pages
892–897, 2002.
[9] C. Lee, M. Potkonjak, and W. H. Mangione-Smith. MediaBench: a tool
for evaluating and synthesizing multimedia and communicatons systems.
In Proc. 30th ACM/IEEE International symposium on Microarchitecture,
MICRO 30, pages 330–335, Washington, DC, USA, 1997. IEEE Com-
puter Society.
[10] J. Lee, C. Park, and S. Ha. Memory access pattern analysis and stream
cache design for multimedia applications. In Proc. Asia and South
Pacific Design Automation Conf the ASP-DAC 2003, pages 22–27, 2003.
[11] C. Ma and S. Chen. A DRAM Precharge Policy Based on Address
Analysis. In 10th Euromicro Conference on Digital System Design
Architectures, Methods and Tools (DSD), pages 244–248, 2007.
[12] Micron Technology Inc. DDR3 SDRAM 1Gb Data Sheet, 2006.
[13] L. Minas and B. Ellison. Energy Efficiency for Information Technology:
How to Reduce Power Consumption in Servers and Data Centers. Intel
Press, 2009.
[14] U. Y. Ogras and R. Marculescu. Prediction-based Flow Control for
Network-on-Chip Traffic. In Proc. 43rd Design Automation Conference,
DAC ’06, pages 839–844, New York, NY, USA, 2006.
[15] S.-I. Park and I.-C. Park. History-based memory mode prediction for
improving memory performance. In Proc. Int. Symp. Circuits and
Systems ISCAS ’03, volume 5, 2003.
[16] M. Pedram. Power optimization and management in embedded systems.
In Proc. Asia and South Pacific Design Automation Conference, ASP-
DAC ’01, pages 239–244, New York, NY, USA, 2001. ACM.
[17] B. ˚
Akesson and K. Goossens. Architectures and modeling of predictable
memory controllers for improved system integration. In Proc. Design,
Automation, and Test in Europe (DATE), pages 851–856, 2011.
[18] B. ˚
Akesson, A. Molnos, A. Hansson, J. Ambrose Angelo, and
K. Goossens. Composability and Predictability for Independent Ap-
plication Development, Verification, and Execution. In M. H¨
ubner and
J. Becker, editors, Multiprocessor System-on-Chip Hardware Design
and Tool Integration, chapter 2. Springer, Dec. 2010.
[19] V. Stankovic and N. Milenkovic. DRAM Controller with a Complete
Predictor: Preliminary Results. In Proc. 7th International Conference
on Telecommunications in Modern Satellite, Cable and Broadcasting
Services, volume 2, pages 593 –596, sept. 2005.
[20] G. Thomas, B. Juurlink, and D. Tutsch. Traffic Prediction for NoCs using
Fuzzy Logic. In Proc. 2nd International Workshop on New Frontiers in
High-performane and Hardware-aware Computing (in Conjunction with
HPCA-17), pages 33–40, San Antonio, Texas, USA, February 2011. KIT
Scientific Publishing.
[21] J. Trajkovic, A. V. Veidenbaum, and A. Kejariwal. Improving SDRAM
access energy efficiency for low-power embedded systems. ACM Trans.
Embed. Comput. Syst., 7:24:1–24:21, May 2008.
[22] O. Vargas. Achieve minimum power consumption in mobile memory
subsystems. Technical report, Infineon Technologies AG, 2006.
[23] Y. Xu, A. S. Agarwal, and B. T. Davis. Prediction in Dynamic SDRAM
Controller Policies. In Proc. 9th International Workshop on Embedded
Computer Systems: Architectures, Modeling, and Simulation, SAMOS
’09, pages 128–138, Berlin, Heidelberg, 2009. Springer-Verlag.
8