`damo report heatmap` modernization for snapshots, page level monitoring and intervals auto-tuning
Table of Contents
TL; DR: damo report heatmap
has recently advanced to support modern DAMON
features including age tracking, snapshots, page level monitoring, and
monitoring intervals auto-tuning. It will help users intuitively understand
the monitored access patterns at a glance.
DAMON in The Past: Full Recording based Monitoring
At the beginning, DAMON was providing only the access frequency of each memory
region in real time. Hence heatmap visualization, which shows the access
frequency of each memory area in the timeline was the first and one of the best
ways to see the access pattern. DAMON user-space tool (damo
) supported such
collections and visualizations of the data via damo record
and damo report heatmap
, like below example.
$ sudo damo record "../masim/masim ../masim/configs/stairs.cfg"
[...]
$ sudo damo report heatmap -i damon.data
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000004777777600000
00000000000000000000000000000000000000000000000000000000000000000005888888700000
00000000000000000000000000000000000000000000000000000000000000000003555555400000
00000000000000000000000000000000000000000000000000000000000048888883000000000000
00000000000000000000000000000000000000000000000000000000000068888883000000000000
00000000000000000000000000000000000000000000000000000000000068888883000000000000
00000000000000000000000000000000000000000000000000000111111100000000000000000000
00000000000000000000000000000000000000000000000000004888888700000000000000000000
00000000000000000000000000000000000000000000000000005888888700000000000000000000
00000000000000000000000000000000000000000000000000001222222100000000000000000000
00000000000000000000000000000000000000000000088888883000000000000000000000000000
00000000000000000000000000000000000000000000088888883000000000000000000000000000
00000000000000000000000000000000000000000000056666662000000000000000000000000000
00000000000000000000000000000000000001777776600000000000000000000000000000000000
00000000000000000000000000000000000004888888700000000000000000000000000000000000
00000000000000000000000000000000000004888888800000000000000000000000000000000000
00000000000000000000000000000023333331000000000000000000000000000000000000000000
00000000000000000000000000000088888883000000000000000000000000000000000000000000
00000000000000000000000000000588888883000000000000000000000000000000000000000000
00000000000000000000011111111332222221000000000000000000000000000000000000000000
00000000000000000000038888888870000000000000000000000000000000000000000000000000
00000000000000000000038888888870000000000000000000000000000000000000000000000000
00000000000000000000037666666650000000000000000000000000000000000000000000000000
00000000000002888888888000000000000000000000000000000000000000000000000000000000
00000000000003888888888000000000000000000000000000000000000000000000000000000000
00000000000003888888888000000000000000000000000000000000000000000000000000000000
00000004444555610000000000000000000000000000000000000000000000000000000000000000
00000005888888820000000000000000000000000000000000000000000000000000000000000000
00000005888888820000000000000000000000000000000000000000000000000000000000000000
44444446722222200000000000000000000000000000000000000000000000000000000000000000
88888888800000000000000000000000000000000000000000000000000000000000000000000000
88888888800000000000000000000000000000000000000000000000000000000000000000000000
66666666600000000000000000000000000000000000000000000000000000000000000000000000
# access_frequency: 0123456789
# x-axis: space [127.934 TiB, 127.934 TiB) (101.930 MiB)
# y-axis: time [2 h 33 m 56.451 s, 2 h 34 m 56.333 s) (59.882 s)
# resolution: 80x40 (1.274 MiB and 1.497 s for each character)
Each character on the middle of the output shows when (row) what address range
(column) of the memory was how frequently (number) accessed. masim
with
stairs.cfg
allocated 10 regions of 10 MiB size, and accesses those one by
one. The above heatmap is hence showing the pattern.
For the visualization, however, users have to record the entire DAMON-observed access frequency of each region for every moment. For long time recording, storage usage of the recorded data was not negligible. Heatmap-visualization of the huge record data was also time consuming. Hence this was useful for lab environments, but arguably not optimized for production environments.
$ ls -alh damon.data
-rw------- 1 root root 95K Jun 8 12:11 damon.data
Evolvement of Snapshots-based Monitoring
DAMON has evolved to provide not only access frequency of each region, but also
how long the current access frequency of the region was kept, namely ‘age’. It
was mainly developed for access-aware system operations, namely DAMON-based
Operation Schemes (DAMOS). But, we found the information can also be useful
for lightweight but practical monitoring. We therefore made yet another DAMON
feature for getting only the current snapshot of the DAMON monitoring results.
For easy capturing and visualization of the snapshot data, we implemented yet
another user-space tool feature, namely damo report access
. damo record
,
which is the user-space tool feature for capturing entire DAMON monitoring
results for the heatmap-like visualizations, has also been updated to support
only snapshot capturing (--snapshot
option). Nowadays, the snapshot based
visualization is the main feature of DAMON user-space tool for production
environments.
For example, users can start DAMON to monitor the workload asynchronously,
using damo start
.
$ sudo damo start "../masim/masim ../masim/configs/stairs.cfg"
Then, users can collect and show the snapshot in various visualization styles,
using damo report access
, like below.
$ sudo damo report access
heatmap: 00000000000000000000000000000002777777866[...]3333333333333333333333337899965555555447[...]8
# min/max temperatures: -2,210,000,000, 220,010,000, column size: 2.623 MiB
0 addr 85.672 TiB size 20.957 MiB access 0 % age 21.700 s
1 addr 85.672 TiB size 20.664 MiB access 0 % age 22.100 s
2 addr 85.672 TiB size 20.902 MiB access 0 % age 21.800 s
3 addr 85.672 TiB size 20.617 MiB access 0 % age 21.400 s
4 addr 85.672 TiB size 17.754 MiB access 0 % age 1.600 s
5 addr 85.672 TiB size 2.004 MiB access 100 % age 1.100 s
6 addr 85.672 TiB size 4.500 MiB access 0 % age 5.100 s
7 addr 127.047 TiB size 41.785 MiB access 0 % age 12.500 s
8 addr 127.047 TiB size 20.957 MiB access 0 % age 11.900 s
9 addr 127.047 TiB size 4.047 MiB access 0 % age 1.800 s
10 addr 127.047 TiB size 9.516 MiB access 100 % age 2.200 s
11 addr 127.047 TiB size 20.855 MiB access 0 % age 6.500 s
12 addr 127.047 TiB size 5.145 MiB access 0 % age 10 s
13 addr 127.994 TiB size 120.000 KiB access 0 % age 10.900 s
14 addr 127.994 TiB size 8.000 KiB access 35 % age 0 ns
15 addr 127.994 TiB size 4.000 KiB access 0 % age 1.600 s
memory bw estimate: 2.250 GiB per second
total size: 209.832 MiB
monitoring intervals: sample 5 ms, aggr 100 ms
$ sudo damo report access --style hot
heatmap: 00000000000000000000000000000002666666888[...]3222222222222222888998777777764444444448[...]8
# min/max temperatures: -3,020,000,000, 100,010,000, column size: 2.623 MiB
|99999999999999999999999999999999| 9.266 MiB access 100 % 1 s
|9999999999999999999999999999| 368.000 KiB access 100 % 200 ms
|222222222222222222222222222| 12.000 KiB access 30 % 100 ms
|6| 1.273 MiB access 75 % 0 ns
|5| 1.035 MiB access 65 % 0 ns
|2| 1.730 MiB access 30 % 0 ns
|000000000000000000000000000000000| 4.445 MiB access 0 % 1.900 s
|0000000000000000000000000000000000| 4.508 MiB access 0 % 2.200 s
|00000000000000000000000000000000000| 120.000 KiB access 0 % 4.600 s
|000000000000000000000000000000000000| 20.168 MiB access 0 % 5.900 s
|000000000000000000000000000000000000| 17.082 MiB access 0 % 7.800 s
|00000000000000000000000000000000000000| 20.512 MiB access 0 % 13.100 s
|00000000000000000000000000000000000000| 4.398 MiB access 0 % 15.700 s
|000000000000000000000000000000000000000| 41.785 MiB access 0 % 20.700 s
|000000000000000000000000000000000000000| 20.414 MiB access 0 % 29.600 s
|000000000000000000000000000000000000000| 20.957 MiB access 0 % 29.900 s
|000000000000000000000000000000000000000| 20.898 MiB access 0 % 29.900 s
|0000000000000000000000000000000000000000| 20.871 MiB access 0 % 30.200 s
memory bw estimate: 2.300 GiB per second
total size: 209.832 MiB
monitoring intervals: sample 5 ms, aggr 100 ms
$ sudo damo report access --style recency-percentiles
# total recency percentiles
<percentile> <idle time>
0 0 ns | |
1 0 ns | |
25 12.500 s |***** |
50 24.900 s |********** |
75 48.800 s |******************* |
99 49.100 s |********************|
100 49.100 s |********************|
memory bw estimate: 2.317 GiB per second
total size: 209.832 MiB
monitoring intervals: sample 5 ms, aggr 100 ms
Users can also periodically collect snapshots and save those as a file, using
damo record
. For example, the below command collects the snapshot three
times with a five seconds interval between each snapshot, and saves the output
as a file named damon.data
. The size of the file is much smaller than the
entire results record. damo report access
can be used for further
visualizing the saved data.
$ sudo damo record --snapshot 5s 3
[...]
$ ls -alh ./damon.data
-rw------- 1 root root 1.3K Jun 8 12:18 ./damon.data
$ sudo damo report access --input_file ./damon.data --style recency-percentiles
kdamond 0 / context 0 / scheme 0 / target id None / recorded for 100 ms from 485947 h 18 m 12.956 s
# total recency percentiles
<percentile> <idle time>
0 0 ns | |
1 0 ns | |
25 16.800 s |*** |
50 22.100 s |**** |
75 1 m 30.300 s |******************* |
99 1 m 30.600 s |********************|
100 1 m 30.600 s |********************|
memory bw estimate: 2.212 GiB per second
total size: 209.934 MiB
monitoring intervals: sample 5 ms, aggr 100 ms
kdamond 0 / context 0 / scheme 0 / target id None / recorded for 100 ms from 485947 h 18 m 18.051 s
# total recency percentiles
<percentile> <idle time>
0 0 ns | |
1 0 ns | |
25 11.600 s |** |
50 24.700 s |***** |
75 1 m 35.300 s |******************* |
99 1 m 35.600 s |********************|
100 1 m 35.600 s |********************|
memory bw estimate: 2.176 GiB per second
total size: 209.934 MiB
monitoring intervals: sample 5 ms, aggr 100 ms
kdamond 0 / context 0 / scheme 0 / target id None / recorded for 100 ms from 485947 h 18 m 23.139 s
# total recency percentiles
<percentile> <idle time>
0 0 ns | |
1 0 ns | |
25 11.800 s |** |
50 23 s |**** |
75 1 m 40.300 s |******************* |
99 1 m 40.600 s |********************|
100 1 m 40.600 s |********************|
memory bw estimate: 2.175 GiB per second
total size: 209.934 MiB
monitoring intervals: sample 5 ms, aggr 100 ms
DAMON has gained more features including page level monitoring and intervals
auto-tuning. Snapshot collecting and visualization features (damo report access
and damo record
) also advanced together to support the modern
features. With intervals auto-tuning, we believe DAMON can be enabled on
production environments always, and snapshot-based data collection and
visualization can be useful for system observability.
Meanwhile, heatmap visualization was not actively updated following the new
DAMON features. It was just not working at all for snapshot data. Though
snapshot based access visualization was proven to be useful, we recently
learned the old style heatmap visualization is what allows people to get the
intuitive glance of the access pattern in an easy way. We therefore started
working on updating the damo report heatmap
to support the modern features,
starting from v2.8.3.
Modernized damo report heatmap
By v2.8.4, we expect damo report heatmap
will show reliable and useful
heatmap visualization of snapshot data. It shows the access frequency of each
memory region like it was doing before. But if the input is the DAMON
monitoring results of snapshot[s] rather than entire results of every moment,
it fills the timeline based on the ‘age’ information of the region on the
snapshots. For example, the three snapshots data, which is collected above,
can be visualized as a heatmap like below.
$ sudo damo report heatmap -i damon.data
[...]
00000000000000000000000000000000000000000000000000 0000000000000000000
0000000000000000000000000000000000000000000000000000000000 0000000000000000000
00000000000000000000000000000000000000001100000000000000058881000000000000000000
00000000000000000000000000000000000000000000000000000000000000 0000000000000000
00000000000000000000000000000000000000012000000000000000000007887000000000000000
00000000000000000000000000000000000000000000000000000000000000000 000000000000
00000000000000000000000000000000000000000020000000000000000000000888500000000000
# access_frequency: 0123456789
# x-axis: space [85.672 TiB, 85.672 TiB) (202.891 MiB)
# y-axis: time [485947 h 16 m 42.456 s, 485947 h 18 m 23.239 s) (1 m 40.783 s)
# resolution: 80x40 (2.536 MiB and 2.520 s for each character)
The format is the same as the above record-based heatmap. But, now there are blank characters. Those are memory regions that we cannot find the access information from the snapshot. Each snapshot shows the access frquency of each region, and how long the access frequency was kept. Let’s say there is an oldest snapshot saying the first 100 MiB region was not accessed at all for 10 seconds, and the next 10 MiB region was accessed with a high access frequency level for 5 seconds. Then, we cannot know what was the access frequency of the 10 MiB region before the 5 seconds. The blank columns are showing that.
In more detail, The first three lines of the output are made from the first
(oldest) snapshot. The snapshot found about 10 MiB of the region (58881
)
were accessed frequently, for about 2.5 seconds. The snapshot says the access
frequency was kept only for about 2.5 seconds, hence it’s access frequency of
the past is unknown, so filled with blank columns. This matches with our
understanding of masim
program’s access pattern.
Next two lines (fourth and fifth) are probably made with the second snapshot.
The third snapshot should made the last two lines. The expected masim
program’s access pattern is continuing to be found there.
The latest version of damo report heatmap
also supports page level monitoring
and intervals auto-tuning. If the snapshot is captured with page level DAMOS
filters, users can plot the heatmap for only DAMOS filters-passed pages, by
passing --df_passed
option to damo report heatmap
. The command also
understands monitoring intervals auto-tuning feature, and hence it handles the
dynamically changed intervals in an appropriate way, when handling the ‘age’
information.
The updated version of the code is available at the next branch, and will be released as v2.8.4 in near future.
Wrapup
The classic heatmap visualization was not actively updated since DAMON changed its strategy for monitoring from full access observation records to partial time information-captured snapshots. Now the classic heatmap visualization is modernized to support the snapshots use case, and expected to be useful at understanding overall access patterns at a glance.