Intel® VROC RAID Logging and Monitoring in Linux

5.1 Intel® VROC RAID Logging

Messages from the MDRAID subsystem in the Linux* kernel are logged for status, warnings, and errors. In most Linux distributions, these entries are stored in:

/var/log/messages

This system log aggregates kernel messages together with RAID-related outputs. Administrators can use it to monitor Intel® VROC RAID activity and identify issues that require attention.

Retrieving Kernel Logs with dmesg

The dmesg command displays kernel messages in real time, including RAID-related events such as initialization, synchronization, device failures, and recovery operations.

Example output:

# dmesg
[Thu Aug 4 09:19:52 2022] md/raid1:md126: not clean -- starting background reconstruction [Thu Aug 4 09:19:52 2022] md/raid1:md126: active with 2 out of 2 mirrors
[Thu Aug 4 09:19:52 2022] md126: detected capacity change from 0 to 107374182400 [Thu Aug 4 09:19:52 2022] md: resync of RAID array md126
[Thu Aug 4 09:21:36 2022] md: md126: resync done.
[Thu Aug 4 09:21:43 2022] md126: detected capacity change from 107374182400 to 0
[Thu Aug 4 09:21:43 2022] md: md126 stopped.
[Thu Aug 4 09:21:43 2022] md: md126 stopped.
[Thu Aug 4 09:21:43 2022] md: md127 stopped.
[Thu Aug 4 09:23:14 2022] md126: detected capacity change from 0 to 16003123642368
[Thu Aug 4 09:23:38 2022] md126: detected capacity change from 16003123642368 to 0
[Thu Aug 4 09:23:38 2022] md: md126 stopped.
[Thu Aug 4 09:23:38 2022] md: md127 stopped.
[Fri Aug 5 01:52:54 2022] md/raid:md126: not clean -- starting background reconstruction [Fri Aug 5 01:52:54 2022] md/raid:md126: device nvme3n1 operational as raid disk 2
[Fri Aug 5 01:52:54 2022] md/raid:md126: device nvme0n1 operational as raid disk 1
[Fri Aug 5 01:52:54 2022] md/raid:md126: device nvme1n1 operational as raid disk 0
[Fri Aug 5 01:52:54 2022] md/raid:md126: raid level 5 active with 3 out of 3 devices,
algorithm 0
[Fri Aug 5 01:52:54 2022] md126: detected capacity change from 0 to 214748364800 [Fri Aug 5 01:52:54 2022] md: resync of RAID array md126
[Fri Aug 5 01:54:36 2022] md: md126: resync done.
[Fri Aug 5 01:54:54 2022] md/raid:md126: Disk failure on nvme0n1, disabling device.
md/raid:md126: Operation continuing on 2 devices. [Fri Aug 5 01:54:54 2022] md: recovery of RAID array md126
[Fri Aug 5 01:56:41 2022] md: md126: recovery done.
[Fri Aug 5 02:00:20 2022] md/raid:md126: Disk failure on nvme3n1, disabling device.
md/raid:md126: Operation continuing on 2 devices. [Fri Aug 5 02:00:50 2022] md: recovery of RAID array md126
[Fri Aug 5 02:02:46 2022] md: md126: recovery done.

These logs provide detailed insights into the lifecycle of a RAID volume, helping administrators quickly identify events such as rebuilds, failures, or capacity changes.

Retrieving System Journal Logs with journalctl

Below is an example snippet of what the journal log may look like:

Reviewing Syslog Messages (/var/log/messages)

Below is an example snippet of what the log may look like in /var/log/messages:

5.2 RAID Monitoring

Once an Intel® VROC RAID volume is active, the mdmonitor daemon starts automatically. It monitors RAID events such as degraded arrays, drive failures, and rebuild progress. If configured in /etc/mdadm.conf, it can also trigger predefined actions or notifications.

Using the mdadm Monitoring Daemon

You can start the mdadm monitoring service manually with the following command:

This runs mdadm as a background daemon to monitor all RAID devices and report events to syslog. Administrators can then filter syslog entries for RAID-specific events.

Before starting monitoring, you must define an email address in the mdadm.conf file to receive notifications. For example:

Using systemctl for RAID Monitoring

The mdmonitor daemon is integrated with systemd, allowing you to manage it using systemctl commands:

  • Check service status:

  • Start the service manually:

  • Restart the service:

  • Enable service to start at boot:

  • Stop the service:

5.3 RAID Alerts

Intel® VROC reports RAID alerts through the monitoring service in Linux*. Administrators can integrate custom programs with the monitoring service to receive and process these alerts, enabling proactive response to RAID events.

Table 5-1. Intel® VROC RAID Alerts in Linux

VROC

Alert/Event

Severity

Description

Fail

Critical

A member drive in the RAID has failed.

FailSpare

Critical

The spare drive used for rebuild has failed.

DeviceDisappeared

Critical

A RAID member device or volume has disappeared (removed or inaccessible).

DegradedArray

Critcal

The RAID array has entered a degraded state.

RebuildStarted

Warning

A degraded RAID has started the rebuild (recovery) process.

RebuildNN

Warning

Notification of rebuilding progress,NN is two-digit number (e.g., 20, 40, …) which indicates the rebuild has passed that many percent of the total.

RebuildFinished

Warning

The rebuild of a degraded RAID is complete or aborted.

SparesMissing

Warning

One or more spare drives defined in mdadm.conf are missing or removed.

SpareActive

Information

A spare drive has been successfully rebuilt and activated.

NewArray

Information

A new RAID array has been detected.

MoveSpare

Information

A spare drive has been reassigned from one array to another.

5.4 Developing a Program to Handle RAID Alerts

The Intel® VROC RAID monitoring service allows administrators to register custom programs to receive and process RAID alerts. This is configured in the /etc/mdadm.conf file, enabling the monitoring service to call the user-defined program whenever an event occurs.

When invoked, the program receives two or three parameters:

  1. Event name – identifies the alert type.

  2. RAID volume device name – indicates the affected RAID device.

  3. Device identifier (optional) – provided when the event relates to a spare or member device.

Below is an example of a simple bash script that handles Intel VROC alerts by writing messages to a log file:

To enable this handler:

  1. Place the script in /usr/sbin, e.g., /usr/sbin/vroc_linux_events_handler.sh.

  2. Add the following line to /etc/mdadm.conf:

Sample output written to /tmp/vroc_alerts.log by this program:

Last updated

Was this helpful?