Resolved
Nov 24 at 09:00am CET
Root Cause Analysis – Cluster Outage on 23 November 2025
- Overview
On Sunday, 23 November 2025, our Ceph cluster at the DE – Maincubes FRA1 site experienced a disruption that led to temporary unavailability of the productive RBD storage. The root cause was a combined failure of two NVMe OSDs from the same manufacturing batch during an active rebalance process, resulting in several Placement Groups (PGs) being irreparably damaged. As a consequence, a full restore from backup was required.
...