Incidents | Informaten GbR

Incidents | Informaten GbR Incidents reported on status page for Informaten GbR https://status.informaten.com/ https://d1lppblt9t2x15.cloudfront.net/logos/70fbc1c03e31aa4b7c4a8cf95725d9cf.png Incidents | Informaten GbR https://status.informaten.com/ en PVE-03 recovered https://status.informaten.com/ Wed, 11 Jun 2025 10:23:02 +0000 https://status.informaten.com/#6e4d996165d6f585277a6d768de539b86d4a0c73f38413526adbab89935b766a PVE-03 recovered PVE-03 went down https://status.informaten.com/ Wed, 11 Jun 2025 10:19:36 +0000 https://status.informaten.com/#6e4d996165d6f585277a6d768de539b86d4a0c73f38413526adbab89935b766a PVE-03 went down Reboot PVE01 https://status.informaten.com/incident/465658 Fri, 22 Nov 2024 01:32:00 +0000 https://status.informaten.com/incident/465658#2a95c307ea0dbddb424428bb7d4443cf70e7a393509d0d0637b73f73e09e3b6c Maintenance completed Reboot PVE01 https://status.informaten.com/incident/465658 Fri, 22 Nov 2024 01:23:00 -0000 https://status.informaten.com/incident/465658#d8d81f97230723c19e9273770a4f51e6ea6ef99f261b15219fc358248ad9c3af reboot the hypervisor for important security updates Ceph Impact https://status.informaten.com/incident/446280 Wed, 16 Oct 2024 20:45:00 -0000 https://status.informaten.com/incident/446280#2c5b1736dc95db48e54c22f9f233f823230396113f1ca906f7f5aae35928989b Impact solved Ceph Impact https://status.informaten.com/incident/446280 Wed, 16 Oct 2024 20:45:00 -0000 https://status.informaten.com/incident/446280#2c5b1736dc95db48e54c22f9f233f823230396113f1ca906f7f5aae35928989b Impact solved Ceph Impact https://status.informaten.com/incident/446280 Wed, 16 Oct 2024 20:45:00 -0000 https://status.informaten.com/incident/446280#2c5b1736dc95db48e54c22f9f233f823230396113f1ca906f7f5aae35928989b Impact solved Ceph Impact https://status.informaten.com/incident/446280 Wed, 16 Oct 2024 19:38:00 -0000 https://status.informaten.com/incident/446280#b2996dc5ffe805d1310a719a760244dd70c52cf92d645e244760d56ef4f80fb8 Incident: OSD failure and critical failure of Ceph Manager pool due to insufficient replication Incident date/time: October 16, 2024, 9:38 PM CEST Affected service: Ceph Cluster - Manager Pool (replication factor 2) Incident description: On October 16, 2024, at 9:38 PM CEST, an OSD (Object Storage Daemon) in the Ceph cluster failed. Due to the replication factor of 2 in the manager pool, the failure caused a critical failure in the pool as there were not enough replicas to keep the pool stable. This resulted in a complete degradation of the manager pool and impacted the availability of the Ceph cluster. Cause: The replication factor of the manager pool was incorrectly set to 2. As a result, the loss of an OSD could not be compensated, resulting in the manager pool becoming inoperable. Immediate actions: Increase replication factor: The manager pool replication factor was increased to 3 to restore redundancy. Restore OSD: The failed OSD was investigated and attempts were made to rejoin the cluster. The exact cause of the failure is still being analyzed. Rebalance data: The Ceph cluster automatically started the rebalance and recovery process to replicate the missing data to other available OSDs. Cluster monitoring: The cluster was continuously monitored to ensure recovery progress and stability. Service impact: The manager pool was unavailable for 1 hour and 36 minutes, resulting in a critical risk to operations. There was no data loss during this time, but the availability of the manager pool was severely impacted. The VMs are all back online and the impact has been resolved. Next steps: Investigate OSD failure: The failed OSD will be further analyzed to rule out hardware issues and take action if necessary. Check replication factor: All pools in the Ceph cluster will be checked for their replication settings to ensure the recommended replication factor is used. Improve monitoring system: Alarms will be set up to detect misconfigurations early in the future. Downtime: 1 hour and 36 minutes Ceph Impact https://status.informaten.com/incident/446280 Wed, 16 Oct 2024 19:38:00 -0000 https://status.informaten.com/incident/446280#b2996dc5ffe805d1310a719a760244dd70c52cf92d645e244760d56ef4f80fb8 Incident: OSD failure and critical failure of Ceph Manager pool due to insufficient replication Incident date/time: October 16, 2024, 9:38 PM CEST Affected service: Ceph Cluster - Manager Pool (replication factor 2) Incident description: On October 16, 2024, at 9:38 PM CEST, an OSD (Object Storage Daemon) in the Ceph cluster failed. Due to the replication factor of 2 in the manager pool, the failure caused a critical failure in the pool as there were not enough replicas to keep the pool stable. This resulted in a complete degradation of the manager pool and impacted the availability of the Ceph cluster. Cause: The replication factor of the manager pool was incorrectly set to 2. As a result, the loss of an OSD could not be compensated, resulting in the manager pool becoming inoperable. Immediate actions: Increase replication factor: The manager pool replication factor was increased to 3 to restore redundancy. Restore OSD: The failed OSD was investigated and attempts were made to rejoin the cluster. The exact cause of the failure is still being analyzed. Rebalance data: The Ceph cluster automatically started the rebalance and recovery process to replicate the missing data to other available OSDs. Cluster monitoring: The cluster was continuously monitored to ensure recovery progress and stability. Service impact: The manager pool was unavailable for 1 hour and 36 minutes, resulting in a critical risk to operations. There was no data loss during this time, but the availability of the manager pool was severely impacted. The VMs are all back online and the impact has been resolved. Next steps: Investigate OSD failure: The failed OSD will be further analyzed to rule out hardware issues and take action if necessary. Check replication factor: All pools in the Ceph cluster will be checked for their replication settings to ensure the recommended replication factor is used. Improve monitoring system: Alarms will be set up to detect misconfigurations early in the future. Downtime: 1 hour and 36 minutes Ceph Impact https://status.informaten.com/incident/446280 Wed, 16 Oct 2024 19:38:00 -0000 https://status.informaten.com/incident/446280#b2996dc5ffe805d1310a719a760244dd70c52cf92d645e244760d56ef4f80fb8 Incident: OSD failure and critical failure of Ceph Manager pool due to insufficient replication Incident date/time: October 16, 2024, 9:38 PM CEST Affected service: Ceph Cluster - Manager Pool (replication factor 2) Incident description: On October 16, 2024, at 9:38 PM CEST, an OSD (Object Storage Daemon) in the Ceph cluster failed. Due to the replication factor of 2 in the manager pool, the failure caused a critical failure in the pool as there were not enough replicas to keep the pool stable. This resulted in a complete degradation of the manager pool and impacted the availability of the Ceph cluster. Cause: The replication factor of the manager pool was incorrectly set to 2. As a result, the loss of an OSD could not be compensated, resulting in the manager pool becoming inoperable. Immediate actions: Increase replication factor: The manager pool replication factor was increased to 3 to restore redundancy. Restore OSD: The failed OSD was investigated and attempts were made to rejoin the cluster. The exact cause of the failure is still being analyzed. Rebalance data: The Ceph cluster automatically started the rebalance and recovery process to replicate the missing data to other available OSDs. Cluster monitoring: The cluster was continuously monitored to ensure recovery progress and stability. Service impact: The manager pool was unavailable for 1 hour and 36 minutes, resulting in a critical risk to operations. There was no data loss during this time, but the availability of the manager pool was severely impacted. The VMs are all back online and the impact has been resolved. Next steps: Investigate OSD failure: The failed OSD will be further analyzed to rule out hardware issues and take action if necessary. Check replication factor: All pools in the Ceph cluster will be checked for their replication settings to ensure the recommended replication factor is used. Improve monitoring system: Alarms will be set up to detect misconfigurations early in the future. Downtime: 1 hour and 36 minutes