Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

NVIDIA NCP-AII Exam - Topic 1 Question 1 Discussion

Actual exam question for NVIDIA's NCP-AII exam
Question #: 1
Topic #: 1
[All NCP-AII Questions]

A system administrator noticed a failure on a DGX H100 server. After a reboot, only the BMC is available. What could be the reason for this behavior?

Show Suggested Answer Hide Answer
Suggested Answer: B

On an NVIDIA DGX system, the Baseboard Management Controller (BMC) is an independent processor that runs even if the main CPU and Operating System fail to load. If a server reboots and the administrator can access the BMC web interface or IPMI console, but the OS (Ubuntu/DGX OS) does not load, the most likely cause is a boot disk failure. The DGX H100 uses NVMe drives in a RAID-1 configuration for the OS boot volume. If both drives in the mirror fail, or if the boot partition becomes corrupted, the system will hang at the BIOS or UEFI prompt, unable to find a bootable device. While failed power supplies (Option D) or network links (Option A) can cause issues, they would typically prevent the BMC from being reachable at all or prevent remote network traffic respectively. A GPU failure (Option C) would not stop the OS from booting; the system would simply boot with a degraded GPU count. Therefore, checking the storage health via the BMC 'Storage' logs is the correct diagnostic step.


Contribute your Thoughts:

0/2000 characters
Glenna
1 day ago
I’m leaning towards multiple GPUs failing, but I feel like that’s less common. Power supplies seem more likely to cause a complete failure.
upvoted 0 times
...
Mariko
7 days ago
This reminds me of a practice question where a network card issue was mentioned. Could that really be the cause here?
upvoted 0 times
...
Alaine
12 days ago
I think a failed boot disk could definitely cause this issue, but I also recall something about power supplies being critical too.
upvoted 0 times
...
Glenn
17 days ago
I remember reading that if only the BMC is available, it could indicate a hardware issue, but I'm not sure which component it would be.
upvoted 0 times
...

Save Cancel