Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

NVIDIA NCP-AII Exam - Topic 1 Question 1 Discussion

Actual exam question for NVIDIA's NCP-AII exam
Question #: 1
Topic #: 1
[All NCP-AII Questions]

A system administrator noticed a failure on a DGX H100 server. After a reboot, only the BMC is available. What could be the reason for this behavior?

Show Suggested Answer Hide Answer
Suggested Answer: B

On an NVIDIA DGX system, the Baseboard Management Controller (BMC) is an independent processor that runs even if the main CPU and Operating System fail to load. If a server reboots and the administrator can access the BMC web interface or IPMI console, but the OS (Ubuntu/DGX OS) does not load, the most likely cause is a boot disk failure. The DGX H100 uses NVMe drives in a RAID-1 configuration for the OS boot volume. If both drives in the mirror fail, or if the boot partition becomes corrupted, the system will hang at the BIOS or UEFI prompt, unable to find a bootable device. While failed power supplies (Option D) or network links (Option A) can cause issues, they would typically prevent the BMC from being reachable at all or prevent remote network traffic respectively. A GPU failure (Option C) would not stop the OS from booting; the system would simply boot with a degraded GPU count. Therefore, checking the storage health via the BMC 'Storage' logs is the correct diagnostic step.


Contribute your Thoughts:

0/2000 characters
Wilda
11 days ago
Could be the network card too, but not sure.
upvoted 0 times
...
Bambi
16 days ago
I think it's definitely a power supply problem.
upvoted 0 times
...
Rana
21 days ago
Sounds like a boot disk issue to me.
upvoted 0 times
...
Margo
26 days ago
How can you tell if the boot disk has failed?
upvoted 0 times
...
Sharen
1 month ago
Wait, multiple GPUs failing? That seems unlikely.
upvoted 0 times
...
Ruthann
1 month ago
Definitely not just the network card.
upvoted 0 times
...
Ollie
1 month ago
I think it could be the power supplies.
upvoted 0 times
...
Micaela
2 months ago
Sounds like a boot disk issue to me.
upvoted 0 times
...
Glenna
2 months ago
I’m leaning towards multiple GPUs failing, but I feel like that’s less common. Power supplies seem more likely to cause a complete failure.
upvoted 0 times
...
Mariko
2 months ago
This reminds me of a practice question where a network card issue was mentioned. Could that really be the cause here?
upvoted 0 times
...
Alaine
2 months ago
I think a failed boot disk could definitely cause this issue, but I also recall something about power supplies being critical too.
upvoted 0 times
...
Glenn
2 months ago
I remember reading that if only the BMC is available, it could indicate a hardware issue, but I'm not sure which component it would be.
upvoted 0 times
...

Save Cancel