Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

NVIDIA NCP-AIO Exam - Topic 4 Question 13 Discussion

Actual exam question for NVIDIA's NCP-AIO exam
Question #: 13
Topic #: 4
[All NCP-AIO Questions]

You are managing a Slurm cluster with multiple GPU nodes, each equipped with different types of GPUs. Some jobs are being allocated GPUs that should be reserved for other purposes, such as display rendering.

How would you ensure that only the intended GPUs are allocated to jobs?

Show Suggested Answer Hide Answer
Suggested Answer: A

Comprehensive and Detailed Explanation From Exact Extract:

In Slurm GPU resource management, the gres.conf file defines the available GPUs (generic resources) per node, while slurm.conf configures the cluster-wide GPU scheduling policies. To prevent jobs from using GPUs reserved for other purposes (e.g., display rendering GPUs), administrators must ensure that only the GPUs intended for compute workloads are listed in these configuration files.

Properly configuring gres.conf allows Slurm to recognize and expose only those GPUs meant for jobs.

slurm.conf must be aligned to exclude or restrict unconfigured GPUs.

Manual GPU assignment using nvidia-smi is not scalable or integrated with Slurm scheduling.

Reinstalling drivers or increasing GPU requests does not solve resource exclusion.

Thus, the correct approach is to verify and configure GPU listings accurately in gres.conf and slurm.conf to restrict job allocations to intended GPUs.


Contribute your Thoughts:

0/2000 characters
Mee
13 days ago
I recall a similar question where we had to manage GPU resources, and I think we ended up using option A as well. It just seems more reliable.
upvoted 0 times
...
Lashawna
18 days ago
I'm not entirely sure, but I feel like option C about reinstalling the drivers might not be necessary unless there are detection issues.
upvoted 0 times
...
Wilford
23 days ago
I think option A makes the most sense since it involves checking the configuration files directly. I remember practicing that in a lab session.
upvoted 0 times
...

Save Cancel