[GH-ISSUE #75] [BUG] PIDs are scrambled and No Such Process is printed since update to NVIDIA drivers #45

Closed
opened 2026-05-05 03:23:13 -06:00 by gitea-mirror · 7 comments
Owner

Originally created by @marcreichman-pfi on GitHub (Jun 20, 2023).
Original GitHub issue: https://github.com/XuehaiPan/nvitop/issues/75

Originally assigned to: @XuehaiPan on GitHub.

Required prerequisites

  • I have read the documentation https://nvitop.readthedocs.io.
  • I have searched the Issue Tracker that this hasn't already been reported. (comment there if it has.)
  • I have tried the latest version of nvitop in a new isolated virtual environment.

What version of nvitop are you using?

git hash 4093334972a334e9057f5acf7661a2c1a96bd021

Operating system and version

Docker image (under Centos 7 host)

NVIDIA driver version

535.54.03

NVIDIA-SMI

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce GTX 1080 Ti     On  | 00000000:02:00.0 Off |                  N/A |
| 23%   45C    P2              57W / 250W |   2658MiB / 11264MiB |     32%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce GTX 1080 Ti     On  | 00000000:82:00.0 Off |                  N/A |
| 24%   45C    P2              55W / 250W |   3430MiB / 11264MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      2863      C   /opt/deepdetect/build/main/dede            1656MiB |
|    0   N/A  N/A      4520      C   /opt/deepdetect/build/main/dede             368MiB |
|    0   N/A  N/A      5001      C   /opt/deepdetect/build/main/dede             630MiB |
|    1   N/A  N/A      3267      C   /opt/deepdetect/build/main/dede             438MiB |
|    1   N/A  N/A      3675      C   /opt/deepdetect/build/main/dede             308MiB |
|    1   N/A  N/A      4072      C   /opt/deepdetect/build/main/dede            2314MiB |
|    1   N/A  N/A      5565      C   /opt/deepdetect/build/main/dede             366MiB |
+---------------------------------------------------------------------------------------+

Python environment

This is the docker version from the latest git head (6/20/2023)

$ sudo docker run -it --rm --runtime=nvidia --gpus=all --pid=host --entrypoint /bin/bash nvitop:4093334972a334e9057f5acf7661a2c1a96bd021
(venv) root@ad4380048e10:/nvitop# python3 -m pip freeze | python3 -c 'import sys; print(sys.version, sys.platform); print("".join(filter(lambda s: any(word in s.lower() for word in ("nvi", "cuda", "nvml", "gpu")), sys.stdin)))'
3.8.10 (default, May 26 2023, 14:05:08)
[GCC 9.4.0] linux
nvidia-ml-py==11.525.112
nvitop @ file:///nvitop

(venv) root@ad4380048e10:/nvitop#

Problem description

The output shows scrambled PIDs for processes after the initial process in the lists for each card, and then shows No Such Process for the wrong PIDs. This only started after the driver update, so I assume something is changed in the nvidia drivers.

Steps to Reproduce

The Python snippets (if any):


Command lines:

$ sudo docker run -it --rm --runtime=nvidia --gpus=all --pid=host nvitop:4093334972a334e9057f5acf7661a2c1a96bd021 --once
Tue Jun 20 18:35:07 2023
╒═════════════════════════════════════════════════════════════════════════════╕
│ NVITOP 1.1.2       Driver Version: 535.54.03      CUDA Driver Version: 12.2 │
├───────────────────────────────┬──────────────────────┬──────────────────────┤
│ GPU  Name        Persistence-M│ Bus-Id        Disp.A │ Volatile Uncorr. ECC │
│ Fan  Temp  Perf  Pwr:Usage/Cap│         Memory-Usage │ GPU-Util  Compute M. │
╞═══════════════════════════════╪══════════════════════╪══════════════════════╪════════════════════════════════════════════════════════════════════╕
│   0  ..orce GTX 1080 Ti  On   │ 00000000:02:00.0 Off │                  N/A │ MEM: █████████████▍ 23.5%                                          │
│ 28%   42C    P8     9W / 250W │   2650MiB / 11264MiB │      0%      Default │ UTL: ▏ 0%                                                          │
├───────────────────────────────┼──────────────────────┼──────────────────────┼────────────────────────────────────────────────────────────────────┤
│   1  ..orce GTX 1080 Ti  On   │ 00000000:82:00.0 Off │                  N/A │ MEM: █████████████████▍ 30.5%                                      │
│ 29%   44C    P8    10W / 250W │   3430MiB / 11264MiB │      0%      Default │ UTL: ▏ 0%                                                          │
╘═══════════════════════════════╧══════════════════════╧══════════════════════╧════════════════════════════════════════════════════════════════════╛
[ CPU: ██████████████████████████████████████████████████████████████████████████████████████████████████ MAX ]  ( Load Average: 71.03 39.83 35.33 )
[ MEM: ███████████████▊ 16.1%                                                                   USED: 9.49GiB ]  [ SWP: ▏ 0.0%                     ]

╒══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╕
│ Processes:                                                                                                                     root@2f027c15efb1 │
│ GPU     PID      USER  GPU-MEM %SM  %CPU  %MEM     TIME  COMMAND                                                                                 │
╞══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
│   0    2863 C    1000  1648MiB   0   0.0   1.8  2:15:55  /opt/deepdetect/build/main/dede -host 0.0.0.0 -port 8080 a652c745cc9b placeshybrid      │
│   0       0 C     N/A     4KiB   0   N/A   N/A      N/A  No Such Process                                                                         │
│   0 429496. C     N/A       0B   0   N/A   N/A      N/A  No Such Process                                                                         │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│   1    3267 C    1000   438MiB N/A   0.0   1.3  2:15:18  /opt/deepdetect/build/main/dede -host 0.0.0.0 -port 8080 bf55e7b22839 inceptionresnetv2 │
│   1       0 C     N/A     4KiB N/A   N/A   N/A      N/A  No Such Process                                                                         │
│   1 242640. C     N/A      N/A N/A   N/A   N/A      N/A  No Such Process                                                                         │
│   1 429496. C     N/A       0B N/A   N/A   N/A      N/A  No Such Process                                                                         │
╘══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╛

Traceback

No response

Logs

$ sudo docker run -it --rm --runtime=nvidia --gpus=all --pid=host -e LOGLEVEL=debug nvitop:4093334972a334e9057f5acf7661a2c1a96bd021 --once
[DEBUG] 2023-06-20 18:35:57,178 nvitop.api.libnvml::nvmlDeviceGetMemoryInfo: NVML memory info version 2 is available.
Tue Jun 20 18:35:57 2023
╒═════════════════════════════════════════════════════════════════════════════╕
│ NVITOP 1.1.2       Driver Version: 535.54.03      CUDA Driver Version: 12.2 │
├───────────────────────────────┬──────────────────────┬──────────────────────┤
│ GPU  Name        Persistence-M│ Bus-Id        Disp.A │ Volatile Uncorr. ECC │
│ Fan  Temp  Perf  Pwr:Usage/Cap│         Memory-Usage │ GPU-Util  Compute M. │
╞═══════════════════════════════╪══════════════════════╪══════════════════════╪════════════════════════════════════════════════════════════════════╕
│   0  ..orce GTX 1080 Ti  On   │ 00000000:02:00.0 Off │                  N/A │ MEM: █████████████▍ 23.5%                                          │
│ 24%   35C    P8     8W / 250W │   2650MiB / 11264MiB │      0%      Default │ UTL: ▏ 0%                                                          │
├───────────────────────────────┼──────────────────────┼──────────────────────┼────────────────────────────────────────────────────────────────────┤
│   1  ..orce GTX 1080 Ti  On   │ 00000000:82:00.0 Off │                  N/A │ MEM: █████████████████▍ 30.5%                                      │
│ 25%   36C    P8     9W / 250W │   3430MiB / 11264MiB │      0%      Default │ UTL: ▏ 0%                                                          │
╘═══════════════════════════════╧══════════════════════╧══════════════════════╧════════════════════════════════════════════════════════════════════╛
[ CPU: █████████████████████████████████████████████████████████████████████████████████████████████████▊ MAX ]  ( Load Average: 84.50 48.19 38.40 )
[ MEM: ███████████████▋ 15.9%                                                                   USED: 9.36GiB ]  [ SWP: ▏ 0.0%                     ]

╒══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╕
│ Processes:                                                                                                                     root@333a2a93dbb1 │
│ GPU     PID      USER  GPU-MEM %SM  %CPU  %MEM     TIME  COMMAND                                                                                 │
╞══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
│   0    2863 C    1000  1648MiB   0   0.0   1.8  2:16:45  /opt/deepdetect/build/main/dede -host 0.0.0.0 -port 8080 a652c745cc9b placeshybrid      │
│   0       0 C     N/A     4KiB   0   N/A   N/A      N/A  No Such Process                                                                         │
│   0 429496. C     N/A       0B   0   N/A   N/A      N/A  No Such Process                                                                         │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│   1    3267 C    1000   438MiB N/A   0.0   1.3  2:16:08  /opt/deepdetect/build/main/dede -host 0.0.0.0 -port 8080 bf55e7b22839 inceptionresnetv2 │
│   1       0 C     N/A     4KiB N/A   N/A   N/A      N/A  No Such Process                                                                         │
│   1 242640. C     N/A      N/A N/A   N/A   N/A      N/A  No Such Process                                                                         │
│   1 429496. C     N/A       0B N/A   N/A   N/A      N/A  No Such Process                                                                         │
╘══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╛

Expected behavior

Prior to the driver update, the information was present for the same PIDs included in nvidia-smi but with the full commandlines and the per-process resource statistics (e.g. GPU PID USER GPU-MEM %SM %CPU %MEM TIME). Now it seems to be having an issue parsing proper PIDs from the nvidia libraries, and then failing downstream from there.

Additional context

I'm not much of a Python programmer unfortunately so I'm not clear where to dig in, but I'd assume the issue is somewhere in the area of receiving the process list for the cards and deciphering the PIDs. My assumption is that something changed in the driver or some structure or class such that parsing code seems to have broken somewhere.

Originally created by @marcreichman-pfi on GitHub (Jun 20, 2023). Original GitHub issue: https://github.com/XuehaiPan/nvitop/issues/75 Originally assigned to: @XuehaiPan on GitHub. ### Required prerequisites - [X] I have read the documentation <https://nvitop.readthedocs.io>. - [X] I have searched the [Issue Tracker](https://github.com/XuehaiPan/nvitop/issues) that this hasn't already been reported. (comment there if it has.) - [X] I have tried the latest version of nvitop in a new isolated virtual environment. ### What version of nvitop are you using? git hash `4093334972a334e9057f5acf7661a2c1a96bd021` ### Operating system and version Docker image (under Centos 7 host) ### NVIDIA driver version 535.54.03 ### NVIDIA-SMI ```text +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce GTX 1080 Ti On | 00000000:02:00.0 Off | N/A | | 23% 45C P2 57W / 250W | 2658MiB / 11264MiB | 32% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 1 NVIDIA GeForce GTX 1080 Ti On | 00000000:82:00.0 Off | N/A | | 24% 45C P2 55W / 250W | 3430MiB / 11264MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 2863 C /opt/deepdetect/build/main/dede 1656MiB | | 0 N/A N/A 4520 C /opt/deepdetect/build/main/dede 368MiB | | 0 N/A N/A 5001 C /opt/deepdetect/build/main/dede 630MiB | | 1 N/A N/A 3267 C /opt/deepdetect/build/main/dede 438MiB | | 1 N/A N/A 3675 C /opt/deepdetect/build/main/dede 308MiB | | 1 N/A N/A 4072 C /opt/deepdetect/build/main/dede 2314MiB | | 1 N/A N/A 5565 C /opt/deepdetect/build/main/dede 366MiB | +---------------------------------------------------------------------------------------+ ``` ### Python environment This is the docker version from the latest git head (6/20/2023) ``` $ sudo docker run -it --rm --runtime=nvidia --gpus=all --pid=host --entrypoint /bin/bash nvitop:4093334972a334e9057f5acf7661a2c1a96bd021 (venv) root@ad4380048e10:/nvitop# python3 -m pip freeze | python3 -c 'import sys; print(sys.version, sys.platform); print("".join(filter(lambda s: any(word in s.lower() for word in ("nvi", "cuda", "nvml", "gpu")), sys.stdin)))' 3.8.10 (default, May 26 2023, 14:05:08) [GCC 9.4.0] linux nvidia-ml-py==11.525.112 nvitop @ file:///nvitop (venv) root@ad4380048e10:/nvitop# ``` ### Problem description The output shows scrambled PIDs for processes after the initial process in the lists for each card, and then shows `No Such Process` for the wrong PIDs. This only started after the driver update, so I assume something is changed in the nvidia drivers. ### Steps to Reproduce The Python snippets (if any): ```python ``` Command lines: ```bash $ sudo docker run -it --rm --runtime=nvidia --gpus=all --pid=host nvitop:4093334972a334e9057f5acf7661a2c1a96bd021 --once Tue Jun 20 18:35:07 2023 ╒═════════════════════════════════════════════════════════════════════════════╕ │ NVITOP 1.1.2 Driver Version: 535.54.03 CUDA Driver Version: 12.2 │ ├───────────────────────────────┬──────────────────────┬──────────────────────┤ │ GPU Name Persistence-M│ Bus-Id Disp.A │ Volatile Uncorr. ECC │ │ Fan Temp Perf Pwr:Usage/Cap│ Memory-Usage │ GPU-Util Compute M. │ ╞═══════════════════════════════╪══════════════════════╪══════════════════════╪════════════════════════════════════════════════════════════════════╕ │ 0 ..orce GTX 1080 Ti On │ 00000000:02:00.0 Off │ N/A │ MEM: █████████████▍ 23.5% │ │ 28% 42C P8 9W / 250W │ 2650MiB / 11264MiB │ 0% Default │ UTL: ▏ 0% │ ├───────────────────────────────┼──────────────────────┼──────────────────────┼────────────────────────────────────────────────────────────────────┤ │ 1 ..orce GTX 1080 Ti On │ 00000000:82:00.0 Off │ N/A │ MEM: █████████████████▍ 30.5% │ │ 29% 44C P8 10W / 250W │ 3430MiB / 11264MiB │ 0% Default │ UTL: ▏ 0% │ ╘═══════════════════════════════╧══════════════════════╧══════════════════════╧════════════════════════════════════════════════════════════════════╛ [ CPU: ██████████████████████████████████████████████████████████████████████████████████████████████████ MAX ] ( Load Average: 71.03 39.83 35.33 ) [ MEM: ███████████████▊ 16.1% USED: 9.49GiB ] [ SWP: ▏ 0.0% ] ╒══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╕ │ Processes: root@2f027c15efb1 │ │ GPU PID USER GPU-MEM %SM %CPU %MEM TIME COMMAND │ ╞══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡ │ 0 2863 C 1000 1648MiB 0 0.0 1.8 2:15:55 /opt/deepdetect/build/main/dede -host 0.0.0.0 -port 8080 a652c745cc9b placeshybrid │ │ 0 0 C N/A 4KiB 0 N/A N/A N/A No Such Process │ │ 0 429496. C N/A 0B 0 N/A N/A N/A No Such Process │ ├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ 1 3267 C 1000 438MiB N/A 0.0 1.3 2:15:18 /opt/deepdetect/build/main/dede -host 0.0.0.0 -port 8080 bf55e7b22839 inceptionresnetv2 │ │ 1 0 C N/A 4KiB N/A N/A N/A N/A No Such Process │ │ 1 242640. C N/A N/A N/A N/A N/A N/A No Such Process │ │ 1 429496. C N/A 0B N/A N/A N/A N/A No Such Process │ ╘══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╛ ``` ### Traceback _No response_ ### Logs ```text $ sudo docker run -it --rm --runtime=nvidia --gpus=all --pid=host -e LOGLEVEL=debug nvitop:4093334972a334e9057f5acf7661a2c1a96bd021 --once [DEBUG] 2023-06-20 18:35:57,178 nvitop.api.libnvml::nvmlDeviceGetMemoryInfo: NVML memory info version 2 is available. Tue Jun 20 18:35:57 2023 ╒═════════════════════════════════════════════════════════════════════════════╕ │ NVITOP 1.1.2 Driver Version: 535.54.03 CUDA Driver Version: 12.2 │ ├───────────────────────────────┬──────────────────────┬──────────────────────┤ │ GPU Name Persistence-M│ Bus-Id Disp.A │ Volatile Uncorr. ECC │ │ Fan Temp Perf Pwr:Usage/Cap│ Memory-Usage │ GPU-Util Compute M. │ ╞═══════════════════════════════╪══════════════════════╪══════════════════════╪════════════════════════════════════════════════════════════════════╕ │ 0 ..orce GTX 1080 Ti On │ 00000000:02:00.0 Off │ N/A │ MEM: █████████████▍ 23.5% │ │ 24% 35C P8 8W / 250W │ 2650MiB / 11264MiB │ 0% Default │ UTL: ▏ 0% │ ├───────────────────────────────┼──────────────────────┼──────────────────────┼────────────────────────────────────────────────────────────────────┤ │ 1 ..orce GTX 1080 Ti On │ 00000000:82:00.0 Off │ N/A │ MEM: █████████████████▍ 30.5% │ │ 25% 36C P8 9W / 250W │ 3430MiB / 11264MiB │ 0% Default │ UTL: ▏ 0% │ ╘═══════════════════════════════╧══════════════════════╧══════════════════════╧════════════════════════════════════════════════════════════════════╛ [ CPU: █████████████████████████████████████████████████████████████████████████████████████████████████▊ MAX ] ( Load Average: 84.50 48.19 38.40 ) [ MEM: ███████████████▋ 15.9% USED: 9.36GiB ] [ SWP: ▏ 0.0% ] ╒══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╕ │ Processes: root@333a2a93dbb1 │ │ GPU PID USER GPU-MEM %SM %CPU %MEM TIME COMMAND │ ╞══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡ │ 0 2863 C 1000 1648MiB 0 0.0 1.8 2:16:45 /opt/deepdetect/build/main/dede -host 0.0.0.0 -port 8080 a652c745cc9b placeshybrid │ │ 0 0 C N/A 4KiB 0 N/A N/A N/A No Such Process │ │ 0 429496. C N/A 0B 0 N/A N/A N/A No Such Process │ ├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ 1 3267 C 1000 438MiB N/A 0.0 1.3 2:16:08 /opt/deepdetect/build/main/dede -host 0.0.0.0 -port 8080 bf55e7b22839 inceptionresnetv2 │ │ 1 0 C N/A 4KiB N/A N/A N/A N/A No Such Process │ │ 1 242640. C N/A N/A N/A N/A N/A N/A No Such Process │ │ 1 429496. C N/A 0B N/A N/A N/A N/A No Such Process │ ╘══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╛ ``` ### Expected behavior Prior to the driver update, the information was present for the same PIDs included in `nvidia-smi` but with the full commandlines and the per-process resource statistics (e.g. `GPU PID USER GPU-MEM %SM %CPU %MEM TIME`). Now it seems to be having an issue parsing proper PIDs from the nvidia libraries, and then failing downstream from there. ### Additional context I'm not much of a Python programmer unfortunately so I'm not clear where to dig in, but I'd assume the issue is somewhere in the area of receiving the process list for the cards and deciphering the PIDs. My assumption is that something changed in the driver or some structure or class such that parsing code seems to have broken somewhere.
gitea-mirror 2026-05-05 03:23:13 -06:00
Author
Owner

@XuehaiPan commented on GitHub (Jun 21, 2023):

Hi @marcreichman-pfi, thanks for raising this. I have encountered the same issue before. I think this would be a bug on the upstream (nvidia-ml-py) with the incompatible NVIDIA driver. The nvidia-ml-py returns invalid PIDs.

In [1]: import pynvml

In [2]: pynvml.nvmlInit()

In [3]: handle = pynvml.nvmlDeviceGetHandleByIndex(0)

In [4]: [p.pid for p in pynvml.nvmlDeviceGetComputeRunningProcesses(handle)]
Out[4]:
[1184,
 0,
 4294967295,
 4294967295,
 16040,
 0,
 4294967295,
 4294967295,
 19984,
 0,
 4294967295,
 4294967295,
 20884,
 0,
 4294967295,
 4294967295,
 26308,
 0,
 4294967295,
 4294967295,
 16336,
 0,
 4294967295,
 4294967295,
 5368,
 0,
 4294967295,
 4294967295,
 19828,
 0,
 4294967295]

I haven't found a solution for this yet. This may be due to an internal API change in the NVML library. We may need to wait for the next nvidia-ml-py release.

As a temporary workaround, you could downgrade your NVIDIA driver version.

See also:

<!-- gh-comment-id:1600733759 --> @XuehaiPan commented on GitHub (Jun 21, 2023): Hi @marcreichman-pfi, thanks for raising this. I have encountered the same issue before. I think this would be a bug on the upstream ([`nvidia-ml-py`](https://pypi.org/project/nvidia-ml-py)) with the incompatible NVIDIA driver. The `nvidia-ml-py` returns invalid PIDs. ```python In [1]: import pynvml In [2]: pynvml.nvmlInit() In [3]: handle = pynvml.nvmlDeviceGetHandleByIndex(0) In [4]: [p.pid for p in pynvml.nvmlDeviceGetComputeRunningProcesses(handle)] Out[4]: [1184, 0, 4294967295, 4294967295, 16040, 0, 4294967295, 4294967295, 19984, 0, 4294967295, 4294967295, 20884, 0, 4294967295, 4294967295, 26308, 0, 4294967295, 4294967295, 16336, 0, 4294967295, 4294967295, 5368, 0, 4294967295, 4294967295, 19828, 0, 4294967295] ``` I haven't found a solution for this yet. This may be due to an internal API change in the NVML library. We may need to wait for the next `nvidia-ml-py` release. As a temporary workaround, you could downgrade your NVIDIA driver version. See also: - #76 - giampaolo/psutil#2264 - giampaolo/psutil#2266
Author
Owner

@marcreichman-pfi commented on GitHub (Jun 21, 2023):

Hi @XuehaiPan and thanks for your response and excellent tool!

We cannot downgrade because we need newer CUDA version support, so for now we'll just have to wait for an updated version with the NVML library fix.

<!-- gh-comment-id:1600869934 --> @marcreichman-pfi commented on GitHub (Jun 21, 2023): Hi @XuehaiPan and thanks for your response and excellent tool! We cannot downgrade because we need newer CUDA version support, so for now we'll just have to wait for an updated version with the NVML library fix.
Author
Owner

@XuehaiPan commented on GitHub (Jul 7, 2023):

Hi @marcreichman-pfi, a new release of nvidia-ml-py with version 12.535.77 came out several hours ago. You can upgrade your nvidia-ml-py package with the command:

python3 -m pip install --upgrade nvidia-ml-py

This would resolve the unrecognized PIDs with CUDA 12 drivers.

I would also make a new release of nvitop to resolve CUDA 12 driver support.

<!-- gh-comment-id:1624730338 --> @XuehaiPan commented on GitHub (Jul 7, 2023): Hi @marcreichman-pfi, a new release of `nvidia-ml-py` with version 12.535.77 came out several hours ago. You can upgrade your `nvidia-ml-py` package with the command: ```bash python3 -m pip install --upgrade nvidia-ml-py ``` This would resolve the unrecognized PIDs with CUDA 12 drivers. I would also make a new release of `nvitop` to resolve CUDA 12 driver support.
Author
Owner

@marcreichman-pfi commented on GitHub (Jul 7, 2023):

Thanks @XuehaiPan - is there a way to do this in the docker version?

<!-- gh-comment-id:1625267147 --> @marcreichman-pfi commented on GitHub (Jul 7, 2023): Thanks @XuehaiPan - is there a way to do this in the docker version?
Author
Owner

@XuehaiPan commented on GitHub (Jul 7, 2023):

Thanks @XuehaiPan - is there a way to do this in the docker version?

@marcreichman-pfi You could upgrade nvidia-ml-py in your docker container.

<!-- gh-comment-id:1625284464 --> @XuehaiPan commented on GitHub (Jul 7, 2023): > Thanks @XuehaiPan - is there a way to do this in the docker version? @marcreichman-pfi You could upgrade `nvidia-ml-py` in your docker container.
Author
Owner

@marcreichman-pfi commented on GitHub (Jul 7, 2023):

Thanks this did the trick! Here was what I did from your Dockerfile:

$ git diff Dockerfile
diff --git a/Dockerfile b/Dockerfile
index c3194cf..96874da 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -32,6 +32,7 @@ COPY . /nvitop
 WORKDIR /nvitop
 RUN . /venv/bin/activate && \
   python3 -m pip install . && \
+  python3 -m pip install --upgrade nvidia-ml-py && \
   rm -rf /root/.cache

 # Entrypoint
<!-- gh-comment-id:1625295168 --> @marcreichman-pfi commented on GitHub (Jul 7, 2023): Thanks this did the trick! Here was what I did from your `Dockerfile`: ``` $ git diff Dockerfile diff --git a/Dockerfile b/Dockerfile index c3194cf..96874da 100644 --- a/Dockerfile +++ b/Dockerfile @@ -32,6 +32,7 @@ COPY . /nvitop WORKDIR /nvitop RUN . /venv/bin/activate && \ python3 -m pip install . && \ + python3 -m pip install --upgrade nvidia-ml-py && \ rm -rf /root/.cache # Entrypoint ```
Author
Owner

@ukejeb commented on GitHub (Aug 15, 2024):

nvitop-1.3.2 with nvidia-ml-py-12.535.161, CUDA 12.2 and Driver Version 535.129.03 also shows No Such Process.

<!-- gh-comment-id:2290467656 --> @ukejeb commented on GitHub (Aug 15, 2024): `nvitop-1.3.2` with `nvidia-ml-py-12.535.161`, `CUDA 12.2` and `Driver Version 535.129.03` also shows `No Such Process`.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/nvitop#45
No description provided.