[GH-ISSUE #75] [BUG] PIDs are scrambled and `No Such Process` is printed since update to NVIDIA drivers

gitea-mirror commented

2026-05-05 03:23:13 -06:00

Owner

Originally created by @marcreichman-pfi on GitHub (Jun 20, 2023).
Original GitHub issue: https://github.com/XuehaiPan/nvitop/issues/75

Originally assigned to: @XuehaiPan on GitHub.

Required prerequisites

I have read the documentation https://nvitop.readthedocs.io.
I have searched the Issue Tracker that this hasn't already been reported. (comment there if it has.)
I have tried the latest version of nvitop in a new isolated virtual environment.

What version of nvitop are you using?

git hash 4093334972a334e9057f5acf7661a2c1a96bd021

Operating system and version

Docker image (under Centos 7 host)

NVIDIA driver version

535.54.03

NVIDIA-SMI

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce GTX 1080 Ti     On  | 00000000:02:00.0 Off |                  N/A |
| 23%   45C    P2              57W / 250W |   2658MiB / 11264MiB |     32%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce GTX 1080 Ti     On  | 00000000:82:00.0 Off |                  N/A |
| 24%   45C    P2              55W / 250W |   3430MiB / 11264MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      2863      C   /opt/deepdetect/build/main/dede            1656MiB |
|    0   N/A  N/A      4520      C   /opt/deepdetect/build/main/dede             368MiB |
|    0   N/A  N/A      5001      C   /opt/deepdetect/build/main/dede             630MiB |
|    1   N/A  N/A      3267      C   /opt/deepdetect/build/main/dede             438MiB |
|    1   N/A  N/A      3675      C   /opt/deepdetect/build/main/dede             308MiB |
|    1   N/A  N/A      4072      C   /opt/deepdetect/build/main/dede            2314MiB |
|    1   N/A  N/A      5565      C   /opt/deepdetect/build/main/dede             366MiB |
+---------------------------------------------------------------------------------------+

Python environment

This is the docker version from the latest git head (6/20/2023)

$ sudo docker run -it --rm --runtime=nvidia --gpus=all --pid=host --entrypoint /bin/bash nvitop:4093334972a334e9057f5acf7661a2c1a96bd021
(venv) root@ad4380048e10:/nvitop# python3 -m pip freeze | python3 -c 'import sys; print(sys.version, sys.platform); print("".join(filter(lambda s: any(word in s.lower() for word in ("nvi", "cuda", "nvml", "gpu")), sys.stdin)))'
3.8.10 (default, May 26 2023, 14:05:08)
[GCC 9.4.0] linux
nvidia-ml-py==11.525.112
nvitop @ file:///nvitop

(venv) root@ad4380048e10:/nvitop#

Problem description

The output shows scrambled PIDs for processes after the initial process in the lists for each card, and then shows No Such Process for the wrong PIDs. This only started after the driver update, so I assume something is changed in the nvidia drivers.

Steps to Reproduce

The Python snippets (if any):

Command lines:

$ sudo docker run -it --rm --runtime=nvidia --gpus=all --pid=host nvitop:4093334972a334e9057f5acf7661a2c1a96bd021 --once
Tue Jun 20 18:35:07 2023
╒═════════════════════════════════════════════════════════════════════════════╕
│ NVITOP 1.1.2       Driver Version: 535.54.03      CUDA Driver Version: 12.2 │
├───────────────────────────────┬──────────────────────┬──────────────────────┤
│ GPU  Name        Persistence-M│ Bus-Id        Disp.A │ Volatile Uncorr. ECC │
│ Fan  Temp  Perf  Pwr:Usage/Cap│         Memory-Usage │ GPU-Util  Compute M. │
╞═══════════════════════════════╪══════════════════════╪══════════════════════╪════════════════════════════════════════════════════════════════════╕
│   0  ..orce GTX 1080 Ti  On   │ 00000000:02:00.0 Off │                  N/A │ MEM: █████████████▍ 23.5%                                          │
│ 28%   42C    P8     9W / 250W │   2650MiB / 11264MiB │      0%      Default │ UTL: ▏ 0%                                                          │
├───────────────────────────────┼──────────────────────┼──────────────────────┼────────────────────────────────────────────────────────────────────┤
│   1  ..orce GTX 1080 Ti  On   │ 00000000:82:00.0 Off │                  N/A │ MEM: █████████████████▍ 30.5%                                      │
│ 29%   44C    P8    10W / 250W │   3430MiB / 11264MiB │      0%      Default │ UTL: ▏ 0%                                                          │
╘═══════════════════════════════╧══════════════════════╧══════════════════════╧════════════════════════════════════════════════════════════════════╛
[ CPU: ██████████████████████████████████████████████████████████████████████████████████████████████████ MAX ]  ( Load Average: 71.03 39.83 35.33 )
[ MEM: ███████████████▊ 16.1%                                                                   USED: 9.49GiB ]  [ SWP: ▏ 0.0%                     ]

╒══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╕
│ Processes:                                                                                                                     root@2f027c15efb1 │
│ GPU     PID      USER  GPU-MEM %SM  %CPU  %MEM     TIME  COMMAND                                                                                 │
╞══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
│   0    2863 C    1000  1648MiB   0   0.0   1.8  2:15:55  /opt/deepdetect/build/main/dede -host 0.0.0.0 -port 8080 a652c745cc9b placeshybrid      │
│   0       0 C     N/A     4KiB   0   N/A   N/A      N/A  No Such Process                                                                         │
│   0 429496. C     N/A       0B   0   N/A   N/A      N/A  No Such Process                                                                         │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│   1    3267 C    1000   438MiB N/A   0.0   1.3  2:15:18  /opt/deepdetect/build/main/dede -host 0.0.0.0 -port 8080 bf55e7b22839 inceptionresnetv2 │
│   1       0 C     N/A     4KiB N/A   N/A   N/A      N/A  No Such Process                                                                         │
│   1 242640. C     N/A      N/A N/A   N/A   N/A      N/A  No Such Process                                                                         │
│   1 429496. C     N/A       0B N/A   N/A   N/A      N/A  No Such Process                                                                         │
╘══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╛

Traceback

No response

Logs

$ sudo docker run -it --rm --runtime=nvidia --gpus=all --pid=host -e LOGLEVEL=debug nvitop:4093334972a334e9057f5acf7661a2c1a96bd021 --once
[DEBUG] 2023-06-20 18:35:57,178 nvitop.api.libnvml::nvmlDeviceGetMemoryInfo: NVML memory info version 2 is available.
Tue Jun 20 18:35:57 2023
╒═════════════════════════════════════════════════════════════════════════════╕
│ NVITOP 1.1.2       Driver Version: 535.54.03      CUDA Driver Version: 12.2 │
├───────────────────────────────┬──────────────────────┬──────────────────────┤
│ GPU  Name        Persistence-M│ Bus-Id        Disp.A │ Volatile Uncorr. ECC │
│ Fan  Temp  Perf  Pwr:Usage/Cap│         Memory-Usage │ GPU-Util  Compute M. │
╞═══════════════════════════════╪══════════════════════╪══════════════════════╪════════════════════════════════════════════════════════════════════╕
│   0  ..orce GTX 1080 Ti  On   │ 00000000:02:00.0 Off │                  N/A │ MEM: █████████████▍ 23.5%                                          │
│ 24%   35C    P8     8W / 250W │   2650MiB / 11264MiB │      0%      Default │ UTL: ▏ 0%                                                          │
├───────────────────────────────┼──────────────────────┼──────────────────────┼────────────────────────────────────────────────────────────────────┤
│   1  ..orce GTX 1080 Ti  On   │ 00000000:82:00.0 Off │                  N/A │ MEM: █████████████████▍ 30.5%                                      │
│ 25%   36C    P8     9W / 250W │   3430MiB / 11264MiB │      0%      Default │ UTL: ▏ 0%                                                          │
╘═══════════════════════════════╧══════════════════════╧══════════════════════╧════════════════════════════════════════════════════════════════════╛
[ CPU: █████████████████████████████████████████████████████████████████████████████████████████████████▊ MAX ]  ( Load Average: 84.50 48.19 38.40 )
[ MEM: ███████████████▋ 15.9%                                                                   USED: 9.36GiB ]  [ SWP: ▏ 0.0%                     ]

╒══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╕
│ Processes:                                                                                                                     root@333a2a93dbb1 │
│ GPU     PID      USER  GPU-MEM %SM  %CPU  %MEM     TIME  COMMAND                                                                                 │
╞══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
│   0    2863 C    1000  1648MiB   0   0.0   1.8  2:16:45  /opt/deepdetect/build/main/dede -host 0.0.0.0 -port 8080 a652c745cc9b placeshybrid      │
│   0       0 C     N/A     4KiB   0   N/A   N/A      N/A  No Such Process                                                                         │
│   0 429496. C     N/A       0B   0   N/A   N/A      N/A  No Such Process                                                                         │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│   1    3267 C    1000   438MiB N/A   0.0   1.3  2:16:08  /opt/deepdetect/build/main/dede -host 0.0.0.0 -port 8080 bf55e7b22839 inceptionresnetv2 │
│   1       0 C     N/A     4KiB N/A   N/A   N/A      N/A  No Such Process                                                                         │
│   1 242640. C     N/A      N/A N/A   N/A   N/A      N/A  No Such Process                                                                         │
│   1 429496. C     N/A       0B N/A   N/A   N/A      N/A  No Such Process                                                                         │
╘══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╛

Expected behavior

Prior to the driver update, the information was present for the same PIDs included in nvidia-smi but with the full commandlines and the per-process resource statistics (e.g. GPU PID USER GPU-MEM %SM %CPU %MEM TIME). Now it seems to be having an issue parsing proper PIDs from the nvidia libraries, and then failing downstream from there.

Additional context

I'm not much of a Python programmer unfortunately so I'm not clear where to dig in, but I'd assume the issue is somewhere in the area of receiving the process list for the cards and deciphering the PIDs. My assumption is that something changed in the driver or some structure or class such that parsing code seems to have broken somewhere.

Originally created by @marcreichman-pfi on GitHub (Jun 20, 2023). Original GitHub issue: https://github.com/XuehaiPan/nvitop/issues/75 Originally assigned to: @XuehaiPan on GitHub. ### Required prerequisites - [X] I have read the documentation <https://nvitop.readthedocs.io>. - [X] I have searched the [Issue Tracker](https://github.com/XuehaiPan/nvitop/issues) that this hasn't already been reported. (comment there if it has.) - [X] I have tried the latest version of nvitop in a new isolated virtual environment. ### What version of nvitop are you using? git hash `4093334972a334e9057f5acf7661a2c1a96bd021` ### Operating system and version Docker image (under Centos 7 host) ### NVIDIA driver version 535.54.03 ### NVIDIA-SMI ```text +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce GTX 1080 Ti On | 00000000:02:00.0 Off | N/A | | 23% 45C P2 57W / 250W | 2658MiB / 11264MiB | 32% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 1 NVIDIA GeForce GTX 1080 Ti On | 00000000:82:00.0 Off | N/A | | 24% 45C P2 55W / 250W | 3430MiB / 11264MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 2863 C /opt/deepdetect/build/main/dede 1656MiB | | 0 N/A N/A 4520 C /opt/deepdetect/build/main/dede 368MiB | | 0 N/A N/A 5001 C /opt/deepdetect/build/main/dede 630MiB | | 1 N/A N/A 3267 C /opt/deepdetect/build/main/dede 438MiB | | 1 N/A N/A 3675 C /opt/deepdetect/build/main/dede 308MiB | | 1 N/A N/A 4072 C /opt/deepdetect/build/main/dede 2314MiB | | 1 N/A N/A 5565 C /opt/deepdetect/build/main/dede 366MiB | +---------------------------------------------------------------------------------------+ ``` ### Python environment This is the docker version from the latest git head (6/20/2023) ``` $ sudo docker run -it --rm --runtime=nvidia --gpus=all --pid=host --entrypoint /bin/bash nvitop:4093334972a334e9057f5acf7661a2c1a96bd021 (venv) root@ad4380048e10:/nvitop# python3 -m pip freeze | python3 -c 'import sys; print(sys.version, sys.platform); print("".join(filter(lambda s: any(word in s.lower() for word in ("nvi", "cuda", "nvml", "gpu")), sys.stdin)))' 3.8.10 (default, May 26 2023, 14:05:08) [GCC 9.4.0] linux nvidia-ml-py==11.525.112 nvitop @ file:///nvitop (venv) root@ad4380048e10:/nvitop# ``` ### Problem description The output shows scrambled PIDs for processes after the initial process in the lists for each card, and then shows `No Such Process` for the wrong PIDs. This only started after the driver update, so I assume something is changed in the nvidia drivers. ### Steps to Reproduce The Python snippets (if any): ```python ``` Command lines: ```bash $ sudo docker run -it --rm --runtime=nvidia --gpus=all --pid=host nvitop:4093334972a334e9057f5acf7661a2c1a96bd021 --once Tue Jun 20 18:35:07 2023 ╒═════════════════════════════════════════════════════════════════════════════╕ │ NVITOP 1.1.2 Driver Version: 535.54.03 CUDA Driver Version: 12.2 │ ├───────────────────────────────┬──────────────────────┬──────────────────────┤ │ GPU Name Persistence-M│ Bus-Id Disp.A │ Volatile Uncorr. ECC │ │ Fan Temp Perf Pwr:Usage/Cap│ Memory-Usage │ GPU-Util Compute M. │ ╞═══════════════════════════════╪══════════════════════╪══════════════════════╪════════════════════════════════════════════════════════════════════╕ │ 0 ..orce GTX 1080 Ti On │ 00000000:02:00.0 Off │ N/A │ MEM: █████████████▍ 23.5% │ │ 28% 42C P8 9W / 250W │ 2650MiB / 11264MiB │ 0% Default │ UTL: ▏ 0% │ ├───────────────────────────────┼──────────────────────┼──────────────────────┼────────────────────────────────────────────────────────────────────┤ │ 1 ..orce GTX 1080 Ti On │ 00000000:82:00.0 Off │ N/A │ MEM: █████████████████▍ 30.5% │ │ 29% 44C P8 10W / 250W │ 3430MiB / 11264MiB │ 0% Default │ UTL: ▏ 0% │ ╘═══════════════════════════════╧══════════════════════╧══════════════════════╧════════════════════════════════════════════════════════════════════╛ [ CPU: ██████████████████████████████████████████████████████████████████████████████████████████████████ MAX ] ( Load Average: 71.03 39.83 35.33 ) [ MEM: ███████████████▊ 16.1% USED: 9.49GiB ] [ SWP: ▏ 0.0% ] ╒══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╕ │ Processes: root@2f027c15efb1 │ │ GPU PID USER GPU-MEM %SM %CPU %MEM TIME COMMAND │ ╞══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡ │ 0 2863 C 1000 1648MiB 0 0.0 1.8 2:15:55 /opt/deepdetect/build/main/dede -host 0.0.0.0 -port 8080 a652c745cc9b placeshybrid │ │ 0 0 C N/A 4KiB 0 N/A N/A N/A No Such Process │ │ 0 429496. C N/A 0B 0 N/A N/A N/A No Such Process │ ├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ 1 3267 C 1000 438MiB N/A 0.0 1.3 2:15:18 /opt/deepdetect/build/main/dede -host 0.0.0.0 -port 8080 bf55e7b22839 inceptionresnetv2 │ │ 1 0 C N/A 4KiB N/A N/A N/A N/A No Such Process │ │ 1 242640. C N/A N/A N/A N/A N/A N/A No Such Process │ │ 1 429496. C N/A 0B N/A N/A N/A N/A No Such Process │ ╘══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╛ ``` ### Traceback _No response_ ### Logs ```text $ sudo docker run -it --rm --runtime=nvidia --gpus=all --pid=host -e LOGLEVEL=debug nvitop:4093334972a334e9057f5acf7661a2c1a96bd021 --once [DEBUG] 2023-06-20 18:35:57,178 nvitop.api.libnvml::nvmlDeviceGetMemoryInfo: NVML memory info version 2 is available. Tue Jun 20 18:35:57 2023 ╒═════════════════════════════════════════════════════════════════════════════╕ │ NVITOP 1.1.2 Driver Version: 535.54.03 CUDA Driver Version: 12.2 │ ├───────────────────────────────┬──────────────────────┬──────────────────────┤ │ GPU Name Persistence-M│ Bus-Id Disp.A │ Volatile Uncorr. ECC │ │ Fan Temp Perf Pwr:Usage/Cap│ Memory-Usage │ GPU-Util Compute M. │ ╞═══════════════════════════════╪══════════════════════╪══════════════════════╪════════════════════════════════════════════════════════════════════╕ │ 0 ..orce GTX 1080 Ti On │ 00000000:02:00.0 Off │ N/A │ MEM: █████████████▍ 23.5% │ │ 24% 35C P8 8W / 250W │ 2650MiB / 11264MiB │ 0% Default │ UTL: ▏ 0% │ ├───────────────────────────────┼──────────────────────┼──────────────────────┼────────────────────────────────────────────────────────────────────┤ │ 1 ..orce GTX 1080 Ti On │ 00000000:82:00.0 Off │ N/A │ MEM: █████████████████▍ 30.5% │ │ 25% 36C P8 9W / 250W │ 3430MiB / 11264MiB │ 0% Default │ UTL: ▏ 0% │ ╘═══════════════════════════════╧══════════════════════╧══════════════════════╧════════════════════════════════════════════════════════════════════╛ [ CPU: █████████████████████████████████████████████████████████████████████████████████████████████████▊ MAX ] ( Load Average: 84.50 48.19 38.40 ) [ MEM: ███████████████▋ 15.9% USED: 9.36GiB ] [ SWP: ▏ 0.0% ] ╒══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╕ │ Processes: root@333a2a93dbb1 │ │ GPU PID USER GPU-MEM %SM %CPU %MEM TIME COMMAND │ ╞══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡ │ 0 2863 C 1000 1648MiB 0 0.0 1.8 2:16:45 /opt/deepdetect/build/main/dede -host 0.0.0.0 -port 8080 a652c745cc9b placeshybrid │ │ 0 0 C N/A 4KiB 0 N/A N/A N/A No Such Process │ │ 0 429496. C N/A 0B 0 N/A N/A N/A No Such Process │ ├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ 1 3267 C 1000 438MiB N/A 0.0 1.3 2:16:08 /opt/deepdetect/build/main/dede -host 0.0.0.0 -port 8080 bf55e7b22839 inceptionresnetv2 │ │ 1 0 C N/A 4KiB N/A N/A N/A N/A No Such Process │ │ 1 242640. C N/A N/A N/A N/A N/A N/A No Such Process │ │ 1 429496. C N/A 0B N/A N/A N/A N/A No Such Process │ ╘══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╛ ``` ### Expected behavior Prior to the driver update, the information was present for the same PIDs included in `nvidia-smi` but with the full commandlines and the per-process resource statistics (e.g. `GPU PID USER GPU-MEM %SM %CPU %MEM TIME`). Now it seems to be having an issue parsing proper PIDs from the nvidia libraries, and then failing downstream from there. ### Additional context I'm not much of a Python programmer unfortunately so I'm not clear where to dig in, but I'd assume the issue is somewhere in the area of receiving the process list for the cards and deciphering the PIDs. My assumption is that something changed in the driver or some structure or class such that parsing code seems to have broken somewhere.

gitea-mirror

2026-05-05 03:23:13 -06:00

closed this issue
added the
pynvml

api

bug

upstream
labels

gitea-mirror commented

2026-05-05 03:23:16 -06:00

Author

Owner

@XuehaiPan commented on GitHub (Jun 21, 2023):

Hi @marcreichman-pfi, thanks for raising this. I have encountered the same issue before. I think this would be a bug on the upstream (nvidia-ml-py) with the incompatible NVIDIA driver. The nvidia-ml-py returns invalid PIDs.

In [1]: import pynvml

In [2]: pynvml.nvmlInit()

In [3]: handle = pynvml.nvmlDeviceGetHandleByIndex(0)

In [4]: [p.pid for p in pynvml.nvmlDeviceGetComputeRunningProcesses(handle)]
Out[4]:
[1184,
 0,
 4294967295,
 4294967295,
 16040,
 0,
 4294967295,
 4294967295,
 19984,
 0,
 4294967295,
 4294967295,
 20884,
 0,
 4294967295,
 4294967295,
 26308,
 0,
 4294967295,
 4294967295,
 16336,
 0,
 4294967295,
 4294967295,
 5368,
 0,
 4294967295,
 4294967295,
 19828,
 0,
 4294967295]

I haven't found a solution for this yet. This may be due to an internal API change in the NVML library. We may need to wait for the next nvidia-ml-py release.

As a temporary workaround, you could downgrade your NVIDIA driver version.

Rows
Columns

[GH-ISSUE #75] [BUG] PIDs are scrambled and No Such Process is printed since update to NVIDIA drivers #45