[GH-ISSUE #106] [BUG][exporter] Process metrics still exist when the process is gone #66

Closed
opened 2026-05-05 03:24:18 -06:00 by gitea-mirror · 5 comments
Owner

Originally created by @caotangdaiduong on GitHub (Nov 22, 2023).
Original GitHub issue: https://github.com/XuehaiPan/nvitop/issues/106

Originally assigned to: @XuehaiPan on GitHub.

Required prerequisites

  • I have read the documentation https://nvitop.readthedocs.io.
  • I have searched the Issue Tracker that this hasn't already been reported. (comment there if it has.)
  • I have tried the latest version of nvitop in a new isolated virtual environment.

What version of nvitop are you using?

1.3.1

Operating system and version

Ubuntu 20.04.4 LTS

NVIDIA driver version

510.47.03

NVIDIA-SMI

Wed Nov 22 16:23:39 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.85.02    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+

Python environment

3.8.10 (default, May 26 2023, 14:05:08)
[GCC 9.4.0] linux
nvidia-ml-py==12.535.133
nvitop==1.3.1
nvitop-exporter==1.3.1

Problem description

nvitop-exporter cache value

Metric values are retained and not refreshed

Steps to Reproduce

The Python snippets (if any):


Command lines:


Traceback

No response

Logs

No response

Expected behavior

No response

Additional context

No response

Originally created by @caotangdaiduong on GitHub (Nov 22, 2023). Original GitHub issue: https://github.com/XuehaiPan/nvitop/issues/106 Originally assigned to: @XuehaiPan on GitHub. ### Required prerequisites - [X] I have read the documentation <https://nvitop.readthedocs.io>. - [X] I have searched the [Issue Tracker](https://github.com/XuehaiPan/nvitop/issues) that this hasn't already been reported. (comment there if it has.) - [X] I have tried the latest version of nvitop in a new isolated virtual environment. ### What version of nvitop are you using? 1.3.1 ### Operating system and version Ubuntu 20.04.4 LTS ### NVIDIA driver version 510.47.03 ### NVIDIA-SMI ```text Wed Nov 22 16:23:39 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 510.85.02 Driver Version: 510.47.03 CUDA Version: 11.6 | |-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ ``` ### Python environment 3.8.10 (default, May 26 2023, 14:05:08) [GCC 9.4.0] linux nvidia-ml-py==12.535.133 nvitop==1.3.1 nvitop-exporter==1.3.1 ### Problem description ### nvitop-exporter cache value Metric values are retained and not refreshed ### Steps to Reproduce The Python snippets (if any): ```python ``` Command lines: ```bash ``` ### Traceback _No response_ ### Logs _No response_ ### Expected behavior _No response_ ### Additional context _No response_
gitea-mirror 2026-05-05 03:24:18 -06:00
Author
Owner

@XuehaiPan commented on GitHub (Nov 22, 2023):

Metric values are retained and not refreshed

Hi @caotangdaiduong, do you set up a prometheus service to retrieve the latest metrics automatically?

<!-- gh-comment-id:1822759018 --> @XuehaiPan commented on GitHub (Nov 22, 2023): > Metric values are retained and not refreshed Hi @caotangdaiduong, do you set up a `prometheus` service to retrieve the latest metrics automatically?
Author
Owner

@caotangdaiduong commented on GitHub (Nov 22, 2023):

And currently I'm using cron to restart the service every minute, this may sound crazy but the metric is completely accurate.

<!-- gh-comment-id:1822891681 --> @caotangdaiduong commented on GitHub (Nov 22, 2023): And currently I'm using cron to restart the service every minute, this may sound crazy but the metric is completely accurate.
Author
Owner

@XuehaiPan commented on GitHub (Nov 22, 2023):

I know by default nvitop default interval is 1s but I have added the interval option with different values like 15s, 30s but the result is still the same.

@caotangdaiduong I can see the metrics are updating on my side. I'm running watch --differences:

watch --differences 'curl -s http://127.0.0.1:8000/metrics'

This is similar to pushgateway, it only updates the value with the last key name and if there is a new key, there will be new values. I think it's similar to the case with many different values (in my case, every time the PID, index is changed, it creates a new one, and the old PID, index is still there).

The metrics for GPU processes are actively updated on my side.

I can confirm if the GPU process is gone, the gauge keys still exist. Do you mean you want to remove these keys if the corresponding processes are gone?

<!-- gh-comment-id:1822940769 --> @XuehaiPan commented on GitHub (Nov 22, 2023): > I know by default nvitop default interval is 1s but I have added the interval option with different values like 15s, 30s but the result is still the same. @caotangdaiduong I can see the metrics are updating on my side. I'm running `watch --differences`: ```bash watch --differences 'curl -s http://127.0.0.1:8000/metrics' ``` > This is similar to pushgateway, it only updates the value with the last key name and if there is a new key, there will be new values. I think it's similar to the case with many different values (in my case, every time the PID, index is changed, it creates a new one, and the old PID, index is still there). The metrics for GPU processes are actively updated on my side. I can confirm if the GPU process is gone, the gauge keys still exist. Do you mean you want to remove these keys if the corresponding processes are gone?
Author
Owner

@XuehaiPan commented on GitHub (Nov 22, 2023):

  • You will see that both the old and new PIDs exist when calling curl to the exporter

@caotangdaiduong I can confirm this and opened a PR #107 to resolve this. You can try it via:

python3 -m pip install "git+https://github.com/XuehaiPan/nvitop.git@exporter-remove-gone-process#egg=nvitop-exporter&subdirectory=nvitop-exporter"
<!-- gh-comment-id:1823276853 --> @XuehaiPan commented on GitHub (Nov 22, 2023): > * You will see that both the old and new PIDs exist when calling curl to the exporter @caotangdaiduong I can confirm this and opened a PR #107 to resolve this. You can try it via: ```python python3 -m pip install "git+https://github.com/XuehaiPan/nvitop.git@exporter-remove-gone-process#egg=nvitop-exporter&subdirectory=nvitop-exporter" ```
Author
Owner

@caotangdaiduong commented on GitHub (Nov 23, 2023):

Hi @XuehaiPan

Thanks for your efforts, I tested it and it works as expected

<!-- gh-comment-id:1823724985 --> @caotangdaiduong commented on GitHub (Nov 23, 2023): Hi @XuehaiPan Thanks for your efforts, I tested it and it works as expected
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/nvitop#66
No description provided.