[GH-ISSUE #65] [Feature Request] Refresh rate < 1 sec #38

Closed
opened 2026-05-05 03:22:55 -06:00 by gitea-mirror · 9 comments
Owner

Originally created by @BlueskyFR on GitHub (Apr 6, 2023).
Original GitHub issue: https://github.com/XuehaiPan/nvitop/issues/65

Originally assigned to: @XuehaiPan on GitHub.

Required prerequisites

  • I have searched the Issue Tracker that this hasn't already been reported. (comment there if it has.)
  • I have tried the latest version of nvitop in a new isolated virtual environment.

Motivation

I see the current minimum refresh rate is 1 second.
Could it be something like 0.1 sec so that we can get a more accurate overview of what is happening on the GPU?

Solution

Alternatives

Additional context

Originally created by @BlueskyFR on GitHub (Apr 6, 2023). Original GitHub issue: https://github.com/XuehaiPan/nvitop/issues/65 Originally assigned to: @XuehaiPan on GitHub. ### Required prerequisites - [X] I have searched the [Issue Tracker](https://github.com/XuehaiPan/nvitop/issues) that this hasn't already been reported. (comment there if it has.) - [X] I have tried the latest version of nvitop in a new isolated virtual environment. ### Motivation <!-- Please outline the motivation for the proposal. Is your feature request related to a problem? E.g., "I'm always frustrated when [...]". If this is related to another issue, please link here too. --> I see the current minimum refresh rate is 1 second. Could it be something like 0.1 sec so that we can get a more accurate overview of what is happening on the GPU? ### Solution - ### Alternatives - ### Additional context -
gitea-mirror 2026-05-05 03:22:55 -06:00
Author
Owner

@XuehaiPan commented on GitHub (Apr 6, 2023):

Duplicate of #32, #63.

Could it be something like 0.1 sec

Hi @BlueskyFR, the latency from the NVML API call is relatively high. I think it's meaningless to support small intervals like 0.1 second. If you want a fine-grained report of resource usage, maybe you should use a profiler instead.

so that we can get a more accurate overview of what is happening on the GPU?

  1. You can select a process and then press the <Enter> key. The metrics on the top row will refresh every 1/4 sec.

Process Metrics Screen
Watch metrics for a specific process (shortcut: Enter / Return).

  1. Use nvitop.ResourceMetricCollector, see Resource Metric Collector for more information.
<!-- gh-comment-id:1498761381 --> @XuehaiPan commented on GitHub (Apr 6, 2023): Duplicate of #32, #63. > Could it be something like 0.1 sec Hi @BlueskyFR, the latency from the NVML API call is relatively high. I think it's meaningless to support small intervals like 0.1 second. If you want a fine-grained report of resource usage, maybe you should use a profiler instead. > so that we can get a more accurate overview of what is happening on the GPU? 1. You can select a process and then press the `<Enter>` key. The metrics on the top row will refresh every 1/4 sec. <p align="center"> <img width="100%" src="https://user-images.githubusercontent.com/16078332/192108815-37c03705-be44-47d4-9908-6d05175db230.png" alt="Process Metrics Screen"> <br/> Watch metrics for a specific process (shortcut: <kbd>Enter</kbd> / <kbd>Return</kbd>). </p> 2. Use `nvitop.ResourceMetricCollector`, see [Resource Metric Collector](https://github.com/XuehaiPan/nvitop#resource-metric-collector) for more information.
Author
Owner

@BlueskyFR commented on GitHub (Apr 6, 2023):

Thanks for your reply.
Why are calls to NVML so slow?

nvidia-smi supports a resolution up to a 10ms refresh rate for instance

<!-- gh-comment-id:1498846995 --> @BlueskyFR commented on GitHub (Apr 6, 2023): Thanks for your reply. Why are calls to NVML so slow? `nvidia-smi` supports a resolution up to a 10ms refresh rate for instance
Author
Owner

@XuehaiPan commented on GitHub (Apr 6, 2023):

Why are calls to NVML so slow?

nvidia-smi supports a resolution up to a 10ms refresh rate for instance

@BlueskyFR nvidia-smi cannot achieve this.

  1. It depends on how many GPU devices are on board.
  2. If the persistence mode is disabled, the nvidia-smi command will take more time (up to seconds (e.g., 3s)) to do a single query.

We can "refresh" the "fake" results every 10ms. But the results may be queried seconds ago. They are not accurate.

Here are some benchmark results from my side. You can try hyperfine on your machine to see the latency.

  • Single NVIDIA 3090 GPU on WSL (persistence mode enabled)
$ hyperfine --warmup 50 --runs 200 nvidia-smi
Benchmark 1: nvidia-smi
  Time (mean ± σ):     113.6 ms ±   8.4 ms    [User: 5.3 ms, System: 3.9 ms]
  Range (min … max):    98.4 ms … 141.4 ms    200 runs
  • 8 x NVIDIA A100 GPU on native Ubuntu (persistence mode enabled)
$ hyperfine --warmup 50 --runs 200 nvidia-smi
Benchmark 1: nvidia-smi
  Time (mean ± σ):      1.920 s ±  0.417 s    [User: 0.007 s, System: 1.298 s]
  Range (min … max):    1.314 s …  4.250 s    200 runs

It takes 2 seconds to do a single query. It cannot run under 10ms.

<!-- gh-comment-id:1498869702 --> @XuehaiPan commented on GitHub (Apr 6, 2023): > Why are calls to NVML so slow? > `nvidia-smi` supports a resolution up to a 10ms refresh rate for instance @BlueskyFR `nvidia-smi` cannot achieve this. 1. It depends on how many GPU devices are on board. 2. If the persistence mode is disabled, the `nvidia-smi` command will take more time (up to seconds (e.g., 3s)) to do a single query. We can "refresh" the "fake" results every 10ms. But the results may be queried seconds ago. They are not accurate. Here are some benchmark results from my side. You can try [`hyperfine`](https://github.com/sharkdp/hyperfine) on your machine to see the latency. - Single NVIDIA 3090 GPU on WSL (persistence mode enabled) ```console $ hyperfine --warmup 50 --runs 200 nvidia-smi Benchmark 1: nvidia-smi Time (mean ± σ): 113.6 ms ± 8.4 ms [User: 5.3 ms, System: 3.9 ms] Range (min … max): 98.4 ms … 141.4 ms 200 runs ``` - 8 x NVIDIA A100 GPU on native Ubuntu (persistence mode enabled) ```console $ hyperfine --warmup 50 --runs 200 nvidia-smi Benchmark 1: nvidia-smi Time (mean ± σ): 1.920 s ± 0.417 s [User: 0.007 s, System: 1.298 s] Range (min … max): 1.314 s … 4.250 s 200 runs ``` It takes 2 seconds to do a single query. It cannot run under 10ms.
Author
Owner

@BlueskyFR commented on GitHub (Apr 6, 2023):

Why are calls to NVML so slow?

nvidia-smi supports a resolution up to a 10ms refresh rate for instance

@BlueskyFR nvidia-smi cannot achieve this.

  1. It depends on how many GPU devices are on board.
  2. If the persistence mode is disabled, the nvidia-smi command will take more time (up to seconds (e.g., 3s)) to do a single query.

We can "refresh" the "fake" results every 10ms. But the results may be queried seconds ago. They are not accurate.

Here are some benchmark results from my side. You can try hyperfine on your machine to see the latency.

  • Single NVIDIA 3090 GPU on WSL (persistence mode enabled)
$ hyperfine --warmup 50 --runs 200 nvidia-smi
Benchmark 1: nvidia-smi
  Time (mean ± σ):     113.6 ms ±   8.4 ms    [User: 5.3 ms, System: 3.9 ms]
  Range (min … max):    98.4 ms … 141.4 ms    200 runs
  • 8 x NVIDIA A100 GPU on native Ubuntu (persistence mode enabled)
$ hyperfine --warmup 50 --runs 200 nvidia-smi
Benchmark 1: nvidia-smi
  Time (mean ± σ):      1.920 s ±  0.417 s    [User: 0.007 s, System: 1.298 s]
  Range (min … max):    1.314 s …  4.250 s    200 runs

It takes 2 seconds to do a single query. It cannot run under 10ms.

You are maybe using it wrong 😊

You can see my post here for more details -> https://github.com/influxdata/telegraf/issues/8534#issue-761112264

<!-- gh-comment-id:1499041398 --> @BlueskyFR commented on GitHub (Apr 6, 2023): > > Why are calls to NVML so slow? > > > `nvidia-smi` supports a resolution up to a 10ms refresh rate for instance > > @BlueskyFR `nvidia-smi` cannot achieve this. > > 1. It depends on how many GPU devices are on board. > 2. If the persistence mode is disabled, the `nvidia-smi` command will take more time (up to seconds (e.g., 3s)) to do a single query. > > We can "refresh" the "fake" results every 10ms. But the results may be queried seconds ago. They are not accurate. > > Here are some benchmark results from my side. You can try [`hyperfine`](https://github.com/sharkdp/hyperfine) on your machine to see the latency. > > - Single NVIDIA 3090 GPU on WSL (persistence mode enabled) > > ```console > $ hyperfine --warmup 50 --runs 200 nvidia-smi > Benchmark 1: nvidia-smi > Time (mean ± σ): 113.6 ms ± 8.4 ms [User: 5.3 ms, System: 3.9 ms] > Range (min … max): 98.4 ms … 141.4 ms 200 runs > ``` > > - 8 x NVIDIA A100 GPU on native Ubuntu (persistence mode enabled) > > ```console > $ hyperfine --warmup 50 --runs 200 nvidia-smi > Benchmark 1: nvidia-smi > Time (mean ± σ): 1.920 s ± 0.417 s [User: 0.007 s, System: 1.298 s] > Range (min … max): 1.314 s … 4.250 s 200 runs > ``` > > It takes 2 seconds to do a single query. It cannot run under 10ms. > You are maybe using it wrong 😊 You can see my post here for more details -> https://github.com/influxdata/telegraf/issues/8534#issue-761112264
Author
Owner

@XuehaiPan commented on GitHub (Apr 6, 2023):

You are maybe using it wrong 😊

You can see my post here for more details -> https://github.com/influxdata/telegraf/issues/8534#issue-761112264

Thanks for the reference. nvitop already uses sparse queries with nvidia-ml-py instead of a full query using nvidia-smi. But there are still many things that are slow here. Such as gathering process information, especially when the process number is relatively large (up to hundreds). Also, as I mentioned above, if you don't enable the persistence mode, your nvidia-smi query will take a much longer time.

<!-- gh-comment-id:1499059139 --> @XuehaiPan commented on GitHub (Apr 6, 2023): > You are maybe using it wrong 😊 > You can see my post here for more details -> https://github.com/influxdata/telegraf/issues/8534#issue-761112264 Thanks for the reference. `nvitop` already uses sparse queries with `nvidia-ml-py` instead of a full query using `nvidia-smi`. But there are still many things that are slow here. Such as gathering process information, especially when the process number is relatively large (up to hundreds). Also, as I mentioned above, if you don't enable the persistence mode, your `nvidia-smi` query will take a much longer time.
Author
Owner

@BlueskyFR commented on GitHub (Apr 6, 2023):

So I think maybe it is more a design problem?
Maybe the same quantity of information cannot be achieved with nvidia-smi but I doubt it

<!-- gh-comment-id:1499107145 --> @BlueskyFR commented on GitHub (Apr 6, 2023): So I think maybe it is more a design problem? Maybe the same quantity of information cannot be achieved with nvidia-smi but I doubt it
Author
Owner

@XuehaiPan commented on GitHub (Apr 6, 2023):

So I think maybe it is more a design problem?
Maybe the same quantity of information cannot be achieved with nvidia-smi but I doubt it

In your example, you are not querying process information, which is the key feature of nvitop. If you want accurate metrics data, I still think you should use a profiler instead. A day-to-day monitor should not run under high sample frequency for 7x24. That will lead to high power consumption. If you want to monitor a process for only several minutes, why not use a profiler? It should be the more appropriate tool for your use case.

<!-- gh-comment-id:1499126959 --> @XuehaiPan commented on GitHub (Apr 6, 2023): > So I think maybe it is more a design problem? > Maybe the same quantity of information cannot be achieved with nvidia-smi but I doubt it In your example, you are not querying process information, which is the key feature of `nvitop`. If you want accurate metrics data, I still think you should use a profiler instead. A day-to-day monitor should not run under high sample frequency for 7x24. That will lead to high power consumption. If you want to monitor a process for only several minutes, why not use a profiler? It should be the more appropriate tool for your use case.
Author
Owner

@BlueskyFR commented on GitHub (Apr 13, 2023):

Could be a solution, what profiler do you have in mind for instance?

<!-- gh-comment-id:1507216263 --> @BlueskyFR commented on GitHub (Apr 13, 2023): Could be a solution, what profiler do you have in mind for instance?
Author
Owner

@XuehaiPan commented on GitHub (Apr 14, 2023):

Could be a solution, what profiler do you have in mind for instance?

@BlueskyFR That depends on your use case because profilers need an in-process injection to add hooks to record kernel times. This may need users to update their code. If you are using PyTorch, you may try torch.profiler.profile (pytorch/kineto). It can collect fine-grained metrics and also come with a web-based GUI integration. You may also try the NVIDIA Nsight Systems, a profiling tool from NVIDIA.

<!-- gh-comment-id:1507904352 --> @XuehaiPan commented on GitHub (Apr 14, 2023): > Could be a solution, what profiler do you have in mind for instance? @BlueskyFR That depends on your use case because profilers need an in-process injection to add hooks to record kernel times. This may need users to update their code. If you are using PyTorch, you may try [`torch.profiler.profile`](https://pytorch.org/tutorials/recipes/recipes/profiler_recipe.html) ([pytorch/kineto](https://github.com/pytorch/kineto)). It can collect fine-grained metrics and also come with a web-based GUI integration. You may also try the [NVIDIA Nsight Systems](https://developer.nvidia.com/nsight-systems), a profiling tool from NVIDIA.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/nvitop#38
No description provided.