[GH-ISSUE #162] [Question] Can't use nvitop callback in pytorch lightning #102

Closed
opened 2026-05-05 03:25:31 -06:00 by gitea-mirror · 2 comments
Owner

Originally created by @leo1oel on GitHub (Apr 19, 2025).
Original GitHub issue: https://github.com/XuehaiPan/nvitop/issues/162

Originally assigned to: @XuehaiPan on GitHub.

Required prerequisites

  • I have read the documentation https://nvitop.readthedocs.io.
  • I have searched the Issue Tracker that this hasn't already been reported. (comment there if it has.)
  • I have tried the latest version of nvitop in a new isolated virtual environment.

Questions

When I use nvitop (version 1.4.2) with the latest pytorch lightning following README, an error occurs.

Error executing job with overrides: []
Traceback (most recent call last):
File "/pasteur/u/yiming/small-vlm/src/vlm/vlm.py", line 120, in main
vlm(cfg)
~~~^^^^^
File "/pasteur/u/yiming/small-vlm/src/vlm/vlm.py", line 106, in vlm
train(cfg.trainer, model, data_module)
~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/pasteur/u/yiming/small-vlm/src/vlm/train/trainer.py", line 55, in train
trainer.fit(
~~~~~~~~~~~^
model=model,
^^^^^^^^^^^^
datamodule=data_module,
^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/pasteur/u/yiming/small-vlm/.venv/lib/python3.13/site-packages/lightning/pytorch/trainer/trainer.py", line 561, in fit
call._call_and_handle_interrupt(
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
self, self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/pasteur/u/yiming/small-vlm/.venv/lib/python3.13/site-packages/lightning/pytorch/trainer/call.py", line 47, in _call_and_handle_interrupt
return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/pasteur/u/yiming/small-vlm/.venv/lib/python3.13/site-packages/lightning/pytorch/strategies/launchers/subprocess_script.py", line 105, in launch
return function(*args, **kwargs)
File "/pasteur/u/yiming/small-vlm/.venv/lib/python3.13/site-packages/lightning/pytorch/trainer/trainer.py", line 599, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/pasteur/u/yiming/small-vlm/.venv/lib/python3.13/site-packages/lightning/pytorch/trainer/trainer.py", line 1012, in _run
results = self._run_stage()
File "/pasteur/u/yiming/small-vlm/.venv/lib/python3.13/site-packages/lightning/pytorch/trainer/trainer.py", line 1056, in _run_stage
self.fit_loop.run()
~~~~~~~~~~~~~~~~~^^
File "/pasteur/u/yiming/small-vlm/.venv/lib/python3.13/site-packages/lightning/pytorch/loops/fit_loop.py", line 216, in run
self.advance()
~~~~~~~~~~~~^^
File "/pasteur/u/yiming/small-vlm/.venv/lib/python3.13/site-packages/lightning/pytorch/loops/fit_loop.py", line 455, in advance
self.epoch_loop.run(self._data_fetcher)
~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
File "/pasteur/u/yiming/small-vlm/.venv/lib/python3.13/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 150, in run
self.advance(data_fetcher)
~~~~~~~~~~~~^^^^^^^^^^^^^^
File "/pasteur/u/yiming/small-vlm/.venv/lib/python3.13/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 303, in advance
call._call_callback_hooks(trainer, "on_train_batch_start", batch, batch_idx)
~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/pasteur/u/yiming/small-vlm/.venv/lib/python3.13/site-packages/lightning/pytorch/trainer/call.py", line 227, in _call_callback_hooks
fn(trainer, trainer.lightning_module, *args, **kwargs)
~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/pasteur/u/yiming/small-vlm/.venv/lib/python3.13/site-packages/lightning_utilities/core/rank_zero.py", line 41, in wrapped_fn
return fn(*args, **kwargs)
TypeError: GpuStatsLogger.on_train_batch_start() takes 3 positional arguments but 5 were given

Originally created by @leo1oel on GitHub (Apr 19, 2025). Original GitHub issue: https://github.com/XuehaiPan/nvitop/issues/162 Originally assigned to: @XuehaiPan on GitHub. ### Required prerequisites - [x] I have read the documentation <https://nvitop.readthedocs.io>. - [x] I have searched the [Issue Tracker](https://github.com/XuehaiPan/nvitop/issues) that this hasn't already been reported. (comment there if it has.) - [x] I have tried the latest version of nvitop in a new isolated virtual environment. ### Questions When I use nvitop (version 1.4.2) with the latest pytorch lightning following README, an error occurs. Error executing job with overrides: [] Traceback (most recent call last): File "/pasteur/u/yiming/small-vlm/src/vlm/vlm.py", line 120, in main vlm(cfg) ~~~^^^^^ File "/pasteur/u/yiming/small-vlm/src/vlm/vlm.py", line 106, in vlm train(cfg.trainer, model, data_module) ~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/pasteur/u/yiming/small-vlm/src/vlm/train/trainer.py", line 55, in train trainer.fit( ~~~~~~~~~~~^ model=model, ^^^^^^^^^^^^ datamodule=data_module, ^^^^^^^^^^^^^^^^^^^^^^^ ) ^ File "/pasteur/u/yiming/small-vlm/.venv/lib/python3.13/site-packages/lightning/pytorch/trainer/trainer.py", line 561, in fit call._call_and_handle_interrupt( ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^ self, self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ) ^ File "/pasteur/u/yiming/small-vlm/.venv/lib/python3.13/site-packages/lightning/pytorch/trainer/call.py", line 47, in _call_and_handle_interrupt return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/pasteur/u/yiming/small-vlm/.venv/lib/python3.13/site-packages/lightning/pytorch/strategies/launchers/subprocess_script.py", line 105, in launch return function(*args, **kwargs) File "/pasteur/u/yiming/small-vlm/.venv/lib/python3.13/site-packages/lightning/pytorch/trainer/trainer.py", line 599, in _fit_impl self._run(model, ckpt_path=ckpt_path) ~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/pasteur/u/yiming/small-vlm/.venv/lib/python3.13/site-packages/lightning/pytorch/trainer/trainer.py", line 1012, in _run results = self._run_stage() File "/pasteur/u/yiming/small-vlm/.venv/lib/python3.13/site-packages/lightning/pytorch/trainer/trainer.py", line 1056, in _run_stage self.fit_loop.run() ~~~~~~~~~~~~~~~~~^^ File "/pasteur/u/yiming/small-vlm/.venv/lib/python3.13/site-packages/lightning/pytorch/loops/fit_loop.py", line 216, in run self.advance() ~~~~~~~~~~~~^^ File "/pasteur/u/yiming/small-vlm/.venv/lib/python3.13/site-packages/lightning/pytorch/loops/fit_loop.py", line 455, in advance self.epoch_loop.run(self._data_fetcher) ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^ File "/pasteur/u/yiming/small-vlm/.venv/lib/python3.13/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 150, in run self.advance(data_fetcher) ~~~~~~~~~~~~^^^^^^^^^^^^^^ File "/pasteur/u/yiming/small-vlm/.venv/lib/python3.13/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 303, in advance call._call_callback_hooks(trainer, "on_train_batch_start", batch, batch_idx) ~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/pasteur/u/yiming/small-vlm/.venv/lib/python3.13/site-packages/lightning/pytorch/trainer/call.py", line 227, in _call_callback_hooks fn(trainer, trainer.lightning_module, *args, **kwargs) ~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/pasteur/u/yiming/small-vlm/.venv/lib/python3.13/site-packages/lightning_utilities/core/rank_zero.py", line 41, in wrapped_fn return fn(*args, **kwargs) TypeError: GpuStatsLogger.on_train_batch_start() takes 3 positional arguments but 5 were given
gitea-mirror 2026-05-05 03:25:31 -06:00
Author
Owner

@XuehaiPan commented on GitHub (Apr 19, 2025):

Hi @leo1oel, thanks for raising this. I want to note that nvitop does not have a schedule to sync with the third-party dependencies. The callback support is going to be deprecated.

Please feel free to copy the callback implementation to your project.

<!-- gh-comment-id:2816721757 --> @XuehaiPan commented on GitHub (Apr 19, 2025): Hi @leo1oel, thanks for raising this. I want to note that `nvitop` does not have a schedule to sync with the third-party dependencies. The callback support is going to be deprecated. Please feel free to copy the callback implementation to your project.
Author
Owner

@XuehaiPan commented on GitHub (Apr 19, 2025):

Overall, thanks for using nvitop. The callback is going to be removed and unmaintained. Sorry about that.

<!-- gh-comment-id:2816746127 --> @XuehaiPan commented on GitHub (Apr 19, 2025): Overall, thanks for using `nvitop`. The callback is going to be removed and unmaintained. Sorry about that.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/nvitop#102
No description provided.