Commit graph

723 commits

Author SHA1 Message Date
Xuehai Pan
470245dc3d chore(pre-commit): update pre-commit hooks 2024-03-05 06:07:01 +00:00
Xuehai Pan
8e0c203a1d chore: update license header 2024-02-16 09:58:19 +00:00
Xuehai Pan
1710579c66 lint: update ruff rules 2024-02-16 09:43:47 +00:00
Xuehai Pan
64e35336cd chore(install-nvidia-driver): set LANGUAGE environment variable 2023-12-21 02:31:30 +08:00
dependabot[bot]
327223ef6e
deps(workflows): bump actions/download-artifact from 3 to 4 (#115)
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-12-18 15:06:57 +08:00
dependabot[bot]
dca862eb9c
deps(workflows): bump actions/upload-artifact from 3 to 4 (#116)
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-12-18 15:06:37 +08:00
Xuehai Pan
9c5d330076 ver: bump version to v1.3.2 2023-12-17 19:18:16 +08:00
Xuehai Pan
bff355bcc4
fix(callbacks/lightning): populate callback for lightning (#114) 2023-12-17 19:13:19 +08:00
dependabot[bot]
b50b83767b
deps(workflows): bump actions/setup-python from 4 to 5 (#111)
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
2023-12-11 14:26:02 +08:00
Xuehai Pan
8c8bc18ea0
feat(exporter): remove metrics if process is gone (#107) 2023-11-23 19:08:40 +08:00
Xuehai Pan
83f90f3fa1 chore(install-nvidia-driver): do not call nvidia-smi when the NVIDIA kernel modules are not loaded 2023-11-10 19:45:21 +08:00
Xuehai Pan
36e66bb43c chore(pre-commit): update pre-commit hooks 2023-11-10 19:45:21 +08:00
Xuehai Pan
4bcb6c92b3 style: miscellaneous style housekeeping 2023-11-05 16:05:38 +08:00
Xuehai Pan
1cff66bc03 deps(nvidia-ml-py): add nvidia-ml-py 12.535.133 to support list 2023-11-05 00:17:15 +08:00
Xuehai Pan
5cba62ffe1 chore(pre-commit): update pre-commit hooks 2023-10-25 23:11:17 +08:00
Xuehai Pan
2b3ec124d5 ver: bump version to v1.3.1 2023-10-05 20:04:13 +08:00
Xuehai Pan
a37e63fcf3
deps(python): add Python 3.12 classifiers (#101) 2023-10-05 20:01:13 +08:00
Xuehai Pan
9da41a5d12
fix(libcuda): fix cuDeviceGetUuid() when the UUID contains 0x00 (#100) 2023-10-05 19:48:41 +08:00
dependabot[bot]
49c164cf30
deps(workflows): bump actions/checkout from 3 to 4 (#96)
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-09-11 15:17:47 +08:00
Xuehai Pan
e8dd7e26e2 chore(pre-commit): update pre-commit hooks 2023-09-09 16:39:48 +08:00
Xuehai Pan
ed10216f6a chore(.readthedocs.yaml): remove deprecated config entry 2023-09-09 16:33:25 +08:00
Xuehai Pan
46ea686c33 deps(nvidia-ml-py): add nvidia-ml-py 12.535.108 to support list 2023-08-31 17:15:11 +08:00
Xuehai Pan
410785e283 ver: bump version to 1.3.0 2023-08-26 17:39:40 +00:00
Xuehai Pan
daf72c7bf3
feat(exporter): add Prometheus exporter (#92) 2023-08-27 01:37:04 +08:00
Xuehai Pan
9ff3ec3400 fix(api/libnvml): fix upstream changes for process info v3 APIs on 535.104.05 driver (#94) 2023-08-26 16:41:55 +00:00
Xuehai Pan
6a9663b33f
fix(api/libnvml): fix removal for process info v3 APIs on the upstream 535.98 driver (#89) 2023-08-17 16:33:53 +08:00
Xuehai Pan
ec4ad645d2
feat(api/device): add methods to query PCIe and NVLink throughput (#87) 2023-08-13 22:37:36 +08:00
Xuehai Pan
ef77b8b989
fix(api/device): use recent timestamp for GPU process utilization query (#85) 2023-08-04 18:55:57 +08:00
Xuehai Pan
ec53de75b4 ver: bump version to v1.2.0 2023-07-24 16:40:44 +08:00
Xuehai Pan
6265a68e39 chore(pre-commit): update pre-commit hooks 2023-07-24 16:37:10 +08:00
Xuehai Pan
11a03ced70
feat(api/collector): include last snapshot metrics in the log results (#80) 2023-07-17 20:48:39 +08:00
Xuehai Pan
c3487c03b6
fix(api/libnvml): fix process info support for NVIDIA R535 driver (CUDA 12.2+) (#79) 2023-07-17 00:22:53 +08:00
Xuehai Pan
04ac6a0efe deps(nvidia-ml-py): add nvidia-ml-py 11.525.131 to support list 2023-07-01 17:16:53 +08:00
Xuehai Pan
7c74e02eb0 chore(pre-commit): update pre-commit hooks 2023-07-01 17:15:08 +08:00
Xuehai Pan
7ebf9056fb chore(pre-commit): update pre-commit hooks 2023-06-07 06:42:15 +00:00
Xuehai Pan
b5d64d58ba deps: pin pytorch-lightning while building documentation 2023-05-11 07:44:02 +00:00
Xuehai Pan
266edcb3be chore(pre-commit): update pre-commit hooks 2023-05-08 00:10:15 +08:00
Xuehai Pan
f0b055bfcf
feat(linter): mypy integration (#73) 2023-05-01 13:02:01 +08:00
Xuehai Pan
2408735f54 chore: pre-commit autoupdate 2023-04-20 05:58:01 +00:00
Xuehai Pan
871c5ca248 style: prefer utf-8 over UTF-8 in code 2023-04-20 05:09:28 +00:00
Xuehai Pan
060ff27725 docs: improve grammar in README 2023-04-14 14:29:55 +00:00
Xuehai Pan
4093334972 fix(api/libcuda): fix inappropriate exception catching in function libcuda.cuDeviceGetUuid 2023-04-12 05:32:56 +00:00
Xuehai Pan
fd1e4148f6 ver: bump version to v1.1.2 2023-04-11 13:28:16 +00:00
Xuehai Pan
7fca245b57
fix(api/device): further isolate the CUDA_VISIBLE_DEVICE parser in a subprocess (#70) 2023-04-11 21:26:12 +08:00
Xuehai Pan
5a0da9239b docs: add notes for upgrading pip for GitHub installation 2023-04-08 15:50:45 +00:00
Xuehai Pan
6b6aee7537 ver: bump version to v1.1.1 2023-04-07 14:39:17 +00:00
Xuehai Pan
1616a486c9 fix(gui/device): fix MIG device support 2023-04-07 14:37:34 +00:00
Xuehai Pan
790ffdf404 ver: bump version to v1.1.0 2023-04-07 14:13:11 +00:00
Xuehai Pan
2ecb5f4bcb
feat(cli): support float number as snapshot interval (>= 0.25s) (#67) 2023-04-07 17:22:41 +08:00
Xuehai Pan
c883884073
refactor(api): move TTLCache usage to CLI-only (#66) 2023-04-07 16:51:07 +08:00