Commit graph

  • 5f8013c059 chore(workflows): update build workflows Xuehai Pan 2023-08-18 12:00:39 +00:00
  • 2c309cc5ec chore(workflows): update lint workflows Xuehai Pan 2023-08-18 11:19:42 +00:00
  • fc32f65aa1 feat(exporter): add support for Prometheus exporter Xuehai Pan 2023-05-01 16:20:39 +08:00
  • 9ff3ec3400 fix(api/libnvml): fix upstream changes for process info v3 APIs on 535.104.05 driver (#94) Xuehai Pan 2023-08-26 18:57:54 +08:00
  • c16587511d chore(api/process): remove attribute gpu_cc_protected_memory Xuehai Pan 2023-08-26 10:51:51 +00:00
  • d44e6b7411 docs(CHANGELOG): update CHANGELOG Xuehai Pan 2023-08-26 10:47:56 +00:00
  • 8e22a2cbed chore(pre-commit): update pre-commit hooks Xuehai Pan 2023-08-26 10:36:14 +00:00
  • 6ac518a775 fix(api/libnvml): fix upstream changes for process info v3 APIs on 535.104.05 driver Xuehai Pan 2023-08-26 10:35:18 +00:00
  • 9f3ff53425 refactor(api/libnvml): refactor symbol lookup logic Xuehai Pan 2023-08-26 10:30:10 +00:00
  • 6a9663b33f
    fix(api/libnvml): fix removal for process info v3 APIs on the upstream 535.98 driver (#89) Xuehai Pan 2023-08-17 16:33:53 +08:00
  • c89acafc1e docs(CHANGELOG): update CHANGELOG Xuehai Pan 2023-08-17 14:39:36 +08:00
  • 4e448c344f fix(api/libnvml): fix upstream removal for process info v3 APIs on 535.98 driver Xuehai Pan 2023-08-16 15:15:30 +08:00
  • ec4ad645d2
    feat(api/device): add methods to query PCIe and NVLink throughput (#87) Xuehai Pan 2023-08-13 22:37:36 +08:00
  • a4bb5bfc7e docs(CHANGELOG): update changelog Xuehai Pan 2023-08-13 21:17:28 +08:00
  • c12b01701b fix(callbacks): fix a potential attribute errors for newer PyTorch-Lightning Xuehai Pan 2023-08-13 22:15:56 +08:00
  • d71eb5103a chore(pre-commit): update pre-commit hooks Xuehai Pan 2023-08-13 20:45:51 +08:00
  • 643393be53 style: style libnvml Xuehai Pan 2023-08-13 19:32:33 +08:00
  • 5184b7c9a7 chore(api/device): add extra interval argument for blocking Xuehai Pan 2023-08-08 16:51:13 +08:00
  • 54fe913323 chore(api/utils): use open condition for the right side Xuehai Pan 2023-08-07 00:42:10 +08:00
  • df1df528ab chore(api/libnvml): handle corrupted pynvml installation Xuehai Pan 2023-08-05 02:33:44 +08:00
  • ce098635f2 chore(pre-commit): update pre-commit hooks Xuehai Pan 2023-08-05 02:22:59 +08:00
  • b2dc47eec8 feat(api/device): add shortcut when there is no NVLink available Xuehai Pan 2023-08-05 02:23:29 +08:00
  • 4073d7da75 feat(api/device): add methods to query NVLink throughput Xuehai Pan 2023-08-05 01:37:28 +08:00
  • 2d479deecc feat(api/device): add methods to query PCI-e throughput Xuehai Pan 2023-08-04 21:00:33 +08:00
  • ef77b8b989
    fix(api/device): use recent timestamp for GPU process utilization query (#85) Xuehai Pan 2023-08-04 18:55:57 +08:00
  • eacefb05f9 docs(CHANGELOG): update changelog Xuehai Pan 2023-08-04 18:44:38 +08:00
  • 27d22cae37 fix(api/device): add extra 0.25s timestamp margin for GPU process utilization Xuehai Pan 2023-08-04 16:36:41 +08:00
  • b99d90544d feat(api/process): update GPU status for gone processes Xuehai Pan 2023-08-04 15:47:55 +08:00
  • e41e0ea701 fix(api/device): use current epoch timestamp for process utilization Xuehai Pan 2023-08-04 15:42:07 +08:00
  • e776dae3a3 fix(api/device): remove extra timestamp margin for GPU process utilization Xuehai Pan 2023-08-03 21:27:28 +08:00
  • 0eae807fe8
    Update pytorch_lightning.py Phúc H. Lê Khắc 2023-08-02 16:43:51 +02:00
  • f85dc71f46
    Update pytorch_lightning.py Phúc H. Lê Khắc 2023-08-02 16:23:57 +02:00
  • ec53de75b4 ver: bump version to v1.2.0 v1.2.0 Xuehai Pan 2023-07-24 16:40:44 +08:00
  • 6265a68e39 chore(pre-commit): update pre-commit hooks Xuehai Pan 2023-07-24 16:37:10 +08:00
  • 11a03ced70
    feat(api/collector): include last snapshot metrics in the log results (#80) Xuehai Pan 2023-07-17 20:48:39 +08:00
  • 06dbdd3aa0 chore(api/collector): last snapshot criteria Xuehai Pan 2023-07-17 18:35:18 +08:00
  • 886d35af05 docs(CHANGELOG): update changelog Xuehai Pan 2023-07-17 18:23:31 +08:00
  • 47e8fa868c chore(api/collector): pass timestamp as method argument Xuehai Pan 2023-07-17 18:18:46 +08:00
  • 9646556178 chore(api/collector): respect daemon thread snapshot interval Xuehai Pan 2023-07-17 01:07:04 +08:00
  • 41d23463b2 feat(api/collector): include last snapshot metrices in the log results Xuehai Pan 2023-07-17 00:47:42 +08:00
  • 872d20c65e chore(api/collector): lock snapshot intervals Xuehai Pan 2023-07-17 00:32:42 +08:00
  • c3487c03b6
    fix(api/libnvml): fix process info support for NVIDIA R535 driver (CUDA 12.2+) (#79) Xuehai Pan 2023-07-17 00:22:53 +08:00
  • 9c7545feee deps(nvidia-ml-py): add nvidia-ml-py 12.535.77 to support list Xuehai Pan 2023-07-17 00:09:00 +08:00
  • db9fb6c9ce chore(api/process): add usedGpuCcProtectedMemory to process snapshot Xuehai Pan 2023-07-17 00:04:30 +08:00
  • 727a4322fe style(api/process): update method name Xuehai Pan 2023-07-16 23:58:13 +08:00
  • 3486b45b11 style(api/libnvml): update private function name Xuehai Pan 2023-07-16 23:45:18 +08:00
  • ce46b3ad1b chore(cli): remove unreachable warnings Xuehai Pan 2023-07-16 23:36:19 +08:00
  • 090cd6baa0 docs(api/libnvml): update docstrings Xuehai Pan 2023-07-16 23:13:15 +08:00
  • 29b047c18d feat(api/process): set used_gpu_cc_protected_memor for GpuProcess Xuehai Pan 2023-07-16 03:32:41 +08:00
  • 788a1fb44e docs(api/libnvml): add comments for type struct fields Xuehai Pan 2023-07-14 11:42:25 +08:00
  • ecb23a66c3 fix(api/libnvml): fix process info support for NVIDIA R535 driver Xuehai Pan 2023-07-14 11:24:12 +08:00
  • 04ac6a0efe deps(nvidia-ml-py): add nvidia-ml-py 11.525.131 to support list Xuehai Pan 2023-07-01 17:16:53 +08:00
  • 7c74e02eb0 chore(pre-commit): update pre-commit hooks Xuehai Pan 2023-07-01 17:15:08 +08:00
  • 7ebf9056fb chore(pre-commit): update pre-commit hooks Xuehai Pan 2023-06-07 06:42:15 +00:00
  • b5d64d58ba deps: pin pytorch-lightning while building documentation Xuehai Pan 2023-05-11 07:42:32 +00:00
  • 266edcb3be chore(pre-commit): update pre-commit hooks Xuehai Pan 2023-05-07 14:12:11 +00:00
  • f0b055bfcf
    feat(linter): mypy integration (#73) Xuehai Pan 2023-05-01 13:02:01 +08:00
  • 00ab574e85 chore(api): update type annotations Xuehai Pan 2023-05-01 04:58:19 +00:00
  • c730af2106 chore(api/device): update type annotations Xuehai Pan 2023-05-01 03:36:35 +00:00
  • 6697d54895 docs(CHANGELOG): update CHANGELOG.md Xuehai Pan 2023-05-01 01:58:26 +00:00
  • afd9ba2514 docs: add notes to set CUDA_DEVICE_ORDER="PCI_BUS_ID" Xuehai Pan 2023-05-01 02:35:12 +08:00
  • 4bb3da75f3 feat(linter): mypy integration Xuehai Pan 2023-04-30 16:35:41 +00:00
  • 2408735f54 chore: pre-commit autoupdate Xuehai Pan 2023-04-20 05:58:01 +00:00
  • 871c5ca248 style: prefer utf-8 over UTF-8 in code Xuehai Pan 2023-04-20 05:09:28 +00:00
  • 060ff27725 docs: improve grammar in README Xuehai Pan 2023-04-14 14:29:55 +00:00
  • 4093334972 fix(api/libcuda): fix inappropriate exception catching in function libcuda.cuDeviceGetUuid Xuehai Pan 2023-04-12 13:16:07 +08:00
  • fd1e4148f6 ver: bump version to v1.1.2 v1.1.2 Xuehai Pan 2023-04-11 13:28:16 +00:00
  • 7fca245b57
    fix(api/device): further isolate the CUDA_VISIBLE_DEVICE parser in a subprocess (#70) Xuehai Pan 2023-04-11 21:26:12 +08:00
  • 12278ccce3 docs(CHANGELOG): update CHANGELOG.md Xuehai Pan 2023-04-10 12:56:59 +00:00
  • 21e0ea00e2 fix(api/device): further isolate the CUDA_VISIBLE_DEVICE parser in a subprocess Xuehai Pan 2023-04-10 12:55:21 +00:00
  • 5a0da9239b docs: add notes for upgrading pip for GitHub installation Xuehai Pan 2023-04-08 15:50:45 +00:00
  • 6b6aee7537 ver: bump version to v1.1.1 v1.1.1 Xuehai Pan 2023-04-07 14:39:07 +00:00
  • 1616a486c9 fix(gui/device): fix MIG device support Xuehai Pan 2023-04-07 14:37:34 +00:00
  • 790ffdf404 ver: bump version to v1.1.0 v1.1.0 Xuehai Pan 2023-04-07 14:11:16 +00:00
  • 2ecb5f4bcb
    feat(cli): support float number as snapshot interval (>= 0.25s) (#67) Xuehai Pan 2023-04-07 17:22:41 +08:00
  • 8db35947f2 docs(CHANGELOG): update CHANGELOG.md Xuehai Pan 2023-04-07 09:08:04 +00:00
  • ce66cac61f feat(cli): support float number as snapshot interval (>= 0.25s) Xuehai Pan 2023-04-07 09:03:51 +00:00
  • c883884073
    refactor(api): move TTLCache usage to CLI-only (#66) Xuehai Pan 2023-04-07 16:51:07 +08:00
  • 7b54f40f97 feat(workflows): add import tests for Python 3.7 Xuehai Pan 2023-04-07 08:46:35 +00:00
  • dbc2f65b97 docs(CHANGELOG): update CHANGELOG.md Xuehai Pan 2023-04-07 08:41:40 +00:00
  • ac56c6a7ab refactor(api): move TTLCache usage to CLI-only Xuehai Pan 2023-04-07 03:24:41 +00:00
  • df42d0c0f0 chore(api/libnvml): always shutdown NVML handle Xuehai Pan 2023-04-07 03:14:44 +00:00
  • 394458ee65 deps(nvidia-ml-py): add nvidia-ml-py 11.525.112 to support list Xuehai Pan 2023-04-06 06:55:15 +00:00
  • 7cee2e32d3 chore(pre-commit): update pre-commit hooks Xuehai Pan 2023-03-30 12:50:51 +00:00
  • 7c50c0854d deps: pin tensorflow-cpu while building documentation Xuehai Pan 2023-03-28 01:03:10 +00:00
  • c95058296d deps: pin tensorflow-cpu while building documentation Xuehai Pan 2023-03-28 07:27:44 +00:00
  • 600c719e9b chore(pre-commit): [pre-commit.ci] autoupdate Xuehai Pan 2023-03-28 01:03:10 +00:00
  • e2982d0d4b deps: use tensorflow-cpu while building documentation Xuehai Pan 2023-03-27 12:12:46 +00:00
  • ae8a08d6ff chore: set default_stages in .pre-commit-config.yaml Xuehai Pan 2023-03-27 08:01:46 +00:00
  • 383cdea81e feat: add codespell integration Xuehai Pan 2023-03-27 07:38:17 +00:00
  • 598e5be4be simple change to accept float numbers as interval Sms-Rk 2023-03-27 00:00:25 +03:30
  • ab95ceea1e
    chore(pre-commit): [pre-commit.ci] autoupdate pre-commit-ci[bot] 2023-03-21 00:40:46 +00:00
  • 05284ec2f8 fix(select): fix type annotation Xuehai Pan 2023-03-20 13:57:07 +00:00
  • 27b573804d chore(api): remove redundant __str__ method when __repr__ is defined Xuehai Pan 2023-03-20 13:56:05 +00:00
  • 0bc40840a4
    feat(gui/host): show more metrics (#59) Xuehai Pan 2023-03-16 20:26:02 +08:00
  • 56a60e9486 docs(CHANGELOG): update CHANGELOG.md Xuehai Pan 2023-03-16 11:37:17 +00:00
  • e00582a8f2 feat(gui/host): show more metrics Xuehai Pan 2023-03-16 11:25:41 +00:00
  • 20313d08bd
    feat(linter): ruff integration (#57) Xuehai Pan 2023-03-15 17:37:42 +08:00
  • 48f05a3dec test: add minimal import tests Xuehai Pan 2023-03-15 09:30:40 +00:00
  • 80b9d6f75c feat: ruff integration Xuehai Pan 2023-03-15 09:18:14 +00:00