Commit graph

511 commits

Author SHA1 Message Date
Xuehai Pan
65150a4d52 chore(core/process): use WeakValueDictionary to cache instances
Signed-off-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
2022-07-10 21:02:15 +08:00
Xuehai Pan
38cdf5fbc4 docs(core/process): add docstrings for exceptions
Signed-off-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
2022-07-10 16:03:50 +08:00
Xuehai Pan
06821b5f1b docs: add section separators in README.md
Signed-off-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
2022-07-06 17:27:44 +08:00
Xuehai Pan
2d2b16acc3 ver: bump version to v0.6.2
Signed-off-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
2022-07-05 16:04:05 +08:00
Xuehai Pan
cf6ab08e66 chore(core/libnvml): limit the number of saved FunctionNotFound exceptions in logging
Signed-off-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
2022-07-05 15:51:02 +08:00
Xuehai Pan
1459e0828f feat(core/device): add method is_leaf_device and to_leaf_devices
Signed-off-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
2022-07-05 15:23:17 +08:00
Xuehai Pan
5023a2489a
Merge pull request #24 from XuehaiPan/pynvml 2022-07-05 13:25:47 +08:00
Xuehai Pan
b214e0a713 chore(cli): add message for missing functions on CUDA 10.x driver
Signed-off-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
2022-07-05 11:35:31 +08:00
Xuehai Pan
9406192302 feat(setup): add extra options to specify nvidia-ml-py version
Signed-off-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
2022-07-05 11:32:02 +08:00
Xuehai Pan
a35d3b9caf docs(core/collector): update docstrings
Signed-off-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
2022-07-05 10:54:33 +08:00
Xuehai Pan
2906d4f043 docs: add more examples
Signed-off-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
2022-07-03 21:18:54 +08:00
Xuehai Pan
cf9e10dc7e docs: update docstring
Signed-off-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
2022-07-03 19:06:13 +08:00
Xuehai Pan
05d1b3dbd0 refactor(core/process): get GPU instance ID and compute instance ID from c_nvmlProcessInfo_t
Signed-off-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
2022-07-03 19:04:30 +08:00
Xuehai Pan
eaca7a640c docs: add quick start
Signed-off-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
2022-07-03 18:37:29 +08:00
Xuehai Pan
2326a72a0a docs: update badges
Signed-off-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
2022-07-02 20:05:30 +08:00
Xuehai Pan
063b541a5f docs: update badges
Signed-off-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
2022-07-02 20:03:17 +08:00
Xuehai Pan
467f4cebc1 docs: use scripts to bypass bugs in the readthedocs build system
Signed-off-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
2022-07-02 19:04:40 +08:00
Xuehai Pan
9f69a755fa docs: resolve sphinx warnings
Signed-off-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
2022-07-02 18:37:22 +08:00
Xuehai Pan
493016b866 docs: resolve sphinx warnings
Signed-off-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
2022-07-02 17:17:55 +08:00
Xuehai Pan
1324777643 docs: resolve sphinx warnings
Signed-off-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
2022-07-02 17:05:11 +08:00
Xuehai Pan
d5f7ada23b docs: resolve sphinx warnings
Signed-off-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
2022-07-02 16:54:06 +08:00
Xuehai Pan
13cfad8eec chore(core/device): alias Device.cuda to CudaDevice
Signed-off-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
2022-07-02 16:03:34 +08:00
Xuehai Pan
d1b8bd44ce ver: bump version to v0.6.1
Signed-off-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
2022-07-01 16:29:28 +08:00
Xuehai Pan
1b1b399321 docs(core/device): fix docstring
Signed-off-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
2022-07-01 15:14:01 +08:00
Xuehai Pan
894a70e5e0 fix(core/device): fix commented property
Signed-off-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
2022-07-01 15:04:38 +08:00
Xuehai Pan
79555da19c chore: add .readthedocs.yaml
Signed-off-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
2022-07-01 14:19:42 +08:00
Xuehai Pan
61ae81fdf8
Merge pull request #22 from XuehaiPan/docs 2022-07-01 13:57:24 +08:00
Xuehai Pan
7512badf82 docs: update link URLs
Signed-off-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
2022-07-01 13:46:46 +08:00
Xuehai Pan
eff645db1a style: resolve pylint warnings
Signed-off-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
2022-07-01 13:24:33 +08:00
Xuehai Pan
102ee45960 docs: add Sphinx-based documents
Signed-off-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
2022-07-01 13:01:22 +08:00
Xuehai Pan
3bb17f6cc9 docs: remove todo list in README.md
Signed-off-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
2022-07-01 10:40:52 +08:00
Xuehai Pan
fab7cf8548 docs(core/utils): add detailed documentation for utilities
Signed-off-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
2022-07-01 10:40:52 +08:00
Xuehai Pan
06c5443a75 docs(core/process): add detailed documentation for process classes
Signed-off-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
2022-07-01 01:51:27 +08:00
Xuehai Pan
db29062e4c docs(core/host): add detailed documentation for psutil shortcuts
Signed-off-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
2022-07-01 01:51:27 +08:00
Xuehai Pan
74b6bab3ca docs(core/libnvml): add detailed documentation for NVML bindings
Signed-off-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
2022-07-01 01:51:27 +08:00
Xuehai Pan
209ec7faef docs(core/device): add detailed documentation for device classes
Signed-off-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
2022-07-01 01:51:26 +08:00
Xuehai Pan
cd801fa8d9 chore(install-nvidia-driver): abort in WSL
Signed-off-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
2022-06-30 18:20:47 +08:00
Xuehai Pan
ef6e523666 fix(core/device): fix CUDA device enumeration for MIG devices
Signed-off-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
2022-06-29 17:23:49 +08:00
Xuehai Pan
226f065a8d docs: update README.md for API reference
Signed-off-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
2022-06-28 16:39:33 +08:00
Xuehai Pan
f5218951a9 chore(core/utils): proper indentation for __str__ of nested snapshot
Signed-off-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
2022-06-28 16:30:22 +08:00
Xuehai Pan
98e7d296a4 feat(core/process): add gpu instance ID and compute instance ID to class GpuProcess
Signed-off-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
2022-06-28 13:00:14 +08:00
Xuehai Pan
dbbcf52b0f docs: update README.md for ResourceMetricCollector
Signed-off-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
2022-06-26 21:06:08 +08:00
Xuehai Pan
cd1e8f28e7 ver: bump version to v0.6.0
Signed-off-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
2022-06-26 20:11:39 +08:00
Xuehai Pan
0672457692
Merge pull request #8 from XuehaiPan/mig-support 2022-06-26 20:03:29 +08:00
Xuehai Pan
7d385e9f09 style: update pylint magic comments
Signed-off-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
2022-06-26 20:03:04 +08:00
Xuehai Pan
32c8ee8e61
Merge pull request #21 from XuehaiPan/collector 2022-06-26 19:58:20 +08:00
Xuehai Pan
0cab53e95c chore(core/collector): close statistics for missing keys
Signed-off-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
2022-06-26 19:12:06 +08:00
Xuehai Pan
7f413e4cd5 docs: update README.md for ResourceMetricCollector
Signed-off-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
2022-06-26 19:11:10 +08:00
Xuehai Pan
4e0c59836a chore(callbacks): add alias for PyTorchLightning
Signed-off-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
2022-06-26 17:44:18 +08:00
Xuehai Pan
229f0e8a95 docs: update README.md for ResourceMetricCollector
Signed-off-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
2022-06-26 17:39:11 +08:00