Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / oap-project/cloudtik issues and pull requests

#1362 - Examples: Horovod on Spark examples GPU support

Pull Request - State: closed - Opened by jerrychenhf over 1 year ago

#1361 - Examples: torch checkpoint to save model in cpu location

Pull Request - State: closed - Opened by jerrychenhf over 1 year ago

#1360 - Examples: synthetic ImageNet example for PyTorch distributed

Pull Request - State: closed - Opened by jerrychenhf over 1 year ago

#1359 - Examples: fix the makedirs permission issue

Pull Request - State: closed - Opened by jerrychenhf over 1 year ago

#1358 - ML: Fix driver NIC issue for Horovod

Pull Request - State: closed - Opened by jerrychenhf over 1 year ago

#1357 - Examples: example folder change the name to examples

Pull Request - State: closed - Opened by jerrychenhf over 1 year ago

#1356 - Examples: PyTorch examples to support GPU

Pull Request - State: closed - Opened by jerrychenhf over 1 year ago

#1355 - Templates: Smaller head for standard and small GPU templates

Pull Request - State: closed - Opened by jerrychenhf over 1 year ago

#1354 - Templates: add very small GPU templates for use of testing

Pull Request - State: closed - Opened by jerrychenhf over 1 year ago

#1353 - Templates: make the GPU templates consistent on naming

Pull Request - State: closed - Opened by jerrychenhf over 1 year ago

#1352 - Alibaba Cloud: add integration test cases

Pull Request - State: closed - Opened by jerrychenhf over 1 year ago

#1351 - Example: add cluster examples for ml (CPU, GPU and oneAPI)

Pull Request - State: closed - Opened by jerrychenhf over 1 year ago

#1350 - Example: ML example for resnet50 with IPEX (need workaround fix)

Pull Request - State: closed - Opened by jerrychenhf over 1 year ago

#1349 - Core: no wait for minimal nodes with an operating quorum

Pull Request - State: closed - Opened by jerrychenhf over 1 year ago

#1348 - Core: fix the quorum launch check logic

Pull Request - State: closed - Opened by jerrychenhf over 1 year ago

#1347 - Core: implement the quorum management of minimal nodes

Pull Request - State: closed - Opened by jerrychenhf over 1 year ago

#1346 - Core: allow minimal nodes cluster to avoid scale on failure

Pull Request - State: closed - Opened by jerrychenhf over 1 year ago

#1345 - Examples: add zookeeper test example

Pull Request - State: closed - Opened by jerrychenhf over 1 year ago

#1344 - Tools: install dlib which removed from the core ml runtime

Pull Request - State: closed - Opened by jerrychenhf over 1 year ago

#1343 - Tools: use the fixed intelai-models commit

Pull Request - State: closed - Opened by jerrychenhf over 1 year ago

#1342 - Core: by default disable head automatic runtime detection

Pull Request - State: closed - Opened by jerrychenhf over 1 year ago

#1341 - Core: commands to handle GPU resource info

Pull Request - State: closed - Opened by jerrychenhf over 1 year ago

#1340 - Providers: auto detect GPU resources from instance type

Pull Request - State: closed - Opened by jerrychenhf over 1 year ago

#1339 - Azure: built-in GPU templates for Azure

Pull Request - State: closed - Opened by jerrychenhf over 1 year ago

#1338 - AWS: refine GPU templates with a base config

Pull Request - State: closed - Opened by jerrychenhf over 1 year ago

#1337 - GCP: Fix the wait for driver

Pull Request - State: closed - Opened by jerrychenhf over 1 year ago

#1336 - GCP: check driver installation only on worker

Pull Request - State: closed - Opened by jerrychenhf over 1 year ago

#1335 - AWS: gpu templates rename to lower case

Pull Request - State: closed - Opened by jerrychenhf over 1 year ago

#1334 - GCP: built-in templates for GPU instances

Pull Request - State: closed - Opened by jerrychenhf over 1 year ago

#1332 - Core: config merge support advanced list appending

Pull Request - State: closed - Opened by jerrychenhf over 1 year ago

#1331 - AWS: aws gpu templates

Pull Request - State: closed - Opened by jerrychenhf over 1 year ago

#1330 - ML: consistent GPU cuda libraries version

Pull Request - State: closed - Opened by jerrychenhf over 1 year ago

#1329 - GCP: Fix the order of setting image source

Pull Request - State: closed - Opened by jerrychenhf over 1 year ago

#1328 - AWS: update the latest image ids of the regions for GPU

Pull Request - State: closed - Opened by jerrychenhf over 1 year ago

#1327 - Core: docker to use GPU tagged image based on runtime

Pull Request - State: closed - Opened by jerrychenhf over 1 year ago

#1326 - Alibaba Cloud: use cpu or gpu image based on runtime

Pull Request - State: closed - Opened by jerrychenhf over 1 year ago

#1325 - GCP: choose the cpu or gup image based on runtime

Pull Request - State: closed - Opened by jerrychenhf over 1 year ago

#1324 - Azure: choose cpu or gpu image at bootstrap

Pull Request - State: closed - Opened by jerrychenhf over 1 year ago

#1323 - AWS: auto configure the image id if gpu is configured

Pull Request - State: closed - Opened by jerrychenhf over 1 year ago

#1322 - AWS: refine the database instance management for workspace

Pull Request - State: closed - Opened by jerrychenhf over 1 year ago

#1321 - ML: Fix the ML runtime docker to set the right env

Pull Request - State: closed - Opened by jerrychenhf over 1 year ago

#1320 - Dev: release docker with GPU options

Pull Request - State: closed - Opened by jerrychenhf over 1 year ago

#1319 - ML: Initial code for ML to support GPU

Pull Request - State: closed - Opened by jerrychenhf over 1 year ago

#1318 - Support cloud database for AWS.

Pull Request - State: closed - Opened by haojinIntel over 1 year ago - 1 comment

#1317 - Get HuaweiCloud provider default cluster image

Pull Request - State: closed - Opened by kiwik over 1 year ago - 1 comment

#1316 - Benchmarks: Add models original code about DLRM dist training.

Pull Request - State: closed - Opened by yao531441 over 1 year ago

#1315 - Dev: improve the release docker to release image individually

Pull Request - State: closed - Opened by jerrychenhf over 1 year ago

#1314 - ML: upgrade MLflow from 2.1.1 to 2.2.2

Pull Request - State: closed - Opened by jerrychenhf over 1 year ago

#1313 - Copy the source code of RNNT and SSD-RESNET to CloudTik.

Pull Request - State: closed - Opened by haojinIntel over 1 year ago - 3 comments

#1312 - Add op_svc_userid into worker node metadata

Pull Request - State: closed - Opened by kiwik over 1 year ago

#1310 - ML: upgrade TensorFlow to 2.12.0 for oneAPI ML runtime

Pull Request - State: closed - Opened by jerrychenhf over 1 year ago

#1309 - ML: Horovod upgrade to 0.27.0 for Tensorlfow 2.12.0

Pull Request - State: closed - Opened by jerrychenhf over 1 year ago

#1308 - Benchmarks: support to run maskcnn with or without IPEX.

Pull Request - State: closed - Opened by haojinIntel over 1 year ago - 1 comment

#1307 - Add source code for maskcnn of ai-model

Pull Request - State: closed - Opened by haojinIntel over 1 year ago

#1306 - Patch ai models during running bootstrap-models.sh

Pull Request - State: closed - Opened by haojinIntel over 1 year ago - 1 comment

#1304 - Support to run training or inference for ssd-resnet34 without IPEX.

Pull Request - State: closed - Opened by haojinIntel over 1 year ago - 2 comments

#1293 - Add HUAWEICLOUD integration test

Pull Request - State: closed - Opened by kiwik over 1 year ago

#1194 - Can cloudtik support Alicloud?

Issue - State: closed - Opened by george-gu-2021 over 1 year ago - 2 comments

#1011 - [Feature] Add HuaweiCloud provider

Issue - State: open - Opened by kiwik almost 2 years ago - 49 comments