Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / lsds/KungFu issues and pull requests
#369 - add a tiny benchmark
Pull Request -
State: closed - Opened by lgarithm 11 months ago
#368 - Update README.md
Pull Request -
State: open - Opened by luomai over 2 years ago
#367 - Bump numpy from 1.16 to 1.22.0 in /tests
Pull Request -
State: closed - Opened by dependabot[bot] over 2 years ago
- 1 comment
Labels: dependencies
#366 - Bump numpy from 1.16 to 1.22.0 in /docs
Pull Request -
State: closed - Opened by dependabot[bot] over 2 years ago
- 1 comment
Labels: dependencies
#365 - With PairAveragingOptimizer, is it possible that two workers in different iterations average their models?
Issue -
State: closed - Opened by ymlei almost 3 years ago
- 1 comment
#364 - remove one too many optimizer
Pull Request -
State: closed - Opened by marwage about 3 years ago
#363 - support -elastic-mode=reload
Pull Request -
State: closed - Opened by lgarithm over 3 years ago
#362 - support run with `python -m kungfu.cmd ...`
Pull Request -
State: closed - Opened by lgarithm over 3 years ago
#361 - Fix linking
Pull Request -
State: closed - Opened by marwage over 3 years ago
#360 - A question about Horovod central coordinator in the paper of KungFu
Issue -
State: open - Opened by JohanOu over 3 years ago
- 2 comments
#359 - log once
Pull Request -
State: closed - Opened by lgarithm over 3 years ago
- 1 comment
#358 - use singleton instead of extern variable
Pull Request -
State: closed - Opened by lgarithm over 3 years ago
#357 - support nccl subset allreduce
Pull Request -
State: closed - Opened by lgarithm over 3 years ago
#356 - add subset_all_reduce and queue
Pull Request -
State: closed - Opened by lgarithm over 3 years ago
#355 - add ncclAllGather binding
Pull Request -
State: closed - Opened by lgarithm over 3 years ago
- 1 comment
#354 - add a configure flag: --disable-cxx11-abi
Pull Request -
State: closed - Opened by lgarithm over 3 years ago
#353 - support init from json config
Pull Request -
State: open - Opened by lgarithm over 3 years ago
#352 - run github action on PR
Pull Request -
State: closed - Opened by lgarithm over 3 years ago
#351 - remove travis CI
Pull Request -
State: closed - Opened by lgarithm over 3 years ago
- 1 comment
#350 - Failure recovery.
Pull Request -
State: closed - Opened by DingtongHan over 3 years ago
- 2 comments
#349 - fix #348
Pull Request -
State: closed - Opened by lgarithm almost 4 years ago
#348 - code loss
Issue -
State: closed - Opened by rankeey almost 4 years ago
- 2 comments
#347 - Error from pytoch demo
Issue -
State: closed - Opened by rankeey almost 4 years ago
- 1 comment
#346 - support python native resize from config server
Pull Request -
State: closed - Opened by lgarithm almost 4 years ago
#345 - pytorch elastic example using python native API
Pull Request -
State: closed - Opened by lgarithm almost 4 years ago
#344 - export resize as plain python API
Pull Request -
State: closed - Opened by lgarithm almost 4 years ago
#343 - support launch with python multiprocess
Pull Request -
State: closed - Opened by lgarithm almost 4 years ago
- 5 comments
#342 - fix numa_node_count
Pull Request -
State: closed - Opened by lgarithm about 4 years ago
#341 - update NCCL binding
Pull Request -
State: closed - Opened by lgarithm about 4 years ago
#340 - build NCCL binding as a library
Pull Request -
State: closed - Opened by lgarithm about 4 years ago
#339 - Is Windows supported?
Issue -
State: open - Opened by blacksailer about 4 years ago
#338 - bind to cpu socket
Pull Request -
State: closed - Opened by lgarithm about 4 years ago
#337 - support set_affinity
Pull Request -
State: closed - Opened by lgarithm about 4 years ago
#336 - [WIP] update documents
Pull Request -
State: open - Opened by lgarithm about 4 years ago
#335 - fix resize
Pull Request -
State: closed - Opened by lgarithm about 4 years ago
#334 - support boot from MPI for debug
Pull Request -
State: open - Opened by lgarithm about 4 years ago
#333 - Access to Adaptive Batch Size Policy
Issue -
State: open - Opened by aleksficek about 4 years ago
#332 - [WIP] use `detached` instead of `not keep` in resize related APIs
Pull Request -
State: closed - Opened by lgarithm over 4 years ago
#331 - support disk_input in benchmark
Pull Request -
State: closed - Opened by lgarithm over 4 years ago
#330 - elastic example for TF2
Pull Request -
State: closed - Opened by lgarithm over 4 years ago
#329 - Check in TensorFlow policy APIs and examples.
Pull Request -
State: closed - Opened by lgarithm over 4 years ago
#328 - cherrypicks
Pull Request -
State: closed - Opened by lgarithm over 4 years ago
#327 - create some helper functions to manage kungfu global variables
Pull Request -
State: closed - Opened by lgarithm over 4 years ago
#326 - builtin config server
Pull Request -
State: closed - Opened by lgarithm over 4 years ago
#325 - new monitoring API: egress_rates
Pull Request -
State: closed - Opened by lgarithm over 4 years ago
#324 - make resize API consistent with paper
Pull Request -
State: closed - Opened by lgarithm over 4 years ago
#323 - use a dedicated thread for NCCL operations (#317)
Pull Request -
State: closed - Opened by lgarithm over 4 years ago
#322 - new resize API
Pull Request -
State: closed - Opened by lgarithm over 4 years ago
#321 - fix shape
Pull Request -
State: closed - Opened by lgarithm over 4 years ago
#320 - Pytorch allgather support
Pull Request -
State: closed - Opened by marwage over 4 years ago
#319 - Reimplement config server using REST style API
Pull Request -
State: closed - Opened by lgarithm over 4 years ago
#318 - fix pytorch include_dirs
Pull Request -
State: closed - Opened by lgarithm over 4 years ago
#317 - use a dedicated thread for NCCL operations
Issue -
State: closed - Opened by lgarithm over 4 years ago
#316 - Pass over README.md
Pull Request -
State: closed - Opened by prp over 4 years ago
#315 - [WIP] Artifact Evaluation README
Pull Request -
State: open - Opened by luomai over 4 years ago
#314 - Elastic scaling policy
Pull Request -
State: closed - Opened by marwage over 4 years ago
#313 - adaptive communication strategy policy
Pull Request -
State: closed - Opened by kfertakis over 4 years ago
#312 - provide rank and cluster_size as standalone TF OPs
Pull Request -
State: closed - Opened by lgarithm over 4 years ago
#311 - Kf strategy adapt merge
Pull Request -
State: closed - Opened by kfertakis over 4 years ago
#310 - update public API docs
Pull Request -
State: closed - Opened by lgarithm over 4 years ago
#309 - initial support for pytorch
Pull Request -
State: closed - Opened by lgarithm over 4 years ago
- 2 comments
#308 - support NCCL in elastic mode
Pull Request -
State: closed - Opened by lgarithm over 4 years ago
#307 - reimplement NCCL scheduler in C++
Pull Request -
State: closed - Opened by lgarithm over 4 years ago
- 1 comment
#306 - make kungfu-run a Python API
Pull Request -
State: closed - Opened by lgarithm over 4 years ago
#305 - kungfu-notify-start (#302)
Pull Request -
State: closed - Opened by lgarithm over 4 years ago
- 5 comments
#304 - Support for share-memory channels?
Issue -
State: closed - Opened by luomai over 4 years ago
- 1 comment
#303 - add an option to enable XLA
Pull Request -
State: closed - Opened by lgarithm over 4 years ago
- 2 comments
#302 - [doc] request parameters doc when the -init-version=-1
Issue -
State: closed - Opened by zrss over 4 years ago
- 5 comments
#301 - initial support for hierarchical-nccl
Pull Request -
State: closed - Opened by lgarithm over 4 years ago
- 2 comments
#300 - Performance drops when TensorFlow experimental XLA JIT is enabled.
Issue -
State: closed - Opened by rankeey over 4 years ago
- 6 comments
#299 - initial support for local/cross/inplace operations
Pull Request -
State: closed - Opened by lgarithm over 4 years ago
#298 - cleanup build warnings
Pull Request -
State: closed - Opened by lgarithm over 4 years ago
#297 - kungfu job is hang in a inconsistent version when i scale down/up mutiple times
Issue -
State: closed - Opened by zrss over 4 years ago
- 14 comments
#296 - the kungfu-job is hang when it scale down
Issue -
State: closed - Opened by zrss over 4 years ago
- 2 comments
#295 - wait new runner (#294)
Pull Request -
State: closed - Opened by lgarithm over 4 years ago
#294 - failed to establish connection to the newly runner
Issue -
State: closed - Opened by zrss over 4 years ago
- 5 comments
#293 - fix remote runner
Pull Request -
State: closed - Opened by lgarithm over 4 years ago
#292 - cleanup
Pull Request -
State: closed - Opened by lgarithm over 4 years ago
#291 - fix chdir
Pull Request -
State: closed - Opened by lgarithm over 4 years ago
#290 - simplify build dependency
Pull Request -
State: closed - Opened by lgarithm over 4 years ago
#289 - check in experiment code
Pull Request -
State: closed - Opened by lgarithm over 4 years ago
#288 - new APIs
Pull Request -
State: closed - Opened by lgarithm over 4 years ago
#287 - scaling experiments
Pull Request -
State: closed - Opened by lgarithm over 4 years ago
#286 - cleanup
Pull Request -
State: closed - Opened by lgarithm over 4 years ago
#285 - LD bug workaround
Pull Request -
State: closed - Opened by lgarithm over 4 years ago
#284 - Inconsistency detected by ld.so
Issue -
State: closed - Opened by lgarithm over 4 years ago
#283 - fix HTTP connection leak
Pull Request -
State: closed - Opened by lgarithm over 4 years ago
#282 - fix typo on returning error from Server
Pull Request -
State: closed - Opened by kfertakis over 4 years ago
#281 - Support real global batch normalisation
Issue -
State: closed - Opened by luomai over 4 years ago
- 2 comments
#280 - support AllReduce with customised tree
Pull Request -
State: closed - Opened by lgarithm over 4 years ago
- 1 comment
#277 - Elastic hook can't support training from checkpoint.
Issue -
State: closed - Opened by rankeey almost 5 years ago
#276 - After remove the worker from the cluster, it is better to set the rank id to -1.
Issue -
State: closed - Opened by rankeey almost 5 years ago
- 3 comments
#271 - fix NCCL
Pull Request -
State: closed - Opened by lgarithm almost 5 years ago
- 2 comments
#266 - panic error
Issue -
State: closed - Opened by rankeey almost 5 years ago
- 3 comments
#263 - bert demo question
Issue -
State: closed - Opened by rankeey almost 5 years ago
- 4 comments
#253 - Segmentation fault on alpine Linux
Issue -
State: closed - Opened by lgarithm almost 5 years ago
- 1 comment
#248 - Lower speed
Issue -
State: closed - Opened by sondv7 almost 5 years ago
- 4 comments
#217 - support -hostfile flag
Issue -
State: closed - Opened by lgarithm about 5 years ago
#205 - Andrei alphazero
Pull Request -
State: closed - Opened by andrei3131 about 5 years ago
- 2 comments
#200 - [Estimator] Sometime throws an dataset end_of_sequence error.
Issue -
State: closed - Opened by marwage about 5 years ago