Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / lsds/KungFu issues and pull requests

#368 - Update README.md

Pull Request - State: open - Opened by luomai about 2 years ago

#367 - Bump numpy from 1.16 to 1.22.0 in /tests

Pull Request - State: closed - Opened by dependabot[bot] over 2 years ago - 1 comment
Labels: dependencies

#366 - Bump numpy from 1.16 to 1.22.0 in /docs

Pull Request - State: closed - Opened by dependabot[bot] over 2 years ago - 1 comment
Labels: dependencies

#364 - remove one too many optimizer

Pull Request - State: closed - Opened by marwage almost 3 years ago

#363 - support -elastic-mode=reload

Pull Request - State: closed - Opened by lgarithm about 3 years ago

#362 - support run with `python -m kungfu.cmd ...`

Pull Request - State: closed - Opened by lgarithm about 3 years ago

#361 - Fix linking

Pull Request - State: closed - Opened by marwage about 3 years ago

#360 - A question about Horovod central coordinator in the paper of KungFu

Issue - State: open - Opened by JohanOu about 3 years ago - 2 comments

#359 - log once

Pull Request - State: closed - Opened by lgarithm about 3 years ago - 1 comment

#358 - use singleton instead of extern variable

Pull Request - State: closed - Opened by lgarithm about 3 years ago

#357 - support nccl subset allreduce

Pull Request - State: closed - Opened by lgarithm about 3 years ago

#356 - add subset_all_reduce and queue

Pull Request - State: closed - Opened by lgarithm about 3 years ago

#355 - add ncclAllGather binding

Pull Request - State: closed - Opened by lgarithm about 3 years ago - 1 comment

#354 - add a configure flag: --disable-cxx11-abi

Pull Request - State: closed - Opened by lgarithm over 3 years ago

#353 - support init from json config

Pull Request - State: open - Opened by lgarithm over 3 years ago

#352 - run github action on PR

Pull Request - State: closed - Opened by lgarithm over 3 years ago

#351 - remove travis CI

Pull Request - State: closed - Opened by lgarithm over 3 years ago - 1 comment

#350 - Failure recovery.

Pull Request - State: closed - Opened by DingtongHan over 3 years ago - 2 comments

#349 - fix #348

Pull Request - State: closed - Opened by lgarithm over 3 years ago

#348 - code loss

Issue - State: closed - Opened by rankeey over 3 years ago - 2 comments

#347 - Error from pytoch demo

Issue - State: closed - Opened by rankeey over 3 years ago - 1 comment

#346 - support python native resize from config server

Pull Request - State: closed - Opened by lgarithm over 3 years ago

#345 - pytorch elastic example using python native API

Pull Request - State: closed - Opened by lgarithm over 3 years ago

#344 - export resize as plain python API

Pull Request - State: closed - Opened by lgarithm over 3 years ago

#343 - support launch with python multiprocess

Pull Request - State: closed - Opened by lgarithm over 3 years ago - 5 comments

#342 - fix numa_node_count

Pull Request - State: closed - Opened by lgarithm over 3 years ago

#341 - update NCCL binding

Pull Request - State: closed - Opened by lgarithm almost 4 years ago

#340 - build NCCL binding as a library

Pull Request - State: closed - Opened by lgarithm almost 4 years ago

#339 - Is Windows supported?

Issue - State: open - Opened by blacksailer almost 4 years ago

#338 - bind to cpu socket

Pull Request - State: closed - Opened by lgarithm almost 4 years ago

#337 - support set_affinity

Pull Request - State: closed - Opened by lgarithm almost 4 years ago

#336 - [WIP] update documents

Pull Request - State: open - Opened by lgarithm almost 4 years ago

#335 - fix resize

Pull Request - State: closed - Opened by lgarithm almost 4 years ago

#334 - support boot from MPI for debug

Pull Request - State: open - Opened by lgarithm almost 4 years ago

#333 - Access to Adaptive Batch Size Policy

Issue - State: open - Opened by aleksficek almost 4 years ago

#332 - [WIP] use `detached` instead of `not keep` in resize related APIs

Pull Request - State: closed - Opened by lgarithm almost 4 years ago

#331 - support disk_input in benchmark

Pull Request - State: closed - Opened by lgarithm almost 4 years ago

#330 - elastic example for TF2

Pull Request - State: closed - Opened by lgarithm almost 4 years ago

#329 - Check in TensorFlow policy APIs and examples.

Pull Request - State: closed - Opened by lgarithm almost 4 years ago

#328 - cherrypicks

Pull Request - State: closed - Opened by lgarithm almost 4 years ago

#327 - create some helper functions to manage kungfu global variables

Pull Request - State: closed - Opened by lgarithm almost 4 years ago

#326 - builtin config server

Pull Request - State: closed - Opened by lgarithm about 4 years ago

#325 - new monitoring API: egress_rates

Pull Request - State: closed - Opened by lgarithm about 4 years ago

#324 - make resize API consistent with paper

Pull Request - State: closed - Opened by lgarithm about 4 years ago

#323 - use a dedicated thread for NCCL operations (#317)

Pull Request - State: closed - Opened by lgarithm about 4 years ago

#322 - new resize API

Pull Request - State: closed - Opened by lgarithm about 4 years ago

#321 - fix shape

Pull Request - State: closed - Opened by lgarithm about 4 years ago

#320 - Pytorch allgather support

Pull Request - State: closed - Opened by marwage about 4 years ago

#319 - Reimplement config server using REST style API

Pull Request - State: closed - Opened by lgarithm about 4 years ago

#318 - fix pytorch include_dirs

Pull Request - State: closed - Opened by lgarithm about 4 years ago

#317 - use a dedicated thread for NCCL operations

Issue - State: closed - Opened by lgarithm about 4 years ago

#316 - Pass over README.md

Pull Request - State: closed - Opened by prp about 4 years ago

#315 - [WIP] Artifact Evaluation README

Pull Request - State: open - Opened by luomai about 4 years ago

#314 - Elastic scaling policy

Pull Request - State: closed - Opened by marwage about 4 years ago

#313 - adaptive communication strategy policy

Pull Request - State: closed - Opened by kfertakis about 4 years ago

#312 - provide rank and cluster_size as standalone TF OPs

Pull Request - State: closed - Opened by lgarithm about 4 years ago

#311 - Kf strategy adapt merge

Pull Request - State: closed - Opened by kfertakis about 4 years ago

#310 - update public API docs

Pull Request - State: closed - Opened by lgarithm about 4 years ago

#309 - initial support for pytorch

Pull Request - State: closed - Opened by lgarithm about 4 years ago - 2 comments

#308 - support NCCL in elastic mode

Pull Request - State: closed - Opened by lgarithm over 4 years ago

#307 - reimplement NCCL scheduler in C++

Pull Request - State: closed - Opened by lgarithm over 4 years ago - 1 comment

#306 - make kungfu-run a Python API

Pull Request - State: closed - Opened by lgarithm over 4 years ago

#305 - kungfu-notify-start (#302)

Pull Request - State: closed - Opened by lgarithm over 4 years ago - 5 comments

#304 - Support for share-memory channels?

Issue - State: closed - Opened by luomai over 4 years ago - 1 comment

#303 - add an option to enable XLA

Pull Request - State: closed - Opened by lgarithm over 4 years ago - 2 comments

#302 - [doc] request parameters doc when the -init-version=-1

Issue - State: closed - Opened by zrss over 4 years ago - 5 comments

#301 - initial support for hierarchical-nccl

Pull Request - State: closed - Opened by lgarithm over 4 years ago - 2 comments

#300 - Performance drops when TensorFlow experimental XLA JIT is enabled.

Issue - State: closed - Opened by rankeey over 4 years ago - 6 comments

#299 - initial support for local/cross/inplace operations

Pull Request - State: closed - Opened by lgarithm over 4 years ago

#298 - cleanup build warnings

Pull Request - State: closed - Opened by lgarithm over 4 years ago

#297 - kungfu job is hang in a inconsistent version when i scale down/up mutiple times

Issue - State: closed - Opened by zrss over 4 years ago - 14 comments

#296 - the kungfu-job is hang when it scale down

Issue - State: closed - Opened by zrss over 4 years ago - 2 comments

#295 - wait new runner (#294)

Pull Request - State: closed - Opened by lgarithm over 4 years ago

#294 - failed to establish connection to the newly runner

Issue - State: closed - Opened by zrss over 4 years ago - 5 comments

#293 - fix remote runner

Pull Request - State: closed - Opened by lgarithm over 4 years ago

#292 - cleanup

Pull Request - State: closed - Opened by lgarithm over 4 years ago

#291 - fix chdir

Pull Request - State: closed - Opened by lgarithm over 4 years ago

#290 - simplify build dependency

Pull Request - State: closed - Opened by lgarithm over 4 years ago

#289 - check in experiment code

Pull Request - State: closed - Opened by lgarithm over 4 years ago

#288 - new APIs

Pull Request - State: closed - Opened by lgarithm over 4 years ago

#287 - scaling experiments

Pull Request - State: closed - Opened by lgarithm over 4 years ago

#286 - cleanup

Pull Request - State: closed - Opened by lgarithm over 4 years ago

#285 - LD bug workaround

Pull Request - State: closed - Opened by lgarithm over 4 years ago

#284 - Inconsistency detected by ld.so

Issue - State: closed - Opened by lgarithm over 4 years ago

#283 - fix HTTP connection leak

Pull Request - State: closed - Opened by lgarithm over 4 years ago

#282 - fix typo on returning error from Server

Pull Request - State: closed - Opened by kfertakis over 4 years ago

#281 - Support real global batch normalisation

Issue - State: closed - Opened by luomai over 4 years ago - 2 comments

#280 - support AllReduce with customised tree

Pull Request - State: closed - Opened by lgarithm over 4 years ago - 1 comment

#277 - Elastic hook can't support training from checkpoint.

Issue - State: closed - Opened by rankeey over 4 years ago

#276 - After remove the worker from the cluster, it is better to set the rank id to -1.

Issue - State: closed - Opened by rankeey over 4 years ago - 3 comments

#271 - fix NCCL

Pull Request - State: closed - Opened by lgarithm over 4 years ago - 2 comments

#266 - panic error

Issue - State: closed - Opened by rankeey over 4 years ago - 3 comments

#263 - bert demo question

Issue - State: closed - Opened by rankeey over 4 years ago - 4 comments

#253 - Segmentation fault on alpine Linux

Issue - State: closed - Opened by lgarithm over 4 years ago - 1 comment

#248 - Lower speed

Issue - State: closed - Opened by sondv7 over 4 years ago - 4 comments

#217 - support -hostfile flag

Issue - State: closed - Opened by lgarithm almost 5 years ago

#205 - Andrei alphazero

Pull Request - State: closed - Opened by andrei3131 almost 5 years ago - 2 comments

#200 - [Estimator] Sometime throws an dataset end_of_sequence error.

Issue - State: closed - Opened by marwage almost 5 years ago

#199 - [Keras example] Downloading data might conflict among parallel peers

Issue - State: closed - Opened by marwage almost 5 years ago - 2 comments