GitHub / pytorch/torchx issues and pull requests
#1068 - Add tolerations to KubernetesScheduler run opts
Issue -
State: open - Opened by JackWittmayer 2 months ago
#1067 - Add node selector to KubernetesScheduler run opts
Issue -
State: open - Opened by JackWittmayer 2 months ago
#1066 - Add application metadata to the describe API's return value
Pull Request -
State: closed - Opened by kiukchung 3 months ago
- 1 comment
Labels: CLA Signed, fb-exported
#1065 - Add missing Pyre mode headers] [shard:2/N]
Pull Request -
State: open - Opened by facebook-github-bot 3 months ago
- 1 comment
Labels: CLA Signed, fb-exported
#1064 - (torchx/local_scheduler) go back to using os.killpg in local_scheduler
Pull Request -
State: closed - Opened by kiukchung 3 months ago
- 2 comments
Labels: CLA Signed, fb-exported
#1063 - (torchx/local_scheduler) Use os.kill instead of os.killpg when sending SIGTERM to the replica pid. Add runner.wait() for torchx.runner.test.api_test#test_empty_session_id to gracefully wait for the replicas to finish running
Pull Request -
State: closed - Opened by kiukchung 3 months ago
- 2 comments
Labels: CLA Signed, fb-exported
#1062 - (torchx/local_scheduler) Use os.kill instead of os.killpg when sending SIGTERM to the replica pid. Add runner.wait() for torchx.runner.test.api_test#test_empty_session_id to gracefully wait for the replicas to finish running
Pull Request -
State: closed - Opened by kiukchung 3 months ago
- 1 comment
Labels: CLA Signed, fb-exported
#1061 - feat: add quoting support to to_dict (#1052) (#1053)
Pull Request -
State: closed - Opened by andywag 3 months ago
- 1 comment
Labels: CLA Signed, fb-exported
#1060 - introduce img_root_secondary to torchx macros
Pull Request -
State: closed - Opened by burak-turk 3 months ago
- 8 comments
Labels: CLA Signed, fb-exported
#1059 - [torchx][ci] Pin pip-25.0.1 since pip-25.1 breaks editable installs of torchx[dev]
Pull Request -
State: closed - Opened by kiukchung 3 months ago
- 1 comment
Labels: CLA Signed
#1058 - [torchx][CI] Run slurm integration test on linux.24_04.4x
Pull Request -
State: closed - Opened by kiukchung 3 months ago
- 4 comments
Labels: CLA Signed
#1057 - Allow empty session name for APIs that take it
Pull Request -
State: closed - Opened by kiukchung 3 months ago
- 13 comments
Labels: CLA Signed, fb-exported
#1056 - fix __file__ value inside components
Pull Request -
State: closed - Opened by daniel-ohayon 3 months ago
- 3 comments
Labels: CLA Signed, fb-exported
#1055 - fix: ray module not found handling (#1049)
Pull Request -
State: closed - Opened by andywag 3 months ago
- 2 comments
Labels: CLA Signed, fb-exported
#1054 - Use k8s volcano replicas to shrink job manifest size
Issue -
State: open - Opened by clumsy 3 months ago
- 2 comments
#1053 - feat: add quoting support to to_dict (#1052)
Pull Request -
State: closed - Opened by clumsy 3 months ago
- 9 comments
Labels: CLA Signed
#1052 - quoting support for to_dict
Issue -
State: open - Opened by clumsy 3 months ago
- 1 comment
#1051 - Upgrade to torch>=2.7.0 et al. for dev-requirements
Pull Request -
State: closed - Opened by kiukchung 3 months ago
- 2 comments
Labels: CLA Signed, fb-exported
#1050 - feat: list all registered schedulers (#1009)
Pull Request -
State: open - Opened by clumsy 3 months ago
- 8 comments
Labels: CLA Signed
#1049 - fix: ray module not found handling
Pull Request -
State: closed - Opened by clumsy 3 months ago
- 10 comments
Labels: CLA Signed
#1048 - Create //torchx/specs:lib_core
Pull Request -
State: closed - Opened by kiukchung 3 months ago
- 8 comments
Labels: CLA Signed, fb-exported
#1047 - Add missing Pyre mode headers] [shard:2/N]
Pull Request -
State: open - Opened by facebook-github-bot 3 months ago
- 1 comment
Labels: CLA Signed, fb-exported
#1046 - Prevent //torchx/cli/test:cmd_run_test from picking up user's /home/kiuk/.torchxconfig when running test cases
Pull Request -
State: closed - Opened by kiukchung 3 months ago
- 1 comment
Labels: CLA Signed, fb-exported
#1045 - support AppStatus json output format
Pull Request -
State: closed - Opened by nghuiqin 3 months ago
- 6 comments
Labels: CLA Signed, fb-exported
#1044 - decouple torchx native scuba logging and ttfb
Pull Request -
State: closed - Opened by kiukchung 3 months ago
- 7 comments
Labels: CLA Signed, fb-exported
#1043 - flag to log full trace on error
Pull Request -
State: open - Opened by tonykao8080 4 months ago
- 1 comment
Labels: CLA Signed, fb-exported
#1042 - Remove crash from AIPM in torchX at end
Pull Request -
State: open - Opened by andywag 4 months ago
- 2 comments
Labels: CLA Signed, fb-exported
#1041 - Back out "Update pyfmt component on FBS:master"
Pull Request -
State: closed - Opened by VladimirMakaev 4 months ago
- 2 comments
Labels: CLA Signed, fb-exported
#1040 - Unpin urllib3 dependency
Issue -
State: open - Opened by trvachov 4 months ago
#1039 - [torchx][CI] Run k8s integration test on linux.24_04.4x since the *.1…
Pull Request -
State: closed - Opened by kiukchung 4 months ago
- 1 comment
Labels: CLA Signed
#1038 - feat: add metadata parameters to dist/spmd components (#1037)
Pull Request -
State: open - Opened by clumsy 4 months ago
- 2 comments
Labels: CLA Signed
#1037 - Add metadata to dist/spdm components
Issue -
State: open - Opened by clumsy 4 months ago
- 6 comments
#1036 - feat: expose run_name via env in dist/spmd (#1035)
Pull Request -
State: closed - Opened by clumsy 4 months ago
- 1 comment
Labels: CLA Signed
#1035 - Expose run name in dist/spmd components
Issue -
State: closed - Opened by clumsy 4 months ago
#1034 - Fix broken doc-push
Pull Request -
State: closed - Opened by kiukchung 4 months ago
- 2 comments
Labels: CLA Signed, fb-exported
#1033 - Fix broken docs-build by pinning protobuf-3.20.x
Pull Request -
State: closed - Opened by kiukchung 4 months ago
- 8 comments
Labels: CLA Signed, fb-exported
#1032 - Run CI actions on ubuntu 24.04 since ubuntu 20.04 is being deprecated on 04/01/2025
Pull Request -
State: closed - Opened by kiukchung 4 months ago
- 8 comments
Labels: CLA Signed, fb-exported
#1031 - torchx - upgrade kfp to 2.5.0 (PENDING)
Pull Request -
State: open - Opened by tonykao8080 4 months ago
- 2 comments
Labels: CLA Signed, fb-exported
#1030 - Fix flaky compute_world_size test on devservers and make pyre happy
Pull Request -
State: closed - Opened by kiukchung 4 months ago
- 2 comments
Labels: CLA Signed, fb-exported
#1029 - Fix broken component integration test due to compute_world_size app not respecting env vars set by torchrun
Pull Request -
State: closed - Opened by kiukchung 4 months ago
- 2 comments
Labels: CLA Signed, fb-exported
#1028 - Update pyfmt component on FBS:master
Pull Request -
State: closed - Opened by tonykao8080 4 months ago
- 1 comment
Labels: CLA Signed, fb-exported
#1027 - Allow setting app and cfg for all apps
Pull Request -
State: closed - Opened by lgarg26 4 months ago
- 1 comment
Labels: CLA Signed, fb-exported
#1026 - Update etcd.yaml image
Pull Request -
State: open - Opened by omrishiv 4 months ago
- 4 comments
Labels: CLA Signed
#1025 - torchx - fix deps vulnerability for pytorch-lightning and sagemaker
Pull Request -
State: closed - Opened by tonykao8080 4 months ago
- 8 comments
Labels: CLA Signed, fb-exported
#1024 - Validation for pipeline and appdef components
Pull Request -
State: closed - Opened by bobyangyf 4 months ago
- 2 comments
Labels: CLA Signed, fb-exported
#1023 - Component arg parsing for pipelines
Pull Request -
State: closed - Opened by bobyangyf 4 months ago
- 2 comments
Labels: CLA Signed, fb-exported
#1022 - schedulers parameterized
Pull Request -
State: closed - Opened by bobyangyf 4 months ago
- 3 comments
Labels: CLA Signed, fb-exported
#1021 - Suggested way to get timestamp of the job submission?
Issue -
State: open - Opened by HanFa 4 months ago
#1020 - Drop python-3.8 unittest in favor of adding 3.12. Update to python-3.10 for other workflows
Pull Request -
State: closed - Opened by kiukchung 4 months ago
- 15 comments
Labels: CLA Signed, fb-exported
#1019 - Dont prefix examples as test
Pull Request -
State: closed - Opened by bobyangyf 4 months ago
- 1 comment
Labels: CLA Signed, fb-exported
#1018 - Componenets to get fn returning Any
Pull Request -
State: closed - Opened by bobyangyf 4 months ago
- 4 comments
Labels: CLA Signed, fb-exported
#1017 - Fix //torchx/github/docs:doctest on devservers and rid a few warnings.
Pull Request -
State: closed - Opened by kiukchung 4 months ago
- 13 comments
Labels: CLA Signed, fb-exported
#1016 - Allow injecting validators for component verification
Pull Request -
State: closed - Opened by bobyangyf 4 months ago
- 3 comments
Labels: CLA Signed, fb-exported
#1015 - assertEqual instead of assertEquals
Pull Request -
State: closed - Opened by bobyangyf 5 months ago
- 1 comment
Labels: CLA Signed, fb-exported
#1014 - typo fix
Pull Request -
State: closed - Opened by jsta 5 months ago
- 1 comment
#1013 - warn on unknown runopt passed via command-line
Pull Request -
State: closed - Opened by daniel-ohayon 5 months ago
- 2 comments
Labels: CLA Signed, fb-exported
#1012 - possible Improvement: Using shutdown() Before close() in `server.py`
Issue -
State: open - Opened by allrob23 5 months ago
#1011 - Handle args for customized entrypoint
Pull Request -
State: closed - Opened by hstonec 5 months ago
- 1 comment
Labels: CLA Signed, fb-exported
#1010 - Allow running shell script from fbpkg
Pull Request -
State: closed - Opened by hstonec 5 months ago
- 1 comment
Labels: CLA Signed, fb-exported
#1009 - List all registered torchx.schedulers
Issue -
State: open - Opened by clumsy 5 months ago
- 3 comments
#1007 - torchx support early validation before workspace build
Pull Request -
State: closed - Opened by tonykao8080 6 months ago
- 1 comment
Labels: CLA Signed, fb-exported
#1006 - Fix for issue with APF packaging with multiple roles
Pull Request -
State: closed - Opened by andywag 6 months ago
- 1 comment
Labels: CLA Signed, fb-exported
#1004 - fbcode//torchx/schedulers/test [A]
Pull Request -
State: open - Opened by itamaro 6 months ago
- 1 comment
Labels: CLA Signed, fb-exported
#1003 - fbcode//torchx/runner/test [A]
Pull Request -
State: closed - Opened by itamaro 6 months ago
- 1 comment
Labels: CLA Signed, fb-exported
#1002 - feat: add aws inf2 instance type as named resources
Pull Request -
State: closed - Opened by HanFa 6 months ago
- 5 comments
Labels: CLA Signed
#1001 - fbcode//torchx/runner/test [B]
Pull Request -
State: closed - Opened by itamaro 6 months ago
- 12 comments
Labels: CLA Signed, fb-exported
#1000 - Fixup Optional runopt cfg values handling during cfg_from_json_repr deserialization
Pull Request -
State: closed - Opened by lgarg26 6 months ago
- 1 comment
Labels: CLA Signed, fb-exported
#999 - Kubernetes backend: Can't run hello world
Issue -
State: open - Opened by knexator 6 months ago
#998 - Auto-fix lint violations from Fixit] fbcode//torchx/schedulers/test
Pull Request -
State: open - Opened by aleivag 7 months ago
- 2 comments
Labels: CLA Signed, fb-exported
#997 - fbcode//torchx/runner/test
Pull Request -
State: open - Opened by bajanduncan 7 months ago
- 3 comments
Labels: CLA Signed, fb-exported
#996 - fbcode//torchx/schedulers/test
Pull Request -
State: open - Opened by bajanduncan 7 months ago
- 1 comment
Labels: CLA Signed, fb-exported
#995 - feat: add aws_p5en.48xlarge
Pull Request -
State: open - Opened by clumsy 7 months ago
- 1 comment
Labels: CLA Signed
#994 - run par as an entrypoint if there is no patch or jetter patch.
Pull Request -
State: closed - Opened by yikaiMeta 7 months ago
- 5 comments
Labels: CLA Signed, fb-exported
#993 - fix dump and load behavior to load same set of schedulers
Pull Request -
State: closed - Opened by lgarg26 7 months ago
- 1 comment
Labels: CLA Signed, fb-exported
#992 - allow configurable scheduler load group
Pull Request -
State: closed - Opened by lgarg26 8 months ago
- 7 comments
Labels: CLA Signed, fb-exported
#991 - fix: adjust aws_c5.18xlarge memory size
Pull Request -
State: closed - Opened by clumsy 8 months ago
- 8 comments
Labels: CLA Signed
#990 - torchx - fix RaySchedulerTest unit test failure
Pull Request -
State: closed - Opened by tonykao8080 8 months ago
- 1 comment
Labels: CLA Signed, fb-exported
#989 - torchx - expose scheduler opts to scheduler validate function interface
Pull Request -
State: closed - Opened by tonykao8080 8 months ago
- 2 comments
Labels: CLA Signed, fb-exported
#988 - Add missing license header
Pull Request -
State: closed - Opened by tonykao8080 8 months ago
- 2 comments
Labels: CLA Signed, fb-exported
#987 - Remove unused pyre ignores
Pull Request -
State: open - Opened by rchen152 8 months ago
- 1 comment
Labels: CLA Signed, fb-exported
#986 - sync torchx .pyre_configuration.internal with external config and upg…
Pull Request -
State: closed - Opened by stroxler 8 months ago
- 5 comments
Labels: CLA Signed, fb-exported
#985 - sync torchx .pyre_configuration.internal with external config and upgrade
Pull Request -
State: closed - Opened by dluo 8 months ago
- 1 comment
Labels: CLA Signed, fb-exported
#984 - Fix docs build by freezing apache airflow version
Pull Request -
State: open - Opened by jesszzzz 8 months ago
Labels: CLA Signed
#983 - Tailing kubernetes logs doesn't put logs on new lines
Issue -
State: open - Opened by matthen 8 months ago
- 2 comments
#982 - Fix pydantic incompatiblity causing test failures
Pull Request -
State: closed - Opened by jesszzzz 8 months ago
Labels: CLA Signed
#981 - Fix pyre errors
Pull Request -
State: open - Opened by jesszzzz 8 months ago
Labels: CLA Signed
#980 - feat: add aws_c5_18xlarge
Pull Request -
State: closed - Opened by clumsy 8 months ago
- 1 comment
Labels: CLA Signed
#979 - [torchx/lintrunner] Propagate pyre cmd errors in pyre_linter.py
Pull Request -
State: closed - Opened by kiukchung 8 months ago
- 1 comment
Labels: CLA Signed
#978 - [torchx/dev-requirements] Fix broken unittest in CI
Pull Request -
State: closed - Opened by kiukchung 8 months ago
Labels: CLA Signed
#977 - Add AWS Inf2 instances support for aws_batch scheduler
Pull Request -
State: open - Opened by shixianc 8 months ago
- 10 comments
Labels: CLA Signed
#976 - Convert directory fbcode/torchx to use the Ruff Formatter
Pull Request -
State: open - Opened by tpolasek 8 months ago
- 1 comment
Labels: CLA Signed, fb-exported
#975 - Update .pyre_configuration to fix Pyre
Pull Request -
State: closed - Opened by yikaiMeta 8 months ago
Labels: CLA Signed
#974 - Add torchx session id as Environment variable
Pull Request -
State: closed - Opened by yikaiMeta 8 months ago
- 3 comments
Labels: CLA Signed, fb-exported
#973 - Strip lines when log-tailing
Pull Request -
State: closed - Opened by Sanjay-Ganeshan 8 months ago
- 2 comments
Labels: CLA Signed, fb-exported
#972 - torchx - profile scheduler validate call
Pull Request -
State: closed - Opened by tonykao8080 8 months ago
- 7 comments
Labels: CLA Signed, fb-exported
#971 - Fix pyre version not found issue
Pull Request -
State: closed - Opened by hstonec 9 months ago
- 1 comment
Labels: CLA Signed
#970 - Update pyfmt component on FBS:master
Pull Request -
State: closed - Opened by hstonec 9 months ago
- 1 comment
Labels: CLA Signed, fb-exported
#969 - feat: add aws_g6e instances
Pull Request -
State: closed - Opened by clumsy 9 months ago
- 8 comments
Labels: CLA Signed
#968 - upgrade pyre version in `fbcode/torchx` - batch 1
Pull Request -
State: closed - Opened by connernilsen 9 months ago
- 1 comment
Labels: CLA Signed, fb-exported
#967 - Allow setting torchx session id from cmd
Pull Request -
State: closed - Opened by hstonec 9 months ago
- 2 comments
Labels: CLA Signed, fb-exported