Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / facebookincubator/submitit issues and pull requests

#1780 - Gracefully exit rank != 0 job steps on slurm cluster

Pull Request - State: open - Opened by erjel 22 days ago
Labels: CLA Signed

#1779 - Run in container

Issue - State: closed - Opened by philmod-h about 1 month ago - 1 comment

#1778 - Max jobs progressively

Pull Request - State: closed - Opened by NickKocher about 2 months ago - 1 comment

#1777 - Include email parameters in job submission

Issue - State: open - Opened by nicolazilio 2 months ago

#1775 - How to catch timed out array jobs

Issue - State: closed - Opened by lee-jin-gyu96 2 months ago - 2 comments

#1774 - Fix black

Pull Request - State: closed - Opened by jrapin 2 months ago
Labels: CLA Signed

#1773 - Update CI packages

Pull Request - State: closed - Opened by jrapin 2 months ago
Labels: CLA Signed

#1772 - ModuleNotFoundError: No module named "models"

Issue - State: closed - Opened by victoris93 3 months ago - 4 comments

#1771 - Keep original tmp slurm submission file as a hidden symlink

Pull Request - State: closed - Opened by xman1979 3 months ago - 1 comment
Labels: CLA Signed

#1770 - Consider PREEMPTED state as "not done"

Pull Request - State: open - Opened by denizokt 3 months ago - 1 comment

#1769 - Out Of Memory

Issue - State: open - Opened by owaisCS 3 months ago

#1768 - Activate github actions

Pull Request - State: closed - Opened by jrapin 4 months ago
Labels: CLA Signed

#1767 - Allow hiding extra env variables in clean_env()

Pull Request - State: closed - Opened by baldassarreFe 4 months ago - 1 comment
Labels: CLA Signed

#1766 - How to retrieve a running job and cancel it

Issue - State: open - Opened by mzhang2FW 6 months ago - 4 comments

#1764 - Can I use torchrun with submitit?

Issue - State: open - Opened by vasudev-sharma 9 months ago - 1 comment

#1763 - Using `RsyncSnapshot` with a editable package install

Issue - State: closed - Opened by jc-audet 9 months ago - 2 comments

#1762 - Submitit jobs die with no error on cluster with SLURM 19.05

Issue - State: open - Opened by mihdalal 10 months ago - 1 comment

#1760 - Turn off Signal Handling

Issue - State: open - Opened by lukasbm 11 months ago - 2 comments

#1759 - Too many sacct requests for batched tasks

Issue - State: open - Opened by Fadelis98 11 months ago

#1758 - Failed to launch: Invalid wckey specification

Issue - State: open - Opened by rskwesterman 12 months ago - 1 comment

#1757 - When 'submitit' meet 'mpirun', there will be a very strange BUG.

Issue - State: closed - Opened by yinkaaiwu 12 months ago - 3 comments

#1756 - Improving performance with NVidia GPU affinity?

Issue - State: open - Opened by giorgos117 12 months ago

#1755 - Bump version to 1.5.1

Pull Request - State: closed - Opened by jrapin about 1 year ago
Labels: CLA Signed

#1754 - Add optional setup step for local executor

Pull Request - State: closed - Opened by jrapin about 1 year ago - 1 comment
Labels: CLA Signed

#1752 - Update version to 1.5.0 (and drop support for Python 3.6 and 3.7)

Pull Request - State: closed - Opened by jrapin about 1 year ago
Labels: CLA Signed

#1751 - Update pylint version

Pull Request - State: closed - Opened by jrapin about 1 year ago
Labels: CLA Signed

#1750 - Enable python executable selection for local executor

Pull Request - State: closed - Opened by jrapin about 1 year ago - 3 comments
Labels: CLA Signed

#1749 - Update black and isort versions

Pull Request - State: closed - Opened by jrapin about 1 year ago - 1 comment
Labels: CLA Signed

#1748 - Does setting `folder` in `AutoExecutor` interfere with sattach?

Issue - State: open - Opened by fleimgruber about 1 year ago - 1 comment

#1747 - Update mypy (and CI python version to 3.8)

Pull Request - State: closed - Opened by jrapin about 1 year ago - 1 comment
Labels: CLA Signed

#1746 - Make local job instances picklable

Pull Request - State: closed - Opened by jrapin about 1 year ago
Labels: CLA Signed

#1745 - Add an option for not using srun

Pull Request - State: closed - Opened by jrapin about 1 year ago - 1 comment
Labels: CLA Signed

#1744 - Add support for OAR Scheduler

Pull Request - State: open - Opened by ychiat35 about 1 year ago - 3 comments
Labels: CLA Signed

#1743 - Update version tag to 1.4.6 for release

Pull Request - State: closed - Opened by jrapin about 1 year ago
Labels: CLA Signed

#1741 - Support Slurm Heterogeneous Job

Issue - State: open - Opened by sunshine-syz about 1 year ago - 2 comments

#1740 - Add nodelist/mail params/dependency as first class slurm parameters

Pull Request - State: closed - Opened by jrapin over 1 year ago - 1 comment
Labels: CLA Signed

#1739 - Enabling sbatch file re-use.

Issue - State: open - Opened by alexnwang over 1 year ago - 2 comments

#1738 - AttributeError , AutoExecutor attribute not recognised by submitit

Issue - State: closed - Opened by willianck over 1 year ago - 1 comment

#1737 - Conda version out of date

Issue - State: open - Opened by Ubadub over 1 year ago - 1 comment

#1736 - array_parallelism on local machine

Issue - State: closed - Opened by sparisi over 1 year ago - 1 comment

#1735 - Documentation of `executor.update_parameters` arguments

Issue - State: open - Opened by JoeZiminski over 1 year ago - 1 comment

#1734 - Submitit with SLURM sub-scheduling

Issue - State: open - Opened by giorgos117 over 1 year ago - 2 comments

#1733 - Running on Galahad

Pull Request - State: closed - Opened by mb010 over 1 year ago - 1 comment

#1730 - SLURM Jobs keep running after successful job completion.

Issue - State: closed - Opened by subho406 over 1 year ago

#1729 - Add singularity compatibility #1608

Pull Request - State: closed - Opened by gwenzek over 1 year ago - 3 comments
Labels: CLA Signed

#1728 - Add custom options to sbatch command in SLURM

Issue - State: open - Opened by nilskober over 1 year ago - 4 comments

#1727 - duplicate tasks when using `SlurmExecutor.map_array`

Issue - State: closed - Opened by eringrant almost 2 years ago - 3 comments

#1726 - Submitit with sbatch

Issue - State: open - Opened by pfrwilson almost 2 years ago - 6 comments
Labels: question

#1725 - Remove `#SBATCH --nodes=1`

Issue - State: closed - Opened by sgbaird almost 2 years ago - 3 comments

#1724 - Compute Canada

Issue - State: closed - Opened by kaijieshi7 almost 2 years ago - 1 comment

#1723 - Can submitit manage chain dependencies?

Issue - State: closed - Opened by eserie almost 2 years ago - 1 comment

#1722 - Should we submit job on login node?

Issue - State: open - Opened by surajmenon72 almost 2 years ago - 1 comment
Labels: question

#1721 - No user code logging output is shown in logs

Issue - State: closed - Opened by fleimgruber about 2 years ago - 2 comments

#1720 - be tolerating about sacct error?

Issue - State: closed - Opened by min-xu-ai about 2 years ago - 2 comments

#1719 - Consider supporting slurm rest api

Issue - State: open - Opened by zeronewb about 2 years ago - 2 comments
Labels: question

#1718 - array_parallelism for LocalExecutor

Issue - State: closed - Opened by se-ok about 2 years ago - 2 comments

#1716 - InfoWatch might get previous jobid info after slurm restart

Issue - State: open - Opened by Liangtaiwan about 2 years ago - 1 comment
Labels: question

#1715 - reraise exception back to user

Pull Request - State: open - Opened by gwenzek about 2 years ago - 6 comments
Labels: CLA Signed

#1714 - Printing in Signal Handlers May Be Unsafe

Issue - State: open - Opened by Queuecumber about 2 years ago - 1 comment
Labels: enhancement

#1713 - Add a timeout to scontrol requeue + explicitely delete function before pickling

Pull Request - State: closed - Opened by jrapin about 2 years ago - 1 comment
Labels: CLA Signed

#1712 - UnicodeDecodeError fails the job

Issue - State: open - Opened by phtu-cs about 2 years ago - 2 comments
Labels: question

#1711 - Submit Over SSH?

Issue - State: open - Opened by JRJacoby about 2 years ago - 4 comments
Labels: enhancement

#1709 - Switching from USR1 Breaks Pytorch Lightning

Issue - State: open - Opened by Queuecumber about 2 years ago - 4 comments

#1708 - Unwanted behavior after a slurm job time limit

Issue - State: closed - Opened by ofir1080 over 2 years ago - 1 comment

#1707 - Recover jobs after kernel dies

Issue - State: closed - Opened by SamuelGabriel over 2 years ago - 1 comment

#1706 - Fallback to slurm for TorchDistributedEnv

Pull Request - State: open - Opened by jrapin over 2 years ago - 1 comment
Labels: CLA Signed

#1705 - Option to overwrite exported variables in TorchDistributedEnvironment

Pull Request - State: closed - Opened by qasfb over 2 years ago - 3 comments
Labels: CLA Signed

#1704 - Submitit puts all tasks on a single GPU

Issue - State: closed - Opened by Bai-YT over 2 years ago - 3 comments

#1703 - Add helper class to facilitate PyTorch distributed initialization

Pull Request - State: closed - Opened by patricklabatut over 2 years ago - 3 comments
Labels: CLA Signed

#1702 - Revert to using USR2 ("cleaner" option)

Pull Request - State: closed - Opened by jrapin over 2 years ago
Labels: CLA Signed

#1701 - Use SIGHUP as default for preemption signal

Pull Request - State: closed - Opened by jrapin over 2 years ago
Labels: CLA Signed

#1700 - [Tentative] Rerun jobs more easily

Pull Request - State: open - Opened by jrapin over 2 years ago - 1 comment
Labels: CLA Signed

#1699 - Add a helper for temporary removing slurm and submitit env variables

Pull Request - State: closed - Opened by jrapin over 2 years ago
Labels: CLA Signed

#1698 - submit job array to multiple partitions

Issue - State: open - Opened by MinkyuHa over 2 years ago

#1697 - [NCCL Conflict] Use USR2 instead of USR1

Pull Request - State: closed - Opened by jrapin over 2 years ago - 2 comments
Labels: CLA Signed

#1696 - How to specify GPUs when executing locally?

Issue - State: closed - Opened by j0ma over 2 years ago - 5 comments

#1695 - Fix submissions running on Windows

Pull Request - State: open - Opened by tmct over 2 years ago
Labels: CLA Signed

#1694 - Skip unnecessary pickle file checks

Pull Request - State: closed - Opened by tmct over 2 years ago - 5 comments
Labels: CLA Signed

#1693 - Fix truncation of "Executor" in executor class name

Pull Request - State: closed - Opened by tmct over 2 years ago - 5 comments
Labels: CLA Signed

#1692 - NodeList Declaration

Issue - State: closed - Opened by Bontempogianpaolo1 over 2 years ago - 2 comments

#1691 - make a git tag on `make release`

Pull Request - State: closed - Opened by gwenzek over 2 years ago
Labels: CLA Signed

#1690 - Latest versions' tags not on Github

Issue - State: closed - Opened by tmct over 2 years ago - 2 comments

#1689 - Task does not wait for GPU memory resources

Issue - State: closed - Opened by chirico85 over 2 years ago - 1 comment

#1688 - `make integration` should also run `pip install --upgrade`

Pull Request - State: closed - Opened by gwenzek over 2 years ago
Labels: CLA Signed

#1687 - filedescriptor out of range in select()

Issue - State: closed - Opened by timlacroix over 2 years ago - 1 comment

#1686 - Use poll instead of select

Pull Request - State: closed - Opened by timlacroix over 2 years ago - 2 comments
Labels: CLA Signed

#1685 - `tasks_per_node=1` does not keep the number of tasks to 1 for the `LocalExecutor`

Issue - State: open - Opened by ihowell over 2 years ago - 4 comments
Labels: enhancement

#1684 - Progress Bar for Jobs (Implementation)

Issue - State: closed - Opened by yuvalkirstain over 2 years ago - 4 comments

#1683 - Update broken link in nevergrad.md

Pull Request - State: closed - Opened by charmoniumQ over 2 years ago - 3 comments
Labels: CLA Signed

#1682 - Release version 1.4 to pypi

Issue - State: closed - Opened by OhadRubin over 2 years ago - 1 comment

#1681 - [enhancement] Time info, like time taken, within Job objects

Issue - State: open - Opened by mennowitteveen over 2 years ago - 2 comments
Labels: enhancement