Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / Lightning-AI/pytorch-lightning issues and pull requests

#19970 - Make checkpoint saving fully atomic

Issue - State: closed - Opened by radomirgr 5 months ago - 3 comments
Labels: feature, help wanted, checkpointing, ver: 2.2.x

#19969 - Fix minor typo in Trainer's documentation

Pull Request - State: closed - Opened by SamuelLarkin 5 months ago - 1 comment
Labels: docs, pl

#19968 - Update README.md

Pull Request - State: closed - Opened by williamFalcon 5 months ago - 1 comment

#19967 - docs: Bump HPU ref `1.6.0.rc0`

Pull Request - State: closed - Opened by pl-ghost 5 months ago - 2 comments
Labels: docs, pl

#19966 - min_epochs and EarlyStopping in conflict

Issue - State: open - Opened by timlod 5 months ago - 2 comments
Labels: bug, needs triage

#19965 - Loading saved config file fails because of InterpolationMode

Issue - State: closed - Opened by iulialexandra 5 months ago - 1 comment
Labels: bug, needs triage

#19964 - Documentation: writing custom samplers compatible with multi GPU training

Issue - State: open - Opened by fteufel 5 months ago
Labels: help wanted, docs

#19963 - Nicer logging of list of dicts for hyper parameters

Pull Request - State: open - Opened by vork 5 months ago - 1 comment
Labels: fabric

#19962 - modified num_replicas=self.world_size

Pull Request - State: open - Opened by arjunagarwal899 5 months ago
Labels: pl

#19961 - Returning num_replicas=world_size when using distributed sampler in ddp

Issue - State: open - Opened by arjunagarwal899 5 months ago - 3 comments
Labels: duplicate, feature, help wanted, distributed, strategy: ddp

#19959 - Removing numpy package from src/lightning

Pull Request - State: closed - Opened by Bhavay-2001 5 months ago - 3 comments
Labels: fabric

#19958 - Removing numpy package dependency in src/lightning package

Pull Request - State: closed - Opened by Bhavay-2001 5 months ago
Labels: fabric

#19957 - Logging Hyperparameters for list of dicts

Issue - State: open - Opened by vork 5 months ago
Labels: bug, needs triage, ver: 2.2.x

#19956 - ValueError: range() arg 3 must not be zero - Need to Identify the Root Cause

Issue - State: closed - Opened by YuyaWake 5 months ago - 1 comment
Labels: bug, needs triage, ver: 2.0.x

#19955 - Adam optimizer is slower after loading model from checkpoint

Issue - State: closed - Opened by radomirgr 5 months ago - 27 comments
Labels: bug, help wanted, optimization, performance

#19954 - Release 2.3.0

Pull Request - State: closed - Opened by awaelchli 5 months ago - 2 comments
Labels: ready, release, fabric, pl, package, data

#19953 - Make TensorBoardLogger default version creation ascii sortable

Issue - State: open - Opened by jdenhof 5 months ago
Labels: feature, needs triage

#19952 - Class name displayed incorrectly

Issue - State: open - Opened by HelixPiano 5 months ago
Labels: docs, needs triage

#19951 - docker image doesn't have `pytorch_lightning`

Issue - State: closed - Opened by grisaitis 5 months ago - 1 comment

#19950 - Autocast "cache_enabled=True" failing

Issue - State: open - Opened by thomassajot 5 months ago - 1 comment
Labels: bug, needs triage

#19949 - Use lr setter callback instead of `attr_name` in `LearningRateFinder` and `Tuner`

Issue - State: open - Opened by arthurdjn 6 months ago - 4 comments
Labels: feature, needs triage

#19948 - ci/docs: enable dispatch build without warning as errors

Pull Request - State: closed - Opened by Borda 6 months ago - 1 comment
Labels: ready, docs, ci

#19947 - Removing numpy requirement from all files in examples/pytorch/domain_templates

Pull Request - State: closed - Opened by Bhavay-2001 6 months ago - 6 comments
Labels: example, refactor, community, pl

#19946 - Fix strict loading from distributed checkpoints vs PyTorch nightly

Pull Request - State: closed - Opened by awaelchli 6 months ago - 2 comments
Labels: bug, checkpointing, fabric, fun

#19945 - Avoid casting with `numpy()` in `multiprocessing.py`

Issue - State: closed - Opened by Peiffap 6 months ago - 1 comment
Labels: help wanted, refactor

#19943 - `make test` fails with `subprocess-exited-with-error`: `AssertionError: Could not find cmake executable!`

Issue - State: open - Opened by Peiffap 6 months ago
Labels: bug, needs triage, ver: 2.2.x

#19942 - Replace usage of `grep -P` with `perl` in `run_standalone_tests.sh`

Pull Request - State: closed - Opened by Peiffap 6 months ago
Labels: ready, ci, community

#19941 - fix(docs): fix broken link to ensure the docs can be built

Pull Request - State: closed - Opened by yurijmikhalevich 6 months ago - 1 comment
Labels: ready, docs, app

#19940 - Custom batch selection for logging

Issue - State: open - Opened by bhosalems 6 months ago - 3 comments
Labels: feature, needs triage

#19939 - Callback for logging forward, backward and update time

Issue - State: open - Opened by joshim5 6 months ago
Labels: feature, needs triage

#19938 - `grep: Invalid option -- P` when running `./tests/run_standalone_tests.sh` on macOS

Issue - State: closed - Opened by Peiffap 6 months ago - 1 comment
Labels: bug, help wanted, tests, ver: 2.2.x

#19937 - Fix typos in CONTRIBUTING.md

Pull Request - State: closed - Opened by Peiffap 6 months ago
Labels: ci

#19936 - FileNotFoundError: [Errno 2] No such file or directory tfevents file

Issue - State: closed - Opened by bhosalems 6 months ago - 1 comment
Labels: bug, ver: 1.8.x

#19935 - Simplify loading full checkpoint in ModelParallelStrategy

Pull Request - State: closed - Opened by awaelchli 6 months ago
Labels: fabric

#19933 - is `lightning` and `pytorch_lightning` the same?

Issue - State: closed - Opened by stephanielees 6 months ago - 4 comments
Labels: bug, needs triage

#19932 - Continuing training with `ckpt_path="last"` and MLFLowLogger fails in distributed setting

Issue - State: open - Opened by selflein 6 months ago
Labels: bug, logger: mlflow

#19931 - Destroy process group in atexit handler

Pull Request - State: closed - Opened by awaelchli 6 months ago - 2 comments
Labels: ready, fabric, pl

#19929 - ModelCheckpoint does not work when using the monitor

Issue - State: closed - Opened by QianhangFeng 6 months ago - 1 comment
Labels: bug, callback: model checkpoint, repro needed, ver: 2.2.x

#19926 - Update FlopCounterMode usage in throughput.py

Pull Request - State: closed - Opened by IvanYashchuk 6 months ago - 2 comments
Labels: ready, fabric

#19925 - Update code owners file

Pull Request - State: closed - Opened by awaelchli 6 months ago - 1 comment
Labels: ci

#19923 - Lightning Fabric: generic method to get the full state dict

Issue - State: open - Opened by Xynonners 6 months ago
Labels: feature, needs triage

#19922 - Update code owners file

Pull Request - State: closed - Opened by awaelchli 6 months ago - 1 comment
Labels: docs, ci, pl

#19921 - forward method missing required positional argument ‘masks’ in PyTorch Lightning

Issue - State: closed - Opened by YuyaWake 6 months ago - 6 comments
Labels: question

#19920 - The training process will stop unexpectedly

Issue - State: open - Opened by 5huanghuai 6 months ago - 1 comment
Labels: bug, needs triage, repro needed

#19919 - XLA FSDP strategy has undocumented requirement for using activation checkpointing

Issue - State: open - Opened by ebreck 6 months ago
Labels: bug, needs triage

#19918 - Disable skipping training step in distributed training

Pull Request - State: closed - Opened by awaelchli 6 months ago - 2 comments
Labels: ready, breaking change, loops, pl

#19917 - Update docstring for `self.log` about keys in distributed training

Pull Request - State: closed - Opened by awaelchli 6 months ago - 2 comments
Labels: ready, docs, pl

#19916 - KeyboardInterrupt raises an exception which results in a zero exit code

Issue - State: open - Opened by amarckal 6 months ago
Labels: bug, help wanted, environment: slurm, ver: 2.0.x, ver: 2.1.x, ver: 2.2.x

#19915 - Check if CometLogger experiment is alive

Pull Request - State: closed - Opened by EtayLivne 6 months ago - 1 comment
Labels: logger: comet, community, pl

#19913 - Add Studio badge to tensor parallel docs

Pull Request - State: closed - Opened by awaelchli 6 months ago - 1 comment
Labels: ready, docs, fabric, pl

#19911 - Include VertexAI cluster environment for Fabric

Pull Request - State: open - Opened by miguelalba96 6 months ago
Labels: docs, fabric, pl

#19910 - element 0 of tensors does not require grad and does not have a grad_fn in "test_step" and "validation_step"

Issue - State: closed - Opened by SongJgit 6 months ago - 4 comments
Labels: working as intended

#19909 - "save_last" could not save a complete checkpoint

Issue - State: open - Opened by kxgong 6 months ago - 1 comment
Labels: bug, needs triage, ver: 1.9.x

#19906 - Add functionality to save nn.Modules supplied as arguments when initialising LightningModule

Issue - State: closed - Opened by tom-hehir 6 months ago
Labels: feature, needs triage

#19905 - AttributeError: type object 'Trainer' has no attribute 'add_argparse_args'

Issue - State: closed - Opened by Park-yebin 6 months ago - 1 comment
Labels: question, working as intended, ver: 2.0.x, ver: 2.1.x

#19904 - Remove unknown `[metadata]` table from `pyproject.toml`

Pull Request - State: closed - Opened by ringohoffman 6 months ago - 1 comment
Labels: ready, community, package

#19903 - CUDA unknown error

Issue - State: closed - Opened by aniketmaurya 6 months ago - 1 comment
Labels: bug, needs triage

#19902 - Error for unsupported precision types with ModelParallelStrategy

Pull Request - State: closed - Opened by awaelchli 6 months ago - 2 comments
Labels: feature, ready, fabric, pl

#19900 - Creating A Second Comet Logger Disables The First

Issue - State: closed - Opened by EtayLivne 6 months ago
Labels: bug, needs triage, ver: 2.1.x

#19899 - (10/10) Support 2D Parallelism - Port Fabric docs to PL

Pull Request - State: closed - Opened by awaelchli 6 months ago - 1 comment
Labels: ready, docs, fabric, pl

#19898 - Fabric: Incorrect `num_replicas` (ddp/fsdp) when number of GPUs on each node is different

Issue - State: open - Opened by shaibagon 6 months ago - 2 comments
Labels: bug, needs triage

#19897 - docs: prune unused `linkcode`

Pull Request - State: closed - Opened by Borda 6 months ago - 1 comment
Labels: ready, docs, fabric, app

#19896 - docs: fix link to CLIP

Pull Request - State: closed - Opened by Borda 6 months ago - 1 comment
Labels: ready, docs, app

#19895 - Error when fast_dev_run=True or num_sanity_val_steps=0 and using torchmetrics MetricTracker

Issue - State: open - Opened by MoustHolmes 6 months ago
Labels: bug, needs triage, ver: 2.2.x

#19894 - Remove the requirement for FSDPStrategy subclasses to only support GPU

Pull Request - State: closed - Opened by awaelchli 6 months ago - 2 comments
Labels: feature, ready, fabric

#19893 - Patch release 2.2.5

Pull Request - State: closed - Opened by awaelchli 6 months ago - 4 comments
Labels: bug, ready, docs, release, fabric, app (removed), pl, dependencies, package

#19891 - MisconfigurationException: Do not set `gradient_accumulation_steps` in the DeepSpeed config

Issue - State: open - Opened by mxkrn 6 months ago
Labels: bug, needs triage

#19890 - Is "Prepare a config file for the CLI" out of date?

Issue - State: open - Opened by zengchang233 6 months ago - 2 comments
Labels: bug, needs triage

#19889 - MLFlowLogger fails when logging hyperparameters as Trainer already does automatically

Issue - State: open - Opened by CristoJV 6 months ago
Labels: bug, needs triage, ver: 2.1.x

#19888 - (9/n) Support 2D Parallelism - Remaining Checkpoint Logic

Pull Request - State: closed - Opened by awaelchli 6 months ago - 2 comments
Labels: feature, ready, checkpointing, fabric, pl

#19887 - (8/n) Support 2D Parallelism - 2D Parallel Fabric Docs

Pull Request - State: closed - Opened by awaelchli 6 months ago - 1 comment
Labels: ready, docs, fabric, pl

#19886 - Fix state dict loading in bitsandbytes plugin when checkpoint is already quantized

Pull Request - State: closed - Opened by awaelchli 6 months ago - 2 comments
Labels: bug, ready, fabric, pl, precision: bnb

#19884 - (7/n) Support 2D Parallelism - TP Fabric Docs

Pull Request - State: closed - Opened by awaelchli 6 months ago - 2 comments
Labels: ready, docs, fabric, pl

#19883 - Lightning stalls with 2 GPUs on 1 node with SLURM (and apptainer)

Issue - State: open - Opened by sorenwacker 6 months ago - 3 comments
Labels: bug, needs triage

#19882 - Enable loss-parallel in example

Pull Request - State: closed - Opened by awaelchli 6 months ago - 1 comment
Labels: example, ready, fabric, pl

#19881 - WIP: Integrate Collective into strategies

Pull Request - State: open - Opened by awaelchli 6 months ago
Labels: refactor, fabric

#19880 - can't fit with ddp_notebook on a Vertex AI Workbench instance (CUDA initialized)

Issue - State: open - Opened by jasonbrancazio 6 months ago
Labels: bug, needs triage

#19879 - (6/n) Support 2D Parallelism - Trainer example

Pull Request - State: closed - Opened by awaelchli 6 months ago - 1 comment
Labels: example, ready, fabric, pl

#19878 - (5/n) Support 2D Parallelism in Lightning Trainer

Pull Request - State: closed - Opened by awaelchli 6 months ago - 2 comments
Labels: feature, ready, fabric, pl

#19877 - Remove redundant code to set the device on the LightningModule

Pull Request - State: closed - Opened by awaelchli 6 months ago - 2 comments
Labels: ready, refactor, code quality, pl

#19874 - Using the MLflow logger produces Inconsistent metric plots

Issue - State: open - Opened by gboeer 6 months ago - 2 comments
Labels: bug, needs triage

#19873 - AttributeError: module 'pytorch_lightning.callbacks' has no attribute 'ProgressBarBase'. Did you mean: 'ProgressBar'?

Issue - State: closed - Opened by carusyte 6 months ago - 1 comment
Labels: question, progress bar: tqdm

#19872 - (4/n) Support 2D Parallelism - Loading optimizer states correctly

Pull Request - State: closed - Opened by awaelchli 6 months ago - 2 comments
Labels: feature, ready, checkpointing, fabric

#19870 - (3/n) Support 2D Parallelism - Efficient loading of full-state checkpoints

Pull Request - State: closed - Opened by awaelchli 6 months ago - 2 comments
Labels: ready, refactor, fabric, performance, pl

#19869 - Error loading a saved model to run inference (using ddp_notebook strategy)

Issue - State: open - Opened by carlos-havier 6 months ago - 1 comment
Labels: bug, needs triage, ver: 2.1.x

#19868 - Possible bug in recognizing `mps` accelerator even though PyTorch seems to register the `mps` device?

Issue - State: open - Opened by adam2392 6 months ago - 1 comment
Labels: bug, needs triage

#19867 - Added some more potentially robust ways to do learning rate tuning

Pull Request - State: open - Opened by varchasgopalaswamy 6 months ago
Labels: pl

#19865 - Resume training, how to change learning scheduler?

Issue - State: open - Opened by jzhanghzau 6 months ago - 1 comment
Labels: bug, needs triage, ver: 2.2.x

#19862 - Add dog has an error: FileNotFoundError:

Issue - State: closed - Opened by BaiYuanxi-dev 6 months ago - 1 comment
Labels: question

#19858 - Dynamically link arguments in `LightningCLI`?

Issue - State: closed - Opened by EthanMarx 6 months ago - 2 comments
Labels: feature, lightningcli

#19857 - Update Lightning Cloud 0.5.69

Pull Request - State: closed - Opened by tchaton 6 months ago - 1 comment
Labels: ready, app, dependencies

#19856 - Reduce queue fetching

Pull Request - State: closed - Opened by tchaton 6 months ago - 1 comment
Labels: ready, app

#19854 - Is the Lightning App deprecated? (Lightning App doc is not found)

Issue - State: closed - Opened by guyleaf 6 months ago - 1 comment
Labels: bug, app

#19852 - (2/n) Support 2D Parallelism - Distributed Checkpoints

Pull Request - State: closed - Opened by awaelchli 6 months ago - 3 comments
Labels: feature, ready, checkpointing, fabric, pl

#19847 - Fix typo on `estimated_stepping_batches` property

Pull Request - State: closed - Opened by afspies 7 months ago - 2 comments
Labels: docs, community, pl

#19843 - docs: Bump HPU ref `1.5.0`

Pull Request - State: closed - Opened by pl-ghost 7 months ago - 1 comment
Labels: docs, pl