Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / Lightning-AI/lightning issues and pull requests
#18877 - Update Habana integration to 1.2
Pull Request -
State: closed - Opened by carmocca 11 months ago
- 2 comments
Labels: ready, docs, tests, accelerator: hpu (external), pl, dependencies
#18876 - Update logic to parse trainer settings from env vars
Pull Request -
State: closed - Opened by awaelchli 11 months ago
- 5 comments
Labels: bug, has conflicts, lightningcli, strategy: ddp, trainer, pl
#18875 - Feature/15718 add sagemaker experiments logger
Pull Request -
State: closed - Opened by tsenst 11 months ago
Labels: docs, ci, fabric, app, pl, dependencies, package
#18874 - Can't override Trainer defaults via env variables for LightningCLI
Issue -
State: closed - Opened by awaelchli 11 months ago
- 2 comments
Labels: bug, strategy: ddp, ver: 2.1.x
#18873 - fabric.save errors when checkpoint exists; introduce a override_checkpoint=True argument
Issue -
State: closed - Opened by rasbt 11 months ago
- 2 comments
Labels: feature, design, checkpointing, fabric
#18872 - DDP + static graph can result in garbage data returned by `all_gather`
Issue -
State: open - Opened by mooninrain 11 months ago
Labels: bug, 3rd party, ver: 2.0.x, repro needed
#18871 - self.log issue with torch.compile.
Issue -
State: closed - Opened by assassin1991 11 months ago
- 2 comments
Labels: bug, needs triage, ver: 2.1.x
#18870 - Support for returning LRSchedulerConfig on LightningModule.configure_optimizers
Issue -
State: open - Opened by function2-llx 11 months ago
- 1 comment
Labels: feature, help wanted, lightningmodule, lr scheduler
#18869 - Consistent imports in docs for core APIs
Pull Request -
State: closed - Opened by awaelchli 11 months ago
- 1 comment
Labels: ready, docs, pl, fun
#18867 - Fix `ModelCheckpoint` callback for no loggers case
Pull Request -
State: closed - Opened by ioangatop 11 months ago
- 2 comments
Labels: bug, ready, callback: model checkpoint, community, pl
#18866 - Update typing_extensions minimum. Add overrides to ParallelStrategy.
Pull Request -
State: open - Opened by seanbethard 11 months ago
- 4 comments
Labels: has conflicts, fabric, pl, dependencies
#18865 - Callback `ModelCheckpoint` option `save_last` without logger fails on remote FS
Issue -
State: closed - Opened by ioangatop 11 months ago
- 1 comment
Labels: bug, callback: model checkpoint, ver: 2.1.x, ver: 2.2.x
#18864 - Fix `CSVLogger` for remote FS
Pull Request -
State: open - Opened by ioangatop 11 months ago
- 3 comments
Labels: fabric
#18863 - Handle checkpoint dirpath suffix in NeptuneLogger
Pull Request -
State: closed - Opened by AleksanderWWW 11 months ago
- 6 comments
Labels: bug, ready, logger: neptune, community, pl
#18862 - Instantiation of Runners is very slow on Windows
Issue -
State: closed - Opened by newLabAspect 11 months ago
- 4 comments
Labels: question, performance, ver: 2.1.x
#18861 - `CSVLogger` fails on remote FS on version `2.1.0`
Issue -
State: open - Opened by ioangatop 11 months ago
- 2 comments
Labels: bug, logger: csv, ver: 2.1.x
#18860 - Add broadcast to Dataset Optimizer with multiple nodes
Pull Request -
State: closed - Opened by tchaton 11 months ago
- 2 comments
Labels: ready, ci, app, dependencies
#18859 - Restore support for builds without distributed
Pull Request -
State: closed - Opened by carmocca 11 months ago
- 1 comment
Labels: bug, ready, fabric, pl
#18858 - Support for torch without distributed broken
Issue -
State: closed - Opened by adamjstewart 11 months ago
- 3 comments
Labels: bug, distributed, ver: 2.1.x
#18857 - Hanging with NeMo
Issue -
State: open - Opened by szhengac 11 months ago
Labels: bug, needs triage, ver: 2.0.x
#18856 - Update debugging_basic.rst
Pull Request -
State: closed - Opened by rasbt 11 months ago
Labels: ready, docs, community, pl
#18854 - Bugfix/18394 batch size finder max val batches
Pull Request -
State: closed - Opened by BoringDonut 11 months ago
- 2 comments
Labels: bug, ready, tuner, community, pl
#18853 - Loading a distributed checkpoint with Fabric fails with a RuntimeError
Issue -
State: closed - Opened by rasbt 11 months ago
- 2 comments
Labels: bug, fabric, strategy: fsdp, ver: 2.1.x
#18852 - Add `torch.compile` guide to docs
Issue -
State: open - Opened by carmocca 11 months ago
- 1 comment
Labels: docs, fabric, performance, pl, torch.compile
#18851 - [WIP] Avoid moving XLA model to CPU in teardown [TPU]
Pull Request -
State: open - Opened by awaelchli 11 months ago
- 2 comments
Labels: accelerator: tpu, strategy: xla
#18850 - Add distributed support for StreamingDataset
Pull Request -
State: closed - Opened by tchaton 11 months ago
- 1 comment
Labels: ready
#18848 - Add throughput utilities to Fabric and the Trainer
Pull Request -
State: closed - Opened by carmocca 11 months ago
- 1 comment
Labels: feature, ready, docs, callback, fabric, pl
#18847 - Extend warning about reducing non floating types
Pull Request -
State: closed - Opened by carmocca 11 months ago
- 2 comments
Labels: feature, ready, logging, pl
#18846 - Change dangerous default random seed selection
Pull Request -
State: closed - Opened by awaelchli 11 months ago
- 3 comments
Labels: feature, ready, breaking change, fabric, reproducibility, pl
#18845 - Extra GPU usage in ddp and ddp-spawn
Issue -
State: closed - Opened by FANGAreNotGnu 11 months ago
- 2 comments
Labels: bug, strategy: ddp, ver: 2.0.x, repro needed
#18844 - Tensor wrapper subclass to avoid `fabric.backward`
Pull Request -
State: open - Opened by carmocca 11 months ago
- 2 comments
Labels: feature, fabric, pl
#18843 - Fixes in evaluation_basic.rst
Pull Request -
State: closed - Opened by rasbt 11 months ago
Labels: ready, docs, pl
#18842 - `Fabric.configure_module` breaks `@property.setter`
Issue -
State: closed - Opened by busFred 11 months ago
- 2 comments
Labels: bug, ver: 2.0.x, repro needed
#18840 - Rename PrecisionPlugin -> Precision
Pull Request -
State: closed - Opened by awaelchli 11 months ago
- 4 comments
Labels: ready, docs, refactor, fabric, plugin, pl, fun
#18838 - Add example for loading a LightningModule if it has additional init arguments
Pull Request -
State: closed - Opened by rasbt 12 months ago
- 2 comments
Labels: ready, docs, pl
#18837 - ci: fix typo in SHA ref
Pull Request -
State: closed - Opened by Borda 12 months ago
- 1 comment
Labels: ready, ci
#18836 - Stuck at loading the trainer module
Issue -
State: closed - Opened by alalith3298 12 months ago
- 2 comments
Labels: bug, accelerator: cuda, ver: 2.1.x, repro needed
#18835 - Issue with logs when using torch.compile
Issue -
State: open - Opened by Forbu 12 months ago
- 5 comments
Labels: bug, torch.compile, ver: 2.1.x
#18834 - `BatchSizeFinder` limits number of validation batches for the whole training process
Issue -
State: closed - Opened by BoringDonut 12 months ago
- 3 comments
Labels: bug, duplicate, tuner, ver: 2.0.x, ver: 1.8.x
#18833 - Add a `prefix` paramater to `self.log_dict()`
Issue -
State: closed - Opened by GaetanLepage 12 months ago
- 3 comments
Labels: feature, logging
#18832 - Have each DDP worker optimizing a specific layer of a common model
Issue -
State: open - Opened by rob-hen 12 months ago
Labels: feature, needs triage
#18831 - `training_step(dataloader_iter)` no longer moves batch to device in 2.1
Issue -
State: closed - Opened by YichengDWu 12 months ago
- 4 comments
Labels: question, docs, data handling, ver: 2.1.x
#18830 - Missing folder error when using TensorBoardLogger with S3 uri
Issue -
State: open - Opened by celpas 12 months ago
Labels: bug, needs triage, ver: 2.0.x
#18829 - Error resuming checkpoint when using `configure_model` method of `LightningModule`
Issue -
State: closed - Opened by Kinyugo 12 months ago
- 1 comment
Labels: bug, duplicate, ver: 2.0.x
#18828 - Scheduler is still stepped when optimizer stepping is skipped.
Issue -
State: closed - Opened by oguz-hanoglu 12 months ago
- 3 comments
Labels: bug, duplicate, precision: amp, ver: 2.0.x
#18827 - Improve DatasetOptimizer API
Pull Request -
State: closed - Opened by tchaton 12 months ago
- 2 comments
Labels: ready, app, dependencies
#18826 - Fix `BatchSizeFinder` leaving model in train state
Pull Request -
State: open - Opened by tanaymeh 12 months ago
- 10 comments
Labels: bug, tuner, community, pl
#18825 - Provide DDP rank in Module constructor to enable setting requires_grad worker dependent
Issue -
State: closed - Opened by rob-hen 12 months ago
Labels: feature, needs triage
#18824 - `LightningModule.to_torchscript()` does not transfer check_inputs to correct device
Issue -
State: open - Opened by pfeatherstone 12 months ago
- 6 comments
Labels: bug, good first issue, ver: 2.0.x, repro needed
#18823 - LightningCLI logger related tests not being run in pull requests
Issue -
State: closed - Opened by mauvilsa 12 months ago
- 3 comments
Labels: bug, ci, tests, ver: 2.1.x
#18822 - LinghtningCLI now will not allow setting a class instance as a default
Pull Request -
State: closed - Opened by mauvilsa 12 months ago
- 1 comment
Labels: ready, lightningcli, community, pl, dependencies
#18821 - Fix failing lightning cli entry point
Pull Request -
State: closed - Opened by awaelchli 12 months ago
- 2 comments
Labels: bug, ready, fabric
#18820 - transformer engine (FP8) support for FSDP training
Issue -
State: closed - Opened by naveenkumarmarri 12 months ago
- 1 comment
Labels: feature, needs triage
#18819 - Avoid false-positive warnings about method calls on the Fabric-wrapped module
Pull Request -
State: closed - Opened by awaelchli 12 months ago
- 2 comments
Labels: feature, ready, fabric, pl, fun
#18818 - Fix reduce type in FSDP mixed precision
Pull Request -
State: open - Opened by awaelchli 12 months ago
Labels: fabric, pl
#18817 - Tiny fixes for the Cache & DatasetOptimizer
Pull Request -
State: closed - Opened by tchaton 12 months ago
- 1 comment
Labels: ready
#18816 - Update bug report template for 2.1
Pull Request -
State: closed - Opened by awaelchli 12 months ago
- 1 comment
Labels: ready, ci
#18815 - lightning run cli entry point stopped working after dropping app from top level
Issue -
State: closed - Opened by awaelchli 12 months ago
Labels: bug, app, dependencies, ver: 2.1.x
#18814 - Bump @babel/traverse from 7.18.6 to 7.23.2 in /src/lightning/app/cli/react-ui-template/ui
Pull Request -
State: open - Opened by dependabot[bot] 12 months ago
- 1 comment
Labels: app, javascript
#18813 - BatchSizeFinder leaves model in the train state if used with trainer.validate
Issue -
State: open - Opened by BoringDonut 12 months ago
- 2 comments
Labels: bug, tuner, ver: 2.0.x, ver: 1.7.x, ver: 1.8.x
#18812 - LR Finder fails when using multi-node training
Issue -
State: open - Opened by praritagarwal 12 months ago
- 1 comment
Labels: question, tuner, ver: 2.1.x
#18811 - The classmethod `.load_from_checkpoint` cannot be called on an instance. Please call it on the class type and make sure the return value is used.
Issue -
State: closed - Opened by SergeySakharovskiy 12 months ago
- 1 comment
Labels: question, ver: 2.0.x, ver: 2.1.x
#18809 - Missing Positional Arguments from CLI/Config File
Issue -
State: open - Opened by tommycwh 12 months ago
- 1 comment
Labels: bug, needs triage, ver: 2.0.x
#18808 - train_dataloader not recognized in Data Module
Issue -
State: closed - Opened by jscottcronin 12 months ago
- 2 comments
Labels: needs triage, ver: 2.0.x
#18807 - Add support for text
Pull Request -
State: closed - Opened by tchaton 12 months ago
- 1 comment
Labels: ready, ci
#18806 - Bad doc webpage layout
Issue -
State: closed - Opened by yuzhenmao 12 months ago
- 3 comments
Labels: docs, ver: 2.1.x
#18805 - Access denied to save model checkpoint on AWS S3.
Issue -
State: closed - Opened by celsofranssa 12 months ago
- 1 comment
Labels: bug, needs triage, ver: 2.0.x
#18804 - Modification of the current_epoch attribute or other interesting @properties without setters
Issue -
State: open - Opened by rucky96 12 months ago
Labels: feature, needs triage
#18803 - [Bug] RuntimeError: No backend type associated with device type cpu
Issue -
State: open - Opened by shenoynikhil 12 months ago
- 12 comments
Labels: bug, working as intended, ver: 2.1.x
#18802 - FSDP not working well with BatchNorm and 16-mixed precision
Issue -
State: closed - Opened by DLlearn 12 months ago
- 3 comments
Labels: bug, 3rd party, precision: amp, strategy: fsdp
#18801 - docs: update ref to latest tutorials
Pull Request -
State: closed - Opened by pl-ghost 12 months ago
- 1 comment
Labels: ready, examples
#18800 - Docs website css is buggy
Issue -
State: closed - Opened by busFred 12 months ago
- 1 comment
Labels: docs
#18798 - Cannot use compiled model together with the `ddp` strategy
Issue -
State: closed - Opened by quancs 12 months ago
- 1 comment
Labels: bug, needs triage, ver: 2.0.x
#18796 - Add name and version
Pull Request -
State: closed - Opened by tchaton 12 months ago
- 2 comments
Labels: ready, app, dependencies
#18795 - ci: simplify/unify make docs targets
Pull Request -
State: closed - Opened by Borda 12 months ago
- 1 comment
Labels: ready, docs, ci
#18794 - Update 2.2.0dev development version and changelog
Pull Request -
State: closed - Opened by awaelchli 12 months ago
- 1 comment
Labels: fabric, app, pl, package
#18793 - Fix bug when removing last checkpoint with deepspeed
Pull Request -
State: closed - Opened by hiaoxui 12 months ago
- 1 comment
Labels: bug, ready, callback: model checkpoint, community, pl
#18792 - docs: fix pages on PyPI
Pull Request -
State: closed - Opened by Borda 12 months ago
- 3 comments
Labels: ready, ci, priority: 1, release, fabric, pl, package
#18791 - ci/release: create a PR for release bump
Pull Request -
State: closed - Opened by Borda 12 months ago
- 1 comment
Labels: ready, ci, priority: 1, release
#18790 - ci/docs: create PR only if needed
Pull Request -
State: closed - Opened by Borda 12 months ago
- 1 comment
Labels: ready, ci
#18789 - Adding test for legacy checkpoint created with 2.1.x
Pull Request -
State: closed - Opened by pl-ghost 12 months ago
- 2 comments
Labels: ready, checkpointing, tests, pl
#18788 - Introduce Dataset Optimizer
Pull Request -
State: closed - Opened by tchaton 12 months ago
- 4 comments
Labels: ready, app, dependencies
#18787 - docs: update ref to latest tutorials & fix CI trigger
Pull Request -
State: closed - Opened by pl-ghost 12 months ago
- 1 comment
Labels: ready, docs, ci, pl, examples
#18786 - Support saving and loading remote paths with FSDP
Issue -
State: open - Opened by schmidt-ai 12 months ago
- 3 comments
Labels: feature, help wanted, strategy: fsdp, ver: 2.1.x
#18785 - Revert removal of empty-parameters check for `configure_optimizers()` with FSDP
Pull Request -
State: closed - Opened by awaelchli 12 months ago
- 2 comments
Labels: bug, ready, strategy: fsdp, pl
#18784 - LightningModule.configure_callbacks overrides Trainer callbacks
Issue -
State: open - Opened by adamjstewart 12 months ago
- 12 comments
Labels: feature, discussion, lightningmodule
#18783 - docs: setting cron for periodical update tutorials
Pull Request -
State: closed - Opened by Borda 12 months ago
- 2 comments
Labels: ready, ci
#18782 - Update probot-check-group.yml to v5.4
Pull Request -
State: closed - Opened by carmocca 12 months ago
- 1 comment
Labels: ready, ci
#18781 - The training mode is accidentally enabled in training_step function.
Issue -
State: closed - Opened by w2kun 12 months ago
- 1 comment
Labels: bug, needs triage, ver: 1.9.x
#18780 - warnings: resuming before epoch end is absolutely normal for long trainings
Issue -
State: open - Opened by stas00 12 months ago
- 5 comments
Labels: feature, data handling
#18779 - xfail collective tests
Pull Request -
State: closed - Opened by carmocca 12 months ago
- 1 comment
Labels: ready, fabric, tests
#18778 - Bugfix: Pin `lightning-cloud` version
Pull Request -
State: closed - Opened by ethanwharris 12 months ago
- 1 comment
Labels: ready, app, dependencies
#18777 - `ImportError`: cannot import name 'V1CloudSpaceAppAction' from 'lightning_cloud.openapi.models'
Issue -
State: closed - Opened by ordabayevy 12 months ago
- 3 comments
Labels: bug, app, ver: 2.1.x
#18777 - `ImportError`: cannot import name 'V1CloudSpaceAppAction' from 'lightning_cloud.openapi.models'
Issue -
State: closed - Opened by ordabayevy 12 months ago
- 3 comments
Labels: bug, app, ver: 2.1.x
#18776 - Raise an exception when calling `fit` twice with spawn
Pull Request -
State: closed - Opened by carmocca 12 months ago
- 2 comments
Labels: ready, breaking change, strategy: ddp, pl, strategy: xla
#18775 - Calling `trainer.fit` twice with spawn strategies won't work as expected
Issue -
State: open - Opened by carmocca 12 months ago
Labels: bug, priority: 1, strategy: ddp, strategy: xla, ver: 2.0.x
#18774 - Minor strategy fixes [TPU]
Pull Request -
State: open - Opened by carmocca 12 months ago
- 2 comments
Labels: bug, ready, fabric, pl
#18774 - Minor strategy fixes [TPU]
Pull Request -
State: closed - Opened by carmocca 12 months ago
- 2 comments
Labels: bug, ready, fabric, pl
#18773 - Fix spelling errors
Pull Request -
State: closed - Opened by awaelchli 12 months ago
- 1 comment
Labels: ready, docs, fabric, app, pl
#18773 - Fix spelling errors
Pull Request -
State: closed - Opened by awaelchli 12 months ago
- 1 comment
Labels: ready, docs, fabric, app, pl