GitHub / Lightning-AI/pytorch-lightning issues and pull requests
#20982 - MoE (mixture of experts) support for expert parallel
Issue -
State: open - Opened by MeteorsHub 16 days ago
Labels: feature, needs triage
#20980 - docs: updating flaking links
Pull Request -
State: open - Opened by Borda 16 days ago
- 1 comment
Labels: ci, pl, package
#20979 - fix: failing markdown link test in ci
Pull Request -
State: closed - Opened by deependujha 16 days ago
- 1 comment
Labels: ci
#20977 - docs: update ref to latest tutorials
Pull Request -
State: closed - Opened by pl-ghost 18 days ago
- 1 comment
Labels: docs, examples
#20976 - Rich progress bar error when resume training
Issue -
State: open - Opened by YAndrewL 19 days ago
- 5 comments
Labels: bug, progress bar: rich, ver: 2.5.x
#20975 - fix: remove extra parameter in accelerator registry decorator
Pull Request -
State: open - Opened by YgLK 19 days ago
- 1 comment
Labels: fabric
#20974 - fix: remove extra parameter in accelerator registry decorator
Pull Request -
State: closed - Opened by YgLK 19 days ago
Labels: docs, ci, fabric, pl, dependencies, dockers, package, store, app, data
#20973 - Accelerator registry decorator usage fails with TypeError due to incorrect function signature
Issue -
State: open - Opened by YgLK 19 days ago
Labels: bug, accelerator, ver: 2.5.x
#20972 - MLFlowLogger.save_dir mishandles absolute file: URIs on Windows
Issue -
State: open - Opened by g-sawicki 20 days ago
Labels: bug, logger, logger: mlflow, ver: 2.5.x
#20971 - Add support nvcr.io/nvidia/pytorch:25.06-py3
Pull Request -
State: closed - Opened by intexcor 20 days ago
Labels: pl, dependencies
#20970 - Proper way to use mixed precision with manual optimization
Issue -
State: open - Opened by aRI0U 21 days ago
Labels: docs, needs triage
#20969 - Recommend uv commands for development scripts
Issue -
State: open - Opened by matsumotosan 21 days ago
Labels: docs, needs triage
#20964 - Fix: Allow trainer to accept CUDAAccelerator instance as accelerator with FSDP strategy
Pull Request -
State: closed - Opened by bhimrazy 27 days ago
Labels: accelerator, strategy: fsdp, pl
#20961 - Add dev env setup guide
Pull Request -
State: closed - Opened by matsumotosan 27 days ago
Labels: ci
#20957 - Strategy `fsdp` requires a GPU accelerator, but got CUDAAccelerator
Issue -
State: closed - Opened by liopeer 28 days ago
- 1 comment
Labels: bug, accelerator, ver: 2.5.x
#20954 - Recommend dev setup / support uv
Issue -
State: closed - Opened by jjh42 29 days ago
- 2 comments
Labels: question, docs, code quality
#20952 - Make asyncio checkpointing work if validate/fit is called more than once
Pull Request -
State: open - Opened by jjh42 29 days ago
Labels: pl
#20951 - docs(csv_logs): Clarify CSV and YAML logging distinction and improve examples
Pull Request -
State: open - Opened by bhimrazy 29 days ago
Labels: docs, logger: csv, pl
#20949 - Add Support for DeepSpeed's `exclude_frozen_parameters` argument in `DeepSpeedStrategy`
Issue -
State: open - Opened by tempoxylophone about 1 month ago
Labels: feature, needs triage
#20948 - docs: Update compatibility matrix in versioning to include Lightning 2.5 series and extend PyTorch support ranges
Pull Request -
State: closed - Opened by bhimrazy about 1 month ago
Labels: docs, pl
#20947 - With automatic_optimization disabled and checkpointing every n steps, the best checkpointed model is the model obtained after backpropagation and not the one used for computing the loss
Issue -
State: open - Opened by Yann-CV about 1 month ago
Labels: bug, needs triage, ver: 2.5.x
#20946 - `sync_dist` works incorrectly when `self.log` gets key-value pairs in different order
Issue -
State: open - Opened by iwyoo about 1 month ago
- 1 comment
Labels: bug, needs triage, ver: 2.5.x
#20945 - update ModelSummary
Pull Request -
State: open - Opened by YChienHung about 1 month ago
- 1 comment
Labels: pl
#20944 - ModelSummary can't show the mode of model whitch is frozen
Issue -
State: open - Opened by YChienHung about 1 month ago
Labels: feature, needs triage
#20943 - Improve type hint for `reduce_fx` in `LightningModule.log`
Pull Request -
State: closed - Opened by rittik9 about 1 month ago
Labels: pl
#20942 - ci: disable TPU testing
Pull Request -
State: closed - Opened by Borda about 1 month ago
- 1 comment
Labels: ci
#20941 - Incomplete typing in LightningModule.log method
Issue -
State: closed - Opened by gabriele-marino about 1 month ago
- 4 comments
Labels: bug, needs triage, ver: 2.5.x
#20940 - test: addressing flaky spawn in subprocesses
Pull Request -
State: closed - Opened by Borda about 1 month ago
- 2 comments
Labels: fabric, pl
#20939 - ci/gpu: setting oldest dependencies
Pull Request -
State: closed - Opened by Borda about 1 month ago
- 2 comments
Labels: ci, fabric, pl, dependencies, package
#20938 - Intel GPU support?
Issue -
State: open - Opened by MilesCranmer about 1 month ago
Labels: feature, accelerator
#20937 - LightningModule -> configure_optimizers() return type
Issue -
State: closed - Opened by jmoerk123 about 1 month ago
- 2 comments
Labels: bug, needs triage, ver: 2.5.x
#20936 - Fix wrong behavior of `DDPStrategy` option with simple GAN training using DDP
Pull Request -
State: open - Opened by samsara-ku about 1 month ago
Labels: pl
#20935 - ci/gpu: drop duplicate/confusing dep. installations
Pull Request -
State: closed - Opened by Borda about 1 month ago
- 2 comments
Labels: ci
#20934 - pin `bitsandbytes!=0.46` due to `int8_double_quant` with `ValueError`
Pull Request -
State: closed - Opened by Borda about 1 month ago
- 2 comments
Labels: fabric, pl, dependencies
#20933 - test: addressing flaky spawn "process 0 terminated with signal SIGABRT"
Pull Request -
State: closed - Opened by Borda about 1 month ago
- 2 comments
Labels: ci, pl
#20932 - MlflowException when logging checkpoints with MLFlowLogger
Issue -
State: open - Opened by leike0813 about 1 month ago
- 2 comments
Labels: bug, needs triage, ver: 2.5.x
#20931 - Model checkpointing `save_on_train_epoch_end` default behavior documentation
Pull Request -
State: closed - Opened by matsumotosan about 1 month ago
Labels: pl
#20930 - build(deps): bump mypy from 1.16.0 to 1.16.1 in /requirements
Pull Request -
State: closed - Opened by dependabot[bot] about 1 month ago
Labels: ci, dependencies
#20929 - build(deps): bump pytest from 8.4.0 to 8.4.1 in /requirements
Pull Request -
State: closed - Opened by dependabot[bot] about 1 month ago
- 1 comment
Labels: ci, fabric, pl, dependencies
#20928 - build(deps): bump pytest-random-order from 1.1.1 to 1.2.0 in /requirements
Pull Request -
State: closed - Opened by dependabot[bot] about 1 month ago
- 1 comment
Labels: ci, fabric, pl, dependencies
#20927 - build(deps): bump codecov/codecov-action from 4 to 5
Pull Request -
State: closed - Opened by dependabot[bot] about 1 month ago
Labels: ci
#20926 - compatibility patch for BnB >=0.46
Issue -
State: open - Opened by Borda about 1 month ago
Labels: bug, help wanted, good first issue, 3rd party, ver: 2.5.x
#20925 - docs: update chlog after `2.5.2` release
Pull Request -
State: closed - Opened by Borda about 1 month ago
- 1 comment
Labels: fabric, pl
#20924 - fix: update automated checkpoint messages for consistency
Pull Request -
State: closed - Opened by Borda about 1 month ago
- 1 comment
Labels: ci
#20923 - Fix nested module example
Pull Request -
State: open - Opened by matsumotosan about 1 month ago
- 1 comment
Labels: docs, pl
#20922 - Adding test for legacy checkpoint created with 2.5.2
Pull Request -
State: closed - Opened by pl-ghost about 1 month ago
- 2 comments
Labels: checkpointing, tests, pl
#20921 - Fix: `no_grad` with AMP bug
Pull Request -
State: open - Opened by baskrahmer about 1 month ago
- 2 comments
Labels: pl
#20920 - Code issue in demo
Issue -
State: open - Opened by zihaoli0629 about 1 month ago
Labels: docs, needs triage
#20919 - When checkpointing with a step interval on a validation metric, the checkpointing is done before the validation computationstep
Issue -
State: open - Opened by Yann-CV about 1 month ago
Labels: bug, needs triage, ver: 2.5.x
#20918 - Minor patch release `2.5.2` [rebase & merge]
Pull Request -
State: closed - Opened by Borda about 1 month ago
- 2 comments
Labels: docs, ci, release, fabric, pl, dependencies, dockers, package, data
#20916 - Add `save_on_exception` option to `ModelCheckpoint`
Pull Request -
State: open - Opened by vsey about 1 month ago
- 1 comment
Labels: pl
#20915 - Ignore Keyword Arguments Outside of Callback Signature During `Fabric.call`
Issue -
State: open - Opened by ryan-minato about 1 month ago
Labels: refactor, needs triage
#20913 - Fabric: Enable "auto" for `devices` and `accelerator` as cli arguments
Pull Request -
State: closed - Opened by fnhirwa about 1 month ago
- 1 comment
Labels: fabric, pl
#20912 - debugging flaky `test_collective_operations` with SIGABRT
Pull Request -
State: closed - Opened by Borda about 1 month ago
- 4 comments
Labels: ci, fabric
#20911 - bump: `bitsandbytes >=0.45.2,<0.47.0` & compatibility patch for `bnb.functional.int8_double_quant`
Pull Request -
State: closed - Opened by Borda about 1 month ago
- 2 comments
Labels: 3rd party, fabric, pl, dependencies
#20910 - fix check for flaky links in readme
Pull Request -
State: closed - Opened by Borda about 1 month ago
- 2 comments
Labels: docs, ci, fabric, pl, package, data
#20909 - Tqdm print multi lines with refresh
Issue -
State: open - Opened by name-used about 1 month ago
- 5 comments
Labels: bug, needs triage, ver: 2.5.x
#20908 - build(deps): update pandoc requirement from <=2.3,>=1.0 to >=1.0,<=2.4 in /requirements
Pull Request -
State: closed - Opened by dependabot[bot] about 1 month ago
Labels: docs, ci, dependencies
#20907 - build(deps): update jsonargparse[signatures] requirement from <4.40.0,>=4.39.0 to >=4.39.0,<4.41.0 in /requirements
Pull Request -
State: closed - Opened by dependabot[bot] about 1 month ago
Labels: ci, pl, dependencies
#20906 - build(deps): update typing-extensions requirement from <4.14.0,>=4.4.0 to >=4.4.0,<4.15.0 in /requirements
Pull Request -
State: closed - Opened by dependabot[bot] about 1 month ago
- 1 comment
Labels: ci, fabric, pl, dependencies
#20905 - build(deps): bump pytest-cov from 6.1.1 to 6.2.1 in /requirements
Pull Request -
State: closed - Opened by dependabot[bot] about 1 month ago
- 1 comment
Labels: ci, fabric, pl, dependencies
#20904 - build(deps): bump coverage from 7.8.2 to 7.9.1 in /requirements
Pull Request -
State: closed - Opened by dependabot[bot] about 1 month ago
Labels: ci, fabric, pl, dependencies
#20903 - build(deps): update lightning-habana requirement from <1.3.0,>=1.2.0 to >=1.2.0,<1.7.0 in /requirements
Pull Request -
State: closed - Opened by dependabot[bot] about 1 month ago
Labels: ci, dependencies
#20902 - Metrics get mapped twice to the same epoch in MLflow logger
Issue -
State: open - Opened by bb511 about 2 months ago
Labels: bug, logger, logger: mlflow, ver: 2.5.x
#20901 - Fix Typos in Comments and Function Names Across Multiple Files
Pull Request -
State: closed - Opened by vtjl10 about 2 months ago
- 1 comment
Labels: fabric, pl
#20900 - chore: bump `mypy` from 1.15.0 to 1.16.0 and resolve typing issues
Pull Request -
State: closed - Opened by rittik9 about 2 months ago
- 1 comment
Labels: fabric, pl, dependencies
#20899 - Fixing problem for silently supporting jsonnet.
Pull Request -
State: closed - Opened by muthissar about 2 months ago
- 2 comments
Labels: pl, dependencies
#20898 - LightningCLI fails loading jsonnet config files
Issue -
State: closed - Opened by muthissar about 2 months ago
Labels: bug, ver: 2.5.x
#20897 - Fix Typo in TBPTT Documentation and Improve Trainer Docstring
Pull Request -
State: closed - Opened by kilavvy about 2 months ago
Labels: docs, fabric, pl
#20896 - feat: Default to RichProgressBar and RichModelSummary if rich is avai…
Pull Request -
State: open - Opened by littlebullGit about 2 months ago
- 2 comments
Labels: pl
#20895 - DOC: Clarify DeviceStatsMonitor logged metrics
Pull Request -
State: open - Opened by MrAnayDongre about 2 months ago
Labels: pl
#20894 - Weird bug when setting `val_check_interval` dynamically in `setup()`
Issue -
State: open - Opened by davidgill97 about 2 months ago
- 4 comments
Labels: bug, needs triage, ver: 2.5.x
#20893 - fixing various typos
Pull Request -
State: closed - Opened by Borda about 2 months ago
- 2 comments
Labels: ci, fabric, pl, dockers, package
#20892 - Fix typos: "reparametrization" → "reparameterization" and "recommed" → "recommend"
Pull Request -
State: closed - Opened by leopardracer about 2 months ago
- 1 comment
Labels: fabric, pl
#20891 - Error when learning on tpu
Issue -
State: open - Opened by intexcor about 2 months ago
Labels: bug, run TPU, ver: 2.5.x
#20890 - Warnings when learning on tpu
Issue -
State: open - Opened by intexcor about 2 months ago
- 1 comment
Labels: bug, run TPU, ver: 2.5.x
#20889 - refactor: use __all__ in accelerators/__init__.py
Pull Request -
State: closed - Opened by littlebullGit about 2 months ago
Labels: pl
#20888 - build(deps): update pandas requirement from <2.3.0,>1.0 to >1.0,<2.4.0 in /requirements
Pull Request -
State: closed - Opened by dependabot[bot] about 2 months ago
- 2 comments
Labels: ci, pl, dependencies
#20887 - build(deps): bump pytest from 8.3.5 to 8.4.0 in /requirements
Pull Request -
State: closed - Opened by dependabot[bot] about 2 months ago
- 2 comments
Labels: ci, fabric, pl, dependencies
#20886 - Use lazy string formatting in logging statement in setup.py
Pull Request -
State: closed - Opened by KAVYANSHTYAGI about 2 months ago
Labels: package
#20885 - Logging in `on_test_epoch_end` with multiple dataloaders
Issue -
State: open - Opened by pschroeppel about 2 months ago
Labels: bug, help wanted, ver: 2.5.x
#20884 - Add more accelerators for learning
Issue -
State: open - Opened by intexcor about 2 months ago
- 3 comments
Labels: feature, needs triage
#20883 - `pylint` is not happy about `_restricted_classmethod`
Issue -
State: open - Opened by stefanistrate about 2 months ago
- 1 comment
Labels: help wanted, refactor, ver: 2.5.x
#20882 - Unexplained behaviour in accumulate gradients vs in a ddp setting - why are the gradients different?
Issue -
State: closed - Opened by avishek-mondal about 2 months ago
- 1 comment
Labels: bug, needs triage, ver: 2.5.x
#20880 - Error using wandb when learning on tpu
Issue -
State: open - Opened by intexcor about 2 months ago
Labels: bug, logger, ver: 2.5.x
#20879 - bump: update base Ubuntu versions for dockers
Pull Request -
State: closed - Opened by Borda about 2 months ago
- 1 comment
Labels: ci, dockers
#20878 - build(deps): bump torch from 2.7.0 to 2.7.1 in /requirements
Pull Request -
State: closed - Opened by dependabot[bot] about 2 months ago
Labels: ci, dependencies
#20877 - bump: PyTorch to be latest `2.7.1`
Pull Request -
State: closed - Opened by Borda about 2 months ago
- 1 comment
Labels: ci, dockers
#20876 - Ensure correct device is used for autocast when mps is selected as Fabric accelerator
Pull Request -
State: closed - Opened by Armannas about 2 months ago
- 2 comments
Labels: fabric
#20875 - bugs too many
Issue -
State: open - Opened by aaaa928 about 2 months ago
- 1 comment
Labels: question
#20874 - Implement todos tensorboard
Pull Request -
State: closed - Opened by KAVYANSHTYAGI about 2 months ago
- 2 comments
Labels: fabric
#20873 - fix: move `check_inputs` to target device if available during `to_torchscript`.
Pull Request -
State: closed - Opened by GdoongMathew about 2 months ago
Labels: pl
#20872 - bugfix: add support for `global_ordinal`, `local_ordinal`, `world_size` in xla
Pull Request -
State: open - Opened by AlexandrByzov about 2 months ago
- 2 comments
Labels: bug, fabric
#20871 - PR: Fix Duplicate Metric Logging in MLFlowLogger to Prevent MLflow Database Errors
Pull Request -
State: open - Opened by KAVYANSHTYAGI about 2 months ago
Labels: pl, dependencies
#20870 - build(deps): bump mypy from 1.15.0 to 1.16.0 in /requirements
Pull Request -
State: closed - Opened by dependabot[bot] about 2 months ago
- 2 comments
Labels: ci, dependencies
#20869 - Fix progress bar display to correctly handle iterable dataset and max_steps during training
Pull Request -
State: closed - Opened by bandpooja about 2 months ago
- 3 comments
Labels: waiting on author, pl
#20868 - feat: add flops in ModelSummary columns.
Pull Request -
State: closed - Opened by GdoongMathew 2 months ago
Labels: pl
#20866 - Fix GAN training exmaple using DDP due to find_unused_parameters
Issue -
State: open - Opened by samsara-ku 2 months ago
- 2 comments
Labels: bug, needs triage, ver: 2.5.x
#20865 - Mlflow logging LR duplicate key issue with PostgreSQL DB #190
Issue -
State: open - Opened by anaprietonem 2 months ago
Labels: bug, needs triage, ver: 2.5.x
#20864 - Add documentation warning: Don’t use torch.profiler.profile context manager around Trainer methods
Pull Request -
State: open - Opened by KAVYANSHTYAGI 2 months ago
Labels: docs, pl