rwth-i6/returnn issues and pull requests

#1737 - `DistributeFilesDataset`: `distrib_shard_files=True` leads to assertion error

Issue - State: closed - Opened by Icemole 17 days ago - 2 comments

#1726 - Distributed trainings: wait for data at every step necessary?

Issue - State: open - Opened by NeoLegends 2 months ago - 1 comment

#1725 - Add tag_prefix as a parameter to LmDataset

Pull Request - State: closed - Opened by dorian-K 3 months ago

#1724 - PT: No backend type associated with device type cpu

Issue - State: open - Opened by NeoLegends 3 months ago - 4 comments

#1723 - test_rel_pos_self_attention (nondeterministic? rare?) exception: dim_orig is None

Issue - State: open - Opened by albertz 3 months ago

#1722 - get_complete_frac fix when num_seqs is None

Pull Request - State: closed - Opened by albertz 3 months ago

#1721 - Dataset's num_seqs is None with TF backend for Viterbi training

Issue - State: closed - Opened by Marvin84 3 months ago - 2 comments

#1720 - Drop older Python support

Issue - State: open - Opened by albertz 3 months ago

#1719 - tooling: change formatter to ruff

Pull Request - State: closed - Opened by NeoLegends 3 months ago - 16 comments

#1718 - FileCache.get_file error: No space left on device

Issue - State: open - Opened by albertz 3 months ago

#1717 - `DistributeFilesDataset`: allow specifying files via list file

Pull Request - State: closed - Opened by NeoLegends 3 months ago
Labels: enhancement

#1716 - HDFDataset startup for huge dataset is slow

Issue - State: open - Opened by albertz 3 months ago - 1 comment

#1715 - PostprocessingDataset serialization fails due to map_seq_stream_preserves_num_seqs

Issue - State: closed - Opened by albertz 4 months ago

#1714 - RF RunCtx train_flag per func

Pull Request - State: closed - Opened by albertz 4 months ago

#1713 - SimpleHDFWriter, sanity checks

Pull Request - State: closed - Opened by albertz 4 months ago

#1712 - RF train_flag_ctx potentially not what you want, enable_dropout_ctx or enable_regularization_ctx or so instead

Issue - State: closed - Opened by albertz 4 months ago - 6 comments

#1711 - Fix incorrect `complete_frac` passed to `_print_process`

Pull Request - State: closed - Opened by dorian-K 4 months ago - 2 comments

#1710 - `torch.load` crash, due to changed defaults in torch >= 2.6

Issue - State: open - Opened by NeoLegends 4 months ago - 3 comments
Labels: bug

#1709 - FileCache: hold lock and refresh mtime during cleanup

Pull Request - State: closed - Opened by NeoLegends 4 months ago - 14 comments
Labels: bug

#1708 - RF combine inconcistent between native and pure Python

Issue - State: open - Opened by albertz 4 months ago - 2 comments

#1707 - Forward: OOM split batch crash on `epoch` data key

Issue - State: closed - Opened by NeoLegends 4 months ago

#1706 - Tests failing, AttributeError: module 'torch' has no attribute 'compiler', transformers lib

Issue - State: closed - Opened by albertz 4 months ago - 3 comments

#1705 - Error from changes in engine.py

Issue - State: closed - Opened by mmueller00 4 months ago - 2 comments

#1704 - Add tensorboard to torch engine

Pull Request - State: open - Opened by robin-p-schmitt 4 months ago - 2 comments

#1703 - compile_tf_graph.py error when using --rec_step_by_step for an AED network

Issue - State: open - Opened by jiangj-dc 4 months ago

#1702 - LLVM ERROR: Symbol not found: __svml_cosf8_ha

Issue - State: closed - Opened by albertz 4 months ago - 4 comments

#1701 - PostprocessingDataset with multi-processing

Issue - State: open - Opened by albertz 4 months ago - 2 comments

#1700 - MixingDataset needed

Issue - State: open - Opened by albertz 4 months ago - 3 comments

#1699 - RF tests, enable test_single_batch_entry globally

Pull Request - State: closed - Opened by albertz 5 months ago

#1698 - PT: also use `complete_frac` for progress reporting

Pull Request - State: closed - Opened by NeoLegends 5 months ago
Labels: enhancement

#1697 - PT: add randomization to bucket batching

Pull Request - State: closed - Opened by NeoLegends 5 months ago - 1 comment

#1696 - RF conv/pool, fix same padding with striding

Pull Request - State: closed - Opened by albertz 5 months ago

#1695 - RF merge_dims, fix for mult dyn dims

Pull Request - State: closed - Opened by albertz 5 months ago

#1694 - Frontend `merge_dims` problematic on dynamic dims

Issue - State: closed - Opened by albertz 5 months ago

#1693 - Frontend `conv`/`pool` 'same' padding with striding is inconsistent

Issue - State: closed - Opened by albertz 5 months ago - 3 comments

#1692 - RF conv/pool/etc, stft, window, use_mask, new behavior version 23

Pull Request - State: closed - Opened by albertz 5 months ago - 1 comment

#1691 - Frontend: masking for more functions, global setting

Issue - State: closed - Opened by albertz 5 months ago - 7 comments
Labels: potential-new-behavior, returnn-frontend

#1690 - OggZip: add option to resample audio via ffmpeg

Pull Request - State: closed - Opened by NeoLegends 5 months ago

#1689 - PT: pass dist rank/size via env to subprocesses

Pull Request - State: closed - Opened by NeoLegends 5 months ago - 3 comments

#1688 - PT preload_from_files ignores when no params are matching

Issue - State: open - Opened by albertz 6 months ago

#1687 - `rf.set_default_device` (`torch.set_default_device`?) before model creation?

Issue - State: open - Opened by albertz 6 months ago

#1686 - pytest collecting phase is slow

Issue - State: open - Opened by albertz 6 months ago

#1685 - `DFDataset`: do not pickle sharding info

Pull Request - State: closed - Opened by NeoLegends 6 months ago - 6 comments

#1684 - fix build, publish wheels

Pull Request - State: closed - Opened by dimbleby 6 months ago - 1 comment

#1683 - Automatically sorting dataset does not work with Torch engine forward + MetaDatasets

Issue - State: open - Opened by albertz 6 months ago - 4 comments

#1682 - SprintCacheDataset issue with torch backend

Issue - State: open - Opened by robin-p-schmitt 6 months ago

#1681 - Fix assert in LaplaceOrdering

Pull Request - State: closed - Opened by dorian-K 6 months ago - 2 comments

#1680 - Implicit `Tensor.bool` can cause unexpected behavior

Issue - State: closed - Opened by albertz 6 months ago - 1 comment

#1679 - Tensor, disallow bool

Pull Request - State: closed - Opened by albertz 6 months ago - 2 comments

#1678 - DistributeFilesDataset _num_shards issue

Issue - State: open - Opened by Judyxujj 6 months ago - 4 comments

#1677 - Add option to passthrough num_seqs in PostprocessingDataset

Pull Request - State: closed - Opened by dorian-K 7 months ago - 5 comments

#1676 - Dataset: implement global `dataset_distribution` option

Pull Request - State: open - Opened by NeoLegends 7 months ago - 4 comments
Labels: enhancement

#1675 - `FileNotFoundError` when updating mtime of files in file cache

Issue - State: closed - Opened by NeoLegends 7 months ago - 8 comments
Labels: bug

#1674 - Interrupt main thread on Exception in sub thread

Pull Request - State: closed - Opened by NeoLegends 7 months ago - 1 comment

#1673 - Dataset: allow LR scheduling based on `get_complete_frac`

Pull Request - State: closed - Opened by NeoLegends 7 months ago - 21 comments

#1672 - Fix _distribute_evenly_by_size for duplicate entries in files_order

Pull Request - State: closed - Opened by dorian-K 7 months ago - 1 comment

#1671 - SimpleHDFWriter extra seq lens not correct, not supporting custom seq lens

Issue - State: open - Opened by albertz 7 months ago

#1670 - Cleanup `returnn.tf.compat`

Issue - State: open - Opened by albertz 8 months ago
Labels: TensorFlow

#1669 - Loading large HDFDatasets inside MetaDataset is slow

Issue - State: open - Opened by dorian-K 8 months ago - 6 comments

#1668 - Bump required Python 3.7 -> 3.8, drop TF1 support

Pull Request - State: closed - Opened by NeoLegends 8 months ago - 7 comments

#1667 - Unhandled exceptions in threads should halt the program

Issue - State: closed - Opened by albertz 8 months ago - 1 comment

#1666 - TODO add test for batching in RF RelPosSelfAttention

Issue - State: closed - Opened by albertz 8 months ago

#1665 - RF cum_concat_step simplify and other RF things

Pull Request - State: closed - Opened by albertz 8 months ago - 5 comments

#1663 - `LockFile`: inspect other processes to check whether lockfile is held

Pull Request - State: closed - Opened by NeoLegends 8 months ago - 1 comment

#1661 - PT: add uniform likelihood bucket batching

Pull Request - State: closed - Opened by NeoLegends 8 months ago - 4 comments

#1660 - PT: allow custom batching, add bucket batching

Pull Request - State: closed - Opened by NeoLegends 8 months ago - 1 comment

#1659 - Allow skipping sequences in forward config option

Pull Request - State: closed - Opened by dorian-K 8 months ago - 1 comment

#1658 - Allow inserting 0-length elements into HDF other

Pull Request - State: closed - Opened by dorian-K 8 months ago - 1 comment

#1657 - Bump github action versions to their most recent version

Pull Request - State: closed - Opened by dorian-K 8 months ago

#1656 - PT: print padding amount per batch and subepoch

Pull Request - State: closed - Opened by NeoLegends 8 months ago - 1 comment
Labels: enhancement

#1655 - `PPDataset`: implement `BucketOrdering`

Pull Request - State: closed - Opened by NeoLegends 8 months ago - 6 comments

#1654 - Step count is not reset when loading a checkpoint and resetting the epoch

Issue - State: open - Opened by mmueller00 8 months ago - 2 comments

#1653 - Add extra_labels to SimpleHDFWriter

Pull Request - State: closed - Opened by dorian-K 8 months ago

#1652 - `PPDataset`: be strict about `seq_order` and `seq_list` in `init_seq_order`

Pull Request - State: closed - Opened by NeoLegends 8 months ago

#1651 - PostprocessingDataset init_seq_order with given seq_list or seq_order wrong (at least with map_seq_stream)

Issue - State: closed - Opened by albertz 8 months ago

#1650 - MultiEpochDataset: implement get_current_seq_order

Pull Request - State: closed - Opened by dorian-K 8 months ago - 4 comments

#1649 - Train proc manager restarts after Bus error crash, still consumes GPU memory, get OutOfMemoryError

Issue - State: open - Opened by albertz 9 months ago

#1648 - Unexpected bus error encountered in worker

Issue - State: open - Opened by albertz 9 months ago - 2 comments

#1647 - remove Nose dependency

Issue - State: closed - Opened by albertz 9 months ago - 3 comments

#1646 - `MPDataset`: make compatible with being wrapped in `PPDataset`

Pull Request - State: closed - Opened by NeoLegends 9 months ago
Labels: bug

#1645 - Plan for packed dims

Issue - State: open - Opened by albertz 9 months ago - 3 comments

#1644 - FileCache: update mtime of lockfile immediately after acquiring it

Pull Request - State: closed - Opened by NeoLegends 9 months ago - 2 comments

#1643 - FileCache assertion on previous copy attempt age triggered

Issue - State: closed - Opened by NeoLegends 9 months ago - 11 comments

#1642 - RF (PT) meaning of losses with `as_error`

Issue - State: open - Opened by albertz 9 months ago

#1641 - `Tensor` `Dim`, support `Dim.capacity > max(Dim.dyn_size_ext)`

Issue - State: open - Opened by albertz 9 months ago
Labels: TPU, JAX

#1640 - MultiProcDataset, implement get_all_tags

Pull Request - State: closed - Opened by albertz 9 months ago

#1639 - MultiEpochDataset, and some other smaller things

Pull Request - State: closed - Opened by albertz 9 months ago

#1638 - Potential timeout during data caching in multi-node trainings

Issue - State: open - Opened by NeoLegends 9 months ago - 2 comments
Labels: bug

#1637 - Some fix for invalid broadcasting

Pull Request - State: open - Opened by albertz 10 months ago - 5 comments

#1636 - RF cross_entropy (matmul, gather) should maybe have allow_broadcast?

Issue - State: open - Opened by albertz 10 months ago

#1635 - Remove outdated Python header attribs?

Issue - State: open - Opened by albertz 10 months ago

#1634 - Sharding for multi-GPU training

Issue - State: open - Opened by albertz 10 months ago - 2 comments

#1633 - Dim declare_same_as, fix when existing same_as

Pull Request - State: closed - Opened by albertz 10 months ago

#1632 - `LaplaceOrdering`: avoid spiky CPU utilization

Pull Request - State: closed - Opened by NeoLegends 10 months ago - 1 comment
Labels: bug

#1631 - `LaplaceOrdering` interacts badly w/ MultiProcDataset

Issue - State: closed - Opened by NeoLegends 10 months ago
Labels: bug

#1630 - Datasets: implement support for within-dataset sharding

Pull Request - State: closed - Opened by NeoLegends 10 months ago - 4 comments

#1629 - DeepCopyError in gradient checkpointing when using `param_variational_noise`

Issue - State: closed - Opened by mmz33 11 months ago - 5 comments

#1628 - PT: regularly sync progress during eval, fix tensor assignment

Pull Request - State: closed - Opened by NeoLegends 11 months ago - 1 comment

#1627 - PT: regularly sync progress during eval

Pull Request - State: closed - Opened by NeoLegends 11 months ago
Labels: bug

#1624 - `OggZipDataset`: normalize // to / when reading files from archive

Pull Request - State: open - Opened by NeoLegends 11 months ago - 2 comments

GitHub / rwth-i6/returnn issues and pull requests