Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / mosaicml/streaming issues and pull requests
#729 - Update huggingface-hub requirement from <0.24,>=0.23.4 to >=0.23.4,<0.25
Pull Request -
State: closed - Opened by dependabot[bot] 4 months ago
- 1 comment
Labels: dependencies
#728 - GCS: Mosiacml-streaming overloads the GCP metadata service when too many processes are used.
Issue -
State: closed - Opened by smspillaz 4 months ago
- 2 comments
Labels: bug
#727 - Incorrect container name in download_from_azure
Issue -
State: closed - Opened by jaehwana2z 4 months ago
- 1 comment
Labels: bug
#726 - Download optimal for device_per_stream batching method.
Issue -
State: open - Opened by huxuan 4 months ago
- 4 comments
#725 - Replication changes sample order
Issue -
State: open - Opened by CodeCreator 4 months ago
- 3 comments
Labels: bug
#724 - Bump fastapi from 0.111.0 to 0.111.1
Pull Request -
State: closed - Opened by dependabot[bot] 4 months ago
Labels: dependencies
#723 - Improve error message on non-0 rank when index file download failed
Pull Request -
State: closed - Opened by bigning 4 months ago
#722 - Using mosaicml streaming with accelerate ?
Issue -
State: closed - Opened by benihime91 4 months ago
- 3 comments
#721 - Add hf - fix lints
Pull Request -
State: closed - Opened by XiaohanZhangCMU 5 months ago
- 2 comments
#720 - Too much disk usage after transforming to MDS format
Issue -
State: closed - Opened by LingxiaoShawn 5 months ago
- 5 comments
#719 - Fix Linting from Pillow version update
Pull Request -
State: closed - Opened by XiaohanZhangCMU 5 months ago
#718 - Bump pydantic from 2.7.4 to 2.8.2
Pull Request -
State: closed - Opened by dependabot[bot] 5 months ago
Labels: dependencies
#717 - 'File exists: "/00000_locals"' when integrated with deepspeed training scripts
Issue -
State: open - Opened by Clement25 5 months ago
- 4 comments
Labels: bug
#716 - All processes allocate memory on rank 0 during StreamingDataset initialization in a distributed setting
Issue -
State: open - Opened by ohallstrom 5 months ago
- 4 comments
Labels: bug
#715 - Bump databricks-sdk from 0.28.0 to 0.29.0
Pull Request -
State: closed - Opened by dependabot[bot] 5 months ago
Labels: dependencies
#714 - Upgrade ci_testing, remove codeql
Pull Request -
State: closed - Opened by snarayan21 5 months ago
#713 - enable adaptive retry for s3 download
Pull Request -
State: closed - Opened by bigning 5 months ago
#712 - Remove duplicate `dbfs:` prefix from error message
Pull Request -
State: closed - Opened by vanshcsingh 5 months ago
#711 - Add HF File System Support to Streaming
Pull Request -
State: closed - Opened by orionw 5 months ago
- 14 comments
#710 - Bump pytest-split from 0.8.2 to 0.9.0
Pull Request -
State: closed - Opened by dependabot[bot] 5 months ago
- 1 comment
Labels: dependencies
#709 - Optional dependency for different storages?
Issue -
State: open - Opened by huxuan 5 months ago
- 2 comments
Labels: enhancement
#708 - fix convert imagenet
Pull Request -
State: closed - Opened by Hprairie 5 months ago
#707 - AttributeError when trying to convert Imagenet1k
Issue -
State: closed - Opened by Hprairie 5 months ago
- 3 comments
Labels: bug
#706 - Fix `drop_first` checking in partitioning to account for `world_size` divisibility
Pull Request -
State: closed - Opened by snarayan21 5 months ago
#705 - Fix linting issues with numpy 2
Pull Request -
State: closed - Opened by snarayan21 5 months ago
#704 - Bump pydantic from 2.7.3 to 2.7.4
Pull Request -
State: closed - Opened by dependabot[bot] 5 months ago
- 1 comment
Labels: dependencies
#703 - Error writing to databricks UC volume
Issue -
State: closed - Opened by JK87iab 5 months ago
- 5 comments
Labels: bug
#702 - Fix edge cases with scalar or empty numpy array encoding
Pull Request -
State: closed - Opened by snarayan21 5 months ago
#701 - Raise IndexError in `Spanner` object instead of `ValueError`
Pull Request -
State: closed - Opened by snarayan21 5 months ago
- 1 comment
#700 - Enable correct resumption from the end of an epoch
Pull Request -
State: closed - Opened by snarayan21 5 months ago
#699 - [QUESTION] Ask about some detailed questions regarding the shuffle algorithm in official website.
Issue -
State: closed - Opened by yanghua 5 months ago
- 3 comments
#698 - Different batch_size for different streams
Issue -
State: closed - Opened by huxuan 5 months ago
- 2 comments
Labels: enhancement
#697 - Bump pytest from 8.2.1 to 8.2.2
Pull Request -
State: closed - Opened by dependabot[bot] 6 months ago
- 1 comment
Labels: dependencies
#696 - Bump pydantic from 2.7.2 to 2.7.3
Pull Request -
State: closed - Opened by dependabot[bot] 6 months ago
Labels: dependencies
#695 - Handle zero-sized ndarray more gracefully
Issue -
State: closed - Opened by huxuan 6 months ago
- 1 comment
Labels: bug
#694 - fix: expand user path for Writer's output directory.
Pull Request -
State: closed - Opened by huxuan 6 months ago
- 1 comment
#693 - Make sure epoch_size is an int
Pull Request -
State: closed - Opened by snarayan21 6 months ago
#692 - Bump pydantic from 2.7.1 to 2.7.2
Pull Request -
State: closed - Opened by dependabot[bot] 6 months ago
- 1 comment
Labels: dependencies
#691 - Bump uvicorn from 0.29.0 to 0.30.1
Pull Request -
State: closed - Opened by dependabot[bot] 6 months ago
- 1 comment
Labels: dependencies
#690 - DeltaTorch Compatability?
Issue -
State: closed - Opened by rangi513 6 months ago
- 3 comments
#689 - Bug that causes FileExistsError in shm
Issue -
State: closed - Opened by Shade5 6 months ago
- 6 comments
Labels: bug
#688 - Warning condition changed for Sequence Parallelism
Pull Request -
State: closed - Opened by XiaohanZhangCMU 6 months ago
#687 - Bump databricks-sdk from 0.27.1 to 0.28.0
Pull Request -
State: closed - Opened by dependabot[bot] 6 months ago
- 2 comments
Labels: dependencies
#686 - Suboptimal usage of 8xH100 GPUs - Streaming dataloader speed significantly fluctuates across batches
Issue -
State: open - Opened by VSehwag 6 months ago
- 7 comments
Labels: bug
#685 - Fix node calculation in `replication` for `World` object
Pull Request -
State: closed - Opened by snarayan21 6 months ago
#684 - Heterogeneous
Pull Request -
State: open - Opened by XiaohanZhangCMU 6 months ago
#683 - Improve local temp directory error when only `remote` is specified
Pull Request -
State: closed - Opened by snarayan21 6 months ago
- 4 comments
#682 - Fix `batch_size` typo for `Stream` object in docs
Pull Request -
State: closed - Opened by snarayan21 6 months ago
#681 - Update CODEOWNERS
Pull Request -
State: closed - Opened by karan6181 6 months ago
#680 - Bump pytest from 8.2.0 to 8.2.1
Pull Request -
State: closed - Opened by dependabot[bot] 6 months ago
- 1 comment
Labels: dependencies
#679 - Bump databricks-sdk from 0.27.0 to 0.27.1
Pull Request -
State: closed - Opened by dependabot[bot] 6 months ago
- 2 comments
Labels: dependencies
#678 - Reading all formats (parquet, csv, tsv, json) etc natively without conversion steps
Issue -
State: closed - Opened by abhijithneilabraham 6 months ago
- 2 comments
Labels: enhancement
#677 - Last entry in the dataset is causing "Relative sample index $x is not present" error
Issue -
State: open - Opened by isidentical 6 months ago
- 3 comments
Labels: bug
#676 - Using minio with StreamingDataset
Issue -
State: closed - Opened by abhijithneilabraham 6 months ago
- 1 comment
Labels: bug
#675 - Update platform references
Pull Request -
State: closed - Opened by aspfohl 6 months ago
- 1 comment
#674 - Use IndexError instead of ValueError in __getitem__
Issue -
State: closed - Opened by keaganlong 6 months ago
- 1 comment
#673 - Helpful error on `py1e` for improperly written datasets
Pull Request -
State: closed - Opened by snarayan21 6 months ago
#672 - Ensure shards cannot be larger than 4GB
Pull Request -
State: closed - Opened by snarayan21 6 months ago
#671 - Shard maximum size should be 4GB for MDS
Issue -
State: closed - Opened by smspillaz 6 months ago
- 1 comment
Labels: bug
#670 - Bump fastapi from 0.110.2 to 0.111.0
Pull Request -
State: closed - Opened by dependabot[bot] 6 months ago
- 1 comment
Labels: dependencies
#669 - Version bump to v0.7.6
Pull Request -
State: closed - Opened by snarayan21 7 months ago
#668 - Fix: having zero bytes files after converting spark dataframe to MDS saved on dbfs:/Volumes
Pull Request -
State: closed - Opened by XiaohanZhangCMU 7 months ago
- 2 comments
#667 - Bump databricks-sdk from 0.23.0 to 0.27.0
Pull Request -
State: closed - Opened by dependabot[bot] 7 months ago
- 1 comment
Labels: dependencies
#666 - Bump pydantic from 2.7.0 to 2.7.1
Pull Request -
State: closed - Opened by dependabot[bot] 7 months ago
- 3 comments
Labels: dependencies
#665 - Bump databricks-sdk from 0.23.0 to 0.26.0
Pull Request -
State: closed - Opened by dependabot[bot] 7 months ago
- 1 comment
Labels: dependencies
#664 - Bump pytest from 8.1.1 to 8.2.0
Pull Request -
State: closed - Opened by dependabot[bot] 7 months ago
- 1 comment
Labels: dependencies
#663 - clean_stale_shared_memory duplicating the master process when called in a train.py script
Issue -
State: open - Opened by antoinedandi 7 months ago
- 2 comments
Labels: bug
#662 - Support large size index.json (20GB +)
Issue -
State: open - Opened by andreamad8 7 months ago
- 2 comments
Labels: enhancement
#661 - Adding `device_per_stream` batching
Pull Request -
State: closed - Opened by snarayan21 7 months ago
#660 - Bump fastapi from 0.110.0 to 0.110.2
Pull Request -
State: closed - Opened by dependabot[bot] 7 months ago
Labels: dependencies
#659 - Integer overflow and data corruption (uncompressed mds file size is larger than 2^32)
Issue -
State: closed - Opened by jarnoseppanen-sc 7 months ago
- 6 comments
Labels: bug
#658 - How does a LRU local cache help with multi-epoch training
Issue -
State: closed - Opened by liangjuf 7 months ago
- 4 comments
Labels: enhancement
#657 - batching_method=random doesn't seem to work properly
Issue -
State: closed - Opened by oscarfossey 7 months ago
- 2 comments
Labels: bug
#656 - Does it support Preference data (for training Reward / DPO)?
Issue -
State: open - Opened by ericxsun 7 months ago
- 4 comments
Labels: enhancement
#655 - Azure Databricks MDS write ops in error: MapInPandas write_mds gives message Spark higher-order functions are not supported in Unity Catalog
Issue -
State: open - Opened by wolliq 7 months ago
- 2 comments
Labels: bug
#654 - Bump databricks-sdk from 0.23.0 to 0.25.1
Pull Request -
State: closed - Opened by dependabot[bot] 7 months ago
- 1 comment
Labels: dependencies
#653 - Bump pydantic from 2.6.4 to 2.7.0
Pull Request -
State: closed - Opened by dependabot[bot] 7 months ago
Labels: dependencies
#652 - Out of Memory when using Streaming Dataloader
Issue -
State: open - Opened by VikaasVarma 7 months ago
- 15 comments
Labels: bug
#651 - Add support for Alipan Storage backend
Pull Request -
State: closed - Opened by PeterDing 8 months ago
- 1 comment
#650 - Version bump to 0.7.5
Pull Request -
State: closed - Opened by snarayan21 8 months ago
#649 - Bump fastapi from 0.110.0 to 0.110.1
Pull Request -
State: closed - Opened by dependabot[bot] 8 months ago
- 1 comment
Labels: dependencies
#648 - Bump databricks-sdk from 0.23.0 to 0.24.0
Pull Request -
State: closed - Opened by dependabot[bot] 8 months ago
- 1 comment
Labels: dependencies
#647 - COCO Dataset fix -- avoids `allow_unsafe_types=True`
Pull Request -
State: closed - Opened by snarayan21 8 months ago
#646 - Augment existing dataset
Issue -
State: open - Opened by LWprogramming 8 months ago
- 3 comments
Labels: enhancement
#645 - Bump gitpython from 3.1.41 to 3.1.43
Pull Request -
State: closed - Opened by dependabot[bot] 8 months ago
- 2 comments
Labels: dependencies
#644 - Bump databricks-sdk from 0.22.0 to 0.23.0
Pull Request -
State: closed - Opened by dependabot[bot] 8 months ago
Labels: dependencies
#643 - GPU utilisation drop between epochs
Issue -
State: closed - Opened by rishabhm12 8 months ago
- 21 comments
Labels: bug
#642 - Bump pytest and fix failing test
Pull Request -
State: closed - Opened by snarayan21 8 months ago
- 1 comment
#641 - Update google-cloud-storage requirement from <2.11.0,>=2.9.0 to >=2.9.0,<2.17.0
Pull Request -
State: open - Opened by dependabot[bot] 8 months ago
- 1 comment
Labels: dependencies
#640 - Bump uvicorn from 0.28.0 to 0.29.0
Pull Request -
State: closed - Opened by dependabot[bot] 8 months ago
- 2 comments
Labels: dependencies
#639 - Bump pydantic from 2.5.3 to 2.6.4
Pull Request -
State: closed - Opened by dependabot[bot] 8 months ago
Labels: dependencies
#638 - Update pytest-cov requirement from <5,>=4 to >=4,<6
Pull Request -
State: closed - Opened by dependabot[bot] 8 months ago
- 3 comments
Labels: dependencies
#637 - When using the simulator, I noticed that all the ports used are 8501. Can I choose other ports?
Issue -
State: closed - Opened by tikboaHIT 8 months ago
Labels: enhancement
#636 - Major overhaul of Streaming documentation
Pull Request -
State: closed - Opened by snarayan21 8 months ago
- 3 comments
#635 - Add batch_size to 1 if not provided for regression testing
Pull Request -
State: closed - Opened by karan6181 8 months ago
#634 - Modify StreamingDataset to support passing process_group as construct…
Pull Request -
State: closed - Opened by jasonkrone 8 months ago
- 4 comments
#633 - Integrating MDS Streaming with HF Dataset Streaming
Issue -
State: closed - Opened by siddk 8 months ago
- 12 comments
Labels: enhancement
#632 - Fixed docstring note for getting sequential sample ordering
Pull Request -
State: closed - Opened by snarayan21 8 months ago
#631 - Bump furo from 2023.7.26 to 2024.1.29
Pull Request -
State: closed - Opened by dependabot[bot] 8 months ago
Labels: dependencies
#630 - Bump pypandoc from 1.12 to 1.13
Pull Request -
State: closed - Opened by dependabot[bot] 8 months ago
- 1 comment
Labels: dependencies