Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / huggingface/datasets issues and pull requests
#7181 - [WIP] Fix datasets export to JSON
Pull Request -
State: open - Opened by varadhbhatnagar about 14 hours ago
#7180 - Memory leak when wrapping datasets into PyTorch Dataset without explicit deletion
Issue -
State: open - Opened by iamwangyabin 1 day ago
#7179 - Support Python 3.11
Pull Request -
State: open - Opened by albertvillanova 3 days ago
- 1 comment
#7178 - Support Python 3.11
Issue -
State: open - Opened by albertvillanova 3 days ago
Labels: enhancement
#7177 - Fix release instructions
Pull Request -
State: closed - Opened by albertvillanova 3 days ago
- 1 comment
#7176 - fix grammar in fingerprint.py
Pull Request -
State: open - Opened by jxmorris12 3 days ago
#7175 - [FSTimeoutError] load_dataset
Issue -
State: open - Opened by cosmo3769 3 days ago
- 2 comments
#7174 - Set dev version
Pull Request -
State: closed - Opened by albertvillanova 4 days ago
- 1 comment
#7173 - Release: 3.0.1
Pull Request -
State: closed - Opened by albertvillanova 4 days ago
- 1 comment
#7172 - Add torchdata as a regular test dependency
Pull Request -
State: closed - Opened by albertvillanova 4 days ago
- 1 comment
#7171 - CI is broken: No solution found when resolving dependencies
Issue -
State: closed - Opened by albertvillanova 4 days ago
Labels: bug
#7170 - Support JSON lines with missing columns
Pull Request -
State: closed - Opened by albertvillanova 5 days ago
- 1 comment
#7169 - JSON lines with missing columns raise CastError
Issue -
State: closed - Opened by albertvillanova 5 days ago
Labels: bug
#7168 - sd1.5 diffusers controlnet training script gives new error
Issue -
State: open - Opened by Night1099 5 days ago
- 2 comments
#7167 - Error Mapping on sd3, sdxl and upcoming flux controlnet training scripts in diffusers
Issue -
State: open - Opened by Night1099 5 days ago
#7166 - fix docstring code example for distributed shuffle
Pull Request -
State: closed - Opened by lhoestq 5 days ago
- 1 comment
#7165 - fix increase_load_count
Pull Request -
State: closed - Opened by lhoestq 6 days ago
- 3 comments
#7164 - fsspec.exceptions.FSTimeoutError when downloading dataset
Issue -
State: open - Opened by timonmerk 6 days ago
- 2 comments
#7163 - Set explicit seed in iterable dataset ddp shuffling example
Issue -
State: closed - Opened by alex-hh 7 days ago
- 1 comment
#7162 - Support JSON lines with empty struct
Pull Request -
State: closed - Opened by albertvillanova 7 days ago
- 1 comment
#7161 - JSON lines with empty struct raise ArrowTypeError
Issue -
State: closed - Opened by albertvillanova 7 days ago
Labels: bug
#7160 - Support JSON lines with missing struct fields
Pull Request -
State: closed - Opened by albertvillanova 7 days ago
- 1 comment
#7159 - JSON lines with missing struct fields raise TypeError: Couldn't cast array
Issue -
State: closed - Opened by albertvillanova 7 days ago
Labels: bug
#7158 - google colab ex
Pull Request -
State: open - Opened by docfhsp 7 days ago
#7157 - Fix zero proba interleave datasets
Pull Request -
State: closed - Opened by lhoestq 8 days ago
- 1 comment
#7156 - interleave_datasets resets shuffle state
Issue -
State: open - Opened by jonathanasdf 9 days ago
#7155 - Dataset viewer not working! Failure due to more than 32 splits.
Issue -
State: closed - Opened by sleepingcat4 12 days ago
- 1 comment
#7154 - Support ndjson data files
Pull Request -
State: closed - Opened by albertvillanova 12 days ago
- 2 comments
#7153 - Support data files with .ndjson extension
Issue -
State: closed - Opened by albertvillanova 12 days ago
Labels: enhancement
#7151 - Align filename prefix splitting with WebDataset library
Pull Request -
State: closed - Opened by albertvillanova 14 days ago
#7150 - WebDataset loader splits keys differently than WebDataset library
Issue -
State: closed - Opened by albertvillanova 14 days ago
Labels: bug
#7149 - Datasets Unknown Keyword Argument Error - task_templates
Issue -
State: closed - Opened by varungupta31 17 days ago
- 1 comment
#7148 - Bug: Error when downloading mteb/mtop_domain
Issue -
State: closed - Opened by ZiyiXia 17 days ago
- 4 comments
#7147 - IterableDataset strange deadlock
Issue -
State: closed - Opened by jonathanasdf 17 days ago
- 6 comments
#7146 - Set dev version
Pull Request -
State: closed - Opened by albertvillanova 19 days ago
- 1 comment
#7145 - Release: 3.0.0
Pull Request -
State: closed - Opened by albertvillanova 19 days ago
- 1 comment
#7144 - Fix key error in webdataset
Pull Request -
State: closed - Opened by ragavsachdeva 19 days ago
- 7 comments
#7143 - Modify add_column() to optionally accept a FeatureType as param
Pull Request -
State: closed - Opened by varadhbhatnagar 22 days ago
- 6 comments
#7142 - Specifying datatype when adding a column to a dataset.
Issue -
State: closed - Opened by varadhbhatnagar 22 days ago
- 1 comment
Labels: enhancement
#7141 - Older datasets throwing safety errors with 2.21.0
Issue -
State: closed - Opened by alvations 23 days ago
- 17 comments
#7139 - Use load_dataset to load imagenet-1K But find a empty dataset
Issue -
State: open - Opened by fscdc 24 days ago
#7138 - Cache only changed columns?
Issue -
State: open - Opened by Modexus 25 days ago
- 2 comments
Labels: enhancement
#7137 - [BUG] dataset_info sequence unexpected behavior in README.md YAML
Issue -
State: open - Opened by ain-soph 25 days ago
- 1 comment
#7136 - Do not consume unnecessary memory during sharding
Pull Request -
State: open - Opened by janEbert 25 days ago
#7135 - Bug: Type Mismatch in Dataset Mapping
Issue -
State: open - Opened by marko1616 26 days ago
- 3 comments
#7134 - Attempting to return a rank 3 grayscale image from dataset.map results in extreme slowdown
Issue -
State: open - Opened by navidmafi 29 days ago
#7133 - remove filecheck to enable symlinks
Pull Request -
State: open - Opened by fschlatt about 1 month ago
- 5 comments
#7132 - Fix data file module inference
Pull Request -
State: open - Opened by HennerM about 1 month ago
- 3 comments
#7129 - Inconsistent output in documentation example: `num_classes` not displayed in `ClassLabel` output
Issue -
State: open - Opened by sergiopaniego about 1 month ago
#7128 - Filter Large Dataset Entry by Entry
Issue -
State: open - Opened by QiyaoWei about 1 month ago
- 3 comments
Labels: enhancement
#7127 - Caching shuffles by np.random.Generator results in unintiutive behavior
Issue -
State: open - Opened by el-hult about 1 month ago
- 5 comments
#7126 - Disable implicit token in CI
Pull Request -
State: closed - Opened by albertvillanova about 1 month ago
- 2 comments
#7125 - Fix wrong SHA in CI tests of HubDatasetModuleFactoryWithParquetExport
Pull Request -
State: closed - Opened by albertvillanova about 1 month ago
- 2 comments
#7124 - Test get_dataset_config_info with non-existing/gated/private dataset
Pull Request -
State: closed - Opened by albertvillanova about 1 month ago
- 2 comments
#7123 - Make dataset viewer more flexible in displaying metadata alongside images
Issue -
State: open - Opened by egrace479 about 1 month ago
- 1 comment
Labels: enhancement
#7122 - [interleave_dataset] sample batches from a single source at a time
Issue -
State: open - Opened by memray about 1 month ago
Labels: enhancement
#7121 - Fix typed examples iterable state dict
Pull Request -
State: closed - Opened by lhoestq about 1 month ago
- 2 comments
#7120 - don't mention the script if trust_remote_code=False
Pull Request -
State: closed - Opened by severo about 1 month ago
- 3 comments
#7119 - Install transformers with numpy-2 CI
Pull Request -
State: closed - Opened by albertvillanova about 1 month ago
- 2 comments
#7118 - Allow numpy-2.1 and test it without audio extra
Pull Request -
State: closed - Opened by albertvillanova about 1 month ago
- 2 comments
#7117 - Audio dataset load everything in RAM and is very slow
Issue -
State: open - Opened by Jourdelune about 1 month ago
- 3 comments
#7116 - datasets cannot handle nested json if features is given.
Issue -
State: closed - Opened by ljw20180420 about 1 month ago
- 3 comments
#7115 - module 'pyarrow.lib' has no attribute 'ListViewType'
Issue -
State: closed - Opened by neurafusionai about 1 month ago
- 1 comment
#7114 - Temporarily pin numpy<2.1 to fix CI
Pull Request -
State: closed - Opened by albertvillanova about 1 month ago
- 2 comments
#7113 - Stream dataset does not iterate if the batch size is larger than the dataset size (related to drop_last_batch)
Issue -
State: closed - Opened by memray about 1 month ago
- 1 comment
#7112 - cudf-cu12 24.4.1, ibis-framework 8.0.0 requires pyarrow<15.0.0a0,>=14.0.1,pyarrow<16,>=2 and datasets 2.21.0 requires pyarrow>=15.0.0
Issue -
State: open - Opened by SoumyaMB10 about 1 month ago
- 2 comments
#7111 - CI is broken for numpy-2: Failed to fetch wheel: llvmlite==0.34.0
Issue -
State: closed - Opened by albertvillanova about 1 month ago
- 2 comments
#7110 - Fix ConnectionError for gated datasets and unauthenticated users
Pull Request -
State: closed - Opened by albertvillanova about 1 month ago
- 4 comments
#7109 - ConnectionError for gated datasets and unauthenticated users
Issue -
State: closed - Opened by albertvillanova about 1 month ago
#7108 - website broken: Create a new dataset repository, doesn't create a new repo in Firefox
Issue -
State: closed - Opened by neoneye about 1 month ago
- 4 comments
#7107 - load_dataset broken in 2.21.0
Issue -
State: closed - Opened by anjor about 1 month ago
- 4 comments
#7106 - Rename LargeList.dtype to LargeList.feature
Pull Request -
State: closed - Opened by albertvillanova about 1 month ago
- 2 comments
#7105 - Use `huggingface_hub` cache
Pull Request -
State: closed - Opened by lhoestq about 2 months ago
- 7 comments
#7104 - remove more script docs
Pull Request -
State: closed - Opened by lhoestq about 2 months ago
- 2 comments
#7103 - Fix args of feature docstrings
Pull Request -
State: closed - Opened by albertvillanova about 2 months ago
- 2 comments
#7102 - Slow iteration speeds when using IterableDataset.shuffle with load_dataset(data_files=..., streaming=True)
Issue -
State: open - Opened by lajd about 2 months ago
- 2 comments
#7101 - `load_dataset` from Hub with `name` to specify `config` using incorrect builder type when multiple data formats are present
Issue -
State: open - Opened by hlky about 2 months ago
- 1 comment
#7100 - IterableDataset: cannot resolve features from list of numpy arrays
Issue -
State: open - Opened by VeryLazyBoy about 2 months ago
#7099 - Set dev version
Pull Request -
State: closed - Opened by albertvillanova about 2 months ago
- 2 comments
#7098 - Release: 2.21.0
Pull Request -
State: closed - Opened by albertvillanova about 2 months ago
- 1 comment
#7097 - Some of DownloadConfig's properties are always being overridden in load.py
Issue -
State: open - Opened by ductai199x about 2 months ago
#7096 - Automatically create `cache_dir` from `cache_file_name`
Pull Request -
State: closed - Opened by ringohoffman about 2 months ago
- 3 comments
#7094 - Add Arabic Docs to Datasets
Pull Request -
State: open - Opened by AhmedAlmaghz about 2 months ago
#7093 - Add Arabic Docs to datasets
Issue -
State: open - Opened by AhmedAlmaghz about 2 months ago
Labels: enhancement
#7092 - load_dataset with multiple jsonlines files interprets datastructure too early
Issue -
State: open - Opened by Vipitis about 2 months ago
- 5 comments
#7090 - The test test_move_script_doesnt_change_hash fails because it runs the 'python' command while the python executable has a different name
Issue -
State: open - Opened by yurivict about 2 months ago
#7089 - Missing pyspark dependency causes the testsuite to error out, instead of a few tests to be skipped
Issue -
State: open - Opened by yurivict about 2 months ago
#7088 - Disable warning when using with_format format on tensors
Issue -
State: open - Opened by Haislich about 2 months ago
Labels: enhancement
#7087 - Unable to create dataset card for Lushootseed language
Issue -
State: closed - Opened by vaishnavsudarshan about 2 months ago
- 2 comments
Labels: enhancement
#7086 - load_dataset ignores cached datasets and tries to hit HF Hub, resulting in API rate limit errors
Issue -
State: open - Opened by tginart about 2 months ago
#7085 - [Regression] IterableDataset is broken on 2.20.0
Issue -
State: closed - Opened by AjayP13 2 months ago
- 3 comments
#7084 - More easily support streaming local files
Issue -
State: open - Opened by fschlatt 2 months ago
Labels: enhancement
#7083 - fix streaming from arrow files
Pull Request -
State: closed - Opened by fschlatt 2 months ago
#7082 - Support HTTP authentication in non-streaming mode
Pull Request -
State: closed - Opened by albertvillanova 2 months ago
- 2 comments
#7081 - Set load_from_disk path type as PathLike
Pull Request -
State: closed - Opened by albertvillanova 2 months ago
- 2 comments
#7080 - Generating train split takes a long time
Issue -
State: open - Opened by alexanderswerdlow 2 months ago
#7079 - HfHubHTTPError: 500 Server Error: Internal Server Error for url:
Issue -
State: closed - Opened by neoneye 2 months ago
- 17 comments
#7078 - Fix CI test_convert_to_parquet
Pull Request -
State: closed - Opened by albertvillanova 2 months ago
- 2 comments
#7077 - column_names ignored by load_dataset() when loading CSV file
Issue -
State: open - Opened by luismsgomes 2 months ago
- 1 comment
#7076 - ๐งช Do not mock create_commit
Pull Request -
State: closed - Opened by coyotte508 2 months ago
- 1 comment