Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / huggingface/datasets issues and pull requests
#5795 - Fix spark imports
Pull Request -
State: closed - Opened by lhoestq over 1 year ago
- 3 comments
#5794 - CI ZeroDivisionError
Issue -
State: closed - Opened by albertvillanova over 1 year ago
- 2 comments
Labels: bug
#5793 - IterableDataset.with_format("torch") not working
Issue -
State: open - Opened by jiangwy99 over 1 year ago
- 1 comment
Labels: bug, enhancement, streaming
#5791 - TIFF/TIF support
Issue -
State: closed - Opened by sebasmos over 1 year ago
- 5 comments
Labels: enhancement
#5790 - Allow to run CI on push to ci-branch
Pull Request -
State: closed - Opened by albertvillanova over 1 year ago
- 2 comments
#5789 - Support streaming datasets that use jsonlines
Issue -
State: open - Opened by albertvillanova over 1 year ago
Labels: enhancement
#5788 - Prepare tests for hfh 0.14
Pull Request -
State: closed - Opened by Wauplin over 1 year ago
- 6 comments
#5787 - Fix inferring module for unsupported data files
Pull Request -
State: closed - Opened by albertvillanova over 1 year ago
- 4 comments
#5786 - Multiprocessing in a `filter` or `map` function with a Pytorch model
Issue -
State: closed - Opened by HugoLaurencon over 1 year ago
- 2 comments
#5785 - Unsupported data files raise TypeError: 'NoneType' object is not iterable
Issue -
State: closed - Opened by albertvillanova over 1 year ago
Labels: bug
#5784 - Raise subprocesses traceback when interrupting
Pull Request -
State: closed - Opened by lhoestq over 1 year ago
- 4 comments
#5783 - Offset overflow while doing regex on a text column
Issue -
State: open - Opened by nishanthcgit over 1 year ago
- 7 comments
#5782 - Support for various audio-loading backends instead of always relying on SoundFile
Issue -
State: closed - Opened by BoringDonut over 1 year ago
- 3 comments
Labels: enhancement
#5781 - Error using `load_datasets`
Issue -
State: closed - Opened by gjyoungjr over 1 year ago
- 2 comments
#5779 - Call fs.makedirs in save_to_disk
Pull Request -
State: closed - Opened by lhoestq over 1 year ago
- 3 comments
#5777 - datasets.load_dataset("code_search_net", "python") : NotADirectoryError: [Errno 20] Not a directory
Issue -
State: closed - Opened by jason-brian-anderson over 1 year ago
- 4 comments
#5775 - ArrowDataset.save_to_disk lost some logic of remote
Issue -
State: closed - Opened by Zoupers over 1 year ago
- 1 comment
#5773 - train_dataset does not implement __len__
Issue -
State: open - Opened by v-yunbin over 1 year ago
- 8 comments
#5771 - Support cloud storage for loading datasets
Issue -
State: closed - Opened by eli-osherovich over 1 year ago
- 1 comment
Labels: duplicate, enhancement
#5770 - Add IterableDataset.from_spark
Pull Request -
State: closed - Opened by maddiedawson over 1 year ago
- 8 comments
#5769 - Tiktoken tokenizers are not pickable
Issue -
State: closed - Opened by markovalexander over 1 year ago
- 1 comment
#5766 - Support custom feature types
Issue -
State: open - Opened by jmontalt over 1 year ago
- 4 comments
Labels: enhancement
#5765 - ValueError: You should supply an encoding or a list of encodings to this method that includes input_ids, but you provided ['text']
Issue -
State: open - Opened by sauravtii over 1 year ago
- 4 comments
#5760 - Multi-image loading in Imagefolder dataset
Issue -
State: open - Opened by vvvm23 over 1 year ago
- 5 comments
Labels: enhancement
#5752 - Streaming dataset looses `.feature` method after `.add_column`
Issue -
State: open - Opened by sanchit-gandhi over 1 year ago
- 2 comments
Labels: bug
#5747 - [WIP] Add Dataset.to_spark
Pull Request -
State: closed - Opened by maddiedawson over 1 year ago
#5744 - [BUG] With Pandas 2.0.0, `load_dataset` raises `TypeError: read_csv() got an unexpected keyword argument 'mangle_dupe_cols'`
Issue -
State: closed - Opened by keyboardAnt over 1 year ago
- 6 comments
#5736 - FORCE_REDOWNLOAD raises "Directory not empty" exception on second run
Issue -
State: open - Opened by rcasero over 1 year ago
- 3 comments
#5735 - Implement sharding on merged iterable datasets
Pull Request -
State: closed - Opened by Hubert-Bonisseur over 1 year ago
- 11 comments
#5729 - Fix nondeterministic sharded data split order
Pull Request -
State: closed - Opened by albertvillanova over 1 year ago
- 3 comments
#5728 - The order of data split names is nondeterministic
Issue -
State: closed - Opened by albertvillanova over 1 year ago
Labels: bug
#5727 - load_dataset fails with FileNotFound error on Windows
Issue -
State: open - Opened by joelkowalewski over 1 year ago
- 3 comments
#5720 - Streaming IterableDatasets do not work with torch DataLoaders
Issue -
State: open - Opened by jlehrer1 over 1 year ago
- 6 comments
#5718 - Reorder default data splits to have validation before test
Pull Request -
State: closed - Opened by albertvillanova over 1 year ago
- 3 comments
#5717 - Errror when saving to disk a dataset of images
Issue -
State: open - Opened by jplu over 1 year ago
- 15 comments
#5716 - Handle empty audio
Issue -
State: closed - Opened by v-yunbin over 1 year ago
- 2 comments
#5708 - Dataset sizes are in MiB instead of MB in dataset cards
Issue -
State: closed - Opened by albertvillanova over 1 year ago
- 12 comments
Labels: bug, dataset-viewer
#5706 - Support categorical data types for Parquet
Issue -
State: closed - Opened by kklemon over 1 year ago
- 17 comments
Labels: enhancement
#5701 - Add Dataset.from_spark
Pull Request -
State: closed - Opened by maddiedawson over 1 year ago
- 19 comments
#5699 - Issue when wanting to split in memory a cached dataset
Issue -
State: open - Opened by FrancoisNoyez over 1 year ago
- 2 comments
#5695 - Loading big dataset raises pyarrow.lib.ArrowNotImplementedError
Issue -
State: closed - Opened by amariucaitheodor over 1 year ago
- 7 comments
#5692 - pyarrow.lib.ArrowInvalid: Unable to merge: Field <field> has incompatible types
Issue -
State: open - Opened by cyanic-selkie over 1 year ago
- 6 comments
#5688 - Wikipedia download_and_prepare for GCS
Issue -
State: closed - Opened by adrianfagerland over 1 year ago
- 3 comments
#5678 - Add support to create a Dataset from spark dataframe
Issue -
State: closed - Opened by lu-wang-dl over 1 year ago
- 5 comments
Labels: enhancement
#5674 - Stored XSS
Issue -
State: closed - Opened by Fadavvi over 1 year ago
- 1 comment
#5665 - Feature request: IterableDataset.push_to_hub
Issue -
State: open - Opened by NielsRogge over 1 year ago
- 5 comments
Labels: enhancement
#5659 - [Audio] Soundfile/libsndfile requirements too stringent for decoding mp3 files
Issue -
State: closed - Opened by sanchit-gandhi over 1 year ago
- 13 comments
#5651 - expanduser in save_to_disk
Issue -
State: closed - Opened by RmZeta2718 over 1 year ago
- 5 comments
Labels: good first issue
#5634 - Not all progress bars are showing up when they should for downloading dataset
Issue -
State: closed - Opened by garlandz-db over 1 year ago
- 2 comments
#5613 - Version mismatch with multiprocess and dill on Python 3.10
Issue -
State: open - Opened by adampauls over 1 year ago
- 6 comments
#5612 - Arrow map type in parquet files unsupported
Issue -
State: open - Opened by TevenLeScao over 1 year ago
- 4 comments
#5610 - use datasets streaming mode in trainer ddp mode cause memory leak
Issue -
State: open - Opened by gromzhu over 1 year ago
- 3 comments
#5604 - Problems with downloading The Pile
Issue -
State: closed - Opened by sentialx over 1 year ago
- 7 comments
#5596 - [TypeError: Couldn't cast array of type] Can only load a subset of the dataset
Issue -
State: closed - Opened by loubnabnl over 1 year ago
- 5 comments
#5594 - Error while downloading the xtreme udpos dataset
Issue -
State: closed - Opened by simran-khanuja over 1 year ago
- 21 comments
#5589 - Revert "pass the dataset features to the IterableDataset.from_generator"
Pull Request -
State: closed - Opened by lhoestq over 1 year ago
- 5 comments
#5575 - Metadata for each column
Issue -
State: open - Opened by parsa-ra over 1 year ago
- 5 comments
Labels: enhancement
#5574 - c4 dataset streaming fails with `FileNotFoundError`
Issue -
State: closed - Opened by krasserm over 1 year ago
- 12 comments
#5554 - Add resampy dep
Pull Request -
State: closed - Opened by lhoestq over 1 year ago
- 5 comments
#5545 - Added return methods for URL-references to the pushed dataset
Pull Request -
State: open - Opened by davidberenstein1957 over 1 year ago
- 6 comments
#5537 - Increase speed of data files resolution
Issue -
State: closed - Opened by lhoestq over 1 year ago
- 5 comments
Labels: enhancement, good second issue
#5536 - Failure to hash function when using .map()
Issue -
State: closed - Opened by venzen over 1 year ago
- 14 comments
#5528 - Push to hub in a pull request
Pull Request -
State: open - Opened by AJDERS over 1 year ago
- 11 comments
#5519 - Lint code with `ruff`
Pull Request -
State: closed - Opened by mariosasko over 1 year ago
- 6 comments
#5517 - `with_format("numpy")` silently downcasts float64 to float32 features
Issue -
State: open - Opened by ernestum over 1 year ago
- 13 comments
#5511 - Creating a dummy dataset from a bigger one
Issue -
State: closed - Opened by patrickvonplaten over 1 year ago
- 8 comments
#5498 - TypeError: 'bool' object is not iterable when filtering a datasets.arrow_dataset.Dataset
Issue -
State: closed - Opened by vmuel over 1 year ago
- 3 comments
#5492 - Push_to_hub in a pull request
Issue -
State: closed - Opened by lhoestq over 1 year ago
- 2 comments
Labels: enhancement, good first issue
#5484 - Update docs for `nyu_depth_v2` dataset
Pull Request -
State: closed - Opened by awsaf49 over 1 year ago
- 6 comments
#5481 - Load a cached dataset as iterable
Issue -
State: open - Opened by lhoestq over 1 year ago
- 16 comments
Labels: enhancement, good second issue
#5477 - Unpin sqlalchemy once issue is fixed
Issue -
State: closed - Opened by albertvillanova over 1 year ago
- 2 comments
#5467 - Fix conda command in readme
Pull Request -
State: closed - Opened by lhoestq over 1 year ago
- 4 comments
#5459 - Disable aiohttp requoting of redirection URL
Pull Request -
State: closed - Opened by albertvillanova over 1 year ago
- 7 comments
#5454 - Save and resume the state of a DataLoader
Issue -
State: open - Opened by lhoestq over 1 year ago
- 18 comments
Labels: enhancement, generic discussion
#5451 - ImageFolder BadZipFile: Bad offset for central directory
Issue -
State: closed - Opened by hmartiro over 1 year ago
- 3 comments
#5430 - Support Apache Beam >= 2.44.0
Issue -
State: closed - Opened by albertvillanova over 1 year ago
- 1 comment
Labels: enhancement
#5422 - Datasets load error for saved github issues
Issue -
State: open - Opened by folterj over 1 year ago
- 7 comments
#5364 - Support for writing arrow files directly with BeamWriter
Pull Request -
State: closed - Opened by mariosasko almost 2 years ago
- 6 comments
#5354 - Consider using "Sequence" instead of "List"
Issue -
State: open - Opened by tranhd95 almost 2 years ago
- 8 comments
Labels: enhancement, good first issue
#5339 - Add Video feature, videofolder, and video-classification task
Pull Request -
State: closed - Opened by nateraw almost 2 years ago
- 4 comments
#5337 - Support webdataset format
Issue -
State: closed - Opened by lhoestq almost 2 years ago
- 5 comments
#5335 - Update tasks.json
Pull Request -
State: closed - Opened by sayakpaul almost 2 years ago
- 11 comments
#5331 - Support for multiple configs in packaged modules via metadata yaml info
Pull Request -
State: open - Opened by polinaeterna almost 2 years ago
- 15 comments
#5324 - Fix docstrings and types in documentation that appears on the website
Issue -
State: open - Opened by polinaeterna almost 2 years ago
- 5 comments
Labels: documentation
#5312 - Add DatasetDict.to_pandas
Pull Request -
State: closed - Opened by lhoestq almost 2 years ago
- 12 comments
#5301 - Return a split Dataset in load_dataset
Pull Request -
State: closed - Opened by lhoestq almost 2 years ago
- 2 comments
#5286 - FileNotFoundError: Couldn't find file at https://dumps.wikimedia.org/enwiki/20220301/dumpstatus.json
Issue -
State: closed - Opened by roritol almost 2 years ago
- 2 comments
#5281 - Support cloud storage in load_dataset
Issue -
State: open - Opened by lhoestq almost 2 years ago
- 28 comments
Labels: enhancement, good second issue
#5274 - load_dataset possibly broken for gated datasets?
Issue -
State: closed - Opened by TristanThrush almost 2 years ago
- 8 comments
#5272 - Use pyarrow Tensor dtype
Issue -
State: open - Opened by franz101 almost 2 years ago
- 16 comments
Labels: enhancement
#5264 - `datasets` can't read a Parquet file in Python 3.9.13
Issue -
State: closed - Opened by loubnabnl almost 2 years ago
- 16 comments
Labels: bug
#5249 - Protect the main branch from inadvertent direct pushes
Issue -
State: closed - Opened by albertvillanova almost 2 years ago
- 1 comment
Labels: maintenance
#5243 - Download only split data
Issue -
State: open - Opened by capsabogdan almost 2 years ago
- 6 comments
Labels: enhancement
#5230 - dataclasses error when importing the library in python 3.11
Issue -
State: closed - Opened by yonikremer almost 2 years ago
- 4 comments
#5228 - Loading a dataset from the hub fails if you happen to have a folder of the same name
Issue -
State: open - Opened by dakinggg almost 2 years ago
- 3 comments
#5227 - datasets.data_files.EmptyDatasetError: The directory at wikisql doesn't contain any data files
Issue -
State: closed - Opened by ScottM-wizard almost 2 years ago
- 2 comments
#5224 - Seems to freeze when loading audio dataset with wav files from local folder
Issue -
State: closed - Opened by uriii3 almost 2 years ago
- 4 comments
#5207 - Connection error of the HuggingFace's dataset Hub due to SSLError with proxy
Issue -
State: open - Opened by leemgs almost 2 years ago
- 13 comments
#5172 - Inconsistency behavior between handling local file protocol and other FS protocols
Issue -
State: open - Opened by leoleoasd almost 2 years ago
#5156 - Unable to download dataset using Azure Data Lake Gen 2
Issue -
State: closed - Opened by clarissesimoes almost 2 years ago
- 4 comments