Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / huggingface/datasets issues and pull requests

#5154 - Test latest fsspec in CI

Pull Request - State: closed - Opened by lhoestq almost 2 years ago - 2 comments

#5150 - Problems after upgrading to 2.6.1

Issue - State: open - Opened by pietrolesci almost 2 years ago - 10 comments

#5131 - WikiText 103 tokenizer hangs

Issue - State: closed - Opened by TrentBrick almost 2 years ago - 1 comment
Labels: bug

#5127 - [WIP] WebDataset export

Pull Request - State: closed - Opened by lhoestq almost 2 years ago - 2 comments

#5123 - datasets freezes with streaming mode in multiple-gpu

Issue - State: open - Opened by jackfeinmann5 almost 2 years ago - 11 comments
Labels: bug

#5117 - Progress bars have color red and never completed to 100%

Issue - State: closed - Opened by echatzikyriakidis almost 2 years ago - 5 comments
Labels: bug

#5096 - Transfer some canonical datasets under an organization namespace

Issue - State: closed - Opened by albertvillanova almost 2 years ago - 11 comments
Labels: dataset contribution

#5084 - IterableDataset formatting in numpy/torch/tf/jax

Pull Request - State: closed - Opened by lhoestq almost 2 years ago - 3 comments

#5083 - Support numpy/torch/tf/jax formatting for IterableDataset

Issue - State: closed - Opened by lhoestq almost 2 years ago - 2 comments
Labels: enhancement, streaming, good second issue

#5045 - Automatically revert to last successful commit to hub when a push_to_hub is interrupted

Issue - State: closed - Opened by jorahn about 2 years ago - 5 comments
Labels: enhancement

#5044 - integrate `load_from_disk` into `load_dataset`

Issue - State: open - Opened by stas00 about 2 years ago - 11 comments
Labels: enhancement

#5018 - Create all YAML dataset_info

Pull Request - State: closed - Opened by lhoestq about 2 years ago - 2 comments
Labels: dataset contribution

#5012 - Force JSON format regardless of file naming on S3

Issue - State: closed - Opened by junwang-wish about 2 years ago - 4 comments
Labels: enhancement

#5001 - Support loading XML datasets

Pull Request - State: open - Opened by albertvillanova about 2 years ago - 3 comments

#4983 - How to convert torch.utils.data.Dataset to huggingface dataset?

Issue - State: closed - Opened by DEROOCE about 2 years ago - 15 comments
Labels: enhancement

#4975 - Add `fn_kwargs` param to `IterableDataset.map`

Pull Request - State: closed - Opened by mariosasko about 2 years ago - 4 comments

#4973 - [GH->HF] Load datasets from the Hub

Pull Request - State: closed - Opened by lhoestq about 2 years ago - 2 comments

#4965 - [Apple M1] MemoryError: Cannot allocate write+execute memory for ffi.callback()

Issue - State: closed - Opened by hoangtnm about 2 years ago - 6 comments
Labels: bug

#4952 - Add test-datasets CI job

Pull Request - State: closed - Opened by lhoestq about 2 years ago - 2 comments

#4947 - Try to fix the Windows CI after TF update 2.10

Pull Request - State: closed - Opened by lhoestq about 2 years ago - 1 comment

#4926 - Dataset infos in yaml

Pull Request - State: closed - Opened by lhoestq about 2 years ago - 6 comments
Labels: dataset contribution

#4883 - With dataloader RSS memory consumed by HF datasets monotonically increases

Issue - State: open - Opened by apsdehal about 2 years ago - 44 comments
Labels: bug

#4847 - Test win ci

Pull Request - State: closed - Opened by Mr-Robot-001 about 2 years ago

#4828 - Support PIL Image objects in `add_item`/`add_column`

Pull Request - State: open - Opened by mariosasko about 2 years ago - 3 comments

#4804 - streaming dataset with concatenating splits raises an error

Issue - State: open - Opened by Bing-su about 2 years ago - 4 comments
Labels: bug

#4803 - Support `pipeline` argument in inspect.py functions

Issue - State: open - Opened by severo about 2 years ago - 1 comment
Labels: enhancement

#4800 - support LargeListArray in pyarrow

Pull Request - State: closed - Opened by Jiaxin-Wen about 2 years ago - 22 comments

#4799 - video dataset loader/parser

Issue - State: closed - Opened by nollied about 2 years ago - 3 comments
Labels: enhancement

#4796 - ArrowInvalid: Could not convert <PIL.Image.Image image mode=RGB when adding image to Dataset

Issue - State: open - Opened by NielsRogge about 2 years ago - 17 comments
Labels: bug

#4760 - Issue with offline mode

Issue - State: closed - Opened by SaulLu about 2 years ago - 15 comments
Labels: bug

#4711 - Document how to create a dataset loading script for audio/vision

Issue - State: closed - Opened by albertvillanova about 2 years ago - 1 comment
Labels: documentation

#4702 - Domain specific dataset discovery on the Hugging Face hub

Issue - State: open - Opened by davanstrien about 2 years ago - 11 comments
Labels: enhancement

#4694 - Distributed data parallel training for streaming datasets

Issue - State: open - Opened by cyk1337 about 2 years ago - 6 comments
Labels: enhancement

#4686 - Align logging with Transformers (again)

Pull Request - State: closed - Opened by mariosasko about 2 years ago - 2 comments

#4624 - Remove all paperswithcode_id: null

Pull Request - State: closed - Opened by lhoestq about 2 years ago - 3 comments

#4602 - Upgrade setuptools in windows CI

Pull Request - State: closed - Opened by lhoestq over 2 years ago - 1 comment

#4601 - Upgrade pip in WIN CI

Pull Request - State: closed - Opened by lhoestq over 2 years ago - 2 comments

#4584 - Add binary classification task IDs

Pull Request - State: closed - Opened by lewtun over 2 years ago - 4 comments

#4573 - Fix evaluation metadata for ncbi_disease

Pull Request - State: closed - Opened by lewtun over 2 years ago - 2 comments
Labels: dataset contribution

#4571 - move under the facebook org?

Issue - State: open - Opened by lewtun over 2 years ago - 3 comments

#4567 - Add evaluation data for amazon_reviews_multi

Pull Request - State: closed - Opened by lewtun over 2 years ago - 2 comments
Labels: dataset contribution

#4560 - Add evaluation metadata to imagenet-1k

Pull Request - State: closed - Opened by lewtun over 2 years ago - 2 comments
Labels: dataset contribution

#4558 - Add evaluation metadata to wmt14

Pull Request - State: closed - Opened by lewtun over 2 years ago - 2 comments
Labels: dataset contribution

#4557 - Add evaluation metadata to wmt16

Pull Request - State: closed - Opened by lewtun over 2 years ago - 3 comments
Labels: dataset contribution

#4529 - Ecoset

Issue - State: closed - Opened by DiGyt over 2 years ago - 3 comments
Labels: dataset request

#4504 - Can you please add the Stanford dog dataset?

Issue - State: closed - Opened by dgrnd4 over 2 years ago - 15 comments
Labels: good first issue, dataset request

#4482 - Test that TensorFlow is not imported on startup

Pull Request - State: closed - Opened by lhoestq over 2 years ago - 3 comments

#4463 - Use config_id to check split sizes instead of config name

Pull Request - State: closed - Opened by lhoestq over 2 years ago - 2 comments

#4461 - AttributeError: module 'datasets' has no attribute 'load_dataset'

Issue - State: closed - Opened by AlexNLP over 2 years ago - 4 comments
Labels: bug

#4448 - New Preprocessing Feature - Deduplication [Request]

Issue - State: open - Opened by yuvalkirstain over 2 years ago - 2 comments
Labels: duplicate, enhancement

#4443 - Dataset Viewer issue for openclimatefix/nimrod-uk-1km

Issue - State: open - Opened by ZYMXIXI over 2 years ago - 7 comments

#4395 - Add Pascal VOC dataset

Pull Request - State: closed - Opened by nateraw over 2 years ago - 6 comments
Labels: dataset contribution

#4394 - trainer became extremely slow after reload dataset by `load_from_disk`

Issue - State: open - Opened by conan1024hao over 2 years ago - 5 comments
Labels: bug

#4365 - Remove dots in config names

Pull Request - State: closed - Opened by lhoestq over 2 years ago - 2 comments

#4334 - Adding eval metadata for billsum

Pull Request - State: closed - Opened by sashavor over 2 years ago

#4284 - Issues in processing very large datasets

Issue - State: closed - Opened by sajastu over 2 years ago - 2 comments
Labels: bug

#4197 - Add remove_columns=True

Pull Request - State: closed - Opened by thomasw21 over 2 years ago - 4 comments

#4184 - [Librispeech] Add 'all' config

Pull Request - State: closed - Opened by patrickvonplaten over 2 years ago - 29 comments

#4183 - Document librispeech configs

Pull Request - State: closed - Opened by lhoestq over 2 years ago - 5 comments

#4175 - Add WIT Dataset

Pull Request - State: closed - Opened by thomasw21 over 2 years ago - 6 comments

#4129 - dataset metadata for reproducibility

Issue - State: open - Opened by nbroad1881 over 2 years ago - 1 comment
Labels: enhancement

#4117 - AttributeError: module 'huggingface_hub' has no attribute 'hf_api'

Issue - State: closed - Opened by arymbe over 2 years ago - 13 comments
Labels: bug

#4114 - Allow downloading just some columns of a dataset

Issue - State: open - Opened by osanseviero over 2 years ago - 8 comments
Labels: enhancement

#4104 - Add time series data - stock market

Issue - State: open - Opened by INF800 over 2 years ago - 10 comments
Labels: dataset request

#4102 - [hub] Fix `api.create_repo` call?

Pull Request - State: closed - Opened by julien-c over 2 years ago - 2 comments

#4096 - Add support for streaming Zarr stores for hosted datasets

Issue - State: closed - Opened by jacobbieker over 2 years ago - 11 comments
Labels: enhancement

#4062 - Loading mozilla-foundation/common_voice_7_0 dataset failed

Issue - State: closed - Opened by aapot over 2 years ago - 10 comments
Labels: dataset bug

#4038 - [DO NOT MERGE] Test doc-builder with skipped installation feature

Pull Request - State: closed - Opened by lewtun over 2 years ago - 2 comments

#4036 - Fix building of documentation

Pull Request - State: closed - Opened by albertvillanova over 2 years ago - 2 comments

#3984 - Local and automatic tests fail

Issue - State: closed - Opened by MarkusSagen over 2 years ago - 1 comment
Labels: bug

#3983 - Infinitely attempting lock

Issue - State: closed - Opened by jyrr over 2 years ago - 4 comments

#3979 - Fix google drive streaming for small files

Pull Request - State: closed - Opened by lhoestq over 2 years ago - 4 comments

#3978 - I can't view HFcallback dataset for ASR Space

Issue - State: open - Opened by kingabzpro over 2 years ago - 4 comments

#3960 - Load local dataset error

Issue - State: open - Opened by TXacs over 2 years ago - 13 comments
Labels: bug, dataset bug

#3956 - TypeError: __init__() missing 1 required positional argument: 'scheme'

Issue - State: closed - Opened by amirj over 2 years ago - 8 comments
Labels: bug

#3946 - Add newline to text dataset builder for controlling universal newlines mode

Pull Request - State: closed - Opened by albertvillanova over 2 years ago - 3 comments

#3941 - billsum dataset: Checksums didn't match for dataset source files:

Issue - State: closed - Opened by XingxingZhang over 2 years ago - 3 comments
Labels: bug

#3913 - Deterministic split order in DatasetDict.map

Pull Request - State: closed - Opened by lhoestq over 2 years ago - 3 comments

#3912 - add draft of registering function for pandas

Pull Request - State: closed - Opened by lvwerra over 2 years ago - 3 comments

#3867 - Update for the rename doc-builder -> hf-doc-utils

Pull Request - State: closed - Opened by sgugger over 2 years ago - 4 comments

#3865 - Add logo img

Pull Request - State: closed - Opened by mishig25 over 2 years ago - 2 comments

#3854 - load only England English dataset from common voice english dataset

Issue - State: closed - Opened by amanjaiswal777 over 2 years ago - 2 comments
Labels: question

#3847 - Datasets' cache not re-used

Issue - State: open - Opened by gejinchen over 2 years ago - 26 comments
Labels: bug

#3838 - Add a data type for labeled images (image segmentation)

Issue - State: open - Opened by severo over 2 years ago
Labels: enhancement

#3792 - Checksums didn't match for dataset source

Issue - State: closed - Opened by rafikg over 2 years ago - 26 comments
Labels: dataset-viewer

#3753 - Expanding streaming capabilities

Issue - State: open - Opened by lvwerra over 2 years ago - 6 comments
Labels: enhancement

#3735 - Performance of `datasets` at scale

Issue - State: open - Opened by lvwerra over 2 years ago - 6 comments

#3720 - Builder Configuration Update Required on Common Voice Dataset

Issue - State: closed - Opened by aasem over 2 years ago - 7 comments
Labels: bug

#3700 - Unable to load a dataset

Issue - State: closed - Opened by PaulchauvinAI over 2 years ago - 3 comments
Labels: bug

#3681 - Fix TestCommand to move dataset_infos instead of copying

Pull Request - State: closed - Opened by albertvillanova over 2 years ago - 6 comments

#3658 - Dataset viewer issue for *P3*

Issue - State: closed - Opened by jeffistyping over 2 years ago - 4 comments

#3650 - Allow 'to_json' to run in unordered fashion in order to lower memory footprint

Pull Request - State: closed - Opened by thomasw21 over 2 years ago - 6 comments

#3644 - Add a GROUP BY operator

Issue - State: open - Opened by felix-schneider over 2 years ago - 11 comments
Labels: enhancement

#3638 - AutoTokenizer hash value got change after datasets.map

Issue - State: open - Opened by tshu-w over 2 years ago - 12 comments
Labels: bug

#3618 - TIMIT Dataset not working with GPU

Issue - State: closed - Opened by TheSeamau5 over 2 years ago - 3 comments
Labels: bug

#3595 - Add ImageNet toy datasets from fastai

Pull Request - State: closed - Opened by mariosasko over 2 years ago - 1 comment
Labels: dataset contribution

#3578 - label information get lost after parquet serialization

Issue - State: closed - Opened by Tudyx over 2 years ago - 2 comments
Labels: bug