Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / huggingface/datasets issues and pull requests

#3567 - Fix push to hub to allow individual split push

Pull Request - State: closed - Opened by thomasw21 over 2 years ago - 1 comment

#3547 - Datasets created with `push_to_hub` can't be accessed in offline mode

Issue - State: closed - Opened by TevenLeScao over 2 years ago - 18 comments
Labels: bug

#3504 - Unable to download PUBMED_title_abstracts_2019_baseline.jsonl.zst

Issue - State: closed - Opened by ToddMorrill almost 3 years ago - 10 comments
Labels: bug, dataset bug

#3474 - Decode images when iterating

Pull Request - State: closed - Opened by lhoestq almost 3 years ago

#3468 - Add COCO dataset

Pull Request - State: closed - Opened by mariosasko almost 3 years ago - 7 comments
Labels: dataset contribution

#3465 - Unable to load 'cnn_dailymail' dataset

Issue - State: closed - Opened by talha1503 almost 3 years ago - 4 comments
Labels: bug, duplicate, dataset bug

#3460 - Don't encode lists as strings when using `Value("string")`

Pull Request - State: closed - Opened by lhoestq almost 3 years ago - 3 comments

#3455 - Easier information editing

Issue - State: closed - Opened by borgr almost 3 years ago - 2 comments
Labels: enhancement, generic discussion

#3450 - Unexpected behavior doing Split + Filter

Issue - State: closed - Opened by jbrachat almost 3 years ago - 1 comment
Labels: bug

#3449 - Add `__add__()`, `__iadd__()` and similar to `Dataset` class

Issue - State: closed - Opened by sgraaf almost 3 years ago - 2 comments
Labels: enhancement, generic discussion

#3446 - Remove redundant local path information in audio/image datasets

Pull Request - State: closed - Opened by mariosasko almost 3 years ago - 3 comments
Labels: dataset contribution

#3444 - Align the Dataset and IterableDataset processing API

Issue - State: open - Opened by lhoestq almost 3 years ago - 8 comments
Labels: enhancement, generic discussion

#3401 - Add Wikimedia pre-processed datasets

Issue - State: open - Opened by albertvillanova almost 3 years ago - 1 comment
Labels: dataset request

#3365 - Add task tags for multimodal datasets

Issue - State: closed - Opened by albertvillanova almost 3 years ago - 1 comment
Labels: enhancement

#3338 - [WIP] Add doctests for tutorials

Pull Request - State: closed - Opened by stevhliu almost 3 years ago - 1 comment

#3334 - Integrate Polars library

Issue - State: closed - Opened by albertvillanova almost 3 years ago - 8 comments
Labels: enhancement

#3299 - Add option to find unique elements in nested sequences when calling `Dataset.unique`

Issue - State: open - Opened by mariosasko almost 3 years ago - 4 comments
Labels: enhancement

#3220 - Add documentation about dataset viewer feature

Issue - State: open - Opened by albertvillanova almost 3 years ago - 1 comment
Labels: enhancement, dataset-viewer

#3178 - "Property couldn't be hashed properly" even though fully picklable

Issue - State: closed - Opened by BramVanroy almost 3 years ago - 26 comments
Labels: bug

#3172 - `SystemError 15` thrown in `Dataset.__del__` when using `Dataset.map()` with `num_proc>1`

Issue - State: closed - Opened by vlievin almost 3 years ago - 12 comments
Labels: bug

#3142 - Provide a way to write a streamed dataset to the disk

Issue - State: open - Opened by severo almost 3 years ago - 2 comments
Labels: enhancement, dataset-viewer

#3113 - Loading Data from HDF files

Issue - State: open - Opened by FeryET almost 3 years ago - 7 comments
Labels: enhancement, good second issue

#2976 - Can't load dataset

Issue - State: closed - Opened by mskovalova about 3 years ago - 4 comments
Labels: bug

#2969 - medical-dialog error

Issue - State: closed - Opened by smeyerhot about 3 years ago - 3 comments
Labels: bug

#2964 - Error when calculating Matthews Correlation Coefficient loaded with `load_metric`

Issue - State: closed - Opened by alvarobartt about 3 years ago - 1 comment
Labels: bug

#2956 - Cache problem in the `load_dataset` method for local compressed file(s)

Issue - State: open - Opened by SaulLu about 3 years ago - 1 comment
Labels: bug

#2924 - "File name too long" error for file locks

Issue - State: closed - Opened by gar1t about 3 years ago - 12 comments
Labels: bug

#2869 - TypeError: 'NoneType' object is not callable

Issue - State: closed - Opened by Chenfei-Kang about 3 years ago - 9 comments
Labels: bug

#2868 - Add Common Objects in 3D (CO3D)

Issue - State: open - Opened by nateraw about 3 years ago
Labels: dataset request, vision

#2838 - Add error_bad_chunk to the JSON loader

Pull Request - State: open - Opened by lhoestq about 3 years ago - 4 comments

#2825 - The datasets.map function does not load cached dataset after moving python script

Issue - State: closed - Opened by hobbitlzy about 3 years ago - 6 comments
Labels: bug

#2818 - cannot load data from my loacal path

Issue - State: closed - Opened by yang-collect about 3 years ago - 1 comment
Labels: bug

#2787 - ConnectionError: Couldn't reach https://raw.githubusercontent.com

Issue - State: closed - Opened by jinec about 3 years ago - 9 comments
Labels: bug

#2775 - `generate_random_fingerprint()` deterministic with 🤗Transformers' `set_seed()`

Issue - State: closed - Opened by mbforbes about 3 years ago - 3 comments
Labels: bug

#2773 - Remove dataset_infos.json

Issue - State: closed - Opened by albertvillanova about 3 years ago - 1 comment
Labels: enhancement, generic discussion

#2763 - English wikipedia datasets is not clean

Issue - State: closed - Opened by lucadiliello about 3 years ago - 1 comment
Labels: bug

#2699 - cannot combine splits merging and streaming?

Issue - State: open - Opened by eyaler about 3 years ago - 5 comments
Labels: bug

#2689 - cannot save the dataset to disk after rename_column

Issue - State: closed - Opened by PaulLerner about 3 years ago - 4 comments
Labels: bug

#2666 - Adds CodeClippy dataset [WIP]

Pull Request - State: closed - Opened by arampacha about 3 years ago - 2 comments
Labels: dataset contribution

#2656 - Change `from_csv` default arguments

Pull Request - State: closed - Opened by SBrandeis about 3 years ago - 1 comment

#2655 - Allow the selection of multiple columns at once

Issue - State: closed - Opened by Dref360 about 3 years ago - 5 comments
Labels: enhancement

#2650 - [load_dataset] shard and parallelize the process

Issue - State: closed - Opened by stas00 about 3 years ago - 4 comments
Labels: enhancement

#2642 - Support multi-worker with streaming dataset (IterableDataset).

Issue - State: open - Opened by cccntu about 3 years ago - 3 comments
Labels: enhancement

#2618 - `filelock.py` Error

Issue - State: closed - Opened by liyucheng09 about 3 years ago - 2 comments
Labels: bug

#2514 - Can datasets remove duplicated rows?

Issue - State: open - Opened by liuxinglan over 3 years ago - 12 comments
Labels: enhancement

#2462 - Merge DatasetDict and Dataset

Issue - State: open - Opened by albertvillanova over 3 years ago - 2 comments
Labels: enhancement, generic discussion

#2377 - ArrowDataset.save_to_disk produces files that cannot be read using pyarrow.feather

Issue - State: open - Opened by Ark-kun over 3 years ago - 4 comments
Labels: bug

#2371 - Align question answering tasks with sub-domains

Issue - State: closed - Opened by lewtun over 3 years ago - 1 comment
Labels: enhancement

#2370 - Adding HendrycksTest dataset

Pull Request - State: closed - Opened by andyzoujm over 3 years ago - 5 comments

#2252 - Slow dataloading with big datasets issue persists

Issue - State: closed - Opened by hwijeen over 3 years ago - 70 comments

#2096 - CoNLL 2003 dataset not including German

Issue - State: closed - Opened by rxian over 3 years ago - 2 comments
Labels: dataset request

#2089 - Add documentaton for dataset README.md files

Issue - State: closed - Opened by PhilipMay over 3 years ago - 8 comments

#2060 - Filtering refactor

Pull Request - State: closed - Opened by theo-m over 3 years ago - 10 comments

#2058 - Is it possible to convert a `tfds` to HuggingFace `dataset`?

Issue - State: closed - Opened by abarbosa94 over 3 years ago - 1 comment

#2035 - wiki40b/wikipedia for almost all languages cannot be downloaded

Issue - State: closed - Opened by dorost1234 over 3 years ago - 11 comments

#2003 - Messages are being printed to the `stdout`

Issue - State: closed - Opened by mahnerak over 3 years ago - 3 comments

#1992 - `datasets.map` multi processing much slower than single processing

Issue - State: open - Opened by hwijeen over 3 years ago - 14 comments
Labels: bug

#1933 - Use arrow ipc file format

Pull Request - State: closed - Opened by lhoestq over 3 years ago - 3 comments

#1835 - Add CHiME4 dataset

Issue - State: open - Opened by patrickvonplaten over 3 years ago - 4 comments
Labels: dataset request, speech

#1796 - Filter on dataset too much slowww

Issue - State: open - Opened by ayubSubhaniya over 3 years ago - 9 comments

#1774 - is it possible to make slice to be more compatible like python list and numpy?

Issue - State: closed - Opened by world2vec over 3 years ago - 2 comments

#1742 - Add GLUE Compat (compatible with transformers<3.5.0)

Pull Request - State: closed - Opened by JetRunner over 3 years ago - 2 comments

#1627 - `Dataset.map` disable progress bar

Issue - State: closed - Opened by Nickil21 almost 4 years ago - 3 comments

#1600 - AttributeError: 'DatasetDict' object has no attribute 'train_test_split'

Issue - State: closed - Opened by david-waterworth almost 4 years ago - 7 comments
Labels: question

#1443 - Add OPUS Wikimedia Translations Dataset

Pull Request - State: closed - Opened by abhishekkrthakur almost 4 years ago - 1 comment
Labels: dataset contribution

#1407 - Add Tweet Eval Dataset

Pull Request - State: closed - Opened by abhishekkrthakur almost 4 years ago - 4 comments

#1297 - OPUS Ted Talks 2013

Pull Request - State: closed - Opened by abhishekkrthakur almost 4 years ago

#1245 - Add Google Turkish Treebank Dataset

Pull Request - State: closed - Opened by abhishekkrthakur almost 4 years ago - 1 comment
Labels: dataset contribution

#1243 - Add Google Noun Verb Dataset

Pull Request - State: closed - Opened by abhishekkrthakur almost 4 years ago - 1 comment
Labels: dataset contribution

#1240 - Multi Domain Sentiment Analysis Dataset (MDSA)

Pull Request - State: closed - Opened by abhishekkrthakur almost 4 years ago - 9 comments
Labels: dataset contribution

#1206 - Adding Enriched WebNLG dataset

Pull Request - State: closed - Opened by TevenLeScao almost 4 years ago - 3 comments

#961 - sample multiple datasets

Issue - State: closed - Opened by rabeehk almost 4 years ago - 6 comments

#960 - Add code to automate parts of the dataset card

Pull Request - State: closed - Opened by patrickvonplaten almost 4 years ago

#937 - Local machine/cluster Beam Datasets example/tutorial

Issue - State: closed - Opened by shangw-nvidia almost 4 years ago - 2 comments

#876 - imdb dataset cannot be loaded

Issue - State: closed - Opened by rabeehk almost 4 years ago - 6 comments

#873 - load_dataset('cnn_dalymail', '3.0.0') gives a 'Not a directory' error

Issue - State: closed - Opened by vishal-burman almost 4 years ago - 13 comments

#868 - Consistent metric outputs

Pull Request - State: closed - Opened by lhoestq almost 4 years ago - 2 comments
Labels: transfer-to-evaluate

#856 - Add open book corpus

Pull Request - State: closed - Opened by vblagoje almost 4 years ago - 21 comments

#843 - use_custom_baseline still produces errors for bertscore

Issue - State: closed - Opened by penatbater almost 4 years ago - 5 comments
Labels: metric bug

#824 - Discussion using datasets in offline mode

Issue - State: closed - Opened by mandubian almost 4 years ago - 11 comments
Labels: enhancement, generic discussion

#693 - Rachel ker add dataset/mlsum

Pull Request - State: closed - Opened by pdhg almost 4 years ago - 1 comment

#662 - Created dataset card snli.md

Pull Request - State: closed - Opened by mcmillanmajora about 4 years ago - 1 comment
Labels: Dataset discussion

#645 - Don't use take on dataset table in pyarrow 1.0.x

Pull Request - State: closed - Opened by lhoestq about 4 years ago - 4 comments

#605 - [Datasets] Transmit format to children

Pull Request - State: closed - Opened by thomwolf about 4 years ago - 1 comment

#599 - Add MATINF dataset

Pull Request - State: closed - Opened by JetRunner about 4 years ago - 2 comments

#562 - [Reproductibility] Allow to pin versions of datasets/metrics

Pull Request - State: closed - Opened by thomwolf about 4 years ago - 1 comment

#546 - Very slow data loading on large dataset

Issue - State: closed - Opened by agemagician about 4 years ago - 26 comments

#480 - Column indexing hotfix

Pull Request - State: closed - Opened by TevenLeScao about 4 years ago - 2 comments

#462 - add DoQA (ACL 2020) dataset

Pull Request - State: closed - Opened by mariamabarham about 4 years ago

#461 - Doqa

Pull Request - State: closed - Opened by mariamabarham about 4 years ago

#456 - add crd3(ACL 2020) dataset

Pull Request - State: closed - Opened by mariamabarham about 4 years ago

#449 - add reuters21578 dataset

Pull Request - State: closed - Opened by mariamabarham about 4 years ago - 3 comments

#406 - Faster Shuffling?

Issue - State: closed - Opened by mitchellgordon95 about 4 years ago - 7 comments