Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / huggingface/datasets issues and pull requests

#5965 - "Couldn't cast array of type" in complex datasets

Issue - State: closed - Opened by piercefreeman over 1 year ago - 4 comments

#5941 - Load Data Sets Too Slow In Train Seq2seq Model

Issue - State: closed - Opened by xyx361100238 over 1 year ago - 10 comments

#5912 - Missing elements in `map` a batched dataset

Issue - State: closed - Opened by sachinruk over 1 year ago - 1 comment

#5903 - Relax `ci.yml` trigger for `pull_request` based on modified paths

Pull Request - State: open - Opened by alvarobartt over 1 year ago - 3 comments

#5896 - HuggingFace does not cache downloaded files aggressively/early enough

Issue - State: closed - Opened by geajack over 1 year ago - 2 comments

#5893 - Load cached dataset as iterable

Pull Request - State: open - Opened by mariusz-jachimowicz-83 over 1 year ago - 1 comment

#5892 - User access requests with manual review do not notify the dataset owner

Issue - State: open - Opened by leondz over 1 year ago - 1 comment

#5891 - Make split slicing consistent with list slicing

Pull Request - State: closed - Opened by mariosasko over 1 year ago - 4 comments

#5887 - HuggingsFace dataset example give error

Issue - State: open - Opened by donhuvy over 1 year ago

#5886 - Use work-stealing algorithm when parallel computing

Issue - State: open - Opened by 1014661165 over 1 year ago
Labels: enhancement

#5885 - Modify `is_remote_filesystem` to return True for FUSE-mounted paths

Pull Request - State: closed - Opened by maddiedawson over 1 year ago - 5 comments

#5883 - Fix `Dataset.to_tf_dataset` when encoding-strings & minor improvements

Pull Request - State: open - Opened by alvarobartt over 1 year ago - 9 comments

#5881 - Split dataset by node: index error when sharding iterable dataset

Issue - State: open - Opened by sanchit-gandhi over 1 year ago - 3 comments

#5878 - Prefetching for IterableDataset

Issue - State: open - Opened by vyeevani over 1 year ago - 4 comments
Labels: enhancement

#5877 - Request for text deduplication feature

Issue - State: open - Opened by SupreethRao99 over 1 year ago - 4 comments
Labels: enhancement

#5876 - Incompatibility with DataLab

Issue - State: open - Opened by helpmefindaname over 1 year ago - 1 comment

#5875 - Why split slicing doesn't behave like list slicing ?

Issue - State: closed - Opened by astariul over 1 year ago - 1 comment
Labels: duplicate

#5874 - Using as_dataset on a "parquet" builder

Issue - State: open - Opened by rems75 over 1 year ago

#5873 - Allow setting the environment variable for the lock file path

Issue - State: open - Opened by xin3he over 1 year ago
Labels: enhancement

#5872 - Fix infer module for uppercase extensions

Pull Request - State: closed - Opened by albertvillanova over 1 year ago - 2 comments

#5871 - data configuration hash suffix depends on uncanonicalized data_dir

Issue - State: open - Opened by kylrth over 1 year ago - 1 comment

#5870 - Behaviour difference between datasets.map and IterableDatasets.map

Issue - State: open - Opened by llStringll over 1 year ago - 1 comment

#5869 - Image Encoding Issue when submitting a Parquet Dataset

Issue - State: open - Opened by PhilippeMoussalli over 1 year ago - 7 comments
Labels: bug

#5868 - Is it possible to change a cached file and 're-cache' it instead of re-generating?

Issue - State: closed - Opened by zyh3826 over 1 year ago - 2 comments
Labels: enhancement

#5867 - Add logic for hashing modules/functions optimized with `torch.compile`

Pull Request - State: closed - Opened by mariosasko over 1 year ago - 5 comments

#5866 - Issue with Sequence features

Issue - State: open - Opened by alialamiidrissi over 1 year ago

#5865 - Deprecate task api

Pull Request - State: open - Opened by mariosasko over 1 year ago - 3 comments

#5864 - Slow iteration over Torch tensors

Issue - State: open - Opened by crisostomi over 1 year ago - 1 comment

#5863 - Use a new low-memory approach for tf dataset index shuffling

Pull Request - State: open - Opened by Rocketknight1 over 1 year ago - 20 comments

#5862 - IndexError: list index out of range with data hosted on Zenodo

Issue - State: open - Opened by albertvillanova over 1 year ago - 1 comment
Labels: bug

#5861 - Better error message when combining dataset dicts instead of datasets

Pull Request - State: closed - Opened by lhoestq over 1 year ago - 7 comments

#5860 - Minor tqdm optim

Pull Request - State: closed - Opened by lhoestq over 1 year ago - 3 comments

#5859 - Raise TypeError when indexing a dataset with bool

Pull Request - State: open - Opened by albertvillanova over 1 year ago - 1 comment

#5858 - Throw an error when dataset improperly indexed

Issue - State: open - Opened by sarahwie over 1 year ago - 1 comment

#5857 - Adding chemistry dataset/models in huggingface

Issue - State: open - Opened by knc6 over 1 year ago
Labels: enhancement

#5856 - Error loading natural_questions

Issue - State: open - Opened by Crownor over 1 year ago

#5855 - `to_tf_dataset` consumes too much memory

Issue - State: open - Opened by massquantity over 1 year ago - 6 comments

#5854 - Can not load audiofolder dataset on kaggle

Issue - State: closed - Opened by ILG2021 over 1 year ago - 3 comments

#5853 - [docs] Redirects, migrated from nginx

Pull Request - State: closed - Opened by julien-c over 1 year ago - 3 comments

#5852 - Iterable torch formatting

Pull Request - State: open - Opened by lhoestq over 1 year ago - 1 comment

#5851 - Error message not clear in interleaving datasets

Issue - State: closed - Opened by surya-narayanan over 1 year ago

#5850 - Make packaged builders skip non-supported file formats

Pull Request - State: open - Opened by albertvillanova over 1 year ago - 8 comments

#5849 - CSV datasets should only read the CSV data files in the repo

Issue - State: open - Opened by albertvillanova over 1 year ago
Labels: bug

#5848 - Add `accelerate` as metric's test dependency to fix CI error

Pull Request - State: closed - Opened by mariosasko over 1 year ago - 3 comments

#5847 - Streaming IterableDataset not working with translation pipeline

Issue - State: open - Opened by jlquinn over 1 year ago - 8 comments

#5846 - load_dataset('bigcode/the-stack-dedup', streaming=True) very slow!

Issue - State: closed - Opened by tbenthompson over 1 year ago - 7 comments

#5845 - Add `date_format` param to the CSV reader

Pull Request - State: closed - Opened by mariosasko over 1 year ago - 6 comments

#5843 - Can't add iterable datasets to a Dataset Dict.

Issue - State: open - Opened by surya-narayanan over 1 year ago - 2 comments

#5842 - Remove columns in interable dataset

Issue - State: open - Opened by surya-narayanan over 1 year ago - 3 comments

#5841 - Abusurdly slow on iteration

Issue - State: closed - Opened by fecet over 1 year ago - 4 comments

#5840 - load model error.

Issue - State: closed - Opened by LanShanPi over 1 year ago - 1 comment

#5839 - Make models/functions optimized with `torch.compile` hashable

Issue - State: closed - Opened by mariosasko over 1 year ago
Labels: enhancement

#5838 - Streaming support for `load_from_disk`

Issue - State: closed - Opened by Nilabhra over 1 year ago - 10 comments
Labels: enhancement

#5837 - Use DeepSpeed load myself " .csv " dataset.

Issue - State: open - Opened by LanShanPi over 1 year ago - 3 comments

#5836 - [docs] Custom decoding transforms

Pull Request - State: closed - Opened by stevhliu over 1 year ago - 4 comments

#5835 - Always set nullable fields in the writer

Pull Request - State: closed - Opened by lhoestq over 1 year ago - 4 comments

#5834 - Is uint8 supported?

Issue - State: closed - Opened by Ryou0634 over 1 year ago - 5 comments

#5833 - Unable to push dataset - `create_pr` problem

Issue - State: closed - Opened by agombert over 1 year ago - 14 comments

#5831 - [Bug]504 Server Error when loading dataset which was already cached

Issue - State: open - Opened by SingL3 over 1 year ago - 6 comments

#5830 - Debug windows #2

Pull Request - State: closed - Opened by HyukjinKwon over 1 year ago

#5828 - Stream data concatenation issue

Issue - State: closed - Opened by krishnapriya-18 over 1 year ago - 2 comments

#5827 - load json dataset interrupt when dtype cast problem occured

Issue - State: open - Opened by 1014661165 over 1 year ago - 1 comment

#5826 - Support working_dir in from_spark

Pull Request - State: open - Opened by maddiedawson over 1 year ago - 3 comments

#5825 - FileNotFound even though exists

Issue - State: closed - Opened by Muennighoff over 1 year ago - 4 comments

#5824 - Fix incomplete docstring for `BuilderConfig`

Pull Request - State: closed - Opened by Laurent2916 over 1 year ago - 2 comments

#5823 - [2.12.0] DatasetDict.save_to_disk not saving to S3

Issue - State: closed - Opened by thejamesmarq over 1 year ago - 3 comments

#5822 - Audio Dataset with_format torch problem

Issue - State: closed - Opened by paulbauriegel over 1 year ago - 2 comments

#5821 - IterableDataset Arrow formatting

Pull Request - State: open - Opened by lhoestq over 1 year ago - 5 comments

#5820 - Incomplete docstring for `BuilderConfig`

Issue - State: closed - Opened by Laurent2916 over 1 year ago - 1 comment
Labels: good first issue

#5819 - Cannot pickle error in Dataset.from_generator()

Issue - State: closed - Opened by xinghaow99 over 1 year ago - 2 comments

#5818 - Ability to update a dataset

Issue - State: open - Opened by davidgilbertson over 1 year ago - 3 comments
Labels: enhancement

#5817 - Setting `num_proc` errors when `.map` returns additional items.

Issue - State: closed - Opened by davidgilbertson over 1 year ago - 3 comments

#5816 - Preserve `stopping_strategy` of shuffled interleaved dataset (random cycling case)

Pull Request - State: closed - Opened by mariosasko over 1 year ago - 3 comments

#5815 - Easy way to create a Kaggle dataset from a Huggingface dataset?

Issue - State: open - Opened by hrbigelow over 1 year ago - 4 comments

#5814 - Repro windows crash

Pull Request - State: closed - Opened by maddiedawson over 1 year ago - 1 comment

#5813 - [DO-NOT-MERGE] Debug Windows issue at #3

Pull Request - State: closed - Opened by HyukjinKwon over 1 year ago

#5812 - Cannot shuffle interleaved IterableDataset with "all_exhausted" stopping strategy

Issue - State: closed - Opened by off99555 over 1 year ago
Labels: bug, streaming

#5810 - Add `fn_kwargs` to `map` and `filter` of `IterableDataset` and `IterableDatasetDict`

Pull Request - State: closed - Opened by yuukicammy over 1 year ago - 9 comments

#5809 - wiki_dpr details for Open Domain Question Answering tasks

Issue - State: open - Opened by yulgok22 over 1 year ago - 1 comment

#5807 - Support parallelized downloading in load_dataset with Spark

Pull Request - State: open - Opened by es94129 over 1 year ago - 2 comments

#5806 - Return the name of the currently loaded file in the load_dataset function.

Issue - State: open - Opened by s-JoL over 1 year ago - 13 comments
Labels: enhancement, good first issue

#5805 - Improve `Create a dataset` tutorial

Issue - State: open - Opened by polinaeterna over 1 year ago - 4 comments
Labels: documentation

#5804 - Set dev version

Pull Request - State: closed - Opened by lhoestq over 1 year ago - 3 comments

#5803 - Release: 2.12.0

Pull Request - State: closed - Opened by lhoestq over 1 year ago - 4 comments

#5802 - Validate non-empty data_files

Pull Request - State: closed - Opened by albertvillanova over 1 year ago - 2 comments

#5800 - Change downloaded file permission based on umask

Pull Request - State: closed - Opened by albertvillanova over 1 year ago - 1 comment

#5799 - Files downloaded to cache do not respect umask

Issue - State: closed - Opened by albertvillanova over 1 year ago
Labels: bug

#5798 - Support parallelized downloading and processing in load_dataset with Spark

Issue - State: open - Opened by es94129 over 1 year ago - 13 comments
Labels: enhancement

#5797 - load_dataset is case sentitive?

Issue - State: open - Opened by haonan-li over 1 year ago - 2 comments

#5796 - Spark docs

Pull Request - State: closed - Opened by lhoestq over 1 year ago - 4 comments