Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / huggingface/datasets issues and pull requests
#5965 - "Couldn't cast array of type" in complex datasets
Issue -
State: closed - Opened by piercefreeman over 1 year ago
- 4 comments
#5961 - IterableDataset: split by node and map may preprocess samples that will be skipped anyway
Issue -
State: open - Opened by johnchienbronci over 1 year ago
- 9 comments
#5941 - Load Data Sets Too Slow In Train Seq2seq Model
Issue -
State: closed - Opened by xyx361100238 over 1 year ago
- 10 comments
#5923 - Cannot import datasets - ValueError: pyarrow.lib.IpcWriteOptions size changed, may indicate binary incompatibility
Issue -
State: closed - Opened by ehuangc over 1 year ago
- 25 comments
#5912 - Missing elements in `map` a batched dataset
Issue -
State: closed - Opened by sachinruk over 1 year ago
- 1 comment
#5903 - Relax `ci.yml` trigger for `pull_request` based on modified paths
Pull Request -
State: open - Opened by alvarobartt over 1 year ago
- 3 comments
#5896 - HuggingFace does not cache downloaded files aggressively/early enough
Issue -
State: closed - Opened by geajack over 1 year ago
- 2 comments
#5893 - Load cached dataset as iterable
Pull Request -
State: open - Opened by mariusz-jachimowicz-83 over 1 year ago
- 1 comment
#5892 - User access requests with manual review do not notify the dataset owner
Issue -
State: open - Opened by leondz over 1 year ago
- 1 comment
#5891 - Make split slicing consistent with list slicing
Pull Request -
State: closed - Opened by mariosasko over 1 year ago
- 4 comments
#5889 - Token Alignment for input and output data over train and test batch/dataset.
Issue -
State: open - Opened by akesh1235 over 1 year ago
#5888 - A way to upload and visualize .mp4 files (millions of them) as part of a dataset
Issue -
State: open - Opened by AntreasAntoniou over 1 year ago
- 5 comments
#5887 - HuggingsFace dataset example give error
Issue -
State: open - Opened by donhuvy over 1 year ago
#5886 - Use work-stealing algorithm when parallel computing
Issue -
State: open - Opened by 1014661165 over 1 year ago
Labels: enhancement
#5885 - Modify `is_remote_filesystem` to return True for FUSE-mounted paths
Pull Request -
State: closed - Opened by maddiedawson over 1 year ago
- 5 comments
#5884 - `Dataset.to_tf_dataset` fails when strings cannot be encoded as `np.bytes_`
Issue -
State: open - Opened by alvarobartt over 1 year ago
- 2 comments
#5883 - Fix `Dataset.to_tf_dataset` when encoding-strings & minor improvements
Pull Request -
State: open - Opened by alvarobartt over 1 year ago
- 9 comments
#5881 - Split dataset by node: index error when sharding iterable dataset
Issue -
State: open - Opened by sanchit-gandhi over 1 year ago
- 3 comments
#5880 - load_dataset from s3 file system through streaming can't not iterate data
Issue -
State: open - Opened by janineguo over 1 year ago
#5878 - Prefetching for IterableDataset
Issue -
State: open - Opened by vyeevani over 1 year ago
- 4 comments
Labels: enhancement
#5877 - Request for text deduplication feature
Issue -
State: open - Opened by SupreethRao99 over 1 year ago
- 4 comments
Labels: enhancement
#5876 - Incompatibility with DataLab
Issue -
State: open - Opened by helpmefindaname over 1 year ago
- 1 comment
#5875 - Why split slicing doesn't behave like list slicing ?
Issue -
State: closed - Opened by astariul over 1 year ago
- 1 comment
Labels: duplicate
#5874 - Using as_dataset on a "parquet" builder
Issue -
State: open - Opened by rems75 over 1 year ago
#5873 - Allow setting the environment variable for the lock file path
Issue -
State: open - Opened by xin3he over 1 year ago
Labels: enhancement
#5872 - Fix infer module for uppercase extensions
Pull Request -
State: closed - Opened by albertvillanova over 1 year ago
- 2 comments
#5871 - data configuration hash suffix depends on uncanonicalized data_dir
Issue -
State: open - Opened by kylrth over 1 year ago
- 1 comment
#5870 - Behaviour difference between datasets.map and IterableDatasets.map
Issue -
State: open - Opened by llStringll over 1 year ago
- 1 comment
#5869 - Image Encoding Issue when submitting a Parquet Dataset
Issue -
State: open - Opened by PhilippeMoussalli over 1 year ago
- 7 comments
Labels: bug
#5868 - Is it possible to change a cached file and 're-cache' it instead of re-generating?
Issue -
State: closed - Opened by zyh3826 over 1 year ago
- 2 comments
Labels: enhancement
#5867 - Add logic for hashing modules/functions optimized with `torch.compile`
Pull Request -
State: closed - Opened by mariosasko over 1 year ago
- 5 comments
#5866 - Issue with Sequence features
Issue -
State: open - Opened by alialamiidrissi over 1 year ago
#5865 - Deprecate task api
Pull Request -
State: open - Opened by mariosasko over 1 year ago
- 3 comments
#5864 - Slow iteration over Torch tensors
Issue -
State: open - Opened by crisostomi over 1 year ago
- 1 comment
#5863 - Use a new low-memory approach for tf dataset index shuffling
Pull Request -
State: open - Opened by Rocketknight1 over 1 year ago
- 20 comments
#5862 - IndexError: list index out of range with data hosted on Zenodo
Issue -
State: open - Opened by albertvillanova over 1 year ago
- 1 comment
Labels: bug
#5861 - Better error message when combining dataset dicts instead of datasets
Pull Request -
State: closed - Opened by lhoestq over 1 year ago
- 7 comments
#5860 - Minor tqdm optim
Pull Request -
State: closed - Opened by lhoestq over 1 year ago
- 3 comments
#5859 - Raise TypeError when indexing a dataset with bool
Pull Request -
State: open - Opened by albertvillanova over 1 year ago
- 1 comment
#5858 - Throw an error when dataset improperly indexed
Issue -
State: open - Opened by sarahwie over 1 year ago
- 1 comment
#5857 - Adding chemistry dataset/models in huggingface
Issue -
State: open - Opened by knc6 over 1 year ago
Labels: enhancement
#5856 - Error loading natural_questions
Issue -
State: open - Opened by Crownor over 1 year ago
#5855 - `to_tf_dataset` consumes too much memory
Issue -
State: open - Opened by massquantity over 1 year ago
- 6 comments
#5854 - Can not load audiofolder dataset on kaggle
Issue -
State: closed - Opened by ILG2021 over 1 year ago
- 3 comments
#5853 - [docs] Redirects, migrated from nginx
Pull Request -
State: closed - Opened by julien-c over 1 year ago
- 3 comments
#5852 - Iterable torch formatting
Pull Request -
State: open - Opened by lhoestq over 1 year ago
- 1 comment
#5851 - Error message not clear in interleaving datasets
Issue -
State: closed - Opened by surya-narayanan over 1 year ago
#5850 - Make packaged builders skip non-supported file formats
Pull Request -
State: open - Opened by albertvillanova over 1 year ago
- 8 comments
#5849 - CSV datasets should only read the CSV data files in the repo
Issue -
State: open - Opened by albertvillanova over 1 year ago
Labels: bug
#5848 - Add `accelerate` as metric's test dependency to fix CI error
Pull Request -
State: closed - Opened by mariosasko over 1 year ago
- 3 comments
#5847 - Streaming IterableDataset not working with translation pipeline
Issue -
State: open - Opened by jlquinn over 1 year ago
- 8 comments
#5846 - load_dataset('bigcode/the-stack-dedup', streaming=True) very slow!
Issue -
State: closed - Opened by tbenthompson over 1 year ago
- 7 comments
#5845 - Add `date_format` param to the CSV reader
Pull Request -
State: closed - Opened by mariosasko over 1 year ago
- 6 comments
#5844 - TypeError: Couldn't cast array of type struct<answer: struct<unanswerable: bool, answerType: string, free_form_answer: string, evidence: list<item: string>, evidenceAnnotate: list<item: string>, highlighted_evidence: list<item: string>>> to ...
Issue -
State: open - Opened by chen-coding over 1 year ago
#5843 - Can't add iterable datasets to a Dataset Dict.
Issue -
State: open - Opened by surya-narayanan over 1 year ago
- 2 comments
#5842 - Remove columns in interable dataset
Issue -
State: open - Opened by surya-narayanan over 1 year ago
- 3 comments
#5841 - Abusurdly slow on iteration
Issue -
State: closed - Opened by fecet over 1 year ago
- 4 comments
#5840 - load model error.
Issue -
State: closed - Opened by LanShanPi over 1 year ago
- 1 comment
#5839 - Make models/functions optimized with `torch.compile` hashable
Issue -
State: closed - Opened by mariosasko over 1 year ago
Labels: enhancement
#5838 - Streaming support for `load_from_disk`
Issue -
State: closed - Opened by Nilabhra over 1 year ago
- 10 comments
Labels: enhancement
#5837 - Use DeepSpeed load myself " .csv " dataset.
Issue -
State: open - Opened by LanShanPi over 1 year ago
- 3 comments
#5836 - [docs] Custom decoding transforms
Pull Request -
State: closed - Opened by stevhliu over 1 year ago
- 4 comments
#5835 - Always set nullable fields in the writer
Pull Request -
State: closed - Opened by lhoestq over 1 year ago
- 4 comments
#5834 - Is uint8 supported?
Issue -
State: closed - Opened by Ryou0634 over 1 year ago
- 5 comments
#5833 - Unable to push dataset - `create_pr` problem
Issue -
State: closed - Opened by agombert over 1 year ago
- 14 comments
#5832 - 404 Client Error: Not Found for url: https://huggingface.co/api/models/bert-large-cased
Issue -
State: closed - Opened by varungupta31 over 1 year ago
- 1 comment
#5831 - [Bug]504 Server Error when loading dataset which was already cached
Issue -
State: open - Opened by SingL3 over 1 year ago
- 6 comments
#5830 - Debug windows #2
Pull Request -
State: closed - Opened by HyukjinKwon over 1 year ago
#5829 - (mach-o file, but is an incompatible architecture (have 'arm64', need 'x86_64'))
Issue -
State: closed - Opened by elcolie over 1 year ago
- 2 comments
#5828 - Stream data concatenation issue
Issue -
State: closed - Opened by krishnapriya-18 over 1 year ago
- 2 comments
#5827 - load json dataset interrupt when dtype cast problem occured
Issue -
State: open - Opened by 1014661165 over 1 year ago
- 1 comment
#5826 - Support working_dir in from_spark
Pull Request -
State: open - Opened by maddiedawson over 1 year ago
- 3 comments
#5825 - FileNotFound even though exists
Issue -
State: closed - Opened by Muennighoff over 1 year ago
- 4 comments
#5824 - Fix incomplete docstring for `BuilderConfig`
Pull Request -
State: closed - Opened by Laurent2916 over 1 year ago
- 2 comments
#5823 - [2.12.0] DatasetDict.save_to_disk not saving to S3
Issue -
State: closed - Opened by thejamesmarq over 1 year ago
- 3 comments
#5822 - Audio Dataset with_format torch problem
Issue -
State: closed - Opened by paulbauriegel over 1 year ago
- 2 comments
#5821 - IterableDataset Arrow formatting
Pull Request -
State: open - Opened by lhoestq over 1 year ago
- 5 comments
#5820 - Incomplete docstring for `BuilderConfig`
Issue -
State: closed - Opened by Laurent2916 over 1 year ago
- 1 comment
Labels: good first issue
#5819 - Cannot pickle error in Dataset.from_generator()
Issue -
State: closed - Opened by xinghaow99 over 1 year ago
- 2 comments
#5818 - Ability to update a dataset
Issue -
State: open - Opened by davidgilbertson over 1 year ago
- 3 comments
Labels: enhancement
#5817 - Setting `num_proc` errors when `.map` returns additional items.
Issue -
State: closed - Opened by davidgilbertson over 1 year ago
- 3 comments
#5816 - Preserve `stopping_strategy` of shuffled interleaved dataset (random cycling case)
Pull Request -
State: closed - Opened by mariosasko over 1 year ago
- 3 comments
#5815 - Easy way to create a Kaggle dataset from a Huggingface dataset?
Issue -
State: open - Opened by hrbigelow over 1 year ago
- 4 comments
#5814 - Repro windows crash
Pull Request -
State: closed - Opened by maddiedawson over 1 year ago
- 1 comment
#5813 - [DO-NOT-MERGE] Debug Windows issue at #3
Pull Request -
State: closed - Opened by HyukjinKwon over 1 year ago
#5812 - Cannot shuffle interleaved IterableDataset with "all_exhausted" stopping strategy
Issue -
State: closed - Opened by off99555 over 1 year ago
Labels: bug, streaming
#5811 - load_dataset: TypeError: 'NoneType' object is not callable, on local dataset filename changes
Issue -
State: open - Opened by durapensa over 1 year ago
- 1 comment
#5810 - Add `fn_kwargs` to `map` and `filter` of `IterableDataset` and `IterableDatasetDict`
Pull Request -
State: closed - Opened by yuukicammy over 1 year ago
- 9 comments
#5809 - wiki_dpr details for Open Domain Question Answering tasks
Issue -
State: open - Opened by yulgok22 over 1 year ago
- 1 comment
#5807 - Support parallelized downloading in load_dataset with Spark
Pull Request -
State: open - Opened by es94129 over 1 year ago
- 2 comments
#5806 - Return the name of the currently loaded file in the load_dataset function.
Issue -
State: open - Opened by s-JoL over 1 year ago
- 13 comments
Labels: enhancement, good first issue
#5805 - Improve `Create a dataset` tutorial
Issue -
State: open - Opened by polinaeterna over 1 year ago
- 4 comments
Labels: documentation
#5804 - Set dev version
Pull Request -
State: closed - Opened by lhoestq over 1 year ago
- 3 comments
#5803 - Release: 2.12.0
Pull Request -
State: closed - Opened by lhoestq over 1 year ago
- 4 comments
#5802 - Validate non-empty data_files
Pull Request -
State: closed - Opened by albertvillanova over 1 year ago
- 2 comments
#5800 - Change downloaded file permission based on umask
Pull Request -
State: closed - Opened by albertvillanova over 1 year ago
- 1 comment
#5799 - Files downloaded to cache do not respect umask
Issue -
State: closed - Opened by albertvillanova over 1 year ago
Labels: bug
#5798 - Support parallelized downloading and processing in load_dataset with Spark
Issue -
State: open - Opened by es94129 over 1 year ago
- 13 comments
Labels: enhancement
#5797 - load_dataset is case sentitive?
Issue -
State: open - Opened by haonan-li over 1 year ago
- 2 comments
#5796 - Spark docs
Pull Request -
State: closed - Opened by lhoestq over 1 year ago
- 4 comments