Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / huggingface/datasets issues and pull requests
#6357 - Allow passing a multiprocessing context to functions that support `num_proc`
Issue -
State: open - Opened by bryant1410 11 months ago
Labels: enhancement
#6356 - Add `fsspec` version to the `datasets-cli env` command output
Pull Request -
State: closed - Opened by mariosasko 11 months ago
- 3 comments
#6355 - More hub centric docs
Pull Request -
State: closed - Opened by lhoestq 11 months ago
- 3 comments
#6354 - `IterableDataset.from_spark` does not support multiple workers in pytorch `Dataloader`
Issue -
State: open - Opened by NazyS 11 months ago
- 1 comment
#6353 - load_dataset save_to_disk load_from_disk error
Issue -
State: closed - Opened by brisker 11 months ago
- 5 comments
#6352 - Error loading wikitext data raise NotImplementedError(f"Loading a dataset cached in a {type(self._fs).__name__} is not supported.")
Issue -
State: closed - Opened by Ahmed-Roushdy 11 months ago
- 13 comments
#6351 - Fix use_dataset.mdx
Pull Request -
State: closed - Opened by angel-luis 11 months ago
- 2 comments
#6350 - Different objects are returned from calls that should be returning the same kind of object.
Issue -
State: open - Opened by phalexo 11 months ago
- 2 comments
#6349 - Can't load ds = load_dataset("imdb")
Issue -
State: closed - Opened by vivianc2 11 months ago
- 4 comments
#6348 - Parquet stream-conversion fails to embed images/audio files from gated repos
Issue -
State: open - Opened by severo 11 months ago
Labels: bug
#6347 - Incorrect example code in 'Create a dataset' docs
Issue -
State: closed - Opened by rwood-97 11 months ago
- 2 comments
#6346 - Fix UnboundLocalError if preprocessing returns an empty list
Pull Request -
State: closed - Opened by cwallenwein 11 months ago
- 2 comments
#6345 - support squad structure datasets using a YAML parameter
Issue -
State: open - Opened by MajdTannous1 11 months ago
Labels: enhancement
#6344 - set dev version
Pull Request -
State: closed - Opened by lhoestq 11 months ago
- 3 comments
#6343 - Remove unused argument in `_get_data_files_patterns`
Pull Request -
State: closed - Opened by lhoestq 11 months ago
- 3 comments
#6342 - Release: 2.14.6
Pull Request -
State: closed - Opened by lhoestq 11 months ago
- 5 comments
#6340 - Release 2.14.5
Pull Request -
State: closed - Opened by lhoestq 11 months ago
- 1 comment
#6339 - minor release step improvement
Pull Request -
State: closed - Opened by lhoestq 11 months ago
- 3 comments
#6338 - pin fsspec before it switches to glob.glob
Pull Request -
State: closed - Opened by lhoestq 11 months ago
- 2 comments
#6337 - Pin supported upper version of fsspec
Pull Request -
State: closed - Opened by albertvillanova 11 months ago
- 6 comments
#6336 - unpin-fsspec
Pull Request -
State: closed - Opened by lhoestq 11 months ago
- 3 comments
#6335 - Support fsspec 2023.10.0
Pull Request -
State: closed - Opened by albertvillanova 11 months ago
- 7 comments
#6334 - datasets.filesystems: fix is_remote_filesystems
Pull Request -
State: closed - Opened by ap-- 11 months ago
- 3 comments
#6333 - Support fsspec 2023.10.0
Issue -
State: closed - Opened by albertvillanova 11 months ago
- 4 comments
#6332 - Replace deprecated license_file in setup.cfg
Pull Request -
State: closed - Opened by albertvillanova 11 months ago
- 4 comments
#6331 - Temporarily pin fsspec < 2023.10.0
Pull Request -
State: closed - Opened by albertvillanova 11 months ago
- 3 comments
#6330 - Latest fsspec==2023.10.0 issue with streaming datasets
Issue -
State: closed - Opened by ZachNagengast 11 months ago
- 8 comments
#6329 - شبکه های متن به گفتار ابتدا متن داده شده را به بازنمایی میانی
Issue -
State: closed - Opened by shabnam706 11 months ago
#6328 - شبکه های متن به گفتار ابتدا متن داده شده را به بازنمایی میانی
Issue -
State: closed - Opened by shabnam706 11 months ago
- 1 comment
#6327 - FileNotFoundError when trying to load the downloaded dataset with `load_dataset(..., streaming=True)`
Issue -
State: closed - Opened by yzhangcs 11 months ago
- 3 comments
#6326 - Create battery_analysis.py
Pull Request -
State: closed - Opened by vinitkm 11 months ago
#6325 - Create battery_analysis.py
Pull Request -
State: closed - Opened by vinitkm 11 months ago
#6324 - Conversion to Arrow fails due to wrong type heuristic
Issue -
State: closed - Opened by jphme 12 months ago
- 2 comments
#6323 - Loading dataset from large GCS bucket very slow since 2.14
Issue -
State: open - Opened by jbcdnr 12 months ago
- 1 comment
#6322 - Fix regex `get_data_files` formatting for base paths
Pull Request -
State: closed - Opened by ZachNagengast 12 months ago
- 4 comments
#6321 - Fix typos
Pull Request -
State: closed - Opened by python273 12 months ago
- 2 comments
#6320 - Dataset slice splits can't load training and validation at the same time
Issue -
State: closed - Opened by timlac 12 months ago
- 1 comment
#6319 - Datasets.map is severely broken
Issue -
State: open - Opened by phalexo 12 months ago
- 15 comments
#6318 - Deterministic set hash
Pull Request -
State: closed - Opened by lhoestq 12 months ago
- 3 comments
#6317 - sentiment140 dataset unavailable
Issue -
State: closed - Opened by AndreasKarasenko 12 months ago
- 2 comments
#6316 - Fix loading Hub datasets with CSV metadata file
Pull Request -
State: closed - Opened by albertvillanova 12 months ago
- 4 comments
#6315 - Hub datasets with CSV metadata raise ArrowInvalid: JSON parse error: Invalid value. in row 0
Issue -
State: closed - Opened by albertvillanova 12 months ago
Labels: bug
#6314 - Support creating new branch in push_to_hub
Pull Request -
State: closed - Opened by jmif 12 months ago
#6313 - Fix commit message formatting in multi-commit uploads
Pull Request -
State: closed - Opened by qgallouedec 12 months ago
- 2 comments
#6312 - docs: resolving namespace conflict, refactored variable
Pull Request -
State: closed - Opened by smty2018 12 months ago
- 1 comment
#6311 - cast_column to Sequence with length=4 occur exception raise in datasets/table.py:2146
Issue -
State: closed - Opened by neiblegy 12 months ago
- 4 comments
#6310 - Add return_file_name in load_dataset
Pull Request -
State: closed - Opened by juliendenize 12 months ago
- 7 comments
#6309 - Fix get_data_patterns for directories with the word data twice
Pull Request -
State: closed - Opened by albertvillanova 12 months ago
- 7 comments
#6308 - module 'resource' has no attribute 'error'
Issue -
State: closed - Opened by NeoWang9999 12 months ago
- 4 comments
#6307 - Fix typo in code example in docs
Pull Request -
State: closed - Opened by bryant1410 12 months ago
- 2 comments
#6306 - pyinstaller : OSError: could not get source code
Issue -
State: closed - Opened by dusk877647949 12 months ago
- 5 comments
#6305 - Cannot load dataset with `2.14.5`: `FileNotFound` error
Issue -
State: closed - Opened by finiteautomata 12 months ago
- 2 comments
#6304 - Update README.md
Pull Request -
State: closed - Opened by smty2018 12 months ago
- 1 comment
#6303 - Parquet uploads off-by-one naming scheme
Issue -
State: open - Opened by ZachNagengast 12 months ago
- 4 comments
#6302 - ArrowWriter/ParquetWriter `write` method does not increase `_num_bytes` and hence datasets not sharding at `max_shard_size`
Issue -
State: closed - Opened by Rassibassi 12 months ago
- 2 comments
#6301 - Unpin `tensorflow` maximum version
Pull Request -
State: closed - Opened by mariosasko 12 months ago
- 3 comments
#6300 - Unpin `jax` maximum version
Pull Request -
State: closed - Opened by mariosasko 12 months ago
- 6 comments
#6299 - Support for newer versions of JAX
Issue -
State: closed - Opened by ddrous 12 months ago
Labels: enhancement
#6298 - Doc readme improvements
Pull Request -
State: closed - Opened by mariosasko 12 months ago
- 2 comments
#6297 - Fix ArrayXD cast
Pull Request -
State: closed - Opened by mariosasko 12 months ago
- 2 comments
#6296 - Move `exceptions.py` to `utils/exceptions.py`
Pull Request -
State: closed - Opened by mariosasko 12 months ago
- 6 comments
#6295 - Fix parquet columns argument in streaming mode
Pull Request -
State: closed - Opened by lhoestq 12 months ago
- 3 comments
#6294 - IndexError: Invalid key is out of bounds for size 0 despite having a populated dataset
Issue -
State: closed - Opened by ZYM66 12 months ago
- 1 comment
#6293 - Choose columns to stream parquet data in streaming mode
Issue -
State: closed - Opened by lhoestq 12 months ago
Labels: bug
#6292 - how to load the image of dtype float32 or float64
Issue -
State: open - Opened by wanglaofei 12 months ago
- 1 comment
#6291 - Casting type from Array2D int to Array2D float crashes
Issue -
State: closed - Opened by AlanBlanchet 12 months ago
- 1 comment
#6290 - Incremental dataset (e.g. `.push_to_hub(..., append=True)`)
Issue -
State: open - Opened by Wauplin 12 months ago
- 4 comments
Labels: enhancement
#6289 - testing doc-builder
Pull Request -
State: closed - Opened by mishig25 12 months ago
- 2 comments
#6288 - Dataset.from_pandas with a DataFrame of PIL.Images
Issue -
State: open - Opened by lhoestq 12 months ago
- 2 comments
Labels: enhancement
#6287 - map() not recognizing "text"
Issue -
State: closed - Opened by EngineerKhan 12 months ago
- 1 comment
#6286 - Create DefunctDatasetError
Pull Request -
State: closed - Opened by albertvillanova 12 months ago
- 2 comments
#6285 - TypeError: expected str, bytes or os.PathLike object, not dict
Issue -
State: open - Opened by andysingal 12 months ago
- 4 comments
#6284 - Add Belebele multiple-choice machine reading comprehension (MRC) dataset
Issue -
State: closed - Opened by rajveer43 12 months ago
- 1 comment
Labels: enhancement
#6283 - Fix array cast/embed with null values
Pull Request -
State: closed - Opened by mariosasko 12 months ago
- 10 comments
#6282 - Drop data_files duplicates
Pull Request -
State: closed - Opened by lhoestq 12 months ago
- 5 comments
#6281 - Improve documentation of dataset.from_generator
Pull Request -
State: closed - Opened by hartmans 12 months ago
- 2 comments
#6280 - Couldn't cast array of type fixed_size_list to Sequence(Value(float64))
Issue -
State: closed - Opened by jmif 12 months ago
- 4 comments
#6279 - Batched IterableDataset
Issue -
State: open - Opened by lneukom 12 months ago
- 5 comments
Labels: enhancement
#6278 - No data files duplicates
Pull Request -
State: closed - Opened by lhoestq 12 months ago
- 4 comments
#6277 - FileNotFoundError: Couldn't find a module script at /content/paws-x/paws-x.py. Module 'paws-x' doesn't exist on the Hugging Face Hub either.
Issue -
State: closed - Opened by diegogonzalezc 12 months ago
- 1 comment
#6276 - I'm trying to fine tune the openai/whisper model from huggingface using jupyter notebook and i keep getting this error
Issue -
State: open - Opened by valaofficial 12 months ago
- 3 comments
#6275 - Would like to Contribute a dataset
Issue -
State: closed - Opened by vikas70607 12 months ago
- 1 comment
#6274 - FileNotFoundError for dataset with multiple builder config
Issue -
State: closed - Opened by LouisChen15 12 months ago
- 2 comments
#6273 - Broken Link to PubMed Abstracts dataset .
Issue -
State: open - Opened by sameemqureshi 12 months ago
- 5 comments
#6272 - Duplicate `data_files` when named `<split>/<split>.parquet`
Issue -
State: closed - Opened by lhoestq 12 months ago
- 7 comments
Labels: bug
#6271 - Overwriting Split overwrites data but not metadata, corrupting dataset
Issue -
State: closed - Opened by govindrai almost 1 year ago
#6270 - Dataset.from_generator raises with sharded gen_args
Issue -
State: closed - Opened by hartmans almost 1 year ago
- 6 comments
#6269 - Reduce the number of commits in `push_to_hub`
Pull Request -
State: closed - Opened by mariosasko about 1 year ago
- 21 comments
#6268 - Add repo_id to DatasetInfo
Pull Request -
State: open - Opened by lhoestq about 1 year ago
- 9 comments
#6267 - Multi label class encoding
Issue -
State: open - Opened by jmif about 1 year ago
- 7 comments
Labels: enhancement
#6266 - Use LibYAML with PyYAML if available
Pull Request -
State: open - Opened by bryant1410 about 1 year ago
- 5 comments
#6265 - Remove `apache_beam` import in `BeamBasedBuilder._save_info`
Pull Request -
State: closed - Opened by mariosasko about 1 year ago
- 4 comments
#6264 - Temporarily pin tensorflow < 2.14.0
Pull Request -
State: closed - Opened by albertvillanova about 1 year ago
- 4 comments
#6263 - CI is broken: ImportError: cannot import name 'context' from 'tensorflow.python'
Issue -
State: closed - Opened by albertvillanova about 1 year ago
Labels: bug
#6262 - Fix CI 404 errors
Pull Request -
State: closed - Opened by albertvillanova about 1 year ago
- 9 comments
#6261 - Can't load a dataset
Issue -
State: closed - Opened by joaopedrosdmm about 1 year ago
- 5 comments
#6260 - REUSE_DATASET_IF_EXISTS don't work
Issue -
State: closed - Opened by rangehow about 1 year ago
- 3 comments
#6259 - Duplicated Rows When Loading Parquet Files from Root Directory with Subdirectories
Issue -
State: closed - Opened by MF-FOOM about 1 year ago
- 1 comment
#6258 - [DOCS] Fix typo: Elasticsearch
Pull Request -
State: closed - Opened by leemthompo about 1 year ago
- 2 comments
#6257 - HfHubHTTPError - exceeded our hourly quotas for action: commit
Issue -
State: closed - Opened by yuvalkirstain about 1 year ago
- 4 comments