Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / huggingface/datasets issues and pull requests

#6357 - Allow passing a multiprocessing context to functions that support `num_proc`

Issue - State: open - Opened by bryant1410 11 months ago
Labels: enhancement

#6356 - Add `fsspec` version to the `datasets-cli env` command output

Pull Request - State: closed - Opened by mariosasko 11 months ago - 3 comments

#6355 - More hub centric docs

Pull Request - State: closed - Opened by lhoestq 11 months ago - 3 comments

#6353 - load_dataset save_to_disk load_from_disk error

Issue - State: closed - Opened by brisker 11 months ago - 5 comments

#6351 - Fix use_dataset.mdx

Pull Request - State: closed - Opened by angel-luis 11 months ago - 2 comments

#6349 - Can't load ds = load_dataset("imdb")

Issue - State: closed - Opened by vivianc2 11 months ago - 4 comments

#6347 - Incorrect example code in 'Create a dataset' docs

Issue - State: closed - Opened by rwood-97 11 months ago - 2 comments

#6346 - Fix UnboundLocalError if preprocessing returns an empty list

Pull Request - State: closed - Opened by cwallenwein 11 months ago - 2 comments

#6345 - support squad structure datasets using a YAML parameter

Issue - State: open - Opened by MajdTannous1 11 months ago
Labels: enhancement

#6344 - set dev version

Pull Request - State: closed - Opened by lhoestq 11 months ago - 3 comments

#6343 - Remove unused argument in `_get_data_files_patterns`

Pull Request - State: closed - Opened by lhoestq 11 months ago - 3 comments

#6342 - Release: 2.14.6

Pull Request - State: closed - Opened by lhoestq 11 months ago - 5 comments

#6340 - Release 2.14.5

Pull Request - State: closed - Opened by lhoestq 11 months ago - 1 comment

#6339 - minor release step improvement

Pull Request - State: closed - Opened by lhoestq 11 months ago - 3 comments

#6338 - pin fsspec before it switches to glob.glob

Pull Request - State: closed - Opened by lhoestq 11 months ago - 2 comments

#6337 - Pin supported upper version of fsspec

Pull Request - State: closed - Opened by albertvillanova 11 months ago - 6 comments

#6336 - unpin-fsspec

Pull Request - State: closed - Opened by lhoestq 11 months ago - 3 comments

#6335 - Support fsspec 2023.10.0

Pull Request - State: closed - Opened by albertvillanova 11 months ago - 7 comments

#6334 - datasets.filesystems: fix is_remote_filesystems

Pull Request - State: closed - Opened by ap-- 11 months ago - 3 comments

#6333 - Support fsspec 2023.10.0

Issue - State: closed - Opened by albertvillanova 11 months ago - 4 comments

#6332 - Replace deprecated license_file in setup.cfg

Pull Request - State: closed - Opened by albertvillanova 11 months ago - 4 comments

#6331 - Temporarily pin fsspec < 2023.10.0

Pull Request - State: closed - Opened by albertvillanova 11 months ago - 3 comments

#6330 - Latest fsspec==2023.10.0 issue with streaming datasets

Issue - State: closed - Opened by ZachNagengast 11 months ago - 8 comments

#6326 - Create battery_analysis.py

Pull Request - State: closed - Opened by vinitkm 11 months ago

#6325 - Create battery_analysis.py

Pull Request - State: closed - Opened by vinitkm 11 months ago

#6324 - Conversion to Arrow fails due to wrong type heuristic

Issue - State: closed - Opened by jphme 12 months ago - 2 comments

#6323 - Loading dataset from large GCS bucket very slow since 2.14

Issue - State: open - Opened by jbcdnr 12 months ago - 1 comment

#6322 - Fix regex `get_data_files` formatting for base paths

Pull Request - State: closed - Opened by ZachNagengast 12 months ago - 4 comments

#6321 - Fix typos

Pull Request - State: closed - Opened by python273 12 months ago - 2 comments

#6320 - Dataset slice splits can't load training and validation at the same time

Issue - State: closed - Opened by timlac 12 months ago - 1 comment

#6319 - Datasets.map is severely broken

Issue - State: open - Opened by phalexo 12 months ago - 15 comments

#6318 - Deterministic set hash

Pull Request - State: closed - Opened by lhoestq 12 months ago - 3 comments

#6317 - sentiment140 dataset unavailable

Issue - State: closed - Opened by AndreasKarasenko 12 months ago - 2 comments

#6316 - Fix loading Hub datasets with CSV metadata file

Pull Request - State: closed - Opened by albertvillanova 12 months ago - 4 comments

#6314 - Support creating new branch in push_to_hub

Pull Request - State: closed - Opened by jmif 12 months ago

#6313 - Fix commit message formatting in multi-commit uploads

Pull Request - State: closed - Opened by qgallouedec 12 months ago - 2 comments

#6312 - docs: resolving namespace conflict, refactored variable

Pull Request - State: closed - Opened by smty2018 12 months ago - 1 comment

#6310 - Add return_file_name in load_dataset

Pull Request - State: closed - Opened by juliendenize 12 months ago - 7 comments

#6309 - Fix get_data_patterns for directories with the word data twice

Pull Request - State: closed - Opened by albertvillanova 12 months ago - 7 comments

#6308 - module 'resource' has no attribute 'error'

Issue - State: closed - Opened by NeoWang9999 12 months ago - 4 comments

#6307 - Fix typo in code example in docs

Pull Request - State: closed - Opened by bryant1410 12 months ago - 2 comments

#6306 - pyinstaller : OSError: could not get source code

Issue - State: closed - Opened by dusk877647949 12 months ago - 5 comments

#6305 - Cannot load dataset with `2.14.5`: `FileNotFound` error

Issue - State: closed - Opened by finiteautomata 12 months ago - 2 comments

#6304 - Update README.md

Pull Request - State: closed - Opened by smty2018 12 months ago - 1 comment

#6303 - Parquet uploads off-by-one naming scheme

Issue - State: open - Opened by ZachNagengast 12 months ago - 4 comments

#6301 - Unpin `tensorflow` maximum version

Pull Request - State: closed - Opened by mariosasko 12 months ago - 3 comments

#6300 - Unpin `jax` maximum version

Pull Request - State: closed - Opened by mariosasko 12 months ago - 6 comments

#6299 - Support for newer versions of JAX

Issue - State: closed - Opened by ddrous 12 months ago
Labels: enhancement

#6298 - Doc readme improvements

Pull Request - State: closed - Opened by mariosasko 12 months ago - 2 comments

#6297 - Fix ArrayXD cast

Pull Request - State: closed - Opened by mariosasko 12 months ago - 2 comments

#6296 - Move `exceptions.py` to `utils/exceptions.py`

Pull Request - State: closed - Opened by mariosasko 12 months ago - 6 comments

#6295 - Fix parquet columns argument in streaming mode

Pull Request - State: closed - Opened by lhoestq 12 months ago - 3 comments

#6293 - Choose columns to stream parquet data in streaming mode

Issue - State: closed - Opened by lhoestq 12 months ago
Labels: bug

#6292 - how to load the image of dtype float32 or float64

Issue - State: open - Opened by wanglaofei 12 months ago - 1 comment

#6291 - Casting type from Array2D int to Array2D float crashes

Issue - State: closed - Opened by AlanBlanchet 12 months ago - 1 comment

#6290 - Incremental dataset (e.g. `.push_to_hub(..., append=True)`)

Issue - State: open - Opened by Wauplin 12 months ago - 4 comments
Labels: enhancement

#6289 - testing doc-builder

Pull Request - State: closed - Opened by mishig25 12 months ago - 2 comments

#6288 - Dataset.from_pandas with a DataFrame of PIL.Images

Issue - State: open - Opened by lhoestq 12 months ago - 2 comments
Labels: enhancement

#6287 - map() not recognizing "text"

Issue - State: closed - Opened by EngineerKhan 12 months ago - 1 comment

#6286 - Create DefunctDatasetError

Pull Request - State: closed - Opened by albertvillanova 12 months ago - 2 comments

#6285 - TypeError: expected str, bytes or os.PathLike object, not dict

Issue - State: open - Opened by andysingal 12 months ago - 4 comments

#6284 - Add Belebele multiple-choice machine reading comprehension (MRC) dataset

Issue - State: closed - Opened by rajveer43 12 months ago - 1 comment
Labels: enhancement

#6283 - Fix array cast/embed with null values

Pull Request - State: closed - Opened by mariosasko 12 months ago - 10 comments

#6282 - Drop data_files duplicates

Pull Request - State: closed - Opened by lhoestq 12 months ago - 5 comments

#6281 - Improve documentation of dataset.from_generator

Pull Request - State: closed - Opened by hartmans 12 months ago - 2 comments

#6280 - Couldn't cast array of type fixed_size_list to Sequence(Value(float64))

Issue - State: closed - Opened by jmif 12 months ago - 4 comments

#6279 - Batched IterableDataset

Issue - State: open - Opened by lneukom 12 months ago - 5 comments
Labels: enhancement

#6278 - No data files duplicates

Pull Request - State: closed - Opened by lhoestq 12 months ago - 4 comments

#6275 - Would like to Contribute a dataset

Issue - State: closed - Opened by vikas70607 12 months ago - 1 comment

#6274 - FileNotFoundError for dataset with multiple builder config

Issue - State: closed - Opened by LouisChen15 12 months ago - 2 comments

#6273 - Broken Link to PubMed Abstracts dataset .

Issue - State: open - Opened by sameemqureshi 12 months ago - 5 comments

#6272 - Duplicate `data_files` when named `<split>/<split>.parquet`

Issue - State: closed - Opened by lhoestq 12 months ago - 7 comments
Labels: bug

#6270 - Dataset.from_generator raises with sharded gen_args

Issue - State: closed - Opened by hartmans almost 1 year ago - 6 comments

#6269 - Reduce the number of commits in `push_to_hub`

Pull Request - State: closed - Opened by mariosasko about 1 year ago - 21 comments

#6268 - Add repo_id to DatasetInfo

Pull Request - State: open - Opened by lhoestq about 1 year ago - 9 comments

#6267 - Multi label class encoding

Issue - State: open - Opened by jmif about 1 year ago - 7 comments
Labels: enhancement

#6266 - Use LibYAML with PyYAML if available

Pull Request - State: open - Opened by bryant1410 about 1 year ago - 5 comments

#6265 - Remove `apache_beam` import in `BeamBasedBuilder._save_info`

Pull Request - State: closed - Opened by mariosasko about 1 year ago - 4 comments

#6264 - Temporarily pin tensorflow < 2.14.0

Pull Request - State: closed - Opened by albertvillanova about 1 year ago - 4 comments

#6262 - Fix CI 404 errors

Pull Request - State: closed - Opened by albertvillanova about 1 year ago - 9 comments

#6261 - Can't load a dataset

Issue - State: closed - Opened by joaopedrosdmm about 1 year ago - 5 comments

#6260 - REUSE_DATASET_IF_EXISTS don't work

Issue - State: closed - Opened by rangehow about 1 year ago - 3 comments

#6258 - [DOCS] Fix typo: Elasticsearch

Pull Request - State: closed - Opened by leemthompo about 1 year ago - 2 comments

#6257 - HfHubHTTPError - exceeded our hourly quotas for action: commit

Issue - State: closed - Opened by yuvalkirstain about 1 year ago - 4 comments