Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / uber/petastorm issues and pull requests

#806 - Petastorm not working due to PyArrow version hell

Issue - State: open - Opened by kiranzo about 1 month ago - 2 comments

#804 - Petastorm hangs forever in DataBricks

Issue - State: open - Opened by juzzmac 7 months ago - 1 comment

#803 - ParquetDataset has an invalid parameter validate_schema

Issue - State: open - Opened by ayushkarnawat 8 months ago - 1 comment

#802 - chore: Update badge pipeline

Pull Request - State: closed - Opened by Juandavi1 11 months ago - 1 comment

#802 - chore: Update badge pipeline

Pull Request - State: closed - Opened by Juandavi1 11 months ago - 1 comment

#801 - make_reader fails for example

Issue - State: closed - Opened by phK3 12 months ago - 1 comment

#796 - Add a ThreadPool which respects the order of Parquet dataset pieces.

Pull Request - State: open - Opened by wbeardall over 1 year ago - 3 comments

#795 - String as input in petastorm dataloaders

Issue - State: open - Opened by freud14-tm over 1 year ago - 3 comments

#792 - Add missing field_name in ValueError

Pull Request - State: open - Opened by chasleslr over 1 year ago - 3 comments

#791 - [Test] Run CI against pyspark 3.4

Pull Request - State: open - Opened by WeichenXu123 over 1 year ago - 3 comments

#787 - Make `make_spark_converter` supports creating converter from a saved dataframe path

Pull Request - State: closed - Opened by WeichenXu123 almost 2 years ago - 7 comments

#786 - make_batch_reader Documentation out of date? seed?

Issue - State: open - Opened by Data-drone almost 2 years ago

#785 - Petastorm sharding and setting batch sizes

Issue - State: open - Opened by Data-drone almost 2 years ago

#784 - Prediction issue using Keras and TransformSpec with PySpark

Issue - State: closed - Opened by sdaza almost 2 years ago

#783 - Support results_queue_size parameter in make_batch_reader api

Pull Request - State: closed - Opened by s-udhaya almost 2 years ago - 8 comments

#781 - How to pass pin_memory argument when using make_torch_dataloader

Issue - State: closed - Opened by s-udhaya almost 2 years ago - 2 comments

#780 - Customized dataset

Issue - State: closed - Opened by JiajianLu almost 2 years ago - 1 comment

#779 - Random seed doesn't seem to work well

Issue - State: open - Opened by kisel4363 about 2 years ago - 2 comments

#778 - Update CI to use latest versions of pyarrow and numpy. Drop pyarrow 4 test config.

Pull Request - State: open - Opened by selitvin about 2 years ago - 2 comments

#777 - Remove ``LocalDiskArrowTableCache`` and use latest pickle protocol for local caching

Pull Request - State: closed - Opened by selitvin about 2 years ago - 3 comments

#776 - using SHAP with petastorm dataset

Issue - State: open - Opened by sdaza about 2 years ago - 1 comment

#775 - Future Warning importing SparkDatasetConverter.

Issue - State: closed - Opened by kisel4363 about 2 years ago - 2 comments

#774 - Dynamic shape of lables.

Issue - State: open - Opened by ohindialign about 2 years ago - 3 comments

#773 - in_set predicate raises error unhashable type: 'Series'

Issue - State: open - Opened by Joachim-Sh about 2 years ago

#772 - Add a collate_lists_fn

Pull Request - State: open - Opened by selitvin about 2 years ago - 1 comment

#771 - Update pytorch mnist example with up-to-date make_reader() interface

Pull Request - State: closed - Opened by chongxiaoc about 2 years ago - 1 comment

#770 - weighted_sampling_reader

Issue - State: open - Opened by weidezhang about 2 years ago - 3 comments

#768 - null cache

Issue - State: open - Opened by weidezhang about 2 years ago - 4 comments

#767 - Reader: enable shuffling inside every row group

Pull Request - State: closed - Opened by chongxiaoc about 2 years ago - 2 comments

#766 - upgrade readthedocs to use Py3.7

Pull Request - State: closed - Opened by chongxiaoc about 2 years ago - 1 comment

#763 - PyTorch Batched Non-shuffle Buffer Large Memory Consumption

Issue - State: closed - Opened by chongxiaoc about 2 years ago - 1 comment
Labels: enhancement

#762 - PyTorch: improve memory-efficiency in batched non-shuffle buffer

Pull Request - State: closed - Opened by chongxiaoc about 2 years ago - 3 comments

#761 - dynamic padding via `collate_fn`

Issue - State: open - Opened by Jomonsugi about 2 years ago - 11 comments

#760 - Newer pyarrow versions?

Issue - State: closed - Opened by winding-lines about 2 years ago - 1 comment

#758 - Validate_schema keyword not supported yet

Issue - State: open - Opened by kisel4363 about 2 years ago - 7 comments

#757 - Replace process_iter by pid_exists

Pull Request - State: closed - Opened by MostafaFarahani over 2 years ago - 3 comments

#756 - Performance on large amounts of data

Issue - State: open - Opened by jaycunningham-8451 over 2 years ago - 1 comment

#755 - training from different sources

Issue - State: open - Opened by weidezhang over 2 years ago - 6 comments

#754 - Wrapper for Arrow Datasets & Dataset Pieces

Pull Request - State: open - Opened by aperiodic over 2 years ago - 2 comments

#753 - Update README.rst

Pull Request - State: open - Opened by FeU-aKlos over 2 years ago - 1 comment

#752 - Add Python3.10 to CI docker image

Pull Request - State: open - Opened by selitvin over 2 years ago - 2 comments

#751 - Upgrade CI to use latest packages of tf,pyarrow,numpy in 'latest' CI configuration

Pull Request - State: closed - Opened by selitvin over 2 years ago - 2 comments

#749 - Do not land: Benchmark size of a parquet file with png files

Pull Request - State: closed - Opened by selitvin over 2 years ago

#748 - Enable batch fetching in parallel

Pull Request - State: open - Opened by jarandaf over 2 years ago - 4 comments

#747 - How to reduce parquet size

Issue - State: open - Opened by journey-wang over 2 years ago - 1 comment

#746 - Import ABC from collections.abc for Python 3.10 compatibility

Pull Request - State: closed - Opened by tirkarthi over 2 years ago - 2 comments

#745 - Test using shared_seed with pytorch converter

Pull Request - State: closed - Opened by selitvin over 2 years ago - 1 comment

#743 - tensorflow pyspark

Issue - State: closed - Opened by malinphy over 2 years ago - 4 comments

#741 - `RestrictedUnpickler` is Bypassable

Issue - State: open - Opened by splitline over 2 years ago

#740 - On BatchedDataLoader performance

Issue - State: closed - Opened by jarandaf over 2 years ago - 8 comments

#739 - Speeding up loading data from spark

Issue - State: open - Opened by jmpanfil over 2 years ago - 3 comments

#738 - Ambiguous workflow while using Spark

Issue - State: open - Opened by smartFunX over 2 years ago - 3 comments

#737 - Use highest available pickle protocol when serializing

Pull Request - State: closed - Opened by rbetz over 2 years ago - 9 comments

#736 - Parquet column/modular encryption support for Petastorm

Issue - State: open - Opened by RobindeGrootNL over 2 years ago - 8 comments

#735 - reuse dataset materialized by SparkDatasetConverter

Issue - State: closed - Opened by Riser01 over 2 years ago - 1 comment

#733 - Tensorflow pentastrom , training stuck

Issue - State: closed - Opened by Riser01 over 2 years ago - 6 comments

#732 - Get rid of RuntimeWarning when using process pool

Pull Request - State: closed - Opened by selitvin over 2 years ago - 1 comment

#731 - Support passing multiple url files to make_reader function.

Pull Request - State: closed - Opened by selitvin over 2 years ago - 3 comments

#730 - Allow more than two namenodes in hdfs configuration file.

Pull Request - State: closed - Opened by selitvin over 2 years ago - 1 comment

#729 - Varying number of examples passed by DataLoader to Pytorch Lightning network

Issue - State: open - Opened by trelium almost 3 years ago - 2 comments

#728 - PyDictReaderWorker does not support multiple paths datset_paths

Issue - State: closed - Opened by zhangzhenyu13 almost 3 years ago - 2 comments

#726 - How to stop petastorm dataloaders at end of epoch

Issue - State: open - Opened by jiwidi almost 3 years ago - 3 comments

#725 - Error when using make_spark_converter

Issue - State: closed - Opened by jiwidi almost 3 years ago

#723 - Use assertEqual instead of assertEquals for Python 3.11 compatibility.

Pull Request - State: closed - Opened by tirkarthi almost 3 years ago - 2 comments

#722 - fix typo "suffling" -> "shuffling"

Pull Request - State: closed - Opened by noxthot almost 3 years ago - 5 comments

#721 - not able to disable shuffling using : make_torch_dataloader

Issue - State: open - Opened by Warra07 about 3 years ago - 2 comments

#720 - make_reader() is taking forever

Issue - State: open - Opened by GraceHLiu about 3 years ago - 14 comments

#719 - Any update on shard imbalance issue for parquet dataset?

Issue - State: closed - Opened by PHILO-HE about 3 years ago - 6 comments

#718 - Use make_batch_reader for petastorm parquet dataset

Issue - State: closed - Opened by PHILO-HE about 3 years ago - 2 comments

#717 - Added fsspec support for _default_delete_dir_handler

Pull Request - State: closed - Opened by manjuransari-zz about 3 years ago - 1 comment

#716 - _default_delete_dir_handler throws error when using default handler

Issue - State: closed - Opened by manjuransari-zz about 3 years ago - 8 comments

#711 - Use spark_test_ctx fixture instead of constructing spark manually

Pull Request - State: open - Opened by selitvin about 3 years ago - 1 comment

#702 - Remove very old pickle compatibility code modifying old atg package names

Pull Request - State: open - Opened by selitvin about 3 years ago - 2 comments

#699 - Access a specific row in the dataframe

Issue - State: open - Opened by 2006pmach about 3 years ago - 4 comments

#690 - Support for parquet files with nested structures

Issue - State: open - Opened by mossadhelali over 3 years ago - 21 comments

#663 - fix get_dataset_path() in fs_utils.py

Pull Request - State: open - Opened by dongpohezui over 3 years ago - 4 comments

#656 - Remove Unischema __getattr__ implementation

Pull Request - State: open - Opened by v01dXYZ over 3 years ago - 2 comments

#641 - Pytorch: add AsyncBatchedDataloader

Pull Request - State: closed - Opened by chongxiaoc over 3 years ago - 7 comments