Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / uber/petastorm issues and pull requests
#806 - Petastorm not working due to PyArrow version hell
Issue -
State: open - Opened by kiranzo 3 months ago
- 2 comments
#805 - Petastorm break with pyarrow 13.0 or newer. Stable version of pyarrow is at 16.0 now.
Issue -
State: open - Opened by LauritsDixen 7 months ago
- 2 comments
#804 - Petastorm hangs forever in DataBricks
Issue -
State: open - Opened by juzzmac 9 months ago
- 1 comment
#803 - ParquetDataset has an invalid parameter validate_schema
Issue -
State: open - Opened by ayushkarnawat 10 months ago
- 1 comment
#802 - chore: Update badge pipeline
Pull Request -
State: closed - Opened by Juandavi1 12 months ago
- 1 comment
#802 - chore: Update badge pipeline
Pull Request -
State: closed - Opened by Juandavi1 12 months ago
- 1 comment
#801 - make_reader fails for example
Issue -
State: closed - Opened by phK3 about 1 year ago
- 1 comment
#800 - FutureWarning: 'ParquetDataset.partitions' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version.
Issue -
State: open - Opened by ton11111 about 1 year ago
#800 - FutureWarning: 'ParquetDataset.partitions' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version.
Issue -
State: open - Opened by ton11111 about 1 year ago
- 1 comment
#799 - make_torch_dataloader using TransformSpec applies transformation on entire dataframe (not lazy loading)
Issue -
State: closed - Opened by davegabe about 1 year ago
- 2 comments
#798 - Bug in ConcurrentVentilator._ventilate() when randomize_item_order=True and random seed is fixed
Issue -
State: open - Opened by JonasRauch over 1 year ago
#797 - Issue with loading nested array type from spark DF to torch
Issue -
State: open - Opened by sardinois over 1 year ago
#796 - Add a ThreadPool which respects the order of Parquet dataset pieces.
Pull Request -
State: open - Opened by wbeardall over 1 year ago
- 3 comments
#795 - String as input in petastorm dataloaders
Issue -
State: open - Opened by freud14-tm over 1 year ago
- 3 comments
#793 - Seeing worse model performance from using petastorm vs normal pytorch dataloader
Issue -
State: open - Opened by AKhazane over 1 year ago
- 1 comment
#792 - Add missing field_name in ValueError
Pull Request -
State: open - Opened by chasleslr over 1 year ago
- 3 comments
#791 - [Test] Run CI against pyspark 3.4
Pull Request -
State: open - Opened by WeichenXu123 over 1 year ago
- 3 comments
#790 - TypeError: __init__() missing 2 required positional arguments: 'instance' and 'token'
Issue -
State: open - Opened by devVipin01 over 1 year ago
#789 - AttributeError: 'bool' object has no attribute 'map' while using Predicate
Issue -
State: open - Opened by xizhenke almost 2 years ago
#788 - How to transform the string data to numerical when using make_batch_reader?
Issue -
State: open - Opened by xizhenke almost 2 years ago
#787 - Make `make_spark_converter` supports creating converter from a saved dataframe path
Pull Request -
State: closed - Opened by WeichenXu123 almost 2 years ago
- 7 comments
#786 - make_batch_reader Documentation out of date? seed?
Issue -
State: open - Opened by Data-drone almost 2 years ago
#785 - Petastorm sharding and setting batch sizes
Issue -
State: open - Opened by Data-drone almost 2 years ago
#784 - Prediction issue using Keras and TransformSpec with PySpark
Issue -
State: closed - Opened by sdaza almost 2 years ago
#783 - Support results_queue_size parameter in make_batch_reader api
Pull Request -
State: closed - Opened by s-udhaya about 2 years ago
- 8 comments
#782 - when hdfs-site.xml file has xi:include tag, the function cann't get hadoop_configuration info
Issue -
State: open - Opened by lytk01 about 2 years ago
#781 - How to pass pin_memory argument when using make_torch_dataloader
Issue -
State: closed - Opened by s-udhaya about 2 years ago
- 2 comments
#780 - Customized dataset
Issue -
State: closed - Opened by JiajianLu about 2 years ago
- 1 comment
#779 - Random seed doesn't seem to work well
Issue -
State: open - Opened by kisel4363 about 2 years ago
- 2 comments
#778 - Update CI to use latest versions of pyarrow and numpy. Drop pyarrow 4 test config.
Pull Request -
State: open - Opened by selitvin about 2 years ago
- 2 comments
#777 - Remove ``LocalDiskArrowTableCache`` and use latest pickle protocol for local caching
Pull Request -
State: closed - Opened by selitvin about 2 years ago
- 3 comments
#776 - using SHAP with petastorm dataset
Issue -
State: open - Opened by sdaza about 2 years ago
- 1 comment
#775 - Future Warning importing SparkDatasetConverter.
Issue -
State: closed - Opened by kisel4363 about 2 years ago
- 2 comments
#774 - Dynamic shape of lables.
Issue -
State: open - Opened by ohindialign about 2 years ago
- 3 comments
#773 - in_set predicate raises error unhashable type: 'Series'
Issue -
State: open - Opened by Joachim-Sh about 2 years ago
#772 - Add a collate_lists_fn
Pull Request -
State: open - Opened by selitvin about 2 years ago
- 1 comment
#771 - Update pytorch mnist example with up-to-date make_reader() interface
Pull Request -
State: closed - Opened by chongxiaoc about 2 years ago
- 1 comment
#770 - weighted_sampling_reader
Issue -
State: open - Opened by weidezhang about 2 years ago
- 3 comments
#769 - make_spark_converter RuntimeError: Vector columns are only supported in pyspark>=3.0
Issue -
State: open - Opened by Alxe1 over 2 years ago
- 4 comments
#768 - null cache
Issue -
State: open - Opened by weidezhang over 2 years ago
- 4 comments
#767 - Reader: enable shuffling inside every row group
Pull Request -
State: closed - Opened by chongxiaoc over 2 years ago
- 2 comments
#766 - upgrade readthedocs to use Py3.7
Pull Request -
State: closed - Opened by chongxiaoc over 2 years ago
- 1 comment
#765 - make_batch_reader loses dtype with list-of-strings columns, causing Tensorflow error when lists contain a None value
Issue -
State: open - Opened by arhan-gunel over 2 years ago
#764 - Will petastorm Dataloader support prefetch like PyTorch Multiprocessing Dataloader?
Issue -
State: closed - Opened by MARD1NO over 2 years ago
- 1 comment
#763 - PyTorch Batched Non-shuffle Buffer Large Memory Consumption
Issue -
State: closed - Opened by chongxiaoc over 2 years ago
- 1 comment
Labels: enhancement
#762 - PyTorch: improve memory-efficiency in batched non-shuffle buffer
Pull Request -
State: closed - Opened by chongxiaoc over 2 years ago
- 3 comments
#761 - dynamic padding via `collate_fn`
Issue -
State: open - Opened by Jomonsugi over 2 years ago
- 11 comments
#760 - Newer pyarrow versions?
Issue -
State: closed - Opened by winding-lines over 2 years ago
- 1 comment
#759 - Can we input a custom collate function as an input variable when creating the dataloader ?
Issue -
State: open - Opened by shamanez over 2 years ago
#758 - Validate_schema keyword not supported yet
Issue -
State: open - Opened by kisel4363 over 2 years ago
- 7 comments
#757 - Replace process_iter by pid_exists
Pull Request -
State: closed - Opened by MostafaFarahani over 2 years ago
- 3 comments
#756 - Performance on large amounts of data
Issue -
State: open - Opened by jaycunningham-8451 over 2 years ago
- 1 comment
#755 - training from different sources
Issue -
State: open - Opened by weidezhang over 2 years ago
- 6 comments
#754 - Wrapper for Arrow Datasets & Dataset Pieces
Pull Request -
State: open - Opened by aperiodic over 2 years ago
- 2 comments
#753 - Update README.rst
Pull Request -
State: open - Opened by FeU-aKlos over 2 years ago
- 1 comment
#752 - Add Python3.10 to CI docker image
Pull Request -
State: open - Opened by selitvin over 2 years ago
- 2 comments
#751 - Upgrade CI to use latest packages of tf,pyarrow,numpy in 'latest' CI configuration
Pull Request -
State: closed - Opened by selitvin over 2 years ago
- 2 comments
#750 - Fix type of the a batch returned by make_batch_reader when TransformSpec's function returns column with all values being None
Pull Request -
State: open - Opened by selitvin over 2 years ago
- 1 comment
#749 - Do not land: Benchmark size of a parquet file with png files
Pull Request -
State: closed - Opened by selitvin over 2 years ago
#748 - Enable batch fetching in parallel
Pull Request -
State: open - Opened by jarandaf over 2 years ago
- 4 comments
#747 - How to reduce parquet size
Issue -
State: open - Opened by journey-wang over 2 years ago
- 1 comment
#746 - Import ABC from collections.abc for Python 3.10 compatibility
Pull Request -
State: closed - Opened by tirkarthi over 2 years ago
- 2 comments
#745 - Test using shared_seed with pytorch converter
Pull Request -
State: closed - Opened by selitvin over 2 years ago
- 1 comment
#744 - Use of transform_spec in make_batch_reader leads to tensorflow error when column is missing values
Issue -
State: open - Opened by oby1 over 2 years ago
- 3 comments
#743 - tensorflow pyspark
Issue -
State: closed - Opened by malinphy over 2 years ago
- 4 comments
#742 - make_batch_reader called by make_torch_loader "got an unexpected keyword argument 'shard_seed'"
Issue -
State: closed - Opened by quocdat32461997 over 2 years ago
- 2 comments
#741 - `RestrictedUnpickler` is Bypassable
Issue -
State: open - Opened by splitline over 2 years ago
#740 - On BatchedDataLoader performance
Issue -
State: closed - Opened by jarandaf over 2 years ago
- 8 comments
#739 - Speeding up loading data from spark
Issue -
State: open - Opened by jmpanfil over 2 years ago
- 3 comments
#738 - Ambiguous workflow while using Spark
Issue -
State: open - Opened by smartFunX almost 3 years ago
- 3 comments
#737 - Use highest available pickle protocol when serializing
Pull Request -
State: closed - Opened by rbetz almost 3 years ago
- 9 comments
#736 - Parquet column/modular encryption support for Petastorm
Issue -
State: open - Opened by RobindeGrootNL almost 3 years ago
- 8 comments
#735 - reuse dataset materialized by SparkDatasetConverter
Issue -
State: closed - Opened by Riser01 almost 3 years ago
- 1 comment
#734 - how to use a single dataset to train multiple input model in tensorflow keras useing pentastorm
Issue -
State: closed - Opened by Riser01 almost 3 years ago
#733 - Tensorflow pentastrom , training stuck
Issue -
State: closed - Opened by Riser01 almost 3 years ago
- 6 comments
#732 - Get rid of RuntimeWarning when using process pool
Pull Request -
State: closed - Opened by selitvin almost 3 years ago
- 1 comment
#731 - Support passing multiple url files to make_reader function.
Pull Request -
State: closed - Opened by selitvin almost 3 years ago
- 3 comments
#730 - Allow more than two namenodes in hdfs configuration file.
Pull Request -
State: closed - Opened by selitvin almost 3 years ago
- 1 comment
#729 - Varying number of examples passed by DataLoader to Pytorch Lightning network
Issue -
State: open - Opened by trelium almost 3 years ago
- 2 comments
#728 - PyDictReaderWorker does not support multiple paths datset_paths
Issue -
State: closed - Opened by zhangzhenyu13 almost 3 years ago
- 2 comments
#727 - Large metadata file: Can't load dataset after using Petastorm row_group_indexer
Issue -
State: open - Opened by marjanAlbouye about 3 years ago
- 1 comment
#726 - How to stop petastorm dataloaders at end of epoch
Issue -
State: open - Opened by jiwidi about 3 years ago
- 3 comments
#725 - Error when using make_spark_converter
Issue -
State: closed - Opened by jiwidi about 3 years ago
#724 - got error AssertionError: Must supply a list of namenodes, but HDFS only supports up to 2 namenode URLs when calling the materialize_dataset() in example
Issue -
State: closed - Opened by Ereebay about 3 years ago
- 3 comments
#723 - Use assertEqual instead of assertEquals for Python 3.11 compatibility.
Pull Request -
State: closed - Opened by tirkarthi about 3 years ago
- 2 comments
#722 - fix typo "suffling" -> "shuffling"
Pull Request -
State: closed - Opened by noxthot about 3 years ago
- 5 comments
#721 - not able to disable shuffling using : make_torch_dataloader
Issue -
State: open - Opened by Warra07 about 3 years ago
- 2 comments
#720 - make_reader() is taking forever
Issue -
State: open - Opened by GraceHLiu about 3 years ago
- 14 comments
#719 - Any update on shard imbalance issue for parquet dataset?
Issue -
State: closed - Opened by PHILO-HE about 3 years ago
- 6 comments
#718 - Use make_batch_reader for petastorm parquet dataset
Issue -
State: closed - Opened by PHILO-HE about 3 years ago
- 2 comments
#717 - Added fsspec support for _default_delete_dir_handler
Pull Request -
State: closed - Opened by manjuransari-zz about 3 years ago
- 1 comment
#716 - _default_delete_dir_handler throws error when using default handler
Issue -
State: closed - Opened by manjuransari-zz about 3 years ago
- 8 comments
#714 - No option to pass storage_options in materialize_dataset()
Issue -
State: open - Opened by manjuransari-zz over 3 years ago
#711 - Use spark_test_ctx fixture instead of constructing spark manually
Pull Request -
State: open - Opened by selitvin over 3 years ago
- 1 comment
#702 - Remove very old pickle compatibility code modifying old atg package names
Pull Request -
State: open - Opened by selitvin over 3 years ago
- 2 comments
#699 - Access a specific row in the dataframe
Issue -
State: open - Opened by 2006pmach over 3 years ago
- 4 comments
#690 - Support for parquet files with nested structures
Issue -
State: open - Opened by mossadhelali over 3 years ago
- 21 comments
#663 - fix get_dataset_path() in fs_utils.py
Pull Request -
State: open - Opened by dongpohezui over 3 years ago
- 4 comments
#656 - Remove Unischema __getattr__ implementation
Pull Request -
State: open - Opened by v01dXYZ over 3 years ago
- 2 comments
#641 - Pytorch: add AsyncBatchedDataloader
Pull Request -
State: closed - Opened by chongxiaoc almost 4 years ago
- 7 comments