Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / aws/sagemaker-pytorch-training-toolkit issues and pull requests
#255 - ModuleNotFoundError: Sagemaker only copies entry_point file to /opt/ml/code/ instead of the holy-cloned source code
Issue -
State: closed - Opened by celsofranssa 12 months ago
- 3 comments
#254 - [FATAL tini (7)] exec train failed: No such file or directory
Issue -
State: open - Opened by celsofranssa 12 months ago
#253 - "Train": executable file not found in $PATH
Issue -
State: open - Opened by celsofranssa 12 months ago
#252 - change: bypass DNS check for studio local exec
Pull Request -
State: closed - Opened by mufaddal-rohawala 12 months ago
- 12 comments
#251 - Fix: pin coverage version to fix pipeline issue
Pull Request -
State: closed - Opened by yl-to over 1 year ago
- 4 comments
#250 - Add PyTorch version environment variable, to facilitate SMTT
Pull Request -
State: closed - Opened by yongyanrao over 1 year ago
- 6 comments
#248 - feature: Add torch_distributed support for Trainium
Pull Request -
State: closed - Opened by satishpasumarthi almost 2 years ago
- 12 comments
#247 - CVE-2007-4559 Patch
Pull Request -
State: open - Opened by TrellixVulnTeam almost 2 years ago
#246 - documentation: update README and contributing guidelines
Pull Request -
State: closed - Opened by satishpasumarthi about 2 years ago
- 4 comments
#245 - Update README.rst with how it related to SMTT
Pull Request -
State: closed - Opened by gilinachum about 2 years ago
- 3 comments
#244 - fix: provide option to use native process launcher
Pull Request -
State: closed - Opened by satishpasumarthi about 2 years ago
- 30 comments
#243 - aaa
Pull Request -
State: closed - Opened by cyberitech about 2 years ago
#242 - aaa
Pull Request -
State: closed - Opened by cyberitech about 2 years ago
#241 - Feature: Support new distribution mechanism for PT-XLA
Pull Request -
State: closed - Opened by Lokiiiiii about 2 years ago
- 8 comments
#240 - Test/fix
Pull Request -
State: closed - Opened by nish2104 about 2 years ago
- 5 comments
#239 - test: empty commit
Pull Request -
State: closed - Opened by nish21 about 2 years ago
- 4 comments
#238 - fix: derive master node from training environment
Pull Request -
State: closed - Opened by satishpasumarthi about 2 years ago
- 8 comments
#237 - upodate
Pull Request -
State: closed - Opened by gijayah213 about 2 years ago
- 4 comments
#236 - feature: add support for native PT DDP distribution
Pull Request -
State: closed - Opened by vishwakaria about 2 years ago
- 28 comments
#235 - feature: Add Heterogeneous Cluster support
Pull Request -
State: closed - Opened by satishpasumarthi about 2 years ago
- 17 comments
#234 - fix: CI changes
Pull Request -
State: closed - Opened by satishpasumarthi about 2 years ago
- 29 comments
#233 - empty commit to trigger ci
Pull Request -
State: closed - Opened by nish21 over 2 years ago
- 16 comments
#232 - [bug] Torch does not find GPU on pytorch-training:1.10.0-gpu-py38 container
Issue -
State: open - Opened by sergii-ivakhno-kidsloop over 2 years ago
#231 - feature: Added Native Pytorch DDP support
Pull Request -
State: closed - Opened by satishpasumarthi over 2 years ago
- 8 comments
#230 - Environment variables set for NCCL and Distributed training are not passed onto the sagemaker-training entrypoint
Issue -
State: closed - Opened by thecooltechguy over 3 years ago
- 1 comment
#229 - model_fn is not recognized. Sagemaker Studio template for model building, training, and deployment
Issue -
State: open - Opened by babarory over 3 years ago
#228 - Dockerfile installation of torch and torchvision from s3, replacing original versions.
Issue -
State: open - Opened by akinolawilson over 3 years ago
#227 - Example use case
Issue -
State: open - Opened by akinolawilson over 3 years ago
- 2 comments
Labels: type: question, type: documentation
#226 - Error importing torchaudio
Issue -
State: open - Opened by bbalaji-ucsd over 3 years ago
- 2 comments
Labels: type: bug
#225 - feature: add reinvent 2020 features
Pull Request -
State: closed - Opened by ChoiByungWook almost 4 years ago
- 73 comments
#224 - fix: not raising excpetion if no image to delete
Pull Request -
State: open - Opened by chuyang-deng almost 4 years ago
- 4 comments
#223 - Getting cudnn error while training on ml.p2.xlarge instance
Issue -
State: closed - Opened by shubhamsharma1609 about 4 years ago
- 2 comments
#222 - cannot recognize num_gpus for more than 1 gpu per instance
Issue -
State: closed - Opened by zhaoanbei about 4 years ago
- 4 comments
Labels: type: feature request
#221 - change: Update main buildspec to only perform CPU integration tests
Pull Request -
State: closed - Opened by bveeramani about 4 years ago
- 15 comments
#220 - change: Pin SageMaker version to less than v2
Pull Request -
State: closed - Opened by bveeramani about 4 years ago
- 3 comments
#219 - docs: Fix docstring style in training.py
Pull Request -
State: closed - Opened by bveeramani about 4 years ago
- 6 comments
#218 - change: Add GPU and unit test buildspecs
Pull Request -
State: closed - Opened by bveeramani about 4 years ago
- 4 comments
#217 - feature: Use MPIRunnerType
Pull Request -
State: closed - Opened by bveeramani about 4 years ago
- 55 comments
#216 - feature: update pytorch vanilla version to 1.6.0
Pull Request -
State: closed - Opened by chuyang-deng about 4 years ago
- 3 comments
#215 - FastAI v1.0.59 causes failed training job
Issue -
State: closed - Opened by dean-cpi over 4 years ago
- 1 comment
#214 - infra: add issue templates
Pull Request -
State: closed - Opened by ajaykarpur over 4 years ago
- 4 comments
#213 - doc: remove confusing information from the Readme.
Pull Request -
State: closed - Opened by nadiaya over 4 years ago
- 3 comments
#212 - infra: do not duplicate test dependencies in tox.ini
Pull Request -
State: closed - Opened by nadiaya over 4 years ago
- 20 comments
#211 - fix: Rename buildspec files.
Pull Request -
State: closed - Opened by nadiaya over 4 years ago
- 4 comments
#210 - fix: bump version of sagemaker-training for script entry point fix
Pull Request -
State: closed - Opened by ajaykarpur over 4 years ago
- 4 comments
#209 - infra: Make docker folder read only, remove unused tests.
Pull Request -
State: closed - Opened by nadiaya over 4 years ago
- 6 comments
#208 - unable to build final dockerfile.cpu
Issue -
State: closed - Opened by Vertika09 over 4 years ago
- 4 comments
#207 - Pytorch 1.5 build issue
Issue -
State: closed - Opened by dwang-sflscientific over 4 years ago
- 2 comments
#206 - change: install ipywidgets in 1.5.0 Python 3 Dockerfiles
Pull Request -
State: closed - Opened by laurenyu over 4 years ago
- 2 comments
#205 - fix: Bump version of sagemaker-training for typing fix
Pull Request -
State: closed - Opened by ajaykarpur over 4 years ago
- 1 comment
#204 - feature: add Python 3.7 support
Pull Request -
State: closed - Opened by ajaykarpur over 4 years ago
- 2 comments
#203 - fix: upgrade dependency versions
Pull Request -
State: closed - Opened by chuyang-deng over 4 years ago
- 7 comments
#202 - Pin Smdebug to the latest version (0.7.2)
Pull Request -
State: closed - Opened by TusharKanekiDey over 4 years ago
- 2 comments
#201 - infra: use tox in buildspecs
Pull Request -
State: closed - Opened by chuyang-deng over 4 years ago
- 3 comments
#200 - feature: add Dockerfiles for PyTorch 1.5.0
Pull Request -
State: closed - Opened by TusharKanekiDey over 4 years ago
- 20 comments
#199 - breaking: Replace sagemaker-containers with sagemaker-training
Pull Request -
State: closed - Opened by ajaykarpur over 4 years ago
- 5 comments
#198 - infra: parallelize SageMaker integ test runs
Pull Request -
State: closed - Opened by laurenyu over 4 years ago
- 2 comments
#197 - RuntimeError in training a model of resnet152 using transfer learning: "models cannot register a hook on a tensor that doesn't require gradient"
Issue -
State: closed - Opened by FurkanArslan over 4 years ago
- 3 comments
Labels: type: question
#196 - fix: change miniconda installation in 1.4.0 Dockerfiles
Pull Request -
State: closed - Opened by laurenyu over 4 years ago
- 1 comment
#195 - infra: remove (unused) model_fn from training scripts
Pull Request -
State: closed - Opened by laurenyu over 4 years ago
- 3 comments
#194 - infra: add requirements.txt integ test
Pull Request -
State: closed - Opened by laurenyu over 4 years ago
- 1 comment
#193 - upgrade pillow etc. to fix safety issues in 1.4.0 dockerfiles
Pull Request -
State: closed - Opened by YYStreet over 4 years ago
- 1 comment
#192 - Upgrade sagemaker-containers and test with more than 1 epoch
Pull Request -
State: closed - Opened by ajaykarpur over 4 years ago
- 1 comment
#191 - upgrade Pillow and use pip to install
Pull Request -
State: closed - Opened by YYStreet over 4 years ago
- 2 comments
#190 - Bump smdebug version
Pull Request -
State: closed - Opened by NihalHarish over 4 years ago
- 2 comments
#189 - requirements.txt not working
Issue -
State: closed - Opened by hrsma2i over 4 years ago
- 2 comments
Labels: type: bug, status: pending release
#188 - infra: run test-toolkit unit tests for release
Pull Request -
State: closed - Opened by laurenyu over 4 years ago
- 1 comment
#187 - fix: upgrade sagemaker-containers to 2.8.2
Pull Request -
State: closed - Opened by laurenyu over 4 years ago
- 1 comment
#186 - Install jupyter_client 5.3.4 in advance for py2 gpu image
Pull Request -
State: closed - Opened by YYStreet over 4 years ago
- 1 comment
#185 - update smdebug
Pull Request -
State: closed - Opened by vandanavk over 4 years ago
- 2 comments
#184 - Revert "Update smdebug to 0.7.0"
Pull Request -
State: closed - Opened by YYStreet over 4 years ago
- 1 comment
#183 - infra: run build steps only when necessary.
Pull Request -
State: closed - Opened by nadiaya over 4 years ago
- 4 comments
#182 - feature: Install toolkit from PyPI.
Pull Request -
State: closed - Opened by nadiaya over 4 years ago
- 5 comments
#181 - Issue with torchvision::nms using custom Pytorch and TorchVision
Issue -
State: closed - Opened by mmaybeno over 4 years ago
- 20 comments
#180 - skip on smexperiments import error
Pull Request -
State: closed - Opened by danabens over 4 years ago
- 3 comments
#179 - install sm experiments always when python 3.6 or greater
Pull Request -
State: closed - Opened by danabens over 4 years ago
- 1 comment
#178 - Custom serving code with framework_version beyond 1.1.0
Issue -
State: closed - Opened by austinmw over 4 years ago
- 5 comments
Labels: type: question
#177 - set min version instead of exact version for sm experiments requirement
Pull Request -
State: closed - Opened by danabens over 4 years ago
- 1 comment
#176 - skip python2 for experiments test
Pull Request -
State: closed - Opened by danabens over 4 years ago
- 1 comment
#175 - install sagemaker-experiments package only for 3.6
Pull Request -
State: closed - Opened by danabens over 4 years ago
- 1 comment
#174 - infra: refactor toolkit tests.
Pull Request -
State: closed - Opened by nadiaya over 4 years ago
- 5 comments
#173 - WA to torchvision dataset issue
Pull Request -
State: closed - Opened by vandanavk over 4 years ago
- 2 comments
#172 - Update smdebug to 0.7.0
Pull Request -
State: closed - Opened by vandanavk over 4 years ago
- 14 comments
#171 - Install awscli from pypi instead of conda for PyTorch containers
Pull Request -
State: closed - Opened by YYStreet over 4 years ago
- 7 comments
#170 - change: install SageMaker Python SDK into Python 3 images
Pull Request -
State: closed - Opened by laurenyu over 4 years ago
- 3 comments
#169 - change: Fix python 2 tox dependencies.
Pull Request -
State: closed - Opened by nadiaya over 4 years ago
- 1 comment
#168 - change: copy all tests to test-toolkit folder.
Pull Request -
State: closed - Opened by nadiaya over 4 years ago
- 2 comments
#167 - feature: Remove unnecessary dependencies.
Pull Request -
State: closed - Opened by nadiaya over 4 years ago
- 1 comment
#166 - Training on GPU with a custom container based on official pytorch-training container
Issue -
State: closed - Opened by jason-morgan over 4 years ago
- 2 comments
Labels: type: question
#165 - update: Update license URL
Pull Request -
State: closed - Opened by saimidu over 4 years ago
- 2 comments
#164 - "bash: cannot set terminal process group (-1): Inappropriate ioctl for device" printed at the start of sagemaker jobs
Issue -
State: open - Opened by abhinavs95 over 4 years ago
- 3 comments
Labels: type: question
#163 - upgrade to latest sagemaker-experiments
Pull Request -
State: closed - Opened by danabens over 4 years ago
- 23 comments
#162 - change: Fix flake8 erros. Add flake configuration to run during PR.
Pull Request -
State: closed - Opened by nadiaya over 4 years ago
- 2 comments
#161 - Add twine section to tox.
Pull Request -
State: closed - Opened by nadiaya over 4 years ago
- 1 comment
#160 - feature: Add release to PyPI. Change package name to sagemaker-pytorch-training.
Pull Request -
State: closed - Opened by nadiaya over 4 years ago
- 5 comments
#159 - fix: remove call to deprecated function download_and_install
Pull Request -
State: closed - Opened by ajaykarpur over 4 years ago
- 3 comments
#158 - Adding changes for PyTorch 1.4.0 DLC
Pull Request -
State: closed - Opened by abhinavs95 over 4 years ago
- 10 comments
#157 - Sagemaker PyTorch Not Recognizing Model_FN
Issue -
State: closed - Opened by zacharyFerretti over 4 years ago
- 9 comments
#152 - Pytorch Lightning pkg pin request in AWS sagemaker Pytorch base container
Issue -
State: open - Opened by amitmukh over 4 years ago
Labels: type: feature request
#139 - Prebuilt PyTorch image difference
Issue -
State: closed - Opened by ruijianw almost 5 years ago
- 15 comments
Labels: type: question