Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / NVIDIA-Merlin/HugeCTR issues and pull requests
#463 - [Question] sok performance
Issue -
State: open - Opened by Orca-bit about 1 month ago
- 1 comment
Labels: question
#462 - [BUG] sok amp mode error
Issue -
State: open - Opened by Orca-bit about 1 month ago
- 1 comment
#461 - [BUG] encounter error when running sok dlrm benchmark
Issue -
State: open - Opened by Orca-bit about 1 month ago
- 2 comments
#460 - [BUG] compile sok error
Issue -
State: closed - Opened by Orca-bit about 1 month ago
- 2 comments
#459 - [Question] What is the difference between HugeCTR/embedding and HugeCTR/src/embeddings?
Issue -
State: closed - Opened by Orca-bit 2 months ago
- 1 comment
Labels: question
#458 - Sync from gitlab
Pull Request -
State: closed - Opened by EmmaQiaoCh 2 months ago
- 2 comments
#457 - Bump actions/download-artifact from 3 to 4.1.7 in /.github/workflows
Pull Request -
State: open - Opened by dependabot[bot] 3 months ago
- 1 comment
Labels: dependencies, github_actions
#456 - [Question]Is there a version of tensorflow 1.15 with a merlin-tensorflow image of sok installed?
Issue -
State: open - Opened by recklessnolove 4 months ago
- 1 comment
Labels: question
#455 - low frequency filter
Pull Request -
State: closed - Opened by ccccjunkang 4 months ago
- 2 comments
#454 - [Requirement] Custom allocator support for gpu_cache
Issue -
State: open - Opened by mfbalin 5 months ago
- 1 comment
#453 - [BUG]The wdl_8gpu.py script execution has halted and training cannot proceed.
Issue -
State: open - Opened by redzhang1990 5 months ago
- 1 comment
#452 - [BUG] Slot calculation error in static_hash_table.cu
Issue -
State: open - Opened by Jiaao-Bai 6 months ago
- 2 comments
#451 - [Question] How to add new models to HPS configuration when using Model Control Mode EXPLICIT?
Issue -
State: closed - Opened by dmac 6 months ago
- 5 comments
Labels: question
#450 - [BUG] I/O error on Linux kernel with 64KiB base page size
Issue -
State: open - Opened by flx42 6 months ago
#449 - [Question] Help converting ONNX to TensorRT with graphsurgeon and HPS plugin
Issue -
State: closed - Opened by dmac 6 months ago
- 4 comments
Labels: question
#448 - Update hierarchical_parameter_server_demo.ipynb
Pull Request -
State: open - Opened by jq 6 months ago
- 1 comment
#447 - Remove some internal files
Pull Request -
State: closed - Opened by EmmaQiaoCh 8 months ago
- 1 comment
#446 - Fix hps docs typo and hps profiler example argument
Pull Request -
State: open - Opened by shyeonn 8 months ago
- 1 comment
#445 - [BUG] Enabling regularization causes CUDNN_STATUS_MAPPING_ERROR for deepfm example
Issue -
State: open - Opened by klmentzer 9 months ago
- 4 comments
#444 - [Question] Is there any related architecture design or documentation for embedding collection
Issue -
State: closed - Opened by Jiaao-Bai 9 months ago
- 2 comments
Labels: question
#443 - [Question] Can i read parquet data from HDFS?
Issue -
State: closed - Opened by wangxingda 9 months ago
- 6 comments
Labels: question
#442 - [BUG]build failed on gtest!
Issue -
State: closed - Opened by SeekPoint 10 months ago
- 5 comments
Labels: bug
#441 - [BUG] cudaErrorIllegalAddress: an illegal memory access was encounteredThread
Issue -
State: closed - Opened by kangna-qi 10 months ago
- 4 comments
#440 - [BUG] Seg Fault When Deploying TF+HPS Model with merlin-tensorflow
Issue -
State: open - Opened by tuanavu 10 months ago
- 9 comments
#439 - [BUG] Run sok tests error
Issue -
State: closed - Opened by kangna-qi 10 months ago
- 1 comment
#438 - [Question] How to dump incremental model to kafka in Release 23.12?
Issue -
State: open - Opened by lausannel 11 months ago
- 2 comments
Labels: question
#437 - [Question] Is there pipeline mechanism to help the lookup requests always be handled on device cache in HugeCTR?
Issue -
State: open - Opened by Lifann 11 months ago
- 1 comment
Labels: question
#436 - support lock-free hashmap backend
Pull Request -
State: open - Opened by ZhuYuJin 12 months ago
#435 - [BUG]preprocess.sh 1 criteo failed with 'Schema' object has no attribute 'write'
Issue -
State: open - Opened by SeekPoint 12 months ago
- 1 comment
#434 - build docker failed with 401 Unauthorized (Set Up the Development Environment With Merlin Containers)
Issue -
State: open - Opened by SeekPoint 12 months ago
- 4 comments
#433 - [BUG] CUDNN_STATUS_MAPPING_ERROR with cudnnSetStream
Issue -
State: closed - Opened by rgandikota 12 months ago
- 21 comments
#432 - sok-experiment static_map empty_key_sentinel and reclaimed_key_sentinel is not right for int64 [BUG]
Issue -
State: closed - Opened by amazingyyc about 1 year ago
- 4 comments
#431 - Trouble installing hugectr_backend for Triton Server
Issue -
State: closed - Opened by sezhiyanhari about 1 year ago
- 1 comment
#430 - fix: typo in kafka broker
Pull Request -
State: open - Opened by lausannel about 1 year ago
- 1 comment
Labels: fea::refactor, fea::chore
#429 - [BUG] Encountered ETC error of din model when training with multiple keyset.
Issue -
State: closed - Opened by dusir about 1 year ago
- 3 comments
#428 - [Question] nv_gpu_cache compiling problem
Issue -
State: closed - Opened by RobertLou about 1 year ago
- 1 comment
Labels: question
#427 - [Question] How can I pre-calculate the GPU memory required for embedding cache size?
Issue -
State: open - Opened by tuanavu about 1 year ago
- 2 comments
Labels: question
#426 - Support for configuration issues
Issue -
State: open - Opened by EmmaQiaoCh about 1 year ago
- 1 comment
#424 - [Question] Difference between Embedding Training Cache and GPU Embedding Cache
Issue -
State: open - Opened by hsezhiyan about 1 year ago
- 9 comments
Labels: question
#423 - Update doc dependencies
Pull Request -
State: closed - Opened by EmmaQiaoCh about 1 year ago
- 1 comment
#422 - [Question] How to serve TF2 SOK model in Triton Inference and convert it to ONNX?
Issue -
State: closed - Opened by tuanavu about 1 year ago
- 1 comment
Labels: question
#421 - [Question] COnfiguration issues with mlcommon benchmarking
Issue -
State: open - Opened by raghavendrachari08 about 1 year ago
- 2 comments
Labels: question
#420 - [Question] Is there a slack channel or discord server for questions and discussion ?
Issue -
State: open - Opened by lilida about 1 year ago
- 4 comments
Labels: question
#419 - [Question]Running the DCN on a single GPU leads to the illegal memory access
Issue -
State: open - Opened by dusir about 1 year ago
- 1 comment
Labels: question, stage::doing
#418 - [Question] tensorflow 1.15 sok example
Issue -
State: open - Opened by MichoChan about 1 year ago
- 2 comments
Labels: question
#417 - [Question] An illegal memory access was encountered on H800 & Hugectr dcn test
Issue -
State: closed - Opened by dusir about 1 year ago
- 4 comments
Labels: bug, question, P0, stage::doing
#416 - [BUG] cooperative_groups/scan.h not in cuda11.X
Issue -
State: open - Opened by MichoChan about 1 year ago
- 5 comments
#415 - [Question] How can I export keras model with SOK?
Issue -
State: open - Opened by longern over 1 year ago
- 3 comments
Labels: question
#414 - [Question] Does HugeCtr support H800 GPU?
Issue -
State: closed - Opened by sparkling9809 over 1 year ago
- 6 comments
Labels: question
#413 - [Question]Does HugeCtr support read data for trainning from Kafka ?
Issue -
State: closed - Opened by sparkling9809 over 1 year ago
- 3 comments
Labels: question
#412 - HashMapBackend occupies 10x memory usage than binary data.
Issue -
State: closed - Opened by ZhuYuJin over 1 year ago
- 7 comments
Labels: question
#411 - [Question] Does HugeCTR support all P-series GPUs? and does it support tfserving as inference?
Issue -
State: closed - Opened by Shu-HowTing over 1 year ago
- 2 comments
Labels: question
#410 - [Requirement] FS Support for Azure Blob Storage
Issue -
State: closed - Opened by shivamsbatra over 1 year ago
- 1 comment
#409 - [BUG] Can’t compile sok
Issue -
State: closed - Opened by kangna-qi over 1 year ago
- 2 comments
#408 - [Question] Confused about the additional element of the output of InteractionLayer
Issue -
State: closed - Opened by heroes999 over 1 year ago
- 6 comments
Labels: question
#407 - [Question] Is there any way for hps to load an embedding table into multiple GPUs?
Issue -
State: closed - Opened by sparkling9809 over 1 year ago
- 4 comments
Labels: question
#406 - [Question]link for day_1.gz is invalid
Issue -
State: closed - Opened by zmxdream over 1 year ago
Labels: question
#405 - Update session_inference_test.cpp
Pull Request -
State: closed - Opened by lxh over 1 year ago
#404 - [Question] Multi-node training encounters Runtime error: unhandled system error ncclGroupEnd()
Issue -
State: closed - Opened by heroes999 over 1 year ago
- 9 comments
Labels: question
#403 - [Question] position bias
Issue -
State: closed - Opened by skunkwerk over 1 year ago
- 2 comments
Labels: question
#402 - [Question] A question towards HugeCTR::concurrent_unordered_map::get_insert
Issue -
State: closed - Opened by heroes999 over 1 year ago
- 6 comments
Labels: question
#401 - [BUG]dlrm script has quite a lot compatibility issues.
Issue -
State: closed - Opened by zpcalan over 1 year ago
- 2 comments
#400 - [Question] Are Sharp and IB a must have for multi-node traning?
Issue -
State: closed - Opened by heroes999 over 1 year ago
- 3 comments
Labels: question
#399 - [Question]Can I build and use gpu_cache independently?
Issue -
State: closed - Opened by RobertLou over 1 year ago
- 1 comment
Labels: question
#398 - [Question]Any randomness in data reader? Any randomness in Model.fit?
Issue -
State: closed - Opened by heroes999 over 1 year ago
- 3 comments
Labels: question
#397 - [Question]How to process criteo day0(50GB)'s dataset to run ETC?
Issue -
State: closed - Opened by zpcalan over 1 year ago
- 11 comments
Labels: question
#396 - Can't install sparse_operation_kit
Issue -
State: closed - Opened by yourtj over 1 year ago
- 2 comments
#395 - [Question]Can't use ETC to train multiple datasets.
Issue -
State: closed - Opened by zpcalan over 1 year ago
- 3 comments
Labels: question
#394 - [Question]loss_test not stable, sometimes some cases will fail
Issue -
State: closed - Opened by heroes999 over 1 year ago
- 6 comments
Labels: question
#393 - [Question]When setting use_mixed_precision=True, wdl training does not converge.
Issue -
State: closed - Opened by zpcalan over 1 year ago
- 22 comments
Labels: question
#392 - Redirect master pages to main
Pull Request -
State: closed - Opened by alexanderronquillo over 1 year ago
#391 - [Question] Failed to run lookup sparse distribute example
Issue -
State: closed - Opened by Nov11 over 1 year ago
- 1 comment
Labels: question
#390 - [Question] Does ETC training feature support to run on multiple physical nodes?
Issue -
State: closed - Opened by zpcalan over 1 year ago
- 1 comment
Labels: question
#389 - [BUG]Can NOT run wdl_parquet.py: CUDNN_STATUS_MAPPING_ERROR
Issue -
State: closed - Opened by butterluo over 1 year ago
- 6 comments
#388 - [Question] SOK - How to save sok.expertiment.Variable correctly into saved model ?
Issue -
State: closed - Opened by Nov11 over 1 year ago
- 2 comments
Labels: question
#387 - [Question] Calling 'apply_gradients' on sok.experiment.Variable reports Variable not created in the strategy scope
Issue -
State: closed - Opened by Nov11 over 1 year ago
- 2 comments
Labels: question
#386 - [BUG] Failed to process day23 of criteo data.
Issue -
State: closed - Opened by zpcalan over 1 year ago
- 3 comments
#385 - [Question] Having a hard time running demo with tensorflow2 mirrorredstrategy
Issue -
State: closed - Opened by Nov11 over 1 year ago
- 2 comments
Labels: question
#384 - [Question] negative item counts in MovieLens notebook
Issue -
State: closed - Opened by JohnFirth over 1 year ago
- 1 comment
Labels: question
#383 - [BUG] cannot run sok demo with official image
Issue -
State: closed - Opened by ZhuYuJin over 1 year ago
- 5 comments
Labels: bug
#382 - [BUG] EmbeddingCollection Wgrad buffer sizes can overflow a 32 bit integer.
Issue -
State: closed - Opened by zpzim over 1 year ago
- 3 comments
#381 - [Question] Dose HugeCTR support feature selection & feature elimination ?
Issue -
State: closed - Opened by wzhgithub over 1 year ago
- 1 comment
Labels: question
#380 - Fix UT failure for l2_regularizer_layer
Pull Request -
State: closed - Opened by EmmaQiaoCh over 1 year ago
- 1 comment
#379 - [Question] How to get the performance of inference
Issue -
State: closed - Opened by liangxuegang almost 2 years ago
- 2 comments
Labels: question
#378 - [BUG] Encountered GPU utilization of 100% while using the SparseOperationKit Experiment API.
Issue -
State: closed - Opened by Acacia124 almost 2 years ago
- 4 comments
#377 - [BUG] Documentation for Optimizer types has a typo.
Issue -
State: closed - Opened by ashish007git almost 2 years ago
- 2 comments
#376 - modify EV name
Pull Request -
State: closed - Opened by Mesilenceki almost 2 years ago
- 1 comment
#375 - [BUG] around 200 layer unit tests fail in my hugectr container, pls lend a hand
Issue -
State: closed - Opened by heroes999 almost 2 years ago
- 2 comments
#374 - [Question] how to not use cuda graph in hugectr?
Issue -
State: closed - Opened by LucQueen almost 2 years ago
- 2 comments
Labels: question
#373 - [Question] How to correctly use Embedding Training Cache feature in HugeCTR
Issue -
State: closed - Opened by yuqie almost 2 years ago
- 2 comments
Labels: question
#372 - [Question]what is about Segmentation fault when i train dlrm in mlperf?
Issue -
State: closed - Opened by LucQueen almost 2 years ago
- 6 comments
Labels: question
#371 - [BUG] Program crashes on garbage collection of inference session / model
Issue -
State: closed - Opened by yakoton almost 2 years ago
- 3 comments
#370 - [Question]How can i debug core dump when i use hugectr
Issue -
State: closed - Opened by LucQueen almost 2 years ago
- 1 comment
Labels: question
#369 - Fix hps doc typo
Pull Request -
State: closed - Opened by yingcanw about 2 years ago
- 1 comment
#368 - [BUG] databse
Issue -
State: closed - Opened by zhaozheng09 about 2 years ago
- 1 comment
#367 - original error: libcuda.so.1: cannot open shared object file: No such file or directory,a problem occurred in the docker image nvcr.io/nvidia/tensorflow:22.06-tf2-py3
Issue -
State: closed - Opened by shijiexu09 about 2 years ago
- 2 comments
#366 - [Question]DIN sample slot_size_array and key range overlap
Issue -
State: closed - Opened by liguo88 about 2 years ago
- 2 comments
Labels: question
#365 - [Question] Initialize SOK embedding on CPU to prevent OOM
Issue -
State: closed - Opened by WonderingWJ about 2 years ago
- 1 comment
Labels: question
#364 - [BUG] DIN sample refers to old version of NVTabular and produces error when running w/ 22.09 container
Issue -
State: closed - Opened by jsohn-nvidia about 2 years ago
- 3 comments
Labels: bug, P1
#363 - [Requirement] TLS communication for cloud-hosted HPS
Issue -
State: closed - Opened by Spartee about 2 years ago
- 6 comments
Labels: fea::functional, requirement