Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / NVIDIA-Merlin/HugeCTR issues and pull requests

#463 - [Question] sok performance

Issue - State: open - Opened by Orca-bit about 1 month ago - 1 comment
Labels: question

#462 - [BUG] sok amp mode error

Issue - State: open - Opened by Orca-bit about 1 month ago - 1 comment

#461 - [BUG] encounter error when running sok dlrm benchmark

Issue - State: open - Opened by Orca-bit about 1 month ago - 2 comments

#460 - [BUG] compile sok error

Issue - State: closed - Opened by Orca-bit about 1 month ago - 2 comments

#459 - [Question] What is the difference between HugeCTR/embedding and HugeCTR/src/embeddings?

Issue - State: closed - Opened by Orca-bit 2 months ago - 1 comment
Labels: question

#458 - Sync from gitlab

Pull Request - State: closed - Opened by EmmaQiaoCh 2 months ago - 2 comments

#457 - Bump actions/download-artifact from 3 to 4.1.7 in /.github/workflows

Pull Request - State: open - Opened by dependabot[bot] 3 months ago - 1 comment
Labels: dependencies, github_actions

#455 - low frequency filter

Pull Request - State: closed - Opened by ccccjunkang 4 months ago - 2 comments

#454 - [Requirement] Custom allocator support for gpu_cache

Issue - State: open - Opened by mfbalin 5 months ago - 1 comment

#452 - [BUG] Slot calculation error in static_hash_table.cu

Issue - State: open - Opened by Jiaao-Bai 6 months ago - 2 comments

#451 - [Question] How to add new models to HPS configuration when using Model Control Mode EXPLICIT?

Issue - State: closed - Opened by dmac 6 months ago - 5 comments
Labels: question

#449 - [Question] Help converting ONNX to TensorRT with graphsurgeon and HPS plugin

Issue - State: closed - Opened by dmac 6 months ago - 4 comments
Labels: question

#448 - Update hierarchical_parameter_server_demo.ipynb

Pull Request - State: open - Opened by jq 6 months ago - 1 comment

#447 - Remove some internal files

Pull Request - State: closed - Opened by EmmaQiaoCh 8 months ago - 1 comment

#446 - Fix hps docs typo and hps profiler example argument

Pull Request - State: open - Opened by shyeonn 8 months ago - 1 comment

#444 - [Question] Is there any related architecture design or documentation for embedding collection

Issue - State: closed - Opened by Jiaao-Bai 9 months ago - 2 comments
Labels: question

#443 - [Question] Can i read parquet data from HDFS?

Issue - State: closed - Opened by wangxingda 9 months ago - 6 comments
Labels: question

#442 - [BUG]build failed on gtest!

Issue - State: closed - Opened by SeekPoint 10 months ago - 5 comments
Labels: bug

#440 - [BUG] Seg Fault When Deploying TF+HPS Model with merlin-tensorflow

Issue - State: open - Opened by tuanavu 10 months ago - 9 comments

#439 - [BUG] Run sok tests error

Issue - State: closed - Opened by kangna-qi 10 months ago - 1 comment

#438 - [Question] How to dump incremental model to kafka in Release 23.12?

Issue - State: open - Opened by lausannel 11 months ago - 2 comments
Labels: question

#436 - support lock-free hashmap backend

Pull Request - State: open - Opened by ZhuYuJin 12 months ago

#433 - [BUG] CUDNN_STATUS_MAPPING_ERROR with cudnnSetStream

Issue - State: closed - Opened by rgandikota 12 months ago - 21 comments

#431 - Trouble installing hugectr_backend for Triton Server

Issue - State: closed - Opened by sezhiyanhari about 1 year ago - 1 comment

#430 - fix: typo in kafka broker

Pull Request - State: open - Opened by lausannel about 1 year ago - 1 comment
Labels: fea::refactor, fea::chore

#429 - [BUG] Encountered ETC error of din model when training with multiple keyset.

Issue - State: closed - Opened by dusir about 1 year ago - 3 comments

#428 - [Question] nv_gpu_cache compiling problem

Issue - State: closed - Opened by RobertLou about 1 year ago - 1 comment
Labels: question

#427 - [Question] How can I pre-calculate the GPU memory required for embedding cache size?

Issue - State: open - Opened by tuanavu about 1 year ago - 2 comments
Labels: question

#426 - Support for configuration issues

Issue - State: open - Opened by EmmaQiaoCh about 1 year ago - 1 comment

#424 - [Question] Difference between Embedding Training Cache and GPU Embedding Cache

Issue - State: open - Opened by hsezhiyan about 1 year ago - 9 comments
Labels: question

#423 - Update doc dependencies

Pull Request - State: closed - Opened by EmmaQiaoCh about 1 year ago - 1 comment

#422 - [Question] How to serve TF2 SOK model in Triton Inference and convert it to ONNX?

Issue - State: closed - Opened by tuanavu about 1 year ago - 1 comment
Labels: question

#421 - [Question] COnfiguration issues with mlcommon benchmarking

Issue - State: open - Opened by raghavendrachari08 about 1 year ago - 2 comments
Labels: question

#420 - [Question] Is there a slack channel or discord server for questions and discussion ?

Issue - State: open - Opened by lilida about 1 year ago - 4 comments
Labels: question

#419 - [Question]Running the DCN on a single GPU leads to the illegal memory access

Issue - State: open - Opened by dusir about 1 year ago - 1 comment
Labels: question, stage::doing

#418 - [Question] tensorflow 1.15 sok example

Issue - State: open - Opened by MichoChan about 1 year ago - 2 comments
Labels: question

#417 - [Question] An illegal memory access was encountered on H800 & Hugectr dcn test

Issue - State: closed - Opened by dusir about 1 year ago - 4 comments
Labels: bug, question, P0, stage::doing

#416 - [BUG] cooperative_groups/scan.h not in cuda11.X

Issue - State: open - Opened by MichoChan about 1 year ago - 5 comments

#415 - [Question] How can I export keras model with SOK?

Issue - State: open - Opened by longern over 1 year ago - 3 comments
Labels: question

#414 - [Question] Does HugeCtr support H800 GPU?

Issue - State: closed - Opened by sparkling9809 over 1 year ago - 6 comments
Labels: question

#413 - [Question]Does HugeCtr support read data for trainning from Kafka ?

Issue - State: closed - Opened by sparkling9809 over 1 year ago - 3 comments
Labels: question

#412 - HashMapBackend occupies 10x memory usage than binary data.

Issue - State: closed - Opened by ZhuYuJin over 1 year ago - 7 comments
Labels: question

#411 - [Question] Does HugeCTR support all P-series GPUs? and does it support tfserving as inference?

Issue - State: closed - Opened by Shu-HowTing over 1 year ago - 2 comments
Labels: question

#410 - [Requirement] FS Support for Azure Blob Storage

Issue - State: closed - Opened by shivamsbatra over 1 year ago - 1 comment

#409 - [BUG] Can’t compile sok

Issue - State: closed - Opened by kangna-qi over 1 year ago - 2 comments

#408 - [Question] Confused about the additional element of the output of InteractionLayer

Issue - State: closed - Opened by heroes999 over 1 year ago - 6 comments
Labels: question

#407 - [Question] Is there any way for hps to load an embedding table into multiple GPUs?

Issue - State: closed - Opened by sparkling9809 over 1 year ago - 4 comments
Labels: question

#406 - [Question]link for day_1.gz is invalid

Issue - State: closed - Opened by zmxdream over 1 year ago
Labels: question

#405 - Update session_inference_test.cpp

Pull Request - State: closed - Opened by lxh over 1 year ago

#404 - [Question] Multi-node training encounters Runtime error: unhandled system error ncclGroupEnd()

Issue - State: closed - Opened by heroes999 over 1 year ago - 9 comments
Labels: question

#403 - [Question] position bias

Issue - State: closed - Opened by skunkwerk over 1 year ago - 2 comments
Labels: question

#402 - [Question] A question towards HugeCTR::concurrent_unordered_map::get_insert

Issue - State: closed - Opened by heroes999 over 1 year ago - 6 comments
Labels: question

#401 - [BUG]dlrm script has quite a lot compatibility issues.

Issue - State: closed - Opened by zpcalan over 1 year ago - 2 comments

#400 - [Question] Are Sharp and IB a must have for multi-node traning?

Issue - State: closed - Opened by heroes999 over 1 year ago - 3 comments
Labels: question

#399 - [Question]Can I build and use gpu_cache independently?

Issue - State: closed - Opened by RobertLou over 1 year ago - 1 comment
Labels: question

#398 - [Question]Any randomness in data reader? Any randomness in Model.fit?

Issue - State: closed - Opened by heroes999 over 1 year ago - 3 comments
Labels: question

#397 - [Question]How to process criteo day0(50GB)'s dataset to run ETC?

Issue - State: closed - Opened by zpcalan over 1 year ago - 11 comments
Labels: question

#396 - Can't install sparse_operation_kit

Issue - State: closed - Opened by yourtj over 1 year ago - 2 comments

#395 - [Question]Can't use ETC to train multiple datasets.

Issue - State: closed - Opened by zpcalan over 1 year ago - 3 comments
Labels: question

#394 - [Question]loss_test not stable, sometimes some cases will fail

Issue - State: closed - Opened by heroes999 over 1 year ago - 6 comments
Labels: question

#393 - [Question]When setting use_mixed_precision=True, wdl training does not converge.

Issue - State: closed - Opened by zpcalan over 1 year ago - 22 comments
Labels: question

#392 - Redirect master pages to main

Pull Request - State: closed - Opened by alexanderronquillo over 1 year ago

#391 - [Question] Failed to run lookup sparse distribute example

Issue - State: closed - Opened by Nov11 over 1 year ago - 1 comment
Labels: question

#390 - [Question] Does ETC training feature support to run on multiple physical nodes?

Issue - State: closed - Opened by zpcalan over 1 year ago - 1 comment
Labels: question

#389 - [BUG]Can NOT run wdl_parquet.py: CUDNN_STATUS_MAPPING_ERROR

Issue - State: closed - Opened by butterluo over 1 year ago - 6 comments

#388 - [Question] SOK - How to save sok.expertiment.Variable correctly into saved model ?

Issue - State: closed - Opened by Nov11 over 1 year ago - 2 comments
Labels: question

#386 - [BUG] Failed to process day23 of criteo data.

Issue - State: closed - Opened by zpcalan over 1 year ago - 3 comments

#385 - [Question] Having a hard time running demo with tensorflow2 mirrorredstrategy

Issue - State: closed - Opened by Nov11 over 1 year ago - 2 comments
Labels: question

#384 - [Question] negative item counts in MovieLens notebook

Issue - State: closed - Opened by JohnFirth over 1 year ago - 1 comment
Labels: question

#383 - [BUG] cannot run sok demo with official image

Issue - State: closed - Opened by ZhuYuJin over 1 year ago - 5 comments
Labels: bug

#382 - [BUG] EmbeddingCollection Wgrad buffer sizes can overflow a 32 bit integer.

Issue - State: closed - Opened by zpzim over 1 year ago - 3 comments

#381 - [Question] Dose HugeCTR support feature selection & feature elimination ?

Issue - State: closed - Opened by wzhgithub over 1 year ago - 1 comment
Labels: question

#380 - Fix UT failure for l2_regularizer_layer

Pull Request - State: closed - Opened by EmmaQiaoCh over 1 year ago - 1 comment

#379 - [Question] How to get the performance of inference

Issue - State: closed - Opened by liangxuegang almost 2 years ago - 2 comments
Labels: question

#377 - [BUG] Documentation for Optimizer types has a typo.

Issue - State: closed - Opened by ashish007git almost 2 years ago - 2 comments

#376 - modify EV name

Pull Request - State: closed - Opened by Mesilenceki almost 2 years ago - 1 comment

#375 - [BUG] around 200 layer unit tests fail in my hugectr container, pls lend a hand

Issue - State: closed - Opened by heroes999 almost 2 years ago - 2 comments

#374 - [Question] how to not use cuda graph in hugectr?

Issue - State: closed - Opened by LucQueen almost 2 years ago - 2 comments
Labels: question

#373 - [Question] How to correctly use Embedding Training Cache feature in HugeCTR

Issue - State: closed - Opened by yuqie almost 2 years ago - 2 comments
Labels: question

#372 - [Question]what is about Segmentation fault when i train dlrm in mlperf?

Issue - State: closed - Opened by LucQueen almost 2 years ago - 6 comments
Labels: question

#371 - [BUG] Program crashes on garbage collection of inference session / model

Issue - State: closed - Opened by yakoton almost 2 years ago - 3 comments

#370 - [Question]How can i debug core dump when i use hugectr

Issue - State: closed - Opened by LucQueen almost 2 years ago - 1 comment
Labels: question

#369 - Fix hps doc typo

Pull Request - State: closed - Opened by yingcanw about 2 years ago - 1 comment

#368 - [BUG] databse

Issue - State: closed - Opened by zhaozheng09 about 2 years ago - 1 comment

#366 - [Question]DIN sample slot_size_array and key range overlap

Issue - State: closed - Opened by liguo88 about 2 years ago - 2 comments
Labels: question

#365 - [Question] Initialize SOK embedding on CPU to prevent OOM

Issue - State: closed - Opened by WonderingWJ about 2 years ago - 1 comment
Labels: question

#364 - [BUG] DIN sample refers to old version of NVTabular and produces error when running w/ 22.09 container

Issue - State: closed - Opened by jsohn-nvidia about 2 years ago - 3 comments
Labels: bug, P1

#363 - [Requirement] TLS communication for cloud-hosted HPS

Issue - State: closed - Opened by Spartee about 2 years ago - 6 comments
Labels: fea::functional, requirement