arcee-ai/distillkit issues and pull requests

#19 - Setup broken with PyTorch 2.5.x

Issue - State: open - Opened by juliensimon 10 days ago - 1 comment
Labels: bug

#18 - Offline Distillation using Top K Logits

Issue - State: closed - Opened by raghavgarg97 20 days ago - 2 comments

#17 - Models with same architecture but different tokenizer

Issue - State: closed - Opened by bil-ash 23 days ago - 2 comments

#16 - AttributeError: 'DataParallel' object has no attribute 'device'

Issue - State: open - Opened by Wolfman1219 about 1 month ago

#16 - AttributeError: 'DataParallel' object has no attribute 'device'

Issue - State: open - Opened by Wolfman1219 about 1 month ago

#15 - Support Llama 3.2 1B model ?

Issue - State: closed - Opened by JinYu1998 about 1 month ago - 1 comment

#15 - Support Llama 3.2 1B model ?

Issue - State: closed - Opened by JinYu1998 about 1 month ago - 1 comment

#14 - Is Dense to MoE, MoE to Dense or MoE to MoE Distillation supported?

Issue - State: closed - Opened by linux-leo about 2 months ago - 1 comment

#13 - [News] GKD method

Issue - State: closed - Opened by kashif about 2 months ago - 1 comment

#12 - How can distillation be carried out under these circumstances?

Issue - State: open - Opened by lean-wang 2 months ago

#11 - Can it support multi-node training?

Issue - State: open - Opened by zidong-onepiece1 2 months ago

#10 - Deprecated positional argument(s) used in SFTTrainer

Issue - State: closed - Opened by JinYu1998 3 months ago - 7 comments

#9 - After training, the model output cannot stop

Issue - State: closed - Opened by blackblue9 3 months ago - 5 comments

#8 - Update distil_hidden.py for teacher tokenizer

Pull Request - State: closed - Opened by fernando-neto-ai 3 months ago

#7 - Teacher uses student tokenizer in distill_hidden.py

Issue - State: closed - Opened by HeegonJin 3 months ago - 1 comment

#6 - Plan for larger model?

Issue - State: closed - Opened by YixinSong-e 3 months ago - 2 comments

#5 - encoder model distillation?

Issue - State: closed - Opened by riyajatar37003 3 months ago - 1 comment

#4 - CUDA Out of memory issue

Issue - State: closed - Opened by avemio-digital 3 months ago - 8 comments

#3 - RuntimeError: 'weight' must be 2-D

Issue - State: open - Opened by Hasan-Syed25 3 months ago - 6 comments

#2 - Distillation with distil_hidden.py when launched fails with Pytorch Version 2.4 with "_get_socket_with_port"

Issue - State: closed - Opened by Nottlespike 4 months ago - 5 comments

#1 - added some initial logic to load the teacher logits

Pull Request - State: open - Opened by shamanez 4 months ago - 3 comments

Ecosyste.ms: Issues

GitHub / arcee-ai/distillkit issues and pull requests

#19 - Setup broken with PyTorch 2.5.x

#18 - Offline Distillation using Top K Logits

#17 - Models with same architecture but different tokenizer

#16 - AttributeError: 'DataParallel' object has no attribute 'device'

#16 - AttributeError: 'DataParallel' object has no attribute 'device'

#15 - Support Llama 3.2 1B model ?

#15 - Support Llama 3.2 1B model ?

#14 - Is Dense to MoE, MoE to Dense or MoE to MoE Distillation supported?

#13 - [News] GKD method

#12 - How can distillation be carried out under these circumstances?

#11 - Can it support multi-node training?

#10 - Deprecated positional argument(s) used in SFTTrainer

#9 - After training, the model output cannot stop

#8 - Update distil_hidden.py for teacher tokenizer

#7 - Teacher uses student tokenizer in distill_hidden.py

#6 - Plan for larger model?

#5 - encoder model distillation?

#4 - CUDA Out of memory issue

#3 - RuntimeError: 'weight' must be 2-D

#2 - Distillation with distil_hidden.py when launched fails with Pytorch Version 2.4 with "_get_socket_with_port"

#1 - added some initial logic to load the teacher logits