princeton-nlp/autocompressors issues and pull requests

#29 - The usage of past_key_values in AutoCompressorMixin

Issue - State: open - Opened by RewindL 4 months ago - 1 comment

#29 - The usage of past_key_values in AutoCompressorMixin

Issue - State: open - Opened by RewindL 4 months ago - 1 comment

#28 - Does Auto-Compressors support newer base model like LLama-3 or Qwen-2.5?

Issue - State: closed - Opened by RewindL 4 months ago - 2 comments

#28 - Does Auto-Compressors support newer base model like LLama-3 or Qwen-2.5?

Issue - State: closed - Opened by RewindL 4 months ago - 2 comments

#27 - Evaluation datasets needed

Issue - State: open - Opened by Mr-lonely0 5 months ago

#26 - fix: typo

Pull Request - State: closed - Opened by khs0415p 5 months ago

#25 - Your shared model trained on LLAMA2 is not trained on Lora, It's full-finetuned model.

Issue - State: closed - Opened by jason9693 6 months ago - 1 comment

#24 - torchrun error when generating training split

Issue - State: open - Opened by OswaldHe 6 months ago - 3 comments

#23 - substep & segment

Issue - State: closed - Opened by Lu-kuan-lpk 7 months ago - 1 comment

#22 - Install as python package?

Issue - State: closed - Opened by creisle 7 months ago - 1 comment

#21 - Some issue about ICL Experience

Issue - State: closed - Opened by broalantaps 10 months ago - 3 comments

#20 - Inquire on data of Table 1

Issue - State: open - Opened by void-b583x2-NULL 10 months ago - 1 comment

#19 - Question about the data preprocessing

Issue - State: closed - Opened by hxs91 10 months ago - 1 comment

#18 - Reduce the number of summary vectors

Issue - State: closed - Opened by rahulseetharaman 11 months ago - 2 comments

#17 - question about `position_ids`

Issue - State: closed - Opened by hxs91 11 months ago - 5 comments

#16 - Held-out perplexity question

Issue - State: closed - Opened by broalantaps 11 months ago - 3 comments

#15 - RuntimeError: FlashAttention only support fp16 and bf16 data type

Issue - State: closed - Opened by stdKonjac about 1 year ago - 3 comments

#14 - Dimension of last_hidden_state size

Issue - State: closed - Opened by imbalu007 about 1 year ago - 2 comments

#13 - AttributeError: 'SubstepTrainer' object has no attribute 'do_grad_scaling'

Issue - State: closed - Opened by msclar about 1 year ago - 3 comments

#12 - Install instructions are not clear

Issue - State: closed - Opened by imbalu007 about 1 year ago - 2 comments

#11 - Finetuning an autocompressor model

Issue - State: closed - Opened by imbalu007 about 1 year ago - 4 comments

#10 - Some fixes to make Llama train after the merge

Pull Request - State: closed - Opened by mu-arkhipov about 1 year ago - 1 comment

#9 - Merge Llama branch

Pull Request - State: closed - Opened by CodeCreator about 1 year ago

#8 - BUG REPORT

Issue - State: closed - Opened by Patrick-Ni over 1 year ago - 1 comment

#7 - Summary Vector Failures and Incomplete Answers with Numerical Contexts

Issue - State: closed - Opened by iseesaw over 1 year ago - 4 comments

#6 - CUDA out of memory.

Issue - State: closed - Opened by xuguohai over 1 year ago - 3 comments

#5 - Question on the preprocessed data

Issue - State: closed - Opened by LouChao98 over 1 year ago - 3 comments

#4 - Inquiry for the release date of the pre-trained model

Issue - State: closed - Opened by siyuhsu over 1 year ago - 1 comment

#3 - Follow-up on Code Release Timeline

Issue - State: closed - Opened by mpoemsl over 1 year ago - 4 comments

#2 - Timeline for Release of Code and Pre-Trained Models

Issue - State: closed - Opened by mpoemsl over 1 year ago - 2 comments

#1 - Readme: Add link and abstract of paper

Pull Request - State: closed - Opened by EwoutH over 1 year ago - 2 comments

Ecosyste.ms: Issues

GitHub / princeton-nlp/autocompressors issues and pull requests

#29 - The usage of past_key_values in AutoCompressorMixin

#29 - The usage of past_key_values in AutoCompressorMixin

#28 - Does Auto-Compressors support newer base model like LLama-3 or Qwen-2.5?

#28 - Does Auto-Compressors support newer base model like LLama-3 or Qwen-2.5?

#27 - Evaluation datasets needed

#26 - fix: typo

#25 - Your shared model trained on LLAMA2 is not trained on Lora, It's full-finetuned model.

#24 - torchrun error when generating training split

#23 - substep & segment

#22 - Install as python package?

#21 - Some issue about ICL Experience

#20 - Inquire on data of Table 1

#19 - Question about the data preprocessing

#18 - Reduce the number of summary vectors

#17 - question about `position_ids`

#16 - Held-out perplexity question

#15 - RuntimeError: FlashAttention only support fp16 and bf16 data type

#14 - Dimension of last_hidden_state size

#13 - AttributeError: 'SubstepTrainer' object has no attribute 'do_grad_scaling'

#12 - Install instructions are not clear

#11 - Finetuning an autocompressor model

#10 - Some fixes to make Llama train after the merge

#9 - Merge Llama branch

#8 - BUG REPORT

#7 - Summary Vector Failures and Incomplete Answers with Numerical Contexts

#6 - CUDA out of memory.

#5 - Question on the preprocessed data

#4 - Inquiry for the release date of the pre-trained model

#3 - Follow-up on Code Release Timeline

#2 - Timeline for Release of Code and Pre-Trained Models

#1 - Readme: Add link and abstract of paper