arcee-ai/mergekit issues and pull requests

#395 - Error at MoE Qwen 1.5B

Issue - State: closed - Opened by ehristoforu 6 months ago - 3 comments

#394 - Null vocab_file Issue with mistral v03 based models when using union tokenizer source

Issue - State: open - Opened by guillermo-gabrielli-fer 6 months ago - 2 comments

#393 - Is there a way to run LORA extraction using multi GPU? 70B LORA extraction OOM on 24GB 3090Ti

Issue - State: open - Opened by Nero10578 6 months ago - 4 comments

#392 - Example case of task_arithmetic needed

Issue - State: open - Opened by Opdoop 6 months ago - 1 comment

#391 - MoE exits itself after expert prompts 100% 2/2

Issue - State: open - Opened by SameedHusayn 6 months ago

#390 - mergekit saves tied and ignored weights unlike what transformers does when saving

Issue - State: open - Opened by nyxkrage 6 months ago

#389 - Create Communication Channels for MergeKit

Issue - State: open - Opened by aditya-cherukuru 6 months ago

#388 - The speed issue with the GTATask.

Issue - State: open - Opened by daidaiershidi 6 months ago - 3 comments

#387 - ABM corrections

Pull Request - State: open - Opened by metric-space 6 months ago

#386 - How to Create a New Merging Method

Issue - State: open - Opened by Guozhenyuan 6 months ago - 1 comment

#385 - Result of merging 2 Gemma2 9B models gains 1B parameters somehow

Issue - State: closed - Opened by jim-plus 6 months ago - 6 comments

#383 - does not appear to have a file named config.json

Issue - State: open - Opened by bxf1001 6 months ago - 2 comments

#382 - Added support for DeepseekV2 model

Pull Request - State: open - Opened by aditya-29 7 months ago - 4 comments

#379 - mergekit-moe支持qwen吗？

Issue - State: open - Opened by hoooooli 7 months ago - 5 comments

#378 - Questions about Config

Issue - State: open - Opened by Zheng-Jay 7 months ago - 2 comments

#377 - mergekit-evolve doesn't account for higher_is_better: false tasks.

Issue - State: open - Opened by mekaneeky 7 months ago - 1 comment

#375 - Network is unreachable

Issue - State: closed - Opened by guanfaqian 7 months ago - 1 comment

#370 - remove strict version of pydantic

Pull Request - State: closed - Opened by sreev 7 months ago - 1 comment

#366 - Add Della merge method

Pull Request - State: closed - Opened by Tej-Deep 7 months ago - 6 comments

#364 - gracefully pause evolutionary optimization?

Issue - State: open - Opened by johnwee1 7 months ago - 1 comment

#360 - Condense a models layers.

Issue - State: open - Opened by DewEfresh 7 months ago - 1 comment

#357 - NuSLERP

Pull Request - State: closed - Opened by cg123 8 months ago - 1 comment

#350 - qwen2-0.5B cannot be merged into MoE

Issue - State: closed - Opened by letterk 8 months ago - 5 comments

#341 - Evolutionary Merging out of memory

Issue - State: open - Opened by ArcherShirou 8 months ago - 4 comments

#340 - Weights Metrics

Pull Request - State: open - Opened by ElliotStein 8 months ago

#335 - Merge arbitrary pytorch models

Pull Request - State: open - Opened by cg123 8 months ago - 1 comment

#333 - `extract_lora.py` improvements and fixes

Pull Request - State: closed - Opened by jukofyork 9 months ago - 12 comments

#332 - Add --load-in-4bit and --load-in-8bit for HF eval backend

Pull Request - State: closed - Opened by cg123 9 months ago

#319 - How to merge a VLM and LLM with different model type.

Issue - State: open - Opened by tanyakansal30 9 months ago - 1 comment

#312 - Qwen/Qwen1.5-1.8B MoE Merging fails

Issue - State: closed - Opened by dgolchin 9 months ago - 4 comments

#251 - Attempt to make zipit work speak the same language as rest of mergekit

Pull Request - State: closed - Opened by metric-space 10 months ago

#249 - Mainly adding modified M_U computation.

Pull Request - State: closed - Opened by shamanez 10 months ago

#243 - _pickle.UnpicklingError: Unsupported type torch._tensor._rebuild_from_type_v2

Issue - State: open - Opened by rangan2510 10 months ago - 5 comments

#207 - Evolutionary Merging Method

Issue - State: open - Opened by codelauncher444 11 months ago - 19 comments

#198 - Idea: Downscaling the K and/or Q matrices for repeated layers in franken-merges?

Issue - State: open - Opened by jukofyork 11 months ago - 63 comments

#195 - Add support for GPTBigCodeForCausalLM

Pull Request - State: closed - Opened by cg123 11 months ago - 2 comments

#179 - Automatic Weight Calc based on NearSwap

Pull Request - State: closed - Opened by Steel-skull 12 months ago - 2 comments

#168 - Support for Merge methods which require some input data?

Issue - State: closed - Opened by ita9naiwa 12 months ago - 2 comments

#167 - Adds a new method to shuffle/swap values

Pull Request - State: open - Opened by Ar57m 12 months ago - 5 comments

#158 - qwen2 architecture definition

Pull Request - State: closed - Opened by thomasgauthier about 1 year ago - 6 comments

#150 - Fix phi-2 merging to MoE.

Pull Request - State: closed - Opened by PhilipMay about 1 year ago - 4 comments

#101 - moe - ValidationError: 1 validation error for MergeConfiguration

Issue - State: closed - Opened by naseerfaheem about 1 year ago - 1 comment

#100 - Adds a way of merging models with different sizes(B)

Pull Request - State: closed - Opened by Ar57m about 1 year ago - 10 comments

#99 - Add JAISLMHeadModel

Pull Request - State: closed - Opened by cg123 about 1 year ago

#98 - JapaneseStableLMAlphaForCausalLM support

Pull Request - State: closed - Opened by cg123 about 1 year ago

#97 - KeyError: 'model.embed_tokens.weight' when using mergekit-moe

Issue - State: open - Opened by axrwl about 1 year ago - 5 comments

#96 - Support for Qwen model

Issue - State: closed - Opened by sorasoras about 1 year ago - 8 comments

#95 - Why can't I run mergekit-moe command in mixtral branch ?

Issue - State: closed - Opened by ZhangEnmao about 1 year ago - 1 comment

#94 - Can you implement the expansion and merging of hidden_size and expand the original hidden?

Issue - State: open - Opened by win10ogod about 1 year ago - 2 comments

#93 - Just merge models

Issue - State: closed - Opened by ftgreat about 1 year ago - 1 comment

#92 - support for JAISLMHeadModel

Issue - State: closed - Opened by h9-tect about 1 year ago - 7 comments

#91 - Support for JapaneseStableLMAlphaForCausalLM

Issue - State: open - Opened by azulika about 1 year ago - 1 comment

#90 - Don't use Safetensors

Issue - State: closed - Opened by fakerybakery about 1 year ago - 1 comment

#89 - Latest commit to Mixtral branch causes script to never run

Issue - State: closed - Opened by Dakraid about 1 year ago - 1 comment

#88 - Support LLaMA MoE?

Issue - State: closed - Opened by cdj0311 about 1 year ago - 2 comments

#87 - Even when gate_mode is set to random, it is still required to input different positive prompts.

Issue - State: closed - Opened by aoi-naive about 1 year ago - 1 comment

#86 - Can mergekit be applied to merge multiple LoRA checkpoints by weights?

Issue - State: open - Opened by authurlord about 1 year ago - 3 comments

#85 - RuntimeError: Some tensors share memory, this will lead to duplicate memory on disk and potential differences when loading them again

Issue - State: closed - Opened by Quang-elec44 about 1 year ago - 6 comments

#84 - Question

Issue - State: closed - Opened by dillfrescott about 1 year ago - 3 comments

#83 - Add support for GPTBigCodeForCausalLM

Pull Request - State: closed - Opened by cg123 about 1 year ago - 1 comment

#82 - Convert Phi to Llama

Issue - State: closed - Opened by fakerybakery about 1 year ago - 5 comments

#81 - Support for MoE Phi-2

Issue - State: open - Opened by beratcmn about 1 year ago - 2 comments

#80 - Support for GPTBigCodeForCausalLM (StarCoder)

Issue - State: closed - Opened by rpand002 about 1 year ago - 1 comment

#79 - Add tokenizer merging tests

Pull Request - State: closed - Opened by cg123 about 1 year ago

#78 - fix(mega): use config file name for final merge

Pull Request - State: closed - Opened by nyxkrage about 1 year ago - 2 comments

#77 - [Mixtral] Positive Prompt Format

Issue - State: closed - Opened by fakerybakery about 1 year ago - 2 comments

#76 - merge_method on moe

Issue - State: closed - Opened by guidevops about 1 year ago - 2 comments

#75 - no error simple question

Issue - State: open - Opened by kalle07 about 1 year ago - 3 comments

#74 - The differences in principles and effects between the various merging methods

Issue - State: open - Opened by hywchina about 1 year ago - 1 comment

#73 - Tokenizer merge fix

Pull Request - State: closed - Opened by cg123 about 1 year ago - 1 comment

#72 - mergekit-mega: compound merging using multiple yaml documents in a single merge config

Pull Request - State: closed - Opened by nyxkrage about 1 year ago - 5 comments

#71 - Union tokenizer merging seems to break lazy tensor loading

Issue - State: closed - Opened by brucethemoose about 1 year ago - 1 comment

#70 - confuse about parameter t in slerp

Issue - State: open - Opened by zyh3826 about 1 year ago - 5 comments

#69 - Fix fp16 DARE on CPU

Pull Request - State: closed - Opened by cg123 about 1 year ago

#68 - Has anyone tried modal.com for merging models ?

Issue - State: open - Opened by MarcelBP about 1 year ago - 1 comment

#67 - Computational Graph Overhaul

Pull Request - State: closed - Opened by cg123 about 1 year ago

#66 - Phi 2

Pull Request - State: closed - Opened by cg123 about 1 year ago

#65 - Move argument parsing to click

Pull Request - State: closed - Opened by cg123 about 1 year ago

#64 - Mixtral branch : What happens when we give both positive and negative prompts per an expert ?

Issue - State: open - Opened by shamanez about 1 year ago - 4 comments

#63 - While quantized by awq , error KeyError: 'block_sparse_moe.experts.0.w2'`

Issue - State: open - Opened by xiechengmude about 1 year ago - 2 comments

#62 - gradient merge

Issue - State: open - Opened by thistleknot about 1 year ago - 1 comment

#61 - Mixtral-moe branch minor issue

Issue - State: closed - Opened by RoseTheLocalFem about 1 year ago - 1 comment

#60 - Convert Mistral -> Llama

Issue - State: closed - Opened by fakerybakery about 1 year ago - 3 comments

#59 - frankenllama_22

Issue - State: closed - Opened by fakerybakery about 1 year ago - 2 comments

#58 - Generate Hugging Face model card

Pull Request - State: closed - Opened by cg123 about 1 year ago - 11 comments

#57 - Lazy tensor loader

Issue - State: open - Opened by sudy-super about 1 year ago - 3 comments

#56 - Eval

Issue - State: open - Opened by darkzbaron about 1 year ago

#55 - Relevant literature for these methods

Issue - State: closed - Opened by petroskarypis about 1 year ago - 1 comment

#54 - mergekit-moe seems to fail

Issue - State: closed - Opened by dillfrescott about 1 year ago - 3 comments

#53 - Could you please explain how passthrough slicing works?

Issue - State: closed - Opened by dillfrescott about 1 year ago - 2 comments

#52 - phi-2 error

Issue - State: closed - Opened by win10ogod about 1 year ago - 4 comments

#51 - Runtime Error, please help

Issue - State: closed - Opened by TuyulBrutal about 1 year ago - 2 comments

#50 - MergeKit models does not behave the same as the original model

Issue - State: open - Opened by casper-hansen about 1 year ago - 2 comments

#49 - Why two different options generate different size of models?

Issue - State: closed - Opened by DopeorNope-Lee about 1 year ago - 1 comment

#48 - add for loop for slices_in

Pull Request - State: closed - Opened by teilomillet about 1 year ago - 1 comment

GitHub / arcee-ai/mergekit issues and pull requests