erfanzar/easydel issues and pull requests

#187 - inference on colab or kaggle with tpu v3 or v2 ?

Issue - State: open - Opened by karthik4579 11 days ago - 1 comment

#186 - merge upstream

Pull Request - State: closed - Opened by nathom 17 days ago - 1 comment

#184 - OOM issue with same batch size that was running ok on 0.0.80

Issue - State: open - Opened by salrowili about 1 month ago - 23 comments

#183 - Inference on TPU Pod (v4-64)

Issue - State: closed - Opened by creatorrr about 1 month ago - 10 comments

#182 - The code to convert EasyDel state to Torch (HF) is not working

Issue - State: closed - Opened by sultan2050 about 1 month ago - 2 comments

#181 - BUG : memory_stats() is not supported in TPU pod causing the inference in TPU pod to throw an error

Issue - State: closed - Opened by salrowili about 1 month ago - 8 comments

#180 - Nnx 1

Pull Request - State: closed - Opened by erfanzar about 1 month ago

#179 - Update auto_tx.py

Pull Request - State: closed - Opened by sparsh35 about 2 months ago - 1 comment

#178 - Performance Optimization and better Modularity

Pull Request - State: closed - Opened by erfanzar 2 months ago

#177 - There is some issue with trainer , loss gets to zero after 1 epoch full fine tuning.

Issue - State: closed - Opened by sparsh35 3 months ago - 6 comments

#176 - Examples and Docs to demonstrate running inferencing using Llama-3.2-1b or 3b models on TPU V4

Issue - State: closed - Opened by rakesh010101 3 months ago - 1 comment

#175 - Inference is very slow on an TPU v4 instance - 2 Tokens / Sec

Issue - State: closed - Opened by rakesh010101 3 months ago - 7 comments

#174 - Received an error while using pipeline generation using llama model

Issue - State: closed - Opened by rakesh010101 3 months ago - 1 comment

#173 - Update loss_funcs.py

Pull Request - State: closed - Opened by sparsh35 3 months ago

#172 - DPO trainer example

Issue - State: open - Opened by sparsh35 4 months ago - 16 comments

#171 - Custom dataset preprocessing

Issue - State: closed - Opened by ayukh 4 months ago - 4 comments

#170 - Nan losses with Gemma 1 DPO training on Kaggle TPU

Issue - State: closed - Opened by defdet 4 months ago - 5 comments

#169 - How to do sequence classification training ?

Issue - State: closed - Opened by sparsh35 4 months ago - 4 comments

#168 - Issue saving and converting the Gemma 2 model after training

Issue - State: closed - Opened by sparsh35 6 months ago - 2 comments

#167 - Update orpo_trainer.py

Pull Request - State: closed - Opened by sparsh35 6 months ago - 1 comment

#166 - TPU v4-32 set-up not working

Issue - State: closed - Opened by s-smits 7 months ago - 13 comments

#165 - Import error EasyDeL libraries examples/flash_attention_training_example.py

Issue - State: closed - Opened by s-smits 7 months ago - 6 comments

#164 - EasyDeL

Issue - State: closed - Opened by kuangdao 8 months ago - 1 comment

#163 - oom when llama2-7b sft

Issue - State: closed - Opened by kuangdao 8 months ago - 5 comments

#162 - Version `0.0.69`

Pull Request - State: closed - Opened by erfanzar 8 months ago

#161 - TPU-v3 Kaggle not working after update

Issue - State: closed - Opened by s-smits 8 months ago - 5 comments

#159 - Update base_trainer.py for handling total_batch_size>1

Pull Request - State: closed - Opened by s-smits 8 months ago

#158 - value error using flash attention

Issue - State: closed - Opened by heydaari 8 months ago - 1 comment

#157 - Logging into wandb.ai

Issue - State: closed - Opened by heydaari 8 months ago - 2 comments

#156 - NaN loss in ORPOTrainer with legacy_sharded_vanilla

Issue - State: closed - Opened by nyl199310 8 months ago - 9 comments

#155 - Out of Memory issue in new easydel version.

Issue - State: closed - Opened by nyl199310 9 months ago - 6 comments

#154 - Falcon-11B: Dict key mismatch; expected keys: ['input_layernorm', 'mlp', 'self_attention']; dict: {'self_attention': {'query_key_value': {'kernel': Array

Issue - State: closed - Opened by s-smits 9 months ago - 9 comments

#152 - [Feature Request] Add support for tiiuae/falcon-11B

Issue - State: closed - Opened by s-smits 9 months ago - 4 comments

#150 - Import Error

Issue - State: closed - Opened by heydaari 9 months ago - 1 comment

#149 - Mosaic kernels cannot be automatically partitioned. Please wrap the call in a shard_map or xmap

Issue - State: closed - Opened by nyl199310 9 months ago - 3 comments

#148 - Can't load checkpoints continue training

Issue - State: closed - Opened by IvoryTower800 10 months ago - 7 comments

#147 - AssertionError: Precision DEFAULT requested together with quantization.

Issue - State: closed - Opened by peterniu19 10 months ago - 5 comments

#146 - training does not start using latest easydel

Issue - State: closed - Opened by IvoryTower800 10 months ago - 6 comments

#145 - 'LoraWeight' object has no attribute 'tolist'

Issue - State: closed - Opened by defdet 10 months ago - 4 comments

#144 - Please provide support for LLama3 or provide example on how to serve it using Easydel

Issue - State: closed - Opened by jchauhan 10 months ago - 4 comments

#143 - load_in_8bit doesn't work on Kaggle TPU

Issue - State: closed - Opened by IvoryTower800 10 months ago - 2 comments

#142 - Out of memory for serving example

Issue - State: closed - Opened by xu3kev 10 months ago - 3 comments

#141 - Import Union

Pull Request - State: closed - Opened by xu3kev 10 months ago - 1 comment

#140 - Kaggle training examples don't work

Issue - State: closed - Opened by jcole75 10 months ago - 14 comments

#138 - Add support for iterable dataset loading

Pull Request - State: closed - Opened by yhavinga 10 months ago

#136 - Updating Beta Branch

Pull Request - State: closed - Opened by erfanzar 10 months ago

#135 - Add gradient norm logging, fix metric collection on multi-worker setup

Pull Request - State: closed - Opened by yhavinga 10 months ago

#134 - checkpoint's size is increasing everytime.

Issue - State: closed - Opened by IvoryTower800 11 months ago - 3 comments

#133 - Unable to Load EasyDeL State

Issue - State: closed - Opened by w11wo 11 months ago - 6 comments

#132 - Error converting easydel checkpoint to huggingface model.

Issue - State: closed - Opened by IvoryTower800 11 months ago - 2 comments

#131 - How to reduce TPU RAM when finetuning?

Issue - State: closed - Opened by IvoryTower800 11 months ago - 8 comments

#129 - Attention Mask for Packed Sequences (via Attention Bias)

Issue - State: closed - Opened by xingyaoww 11 months ago - 3 comments

#128 - Transformers-like API for inference

Issue - State: closed - Opened by Froggy111 11 months ago - 19 comments

#127 - Add save_total_limit argument to delete older checkpoints

Pull Request - State: closed - Opened by yhavinga 11 months ago

#126 - How to continue training from a previous saved easydel checkpoint?

Issue - State: closed - Opened by IvoryTower800 11 months ago - 9 comments

#125 - a question about how to increase batch size.

Issue - State: closed - Opened by IvoryTower800 11 months ago - 6 comments

#124 - Time whole train loop instead of only call to train step function

Pull Request - State: closed - Opened by yhavinga 11 months ago

#123 - Ignore token label smooth z loss

Pull Request - State: closed - Opened by yhavinga 11 months ago

#122 - Model configs pass attributes to PretrainedConfig to prevent override…

Pull Request - State: closed - Opened by yhavinga 11 months ago

#121 - Docs site is broken https://erfanzar.github.io

Issue - State: closed - Opened by nigh8w0lf 11 months ago - 1 comment

#120 - Training with Ring Attention Failed

Issue - State: closed - Opened by IvoryTower800 11 months ago - 10 comments

#119 - Update Beta branch version to EasyDeL 0.0.55

Pull Request - State: closed - Opened by erfanzar 11 months ago

#118 - Install from git not working

Issue - State: closed - Opened by sr5434 11 months ago - 8 comments

#117 - Training in kaggle's TPU is failing

Issue - State: closed - Opened by saidineshpola 11 months ago - 5 comments

#116 - Output Differs from Hugging Face Transformer Result and EasyDel Results

Issue - State: closed - Opened by jchauhan 11 months ago - 4 comments

#115 - Updating Beta Branch

Pull Request - State: closed - Opened by erfanzar 11 months ago

#114 - [Urgent] Exception while load AdaptLLM/medicine-chat, variant of llama

Issue - State: closed - Opened by jchauhan 12 months ago - 6 comments

#112 - Added zephyr prompter

Pull Request - State: closed - Opened by jchauhan 12 months ago - 1 comment

#111 - Easydel support on TPU v4.8 - getting exception

Issue - State: closed - Opened by jchauhan 12 months ago - 1 comment

#110 - Support HuggingFaceH4/zephyr-7b-beta serving using EasyDel

Issue - State: closed - Opened by jchauhan 12 months ago - 1 comment

#109 - None of the examples scripts works, that used to work earlier. Please test your examples again and update docs

Issue - State: closed - Opened by jchauhan 12 months ago - 1 comment

#108 - GPT2 (150M model) support on Tv2.8. Example scripts goes out of memory

Issue - State: closed - Opened by jchauhan 12 months ago - 1 comment

#107 - Fix unexpected indent error

Pull Request - State: closed - Opened by sr5434 12 months ago - 2 comments

#106 - Exception while running any model - einops.EinopsError: Error while processing rearrange-reduction pattern "b (c n) d -> b c n d".

Issue - State: closed - Opened by jchauhan 12 months ago - 1 comment

#105 - Example shown on https://pypi.org/project/EasyDeL/ to finetune tinyllama raise exception on kaggle

Issue - State: closed - Opened by jchauhan 12 months ago - 3 comments

#104 - Error while finetuning Tinyllama on Kaggle TPU

Issue - State: closed - Opened by jchauhan 12 months ago - 5 comments

#103 - QLoRA Finetune Example

Issue - State: closed - Opened by sr5434 12 months ago - 11 comments

#102 - Add label smoothing, z_loss and ignore <=0 tokens in loss calculation

Pull Request - State: closed - Opened by yhavinga 12 months ago - 13 comments

#100 - Optimize mean loss and accuracy calculation

Pull Request - State: closed - Opened by yhavinga about 1 year ago

#99 - Potential regression causing resource exhausted after recent commit

Issue - State: closed - Opened by yhavinga about 1 year ago - 3 comments

#99 - Potential regression causing resource exhausted after recent commit

Issue - State: closed - Opened by yhavinga about 1 year ago - 3 comments

#99 - Potential regression causing resource exhausted after recent commit

Issue - State: closed - Opened by yhavinga about 1 year ago - 3 comments

#98 - Error while training GPT2 on the kaggle

Issue - State: closed - Opened by jchauhan about 1 year ago - 3 comments