NVlabs/VILA issues and pull requests

#142 - Long context video module only

Issue - State: open - Opened by mustafahalimeh about 1 month ago

#141 - Context size and examples for LongVILA

Issue - State: open - Opened by yulinzou about 1 month ago

#139 - cannot download dataset

Issue - State: open - Opened by henrycjh 2 months ago - 1 comment

#137 - VILA-1.5-HD coming soon?

Issue - State: open - Opened by collinmccarthy 2 months ago - 1 comment

#135 - ValueError: The checkpoint you are trying to load has model type `llava_llama` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

Issue - State: open - Opened by eternal8080 2 months ago - 12 comments

#130 - How to run longvila large context, sequence parallel inference?

Issue - State: open - Opened by zadeismael 3 months ago - 17 comments

#126 - TypeError: LlamaRotaryEmbedding.forward() got an unexpected keyword argument 'seq_len' when running VILA model inference

Issue - State: open - Opened by LanceLeonhart 3 months ago - 4 comments

#123 - Random shuffle before dropping the last few samples

Pull Request - State: open - Opened by tongzhoumu 3 months ago

#108 - Add .gitignore

Pull Request - State: open - Opened by zzxslp 3 months ago

#100 - Is there any way to increase the context window?

Issue - State: closed - Opened by ZackBradshaw 4 months ago - 4 comments

#99 - release schedule for the "VILA1.5-34b-4bit-AWQ" model.

Issue - State: closed - Opened by xiexiaoshinick 4 months ago - 1 comment

#98 - question: what does 'repack_multimodal_data' function do?

Issue - State: closed - Opened by orrzohar 4 months ago - 1 comment

#97 - Multi-Image or Multi-Video Inference Example

Issue - State: open - Opened by chancharikmitra 4 months ago - 2 comments

#96 - Support for multi-video captioning with multiple grid image inputs?

Issue - State: closed - Opened by YoungjaeDev 4 months ago - 2 comments

#95 - Whether the visual encoder participates in training

Issue - State: closed - Opened by LoverLost 4 months ago - 3 comments

#94 - Deployment to SageMaker and/or HuggingFace Inference Endpoints Fails With Error

Issue - State: open - Opened by averypfeiffer 4 months ago - 5 comments

#93 - How to convert model to gguf

Issue - State: closed - Opened by dand-milestone 4 months ago - 3 comments

#92 - how to finetune?

Issue - State: closed - Opened by lxw919 4 months ago - 1 comment

#91 - Does VILA support higher resolution than 336px, e.g., 672 or 1008?

Issue - State: closed - Opened by lalalal9797 4 months ago - 2 comments

#90 - About the release of VILA v1.5 technical report or blog

Issue - State: open - Opened by Fr0zenCrane 4 months ago - 1 comment

#89 - About the VILA1.5 3b

Issue - State: open - Opened by Davidup1 4 months ago - 2 comments

#88 - About Intermediate Checkpoints

Issue - State: closed - Opened by ruili33 4 months ago - 2 comments

#87 - Seems video token isn't used in the model during video inference

Issue - State: closed - Opened by Virtualexistence 4 months ago - 2 comments

#86 - Demo on Huggingface Spaces

Issue - State: open - Opened by yvrjsharma 4 months ago - 7 comments

#85 - Update README.md

Pull Request - State: closed - Opened by hongxuyin 4 months ago

#84 - docs: update README.md

Pull Request - State: closed - Opened by eltociear 4 months ago

#83 - How VILA can handle 8 frames from videos?

Issue - State: closed - Opened by KangsanKim07 4 months ago - 5 comments

#82 - Problem training on zero2.json

Issue - State: closed - Opened by Davidup1 4 months ago - 4 comments

#81 - Update perception test eval script and results in README

Pull Request - State: closed - Opened by Xiuyu-Li 4 months ago

#80 - Whether this is a bug?

Issue - State: closed - Opened by jihaonew 4 months ago - 7 comments

#79 - Multi image inference quality

Issue - State: closed - Opened by oroojlooy 5 months ago - 1 comment

#78 - The inference video reports an error： ValueError: Unable to create tensor, you should probably activate padding with 'padding=True' to have batched tensors with the same length.

Issue - State: closed - Opened by changqinyao 5 months ago - 3 comments

#77 - Question about the output

Issue - State: open - Opened by DwanZhang-AI 5 months ago - 4 comments

#76 - What is the --conv-mode of VILA1.5-13b?

Issue - State: closed - Opened by DwanZhang-AI 5 months ago - 2 comments

#75 - added functionality to process a bunch of videos at a time

Pull Request - State: closed - Opened by poorfrombabylon 5 months ago

#74 - OpenVLM leaderboard

Issue - State: open - Opened by oroojlooy 5 months ago - 3 comments

#73 - VILA Context-length

Issue - State: closed - Opened by oroojlooy 5 months ago - 2 comments

#72 - Why setting LLaMa3's padding direction to "right"?

Issue - State: open - Opened by ROIM1998 5 months ago - 2 comments

#71 - Bug in conversation.py

Issue - State: closed - Opened by zhang-jr 6 months ago - 2 comments

#70 - Finetuning

Issue - State: open - Opened by RohanR04 6 months ago - 6 comments

#69 - About VILADistributedSampler and gradient_accumulation_steps

Issue - State: closed - Opened by dreamerlin 6 months ago - 2 comments

#68 - Access to pretrained model weights

Issue - State: open - Opened by zzxslp 6 months ago - 4 comments

#67 - VILA-1.5 details

Issue - State: closed - Opened by Lopa07 6 months ago - 4 comments

#66 - How does the VILA preprocessed video?

Issue - State: closed - Opened by MonolithFoundation 6 months ago - 1 comment

#65 - Does S2 able to unfreeze vit to train?

Issue - State: closed - Opened by MonolithFoundation 6 months ago - 1 comment

#64 - Fix vision engine build

Pull Request - State: closed - Opened by meenchen 6 months ago

#63 - What is the LLM used for VILA 1.5 40B?

Issue - State: closed - Opened by javier-m 6 months ago - 1 comment

#62 - math dataset incomplete description

Issue - State: open - Opened by hubenjm 6 months ago - 2 comments

#61 - YouCook2 code to generate video clips from raw videos?

Issue - State: open - Opened by hubenjm 6 months ago - 4 comments

#60 - RuntimeError: GET was unable to find an engine to execute this computation

Issue - State: closed - Opened by pribadihcr 6 months ago - 2 comments

#59 - No module named 'llava.tf_utils'

Issue - State: closed - Opened by pribadihcr 6 months ago - 5 comments

#58 - Would you consider releasing code that supports lora training 40b model?

Issue - State: closed - Opened by Key-lei 6 months ago - 1 comment

#57 - When will new annotations files be available?

Issue - State: closed - Opened by hubenjm 6 months ago - 1 comment

#56 - "No module named llava"

Issue - State: closed - Opened by vedantroy 6 months ago - 2 comments

#55 - How's the DownSampleBlock performance compare with CAbstractor?

Issue - State: closed - Opened by lucasjinreal 6 months ago - 4 comments

#54 - Potential bug in mm_utils.py process_image function

Issue - State: open - Opened by hubenjm 6 months ago - 1 comment

#53 - working with VLLM

Issue - State: open - Opened by kousun12 6 months ago - 2 comments

#52 - How to evaluate 4shot?

Issue - State: open - Opened by leexinhao 6 months ago

#51 - Running the AWQ models

Issue - State: open - Opened by signine 6 months ago - 3 comments

#50 - Provide ShareGPT4V filtered annotations file

Issue - State: closed - Opened by hubenjm 6 months ago - 1 comment

#49 - About perception testset

Issue - State: open - Opened by mary-0830 6 months ago - 3 comments

#48 - Inference not working - Keyword tensor should have 2 or 3 dimensions, got 1

Issue - State: closed - Opened by signine 6 months ago - 5 comments

#47 - demo_trt_llm/convert_checkpoint.py - AttributeError: 'LlavaLlamaConfig' object has no attribute 'num_attention_heads'

Issue - State: closed - Opened by dimakan 6 months ago - 3 comments

#46 - Hi, Have you compare with s2 [384, 768] scales versus interpolate to 768x768?

Issue - State: open - Opened by OpenJarvisAI 6 months ago - 6 comments

#45 - Add support for GPUs with compute capability lower than 8.0 for awq/kernels installation

Issue - State: closed - Opened by rahulthakur319 6 months ago - 1 comment

#44 - Fix for backwards compatibility

Pull Request - State: closed - Opened by michael-heinrich 6 months ago

#43 - fix: PR #40 other bug.

Pull Request - State: closed - Opened by SeanCraven314 6 months ago - 4 comments

#42 - Request for middle checkpoint

Issue - State: closed - Opened by jihaonew 6 months ago - 3 comments

#41 - Easy backwards compatibility fix

Issue - State: open - Opened by michael-heinrich 6 months ago - 4 comments

#40 - fix: Fix tensor shape error, during llava inference.

Pull Request - State: closed - Opened by SeanCraven314 6 months ago - 1 comment

#39 - Llama-3-VILA1.5-8B Inference error

Issue - State: open - Opened by joebradly 6 months ago - 13 comments

#38 - Updated paper on the latest model (video understanding, etc.)

Issue - State: open - Opened by thecooltechguy 6 months ago - 4 comments

#37 - Chamfer distance's data source

Issue - State: closed - Opened by threegold116 6 months ago - 2 comments

#36 - Instruction for VILA 1.5 with tinychat (llm-awq) doesn't work well due to fixed torch version (==2.0.1)

Issue - State: open - Opened by gigony 6 months ago - 5 comments

#35 - Update readme of VILA1.5

Pull Request - State: closed - Opened by kentang-mit 6 months ago

#34 - vila1.5 release

Pull Request - State: closed - Opened by Efficient-Large-Language-Model 6 months ago

#33 - vila1.5 release

Pull Request - State: closed - Opened by Efficient-Large-Language-Model 6 months ago

#32 - video

Issue - State: closed - Opened by Efficient-Large-Language-Model 6 months ago

#31 - Possibility to support LLama-3?

Issue - State: closed - Opened by hzhang57 7 months ago - 1 comment

#30 - LLM version

Issue - State: closed - Opened by gordonhu608 7 months ago

#29 - Updated Mixtral for Long-context and Fake Gradient

Pull Request - State: closed - Opened by yukang2017 7 months ago

#28 - Model checkpoints before supervised fine-tuning

Issue - State: closed - Opened by CRazorback 7 months ago - 1 comment

#27 - Missing deepspeed config files in training scripts

Issue - State: closed - Opened by AoyuQC 7 months ago - 2 comments

#26 - Question on Multi-Image Input Processing During Training

Issue - State: open - Opened by gaozhihan 7 months ago

#25 - Cannot correctly recognize <im_patch>

Issue - State: closed - Opened by m2408gj 7 months ago - 2 comments

#24 - What're the modifications in `llava/train/transformers_replace`?

Issue - State: open - Opened by ys-zong 7 months ago - 6 comments

#23 - Intermediate stages checkpoints

Issue - State: closed - Opened by sarvghotra 7 months ago - 2 comments

#22 - More data leading to lower indicators?

Issue - State: closed - Opened by uniquehou 7 months ago - 3 comments

#21 - What's the purpose of func repack_multimodal_data?

Issue - State: closed - Opened by BlueBlueFF 8 months ago - 1 comment

#20 - Multi-image Input Inference Script

Issue - State: closed - Opened by gaozhihan 8 months ago - 2 comments

#19 - unexpected keyword argument 'seqlens_in_batch'

Issue - State: closed - Opened by katopz 8 months ago - 3 comments

#18 - Index error when conversations is short. (/aten/src/ATen/native/cuda/IndexKernel.cu:)

Issue - State: closed - Opened by hzhang57 8 months ago - 1 comment

#17 - Multi-image is worse than concat them as single image.

Issue - State: open - Opened by liuweijie19980216 8 months ago - 2 comments

#16 - AWQ Tinychat tensor mismatch RuntimeError

Issue - State: closed - Opened by leon-seidel 8 months ago - 1 comment

#15 - License

Issue - State: closed - Opened by fakerybakery 8 months ago - 2 comments

#14 - Base LLM for the VILA 7B Model

Issue - State: closed - Opened by shikhar-srivastava 8 months ago - 2 comments

#13 - FlashAttention Bug

Issue - State: closed - Opened by rzyfrank 8 months ago - 6 comments

#12 - Is stage2 neccessary?

Issue - State: closed - Opened by peibinchen 8 months ago - 6 comments

#11 - KeyError: 'llava_llama'

Issue - State: closed - Opened by huzicong 8 months ago - 2 comments

#10 - Inference has error: TypeError: LlamaForCausalLM.forward() got an unexpected keyword argument 'seqlens_in_batch'

Issue - State: closed - Opened by hzhang57 8 months ago - 4 comments

GitHub / NVlabs/VILA issues and pull requests