casper-hansen/AutoAWQ issues and pull requests

#29 - batching sample; no disable_fused_layers for FP16 model

Pull Request - State: closed - Opened by wanzhenchn over 1 year ago - 2 comments

#28 - [BUG] Fix illegal memory access + Quantized Multi-GPU support

Pull Request - State: closed - Opened by casper-hansen over 1 year ago - 1 comment

#27 - Allow user to use custom calibration data for quantization

Pull Request - State: closed - Opened by boehm-e over 1 year ago - 2 comments

#26 - Implement batch size for speed test

Pull Request - State: closed - Opened by casper-hansen over 1 year ago

#25 - support speedtest to benchmark FP16 model

Pull Request - State: closed - Opened by wanzhenchn over 1 year ago - 3 comments

#24 - remove fixed compute capabilities list

Pull Request - State: closed - Opened by wanzhenchn over 1 year ago - 3 comments

#23 - YaRN support for LLaMa models

Pull Request - State: closed - Opened by casper-hansen over 1 year ago

#22 - add llava model support

Issue - State: closed - Opened by qZhang88 over 1 year ago - 3 comments
Labels: enhancement, good first issue

#21 - fuse_layers bug fix

Pull Request - State: closed - Opened by qwopqwop200 over 1 year ago - 2 comments

#20 - Bug hunt: illegal memory access

Issue - State: closed - Opened by casper-hansen over 1 year ago - 10 comments
Labels: bug, help wanted

#19 - Implement xformers layernorm (2x faster than nn.LayerNorm)

Pull Request - State: closed - Opened by casper-hansen over 1 year ago

#18 - Refactor fused modules

Pull Request - State: closed - Opened by casper-hansen over 1 year ago

#17 - Add multi-gpu support to fused layers

Issue - State: closed - Opened by casper-hansen over 1 year ago - 1 comment
Labels: help wanted

#16 - windows support

Pull Request - State: closed - Opened by qwopqwop200 over 1 year ago - 1 comment

#15 - Support batch input for performance test

Issue - State: closed - Opened by wanzhenchn over 1 year ago - 2 comments

#14 - Windows build support

Issue - State: closed - Opened by casper-hansen over 1 year ago
Labels: help wanted

#13 - Cuda issue when trying to install

Issue - State: closed - Opened by mhenrichsen over 1 year ago - 5 comments

#12 - Recursion error when creating AutoTokenizer for llama-13b-hf

Issue - State: closed - Opened by wanzhenchn over 1 year ago - 4 comments

#11 - Compatibility in Python 3.8 when running entry.py

Issue - State: closed - Opened by wanzhenchn over 1 year ago - 2 comments

#10 - Quantize models with custom datasets

Issue - State: closed - Opened by casper-hansen over 1 year ago
Labels: enhancement

#9 - Release PyPi package + Create GitHub workflow

Pull Request - State: closed - Opened by casper-hansen over 1 year ago - 3 comments

#8 - Create class QuantConfig

Issue - State: closed - Opened by casper-hansen over 1 year ago - 8 comments
Labels: good first issue

#7 - Clean up fused modules

Issue - State: closed - Opened by casper-hansen over 1 year ago - 1 comment
Labels: good first issue

#6 - Interested in Hugging Face transformers integration?

Issue - State: closed - Opened by younesbelkada over 1 year ago - 2 comments

#5 - Implement BigCode models (StarCoder etc.)

Issue - State: closed - Opened by casper-hansen over 1 year ago - 7 comments

#4 - Experiment with implementing AWQ for BERT models

Issue - State: open - Opened by casper-hansen over 1 year ago - 3 comments
Labels: help wanted

#3 - Implement exllama q4_matmul kernel as alternative

Issue - State: open - Opened by casper-hansen over 1 year ago - 5 comments
Labels: enhancement, help wanted

#2 - Implement faster LayerNorm than nn.LayerNorm

Issue - State: closed - Opened by casper-hansen over 1 year ago - 1 comment
Labels: enhancement, help wanted

#1 - Add GPTJ Support

Pull Request - State: closed - Opened by jamesdborin over 1 year ago - 3 comments

Ecosyste.ms: Issues

GitHub / casper-hansen/AutoAWQ issues and pull requests

#29 - batching sample; no disable_fused_layers for FP16 model

#28 - [BUG] Fix illegal memory access + Quantized Multi-GPU support

#27 - Allow user to use custom calibration data for quantization

#26 - Implement batch size for speed test

#25 - support speedtest to benchmark FP16 model

#24 - remove fixed compute capabilities list

#23 - YaRN support for LLaMa models

#22 - add llava model support

#21 - fuse_layers bug fix

#20 - Bug hunt: illegal memory access

#19 - Implement xformers layernorm (2x faster than nn.LayerNorm)

#18 - Refactor fused modules

#17 - Add multi-gpu support to fused layers

#16 - windows support

#15 - Support batch input for performance test

#14 - Windows build support

#13 - Cuda issue when trying to install

#12 - Recursion error when creating AutoTokenizer for llama-13b-hf

#11 - Compatibility in Python 3.8 when running entry.py

#10 - Quantize models with custom datasets

#9 - Release PyPi package + Create GitHub workflow

#8 - Create class QuantConfig

#7 - Clean up fused modules

#6 - Interested in Hugging Face transformers integration?

#5 - Implement BigCode models (StarCoder etc.)

#4 - Experiment with implementing AWQ for BERT models

#3 - Implement exllama q4_matmul kernel as alternative

#2 - Implement faster LayerNorm than nn.LayerNorm

#1 - Add GPTJ Support