Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / deftruth/awesome-llm-inference issues and pull requests
#117 - ๐ฅ๐ฅ[Mooncake] Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving
Pull Request -
State: closed - Opened by DefTruth 18 days ago
#116 - ๐ฅ๐ฅ[DeServe] DESERVE: TOWARDS AFFORDABLE OFFLINE LLM INFERENCE VIA DECENTRALIZATION
Pull Request -
State: closed - Opened by DefTruth 18 days ago
#115 - ๐ฅ๐ฅ[KVDirect] KVDirect: Distributed Disaggregated LLM Inference
Pull Request -
State: closed - Opened by DefTruth 18 days ago
#114 - ๐ฅ๐ฅ[DistServe] DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving
Pull Request -
State: closed - Opened by DefTruth 18 days ago
#113 - [feat] add deepseek-r1
Pull Request -
State: closed - Opened by shaoyuyoung 26 days ago
#112 - add `MiniMax-01` in Trending LLM/VLM Topics and Long Context Attention
Pull Request -
State: closed - Opened by shaoyuyoung about 1 month ago
- 2 comments
#111 - ๐ฅ๐ฅ[FFPA] FFPA: Yet another Faster Flash Prefill Attention with O(1) SRAM complexity for headdim > 256, ~1.5x faster than SDPA EA(@DefTruth)
Pull Request -
State: closed - Opened by DefTruth about 1 month ago
#110 - ๐ฅ๐ฅ[SP: TokenRing] TokenRing: An Efficient Parallelism Framework for Infinite-Context LLMs via Bidirectional Communication
Pull Request -
State: closed - Opened by DefTruth about 1 month ago
#110 - ๐ฅ๐ฅ[SP: TokenRing] TokenRing: An Efficient Parallelism Framework for Infinite-Context LLMs via Bidirectional Communication
Pull Request -
State: closed - Opened by DefTruth about 1 month ago
#109 - ๐ฅ๐ฅ๐ฅ[DeepSeek-V3] DeepSeek-V3 Technical Report
Pull Request -
State: closed - Opened by DefTruth about 2 months ago
#109 - ๐ฅ๐ฅ๐ฅ[DeepSeek-V3] DeepSeek-V3 Technical Report
Pull Request -
State: closed - Opened by DefTruth about 2 months ago
#108 - ๐ฅ๐ฅ[HADACORE] HADACORE: TENSOR CORE ACCELERATED HADAMARD TRANSFORM KERNEL
Pull Request -
State: closed - Opened by DefTruth about 2 months ago
#108 - ๐ฅ๐ฅ[HADACORE] HADACORE: TENSOR CORE ACCELERATED HADAMARD TRANSFORM KERNEL
Pull Request -
State: closed - Opened by DefTruth about 2 months ago
#107 - ๐ฅ[DynamicKV] DynamicKV: Task-Aware Adaptive KV Cache Compression for Long Context LLMs
Pull Request -
State: closed - Opened by DefTruth about 2 months ago
#107 - ๐ฅ[DynamicKV] DynamicKV: Task-Aware Adaptive KV Cache Compression for Long Context LLMs
Pull Request -
State: closed - Opened by DefTruth about 2 months ago
#106 - ๐ฅ๐ฅ[NITRO] NITRO: LLM INFERENCE ON INTELยฎ LAPTOP NPUS
Pull Request -
State: closed - Opened by DefTruth about 2 months ago
#106 - ๐ฅ๐ฅ[NITRO] NITRO: LLM INFERENCE ON INTELยฎ LAPTOP NPUS
Pull Request -
State: closed - Opened by DefTruth about 2 months ago
#105 - ๐ฅ๐ฅ[TurboAttention] TURBOATTENTION: EFFICIENT ATTENTION APPROXIMATION FOR HIGH THROUGHPUTS LLMS
Pull Request -
State: closed - Opened by DefTruth about 2 months ago
#105 - ๐ฅ๐ฅ[TurboAttention] TURBOATTENTION: EFFICIENT ATTENTION APPROXIMATION FOR HIGH THROUGHPUTS LLMS
Pull Request -
State: closed - Opened by DefTruth about 2 months ago
#104 - ๐ฅ[BatchLLM] BatchLLM: Optimizing Large Batched LLM Inference with Global Prefix Sharing and Throughput-oriented Token Batching
Pull Request -
State: closed - Opened by DefTruth 2 months ago
#103 - ๐ฅ[ClusterKV] ClusterKV: Manipulating LLM KV Cache in Semantic Space for Recallable Compression
Pull Request -
State: closed - Opened by DefTruth 2 months ago
#102 - ๐ฅ[KV Cache Recomputation] Efficient LLM Inference with I/O-Aware Partial KV Cache Recomputation
Pull Request -
State: closed - Opened by DefTruth 3 months ago
#101 - ๐ฅ[Star-Attention: 11x~ speedup] Star Attention: Efficient LLM Inference over Long Sequences
Pull Request -
State: closed - Opened by DefTruth 3 months ago
#100 - ๐ฅ[SparseInfer] SparseInfer: Training-free Prediction of Activation Sparsity for Fast LLM Inference
Pull Request -
State: closed - Opened by DefTruth 3 months ago
#99 - ๐ฅ[Squeezed Attention] SQUEEZED ATTENTION: Accelerating Long Context Length LLM Inference(@UC Berkeley)
Pull Request -
State: closed - Opened by DefTruth 3 months ago
#98 - ๐ฅ[SageAttention-2] SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration(@thu-ml)
Pull Request -
State: closed - Opened by DefTruth 3 months ago
#97 - ๐ฅ[SageAttention] SAGEATTENTION: ACCURATE 8-BIT ATTENTION FOR PLUG-AND-PLAY INFERENCE ACCELERATION(@thu-ml)
Pull Request -
State: closed - Opened by DefTruth 3 months ago
#96 - add vAttention code link
Pull Request -
State: closed - Opened by KevinZeng08 3 months ago
#95 - Add code link to BPT
Pull Request -
State: closed - Opened by DefTruth 3 months ago
#94 - ๐ฅ๐ฅ[TP: Comm Compression] Communication Compression for Tensor Parallel LLM Inference
Pull Request -
State: closed - Opened by DefTruth 3 months ago
#93 - ๐ฅ๐ฅ[SP: BPT] Blockwise Parallel Transformer for Large Context Models
Pull Request -
State: closed - Opened by DefTruth 3 months ago
#92 - Add DP/TP/SP/CP papers with codes
Pull Request -
State: closed - Opened by DefTruth 3 months ago
#91 - ๐ฅ[BitNet] BitNet a4.8: 4-bit Activations for 1-bit LLMs
Pull Request -
State: closed - Opened by DefTruth 3 months ago
#90 - ๐ฅ[Tensor Product] Acceleration of Tensor-Product Operations with Tensor Cores
Pull Request -
State: closed - Opened by DefTruth 4 months ago
#89 - ๐ฅ[Fast Best-of-N] Fast Best-of-N Decoding via Speculative Rejection
Pull Request -
State: closed - Opened by DefTruth 4 months ago
#88 - ๐ฅ[FastAttention] FastAttention: Extend FlashAttention2 to NPUs and Low-resource GPUs for Efficient Inference
Pull Request -
State: closed - Opened by DefTruth 4 months ago
#87 - Efficient Hybrid Inference for LLMs: Reward-Based Token Modelling with Selective Cloud Assistance
Pull Request -
State: closed - Opened by aharshms 4 months ago
#86 - Add paper AdaKV
Pull Request -
State: closed - Opened by FFY0 4 months ago
- 1 comment
#85 - early exit of LLM inference
Pull Request -
State: closed - Opened by boyi-liu 4 months ago
- 1 comment
#84 - ๐ฅ[PARALLELSPEC] PARALLELSPEC: PARALLEL DRAFTER FOR EFFICIENT SPECULATIVE DECODING
Pull Request -
State: closed - Opened by DefTruth 4 months ago
#83 - [LLM Inference] LARGE LANGUAGE MODEL INFERENCE ACCELERATION: A COMPREHENSIVE HARDWARE PERSPECTIVE
Pull Request -
State: closed - Opened by DefTruth 4 months ago
#82 - Large Language Model Performance Benchmarking on Mobile Platforms: A Thorough Evaluation
Pull Request -
State: closed - Opened by DefTruth 4 months ago
#81 - ๐ฅ[LORC] Low-Rank Compression for LLMs KV Cache with a Progressive Compression Strategy
Pull Request -
State: closed - Opened by DefTruth 4 months ago
#80 - [From Author] Link CacheGen and CacheBlend to LMCache
Pull Request -
State: closed - Opened by KuntaiDu 5 months ago
#79 - Bump up to v2.6
Pull Request -
State: closed - Opened by DefTruth 5 months ago
#78 - ๐ฅ[LayerKV] Optimizing Large Language Model Serving with Layer-wise KV Cache Management
Pull Request -
State: closed - Opened by DefTruth 5 months ago
#77 - ๐ฅ[KV-COMPRESS] PAGED KV-CACHE COMPRESSION WITH VARIABLE COMPRESSION RATES PER ATTENTION HEAD
Pull Request -
State: closed - Opened by DefTruth 5 months ago
#76 - ๐ฅ๐ฅ[Tensor Cores] Efficient Arbitrary Precision Acceleration for Large Language Models on GPU Tensor Cores
Pull Request -
State: closed - Opened by DefTruth 5 months ago
#75 - ๐ฅ[AlignedKV] AlignedKV: Reducing Memory Access of KV-Cache with Precision-Aligned Quantization
Pull Request -
State: closed - Opened by DefTruth 5 months ago
#74 - ๐ฅ๐ฅ[HiFloat8] Ascend HiFloat8 Format for Deep Learning
Pull Request -
State: closed - Opened by DefTruth 5 months ago
#73 - [Low-bit] A Survey of Low-bit Large Language Models: Basics, Systems, and Algorithms
Pull Request -
State: closed - Opened by DefTruth 5 months ago
#72 - ๐ฅ๐ฅ[INT-FLASHATTENTION] INT-FLASHATTENTION: ENABLING FLASH ATTENTION FOR INT8 QUANTIZATION
Pull Request -
State: closed - Opened by DefTruth 5 months ago
#71 - fix typo
Pull Request -
State: closed - Opened by DefTruth 5 months ago
#70 - ๐ฅ[VPTQ] VPTQ: EXTREME LOW-BIT VECTOR POST-TRAINING QUANTIZATION FOR LARGE LANGUAGE MODELS
Pull Request -
State: closed - Opened by DefTruth 5 months ago
#69 - Bump up to v2.5
Pull Request -
State: closed - Opened by DefTruth 5 months ago
#68 - ๐ฅ๐ฅ[CRITIPREFILL] CRITIPREFILL: A SEGMENT-WISE CRITICALITYBASED APPROACH FOR PREFILLING ACCELERATION IN LLMS
Pull Request -
State: closed - Opened by DefTruth 5 months ago
#67 - move RetrievalAttention -> long context
Pull Request -
State: closed - Opened by DefTruth 5 months ago
#66 - Update codebase of paper "parallel speculative decoding with adaptive draft length"
Pull Request -
State: closed - Opened by smart-lty 5 months ago
- 1 comment
#65 - ๐ฅ[InstInfer] InstInfer: In-Storage Attention Offloading for Cost-Effective Long-Context LLM Inference
Pull Request -
State: closed - Opened by DefTruth 5 months ago
#64 - Bump up to v2.4
Pull Request -
State: closed - Opened by DefTruth 5 months ago
#63 - ๐ฅ[Inf-MLLM] Inf-MLLM: Efficient Streaming Inference of Multimodal Large Language Models on a Single GPU
Pull Request -
State: closed - Opened by DefTruth 5 months ago
#62 - ๐ฅ[RetrievalAttention] Accelerating Long-Context LLM Inference via Vector Retrieval
Pull Request -
State: closed - Opened by DefTruth 5 months ago
#62 - ๐ฅ[RetrievalAttention] Accelerating Long-Context LLM Inference via Vector Retrieval
Pull Request -
State: closed - Opened by DefTruth 5 months ago
#61 - Bump up to v2.3
Pull Request -
State: closed - Opened by DefTruth 5 months ago
#61 - Bump up to v2.3
Pull Request -
State: closed - Opened by DefTruth 5 months ago
#60 - ๐ฅ[SpMM] High Performance Unstructured SpMM Computation Using Tensor Cores
Pull Request -
State: closed - Opened by DefTruth 6 months ago
#60 - ๐ฅ[SpMM] High Performance Unstructured SpMM Computation Using Tensor Cores
Pull Request -
State: closed - Opened by DefTruth 6 months ago
#59 - ๐ฅ[CHESS] CHESS : Optimizing LLM Inference via Channel-Wise Thresholding and Selective Sparsification
Pull Request -
State: closed - Opened by DefTruth 6 months ago
#59 - ๐ฅ[CHESS] CHESS : Optimizing LLM Inference via Channel-Wise Thresholding and Selective Sparsification
Pull Request -
State: closed - Opened by DefTruth 6 months ago
#58 - Bump up to v2.2
Pull Request -
State: closed - Opened by DefTruth 6 months ago
#57 - ๐ฅ๐ฅ[Context Distillation] Efficient LLM Context Distillation
Pull Request -
State: closed - Opened by DefTruth 6 months ago
#57 - ๐ฅ๐ฅ[Context Distillation] Efficient LLM Context Distillation
Pull Request -
State: closed - Opened by DefTruth 6 months ago
#56 - ๐ฅ๐ฅ[Prompt Compression] Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference
Pull Request -
State: closed - Opened by DefTruth 6 months ago
#56 - ๐ฅ๐ฅ[Prompt Compression] Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference
Pull Request -
State: closed - Opened by DefTruth 6 months ago
#55 - ๐ฅ[Speculative Decoding] Boosting Lossless Speculative Decoding via Feature Sampling and Partial Alignment Distillation
Pull Request -
State: closed - Opened by DefTruth 6 months ago
#55 - ๐ฅ[Speculative Decoding] Boosting Lossless Speculative Decoding via Feature Sampling and Partial Alignment Distillation
Pull Request -
State: closed - Opened by DefTruth 6 months ago
#54 - ๐ฅ[SJF Scheduling] Efficient LLM Scheduling by Learning to Rank
Pull Request -
State: closed - Opened by DefTruth 6 months ago
#54 - ๐ฅ[SJF Scheduling] Efficient LLM Scheduling by Learning to Rank
Pull Request -
State: closed - Opened by DefTruth 6 months ago
#53 - ๐ฅ[Decentralized LLM] Decentralized LLM Inference over Edge Networks with Energy Harvesting
Pull Request -
State: closed - Opened by DefTruth 6 months ago
#53 - ๐ฅ[Decentralized LLM] Decentralized LLM Inference over Edge Networks with Energy Harvesting
Pull Request -
State: closed - Opened by DefTruth 6 months ago
#52 - ๐ฅ[ACTIVATION SPARSITY] TRAINING-FREE ACTIVATION SPARSITY IN LARGE LANGUAGE MODELS
Pull Request -
State: closed - Opened by DefTruth 6 months ago
#52 - ๐ฅ[ACTIVATION SPARSITY] TRAINING-FREE ACTIVATION SPARSITY IN LARGE LANGUAGE MODELS
Pull Request -
State: closed - Opened by DefTruth 6 months ago
#51 - Add NanoFlow code link
Pull Request -
State: closed - Opened by DefTruth 6 months ago
#51 - Add NanoFlow code link
Pull Request -
State: closed - Opened by DefTruth 6 months ago
#50 - Bump up to v2.1
Pull Request -
State: closed - Opened by DefTruth 6 months ago
#49 - ๐ฅ๐ฅ[FLA] FLA: A Triton-Based Library for Hardware-Efficient Implementaโฆ
Pull Request -
State: closed - Opened by DefTruth 6 months ago
#49 - ๐ฅ๐ฅ[FLA] FLA: A Triton-Based Library for Hardware-Efficient Implementaโฆ
Pull Request -
State: closed - Opened by DefTruth 6 months ago
#48 - ๐ฅ[1-bit LLMs] Matmul or No Matmal in the Era of 1-bit LLMs
Pull Request -
State: closed - Opened by DefTruth 6 months ago
#48 - ๐ฅ[1-bit LLMs] Matmul or No Matmal in the Era of 1-bit LLMs
Pull Request -
State: closed - Opened by DefTruth 6 months ago
#47 - ๐ฅ๐ฅ[MARLIN] MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models
Pull Request -
State: closed - Opened by DefTruth 6 months ago
#47 - ๐ฅ๐ฅ[MARLIN] MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models
Pull Request -
State: closed - Opened by DefTruth 6 months ago
#46 - Add ABQ-LLM code link
Pull Request -
State: closed - Opened by DefTruth 6 months ago
#46 - Add ABQ-LLM code link
Pull Request -
State: closed - Opened by DefTruth 6 months ago
#45 - add code linkใABQ-LLM ใ
Issue -
State: closed - Opened by lswzjuer 6 months ago
- 2 comments
#44 - ๐ฅ[MagicDec] MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding
Pull Request -
State: closed - Opened by DefTruth 6 months ago
#44 - ๐ฅ[MagicDec] MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding
Pull Request -
State: closed - Opened by DefTruth 6 months ago
#43 - ๐ฅ[NanoFlow] NanoFlow: Towards Optimal Large Language Model Serving Throughput
Pull Request -
State: closed - Opened by DefTruth 6 months ago
#43 - ๐ฅ[NanoFlow] NanoFlow: Towards Optimal Large Language Model Serving Throughput
Pull Request -
State: closed - Opened by DefTruth 6 months ago
#42 - ๐ฅ[FocusLLM] FocusLLM: Scaling LLMโs Context by Parallel Decoding
Pull Request -
State: closed - Opened by DefTruth 6 months ago
#42 - ๐ฅ[FocusLLM] FocusLLM: Scaling LLMโs Context by Parallel Decoding
Pull Request -
State: closed - Opened by DefTruth 6 months ago