google/jetstream-pytorch issues and pull requests

#187 - Add model warmup and jax compilation cache flags

Pull Request - State: open - Opened by vivianrwu about 2 months ago

#186 - Fix too many positional arguments lint error

Pull Request - State: closed - Opened by FanhaiLu1 about 2 months ago

#185 - [Feature Request] Per request sampling params

Issue - State: open - Opened by qihqi about 2 months ago - 1 comment

#184 - Switch to NP from Jax to improve attention manager performance

Pull Request - State: closed - Opened by FanhaiLu1 about 2 months ago - 1 comment

#183 - Make sure the server does not crash if the input is too long

Issue - State: open - Opened by qihqi 2 months ago

#182 - [RFC] Formalizing commandline arguments.

Issue - State: open - Opened by qihqi 2 months ago

#181 - Add offline perf ci

Pull Request - State: closed - Opened by qihqi 2 months ago - 6 comments

#180 - Support End To End PagedAttention in JetStream

Pull Request - State: closed - Opened by FanhaiLu1 2 months ago

#179 - Pa decode checkin 1

Pull Request - State: closed - Opened by FanhaiLu1 2 months ago

#178 - Update README for new CLI

Pull Request - State: closed - Opened by qihqi 3 months ago

#177 - Update Jetstream, add optional sampler args.

Pull Request - State: closed - Opened by qihqi 3 months ago

#176 - Add gemma support in better cli

Pull Request - State: closed - Opened by qihqi 3 months ago

#175 - Use kwargs to simplify the call sites a bit

Pull Request - State: closed - Opened by yixinshi 3 months ago

#174 - Add mixtral support to new CLI

Pull Request - State: closed - Opened by qihqi 3 months ago

#173 - Issues with prefill & generate

Issue - State: open - Opened by qihqi 3 months ago

#172 - Fix the performance regression with ragged attention on for llama2 7b.

Pull Request - State: closed - Opened by wang2yn84 3 months ago - 2 comments

#171 - Replace repeat kv with proper GQA handling.

Pull Request - State: closed - Opened by wang2yn84 3 months ago - 3 comments

#170 - fix ray engine crashes on multihost

Pull Request - State: closed - Opened by sixiang-google 3 months ago

#169 - Error Running `run_ray_serve_interleave` with Llama3 8B

Issue - State: open - Opened by ryanaoleary 3 months ago

#168 - Add a script to measure speed of basic ops

Pull Request - State: closed - Opened by qihqi 3 months ago

#167 - Add page attention manager and kvcache manager

Pull Request - State: closed - Opened by FanhaiLu1 3 months ago

#166 - Add page attention manager and kvcache manager

Pull Request - State: closed - Opened by FanhaiLu1 3 months ago

#165 - Fix TPU head resource name for v4 and v5e

Pull Request - State: closed - Opened by richardsliu 4 months ago

#164 - Fix Ray engine crash on multihost

Pull Request - State: closed - Opened by richardsliu 4 months ago

#163 - Fixed exhausted bug between head and workers

Pull Request - State: closed - Opened by FanhaiLu1 4 months ago

#162 - Handle v5e-8 in run_ray_serve_interleave

Pull Request - State: closed - Opened by richardsliu 4 months ago

#161 - Update Ray version in Dockerfile and add v5 configs

Pull Request - State: closed - Opened by richardsliu 4 months ago

#160 - Add newest llama-3 benchmarks

Pull Request - State: closed - Opened by qihqi 4 months ago

#159 - V5e8 ray

Pull Request - State: closed - Opened by FanhaiLu1 4 months ago

#158 - Return np instead of jax array for prefill result tokens

Pull Request - State: closed - Opened by FanhaiLu1 4 months ago

#157 - Correct typo enbedding -> embedding

Pull Request - State: closed - Opened by tengomucho 4 months ago - 1 comment

#156 - commit act quant for conditional ffn

Pull Request - State: open - Opened by qihqi 4 months ago

#155 - Stacked cache mixtral.

Pull Request - State: closed - Opened by wang2yn84 4 months ago

#154 - Stacked cache for MLPerf

Pull Request - State: closed - Opened by wang2yn84 4 months ago

#153 - Add mlperf benchmark for offline for mixtral

Pull Request - State: closed - Opened by qihqi 4 months ago - 2 comments

#152 - Set accumulate type to bf16 in activation quant

Pull Request - State: closed - Opened by lsy323 4 months ago - 1 comment

#151 - Optimize cache update.

Pull Request - State: closed - Opened by wang2yn84 4 months ago - 7 comments

#150 - Ray engine crashes on multihost when fetching Jax.array from prefill_ray

Issue - State: closed - Opened by richardsliu 4 months ago - 1 comment

#149 - Fix blockwise sharding

Pull Request - State: open - Opened by lsy323 4 months ago

#148 - Add mlperf benchmark scripts in-tree.

Pull Request - State: closed - Opened by qihqi 4 months ago

#147 - Make Ray engine and worker process prefill returning first token

Pull Request - State: closed - Opened by richardsliu 4 months ago

#146 - Jetstream + RayServe deployment for interleave mode

Pull Request - State: closed - Opened by richardsliu 4 months ago

#145 - Set JAX_PLATFORMS to "tpu, cpu" for ray worker

Pull Request - State: closed - Opened by richardsliu 4 months ago

#144 - Fix exception in ray_worker

Pull Request - State: closed - Opened by richardsliu 4 months ago

#143 - Make prefilling return first token for loadgen integration

Pull Request - State: closed - Opened by sixiang-google 4 months ago - 1 comment

#142 - Add server tests

Pull Request - State: closed - Opened by bvrockwell 4 months ago - 1 comment

#141 - Update benchmark command in README.md

Pull Request - State: closed - Opened by bhavya01 5 months ago

#140 - add enable jax profiler to run_server

Pull Request - State: closed - Opened by bvrockwell 5 months ago

#139 - Update README.md to state the limitation of accessing GCS when conver…

Pull Request - State: closed - Opened by wang2yn84 5 months ago

#138 - Minor fixes to README

Pull Request - State: closed - Opened by wang2yn84 5 months ago

#137 - Empty response returned for prompt responses when using run_server_with_ray.py and batch_size > 1

Issue - State: open - Opened by richardsliu 5 months ago - 2 comments

#136 - Add layer id in scope for each TransformerBlock layer

Pull Request - State: closed - Opened by FanhaiLu1 5 months ago

#135 - Checkpoint conversion script breaks for meta-llama/llama-2-7b on HF

Issue - State: open - Opened by vivianrwu 5 months ago

#134 - prototyping better UX

Pull Request - State: closed - Opened by qihqi 5 months ago - 2 comments

#133 - Add left aligned cache support.

Pull Request - State: closed - Opened by wang2yn84 5 months ago

#132 - fix mixtral quantization scaler axis when dimension > 2

Pull Request - State: closed - Opened by sixiang-google 5 months ago

#131 - Add test for Mixtral model.

Pull Request - State: closed - Opened by wang2yn84 5 months ago

#130 - make sure GPU works

Pull Request - State: closed - Opened by qihqi 5 months ago

#129 - Update README.md

Pull Request - State: closed - Opened by bhavya01 5 months ago

#128 - Update README.md

Pull Request - State: closed - Opened by qihqi 5 months ago

#127 - Update submodules, prepare for leasing v0.2.4

Pull Request - State: closed - Opened by qihqi 5 months ago - 1 comment

#126 - Add lock in prefill and generate to prevent starvation

Pull Request - State: closed - Opened by FanhaiLu1 5 months ago - 1 comment

#125 - Update summary.md

Pull Request - State: closed - Opened by qihqi 5 months ago - 1 comment

#124 - Remove JSON config mangling for Gemma ckpt

Pull Request - State: closed - Opened by lsy323 5 months ago - 1 comment

#123 - Add different token sampling algorithms to decoder.

Pull Request - State: closed - Opened by bvrockwell 5 months ago - 1 comment

#122 - add script to isntall for GPU

Pull Request - State: closed - Opened by qihqi 5 months ago - 2 comments

#121 - Fix convert_checkpoint.py for hf and gemma

Pull Request - State: closed - Opened by qihqi 5 months ago

#120 - Mixtral enablement.

Pull Request - State: closed - Opened by wang2yn84 5 months ago - 1 comment

#119 - Add guide on adding HF ckpt conversion support

Pull Request - State: closed - Opened by lsy323 5 months ago

#118 - Support HF LLaMA ckpt conversion

Pull Request - State: closed - Opened by lsy323 5 months ago

#117 - Integrate disaggregated serving with JetStream

Pull Request - State: closed - Opened by FanhaiLu1 5 months ago

#116 - Fix conversion bug

Pull Request - State: closed - Opened by yeandy 5 months ago

#115 - Bug in model conversion script

Issue - State: closed - Opened by yeandy 5 months ago - 2 comments

#114 - Add for readme interleave multiple host with ray

Pull Request - State: closed - Opened by FanhaiLu1 5 months ago - 1 comment

#113 - Metrics bug: server_lib should be config_lib

Pull Request - State: closed - Opened by Bslabe123 6 months ago

#112 - Enable jax profiler server in run with ray

Pull Request - State: closed - Opened by FanhaiLu1 6 months ago

#111 - Jetstream: 8128c8a -> v0.2.2

Pull Request - State: closed - Opened by Bslabe123 6 months ago

#110 - Release JetStream v0.2.2

Pull Request - State: closed - Opened by JoeZijunZhou 6 months ago

#109 - Add run_server with ray for interleave serving

Pull Request - State: closed - Opened by FanhaiLu1 6 months ago

#108 - Update Jetstream commit id

Pull Request - State: closed - Opened by FanhaiLu1 6 months ago

#107 - Return Tuple(interleaveEngList, prefillEngineList, decodeEngineList) in create ray engine

Issue - State: open - Opened by FanhaiLu1 6 months ago

#106 - Ray Disaggregated Serving MVP

Pull Request - State: closed - Opened by FanhaiLu1 6 months ago - 2 comments

#105 - Add activation quantization support to per-channel quantized linear layers

Pull Request - State: closed - Opened by lsy323 6 months ago

#104 - Fix convert script cannot generate bf16 weights

Pull Request - State: closed - Opened by lsy323 6 months ago

#103 - Update run_interactive.py with finer control of profiler.

Pull Request - State: closed - Opened by wang2yn84 6 months ago

#102 - Update run_server.py. metrics_server_config is not supported in JetStream[8128c8a] yet

Pull Request - State: closed - Opened by wang2yn84 6 months ago - 2 comments

#101 - Add support for Llama3-70b

Pull Request - State: closed - Opened by bhavya01 6 months ago - 3 comments

#100 - Fix ray conflict changes

Pull Request - State: closed - Opened by FanhaiLu1 6 months ago - 2 comments

#99 - Pass metrics client config through to Jetstream

Pull Request - State: closed - Opened by Bslabe123 6 months ago - 1 comment

#98 - Fix gemma model, enable_weight_quantization is available through quant_config.

Pull Request - State: closed - Opened by wang2yn84 6 months ago - 1 comment

#97 - Update README.md, the quantize flag is no longer available, quantize_type assumes the role of the original flag.

Pull Request - State: closed - Opened by wang2yn84 6 months ago - 1 comment

#96 - Fix flax and ray dependencies

Pull Request - State: closed - Opened by FanhaiLu1 6 months ago

#95 - Fixes tests. Can now run on CPU by default.

Pull Request - State: closed - Opened by wang2yn84 6 months ago - 4 comments

#94 - Add regression test to detect service broken and performance degradation

Issue - State: open - Opened by FanhaiLu1 6 months ago - 2 comments

#93 - Integrates ragged attention to JetStream Pytorch

Pull Request - State: closed - Opened by wang2yn84 6 months ago

#92 - Move flags in scripts to a common function

Pull Request - State: closed - Opened by lsy323 6 months ago

#91 - Update README.md

Pull Request - State: closed - Opened by qihqi 6 months ago

#90 - Leverage tokens_utils to process result tokens

Pull Request - State: closed - Opened by FanhaiLu1 6 months ago

#89 - Move deps to git submodule

Pull Request - State: closed - Opened by qihqi 6 months ago

#88 - Update version of jetstream; misc fixes

Pull Request - State: closed - Opened by qihqi 6 months ago

GitHub / google/jetstream-pytorch issues and pull requests