deepjavalibrary/djl-serving issues and pull requests

#2403 - [fix][lmi][specdec] fix issue with json output formatter not returnin…

Pull Request - State: closed - Opened by siddvenk 6 days ago

#2402 - Upgrade to DJL 0.30.0

Pull Request - State: closed - Opened by xyang16 9 days ago

#2401 - [docker] Update vllm to 0.6.1.post2

Pull Request - State: closed - Opened by xyang16 9 days ago

#2400 - [fix] prevent requests being sent to python model until model is full…

Pull Request - State: closed - Opened by siddvenk 9 days ago

#2399 - [python] Update lmi-list generation params after upgrading to vllm 0.5.5

Pull Request - State: closed - Opened by xyang16 11 days ago

#2398 - [cherry-pick] [neo] Fix calib_size dtype bug (#2397)

Pull Request - State: closed - Opened by a-ys 11 days ago

#2397 - [neo] Fix calib_size dtype bug

Pull Request - State: closed - Opened by a-ys 11 days ago

#2396 - [fix][lmi] validate token exists in streaming output formatters to ha…

Pull Request - State: closed - Opened by siddvenk 11 days ago

#2395 - [python] Fix null pointer for empty output tokens

Pull Request - State: closed - Opened by xyang16 11 days ago

#2394 - bump up transformers version for llama 3.1 support

Pull Request - State: closed - Opened by ydm-amazon 11 days ago

#2393 - [fix][lmi] validate token exists in streaming output formatters to ha…

Pull Request - State: closed - Opened by siddvenk 11 days ago

#2392 - [fix][lmi][tests] specify jsonquery for awscurl when using chat compl…

Pull Request - State: closed - Opened by siddvenk 11 days ago

#2391 - [docker] Update vllm to 0.5.5

Pull Request - State: closed - Opened by xyang16 13 days ago

#2390 - [fix][lmi][tests] specify jsonquery for awscurl when using chat compl…

Pull Request - State: closed - Opened by siddvenk 13 days ago

#2389 - Upgrade to support latest vLLM version (max_lora_rank)

Issue - State: open - Opened by dreamiter 14 days ago - 8 comments
Labels: enhancement

#2388 - Adds grpc plugin

Pull Request - State: closed - Opened by frankfliu 14 days ago - 1 comment

#2387 - Support for newer Vision LMs through vllm 0.6.1

Issue - State: open - Opened by rdzotz 15 days ago
Labels: enhancement

#2386 - [ci] Updates gradle to 8.10.1

Pull Request - State: closed - Opened by frankfliu 16 days ago

#2385 - docker 0.29.0-pytorch-inf2 with meta-llama/Meta-Llama-3.1-8B-Instructn failes

Issue - State: open - Opened by yaronr 16 days ago
Labels: bug

#2384 - [cherry-pick] [Neo][vLLM] Accept quant options for awq, fp8 (#2382)

Pull Request - State: closed - Opened by a-ys 17 days ago

#2383 - [do not merge] [cherry-pick] [Neo][vLLM] Accept quant options for awq, fp8

Pull Request - State: closed - Opened by a-ys 17 days ago

#2382 - [Neo][vLLM] Accept quant options for awq, fp8

Pull Request - State: closed - Opened by a-ys 17 days ago - 4 comments

#2381 - [python] check whether last token is generated for json_output_format…

Pull Request - State: closed - Opened by siddvenk 17 days ago

#2380 - 0.29.0 dlc

Pull Request - State: closed - Opened by siddvenk 17 days ago

#2379 - [docker][neuron] Version bumps for vllm 0.6.0

Pull Request - State: closed - Opened by tosterberg 18 days ago

#2378 - NeuronX compiler: specify data type

Issue - State: open - Opened by CoolFish88 18 days ago - 1 comment
Labels: enhancement

#2377 - Transformers NeuronX continuous batching support for Mistal 7b Instruct V3

Issue - State: open - Opened by CoolFish88 18 days ago
Labels: enhancement

#2376 - [python] Update vllm rolling batcher sampling params for 0.6.0 support

Pull Request - State: closed - Opened by tosterberg 19 days ago

#2375 - [python] check whether last token is generated for json_output_formatter

Pull Request - State: closed - Opened by sindhuvahinis 20 days ago

#2374 - [unittest] Remove assert and add self.assertEqual

Pull Request - State: closed - Opened by sindhuvahinis 20 days ago

#2373 - [unittest] add spec decoding multiple tokens generation unit tests

Pull Request - State: closed - Opened by sindhuvahinis 20 days ago

#2372 - [fix][lmi] only use sequence iterators for generating outputs in stre…

Pull Request - State: closed - Opened by siddvenk 20 days ago

#2371 - [awscurl] Prints inter token latency

Pull Request - State: closed - Opened by frankfliu 22 days ago

#2370 - [ci] llama-2-13b on inf2 requires additional config removing from LCNC

Pull Request - State: closed - Opened by tosterberg 23 days ago

#2369 - [fix] Format input text to avoid error

Pull Request - State: closed - Opened by xyang16 23 days ago

#2368 - [ci][fix] LCNC model tagging and accelerator count

Pull Request - State: closed - Opened by tosterberg 23 days ago

#2367 - [ci] Neuron LCNC tests small models

Pull Request - State: closed - Opened by tosterberg 24 days ago

#2366 - [Neo][vLLM] Fix quantization failure caused by improperly loaded mode…

Pull Request - State: closed - Opened by tosterberg 24 days ago

#2365 - Model conversion process failed. Unable to find bin files

Issue - State: open - Opened by joshight 24 days ago
Labels: bug

#2364 - [CI] Add llama-3.1 lmi-dist test, with secure mode enabled

Pull Request - State: closed - Opened by ethnzhng 24 days ago

#2363 - [serving] add request id logging on invocations/predictions path

Pull Request - State: closed - Opened by siddvenk 24 days ago

#2362 - Mistral7b custom inference with LMI not working: java.lang.IllegalStateException: Read chunk timeout.

Issue - State: open - Opened by jeremite 25 days ago
Labels: bug

#2361 - [serving] Updates dependencies version to latest

Pull Request - State: closed - Opened by frankfliu 25 days ago

#2360 - [Neo][vLLM] Fix quantization failure caused by improperly loaded model.

Pull Request - State: closed - Opened by a-ys 25 days ago

#2359 - [ci] Reformat shell script with shfmt

Pull Request - State: closed - Opened by frankfliu 25 days ago

#2358 - [ci] minor fixes in multi-node integration test

Pull Request - State: closed - Opened by sindhuvahinis 25 days ago

#2357 - [serving] Print PIPELINE_PARALLEL_DEGREE env var

Pull Request - State: closed - Opened by xyang16 26 days ago

#2356 - [fix][sf] fix bug with PyPredictor to remove worker, add specific fla…

Pull Request - State: closed - Opened by siddvenk 26 days ago

#2355 - Token metrics no longer computed when specifying a json query

Issue - State: closed - Opened by CoolFish88 26 days ago - 2 comments
Labels: bug

#2354 - Strange generation with Llama-3.1-70B on ml.inf2.48xlarge

Issue - State: open - Opened by juliensimon 27 days ago - 4 comments
Labels: bug

#2353 - [serving] Updates onnxruntime to 1.19.0

Pull Request - State: closed - Opened by frankfliu about 1 month ago

#2352 - [fix] Partition tests use python handler and avoid java only defaults

Pull Request - State: closed - Opened by tosterberg about 1 month ago

#2351 - [serving] Minor code improvement

Pull Request - State: closed - Opened by frankfliu about 1 month ago

#2350 - [test][neuron] Add gpt2 test case and infinite loop guard

Pull Request - State: closed - Opened by tosterberg about 1 month ago

#2349 - [serving] Removes form data size limit

Pull Request - State: closed - Opened by frankfliu about 1 month ago

#2348 - [docker][lmi] fix torch and flashattention dependency versions

Pull Request - State: closed - Opened by siddvenk about 1 month ago

#2347 - [ci] remove precompiled trt tests because of switch from g5 to g6

Pull Request - State: closed - Opened by siddvenk about 1 month ago

#2346 - [ci] use g6 for llm integration due to capacity issues with g5

Pull Request - State: closed - Opened by siddvenk about 1 month ago

#2345 - [lmi][neuron] Add smart defaults to LMI Neuron

Pull Request - State: closed - Opened by tosterberg about 1 month ago

#2344 - [fix] prevent requests being sent to python model until model is full…

Pull Request - State: closed - Opened by siddvenk about 1 month ago

#2343 - [docker] update vllm wheel for version required by lmi-dist

Pull Request - State: closed - Opened by siddvenk about 1 month ago

#2342 - [fix] prevent requests being sent to python model until model is full…

Pull Request - State: closed - Opened by siddvenk about 1 month ago

#2341 - [awscurl] Allows set max length by env var

Pull Request - State: closed - Opened by frankfliu about 1 month ago - 2 comments

#2340 - awscurl: Missing token metrics when -t option specified

Issue - State: open - Opened by CoolFish88 about 1 month ago - 7 comments
Labels: bug

#2339 - awscurl: WARN maxLength is not explicitly specified, use modelMaxLength: 512

Issue - State: open - Opened by CoolFish88 about 1 month ago - 2 comments
Labels: bug

#2338 - [feat] add disable_sliding_window parameter to vllm/lmi-dist engine args

Pull Request - State: closed - Opened by hommayushi3 about 1 month ago

#2337 - Add "disable-sliding-window" VLLM/LMI-dist engine argument to enable running Phi-3-Vision with Flash Attn

Issue - State: closed - Opened by hommayushi3 about 1 month ago - 1 comment
Labels: enhancement

#2336 - Add simulated multi-node test

Pull Request - State: closed - Opened by nikhil-sk about 1 month ago

#2335 - [Draft] Add simulated multi-node test

Pull Request - State: closed - Opened by nikhil-sk about 1 month ago

#2334 - [Draft] Add EKS+LWS simulated multi-node test

Pull Request - State: closed - Opened by nikhil-sk about 1 month ago

#2333 - [cherry-pick] allow list enable streaming

Pull Request - State: closed - Opened by ydm-amazon about 1 month ago

#2332 - allowlist enable streaming

Pull Request - State: closed - Opened by ydm-amazon about 1 month ago

#2331 - [ci] Fix awscurl run headers

Pull Request - State: closed - Opened by xyang16 about 1 month ago

#2330 - [Docs] Add a few missing TRT-LLM options

Pull Request - State: closed - Opened by ethnzhng about 1 month ago

#2329 - LMI release notes

Pull Request - State: closed - Opened by ydm-amazon about 1 month ago

#2328 - [ci] fix device_map auto change in hf handler

Pull Request - State: closed - Opened by sindhuvahinis about 1 month ago

#2327 - [ci] fix benchmark nightly concurrency

Pull Request - State: closed - Opened by sindhuvahinis about 1 month ago

#2326 - [awscurl] Loads AWS creadentials from EKS metadata

Pull Request - State: closed - Opened by frankfliu about 1 month ago

#2325 - [lmi][rolling-batch] deprecate backwards compat input formatter support

Pull Request - State: closed - Opened by siddvenk about 1 month ago

#2324 - [serving] Change default retry_threshold to 0

Pull Request - State: closed - Opened by frankfliu about 1 month ago

#2323 - awscurl loading aws credentials in SageMaker Studio

Issue - State: closed - Opened by acere about 1 month ago
Labels: enhancement

#2322 - fix: allow chat template for non batch

Pull Request - State: closed - Opened by sindhuvahinis about 1 month ago

#2321 - [docker] LMI Neuron bump optimum-neuron version

Pull Request - State: closed - Opened by tosterberg about 1 month ago

#2320 - [ci] Add models in text embedding integration

Pull Request - State: closed - Opened by xyang16 about 2 months ago

#2319 - [ci] Add models in text embedding integration

Pull Request - State: closed - Opened by xyang16 about 2 months ago

#2318 - [feat] lmi neuronx add smart defaults context length estimates

Pull Request - State: closed - Opened by tosterberg about 2 months ago

#2317 - [cherry-pick][0.29.0-dlc] [Neo][Neuron] Various CX improvements for Neo Neuron entrypoint (#2296)

Pull Request - State: closed - Opened by a-ys about 2 months ago

#2316 - [feat] lmi neuronx add smart defaults n_positions

Pull Request - State: closed - Opened by tosterberg about 2 months ago

#2315 - Pass trust_remote_code arg to djl-convert

Pull Request - State: closed - Opened by xyang16 about 2 months ago

#2314 - [wlm] Minor refactor to remove unused parameter

Pull Request - State: closed - Opened by frankfliu about 2 months ago

#2313 - [0.28.0-dlc] fix the integration test to build on staging for gpu

Pull Request - State: closed - Opened by sindhuvahinis about 2 months ago

#2312 - [0.28.0-dlc] fix the integration test to build on staging

Pull Request - State: closed - Opened by sindhuvahinis about 2 months ago

#2311 - [cherry-pick][secure-mode] Update options allowlist for 0.29.0 (#2310)

Pull Request - State: closed - Opened by ethnzhng about 2 months ago

#2310 - [secure-mode] Update options allowlist for 0.29.0

Pull Request - State: closed - Opened by ethnzhng about 2 months ago

#2309 - [cherry-pick][secure-mode] Do not require untrusted channels env var to be set (#2306)

Pull Request - State: closed - Opened by ethnzhng about 2 months ago

#2308 - [wlm] fail fast if one of the workers dies (#2305)

Pull Request - State: closed - Opened by sindhuvahinis about 2 months ago

#2307 - [wlm] fail fast if one of the workers dies (#2305)

Pull Request - State: closed - Opened by sindhuvahinis about 2 months ago

#2306 - [secure-mode] Do not require untrusted channels env var to be set

Pull Request - State: closed - Opened by ethnzhng about 2 months ago

#2305 - [wlm] fail fast if one of the workers dies

Pull Request - State: closed - Opened by sindhuvahinis about 2 months ago

#2304 - [docs] Fixes broken links

Pull Request - State: closed - Opened by frankfliu about 2 months ago

GitHub / deepjavalibrary/djl-serving issues and pull requests