deepjavalibrary/djl-serving issues and pull requests

#1260 - Update trtllm toolkit path

Pull Request - State: closed - Opened by rohithkrn about 1 year ago

#1259 - [TRT partition] add realtime stream reader for the conversion script

Pull Request - State: closed - Opened by lanking520 about 1 year ago

#1258 - [TRTLLM] always setting request output length

Pull Request - State: closed - Opened by lanking520 about 1 year ago

#1257 - MME - deviceId while creating workers

Pull Request - State: closed - Opened by sindhuvahinis about 1 year ago

#1256 - [TRTLLM] add trtllm with no deps

Pull Request - State: closed - Opened by lanking520 about 1 year ago

#1255 - [TRTLLM] use tensorrt wheel

Pull Request - State: closed - Opened by lanking520 about 1 year ago

#1254 - install trtllm toolkit

Pull Request - State: closed - Opened by lanking520 about 1 year ago

#1253 - [python] Fixes build error

Pull Request - State: closed - Opened by frankfliu about 1 year ago

#1252 - Inf2 properties refactoring using pydantic

Pull Request - State: closed - Opened by sindhuvahinis about 1 year ago

#1251 - [serving] Adds token latency metric

Pull Request - State: closed - Opened by frankfliu about 1 year ago

#1250 - [feat] Add inf2 2.15 sdk and handler to 0.24.0 dlc

Pull Request - State: closed - Opened by tosterberg about 1 year ago

#1249 - [python] Buffer tokens for rolling batch

Pull Request - State: closed - Opened by frankfliu about 1 year ago

#1248 - [TRTLLM] some clean up on trtllm handler

Pull Request - State: closed - Opened by lanking520 about 1 year ago

#1247 - add trtllm cuda-compat

Pull Request - State: closed - Opened by lanking520 about 1 year ago

#1246 - [DeepSpeed DLC] separate container build with multi-layers

Pull Request - State: closed - Opened by lanking520 about 1 year ago

#1245 - remove unused components

Pull Request - State: closed - Opened by lanking520 about 1 year ago

#1244 - removing ai template installation in deepspeed container

Pull Request - State: closed - Opened by lanking520 about 1 year ago

#1240 - New PR for tensorrt llm

Pull Request - State: closed - Opened by ydm-amazon about 1 year ago - 3 comments

#1236 - Issue with serving modes section (documentation)

Issue - State: closed - Opened by segundovolante about 1 year ago - 4 comments
Labels: bug

#1235 - Add trt-llm engine build step during model initialization

Pull Request - State: closed - Opened by rohithkrn about 1 year ago - 3 comments

#1230 - [SageMaker Galactus developer experience] model load integration to DJL serving

Pull Request - State: closed - Opened by haNa-meister about 1 year ago - 3 comments

#1229 - [fix] gpt2 neuron support handler and ci

Pull Request - State: closed - Opened by tosterberg about 1 year ago

#1227 - [neuronx] bump to 2.15 for tnx container and scripts

Pull Request - State: closed - Opened by tosterberg about 1 year ago - 2 comments

#1222 - Cleans tensorParallelDegree with MultiDevice

Pull Request - State: closed - Opened by zachgk about 1 year ago - 2 comments

#1220 - Update mpirun options

Pull Request - State: closed - Opened by xyang16 about 1 year ago - 1 comment

#1218 - [TRTLLM][SAMPLE] add trtllm rough rolling batcher

Pull Request - State: closed - Opened by lanking520 about 1 year ago

#1216 - Do warmup in multiple requests

Pull Request - State: closed - Opened by xyang16 about 1 year ago

#1214 - Ability to transform model outputs in DJL Serving

Issue - State: closed - Opened by rachitchauhan43 about 1 year ago - 4 comments
Labels: enhancement

#1212 - switch to torchrun as default

Pull Request - State: closed - Opened by lanking520 about 1 year ago

#1206 - [NeuronX] add attention mask porting from optimum-neuron

Pull Request - State: closed - Opened by lanking520 about 1 year ago

#1203 - Setting default datatype for deepspeed handlers

Pull Request - State: closed - Opened by sindhuvahinis about 1 year ago

#1194 - [fix] update context estimate interface

Pull Request - State: closed - Opened by tosterberg about 1 year ago

#1193 - [python] Do not set default value for truncate

Pull Request - State: closed - Opened by xyang16 about 1 year ago

#1190 - CI Test

Pull Request - State: closed - Opened by tosterberg about 1 year ago

#1189 - [0.24.0] Fix lmi_dist garbage output issue

Pull Request - State: closed - Opened by xyang16 about 1 year ago

#1188 - djl lmi images with vllm and hf quantizaton support

Issue - State: closed - Opened by Nagarajj about 1 year ago - 1 comment
Labels: bug

#1187 - Fix lmi_dist garbage output issue

Pull Request - State: open - Opened by xyang16 about 1 year ago

#1186 - [INF2] allow neuron to load split model directly

Pull Request - State: closed - Opened by lanking520 about 1 year ago

#1185 - Adding INF2 (transformers-neuronx) compilation latencies to SageMaker Health Metrics

Pull Request - State: open - Opened by Lokiiiiii about 1 year ago - 2 comments

#1184 - Add context length estimate for Neuron handler

Pull Request - State: closed - Opened by lanking520 about 1 year ago

#1183 - Add CI performance test for deepspeed smoothquant.

Pull Request - State: closed - Opened by chen3933 about 1 year ago

#1182 - Fix max tensor_parallel_degree

Pull Request - State: closed - Opened by zachgk about 1 year ago

#1181 - [bug fix] add entrypoint camel case recovery

Pull Request - State: closed - Opened by lanking520 about 1 year ago

#1180 - Update mmap version in `deepspeed.Dockerfile`

Pull Request - State: closed - Opened by maaquib about 1 year ago

#1179 - Add aiccl support

Pull Request - State: open - Opened by maaquib about 1 year ago

#1178 - [bugfix] parsing waiting steps to integer

Pull Request - State: closed - Opened by lanking520 about 1 year ago

#1177 - [CI] change xgen to standard llama model

Pull Request - State: closed - Opened by lanking520 about 1 year ago

#1176 - [LMI][Handler] add more model support coverage

Pull Request - State: closed - Opened by lanking520 about 1 year ago

#1175 - [fix] update tp for dynamic llama2 test back to 4

Pull Request - State: closed - Opened by tosterberg about 1 year ago

#1174 - Fix flash_attn import issue

Pull Request - State: closed - Opened by xyang16 about 1 year ago

#1173 - rolling batch does not work

Issue - State: closed - Opened by prgawade about 1 year ago - 2 comments
Labels: bug

#1172 - Faster in-memory weight transfer for transformers-neuronx

Pull Request - State: closed - Opened by Lokiiiiii about 1 year ago - 1 comment

#1171 - Adding llama2 w/ SmoothQuant ci test

Pull Request - State: closed - Opened by maaquib about 1 year ago

#1170 - [Docker] free disk space for docker build

Pull Request - State: closed - Opened by lanking520 about 1 year ago

#1169 - Update java dependencies

Pull Request - State: closed - Opened by zachgk about 1 year ago

#1168 - [INF2] add neuron batch size default and support rolling batch configs

Pull Request - State: closed - Opened by lanking520 about 1 year ago

#1167 - [feat] freeze deepspeed version for release

Pull Request - State: closed - Opened by tosterberg about 1 year ago

#1166 - Enable adapters preview in llm_integration test

Pull Request - State: closed - Opened by zachgk about 1 year ago

#1165 - [Handler] disable flash attention as default as of now

Pull Request - State: closed - Opened by lanking520 about 1 year ago

#1164 - [fix] add fast loading to partition test

Pull Request - State: closed - Opened by tosterberg about 1 year ago

#1163 - [serving] Cancel request if client disconnect

Pull Request - State: open - Opened by frankfliu about 1 year ago

#1162 - installing official vLLM into container

Pull Request - State: closed - Opened by lanking520 about 1 year ago

#1161 - Update vllm wheel name

Pull Request - State: closed - Opened by xyang16 about 1 year ago

#1160 - Adds versions as labels in dockerfiles

Pull Request - State: closed - Opened by zachgk about 1 year ago

#1159 - When doing smoothquant calibration, pass tokenizer through in deepspe…

Pull Request - State: closed - Opened by davidthomas426 about 1 year ago

#1158 - [Handler] disable circular import

Pull Request - State: closed - Opened by lanking520 about 1 year ago

#1157 - Clarify error message with unsupported quantization algorithm, since …

Pull Request - State: closed - Opened by davidthomas426 about 1 year ago

#1156 - Add error message for quantization when using checkpoint loading.

Pull Request - State: closed - Opened by chen3933 about 1 year ago

#1155 - [IB] remove empty lines

Pull Request - State: closed - Opened by lanking520 about 1 year ago

#1154 - [0.24.0 branch] Release 0.24.0 changes

Pull Request - State: closed - Opened by zachgk about 1 year ago

#1153 - Assert local lora models in the handler

Pull Request - State: closed - Opened by rohithkrn about 1 year ago

#1152 - Add feature flag for adapters

Pull Request - State: closed - Opened by zachgk about 1 year ago - 1 comment

#1151 - Instance Benchmark Rev2

Pull Request - State: closed - Opened by lanking520 about 1 year ago

#1150 - [serving] Allow model_id point to djl model zoo

Pull Request - State: closed - Opened by frankfliu about 1 year ago

#1149 - Instant benchmark

Pull Request - State: closed - Opened by lanking520 about 1 year ago

#1148 - Support adapters by properties

Pull Request - State: closed - Opened by zachgk about 1 year ago

#1147 - Block remote adapter url and handler override

Pull Request - State: closed - Opened by zachgk about 1 year ago

#1146 - Give a version of seq scheduler

Pull Request - State: closed - Opened by KexinFeng about 1 year ago

#1145 - [INF2][CI] switch the model to pythia

Pull Request - State: closed - Opened by lanking520 about 1 year ago - 1 comment

#1144 - [fix] version_fix

Pull Request - State: closed - Opened by KexinFeng about 1 year ago

#1143 - [CI] allow inf2 instance to sleep longer

Pull Request - State: closed - Opened by lanking520 about 1 year ago

#1142 - [CI][Neuron] add extra timeout time for gpt neox

Pull Request - State: closed - Opened by lanking520 about 1 year ago

#1141 - [WIP][FasterTransformer] use python 3.10.0 and upgrade pytorch

Pull Request - State: closed - Opened by lanking520 about 1 year ago - 2 comments

#1140 - Update vllm_rolling_batch.py

Pull Request - State: closed - Opened by lanking520 about 1 year ago

#1139 - Adding smoothquant integ tests

Pull Request - State: closed - Opened by maaquib about 1 year ago

#1138 - [feat] Modify deepspeed handler to support smoothQuant.

Pull Request - State: closed - Opened by chen3933 about 1 year ago - 3 comments

#1137 - [fix] Gptq dependency

Pull Request - State: closed - Opened by KexinFeng about 1 year ago - 4 comments

#1136 - [vLLM][Handler] add quantization option for vLLM

Pull Request - State: closed - Opened by lanking520 about 1 year ago

#1135 - [python] Make rolling batch output not escape unicode characters

Pull Request - State: closed - Opened by xyang16 about 1 year ago

#1134 - [INF2][Handler] remove type conversion in Neuron

Pull Request - State: closed - Opened by lanking520 about 1 year ago

#1133 - Revert flash_attn v2 version back to 2.0.1

Pull Request - State: closed - Opened by xyang16 about 1 year ago

#1132 - [fix] fix hf transformer handler dependency

Pull Request - State: closed - Opened by KexinFeng about 1 year ago - 2 comments

#1131 - [fix] Fix falcon in seq_scheduler

Pull Request - State: closed - Opened by KexinFeng about 1 year ago

#1130 - [0.22.1][DeepSpeed] make deepspeed run on cpu runner

Pull Request - State: closed - Opened by lanking520 about 1 year ago

#1129 - [fix] falcon test model failure in unittest

Pull Request - State: closed - Opened by KexinFeng about 1 year ago

#1128 - [Backport][0.22.1][INF2] remove header installation

Pull Request - State: closed - Opened by lanking520 about 1 year ago

#1127 - [Backport][0.23.0] remove INF2 header installation

Pull Request - State: closed - Opened by lanking520 about 1 year ago

#1126 - [INF2] remove neuron settings on cache hit for the folder

Pull Request - State: closed - Opened by lanking520 about 1 year ago - 1 comment

#1125 - Add rolling batch gptq integration test

Pull Request - State: closed - Opened by xyang16 about 1 year ago - 2 comments

#1124 - [Handler] bump up vllm version and fix some bugs

Pull Request - State: closed - Opened by lanking520 about 1 year ago

GitHub / deepjavalibrary/djl-serving issues and pull requests