alibaba/BladeDISC issues and pull requests

#1315 - Evt source op lowering

Pull Request - State: open - Opened by Xinyu302 11 days ago - 1 comment

#1314 - build pytorch BladeDISC in docker

Issue - State: open - Opened by aiyxxj about 1 month ago

#1313 - auto offloading in dynamic shape IR

Pull Request - State: open - Opened by Yancey1989 about 2 months ago

#1312 - How to export compile_commands.json

Issue - State: open - Opened by foocoder about 2 months ago

#1311 - using lmhlo_disc.concatenateOp if all operands are fixed shape

Pull Request - State: closed - Opened by Yancey1989 about 2 months ago

#1310 - simplifier scalar tensor shape to scalar i32

Pull Request - State: closed - Opened by Yancey1989 2 months ago

#1309 - Fix the error when compiling with torchacc 2.3

Pull Request - State: closed - Opened by anw90 3 months ago - 1 comment

#1308 - fixup shape propagate pass

Pull Request - State: closed - Opened by Yancey1989 3 months ago

#1307 - add buffer-live-range analysis on lmhlo

Pull Request - State: closed - Opened by Yancey1989 3 months ago - 1 comment

#1306 - [WIP]Add AutoOffloadingPass

Pull Request - State: closed - Opened by Yancey1989 3 months ago - 1 comment

#1305 - use multi-stream for TensorRT Engine Op

Issue - State: closed - Opened by zhyncs 3 months ago - 2 comments

#1304 - Add DiscRematerializationPass to auto recompute

Pull Request - State: closed - Opened by eedalong 4 months ago - 1 comment

#1303 - add view op dynamic shape propagate

Pull Request - State: closed - Opened by Yancey1989 4 months ago

#1302 - Support shape_propagate for more ops

Pull Request - State: closed - Opened by eedalong 4 months ago

#1301 - Add ConvertSimplifyPattern

Pull Request - State: closed - Opened by eedalong 4 months ago

#1300 - Support bf16 constant load and collective ops

Pull Request - State: closed - Opened by eedalong 4 months ago

#1299 - Support bf16 ScatterOp

Pull Request - State: closed - Opened by eedalong 4 months ago

#1298 - add shape propagate pass on mhlo

Pull Request - State: closed - Opened by Yancey1989 4 months ago

#1297 - Some minor fixes

Pull Request - State: closed - Opened by eedalong 4 months ago

#1296 - fix yitian ci build failed

Pull Request - State: closed - Opened by Yancey1989 5 months ago

#1295 - [bugfix] fix scatter op accuracy

Pull Request - State: closed - Opened by Yancey1989 5 months ago

#1294 - How to get best performace with optimization with torch_blade

Issue - State: open - Opened by JackWeiw 5 months ago - 2 comments

#1293 - support Llama bf16 amp training

Pull Request - State: closed - Opened by Yancey1989 6 months ago

#1292 - Optimize input-output alias

Pull Request - State: closed - Opened by eedalong 6 months ago

#1291 - Add more algebra simplify rules

Pull Request - State: closed - Opened by eedalong 6 months ago

#1290 - build pytorch_blade failed

Issue - State: closed - Opened by JackWeiw 6 months ago - 3 comments

#1289 - support collective operators

Issue - State: closed - Opened by Yancey1989 7 months ago

#1288 - Add DiscCollectiveOpsPass and related collective ops

Pull Request - State: closed - Opened by Yancey1989 7 months ago

#1287 - Support async collective op execution

Pull Request - State: closed - Opened by eedalong 7 months ago

#1286 - Support optimization barrier op

Pull Request - State: closed - Opened by eedalong 7 months ago

#1284 - Add scalar reduction codegen schedule

Pull Request - State: open - Opened by Yancey1989 7 months ago

#1283 - Support mhlo.custom_call op processing

Pull Request - State: closed - Opened by eedalong 7 months ago

#1282 - Always try to lower standalone lmhlo.transpose to custom call for better performance

Pull Request - State: closed - Opened by eedalong 7 months ago

#1281 - Test

Pull Request - State: closed - Opened by aisha131996 8 months ago - 1 comment

#1280 - Warning when run tensorflow model with bladedisc

Issue - State: open - Opened by DY-TL 8 months ago - 1 comment

#1279 - add reduce-buffer-live-range pass to reduce memory peak

Pull Request - State: closed - Opened by Yancey1989 8 months ago

#1278 - README typos

Pull Request - State: open - Opened by tgenesio 9 months ago

#1277 - add one-hop buffer reuse propogation

Pull Request - State: closed - Opened by eedalong 9 months ago

#1276 - [WIP] Support Alias V2

Pull Request - State: closed - Opened by eedalong 9 months ago

#1275 - support collective all_reduce op

Pull Request - State: closed - Opened by Yancey1989 9 months ago

#1273 - Add pass to process input and output alias hint info

Pull Request - State: closed - Opened by eedalong 9 months ago

#1272 - support scatter op and update tf_community submodule

Pull Request - State: closed - Opened by eedalong 10 months ago - 1 comment

#1271 - cuda graph and multi-stream support

Issue - State: closed - Opened by zhyncs 10 months ago - 2 comments

#1270 - Support ScatterOp Codegen

Pull Request - State: closed - Opened by eedalong 11 months ago - 1 comment

#1269 - Model with custom cpu Op will hang when open BladeDISC optimize.

Issue - State: open - Opened by tuanzhangCS 11 months ago

#1267 - compile optimizer computation graph in stable diffusion finetune

Pull Request - State: closed - Opened by Yancey1989 12 months ago

#1265 - [TorchBench] Performance Signal Detected

Issue - State: open - Opened by zzpmiracle 12 months ago
Labels: Benchmark

#1264 - In TF_Blade, we found TRT_SUPPORT_LIST is not match the Onnx-Tensorrt Op Support List.

Issue - State: open - Opened by b4b4o 12 months ago

#1263 - gomp_barrier_wait_end func consumes most of cpu time

Issue - State: open - Opened by Xavier1994 12 months ago

#1262 - [TorchBench] Performance Signal Detected

Issue - State: open - Opened by zzpmiracle 12 months ago
Labels: Benchmark

#1261 - [TorchBench] Performance Signal Detected

Issue - State: open - Opened by zzpmiracle about 1 year ago
Labels: Benchmark

#1260 - add more ops for MLIR-based end-to-end GPU Tensor Core GEMM codegen

Pull Request - State: closed - Opened by Guo-Peilin about 1 year ago

#1259 - How to compile and run models from pytorch in dynamic-shape mode?

Issue - State: open - Opened by SunflowerAries about 1 year ago

#1258 - [TorchBench] Performance Signal Detected

Issue - State: open - Opened by zzpmiracle about 1 year ago
Labels: Benchmark

#1257 - Refine colreduction fusion strategy of kStitch.

Pull Request - State: closed - Opened by yunzhongOvO about 1 year ago

#1255 - support stablehlo in torch-mlir-opt binary

Pull Request - State: closed - Opened by Yancey1989 about 1 year ago

#1254 - fix stablediffusion finetune compilation failed

Pull Request - State: closed - Opened by Yancey1989 about 1 year ago

#1253 - about pdll inculded cpp function

Issue - State: open - Opened by callmelaoyi about 1 year ago - 2 comments

#1252 - [TorchBench] Performance Signal Detected

Issue - State: open - Opened by zzpmiracle about 1 year ago
Labels: Benchmark

#1251 - update readme

Pull Request - State: closed - Opened by qiuxiafei about 1 year ago

#1250 - Dont Support FusedBatchNorm OP

Issue - State: open - Opened by theflyfish about 1 year ago

#1249 - [TorchBench] Performance Signal Detected

Issue - State: open - Opened by zzpmiracle about 1 year ago
Labels: Benchmark

#1248 - Failed to initialize nvml unknown error inside docker

Issue - State: open - Opened by HassanAsghar1 about 1 year ago

#1246 - [TorchBench] Performance Signal Detected

Issue - State: open - Opened by zzpmiracle about 1 year ago
Labels: Benchmark

#1245 - [TorchBench] Performance Signal Detected

Issue - State: open - Opened by zzpmiracle about 1 year ago
Labels: Benchmark

#1244 - Unable to compile StableDiffusion

Issue - State: open - Opened by renderless about 1 year ago

#1237 - [TorchBench] Performance Signal Detected

Issue - State: open - Opened by zzpmiracle about 1 year ago
Labels: Benchmark

#1236 - Enable the experimental mem-intensive operator optimization by default.

Pull Request - State: closed - Opened by JamesTheZ about 1 year ago

#1234 - [TorchBench] Performance Signal Detected

Issue - State: open - Opened by zzpmiracle about 1 year ago
Labels: Benchmark

#1230 - refine disc pdll uts

Pull Request - State: closed - Opened by chenbohua3 about 1 year ago

#1229 - coredump when set tf_enable_tao be true

Issue - State: open - Opened by houjincheng1992 about 1 year ago

#1228 - update torch-pre version.

Pull Request - State: closed - Opened by qiuxiafei about 1 year ago - 1 comment

#1227 - use cudaMemsetAsync on ral memset

Pull Request - State: closed - Opened by chenbohua3 about 1 year ago

#1225 - run through inplace KV cache compilation pass pipeline in torch_blade

Pull Request - State: closed - Opened by Yancey1989 about 1 year ago

#1222 - need a flag to sync on cuda stream in BladeDISC RAL

Issue - State: open - Opened by Yancey1989 about 1 year ago

#1221 - support inplace kv cache compilation

Pull Request - State: closed - Opened by Yancey1989 about 1 year ago

#1220 - Enable cudamemset

Pull Request - State: closed - Opened by chenbohua3 about 1 year ago

#1219 - [TorchBench] Performance Signal Detected

Issue - State: open - Opened by zzpmiracle over 1 year ago
Labels: Benchmark

#1218 - support inplace operator in TorchBlade

Issue - State: closed - Opened by Yancey1989 over 1 year ago
Labels: feature

#1217 - Support setting scratch memory limit by env

Pull Request - State: closed - Opened by bikekiller over 1 year ago

#1215 - enable attention backward op by default

Pull Request - State: closed - Opened by chenbohua3 over 1 year ago

#1213 - Rebase community TensorFlow.

Pull Request - State: closed - Opened by JamesTheZ over 1 year ago

#1212 - [TorchBench] Performance Signal Detected

Issue - State: closed - Opened by zzpmiracle over 1 year ago - 1 comment
Labels: Benchmark

#1209 - using args-mutation annotation to support inplace op in torch-blade pipline

Pull Request - State: closed - Opened by Yancey1989 over 1 year ago

#1205 - relax min node nums in pdll gpu e2e ut

Pull Request - State: closed - Opened by chenbohua3 over 1 year ago

#1204 - limit bias to be a constant for weight-only qgemm

Pull Request - State: closed - Opened by chenbohua3 over 1 year ago

#1201 - GPU memory-intensive codegen: add column reduction schedule.

Pull Request - State: closed - Opened by yunzhongOvO over 1 year ago - 2 comments

#1200 - selection of gemm kernel under different input shapes

Issue - State: open - Opened by LiZerun over 1 year ago

#1199 - The basic end-to-end GPU Tensor Core GEMM MLIR-codegen.

Pull Request - State: closed - Opened by JamesTheZ over 1 year ago - 1 comment

#1198 - GPU fusion strategy of TransformDialect-based GEMM codegen.

Pull Request - State: closed - Opened by JamesTheZ over 1 year ago

#1197 - Transform ops for GPU GEMM codegen.

Pull Request - State: closed - Opened by JamesTheZ over 1 year ago

#1196 - Fix typo for keeping tempfiles

Pull Request - State: closed - Opened by bikekiller over 1 year ago

#1194 - fix weight-only qgemm layout transformation bugs

Pull Request - State: closed - Opened by chenbohua3 over 1 year ago

#1193 - [TorchBench] Performance Signal Detected

Issue - State: closed - Opened by zzpmiracle over 1 year ago - 1 comment
Labels: Benchmark

#1192 - weight-only quant linear pdl patterns

Pull Request - State: closed - Opened by chenbohua3 over 1 year ago

#1191 - enable pdl pattern match in more place

Pull Request - State: closed - Opened by chenbohua3 over 1 year ago

#1190 - add stable diffusion fine-tune example

Pull Request - State: closed - Opened by Yancey1989 over 1 year ago - 1 comment

#1188 - [Torch-Blade] add conv_nhwc_lowering

Pull Request - State: closed - Opened by zzpmiracle over 1 year ago - 1 comment

#1187 - Pass for transforming weight layout for A16w8/A16w4 Gemm

Pull Request - State: closed - Opened by chenbohua3 over 1 year ago

#1186 - [TorchBench] Performance Signal Detected

Issue - State: closed - Opened by zzpmiracle over 1 year ago
Labels: Benchmark

GitHub / alibaba/BladeDISC issues and pull requests