microsoft/torchscale issues and pull requests

#113 - torchscale 0.3.0 does not include LongNet

Issue - State: open - Opened by tsy935 2 months ago

#112 - train.py: error: unrecognized arguments: --flash-attention --segment-length [2048,4096] --dilated-ratio [1,2]

Issue - State: open - Opened by github2657529567 3 months ago

#111 - No module named 'sentencepiece'

Issue - State: open - Opened by github2657529567 3 months ago

#110 - torchscale 0.3.0 requires fairscale==0.4.0, but you have fairscale 0.4.13 which is incompatible.

Issue - State: open - Opened by pandayummy 5 months ago

#109 - Minecraft

Issue - State: closed - Opened by Pelaez99 7 months ago

#108 - Question about LongNet attention map overlap

Issue - State: open - Opened by RmZeta2718 9 months ago

#107 - Different batch sizes lead to different evalution results for LongVIT

Issue - State: open - Opened by HHHedo 9 months ago

#106 - How to test the model

Issue - State: open - Opened by ReloJeffrey 9 months ago

#105 - pip error

Issue - State: open - Opened by wanghaoran-ucas 10 months ago

#104 - Where is the offset implemented in Multi-head dilated attention ?

Issue - State: open - Opened by AshStuff 10 months ago

#103 - can't use longvit

Issue - State: open - Opened by abebe9849 10 months ago

#102 - Question about learnable segment lengths and dilation rates

Issue - State: open - Opened by benrousePUC 10 months ago

#101 - How to use retention in RetNet for cross-attention?

Issue - State: open - Opened by yxchng 11 months ago

#100 - renames longnet file; longnet example in readme works now

Pull Request - State: open - Opened by JacksonSearle 11 months ago - 1 comment

#99 - Checkpoint for RetNet

Issue - State: open - Opened by macsz 11 months ago

#98 - What WSI level was used for pretraining LongVit?

Issue - State: closed - Opened by jpfeil 12 months ago - 1 comment

#97 - about attention mask

Issue - State: closed - Opened by hichoe95 about 1 year ago

#96 - Bump pillow from 10.0.0 to 10.2.0 in /examples/longvit

Pull Request - State: open - Opened by dependabot[bot] about 1 year ago
Labels: dependencies

#95 - about the longnet's ppl

Issue - State: open - Opened by robotzheng about 1 year ago - 2 comments

#94 - Update requirements.txt

Pull Request - State: closed - Opened by I8dNLo about 1 year ago

#93 - Bump transformers from 4.8.1 to 4.36.0 in /examples/longvit

Pull Request - State: open - Opened by dependabot[bot] about 1 year ago
Labels: dependencies

#92 - Fix No module named 'torch._six'

Pull Request - State: open - Opened by ahmedhshahin about 1 year ago

#91 - Bump pyarrow from 9.0.0 to 14.0.1 in /examples/longvit

Pull Request - State: open - Opened by dependabot[bot] about 1 year ago
Labels: dependencies

#90 - Bump scipy from 1.6.3 to 1.10.0 in /examples/longvit

Pull Request - State: open - Opened by dependabot[bot] about 1 year ago
Labels: dependencies

#89 - Bump pillow from 10.0.0 to 10.0.1 in /examples/longvit

Pull Request - State: closed - Opened by dependabot[bot] about 1 year ago - 1 comment
Labels: dependencies

#88 - Bump transformers from 4.8.1 to 4.30.0 in /examples/longvit

Pull Request - State: closed - Opened by dependabot[bot] about 1 year ago - 1 comment
Labels: dependencies

#87 - Release LongNet and LongViT

Pull Request - State: closed - Opened by shumingma about 1 year ago

#86 - Wrong Rnm Normalization.

Issue - State: open - Opened by pdradx about 1 year ago - 1 comment

#85 - Introducing padding_mask to RetNet

Issue - State: open - Opened by xtwigs about 1 year ago - 2 comments

#84 - Question regarding the configuration of decoder_retention_heads

Issue - State: open - Opened by Kratos-Wen about 1 year ago - 2 comments

#83 - Training RetNet on A100 GPUs

Issue - State: open - Opened by Antoine-Bergerault about 1 year ago - 1 comment

#82 - [Minor issue] Discrepancy inside arxiv paper

Issue - State: open - Opened by radarFudan about 1 year ago

#81 - Question about the normalization in attention

Issue - State: closed - Opened by Cranial-XIX over 1 year ago - 2 comments

#80 - Question about RetNetRelPos

Issue - State: closed - Opened by hyunwoongko over 1 year ago - 2 comments

#79 - about gamma/decay in RetNet

Issue - State: closed - Opened by rouniuyizu over 1 year ago - 2 comments

#78 - typo in normalization denominator in parallel retention?

Issue - State: closed - Opened by XintianHan over 1 year ago - 1 comment

#77 - Chunk recurrent representation incorrect results

Issue - State: closed - Opened by N0r9st over 1 year ago - 7 comments

#76 - Query about Retentive Network's Recurrent Representation

Issue - State: closed - Opened by gopi231091 over 1 year ago - 1 comment

#75 - About training memory

Issue - State: closed - Opened by HoraceXIaoyiBao over 1 year ago - 2 comments

#74 - BEiT3 Vision-Language Expert question

Issue - State: closed - Opened by andreapdr over 1 year ago - 4 comments

#73 - AttributeError: 'EncoderDecoderConfig' object has no attribute 'normalize_output'

Issue - State: closed - Opened by 3CodeLove over 1 year ago - 3 comments

#72 - RuntimeError: The size of tensor a (5) must match the size of tensor b (2) at non-singleton dimension 0

Issue - State: closed - Opened by codinglover0111 over 1 year ago - 3 comments

#71 - Compatibility with torchsummary

Issue - State: closed - Opened by lzqlzzq over 1 year ago - 1 comment

#70 - fix fairseq example

Pull Request - State: closed - Opened by sunyt32 over 1 year ago

#69 - Update new RetNet settings

Pull Request - State: closed - Opened by sunyt32 over 1 year ago

#68 - initialization of qkv

Issue - State: closed - Opened by XintianHan over 1 year ago - 3 comments

#67 - pip package does not contain RetNet

Issue - State: closed - Opened by fabienGenhealth over 1 year ago - 2 comments

#66 - Question on decay factor for attention with xPos

Issue - State: closed - Opened by mvbakulin over 1 year ago - 1 comment

#65 - There're a confusion in torchscale

Issue - State: closed - Opened by lovekang3344 over 1 year ago - 3 comments

#64 - retnet traning config

Issue - State: open - Opened by hanlinxuy over 1 year ago - 7 comments

#63 - Could you please explain the reason behind defining TEMPERATURE_FOR_L_UAX in the code without actually using it?

Issue - State: closed - Opened by Ruiyuan-Zhang over 1 year ago - 1 comment

#62 - Question about the recurrent forward of MultiScaleRetention

Issue - State: closed - Opened by LEECHOONGHO over 1 year ago - 2 comments

#61 - Can Torchscale be applied in point cloud tasks?

Issue - State: closed - Opened by huihui0613 over 1 year ago - 2 comments

#60 - `get_moe_group` 's return is None, when building `class MOELayer(Base)` , using one gpu

Issue - State: closed - Opened by Ruiyuan-Zhang over 1 year ago - 4 comments

#59 - embed_tokens

Issue - State: open - Opened by CodeMiningCZW over 1 year ago - 4 comments

#58 - Question about is_first_step and Retnet

Issue - State: closed - Opened by tdomhan over 1 year ago - 2 comments

#57 - Retnet parameter dimension

Issue - State: closed - Opened by allanj over 1 year ago - 2 comments

#56 - "sentencepiece.bpe.model" and "dict.txt" in page below seem not available

Issue - State: closed - Opened by HuXinjing over 1 year ago - 2 comments

#55 - Retnet training is slow

Issue - State: closed - Opened by Zth9730 over 1 year ago - 2 comments

#54 - RetNet : Check consistency of each forward mode

Issue - State: closed - Opened by mmorinag127 over 1 year ago - 9 comments

#53 - Is there some example of the paper? e.g., compare of the inference latency

Issue - State: closed - Opened by LiZeng001 over 1 year ago - 1 comment

#52 - Training & Inference examples for RetNet

Issue - State: closed - Opened by jhl-Det over 1 year ago - 1 comment

#51 - fix chunkwise inconsistency bug

Pull Request - State: closed - Opened by sunyt32 over 1 year ago

#50 - Adding sqrt in the recurrent_forward of retnet to make it consistent with parallel_forward

Pull Request - State: closed - Opened by wangmengzhi over 1 year ago

#49 - RetNet: relative position

Issue - State: closed - Opened by fkodom over 1 year ago - 5 comments

#48 - Multi-Scale Retention: Why include position embeddings explicitly?

Issue - State: closed - Opened by fkodom over 1 year ago - 3 comments

#47 - scale.sqrt() in the recurrent_forward function of the multiscale_retention module

Issue - State: closed - Opened by wangmengzhi over 1 year ago - 6 comments

#46 - Update epsilon in retention

Pull Request - State: closed - Opened by sunyt32 over 1 year ago

#45 - LEX inference support and checkpoint

Issue - State: closed - Opened by RulinShao over 1 year ago - 5 comments

#44 - recurrent_forward in MultiScaleRetention

Issue - State: closed - Opened by Anker-ZX-AI over 1 year ago - 1 comment

#43 - AttributeError: 'EncoderConfig' object has no attribute 'decoder_layers'

Issue - State: closed - Opened by dedekinds over 1 year ago - 2 comments

#42 - the meaning of "incremental_state" in RetNet

Issue - State: closed - Opened by jhl-Det over 1 year ago - 3 comments

#41 - can not download dict.txt

Issue - State: closed - Opened by robotzheng over 1 year ago - 2 comments

#40 - Inconsist recurrent and parallel results for RetNet

Issue - State: closed - Opened by YirunKCL over 1 year ago - 4 comments

#39 - Config fix

Pull Request - State: open - Opened by agoryuno over 1 year ago

#38 - Remove inheritance from `object`

Pull Request - State: open - Opened by agoryuno over 1 year ago - 2 comments

#37 - Longnet Code Release

Issue - State: closed - Opened by arnavdantuluri over 1 year ago - 13 comments

#36 - testing very large attention windows

Issue - State: open - Opened by fredzannarbor over 1 year ago

#35 - About the param `scale_base`

Issue - State: closed - Opened by horizon94 over 1 year ago - 1 comment

#34 - some result plots are not show

Issue - State: closed - Opened by klae01 over 1 year ago - 1 comment

#33 - support lm prefix computation in one go

Pull Request - State: closed - Opened by XingxingZhang over 1 year ago

#32 - EncoderDecoder Configuration Issue

Issue - State: closed - Opened by klae01 over 1 year ago - 1 comment

#31 - add basic test

Pull Request - State: closed - Opened by klae01 over 1 year ago - 1 comment

#30 - make pgs global

Pull Request - State: closed - Opened by njb-ms almost 2 years ago - 2 comments

#29 - question about the number of output_projection

Issue - State: closed - Opened by violet-sto almost 2 years ago - 1 comment

#28 - xPos cross-attention change

Issue - State: closed - Opened by janEbert almost 2 years ago - 2 comments

#27 - Bump timm version to latest

Pull Request - State: closed - Opened by JonathanRayner almost 2 years ago

#25 - Fairseq version compatible with torchscale

Issue - State: closed - Opened by sjelassi almost 2 years ago - 1 comment

#24 - Swapped naive dot product attention for flash attention

Pull Request - State: open - Opened by usryokousha almost 2 years ago - 4 comments

#23 - About running speed

Issue - State: open - Opened by NieShenRuc almost 2 years ago

#22 - Could not install fairseq

Issue - State: closed - Opened by BaohaoLiao almost 2 years ago - 1 comment

#21 - v0.2.0

Pull Request - State: closed - Opened by shumingma almost 2 years ago

#20 - fx BERT + moe

Pull Request - State: closed - Opened by buaahsh almost 2 years ago

#19 - Update README.md

Pull Request - State: closed - Opened by buaahsh almost 2 years ago

#18 - Installer bug - wrong `apex` package installed

Issue - State: closed - Opened by jph00 almost 2 years ago - 2 comments

#17 - SMOE or XMOE Network how to "evaluate" and "save and resume"

Issue - State: closed - Opened by randomtutu almost 2 years ago - 2 comments

#16 - Questions about the implementation of deepnorm

Issue - State: closed - Opened by jiaohuix almost 2 years ago - 2 comments

#15 - [Question] what are the usages of multiway_network.py?

Issue - State: closed - Opened by berniewang8177 almost 2 years ago - 2 comments

#14 - Does Torchscale support vision transformers in vision tasks?

Issue - State: closed - Opened by nightsnack about 2 years ago - 5 comments

#13 - Q) Tensor parallel for magneto

Issue - State: closed - Opened by taehwakkwon about 2 years ago - 9 comments

GitHub / microsoft/torchscale issues and pull requests