WorksApplications/SudachiTra issues and pull requests

#67 - Fixes #66 - sudachitra not being compatible with transformers version newer than 4.34

Pull Request - State: closed - Opened by mingboiz about 1 year ago - 5 comments

#66 - sudachitra and other custom tokenizers no longer compatible with transformers later than 4.34

Issue - State: closed - Opened by mingboiz about 1 year ago - 4 comments

#65 - Can I use a user dictionary?

Issue - State: open - Opened by mumumu09chi almost 2 years ago - 2 comments

#64 - The entry of `\n` in `vocab.txt` is causing token index shifting

Issue - State: open - Opened by hiroshi-matsuda-rit almost 2 years ago

#63 - Introduce token-based authentication for PyPI

Issue - State: open - Opened by mh-northlander almost 2 years ago

#62 - setup.py install is deprecated.

Issue - State: open - Opened by mh-northlander almost 2 years ago

#61 - Update python-publish workflow

Pull Request - State: closed - Opened by mh-northlander almost 2 years ago - 2 comments

#60 - Python publish workflow is not kicked on the release

Issue - State: closed - Opened by mh-northlander almost 2 years ago - 1 comment

#59 - Prepare for chiTra-1.1

Pull Request - State: closed - Opened by mh-northlander almost 2 years ago

#58 - Prepare for v0.1.8

Pull Request - State: closed - Opened by mh-northlander almost 2 years ago

#57 - Vocabulary file handling

Issue - State: open - Opened by mh-northlander almost 2 years ago

#56 - Add changelog file

Issue - State: closed - Opened by mh-northlander almost 2 years ago

#55 - Add patch file for the JGLUE evaluation

Pull Request - State: closed - Opened by mh-northlander almost 2 years ago

#54 - Allow to save vocab with non-consecutive indices

Pull Request - State: closed - Opened by mh-northlander almost 2 years ago - 3 comments

#53 - Allow empty line in the vocab file

Issue - State: closed - Opened by mh-northlander almost 2 years ago

#52 - Evaluate model with JGLUE

Issue - State: closed - Opened by mh-northlander almost 2 years ago

#51 - tokenizer.model_max_length is incorrect

Issue - State: open - Opened by mh-northlander about 2 years ago - 1 comment

#50 - Feather/add normalized nouns

Pull Request - State: closed - Opened by katsutan over 2 years ago

#49 - add workflow_dispatch

Pull Request - State: closed - Opened by t-yamamura over 2 years ago - 2 comments

#48 - Support 接尾辞-動詞的 and 接尾辞-形容詞的

Pull Request - State: closed - Opened by KoichiYasuoka almost 3 years ago - 4 comments
Labels: duplicate

#47 - update document with the release of pretraining models

Pull Request - State: closed - Opened by t-yamamura almost 3 years ago

#46 - fix README for pretraining

Pull Request - State: closed - Opened by t-yamamura almost 3 years ago

#45 - Update README for pretraining

Pull Request - State: closed - Opened by t-yamamura about 3 years ago

#44 - Update README for pretraing

Issue - State: closed - Opened by t-yamamura about 3 years ago

#43 - Tokenizer initializations behave differently

Issue - State: open - Opened by mh-northlander about 3 years ago

#42 - Add to the test for alignments of encoded tokens by `JapaneseBertWordPieceTokenizer`

Issue - State: open - Opened by t-yamamura about 3 years ago
Labels: bug

#41 - use `pathlib` instead of `os.path`

Issue - State: open - Opened by t-yamamura about 3 years ago

#40 - pretraining by NVIDIA

Pull Request - State: closed - Opened by katsutan about 3 years ago - 1 comment

#39 - Make `split_dataset.py` support huge file input.

Pull Request - State: closed - Opened by t-yamamura about 3 years ago - 2 comments

#38 - Feature/use huggingface compatible pretokenizer

Pull Request - State: closed - Opened by t-yamamura about 3 years ago - 1 comment

#37 - Add scripts for the model evaluation

Pull Request - State: closed - Opened by mh-northlander about 3 years ago - 2 comments

#36 - use PosMatcher instead of `part_of_speech()`

Pull Request - State: closed - Opened by t-yamamura about 3 years ago

#35 - Feature/conjugation preserving normalize for subword

Pull Request - State: closed - Opened by t-yamamura about 3 years ago

#34 - Fix/modify merged preprocessing codes

Pull Request - State: closed - Opened by t-yamamura about 3 years ago

#33 - Use scripts for pretraining implemented by NVIDIA

Issue - State: closed - Opened by t-yamamura about 3 years ago

#32 - Feature/add cleaning and preprocessing

Pull Request - State: closed - Opened by t-yamamura about 3 years ago

#31 - add normalizer that leaved conjugation

Pull Request - State: closed - Opened by katsutan about 3 years ago - 2 comments

#30 - require sudachipy>=0.6.0

Pull Request - State: closed - Opened by t-yamamura over 3 years ago

#29 - remove slow tokenizer

Pull Request - State: closed - Opened by t-yamamura over 3 years ago

#28 - remove slow tokenizer

Issue - State: closed - Opened by t-yamamura over 3 years ago

#27 - add NFKC normalization

Pull Request - State: closed - Opened by t-yamamura over 3 years ago

#26 - use NFKC as preprocessing

Issue - State: closed - Opened by t-yamamura over 3 years ago

#25 - remove lowercase normalizer

Pull Request - State: closed - Opened by t-yamamura over 3 years ago

#24 - Remove lowercase normalizer

Issue - State: closed - Opened by t-yamamura over 3 years ago

#23 - Add preprocessing for cleaning up corpus

Issue - State: closed - Opened by t-yamamura over 3 years ago

#22 - Replace SudachiPy with sudachi.rs

Issue - State: closed - Opened by t-yamamura over 3 years ago

#21 - improve default configurations

Pull Request - State: closed - Opened by hiroshi-matsuda-rit over 3 years ago

#20 - fix slow tokenizer

Pull Request - State: closed - Opened by t-yamamura over 3 years ago

#19 - add slow tokenizer

Pull Request - State: closed - Opened by t-yamamura over 3 years ago

#18 - Re-register submodule

Pull Request - State: closed - Opened by t-yamamura over 3 years ago

#17 - update submodule

Pull Request - State: closed - Opened by t-yamamura over 3 years ago

#16 - make dirs before saving vocab

Pull Request - State: closed - Opened by t-yamamura over 3 years ago

#15 - fix wrong package name

Pull Request - State: closed - Opened by t-yamamura over 3 years ago

#14 - store line_per_file as int

Pull Request - State: closed - Opened by t-yamamura over 3 years ago

#13 - Fix/train tokenizer args

Pull Request - State: closed - Opened by katsutan over 3 years ago

#12 - Adapt to the text preprocessing of SudachiPy

Pull Request - State: closed - Opened by t-yamamura over 3 years ago

#11 - fix import

Pull Request - State: closed - Opened by t-yamamura over 3 years ago

#10 - Bump bunkai from 1.3.0 to 1.4.0

Pull Request - State: closed - Opened by t-yamamura over 3 years ago

#9 - Fix typos

Pull Request - State: closed - Opened by sorami over 3 years ago - 1 comment

#8 - Create python-publish.yaml

Pull Request - State: closed - Opened by t-yamamura over 3 years ago

#7 - Rename package

Pull Request - State: closed - Opened by t-yamamura over 3 years ago

#6 - Refactor/codes for pretraining

Pull Request - State: closed - Opened by t-yamamura over 3 years ago

#5 - Feature/add documents and comments

Pull Request - State: closed - Opened by t-yamamura over 3 years ago

#4 - add pos pretokenizer

Pull Request - State: closed - Opened by t-yamamura over 3 years ago

#3 - fix import structure

Pull Request - State: closed - Opened by t-yamamura over 3 years ago

#2 - transformers should be >= 4.6.1

Pull Request - State: closed - Opened by hiroshi-matsuda-rit over 3 years ago

#1 - rename package name

Pull Request - State: closed - Opened by t-yamamura over 3 years ago

GitHub / WorksApplications/SudachiTra issues and pull requests