Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / alvations/sacremoses issues and pull requests

#138 - Web and basic protected patterns by default

Pull Request - State: open - Opened by samirsalman over 1 year ago

#137 - Quiet flag has no effect on the detokenizer

Issue - State: open - Opened by XapaJIaMnu over 1 year ago

#136 - [Question] Why is <unk> token tokenized into 3 items?

Issue - State: open - Opened by MarkusSpanring almost 2 years ago

#135 - Is this package multi-threaded?

Issue - State: open - Opened by ejkitchen about 2 years ago

#134 - Clean up Python 2 compatibility-related codes

Pull Request - State: closed - Opened by BLKSerene about 2 years ago - 1 comment

#133 - compile regex objects ahead of time for improved perf.

Pull Request - State: open - Opened by erip about 2 years ago - 1 comment

#132 - Add replacement for space before special symbols

Pull Request - State: closed - Opened by dimitrityan over 2 years ago

#131 - Restrict click to be <8.1

Pull Request - State: open - Opened by alecbrick over 2 years ago - 3 comments

#130 - Error with CLI `tokenize` using `click==8.1.3`

Issue - State: closed - Opened by gsarti over 2 years ago - 10 comments
Labels: bug

#129 - New release?

Issue - State: closed - Opened by erip over 2 years ago - 4 comments

#128 - Code cleanup

Pull Request - State: closed - Opened by BLKSerene over 2 years ago - 1 comment

#127 - use setuptools instead of distutils

Pull Request - State: closed - Opened by alvations over 2 years ago

#126 - distutils is deprecated in Python 3.10

Issue - State: closed - Opened by tirkarthi over 2 years ago - 1 comment

#125 - Which of the 100 languages used in mBERT are not supported by this tokenizer?

Issue - State: closed - Opened by fake-warrior8 almost 3 years ago - 1 comment

#124 - Cache downloaded tokenizer files

Issue - State: closed - Opened by masus04 almost 3 years ago - 1 comment

#123 - Is there a way to use sacremoses in java?

Issue - State: closed - Opened by SefaZeng almost 3 years ago - 1 comment
Labels: feature-request

#122 - Trying to get in touch regarding a security issue

Issue - State: closed - Opened by JamieSlome about 3 years ago - 2 comments

#121 - Fixed protected patterns, truecase logic

Pull Request - State: open - Opened by pluiez about 3 years ago

#120 - Chinese full stop “。” can't be split.

Issue - State: open - Opened by BrightXiaoHan about 3 years ago - 1 comment

#119 - Add missing language for nonbreaking prefixes

Pull Request - State: closed - Opened by BLKSerene over 3 years ago

#118 - deep detokenizer

Issue - State: open - Opened by chessgecko over 3 years ago

#117 - can't tokenise the period properly

Issue - State: open - Opened by gdxie1 over 3 years ago

#116 - No detokenize_penn?

Issue - State: open - Opened by mfelice over 3 years ago

#115 - update release on github

Issue - State: open - Opened by PeterLijunfeng over 3 years ago

#114 - Add tokenization for Tetun (tdt)

Pull Request - State: open - Opened by raphaelmerx over 3 years ago

#112 - bug report

Issue - State: open - Opened by Brucewuzhang almost 4 years ago

#111 - fix "int' + 'str' error, revert to what mosesdecoder does

Pull Request - State: open - Opened by Brucewuzhang almost 4 years ago - 1 comment

#109 - Bug: normalize prints extra newline

Issue - State: closed - Opened by mjpost about 4 years ago - 1 comment
Labels: bug

#108 - Fix detruecaser when the first token is all-caps

Pull Request - State: closed - Opened by yuyang-huang about 4 years ago - 2 comments

#107 - error: loading model for detruecaser

Issue - State: open - Opened by colingair about 4 years ago

#106 - detrucase Error: unsupported operand type(s)

Issue - State: open - Opened by colingair about 4 years ago

#105 - truecase training killed: 9

Issue - State: open - Opened by colingair about 4 years ago - 1 comment

#104 - License Clarification

Issue - State: closed - Opened by TRQ3 about 4 years ago - 1 comment

#103 - Improvements: virama and nukthas of Indic languages, easy way to specify basic protected patterns

Pull Request - State: closed - Opened by thammegowda about 4 years ago - 5 comments
Labels: bug, enhancement, awesome-contribution

#102 - Possible to retrain/keep training an existing model?

Issue - State: open - Opened by petulla over 4 years ago - 5 comments
Labels: question, feature-request

#101 - Change to chain.from_iterable in truecase.py

Pull Request - State: closed - Opened by cool-RR over 4 years ago - 12 comments

#100 - Use chain.from_iterable in normalize.py

Pull Request - State: closed - Opened by cool-RR over 4 years ago - 7 comments

#99 - Is there a plan to have sent_tokenize in this library?

Issue - State: open - Opened by jeremyasapp over 4 years ago - 2 comments
Labels: enhancement, feature-request

#98 - Tokenizer -x option is confusing

Issue - State: open - Opened by ZJaume over 4 years ago - 3 comments
Labels: enhancement, cli

#97 - Non-breaking prefixes at the end of the sentence

Issue - State: closed - Opened by ZJaume over 4 years ago - 2 comments
Labels: duplicate, wontfix

#96 - New pipeline feature!

Pull Request - State: closed - Opened by alvations over 4 years ago - 2 comments

#95 - Dropping Python 2.7 support.

Issue - State: closed - Opened by alvations over 4 years ago - 1 comment
Labels: pythonic

#94 - Fixes deprecation warning

Pull Request - State: closed - Opened by alvations over 4 years ago

#93 - Relicense to MIT

Pull Request - State: closed - Opened by alvations over 4 years ago - 2 comments
Labels: help wanted, license

#92 - Create LICENSE

Pull Request - State: closed - Opened by alvations over 4 years ago

#91 - PyPI tarball contains code with syntax errors

Issue - State: closed - Opened by adamjstewart over 4 years ago - 3 comments
Labels: bug

#90 - Escape option on command line

Issue - State: closed - Opened by felipealco over 4 years ago - 1 comment

#89 - normalize broken

Issue - State: closed - Opened by mjpost over 4 years ago - 2 comments
Labels: bug

#88 - Fixing typos in truecase function

Pull Request - State: closed - Opened by HaukurPall over 4 years ago - 1 comment
Labels: bug

#87 - NameError: name 'words' is not defined in truecaser

Issue - State: closed - Opened by butsugiri over 4 years ago - 2 comments

#86 - pip3 install error: AttributeError: '_io.BufferedWriter' object has no attribute 'encoding'

Issue - State: closed - Opened by youssefavx over 4 years ago - 6 comments
Labels: bug

#85 - Fix normalize command

Pull Request - State: closed - Opened by myleott over 4 years ago - 1 comment

#84 - Deprecation warning due to invalid escape sequences in Python 3.8

Issue - State: closed - Opened by tirkarthi over 4 years ago - 2 comments
Labels: pythonic

#83 - LICENSE file

Issue - State: closed - Opened by noqcks over 4 years ago - 4 comments
Labels: license

#82 - Add lowercase script?

Issue - State: open - Opened by mayhewsw over 4 years ago - 20 comments
Labels: feature-request

#81 - Fix typo, up version

Pull Request - State: closed - Opened by alvations over 4 years ago

#80 - Added more normalization script

Pull Request - State: closed - Opened by alvations over 4 years ago

#79 - Space deduplication

Issue - State: open - Opened by alvations almost 5 years ago - 3 comments
Labels: help wanted, question

#78 - Patching single quotes normalization

Pull Request - State: closed - Opened by alvations almost 5 years ago - 1 comment

#76 - Difference between regex and re library

Issue - State: open - Opened by alvations almost 5 years ago - 2 comments
Labels: help wanted, question

#75 - Tag releases

Issue - State: closed - Opened by ryandesign almost 5 years ago - 1 comment

#74 - Apostrophes in English

Issue - State: open - Opened by j0hannes almost 5 years ago - 12 comments
Labels: bug, wontfix

#73 - Weird results for Tamil and Russian tokenization

Issue - State: closed - Opened by BLKSerene almost 5 years ago - 4 comments
Labels: bug

#72 - fix detokenization to add space between non-cjk and cjk tokens

Pull Request - State: closed - Opened by brandonherzog almost 5 years ago - 1 comment

#70 - Verbosity

Pull Request - State: closed - Opened by alvations about 5 years ago

#69 - Verbosity

Issue - State: closed - Opened by mjpost about 5 years ago - 4 comments
Labels: enhancement

#68 - Truecaser for foreign languages!

Issue - State: closed - Opened by kalyangvs about 5 years ago - 5 comments

#67 - Remove duplicated HANDLE_PSEUDO_SPACES in normalize.py

Pull Request - State: closed - Opened by shijie-wu about 5 years ago - 1 comment

#66 - Add missing languages for nonbreaking prefixes

Pull Request - State: closed - Opened by BLKSerene about 5 years ago - 1 comment

#65 - Be more specific with the variable name for custom nb prefixes

Pull Request - State: closed - Opened by alvations about 5 years ago

#64 - Added feature to add custom nb prefixes

Pull Request - State: closed - Opened by alvations about 5 years ago

#63 - Added chinese features

Pull Request - State: closed - Opened by alvations about 5 years ago

#62 - Patching slowness in v0.0.22

Pull Request - State: closed - Opened by alvations about 5 years ago

#61 - first call to MosesTokenizer.tokenize is very slow

Issue - State: closed - Opened by johnfarina about 5 years ago - 14 comments
Labels: bug, fixed

#60 - Updated with perluniprops `unichars -au`

Pull Request - State: closed - Opened by alvations about 5 years ago

#59 - Patching issues on tracker

Pull Request - State: closed - Opened by alvations about 5 years ago - 1 comment

#58 - No such file or directory: nonbreaking_prefix.es

Issue - State: closed - Opened by loretoparisi about 5 years ago - 2 comments

#57 - add missing import in utils

Pull Request - State: closed - Opened by yannvgn about 5 years ago - 1 comment

#56 - fix substitutions in normalizer

Pull Request - State: closed - Opened by yannvgn over 5 years ago - 1 comment

#55 - Truecaser crashes for large corpora (>8M segments)

Issue - State: closed - Opened by pypae over 5 years ago - 2 comments

#54 - Patch #52

Pull Request - State: closed - Opened by alvations over 5 years ago

#53 - Non-Breaking Prefixes are stripped

Issue - State: closed - Opened by pypae over 5 years ago - 2 comments

#52 - Comma after a number at the end of the sentence not split

Issue - State: closed - Opened by pypae over 5 years ago - 2 comments
Labels: bug

#51 - Issue in handles_nonbreaking_prefixes

Issue - State: closed - Opened by pypae over 5 years ago - 5 comments
Labels: bug

#49 - Truecaser Test dependency on norvig.com/big.txt

Issue - State: closed - Opened by DavidHarrison over 5 years ago - 3 comments

#46 - Added protected pattern feature and basic protected patterns

Pull Request - State: closed - Opened by alvations over 5 years ago - 1 comment

#44 - Support of Moses tokenizer Perl scripts

Issue - State: closed - Opened by loretoparisi over 5 years ago - 9 comments

#42 - Tokenization for Hindi (e.g. `क्या`) is weird

Issue - State: open - Opened by alvations over 5 years ago - 6 comments
Labels: bug

#39 - Patch truecaser for #16

Pull Request - State: closed - Opened by alvations over 5 years ago - 1 comment

#38 - Added normalizer from #17

Pull Request - State: closed - Opened by alvations over 5 years ago - 1 comment

#37 - Testing CLI commands

Issue - State: open - Opened by alvations over 5 years ago - 3 comments

#35 - Flag --protected from original Moses tokenizer

Issue - State: closed - Opened by noe over 5 years ago - 5 comments
Labels: enhancement

#21 - "p.m." is not tokenized as in the original script.

Issue - State: open - Opened by pypae over 5 years ago - 2 comments

#15 - Truecaser Save to File

Issue - State: closed - Opened by pypae over 5 years ago - 6 comments

#12 - Documentation

Issue - State: open - Opened by alvations almost 6 years ago - 5 comments

#10 - Added MosesTruecaser

Pull Request - State: closed - Opened by alvations almost 6 years ago - 1 comment