Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / alvations/sacremoses issues and pull requests
#138 - Web and basic protected patterns by default
Pull Request -
State: open - Opened by samirsalman over 1 year ago
#137 - Quiet flag has no effect on the detokenizer
Issue -
State: open - Opened by XapaJIaMnu over 1 year ago
#136 - [Question] Why is <unk> token tokenized into 3 items?
Issue -
State: open - Opened by MarkusSpanring almost 2 years ago
#135 - Is this package multi-threaded?
Issue -
State: open - Opened by ejkitchen about 2 years ago
#134 - Clean up Python 2 compatibility-related codes
Pull Request -
State: closed - Opened by BLKSerene about 2 years ago
- 1 comment
#133 - compile regex objects ahead of time for improved perf.
Pull Request -
State: open - Opened by erip about 2 years ago
- 1 comment
#132 - Add replacement for space before special symbols
Pull Request -
State: closed - Opened by dimitrityan over 2 years ago
#131 - Restrict click to be <8.1
Pull Request -
State: open - Opened by alecbrick over 2 years ago
- 3 comments
#130 - Error with CLI `tokenize` using `click==8.1.3`
Issue -
State: closed - Opened by gsarti over 2 years ago
- 10 comments
Labels: bug
#129 - New release?
Issue -
State: closed - Opened by erip over 2 years ago
- 4 comments
#128 - Code cleanup
Pull Request -
State: closed - Opened by BLKSerene over 2 years ago
- 1 comment
#127 - use setuptools instead of distutils
Pull Request -
State: closed - Opened by alvations over 2 years ago
#126 - distutils is deprecated in Python 3.10
Issue -
State: closed - Opened by tirkarthi over 2 years ago
- 1 comment
#125 - Which of the 100 languages used in mBERT are not supported by this tokenizer?
Issue -
State: closed - Opened by fake-warrior8 almost 3 years ago
- 1 comment
#124 - Cache downloaded tokenizer files
Issue -
State: closed - Opened by masus04 almost 3 years ago
- 1 comment
#123 - Is there a way to use sacremoses in java?
Issue -
State: closed - Opened by SefaZeng almost 3 years ago
- 1 comment
Labels: feature-request
#122 - Trying to get in touch regarding a security issue
Issue -
State: closed - Opened by JamieSlome about 3 years ago
- 2 comments
#121 - Fixed protected patterns, truecase logic
Pull Request -
State: open - Opened by pluiez about 3 years ago
#120 - Chinese full stop “。” can't be split.
Issue -
State: open - Opened by BrightXiaoHan about 3 years ago
- 1 comment
#119 - Add missing language for nonbreaking prefixes
Pull Request -
State: closed - Opened by BLKSerene over 3 years ago
#118 - deep detokenizer
Issue -
State: open - Opened by chessgecko over 3 years ago
#117 - can't tokenise the period properly
Issue -
State: open - Opened by gdxie1 over 3 years ago
#116 - No detokenize_penn?
Issue -
State: open - Opened by mfelice over 3 years ago
#115 - update release on github
Issue -
State: open - Opened by PeterLijunfeng over 3 years ago
#114 - Add tokenization for Tetun (tdt)
Pull Request -
State: open - Opened by raphaelmerx over 3 years ago
#113 - Truecaser seems do not process sentences beginning with quotation marks
Issue -
State: open - Opened by SunshineBot almost 4 years ago
#112 - bug report
Issue -
State: open - Opened by Brucewuzhang almost 4 years ago
#111 - fix "int' + 'str' error, revert to what mosesdecoder does
Pull Request -
State: open - Opened by Brucewuzhang almost 4 years ago
- 1 comment
#110 - Is this normal to tokenize "His number is No.123." to ['His', 'number', 'is', 'No.123.']. Should it be ['His', 'number', 'is', 'No.123', '.']?
Issue -
State: closed - Opened by BrightXiaoHan about 4 years ago
#109 - Bug: normalize prints extra newline
Issue -
State: closed - Opened by mjpost about 4 years ago
- 1 comment
Labels: bug
#108 - Fix detruecaser when the first token is all-caps
Pull Request -
State: closed - Opened by yuyang-huang about 4 years ago
- 2 comments
#107 - error: loading model for detruecaser
Issue -
State: open - Opened by colingair about 4 years ago
#106 - detrucase Error: unsupported operand type(s)
Issue -
State: open - Opened by colingair about 4 years ago
#105 - truecase training killed: 9
Issue -
State: open - Opened by colingair about 4 years ago
- 1 comment
#104 - License Clarification
Issue -
State: closed - Opened by TRQ3 about 4 years ago
- 1 comment
#103 - Improvements: virama and nukthas of Indic languages, easy way to specify basic protected patterns
Pull Request -
State: closed - Opened by thammegowda about 4 years ago
- 5 comments
Labels: bug, enhancement, awesome-contribution
#102 - Possible to retrain/keep training an existing model?
Issue -
State: open - Opened by petulla over 4 years ago
- 5 comments
Labels: question, feature-request
#101 - Change to chain.from_iterable in truecase.py
Pull Request -
State: closed - Opened by cool-RR over 4 years ago
- 12 comments
#100 - Use chain.from_iterable in normalize.py
Pull Request -
State: closed - Opened by cool-RR over 4 years ago
- 7 comments
#99 - Is there a plan to have sent_tokenize in this library?
Issue -
State: open - Opened by jeremyasapp over 4 years ago
- 2 comments
Labels: enhancement, feature-request
#98 - Tokenizer -x option is confusing
Issue -
State: open - Opened by ZJaume over 4 years ago
- 3 comments
Labels: enhancement, cli
#97 - Non-breaking prefixes at the end of the sentence
Issue -
State: closed - Opened by ZJaume over 4 years ago
- 2 comments
Labels: duplicate, wontfix
#96 - New pipeline feature!
Pull Request -
State: closed - Opened by alvations over 4 years ago
- 2 comments
#95 - Dropping Python 2.7 support.
Issue -
State: closed - Opened by alvations over 4 years ago
- 1 comment
Labels: pythonic
#94 - Fixes deprecation warning
Pull Request -
State: closed - Opened by alvations over 4 years ago
#93 - Relicense to MIT
Pull Request -
State: closed - Opened by alvations over 4 years ago
- 2 comments
Labels: help wanted, license
#92 - Create LICENSE
Pull Request -
State: closed - Opened by alvations over 4 years ago
#91 - PyPI tarball contains code with syntax errors
Issue -
State: closed - Opened by adamjstewart over 4 years ago
- 3 comments
Labels: bug
#90 - Escape option on command line
Issue -
State: closed - Opened by felipealco over 4 years ago
- 1 comment
#89 - normalize broken
Issue -
State: closed - Opened by mjpost over 4 years ago
- 2 comments
Labels: bug
#88 - Fixing typos in truecase function
Pull Request -
State: closed - Opened by HaukurPall over 4 years ago
- 1 comment
Labels: bug
#87 - NameError: name 'words' is not defined in truecaser
Issue -
State: closed - Opened by butsugiri over 4 years ago
- 2 comments
#86 - pip3 install error: AttributeError: '_io.BufferedWriter' object has no attribute 'encoding'
Issue -
State: closed - Opened by youssefavx over 4 years ago
- 6 comments
Labels: bug
#85 - Fix normalize command
Pull Request -
State: closed - Opened by myleott over 4 years ago
- 1 comment
#84 - Deprecation warning due to invalid escape sequences in Python 3.8
Issue -
State: closed - Opened by tirkarthi over 4 years ago
- 2 comments
Labels: pythonic
#83 - LICENSE file
Issue -
State: closed - Opened by noqcks over 4 years ago
- 4 comments
Labels: license
#82 - Add lowercase script?
Issue -
State: open - Opened by mayhewsw over 4 years ago
- 20 comments
Labels: feature-request
#81 - Fix typo, up version
Pull Request -
State: closed - Opened by alvations over 4 years ago
#80 - Added more normalization script
Pull Request -
State: closed - Opened by alvations over 4 years ago
#79 - Space deduplication
Issue -
State: open - Opened by alvations almost 5 years ago
- 3 comments
Labels: help wanted, question
#78 - Patching single quotes normalization
Pull Request -
State: closed - Opened by alvations almost 5 years ago
- 1 comment
#77 - Check for custom non-breaking prefixes file before loading default prefixes
Pull Request -
State: closed - Opened by alvations almost 5 years ago
#76 - Difference between regex and re library
Issue -
State: open - Opened by alvations almost 5 years ago
- 2 comments
Labels: help wanted, question
#75 - Tag releases
Issue -
State: closed - Opened by ryandesign almost 5 years ago
- 1 comment
#74 - Apostrophes in English
Issue -
State: open - Opened by j0hannes almost 5 years ago
- 12 comments
Labels: bug, wontfix
#73 - Weird results for Tamil and Russian tokenization
Issue -
State: closed - Opened by BLKSerene almost 5 years ago
- 4 comments
Labels: bug
#72 - fix detokenization to add space between non-cjk and cjk tokens
Pull Request -
State: closed - Opened by brandonherzog almost 5 years ago
- 1 comment
#71 - detokenization does not add a space between Chinese/Japanese characters and non-CJK characters
Issue -
State: closed - Opened by brandonherzog almost 5 years ago
- 1 comment
#70 - Verbosity
Pull Request -
State: closed - Opened by alvations about 5 years ago
#69 - Verbosity
Issue -
State: closed - Opened by mjpost about 5 years ago
- 4 comments
Labels: enhancement
#68 - Truecaser for foreign languages!
Issue -
State: closed - Opened by kalyangvs about 5 years ago
- 5 comments
#67 - Remove duplicated HANDLE_PSEUDO_SPACES in normalize.py
Pull Request -
State: closed - Opened by shijie-wu about 5 years ago
- 1 comment
#66 - Add missing languages for nonbreaking prefixes
Pull Request -
State: closed - Opened by BLKSerene about 5 years ago
- 1 comment
#65 - Be more specific with the variable name for custom nb prefixes
Pull Request -
State: closed - Opened by alvations about 5 years ago
#64 - Added feature to add custom nb prefixes
Pull Request -
State: closed - Opened by alvations about 5 years ago
#63 - Added chinese features
Pull Request -
State: closed - Opened by alvations about 5 years ago
#62 - Patching slowness in v0.0.22
Pull Request -
State: closed - Opened by alvations about 5 years ago
#61 - first call to MosesTokenizer.tokenize is very slow
Issue -
State: closed - Opened by johnfarina about 5 years ago
- 14 comments
Labels: bug, fixed
#60 - Updated with perluniprops `unichars -au`
Pull Request -
State: closed - Opened by alvations about 5 years ago
#59 - Patching issues on tracker
Pull Request -
State: closed - Opened by alvations about 5 years ago
- 1 comment
#58 - No such file or directory: nonbreaking_prefix.es
Issue -
State: closed - Opened by loretoparisi about 5 years ago
- 2 comments
#57 - add missing import in utils
Pull Request -
State: closed - Opened by yannvgn about 5 years ago
- 1 comment
#56 - fix substitutions in normalizer
Pull Request -
State: closed - Opened by yannvgn over 5 years ago
- 1 comment
#55 - Truecaser crashes for large corpora (>8M segments)
Issue -
State: closed - Opened by pypae over 5 years ago
- 2 comments
#54 - Patch #52
Pull Request -
State: closed - Opened by alvations over 5 years ago
#53 - Non-Breaking Prefixes are stripped
Issue -
State: closed - Opened by pypae over 5 years ago
- 2 comments
#52 - Comma after a number at the end of the sentence not split
Issue -
State: closed - Opened by pypae over 5 years ago
- 2 comments
Labels: bug
#51 - Issue in handles_nonbreaking_prefixes
Issue -
State: closed - Opened by pypae over 5 years ago
- 5 comments
Labels: bug
#49 - Truecaser Test dependency on norvig.com/big.txt
Issue -
State: closed - Opened by DavidHarrison over 5 years ago
- 3 comments
#46 - Added protected pattern feature and basic protected patterns
Pull Request -
State: closed - Opened by alvations over 5 years ago
- 1 comment
#44 - Support of Moses tokenizer Perl scripts
Issue -
State: closed - Opened by loretoparisi over 5 years ago
- 9 comments
#42 - Tokenization for Hindi (e.g. `क्या`) is weird
Issue -
State: open - Opened by alvations over 5 years ago
- 6 comments
Labels: bug
#39 - Patch truecaser for #16
Pull Request -
State: closed - Opened by alvations over 5 years ago
- 1 comment
#38 - Added normalizer from #17
Pull Request -
State: closed - Opened by alvations over 5 years ago
- 1 comment
#37 - Testing CLI commands
Issue -
State: open - Opened by alvations over 5 years ago
- 3 comments
#35 - Flag --protected from original Moses tokenizer
Issue -
State: closed - Opened by noe over 5 years ago
- 5 comments
Labels: enhancement
#21 - "p.m." is not tokenized as in the original script.
Issue -
State: open - Opened by pypae over 5 years ago
- 2 comments
#15 - Truecaser Save to File
Issue -
State: closed - Opened by pypae over 5 years ago
- 6 comments
#12 - Documentation
Issue -
State: open - Opened by alvations almost 6 years ago
- 5 comments
#10 - Added MosesTruecaser
Pull Request -
State: closed - Opened by alvations almost 6 years ago
- 1 comment