Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / OpenNMT/Tokenizer issues and pull requests
#333 - Bump actions/download-artifact from 3 to 4.1.7 in /.github/workflows
Pull Request -
State: open - Opened by dependabot[bot] 27 days ago
Labels: dependencies
#332 - Minor fixes addressing issue #330
Pull Request -
State: open - Opened by hatboyzero 2 months ago
#331 - ICU_INCLUDE_DIRS for cli projects
Issue -
State: open - Opened by emabiz 3 months ago
#330 - Seems min C++ standard is now C++17
Issue -
State: open - Opened by royshil 3 months ago
#329 - pyonmttok installation fails while installing OpenNMT-py with python 3.12
Issue -
State: open - Opened by ramanirudh 4 months ago
- 1 comment
#328 - How to return token ranges when tokenizing text?
Issue -
State: open - Opened by anderleich 6 months ago
#326 - How to develop a C++ tokenizer for MarianMT using the OpenNMT Tokenizer
Issue -
State: closed - Opened by Zapotecatl about 1 year ago
- 3 comments
#325 - my source file doesn't compile
Issue -
State: closed - Opened by AESTheProgrammer over 1 year ago
- 2 comments
#324 - Update ICU to 73.2
Pull Request -
State: closed - Opened by guillaumekln over 1 year ago
#323 - SentencePiece 0.1.97 changed API to take `std::vector<string_view>` "everywhere", breaking SetVocabulary here
Issue -
State: open - Opened by Micket over 1 year ago
- 2 comments
#322 - Consider escaped characters as single characters in BPE
Pull Request -
State: closed - Opened by guillaumekln over 1 year ago
#321 - Ignore undefined scripts when resolving inherited or common scripts
Pull Request -
State: closed - Opened by guillaumekln over 1 year ago
#320 - Fix BPELearner segfault when there is not a single pair of characters
Pull Request -
State: closed - Opened by guillaumekln over 1 year ago
#319 - Fix infinite loop in character_iterator when invalid code point is encountered
Pull Request -
State: closed - Opened by NM-20 over 1 year ago
#318 - Make the DLL registration generic
Pull Request -
State: closed - Opened by guillaumekln over 1 year ago
#317 - Update ICU to 72.1
Pull Request -
State: closed - Opened by guillaumekln over 1 year ago
#316 - Add tokenization option allow_isolated_marks
Pull Request -
State: closed - Opened by guillaumekln over 1 year ago
#315 - C++ library's "tokenize" function causes application to hang with specific characters
Issue -
State: closed - Opened by NM-20 over 1 year ago
- 24 comments
#314 - C++ library's "tokenize" function causes application to hang with specific characters
Issue -
State: closed - Opened by NM-20 over 1 year ago
#313 - A strange segmentation occurs with a Thai example.
Issue -
State: closed - Opened by l-k-11235 over 1 year ago
- 6 comments
#312 - case annotation combined with other external features
Issue -
State: closed - Opened by vince62s over 1 year ago
- 4 comments
#311 - Expose Python function to check if a language code is valid
Pull Request -
State: closed - Opened by guillaumekln over 1 year ago
#310 - Add Tokenizer argument `vocabulary` to directly pass a list of tokens
Pull Request -
State: closed - Opened by guillaumekln over 1 year ago
#309 - [feature request] num_workers option in BPELearner
Issue -
State: closed - Opened by vince62s almost 2 years ago
- 3 comments
#308 - Weird character on tokenizer output (C++ only)
Issue -
State: closed - Opened by A2va almost 2 years ago
- 2 comments
#307 - Update GitHub Actions to fix warnings
Pull Request -
State: closed - Opened by guillaumekln almost 2 years ago
#306 - Fix macOS build in CI
Pull Request -
State: closed - Opened by guillaumekln almost 2 years ago
#305 - Update cibuildwheel to 2.11.2
Pull Request -
State: closed - Opened by guillaumekln almost 2 years ago
#304 - Update pybind11 to 2.10.1
Pull Request -
State: closed - Opened by guillaumekln almost 2 years ago
#303 - Implement pickle support for Vocab objects
Pull Request -
State: closed - Opened by guillaumekln almost 2 years ago
#302 - Update GoogleTest to 1.12.1 to fix CMake warning
Pull Request -
State: closed - Opened by guillaumekln about 2 years ago
#301 - Fix static compilation
Pull Request -
State: closed - Opened by panosk about 2 years ago
#300 - Question about CMakeLists.txt's "create_library" condition
Issue -
State: closed - Opened by panosk about 2 years ago
- 1 comment
#299 - Update ICU to 71.1
Pull Request -
State: closed - Opened by guillaumekln about 2 years ago
#298 - Handle error cases when reading token frequencies from the vocab file
Pull Request -
State: closed - Opened by guillaumekln about 2 years ago
#297 - Build wheels for Python 3.11
Pull Request -
State: closed - Opened by guillaumekln about 2 years ago
#296 - Build ARM64 wheels for macOS
Pull Request -
State: closed - Opened by guillaumekln about 2 years ago
#295 - Fix error when --segment_alphabet option is not set
Pull Request -
State: closed - Opened by guillaumekln about 2 years ago
#294 - step-by-step to compile from source ?
Issue -
State: closed - Opened by vince62s about 2 years ago
- 3 comments
#293 - After "import pyonmttok", torch will report ERROR "Segmentation fault (core dumped) "
Issue -
State: closed - Opened by areaChun about 2 years ago
- 2 comments
#292 - Only set flag -Wno-stringop-overflow for GCC
Pull Request -
State: closed - Opened by guillaumekln about 2 years ago
#291 - Update to macOS 11 runner as macOS 10.15 runner is deprecated
Pull Request -
State: closed - Opened by guillaumekln about 2 years ago
#290 - Update cxxopts to 3.0.0
Pull Request -
State: closed - Opened by guillaumekln about 2 years ago
#289 - Update cibuildwheel to 2.8.1
Pull Request -
State: closed - Opened by guillaumekln about 2 years ago
#288 - Update pybind11 to 2.10.0
Pull Request -
State: closed - Opened by guillaumekln about 2 years ago
#287 - Expose token counters from Vocab class
Pull Request -
State: closed - Opened by guillaumekln about 2 years ago
- 1 comment
#286 - Expose frequencies for Vocab?
Issue -
State: closed - Opened by Zenglinxiao about 2 years ago
- 2 comments
Labels: enhancement
#285 - Build ARM64 wheels for macOS
Issue -
State: closed - Opened by guillaumekln over 2 years ago
- 19 comments
Labels: enhancement, help wanted
#284 - speed
Issue -
State: closed - Opened by rudyyin over 2 years ago
- 2 comments
#283 - save learned bpe model
Issue -
State: closed - Opened by rudyyin over 2 years ago
- 2 comments
#282 - question related to the new features re: vocab
Issue -
State: closed - Opened by vince62s over 2 years ago
- 5 comments
#281 - Add a Vocab class and related functions
Pull Request -
State: closed - Opened by guillaumekln over 2 years ago
- 4 comments
#280 - Add basic Tokenizer.__call__ method
Pull Request -
State: closed - Opened by guillaumekln over 2 years ago
#279 - [Question] build vocab in this library ?
Issue -
State: closed - Opened by vince62s over 2 years ago
- 3 comments
#278 - [Question] tokenize list of files ?
Issue -
State: closed - Opened by vince62s over 2 years ago
- 3 comments
#277 - Update pybind11 to 2.9.1
Pull Request -
State: closed - Opened by guillaumekln over 2 years ago
#276 - Update black to stable release
Pull Request -
State: closed - Opened by guillaumekln over 2 years ago
#275 - Improve lang validity check
Pull Request -
State: closed - Opened by guillaumekln over 2 years ago
#274 - case_markup changes critical for trained models
Issue -
State: closed - Opened by anderleich over 2 years ago
- 12 comments
#273 - Add support to release aarch64 wheels
Issue -
State: closed - Opened by odidev almost 3 years ago
#272 - Add aarch64 wheel build support
Pull Request -
State: closed - Opened by odidev almost 3 years ago
- 2 comments
#271 - Set explicit DLL load order
Pull Request -
State: closed - Opened by guillaumekln almost 3 years ago
#270 - Add tokenize_batch in Python API reference
Pull Request -
State: closed - Opened by guillaumekln almost 3 years ago
#269 - Build wheels for Python 3.10 and drop Python 3.5
Pull Request -
State: closed - Opened by guillaumekln almost 3 years ago
#268 - Add tokenize_batch method in Python
Pull Request -
State: closed - Opened by guillaumekln almost 3 years ago
#267 - Improve check for invalid escape sequences
Pull Request -
State: closed - Opened by guillaumekln about 3 years ago
#266 - Support with_separators in detokenization
Pull Request -
State: closed - Opened by guillaumekln about 3 years ago
#265 - Export __version__ variable in Python module
Pull Request -
State: closed - Opened by guillaumekln about 3 years ago
#264 - Reformat Python code with Black
Pull Request -
State: closed - Opened by guillaumekln about 3 years ago
#263 - Remove the SpaceTokenizer class
Pull Request -
State: closed - Opened by guillaumekln about 3 years ago
#262 - Expose with_separators option in Python and CLI
Pull Request -
State: closed - Opened by guillaumekln about 3 years ago
#261 - Allow setting a custom tokens delimiter when writing to files
Pull Request -
State: closed - Opened by guillaumekln about 3 years ago
#260 - Build Python wheels for Windows
Pull Request -
State: closed - Opened by guillaumekln about 3 years ago
#259 - AttributeError: module 'tempfile' has no attribute 'mkstemp'
Issue -
State: closed - Opened by BrightXiaoHan about 3 years ago
- 1 comment
#258 - Avoid a copy when returning Token objects from Python
Pull Request -
State: closed - Opened by guillaumekln about 3 years ago
#257 - Verify escape sequence during detokenization
Pull Request -
State: closed - Opened by guillaumekln about 3 years ago
#256 - Update SentencePiece to 0.1.96
Pull Request -
State: closed - Opened by guillaumekln about 3 years ago
#255 - Upgrade the Python wheels build environment
Pull Request -
State: closed - Opened by guillaumekln about 3 years ago
#254 - Cleanup shared_ptr creation in Python wrapper
Pull Request -
State: closed - Opened by guillaumekln about 3 years ago
#253 - Cleanup C++ tests
Pull Request -
State: closed - Opened by guillaumekln about 3 years ago
#252 - Fix casing resolution when some letters do not have case information
Pull Request -
State: closed - Opened by guillaumekln about 3 years ago
#251 - Fix regression in last commit for preserved tokens and BPE subword
Pull Request -
State: closed - Opened by guillaumekln over 3 years ago
#250 - Fix handling of detached spacers for single SentencePiece subword
Pull Request -
State: closed - Opened by guillaumekln over 3 years ago
#249 - Fix divergence with SentencePiece for detached leading spacers
Pull Request -
State: closed - Opened by guillaumekln over 3 years ago
#248 - Fix application of subword vocabulary for tokens with "preserve" flag
Pull Request -
State: closed - Opened by guillaumekln over 3 years ago
#247 - Set upper bound to required Python version
Pull Request -
State: closed - Opened by guillaumekln over 3 years ago
#246 - What's the difference between this tokenizer and Moses tokenizer, or Sacremoses
Issue -
State: closed - Opened by gdxie1 over 3 years ago
- 1 comment
Labels: question
#245 - Require ICU 60 or greater to use lang option
Pull Request -
State: closed - Opened by guillaumekln over 3 years ago
#244 - Minor optimization to the code point to string conversion
Pull Request -
State: closed - Opened by guillaumekln over 3 years ago
#243 - Update Unicode code point <-> string conversions
Pull Request -
State: closed - Opened by guillaumekln over 3 years ago
#242 - Remove unnecessary CI step
Pull Request -
State: closed - Opened by guillaumekln over 3 years ago
#241 - Update cibuildwheel to 1.10.0
Pull Request -
State: closed - Opened by guillaumekln over 3 years ago
#240 - Add lang argument and apply locale-dependent recasing
Pull Request -
State: closed - Opened by guillaumekln over 3 years ago
- 5 comments
#239 - Potential undesirable effects with SoftCaseRegions and normalization suggestion
Issue -
State: closed - Opened by panosk over 3 years ago
- 6 comments
#238 - Check tokenization mode when enabling case_markup
Pull Request -
State: closed - Opened by guillaumekln over 3 years ago
#237 - Implement __len__ method in Token class
Pull Request -
State: closed - Opened by guillaumekln over 3 years ago
#236 - Cleanup some manual Python <-> C++ types conversion
Pull Request -
State: closed - Opened by guillaumekln over 3 years ago
#235 - Update Python package metadata
Pull Request -
State: closed - Opened by guillaumekln over 3 years ago
#234 - Release Python GIL in tokenize method
Pull Request -
State: closed - Opened by guillaumekln over 3 years ago
#233 - Add training flag to enable/disable subword regularization
Pull Request -
State: closed - Opened by guillaumekln over 3 years ago