Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / OpenNMT/Tokenizer issues and pull requests

#333 - Bump actions/download-artifact from 3 to 4.1.7 in /.github/workflows

Pull Request - State: open - Opened by dependabot[bot] 27 days ago
Labels: dependencies

#332 - Minor fixes addressing issue #330

Pull Request - State: open - Opened by hatboyzero 2 months ago

#331 - ICU_INCLUDE_DIRS for cli projects

Issue - State: open - Opened by emabiz 3 months ago

#330 - Seems min C++ standard is now C++17

Issue - State: open - Opened by royshil 3 months ago

#328 - How to return token ranges when tokenizing text?

Issue - State: open - Opened by anderleich 6 months ago

#326 - How to develop a C++ tokenizer for MarianMT using the OpenNMT Tokenizer

Issue - State: closed - Opened by Zapotecatl about 1 year ago - 3 comments

#325 - my source file doesn't compile

Issue - State: closed - Opened by AESTheProgrammer over 1 year ago - 2 comments

#324 - Update ICU to 73.2

Pull Request - State: closed - Opened by guillaumekln over 1 year ago

#322 - Consider escaped characters as single characters in BPE

Pull Request - State: closed - Opened by guillaumekln over 1 year ago

#318 - Make the DLL registration generic

Pull Request - State: closed - Opened by guillaumekln over 1 year ago

#317 - Update ICU to 72.1

Pull Request - State: closed - Opened by guillaumekln over 1 year ago

#316 - Add tokenization option allow_isolated_marks

Pull Request - State: closed - Opened by guillaumekln over 1 year ago

#313 - A strange segmentation occurs with a Thai example.

Issue - State: closed - Opened by l-k-11235 over 1 year ago - 6 comments

#312 - case annotation combined with other external features

Issue - State: closed - Opened by vince62s over 1 year ago - 4 comments

#311 - Expose Python function to check if a language code is valid

Pull Request - State: closed - Opened by guillaumekln over 1 year ago

#309 - [feature request] num_workers option in BPELearner

Issue - State: closed - Opened by vince62s almost 2 years ago - 3 comments

#308 - Weird character on tokenizer output (C++ only)

Issue - State: closed - Opened by A2va almost 2 years ago - 2 comments

#307 - Update GitHub Actions to fix warnings

Pull Request - State: closed - Opened by guillaumekln almost 2 years ago

#306 - Fix macOS build in CI

Pull Request - State: closed - Opened by guillaumekln almost 2 years ago

#305 - Update cibuildwheel to 2.11.2

Pull Request - State: closed - Opened by guillaumekln almost 2 years ago

#304 - Update pybind11 to 2.10.1

Pull Request - State: closed - Opened by guillaumekln almost 2 years ago

#303 - Implement pickle support for Vocab objects

Pull Request - State: closed - Opened by guillaumekln almost 2 years ago

#302 - Update GoogleTest to 1.12.1 to fix CMake warning

Pull Request - State: closed - Opened by guillaumekln about 2 years ago

#301 - Fix static compilation

Pull Request - State: closed - Opened by panosk about 2 years ago

#300 - Question about CMakeLists.txt's "create_library" condition

Issue - State: closed - Opened by panosk about 2 years ago - 1 comment

#299 - Update ICU to 71.1

Pull Request - State: closed - Opened by guillaumekln about 2 years ago

#298 - Handle error cases when reading token frequencies from the vocab file

Pull Request - State: closed - Opened by guillaumekln about 2 years ago

#297 - Build wheels for Python 3.11

Pull Request - State: closed - Opened by guillaumekln about 2 years ago

#296 - Build ARM64 wheels for macOS

Pull Request - State: closed - Opened by guillaumekln about 2 years ago

#295 - Fix error when --segment_alphabet option is not set

Pull Request - State: closed - Opened by guillaumekln about 2 years ago

#294 - step-by-step to compile from source ?

Issue - State: closed - Opened by vince62s about 2 years ago - 3 comments

#292 - Only set flag -Wno-stringop-overflow for GCC

Pull Request - State: closed - Opened by guillaumekln about 2 years ago

#291 - Update to macOS 11 runner as macOS 10.15 runner is deprecated

Pull Request - State: closed - Opened by guillaumekln about 2 years ago

#290 - Update cxxopts to 3.0.0

Pull Request - State: closed - Opened by guillaumekln about 2 years ago

#289 - Update cibuildwheel to 2.8.1

Pull Request - State: closed - Opened by guillaumekln about 2 years ago

#288 - Update pybind11 to 2.10.0

Pull Request - State: closed - Opened by guillaumekln about 2 years ago

#287 - Expose token counters from Vocab class

Pull Request - State: closed - Opened by guillaumekln about 2 years ago - 1 comment

#286 - Expose frequencies for Vocab?

Issue - State: closed - Opened by Zenglinxiao about 2 years ago - 2 comments
Labels: enhancement

#285 - Build ARM64 wheels for macOS

Issue - State: closed - Opened by guillaumekln over 2 years ago - 19 comments
Labels: enhancement, help wanted

#284 - speed

Issue - State: closed - Opened by rudyyin over 2 years ago - 2 comments

#283 - save learned bpe model

Issue - State: closed - Opened by rudyyin over 2 years ago - 2 comments

#282 - question related to the new features re: vocab

Issue - State: closed - Opened by vince62s over 2 years ago - 5 comments

#281 - Add a Vocab class and related functions

Pull Request - State: closed - Opened by guillaumekln over 2 years ago - 4 comments

#280 - Add basic Tokenizer.__call__ method

Pull Request - State: closed - Opened by guillaumekln over 2 years ago

#279 - [Question] build vocab in this library ?

Issue - State: closed - Opened by vince62s over 2 years ago - 3 comments

#278 - [Question] tokenize list of files ?

Issue - State: closed - Opened by vince62s over 2 years ago - 3 comments

#277 - Update pybind11 to 2.9.1

Pull Request - State: closed - Opened by guillaumekln over 2 years ago

#276 - Update black to stable release

Pull Request - State: closed - Opened by guillaumekln over 2 years ago

#275 - Improve lang validity check

Pull Request - State: closed - Opened by guillaumekln over 2 years ago

#274 - case_markup changes critical for trained models

Issue - State: closed - Opened by anderleich over 2 years ago - 12 comments

#273 - Add support to release aarch64 wheels

Issue - State: closed - Opened by odidev almost 3 years ago

#272 - Add aarch64 wheel build support

Pull Request - State: closed - Opened by odidev almost 3 years ago - 2 comments

#271 - Set explicit DLL load order

Pull Request - State: closed - Opened by guillaumekln almost 3 years ago

#270 - Add tokenize_batch in Python API reference

Pull Request - State: closed - Opened by guillaumekln almost 3 years ago

#269 - Build wheels for Python 3.10 and drop Python 3.5

Pull Request - State: closed - Opened by guillaumekln almost 3 years ago

#268 - Add tokenize_batch method in Python

Pull Request - State: closed - Opened by guillaumekln almost 3 years ago

#267 - Improve check for invalid escape sequences

Pull Request - State: closed - Opened by guillaumekln about 3 years ago

#266 - Support with_separators in detokenization

Pull Request - State: closed - Opened by guillaumekln about 3 years ago

#265 - Export __version__ variable in Python module

Pull Request - State: closed - Opened by guillaumekln about 3 years ago

#264 - Reformat Python code with Black

Pull Request - State: closed - Opened by guillaumekln about 3 years ago

#263 - Remove the SpaceTokenizer class

Pull Request - State: closed - Opened by guillaumekln about 3 years ago

#262 - Expose with_separators option in Python and CLI

Pull Request - State: closed - Opened by guillaumekln about 3 years ago

#261 - Allow setting a custom tokens delimiter when writing to files

Pull Request - State: closed - Opened by guillaumekln about 3 years ago

#260 - Build Python wheels for Windows

Pull Request - State: closed - Opened by guillaumekln about 3 years ago

#259 - AttributeError: module 'tempfile' has no attribute 'mkstemp'

Issue - State: closed - Opened by BrightXiaoHan about 3 years ago - 1 comment

#258 - Avoid a copy when returning Token objects from Python

Pull Request - State: closed - Opened by guillaumekln about 3 years ago

#257 - Verify escape sequence during detokenization

Pull Request - State: closed - Opened by guillaumekln about 3 years ago

#256 - Update SentencePiece to 0.1.96

Pull Request - State: closed - Opened by guillaumekln about 3 years ago

#255 - Upgrade the Python wheels build environment

Pull Request - State: closed - Opened by guillaumekln about 3 years ago

#254 - Cleanup shared_ptr creation in Python wrapper

Pull Request - State: closed - Opened by guillaumekln about 3 years ago

#253 - Cleanup C++ tests

Pull Request - State: closed - Opened by guillaumekln about 3 years ago

#252 - Fix casing resolution when some letters do not have case information

Pull Request - State: closed - Opened by guillaumekln about 3 years ago

#251 - Fix regression in last commit for preserved tokens and BPE subword

Pull Request - State: closed - Opened by guillaumekln over 3 years ago

#250 - Fix handling of detached spacers for single SentencePiece subword

Pull Request - State: closed - Opened by guillaumekln over 3 years ago

#249 - Fix divergence with SentencePiece for detached leading spacers

Pull Request - State: closed - Opened by guillaumekln over 3 years ago

#247 - Set upper bound to required Python version

Pull Request - State: closed - Opened by guillaumekln over 3 years ago

#246 - What's the difference between this tokenizer and Moses tokenizer, or Sacremoses

Issue - State: closed - Opened by gdxie1 over 3 years ago - 1 comment
Labels: question

#245 - Require ICU 60 or greater to use lang option

Pull Request - State: closed - Opened by guillaumekln over 3 years ago

#244 - Minor optimization to the code point to string conversion

Pull Request - State: closed - Opened by guillaumekln over 3 years ago

#243 - Update Unicode code point <-> string conversions

Pull Request - State: closed - Opened by guillaumekln over 3 years ago

#242 - Remove unnecessary CI step

Pull Request - State: closed - Opened by guillaumekln over 3 years ago

#241 - Update cibuildwheel to 1.10.0

Pull Request - State: closed - Opened by guillaumekln over 3 years ago

#240 - Add lang argument and apply locale-dependent recasing

Pull Request - State: closed - Opened by guillaumekln over 3 years ago - 5 comments

#239 - Potential undesirable effects with SoftCaseRegions and normalization suggestion

Issue - State: closed - Opened by panosk over 3 years ago - 6 comments

#238 - Check tokenization mode when enabling case_markup

Pull Request - State: closed - Opened by guillaumekln over 3 years ago

#237 - Implement __len__ method in Token class

Pull Request - State: closed - Opened by guillaumekln over 3 years ago

#236 - Cleanup some manual Python <-> C++ types conversion

Pull Request - State: closed - Opened by guillaumekln over 3 years ago

#235 - Update Python package metadata

Pull Request - State: closed - Opened by guillaumekln over 3 years ago

#234 - Release Python GIL in tokenize method

Pull Request - State: closed - Opened by guillaumekln over 3 years ago

#233 - Add training flag to enable/disable subword regularization

Pull Request - State: closed - Opened by guillaumekln over 3 years ago