Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / google/sentencepiece issues and pull requests

#798 - IndexError: Out of range: piece id is out of range.

Issue - State: closed - Opened by lytgyx about 2 years ago - 1 comment

#796 - Cannot install sentencepiece with Python 3.11 on Windows

Issue - State: closed - Opened by kbatsuren about 2 years ago - 2 comments

#795 - CMake need endif

Pull Request - State: closed - Opened by A2va about 2 years ago

#794 - sentencepiece 0.1.97 re-released?

Issue - State: closed - Opened by kenhys about 2 years ago - 2 comments

#793 - Disable shared build on windows

Pull Request - State: closed - Opened by A2va about 2 years ago

#792 - add CIFuzz GitHub action

Pull Request - State: closed - Opened by DavidKorczynski over 2 years ago

#791 - Continuous Tokenizer Training

Issue - State: closed - Opened by dszhengyu over 2 years ago - 1 comment

#790 - Recommended corpus size

Issue - State: closed - Opened by astariul over 2 years ago - 1 comment

#788 - `not a mach-o file` error on Jupyter M2 Mac

Issue - State: closed - Opened by mattlinares over 2 years ago - 2 comments

#787 - Build with protobuf in system

Issue - State: closed - Opened by acane77 over 2 years ago - 3 comments
Labels: bug, enhancement

#785 - Linkage error

Issue - State: closed - Opened by A2va over 2 years ago - 3 comments

#782 - Even with the sampling I get OOM

Issue - State: closed - Opened by lfoppiano over 2 years ago - 3 comments

#780 - Enable iOS builds

Pull Request - State: closed - Opened by jplu over 2 years ago - 1 comment

#770 - about running spm.SentencePieceTrainer.Train()?

Issue - State: closed - Opened by Joll123 over 2 years ago - 2 comments

#763 - Difficulty installing on M1 mac (solved)

Issue - State: closed - Opened by johnmcdonnell over 2 years ago - 2 comments

#763 - Difficulty installing on M1 mac (solved)

Issue - State: closed - Opened by johnmcdonnell over 2 years ago - 2 comments

#756 - Fix a typo

Pull Request - State: closed - Opened by kenhys over 2 years ago

#748 - Any way to load from Huggingface `tokenizer.json` file?

Issue - State: closed - Opened by jbmaxwell almost 3 years ago - 6 comments

#741 - “sentencepiece_processor.h”: No such file or directory

Issue - State: closed - Opened by Helmsman-Lab almost 3 years ago - 3 comments

#738 - Exception occurs when reading the saved model again

Issue - State: closed - Opened by zchuz almost 3 years ago - 2 comments

#738 - Exception occurs when reading the saved model again

Issue - State: closed - Opened by zchuz almost 3 years ago - 2 comments

#732 - Sentencepiece installation fails on Python 3.10

Issue - State: closed - Opened by tsharish almost 3 years ago - 4 comments

#726 - SentencePieceProcessor has no attribute Encode

Issue - State: closed - Opened by prashantserai about 3 years ago - 1 comment

#726 - SentencePieceProcessor has no attribute Encode

Issue - State: closed - Opened by prashantserai about 3 years ago - 1 comment

#723 - Bug: can't co-exist with pytorch-lightning

Issue - State: closed - Opened by jordane95 about 3 years ago - 11 comments

#703 - Segmentation fault on Ubuntu with basic python test

Issue - State: closed - Opened by johntmyers over 3 years ago - 6 comments

#702 - Unigram training always crashes when making suffix array

Issue - State: closed - Opened by MatthewBieda over 3 years ago - 4 comments

#692 - user defined char set separated from "_".

Issue - State: closed - Opened by BrightXiaoHan over 3 years ago - 1 comment

#684 - How to handle multiple whitespaces or newlines

Issue - State: closed - Opened by AmrMKayid over 3 years ago - 2 comments

#683 - bazel support for C++ API

Issue - State: open - Opened by BBerabi over 3 years ago - 1 comment
Labels: feature request

#650 - Prevent sentencepiece from normalizing whitespaces

Issue - State: closed - Opened by miguelvictor almost 4 years ago - 1 comment

#628 - Is the loss computation in UnigramTrainer correct?

Issue - State: closed - Opened by mbollmann about 4 years ago - 3 comments
Labels: bug

#608 - Add Mac M1 Compatibility

Issue - State: closed - Opened by pierreia about 4 years ago - 22 comments

#604 - RuntimeError when using sentencepiece

Issue - State: closed - Opened by Serkonosand about 4 years ago - 2 comments

#591 - Cannot install sentencepiece with Python 3.9 on Windows

Issue - State: closed - Opened by seemethere about 4 years ago - 16 comments

#588 - Combine vocabularies from various languges

Issue - State: closed - Opened by JamesDConley about 4 years ago - 8 comments

#579 - Shared library use unsafe because of abseil linkage

Issue - State: closed - Opened by danieldk over 4 years ago - 6 comments

#572 - pip install failed on Linux

Issue - State: closed - Opened by zhangguanheng66 over 4 years ago - 11 comments

#571 - Sentencepiece with pre-defined vocabulary

Issue - State: open - Opened by vladmosin over 4 years ago - 6 comments
Labels: help wanted, feature request

#563 - cmake: fix FTBFS on armel, mips, powerpc, m68k and sh4

Pull Request - State: closed - Opened by kenhys over 4 years ago

#562 - cmake: use GNUInstallDirs.cmake on UNIX

Pull Request - State: closed - Opened by kenhys over 4 years ago

#555 - What is the meaning of the second column of the .vocab file (using BPE)?

Issue - State: closed - Opened by dskoo over 4 years ago - 2 comments

#516 - My training crashes with large corpus.

Issue - State: closed - Opened by Srj over 4 years ago - 6 comments

#481 - Specify protobuf version when compiling from source

Issue - State: closed - Opened by jchwenger almost 5 years ago - 4 comments
Labels: duplicate, protobuf

#480 - How to get the frequency of a subword ?

Issue - State: closed - Opened by liuyaox almost 5 years ago - 2 comments

#474 - Using `set_vocabulary` to modify vocabulary

Issue - State: closed - Opened by sshleifer almost 5 years ago - 4 comments

#464 - module 'sentencepiece' has no attribute 'SentencePieceTrainer'

Issue - State: closed - Opened by rossbrown9879 almost 5 years ago - 6 comments

#444 - Get vocab and merges file from model file

Issue - State: closed - Opened by andompesta about 5 years ago - 3 comments

#444 - Get vocab and merges file from model file

Issue - State: closed - Opened by andompesta about 5 years ago - 2 comments

#426 - How to extend tokens dictionary?

Issue - State: closed - Opened by kpe over 5 years ago - 11 comments

#425 - do_lower_case in the sentencepiece model files

Issue - State: closed - Opened by kpe over 5 years ago - 3 comments

#416 - Fix a typo

Pull Request - State: closed - Opened by kenhys over 5 years ago

#412 - Regarding `character_coverage`

Issue - State: closed - Opened by ArbinTimilsina over 5 years ago - 3 comments

#406 - Explanation of encoding method

Issue - State: closed - Opened by rmrao over 5 years ago - 2 comments

#384 - Remove duplicated if (NOT DEFINED CMAKE_INSTALL_LIBDIR) check

Pull Request - State: closed - Opened by kenhys over 5 years ago - 2 comments

#378 - Pip install sentencepiece failure

Issue - State: closed - Opened by saareliad over 5 years ago - 43 comments

#366 - can we train by Parallel Computing or Multithreading or multi-Progress

Issue - State: open - Opened by joytianya over 5 years ago - 7 comments
Labels: feature request

#346 - Possible to have arm support for Android?

Issue - State: closed - Opened by gitathrun over 5 years ago - 13 comments

#338 - Option to quite LOG(INFO) and LOG(WARNING) messages

Issue - State: closed - Opened by ArbinTimilsina over 5 years ago - 6 comments

#323 - How can i add character to existing model?

Issue - State: closed - Opened by misssprite almost 6 years ago - 3 comments

#318 - Bug in BPE algorithm

Issue - State: closed - Opened by xbelonogov almost 6 years ago - 5 comments
Labels: bug

#299 - python wrapper export vocabulary list

Issue - State: closed - Opened by xinsu626 almost 6 years ago - 3 comments

#285 - Tutorial to train a cross-language model with sentencepiece

Issue - State: closed - Opened by loretoparisi about 6 years ago - 4 comments
Labels: sample code

#263 - do not split by apostrophe character

Issue - State: closed - Opened by EgorLakomkin about 6 years ago - 3 comments

#255 - replace <unk> with custom unk token "xxunk"

Issue - State: closed - Opened by kasparlund about 6 years ago - 6 comments

#252 - Computing representative vocabularies for multiple large files

Issue - State: closed - Opened by emjotde about 6 years ago - 8 comments

#242 - Typo on paragraph #44

Pull Request - State: closed - Opened by kant over 6 years ago - 1 comment

#215 - What is the difference between --user_defined_symbols and --control_symbols

Issue - State: closed - Opened by thammegowda over 6 years ago - 3 comments

#121 - Manually modifying SentencePiece model?

Issue - State: closed - Opened by neubig over 6 years ago - 9 comments

#121 - Manually modifying SentencePiece model?

Issue - State: closed - Opened by neubig over 6 years ago - 10 comments

#102 - Understanding BOS/EOS symbols

Issue - State: closed - Opened by sooheon over 6 years ago - 6 comments

#99 - Added link on string #32

Pull Request - State: closed - Opened by kant over 6 years ago - 1 comment

#27 - Typo

Pull Request - State: closed - Opened by kant over 7 years ago - 1 comment