bheinzerling/bpemb issues and pull requests

#71 - Support for python 3.12 and scipy-1.13.0

Issue - State: closed - Opened by sjschmid 9 months ago - 1 comment

#70 - Model Downloading 404 Error

Issue - State: closed - Opened by gokdumano 11 months ago - 1 comment

#69 - release tags

Issue - State: open - Opened by ViZiD 11 months ago

#68 - Error in URL

Issue - State: closed - Opened by davebulaval 11 months ago - 1 comment

#67 - util: make content header check more robust

Pull Request - State: closed - Opened by stefan-it 11 months ago

#66 - UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

Issue - State: open - Opened by srolskyi 11 months ago - 7 comments

#65 - Add project urls and move the metadata into `pyproject.toml` according to `PEP 621`.

Pull Request - State: open - Opened by KOLANICH about 2 years ago

#64 - Extends embed to allow sequence of texts

Pull Request - State: open - Opened by maxi-marufo over 2 years ago

#63 - SSLError

Issue - State: closed - Opened by davebulaval over 2 years ago - 2 comments

#62 - Is the training procedure open?

Issue - State: closed - Opened by utrobinmv over 2 years ago

#61 - Incompatibility with subword-nmt 3.8.0

Issue - State: closed - Opened by AghilesAzzoug about 3 years ago

#60 - EOFError: Compressed file ended before the end-of-stream marker was reached

Issue - State: closed - Opened by JulesBelveze over 3 years ago - 2 comments

#59 - en.wiki.bpe.op or en.wiki.bpe.vs

Issue - State: closed - Opened by zhenpingli over 3 years ago

#58 - Truecase supported.

Issue - State: closed - Opened by BrightXiaoHan over 3 years ago - 1 comment

#57 - How to decode encoded byte-pair sentences?

Issue - State: closed - Opened by chayan-dhaddha over 3 years ago - 1 comment

#56 - Issues after updating to gensim 4.0.0

Issue - State: closed - Opened by arun5309 almost 4 years ago - 1 comment

#55 - Are the word embedding glove or word2vec

Issue - State: closed - Opened by YoadTew almost 4 years ago - 1 comment

#54 - [Question] How can we use BPEmb for large documents?

Issue - State: closed - Opened by neel04 almost 4 years ago - 2 comments

#53 - adding special tokens to a BPEmb model

Issue - State: closed - Opened by tannonk about 4 years ago - 8 comments

#52 - Can I add <pad>?

Issue - State: closed - Opened by Randool about 4 years ago - 1 comment

#51 - special tokens not handled

Issue - State: closed - Opened by dunovank over 4 years ago - 2 comments

#50 - Fix type hints for 'ids' type (fix #49)

Pull Request - State: closed - Opened by cosine0 over 4 years ago - 4 comments
Labels: hacktoberfest-accepted

#49 - Incorrect type hints for encode_ids*

Issue - State: closed - Opened by cosine0 over 4 years ago

#48 - Load custom Word2Vec

Issue - State: closed - Opened by Delphine22 over 4 years ago - 1 comment

#47 - Number isseue in bpemb

Issue - State: closed - Opened by aimanmutasem over 4 years ago - 1 comment

#46 - How to use BPEmb as pre-trining model

Issue - State: closed - Opened by aimanmutasem over 4 years ago

#45 - UNK words in the prediction output

Issue - State: closed - Opened by aimanmutasem over 4 years ago - 2 comments

#44 - EOFError: Compressed file ended before the end-of-stream marker was reached

Issue - State: closed - Opened by aimanmutasem over 4 years ago - 4 comments

#43 - Vocabulary size issue

Issue - State: closed - Opened by aimanmutasem over 4 years ago - 2 comments

#42 - Update the pypi package

Issue - State: closed - Opened by mauryaland over 4 years ago - 1 comment

#41 - Encode with EOS: change function call

Pull Request - State: closed - Opened by hubertkarbowy over 4 years ago - 1 comment

#40 - setup: ensure utf-8 encoding when reading README.md

Pull Request - State: closed - Opened by hartb almost 5 years ago - 1 comment

#39 - Subword vectors to word vector

Issue - State: closed - Opened by susmoy-macgill36 almost 5 years ago - 1 comment

#38 - Adding support for own models

Issue - State: closed - Opened by stephantul almost 5 years ago - 3 comments

#37 - AttributeError: module 'smart_open' has no attribute 's3'

Issue - State: closed - Opened by ssp573 almost 5 years ago - 1 comment

#36 - Continue training

Issue - State: closed - Opened by ericlingit about 5 years ago - 1 comment

#35 - question on https://nlp.h-its.org

Issue - State: closed - Opened by jwijffels about 5 years ago - 4 comments

#34 - version of sentencepiece used

Issue - State: closed - Opened by jwijffels about 5 years ago - 4 comments

#33 - Is there a way to specify the maximum number of subwords so that I can get an embedding of fixed size?

Issue - State: closed - Opened by subrahmanyap about 5 years ago - 1 comment

#32 - Difference between "en.wiki.bpe.vs50000" and "en.wiki.bpe.op50000"

Issue - State: closed - Opened by caozhen-alex over 5 years ago - 2 comments

#31 - model/embedding versioning?

Issue - State: closed - Opened by aparrish over 5 years ago - 2 comments

#30 - Why do Digits always mapped to zero?

Issue - State: closed - Opened by sumyatthitsar over 5 years ago - 2 comments

#29 - Compare embeddings

Issue - State: closed - Opened by loretoparisi over 5 years ago - 2 comments

#28 - tokenization only feature

Issue - State: closed - Opened by trideeprath almost 6 years ago - 1 comment

#27 - most_similar method

Issue - State: closed - Opened by trideeprath almost 6 years ago - 1 comment

#26 - The index for <unk> is 0, so what about <pad>?

Issue - State: closed - Opened by ghost almost 6 years ago

#25 - How do you get the embedding/id for the pad token ?

Issue - State: closed - Opened by derlin almost 6 years ago - 3 comments

#24 - Syntax error while importing

Issue - State: closed - Opened by amansrivastava17 almost 6 years ago - 1 comment

#23 - load vectors from path

Issue - State: closed - Opened by alejandrojcastaneira about 6 years ago - 1 comment

#22 - Training customized bpemb

Issue - State: closed - Opened by gccome about 6 years ago - 1 comment

#21 - size/source of training corpora

Issue - State: closed - Opened by joemzhao about 6 years ago - 2 comments

#20 - numbers/digits conversion

Issue - State: closed - Opened by csarron about 6 years ago - 3 comments

#19 - multilingual text

Issue - State: closed - Opened by rohitsaluja22 about 6 years ago - 2 comments

#18 - Error when loading model

Issue - State: closed - Opened by Hoiy about 6 years ago - 2 comments

#17 - Fix http_get & remove f-strings

Pull Request - State: closed - Opened by sanghoon about 6 years ago - 1 comment

#16 - Encoder not splitting words into subwords

Issue - State: closed - Opened by SamLynnEvans about 6 years ago - 2 comments

#15 - fix typo (install -> import)

Pull Request - State: closed - Opened by jfilter about 6 years ago - 1 comment

#14 - Missing tokens in German model

Issue - State: closed - Opened by maurice-g over 6 years ago - 3 comments

#13 - SentencePiece fails?

Issue - State: closed - Opened by gwohlgen over 6 years ago - 2 comments

#12 - Train --model_type=unigram

Issue - State: closed - Opened by taku910 over 6 years ago - 2 comments

#11 - How do you learn the Chinese BPE?

Issue - State: closed - Opened by Shuailong almost 7 years ago - 2 comments

#10 - On-the-fly conversion to subwords in Python

Issue - State: closed - Opened by jbingel almost 7 years ago - 2 comments

#9 - Comparison to other word vectors

Issue - State: closed - Opened by DonaldTsang about 7 years ago - 1 comment

#8 - No question marks in Russian models

Issue - State: closed - Opened by avostryakov about 7 years ago - 3 comments

#7 - Vocab length != word vector count

Issue - State: closed - Opened by tocab about 7 years ago - 5 comments

#6 - Some embeddings are invalid (majority of vectors is inf or nan)

Issue - State: closed - Opened by leezu about 7 years ago - 5 comments

#5 - Training script

Issue - State: open - Opened by lparam about 7 years ago - 10 comments

GitHub / bheinzerling/bpemb issues and pull requests