Helsinki-NLP/OpusFilter issues and pull requests

#76 - support selection of possible languages for lingua

Pull Request - State: open - Opened by svirpioj 7 days ago

#75 - Opusfilter fails to compress data when it is downloaded via moses

Issue - State: closed - Opened by thfrkielikone 7 months ago - 3 comments

#74 - Cache behaviour

Issue - State: closed - Opened by thfrkielikone 7 months ago - 3 comments

#73 - Make some older libraries optional

Pull Request - State: closed - Opened by svirpioj 7 months ago

#72 - Installing on fedora 40

Issue - State: closed - Opened by thfrkielikone 8 months ago - 5 comments

#71 - fix score method in SentenceEmbeddingFilter

Pull Request - State: closed - Opened by svirpioj 10 months ago

#70 - SentenceEmbeddingFilter chunksize clashes with general chunksize

Issue - State: closed - Opened by miau1 10 months ago - 1 comment
Labels: bug

#69 - Issue with opus-fast-mosestokenizer dep for ARM-macs

Issue - State: open - Opened by rggdmonk 11 months ago - 3 comments

#68 - LMclassify always score 1

Issue - State: closed - Opened by wuyangjian 12 months ago - 3 comments

#67 - Add lingua-py support for language identification

Pull Request - State: closed - Opened by svirpioj about 1 year ago

#66 - Add support for fastspell for language identification

Issue - State: open - Opened by marco-c about 1 year ago

#65 - Add lingua-py support for language identification

Pull Request - State: closed - Opened by marco-c about 1 year ago - 1 comment

#64 - Refactor autogen code

Pull Request - State: closed - Opened by svirpioj over 1 year ago
Labels: enhancement

#63 - eflomal crashes during filtering

Issue - State: open - Opened by yvesscherrer over 1 year ago - 1 comment

#61 - Issue during installation

Issue - State: closed - Opened by evramnarouz over 1 year ago - 3 comments
Labels: installation

#60 - Add pyyaml to requirements

Issue - State: closed - Opened by yvesscherrer almost 2 years ago - 1 comment
Labels: invalid

#59 - insufficient documentation

Issue - State: closed - Opened by jairosg almost 2 years ago - 1 comment
Labels: documentation

#58 - Install eflomal from PyPI and use the new interface in WordAlignFilter

Pull Request - State: closed - Opened by svirpioj almost 2 years ago

#57 - switch to opus-fast-mosestokenizer

Pull Request - State: closed - Opened by svirpioj almost 2 years ago

#56 - Bump setuptools from 58.0.0 to 65.5.1

Pull Request - State: closed - Opened by dependabot[bot] about 2 years ago - 1 comment
Labels: dependencies

#55 - build documentation with sphinx

Pull Request - State: closed - Opened by svirpioj over 2 years ago

#54 - migrate docs to sphinx

Pull Request - State: closed - Opened by BrightXiaoHan over 2 years ago - 2 comments

#53 - Integration with MTData

Issue - State: open - Opened by svirpioj over 2 years ago
Labels: enhancement

#52 - Better word alignment filter

Issue - State: open - Opened by svirpioj over 2 years ago - 1 comment
Labels: enhancement

#51 - Automatic configuration generation

Pull Request - State: closed - Opened by svirpioj over 2 years ago

#50 - Improve handling whitespace in Jieba and MeCab tokenization

Pull Request - State: closed - Opened by svirpioj over 2 years ago

#49 - feature: add parallel decorator for functions preprocess, score, and filter

Pull Request - State: closed - Opened by BrightXiaoHan over 2 years ago - 6 comments

#48 - fix jieba tokenize and detokenize funcs.

Pull Request - State: closed - Opened by BrightXiaoHan over 2 years ago - 2 comments

#47 - fix: missing the checker for param

Pull Request - State: closed - Opened by BrightXiaoHan over 2 years ago - 1 comment
Labels: bug

#46 - Process Killed

Issue - State: closed - Opened by bayesrule almost 3 years ago - 2 comments

#45 - Add subword segmentation support

Pull Request - State: closed - Opened by svirpioj almost 3 years ago
Labels: enhancement

#44 - add SentenceEmbeddingFilter and ParallelNearestNeighbors model

Pull Request - State: closed - Opened by svirpioj almost 3 years ago

#43 - Add support for Japanese tokenization

Pull Request - State: closed - Opened by svirpioj almost 3 years ago
Labels: enhancement

#42 - add SimilarityFilter

Pull Request - State: closed - Opened by svirpioj almost 3 years ago
Labels: enhancement

#41 - Debug the configuration by export filtered corpus.

Issue - State: closed - Opened by BrightXiaoHan almost 3 years ago - 2 comments
Labels: question

#40 - allow per-language parameters for length filters

Pull Request - State: closed - Opened by svirpioj almost 3 years ago - 1 comment
Labels: enhancement

#39 - fix bug in classifier training and improve unit tests

Pull Request - State: closed - Opened by svirpioj almost 3 years ago
Labels: bug

#38 - Specify different "unit" types in filters.

Issue - State: closed - Opened by BrightXiaoHan almost 3 years ago - 2 comments
Labels: enhancement

#37 - Version 2.3.0 breaks train_classifier function

Issue - State: closed - Opened by wujameszj about 3 years ago - 1 comment
Labels: bug

#36 - add option to save scores in train_alignment

Pull Request - State: closed - Opened by svirpioj about 3 years ago
Labels: enhancement

#35 - add RepetitionFilter

Pull Request - State: closed - Opened by svirpioj about 3 years ago
Labels: enhancement

#34 - Is it possible to generate score file during training alignment model?

Issue - State: closed - Opened by BrightXiaoHan about 3 years ago - 6 comments
Labels: enhancement

#33 - Add LMClassifierFilter

Pull Request - State: closed - Opened by svirpioj about 3 years ago

#32 - add MonolingualSentenceSplitter

Pull Request - State: closed - Opened by svirpioj about 3 years ago

#31 - Possible bug in word_alignment accept function

Issue - State: closed - Opened by tomsbergmanis about 3 years ago - 5 comments
Labels: invalid

#30 - tokenizer ignored when creating align.priors

Issue - State: closed - Opened by tomsbergmanis about 3 years ago - 2 comments
Labels: invalid

#29 - Add method-specific options for LanguageIDFilter

Pull Request - State: closed - Opened by svirpioj about 3 years ago

#28 - Use multicore to accelerate score, filter and tokenize processes.

Issue - State: closed - Opened by BrightXiaoHan about 3 years ago - 5 comments
Labels: enhancement

#27 - add jieba tokenizer for Chinese

Pull Request - State: closed - Opened by svirpioj about 3 years ago - 1 comment
Labels: enhancement

#26 - opusfilter : command not found

Issue - State: closed - Opened by Pkscode over 3 years ago - 2 comments
Labels: installation

#25 - pandas<1.0.0 not supported in opusfilter>=2.0.0

Issue - State: closed - Opened by svirpioj over 3 years ago - 1 comment
Labels: bug

#24 - How to choose threshold for WordAlignFilter?

Issue - State: closed - Opened by BrightXiaoHan over 3 years ago

#23 - add jieba tokenizer for Chinese corpus.

Pull Request - State: closed - Opened by BrightXiaoHan over 3 years ago - 5 comments
Labels: enhancement

#22 - Installation fails on Windows

Issue - State: closed - Opened by aarnetalman over 3 years ago - 1 comment
Labels: documentation

#21 - Installation using pip fails

Issue - State: closed - Opened by aarnetalman over 3 years ago - 5 comments
Labels: bug

#20 - Add support to fasttext for language detection

Pull Request - State: closed - Opened by svirpioj over 3 years ago

#19 - Add suppress_prompts parameter for opus_read

Pull Request - State: closed - Opened by radinplaid over 3 years ago

#18 - Add option to suppress download confirmation for "opus_read" (Issue #10)

Pull Request - State: closed - Opened by radinplaid over 3 years ago
Labels: enhancement

#17 - add function for downloading a single file

Pull Request - State: closed - Opened by svirpioj over 3 years ago
Labels: enhancement

#16 - restrict build-n-publish job to pushed tags

Pull Request - State: closed - Opened by svirpioj over 3 years ago

#15 - fix build-n-publish job

Pull Request - State: closed - Opened by svirpioj over 3 years ago
Labels: bug

#14 - Add support to fasttext for language detection (Develop branch)

Pull Request - State: closed - Opened by kirianguiller over 3 years ago - 4 comments
Labels: enhancement

#13 - Extended YAML configuration

Pull Request - State: closed - Opened by svirpioj over 3 years ago
Labels: enhancement

#12 - Add support to fasttext for language detection

Pull Request - State: closed - Opened by kirianguiller over 3 years ago - 4 comments
Labels: enhancement

#11 - TypeError when processing ParaCrawl

Issue - State: closed - Opened by lefterav over 3 years ago - 1 comment
Labels: bug

#10 - Option to suppress download confirmation for "opus_read"

Issue - State: closed - Opened by lefterav over 3 years ago - 2 comments
Labels: enhancement

#9 - Tokenization behavior in WordAlignFilter

Issue - State: closed - Opened by yvesscherrer almost 4 years ago - 4 comments
Labels: bug

#8 - Additional filter suggestion: remove lines with repeated content

Issue - State: closed - Opened by yvesscherrer almost 4 years ago - 3 comments
Labels: enhancement

#7 - Language id filter comparison

Issue - State: closed - Opened by yvesscherrer almost 4 years ago - 3 comments
Labels: enhancement

#6 - LanguageIDFilter filter error

Issue - State: closed - Opened by virgulvirgul about 4 years ago - 2 comments
Labels: bug

#5 - use latest release if not provided

Pull Request - State: closed - Opened by jbrry about 4 years ago - 3 comments
Labels: enhancement

#4 - Option to keep blank lines

Issue - State: closed - Opened by jbrry about 4 years ago - 3 comments
Labels: enhancement

#3 - standardize_dataframe_scores receives empty data frame in classifier.py on nlingual-rebase branch

Issue - State: closed - Opened by jbrry over 4 years ago - 4 comments
Labels: bug

#2 - LM paths do not use output_directory

Issue - State: closed - Opened by yvesscherrer almost 5 years ago - 2 comments
Labels: bug

#1 - Infinite scores from word aligment

Issue - State: open - Opened by svirpioj almost 5 years ago
Labels: bug

GitHub / Helsinki-NLP/OpusFilter issues and pull requests