Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / kpu/preprocess issues and pull requests

#43 - Apply upstream double-conversion ifdef changes to build on macOS arm

Pull Request - State: closed - Opened by gregtatum 9 months ago - 1 comment

#42 - Trouble Extracting Monolingual Datasets from SeamlessAlign

Issue - State: closed - Opened by nassergharbi 10 months ago - 1 comment

#41 - Fail downloading Seamless align data

Issue - State: open - Opened by lzl-mt about 1 year ago - 1 comment

#40 - Error when reconstructing Seamless data

Issue - State: closed - Opened by starshine360 about 1 year ago - 7 comments

#39 - Error when compiling with cmake:

Issue - State: closed - Opened by starshine360 over 1 year ago - 2 comments

#38 - Add support for Armenian to sentence splitter

Pull Request - State: closed - Opened by kpu over 1 year ago

#37 - Add `Reserve()` to `AutoProbing`

Pull Request - State: open - Opened by jelmervdl over 1 year ago - 1 comment

#36 - Split initialisation and declaration

Pull Request - State: closed - Opened by jelmervdl over 1 year ago

#35 - -k from `cache` doesn't work when numbers are not sorted

Issue - State: closed - Opened by cgr71ii about 2 years ago - 3 comments

#34 - Rule-based cleaning

Pull Request - State: closed - Opened by kpuatfb over 2 years ago

#33 - Add column selection support for deduper

Pull Request - State: closed - Opened by kpuatfb almost 3 years ago

#32 - Catch eof exception in cache when output is empty

Pull Request - State: closed - Opened by lpla about 3 years ago - 2 comments

#31 - Change split_single_document to work on STDIN & STDOUT

Pull Request - State: open - Opened by jelmervdl over 3 years ago

#30 - Sentence splitter uses unbounded memory in -k mode

Issue - State: open - Opened by kpu over 3 years ago

#29 - Add ENOTSUP to the ignored errors in FSyncIgnoreUnsupported

Pull Request - State: closed - Opened by jelmervdl over 3 years ago - 2 comments

#28 - Use self-pipe trick to teleport execvp failure from fork to parent

Pull Request - State: closed - Opened by jelmervdl over 3 years ago - 3 comments

#27 - foldfilter still expects input even when the command is invalid

Issue - State: closed - Opened by kpu over 3 years ago

#26 - WIP: batch_dedupe tool to deduplicate across batches

Pull Request - State: closed - Opened by jelmervdl over 3 years ago - 1 comment

#25 - Add -c option to split-sentences.perl

Pull Request - State: open - Opened by jelmervdl almost 4 years ago - 1 comment

#23 - Added a simple program to compute Murmurhash in a md5sum way

Pull Request - State: closed - Opened by lpla almost 4 years ago

#22 - Compilation error if zlib is not installed

Issue - State: closed - Opened by cgr71ii almost 4 years ago - 1 comment

#21 - foldfilter breaks translation from language without spaces to language with spaces

Issue - State: open - Opened by kpu almost 4 years ago - 4 comments

#19 - b64filter: base64-encode & output documents on the go

Issue - State: closed - Opened by jelmervdl about 4 years ago - 1 comment

#18 - Reintroduce base64 mode

Pull Request - State: closed - Opened by jelmervdl about 4 years ago

#17 - b64filter, foldfilter and docenc

Pull Request - State: closed - Opened by jelmervdl about 4 years ago

#16 - Cache util::EndOfFileException

Issue - State: closed - Opened by zuny26 over 4 years ago - 1 comment

#15 - 'unicode/stringpiece.h' file not found when running

Issue - State: closed - Opened by fenimi over 4 years ago - 4 comments

#14 - Undefined reference to boost unit_test while using make

Issue - State: open - Opened by abdullahkhilji over 4 years ago - 1 comment

#13 - Don't lose scoped_memory source when realloc

Pull Request - State: closed - Opened by jelmervdl over 4 years ago - 1 comment

#12 - Add base64 encode/decode as an option to existing sentence splitter

Pull Request - State: closed - Opened by jelmervdl over 4 years ago

#11 - Fix for assertion error when writing exactly 16384 bytes to FakeOFStream

Pull Request - State: closed - Opened by jelmervdl over 4 years ago - 2 comments

#10 - truecaser not identical to perl script

Issue - State: open - Opened by kpu almost 5 years ago - 1 comment

#9 - Match regular expression with code comment

Pull Request - State: open - Opened by lpla about 5 years ago - 3 comments

#8 - cache: Column option -k/--key

Pull Request - State: closed - Opened by lpla about 5 years ago - 1 comment

#7 - Corpus Tokenization

Issue - State: closed - Opened by ndvbd over 5 years ago - 3 comments

#6 - Error reporting for `cache` program

Issue - State: closed - Opened by kpu over 5 years ago

#5 - Added support for CMake 2.6 adding FindICU.cmake file to the project

Pull Request - State: closed - Opened by lpla about 6 years ago - 2 comments

#4 - support munging filenames. For xz/gzip/other compression

Pull Request - State: closed - Opened by hieuhoang about 6 years ago - 1 comment

#3 - Error Cmake

Issue - State: closed - Opened by binhvq over 6 years ago - 2 comments

#2 - Unknown CMake command "AddExes"

Issue - State: closed - Opened by callison-burch over 6 years ago - 1 comment

#1 - compile with bjam

Issue - State: closed - Opened by windweller almost 7 years ago - 1 comment