Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / kpu/preprocess issues and pull requests
#43 - Apply upstream double-conversion ifdef changes to build on macOS arm
Pull Request -
State: closed - Opened by gregtatum 9 months ago
- 1 comment
#42 - Trouble Extracting Monolingual Datasets from SeamlessAlign
Issue -
State: closed - Opened by nassergharbi 10 months ago
- 1 comment
#41 - Fail downloading Seamless align data
Issue -
State: open - Opened by lzl-mt about 1 year ago
- 1 comment
#40 - Error when reconstructing Seamless data
Issue -
State: closed - Opened by starshine360 about 1 year ago
- 7 comments
#39 - Error when compiling with cmake:
Issue -
State: closed - Opened by starshine360 over 1 year ago
- 2 comments
#38 - Add support for Armenian to sentence splitter
Pull Request -
State: closed - Opened by kpu over 1 year ago
#37 - Add `Reserve()` to `AutoProbing`
Pull Request -
State: open - Opened by jelmervdl over 1 year ago
- 1 comment
#36 - Split initialisation and declaration
Pull Request -
State: closed - Opened by jelmervdl over 1 year ago
#35 - -k from `cache` doesn't work when numbers are not sorted
Issue -
State: closed - Opened by cgr71ii about 2 years ago
- 3 comments
#34 - Rule-based cleaning
Pull Request -
State: closed - Opened by kpuatfb over 2 years ago
#33 - Add column selection support for deduper
Pull Request -
State: closed - Opened by kpuatfb almost 3 years ago
#32 - Catch eof exception in cache when output is empty
Pull Request -
State: closed - Opened by lpla about 3 years ago
- 2 comments
#31 - Change split_single_document to work on STDIN & STDOUT
Pull Request -
State: open - Opened by jelmervdl over 3 years ago
#30 - Sentence splitter uses unbounded memory in -k mode
Issue -
State: open - Opened by kpu over 3 years ago
#29 - Add ENOTSUP to the ignored errors in FSyncIgnoreUnsupported
Pull Request -
State: closed - Opened by jelmervdl over 3 years ago
- 2 comments
#28 - Use self-pipe trick to teleport execvp failure from fork to parent
Pull Request -
State: closed - Opened by jelmervdl over 3 years ago
- 3 comments
#27 - foldfilter still expects input even when the command is invalid
Issue -
State: closed - Opened by kpu over 3 years ago
#26 - WIP: batch_dedupe tool to deduplicate across batches
Pull Request -
State: closed - Opened by jelmervdl over 3 years ago
- 1 comment
#25 - Add -c option to split-sentences.perl
Pull Request -
State: open - Opened by jelmervdl almost 4 years ago
- 1 comment
#24 - error: cannot convert ‘size_t* {aka long unsigned int*}’ to ‘int32_t* {aka int*}’ for argument ‘2’ to ‘UChar32 utf8_nextCharSafeBody_60(const uint8_t*, int32_t*, int32_t, UChar32, UBool)’
Issue -
State: closed - Opened by lpla almost 4 years ago
- 5 comments
#23 - Added a simple program to compute Murmurhash in a md5sum way
Pull Request -
State: closed - Opened by lpla almost 4 years ago
#22 - Compilation error if zlib is not installed
Issue -
State: closed - Opened by cgr71ii almost 4 years ago
- 1 comment
#21 - foldfilter breaks translation from language without spaces to language with spaces
Issue -
State: open - Opened by kpu almost 4 years ago
- 4 comments
#20 - Warning: Compatibility with CMake < 2.8.12 will be removed from a future version of CMake.
Issue -
State: closed - Opened by lpla almost 4 years ago
#19 - b64filter: base64-encode & output documents on the go
Issue -
State: closed - Opened by jelmervdl about 4 years ago
- 1 comment
#18 - Reintroduce base64 mode
Pull Request -
State: closed - Opened by jelmervdl about 4 years ago
#17 - b64filter, foldfilter and docenc
Pull Request -
State: closed - Opened by jelmervdl about 4 years ago
#16 - Cache util::EndOfFileException
Issue -
State: closed - Opened by zuny26 over 4 years ago
- 1 comment
#15 - 'unicode/stringpiece.h' file not found when running
Issue -
State: closed - Opened by fenimi over 4 years ago
- 4 comments
#14 - Undefined reference to boost unit_test while using make
Issue -
State: open - Opened by abdullahkhilji over 4 years ago
- 1 comment
#13 - Don't lose scoped_memory source when realloc
Pull Request -
State: closed - Opened by jelmervdl over 4 years ago
- 1 comment
#12 - Add base64 encode/decode as an option to existing sentence splitter
Pull Request -
State: closed - Opened by jelmervdl over 4 years ago
#11 - Fix for assertion error when writing exactly 16384 bytes to FakeOFStream
Pull Request -
State: closed - Opened by jelmervdl over 4 years ago
- 2 comments
#10 - truecaser not identical to perl script
Issue -
State: open - Opened by kpu almost 5 years ago
- 1 comment
#9 - Match regular expression with code comment
Pull Request -
State: open - Opened by lpla about 5 years ago
- 3 comments
#8 - cache: Column option -k/--key
Pull Request -
State: closed - Opened by lpla about 5 years ago
- 1 comment
#7 - Corpus Tokenization
Issue -
State: closed - Opened by ndvbd over 5 years ago
- 3 comments
#6 - Error reporting for `cache` program
Issue -
State: closed - Opened by kpu over 5 years ago
#5 - Added support for CMake 2.6 adding FindICU.cmake file to the project
Pull Request -
State: closed - Opened by lpla about 6 years ago
- 2 comments
#4 - support munging filenames. For xz/gzip/other compression
Pull Request -
State: closed - Opened by hieuhoang about 6 years ago
- 1 comment
#3 - Error Cmake
Issue -
State: closed - Opened by binhvq over 6 years ago
- 2 comments
#2 - Unknown CMake command "AddExes"
Issue -
State: closed - Opened by callison-burch over 6 years ago
- 1 comment
#1 - compile with bjam
Issue -
State: closed - Opened by windweller almost 7 years ago
- 1 comment