Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / diasks2/pragmatic_tokenizer issues and pull requests

#48 - Master dev/7 numbered lists

Pull Request - State: closed - Opened by abrazzini almost 4 years ago

#47 - Master dev/2 strip tags

Pull Request - State: closed - Opened by giovannelli almost 4 years ago

#46 - & symbol and URL's downcase

Pull Request - State: closed - Opened by giovannelli almost 4 years ago - 1 comment

#45 - Adding rules for tokenization of words with apostrophes in french

Pull Request - State: closed - Opened by taha-yassine over 5 years ago

#44 - Replicated in Crystal

Issue - State: open - Opened by watzon over 5 years ago - 1 comment

#43 - Non-breaking spaces should STILL be spaces

Pull Request - State: closed - Opened by wflanagan over 5 years ago - 2 comments

#42 - downcase: false shoudn't mean upcase for contractions

Issue - State: open - Opened by sheerun over 5 years ago

#41 - Contractions don't remove dots

Issue - State: open - Opened by sheerun over 5 years ago

#40 - multiple slashes within a string not properly processed

Issue - State: open - Opened by maia almost 6 years ago

#39 - speed improvements by optimisation of regular expressions

Pull Request - State: closed - Opened by maia over 6 years ago - 1 comment

#38 - lower memory usage by reducing object allocations

Pull Request - State: closed - Opened by maia over 6 years ago - 1 comment

#37 - NoMethodError (nil.length)

Issue - State: closed - Opened by maia about 7 years ago - 2 comments

#36 - fix deprecated warning for Ruby 2.4

Pull Request - State: closed - Opened by mmacia over 7 years ago - 3 comments

#35 - EMOJI_REGEX exception on JRuby

Issue - State: open - Opened by Arvinje over 8 years ago - 1 comment

#34 - stop words not replaceable

Issue - State: closed - Opened by maia over 8 years ago - 1 comment
Labels: duplicate

#33 - urls should not be downcased

Issue - State: open - Opened by maia over 8 years ago - 1 comment
Labels: bug, help wanted

#32 - long_word_split should not split emails, urls, twitter handles

Issue - State: closed - Opened by maia over 8 years ago - 1 comment
Labels: bug

#31 - stop words and filter languages

Issue - State: closed - Opened by maia almost 9 years ago - 2 comments
Labels: bug

#30 - unifying regex, using constants

Pull Request - State: closed - Opened by maia almost 9 years ago - 1 comment

#29 - refactored PostProcessor

Pull Request - State: closed - Opened by maia almost 9 years ago - 5 comments

#28 - cleanup pre_processor.rb

Pull Request - State: closed - Opened by maia almost 9 years ago - 1 comment

#27 - Speed

Issue - State: closed - Opened by diasks2 almost 9 years ago - 3 comments
Labels: enhancement

#26 - refactoring to style guide

Pull Request - State: closed - Opened by maia almost 9 years ago - 5 comments

#25 - Properly detect emoticons

Issue - State: open - Opened by diasks2 almost 9 years ago - 2 comments
Labels: enhancement, help wanted

#24 - characters test string

Issue - State: closed - Opened by maia almost 9 years ago - 2 comments
Labels: bug

#23 - mapping of similar characters (e.g. apostrophes)?

Issue - State: open - Opened by maia almost 9 years ago - 1 comment
Labels: enhancement

#22 - more specs

Issue - State: closed - Opened by maia almost 9 years ago - 2 comments

#21 - more specs

Issue - State: closed - Opened by maia almost 9 years ago - 2 comments

#20 - Identifying emojis by unicode ranges?

Issue - State: closed - Opened by maia almost 9 years ago - 4 comments
Labels: enhancement, question

#19 - Should all TLDs be whitelisted?

Issue - State: open - Opened by diasks2 almost 9 years ago - 1 comment
Labels: question

#18 - Definition of clean

Issue - State: closed - Opened by diasks2 almost 9 years ago - 2 comments

#17 - additional specs

Issue - State: closed - Opened by maia almost 9 years ago - 10 comments

#16 - splitting of words with # prefix at hyphen

Issue - State: closed - Opened by maia almost 9 years ago - 4 comments

#15 - classic_filter and non-acronyms

Issue - State: closed - Opened by maia almost 9 years ago - 1 comment

#14 - single quotes return different result based on language setting

Issue - State: closed - Opened by maia almost 9 years ago - 1 comment

#13 - remove_numbers should keep tokens that contain letters

Issue - State: closed - Opened by maia almost 9 years ago - 1 comment

#12 - option :clean removes hashtags

Issue - State: closed - Opened by maia almost 9 years ago - 1 comment

#11 - split long words

Issue - State: closed - Opened by maia almost 9 years ago - 1 comment

#10 - three options for each kind of token

Issue - State: closed - Opened by maia almost 9 years ago - 5 comments

#9 - feature overlap with pragmatic_segmenter?

Issue - State: open - Opened by maia almost 9 years ago - 1 comment
Labels: question

#8 - Allow user to specify abbreviations and/or stop words to be used

Issue - State: closed - Opened by diasks2 almost 9 years ago - 1 comment
Labels: enhancement

#7 - slow loading time

Issue - State: closed - Opened by maia almost 9 years ago - 3 comments
Labels: enhancement, question

#6 - ActiveSupport::Multibyte::Chars causing NoMethodError

Issue - State: closed - Opened by maia almost 9 years ago - 5 comments

#5 - option to require only specific languages?

Issue - State: open - Opened by maia almost 9 years ago - 2 comments
Labels: enhancement

#4 - german contractions list

Issue - State: closed - Opened by maia almost 9 years ago - 2 comments

#3 - updated german abbreviations

Issue - State: closed - Opened by maia almost 9 years ago - 3 comments

#2 - options should (also) allow symbols

Issue - State: closed - Opened by maia almost 9 years ago - 1 comment

#1 - additional specs

Issue - State: closed - Opened by maia almost 9 years ago - 12 comments