Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / EFord36/normalise issues and pull requests

#125 - Eas fixes

Pull Request - State: closed - Opened by esrel about 3 years ago

#124 - Module not found 'sklearn.semi_supervised.label_propagation'

Issue - State: open - Opened by dimaelzein over 3 years ago - 6 comments

#123 - Normalizing the text often removes the last word of one sentence

Issue - State: closed - Opened by 121898 almost 4 years ago - 2 comments

#122 - Warning: Careful using a custom tokenizer...

Issue - State: open - Opened by PetrochukM about 4 years ago

#121 - IndexError: list index out of range

Issue - State: open - Opened by NouamaneTazi almost 5 years ago

#120 - UserWarning re: LabelPropagation

Issue - State: open - Opened by bbookman almost 5 years ago - 1 comment

#119 - FutureWarning re: sklearn.semi_supervised.label_propagation

Issue - State: open - Opened by bbookman almost 5 years ago - 2 comments

#118 - Add functionality to be able disable modules

Issue - State: closed - Opened by mbalatsko over 5 years ago

#117 - wrong normalization

Issue - State: closed - Opened by cmatosve over 6 years ago - 2 comments

#116 - Squared/cubed symbols deleted, eg. 20cm² -> 'twenty'

Issue - State: closed - Opened by emmaflint27 about 7 years ago
Labels: inaccuracy

#115 - US and international phone numbers, eg. +44 (0)1223 760812, (905) 513-7480

Issue - State: open - Opened by emmaflint27 about 7 years ago
Labels: enhancement

#114 - Roman numerals, eg. Pope Leo X, Henry VIII, Elizabeth II

Issue - State: closed - Opened by emmaflint27 about 7 years ago
Labels: enhancement

#112 - '80km' -> 'eighty andeighty', getting tagged as NUMB not SPLT + incorrect expansion

Issue - State: closed - Opened by emmaflint27 about 7 years ago
Labels: inaccuracy

#111 - '°C' gets split and incorrectly expanded to 'degrees century'

Issue - State: closed - Opened by emmaflint27 about 7 years ago
Labels: inaccuracy

#109 - User abbreviation not working properly

Issue - State: closed - Opened by javidalkaruzi over 7 years ago - 4 comments

#108 - Command line improvements

Pull Request - State: closed - Opened by EFord36 about 8 years ago

#107 - Updated version number

Pull Request - State: closed - Opened by EFord36 about 8 years ago

#106 - Updated command line tool

Pull Request - State: closed - Opened by EFord36 about 8 years ago - 1 comment

#105 - Command line usage could allow multiple files

Issue - State: closed - Opened by EFord36 about 8 years ago - 1 comment
Labels: enhancement

#104 - Command line usage could allow custom abbrevs in specified file

Issue - State: closed - Opened by EFord36 about 8 years ago - 1 comment
Labels: enhancement

#103 - 11/04/1996 not tagged as NDATE

Issue - State: closed - Opened by EFord36 about 8 years ago - 1 comment
Labels: inaccuracy

#102 - tokenize_basic fails with newline

Issue - State: closed - Opened by EFord36 about 8 years ago
Labels: bug

#101 - Issue with NDATE expansion

Issue - State: closed - Opened by EFord36 about 8 years ago - 1 comment
Labels: bug

#100 - Pickle files won't load because of directory issues

Issue - State: closed - Opened by EFord36 about 8 years ago - 1 comment
Labels: bug

#99 - Tokenizer deletes final word if it ends with '!' (and presumably '?')

Issue - State: closed - Opened by EFord36 about 8 years ago - 1 comment
Labels: bug

#98 - gen_sig speed improvement and rude dict

Pull Request - State: closed - Opened by EFord36 about 8 years ago

#97 - Possibility of print statements on running normalisation.py

Issue - State: closed - Opened by EFord36 about 8 years ago - 1 comment
Labels: question

#96 - Add command line functionality to module

Issue - State: closed - Opened by EFord36 about 8 years ago - 1 comment
Labels: enhancement

#95 - tokenize_basic fails with brackets

Issue - State: closed - Opened by EFord36 about 8 years ago - 1 comment
Labels: bug

#94 - Ready readme for public release

Issue - State: closed - Opened by EFord36 about 8 years ago
Labels: style

#93 - Fixes #92

Pull Request - State: closed - Opened by EFord36 about 8 years ago

#92 - crashes when trying to split

Issue - State: closed - Opened by EFord36 about 8 years ago - 1 comment
Labels: bug

#91 - Add print statements to normalisation

Issue - State: closed - Opened by EFord36 about 8 years ago - 2 comments
Labels: style

#90 - "04:00GMT" tagged as SPLT but doesn't split?

Issue - State: closed - Opened by emmaflint27 about 8 years ago
Labels: inaccuracy

#89 - Delete all spyder created strings in files

Issue - State: closed - Opened by EFord36 about 8 years ago
Labels: style

#88 - Introduce testing

Issue - State: open - Opened by EFord36 about 8 years ago
Labels: enhancement

#87 - Add support for emails

Issue - State: closed - Opened by EFord36 about 8 years ago
Labels: enhancement

#86 - Lack of data for NDATE, NTEL, NSCI

Issue - State: closed - Opened by emmaflint27 about 8 years ago
Labels: inaccuracy

#85 - Street vs Saint

Issue - State: closed - Opened by EFord36 about 8 years ago
Labels: inaccuracy

#84 - Tidying

Pull Request - State: closed - Opened by EFord36 about 8 years ago

#83 - Tokenizer and api

Pull Request - State: closed - Opened by EFord36 about 8 years ago

#82 - Added NSCI tag

Pull Request - State: closed - Opened by emmaflint27 about 8 years ago

#81 - WDLK expansions frequently very incorrect

Issue - State: open - Opened by EFord36 about 8 years ago
Labels: inaccuracy

#80 - Abbreviations that aren't titlecase are tagged as LSEQ

Issue - State: closed - Opened by EFord36 about 8 years ago
Labels: inaccuracy

#79 - class_ALPHA improvements

Pull Request - State: closed - Opened by emmaflint27 about 8 years ago

#78 - Fixes #37

Pull Request - State: closed - Opened by EFord36 about 8 years ago

#77 - Bug fixes

Pull Request - State: closed - Opened by EFord36 about 8 years ago

#76 - Reorganisation

Pull Request - State: closed - Opened by EFord36 about 8 years ago

#75 - '+' in numbers, eg. +1, +2

Issue - State: closed - Opened by emmaflint27 about 8 years ago - 1 comment

#74 - gen_candidates('abbrv') doesn't return anything (should return at least 'abbreviation')

Issue - State: open - Opened by EFord36 about 8 years ago - 1 comment
Labels: inaccuracy

#73 - Add 'of' to date expansions, eg. 13th of January

Issue - State: closed - Opened by emmaflint27 about 8 years ago
Labels: enhancement

#72 - Add BMONEY tag and expander

Issue - State: closed - Opened by emmaflint27 about 8 years ago
Labels: enhancement

#71 - Expand expn improvements

Pull Request - State: closed - Opened by EFord36 about 8 years ago

#70 - Expand_EXPN improvements

Pull Request - State: closed - Opened by EFord36 about 8 years ago

#69 - Add support for currency ranges, eg. $5-8,000, in expand_MONEY

Issue - State: closed - Opened by emmaflint27 about 8 years ago
Labels: inaccuracy

#68 - $7.00 --> seven point zero zero dollars

Issue - State: closed - Opened by emmaflint27 about 8 years ago
Labels: inaccuracy

#67 - '6-4-2' classified as NUM and expanded as 'six hundred and forty two' - should be NDIG

Issue - State: closed - Opened by emmaflint27 about 8 years ago - 1 comment
Labels: inaccuracy

#66 - 1/16 classified as NDATE

Issue - State: closed - Opened by emmaflint27 about 8 years ago - 1 comment
Labels: inaccuracy

#65 - Gen_frame fails when near text boundary and fractional index

Issue - State: closed - Opened by EFord36 about 8 years ago
Labels: bug

#64 - expand_MONEY fails on amounts with the format eg. "$.027"

Issue - State: closed - Opened by emmaflint27 about 8 years ago
Labels: bug

#63 - expand_NUM fails to expand number ranges of the format, eg. "2:28-:33"

Issue - State: open - Opened by emmaflint27 about 8 years ago - 2 comments
Labels: inaccuracy

#62 - Problem with expand_NUM - '1056' --> "one thousand, fift and six"

Issue - State: closed - Opened by emmaflint27 about 8 years ago
Labels: inaccuracy

#61 - 'US' not tagged as NSW since it looks like 'us' capitalised

Issue - State: closed - Opened by EFord36 about 8 years ago
Labels: inaccuracy

#60 - Word_tokenized vs. word_tokenized_lowered

Issue - State: closed - Opened by EFord36 about 8 years ago - 1 comment
Labels: question

#59 - expand_PROF fails with '**b' [258604]

Issue - State: closed - Opened by emmaflint27 about 8 years ago
Labels: bug

#58 - Number Classification

Pull Request - State: closed - Opened by emmaflint27 about 8 years ago

#57 - Add support for fractions, eg. 1/16 - also solve confusion b/w dates and fractions

Issue - State: closed - Opened by emmaflint27 about 8 years ago
Labels: enhancement

#56 - 'I' never detected as an NSW (potential Roman Numeral)

Issue - State: open - Opened by EFord36 about 8 years ago - 1 comment
Labels: inaccuracy

#55 - Create a function to judge if a word is phonotactically valid

Issue - State: closed - Opened by EFord36 about 8 years ago - 1 comment
Labels: enhancement

#54 - Classifier to figure out if text is AmE or BrE (in order to better do dates)

Issue - State: open - Opened by EFord36 about 8 years ago - 1 comment
Labels: enhancement

#53 - Add features to class_NUMB for NDATE, NTEL, NADDR etc.

Issue - State: closed - Opened by emmaflint27 about 8 years ago
Labels: enhancement

#52 - expand_NUM fails with multiple '.' in a number

Issue - State: closed - Opened by EFord36 about 8 years ago - 1 comment
Labels: bug

#51 - Add feature to class_ALPHA to identify abbreviations of the form 'abbrv' etc.

Issue - State: closed - Opened by emmaflint27 about 8 years ago - 1 comment
Labels: inaccuracy

#50 - expand_PRCT fails when given '4-1/2%'

Issue - State: closed - Opened by EFord36 about 8 years ago - 2 comments
Labels: inaccuracy

#49 - Closes #24

Pull Request - State: closed - Opened by EFord36 about 8 years ago

#47 - Gen_Context doesn't work for words in first or last sentence

Issue - State: closed - Opened by EFord36 about 8 years ago
Labels: bug

#46 - Add support for HTAG

Issue - State: closed - Opened by EFord36 about 8 years ago - 1 comment
Labels: enhancement

#45 - Add support for URL

Issue - State: closed - Opened by EFord36 about 8 years ago
Labels: enhancement

#43 - Add support for NDATE

Issue - State: closed - Opened by EFord36 about 8 years ago
Labels: enhancement

#42 - Add support for NADDR

Issue - State: closed - Opened by EFord36 about 8 years ago - 1 comment
Labels: enhancement

#41 - Add support for NTEL

Issue - State: closed - Opened by EFord36 about 8 years ago
Labels: enhancement

#40 - Add support for Roman Numerals

Issue - State: closed - Opened by EFord36 about 8 years ago
Labels: enhancement

#38 - Change the way gen_candidates works so as to be more inclusive

Issue - State: closed - Opened by EFord36 about 8 years ago - 1 comment
Labels: enhancement

#37 - 5 Dec. --> 5th December, 'the 5th December' might be nicer

Issue - State: closed - Opened by emmaflint27 about 8 years ago
Labels: enhancement

#36 - Improve gen_signature relevance - could use idf, could also reduce size of signature

Issue - State: closed - Opened by EFord36 about 8 years ago
Labels: enhancement

#35 - Use idf from nltk 'TextCollection' object to improve frequency metric for expand_EXPN

Issue - State: closed - Opened by EFord36 about 8 years ago - 1 comment
Labels: enhancement

#32 - 'pre' --> LSEQ

Issue - State: closed - Opened by emmaflint27 about 8 years ago
Labels: inaccuracy

#31 - Words ending s' --> LSEQ, eg. Hughes', should be WDLK

Issue - State: closed - Opened by emmaflint27 about 8 years ago
Labels: inaccuracy

#30 - "O'Neill" tagged as LSEQ --> O N E I L L, should be WDLK

Issue - State: closed - Opened by emmaflint27 about 8 years ago - 1 comment
Labels: inaccuracy

#29 - 'multi' corrects to 'must' when in a SPLT token (e.g. 593 in word_tokenized)

Issue - State: open - Opened by EFord36 about 8 years ago - 1 comment
Labels: duplicate, inaccuracy

#21 - '&' gets tagged as MISC and then NONE

Issue - State: closed - Opened by EFord36 about 8 years ago
Labels: inaccuracy

#20 - '...' deleted (tagged as MISC, NONE), may be wanted for intonation

Issue - State: open - Opened by EFord36 about 8 years ago
Labels: question

#15 - expand_MONEY

Issue - State: open - Opened by emmaflint27 about 8 years ago - 1 comment
Labels: enhancement

#13 - Class_ALPHA misclassifies: etc., m.p.h, lbs., figs., nos., dept., r.p.m.

Issue - State: closed - Opened by EFord36 about 8 years ago
Labels: inaccuracy

#12 - in class_NUMB, can use (previous word in abbrev_dict) as a feature for classifier

Issue - State: closed - Opened by EFord36 about 8 years ago - 1 comment
Labels: enhancement

#10 - split() function doesn't split by spaces, slashes or underscores

Issue - State: closed - Opened by EFord36 about 8 years ago
Labels: inaccuracy

#8 - Create NSCI tag for co-ordinates, degrees, feet and inches

Issue - State: closed - Opened by EFord36 about 8 years ago - 1 comment
Labels: enhancement