Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / HazyResearch/fonduer issues and pull requests

#546 - CandidateExtractor doesn't scale for larger relations

Issue - State: open - Opened by robbieculkin about 3 years ago - 1 comment

#545 - Resolve a memory leak by large data on out_queue (related to #494)

Pull Request - State: closed - Opened by YasushiMiyata over 3 years ago - 2 comments

#544 - Resolve memory leaks caused by add and commit to postgres (related to #494, redo #541)

Pull Request - State: closed - Opened by YasushiMiyata over 3 years ago
Labels: enhancement

#543 - docs: pin sphinx version to <4.0.0

Pull Request - State: closed - Opened by lukehsiao over 3 years ago - 1 comment
Labels: docs

#542 - Add multiline Japanese strings support to HocrVisualParser() to fix #534 and redo #537

Pull Request - State: closed - Opened by YasushiMiyata over 3 years ago - 2 comments
Labels: enhancement

#541 - Resolve memory leaks caused by adding and commiting to postgres (related to #494)

Pull Request - State: closed - Opened by YasushiMiyata over 3 years ago - 2 comments

#540 - Add multiline Japanese strings support to HocrVisualParser() to fix #534 and redo #537

Pull Request - State: closed - Opened by YasushiMiyata over 3 years ago - 15 comments

#539 - Fix sqlalchemy query error of test_postgres.py (Fix #538)

Pull Request - State: closed - Opened by YasushiMiyata over 3 years ago - 5 comments
Labels: bug

#537 - Add multiline Japanese strings support to HocrVisualParser() to fix #534

Pull Request - State: closed - Opened by YasushiMiyata over 3 years ago - 2 comments

#536 - Tables aren't redefined for re-runs of UDF apply

Issue - State: open - Opened by robbieculkin over 3 years ago - 5 comments

#535 - UDF hangs with no exception / warning

Issue - State: closed - Opened by robbieculkin over 3 years ago - 5 comments
Labels: bug

#534 - HOCRParser fails to multiline Japanese strings

Issue - State: closed - Opened by YasushiMiyata over 3 years ago - 2 comments

#533 - Its dead slow with Win10 + PY 3.6

Issue - State: closed - Opened by nageshsvs over 3 years ago - 2 comments

#532 - Parser can't handle big tables?

Issue - State: closed - Opened by linM24 almost 4 years ago - 3 comments

#531 - Update setup-miniconda to avoid the use of add-path and set-env

Pull Request - State: closed - Opened by HiromuHota almost 4 years ago - 6 comments

#530 - Update visual.py

Pull Request - State: closed - Opened by annelhote almost 4 years ago - 4 comments
Labels: docs

#528 - Use spaCy v2.3.0 or later to use HocrVisualParser

Pull Request - State: closed - Opened by HiromuHota about 4 years ago - 1 comment

#527 - Tokens not aligned error when spacy < 2.3.0

Issue - State: closed - Opened by HiromuHota about 4 years ago - 3 comments

#526 - Unwrap "ocrx_line" as well as "ocr_line" as Fonduer has no data model

Pull Request - State: closed - Opened by HiromuHota about 4 years ago - 2 comments

#525 - unable to read images in the pdf file

Issue - State: closed - Opened by ashleo25 about 4 years ago - 8 comments

#524 - Parser is not splitting multiple lines sentences properly

Issue - State: open - Opened by eng-khaled1 about 4 years ago - 3 comments

#523 - Suggestion required: Getting error while applying Featurizer

Issue - State: open - Opened by AshutoshUpadhya about 4 years ago - 3 comments

#522 - How can i extract a paragraph and all associated sentences in document

Issue - State: open - Opened by ashleo25 about 4 years ago - 1 comment
Labels: needs-info

#521 - HTMLDocPreprocessor for PDF documents is it always required

Issue - State: closed - Opened by ashleo25 about 4 years ago - 3 comments
Labels: discussion

#520 - Process the tail text only after child elements (#333)

Pull Request - State: closed - Opened by HiromuHota about 4 years ago - 2 comments

#519 - Add HOCRDocProprocessor and HocrVisualParser

Pull Request - State: closed - Opened by HiromuHota about 4 years ago - 9 comments
Labels: enhancement

#518 - Rename "VisualLinker" to "PdfVisualParser" to welcome "HocrVisualParser"

Pull Request - State: closed - Opened by HiromuHota about 4 years ago - 2 comments
Labels: enhancement

#517 - docs: fix epub warning by adding version to conf.py

Pull Request - State: closed - Opened by lukehsiao about 4 years ago
Labels: docs

#516 - docs: configure RTD using config file

Pull Request - State: closed - Opened by lukehsiao about 4 years ago - 2 comments
Labels: docs

#515 - Add a missing requirement for ReadTheDocs (#512)

Pull Request - State: closed - Opened by HiromuHota about 4 years ago - 5 comments
Labels: docs

#513 - CORE_XX was renamed to BASIC_XX at #283

Pull Request - State: closed - Opened by HiromuHota about 4 years ago - 1 comment

#512 - ReadTheDocs error

Issue - State: closed - Opened by HiromuHota about 4 years ago - 4 comments

#511 - Is this the right way to test the saved emmental models?

Issue - State: open - Opened by saikalyan9981 about 4 years ago - 5 comments

#510 - Improve an error message

Pull Request - State: closed - Opened by HiromuHota about 4 years ago - 1 comment

#509 - Native support for hOCR

Pull Request - State: closed - Opened by HiromuHota about 4 years ago - 2 comments

#508 - Use "--use-feature=2020-resolver" to fix #390

Pull Request - State: closed - Opened by HiromuHota about 4 years ago - 2 comments

#507 - BBox value errors

Issue - State: closed - Opened by saikalyan9981 about 4 years ago - 3 comments
Labels: duplicate

#506 - Support v2.3.X of spaCy, which includes pretrained models for Chinese and Japanese

Pull Request - State: closed - Opened by HiromuHota about 4 years ago - 1 comment

#505 - Move textual functions in data_model_utils.tabular to data_model_utils.textual

Pull Request - State: closed - Opened by HiromuHota about 4 years ago - 1 comment
Labels: clean-up

#504 - get_cell_ngrams and get_neighbor_cell_ngrams yield nothing when the mention is not tabular (fix #471)

Pull Request - State: closed - Opened by HiromuHota about 4 years ago - 1 comment
Labels: clean-up

#502 - Extracting Information from tables without Borders

Issue - State: closed - Opened by saikalyan9981 about 4 years ago - 4 comments

#501 - Duplicate key error while adding two mentions which are same

Issue - State: closed - Opened by saikalyan9981 about 4 years ago - 9 comments

#500 - Use miniconda to consolidate GitHub Actions workflow

Pull Request - State: closed - Opened by HiromuHota about 4 years ago - 1 comment

#499 - Adopt to black 20.8b

Pull Request - State: closed - Opened by HiromuHota about 4 years ago - 1 comment

#498 - Setup/teardown a database every unit test for better isolation

Pull Request - State: closed - Opened by HiromuHota about 4 years ago - 1 comment

#497 - Add `nullables` to candidate_subclass()

Pull Request - State: closed - Opened by HiromuHota about 4 years ago - 2 comments

#496 - Allow NULL mention in a candidate

Issue - State: closed - Opened by HiromuHota about 4 years ago

#495 - Commit Document and related objects every doc iteration

Pull Request - State: closed - Opened by HiromuHota about 4 years ago - 1 comment

#493 - Update CHANGELOG.rst

Pull Request - State: closed - Opened by YasushiMiyata over 4 years ago

#492 - Enable RegexMatchSpan with concatenates words by sep="(separator)" option

Pull Request - State: closed - Opened by YasushiMiyata over 4 years ago - 4 comments

#491 - No need to ignore type for torch.__version__ as of PyTorch 1.6.0

Pull Request - State: closed - Opened by HiromuHota over 4 years ago - 1 comment

#490 - Persist doc only when no error happens during parsing

Pull Request - State: closed - Opened by HiromuHota over 4 years ago - 3 comments

#488 - Initialize Drawing object every page not to carry over drawings to the following pages

Pull Request - State: closed - Opened by HiromuHota over 4 years ago - 3 comments

#486 - ValueError: zero-size array during _strlib_multinary_features

Issue - State: closed - Opened by HiromuHota over 4 years ago - 2 comments

#485 - Do not access doc.name in in_thread to prevent concurrent session access

Pull Request - State: closed - Opened by HiromuHota over 4 years ago - 4 comments

#484 - Use dict instead of list for much faster lookup

Pull Request - State: closed - Opened by HiromuHota over 4 years ago - 5 comments
Labels: enhancement

#483 - Unbearable slowness in `Featurizer.get_feature_matrices`

Issue - State: closed - Opened by HiromuHota over 4 years ago - 2 comments

#482 - sqlalchemy.exc.InvalidRequestError during labeler.apply or featurizer.apply

Issue - State: closed - Opened by HiromuHota over 4 years ago - 3 comments
Labels: bug

#481 - Fix get_axis_ngrams not to return None

Pull Request - State: closed - Opened by HiromuHota over 4 years ago - 1 comment
Labels: bug

#480 - Fix #469

Pull Request - State: closed - Opened by YasushiMiyata over 4 years ago - 1 comment
Labels: bug

#479 - Log a stack trace on parsing error for better debug experience

Pull Request - State: closed - Opened by HiromuHota over 4 years ago - 2 comments

#478 - Details of a parse error

Issue - State: closed - Opened by HiromuHota over 4 years ago - 4 comments

#477 - Correct the entity type for NumberMatcher from “NUMBER” to “CARDINAL”

Pull Request - State: closed - Opened by HiromuHota over 4 years ago - 1 comment

#476 - Support hOCR

Issue - State: closed - Opened by HiromuHota over 4 years ago - 6 comments

#475 - Adopt to isort v5.0.0

Pull Request - State: closed - Opened by HiromuHota over 4 years ago - 2 comments

#474 - make check fails because of an isort's breaking change

Issue - State: closed - Opened by HiromuHota over 4 years ago - 1 comment

#473 - NUMBER is not supported entity type by spaCy

Issue - State: closed - Opened by HiromuHota over 4 years ago

#472 - Add tests for `matchers.py`

Pull Request - State: closed - Opened by HiromuHota over 4 years ago - 1 comment
Labels: enhancement

#470 - Add tests for `data_model_utils.tabular`

Pull Request - State: closed - Opened by HiromuHota over 4 years ago - 2 comments

#469 - get_max_row_num is missing

Issue - State: closed - Opened by HiromuHota over 4 years ago

#468 - Do not explicitly try to install freetype, which is already installed

Pull Request - State: closed - Opened by HiromuHota over 4 years ago - 1 comment

#467 - GitHub Actions fail on macos

Issue - State: closed - Opened by HiromuHota over 4 years ago - 3 comments

#466 - Mock imports of "cloudpickle" and "mlflow"

Pull Request - State: closed - Opened by HiromuHota over 4 years ago - 1 comment
Labels: docs

#464 - The content for "MLflow model for Fonduer" is missing from Read The Docs

Issue - State: closed - Opened by HiromuHota over 4 years ago - 5 comments
Labels: docs

#456 - Feat/multary candidates

Pull Request - State: closed - Opened by wajdikhattel over 4 years ago - 12 comments
Labels: enhancement

#455 - Add multary-candidates for feature extraction

Issue - State: closed - Opened by wajdikhattel over 4 years ago

#433 - Try to reproduce #12, but can't

Pull Request - State: closed - Opened by HiromuHota over 4 years ago - 2 comments

#369 - train arg should have been removed by #335

Pull Request - State: closed - Opened by HiromuHota almost 5 years ago
Labels: clean-up

#366 - Readthedocs issue with Emmental

Issue - State: open - Opened by senwu almost 5 years ago - 3 comments
Labels: bug, docs

#333 - Inner HTML elements are processed after the tail text

Issue - State: closed - Opened by HiromuHota about 5 years ago - 2 comments

#270 - RegexMatchSpan with sep="" concatenates words with sep="(space)"

Issue - State: closed - Opened by HiromuHota over 5 years ago - 1 comment
Labels: bug

#265 - Add Word (Token) class as another data model

Issue - State: closed - Opened by HiromuHota over 5 years ago - 1 comment

#245 - Enable pip cache even when language: generic

Pull Request - State: closed - Opened by HiromuHota over 5 years ago - 1 comment

#217 - perf: do not store redundant feature strings for each candidate

Issue - State: open - Opened by lukehsiao over 5 years ago - 1 comment
Labels: help wanted, discussion

#200 - Modify docstring of functions that return get_sparse_matrix

Pull Request - State: closed - Opened by HiromuHota almost 6 years ago
Labels: docs

#170 - Visualizer fails with a PolicyError

Issue - State: closed - Opened by lukehsiao about 6 years ago - 1 comment
Labels: wontfix, tutorials

#166 - Visual position data for words is sometimes inaccurate

Issue - State: closed - Opened by mfboulos about 6 years ago - 7 comments
Labels: bug, help wanted, needs-info

#159 - GenerativeModel was succeeded by LabelModel from metal.label_model

Pull Request - State: closed - Opened by HiromuHota about 6 years ago
Labels: docs

#12 - Word mismatch between HTML and PDF for visual linker

Issue - State: open - Opened by lukehsiao over 6 years ago - 10 comments

#3 - Integrate new parser to support pdftotree output

Issue - State: closed - Opened by lukehsiao over 6 years ago - 6 comments
Labels: enhancement