Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / moj-analytical-services/splink issues and pull requests

#1019 - Bug: Comparison.__deepcopy__() doesn't respect subclassing

Issue - State: open - Opened by NickCrews about 2 years ago
Labels: nice to have

#1018 - feat: Support sqlglot versions >=5.1.0

Pull Request - State: closed - Opened by NickCrews about 2 years ago

#1011 - [FEAT] Support for embedding-based similarity functions

Issue - State: open - Opened by OlivierBinette about 2 years ago - 26 comments
Labels: enhancement, comparison levels

#1007 - Examples/tutorials for custom comparisons

Issue - State: closed - Opened by samnlindsay about 2 years ago - 2 comments
Labels: documentation

#1006 - fix: Safeguard against rounding/overflow errors in great_circle_distance_km_sql()

Pull Request - State: closed - Opened by NickCrews about 2 years ago - 1 comment

#1004 - Use Ruff as a linter

Pull Request - State: closed - Opened by NickCrews about 2 years ago - 4 comments

#1003 - Drop python 3.6 support

Pull Request - State: closed - Opened by NickCrews about 2 years ago - 2 comments

#1001 - Create general function to profile clusters

Issue - State: open - Opened by RossKen about 2 years ago - 1 comment
Labels: enhancement, good first issue, profiling, clustering

#1000 - SQLite random SQL doesn't allow customised `unique_id_column_name`

Issue - State: open - Opened by ADBond about 2 years ago
Labels: bug, good first issue, sqlite

#996 - Profiling upgrades/fixes

Issue - State: open - Opened by samnlindsay about 2 years ago
Labels: good first issue, profiling

#995 - Robertwhiffin udf register fix

Pull Request - State: closed - Opened by ThomasHepworth about 2 years ago - 2 comments

#992 - feat: Support sqlglot >=5.1.0

Pull Request - State: closed - Opened by NickCrews about 2 years ago - 4 comments

#985 - Awards and citation

Pull Request - State: closed - Opened by RobinL about 2 years ago - 1 comment

#981 - `add_l_or_r_to_identifier` now has case for type exp.Lambda

Pull Request - State: closed - Opened by ThomasHepworth about 2 years ago - 1 comment

#969 - Does the completeness chart works

Issue - State: closed - Opened by RobinL about 2 years ago
Labels: profiling

#962 - (WIP) 961 ideas for improving caching

Pull Request - State: closed - Opened by RobinL about 2 years ago - 9 comments

#961 - Ideas for improving caching

Issue - State: closed - Opened by RobinL about 2 years ago - 5 comments
Labels: caching

#947 - Tf tables not being correctly referenced in 'estimate_probability_two_random_records_match'

Issue - State: open - Opened by RobinL about 2 years ago
Labels: check if still an issue, term frequency

#946 - black and bump version to 3.5.1

Pull Request - State: closed - Opened by RobinL about 2 years ago - 1 comment

#943 - Update and lint docstring

Pull Request - State: closed - Opened by RobinL about 2 years ago - 1 comment

#942 - Bump jsonschema dependency to ensure Splink works in latest jupyterlab

Pull Request - State: closed - Opened by RobinL about 2 years ago - 1 comment

#935 - [object Object] in cluster studio & comparison viewer tables

Issue - State: closed - Opened by ADBond about 2 years ago
Labels: bug, good first issue, graphs

#931 - [DOCS] Add data prep pre-requisites section to docs

Pull Request - State: closed - Opened by RobinL about 2 years ago - 1 comment

#930 - [DOCS] Add m estimation from pairwise (clerical) labels example

Pull Request - State: closed - Opened by RobinL about 2 years ago - 1 comment

#925 - Fix tests

Pull Request - State: closed - Opened by ThomasHepworth about 2 years ago - 1 comment

#916 - `__splink__df_concat_with_tf` cache reused if two separate linkers in play

Issue - State: closed - Opened by RobinL about 2 years ago - 2 comments
Labels: bug

#910 - `RANDOM()` / `RAND()` backend compatibility

Issue - State: closed - Opened by samnlindsay about 2 years ago - 1 comment
Labels: bug, spark

#907 - Return settings dict from save_settings_to_json()

Pull Request - State: closed - Opened by NickCrews about 2 years ago - 2 comments

#896 - docs: Fix link to settings_jsonschema.json

Pull Request - State: closed - Opened by NickCrews about 2 years ago - 1 comment

#885 - Comparison level logical composition

Issue - State: closed - Opened by ADBond about 2 years ago - 3 comments
Labels: enhancement, Interface/API improvement, comparison levels

#884 - Error if input dataframes already have a column named `source_dataset`

Issue - State: open - Opened by RobinL about 2 years ago - 1 comment
Labels: check if still an issue, validation

#882 - Toy Example

Issue - State: closed - Opened by firmai about 2 years ago - 5 comments
Labels: validation

#880 - InputColumn class is ignoring index

Issue - State: closed - Opened by mamonu about 2 years ago - 2 comments
Labels: documentation, good first issue, comparison levels

#879 - More detail on missing trained values in `linker.predict()`

Issue - State: open - Opened by ADBond about 2 years ago - 1 comment
Labels: enhancement, model training

#865 - Update pyproject.toml

Pull Request - State: closed - Opened by mamonu over 2 years ago - 4 comments

#852 - Unclear error if EM training blocking rule creates empty link table

Issue - State: closed - Opened by ADBond over 2 years ago - 1 comment
Labels: bug

#850 - github action for testing py3.6 compatibility

Pull Request - State: closed - Opened by mamonu over 2 years ago - 2 comments

#849 - [DOCS] Clarify best data for Splink

Pull Request - State: closed - Opened by RobinL over 2 years ago - 1 comment

#845 - `comparison_viewer_dashboard` breaks if `output_column_name` contains spaces

Issue - State: closed - Opened by ADBond over 2 years ago
Labels: bug

#839 - Comparison viewer filters doesn't use level labels

Issue - State: closed - Opened by ADBond over 2 years ago - 1 comment
Labels: enhancement, Interface/API improvement, graphs

#825 - [FEAT] Add match probability to precision recall and roc

Pull Request - State: closed - Opened by RobinL over 2 years ago - 1 comment

#824 - [FIX] Fix overlapping bars problem in match weight and m and u values charts

Pull Request - State: closed - Opened by RobinL over 2 years ago - 1 comment

#810 - [FIX] Add preceding blocking rules to eliminate dupes in `find_matches_to_new_records`

Pull Request - State: closed - Opened by RobinL over 2 years ago - 1 comment

#808 - Missingness chart fails if tables don't have same columns

Issue - State: closed - Opened by ADBond over 2 years ago - 2 comments
Labels: bug

#807 - Add F1 score to ROC and precision/recall charts

Pull Request - State: closed - Opened by NickCrews over 2 years ago - 1 comment

#802 - `compare_two_records` needs to check whether tf tables exist

Issue - State: open - Opened by RobinL over 2 years ago - 5 comments
Labels: bug, check if still an issue

#801 - [FIX] Improve poor performance of linker.prediction_errors_from_labels_table in Spark

Pull Request - State: closed - Opened by RobinL over 2 years ago - 1 comment

#793 - Accuracy analysis from labels column assumes blocking rules have perfect recall

Issue - State: open - Opened by RobinL over 2 years ago
Labels: model qa

#694 - decimal can only support precision up to 38

Issue - State: closed - Opened by KalaniStanton over 2 years ago - 4 comments

#680 - Arrays/structs break when loading pandas df into `duckdb`

Issue - State: closed - Opened by ThomasHepworth over 2 years ago - 2 comments
Labels: duckdb, check if still an issue

#666 - Unable to create `Comparison` for a function with a non-default schema

Issue - State: closed - Opened by philip-hunt-kani over 2 years ago - 6 comments
Labels: check if still an issue

#657 - `pd.NA` is treated as a string value when registered in a db.

Issue - State: closed - Opened by ThomasHepworth over 2 years ago - 5 comments
Labels: bug, duckdb, check if still an issue

#642 - Apply accessibility guidelines to splink documentation

Issue - State: open - Opened by mamonu over 2 years ago - 1 comment
Labels: documentation

#539 - Add density-based sampling to splink cluster studio

Issue - State: closed - Opened by RobinL over 2 years ago - 2 comments
Labels: enhancement, clustering

#430 - Add chart to show TF adjustments for specific values

Issue - State: closed - Opened by samnlindsay almost 3 years ago - 1 comment
Labels: enhancement, model training, term frequency

#402 - Allow `linker.train_m_from_deterministic_rule()`

Issue - State: open - Opened by RobinL almost 3 years ago - 1 comment
Labels: good first issue, model training

#251 - One-to-one matching

Issue - State: open - Opened by lucasmalherbe about 3 years ago - 6 comments
Labels: enhancement

#215 - Add default postcode comparison function

Issue - State: closed - Opened by samnlindsay over 3 years ago - 7 comments
Labels: comparison levels

#199 - Profiling of dates/quantities with a histogram

Issue - State: open - Opened by samnlindsay almost 4 years ago
Labels: good first issue, profiling