Changes in version 0.9.16 o Fixed broken links in documentation o Removed example causing spurious ASAN error on some systems. Changes in version 0.9.15 (2025-01-10) o Fixe issue with zero-length 'nthreads' argument in all exported functions with this parameter. (Thanks to Brian Ripley for the notification and pointer to the problem) Changes in version 0.9.14 (2024-12-10) o Fixed issue with zero-length strings in 'qgrams' (Thanks to Brian Ripley for the notification and pointer to the origin of the problem) Changes in version 0.9.12 (2023-11-28) o apparently R_xlen_t is long long int on CLANG/Windows and long int on gcc-13/debian Changes in version 0.9.11 o Fixed a warning in gcc-13: changed specifier from %d to %ld. (Thanks to Kurt Hornik for the head's up) Changes in version 0.9.10 (2022-11-07) o Fixed another warning generated by new C compiler that I overlooked. (Thanks to the CRAN team for the head's up) Changes in version 0.9.9 (2022-10-20) o Fixed warnings generated by new C compiler. (function prototypes must now be defined completely). (Thanks to Kurt Hornik for the head's up.) Changes in version 0.9.8 (2021-09-09) o Fixed some issues on C-level causing problems with the CLANG compiler. (Thanks to Brian Ripley for not only reporting this, but also sending updated code with fixes). Changes in version 0.9.7 (2021-07-28) o Fixes in use of INTEGER() and VECTOR_ELT() after updates in R's C API. this affected 'afind' and 'max_length' (internally). (Thanks to Luke Tierny and Kurt Hornik for the notification). o Fix in 'amatch' causing utf-8 characters to be ignored in some cases (thanks to Joan Mime for reporting #78). o Fix: segfault when 'afind' was called with many search patterns or many texts to be searched. o Fix: stringsimmatrix was not normalized correctly (Thanks to Tamas Ferenci for reporting GH). Changes in version 0.9.6.3 (2020-10-09) o Resubmit. Fixed an URL redirect that was detected by CRAN. Changes in version 0.9.6.2 o Resubmit. Fixed url issues detected by CRAN, added doi to description as per CRAN request. Changes in version 0.9.6.1 o Bugfix: afind/grab/grabl returned wrong results on MacOS only. (thanks to Prof. Brian Ripley for the notification and for running tests on his personal machine and to Tomas Kalibera for making the ubuntu-rchk docker image available). Changes in version 0.9.6 (2020-07-16) o New function 'afind': find approximate matches in text based on string distance. o New functions 'grab', 'grabl': fuzzy matching equivalent to 'grep' and 'grepl'. o New function 'extract': fuzzy matching equivalent of stringr::str_extract. o New algorithm 'running_cosine': fast fuzzy text search using cosine distance. o New function 'stringsimmatrix' (Thanks to Johannes Gruber). o Number of threads used is now reported when loading 'stringdist'. o Internal fixes (in some cases class() == 'class' was used). Changes in version 0.9.5.5 (2019-10-21) o Changed two URLs to canonical form in README.md (https://) to comply with CRAN policy. Changes in version 0.9.5.4 o Some tests using seq_dist() would fail unpredictably when the input was defined with lazily evaluated arguments, e.g. list(1:3, 2:4); but only in the context of NSE by a test suite ('tinytest', 'testthat'). Tests were replaced by literal versions, e.g. list(c(1,2,3), c(2,3,4)). Changes in version 0.9.5.3 (2019-10-11) o Update in test suite to stay on CRAN Changes in version 0.9.5.2 (2019-06-06) o RJournal paper and C/C++ api docs are now presented as vignette. o Switched to tinytest framework o Fix: stringdist could cause a segfault for edit distances between very long strings. (Thanks to GH user gllipatz) Changes in version 0.9.5.1 (2018-06-08) o Fixed header file for C API Changes in version 0.9.5.0 (2018-06-07) o New contributor: Chris Muir o C/C++ API now exposed for packages LinkingTo stringdist. See `?stringdist_api` o Arguments 'maxDist', 'ncores', 'cluster' of functions 'stringdist' and 'stringdistmatrix' have been deprecated for several years and are now removed. o Fixed edge case where cosine distance with q=1, between strings of repeating characters yielded Inf (Thanks to Markus Dumke) Changes in version 0.9.4.6 (2017-07-31) o Fixed argument passing error in lower_tri (thanks to Kurt Hornik) Changes in version 0.9.4.5 (2017-07-27) o New argument 'bt' implementing Winkler's boost threshold for the Jaro-Winkler distance o stringdist(a,b,method="qgram") returns correct value when q>nchar(a) (or b). (Thanks to Giora Simchoni). Also affects stringdistmatrix, amatch, seq_dist, and seq_distmatrix. o registered native routines as now recommended by CRAN Changes in version 0.9.4.4 (2016-12-16) o updated default nr of threads to comply to CRAN policy (thanks to Kurt Hornik). The default nr of cores now equals OMP_NUM_THREADS if set. See ?'stringdist-parallelization' for the full policy. Changes in version 0.9.4.2 (2016-09-09) o bugfix in stringdistmatrix(a): value of p, for jw-distance was ignored (thanks to Max Fritsche) o bugfix in stringdistmatrix(a): Would segfault on q-gram w/input > ~7k strings and q>1 (thanks to Connor McKay) o bugfix in jaccard distance: distance not always correct when passing multiple strings (thanks to Robert Carlson) Changes in version 0.9.4.1 (2016-01-02) o stringdistmatrix(a) now outputs long vectors (issue #45, thanks to Wouter Touw). For stringdistmatrix(a,b) this was already the case, but the length of rows and columns remains restricted to 2^31-1 since long input vectors are not supported (yet). o bugfix in osa/dl/lv distances w/unequal edit weights (thanks to Nathalia Potocka) Changes in version 0.9.4 (2015-10-26) o bugfix: edge case for zero-size for lower tridiagonal dist matrices (caused UBSAN to fire, but gave correct results). o bugfix in jw distance: not symmetric for certain cases (thanks to github user gtumuluri) Changes in version 0.9.3 (2015-08-21) o new function for tokenizing integer sequences: seq_qgrams o new function for matching integer sequences: seq_amatch o new functions computing distances between integer sequences: seq_dist, seq_distmatrix o q-gram based distances are now always 0 when q=0 (used to be Inf if at least one of the arguments was not the empty string) o stringdist, stringdistmatrix now emit warning when presented with 'list' argument o small c-side code optimizations o bugfix in dl, lv, osa distance: weights were not taken into account properly (thanks to Zach Price) Changes in version 0.9.2 (2015-06-24) o Update fixing some errors (missing documentation, tests) in the 0.9.1 release. o Fixed a few possible memory leaks. Changes in version 0.9.1 (2015-06-22) o Argument 'useNames' of 'stringdistmatrix' now accepts 'none', 'strings', and 'names' o New function 'stringsim' computes string similarities between 0 and 1 based on 'stringdist' o Calling 'stringdistmatrix' with a single argument returns an object of class 'dist' o Argument 'cluster' to stringdistmatrix is phased out. It is now ignored with a message. o Specifying 'ncores' was already ignored but now also causes a warning o internal: rewrite of the R/C interface, saving about 1/3 of C-code, making extending easier o bugfix in stringdistmatrix: output was transposed when length(a)==1 (Thanks to github user cpoonolly) o Safer core detection to avoid a failure under Cygwin (thanks to Lauri Koobas) Changes in version 0.9.0 (2015-01-10) o C-code underlying stringdist and amatch now automatically use multithreading based on openMP. The default number of threads is governed by options('sd_num_thread'). o stringdist, stringdistmatrix, amatch and ain gain nthread argument which can overwrite the default maximum number of threads. o Argument 'maxDist' is phased out for 'stringdist' and 'stringdistmatrix'. Specifying it causes a message. o Argument 'ncores' is phased out for 'stringdistmatrix'. It is now ignored and specifying it causes a message. o bugfix in amatch/dl. In certain cases, the best match went undetected. o Documentation improved and rearranged with string metrics, encoding, and parallelization now documented as separate topics. Changes in version 0.8.2 (2014-12-16) o Fixed a few warnings issued by the CLANG compiler (thanks to Brian Ripley). This fixes a bug in amatch/jaccard o Fixed a bug in stringdist/osa, dl: NA incorectly returned (thanks to Lauri Koobas). Changes in version 0.8.1 (2014-10-07) o stringdistmatrix returns dimensionless matrix when both arguments have length zero (thanks to Richie Cotton) o stringdistmatrix gains argument 'useNames' (thanks to Richie Cotton) o Package now 'Imports' parallel rather than 'Depends' on it. o bugfix in optimal string alignment distance: the nr of transpositions was sometimes overcounted (thanks to Frank Binder) o rearranged the documentation. Changes in version 0.8.0 (2014-08-08) o Added soundex-based string distance (thanks to Jan van der Laan) o New function 'phonetic' translates strings to phonetic codes using soundex (thanks to Jan van der Laan) o New function 'printable_ascii' detects non-printable ascii or non-ascii characters. o Precision issue: cosine distance between equal strings would be O(1e-16) in stead of 0.0 (thanks to Ben Haller). o Code cleaning: somewhat better performance when maxDist is unspecified in stringdist. It remains deprecated. o Row names in the output array of 'qgrams' are now in system native encoding (used to be utf8 for all systems). o updated CITATION with page number info as the R Journal is now out. Changes in version 0.7.3 (2014-05-16) o bugfix in jw-distance: out-of-range access in C-code caused R to crash in some cases (thanks to Carol Gan) o bugfix in dl distance: in some cases, distances could be one unit too high. o Updated CITATION file: paper to appear in The R Journal vol 6 (2014). o Some updates in documentation. Changes in version 0.7.2 (2014-03-02) o function 'qgrams' gains .list argument o bugfix in multicore option of stringdistmatrix o bugfix in substitution weight of DL-distance (undercounted when w4 != 1 in some cases) o bugfix in dl.c: C-function read outside of array. Changes in version 0.7.0 (2013-09-06) o added useBytes option: up to ~3-fold speed gain at the cost of possible encoding-dependent results. o new memory allocation method for q-grams increases speed between ~5% and ~30% depending on q and input string. o function 'qgrams' gains useNames option. o jaro-winkler distance gains weight argument. o C-code optimization in edit-based distances: 10~20% speed increase depending on input. o bugfix in amatch: sometimes NA was erroneously returned. o bugfix in amatch/lcs: hamming distance method was called erroneously. Changes in version 0.6.1 (2013-08-09) o bugfix in parallel version of stringdistmatrix: parameter p was not passed (thanks to Ricardo Saporta) o bugfix in lv/osa/dl: maxDist ignored in certain cases Changes in version 0.6.0 (2013-07-19) o added amatch function: approximate matching version of 'match' o added ain function: approximate matching version of '%in%' o qgrams now accepts arbitrary number of arguments. Outputs array, not table o added cosine distance o added Jaccard distance o added Jaro and Jaro-Winkler distances o small performance tweeks in underlying C code o Edge case in stringdistmatrix: output is now always of class matrix o Default maxDist is now Inf (this is only to make it more intuitive and does not break previous code) o BREAKING CHANGE: output -1 is replaced by Inf for all distance methods Changes in version 0.5.0 (2013-06-21) o added qgram counting function 'qgrams' o faster edge case handling in osa method. o edge case in lv/osa/dl methods: distance returned length(b) in stead of -1 when length(a) == 0, maxDist < length(b). o bugfix in lv/osa/dl method: maxDist returned when length(a) > maxDist > 0 (thanks to Daniel Reckhard). o Hamming distance (method='h') now returns -1 for strings of unequal lengts (used to emit error). o added longest common substring distance (method='lcs'). o added qgram distance method. o stringdistmatrix gains cluster argument. Changes in version 0.4.2 o Fix in error message for hamming distance o Workaround for system-dependent translation of utf8 NA characters Changes in version 0.4.0 o First release