Some details how
BibMatch is currently used at DESY
The code is standard.
Config file
Some settings in
invenio-local.conf.txt
FUZZY_WORDLIMITS set to
N=2 for authors (not used yet) hardcoded use 1st_author and one of secondary authors.
N=2 for title resulting in
use N+1 longest words out of the original title search,
combine searches for all pairs of N words. Example: N=2 -> 3 words in total: W1 W2 W3.
(title:W1 title:W2) or (title:W1 title:W3) or (title:W2 title:W3)
VALIDATION_RULESETS
by default title is not in the validator, found to be too picky
Validate report number, DOI and authors in lazy mode (i.e. one match yields OK)
Validator for authors is important since the search is only for lastnames.
Only the validator uses firstnames.
If there is a report number or DOI
validate only this field + authors
SEARCH_RESULT_MATCH_LIMIT
set to 40
I.e. if the search gives more than 40 results before the validator, no match is returned
Search queries
the
standard author + title query uses all authors (lastname only) and the first 3 words of the title (that are longer than 3 letters, no punktuation, no numbers, not 'with', 'from', 'erratum', 'addendum', 'publisher.s note')
If the new record has a
CNUM use
cnum.config search for:
arXiv-ID
DOI
1st_author and CNUM
standard author + title query
Default is
allauth.config search for:
arXiv-ID
DOI
standard author + title query