Expanded Benchmarks

This documentation provides more specific information about how an organization might apply the benchmarks or determine the level of quality for records and collections of metadata.

  • Benchmark: criterion that must be met

  • Metric(s): mechanism or measurement to determine if a record/value meets the benchmark standard; these may depend on local guidelines and field usage

  • Examples and Notes: non-exhaustive list of additional clarifications and/or examples of values that may or may not meet a metric

In this model, the benchmark and metrics set the standard (i.e., the criteria that must be met to qualify for that quality level) and the examples show some ways that the standard might be applied for different local circumstances. Also note that minimal and ideal levels are clearly defined, while all intermediary benchmarking stages are left up to local organizations. The benchmarks in the “suggested” section describe suggested priorities for organizations setting “better-than-minimal” benchmarks for their metadata that fall between minimal and ideal.

  • General benchmarks usage:

    • Each criterion is intended to be “system agnostic” but some may not apply to every situation (e.g., local field requirements)

    • Criteria are binary – i.e., the set being evaluated must meet all points or it does not meet the benchmarking standard for that level

    • These benchmarks focus solely on the quality of metadata entry, not the quality of information (i.e., available information is all entered correctly, although we might wish that additional information is known about an item to improve the record)

    • This framework is intended to be scalable (it is written in the context of 1 record, but could apply across a collection, resource type, or an entire system)

Minimal Benchmarks

Minimal-level benchmarks are mostly objective (e.g., a value is present/not present) and should apply to all records

Benchmark

Metrics

Examples

The record is specific/scoped correctly

  • Values in a record for an individual item (e.g., a monograph, photograph, newsletter issue, etc.) describe only that item rather than multiple collection-level or serial-level items

Values that may be correct:

  • Name(s) of the people or organizations who authored a text or issue but not all of the authors from a collection of texts

  • A publisher value that matches an individual serial issue rather than multiple possible publisher values from an entire serial set

  • Subject-type values or descriptions of the content that reflect the item rather than general collection-level descriptions or subjects describing multiple items

  • Values in a record describing multiple items, or a collection of items, reflect all of the content attached to it

Possible issues to review:

  • Multiple records that all have the same title and/or description (especially if there are a large number)

  • Relevant information is absent from a collection-level record

Every record has a title

  • The title field is not empty

Possible issues to review:

  • Any record that has no value or an effectively empty title value (e.g., a value consisting exclusively of whitespace

Value content matches the field type

  • For fields that have a specific data type, content matches the specified type

Values that may be correct:

  • A field requiring a date entry contains a date, e.g., formatted YYYY-MM-DD, rather than a text string, e.g., sometime in September

  • A field requiring a binary entry (e.g., checked/unchecked) does not contain an alphanumeric text string or other value

  • A field requiring a standard code value (e.g., language codes from ISO 639-2) does not contain other characters

  • A field requiring a numeric value does not contain letters or other symbols

No values exceed applicable system character limits

  • No field in a record has more characters than any allowable limit

Possible issues to review:

  • Any value outside of a standard, required length (e.g., identifiers)

  • Any value longer than what is permitted by a local system on a technical level

There is no text encoding that “breaks” records

  • The record displays publicly without error messages

  • The record can be successfully edited/saved administratively and indexed in the system

Note:

This is dependent on local system limitations

Suggested Benchmarks

This category includes suggestions about what ought to be prioritized in local benchmarks to make records “better than minimal,” based on research and professional experience (noted in the justification column). These are mostly objective, but also include some subjective elements; suggested benchmarks are intended to be adjusted as needed and applied when applicable according to local requirements.

Benchmark

Justification

Metrics

Examples

The record describes the item that it is attached to

(i.e., there is not a mismatch between an item and a record describing a different item)

This is a relatively fundamental need for metadata quality, but generally cannot be verified without manual review of every record (i.e., not scalable for large collections).

  • The preponderance of information in the record matches the content of the item

Note:

  • This requires manually reviewing an individual record to see if values largely match the associated item

All locally-required fields have values

By definition, required fields should have values, but which fields are required (or available for usage) varies too much among schemas to be stated in a standardized way.

  • Any field required by the governing schema is not empty

Possible issues to remediate:

  • Any record missing a value in a record that is deemed “required” by a local or relevant consortial schema such as an identifier, language, resource type, etc.

All conditionally-required fields have values

By definition, required fields should have values, but which fields are required (or available for usage) varies too much among schemas to be stated in a standardized way.

  • Any field required by the governing schema under other conditions (e.g., “required if available”) is not empty in records meeting those conditions

Possible issues to remediate:

  • Any record missing a value in a record that is deemed “required when available” by a local or relevant consortial schema, e.g., fields labeled by DPLA as “required when available”:

    • Collection

    • Language

    • Type

  • Any field required by the governing schema for a specific material type is not empty in records for that type

Possible issues to remediate:

  • Any record for a resource type that is missing a value that is “required for <resource type>” by a local or relevant consortial schema – e.g., “creator is required for ETDs”

Fields that require multiple parts or qualifiers have all parts

This is an extension of required field values, however, not all assessment considers values and other parts (like qualifiers) in tandem and this also incorporates non-required fields when they are in usage.

  • Any field that has a local governing schema requiring a qualifier has a qualifier value when content is present

  • Any field that has a local governing schema requiring multiple components has all parts

Possible issues to remediate:

  • Any value missing a qualifier field (e.g., for QDC or locally-qualified metadata fields)

  • Field values missing parts, e.g., if both publisher name & publisher location must be entered in a record and only one is present

Records have some type of subject value

Subject-based fields contain some of the only values that can help collocate related items across collections (including aggregations, like DPLA, Europeana, etc.) more broadly by topic. This also makes subject-based values a good candidate for review and normalization, if needed.

  • At least one subject-type field used by the governing schema has a value (e.g., subject, keyword, genre, etc.)

Values that may be correct:

  • Terms from a local or general thesaurus, like LCSH

  • Subjects from a specialized list or thesaurus like MeSH, the Art and Architecture Thesaurus, LC Medium of Performance Terms, Chenhall’s Nomenclature, Homosaurus, etc.

  • Genre terms from a local list or controlled thesaurus, like the LC Genre/Form Terms

  • Keywords relevant to the content of the item

A rights statement is present

Inclusion of rights information is broadly encouraged within the digital library community. Standardized rights statements were developed through international efforts.

  • A value describing rights associated with the item is in the record

Values that may be correct:

  • An item has a clearly-defined creative commons license listed in the record

  • The record contains a statement asserting copyright and/or listing the rights holder

  • When possible, a standardized rights statement is in the record; organizations should consider implementing these for shareability (see: https://rightsstatements.org/en/documentation/)

Stray character encoding has been removed

This is a problem that tends to be relatively easy to find programmatically and, depending on string matching, can make a significant difference when terms are normalized.

  • Values do not include character encoding strings, mark-up values, or other non-displaying text (usually pasted in from another source)

Possible issues to remediate:

  • PDF character encoding, like “'” instead of an apostrophe

  • LaTex or other technical mark-up, like “.pi./sup +/, p”

  • MARC subfields in names or subjects, like “$c” or “|x”

All “placeholder” values have been replaced/removed and are not present in the publicly accessible record

Placeholders can be an indicator of information that is missing, or records that need review and may be easy to find programmatically if placeholders are applied consistently in local records.

  • Values do not include any strings meant to be replaced with other text

Possible issues to remediate:

  • The presence of text such as:

    • YYYY-MM

    • {{{name}}}

    • [add info]

    • <date value>

    • other placeholder text

Extremely problematic/offensive terms have been removed or handled appropriately

Although comprehensive review and revision of records likely falls in the “ideal” category, it may be useful to think about that process iteratively and set first-level local priorities to address some problems more immediately.

  • Any values identified by the institution as priorities to remove for remediation are no longer present

Note:

  • This will depend on historic local practice, collection content, and decisions made based on current remediation practices; in some locations, this may also be affected by legislation or other policies

Ideal Benchmarks

Ideal-level benchmarks are intended to describe a “perfect” metadata record, i.e., if all available information about a specific item has been entered correctly, according to local standards. Many of these benchmarks are more subjective or item-dependent and not every benchmark will apply depending on system requirements. All applicable benchmarks must be met for a record to be “ideal.”

Benchmark

Metrics

Examples

All metadata values align with expectations for the material type

  • Values in every field align with usage guidelines according to the local governing schema

Values that may be correct:

  • A creator for a photograph is labeled as a photographer and a creator for a book is labeled as an author

  • A thesis/dissertation has a creator value (rather than “unknown”)

  • A published text item has a language value (rather than “no language”)

When applicable, relationships between items and parent collections are clearly represented

  • Field values reflecting relationships are not empty in line with the local governing schema

Values that may be correct:

  • “Collection” names (e.g., if this is an available field)

  • Notes referencing a larger collection or holdings

  • Link(s) to an archival finding aid, catalog record, or similar documentation for a collection

  • Series title or archival series information

  • Relation or source information referencing a collection (depending on local usage)

Relevant recommended/optional fields have values

  • Recommended and optional fields are not empty when information is available

Note:

  • This requires comparing an individual record to the item and any available supplementary information sources (e.g., catalog records, finding aids, handwritten notes in physical collections, information provided by a donor or subject expert, etc.) to determine if values have been entered

All relevant information about the item is included

  • Values accurately represent complete field information according to local guidelines

  • When applicable, multiple entries (i.e., all relevant entries) are included in a field

Non-required qualifiers or field parts are added to provide enhanced information or functionality

  • Qualifiers and field part values are not empty when information is available

“Null” values are used consistently, according to local guidelines

  • Unused/non-populated fields are empty or contain specific required text based on the local governing schema

Values that may be correct:

  • N/A, Not Applicable, Unknown (or similar) – if the schema requires one of these values

  • Blank entry – if the schema requires unused fields to be left blank

Fields/subfields that cannot be repeated occur only once

  • No non-repeatable fields occur multiple times in a record

  • No qualifier or field part that is non-repeatable occurs multiple times in a record

Possible issues to remediate:

  • A record for a single item has multiple formats

  • A record has multiple “creation” dates if only one is allowed

All values are appropriate lengths for their fields

  • The total number of characters and/or number of “tokens” (words or space-separated components) in each field value matches expectations of the local governing schema

Possible issues to remediate:

  • Extremely short values (e.g., subjects that are only 1 or 2 characters long)

  • Extremely long values (e.g., single name values more than 1,000 characters long)

Note:

  • Expected lengths will depend on local requirements, e.g., whether a field is repeatable (one term per entry) or if there is a single field with multiple separated terms

All values that ought to align with standards conform to applicable vocabularies or rules

  • Formatting for every field that aligns with a controlled vocabulary or standard is valid according to the relevant authority

Values that may be correct

  • Date formatting matches EDTF, W3C, or other date standard in use

  • Names match LCNAF, VIAF, or other name standards in use

  • Locations align with TGM, GeoNames, or other location standard in use

  • Subjects match LCSH, AAT, TGM, LCGFT, MeSH, or other subject standard(s) in use

All values are spelled correctly

  • There are no misspelled words

  • Unusual spellings have been checked and verified

Notes:

  • If available, a spell-checker may be helpful (e.g., in a browser, text editor, etc.)

  • Some values – like names – may require manual checking or verification against other sources

Text fields use appropriate punctuation, grammar, abbreviations, etc.

  • Free-text fields meet any style requirements in the local governing schema

Values that may be correct:

  • Text that matches the expected tense (e.g., use of present or present-progressive tense)

  • Text written in “complete sentences” or written out in specific component parts, according to local requirements

Reading level & language use is appropriate for all (relevant) communities or audiences

  • If there is a defined user group, word choice and metadata values meet expectations for the audience

Values that may be correct:

  • Collections intended for students do not use language above the reading grade-level of users

  • Materials intended for scientific research have appropriate technical terminology or phrasing, based on the expectations of the particular field

Vocabulary usage aligns with the needs of the audience and material type

  • If there is a defined user group, controlled fields use values in line with audience expectations

Values that may be correct:

  • Use of MeSH terms in a medical collection (or collection intended for medical professionals) vs. LCSH or more general terms for a non-medical audience

  • Names come from the Union List of Artist Names (ULAN) for an art-related collection

Values connected to interface functionality work

  • Metadata values associated with more complex functionality function as intended

Values that may be correct:

  • Fields used locally for filtering searches or browsing (e.g., dates, subjects, locations, etc.) have values that are normalized to collocate information based on user selection or input

  • Values that become clickable links in local systems (e.g., names, resource types, genres, etc.) are normalized

Record language has been evaluated/updated to align with best practices related to reparative metadata, inclusive language, etc.

  • Metadata field usage and values align with local best practices

Note:

  • This will depend on historic local practice, collection content, and decisions made based on current remediation practices; in some locations, this may also be affected by legislation or other policies