Expanded Benchmarks

This documentation provides more specific information about how an organization might apply the benchmarks or determine the level of quality for records and collections of metadata.

Benchmark: criterion that must be met
Metric(s): mechanism or measurement to determine if a record/value meets the benchmark standard; these may depend on local guidelines and field usage
Examples and Notes: non-exhaustive list of additional clarifications and/or examples of values that may or may not meet a metric

In this model, the benchmark and metrics set the standard (i.e., the criteria that must be met to qualify for that quality level) and the examples show some ways that the standard might be applied for different local circumstances. Also note that minimal and ideal levels are clearly defined, while all intermediary benchmarking stages are left up to local organizations. The benchmarks in the “suggested” section describe suggested priorities for organizations setting “better-than-minimal” benchmarks for their metadata that fall between minimal and ideal.

General benchmarks usage:
- Each criterion is intended to be “system agnostic” but some may not apply to every situation (e.g., local field requirements)
- Criteria are binary – i.e., the set being evaluated must meet all points or it does not meet the benchmarking standard
- Benchmarks are cumulative – i.e., records must meet all the criteria at the chosen level and the lower levels, if relevant
- These benchmarks focus solely on the quality of metadata entry, not the quality of information – i.e., available information is all entered correctly, although we might wish that additional information is known about an item to improve the record
- This framework is intended to be scalable (it is written in the context of 1 record, but could apply across a collection, resource type, or an entire system)
- Minimal criteria apply in all cases; suggested criteria do not rise to the level of “absolute minimum” but are suggested as priorities for “better-than-minimal” based on our research and experience; ideal criteria tend to be more subjective and may not apply in every situation

Minimal Benchmarks

Minimal-level benchmarks are mostly objective (e.g., a value is present/not present) and should apply to all records

Benchmark	Metrics	Examples
The record is specific/scoped correctly	Values in a record for an individual item (e.g., a monograph, photograph, newsletter issue, etc.) describe only that item rather than multiple collection-level or serial-level items	Values that may be correct: Name(s) of the people or organizations who authored a text or issue but not all of the authors from a collection of texts A publisher value that matches an individual serial issue rather than multiple possible publisher values from an entire serial set Subject-type values or descriptions of the content that reflect the item rather than general collection-level descriptions or subjects describing multiple items
The record is specific/scoped correctly	Values in a record describing multiple items, or a collection of items, reflect all of the content attached to it	Possible issues to review: Multiple records that all have the same title and/or description (especially if there are a large number) Relevant information is absent from a collection-level record
Every record has a title	The title field is not empty	Possible issues to review: Any record that has no value or an effectively empty title value (e.g., a value consisting exclusively of whitespace)
Value content matches the field type	For fields that have a specific data type, content matches the specified type	Values that may be correct: A field requiring a date entry contains a date, e.g., formatted YYYY-MM-DD, rather than a text string, e.g., sometime in September A field requiring a binary entry (e.g., checked/unchecked) does not contain an alphanumeric text string or other value A field requiring a standard code value (e.g., language codes from ISO 639-2) does not contain other characters A field requiring a numeric value does not contain letters or other symbols
No values exceed applicable system character limits	No field in a record has more characters than any allowable limit	Possible issues to review: Any value outside of a standard, required length (e.g., identifiers) Any value longer than what is permitted by a local system on a technical level
There is no text encoding that “breaks” records	The record displays publicly without error messages The record can be successfully edited/saved administratively and indexed in the system	Note: This is dependent on local system limitations

Suggested Benchmarks

This category includes suggestions about what ought to be prioritized in local benchmarks to make records “better than minimal,” based on research and professional experience (noted in the justification column). These are mostly objective, but also include some subjective elements; suggested benchmarks are intended to be adjusted as needed and applied when applicable according to local requirements.

Benchmark	Justification	Metrics	Examples
The record describes the item that it is attached to (i.e., there is not a mismatch between an item and a record describing a different item)	This is a relatively fundamental need for metadata quality, but generally cannot be verified without manual review of every record (i.e., not scalable for large collections).	The preponderance of information in the record matches the content of the item	Note: This requires manually reviewing an individual record to see if values largely match the associated item
All locally-required fields have values	By definition, required fields should have values, but which fields are required (or available for usage) varies too much among schemas to be stated in a standardized way.	Any field required by the governing schema is not empty	Possible issues to remediate: Any record missing a value in a record that is deemed “required” by a local or relevant consortial schema such as an identifier, language, resource type, etc.
All conditionally-required fields have values	By definition, required fields should have values, but which fields are required (or available for usage) varies too much among schemas to be stated in a standardized way.	Any field required by the governing schema under other conditions (e.g., “required if available”) is not empty in records meeting those conditions	Possible issues to remediate: Any record missing a value in a record that is deemed “required when available” by a local or relevant consortial schema, e.g., fields labeled by DPLA as “required when available”: Collection Language Type
All conditionally-required fields have values		Any field required by the governing schema for a specific material type is not empty in records for that type	Possible issues to remediate: Any record for a resource type that is missing a value that is “required for <resource type>” by a local or relevant consortial schema – e.g., “creator is required for ETDs”
Fields that require multiple parts or qualifiers have all parts	This is an extension of required field values, however, not all assessment considers values and other parts (like qualifiers) in tandem and this also incorporates non-required fields when they are in usage.	Any field that has a local governing schema requiring a qualifier has a qualifier value when content is present Any field that has a local governing schema requiring multiple components has all parts	Possible issues to remediate: Any value missing a qualifier field (e.g., for QDC or locally-qualified metadata fields) Field values missing parts, e.g., if both publisher name & publisher location must be entered in a record and only one is present
Records have some type of subject value	Subject-based fields contain some of the only values that can help collocate related items across collections (including aggregations, like DPLA, Europeana, etc.) more broadly by topic. This also makes subject-based values a good candidate for review and normalization, if needed.	At least one subject-type field used by the governing schema has a value (e.g., subject, keyword, genre, etc.)	Values that may be correct: Terms from a local or general thesaurus, like LCSH Subjects from a specialized list or thesaurus like MeSH, the Art and Architecture Thesaurus, LC Medium of Performance Terms, Chenhall’s Nomenclature, Homosaurus, etc. Genre terms from a local list or controlled thesaurus, like the LC Genre/Form Terms Keywords relevant to the content of the item
A rights statement is present	Inclusion of rights information is broadly encouraged within the digital library community. Standardized rights statements were developed through international efforts.	A value describing rights associated with the item is in the record	Values that may be correct: An item has a clearly-defined creative commons license listed in the record The record contains a statement asserting copyright and/or listing the rights holder When possible, a standardized rights statement is in the record; organizations should consider implementing these for shareability (see: https://rightsstatements.org/en/documentation/)
Stray character encoding has been removed	This is a problem that tends to be relatively easy to find programmatically and, depending on string matching, can make a significant difference when terms are normalized.	Values do not include character encoding strings, mark-up values, or other non-displaying text (usually pasted in from another source)	Possible issues to remediate: PDF character encoding, like “'” instead of an apostrophe LaTex or other technical mark-up, like “.pi./sup +/, p” MARC subfields in names or subjects, like “$c” or “\|x”
All “placeholder” values have been replaced/removed and are not present in the publicly accessible record	Placeholders can be an indicator of information that is missing or records that need review; they may also be easy to find programmatically if placeholders are applied consistently in local records.	Values do not include any strings meant to be replaced with other text	Possible issues to remediate: The presence of text such as: YYYY-MM {{{name}}} [add info] <date value> other placeholder text
Extremely problematic/offensive terms have been removed or handled appropriately	Although comprehensive review and revision of records likely falls in the “ideal” category, it may be useful to think about that process iteratively and set first-level local priorities to address some problems more immediately.	Any values identified by the institution as priorities to remove for remediation are no longer present	Note: This will depend on historic local practice, collection content, and decisions made based on current remediation practices; in some locations, this may also be affected by legislation or other policies

Ideal Benchmarks

Ideal-level benchmarks are intended to describe a “perfect” metadata record, i.e., if all available information about a specific item has been entered correctly, according to local standards. Many of these benchmarks are more subjective or item-dependent and not every benchmark will apply depending on system requirements. All applicable benchmarks must be met for a record to be “ideal.”

Benchmark	Metrics	Examples
All metadata values align with expectations for the material type	Values in every field align with usage guidelines according to the local governing schema	Values that may be correct: A creator for a photograph is labeled as a photographer and a creator for a book is labeled as an author A thesis/dissertation has a creator value (rather than “unknown”) A published text item has a language value (rather than “no language”)
When applicable, relationships between items and parent collections are clearly represented	Field values reflecting relationships are not empty in line with the local governing schema	Values that may be correct: “Collection” names (e.g., if this is an available field) Notes referencing a larger collection or holdings Link(s) to an archival finding aid, catalog record, or similar documentation for a collection Series title or archival series information Relation or source information referencing a collection (depending on local usage)
Relevant recommended/optional fields have values	Recommended and optional fields are not empty when information is available	Note: This requires comparing an individual record to the item and any available supplementary information sources (e.g., catalog records, finding aids, handwritten notes in physical collections, information provided by a donor or subject expert, etc.) to determine if values have been entered
All relevant information about the item is included	Values accurately represent complete field information according to local guidelines When applicable, multiple entries (i.e., all relevant entries) are included in a field
Non-required qualifiers or field parts are added to provide enhanced information or functionality	Qualifiers and field part values are not empty when information is available
“Null” values are used consistently, according to local guidelines	Unused/non-populated fields are empty or contain specific required text based on the local governing schema	Values that may be correct: N/A, Not Applicable, Unknown (or similar) – if the schema requires one of these values Blank entry – if the schema requires unused fields to be left blank
Fields/subfields that cannot be repeated occur only once	No non-repeatable fields occur multiple times in a record No qualifier or field part that is non-repeatable occurs multiple times in a record	Possible issues to remediate: A record for a single item has multiple formats A record has multiple “creation” dates if only one is allowed
All values are appropriate lengths for their fields	The total number of characters and/or number of “tokens” (words or space-separated components) in each field value matches expectations of the local governing schema	Possible issues to remediate: Extremely short values (e.g., subjects that are only 1 or 2 characters long) Extremely long values (e.g., single name values more than 1,000 characters long) Note: Expected lengths will depend on local requirements, e.g., whether a field is repeatable (one term per entry) or if there is a single field with multiple separated terms
All values that ought to align with standards conform to applicable vocabularies or rules	Formatting for every field that aligns with a controlled vocabulary or standard is valid according to the relevant authority	Values that may be correct Date formatting matches EDTF, W3C, or other date standard in use Names match LCNAF, VIAF, or other name standards in use Locations align with TGM, GeoNames, or other location standard in use Subjects match LCSH, AAT, TGM, LCGFT, MeSH, or other subject standard(s) in use
All values are spelled correctly	There are no misspelled words Unusual spellings have been checked and verified	Notes: If available, a spell-checker may be helpful (e.g., in a browser, text editor, etc.) Some values – like names – may require manual checking or verification against other sources
Text fields use appropriate punctuation, grammar, abbreviations, etc.	Free-text fields meet any style requirements in the local governing schema	Possible issues to remediate: Text strings (not from controlled vocabularies or authorities) that don’t match local formatting requirements, e.g.: duplicated or missing punctuation and spacing variations in capitalization or lack of capitalization in proper names unexpected word order (e.g., names or subjects) irregular alphanumeric patterns (e.g., identifiers) Values that may be correct: Text that matches the expected tense (e.g., use of present or present-progressive tense) Text written in “complete sentences” or written out in specific component parts, according to local requirements
Reading level & language use is appropriate for all (relevant) communities or audiences	If there is a defined user group, word choice and metadata values meet expectations for the audience	Values that may be correct: Collections intended for students do not use language above the reading grade-level of users Materials intended for scientific research have appropriate technical terminology or phrasing, based on the expectations of the particular field
Vocabulary usage aligns with the needs of the audience and material type	If there is a defined user group, controlled fields use values in line with audience expectations	Values that may be correct: Use of MeSH terms in a medical collection (or collection intended for medical professionals) vs. LCSH or more general terms for a non-medical audience Names come from the Union List of Artist Names (ULAN) for an art-related collection
Values connected to interface functionality work	Metadata values associated with more complex functionality function as intended	Values that may be correct: Fields used locally for filtering searches or browsing (e.g., dates, subjects, locations, etc.) have values that are normalized to collocate information based on user selection or input Values that become clickable links in local systems (e.g., names, resource types, genres, etc.) are normalized
Record language has been evaluated/updated to align with best practices related to reparative metadata, inclusive language, etc.	Metadata field usage and values align with local best practices	Note: This will depend on historic local practice, collection content, and decisions made based on current remediation practices; in some locations, this may also be affected by legislation or other policies