DOCiD™ Metadata Schema
Schema Overview
The DOCiD™ metadata schema implements a comprehensive framework for managing digital publications with persistent identifiers. Built on PostgreSQL with SQLAlchemy ORM, it provides robust data management with full referential integrity.
Key Features
- 7-step workflow for publication registration
- Multi-identifier support (DOI, Handle, CSTR, ARK)
- Hierarchical comment system with threading
- Comprehensive audit trail
- JSON metadata extensions
- External service integrations
Database Architecture
- PostgreSQL backend (JSON support required)
- SQLAlchemy ORM with migrations
- Cascading relationships
- Comprehensive indexing
- Soft delete support
- Audit logging
Core Data Models
DOCiD™ implements the following primary database models for managing publications and related entities:
Model | Table Name | Purpose | Key Relationships |
---|---|---|---|
Publications | publications | Central publication entity | → Files, Documents, Creators, Organizations, Funders, Projects |
PublicationFiles | publications_files | Uploaded publication files | ← Publications |
PublicationDocuments | publication_documents | Document metadata | ← Publications |
PublicationCreators | publication_creators | Authors/contributors | ← Publications, → CreatorsRoles |
PublicationOrganization | publication_organizations | Institutional affiliations | ← Publications |
PublicationFunders | publication_funders | Funding information | ← Publications, → FunderTypes |
PublicationProjects | publication_projects | Research projects | ← Publications |
PublicationComments | publication_comments | Hierarchical comments | ← Publications, UserAccount, self-referential |
PublicationAuditTrail | publication_audit_trail | Change tracking | ← Publications, UserAccount |
Publications Model
The central entity managing all publication metadata:
Publications Table Schema
class Publications(db.Model):
__tablename__ = 'publications'
# Primary Key
id = Column(Integer, primary_key=True, autoincrement=True)
# User Association
user_id = Column(Integer, ForeignKey('user_accounts.user_id'), nullable=False, index=True)
# Core Identifiers
document_docid = Column(String(255), nullable=False) # DOCiD identifier
doi = Column(String(50), nullable=False) # DOI
# Metadata Fields
document_title = Column(String(255), nullable=False)
document_description = Column(Text)
avatar = Column(String(255)) # Publication thumbnail
owner = Column(String(255)) # Owner name
publication_poster_url = Column(String(255)) # Poster/cover image
# Classification
resource_type_id = Column(Integer, ForeignKey('resource_types.id'), nullable=False, index=True)
# Timestamps
timestamp = Column(Integer, default=lambda: int(datetime.utcnow().timestamp()))
published = Column(DateTime, default=datetime.utcnow)
updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow)
updated_by = Column(Integer, ForeignKey('user_accounts.user_id'), nullable=True, index=True)
# Relationships (cascade="all, delete-orphan")
user_account = relationship('UserAccount', back_populates='publications', foreign_keys=[user_id])
publications_files = relationship('PublicationFiles', cascade="all, delete-orphan")
publication_documents = relationship('PublicationDocuments', cascade="all, delete-orphan")
publication_creators = relationship('PublicationCreators', cascade="all, delete-orphan")
publication_organizations = relationship('PublicationOrganization', cascade="all, delete-orphan")
publication_funders = relationship('PublicationFunders', cascade="all, delete-orphan")
publication_projects = relationship('PublicationProjects', cascade="all, delete-orphan")
PublicationCreators Model
Manages authors and contributors with ORCID integration:
PublicationCreators Schema
class PublicationCreators(db.Model):
__tablename__ = 'publication_creators'
id = Column(Integer, primary_key=True)
publication_id = Column(Integer, ForeignKey('publications.id'), nullable=False, index=True)
# Name Fields
family_name = Column(String(255), nullable=False)
given_name = Column(String(255))
# Identifiers
identifier = Column(String(500)) # Full URL (e.g., https://orcid.org/0000-0002-1981-4157)
identifier_type = Column(String(50)) # Type: 'orcid', 'isni', 'viaf'
# Role
role_id = Column(String(255), nullable=False) # References CreatorsRoles
# Relationship
publication = relationship('Publications', back_populates='publication_creators')
Supported Creator Roles
- Innovator - Innovation lead
- Director - Project director
- Researcher - Research contributor
- Principal Investigator - Lead researcher
- Librarian - Information specialist
- Vice Chancellor - Academic leadership
- Deputy Vice Chancellor - Academic administration
PublicationOrganization Model
Institutional affiliations with ROR (Research Organization Registry) integration:
PublicationOrganization Schema
class PublicationOrganization(db.Model):
__tablename__ = 'publication_organizations'
id = Column(Integer, primary_key=True)
publication_id = Column(Integer, ForeignKey('publications.id'), nullable=False, index=True)
# Organization Details
name = Column(String(255), nullable=False)
type = Column(String(255), nullable=False)
other_name = Column(String(255)) # Alternative names
country = Column(String(255))
# Identifiers
identifier = Column(String(500)) # Full URL (e.g., https://ror.org/02nr0ka47)
identifier_type = Column(String(50)) # Type: 'ror', 'grid', 'isni'
# Relationship
publication = relationship('Publications', back_populates='publication_organizations')
PublicationFunders Model
Funding information with ROR and Crossref Funder Registry integration:
PublicationFunders Schema
class PublicationFunders(db.Model):
__tablename__ = 'publication_funders'
id = Column(Integer, primary_key=True)
publication_id = Column(Integer, ForeignKey('publications.id'), nullable=False, index=True)
# Funder Details
name = Column(String(255), nullable=False)
type = Column(String(255), nullable=False)
funder_type_id = Column(Integer, ForeignKey('funder_types.id'), nullable=False, index=True)
other_name = Column(String(255))
country = Column(String(255))
# Identifiers
identifier = Column(String(500)) # Full URL (e.g., https://ror.org/01ej9dk98)
identifier_type = Column(String(50)) # Type: 'ror', 'fundref', 'isni'
# Relationships
publication = relationship('Publications', back_populates='publication_funders')
PublicationProjects Model
Research projects with RAiD (Research Activity Identifier) integration:
PublicationProjects Schema
class PublicationProjects(db.Model):
__tablename__ = 'publication_projects'
id = Column(Integer, primary_key=True)
publication_id = Column(Integer, ForeignKey('publications.id'), nullable=False, index=True)
# Project Details
title = Column(String(255), nullable=False)
description = Column(Text, nullable=False)
# Identifiers
raid_id = Column(String(255)) # Legacy RAiD ID
identifier = Column(String(500)) # Full URL (e.g., https://app.demo.raid.org.au/raids/10.80368/b1adfb3a)
identifier_type = Column(String(50)) # Type: 'raid'
# Relationship
publication = relationship('Publications', back_populates='publication_projects')
Files and Documents Management
PublicationFiles
class PublicationFiles(db.Model):
__tablename__ = 'publications_files'
id = Column(Integer, primary_key=True)
publication_id = Column(Integer, ForeignKey('publications.id'))
# File Metadata
title = Column(String(255), nullable=False)
description = Column(Text, nullable=False)
publication_type_id = Column(Integer, ForeignKey('publication_types.id'))
# File Details
file_name = Column(String(255), nullable=False)
file_type = Column(String(100), nullable=False)
file_url = Column(String(255), nullable=False)
# Identifiers
identifier = Column(String(100), nullable=False)
generated_identifier = Column(String(100), nullable=False)
handle_identifier = Column(String(100)) # CORDRA Handle
external_identifier = Column(String(100)) # DataCite/Crossref DOI
external_identifier_type = Column(String(50))
PublicationDocuments
class PublicationDocuments(db.Model):
__tablename__ = 'publication_documents'
id = Column(Integer, primary_key=True)
publication_id = Column(Integer, ForeignKey('publications.id'))
# Document Metadata
title = Column(String(255), nullable=False)
description = Column(Text, nullable=False)
publication_type_id = Column(Integer, ForeignKey('publication_types.id'))
# File Reference
file_url = Column(String(255), nullable=False)
# Identifiers
identifier_type_id = Column(Integer, ForeignKey('identifier_types.id'))
generated_identifier = Column(String(255))
identifier_cstr = Column(String(100)) # CSTR identifier
handle_identifier = Column(String(100)) # CORDRA Handle
external_identifier = Column(String(100)) # External DOI
external_identifier_type = Column(String(50))
Identifier Management
DOCiD™ supports multiple persistent identifier schemes:
Identifier Type | Format | Example | Integration |
---|---|---|---|
DOCiD™ | 20.500.[prefix]/[suffix] | 20.500.14351/b23a80ce9b1a6ef21588 | Internal generation |
DOI | 10.[prefix]/[suffix] | 10.1234/example.doi | Crossref/DataCite |
Handle | [prefix]/[suffix] | 20.500.12345/67890 | CORDRA |
CSTR | 31253.11.[suffix] | 31253.11.sciencedh.2024.123 | CSTR Service |
ORCID | 0000-0000-0000-000X | 0000-0002-1981-4157 | ORCID API |
ROR | https://ror.org/[id] | https://ror.org/02nr0ka47 | ROR API |
RAiD | 10.[prefix]/[suffix] | 10.80368/b1adfb3a | RAiD Service |
Audit Trail
Comprehensive change tracking for compliance and history:
PublicationAuditTrail Model
class PublicationAuditTrail(db.Model):
__tablename__ = 'publication_audit_trail'
id = Column(Integer, primary_key=True)
publication_id = Column(Integer, ForeignKey('publications.id'), nullable=False, index=True)
user_id = Column(Integer, ForeignKey('user_accounts.user_id'), nullable=False, index=True)
# Action Tracking
action = Column(String(50), nullable=False) # 'CREATE', 'UPDATE', 'DELETE'
field_name = Column(String(100)) # Field changed
old_value = Column(Text) # Previous value (JSON)
new_value = Column(Text) # New value (JSON)
# Context
timestamp = Column(DateTime, nullable=False, default=datetime.utcnow, index=True)
ip_address = Column(String(45)) # IPv4/IPv6
user_agent = Column(Text) # Browser/client info
# Relationships
publication = relationship('Publications', backref='audit_entries')
user = relationship('UserAccount', backref='audit_actions')
Draft Management
Session persistence for multi-step workflow:
PublicationDrafts Model
class PublicationDrafts(db.Model):
__tablename__ = 'publication_drafts'
id = Column(Integer, primary_key=True)
email = Column(String(100), nullable=False, unique=True, index=True) # One draft per user
form_data = Column(JSON, nullable=False) # Complete form state as JSON
# Timestamps
created_at = Column(DateTime, default=datetime.utcnow)
updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow)
Reference Tables (Controlled Vocabularies)
ResourceTypes
- Indigenous Knowledge
- Patent
- Cultural Heritage
- Project
- Funder
- DMP (Data Management Plan)
PublicationTypes
- Article
- Book Chapter
- Proceeding
- Monograph
- Preprint
- Edited Book
- Seminar
- Research Chapter
- Review Article
- Book Review
- Conference Abstract
- Letter to Editor
- Editorial
- Other Book Content
- Correction Erratum
FunderTypes
- Government
- Foundation
- Corporate
- Academic Institution
- International Organization
- Non-profit
IdentifierTypes
- APA Handle ID
- Crossref
- Datacite
- CSTR
- DOI
- ARK Keys
- ArXiv ID
- Handle ID
- Hand iD
- dPID
- ORCID
- ISNI
- Ringgold
User Account Management
Comprehensive user profile with social authentication:
UserAccount Model (Key Fields)
class UserAccount(db.Model):
__tablename__ = "user_accounts"
user_id = Column(Integer, primary_key=True)
# Authentication
email = Column(String(100), unique=True, nullable=False)
password = Column(String(255), nullable=True) # For local auth
social_id = Column(String(100), nullable=True) # Social auth ID
type = Column(String(20), nullable=False) # google, orcid, github
# Profile Information
user_name = Column(String(50), nullable=False)
full_name = Column(String(100), nullable=False)
avator = Column(String(255), nullable=True)
# Professional Information
affiliation = Column(String(100), nullable=True)
role = Column(String(50), nullable=True)
orcid_id = Column(String(50), nullable=True)
ror_id = Column(String(50), nullable=True)
# Location
country = Column(String(50), nullable=True)
city = Column(String(50), nullable=True)
location = Column(String(100), nullable=True)
# Social Profiles
linkedin_profile_link = Column(String(255), nullable=True)
facebook_profile_link = Column(String(255), nullable=True)
x_profile_link = Column(String(255), nullable=True)
instagram_profile_link = Column(String(255), nullable=True)
github_profile_link = Column(String(255), nullable=True)
# Timestamps
timestamp = Column(DateTime, default=datetime.utcnow)
date_joined = Column(DateTime, default=datetime.utcnow)
# Relationships
publications = relationship('Publications', cascade="all, delete-orphan")
Database Constraints & Indexes
Constraint Type | Tables | Purpose |
---|---|---|
Primary Keys | All tables | Unique record identification |
Foreign Keys | All relationship fields | Referential integrity |
Unique Constraints | email (user_accounts), doi (publications) | Prevent duplicates |
Not Null | Required fields | Data completeness |
Indexes | Foreign keys, timestamp fields | Query performance |
Cascading Delete | Publication relationships | Maintain integrity on deletion |
Database Migrations
Alembic-based migration management:
Migration Commands
# Initialize migrations
python run.py db init
# Create a new migration
python run.py db migrate -m "Description of changes"
# Apply migrations
python run.py db upgrade
# Rollback migration
python run.py db downgrade
# View migration history
python run.py db history
# Create database and seed data
python manage.py create-db
python manage.py seed-db
Comments System
Hierarchical commenting with threading and moderation:
PublicationComments Model