DOCiD™ Metadata Schema

Technical specification for publication and document identifier management system

Schema Overview

The DOCiD™ metadata schema implements a comprehensive framework for managing digital publications with persistent identifiers. Built on PostgreSQL with SQLAlchemy ORM, it provides robust data management with full referential integrity.

Key Features

  • 7-step workflow for publication registration
  • Multi-identifier support (DOI, Handle, CSTR, ARK)
  • Hierarchical comment system with threading
  • Comprehensive audit trail
  • JSON metadata extensions
  • External service integrations

Database Architecture

  • PostgreSQL backend (JSON support required)
  • SQLAlchemy ORM with migrations
  • Cascading relationships
  • Comprehensive indexing
  • Soft delete support
  • Audit logging

Core Data Models

DOCiD™ implements the following primary database models for managing publications and related entities:

Model Table Name Purpose Key Relationships
Publications publications Central publication entity → Files, Documents, Creators, Organizations, Funders, Projects
PublicationFiles publications_files Uploaded publication files ← Publications
PublicationDocuments publication_documents Document metadata ← Publications
PublicationCreators publication_creators Authors/contributors ← Publications, → CreatorsRoles
PublicationOrganization publication_organizations Institutional affiliations ← Publications
PublicationFunders publication_funders Funding information ← Publications, → FunderTypes
PublicationProjects publication_projects Research projects ← Publications
PublicationComments publication_comments Hierarchical comments ← Publications, UserAccount, self-referential
PublicationAuditTrail publication_audit_trail Change tracking ← Publications, UserAccount

Publications Model

The central entity managing all publication metadata:

Publications Table Schema

class Publications(db.Model):
    __tablename__ = 'publications'

    # Primary Key
    id = Column(Integer, primary_key=True, autoincrement=True)

    # User Association
    user_id = Column(Integer, ForeignKey('user_accounts.user_id'), nullable=False, index=True)

    # Core Identifiers
    document_docid = Column(String(255), nullable=False)  # DOCiD identifier
    doi = Column(String(50), nullable=False)              # DOI

    # Metadata Fields
    document_title = Column(String(255), nullable=False)
    document_description = Column(Text)
    avatar = Column(String(255))                          # Publication thumbnail
    owner = Column(String(255))                           # Owner name
    publication_poster_url = Column(String(255))          # Poster/cover image

    # Classification
    resource_type_id = Column(Integer, ForeignKey('resource_types.id'), nullable=False, index=True)

    # Timestamps
    timestamp = Column(Integer, default=lambda: int(datetime.utcnow().timestamp()))
    published = Column(DateTime, default=datetime.utcnow)
    updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow)
    updated_by = Column(Integer, ForeignKey('user_accounts.user_id'), nullable=True, index=True)

    # Relationships (cascade="all, delete-orphan")
    user_account = relationship('UserAccount', back_populates='publications', foreign_keys=[user_id])
    publications_files = relationship('PublicationFiles', cascade="all, delete-orphan")
    publication_documents = relationship('PublicationDocuments', cascade="all, delete-orphan")
    publication_creators = relationship('PublicationCreators', cascade="all, delete-orphan")
    publication_organizations = relationship('PublicationOrganization', cascade="all, delete-orphan")
    publication_funders = relationship('PublicationFunders', cascade="all, delete-orphan")
    publication_projects = relationship('PublicationProjects', cascade="all, delete-orphan")

PublicationCreators Model

Manages authors and contributors with ORCID integration:

PublicationCreators Schema

class PublicationCreators(db.Model):
    __tablename__ = 'publication_creators'

    id = Column(Integer, primary_key=True)
    publication_id = Column(Integer, ForeignKey('publications.id'), nullable=False, index=True)

    # Name Fields
    family_name = Column(String(255), nullable=False)
    given_name = Column(String(255))

    # Identifiers
    identifier = Column(String(500))       # Full URL (e.g., https://orcid.org/0000-0002-1981-4157)
    identifier_type = Column(String(50))   # Type: 'orcid', 'isni', 'viaf'

    # Role
    role_id = Column(String(255), nullable=False)  # References CreatorsRoles

    # Relationship
    publication = relationship('Publications', back_populates='publication_creators')

Supported Creator Roles

  • Innovator - Innovation lead
  • Director - Project director
  • Researcher - Research contributor
  • Principal Investigator - Lead researcher
  • Librarian - Information specialist
  • Vice Chancellor - Academic leadership
  • Deputy Vice Chancellor - Academic administration

PublicationOrganization Model

Institutional affiliations with ROR (Research Organization Registry) integration:

PublicationOrganization Schema

class PublicationOrganization(db.Model):
    __tablename__ = 'publication_organizations'

    id = Column(Integer, primary_key=True)
    publication_id = Column(Integer, ForeignKey('publications.id'), nullable=False, index=True)

    # Organization Details
    name = Column(String(255), nullable=False)
    type = Column(String(255), nullable=False)
    other_name = Column(String(255))              # Alternative names
    country = Column(String(255))

    # Identifiers
    identifier = Column(String(500))              # Full URL (e.g., https://ror.org/02nr0ka47)
    identifier_type = Column(String(50))          # Type: 'ror', 'grid', 'isni'

    # Relationship
    publication = relationship('Publications', back_populates='publication_organizations')

PublicationFunders Model

Funding information with ROR and Crossref Funder Registry integration:

PublicationFunders Schema

class PublicationFunders(db.Model):
    __tablename__ = 'publication_funders'

    id = Column(Integer, primary_key=True)
    publication_id = Column(Integer, ForeignKey('publications.id'), nullable=False, index=True)

    # Funder Details
    name = Column(String(255), nullable=False)
    type = Column(String(255), nullable=False)
    funder_type_id = Column(Integer, ForeignKey('funder_types.id'), nullable=False, index=True)
    other_name = Column(String(255))
    country = Column(String(255))

    # Identifiers
    identifier = Column(String(500))              # Full URL (e.g., https://ror.org/01ej9dk98)
    identifier_type = Column(String(50))          # Type: 'ror', 'fundref', 'isni'

    # Relationships
    publication = relationship('Publications', back_populates='publication_funders')

PublicationProjects Model

Research projects with RAiD (Research Activity Identifier) integration:

PublicationProjects Schema

class PublicationProjects(db.Model):
    __tablename__ = 'publication_projects'

    id = Column(Integer, primary_key=True)
    publication_id = Column(Integer, ForeignKey('publications.id'), nullable=False, index=True)

    # Project Details
    title = Column(String(255), nullable=False)
    description = Column(Text, nullable=False)

    # Identifiers
    raid_id = Column(String(255))                 # Legacy RAiD ID
    identifier = Column(String(500))              # Full URL (e.g., https://app.demo.raid.org.au/raids/10.80368/b1adfb3a)
    identifier_type = Column(String(50))          # Type: 'raid'

    # Relationship
    publication = relationship('Publications', back_populates='publication_projects')

Files and Documents Management

PublicationFiles

class PublicationFiles(db.Model):
    __tablename__ = 'publications_files'

    id = Column(Integer, primary_key=True)
    publication_id = Column(Integer, ForeignKey('publications.id'))

    # File Metadata
    title = Column(String(255), nullable=False)
    description = Column(Text, nullable=False)
    publication_type_id = Column(Integer, ForeignKey('publication_types.id'))

    # File Details
    file_name = Column(String(255), nullable=False)
    file_type = Column(String(100), nullable=False)
    file_url = Column(String(255), nullable=False)

    # Identifiers
    identifier = Column(String(100), nullable=False)
    generated_identifier = Column(String(100), nullable=False)
    handle_identifier = Column(String(100))       # CORDRA Handle
    external_identifier = Column(String(100))     # DataCite/Crossref DOI
    external_identifier_type = Column(String(50))

PublicationDocuments

class PublicationDocuments(db.Model):
    __tablename__ = 'publication_documents'

    id = Column(Integer, primary_key=True)
    publication_id = Column(Integer, ForeignKey('publications.id'))

    # Document Metadata
    title = Column(String(255), nullable=False)
    description = Column(Text, nullable=False)
    publication_type_id = Column(Integer, ForeignKey('publication_types.id'))

    # File Reference
    file_url = Column(String(255), nullable=False)

    # Identifiers
    identifier_type_id = Column(Integer, ForeignKey('identifier_types.id'))
    generated_identifier = Column(String(255))
    identifier_cstr = Column(String(100))         # CSTR identifier
    handle_identifier = Column(String(100))       # CORDRA Handle
    external_identifier = Column(String(100))     # External DOI
    external_identifier_type = Column(String(50))

Identifier Management

DOCiD™ supports multiple persistent identifier schemes:

Identifier Type Format Example Integration
DOCiD™ 20.500.[prefix]/[suffix] 20.500.14351/b23a80ce9b1a6ef21588 Internal generation
DOI 10.[prefix]/[suffix] 10.1234/example.doi Crossref/DataCite
Handle [prefix]/[suffix] 20.500.12345/67890 CORDRA
CSTR 31253.11.[suffix] 31253.11.sciencedh.2024.123 CSTR Service
ORCID 0000-0000-0000-000X 0000-0002-1981-4157 ORCID API
ROR https://ror.org/[id] https://ror.org/02nr0ka47 ROR API
RAiD 10.[prefix]/[suffix] 10.80368/b1adfb3a RAiD Service

Comments System

Hierarchical commenting with threading and moderation:

PublicationComments Model

class PublicationComments(db.Model):
    __tablename__ = 'publication_comments'

    id = Column(Integer, primary_key=True)
    publication_id = Column(Integer, ForeignKey('publications.id'), nullable=False, index=True)
    user_id = Column(Integer, ForeignKey('user_accounts.user_id'), nullable=False, index=True)

    # Hierarchical Structure
    parent_comment_id = Column(Integer, ForeignKey('publication_comments.id'), nullable=True, index=True)

    # Comment Content
    comment_text = Column(Text, nullable=False)
    comment_type = Column(String(50), default='general')  # general, review, question, suggestion

    # Status Management
    status = Column(String(20), default='active')         # active, edited, deleted, flagged
    is_edited = Column(Boolean, default=False)
    edit_count = Column(Integer, default=0)
    likes_count = Column(Integer, default=0)

    # Timestamps
    created_at = Column(DateTime, default=datetime.utcnow, nullable=False)
    updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow)

    # Relationships
    publication = relationship('Publications', backref=backref('comments', lazy='dynamic'))
    user = relationship('UserAccount', backref=backref('comments', lazy='dynamic'))
    parent_comment = relationship('PublicationComments', remote_side=[id], backref='replies')

Audit Trail

Comprehensive change tracking for compliance and history:

PublicationAuditTrail Model

class PublicationAuditTrail(db.Model):
    __tablename__ = 'publication_audit_trail'

    id = Column(Integer, primary_key=True)
    publication_id = Column(Integer, ForeignKey('publications.id'), nullable=False, index=True)
    user_id = Column(Integer, ForeignKey('user_accounts.user_id'), nullable=False, index=True)

    # Action Tracking
    action = Column(String(50), nullable=False)           # 'CREATE', 'UPDATE', 'DELETE'
    field_name = Column(String(100))                      # Field changed
    old_value = Column(Text)                              # Previous value (JSON)
    new_value = Column(Text)                              # New value (JSON)

    # Context
    timestamp = Column(DateTime, nullable=False, default=datetime.utcnow, index=True)
    ip_address = Column(String(45))                       # IPv4/IPv6
    user_agent = Column(Text)                             # Browser/client info

    # Relationships
    publication = relationship('Publications', backref='audit_entries')
    user = relationship('UserAccount', backref='audit_actions')

Draft Management

Session persistence for multi-step workflow:

PublicationDrafts Model

class PublicationDrafts(db.Model):
    __tablename__ = 'publication_drafts'

    id = Column(Integer, primary_key=True)
    email = Column(String(100), nullable=False, unique=True, index=True)  # One draft per user
    form_data = Column(JSON, nullable=False)              # Complete form state as JSON

    # Timestamps
    created_at = Column(DateTime, default=datetime.utcnow)
    updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow)

Reference Tables (Controlled Vocabularies)

ResourceTypes

  • Indigenous Knowledge
  • Patent
  • Cultural Heritage
  • Project
  • Funder
  • DMP (Data Management Plan)

PublicationTypes

  • Article
  • Book Chapter
  • Proceeding
  • Monograph
  • Preprint
  • Edited Book
  • Seminar
  • Research Chapter
  • Review Article
  • Book Review
  • Conference Abstract
  • Letter to Editor
  • Editorial
  • Other Book Content
  • Correction Erratum

FunderTypes

  • Government
  • Foundation
  • Corporate
  • Academic Institution
  • International Organization
  • Non-profit

IdentifierTypes

  • APA Handle ID
  • Crossref
  • Datacite
  • CSTR
  • DOI
  • ARK Keys
  • ArXiv ID
  • Handle ID
  • Hand iD
  • dPID
  • ORCID
  • ISNI
  • Ringgold

User Account Management

Comprehensive user profile with social authentication:

UserAccount Model (Key Fields)

class UserAccount(db.Model):
    __tablename__ = "user_accounts"

    user_id = Column(Integer, primary_key=True)

    # Authentication
    email = Column(String(100), unique=True, nullable=False)
    password = Column(String(255), nullable=True)         # For local auth
    social_id = Column(String(100), nullable=True)        # Social auth ID
    type = Column(String(20), nullable=False)             # google, orcid, github

    # Profile Information
    user_name = Column(String(50), nullable=False)
    full_name = Column(String(100), nullable=False)
    avator = Column(String(255), nullable=True)

    # Professional Information
    affiliation = Column(String(100), nullable=True)
    role = Column(String(50), nullable=True)
    orcid_id = Column(String(50), nullable=True)
    ror_id = Column(String(50), nullable=True)

    # Location
    country = Column(String(50), nullable=True)
    city = Column(String(50), nullable=True)
    location = Column(String(100), nullable=True)

    # Social Profiles
    linkedin_profile_link = Column(String(255), nullable=True)
    facebook_profile_link = Column(String(255), nullable=True)
    x_profile_link = Column(String(255), nullable=True)
    instagram_profile_link = Column(String(255), nullable=True)
    github_profile_link = Column(String(255), nullable=True)

    # Timestamps
    timestamp = Column(DateTime, default=datetime.utcnow)
    date_joined = Column(DateTime, default=datetime.utcnow)

    # Relationships
    publications = relationship('Publications', cascade="all, delete-orphan")

Database Constraints & Indexes

Constraint Type Tables Purpose
Primary Keys All tables Unique record identification
Foreign Keys All relationship fields Referential integrity
Unique Constraints email (user_accounts), doi (publications) Prevent duplicates
Not Null Required fields Data completeness
Indexes Foreign keys, timestamp fields Query performance
Cascading Delete Publication relationships Maintain integrity on deletion

Database Migrations

Alembic-based migration management:

Migration Commands

# Initialize migrations
python run.py db init

# Create a new migration
python run.py db migrate -m "Description of changes"

# Apply migrations
python run.py db upgrade

# Rollback migration
python run.py db downgrade

# View migration history
python run.py db history

# Create database and seed data
python manage.py create-db
python manage.py seed-db