System Architecture

Comprehensive overview of DOCiD™'s architectural components, design patterns, and data models

Architecture Overview

The DOCiD™ (Digital Object Container Identifier) system is a sophisticated platform designed to serve as a comprehensive publication and document identifier management system. The system is architected with a focus on persistent identifier (PID) services, metadata management, and scholarly communication tools, specifically designed to support cultural heritage centres, museums, patent offices, funders, facilities, and publishing infrastructures.

Core Architecture Components

Application Layer

  • Application factory pattern with comprehensive extension integration
  • Blueprint-based modular architecture with 20+ service-specific routes
  • RESTful API design with JSON payload communication
  • Swagger/OpenAPI documentation integration via Flasgger

Service Layer

  • External service integration abstractions
  • Background processing for batch operations
  • Authentication and token management
  • Error handling and fallback mechanisms

Data Layer

  • SQLAlchemy ORM with PostgreSQL backend
  • Complex relational model with cascading relationships
  • Migration management via Alembic
  • Reference data and controlled vocabularies

Key Architectural Patterns

Identifier Management Centralization

The system implements a centralized approach to managing multiple persistent identifier types including DOIs, DOCiD™s, Handles, and other PIDs. This ensures consistency across all identifier operations and provides a unified interface for external systems.

Key Features:

  • Unified identifier assignment and management
  • Support for multiple identifier schemes simultaneously
  • Automatic identifier validation and format checking
  • Cross-reference resolution between different identifier types
Hierarchical Data Structures

Comments system implements a sophisticated tree structure allowing for threaded discussions with parent-child relationships, status management, and moderation capabilities.

Implementation Details:

  • Self-referential database relationships
  • Recursive comment retrieval algorithms
  • Status-based visibility management
  • Hierarchical permission inheritance
Service Integration Abstraction

External service connectors are abstracted into dedicated service classes, providing consistent interfaces for interacting with Crossref, CSTR, CORDRA, ROR, ORCID, and Local Contexts APIs.

Service Patterns:

  • Standardized authentication token management
  • Consistent error handling and retry logic
  • Unified data transformation interfaces
  • Configurable service endpoints and credentials

Database Architecture

The system utilizes PostgreSQL as the primary database, leveraging advanced features like JSON fields, full-text search, and complex indexing strategies.

Core Data Models

Publications Model
Core Entity Table: publications

Central entity managing publication metadata with DOCiD™/DOI assignment and cascading relationships to files, creators, organizations, funders, and projects.

Publications Model Schema
class Publications(db.Model):
    __tablename__ = 'publications'

    id = Column(Integer, primary_key=True, autoincrement=True)
    user_id = Column(Integer, ForeignKey('user_accounts.user_id'), nullable=False, index=True)
    document_docid = Column(String(255), nullable=False)
    document_title = Column(String(255), nullable=False)
    document_description = Column(Text)
    resource_type_id = Column(Integer, ForeignKey('resource_types.id'), nullable=False)
    doi = Column(String(50), nullable=False)
    published = Column(DateTime, default=datetime.utcnow)

    # Relationships
    user_account = relationship('UserAccount', back_populates='publications')
    publication_creators = relationship('PublicationCreators', cascade="all, delete-orphan")
    publication_organizations = relationship('PublicationOrganization', cascade="all, delete-orphan")
    publication_funders = relationship('PublicationFunders', cascade="all, delete-orphan")
UserAccount Model
User Management Table: user_accounts

Comprehensive user management with social authentication support and rich profile information including ORCID and ROR ID integration.

UserAccount Model Schema
class UserAccount(db.Model):
    __tablename__ = "user_accounts"

    user_id = db.Column(db.Integer, primary_key=True, autoincrement=True)
    user_name = db.Column(db.String(50), nullable=False)
    full_name = db.Column(db.String(100), nullable=False)
    email = db.Column(db.String(100), unique=True, nullable=False)
    orcid_id = db.Column(db.String(50), nullable=True)
    ror_id = db.Column(db.String(50), nullable=True)
    affiliation = db.Column(db.String(100), nullable=True)
    role = db.Column(db.String(50), nullable=True)
    country = db.Column(db.String(50), nullable=True)
    
    # Social media profile links
    linkedin_profile_link = db.Column(db.String(255), nullable=True)
    github_profile_link = db.Column(db.String(255), nullable=True)

    # Relationships
    publications = relationship('Publications', back_populates='user_account')
PublicationCreators Model
Creator Management Table: publication_creators

Author/creator management with role assignment and multiple identifier scheme support.

PublicationCreators Model Schema
class PublicationCreators(db.Model):
    __tablename__ = 'publication_creators'

    id = Column(Integer, primary_key=True, autoincrement=True)
    publication_id = Column(Integer, ForeignKey('publications.id'), nullable=False, index=True)
    family_name = Column(String(255), nullable=False)
    given_name = Column(String(255))
    identifier = Column(String(500))  # Full resolvable URL (e.g., https://orcid.org/...)
    identifier_type = Column(String(50))  # Type (e.g., 'orcid', 'isni', 'viaf')
    role_id = Column(String(255), nullable=False)

    # Relationships
    publication = relationship('Publications', back_populates='publication_creators')

Supported Creator Identifiers:

  • ORCID: https://orcid.org/0000-0000-0000-0000
  • ISNI: https://isni.org/isni/0000000000000000
  • VIAF: http://viaf.org/viaf/123456789
  • ResearcherID: http://www.researcherid.com/rid/A-1234-2012
  • Scopus ID: Scopus author identifier URLs
PublicationComments Model
Comments System Table: publication_comments

Hierarchical comment structure with parent-child relationships, status management, and engagement tracking.

PublicationComments Model Schema
class PublicationComments(db.Model):
    __tablename__ = 'publication_comments'
    
    id = db.Column(db.Integer, primary_key=True, autoincrement=True)
    publication_id = db.Column(db.Integer, db.ForeignKey('publications.id'), nullable=False)
    user_id = db.Column(db.Integer, db.ForeignKey('user_accounts.user_id'), nullable=False)
    parent_comment_id = db.Column(db.Integer, db.ForeignKey('publication_comments.id'), nullable=True)
    comment_text = db.Column(db.Text, nullable=False)
    comment_type = db.Column(db.String(50), default='general')
    status = db.Column(db.String(20), default='active')
    likes_count = db.Column(db.Integer, default=0)
    created_at = db.Column(db.DateTime, default=datetime.utcnow)
    
    # Self-referential relationship for hierarchical comments
    parent_comment = db.relationship('PublicationComments', remote_side=[id], backref='replies')

Comment Features:

  • Hierarchical Structure: Parent-child relationships for threaded discussions
  • Comment Types: general, review, question, suggestion
  • Status Management: active, edited, deleted, flagged states
  • Engagement: Like counts and reply threading
  • Moderation: User vs admin permissions for editing/deletion

Authentication & Authorization

JWT-Based Authentication

  • JWT-Extended implementation
  • Access and refresh token management
  • Role-based access control
  • Social authentication integration

Social Authentication Providers

  • Google OAuth 2.0
  • ORCID OAuth 2.0
  • GitHub OAuth 2.0

Permission Management

  • User role hierarchy (user, admin, superadmin)
  • Resource-based permissions
  • Comment moderation permissions
  • Publication ownership controls

Database Relationships

Relationship Type Primary Entity Related Entity Description
One-to-Many UserAccount Publications User ownership of publications
One-to-Many Publications PublicationComments Publication discussions
One-to-Many Publications PublicationCreators Multiple authors per publication
One-to-Many Publications PublicationOrganization Multiple affiliations
One-to-Many Publications PublicationFunders Multiple funding sources
Self-Referential PublicationComments PublicationComments Parent-child hierarchy for threaded comments

Reference Data Models

DOCiD™ implements comprehensive controlled vocabularies for consistent metadata classification.

Model Table Purpose Key Fields
ResourceTypes resource_types Publication resource classifications id, resource_type
CreatorsRoles creators_roles Creator role taxonomies id, role_id, role_name
FunderTypes funder_types Funding organization categories id, funder_type_name
PublicationTypes publication_types Document type classifications id, publication_type_name
PublicationIdentifierTypes identifier_types Identifier scheme registry id, identifier_type_name