Skip to content
Merging Notion Databases: How I Built a Database Fusion Tool
tech

Merging Notion Databases: How I Built a Database Fusion Tool

NB
Nagesh Bansal
March 11, 202515 min read
notion
database
api
integration
python

Have you ever needed to combine multiple Notion databases but found yourself copying properties and data manually? Learn how I built a Python-based tool that automatically merges Notion databases, handling property conflicts, preserving data integrity, and even copying page content blocks—turning hours of tedious work into a simple automated process.

Merging Notion Databases: How I Built a Database Fusion Tool

We've all been there. You've created multiple Notion databases over time, each with its own structure and purpose. Then one day, you realize these separate islands of information would be much more powerful if combined. But merging Notion databases manually is tedious, error-prone, and frankly, a waste of time.

After several late-night sessions trying to copy properties and migrating data by hand, I decided to build a proper solution: a Notion Database Merger that could handle the complexity automatically.

In this post, I'll walk through how I built this tool using Python, FastAPI, and the Notion API. You'll learn about the challenges of database merging, how to handle property conflicts, and how to preserve rich content when migrating data.

The Database Fragmentation Problem

My personal knowledge management system in Notion had evolved organically over time. I had:

  • An "Ideas" database from my early Notion days
  • A more structured "Validated Ideas" database I created later
  • A "Quick Captures" database I used on mobile
  • Several specialized idea collections for different projects

Each database had slightly different properties, formats, and content types. Merging them manually would require:

  1. Creating a new database with all combined properties
  2. Carefully mapping fields between databases
  3. Copying each item one by one
  4. Manually transferring all content blocks for each page
  5. Handling conflicts for properties with the same name but different types

Just thinking about doing this for hundreds of pages made me want to close my laptop and go for a very long walk.

The Automated Solution

Instead, I built a Notion Database Merger as part of my Research Notes Processor system. The high-level architecture looks like this:

┌─────────────────┐     ┌────────────────┐     ┌───────────────┐
│  Source DB #1   │     │  Source DB #2  │     │  Source DB #3 │
└────────┬────────┘     └────────┬───────┘     └───────┬───────┘
         │                       │                     │
         ▼                       ▼                     ▼
┌────────────────────────────────────────────────────────────┐
│                 Database Merger Processor                  │
├────────────────────────────────────────────────────────────┤
│  ┌─────────────┐    ┌──────────────┐    ┌──────────────┐   │
│  │ Schema      │    │ Property     │    │ Block        │   │
│  │ Analyzer    │    │ Transformer  │    │ Copier       │   │
│  └─────────────┘    └──────────────┘    └──────────────┘   │
└────────────────────────────┬───────────────────────────────┘
                             │
                             ▼
                   ┌───────────────────┐
                   │   Merged Database │
                   └───────────────────┘

Let's dive into how each component works.

Building Blocks: The Notion Database Merger

I built this tool as a processor class in my FastAPI application. Here's the core structure:

class NotionDatabaseMerger(BaseProcessor):
    def __init__(self, notion_client: Client):
        self.notion = notion_client
        super().__init__()

    async def merge_databases(
        self,
        db_ids: List[str],
        target_title: str,
        parent_page_id: str
    ) -> str:
        """Main method to merge databases"""
        merged_db_id, property_sources = await self.create_merged_database(
            source_db_ids=db_ids,
            title=target_title,
            parent_page_id=parent_page_id
        )

        for idx, db_id in enumerate(db_ids):
            await self.copy_data(
                source_db_id=db_id,
                target_db_id=merged_db_id,
                db_index=idx,
                property_sources=property_sources
            )

        return merged_db_id

This gives us a simple API: pass in database IDs, specify where to create the merged database, and the processor handles the rest.

The Schema Analyzer: Understanding Database Structure

The first challenge was to understand the schema of each source database. Notion's API represents database properties as JSON objects, each with a specific type (text, select, multi-select, etc.).

Here's how I retrieve and analyze database schemas:

async def get_database_schema(self, database_id: str) -> Dict[str, Dict]:
    """Retrieve the schema (properties) of a Notion database"""
    database = await self.notion.databases.retrieve(database_id=database_id)
    return database.get('properties', {})

Merging Schemas: The Art of Conflict Resolution

Perhaps the trickiest part was merging database schemas. What happens when two databases have a "Status" property, but one is a select and one is multi-select? Or when they have different select options?

My approach was to:

  1. Preserve all unique properties
  2. Handle conflicts by creating uniquely named properties
  3. For select/multi-select, merge all options from all databases

Here's the schema merging logic:

async def merge_schemas(self, schemas: List[Dict[str, Dict]]) -> tuple[Dict[str, Dict], Dict[str, List[int]]]:
    """Merge multiple database schemas, preserving all unique properties"""
    merged_schema = {}
    property_sources = {}  # Track which DB each property came from

    for idx, schema in enumerate(schemas):
        for prop_name, prop_config in schema.items():
            if prop_name not in merged_schema:
                # New property - add it directly
                merged_schema[prop_name] = prop_config
                property_sources[prop_name] = [idx]
            else:
                # Existing property - check if configuration matches
                if merged_schema[prop_name]['type'] == prop_config['type']:
                    # Same type - record that this property exists in multiple DBs
                    property_sources[prop_name].append(idx)
                else:
                    # Different type - create new property with DB identifier
                    new_name = f"{prop_name}_db{idx + 1}"
                    merged_schema[new_name] = prop_config
                    property_sources[new_name] = [idx]

    return merged_schema, property_sources

Select and Multi-Select: The Option Synchronization Challenge

A particularly tricky issue was handling select and multi-select properties. For these to work properly, the target database needs to have all possible options defined before you can assign them to pages.

My solution was to collect all select options before copying data:

async def merge_select_options(self, schemas: List[Dict[str, Dict]]) -> Dict[str, List[Dict[str, str]]]:
    """Merge select options from all source databases"""
    merged_options = {}

    for schema in schemas:
        for prop_name, prop_config in schema.items():
            if prop_config['type'] in ['select', 'multi_select']:
                if prop_name not in merged_options:
                    merged_options[prop_name] = []

                # Get options from this schema
                options = prop_config.get('select', {}).get('options', [])

                # Add new options
                existing_names = {opt['name'] for opt in merged_options[prop_name]}
                for option in options:
                    if option['name'] not in existing_names:
                        merged_options[prop_name].append({'name': option['name']})
                        existing_names.add(option['name'])

    return merged_options

Copying Data: The Property Transformation Challenge

With our schema ready, we need to copy actual data. Each property type requires special handling, especially for select and multi-select values:

async def copy_data(self, source_db_id: str, target_db_id: str, db_index: int, property_sources: Dict[str, List[int]]) -> None:
    """Copy data from source database to target database"""
    pages = []
    start_cursor = None
    
    # Get target database schema to validate properties
    target_schema = await self.get_database_schema(target_db_id)
    valid_properties = set(target_schema.keys())
    
    try:
        # Query all pages from source database
        while True:
            response = await self.notion.databases.query(
                database_id=source_db_id,
                start_cursor=start_cursor
            )
            pages.extend(response["results"])
            if not response.get("has_more"):
                break
            start_cursor = response.get("next_cursor")

        # Copy each page
        for page in pages:
            try:
                transformed_properties = {}
                
                # Process each property
                for prop_name, prop_value in page["properties"].items():
                    # Skip if property doesn't exist in target database
                    if prop_name not in valid_properties:
                        continue
                        
                    prop_type = prop_value.get('type')
                    if not prop_type:
                        continue

                    # Handle each property type
                    if prop_type == 'title':
                        title_content = prop_value.get('title', [])
                        if title_content:
                            transformed_properties[prop_name] = {
                                'title': [{'text': {'content': item.get('plain_text', '')}} 
                                        for item in title_content]
                            }
                    
                    elif prop_type == 'rich_text':
                        rich_text = prop_value.get('rich_text', [])
                        if rich_text:
                            transformed_properties[prop_name] = {
                                'rich_text': [{
                                    'type': 'text',
                                    'text': {'content': item.get('plain_text', '')}
                                } for item in rich_text]
                            }
                    
                    elif prop_type == 'multi_select':
                        multi_select = prop_value.get('multi_select', [])
                        if multi_select:
                            transformed_properties[prop_name] = {
                                'multi_select': [{'name': item['name']} for item in multi_select]
                            }
                    
                    elif prop_type == 'select':
                        select_value = prop_value.get('select')
                        if select_value and select_value.get('name'):
                            transformed_properties[prop_name] = {
                                'select': {'name': select_value['name']}
                            }
                    
                    elif prop_type == 'url':
                        url = prop_value.get('url')
                        if url:
                            transformed_properties[prop_name] = {'url': url}
                    
                    elif prop_type == 'date':
                        date_value = prop_value.get('date')
                        if date_value:
                            transformed_properties[prop_name] = {'date': date_value}
                    
                    elif prop_type == 'files':
                        files = prop_value.get('files', [])
                        transformed_properties[prop_name] = {'files': files}
                    
                    elif prop_type == 'relation':
                        relation = prop_value.get('relation', [])
                        transformed_properties[prop_name] = {'relation': relation}

                if transformed_properties:
                    # Create the new page
                    new_page = await self.notion.pages.create(
                        parent={"database_id": target_db_id},
                        properties=transformed_properties
                    )
                    
                    # Copy blocks from source page to new page
                    source_blocks = await self.get_blocks(page["id"])
                    if source_blocks:
                        await self.copy_blocks(source_blocks, new_page["id"])
                        
            except Exception as e:
                print(f"Error copying page: {str(e)}")
                print(f"Property values: {transformed_properties}")
                continue
                
    except Exception as e:
        print(f"Error querying database: {str(e)}")

The Content Challenge: Copying Blocks Recursively

Notion pages aren't just properties - they contain rich content blocks. Copying these blocks preserves the actual knowledge within the pages.

This was perhaps the most challenging aspect of the entire project. Notion's block structure is complex, and the API has strict requirements for block creation.

Handling Block Types

Different block types require different handling. The most common blocks are text-based (paragraphs, headings, lists), but others include images, code blocks, embeds, and more.

My approach was to transform each block to match Notion's API requirements:

async def transform_rich_text(rich_text):
    """Transform rich text content to a valid format"""
    if not rich_text:
        return []
        
    transformed = []
    
    for text_item in rich_text:
        # Base annotations
        annotations = {
            "bold": False,
            "italic": False,
            "strikethrough": False,
            "underline": False,
            "code": False,
            "color": "default"
        }
        annotations.update(text_item.get("annotations", {}))
        
        if text_item["type"] == "text":
            transformed.append({
                "type": "text",
                "text": {
                    "content": text_item.get("plain_text", ""),
                    "link": {"url": text_item.get("href")} if text_item.get("href") else None
                },
                "annotations": annotations
            })
            
        elif text_item["type"] == "mention":
            mention_data = text_item.get("mention", {})
            if mention_data.get("type") == "link_mention":
                # Convert link mentions to text with link
                href = mention_data.get("link_mention", {}).get("href", "")
                transformed.append({
                    "type": "text",
                    "text": {
                        "content": href,
                        "link": {"url": href}
                    },
                    "annotations": annotations
                })
            else:
                # For other mention types, use plain text
                transformed.append({
                    "type": "text",
                    "text": {"content": text_item.get("plain_text", "")},
                    "annotations": annotations
                })
    
    return transformed

Transforming Block Content

Each block type needs specific handling:

async def transform_block(block):
    """Transform a block into a valid format for creation"""
    if not block or "type" not in block:
        return None
        
    block_type = block["type"]
    content = block.get(block_type, {})
    transformed_block = {"type": block_type}
    
    # Handle text-based blocks
    text_block_types = [
        "paragraph", "heading_1", "heading_2", "heading_3",
        "bulleted_list_item", "numbered_list_item", "quote"
    ]
    
    if block_type in text_block_types:
        transformed_block[block_type] = {
            "rich_text": await transform_rich_text(content.get("rich_text", [])),
            "color": content.get("color", "default")
        }
        
    elif block_type == "to_do":
        transformed_block["to_do"] = {
            "rich_text": await transform_rich_text(content.get("rich_text", [])),
            "checked": content.get("checked", False),
            "color": content.get("color", "default")
        }
        
    elif block_type == "code":
        transformed_block["code"] = {
            "rich_text": await transform_rich_text(content.get("rich_text", [])),
            "language": content.get("language", "plain text")
        }
        
    elif block_type == "image":
        # Get the URL from either external or file type
        image_url = None
        if "external" in content:
            image_url = content["external"]["url"]
        elif "file" in content:
            image_url = content["file"]["url"]
            
        if image_url:
            # Always treat as external when copying
            transformed_block["image"] = {
                "type": "external",
                "external": {"url": image_url}
            }
            
    elif block_type == "divider":
        transformed_block["divider"] = {}
        
    elif block_type == "bookmark":
        transformed_block["bookmark"] = {
            "url": content.get("url", "")
        }
        
    return transformed_block

The Block Copying Process

The complete block copying process handles recursive copying of nested blocks:

async def copy_blocks(self, source_blocks: List[Dict], target_page_id: str):
    """Copy blocks from source to target page with proper transformation"""
    if not source_blocks:
        return
        
    for block in source_blocks:
        try:
            # Transform the block
            transformed_block = await transform_block(block)
            
            # Skip if transformation failed
            if transformed_block is None:
                print(f"Skipping invalid block: {block}")
                continue
                
            # Handle empty paragraph blocks
            if (transformed_block["type"] == "paragraph" and 
                not transformed_block["paragraph"]["rich_text"]):
                transformed_block["paragraph"]["rich_text"] = [{
                    "type": "text",
                    "text": {"content": ""},
                    "annotations": {
                        "bold": False,
                        "italic": False,
                        "strikethrough": False,
                        "underline": False,
                        "code": False,
                        "color": "default"
                    }
                }]
            
            # Append the transformed block
            response = await self.notion.blocks.children.append(
                block_id=target_page_id,
                children=[transformed_block]
            )
            
            # Handle children if they exist
            if block.get("has_children", False):
                new_block_id = response["results"][0]["id"]
                child_blocks = await self.get_blocks(block["id"])
                await self.copy_blocks(child_blocks, new_block_id)
                
        except Exception as e:
            print(f"Error copying block: {block}")
            print(f"Error details: {str(e)}")
            continue

Lessons Learned: The Quirks of Notion's API

Building this tool taught me several important lessons about working with Notion's API:

  1. Property Types Matter: You can't simply copy property configurations - each type needs specific handling
  2. Select Options Must Pre-exist: You must create select options before assigning them to pages
  3. Block Structure Is Complex: Block content has a nested structure that must be carefully transformed
  4. Rich Text Transformation: Rich text elements like mentions and links require special handling
  5. Image Handling: When copying images, they must be treated as external URLs regardless of their original source
  6. Empty Block Handling: Empty paragraph blocks need special treatment to avoid validation errors
  7. Error Handling Is Critical: API errors for one page shouldn't stop the entire process

Another challenging aspect was handling link mentions and embeds. These are rendered as beautiful previews in Notion, but the API treats them as complex structured data.

My solution was to convert link mentions to regular text with links, preserving the URL but simplifying the structure:

# Converting link mentions to regular text with links
if mention_data.get("type") == "link_mention":
    href = mention_data.get("link_mention", {}).get("href", "")
    transformed.append({
        "type": "text",
        "text": {
            "content": href,
            "link": {"url": href}
        },
        "annotations": annotations
    })

Image Block Transformation

Final tricky aspect was handling image blocks. When copying images between pages, we need to treat all images as external URLs:

# Get the URL from either external or file type
image_url = None
if "external" in content:
    image_url = content["external"]["url"]
elif "file" in content:
    image_url = content["file"]["url"]
    
if image_url:
    # Always treat as external when copying
    transformed_block["image"] = {
        "type": "external",
        "external": {"url": image_url}
    }

Results: From Many to One

The first time I ran this on my fragmented idea databases, the results were impressive:

  • 4 source databases with 300+ pages combined into one
  • 27 unique properties properly merged and configured
  • 1,500+ content blocks correctly migrated
  • Hours saved compared to manual copying

I now had a unified view of all my ideas, properly categorized and searchable.

Handling Edge Cases and Error Recovery

No data migration is perfect. I built in several mechanisms to handle errors gracefully:

Empty Block Handling

Notion's API sometimes rejects empty blocks, so I added special handling:

# Handle empty paragraph blocks
if (transformed_block["type"] == "paragraph" and 
    not transformed_block["paragraph"]["rich_text"]):
    transformed_block["paragraph"]["rich_text"] = [{
        "type": "text",
        "text": {"content": ""},
        "annotations": {
            "bold": False,
            "italic": False,
            "strikethrough": False,
            "underline": False,
            "code": False,
            "color": "default"
        }
    }]

Error Logging and Recovery

Robust error handling ensures that failures don't stop the entire process:

try:
    # Copy operation code...
except Exception as e:
    print(f"Error copying block: {block}")
    print(f"Error details: {str(e)}")
    continue  # Continue with next block

Future Enhancements

There's always room for improvement. Here are some features I'm considering:

  1. Intelligent Property Mapping: Using AI to suggest property mappings based on names and content
  2. Selective Merging: Allowing users to choose which properties to include
  3. Bidirectional Sync: Keeping merged databases in sync over time
  4. Conflict Resolution UI: A user interface for handling property conflicts
  5. Enhanced Block Support: Better handling of complex block types like synced blocks and databases
  6. Nested Database Support: Handling databases within databases

Making the Script Even More User-Friendly

If you're interested in extending the script, here are some user-friendly features you could add:

  1. Interactive Mode: A guided CLI experience that walks users through the merge process
  2. Database Preview: Show a preview of the schemas before merging
  3. Post-Merge Statistics: Generate a report of what was merged
  4. Selective Page Copying: Allow users to specify filters for which pages to copy
  5. Property Transformation Rules: Define custom rules for transforming property values during the merge

Using the Standalone Database Merger Script

I've created a standalone script version of the Notion Database Merger that doesn't require any web framework. This makes it much easier to use for most people who just want to merge their databases quickly.

Setting Up the Script

  1. Set up a Notion integration:

  2. Share databases with your integration:

    • Go to each database you want to merge
    • Click "Share" in the top right
    • Add your integration to the share list
  3. Install required packages:

    pip install notion-client python-dotenv
  4. Create a .env file in the same directory as the script:

    NOTION_TOKEN=your_secret_notion_token
    

Running the Script

The script uses a simple command-line interface:

python notion_merger.py --sources db1_id db2_id db3_id --title "Merged Database" --parent parent_page_id

Parameters:

  • --sources: The IDs of the source databases you want to merge (space-separated)
  • --title: The title for your new merged database
  • --parent: The ID of the page where you want to create the merged database

You can find database and page IDs in Notion URLs. For example, in the URL https://www.notion.so/username/abcdef1234567890abcdef1234567890, the ID is abcdef1234567890abcdef1234567890.

Example Usage

# Merge three databases into one called "Master Ideas Database"
python notion_merger.py \
  --sources 8a12b345c678d90efgh1234ij56k789l 2a34b567c890d12efgh3456ij78k901l 3a45b678c901d23efgh4567ij89k012l \
  --title "Master Ideas Database" \
  --parent 9a89b789c678d67efgh8901ij23k456l

The script will:

  1. Create a new database with all combined properties
  2. Copy all pages with their properties from each source database
  3. Preserve all content blocks, images, and formatting
  4. Show progress logs as it works

The entire process typically takes a few minutes, depending on how many pages and blocks you're merging.

The Complete Code

I've shared the complete code for the standalone Notion Database Merger in this GitHub Gist: notion-database-merger.py

This includes all the functions discussed in this post, properly organized and documented. Feel free to use it as a starting point for your own database merger tool.

Key Features of the Standalone Script

  • No Web Framework Required: Works as a simple command-line tool
  • Comprehensive Logging: Detailed progress updates and error reporting
  • Robust Error Handling: Failures with individual pages don't stop the whole process
  • All Block Types Supported: Correctly handles text, images, embeds, and other block types
  • Proper Property Transformation: Handles all property types including selects, rich text, etc.

Conclusion

Building a Notion Database Merger taught me about both the capabilities and limitations of the Notion API. What started as a weekend project to solve my own database fragmentation problem turned into a robust tool that's now part of my larger knowledge management system.

The beauty of building custom tools is that they can precisely match your workflow. This merger doesn't just save time - it enables new ways of organizing information that weren't possible when my data was fragmented across multiple databases.