Merging Notion Databases: How I Built a Database Fusion Tool
We've all been there. You've created multiple Notion databases over time, each with its own structure and purpose. Then one day, you realize these separate islands of information would be much more powerful if combined. But merging Notion databases manually is tedious, error-prone, and frankly, a waste of time.
After several late-night sessions trying to copy properties and migrating data by hand, I decided to build a proper solution: a Notion Database Merger that could handle the complexity automatically.
In this post, I'll walk through how I built this tool using Python, FastAPI, and the Notion API. You'll learn about the challenges of database merging, how to handle property conflicts, and how to preserve rich content when migrating data.
The Database Fragmentation Problem
My personal knowledge management system in Notion had evolved organically over time. I had:
- An "Ideas" database from my early Notion days
- A more structured "Validated Ideas" database I created later
- A "Quick Captures" database I used on mobile
- Several specialized idea collections for different projects
Each database had slightly different properties, formats, and content types. Merging them manually would require:
- Creating a new database with all combined properties
- Carefully mapping fields between databases
- Copying each item one by one
- Manually transferring all content blocks for each page
- Handling conflicts for properties with the same name but different types
Just thinking about doing this for hundreds of pages made me want to close my laptop and go for a very long walk.
The Automated Solution
Instead, I built a Notion Database Merger as part of my Research Notes Processor system. The high-level architecture looks like this:
┌─────────────────┐ ┌────────────────┐ ┌───────────────┐
│ Source DB #1 │ │ Source DB #2 │ │ Source DB #3 │
└────────┬────────┘ └────────┬───────┘ └───────┬───────┘
│ │ │
▼ ▼ ▼
┌────────────────────────────────────────────────────────────┐
│ Database Merger Processor │
├────────────────────────────────────────────────────────────┤
│ ┌─────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Schema │ │ Property │ │ Block │ │
│ │ Analyzer │ │ Transformer │ │ Copier │ │
│ └─────────────┘ └──────────────┘ └──────────────┘ │
└────────────────────────────┬───────────────────────────────┘
│
▼
┌───────────────────┐
│ Merged Database │
└───────────────────┘
Let's dive into how each component works.
Building Blocks: The Notion Database Merger
I built this tool as a processor class in my FastAPI application. Here's the core structure:
class NotionDatabaseMerger(BaseProcessor): def __init__(self, notion_client: Client): self.notion = notion_client super().__init__() async def merge_databases( self, db_ids: List[str], target_title: str, parent_page_id: str ) -> str: """Main method to merge databases""" merged_db_id, property_sources = await self.create_merged_database( source_db_ids=db_ids, title=target_title, parent_page_id=parent_page_id ) for idx, db_id in enumerate(db_ids): await self.copy_data( source_db_id=db_id, target_db_id=merged_db_id, db_index=idx, property_sources=property_sources ) return merged_db_id
This gives us a simple API: pass in database IDs, specify where to create the merged database, and the processor handles the rest.
The Schema Analyzer: Understanding Database Structure
The first challenge was to understand the schema of each source database. Notion's API represents database properties as JSON objects, each with a specific type (text, select, multi-select, etc.).
Here's how I retrieve and analyze database schemas:
async def get_database_schema(self, database_id: str) -> Dict[str, Dict]: """Retrieve the schema (properties) of a Notion database""" database = await self.notion.databases.retrieve(database_id=database_id) return database.get('properties', {})
Merging Schemas: The Art of Conflict Resolution
Perhaps the trickiest part was merging database schemas. What happens when two databases have a "Status" property, but one is a select and one is multi-select? Or when they have different select options?
My approach was to:
- Preserve all unique properties
- Handle conflicts by creating uniquely named properties
- For select/multi-select, merge all options from all databases
Here's the schema merging logic:
async def merge_schemas(self, schemas: List[Dict[str, Dict]]) -> tuple[Dict[str, Dict], Dict[str, List[int]]]: """Merge multiple database schemas, preserving all unique properties""" merged_schema = {} property_sources = {} # Track which DB each property came from for idx, schema in enumerate(schemas): for prop_name, prop_config in schema.items(): if prop_name not in merged_schema: # New property - add it directly merged_schema[prop_name] = prop_config property_sources[prop_name] = [idx] else: # Existing property - check if configuration matches if merged_schema[prop_name]['type'] == prop_config['type']: # Same type - record that this property exists in multiple DBs property_sources[prop_name].append(idx) else: # Different type - create new property with DB identifier new_name = f"{prop_name}_db{idx + 1}" merged_schema[new_name] = prop_config property_sources[new_name] = [idx] return merged_schema, property_sources
Select and Multi-Select: The Option Synchronization Challenge
A particularly tricky issue was handling select and multi-select properties. For these to work properly, the target database needs to have all possible options defined before you can assign them to pages.
My solution was to collect all select options before copying data:
async def merge_select_options(self, schemas: List[Dict[str, Dict]]) -> Dict[str, List[Dict[str, str]]]: """Merge select options from all source databases""" merged_options = {} for schema in schemas: for prop_name, prop_config in schema.items(): if prop_config['type'] in ['select', 'multi_select']: if prop_name not in merged_options: merged_options[prop_name] = [] # Get options from this schema options = prop_config.get('select', {}).get('options', []) # Add new options existing_names = {opt['name'] for opt in merged_options[prop_name]} for option in options: if option['name'] not in existing_names: merged_options[prop_name].append({'name': option['name']}) existing_names.add(option['name']) return merged_options
Copying Data: The Property Transformation Challenge
With our schema ready, we need to copy actual data. Each property type requires special handling, especially for select and multi-select values:
async def copy_data(self, source_db_id: str, target_db_id: str, db_index: int, property_sources: Dict[str, List[int]]) -> None: """Copy data from source database to target database""" pages = [] start_cursor = None # Get target database schema to validate properties target_schema = await self.get_database_schema(target_db_id) valid_properties = set(target_schema.keys()) try: # Query all pages from source database while True: response = await self.notion.databases.query( database_id=source_db_id, start_cursor=start_cursor ) pages.extend(response["results"]) if not response.get("has_more"): break start_cursor = response.get("next_cursor") # Copy each page for page in pages: try: transformed_properties = {} # Process each property for prop_name, prop_value in page["properties"].items(): # Skip if property doesn't exist in target database if prop_name not in valid_properties: continue prop_type = prop_value.get('type') if not prop_type: continue # Handle each property type if prop_type == 'title': title_content = prop_value.get('title', []) if title_content: transformed_properties[prop_name] = { 'title': [{'text': {'content': item.get('plain_text', '')}} for item in title_content] } elif prop_type == 'rich_text': rich_text = prop_value.get('rich_text', []) if rich_text: transformed_properties[prop_name] = { 'rich_text': [{ 'type': 'text', 'text': {'content': item.get('plain_text', '')} } for item in rich_text] } elif prop_type == 'multi_select': multi_select = prop_value.get('multi_select', []) if multi_select: transformed_properties[prop_name] = { 'multi_select': [{'name': item['name']} for item in multi_select] } elif prop_type == 'select': select_value = prop_value.get('select') if select_value and select_value.get('name'): transformed_properties[prop_name] = { 'select': {'name': select_value['name']} } elif prop_type == 'url': url = prop_value.get('url') if url: transformed_properties[prop_name] = {'url': url} elif prop_type == 'date': date_value = prop_value.get('date') if date_value: transformed_properties[prop_name] = {'date': date_value} elif prop_type == 'files': files = prop_value.get('files', []) transformed_properties[prop_name] = {'files': files} elif prop_type == 'relation': relation = prop_value.get('relation', []) transformed_properties[prop_name] = {'relation': relation} if transformed_properties: # Create the new page new_page = await self.notion.pages.create( parent={"database_id": target_db_id}, properties=transformed_properties ) # Copy blocks from source page to new page source_blocks = await self.get_blocks(page["id"]) if source_blocks: await self.copy_blocks(source_blocks, new_page["id"]) except Exception as e: print(f"Error copying page: {str(e)}") print(f"Property values: {transformed_properties}") continue except Exception as e: print(f"Error querying database: {str(e)}")
The Content Challenge: Copying Blocks Recursively
Notion pages aren't just properties - they contain rich content blocks. Copying these blocks preserves the actual knowledge within the pages.
This was perhaps the most challenging aspect of the entire project. Notion's block structure is complex, and the API has strict requirements for block creation.
Handling Block Types
Different block types require different handling. The most common blocks are text-based (paragraphs, headings, lists), but others include images, code blocks, embeds, and more.
My approach was to transform each block to match Notion's API requirements:
async def transform_rich_text(rich_text): """Transform rich text content to a valid format""" if not rich_text: return [] transformed = [] for text_item in rich_text: # Base annotations annotations = { "bold": False, "italic": False, "strikethrough": False, "underline": False, "code": False, "color": "default" } annotations.update(text_item.get("annotations", {})) if text_item["type"] == "text": transformed.append({ "type": "text", "text": { "content": text_item.get("plain_text", ""), "link": {"url": text_item.get("href")} if text_item.get("href") else None }, "annotations": annotations }) elif text_item["type"] == "mention": mention_data = text_item.get("mention", {}) if mention_data.get("type") == "link_mention": # Convert link mentions to text with link href = mention_data.get("link_mention", {}).get("href", "") transformed.append({ "type": "text", "text": { "content": href, "link": {"url": href} }, "annotations": annotations }) else: # For other mention types, use plain text transformed.append({ "type": "text", "text": {"content": text_item.get("plain_text", "")}, "annotations": annotations }) return transformed
Transforming Block Content
Each block type needs specific handling:
async def transform_block(block): """Transform a block into a valid format for creation""" if not block or "type" not in block: return None block_type = block["type"] content = block.get(block_type, {}) transformed_block = {"type": block_type} # Handle text-based blocks text_block_types = [ "paragraph", "heading_1", "heading_2", "heading_3", "bulleted_list_item", "numbered_list_item", "quote" ] if block_type in text_block_types: transformed_block[block_type] = { "rich_text": await transform_rich_text(content.get("rich_text", [])), "color": content.get("color", "default") } elif block_type == "to_do": transformed_block["to_do"] = { "rich_text": await transform_rich_text(content.get("rich_text", [])), "checked": content.get("checked", False), "color": content.get("color", "default") } elif block_type == "code": transformed_block["code"] = { "rich_text": await transform_rich_text(content.get("rich_text", [])), "language": content.get("language", "plain text") } elif block_type == "image": # Get the URL from either external or file type image_url = None if "external" in content: image_url = content["external"]["url"] elif "file" in content: image_url = content["file"]["url"] if image_url: # Always treat as external when copying transformed_block["image"] = { "type": "external", "external": {"url": image_url} } elif block_type == "divider": transformed_block["divider"] = {} elif block_type == "bookmark": transformed_block["bookmark"] = { "url": content.get("url", "") } return transformed_block
The Block Copying Process
The complete block copying process handles recursive copying of nested blocks:
async def copy_blocks(self, source_blocks: List[Dict], target_page_id: str): """Copy blocks from source to target page with proper transformation""" if not source_blocks: return for block in source_blocks: try: # Transform the block transformed_block = await transform_block(block) # Skip if transformation failed if transformed_block is None: print(f"Skipping invalid block: {block}") continue # Handle empty paragraph blocks if (transformed_block["type"] == "paragraph" and not transformed_block["paragraph"]["rich_text"]): transformed_block["paragraph"]["rich_text"] = [{ "type": "text", "text": {"content": ""}, "annotations": { "bold": False, "italic": False, "strikethrough": False, "underline": False, "code": False, "color": "default" } }] # Append the transformed block response = await self.notion.blocks.children.append( block_id=target_page_id, children=[transformed_block] ) # Handle children if they exist if block.get("has_children", False): new_block_id = response["results"][0]["id"] child_blocks = await self.get_blocks(block["id"]) await self.copy_blocks(child_blocks, new_block_id) except Exception as e: print(f"Error copying block: {block}") print(f"Error details: {str(e)}") continue
Lessons Learned: The Quirks of Notion's API
Building this tool taught me several important lessons about working with Notion's API:
- Property Types Matter: You can't simply copy property configurations - each type needs specific handling
- Select Options Must Pre-exist: You must create select options before assigning them to pages
- Block Structure Is Complex: Block content has a nested structure that must be carefully transformed
- Rich Text Transformation: Rich text elements like mentions and links require special handling
- Image Handling: When copying images, they must be treated as external URLs regardless of their original source
- Empty Block Handling: Empty paragraph blocks need special treatment to avoid validation errors
- Error Handling Is Critical: API errors for one page shouldn't stop the entire process
Handling Link Mentions and Rich Content
Another challenging aspect was handling link mentions and embeds. These are rendered as beautiful previews in Notion, but the API treats them as complex structured data.
My solution was to convert link mentions to regular text with links, preserving the URL but simplifying the structure:
# Converting link mentions to regular text with links if mention_data.get("type") == "link_mention": href = mention_data.get("link_mention", {}).get("href", "") transformed.append({ "type": "text", "text": { "content": href, "link": {"url": href} }, "annotations": annotations })
Image Block Transformation
Final tricky aspect was handling image blocks. When copying images between pages, we need to treat all images as external URLs:
# Get the URL from either external or file type image_url = None if "external" in content: image_url = content["external"]["url"] elif "file" in content: image_url = content["file"]["url"] if image_url: # Always treat as external when copying transformed_block["image"] = { "type": "external", "external": {"url": image_url} }
Results: From Many to One
The first time I ran this on my fragmented idea databases, the results were impressive:
- 4 source databases with 300+ pages combined into one
- 27 unique properties properly merged and configured
- 1,500+ content blocks correctly migrated
- Hours saved compared to manual copying
I now had a unified view of all my ideas, properly categorized and searchable.
Handling Edge Cases and Error Recovery
No data migration is perfect. I built in several mechanisms to handle errors gracefully:
Empty Block Handling
Notion's API sometimes rejects empty blocks, so I added special handling:
# Handle empty paragraph blocks if (transformed_block["type"] == "paragraph" and not transformed_block["paragraph"]["rich_text"]): transformed_block["paragraph"]["rich_text"] = [{ "type": "text", "text": {"content": ""}, "annotations": { "bold": False, "italic": False, "strikethrough": False, "underline": False, "code": False, "color": "default" } }]
Error Logging and Recovery
Robust error handling ensures that failures don't stop the entire process:
try: # Copy operation code... except Exception as e: print(f"Error copying block: {block}") print(f"Error details: {str(e)}") continue # Continue with next block
Future Enhancements
There's always room for improvement. Here are some features I'm considering:
- Intelligent Property Mapping: Using AI to suggest property mappings based on names and content
- Selective Merging: Allowing users to choose which properties to include
- Bidirectional Sync: Keeping merged databases in sync over time
- Conflict Resolution UI: A user interface for handling property conflicts
- Enhanced Block Support: Better handling of complex block types like synced blocks and databases
- Nested Database Support: Handling databases within databases
Making the Script Even More User-Friendly
If you're interested in extending the script, here are some user-friendly features you could add:
- Interactive Mode: A guided CLI experience that walks users through the merge process
- Database Preview: Show a preview of the schemas before merging
- Post-Merge Statistics: Generate a report of what was merged
- Selective Page Copying: Allow users to specify filters for which pages to copy
- Property Transformation Rules: Define custom rules for transforming property values during the merge
Using the Standalone Database Merger Script
I've created a standalone script version of the Notion Database Merger that doesn't require any web framework. This makes it much easier to use for most people who just want to merge their databases quickly.
Setting Up the Script
-
Set up a Notion integration:
- Go to https://www.notion.so/my-integrations and create a new integration
- Give it a name like "Database Merger"
- Copy your secret token (you'll need it later)
-
Share databases with your integration:
- Go to each database you want to merge
- Click "Share" in the top right
- Add your integration to the share list
-
Install required packages:
pip install notion-client python-dotenv
-
Create a .env file in the same directory as the script:
NOTION_TOKEN=your_secret_notion_token
Running the Script
The script uses a simple command-line interface:
python notion_merger.py --sources db1_id db2_id db3_id --title "Merged Database" --parent parent_page_id
Parameters:
--sources
: The IDs of the source databases you want to merge (space-separated)--title
: The title for your new merged database--parent
: The ID of the page where you want to create the merged database
You can find database and page IDs in Notion URLs. For example, in the URL https://www.notion.so/username/abcdef1234567890abcdef1234567890
, the ID is abcdef1234567890abcdef1234567890
.
Example Usage
# Merge three databases into one called "Master Ideas Database" python notion_merger.py \ --sources 8a12b345c678d90efgh1234ij56k789l 2a34b567c890d12efgh3456ij78k901l 3a45b678c901d23efgh4567ij89k012l \ --title "Master Ideas Database" \ --parent 9a89b789c678d67efgh8901ij23k456l
The script will:
- Create a new database with all combined properties
- Copy all pages with their properties from each source database
- Preserve all content blocks, images, and formatting
- Show progress logs as it works
The entire process typically takes a few minutes, depending on how many pages and blocks you're merging.
The Complete Code
I've shared the complete code for the standalone Notion Database Merger in this GitHub Gist: notion-database-merger.py
This includes all the functions discussed in this post, properly organized and documented. Feel free to use it as a starting point for your own database merger tool.
Key Features of the Standalone Script
- No Web Framework Required: Works as a simple command-line tool
- Comprehensive Logging: Detailed progress updates and error reporting
- Robust Error Handling: Failures with individual pages don't stop the whole process
- All Block Types Supported: Correctly handles text, images, embeds, and other block types
- Proper Property Transformation: Handles all property types including selects, rich text, etc.
Conclusion
Building a Notion Database Merger taught me about both the capabilities and limitations of the Notion API. What started as a weekend project to solve my own database fragmentation problem turned into a robust tool that's now part of my larger knowledge management system.
The beauty of building custom tools is that they can precisely match your workflow. This merger doesn't just save time - it enables new ways of organizing information that weren't possible when my data was fragmented across multiple databases.