Skip to content
Setting up Meilisearch on AWS for Python Projects
tech

Setting up Meilisearch on AWS for Python Projects

NB
Nagesh Bansal
March 17, 20258 min read
aws
meilisearch
python
search
typesense

A practical guide to deploying Meilisearch on AWS EC2 for Python applications, with insights from my experience implementing search for Hyperly and StarJobs, including a comparison with Typesense.

Setting up Meilisearch on AWS for Python Projects

When I started building StarJobs.dev, implementing robust search functionality was one of the core requirements. Having already successfully integrated Meilisearch with Hyperly, it was a relatively straightforward decision to stick with what I knew worked well.

That said, I did my due diligence and briefly revisited the comparison between Meilisearch and Typesense to ensure I wasn't missing out on any major improvements or features that might benefit StarJobs specifically. Let me share what I found and why Meilisearch remained the right choice for my new project.

Meilisearch vs. Typesense: The Showdown

Both search engines are fantastic options, but they have different strengths. Here's what I discovered in my comparison:

FeatureMy Experience
SetupMeilisearch was slightly easier to get running with minimal configuration, which was a big plus when I just wanted to get things moving.
PerformanceFor my dataset of around 5 million jobs, both performed well, but Meilisearch seemed to handle complex queries with less tuning.
Multi-languageThis was a deciding factor for me - Meilisearch's multi-language support was more robust out of the box, which I needed for international job listings.
RelevanceMeilisearch's default relevance algorithm just worked better for my specific use case without much tweaking.

Typesense has its advantages too - especially its built-in dashboard and potentially better resource efficiency for very large datasets. But for my specific needs with StarJobs, Meilisearch was the winner.

Now, let's dive into how I set up Meilisearch on AWS for my Python application.

Setting Up Meilisearch on AWS EC2

Step 1: Launching an EC2 Instance

First, I needed a server to host Meilisearch. I went with an EC2 instance on AWS:

  1. I logged into AWS Console and navigated to EC2
  2. For StarJobs, I chose a t3a.medium instance with Ubuntu 22.04 (more RAM is helpful for search engines)
  3. Added 30GB of SSD storage (search indexes can grow quickly)
  4. Created a security group allowing SSH (port 22) and Meilisearch (port 7700)

When I did this for Hyperly, I initially chose a smaller instance (t2.micro) which worked fine for testing but needed upgrading once our document count increased. Learn from my mistake - start with something reasonable!

Step 2: Installing Meilisearch

After SSH-ing into my instance, installation was surprisingly straightforward:

# Update system packages
sudo apt update && sudo apt upgrade -y

# Install necessary packages
sudo apt install curl gnupg2 -y

# Download and install Meilisearch
curl -L https://install.meilisearch.com | sh

# Verify the installation
./meilisearch --help

At this point, I had Meilisearch downloaded, but I needed it to run as a service so it would restart automatically if needed.

Step 3: Setting Up Meilisearch as a Service

This part was slightly tricky the first time I did it for Hyperly, but I had learned my lesson:

# Create a systemd service file
sudo nano /etc/systemd/system/meilisearch.service

In the editor, I added:

[Unit]
Description=Meilisearch
After=network.target

[Service]
Type=simple
User=ubuntu
ExecStart=/home/ubuntu/meilisearch --master-key YOUR_MASTER_KEY_HERE --db-path /home/ubuntu/meili_data
Restart=on-failure
LimitNOFILE=65535

[Install]
WantedBy=multi-user.target

A few important points about this configuration:

  1. I added --db-path to specify where the data would be stored, making backups easier
  2. The LimitNOFILE parameter was crucial - I learned this the hard way when Hyperly's search stopped working under heavy load
  3. Creating a strong master key is essential for security

Then I started the service:

sudo systemctl daemon-reload
sudo systemctl enable meilisearch
sudo systemctl start meilisearch

For production environments, I prefer adding Nginx as a reverse proxy with SSL:

# Install Nginx
sudo apt install nginx -y

# Create Nginx configuration
sudo nano /etc/nginx/sites-available/meilisearch

I used this configuration:

server {
    listen 80;
    server_name search.starjobs.dev;  # Replace with your domain

    location / {
        proxy_pass http://localhost:7700;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_http_version 1.1;
        proxy_set_header Connection "";
    }
}

Then activated it and got SSL with Let's Encrypt:

sudo ln -s /etc/nginx/sites-available/meilisearch /etc/nginx/sites-enabled/
sudo certbot --nginx -d search.starjobs.dev
sudo systemctl restart nginx

With this setup, I could access my Meilisearch instance securely through https://search.starjobs.dev.

Connecting to Meilisearch from Python

Now for the fun part - actually using Meilisearch in my Python application!

First, I installed the Python client:

pip install meilisearch

Then I created a configuration module to keep things organized:

# meilisearch_config.py
from meilisearch import Client

# Connection details
MEILISEARCH_URL = "https://search.starjobs.dev"  # or your EC2 public IP with port
MEILISEARCH_KEY = "your_master_key_here"
INDEX_NAME = "jobs"

# Initialize client and index
client = None
index = None

def init_meilisearch():
    global client, index
    client = Client(MEILISEARCH_URL, MEILISEARCH_KEY)
    
    # Configure index if it doesn't exist
    if INDEX_NAME not in [index['uid'] for index in client.get_indexes()['results']]:
        index = client.create_index(INDEX_NAME, {'primaryKey': 'id'})
        
        # Configure searchable attributes
        index.update_searchable_attributes([
            'title',
            'description',
            'company',
            'location',
            'topics'
        ])
        
        # Configure filterable attributes
        index.update_filterable_attributes([
            'company',
            'location',
            'experience_level',
            'job_type',
            'remote'
        ])
        
        # Configure sortable attributes
        index.update_sortable_attributes([
            'created_at',
            'salary'
        ])
    else:
        index = client.index(INDEX_NAME)
    
    return index

With this configuration in place, I could now create an indexing script to populate Meilisearch from my MongoDB database:

# meilisearch_indexer.py
import time
from meilisearch_config import init_meilisearch, index
from pymongo import MongoClient

BATCH_SIZE = 10000  # Found this to be a good balance for my dataset

# MongoDB connection
mongo_client = MongoClient('mongodb://localhost:27017')
db = mongo_client['starjobs']
jobs = db['jobs']

# Index jobs in MeiliSearch
def index_jobs():
    batch = []
    indexed_count = 0
    
    for job in jobs.find().batch_size(BATCH_SIZE):
        document = {
            'id': str(job['_id']),
            'title': job['title'],
            'description': job['description'],
            'company': job['company_name'],
            'location': job['location'],
            'salary': job.get('salary', 0),
            'job_type': job.get('job_type', 'Full-time'),
            'experience_level': job.get('experience_level', 'Mid-Level'),
            'remote': job.get('remote', False),
            'created_at': int(job['created_at'].timestamp()),
            'topics': job.get('keywords', [])
        }
        batch.append(document)
        
        # Bulk insert once batch size is reached
        if len(batch) >= BATCH_SIZE:
            index.add_documents(batch)
            indexed_count += len(batch)
            print(f"Indexed {indexed_count} jobs")
            batch = []

    # Add any remaining documents in the last batch
    if batch:
        index.add_documents(batch)
        indexed_count += len(batch)
        print(f"Indexed {indexed_count} jobs (final batch)")

if __name__ == "__main__":
    # Initialize MeiliSearch and set up index
    init_meilisearch()

    # Index jobs into MeiliSearch
    index_jobs()

    # Wait for indexing to complete
    while True:
        stats = index.get_stats()
        if not stats['isIndexing']:
            break
        print("Still indexing... Current count:", stats['numberOfDocuments'])
        time.sleep(5)

    print("Indexing completed successfully")

This script efficiently loads jobs from MongoDB and indexes them in Meilisearch in batches. The first time I ran this for StarJobs with about 2 million jobs, it took around 45 minutes to complete. Not bad!

Searching from My FastAPI Application

Once everything was indexed, I could implement the search in my FastAPI application:

from fastapi import FastAPI, Query, Depends
from typing import Optional, List
from pydantic import BaseModel
from meilisearch_config import init_meilisearch, index

app = FastAPI()

# Initialize Meilisearch on startup
@app.on_event("startup")
async def startup_event():
    init_meilisearch()

class SearchResult(BaseModel):
    hits: List[dict]
    nbHits: int
    processingTimeMs: int
    query: str

@app.get("/api/search", response_model=SearchResult)
async def search_jobs(
    q: str = Query("", description="Search query"),
    location: Optional[str] = None,
    remote: Optional[bool] = None,
    experience: Optional[str] = None,
    page: int = Query(1, ge=1),
    limit: int = Query(20, ge=1, le=100)
):
    filters = []
    
    # Build filters based on query parameters
    if location:
        filters.append(f"location = '{location}'")
    
    if remote:
        filters.append("remote = true")
    
    if experience:
        filters.append(f"experience_level = '{experience}'")
    
    # Combine filters with AND operator
    filter_string = " AND ".join(filters) if filters else None
    
    # Convert to offset/limit for Meilisearch
    offset = (page - 1) * limit
    
    # Perform search
    results = index.search(
        q,
        {
            'filter': filter_string,
            'limit': limit,
            'offset': offset,
            'sort': ['created_at:desc']
        }
    )
    
    return results

Lessons Learned & Performance Observations

After implementing Meilisearch for both Hyperly and StarJobs, here are some valuable lessons I learned:

  1. Memory Matters: I initially underestimated memory requirements. For ~5 million documents, a minimum of 4GB RAM is recommended.

  2. Batch Sizes: Finding the right batch size for your data is crucial. Too small = slow indexing. Too large = timeouts. I landed on 1000 for StarJobs.

  3. Backups: I automated daily backups of the Meilisearch database folder to S3. This saved me once when I needed to migrate to a larger instance.

  4. Filterable vs. Searchable: Be strategic about which fields you make filterable vs. searchable. I found that making too many fields filterable slowed things down.

  5. Typo Tolerance: Meilisearch's typo tolerance is amazing for job searches. People often misspell company names or technologies, and it still finds relevant results.

In terms of performance, Meilisearch has been outstanding. Even with complex queries across millions of posts, response times typically stay under 100ms. The search relevance has been excellent out of the box, which was a pleasant surprise compared to the extensive tuning I had to do with Elasticsearch in previous projects.

Conclusion

Setting up Meilisearch on AWS for a Python project is relatively straightforward, and the results are worth it. While both Meilisearch and Typesense are excellent choices, I found Meilisearch to be a better fit for my specific needs with StarJobs.dev.

If you're building a search-heavy application and want something that "just works" without extensive configuration, give Meilisearch a try. It's been a game-changer for both Hyperly and now StarJobs, allowing us to provide lightning-fast, relevant search results without managing complex infrastructure.

Have you implemented search in your projects? I'd love to hear about your experiences in the comments!