Setting up Meilisearch on AWS for Python Projects
When I started building StarJobs.dev, implementing robust search functionality was one of the core requirements. Having already successfully integrated Meilisearch with Hyperly, it was a relatively straightforward decision to stick with what I knew worked well.
That said, I did my due diligence and briefly revisited the comparison between Meilisearch and Typesense to ensure I wasn't missing out on any major improvements or features that might benefit StarJobs specifically. Let me share what I found and why Meilisearch remained the right choice for my new project.
Meilisearch vs. Typesense: The Showdown
Both search engines are fantastic options, but they have different strengths. Here's what I discovered in my comparison:
Feature | My Experience |
---|---|
Setup | Meilisearch was slightly easier to get running with minimal configuration, which was a big plus when I just wanted to get things moving. |
Performance | For my dataset of around 5 million jobs, both performed well, but Meilisearch seemed to handle complex queries with less tuning. |
Multi-language | This was a deciding factor for me - Meilisearch's multi-language support was more robust out of the box, which I needed for international job listings. |
Relevance | Meilisearch's default relevance algorithm just worked better for my specific use case without much tweaking. |
Typesense has its advantages too - especially its built-in dashboard and potentially better resource efficiency for very large datasets. But for my specific needs with StarJobs, Meilisearch was the winner.
Now, let's dive into how I set up Meilisearch on AWS for my Python application.
Setting Up Meilisearch on AWS EC2
Step 1: Launching an EC2 Instance
First, I needed a server to host Meilisearch. I went with an EC2 instance on AWS:
- I logged into AWS Console and navigated to EC2
- For StarJobs, I chose a
t3a.medium
instance with Ubuntu 22.04 (more RAM is helpful for search engines) - Added 30GB of SSD storage (search indexes can grow quickly)
- Created a security group allowing SSH (port 22) and Meilisearch (port 7700)
When I did this for Hyperly, I initially chose a smaller instance (t2.micro
) which worked fine for testing but needed upgrading once our document count increased. Learn from my mistake - start with something reasonable!
Step 2: Installing Meilisearch
After SSH-ing into my instance, installation was surprisingly straightforward:
# Update system packages sudo apt update && sudo apt upgrade -y # Install necessary packages sudo apt install curl gnupg2 -y # Download and install Meilisearch curl -L https://install.meilisearch.com | sh # Verify the installation ./meilisearch --help
At this point, I had Meilisearch downloaded, but I needed it to run as a service so it would restart automatically if needed.
Step 3: Setting Up Meilisearch as a Service
This part was slightly tricky the first time I did it for Hyperly, but I had learned my lesson:
# Create a systemd service file sudo nano /etc/systemd/system/meilisearch.service
In the editor, I added:
[Unit] Description=Meilisearch After=network.target [Service] Type=simple User=ubuntu ExecStart=/home/ubuntu/meilisearch --master-key YOUR_MASTER_KEY_HERE --db-path /home/ubuntu/meili_data Restart=on-failure LimitNOFILE=65535 [Install] WantedBy=multi-user.target
A few important points about this configuration:
- I added
--db-path
to specify where the data would be stored, making backups easier - The
LimitNOFILE
parameter was crucial - I learned this the hard way when Hyperly's search stopped working under heavy load - Creating a strong master key is essential for security
Then I started the service:
sudo systemctl daemon-reload sudo systemctl enable meilisearch sudo systemctl start meilisearch
Step 4: Setting Up Nginx as a Reverse Proxy (Optional but Recommended)
For production environments, I prefer adding Nginx as a reverse proxy with SSL:
# Install Nginx sudo apt install nginx -y # Create Nginx configuration sudo nano /etc/nginx/sites-available/meilisearch
I used this configuration:
server { listen 80; server_name search.starjobs.dev; # Replace with your domain location / { proxy_pass http://localhost:7700; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_http_version 1.1; proxy_set_header Connection ""; } }
Then activated it and got SSL with Let's Encrypt:
sudo ln -s /etc/nginx/sites-available/meilisearch /etc/nginx/sites-enabled/ sudo certbot --nginx -d search.starjobs.dev sudo systemctl restart nginx
With this setup, I could access my Meilisearch instance securely through https://search.starjobs.dev
.
Connecting to Meilisearch from Python
Now for the fun part - actually using Meilisearch in my Python application!
First, I installed the Python client:
pip install meilisearch
Then I created a configuration module to keep things organized:
# meilisearch_config.py from meilisearch import Client # Connection details MEILISEARCH_URL = "https://search.starjobs.dev" # or your EC2 public IP with port MEILISEARCH_KEY = "your_master_key_here" INDEX_NAME = "jobs" # Initialize client and index client = None index = None def init_meilisearch(): global client, index client = Client(MEILISEARCH_URL, MEILISEARCH_KEY) # Configure index if it doesn't exist if INDEX_NAME not in [index['uid'] for index in client.get_indexes()['results']]: index = client.create_index(INDEX_NAME, {'primaryKey': 'id'}) # Configure searchable attributes index.update_searchable_attributes([ 'title', 'description', 'company', 'location', 'topics' ]) # Configure filterable attributes index.update_filterable_attributes([ 'company', 'location', 'experience_level', 'job_type', 'remote' ]) # Configure sortable attributes index.update_sortable_attributes([ 'created_at', 'salary' ]) else: index = client.index(INDEX_NAME) return index
With this configuration in place, I could now create an indexing script to populate Meilisearch from my MongoDB database:
# meilisearch_indexer.py import time from meilisearch_config import init_meilisearch, index from pymongo import MongoClient BATCH_SIZE = 10000 # Found this to be a good balance for my dataset # MongoDB connection mongo_client = MongoClient('mongodb://localhost:27017') db = mongo_client['starjobs'] jobs = db['jobs'] # Index jobs in MeiliSearch def index_jobs(): batch = [] indexed_count = 0 for job in jobs.find().batch_size(BATCH_SIZE): document = { 'id': str(job['_id']), 'title': job['title'], 'description': job['description'], 'company': job['company_name'], 'location': job['location'], 'salary': job.get('salary', 0), 'job_type': job.get('job_type', 'Full-time'), 'experience_level': job.get('experience_level', 'Mid-Level'), 'remote': job.get('remote', False), 'created_at': int(job['created_at'].timestamp()), 'topics': job.get('keywords', []) } batch.append(document) # Bulk insert once batch size is reached if len(batch) >= BATCH_SIZE: index.add_documents(batch) indexed_count += len(batch) print(f"Indexed {indexed_count} jobs") batch = [] # Add any remaining documents in the last batch if batch: index.add_documents(batch) indexed_count += len(batch) print(f"Indexed {indexed_count} jobs (final batch)") if __name__ == "__main__": # Initialize MeiliSearch and set up index init_meilisearch() # Index jobs into MeiliSearch index_jobs() # Wait for indexing to complete while True: stats = index.get_stats() if not stats['isIndexing']: break print("Still indexing... Current count:", stats['numberOfDocuments']) time.sleep(5) print("Indexing completed successfully")
This script efficiently loads jobs from MongoDB and indexes them in Meilisearch in batches. The first time I ran this for StarJobs with about 2 million jobs, it took around 45 minutes to complete. Not bad!
Searching from My FastAPI Application
Once everything was indexed, I could implement the search in my FastAPI application:
from fastapi import FastAPI, Query, Depends from typing import Optional, List from pydantic import BaseModel from meilisearch_config import init_meilisearch, index app = FastAPI() # Initialize Meilisearch on startup @app.on_event("startup") async def startup_event(): init_meilisearch() class SearchResult(BaseModel): hits: List[dict] nbHits: int processingTimeMs: int query: str @app.get("/api/search", response_model=SearchResult) async def search_jobs( q: str = Query("", description="Search query"), location: Optional[str] = None, remote: Optional[bool] = None, experience: Optional[str] = None, page: int = Query(1, ge=1), limit: int = Query(20, ge=1, le=100) ): filters = [] # Build filters based on query parameters if location: filters.append(f"location = '{location}'") if remote: filters.append("remote = true") if experience: filters.append(f"experience_level = '{experience}'") # Combine filters with AND operator filter_string = " AND ".join(filters) if filters else None # Convert to offset/limit for Meilisearch offset = (page - 1) * limit # Perform search results = index.search( q, { 'filter': filter_string, 'limit': limit, 'offset': offset, 'sort': ['created_at:desc'] } ) return results
Lessons Learned & Performance Observations
After implementing Meilisearch for both Hyperly and StarJobs, here are some valuable lessons I learned:
-
Memory Matters: I initially underestimated memory requirements. For ~5 million documents, a minimum of 4GB RAM is recommended.
-
Batch Sizes: Finding the right batch size for your data is crucial. Too small = slow indexing. Too large = timeouts. I landed on 1000 for StarJobs.
-
Backups: I automated daily backups of the Meilisearch database folder to S3. This saved me once when I needed to migrate to a larger instance.
-
Filterable vs. Searchable: Be strategic about which fields you make filterable vs. searchable. I found that making too many fields filterable slowed things down.
-
Typo Tolerance: Meilisearch's typo tolerance is amazing for job searches. People often misspell company names or technologies, and it still finds relevant results.
In terms of performance, Meilisearch has been outstanding. Even with complex queries across millions of posts, response times typically stay under 100ms. The search relevance has been excellent out of the box, which was a pleasant surprise compared to the extensive tuning I had to do with Elasticsearch in previous projects.
Conclusion
Setting up Meilisearch on AWS for a Python project is relatively straightforward, and the results are worth it. While both Meilisearch and Typesense are excellent choices, I found Meilisearch to be a better fit for my specific needs with StarJobs.dev.
If you're building a search-heavy application and want something that "just works" without extensive configuration, give Meilisearch a try. It's been a game-changer for both Hyperly and now StarJobs, allowing us to provide lightning-fast, relevant search results without managing complex infrastructure.
Have you implemented search in your projects? I'd love to hear about your experiences in the comments!