IntelliTalent
GPT-4o-mini · FAISS · LangChain · MongoDB — 2026

Overview
IntelliTalent is a full-stack job matching platform that parses a PDF resume, extracts structured skills using GPT-4o-mini with FAISS-backed retrieval-augmented generation, and ranks live job listings from LinkedIn and Indeed using a hybrid scoring algorithm combining semantic similarity and skill overlap.
Problem
Job seekers manually scan hundreds of listings that are irrelevant to their skills. Keyword-based search misses contextually relevant roles while returning noise. There's no easy way to know which jobs you're actually qualified for.
Approach
Parse resume with PyMuPDF → extract skills via GPT-4o-mini + FAISS RAG over an 89-skill taxonomy → scrape live jobs via Apify → embed resume and all job descriptions using OpenAI text-embedding-3-small (1,536-dim) → compute hybrid score (60% cosine similarity + 40% skill overlap via regex taxonomy matching) → rank and display results with skill gap analysis.
System Architecture
The platform processes a resume through a multi-stage pipeline before surfacing ranked job matches:
- PDF Upload (Streamlit) → PyMuPDF text extraction
- GPT-4o-mini: resume summarization + skill extraction via FAISS RAG over 89-skill taxonomy
- Apify Actors: live LinkedIn + Indeed job scrape
- OpenAI batch embeddings — text-embedding-3-small (1,536-dim) for resume and all jobs
- NumPy vectorized cosine similarity across all jobs in <100ms
- Hybrid score = 0.6 × semantic similarity + 0.4 × skill overlap (regex taxonomy matching)
- MongoDB: user profiles, job cache with 7-day TTL indexes, match history
- Streamlit dashboard: ranked job cards + skill gap analysis panel
Key Features
- FAISS-backed RAG over 89-skill taxonomy — injects only relevant skills into LLM context, reducing token usage
- Regex taxonomy matching replaces per-job LLM calls (~50ms vs 5+ sec per job)
- Batch OpenAI embeddings with vectorized cosine similarity across all jobs in <100ms
- O*NET role clustering (1,000+ occupations) with synonym expansion for smarter search
- GPT-generated role suggestions + role selection screen before job search
- Auth: bcrypt password hashing, OTP email verification, brute-force account lockout
- MongoDB TTL indexes auto-expire job listings after 7 days
Technical Stack
- LLM: GPT-4o-mini, text-embedding-3-small (OpenAI)
- Orchestration: LangChain
- Vector Store: FAISS
- PDF Parsing: PyMuPDF
- Frontend: Streamlit
- Database: MongoDB 7.0
- Job Scraping: Apify Actors (LinkedIn + Indeed)
- Containerization: Docker + Docker Compose
- Cloud: AWS EC2, GCP Cloud Run
- Secrets: AWS SSM Parameter Store, GCP Secret Manager
Deployment
Deployed on AWS EC2 (t2.micro) and GCP Cloud Run with 0–3 auto-scaling instances and 2GB RAM. Secrets managed via AWS SSM Parameter Store and GCP Secret Manager. Dockerized with Python 3.11-slim base image.
Challenges & Solutions
- Per-job LLM calls were too slow (5+ sec each) → replaced with regex taxonomy matching (~50ms total)
- Full taxonomy in every prompt was token-heavy → FAISS RAG injects only top-k relevant skills into context
- Job cache lost on container restart → moved to MongoDB with 7-day TTL indexes for persistence
Future Improvements
Application tracker, resume quality score, cross-user job alerts, and a personal analytics dashboard.