IntelliTalent

GPT-4o-mini · FAISS · LangChain · MongoDB — 2026

Overview

IntelliTalent is a full-stack job matching platform that parses a PDF resume, extracts structured skills using GPT-4o-mini with FAISS-backed retrieval-augmented generation, and ranks live job listings from LinkedIn and Indeed using a hybrid scoring algorithm combining semantic similarity and skill overlap.

Problem

Job seekers manually scan hundreds of listings that are irrelevant to their skills. Keyword-based search misses contextually relevant roles while returning noise. There's no easy way to know which jobs you're actually qualified for.

Approach

Parse resume with PyMuPDF → extract skills via GPT-4o-mini + FAISS RAG over an 89-skill taxonomy → scrape live jobs via Apify → embed resume and all job descriptions using OpenAI text-embedding-3-small (1,536-dim) → compute hybrid score (60% cosine similarity + 40% skill overlap via regex taxonomy matching) → rank and display results with skill gap analysis.

System Architecture

The platform processes a resume through a multi-stage pipeline before surfacing ranked job matches:

PDF Upload (Streamlit) → PyMuPDF text extraction
GPT-4o-mini: resume summarization + skill extraction via FAISS RAG over 89-skill taxonomy
Apify Actors: live LinkedIn + Indeed job scrape
OpenAI batch embeddings — text-embedding-3-small (1,536-dim) for resume and all jobs
NumPy vectorized cosine similarity across all jobs in <100ms
Hybrid score = 0.6 × semantic similarity + 0.4 × skill overlap (regex taxonomy matching)
MongoDB: user profiles, job cache with 7-day TTL indexes, match history
Streamlit dashboard: ranked job cards + skill gap analysis panel

Key Features

FAISS-backed RAG over 89-skill taxonomy — injects only relevant skills into LLM context, reducing token usage
Regex taxonomy matching replaces per-job LLM calls (~50ms vs 5+ sec per job)
Batch OpenAI embeddings with vectorized cosine similarity across all jobs in <100ms
O*NET role clustering (1,000+ occupations) with synonym expansion for smarter search
GPT-generated role suggestions + role selection screen before job search
Auth: bcrypt password hashing, OTP email verification, brute-force account lockout
MongoDB TTL indexes auto-expire job listings after 7 days

Technical Stack

LLM: GPT-4o-mini, text-embedding-3-small (OpenAI)
Orchestration: LangChain
Vector Store: FAISS
PDF Parsing: PyMuPDF
Frontend: Streamlit
Database: MongoDB 7.0
Job Scraping: Apify Actors (LinkedIn + Indeed)
Containerization: Docker + Docker Compose
Cloud: AWS EC2, GCP Cloud Run
Secrets: AWS SSM Parameter Store, GCP Secret Manager

Deployment

Deployed on AWS EC2 (t2.micro) and GCP Cloud Run with 0–3 auto-scaling instances and 2GB RAM. Secrets managed via AWS SSM Parameter Store and GCP Secret Manager. Dockerized with Python 3.11-slim base image.

Challenges & Solutions

Per-job LLM calls were too slow (5+ sec each) → replaced with regex taxonomy matching (~50ms total)
Full taxonomy in every prompt was token-heavy → FAISS RAG injects only top-k relevant skills into context
Job cache lost on container restart → moved to MongoDB with 7-day TTL indexes for persistence

Future Improvements

Application tracker, resume quality score, cross-user job alerts, and a personal analytics dashboard.