IntelliTalent

GPT-4o-mini · FAISS · LangChain · MongoDB — 2026

IntelliTalent hero

Overview

IntelliTalent is a full-stack job matching platform that parses a PDF resume, extracts structured skills using GPT-4o-mini with FAISS-backed retrieval-augmented generation, and ranks live job listings from LinkedIn and Indeed using a hybrid scoring algorithm combining semantic similarity and skill overlap.

Problem

Job seekers manually scan hundreds of listings that are irrelevant to their skills. Keyword-based search misses contextually relevant roles while returning noise. There's no easy way to know which jobs you're actually qualified for.

Approach

Parse resume with PyMuPDF → extract skills via GPT-4o-mini + FAISS RAG over an 89-skill taxonomy → scrape live jobs via Apify → embed resume and all job descriptions using OpenAI text-embedding-3-small (1,536-dim) → compute hybrid score (60% cosine similarity + 40% skill overlap via regex taxonomy matching) → rank and display results with skill gap analysis.

System Architecture

The platform processes a resume through a multi-stage pipeline before surfacing ranked job matches:

  • PDF Upload (Streamlit) → PyMuPDF text extraction
  • GPT-4o-mini: resume summarization + skill extraction via FAISS RAG over 89-skill taxonomy
  • Apify Actors: live LinkedIn + Indeed job scrape
  • OpenAI batch embeddings — text-embedding-3-small (1,536-dim) for resume and all jobs
  • NumPy vectorized cosine similarity across all jobs in <100ms
  • Hybrid score = 0.6 × semantic similarity + 0.4 × skill overlap (regex taxonomy matching)
  • MongoDB: user profiles, job cache with 7-day TTL indexes, match history
  • Streamlit dashboard: ranked job cards + skill gap analysis panel

Key Features

  • FAISS-backed RAG over 89-skill taxonomy — injects only relevant skills into LLM context, reducing token usage
  • Regex taxonomy matching replaces per-job LLM calls (~50ms vs 5+ sec per job)
  • Batch OpenAI embeddings with vectorized cosine similarity across all jobs in <100ms
  • O*NET role clustering (1,000+ occupations) with synonym expansion for smarter search
  • GPT-generated role suggestions + role selection screen before job search
  • Auth: bcrypt password hashing, OTP email verification, brute-force account lockout
  • MongoDB TTL indexes auto-expire job listings after 7 days

Technical Stack

  • LLM: GPT-4o-mini, text-embedding-3-small (OpenAI)
  • Orchestration: LangChain
  • Vector Store: FAISS
  • PDF Parsing: PyMuPDF
  • Frontend: Streamlit
  • Database: MongoDB 7.0
  • Job Scraping: Apify Actors (LinkedIn + Indeed)
  • Containerization: Docker + Docker Compose
  • Cloud: AWS EC2, GCP Cloud Run
  • Secrets: AWS SSM Parameter Store, GCP Secret Manager

Deployment

Deployed on AWS EC2 (t2.micro) and GCP Cloud Run with 0–3 auto-scaling instances and 2GB RAM. Secrets managed via AWS SSM Parameter Store and GCP Secret Manager. Dockerized with Python 3.11-slim base image.

Challenges & Solutions

  • Per-job LLM calls were too slow (5+ sec each) → replaced with regex taxonomy matching (~50ms total)
  • Full taxonomy in every prompt was token-heavy → FAISS RAG injects only top-k relevant skills into context
  • Job cache lost on container restart → moved to MongoDB with 7-day TTL indexes for persistence

Future Improvements

Application tracker, resume quality score, cross-user job alerts, and a personal analytics dashboard.