Back to Portfolio
Deep Tech • NLP • RAG Pipeline

ThinkJoel
Docu-Brain

Turning static documentation into a dynamic knowledge base. This project showcases a full-stack implementation of Retrieval-Augmented Generation, enabling precision AI conversations grounded in your private data.

ThinkJoel AI Interface

Processing Engine

Python / Flask

Reasoning Model

Gemini Pro

Vector Memory

ChromaDB

Pipeline

LangChain

Technical Highlights

Solving the Hallucination Problem

Traditional LLMs guess. This system retrieves. By grounding answers in real data, we ensure enterprise-grade accuracy.

Semantic Ingestion

Beyond simple text extraction, the system transforms raw PDF data into high-dimensional vector embeddings using HuggingFace models, preserving the semantic essence of every paragraph.

PDF MinerRecursive Character SplittingHuggingFace

Contextual Retrieval

Leveraging ChromaDB for sub-millisecond similarity searches, the RAG pipeline identifies the most relevant document fragments to ground AI responses in factual evidence.

ChromaDBVector Similarity SearchLangChain

Grounded Generation

Engineered with Google Gemini to eliminate hallucinations. The LLM acts as a reasoning engine, synthesizing answers only from the context provided by the retrieval layer.

Google Gemini ProSystem Prompt EngineeringHistory Buffer

The Pipeline Architecture

01

The Ingestion Engine

Documents are parsed and broken into overlapping chunks. This ensures that context isn't lost at the boundaries of a split.

02

Vectorization

Text is converted into numerical vectors. We store these in ChromaDB, creating a 'searchable brain' of your document.

03

The Retrieval Loop

When a user asks a question, LangChain queries the vector store for the most statistically relevant text blocks.

04

Reasoned Synthesis

Google Gemini receives the user's query + the retrieved text. It synthesizes a human-like response based strictly on the facts found.

setup-guide.sh

# Initialization

git clone https://github.com/JohnJoel4/ThinkJoel-ai-document-analyzer.git

cd ThinkJoel-ai-document-analyzer && python -m venv venv

# Dependencies

pip install -r requirements.txt

# Environment Configuration

# Ensure GOOGLE_API_KEY is present in your .env

export FLASK_APP=run.py && flask run