This site runs best with JavaScript enabled.
aiarchitecturepython

Deep Dive into DeepTutor: An AI-Powered Personalized Learning Architecture

Analyzing the architecture of DeepTutor, an AI-powered personalized learning assistant built with FastAPI, Next.js, and a sophisticated multi-agent system.

KL
Khoa Le
·

In my recent exploration of AI-powered educational tools, I came across DeepTutor — an open-source personalized learning assistant from HKUDS (The University of Hong Kong, Department of Computer Science). After setting it up locally and diving through its codebase, I wanted to share my architectural insights and analysis of what makes this system tick.

What is DeepTutor?

DeepTutor is an AI-powered personalized learning platform that combines multiple AI agents to create an adaptive tutoring experience. It's built on a modern full-stack architecture: FastAPI backend, Next.js 16 frontend, and a sophisticated multi-agent system powered by Large Language Models (LLMs).

System Architecture Overview

The Dual-Loop Agent System

At the heart of DeepTutor lies its Dual-Loop Agent System — a clever architectural pattern that separates learning into two distinct but interconnected phases:

  1. Analysis Loop — Understands the learner's current state, knowledge gaps, and learning style
  2. Solve Loop — Generates personalized content, exercises, and explanations based on the analysis

This separation allows the system to first deeply understand the problem space before attempting to solve it, resulting in more targeted and effective tutoring.

Technology Stack

LayerTechnology
FrontendNext.js 16, React 19, Tailwind CSS
BackendFastAPI, Uvicorn, Python 3.11+
LLMkimi-k2.5:cloud (via Ollama)
EmbeddingsJina AI (jina-embeddings-v3)
DatabaseSQLite (with optional PostgreSQL)
Knowledge GraphLightRAG
Vector StoreFAISS

The codebase spans approximately 53,646 lines of Python/YAML across 222 Python files — substantial for an educational AI system.

Multi-Agent Architecture

DeepTutor implements seven specialized agents, each with a distinct responsibility:

1. Solver Agent

The core problem-solving engine. It analyzes questions and generates step-by-step explanations. Uses chain-of-thought prompting to show its reasoning.

2. Researcher Agent

Gathers and synthesizes information from the knowledge base. Implements RAG (Retrieval-Augmented Generation) to provide contextually relevant information.

3. Question Generator

Creates personalized practice questions based on learning objectives and detected knowledge gaps.

4. Co-Writer Agent

Assists with writing tasks, code reviews, and structured content creation.

5. Notebook Agent

Manages learning materials, notes, and references. Organizes content for later review.

6. Guide Agent

Provides navigation and learning path recommendations. Suggests next topics and prerequisite reviews.

7. Planner Agent

Creates personalized learning schedules and tracks progress over time.

The Embedding Layer

DeepTutor's flexibility comes from its 7 embedding adapters:

  • Jina AI (default, cloud-based)
  • OpenAI (text-embedding-3)
  • Cohere (embed-english-v3)
  • Ollama (local embeddings)
  • HuggingFace (sentence-transformers)
  • Google (text-embedding)
  • Azure OpenAI

This abstraction allows users to choose between cost, privacy, and performance trade-offs based on their needs.

Knowledge Graph Integration

The system integrates LightRAG (Lightweight Retrieval-Augmented Generation) for knowledge graph construction:

# Conceptual implementation
from lightrag import LightRAG

rag = LightRAG(
    embedding_model="jina-embeddings-v3",
    llm_model_func=kimi_llm_handler,
    working_dir="./knowledge_graph"
)

# Insert learning materials
rag.insert(text=learning_content)

# Query with context
result = rag.query(
    query="Explain neural networks",
    mode="hybrid"  # Combines vector + knowledge graph search
)

This approach creates connections between concepts, enabling the system to answer questions that require understanding multiple related topics.

Frontend Architecture

The Next.js 16 frontend is built with:

  • App Router — File-based routing with React Server Components
  • Client-side state — React hooks for UI state management
  • Streaming responses — Real-time token-by-token LLM output display
  • Tailwind CSS — Utility-first styling responsive design
  • Server Actions — Direct backend function calls without API endpoints

Key UI Components

ComponentPurpose
ChatInterfaceMain conversation window
AgentSelectorSwitch between tutor modes
KnowledgeGraphVisual concept map
ProgressTrackerLearning analytics
NotebookPanelNote-taking interface

Backend API Structure

The FastAPI backend exposes RESTful endpoints:

# Key endpoints
@router.post("/chat")
async def chat(message: ChatMessage, session_id: str):
    """Main chat endpoint with streaming support"""
    pass

@router.post("/agents/{agent_type}/invoke")
async def invoke_agent(agent_type: str, request: AgentRequest):
    """Direct agent communication"""
    pass

@router.get("/knowledge-graph/{concept}")
async def get_concept_graph(concept: str):
    """Retrieve related concepts"""
    pass

@router.post("/upload/document")
async def upload_document(file: UploadFile):
    """Process learning materials"""
    pass

RAG Pipeline Flow

The Retrieval-Augmented Generation pipeline works as follows:

  1. Document Ingestion

    • Users upload PDFs, markdown, or text
    • Documents chunked into semantic segments
    • Each chunk embedded and stored in FAISS
  2. Query Processing

    • User question embedded using selected model
    • Vector similarity search retrieves top-k chunks
    • LightRAG adds related concepts from knowledge graph
  3. Context Assembly

    • Retrieved chunks combined into context window
    • System prompt constructed with agent role
    • Previous conversation history appended
  4. Generation

    • LLM generates response with streaming
    • Output token-by-token to frontend
    • Response cached for future reference

Configuration & Deployment

Environment Variables

# LLM Configuration
OLLAMA_URL=http://localhost:11434
OLLAMA_MODEL=kimi-k2.5:cloud

# Embeddings
JINA_API_KEY=your_jina_key_here

# Storage
VECTOR_STORE_PATH=./data/vectors
KNOWLEDGE_GRAPH_PATH=./data/kg

# Optional: PostgreSQL for production
DATABASE_URL=postgresql://...

Local Development Setup

# Clone repository
git clone https://github.com/HKUDS/DeepTutor.git
cd DeepTutor

# Backend setup
cd backend
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
uvicorn main:app --host 0.0.0.0 --port 8001

# Frontend setup
cd ../frontend
npm install
npm run build
npm start  # Production server on port 3000

Architectural Strengths

  1. Modular Agent Design — Easy to extend or customize individual agents
  2. Multiple Embedding Sources — Vendor flexibility and cost optimization
  3. Knowledge Graph Memory — Concepts linked, not isolated
  4. Streaming UX — Real-time response improves perceived performance
  5. Local LLM Support — Privacy-preserving option via Ollama

Areas for Improvement

  1. No Authentication — Currently no user auth system
  2. Limited Persistence — Session-based, no long-term progress tracking
  3. Single Knowledge Graph — No multi-tenant isolation
  4. No Rate Limiting — API endpoints unprotected
  5. Deployment Complexity — Multiple services require orchestration

Comparison with Similar Systems

FeatureDeepTutorKhanmigoDuolingo Max
Open Source
Multi-Agent
Knowledge Graph
Local LLM
CostUsage-basedFree/PaidSubscription

Key Takeaways

DeepTutor represents a sophisticated approach to AI-powered education. Its architecture demonstrates:

  • Separation of concerns with specialized agents
  • Flexible embedding layer for different use cases
  • Knowledge graph integration going beyond simple RAG
  • Modern frontend choices with Next.js App Router

For engineering teams building similar systems, DeepTutor offers valuable patterns: the dual-loop design, multi-agent orchestration, and pluggable embedding adapters are all concepts worth adapting.

The codebase is surprisingly mature for an academic project — well-structured, documented, and production-ready for personal or small-scale educational deployments.

Resources

  • GitHub Repository: github.com/HKUDS/DeepTutor
  • Paper: "DeepTutor: Towards Interactive AI Tutoring via Hierarchical Cooperative Multi-Agent" (arXiv)
  • Demo: Available via the repository's README

Have you experimented with AI tutoring systems? I'd love to hear your thoughts on this architecture or similar approaches you've seen in production.