In my recent exploration of AI-powered educational tools, I came across DeepTutor — an open-source personalized learning assistant from HKUDS (The University of Hong Kong, Department of Computer Science). After setting it up locally and diving through its codebase, I wanted to share my architectural insights and analysis of what makes this system tick.
What is DeepTutor?
DeepTutor is an AI-powered personalized learning platform that combines multiple AI agents to create an adaptive tutoring experience. It's built on a modern full-stack architecture: FastAPI backend, Next.js 16 frontend, and a sophisticated multi-agent system powered by Large Language Models (LLMs).
System Architecture Overview
The Dual-Loop Agent System
At the heart of DeepTutor lies its Dual-Loop Agent System — a clever architectural pattern that separates learning into two distinct but interconnected phases:
- Analysis Loop — Understands the learner's current state, knowledge gaps, and learning style
- Solve Loop — Generates personalized content, exercises, and explanations based on the analysis
This separation allows the system to first deeply understand the problem space before attempting to solve it, resulting in more targeted and effective tutoring.
Technology Stack
| Layer | Technology |
|---|---|
| Frontend | Next.js 16, React 19, Tailwind CSS |
| Backend | FastAPI, Uvicorn, Python 3.11+ |
| LLM | kimi-k2.5:cloud (via Ollama) |
| Embeddings | Jina AI (jina-embeddings-v3) |
| Database | SQLite (with optional PostgreSQL) |
| Knowledge Graph | LightRAG |
| Vector Store | FAISS |
The codebase spans approximately 53,646 lines of Python/YAML across 222 Python files — substantial for an educational AI system.
Multi-Agent Architecture
DeepTutor implements seven specialized agents, each with a distinct responsibility:
1. Solver Agent
The core problem-solving engine. It analyzes questions and generates step-by-step explanations. Uses chain-of-thought prompting to show its reasoning.
2. Researcher Agent
Gathers and synthesizes information from the knowledge base. Implements RAG (Retrieval-Augmented Generation) to provide contextually relevant information.
3. Question Generator
Creates personalized practice questions based on learning objectives and detected knowledge gaps.
4. Co-Writer Agent
Assists with writing tasks, code reviews, and structured content creation.
5. Notebook Agent
Manages learning materials, notes, and references. Organizes content for later review.
6. Guide Agent
Provides navigation and learning path recommendations. Suggests next topics and prerequisite reviews.
7. Planner Agent
Creates personalized learning schedules and tracks progress over time.
The Embedding Layer
DeepTutor's flexibility comes from its 7 embedding adapters:
- Jina AI (default, cloud-based)
- OpenAI (text-embedding-3)
- Cohere (embed-english-v3)
- Ollama (local embeddings)
- HuggingFace (sentence-transformers)
- Google (text-embedding)
- Azure OpenAI
This abstraction allows users to choose between cost, privacy, and performance trade-offs based on their needs.
Knowledge Graph Integration
The system integrates LightRAG (Lightweight Retrieval-Augmented Generation) for knowledge graph construction:
# Conceptual implementation
from lightrag import LightRAG
rag = LightRAG(
embedding_model="jina-embeddings-v3",
llm_model_func=kimi_llm_handler,
working_dir="./knowledge_graph"
)
# Insert learning materials
rag.insert(text=learning_content)
# Query with context
result = rag.query(
query="Explain neural networks",
mode="hybrid" # Combines vector + knowledge graph search
)This approach creates connections between concepts, enabling the system to answer questions that require understanding multiple related topics.
Frontend Architecture
The Next.js 16 frontend is built with:
- App Router — File-based routing with React Server Components
- Client-side state — React hooks for UI state management
- Streaming responses — Real-time token-by-token LLM output display
- Tailwind CSS — Utility-first styling responsive design
- Server Actions — Direct backend function calls without API endpoints
Key UI Components
| Component | Purpose |
|---|---|
ChatInterface | Main conversation window |
AgentSelector | Switch between tutor modes |
KnowledgeGraph | Visual concept map |
ProgressTracker | Learning analytics |
NotebookPanel | Note-taking interface |
Backend API Structure
The FastAPI backend exposes RESTful endpoints:
# Key endpoints
@router.post("/chat")
async def chat(message: ChatMessage, session_id: str):
"""Main chat endpoint with streaming support"""
pass
@router.post("/agents/{agent_type}/invoke")
async def invoke_agent(agent_type: str, request: AgentRequest):
"""Direct agent communication"""
pass
@router.get("/knowledge-graph/{concept}")
async def get_concept_graph(concept: str):
"""Retrieve related concepts"""
pass
@router.post("/upload/document")
async def upload_document(file: UploadFile):
"""Process learning materials"""
passRAG Pipeline Flow
The Retrieval-Augmented Generation pipeline works as follows:
-
Document Ingestion
- Users upload PDFs, markdown, or text
- Documents chunked into semantic segments
- Each chunk embedded and stored in FAISS
-
Query Processing
- User question embedded using selected model
- Vector similarity search retrieves top-k chunks
- LightRAG adds related concepts from knowledge graph
-
Context Assembly
- Retrieved chunks combined into context window
- System prompt constructed with agent role
- Previous conversation history appended
-
Generation
- LLM generates response with streaming
- Output token-by-token to frontend
- Response cached for future reference
Configuration & Deployment
Environment Variables
# LLM Configuration
OLLAMA_URL=http://localhost:11434
OLLAMA_MODEL=kimi-k2.5:cloud
# Embeddings
JINA_API_KEY=your_jina_key_here
# Storage
VECTOR_STORE_PATH=./data/vectors
KNOWLEDGE_GRAPH_PATH=./data/kg
# Optional: PostgreSQL for production
DATABASE_URL=postgresql://...Local Development Setup
# Clone repository
git clone https://github.com/HKUDS/DeepTutor.git
cd DeepTutor
# Backend setup
cd backend
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
uvicorn main:app --host 0.0.0.0 --port 8001
# Frontend setup
cd ../frontend
npm install
npm run build
npm start # Production server on port 3000Architectural Strengths
- Modular Agent Design — Easy to extend or customize individual agents
- Multiple Embedding Sources — Vendor flexibility and cost optimization
- Knowledge Graph Memory — Concepts linked, not isolated
- Streaming UX — Real-time response improves perceived performance
- Local LLM Support — Privacy-preserving option via Ollama
Areas for Improvement
- No Authentication — Currently no user auth system
- Limited Persistence — Session-based, no long-term progress tracking
- Single Knowledge Graph — No multi-tenant isolation
- No Rate Limiting — API endpoints unprotected
- Deployment Complexity — Multiple services require orchestration
Comparison with Similar Systems
| Feature | DeepTutor | Khanmigo | Duolingo Max |
|---|---|---|---|
| Open Source | ✅ | ❌ | ❌ |
| Multi-Agent | ✅ | ❌ | ❌ |
| Knowledge Graph | ✅ | ❌ | ❌ |
| Local LLM | ✅ | ❌ | ❌ |
| Cost | Usage-based | Free/Paid | Subscription |
Key Takeaways
DeepTutor represents a sophisticated approach to AI-powered education. Its architecture demonstrates:
- Separation of concerns with specialized agents
- Flexible embedding layer for different use cases
- Knowledge graph integration going beyond simple RAG
- Modern frontend choices with Next.js App Router
For engineering teams building similar systems, DeepTutor offers valuable patterns: the dual-loop design, multi-agent orchestration, and pluggable embedding adapters are all concepts worth adapting.
The codebase is surprisingly mature for an academic project — well-structured, documented, and production-ready for personal or small-scale educational deployments.
Resources
- GitHub Repository: github.com/HKUDS/DeepTutor
- Paper: "DeepTutor: Towards Interactive AI Tutoring via Hierarchical Cooperative Multi-Agent" (arXiv)
- Demo: Available via the repository's README
Have you experimented with AI tutoring systems? I'd love to hear your thoughts on this architecture or similar approaches you've seen in production.