| ๐ Documentation | ๐ Quickstart | ๐ API Guide | ๐ค Adaptive Hops | ๐ Security | ๐ Changelog |
Production-Ready AI Research Agent with OSINT & Multi-Hop Reasoning <div align=โleft>
Current Version: 1.4.6 โ Security Fixes </div>
We welcome your ideas, bug reports, and feature requests!
A fully local, production-ready AI system with advanced intelligence features:
/query, /plugins, /stats, /health)setup.bat, setup.sh with auto-configurationsite:, inurl:, intext:, filetype:, email:, phone:, ip:clear command, stores emails/phones/IPs/usernames/domains/notesforget command-
โ๏ธ Cloud LLM & Provider-Based Config:
config.json.example during setup๐ค Adaptive Agent Hopping System
๐ Complete English Translation:
Major Changes:
forget commandrun_api.bat / run_api.sh for quick FastAPI server startupForget Command Syntax:
forget email:test@example.com # Delete specific email
forget phone:+491234567890 # Delete phone number
forget ip:192.168.1.1 # Delete IP address
forget username:johndoe # Delete username
forget category:emails # Delete all emails
forget category:phones # Delete all phone numbers
forget all:true # Delete entire memory store
Start API Server:
# Windows
run_api.bat
# Linux/macOS
./run_api.sh
# Or manually
python app.py
Then open in browser: http://localhost:8000/docs
The integrated health module offers a unified dashboard with two modes:
# Windows
health-dashboard.bat
# Linux/macOS
./health-dashboard.sh
# Directly with Python (Interactive Menu)
python health-dashboard.py
# Directly to Live Monitor
python health-dashboard.py --monitor
# Directly to Test Dashboard
python health-dashboard.py --tests
Real-time monitoring with rich terminal UI:
Tkinter-based GUI for test management:
See: Health Monitoring Guide for details and programmatic usage
OSINT Usage:
# Email intelligence
email:test@example.com
# Phone intelligence
phone:"+49 151 12345678"
# IP intelligence
ip:8.8.8.8
# Batch processing (NEW in v1.4.1!)
email:test@example.com user@domain.com admin@site.com
phone:+491234567890 +441234567890 +331234567890
# Memory Store (NEW in v1.4.2!)
remember email:test@example.com # Store email
recall emails # Retrieve all emails
forget email:test@example.com # Delete specific email
forget category:emails # Delete all emails
forget all:true # Delete entire memory store
# Advanced search
site:github.com inurl:python filetype:md
# Combined operators
email:john@example.com site:linkedin.com inurl:profile
| See: OSINT Usage Guide | OSINT Module README |
Real-time monitoring with rich terminal UI displaying system metrics, component health, and performance tracking.

CrawlLamaโs adaptive intelligence system with automatic agent selection and interactive commands.

Tkinter-based test management interface with automatic test detection and real-time progress tracking.

Pre-built Releases (recommended for quick start):
| Version | Download | VirusTotal Check |
|---|---|---|
| v1.4 Preview | Crawllama-1.4-preview.zip | ๐ VirusTotal Scan |
โ
All downloads are virus-free - VirusTotal scans confirm no malware
๐ฆ Plug & Play - Simply extract and start (Ollama + Python required)
Windows:
C:\Crawllama)ollama serve
ollama pull qwen3:4b
setup.bat
run.bat
Linux/macOS:
wget https://github.com/arn-c0de/Crawllama/releases/download/v.1.4_Preview/Crawllama-1.4-preview.zip
unzip Crawllama-1.4-preview.zip
cd Crawllama-1.4
curl -fsSL https://ollama.ai/install.sh | sh
ollama serve &
ollama pull qwen3:4b
chmod +x setup.sh run.sh
./setup.sh
./run.sh
Windows:
setup.bat
Linux/macOS:
chmod +x setup.sh
./setup.sh
Note: After the initial setup, you must select at least one LLM model during setup. If a model is already installed, you can skip this stepโotherwise, selection is required to avoid errors in the test program.
The setup script:
.env.example to .envโ ๏ธ Note for initial installation:
When running pip install -r requirements.txt for the first time within the newly created virtual environment, installing all dependenciesโespecially packages like torch, sentence-transformers, and scientific librariesโmay take 5โ10 minutes (or longer, depending on connection and hardware). Please wait until the process completes; afterward, the virtual environment is ready for use.
Note on disk space: After installation (including venv), the project typically requires about 1.2โ1.5 GB of free disk space (v1.4: ~1.23 GB). This value may vary significantly depending on the operating system, Python packages (e.g., larger PyTorch/CUDA wheels), and additional models. Plan for ample additional space if storage is limited.
Model download sizes (approximate):
qwen3:4b โ ~2โ4 GB (depending on format/quantization)qwen3:8b โ ~8โ12 GBdeepseek-r1:8b โ ~6โ10 GBllama3:7b โ ~6โ9 GBmistral:7b โ ~4โ8 GBphi3:14b โ ~12โ20+ GBNote: Model sizes vary significantly depending on the provider, format (FP16, INT8 quantization, etc.), and additional assets. Quantized models (e.g., INT8) can significantly reduce size, while FP32/FP16 or models with additional tokenizer/vocab files require more space. Plan for sufficient additional storage if using larger models or multiple models simultaneously.
Prerequisites:
Windows - Step by Step:
# 1. Clone repository
git clone https://github.com/arn-c0de/Crawllama.git
cd Crawllama
# 2. Create virtual environment
python -m venv venv
venv\Scripts\activate
# 3. Install dependencies (takes 5-10 min)
pip install -r requirements.txt
# 4. Create directories
mkdir data\cache data\embeddings data\history logs plugins
# 5. Configuration
copy .env.example .env
notepad .env # Optional: Add API keys
# 6. Start Ollama (separate terminal)
ollama serve
# 7. Load model (separate terminal)
ollama pull qwen3:4b
# 8. Start Crawllama
python main.py --interactive
Linux/macOS - Step by Step:
# 1. Clone repository
git clone https://github.com/arn-c0de/Crawllama.git
cd Crawllama
# 2. Create virtual environment
python3 -m venv venv
source venv/bin/activate
# 3. Install dependencies (takes 5-10 min)
pip install -r requirements.txt
# 4. Create directories
mkdir -p data/cache data/embeddings data/history logs plugins
# 5. Configuration
cp .env.example .env
nano .env # Optional: Add API keys
# 6. Install and start Ollama
curl -fsSL https://ollama.ai/install.sh | sh
ollama serve &
# 7. Load model
ollama pull qwen3:4b
# 8. Start Crawllama
python main.py --interactive
Troubleshooting Installation:
| Problem | Solution |
|---|---|
python not found |
Install Python 3.10+: python.org |
pip install fails |
Run python -m pip install --upgrade pip |
ollama: command not found |
Install Ollama: ollama.ai/download |
Connection refused (Ollama) |
Start Ollama: ollama serve |
ModuleNotFoundError |
Activate virtual environment: venv\Scripts\activate (Win) or source venv/bin/activate (Linux) |
| Disk space full | Ensure at least 5 GB free for venv + model |
# 1. Clone
git clone https://github.com/arn-c0de/Crawllama.git
cd Crawllama
# 2. Virtual Environment
python -m venv venv
source venv/bin/activate # Linux/macOS
venv\Scripts\activate # Windows
# 3. Dependencies
pip install -r requirements.txt
# 4. Directories
mkdir -p data/cache data/embeddings data/history logs plugins
# 5. Config
cp .env.example .env
# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh # Linux/macOS
# or from https://ollama.ai/download # Windows
# Start Ollama
ollama serve
# Load model
ollama pull qwen3:4b
# Alternative: deepseek-r1:8b, llama3:7b, mistral
Note:
The first start may take significantly longer than subsequent starts!
Initialization, dependency installation, and model downloads may take several minutes, depending on hardware and internet connection.
After the first successful start, all subsequent starts are significantly faster.
python main.py --interactive
# Or with setup script
run.bat # Windows
./run.sh # Linux/macOS
โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ CrawlLama - Local Search and Response Agent โ
โ Commands: โ
โ clear - Reset session (history + cache) โ
โ clear-cache - Clear cache only โ
โ save - Manually save session โ
โ load - Reload session โ
โ stats - Display statistics โ
โ status - Show context usage โ
โ settings - Show/edit settings โ
โ restart - Restart agent (reload config) โ
โ exit, quit - Exit โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โฏ What is Machine Learning?
New Commands:
status - Shows token usage and available context capacity
โฏ status
Context Usage Tracker
โโโโโโโโโโโโโโโโโโโโโณโโโโโโโโโโโโณโโโโโโโโโโโโ
โ Source โ Tokens โ Share โ
โกโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฉ
โ Conversation โ 850 โ 8.5% โ
โ Search Results โ 320 โ 3.2% โ
โ Total Used โ 1,170 โ 11.7% โ
โ Available โ 8,830 โ 88.3% โ
โ Maximum โ 10,000 โ 100% โ
โโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโดโโโโโโโโโโโโ
settings - Interactive configuration editor
โฏ settings
Displays all settings and allows:
โข Category selection (llm, search, rag, cache, osint, all)
โข Change LLM model (qwen3:8b, deepseek-r1:8b, etc.)
โข Adjust temperature (0.0-1.0)
โข Configure max tokens (now 16,000 for RTX 3080+)
โข Change search region (de-de, us-en, wt-wt)
โข Configure OSINT max results & rate limits
โข Enable/disable RAG
โข Enable/disable cache
โข Save changes directly to config.json
โข Auto-restart after saving (optional)
restart - Restart agent
โฏ restart
โข Reloads config.json
โข Fully reinitializes agent
โข Optional session preservation
โข No session interruption
# Windows
health-dashboard.bat
# Linux/macOS
python health-dashboard.py
The dashboard displays:
Interactive commands:
r - Refresh (manual)c - Clear error logt - Run component testsq - QuitThe agent automatically decides when and how to search:
โฏ Who is the current German Chancellor?
1. LLM analyzes: "Requires current info" โ
2. Agent performs web search
3. LLM processes search results
4. Agent delivers up-to-date response
OSINT Search Operators:
# Domain-specific search
โฏ site:github.com machine learning
# Email Intelligence
โฏ email:john.doe@company.com
# Phone Intelligence
โฏ phone:"+49 151 12345678"
# IP Intelligence (NEW!)
โฏ ip:8.8.8.8
โฏ 192.168.1.1 # Auto-detects as IP
# Social Media Intelligence (12 Platforms)
โฏ username:elonmusk
โฏ @microsoft
โฏ github # Auto-detects as username
# File format search
โฏ site:example.com filetype:pdf
# URL filter
โฏ inurl:documentation python
# Text in content
โฏ intext:"contact email" site:example.com
Combined Searches:
# Multiple operators
โฏ site:linkedin.com inurl:profile "software engineer"
# Exclusion with minus
โฏ python programming -java
# OR conjunction
โฏ site:github.com OR site:gitlab.com "machine learning"
See OSINT Usage Guide for all features.
# Standard query (agent decides automatically if web search is needed)
python main.py "What is Python?"
# Multi-Hop Reasoning (for complex queries)
python main.py --multihop "Compare Python and JavaScript for web development"
# Offline mode (no web search, only LLM knowledge)
python main.py --no-web "Explain photosynthesis"
# OSINT search with search operators
python main.py "site:github.com python projects"
python main.py "email:contact@example.com"
# With specific model
python main.py --model llama3:7b "Who discovered Einstein?"
# Start server
python app.py
# Or with starter scripts
run_api.bat # Windows
./run_api.sh # Linux/macOS
# Or manually
uvicorn app:app --host 0.0.0.0 --port 8000
API Documentation: http://localhost:8000/docs
Available Endpoints:
Query & Reasoning:
POST /query - Execute standard or multi-hop queriesPOST /osint/query - OSINT queries with operators (email:, phone:, ip:, etc.)Memory Store (CRUD):
GET /memory - Retrieve all stored entriesPOST /memory/remember - Store value (email, phone, ip, username, domain, note)GET /memory/recall/{category} - Retrieve category (emails, phones, ips, etc.)DELETE /memory/forget - Delete individual values, categories, or everythingGET /memory/stats - Memory store statisticsSession Management:
POST /session/clear - Reset sessionPOST /session/save - Save sessionPOST /session/load - Load sessionCache:
POST /cache/clear - Clear cacheGET /cache/stats - Cache statisticsConfiguration:
GET /config - Retrieve current configurationPATCH /config - Modify configuration (llm, search, rag, cache, osint)GET /context/status - Token usage & context statusPlugins & Tools:
GET /plugins - List available pluginsPOST /plugins/{name}/load - Load pluginPOST /plugins/{name}/unload - Unload pluginGET /tools - List available toolsSystem:
GET /health - Health check (agent, monitoring, components)GET /stats - System statistics (agent stats, resources, performance)GET /security-info - Security configuration (rate limits, features)๐ API Security (v1.4.2+):
The API is protected by default with multiple security features:
Setup:
# 1. Set API key in .env
CRAWLLAMA_API_KEY=your_secure_api_key_min_32_chars
# 2. For local development (without API key)
CRAWLLAMA_DEV_MODE=true
# 3. Adjust rate limit (optional)
RATE_LIMIT=100
# 4. Configure CORS origins (optional)
ALLOWED_ORIGINS=http://localhost:3000,http://localhost:8080
Usage with API Key:
# With API key header
curl -X POST http://localhost:8000/query \
-H "Content-Type: application/json" \
-H "X-API-Key: your_api_key_here" \
-d '{"query": "test"}'
# Or in dev mode (without API key)
export CRAWLLAMA_DEV_MODE=true
python app.py
Example Requests:
# Standard query (agent uses web search automatically if needed)
curl -X POST http://localhost:8000/query \
-H "Content-Type: application/json" \
-d '{
"query": "What is Machine Learning?",
"use_multihop": false
}'
# Multi-hop query (for complex analyses)
curl -X POST http://localhost:8000/query \
-H "Content-Type: application/json" \
-d '{
"query": "Compare Python and JavaScript",
"use_multihop": true,
"max_hops": 3
}'
# OSINT search with search operators
curl -X POST http://localhost:8000/query \
-H "Content-Type: application/json" \
-d '{
"query": "site:github.com python machine-learning",
"use_multihop": false
}'
# Retrieve statistics
curl http://localhost:8000/stats
# List plugins
curl http://localhost:8000/plugins
# Load plugin
curl -X POST http://localhost:8000/plugins/example_plugin/load
| Option | Description |
|โโโ|โโโโโ|
| --interactive | Interactive mode |
| --debug | Enable debug logging |
| --no-web | Offline mode (no web search) |
| --model MODEL | Choose Ollama model |
| --stats | Display system statistics |
| --clear-cache | Clear cache |
| Option | Description |
|โโโ|โโโโโ|
| --multihop | Enable multi-hop reasoning |
| --max-hops N | Max reasoning steps (1-5) |
| --api | Start API server |
| --plugins | List available plugins |
| --load-plugin NAME | Load plugin |
| --help-extended | Show extended help |
| --examples | Show usage examples |
| --setup-keys | Securely set up API keys |
| Command | Description |
|โโโ|โโโโโ|
| exit, quit | Exit program |
| clear | Clear screen |
| stats | Display statistics |
| help | Show help |
CrawlLama provides a complete REST API for integration into custom applications.
Windows:
run_api.bat
Linux/macOS:
./run_api.sh
Or manually:
uvicorn app:app --host 0.0.0.0 --port 8000
1. Start API Server
run_api.bat
2. Open API Documentation
3. Send Query
curl -X POST http://localhost:8000/query \
-H "X-API-Key: your-key" \
-H "Content-Type: application/json" \
-d '{"query": "What is Python?", "use_tools": false}'
POST /query - Execute queries (with/without web search, multi-hop)GET /health - Health checkGET /stats - System statisticsPOST /memory/remember - Store data (OSINT)GET /memory/recall/{category} - Retrieve dataGET /plugins - Manage pluginsPOST /cache/clear - Clear cacheSet API key in .env:
CRAWLLAMA_API_KEY=your-secret-key-here
Or for testing:
CRAWLLAMA_DEV_MODE=true
๐ API Usage Guide - Complete API documentation with examples
๐ The complete and up-to-date project structure can be found here: docs/development/PROJECT_STRUCTURE.md
{
"llm": {
"base_url": "http://127.0.0.1:11434",
"model": "qwen3:8b",
"temperature": 0.7,
"max_tokens": 10000,
"stream": true
},
"search": {
"provider": "duckduckgo",
"max_results": 5,
"timeout": 10
},
"rag": {
"enabled": true,
"batch_size": 100,
"max_workers": 4
},
"cache": {
"enabled": true,
"ttl_hours": 24,
"max_size_mb": 500,
"clear_on_startup": false
},
"osint": {
"max_results": 20,
"email_search_limit": 50,
"phone_search_limit": 50,
"general_osint_limit": 100
},
"multihop": {
"enabled": true,
"max_hops": 3,
"confidence_threshold": 0.7,
"enable_critique": true
},
"plugins": {
"example_plugin": {
"enabled": true
}
},
"security": {
"rate_limit": 1.0,
"max_context_length": 8000,
"check_robots_txt": true
}
}
Recommended max_tokens Settings:
| GPU/Hardware | Recommended max_tokens | Model |
|---|---|---|
| RTX 3080+ (10GB+) | 10,000 - 16,000 | qwen3:8b, deepseek-r1:8b |
| RTX 3060/3070 (8GB) | 6,000 - 8,000 | qwen3:4b, llama3:7b |
| CPU Only | 2,000 - 4,000 | qwen3:4b |
๐ก Tip: Use the status command to monitor your token usage in real-time!
# API Keys (optional)
BRAVE_API_KEY=your_brave_api_key
SERPER_API_KEY=your_serper_api_key
# Proxy (optional)
HTTP_PROXY=http://proxy:port
HTTPS_PROXY=https://proxy:port
# All tests
pytest tests/ -v
# With coverage
pytest --cov=core --cov=tools --cov=utils tests/
# Specific tests
pytest tests/test_multihop_reasoning.py -v
pytest tests/test_error_simulation.py -v
# With debug output
pytest tests/ -v --log-cli-level=INFO
# plugins/my_plugin.py
from core.plugin_manager import Plugin, PluginMetadata
class MyPlugin(Plugin):
def get_metadata(self) -> PluginMetadata:
return PluginMetadata(
name="MyPlugin",
version="1.0.0",
description="My custom plugin",
author="Your Name",
dependencies=[]
)
def get_tools(self):
return [self.my_tool]
def my_tool(self, input: str) -> str:
return f"Processed: {input}"
See: Plugin Tutorial for details
tests/ for examplesContributions are welcome!
Development Workflow:
git checkout -b feature/amazing-feature)git commit -m 'Add amazing feature')git push origin feature/amazing-feature)Coding Standards:
| Operation | Average | Notes |
|---|---|---|
| Standard Query | 2-5s | Without web search |
| Query with Web Search | 5-10s | 3-5 results |
| Multi-Hop (3 Hops) | 15-30s | Complex |
| RAG Search | <1s | 5 results |
| API Request | <100ms | Without tools |
robots.txt# Check status
curl http://127.0.0.1:11434/api/tags
# Start Ollama
ollama serve
# Reinstall dependencies
pip install -r requirements.txt
# Or re-run setup
./setup.sh # or setup.bat
# Delete embeddings
rm -rf data/embeddings/
# Restart
python main.py
# Adjust in config.json
"security": {
"rate_limit": 2.0 # 2 req/s
}
Crawllama License (Non-Commercial) - Free for use and development, but no commercial sale allowed.
โ Allowed:
โ Not Allowed:
See LICENSE for full details.
Built with:
Last Updated: 2025-10-27