๐Ÿš€ -> Project on GitHub <-

OSINT Module for CrawlLama

Version: 1.2.0 Status: Production Ready Last Updated: 2025-01-24

๐ŸŽฏ Overview

The OSINT (Open Source Intelligence) module provides advanced search capabilities, email/phone intelligence, and AI-powered query enhancement for investigative research.

IMPORTANT: OSINT features are provided exclusively for legitimate purposes:

โœ… Permitted Use:

โŒ Prohibited Use:

All OSINT queries are logged for compliance and audit purposes.

๐Ÿ” Features

1. Advanced Search Operators

Parse and execute advanced search queries:

from core.osint import OSINTQueryParser

parser = OSINTQueryParser()
query = parser.parse('site:github.com inurl:python filetype:md')

print(query.site)      # 'github.com'
print(query.inurl)     # 'python'
print(query.filetype)  # 'md'

Supported Operators:

2. Email Intelligence

Comprehensive email analysis:

from core.osint import EmailIntelligence

email_intel = EmailIntelligence()
result = email_intel.analyze_email('test@example.com')

print(result['valid'])          # True/False
print(result['domain'])         # 'example.com'
print(result['mx_records'])     # List of MX records
print(result['disposable'])     # True if disposable email
print(result['variations'])     # Email variations
print(result['confidence'])     # Confidence score (0.0-1.0)

Capabilities:

3. Phone Intelligence

Phone number analysis and validation:

from core.osint import PhoneIntelligence

phone_intel = PhoneIntelligence()
result = phone_intel.analyze_phone('+49 151 12345678', region='DE')

print(result['valid'])         # True/False
print(result['formatted'])     # '+49 151 12345678'
print(result['country'])       # 'Germany'
print(result['carrier'])       # Carrier name (if available)
print(result['type'])          # 'mobile', 'fixed_line', etc.
print(result['variations'])    # Format variations

Capabilities:

Note: Full phone intelligence requires phonenumbers library:

pip install phonenumbers

4. AI Query Enhancement

LLM-powered query optimization:

from core.osint import QueryEnhancer
from core.llm_client import OllamaClient

llm = OllamaClient()
enhancer = QueryEnhancer(llm)

# Generate query variations
variations = enhancer.generate_variations("John Doe security researcher")
# Output: ["John Doe cybersecurity", "John Doe infosec", ...]

# Suggest operators
operators = enhancer.suggest_operators("find John Doe LinkedIn")
# Output: {'site': 'linkedin.com', 'inurl': 'profile'}

# Identify entity type
entity_type = enhancer.identify_entity_type("test@example.com")
# Output: 'email'

# Suggest sources
sources = enhancer.suggest_sources("Max Mustermann developer", "person")
# Output: ['linkedin.com', 'github.com', 'xing.de', ...]

5. Social Media Intelligence

Comprehensive social media profile analysis and discovery:

from core.osint import SocialIntelligence

social = SocialIntelligence()

# Analyze username across platforms
result = await social.analyze_username("john_doe")

print(f"Found on {result['summary']['platforms_with_presence']} platforms")
print(f"Confidence: {result['summary']['confidence_score']:.1f}%")

# Generate detailed report
report = social.generate_social_report(result)
print(report)

# Discover profiles by email
email_result = await social.discover_profiles_by_email("john@example.com")
print(f"Email-based matches: {len(email_result['username_matches'])}")

Supported Platforms:

Features:

6. Compliance & Rate Limiting

Built-in compliance checks and rate limiting:

from core.osint import OSINTCompliance

compliance = OSINTCompliance()

# Check if user accepted terms
if not compliance.check_terms_accepted("user123"):
    print(compliance.display_terms())

# Accept terms
compliance.accept_terms("user123")

# Check query compliance
allowed, reason = compliance.check_query(
    query="email:test@example.com",
    user_id="user123",
    query_type="email_search"
)

if not allowed:
    print(f"Query blocked: {reason}")

# Get usage stats
stats = compliance.get_usage_stats("user123")
print(f"Requests this hour: {stats['total_requests_last_hour']}")
print(f"Remaining limits: {stats['remaining_limits']}")

Rate Limits (per hour):

๐Ÿš€ Quick Start

Using OSINT Tool (Unified Interface)

from tools.osint_tool import OSINTTool
from core.llm_client import OllamaClient

# Initialize
llm = OllamaClient()
osint = OSINTTool(llm, config)

# Accept terms (first time)
if not osint.check_terms():
    osint.accept_terms()

# Process OSINT query
result = osint.process_query("email:test@example.com site:linkedin.com")

print(result['query_type'])         # 'email_intelligence'
print(result['intelligence'])       # Email analysis results
print(result['suggestions'])        # AI suggestions

Using in CrawlLama Main

# In main.py or interactive mode
query = "email:max.mustermann@example.com"

# The agent will automatically detect OSINT operators
response = agent.query(query)

Example Queries:

# Email intelligence
email:test@example.com

# Phone intelligence
phone:"+49 151 12345678"

# Social media username search
social:john_doe

# Advanced search
site:github.com inurl:python "machine learning"

# Combined searches
email:john@example.com site:linkedin.com inurl:profile
social:john_doe platforms:twitter,github,instagram

๐Ÿ“ Module Structure

core/osint/
โ”œโ”€โ”€ __init__.py              # Module exports
โ”œโ”€โ”€ query_parser.py          # Advanced operator parsing
โ”œโ”€โ”€ email_intel.py           # Email intelligence
โ”œโ”€โ”€ phone_intel.py           # Phone intelligence
โ”œโ”€โ”€ social_intel.py          # Social media intelligence
โ”œโ”€โ”€ query_enhancer.py        # AI query enhancement
โ”œโ”€โ”€ compliance.py            # Compliance & rate limiting
โ””โ”€โ”€ README.md                # This file

tools/
โ””โ”€โ”€ osint_tool.py            # Unified OSINT tool for agent

data/osint_logs/             # Audit logs (auto-created)
โ”œโ”€โ”€ osint_queries_YYYY-MM.jsonl
โ”œโ”€โ”€ violations.jsonl
โ””โ”€โ”€ terms_accepted.json

๐Ÿ”ง Configuration

Add to config.json:

{
  "osint": {
    "enabled": true,
    "log_queries": true,
    "rate_limits": {
      "email_search": 50,
      "phone_search": 50,
      "general_osint": 100
    }
  }
}

๐Ÿ“ Examples

Example 1: Email OSINT

from core.osint import EmailIntelligence

intel = EmailIntelligence()

# Analyze email
result = intel.analyze_email("john.doe@company.com")

if result['valid']:
    print(f"Domain: {result['domain']}")
    print(f"Disposable: {result['disposable']}")
    print(f"MX Records: {result['mx_records']}")

    # Generate variations
    print("Possible variations:")
    for var in result['variations']:
        print(f"  โ€ข {var}")

Example 2: Phone OSINT

from core.osint import PhoneIntelligence

intel = PhoneIntelligence()

# Analyze German phone number
result = intel.analyze_phone("+49 151 12345678", region="DE")

if result['valid']:
    print(f"Formatted: {result['formatted']}")
    print(f"Country: {result['country']}")
    print(f"Type: {result['type']}")
    print(f"Carrier: {result['carrier']}")

Example 3: Social Media Intelligence

import asyncio
from core.osint import SocialIntelligence

async def social_analysis_example():
    social = SocialIntelligence()
    
    # Username analysis across platforms
    result = await social.analyze_username("john_doe", 
                                          platforms=["twitter", "github", "instagram"])
    
    print(f"Analysis Results:")
    print(f"โ”œโ”€ Platforms found: {result['summary']['platforms_with_presence']}")
    print(f"โ”œโ”€ Confidence: {result['summary']['confidence_score']:.1f}%")
    print(f"โ””โ”€ Risk level: {'HIGH' if len(result['summary']['risk_indicators']) > 2 else 'LOW'}")
    
    # Show found profiles
    for profile in result['platforms_found']:
        verified = "โœ“" if profile['profile_data'].get('verified') else ""
        print(f"  ๐Ÿ”— {profile['platform']}: {profile['url']} {verified}")

# Run the analysis
asyncio.run(social_analysis_example())
from core.osint import QueryEnhancer, OSINTQueryParser
from core.llm_client import OllamaClient

llm = OllamaClient()
enhancer = QueryEnhancer(llm)
parser = OSINTQueryParser()

# Original query
query = "Max Mustermann security"

# Get AI suggestions
variations = enhancer.generate_variations(query)
operators = enhancer.suggest_operators(query)

# Build enhanced query
enhanced = f"{query} {' '.join([f'{op}:{val}' for op, val in operators.items()])}"
print(f"Enhanced: {enhanced}")

# Parse and execute
parsed = parser.parse(enhanced)

๐Ÿ›ก๏ธ Privacy & Security

Data Protection

GDPR Compliance

The OSINT module is designed with privacy laws in mind:

  1. Purpose Limitation: Only for legitimate purposes
  2. Data Minimization: Minimal data collection
  3. Transparency: Clear terms of use
  4. User Rights: Audit logs accessible
  5. Security: Encrypted storage and transmission

Blacklisted Queries

Queries containing these terms are automatically blocked:

๐Ÿ“Š Audit Logs

All OSINT operations are logged:

{
  "timestamp": "2025-01-24T10:30:00",
  "user_id": "user123",
  "query": "email:test@example.com",
  "query_type": "email_search",
  "status": "approved"
}

Logs are stored in: data/osint_logs/osint_queries_YYYY-MM.jsonl

๐Ÿงช Testing

# Run OSINT tests
pytest tests/test_osint.py -v

# Test specific module
pytest tests/test_email_intel.py -v

๐Ÿ“ฆ Dependencies

# Core (required)
pip install requests beautifulsoup4

# Phone intelligence (optional but recommended)
pip install phonenumbers

# Full installation
pip install -r requirements.txt

๐Ÿ”ฎ Future Enhancements (v1.3+)

๐Ÿ“š Resources

โ“ FAQ

Q: Do I need API keys? A: No API keys required for basic features. Optional integrations (HaveIBeenPwned, etc.) require keys.

Q: Is phone intelligence library required? A: No, but phonenumbers library provides advanced features (carrier, type detection).

Q: Are queries stored permanently? A: Only audit logs are stored (timestamp, user_id, query type). No sensitive data persisted.

Q: What if I exceed rate limits? A: Wait 1 hour or increase limits in config.json (use responsibly).

Q: Can I use this for commercial purposes? A: Yes, but ensure compliance with local laws and terms of service of searched platforms.

๐Ÿ“ง Support

Remember: With great power comes great responsibility. Use OSINT ethically! ๐Ÿ›ก๏ธ