Information Gathering & Reconnaissance
Overview
Web application information gathering and reconnaissance form the critical foundation for successful web application penetration testing. This phase involves systematically collecting intelligence about target web applications, their underlying infrastructure, technologies, and potential attack vectors. Effective reconnaissance significantly increases testing efficiency and vulnerability discovery rates by providing comprehensive knowledge of the application landscape.
Web Application Fingerprinting
Purpose
Web application fingerprinting identifies the specific technologies, frameworks, and configurations used by target applications. This intelligence enables targeted testing approaches and helps prioritize vulnerabilities based on known weaknesses in identified technologies.
HTTP Response Fingerprinting
Server and Framework Headers
Manual Header Analysis:
# Basic header extraction
curl -I https://<target_domain>
wget --server-response --spider https://<target_domain>
# Multiple request methods for header analysis
for method in GET POST PUT DELETE OPTIONS; do
echo "=== $method ==="
curl -X $method -I https://<target_domain>
done
Automated Header Enumeration:
# Using nmap for HTTP header detection
nmap --script http-headers <target_ip>
nmap --script http-server-header <target_ip>
# Using whatweb for comprehensive technology detection
whatweb -v <target_url>
whatweb --color=never --no-errors -a 3 <target_url>
# Using httpx for advanced header analysis
echo <target_domain> | httpx -title -server -tech-detect -status-code
# Using nuclei for technology detection
nuclei -u <target_url> -t technologies/
nuclei -u <target_url> -t technologies/tech-detect.yaml
Framework Identification Headers:
X-Powered-By: PHP/7.4.3
- PHP version informationX-AspNet-Version: 4.0.30319
- ASP.NET framework versionX-Generator: Drupal 9 (https://www.drupal.org)
- CMS identificationServer: Apache/2.4.41 (Ubuntu)
- Web server with OS info
Error Message Fingerprinting
Database Error Detection with SQLMap:
# Basic database fingerprinting
sqlmap -u "https://<target>/page.php?id=1" --batch --banner
# Database enumeration without exploitation
sqlmap -u "https://<target>/page.php?id=1" --batch --fingerprint
# POST request database detection
sqlmap -r request.txt --batch --banner
# Cookie-based injection detection
sqlmap -u "https://<target>/page.php" --cookie="PHPSESSID=abc123; user_id=1" --batch
Error Detection with Nuclei:
# SQL injection detection templates
nuclei -u <target_url> -t vulnerabilities/sql/
# Database-specific error detection
nuclei -u <target_url> -t exposures/logs/sql-errors.yaml
Common Database Error Patterns:
MySQL:
You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version
PostgreSQL:
PostgreSQL query failed: ERROR: syntax error
MSSQL:
Microsoft OLE DB Provider for SQL Server error
Oracle:
ORA-00942: table or view does not exist
Technology Stack Identification
Purpose
Technology stack identification provides comprehensive mapping of all technologies, frameworks, libraries, and services used by the target application.
Automated Technology Detection
WhatWeb Technology Analysis:
# Basic technology detection
whatweb <target_url>
# Aggressive scanning with all plugins
whatweb -a 3 <target_url>
# JSON output for parsing
whatweb --color=never --no-errors -a 3 --output-format=json <target_url>
# Bulk scanning
whatweb -i urls.txt --output-format=brief
Wappalyzer CLI Usage:
# Install and use Wappalyzer
npm install -g wappalyzer
wappalyzer <target_url>
# Batch processing
cat urls.txt | wappalyzer --batch --recursive
Nuclei Technology Detection:
# Comprehensive technology detection
nuclei -u <target_url> -t technologies/ -o tech_results.txt
# Specific framework detection
nuclei -u <target_url> -t technologies/wordpress-detect.yaml
nuclei -u <target_url> -t technologies/drupal-version.yaml
nuclei -u <target_url> -t technologies/joomla-version.yaml
CMS-Specific Detection
WordPress Enumeration with WPScan:
# Basic WordPress scan
wpscan --url <target_url>
# Enumerate plugins and themes
wpscan --url <target_url> --enumerate p,t
# User enumeration
wpscan --url <target_url> --enumerate u
# Vulnerability detection
wpscan --url <target_url> --enumerate vp,vt,cb
Drupal Enumeration with Droopescan:
# Drupal version and module detection
droopescan scan drupal -u <target_url>
# Specific plugin enumeration
droopescan scan drupal -u <target_url> --enumerate p
Joomla Scanning with JoomScan:
# Basic Joomla enumeration
joomscan -u <target_url>
# Component enumeration
joomscan -u <target_url> -ec
Directory and File Enumeration
Purpose
Directory and file enumeration systematically discovers hidden content, administrative interfaces, configuration files, and sensitive information.
Directory Discovery Tools
Gobuster Directory Enumeration:
# Basic directory discovery
gobuster dir -u https://<target> -w /usr/share/wordlists/dirb/common.txt
# Multi-extension discovery
gobuster dir -u https://<target> -w /usr/share/seclists/Discovery/Web-Content/common.txt -x php,asp,aspx,jsp,html
# Admin panel discovery
gobuster dir -u https://<target> -w /usr/share/seclists/Discovery/Web-Content/AdminPanels.txt
# Recursive discovery
gobuster dir -u https://<target> -w wordlist.txt -r -d 3
Ffuf Fuzzing:
# Basic directory fuzzing
ffuf -w /usr/share/seclists/Discovery/Web-Content/common.txt -u https://<target>/FUZZ
# Multi-extension fuzzing
ffuf -w wordlist.txt -u https://<target>/FUZZ -e .php,.asp,.aspx,.jsp,.html
# Parameter fuzzing
ffuf -w parameters.txt -u https://<target>/page.php?FUZZ=test
# POST data fuzzing
ffuf -w wordlist.txt -u https://<target>/login -d "username=admin&password=FUZZ" -X POST
Dirb Scanning:
# Basic dirb scan
dirb https://<target> /usr/share/dirb/wordlists/common.txt
# Extension-specific scanning
dirb https://<target> wordlist.txt -X .php,.asp,.jsp
# Recursive scanning
dirb https://<target> wordlist.txt -r
Feroxbuster Recursive Discovery:
# Recursive directory discovery
feroxbuster -u https://<target> -w /usr/share/seclists/Discovery/Web-Content/common.txt
# Multi-threaded with extensions
feroxbuster -u https://<target> -w wordlist.txt -x php,asp,html -t 200
# Depth-limited recursion
feroxbuster -u https://<target> -w wordlist.txt -d 2
Backup and Configuration File Discovery
Configuration File Discovery:
# Config file specific enumeration
gobuster dir -u https://<target> -w config-files.txt -x .conf,.config,.ini,.xml,.json,.yaml,.yml,.env
# Backup file discovery
ffuf -w backup-extensions.txt -u https://<target>/config.FUZZ
# Common config paths
nuclei -u <target_url> -t exposures/configs/
Version Control Exposure:
# Git repository detection
curl -s https://<target>/.git/HEAD
curl -s https://<target>/.git/config
# Git file enumeration with GitTools
python3 gitdumper.py https://<target>/.git/ output_directory
# SVN exposure check
curl -s https://<target>/.svn/entries
Source Code Analysis
Purpose
Source code analysis examines client-side code, exposed server-side code, and configuration files to identify vulnerabilities and sensitive information.
Client-Side Code Analysis
JavaScript Analysis:
# Extract JavaScript files
curl -s <target_url> | grep -oE 'src="[^"]*\.js"' | cut -d'"' -f2
# Analyze JavaScript for sensitive data
curl -s <target_url>/app.js | grep -E "(api|key|token|password|secret)"
# Source map discovery
curl -s <target_url>/app.js | grep "sourceMappingURL"
Link and Endpoint Extraction:
# Extract all links from page
curl -s <target_url> | grep -oE 'href="[^"]*"' | cut -d'"' -f2
# API endpoint discovery
curl -s <target_url> | grep -oE '"/api/[^"]*"'
# Extract form actions
curl -s <target_url> | grep -oE 'action="[^"]*"' | cut -d'"' -f2
Configuration File Analysis
Environment File Discovery:
# .env file check
curl -s https://<target>/.env
# Configuration file discovery
for file in config.php settings.py web.config application.properties; do
echo "Testing: $file"
curl -s -f "https://<target>/$file" && echo " [FOUND]"
done
Nuclei Config Exposure Detection:
# Configuration exposure detection
nuclei -u <target_url> -t exposures/configs/
# Environment file exposure
nuclei -u <target_url> -t exposures/files/
# Backup file detection
nuclei -u <target_url> -t exposures/backups/
Subdomain Enumeration
Purpose
Subdomain enumeration discovers additional attack surfaces and services that may have different security postures than the primary domain.
Passive Subdomain Discovery
Subfinder Enumeration:
# Basic subdomain discovery
subfinder -d <target_domain>
# Verbose output with sources
subfinder -d <target_domain> -v
# Specific sources
subfinder -d <target_domain> -sources crtsh,virustotal,shodan
# Output to file
subfinder -d <target_domain> -o subdomains.txt
Amass Comprehensive Enumeration:
# Passive enumeration
amass enum -passive -d <target_domain>
# Active enumeration
amass enum -active -d <target_domain>
# Brute force enumeration
amass enum -brute -d <target_domain> -w wordlist.txt
# Historical data
amass db -d <target_domain> -since 01/01/2023
Additional Passive Tools:
# Assetfinder
assetfinder --subs-only <target_domain>
# Findomain
findomain -t <target_domain>
# Certificate transparency with ctfr
python3 ctfr.py -d <target_domain>
Active Subdomain Discovery
DNS Brute Force with Gobuster:
# DNS subdomain brute force
gobuster dns -d <target_domain> -w /usr/share/seclists/Discovery/DNS/subdomains-top1million-5000.txt
# Threaded DNS brute force
gobuster dns -d <target_domain> -w wordlist.txt -t 50
MassDNS Resolution:
# Generate subdomain list
echo <target_domain> | subfinder | tee subdomains.txt
# Resolve with massdns
massdns -r resolvers.txt -t A -o S subdomains.txt
Certificate Transparency Analysis
CT Log Tools:
# Certificate transparency with subfinder
subfinder -d <target_domain> -sources crtsh,certspotter
# Manual CT log query
curl -s "https://crt.sh/?q=%25.<target_domain>&output=json" | jq -r '.[].name_value' | sort -u
# Historical certificate analysis
curl -s "https://crt.sh/?q=<target_domain>&output=json" | jq -r '.[] | select(.not_after > "2023-01-01") | .name_value'
Content Discovery
Purpose
Content discovery identifies hidden pages, directories, files, and functionality that may not be linked from the main application interface.
Web Crawling and Spidering
Burp Suite Spider:
# Configure Burp Suite for comprehensive crawling
# Set scope to target domain
# Enable form submission and link following
# Configure authentication if required
Hakrawler for JavaScript Crawling:
# JavaScript-aware crawling
echo <target_url> | hakrawler
# Deep crawling with JavaScript execution
echo <target_url> | hakrawler -depth 3 -js
Gospider Web Crawler:
# Comprehensive web crawling
gospider -s <target_url> -c 10 -d 3
# JavaScript file analysis
gospider -s <target_url> -js -c 10
API Discovery
API Endpoint Discovery:
# Common API paths
gobuster dir -u https://<target> -w api-endpoints.txt -p /api/v1,/api/v2,/rest,/graphql
# Swagger/OpenAPI discovery
ffuf -w common-api-docs.txt -u https://<target>/FUZZ
# GraphQL endpoint discovery
nuclei -u <target_url> -t exposures/apis/graphql.yaml
API Documentation Discovery:
# Swagger UI discovery
curl -s https://<target>/swagger-ui/
curl -s https://<target>/api-docs/
# OpenAPI specification
curl -s https://<target>/openapi.json
curl -s https://<target>/swagger.json
Third-Party Service Identification
Purpose
Third-party service identification maps external dependencies, integrations, and services used by the application.
CDN and External Service Detection
CDN Identification:
# DNS resolution for CDN detection
dig <target_domain>
nslookup <target_domain>
# Header-based CDN detection
curl -I https://<target_domain> | grep -i "server\|via\|x-cache"
# Whatweb CDN detection
whatweb <target_url> | grep -i cdn
External Resource Analysis:
# Extract external URLs
curl -s <target_url> | grep -oE 'https?://[^/"]+' | sort -u
# JavaScript library CDN analysis
curl -s <target_url> | grep -oE 'src="https?://[^"]*\.js"'
# Font and CSS CDN discovery
curl -s <target_url> | grep -oE 'href="https?://[^"]*\.(css|woff|ttf)"'
Authentication and Payment Integration
OAuth and SSO Detection:
# OAuth endpoint discovery
curl -s <target_url> | grep -i "oauth\|auth0\|okta\|azure"
# Social login detection
curl -s <target_url> | grep -E "(facebook|google|github|linkedin).*login"
Payment Gateway Detection:
# Payment processor identification
curl -s <target_url> | grep -i "stripe\|paypal\|square\|braintree"
# E-commerce platform detection
whatweb <target_url> | grep -i "shopify\|woocommerce\|magento"
Version Disclosure Analysis
Purpose
Version disclosure analysis identifies specific software versions and configurations that can inform vulnerability assessment.
Automated Version Detection
Nuclei Version Detection:
# Comprehensive version detection
nuclei -u <target_url> -t technologies/ -o versions.txt
# CMS version detection
nuclei -u <target_url> -t technologies/wordpress-version.yaml
nuclei -u <target_url> -t technologies/drupal-version.yaml
nuclei -u <target_url> -t technologies/joomla-version.yaml
Version Disclosure Files:
# Common version files
for file in version.txt VERSION changelog.txt CHANGELOG.md readme.txt README.md; do
echo "Testing: $file"
curl -s -f "https://<target>/$file" && echo " [FOUND]"
done
# Package manager files
for file in composer.json package.json requirements.txt Gemfile pom.xml; do
echo "Testing: $file"
curl -s -f "https://<target>/$file" && echo " [FOUND]"
done
Framework-Specific Version Detection
WordPress Version Detection:
# WordPress version from generator meta tag
curl -s <target_url> | grep -i "generator.*wordpress"
# Version from readme
curl -s <target_url>/readme.html | grep -i version
# Version from RSS feed
curl -s <target_url>/feed/ | grep -i generator
Database Version Detection:
# MySQL version through error
sqlmap -u "<target_url>?id=1" --batch --banner
# PostgreSQL version detection
sqlmap -u "<target_url>?id=1" --batch --dbms=postgresql --banner
Advanced Information Gathering Techniques
Purpose
Advanced information gathering techniques go beyond basic reconnaissance to uncover hidden assets, sensitive information, and potential attack vectors through sophisticated analysis and specialized tools.
Google Dorking and Search Engine Intelligence
Advanced Google Dork Techniques
Sensitive File Discovery:
# Database dumps and backups
site:<target_domain> filetype:sql "INSERT INTO"
site:<target_domain> filetype:sql "CREATE TABLE"
site:<target_domain> filetype:bak
# Configuration and log files
site:<target_domain> filetype:log
site:<target_domain> filetype:conf
site:<target_domain> ext:env "DB_PASSWORD"
# Email and contact information
site:<target_domain> "@<target_domain>" filetype:xls
site:<target_domain> "@<target_domain>" filetype:csv
# Error messages and debug information
site:<target_domain> "fatal error" "call stack"
site:<target_domain> "mysql_connect()" "warning"
site:<target_domain> "Index of" "Parent Directory"
Framework and Technology Specific Dorks:
# WordPress specific
site:<target_domain> inurl:wp-content/uploads filetype:sql
site:<target_domain> inurl:wp-config.php.bak
site:<target_domain> "wp-config.php" "DB_PASSWORD"
# Drupal specific
site:<target_domain> inurl:sites/default/files
site:<target_domain> "settings.php" "database"
# Laravel specific
site:<target_domain> ".env" "APP_KEY"
site:<target_domain> inurl:storage/logs
Alternative Search Engines
Bing and DuckDuckGo Intelligence:
# Bing specific operators
site:<target_domain> contains:login
site:<target_domain> contains:admin
# DuckDuckGo search
site:<target_domain> filetype:pdf
site:<target_domain> intitle:"admin" OR intitle:"administrator"
Specialized Search Engines:
# Shodan queries for web applications
http.title:"<company_name>"
ssl.cert.subject.cn:<target_domain>
hostname:<target_domain> port:80,443,8080,8443
# Censys queries
parsed.names:<target_domain>
autonomous_system.organization:"<company_name>"
Social Media and Professional Network Intelligence
LinkedIn Reconnaissance
Employee Enumeration with theHarvester:
# Comprehensive email and employee enumeration
theHarvester -d <target_domain> -l 500 -b linkedin
# Specific social media sources
theHarvester -d <target_domain> -b linkedin,twitter,instagram
# Export results for further analysis
theHarvester -d <target_domain> -b all -f employees.xml
Manual LinkedIn Intelligence:
# Advanced LinkedIn search operators
site:linkedin.com "<company_name>" "software engineer"
site:linkedin.com "<company_name>" "system administrator"
site:linkedin.com "<company_name>" "devops"
# Technology stack identification through employee skills
site:linkedin.com "<company_name>" "AWS" OR "Azure" OR "Docker"
GitHub and Code Repository Intelligence
GitHub Reconnaissance with GitDorker:
# Install and use GitDorker
python3 GitDorker.py -tf tokens.txt -q <target_domain> -d dorks.txt
# Manual GitHub searches
site:github.com "<target_domain>"
site:github.com "<company_name>" password
site:github.com "<company_name>" API_KEY OR secret
GitHub API Intelligence:
# GitHub API for organization discovery
curl -s "https://api.github.com/search/users?q=<company_name>+type:org"
# Repository enumeration
curl -s "https://api.github.com/orgs/<company_name>/repos"
# Recent commits analysis
curl -s "https://api.github.com/repos/<company>/<repo>/commits"
Cloud Infrastructure Discovery
AWS Infrastructure Enumeration
S3 Bucket Discovery:
# S3 bucket enumeration with aws-cli
aws s3 ls s3://<company-name>-backups --no-sign-request
aws s3 ls s3://<company-name>-logs --no-sign-request
# Automated S3 hunting with Bucket Stream
python3 bucket_stream.py --only-interesting
# S3 bucket permutation with S3Scanner
python3 s3scanner.py sites.txt
CloudFront and CDN Discovery:
# CloudFront distribution enumeration
dig <target_domain> | grep cloudfront
curl -I https://<target_domain> | grep -i cloudfront
# CDN detection
whatweb <target_url> | grep -i "cloudflare\|akamai\|fastly"
API Discovery and Analysis
GraphQL Intelligence
GraphQL Schema Discovery:
# GraphQL introspection query
curl -X POST -H "Content-Type: application/json" \
-d '{"query": "{__schema{types{name}}}"}' \
https://<target>/graphql
# GraphQL Voyager for schema visualization
python3 -m http.server 8080
# Navigate to GraphQL Voyager interface
# Automated GraphQL analysis with InQL
python3 inql.py -t https://<target>/graphql
REST API Discovery with Kiterunner:
# API endpoint discovery
kr scan <target_url> -w routes-large.kite
# Wordlist-based API discovery
kr brute <target_url> -w api-wordlist.txt
# Technology-specific API patterns
kr scan <target_url> -w swagger-wordlist.kite
Mobile Application Analysis
Mobile App Intelligence
APK Analysis for Web Services:
# APK download and extraction
apktool d application.apk
# Extract API endpoints from APK
grep -r "https\?://" application/ | grep -E "\.(com|net|org)"
# Certificate pinning analysis
grep -r "certificate\|ssl\|tls" application/
iOS Application Analysis:
# IPA file analysis
unzip application.ipa
plutil -p Payload/App.app/Info.plist
# URL scheme discovery
grep -r "http" Payload/App.app/
Metadata and Document Analysis
Document Intelligence
Metadata Extraction with ExifTool:
# Download and analyze documents
wget https://<target>/document.pdf
exiftool document.pdf
# Bulk document analysis
find downloads/ -name "*.pdf" -exec exiftool {} \;
# Author and creation software analysis
exiftool -Author -Creator -Producer downloads/*
FOCA (Fingerprinting Organizations with Collected Archives):
# Document metadata analysis
python3 foca.py -d <target_domain> -t pdf,doc,xls
# Email extraction from documents
python3 foca.py -d <target_domain> -e -t all
Threat Intelligence Integration
Threat Intelligence Platforms
VirusTotal Intelligence:
# Domain intelligence
curl -H "x-apikey: <api_key>" \
"https://www.virustotal.com/vtapi/v2/domain/report?domain=<target_domain>"
# Passive DNS analysis
curl -H "x-apikey: <api_key>" \
"https://www.virustotal.com/vtapi/v2/domain/report?domain=<target_domain>&allinfo=1"
Passive Total Integration:
# Historical WHOIS data
curl -u "<username>:<api_key>" \
"https://api.passivetotal.org/v2/whois?query=<target_domain>"
# Passive DNS resolution
curl -u "<username>:<api_key>" \
"https://api.passivetotal.org/v2/dns/passive?query=<target_domain>"
Information Organization
Data Collection Framework:
# Create organized directory structure
mkdir -p reconnaissance/{subdomains,directories,technologies,vulnerabilities,screenshots}
# Automated data collection script
echo "Target: <target_domain>" > reconnaissance/summary.txt
subfinder -d <target_domain> > reconnaissance/subdomains/subfinder.txt
gobuster dir -u https://<target_domain> -w wordlist.txt > reconnaissance/directories/gobuster.txt
whatweb <target_domain> > reconnaissance/technologies/whatweb.txt
Validation and Verification
Multi-Tool Verification:
# Cross-validate findings with multiple tools
subfinder -d <target_domain> -o sub1.txt
amass enum -passive -d <target_domain> -o sub2.txt
findomain -t <target_domain> -o sub3.txt
# Compare results
cat sub1.txt sub2.txt sub3.txt | sort -u > verified_subdomains.txt
Quality Assurance Checklist:
Web application fingerprinting completed with multiple methods
Technology stack fully identified and documented
Directory and file enumeration comprehensive
Source code analysis completed where accessible
Subdomain enumeration exhaustive across multiple techniques
Content discovery systematic and complete
Version disclosure analysis comprehensive with vulnerability correlation
All findings verified through multiple detection methods
Security implications documented for each discovered component
Last updated
Was this helpful?