This guide is currently under development, and I greatly welcome any suggestions or feedback or at reaper.gitbook@gmail.com

Information Gathering & Reconnaissance

Overview

Web application information gathering and reconnaissance form the critical foundation for successful web application penetration testing. This phase involves systematically collecting intelligence about target web applications, their underlying infrastructure, technologies, and potential attack vectors. Effective reconnaissance significantly increases testing efficiency and vulnerability discovery rates by providing comprehensive knowledge of the application landscape.


Web Application Fingerprinting

Purpose

Web application fingerprinting identifies the specific technologies, frameworks, and configurations used by target applications. This intelligence enables targeted testing approaches and helps prioritize vulnerabilities based on known weaknesses in identified technologies.

HTTP Response Fingerprinting

Server and Framework Headers

Manual Header Analysis:

# Basic header extraction
curl -I https://<target_domain>
wget --server-response --spider https://<target_domain>

# Multiple request methods for header analysis
for method in GET POST PUT DELETE OPTIONS; do
    echo "=== $method ==="
    curl -X $method -I https://<target_domain>
done

Automated Header Enumeration:

# Using nmap for HTTP header detection
nmap --script http-headers <target_ip>
nmap --script http-server-header <target_ip>

# Using whatweb for comprehensive technology detection
whatweb -v <target_url>
whatweb --color=never --no-errors -a 3 <target_url>

# Using httpx for advanced header analysis
echo <target_domain> | httpx -title -server -tech-detect -status-code

# Using nuclei for technology detection
nuclei -u <target_url> -t technologies/
nuclei -u <target_url> -t technologies/tech-detect.yaml

Framework Identification Headers:

  • X-Powered-By: PHP/7.4.3 - PHP version information

  • X-AspNet-Version: 4.0.30319 - ASP.NET framework version

  • X-Generator: Drupal 9 (https://www.drupal.org) - CMS identification

  • Server: Apache/2.4.41 (Ubuntu) - Web server with OS info

Error Message Fingerprinting

Database Error Detection with SQLMap:

# Basic database fingerprinting
sqlmap -u "https://<target>/page.php?id=1" --batch --banner

# Database enumeration without exploitation
sqlmap -u "https://<target>/page.php?id=1" --batch --fingerprint

# POST request database detection
sqlmap -r request.txt --batch --banner

# Cookie-based injection detection
sqlmap -u "https://<target>/page.php" --cookie="PHPSESSID=abc123; user_id=1" --batch

Error Detection with Nuclei:

# SQL injection detection templates
nuclei -u <target_url> -t vulnerabilities/sql/

# Database-specific error detection
nuclei -u <target_url> -t exposures/logs/sql-errors.yaml

Common Database Error Patterns:

  • MySQL: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version

  • PostgreSQL: PostgreSQL query failed: ERROR: syntax error

  • MSSQL: Microsoft OLE DB Provider for SQL Server error

  • Oracle: ORA-00942: table or view does not exist


Technology Stack Identification

Purpose

Technology stack identification provides comprehensive mapping of all technologies, frameworks, libraries, and services used by the target application.

Automated Technology Detection

WhatWeb Technology Analysis:

# Basic technology detection
whatweb <target_url>

# Aggressive scanning with all plugins
whatweb -a 3 <target_url>

# JSON output for parsing
whatweb --color=never --no-errors -a 3 --output-format=json <target_url>

# Bulk scanning
whatweb -i urls.txt --output-format=brief

Wappalyzer CLI Usage:

# Install and use Wappalyzer
npm install -g wappalyzer
wappalyzer <target_url>

# Batch processing
cat urls.txt | wappalyzer --batch --recursive

Nuclei Technology Detection:

# Comprehensive technology detection
nuclei -u <target_url> -t technologies/ -o tech_results.txt

# Specific framework detection
nuclei -u <target_url> -t technologies/wordpress-detect.yaml
nuclei -u <target_url> -t technologies/drupal-version.yaml
nuclei -u <target_url> -t technologies/joomla-version.yaml

CMS-Specific Detection

WordPress Enumeration with WPScan:

# Basic WordPress scan
wpscan --url <target_url>

# Enumerate plugins and themes
wpscan --url <target_url> --enumerate p,t

# User enumeration
wpscan --url <target_url> --enumerate u

# Vulnerability detection
wpscan --url <target_url> --enumerate vp,vt,cb

Drupal Enumeration with Droopescan:

# Drupal version and module detection
droopescan scan drupal -u <target_url>

# Specific plugin enumeration
droopescan scan drupal -u <target_url> --enumerate p

Joomla Scanning with JoomScan:

# Basic Joomla enumeration
joomscan -u <target_url>

# Component enumeration
joomscan -u <target_url> -ec

Directory and File Enumeration

Purpose

Directory and file enumeration systematically discovers hidden content, administrative interfaces, configuration files, and sensitive information.

Directory Discovery Tools

Gobuster Directory Enumeration:

# Basic directory discovery
gobuster dir -u https://<target> -w /usr/share/wordlists/dirb/common.txt

# Multi-extension discovery
gobuster dir -u https://<target> -w /usr/share/seclists/Discovery/Web-Content/common.txt -x php,asp,aspx,jsp,html

# Admin panel discovery
gobuster dir -u https://<target> -w /usr/share/seclists/Discovery/Web-Content/AdminPanels.txt

# Recursive discovery
gobuster dir -u https://<target> -w wordlist.txt -r -d 3

Ffuf Fuzzing:

# Basic directory fuzzing
ffuf -w /usr/share/seclists/Discovery/Web-Content/common.txt -u https://<target>/FUZZ

# Multi-extension fuzzing
ffuf -w wordlist.txt -u https://<target>/FUZZ -e .php,.asp,.aspx,.jsp,.html

# Parameter fuzzing
ffuf -w parameters.txt -u https://<target>/page.php?FUZZ=test

# POST data fuzzing
ffuf -w wordlist.txt -u https://<target>/login -d "username=admin&password=FUZZ" -X POST

Dirb Scanning:

# Basic dirb scan
dirb https://<target> /usr/share/dirb/wordlists/common.txt

# Extension-specific scanning
dirb https://<target> wordlist.txt -X .php,.asp,.jsp

# Recursive scanning
dirb https://<target> wordlist.txt -r

Feroxbuster Recursive Discovery:

# Recursive directory discovery
feroxbuster -u https://<target> -w /usr/share/seclists/Discovery/Web-Content/common.txt

# Multi-threaded with extensions
feroxbuster -u https://<target> -w wordlist.txt -x php,asp,html -t 200

# Depth-limited recursion
feroxbuster -u https://<target> -w wordlist.txt -d 2

Backup and Configuration File Discovery

Configuration File Discovery:

# Config file specific enumeration
gobuster dir -u https://<target> -w config-files.txt -x .conf,.config,.ini,.xml,.json,.yaml,.yml,.env

# Backup file discovery
ffuf -w backup-extensions.txt -u https://<target>/config.FUZZ

# Common config paths
nuclei -u <target_url> -t exposures/configs/

Version Control Exposure:

# Git repository detection
curl -s https://<target>/.git/HEAD
curl -s https://<target>/.git/config

# Git file enumeration with GitTools
python3 gitdumper.py https://<target>/.git/ output_directory

# SVN exposure check
curl -s https://<target>/.svn/entries

Source Code Analysis

Purpose

Source code analysis examines client-side code, exposed server-side code, and configuration files to identify vulnerabilities and sensitive information.

Client-Side Code Analysis

JavaScript Analysis:

# Extract JavaScript files
curl -s <target_url> | grep -oE 'src="[^"]*\.js"' | cut -d'"' -f2

# Analyze JavaScript for sensitive data
curl -s <target_url>/app.js | grep -E "(api|key|token|password|secret)"

# Source map discovery
curl -s <target_url>/app.js | grep "sourceMappingURL"

Link and Endpoint Extraction:

# Extract all links from page
curl -s <target_url> | grep -oE 'href="[^"]*"' | cut -d'"' -f2

# API endpoint discovery
curl -s <target_url> | grep -oE '"/api/[^"]*"'

# Extract form actions
curl -s <target_url> | grep -oE 'action="[^"]*"' | cut -d'"' -f2

Configuration File Analysis

Environment File Discovery:

# .env file check
curl -s https://<target>/.env

# Configuration file discovery
for file in config.php settings.py web.config application.properties; do
    echo "Testing: $file"
    curl -s -f "https://<target>/$file" && echo " [FOUND]"
done

Nuclei Config Exposure Detection:

# Configuration exposure detection
nuclei -u <target_url> -t exposures/configs/

# Environment file exposure
nuclei -u <target_url> -t exposures/files/

# Backup file detection
nuclei -u <target_url> -t exposures/backups/

Subdomain Enumeration

Purpose

Subdomain enumeration discovers additional attack surfaces and services that may have different security postures than the primary domain.

Passive Subdomain Discovery

Subfinder Enumeration:

# Basic subdomain discovery
subfinder -d <target_domain>

# Verbose output with sources
subfinder -d <target_domain> -v

# Specific sources
subfinder -d <target_domain> -sources crtsh,virustotal,shodan

# Output to file
subfinder -d <target_domain> -o subdomains.txt

Amass Comprehensive Enumeration:

# Passive enumeration
amass enum -passive -d <target_domain>

# Active enumeration
amass enum -active -d <target_domain>

# Brute force enumeration
amass enum -brute -d <target_domain> -w wordlist.txt

# Historical data
amass db -d <target_domain> -since 01/01/2023

Additional Passive Tools:

# Assetfinder
assetfinder --subs-only <target_domain>

# Findomain
findomain -t <target_domain>

# Certificate transparency with ctfr
python3 ctfr.py -d <target_domain>

Active Subdomain Discovery

DNS Brute Force with Gobuster:

# DNS subdomain brute force
gobuster dns -d <target_domain> -w /usr/share/seclists/Discovery/DNS/subdomains-top1million-5000.txt

# Threaded DNS brute force
gobuster dns -d <target_domain> -w wordlist.txt -t 50

MassDNS Resolution:

# Generate subdomain list
echo <target_domain> | subfinder | tee subdomains.txt

# Resolve with massdns
massdns -r resolvers.txt -t A -o S subdomains.txt

Certificate Transparency Analysis

CT Log Tools:

# Certificate transparency with subfinder
subfinder -d <target_domain> -sources crtsh,certspotter

# Manual CT log query
curl -s "https://crt.sh/?q=%25.<target_domain>&output=json" | jq -r '.[].name_value' | sort -u

# Historical certificate analysis
curl -s "https://crt.sh/?q=<target_domain>&output=json" | jq -r '.[] | select(.not_after > "2023-01-01") | .name_value'

Content Discovery

Purpose

Content discovery identifies hidden pages, directories, files, and functionality that may not be linked from the main application interface.

Web Crawling and Spidering

Burp Suite Spider:

# Configure Burp Suite for comprehensive crawling
# Set scope to target domain
# Enable form submission and link following
# Configure authentication if required

Hakrawler for JavaScript Crawling:

# JavaScript-aware crawling
echo <target_url> | hakrawler

# Deep crawling with JavaScript execution
echo <target_url> | hakrawler -depth 3 -js

Gospider Web Crawler:

# Comprehensive web crawling
gospider -s <target_url> -c 10 -d 3

# JavaScript file analysis
gospider -s <target_url> -js -c 10

API Discovery

API Endpoint Discovery:

# Common API paths
gobuster dir -u https://<target> -w api-endpoints.txt -p /api/v1,/api/v2,/rest,/graphql

# Swagger/OpenAPI discovery
ffuf -w common-api-docs.txt -u https://<target>/FUZZ

# GraphQL endpoint discovery
nuclei -u <target_url> -t exposures/apis/graphql.yaml

API Documentation Discovery:

# Swagger UI discovery
curl -s https://<target>/swagger-ui/
curl -s https://<target>/api-docs/

# OpenAPI specification
curl -s https://<target>/openapi.json
curl -s https://<target>/swagger.json

Third-Party Service Identification

Purpose

Third-party service identification maps external dependencies, integrations, and services used by the application.

CDN and External Service Detection

CDN Identification:

# DNS resolution for CDN detection
dig <target_domain>
nslookup <target_domain>

# Header-based CDN detection
curl -I https://<target_domain> | grep -i "server\|via\|x-cache"

# Whatweb CDN detection
whatweb <target_url> | grep -i cdn

External Resource Analysis:

# Extract external URLs
curl -s <target_url> | grep -oE 'https?://[^/"]+' | sort -u

# JavaScript library CDN analysis
curl -s <target_url> | grep -oE 'src="https?://[^"]*\.js"'

# Font and CSS CDN discovery
curl -s <target_url> | grep -oE 'href="https?://[^"]*\.(css|woff|ttf)"'

Authentication and Payment Integration

OAuth and SSO Detection:

# OAuth endpoint discovery
curl -s <target_url> | grep -i "oauth\|auth0\|okta\|azure"

# Social login detection
curl -s <target_url> | grep -E "(facebook|google|github|linkedin).*login"

Payment Gateway Detection:

# Payment processor identification
curl -s <target_url> | grep -i "stripe\|paypal\|square\|braintree"

# E-commerce platform detection
whatweb <target_url> | grep -i "shopify\|woocommerce\|magento"

Version Disclosure Analysis

Purpose

Version disclosure analysis identifies specific software versions and configurations that can inform vulnerability assessment.

Automated Version Detection

Nuclei Version Detection:

# Comprehensive version detection
nuclei -u <target_url> -t technologies/ -o versions.txt

# CMS version detection
nuclei -u <target_url> -t technologies/wordpress-version.yaml
nuclei -u <target_url> -t technologies/drupal-version.yaml
nuclei -u <target_url> -t technologies/joomla-version.yaml

Version Disclosure Files:

# Common version files
for file in version.txt VERSION changelog.txt CHANGELOG.md readme.txt README.md; do
    echo "Testing: $file"
    curl -s -f "https://<target>/$file" && echo " [FOUND]"
done

# Package manager files
for file in composer.json package.json requirements.txt Gemfile pom.xml; do
    echo "Testing: $file"
    curl -s -f "https://<target>/$file" && echo " [FOUND]"
done

Framework-Specific Version Detection

WordPress Version Detection:

# WordPress version from generator meta tag
curl -s <target_url> | grep -i "generator.*wordpress"

# Version from readme
curl -s <target_url>/readme.html | grep -i version

# Version from RSS feed
curl -s <target_url>/feed/ | grep -i generator

Database Version Detection:

# MySQL version through error
sqlmap -u "<target_url>?id=1" --batch --banner

# PostgreSQL version detection
sqlmap -u "<target_url>?id=1" --batch --dbms=postgresql --banner

Advanced Information Gathering Techniques

Purpose

Advanced information gathering techniques go beyond basic reconnaissance to uncover hidden assets, sensitive information, and potential attack vectors through sophisticated analysis and specialized tools.

Google Dorking and Search Engine Intelligence

Advanced Google Dork Techniques

Sensitive File Discovery:

# Database dumps and backups
site:<target_domain> filetype:sql "INSERT INTO"
site:<target_domain> filetype:sql "CREATE TABLE"
site:<target_domain> filetype:bak

# Configuration and log files
site:<target_domain> filetype:log
site:<target_domain> filetype:conf
site:<target_domain> ext:env "DB_PASSWORD"

# Email and contact information
site:<target_domain> "@<target_domain>" filetype:xls
site:<target_domain> "@<target_domain>" filetype:csv

# Error messages and debug information
site:<target_domain> "fatal error" "call stack"
site:<target_domain> "mysql_connect()" "warning"
site:<target_domain> "Index of" "Parent Directory"

Framework and Technology Specific Dorks:

# WordPress specific
site:<target_domain> inurl:wp-content/uploads filetype:sql
site:<target_domain> inurl:wp-config.php.bak
site:<target_domain> "wp-config.php" "DB_PASSWORD"

# Drupal specific
site:<target_domain> inurl:sites/default/files
site:<target_domain> "settings.php" "database"

# Laravel specific
site:<target_domain> ".env" "APP_KEY"
site:<target_domain> inurl:storage/logs

Alternative Search Engines

Bing and DuckDuckGo Intelligence:

# Bing specific operators
site:<target_domain> contains:login
site:<target_domain> contains:admin

# DuckDuckGo search
site:<target_domain> filetype:pdf
site:<target_domain> intitle:"admin" OR intitle:"administrator"

Specialized Search Engines:

# Shodan queries for web applications
http.title:"<company_name>"
ssl.cert.subject.cn:<target_domain>
hostname:<target_domain> port:80,443,8080,8443

# Censys queries
parsed.names:<target_domain>
autonomous_system.organization:"<company_name>"

Social Media and Professional Network Intelligence

LinkedIn Reconnaissance

Employee Enumeration with theHarvester:

# Comprehensive email and employee enumeration
theHarvester -d <target_domain> -l 500 -b linkedin

# Specific social media sources
theHarvester -d <target_domain> -b linkedin,twitter,instagram

# Export results for further analysis
theHarvester -d <target_domain> -b all -f employees.xml

Manual LinkedIn Intelligence:

# Advanced LinkedIn search operators
site:linkedin.com "<company_name>" "software engineer"
site:linkedin.com "<company_name>" "system administrator"
site:linkedin.com "<company_name>" "devops"

# Technology stack identification through employee skills
site:linkedin.com "<company_name>" "AWS" OR "Azure" OR "Docker"

GitHub and Code Repository Intelligence

GitHub Reconnaissance with GitDorker:

# Install and use GitDorker
python3 GitDorker.py -tf tokens.txt -q <target_domain> -d dorks.txt

# Manual GitHub searches
site:github.com "<target_domain>"
site:github.com "<company_name>" password
site:github.com "<company_name>" API_KEY OR secret

GitHub API Intelligence:

# GitHub API for organization discovery
curl -s "https://api.github.com/search/users?q=<company_name>+type:org"

# Repository enumeration
curl -s "https://api.github.com/orgs/<company_name>/repos"

# Recent commits analysis
curl -s "https://api.github.com/repos/<company>/<repo>/commits"

Cloud Infrastructure Discovery

AWS Infrastructure Enumeration

S3 Bucket Discovery:

# S3 bucket enumeration with aws-cli
aws s3 ls s3://<company-name>-backups --no-sign-request
aws s3 ls s3://<company-name>-logs --no-sign-request

# Automated S3 hunting with Bucket Stream
python3 bucket_stream.py --only-interesting

# S3 bucket permutation with S3Scanner
python3 s3scanner.py sites.txt

CloudFront and CDN Discovery:

# CloudFront distribution enumeration
dig <target_domain> | grep cloudfront
curl -I https://<target_domain> | grep -i cloudfront

# CDN detection
whatweb <target_url> | grep -i "cloudflare\|akamai\|fastly"

API Discovery and Analysis

GraphQL Intelligence

GraphQL Schema Discovery:

# GraphQL introspection query
curl -X POST -H "Content-Type: application/json" \
     -d '{"query": "{__schema{types{name}}}"}' \
     https://<target>/graphql

# GraphQL Voyager for schema visualization
python3 -m http.server 8080
# Navigate to GraphQL Voyager interface

# Automated GraphQL analysis with InQL
python3 inql.py -t https://<target>/graphql

REST API Discovery with Kiterunner:

# API endpoint discovery
kr scan <target_url> -w routes-large.kite

# Wordlist-based API discovery
kr brute <target_url> -w api-wordlist.txt

# Technology-specific API patterns
kr scan <target_url> -w swagger-wordlist.kite

Mobile Application Analysis

Mobile App Intelligence

APK Analysis for Web Services:

# APK download and extraction
apktool d application.apk

# Extract API endpoints from APK
grep -r "https\?://" application/ | grep -E "\.(com|net|org)"

# Certificate pinning analysis
grep -r "certificate\|ssl\|tls" application/

iOS Application Analysis:

# IPA file analysis
unzip application.ipa
plutil -p Payload/App.app/Info.plist

# URL scheme discovery
grep -r "http" Payload/App.app/

Metadata and Document Analysis

Document Intelligence

Metadata Extraction with ExifTool:

# Download and analyze documents
wget https://<target>/document.pdf
exiftool document.pdf

# Bulk document analysis
find downloads/ -name "*.pdf" -exec exiftool {} \;

# Author and creation software analysis
exiftool -Author -Creator -Producer downloads/*

FOCA (Fingerprinting Organizations with Collected Archives):

# Document metadata analysis
python3 foca.py -d <target_domain> -t pdf,doc,xls

# Email extraction from documents
python3 foca.py -d <target_domain> -e -t all

Threat Intelligence Integration

Threat Intelligence Platforms

VirusTotal Intelligence:

# Domain intelligence
curl -H "x-apikey: <api_key>" \
     "https://www.virustotal.com/vtapi/v2/domain/report?domain=<target_domain>"

# Passive DNS analysis
curl -H "x-apikey: <api_key>" \
     "https://www.virustotal.com/vtapi/v2/domain/report?domain=<target_domain>&allinfo=1"

Passive Total Integration:

# Historical WHOIS data
curl -u "<username>:<api_key>" \
     "https://api.passivetotal.org/v2/whois?query=<target_domain>"

# Passive DNS resolution
curl -u "<username>:<api_key>" \
     "https://api.passivetotal.org/v2/dns/passive?query=<target_domain>"

Information Organization

Data Collection Framework:

# Create organized directory structure
mkdir -p reconnaissance/{subdomains,directories,technologies,vulnerabilities,screenshots}

# Automated data collection script
echo "Target: <target_domain>" > reconnaissance/summary.txt
subfinder -d <target_domain> > reconnaissance/subdomains/subfinder.txt
gobuster dir -u https://<target_domain> -w wordlist.txt > reconnaissance/directories/gobuster.txt
whatweb <target_domain> > reconnaissance/technologies/whatweb.txt

Validation and Verification

Multi-Tool Verification:

# Cross-validate findings with multiple tools
subfinder -d <target_domain> -o sub1.txt
amass enum -passive -d <target_domain> -o sub2.txt
findomain -t <target_domain> -o sub3.txt

# Compare results
cat sub1.txt sub2.txt sub3.txt | sort -u > verified_subdomains.txt

Quality Assurance Checklist:

  • Web application fingerprinting completed with multiple methods

  • Technology stack fully identified and documented

  • Directory and file enumeration comprehensive

  • Source code analysis completed where accessible

  • Subdomain enumeration exhaustive across multiple techniques

  • Content discovery systematic and complete

  • Version disclosure analysis comprehensive with vulnerability correlation

  • All findings verified through multiple detection methods

  • Security implications documented for each discovered component

Last updated

Was this helpful?