Infrastructure

100Cr+

DNS Responses

28Cr+

Unique Domains

~1,600

Domains/Second

~20k

Inserts/Second

System Overview

Niruddha is a distributed system that performs DNS resolution across Indian ISP nameservers at scale, processing over 100 crore responses across 28 crore domains to date.

Data Pipeline

Five-stage distributed processing pipeline

Ingestion

CZDS + CT Logs → Kafka

Distribution

Kafka → Worker Fleet

Resolution

ZDNS + ISP Resolvers

Publishing

Results → Kafka

Storage

Kafka → PostgreSQL

Ingestion: Domain names sourced from CZDS (Centralized Zone Data Service) and Certificate Transparency logs are loaded into Kafka topics for parallel processing.

Distribution: Kafka consumers distribute domains across a fleet of worker nodes, each pulling from the shared topic.

Resolution: Workers use ZDNS to query domains against ISP nameservers (Jio, Airtel, BSNL, ACT, Idea, Tata), throttled to respect resolver capacity.

Publishing: DNS responses, including filtering indicators, are published back to Kafka for persistence.

Storage: A dedicated consumer writes responses to PostgreSQL at ~20,000 inserts per second using batch operations.

Technology Stack

Core Infrastructure

Message QueueApache Kafka
DatabasePostgreSQL
Cache LayerRedis
DNS ResolutionZDNS

Web Application

FrameworkNext.js 15
UI LibraryReact 19
Local StorageSQLite
VirtualizationTanStack Virtual

Worker Fleet

Horizontally scaled resolution nodes

The worker fleet is the core component of data collection. Each worker:

Pulls resolver addresses from Redis (dynamic resolver pool)
Consumes domains from Kafka topics
Performs DNS lookups using ZDNS (concurrent DNS scanner)
Publishes results back to Kafka for persistence

Workers operate independently with no shared state beyond Kafka offsets, allowing horizontal scaling. Resolution rate is throttled per-worker to stay within resolver capacity.

Collective throughput: ~1,600 domains/second

Variable based on resolver latency and throttling configuration

Storage & Caching

Three-tier data architecture

PostgreSQL (Primary Storage)

Over 100 crore DNS response records stored with compound indexes optimised for index-only scans. Batch inserts achieve ~20,000 records/second.

HDD-backed storage; direct queries take 30-60 seconds due to table size.

Redis (Distributed Cache)

Filtered domains cached in Redis with 1-hour TTL. Web requests read from Redis first, reducing response time from 30+ seconds to ~50ms.

SQLite (Local Mirror)

Filtered domains mirrored to local SQLite for low-latency search and filtering. Used as fallback when Redis is unavailable.

Cache Refresh: Hourly automated refresh from PostgreSQL, with manual refresh available via admin interface. PostgreSQL is queried only during cache refresh, never on user requests.

Known Limitations

DNS-only observation: Does not detect IP-level or deep packet inspection filtering
Point-in-time data: Resolution behavior can change; cache refreshes hourly
Resolver coverage: Limited to publicly accessible ISP nameservers
Domain coverage: Limited to domains present in CZDS and CT logs

Browse the Dataset