Infrastructure

100Cr+
DNS Responses
28Cr+
Unique Domains
~1,600
Domains/Second
~20k
Inserts/Second

System Overview

Niruddha is a distributed system that performs DNS resolution across Indian ISP nameservers at scale, processing over 100 crore responses across 28 crore domains to date.

Data Pipeline

Five-stage distributed processing pipeline

1
Ingestion
CZDS + CT Logs → Kafka
2
Distribution
Kafka → Worker Fleet
3
Resolution
ZDNS + ISP Resolvers
4
Publishing
Results → Kafka
5
Storage
Kafka → PostgreSQL
Ingestion: Domain names sourced from CZDS (Centralized Zone Data Service) and Certificate Transparency logs are loaded into Kafka topics for parallel processing.
Distribution: Kafka consumers distribute domains across a fleet of worker nodes, each pulling from the shared topic.
Resolution: Workers use ZDNS to query domains against ISP nameservers (Jio, Airtel, BSNL, ACT, Idea, Tata), throttled to respect resolver capacity.
Publishing: DNS responses, including filtering indicators, are published back to Kafka for persistence.
Storage: A dedicated consumer writes responses to PostgreSQL at ~20,000 inserts per second using batch operations.

Technology Stack

Core Infrastructure

  • Message QueueApache Kafka
  • DatabasePostgreSQL
  • Cache LayerRedis
  • DNS ResolutionZDNS

Web Application

  • FrameworkNext.js 15
  • UI LibraryReact 19
  • Local StorageSQLite
  • VirtualizationTanStack Virtual

Worker Fleet

Horizontally scaled resolution nodes

The worker fleet is the core component of data collection. Each worker:

  • Pulls resolver addresses from Redis (dynamic resolver pool)
  • Consumes domains from Kafka topics
  • Performs DNS lookups using ZDNS (concurrent DNS scanner)
  • Publishes results back to Kafka for persistence

Workers operate independently with no shared state beyond Kafka offsets, allowing horizontal scaling. Resolution rate is throttled per-worker to stay within resolver capacity.

Collective throughput: ~1,600 domains/second

Variable based on resolver latency and throttling configuration

Storage & Caching

Three-tier data architecture

PostgreSQL (Primary Storage)

Over 100 crore DNS response records stored with compound indexes optimised for index-only scans. Batch inserts achieve ~20,000 records/second.

HDD-backed storage; direct queries take 30-60 seconds due to table size.

Redis (Distributed Cache)

Filtered domains cached in Redis with 1-hour TTL. Web requests read from Redis first, reducing response time from 30+ seconds to ~50ms.

SQLite (Local Mirror)

Filtered domains mirrored to local SQLite for low-latency search and filtering. Used as fallback when Redis is unavailable.

Cache Refresh: Hourly automated refresh from PostgreSQL, with manual refresh available via admin interface. PostgreSQL is queried only during cache refresh, never on user requests.

Known Limitations

  • DNS-only observation: Does not detect IP-level or deep packet inspection filtering
  • Point-in-time data: Resolution behavior can change; cache refreshes hourly
  • Resolver coverage: Limited to publicly accessible ISP nameservers
  • Domain coverage: Limited to domains present in CZDS and CT logs
Browse the Dataset