Regional Internet Registries (RIRs) publish WHOIS, route object delegation (in Statistics Exchange files), and reverse DNS (rDNS) zone files. These are are valuable resources for networking research and engineers yet they contain inconsistencies and are not all available long-term. In this work, we consolidate and make available longitudinal RIR-level data, aiming to lower the barriers to start working with these data.
{"prefixes": ["23.219.0.0/16"], "start_address": "23.219.0.0",
"end_address": "23.219.255.255", "rfc_2317": false,
"timestamp": 1684357200, "source": "ARIN", "af": 4,
"rdns": {"name": ["219.23.in-addr.arpa."],
"origin": ["23.in-addr.arpa."], "ttl": 86400,
"rdclass": "IN", "rdatasets":
{"NS": ["ns{1-8}.reverse.deploy.akamaitechnologies.com."]}}}
We enrich the data, e.g., by adding a classless delegation flag. The prefixes in RIR-level zones largely follow octet boundaries, but CNAME are sometimes present for classless delegation, i.e., RFC 2317. We store the consolidated rDNS data in a tiered hierarchy similar to the WHOIS data and key records in the same manner.
We store the consolidated WHOIS data in a tiered (year, month, day) hierarchy, which popular tools for data engineering can use for partition discovery as well as optimisation. The partitioned data contains per record information such as the source RIR, WHOIS serial number, object created and last-modified dates.
{"serial": 748705, "use_route": true,
"prefixes": ["23.219.183.0/24"],"start_address": "23.219.183.0",
"end_address": "23.219.183.255", "descr": "Akamai Technologies",
"origin": 20940, "mnt-by": "MNT-AKAMAI",
"source": "ARIN", "created": 1555027200,
"last-modified": 1555027200, "status": "ALLOCATED",
"netname": null, "country": "US", "af": 4}
Field | Datatype | Description | Dataset |
---|---|---|---|
serial | INTEGER | Internal serial number for published WHOIS | WHOIS |
prefixes | ARRAY of STRING | Allocated prefixes or delegated rDNS | Both |
We host the consolidated rDNS and WHOIS data in an S3-compatible Object Storage. You can address our repository and load the data directly using tools such as Apache Spark. The records are stored in bzip2-compressed JSON Lines objects. To get you started, we have created a basic Jupyter Python notebook for inspiration (see below). You can also browse and directly download the data if you prefer here.
mkdir rir-data-notebook && cd rir-data-notebook
Step 2: Create a file named Dockerfile with the following content:
FROM quay.io/jupyter/pyspark-notebook:spark-3.5.3
USER root
RUN wget -q -P /usr/local/spark/jars/ \
https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-bundle/1.11.375/aws-java-sdk-bundle-1.11.375.jar
RUN wget -q -P /usr/local/spark/jars/ \
https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/3.2.0/hadoop-aws-3.2.0.jar \
USER ${NB_UID}
RUN pip install boto3
Step 3: Build the Docker image: docker build --tag 'rir-data-notebook:spark-3.5.3' .
Step 4: Run a container: docker run -p 8888:8888 rir-data-notebook:spark-3.5.3
Step 5: Open Jupyter Lab:
The standard output from the previous command will display a web link with an authentication token. Open the link in your browser to access Jupyter Lab, or use this link: http://127.0.0.1:8888/lab and submit the token.
Step 6: Upload the example notebook:
Click the up arrow ("Upload Files") and upload the example .ipynb file. A web preview can be found here.
Alfred Arouna
Ioana Livadariu
Mattijs Jonker