Lowering the Barriers to Working with Public RIR-Level Data

About

Regional Internet Registries (RIRs) publish WHOIS, route object delegation (in Statistics Exchange files), and reverse DNS (rDNS) zone files. These are are valuable resources for networking research and engineers yet they contain inconsistencies and are not all available long-term. In this work, we consolidate and make available longitudinal RIR-level data, aiming to lower the barriers to start working with these data.

{"prefixes": ["23.219.0.0/16"], "start_address": "23.219.0.0", 
"end_address": "23.219.255.255", "rfc_2317": false, 
"timestamp": 1684357200, "source": "ARIN", "af": 4, 
"rdns": {"name": ["219.23.in-addr.arpa."], 
"origin": ["23.in-addr.arpa."], "ttl": 86400, 
"rdclass": "IN", "rdatasets": 
{"NS": ["ns{1-8}.reverse.deploy.akamaitechnologies.com."]}}}

Consolidated rDNS data

We enrich the data, e.g., by adding a classless delegation flag. The prefixes in RIR-level zones largely follow octet boundaries, but CNAME are sometimes present for classless delegation, i.e., RFC 2317. We store the consolidated rDNS data in a tiered hierarchy similar to the WHOIS data and key records in the same manner.

Consolidated WHOIS data

We store the consolidated WHOIS data in a tiered (year, month, day) hierarchy, which popular tools for data engineering can use for partition discovery as well as optimisation. The partitioned data contains per record information such as the source RIR, WHOIS serial number, object created and last-modified dates.

{"serial": 748705, "use_route": true, 
"prefixes": ["23.219.183.0/24"],"start_address": "23.219.183.0", 
"end_address": "23.219.183.255", "descr": "Akamai Technologies", 
"origin": 20940, "mnt-by": "MNT-AKAMAI", 
"source": "ARIN", "created": 1555027200, 
"last-modified": 1555027200, "status": "ALLOCATED",
"netname": null, "country": "US", "af": 4}

Data Schema

Field	Datatype	Description	Dataset
serial	INTEGER	Internal serial number for published WHOIS	WHOIS
prefixes	ARRAY of STRING	Allocated prefixes or delegated rDNS	Both

How to Obtain the Data

We host the consolidated rDNS and WHOIS data in an S3-compatible Object Storage. You can address our repository and load the data directly using tools such as Apache Spark. The records are stored in bzip2-compressed JSON Lines objects. To get you started, we have created a basic Jupyter Python notebook for inspiration (see below). You can also browse and directly download the data if you prefer here.

Example using Docker, the AWS SDK for Python and Spark

Step 1: Create an image building directory: mkdir rir-data-notebook && cd rir-data-notebook
Step 2: Create a file named Dockerfile with the following content:


FROM quay.io/jupyter/pyspark-notebook:spark-3.5.3

USER root
RUN wget -q -P /usr/local/spark/jars/ \
    https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-bundle/1.11.375/aws-java-sdk-bundle-1.11.375.jar
RUN wget -q -P /usr/local/spark/jars/ \
    https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/3.2.0/hadoop-aws-3.2.0.jar \

USER ${NB_UID}
RUN pip install boto3

Step 3: Build the Docker image: docker build --tag 'rir-data-notebook:spark-3.5.3' .
Step 4: Run a container: docker run -p 8888:8888 rir-data-notebook:spark-3.5.3
Step 5: Open Jupyter Lab:
The standard output from the previous command will display a web link with an authentication token. Open the link in your browser to access Jupyter Lab, or use this link: http://127.0.0.1:8888/lab and submit the token.
Step 6: Upload the example notebook:
Click the up arrow ("Upload Files") and upload the example .ipynb file. A web preview can be found here.

Team Members

Alfred Arouna
alfred[at]simula[dot]no

Ioana Livadariu
ioana[at]simula[dot]no

Mattijs Jonker
m.jonker[at]utwente[dot]nl