Revocation Statuses on the Internet
Nikita Korzhitskii and Niklas Carlsson
Paper:
Nikita Korzhitskii and Niklas Carlsson,
"Revocation Statuses on the Internet",
Proc. Passive and Active Measurement Conference (PAM),
Mar/Apr. 2021.
(pdf)
Abstract:
The modern Internet is highly dependent on the trust communicated via X.509 certificates. However, in some cases certificates become untrusted and it is necessary to revoke them. In practice, the problem of secure certificate revocation has not yet been solved, and today no revocation procedure (similar to Certificate Transparency w.r.t. certificate issuance) has been adopted to provide transparent and immutable history of all revocations. Instead, the status of most certificates can only be checked with Online Certificate Status Protocol (OCSP) and/or Certificate Revocation Lists (CRLs). In this paper, we present the first longitudinal characterization of the revocation statuses delivered by CRLs and OCSP servers from the time of certificate expiration to status disappearance. The analysis captures the status history of over 1 million revoked certificates, including 773K certificates mass-revoked by Let's Encrypt. Our characterization provides a new perspective on the Internet's revocation rates, quantifies how short-lived the revocation statuses are, highlights differences in revocation practices within and between different CAs, and captures biases and oddities in the handling of revoked certificates. Combined, the findings motivate the development and adoption of a revocation transparency standard.
Datasets
To help build upon our work, below, we make available datasets.
The datasets are compressed using XZ Utils LZMA2 algorithm, decompression requires at least ~1536MiB of RAM and ~267GiB of storage.
The dataset can be downloaded
here.
If you use our datasets in your research,
please include a reference to our PAM 2021 paper
(pdf)
in your work.
Dataset A
The dataset contains OCSP statuses of revoked certificates in a "Tab"-separated text file with the following fields:
-
certificate: an internal certificate ID
-
first_pass_status: status code returned during the first pass
-
not_before: start of the validity period of a certificate
-
not_after: end of the validity period of a certificate
-
status: status code
-
status_time_min_unix: the first time the status code was observed for the certificate
-
status_time_max_unix: the first time the status code was observed for the certificate
-
count: number of status observations with the code during the above interval
-
certificate: an internal certificate ID (duplicate)
-
serial: serial number of a certificate
-
name: common name of a certificate
-
subject: certificate subject, JSON encoded
-
issuer: issuer, JSON encoded
-
version: certificate version, OpenSSL enumeration
-
purposes: certificate purposes, JSON encoded
-
hash: certificate hash, SHA256
-
basicConstraints:
-
keyUsage: keyUsage as interpreted by OpenSSL
-
extendedKeyUsage: extended key usage as interpreted by OpenSSL
-
authorityKeyIdentifier: reference to the key of the issuer
-
subjectAltName: alternative common names, e.g. additional domains
-
hasSCT: set to 1, if the certificate contains at least one Signed Certificate Timestamp
-
keyType: key type (e.g. RSA, EC)
-
keyBits: key length in bits
-
extensions: certificate extensions, JSON encoded
-
isEV: set to 1, if the certificate is an Extended Validation certificate
The timestamps are provided in the form of Unix-timestamps.
Status codes:
-
1 - Good (non-revoked)
-
2 - Revoked
-
6 - Unauthorized (A non-signed OCSP response)
-
10 - Unknown (Any non-standard response or an explicit OCSP response with a status Unknown)
-
124 - Timeout (It took more than 5 seconds to get a response)
Additional codes:
-
0 - Unchecked (should not appear)
-
11 - Bad file (Appears if there's an error parsing the file, should not appear)
-
12 - Bad url (Appears if something is wrong with the OCSP endpoint URL in the certificate, should not appear)
-
254 - Pending (should not appear)
Dataset B
The dataset contains several files with Python tuples ('hash', 'parsedCRL').
Each tuple corresponds to a snapshot of a CRL list, where parsedList is a parsed version of the list encoded in JSON format.
Each non-empty parsedCRL contains a key 'Certificate Revocation List (CRL)'
that refers to the details about the list, such as validity dates, signature algorithm,
version, extensions, etc., while the key 'Revoked Certificates', refers to a value object
that has serial numbers of revoked certificates as keys. The schemas of parsedCRL objects vary,
depending on the fields included in the original CRL.
The following is an example of a parsedCRL object.
{
'Revoked Certificates': {
'020047A56DC2A9A924EBFC588DE19E412871B859': {
'Revocation Date': 'Oct 14 20:30:52 2019 GMT'
},
'021821E3F0C5F9EDFB70224442A4D577A16FEF38': {
'Revocation Date': 'Oct 14 20:31:15 2019 GMT'
}
},
'Certificate Revocation List (CRL)': {
'Last Update': 'Feb 24 09:49:20 2020 GMT',
'Signature Algorithm': 'sha256WithRSAEncryption',
'Version 2 (0x1)': {},
'CRL extensions': {
'X509v3 CRL Number': '161',
'X509v3 Authority Key Identifier': 'keyid:6C:E2:B0:26:8D:5B:D6:26:08:1F:98:5D:69:E0:0E:7F:55:EC:AE:76'
},
'Next Update': 'Feb 26 09:49:20 2020 GMT',
'Issuer': '/CN=Correo Uruguayo - CA/C=UY'
}
}))
Dataset C
The dataset contains 2 CSV files:
a map of CRL IDs to URL addresses (id, url)
measurement times and hashes of observed lists, that refer to Dataset B (id, hash, timestamp)
Citation format
When citing our dataset or work, please cite the conference version of the paper:
-
Nikita Korzhitskii and Niklas Carlsson,
"Revocation Statuses on the Internet",
Proc. Passive and Active Measurement Conference (PAM),
Mar/Apr. 2021.
(pdf)