Revocation Statuses on the Internet

Nikita Korzhitskii and Niklas Carlsson


Paper: Nikita Korzhitskii and Niklas Carlsson, "Revocation Statuses on the Internet", Proc. Passive and Active Measurement Conference (PAM), Mar/Apr. 2021. (pdf)

Abstract: The modern Internet is highly dependent on the trust communicated via X.509 certificates. However, in some cases certificates become untrusted and it is necessary to revoke them. In practice, the problem of secure certificate revocation has not yet been solved, and today no revocation procedure (similar to Certificate Transparency w.r.t. certificate issuance) has been adopted to provide transparent and immutable history of all revocations. Instead, the status of most certificates can only be checked with Online Certificate Status Protocol (OCSP) and/or Certificate Revocation Lists (CRLs). In this paper, we present the first longitudinal characterization of the revocation statuses delivered by CRLs and OCSP servers from the time of certificate expiration to status disappearance. The analysis captures the status history of over 1 million revoked certificates, including 773K certificates mass-revoked by Let's Encrypt. Our characterization provides a new perspective on the Internet's revocation rates, quantifies how short-lived the revocation statuses are, highlights differences in revocation practices within and between different CAs, and captures biases and oddities in the handling of revoked certificates. Combined, the findings motivate the development and adoption of a revocation transparency standard.

Datasets

To help build upon our work, below, we make available datasets. The datasets are compressed using XZ Utils LZMA2 algorithm, decompression requires at least ~1536MiB of RAM and ~267GiB of storage.

The dataset can be downloaded here.

If you use our datasets in your research, please include a reference to our PAM 2021 paper (pdf) in your work.

Dataset A

The dataset contains OCSP statuses of revoked certificates in a "Tab"-separated text file with the following fields:
  1. certificate: an internal certificate ID
  2. first_pass_status: status code returned during the first pass
  3. not_before: start of the validity period of a certificate
  4. not_after: end of the validity period of a certificate
  5. status: status code
  6. status_time_min_unix: the first time the status code was observed for the certificate
  7. status_time_max_unix: the first time the status code was observed for the certificate
  8. count: number of status observations with the code during the above interval
  9. certificate: an internal certificate ID (duplicate)
  10. serial: serial number of a certificate
  11. name: common name of a certificate
  12. subject: certificate subject, JSON encoded
  13. issuer: issuer, JSON encoded
  14. version: certificate version, OpenSSL enumeration
  15. purposes: certificate purposes, JSON encoded
  16. hash: certificate hash, SHA256
  17. basicConstraints:
  18. keyUsage: keyUsage as interpreted by OpenSSL
  19. extendedKeyUsage: extended key usage as interpreted by OpenSSL
  20. authorityKeyIdentifier: reference to the key of the issuer
  21. subjectAltName: alternative common names, e.g. additional domains
  22. hasSCT: set to 1, if the certificate contains at least one Signed Certificate Timestamp
  23. keyType: key type (e.g. RSA, EC)
  24. keyBits: key length in bits
  25. extensions: certificate extensions, JSON encoded
  26. isEV: set to 1, if the certificate is an Extended Validation certificate
The timestamps are provided in the form of Unix-timestamps.

Status codes:

Additional codes:

Dataset B

The dataset contains several files with Python tuples ('hash', 'parsedCRL'). Each tuple corresponds to a snapshot of a CRL list, where parsedList is a parsed version of the list encoded in JSON format. Each non-empty parsedCRL contains a key 'Certificate Revocation List (CRL)' that refers to the details about the list, such as validity dates, signature algorithm, version, extensions, etc., while the key 'Revoked Certificates', refers to a value object that has serial numbers of revoked certificates as keys. The schemas of parsedCRL objects vary, depending on the fields included in the original CRL.

The following is an example of a parsedCRL object.


{
  'Revoked Certificates': {
    '020047A56DC2A9A924EBFC588DE19E412871B859': {
      'Revocation Date': 'Oct 14 20:30:52 2019 GMT'
  },
  '021821E3F0C5F9EDFB70224442A4D577A16FEF38': {
    'Revocation Date': 'Oct 14 20:31:15 2019 GMT'
  }
},
  'Certificate Revocation List (CRL)': {
    'Last Update': 'Feb 24 09:49:20 2020 GMT',
    'Signature Algorithm': 'sha256WithRSAEncryption',
    'Version 2 (0x1)': {},
    'CRL extensions': {
      'X509v3 CRL Number': '161',
      'X509v3 Authority Key Identifier': 'keyid:6C:E2:B0:26:8D:5B:D6:26:08:1F:98:5D:69:E0:0E:7F:55:EC:AE:76'
    },
    'Next Update': 'Feb 26 09:49:20 2020 GMT',
    'Issuer': '/CN=Correo Uruguayo - CA/C=UY'
  }
}))

Dataset C

The dataset contains 2 CSV files: a map of CRL IDs to URL addresses (id, url) measurement times and hashes of observed lists, that refer to Dataset B (id, hash, timestamp)

Citation format

When citing our dataset or work, please cite the conference version of the paper: