Press "Enter" to skip to content

Listing files in each layer of an saved Docker image

If someone gives you a Docker image as a tarball, i.e., the output of docker save, and if you want to find a file that was deleted at some point and does not exist in the final image, how to do it?

Hint:

  1. A Docker image is composed of layers (i.e., snapshots) of each Docker file command.
  2. Each layer in an image tarball is also itself a tarball.

A TLDR example solution (code can be found here):

#! /usr/bin/env python3
# -*- coding: utf-8 -*-
# vim:fenc=utf-8
#
# Copyright © 2022 Pi-Yueh Chuang <pychuang@pm.me>
#
# Distributed under terms of the BSD 3-Clause license.

"""List files in each layer of an image tarball created through `docker save`.
"""
import argparse
import logging
import pathlib
import tarfile
import tempfile


def main(image, logger):
    """Main function.
    """

    with tarfile.open(image, "r") as fobj:
        member = fobj.next()
        while member is not None:

            if "layer.tar" not in member.name:
                member = fobj.next()
                continue

            logger.info(f"{member.name}")
            with tempfile.TemporaryDirectory() as tempdir:
                fobj.extract(member, tempdir)
                with tarfile.open(f"{tempdir}/{member.name}", "r") as subfobj:
                    submember = subfobj.next()
                    while submember is not None:
                        logger.info(f"\t{submember.name}")
                        submember = subfobj.next()
            logger.info("")

            member = fobj.next()


if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="List files in each layer of a Docker image (in *.tar.gz format)")
    parser.add_argument("image", metavar="IMAGE", action="store", type=pathlib.Path)
    parser.add_argument("logfile", metavar="LOGFILE", action="store", type=pathlib.Path)
    args = parser.parse_args()

    logger = logging.getLogger(__name__)
    logging.basicConfig(handlers=[logging.NullHandler()], format="%(message)s", level=logging.DEBUG)
    logger.addHandler(logging.StreamHandler())
    logger.addHandler(logging.FileHandler(args.logfile, "w", "utf-8"))

    main(args.image, logger)

The code shown write a list of all files in each layer to user-specified text file. Then, we can check that text file to find the desired file. Finally, extract that single file from the tarball.

Note, each layer was committed at the end of each Docker file command. So if the file was downloaded and removed in the same command, then there’s no way to extract that file from the image history. For example, if the Docker file has something like:

RUN curl -LO https://......./abc.tar.gz \
 && <compile and build> \
 && rm abc.tar.gz

Then, the abc.tar.gz will not be found in history.

Be First to Comment

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.