{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n# Create a very large FITS file from scratch\n\n\nThis example demonstrates how to create a large file (larger than will fit in\nmemory) from scratch using `astropy.io.fits`.\n\n-------------------\n\n*By: Erik Bray*\n\n*License: BSD*\n\n-------------------\n\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Normally to create a single image FITS file one would do something like:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import os\nimport numpy as np\nfrom astropy.io import fits\ndata = np.zeros((40000, 40000), dtype=np.float64)\nhdu = fits.PrimaryHDU(data=data)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Then use the `astropy.io.fits.writeto()` method to write out the new\nfile to disk\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "hdu.writeto('large.fits')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "However, a 40000 x 40000 array of doubles is nearly twelve gigabytes! Most\nsystems won't be able to create that in memory just to write out to disk. In\norder to create such a large file efficiently requires a little extra work,\nand a few assumptions.\n\nFirst, it is helpful to anticipate about how large (as in, how many keywords)\nthe header will have in it. FITS headers must be written in 2880 byte\nblocks, large enough for 36 keywords per block (including the END keyword in\nthe final block). Typical headers have somewhere between 1 and 4 blocks,\nthough sometimes more.\n\nSince the first thing we write to a FITS file is the header, we want to write\nenough header blocks so that there is plenty of padding in which to add new\nkeywords without having to resize the whole file. Say you want the header to\nuse 4 blocks by default. Then, excluding the END card which Astropy will add\nautomatically, create the header and pad it out to 36 * 4 cards.\n\nCreate a stub array to initialize the HDU; its\nexact size is irrelevant, as long as it has the desired number of\ndimensions\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "data = np.zeros((100, 100), dtype=np.float64)\nhdu = fits.PrimaryHDU(data=data)\nheader = hdu.header\nwhile len(header) < (36 * 4 - 1):\n    header.append()  # Adds a blank card to the end"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Now adjust the NAXISn keywords to the desired size of the array, and write\nonly the header out to a file. Using the ``hdu.writeto()`` method will cause\nastropy to \"helpfully\" reset the NAXISn keywords to match the size of the\ndummy array. That is because it works hard to ensure that only valid FITS\nfiles are written. Instead, we can write just the header to a file using the\n`astropy.io.fits.Header.tofile` method:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "header['NAXIS1'] = 40000\nheader['NAXIS2'] = 40000\nheader.tofile('large.fits')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Finally, grow out the end of the file to match the length of the\ndata (plus the length of the header). This can be done very efficiently on\nmost systems by seeking past the end of the file and writing a single byte,\nlike so:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "with open('large.fits', 'rb+') as fobj:\n    # Seek past the length of the header, plus the length of the\n    # Data we want to write.\n    # 8 is the number of bytes per value, i.e. abs(header['BITPIX'])/8\n    # (this example is assuming a 64-bit float)\n    # The -1 is to account for the final byte that we are about to\n    # write:\n    fobj.seek(len(header.tostring()) + (40000 * 40000 * 8) - 1)\n    fobj.write(b'\\0')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "More generally, this can be written:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "shape = tuple(header['NAXIS{0}'.format(ii)] for ii in range(1, header['NAXIS']+1))\nwith open('large.fits', 'rb+') as fobj:\n    fobj.seek(len(header.tostring()) + (np.product(shape) * np.abs(header['BITPIX']//8)) - 1)\n    fobj.write(b'\\0')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "On modern operating systems this will cause the file (past the header) to be\nfilled with zeros out to the ~12GB needed to hold a 40000 x 40000 image. On\nfilesystems that support sparse file creation (most Linux filesystems, but not\nthe HFS+ filesystem used by most Macs) this is a very fast, efficient\noperation. On other systems your mileage may vary.\n\nThis isn't the only way to build up a large file, but probably one of the\nsafest. This method can also be used to create large multi-extension FITS\nfiles, with a little care.\n\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Finally, we'll remove the file we created:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "os.remove('large.fits')"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.6.4"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}