Installation

Installing Rust

EGI v2 is written in Rust, so you will need a Rust toolchain installed to compile it. To check if you do, run the command cargo --help in a terminal. (cargo is Rust's build tool and package manager.) You should see output like this:

$ cargo --help
Rust's package manager

Usage: cargo [+toolchain] [OPTIONS] [COMMAND]
       cargo [+toolchain] [OPTIONS] -Zscript <MANIFEST_RS> [ARGS]...

If so, you can skip the rest of this section. If not, installing Rust is simple and does not require administrative privileges. If you are working on a high performance cluster, double check that Rust/cargo aren't already available by loading a module.

To install Rust, use the rustup manager by following the instructions on the rust-lang website.

Installing GGG

EGI is a wrapper around the GGG retrieval software, developed at JPL. You will need GGG installed for EGI to work. Full instructions to install are given on the TCCON wiki. The very brief version is:

  1. Ensure you have a Fortran compiler and the conda package manager installed and available on your system. The miniconda Python installation provides a minimal Python install and conda manager, and is ideal for this purpose. gfortran is the default Fortran compiler for GGG; to use another compiler will require you to link the proper compiler script in GGG's install subdirectory.
  2. Download the latest release from https://github.com/TCCON/GGG and untar it.
  3. Set the GGGPATH and gggpath environmental variables for your shell to point to the GGG directory. These should go in your ~/.bashrc or equivalent file. For example, if /home/user/ggg is the directory that you expanded from the release tarball (it should contain subdirectories such as isotopologs and linelist), and you use Bash, then add the following to ~/.bashrc:
export GGGPATH=/home/user/ggg
export gggpath=/home/user/ggg

If your shell is ZSH, then the syntax is the same, but you would modify ~/.zshrc instead. You can check which shell you have by running the command echo $SHELL in a terminal.

Installing GGG-RS

EGI-RS uses some developmental replacements for the GGG post processing programs that have more flexibility to handle EM27/SUN-specific settings. Because this is part of a repo still under development, installation is a bit annoying at present. We will do our best to simplify this in the future.

For now, the following steps will install these programs:

  1. Set the environmental variable GGGRS_NCDIR to point to the environment created as part of installing GGG. Depending on your GGG installation process, this might be at $GGGPATH/install/.condaenv or a named environment managed by conda. If you do not have a $GGGPATH/install/.condaenv, run conda env list and find the path for an environment named "ggg-tccon-default". See the Installing GGG section above for how to set environmental variables. After modifying your ~/.bashrc or ~/.zshrc file, run source ~/.bashrc or source ~/.zshrc to include the new environmental variable in your shell.
  2. Change directory into your $GGGPATH, directory (cd $GGGPATH)
  3. Clone the GGG-RS repo as src-rs (git clone https://github.com/TCCON/ggg-rs.git src-rs)
  4. Change into the src-rs directory and run the make command.

Compilation may take a few minutes. If all goes well, you should see a message similar to the following:

   Installed package `ggg-rs v0.1.0 (/home/you/ggg/src-rs)` (executables `add_nc_flags`, `apply_tccon_airmass_correction`, `bin2nc`, `change_ggg_files`, `collate_tccon_results`, `i2s_setup`, `list_spectra`, `plot_opus_spectra`, `query_output`, `strip_header`)
warning: be sure to add `/home/you/ggg/bin` to your PATH to be able to run the installed binaries

You can ignore the warning about adding a directory to your PATH; we will always be calling these programs with their full path, so adding the directory to your PATH (which it so you only need to type the program name to run it). If you encounter trouble, see the GGG-RS README for suggestions. To free up space, you can run the cargo clean command in $GGGPATH/src-rs to delete intermediate compilation files not needed any more.

Installing EGI-RS

Once you have GGG-RS installed, installing EGI-RS is easy. Simply run the following command from anywhere:

cargo install --git https://github.com/TCCON/egi-rs --root "$GGGPATH"

Similarly to installing GGG-RS, you should see a message similar to:

   Installed package `egi-rs v0.1.0 (https://github.com/TCCON/egi-rs#2d7a442c)` (executables `em27-catalogue`, `em27-gfit-prep`, `em27-i2s-prep`, `em27-init`)
warning: be sure to add `/home/you/ggg/bin` to your PATH to be able to run the installed binaries

Again, we can ignore that warning. The final step is to run the em27-init program we just installed. This will add some extra EM27/SUN-specific files to your $GGGPATH. It will print out a summary; all steps should read "OK". If not, or if you got a fatal error earlier in the run, correct the issue and try again.

Location

EGI v2 prefers JSON files to specify the latitude, longitude, and altitude at which the EM27 was positioned. Unlike version 1, which recommended using coordinate files stored in your EGIPATH, we recommend for version 2 that you place these JSON files alongside your interferograms. That way, if you package the interferograms to send to another computing system or institution, the necessary location data goes along with those interferograms. We do still support the original coordinate file approach, see below for more on that.

For this tutorial, we'll assume that your data will be in a directory structure like this:

/
└── data
    └── xx
        ├── 20240401
        │   ├── coords.json
        │   ├── interferograms/
        │   └── met_source.json
        └── 20240402
            ├── coords.json
            ├── interferograms/
            └── met_source.json

where xx is your site ID. As you can see, the input data is organized by date, and each date has a subfolder containing the interferograms as well as two JSON files. We'll create the coords.json files now. These files can have different formats, but the one we will use here represents a single location. Since we usually separate our data by day, this format works for all cases where the EM27 was in one location for a given day (which should be the majority of cases).

These JSON files have the format:

{
  "longitude": LONGITUDE_DEG,
  "latitude": LATITUDE_DEG,
  "altitude": ALTITUDE_METERS
}

where LONGITUDE_DEG, LATITUDE_DEG, and ALTITUDE_METERS must be replaced with numeric values. EGI uses the convention that west and south are represented as negative values. Here is a concrete example for an EM27 operated at Caltech, which is at 118.13 W, 34.14 N, and 230 m altitude:


{
  "longitude": -118.13,
  "latitude": 34.14,
  "altitude": 230.0
}

For this tutorial, we'll assume the EM27 was in the same place for both dates, so we would enter this same information for both files. If your EM27 is stationed quasi-permanently at one location, you could create one JSON file and symbolically link it to each daily directory.

We will see how these files are used in Running I2S.

Coordinate file support

If you have coordinate files from EGI v1, you can reuse them by making your coords.json files like so:

{
  "site_id": "xx"
}

Again, "xx" would be replaced with the site ID for these interferograms. This tells EGI v2 to look for a coordinate file at $EGIPATH/coordinates/xx_dlla.dat. The coordinate files have the format:

2   6
Date     UTCTime   Latitude  Longitude  Alt_masl  Descrip_opt
20140601 01:30:00   35.1431  -116.1042      237    Zzyxx (testing)
20140613 17:34:00   34.1362  -118.1269      237    Caltech
20140613 17:34:32   34.1361  -118.1269      237    CaltechB
20140628 18:55:05   34.1362  -118.1269      237    Caltech
20140628 18:55:30   35.1431  -116.1042      237    Zzyxx (testing)

where:

  • the first line gives the number of header lines (2) and the number of columns (6) - the number of columns is ignored by EGI v2,
  • "Date" and "UTCTime" give the starting date/time for these coordinates.
    • "UTCTime" is optional, if omitted.... TODO
  • "Latitude" and "Longitude" give the coordinates, with south and west represented as negative,
  • "Alt_masl" is the meters above sea level for this measurement, and
  • "Descrip_opt" is an optional, human-readable location description (not read by EGI)

Surface meteorology

Surface pressure is required to perform the level 2 retrievals, so it must be available when processing the EM27 interferograms. Temperature, relative humidity, wind speed, and wind direction are convenient to have, but not required. Since there is not a standard way of collecting surface meteorology data for EM27 observations, EGI must be flexible in how it accepts this data.

Note: users are discouraged from using surface pressure from reanalysis meteorology for their surface pressure unless absolutely necessary. Surface pressure is needed to accurately calculate the position of the EM27 relative to the atmosphere above it, so a good quality measured surface pressure will give better results.

Continuing with our tutorial, recall the directory structure:

/
└── data
    └── xx
        ├── 20240401
        │   ├── coords.json
        │   ├── interferograms/
        │   └── met_source.json
        └── 20240402
            ├── coords.json
            ├── interferograms/
            └── met_source.json

Now we are going to create the met_source.json files.

Running i2s

Now that all of the original data is prepared, it's time to set up and run I2S, which converts the interferograms into spectra. We use the em27-i2s-prep program to set up run directories and scripts for I2S. This has several subcommands which are useful in different cases. For this example, we'll use the daily-json subcommand. This allows us to set up to run each day's worth of interferograms in parallel, with the configuration we need for that included in (yet another) JSON file.

First we need to prepare the JSON file. It needs to include a number of pieces of information:

  • the directories where the interferograms are stored,
  • a glob pattern to select the interferograms in each directory,
  • the paths to the coordinate and meteorology JSON files, and
  • where we want the run directories to be created.

Continuing on from the previous example, let's assume that we're processing data for the instrument with site ID "xx", and we have a directory structure like so:

/
└── data
    └── xx
        ├── 20240401
        │   ├── coords.json
        │   ├── interferograms/
        │   └── met_source.json
        └── 20240402
            ├── coords.json
            ├── interferograms/
            └── met_source.json

We'll also assume that our interferograms include the date in their name, that we want to create run directories like /data/xx/spectra/20240401 to run I2S in, and that our EM27 has the dual InGaAs detector that supports retrieving CO. Let's create a JSON file named demo.json with the following contents:

{
  "igram_pattern": "/data/{SITE_ID}/{DATE:%Y%m%d}/interferograms/",
  "igram_glob_pattern": "*{DATE:%Y%m%d}*",
  "coord_file_pattern": "/data/{SITE_ID}/{DATE:%Y%m%d}/coords.json",
  "met_file_pattern": "/data/{SITE_ID}/{DATE:%Y%m%d}/met_source.json",
  "run_dir_pattern": "/data/{SITE_ID}/spectra/{DATE:%Y%m%d}"
}

Notice that our values have some parts in curly braces, namely {SITE_ID} and {DATE:%Y%m%d}. These are placeholders, which will have the actual value substituted in for each date processed. {SITE_ID} will be replaced with our two-letter site ID, xx, when we pass it on the command line. {DATE} will be replaced with each date that we want to process. The {DATE} placeholders also include an extra part, the :%Y%m%d. This specifies the format the date should have; "%Y" means the 4-digit year, "%m" the 2-digit month, and "%d" the 2-digit day. We use the chrono crate for dates, so all the format specifiers listed on their strftime page can be used. Other characters can be included as well, e.g. %Y.%m.%d would print "2024.04.01" for 1 Apr 2024. With no format, {DATE} defaults to %Y-%m-%d format.

Now we can run em27-i2s-prep to create our run directories. Assuming we want to run the 1st, 2nd, and 3rd of Apr 2024, the command is:

$ em27-i2s-prep daily-json demo.json xx 2024-04-01 2024-04-03

Note that the start and end dates must be given in the YYYY-MM-DD, a.k.a. %Y-%m-%d format for command line arguments. This may take a minute or two to run (it inspects the headers of every interferogram, which adds up with a lot of them), but will create:

  • three run directories: 20240401, 20240402, and 20240403 in /data/xx/spectra, and
  • a multii2s.in file in your current directory.

The multii2s.in file is a script that will run each day's interferograms through I2S. It can be run in serial with bash multii2s.in, but if your system has the parallel tool, we can run the days in parallel with the command:

$ parallel -t --delay=1 -j4 < multii2s.in

The -j argument specifies how many concurrent tasks to run, here we use 4, but you can use more or less (depending on your system). Note that, if you are working over an SSH connection, it may be a good idea to run this command in something like screen so that you can disconnect and let it keep running. Depending on the number of days and interferograms per day, this step could take minutes or a few hours. When it completes, you will have spectra in each of the run directories.

Now we're ready to run the level 2 retrieval.

Troublshooting interferogram conversion

This section describes common or difficult-to-understand errors that can occur while setting up to run I2S.

Errors during preparation

What does the error "data did not match any variant of untagged enum CoordinateSource" mean when running em27-i2s-prep?

This means that your coordinate file did not match any of the expected formats. That might mean you are missing one of the required fields (or misspelled one), or that a value is not of the proper type. For instance, if any of "longitude", "latitude", or "altitude" are strings or null, that will cause this error.