Create batch of simulations

Last modified by pecl@helsinki_fi on 2024/01/26 07:24

Many times you may want to repeat the simulation for different input data but same model setup, like if modelling consecutive days in a month. These cases would always have the same input variables, similar options for the model modules, aerosol structure and so on. The create batch tool is designed for this occasion. After you are happy with all the simulation settings, and have checked that the simulation starts and runs (in the Run ARCA→Monitor tab) without problems for one case, you can multiply the simulation setup. The Create batch tool will create the output directories and the individual INITFILEs for all of the experiments, where the date and paths are changes to the appropriate ones.

Using wildcards

Since the input files would typically be different for each case, this is where the wildcards are especially useful. Remember that for the env_file, mcm_file, bg_particles and losses file you can use wildcards. Depending on the simulation identifier indexing system (Date or Index) you can insert year, month and day or index using <y>, <m>, <d> and <i>. Repeating the tag inside brackets inserts one more number, starting from right. For example using <yy> would produce 17 if the year is 2017, <yyyy> gives 2017. If the index is 0345, using <iii> gives 345, and so on. The GUI replaces these tags when writing the INITFILE, but also stores the tags so that when the file is loaded to the gui, the tags are back in their places.

Range/Indices

An example of the Create batch tool is shown below. Here the Date/Index in the Simulation identifier is not relevant, but instead the dates/indices now come from the Range/Indices in the Create Batch options. By pressing Preview batch, the Batch Preview window shows the directories which will be created. The input directory contains the INITFILE, and it is also a good place to store other input data. The input data files env_file, mcm_file, bg_particles and losses file are not copied by the tool; this you have to do afterwards. However, with clever usage of the wildcards copying can often be omitted altogether.

Pick from file

This option is there to enable creating a batch with non-consecutive dates/indices. The file should be a text file where each line consists of the date in format yyyy-mm-dd or, if indices are used, ii. For example:


234312

or


2019-04-032013-01-232015-12-13


Batch4.png


The Batch Preview window also shows the files that are written. These only include the INITFILEs and a bash file which calls all the individual simulation cases. The bash file should be called (started) from the location it was saved, and the runs then start immediately (assuming the paths of the input files were correctly set up in the gui).

If there are some folders which would be created, but already exist and contain files, these will be showed also in a separate section. Below May 6th of 2017 was already run and there are files in the directory, and the tool notes that these new runs would overwrite them.

Batch5.png


Batches which are meant to run on a different computer / cluster

Often the whole idea to create a batch case is to run it on another computer or cluster. When a batch case is moved from one place to another, all absolute file and directory paths which are defined in the INITFILE will be broken. Therefore, if the batch is to be relocated, either:

  1. Store files inside the ARCA program directory (probably the most succesfull way), or
  2. Define relative path in the Common root, such as ../../../MyOutput

Different supercomputers have different policies when it comes to storing large amounts of data in work directories. ARCA batch tool is not endlesly creative with regard anticipating all possible paths in some remote computer, so you might need to add your own scripts for moving the data around the cluster directories. Luckily, the amount of data produced from a single run is still relatively small and should not a be a problem to store in the work or personal directory.

Variations

The batched simulations can further be modified and used with the Variations tool, which lets you manipulate the time dependent input variables withing a range.