.. _`parallelization`: *************************** Parallelization with Python *************************** This file is an introduction to the realm of parallelization, and specifically for use with tudatpy. Tudatpy has many applications and many can be parallelized. For parallelization specifically in combination with PyGMO, further reading is available under :ref:`parallelization_with_pygmo`. .. contents:: Content of this page :local: General parallelization with Python #################################### In Python, you can parallelize data processing in various ways. One possible way is to use GPU's, but this is not discussed here. For Python CPU-based parallelization, there are generally two types: multi-processing and multi-threading. Multi-processing is a method that initializes multiple processes. This means that different processes are running on independent CPU's, with independent memory management. Multi-threading is a method that uses multiple threads for a single parent process with shared memory. Child processes can be run on separate threads. There are generally two threads per CPU, and each computer system has their own amount of CPU's with their own specs. The amount of parallellity is therefore determined by the system you want to run on. It should be noted that it does not always make sense to parallelize your simulations. The initialization of parallel tasks takes longer, so there is a break even point beyond which it is worthwhile, shown in :ref:`multi_threading_with_batch_fitness_evaluation`. To enable parallel behavior with Python, the ``multiprocessing`` module is used. Other alternatives exist as well that are more modern, but they are not as widely spread or as thoroughly documented. Ray, for instance, is one of these packages, it is arguably more seamless, but it is also rather new and focused on AI applications. All parallel processing should be put under ``if __name__ == "__main__" :``. This line ensures that the code is only run if that file is the file being executed directly (so not imported, for example). This prevents an infinite loop when creating new child processes -- or starting calculations on other threads. If this line is omitted, child processes import the python script, which then run the same script again, thereby spawning more child processes. This results in an infinite loop. Next, ``mp.get_context("spawn")`` is a context object that has the attributes of the multiprocessing module. Here, the ``"spawn"`` argument refers to the method that creates a new Python process. ``"spawn"`` specifically starts a fresh Python interpreter process -- which is default on macOS and Windows. ``"fork"`` copies a Python process using ``os.fork()``-- which is the default on Linux. ``"forkserver"`` creates a server process; a new process is then requested and the server uses ``"fork"`` to create it. This method can generally be left at the default value. A ``Pool`` object is temporarily created, which is just a collection of available processes that can be allocated to computational tasks. The number of cores you would like to appoint to the ``Pool`` is given as an argument. Subsequently, the ``map()`` or ``starmap`` method allows for a function to be applied to an iterable, rather than a single argument. ``map()`` allows for a single argument to be passed to the function, ``starmap()`` allows for multiple arguments. The inputs are all the sets of input arguments in the form of a list of tuples, which constitutes the iterable mentioned previously. The outputs are formatted analogously, where the tuples are the various outputs rather than the input arguments. .. use manually synchronized tabs instead of tabbed code to allow dropdowns .. tab-set:: :sync-group: coding-language .. tab-item:: Python :sync: python .. dropdown:: Required :color: muted .. code-block:: python import multiprocessing as mp import numpy as np from tudatpy.dynamics import environment_setup, propagation_setup from tudatpy.interface import spice .. literalinclude:: /_snippets/simulation/parallelization/general_bfe_example.py :language: python .. tab-item:: C++ :sync: cpp .. literalinclude:: /_snippets/simulation/environment_setup/req_create_bodies.cpp :language: cpp .. note:: The memory will be freed only after all the outputs are collected. It may be wise to split the list of inputs into smaller batches in case a high number of simulations are run, to avoid overflowing the memory. .. seealso:: Other ways to specify the context or create a Pool object are also possible, more can be read on `the multiprocessing documentation page `_. Batch Fitness Evaluation for Monte-Carlo analysis ################################################# In this section, the basic structure is presented that can allow for a simple, parallel Monte-Carlo analysis of any problem. An astrodynamics example is used for obvious reasons: the :ref:`Kepler satellite orbit `. Using this, we can change any parameter, let the Monte-Carlo simulations run in parallel, and enjoy the power. BFE Monte Carlo code structure ------------------------------ In the snippet below, the implementation can be seen. It is straightforward, and looks surprisingly similar to `General parallelization with Python`_. The ``run_simulation()`` function is shown below as ``run_dynamics()``. The same concepts are applied, but rather than two integers being returned without further calculations, the inputs are the Semi-major Axis and Eccentricity elements of the initial state which has a profound influence on the final results of the orbit. .. use manually synchronized tabs instead of tabbed code to allow dropdowns .. tab-set:: :sync-group: coding-language .. tab-item:: Python :sync: python .. dropdown:: Required :color: muted .. code-block:: python # Load bfe modules import multiprocessing as mp # Load standard modules import numpy as np from matplotlib import pyplot as plt # Load tudatpy modules from tudatpy.interface import spice from tudatpy import dynamics from tudatpy.dynamics import environment_setup, propagation_setup from tudatpy.astro import element_conversion from tudatpy import constants from tudatpy.util import result2array .. literalinclude:: /_snippets/simulation/parallelization/mc_bfe_run.py :language: python .. tab-item:: C++ :sync: cpp .. literalinclude:: /_snippets/simulation/environment_setup/req_create_bodies.cpp :language: cpp The basic BFE structure can be seen above. Below the ``run_dynamics()`` function is shown, which is almost identical to code from the :ref:`Kepler satellite orbit `, with the small adjustment that the initial state definition is given by the input arguments to the function rather than defined manually. .. use manually synchronized tabs instead of tabbed code to allow dropdowns .. tab-set:: :sync-group: coding-language .. tab-item:: Python :sync: python .. dropdown:: Required :color: muted .. code-block:: python # Load bfe modules import multiprocessing as mp # Load standard modules import numpy as np from matplotlib import pyplot as plt # Load tudatpy modules from tudatpy.interface import spice from tudatpy import dynamics from tudatpy.dynamics import environment_setup, propagation_setup from tudatpy.astro import element_conversion from tudatpy import constants from tudatpy.util import result2array .. literalinclude:: /_snippets/simulation/parallelization/mc_bfe_dynamics.py :language: python .. tab-item:: C++ :sync: cpp .. literalinclude:: /_snippets/simulation/environment_setup/req_create_bodies.cpp :language: cpp BFE Monte Carlo results ----------------------- Regarding the performance of the BFE, a few results are shown in the table below. Once again, a substantial improvement is observed when conducting Monte Carlo analyses using tudatpy. .. note:: These simulations are tested on macOS Ventura 13.1 with a 3.1 GHz Quad-Core Intel Core i7 processor only. Four cores (CPU's) are used during the BFE. +-----------------------+---------------------------+---------------+----------------+--------------------+ | Number of experiments | Batch Fitness Evaluation | CPU time [s] | CPU usage [-] | Clock time [s] | +=======================+===========================+===============+================+====================+ | 500 | no | 107.94 | 99% | 110.51 | | +---------------------------+---------------+----------------+--------------------+ | | yes | 118.07 | 381% | 32.07 | +-----------------------+---------------------------+---------------+----------------+--------------------+ | 2000 | no | 443.83 | 99% | 457.35 | | +---------------------------+---------------+----------------+--------------------+ | | yes | 475.32 | 385% | 127.11 | +-----------------------+---------------------------+---------------+----------------+--------------------+ .. note:: Other applications are possible and may be documented in the future. If you happen to implement any yourself, feel free to contact the developers or open a pull-request.