The analysis of the Pareto Front reported data incompatibility, which is very common for real data due to different resolutions, sensitivities and depth of investigations.
Notwithstanding this, the multi-objective optimizers provided a complementary interpretation of the data, ensuring significant advantages with respect to the separate optimizations we carried out using the single-objective particle swarm optimization algorithm. Most users should sign in with their email address. If you originally registered with a username please use that to sign in. To purchase short term access, please sign in to your Oxford Academic account above.
Don't already have an Oxford Academic account? Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide.
Sign In or Create an Account. Sign In. Advanced Search. Article Navigation. Close mobile search navigation Article Navigation. Volume Joint optimization of geophysical data using multi-objective swarm intelligence Francesca Pace. E-mail: francesca. Oxford Academic.
Alberto Godio. E-mail: alberto. Alessandro Santilano. Cesare Comina. Cite Citation. Permissions Icon Permissions. SUMMARY The joint inversion of multiple data sets encompasses the advantages of different geophysical methods but may yield to conflicting solutions. Issue Section:. Some computational intelligence algorithms already have been adapted to be executed in GPU-based platforms. Some tests regarding the scalability of the algorithms as a function of the number of dimensions were also presented.
Bastos-Filho et al. They presented some interesting analysis on the parallelization process regarding the generation of ants in order to minimize the communication overhead between CPU-GPU.
The proposals achieved remarkable speedups. We discuss some important issues regarding the implementation in order to improve the time performance. We also consider some other relevant aspects, such as when and where it is necessary to set synchronization barriers. This paper is organized as follows: in the next Section we present an overview of the FSS algorithm.
Theory and New Applications of Swarm Intelligence
Our contribution and the results are presented in Sections 4 and 5, respectively. In the last Section, we present our conclusions, where we also suggest future works. As mentioned by Bastos-Filho et al. The search guidance in FSS is driven by the success of the members of the population. The original version of the FSS algorithm has four operators, which can be grouped in two classes: feeding and swimming. Fish weight is updated once in every FSS cycle by the feeding operator, according to equation 2. Thus, this operator increases the capacity to auto-regulate the exploration-exploitation granularity.
It helps the algorithm to initialize with an exploration behavior and change dynamically to an exploitation behavior.
Big data analytics with swarm intelligence | Emerald Insight
Despite all successful applications, some algorithms can not be effectively implemented for GPU platforms. CUDA allows a direct communication of programs, written in C programming language, with the GPU instructions by using minimal extensions. It has three main abstractions: a hierarchy of groups of threads, shared memories and barriers for synchronization NVIDIA b. These abstractions allow one to divide the problem into coarse sub-problems, which can be solved independently in parallel.
Each sub-problem can be further divided in minimal procedures that can be solved cooperatively in parallel by all threads within a block. Thus, each block of threads can be scheduled on any of the available processing cores, regardless of the execution order. In general, the algorithm correctness must be guaranteed, once race conditions on a parallel implementation may imply in outdated results. Furthermore, since we want to execute the algorithm as fast as possible, it is worth to discuss where it is necessary to set synchronization barriers and in which memory we shall store the algorithm information.
Any transfer of this type may reduce the time execution performance. Thus, this operation should be avoided whenever possible.
- Swarm intelligence - Wikipedia;
- US Army Tank Crewman 1941-45: European Theater of Operations (ETO) 1944-45;
- Multi-Objective Swarm Intelligent Systems.
- Navigation menu.
One alternative is to move some operations from the host to the device. Even when it seems to be unnecessary not so parallel , the generation of data in the GPU is faster than the time needed to transfer huge volumes of data.
More Books by Leandro dos Santos Coelho
Furthermore, the time to access these distinct types of memory vary. Moreover, all threads can access the same global memory. All these memory spaces follow a memory hierarchy: the fastest one is the local memory and the slowest is the global memory; accordingly the smallest one is the local memory and the largest is the global memory.
- IEEE WCCI 2016 Public Lecture.
- Multi-Objective Optimization Design for Gear Reducer Based on the Grey Particle Swarm Algorithm!
- Download Multi Objective Swarm Intelligent Systems: Theory.
Then, if there is data that must be accessed by all threads, the shared memory might be the best choice. However, the shared memory can only be accessed by the threads inside its block and its size is not very large. On the FSS versions, most of the variables are global when used on kernel functions. Shared memory was also used to perform the barycenter calculations. Another important aspect is the necessity to set synchronization barriers.
A barrier forces a thread to wait until all other threads of the same block reach the barrier. It helps to guarantee the correctness of the algorithm running on the GPU, but it can reduce the time performance.
Recommended for you
Furthermore, threads within a block can cooperate among themselves by sharing data through some shared memory and must synchronize their execution to coordinate the memory accesses see Fig. Although the GPUs are famous because of their Fig. Illustration of a Grid of Thread Blocks parallel high precision operations, there are GPUs with only single precision capacity. Since many computational problems need double precision computation, this limitation may lead to bad results. Therefore, it turns out that these GPUs are inappropriate to solve some types of problems.
The CUDA capacity to execute a high number of threads in parallel is due to the hierarchical organization of these threads as a grid of blocks. Besides, a thread block must synchronize themselves to coordinate the accesses to the memory. Therefore, each GPU has its own limitation. As a consequence, an application that needs to overpass this limitation have to be executed sequentially with more blocks, otherwise it might obtain wrong or, at least, outdated results.
The cards with 2. The other ones only can execute threads and have 16 KB of shared memory space. CUDA C program structure 3. These copies are also known as parallel blocks and are divided into a number of execution threads. The blocks are divided in threads that can be structured from 1 to 3 dimensions. As a consequence, the kernel functions can be easily instantiated see Fig. On the current GPUs, a thread block may contain up to threads. For this chapter, the simulations were made with GPUs that supports up to threads. Another important concept in CUDA architecture is related to Warp, which refers to 32 threads grouped to get executed in lockstep, i.
In this chapter, as already mentioned, the data processing is performed directly in the memories. A Host function can only be called and executed by the CPU. However, the results will be probably worse, since the information acquired is not necessarily the current best.
Here, we propose two different approaches for Asynchronous FSS. In the second approach, called Asynchronous - Version B, all the synchronization barriers were removed from the code in order to have a full asynchronous version. Algorithm 0.