Sorbonne Université – SESI M2 ——–
MU5IN160 – Parallel Programming Hands-on Session 6 – Dataflow for Motion Application Very important, about the submission of your work At the end of this session you will have toupload the following files on Moodle: 1) a zip of the src folder and 2) a zip of the include folder. Afterthat you will have 2 weeks to complete your work and to update your first submission. You have to workin group of two people but each of you will have to upload the file on Moodle. Finally, please write yourname plus the name of your pair at the top of all these files.
Short introduction
In this session, we will work on a streaming application that detects and tracksmoving objects from a video sequence. Contrary to the previous sessions, we will not use EasyPAP thistime. The later is not adapted for streaming applications. A working streaming application will be givento you and you will have to use StreamPU to implement the Motion application through an explicitdataflow representation.
1 Appetizer
First you need to clone the repository of the Motion project:git clone --recursive https://gitlab.lip6.fr/parallel-programming/motion-sesi.gitThe Motion project uses CMake in order to generate a Makefile: follow the README instructions tocompile the codegrayscale image
Fig. 1 presents the different algorithms used to detect moving objects and to track them over time. To
make it work, two strong assumptions are made: 1) the camera is fixed, 2) the light intensity is constantover time. First, an image is read from a camera (or a video sequence) and then it is converted in agrayscale image. Then, the Σ∆ algorithm is triggered. This algorithm is able to detect if a pixel ismoving over time. It returns a binary image, if a pixel value is 0, then it means that it is not moving.Otherwise, if a pixel value is 1, then it means that it is moving. After that, morphology algorithms areapplied1 . This is a pre-processing to regroup moving pixels together and eliminate isolated pixels. Then,from a binary image, a connected components labeling (CCL) algorithm is performed. The later, givesthe same label to a group of pixel that are connected to eachother. CCL returns an image of labels wherel = 0 means no object and l > 0 means a moving object. From this image of label, some features areextracted (CCA): for each object the center of mass (xG, yG), the bounding box ([xmin, xmax, ymin, ymax])and the surface S are extracted. Depending on their surface, the objects are filtered (Smin < S < Smax).From two images at t − 1 and t, a matching algorithm determines which objects are the same in the twodifferent images (mainly according to their distance). At the end, the identified objects are tracked toave a constant identifier over time.This graph of tasks is then repeated until the video sequence is over. It is not mandatory to understandperfectly each algorithm. Thepurpose of this session is to work on a streaming application, representativeof a real application, and to perform optimizations at the task graph level.Mathematical morphology: https://en.wikipedia.org/wiki/Mathematical_morphologyIn this graph, two tasks cannot be replicated. The per pixel motion algorithm requires its previousoutput to compute the current binary image. It detects intensity variations over time. It is almost thesame for the tracking algorithm that maintains a list of tracks that are updated according to the lastframe.If you’d like to better understand the algorithms used in this project, some of them are described in moredetail in the document’s appendix. In any case, it’s worth noting that you don’t need to understandexactly what these algorithms do to complete this lab.
1.1 Run Motion
To run the code you will need some input videos. You can download a videos collection on Moodle (see the“Artifacts” section) or from this web link: http://www.potionmagic.eu/~adrien/data/traffic.zip.First, unzip the traffic.zip and from the build directory run the code with the following command:
./bin/motion2 --vid-in-path ./traffic/1080p_day_street_top_view_snow.mp4 \
--flt-s-min 2000 --knn-d 50 --trk-obj-min 5 --vid-out-play --vid-out-idYou should see a window with a top view of a highway and some moving cars (see Fig. 2) andyou shouldsee green bounding boxes around the cars.Figure 2: Motion screenshot (with –-vid-out-play –-vid-out-id parameters).
1.2 Architecture of the Project
Motion is mainly a C-style project but it is compiled in C++ to use StreamPU. The sources arelocated in the src folder, and there are 3 sub-folders:
- common: contains implementations of the processing tasks,
- main: contains source files that correspond to a final binary executable,
- wrapper: contains C++ files to wrap the C-style processing functions into StreamPU modules
and tasks.The headers are located in the include folder. Inside there are two sub-folders: c/motion for the C-style
headers and cpp/motion for the C++ headers.Page 22 From Imperative to Dataflow Programming
We will convert the motion2 main into a dataflow description (= StreamPU modules and tasks). The
motion2 is located here src/main/motion2.c. This implementation is very close to the task graph
presented in Fig. 1.
Task #1
Understand the code, run the motion2 executable and play with the parameters (-h showsand describes the available parameters).To help you in the task, we created an other main based on motion2.c and we converted some C functionsinto StreamPU modules for you. See the motion2_spu.cpp file.Task #2 Understand the code, run the motion2-spu executable and play with the parameters (-hshows and describes the available parameters). Understand the code of motion2_spu.cpp by comparingit with the C-style motion2.c code.
Task #3
Create new StreamPU stateful modules, each time you will create new .cpp and .hppfiles in the wrapper folders. You will only declare input and output sockets (DON’T use forward
sockets at this time):
- Sigma_delta: Add a StreamPU compute task that will call the sigma_delta_compute function,
- Morpho: Add a StreamPU compute task that will call the C morpho_compute_opening3 andmorpho_compute_closing3 functions,
- CCL: Add a StreamPU apply task that will call the C CCL_LSL_apply function,
- Features_CCA: Add a StreamPU extract task that will call the C features_extract function,
- Features_filter: Add a StreamPU filter task that will call the C features_filter_surfaceand features_shrink_basic functions (note that the maximum input size of the features differsfrom the maximum size of the output features: indeed, the main purpose of the shrink function isto reduce the maximum number of features and to save memory space),
- KNN: Add a StreamPU match task that will call the C kNN_match function,
- Tracking: Add a StreamPU perform task that will call the C tracking_perform function.Add the StreamPU modules and tasks incrementally in the motion2_spu.cpp file and you will testf their integration is working (you can compare the logs with a diff, see Note #2 below). Have a lookow we did this for the other StreamPU tasks that are given to you. You will follow the samephilosophy: 1) bind the sockets to the buffers allocated in the main file and 2) call the exec() methodexplicitly.
Note #1
It is NOT possible to create sockets of RoI_t structure. Only the basic C types are supported.To get around this limitation you can count the number of bytes in the structure. For instance, you cando something like:auto si_RoIs = this->template create_socket_in<uint8_t>(t, "in_RoIs", max_size * sizeof(RoI_t));Note #2
motion2 is our golden model. To compare the results of motion2 and motion2-spu you needto generate the logs of motion2 executable first (we do it for onlyframes to execute faster):./bin/motion2 --vid-in-path ./traffic/1080p_day_street_top_view_snow.mp4 \
--vid-in-stop 20 --flt-s-min 2000 --knn-d 50 --trk-obj-min 5 --log-path logs_refs
Secondly, you need to generate the logs of the motion2-spu executable:
Page 3./bin/motion2-spu --vid-in-path ./traffic/1080p_day_street_top_view_snow.mp4 \
--vid-in-stop 20 --flt-s-min 2000 --knn-d 50 --trk-obj-min 5 --log-path logs_spu
Finally you need to compare the logs together:
diff logs_refs logs_spu
If the later command returns nothing, it means that motion2 and motion2-spu are equivalent (in term
of features). This is good, your new implementation is correct! If not... it is time to debug :’-(.
Task #4
At this point, you should only have StreamPU tasks that call their exec() method explicitly(no more C style function calls). However, the code is still using the data allocated in the main function.This can be improved because StreamPU performs the data allocation and deallocation automaticallyfor you. In order to remove most of these allocations you have to perform partial “output to input socket”bindings. For instance, if we only consider to eliminate the IB0 buffer, it is possible to remove “pointerto output socket” bindings and to add “output toinputsocket” bindings instead, as shown in Code 1.Do it for all the buffers, EXCEPT for IG0 and IG1. It is strongly advised to do it step by step and tocheck if the code is giving exactly the same results after each modification (please refer to Note #2).
// [...]
// step 1: motion detection (per pixel) with Sigma-Delta algorithm
sd0["compute::in_img"].bind(IG0[0]);
// sd0["compute::out_img"].bind(IB0[0]); // this line can be removed
sd0("compute").exec();
// step 2: mathematical morphology
// mrp0["compute::in_img"].bind(IB0[0]); // this line can be removed
mrp0["compute::in_img"] = sd0["compute::out_img"]; // <-- [NEW] output to input socket binding
// mrp0["compute::out_img"].bind(IB0[0]); // this line can be removed
mrp0("compute").exec();
// step 3: connected components labeling (CCL)
uint32_t n_RoIs_tmp0;
// ccl0["apply::in_img"].bind(IB0[0]); // this line can be removed
ccl0["apply::in_img"] = mrp0["compute::out_img"]; // <-- [NEW] output to input socket binding
ccl0["apply::out_labels"].bind(L10[0]);
ccl0["apply::out_n_RoIs"].bind(&n_RoIs_tmp0);
ccl0("apply").exec();
// [...]
Source code 1: Example of partial socket binding to eliminate IB0 buffer allocation/deallocation in the
main function.
Task #5
Now, replace IG0 and IG1 buffers by the binding of the video["generate::out_img_gray8"]
socket. For this, you will need to use a Delayer module in order to keep the t − 1 image in memory
(previously kept in the IG0 buffer). If you don’t use it, the t − 1 image will always be overwritten when
executing the video("generate") task.
Note #3
In the motion2 executable, some tasks are not executed in the first stream (see the following
condition in the motion2.c file: “if (n_processed_frames > 0)”). To manage it you have two possible
options:
- Always execute the tasks (no control flow) but in this case you need to carefully initialize the
Delayer module to the first frame with the Delayer::set_data() method (this solution is
simpler to implement),
Page 4• Use a Switcher and a Controller_limit module to implement the control flow (= if condition).
To simplify, you will only put the Sigma_delta.compute() task in the condition. In other terms,
the CCL, the CCA and the filtering will be executed anyway.
Task #6
At this point you should not have memory allocations and deallocations anymore in the mainfunction. Next objective is to get rid of the multiple exec() calls over the tasks. You will separate thebinding from the execution. To do this, the socket bindings need to be moved outside of the while(1)
oop and the while(1) loop needs to be replaced by a StreamPU Sequence. Once it is done, only oneexec() call should remain: the one over the newly created Sequence object. Of course, you will check
Note #4
To help you in the debugging, you can print the sequence graph with the export_dot method.Enable/disable the logs, enable/disable the visualization and observe the impact on the task graph. Ifyou chose to do not implement control flow, the output graph should looks like in Fig. 3. Note that youcan personalize the name of a module with the set_custom_name(std::string custom_name) method.
Task #7
Before the sequence execution, you will enable the statistics of the task (call the get_modulesmethod on a sequence object). And after the sequence execution you will print them at the end(tools::Stats::show function). The application will display the statistics only if there is the --statsparameter. What do you see? Is it different than from the motion2 executable? Explain.
[Bonus] Task #8
When you think it’s necessary, create new tasks, postfixed with a f, that use forwardsocket instead of input/output sockets combination. For instance, if we consider a task named computewithout forward socket, the task that uses forward socket will be named computef. You will NOT
replace the former compute task. Using forward sockets should help you to remove useless copies.Do it incrementally to validate that the application is still working (see Note #2). Can you see animprovement in the statistics of the tasks?
Page 5Appendix
2.1 Sigma-Delta Algorithm (Σ∆) The motion detection problem consists in separating moving and static areas in each frame. At eachinstant, each pixel must be tagged with a fixed/moving binary identifier. When the camera is fixed, suchdetection can be performed using the time differences computed for each pixel.The following notations apply:
- t : current instant of time, used to identify the frames,
- It: grayscale source image at time t,
- It−1: grayscale source image at time t − 1,
- Mt: background image (mean image),
- Ot: grayscale difference image,
- Vt: image of variance (standard deviation) computed for each pixel,
- Lt: binary label image (motion/background), Lt(x) = {0, 1} or Lt(x) = {0, 255} to encode
{background, movement},
- x: the current pixel with (i, j) coordinates.
Most of motion detection techniques in an image sequence It(x) are based on an estimate of the modulusof the temporal gradient | ∂I ∂t |. If the light intensity of the scene vary slowly (= is constant between twoconsecutive images), then a significant variation in the pixelgrayscale (above a threshold) between twoimages will imply that there is movement at that point.he Σ∆ algorithm assumes that the noise level can vary at any point. To achieve this, the pixel grayscale modeled by a mean Mt(x) and a variance (standard deviation) Vt(x). If the difference between thecurrent image and the background image is greater than N times the standard deviation, then movementoccurs. The value of N is a parameter. In this project, N is always set to 2. This is a motion detection system based on theestimation of static background statistics using Σ∆modulation: an iterative analog/digital conversion method that increments or decrements the digitizedvalue by one unit according to the result of the comparison between the analog value and the current
digitized value.Algorithm 1: Sigma-Delta (Σ∆).Part #1: mean computation]The algorithm initialization for t = 0 is the following: M0(x) ← I0(x) and V0(x) ← Vmin. Then, thealgorithm is applied to the images from t = 1. The Vmin and Vmax constants are used to restrict thepossible values of Vt. Typically, Vmin = 1 and Vmax = 254. The complete algorithm after initialization isshown in Alg. 1.In the Motionproject, a naive Σ∆ implementation is given to you:Page 6• Header: in the include/c/motion/sigma_delta/sigma_delta_compute.h file,
- Source: in the src/common/sigma_delta/sigma_delta_compute.c file.See the sigma_delta_compute function.
2.2 Mathematical Morphology
In this project, we consider squared elements B of size 3 × 3. Let X be the set of pixels associated withthe B element. There are two basic operations: the dilation of X noted δB(X) and the erosion of X oted ϵB(X). The application of mathematical morphology operators is similar to filtering operators(stencils or convolutions), but with non-linear operations.For binary images, dilation consists in computing a OR on the B neighborhood in the source image andwriting it to the destination image. Conversely, erosion consists in computing a AND on the neighborhood.So, if a point in the neighborhood is 1, the dilation produces a 1 (since x OR 1 == 1), thus dilating thebinary connected component. Conversely, if only one pixel is 0 in the B neighborhood, the erosion willproduce a 0 (since x AND 0 == 0), thus eroding the connected component.Erosion is used to reduce noise in images: if we consider that a small group of pixels is the noise thatwe’re trying to remove, then applying erosion with a B element of size 3 × 3 will make any group ofpixels with a radius smaller than its size disappear.Figure 4: Left: the initial binary image. Center: eroded image with a 3 × 3 squared element: the gray pixels areremoved. Right: dilated image with a 3 × 3 squared element: the gray pixels are added. Source: Wikipedia.Let r be the radius and d = 2r + 1 the diameter of a squared element B, then an erosion of radius r removes, to any connected component, a thickness of r pixels of contour while a dilation of radius r addsa thickness of r pixels代寫MU5IN160 – Parallel Programming to the contour (see Fig. 4, note that in the figure the logic is reversed: pixels at 1are black while pixels at 0 are white).Figure 5: Left: the initial binary image. Center: opened image with a 3 × 3 squared element: the gray pixelsare removed. Right: closed image with a 3 × 3 squared element: the gray pixels are added. Source: Wikipedia.From these two operators, two others can be defined: the closing ϕB(X) = ϵB(δB(X)) and the opening γB(X) = δB(ϵB(X)). Closing reduces (or even completely close) holes in connected components, whileopening does the opposite, enlarging these same holes (see Fig. 5, note that in the figure the logic isreversed: pixels at 1 are black, while pixels at 0 are white).One of the advantages of opening and closing is that they preserve the (discrete) size of the regions,unlike erosion, which reduces it, or dilation, which increases it. Depending on requirements, either aclosing or an opening can be chosen. As these operators are idempotent, applying them several timesdoes not change the result (which will be identical to that obtained after a single application). On theother hand, they can be chained (opening and then closing or closing and then opening) to improve theresult image (noise reduction, filling holes, ...). By gradually increasing their radius, we obtain sequentialalternating filters, which are particularly effective for removing noise.In the Motion project, naive 3 × 3 mathematical morphology implementations are given to you:Header: in the include/c/motion/morpho/morpho_compute.h file,Page 7• Source: in the src/common/morpho/morpho_compute.c file.See the morpho_compute_opening3 and morpho_compute_closing3 functions.Page 8