CSCI-UA.0480-051: Parallel Computing

t79l0j發表於2024-06-12

原文網址 : https://www.cnblogs.com/qq99515681/p/18243709

CSCI-UA.0480-051: Parallel Computing

Final Exam (May 15th, 2023)

Total: 100 points

Problem 1

Suppose we have the following two DAGs. Each DAG represents a process. That is, DAG 1 is a process and DAG 2 is another process. The two DAGs are totally independent from each other. The table shows the time taken by each task in each process.

a. [8 points] What will be the minimum time taken by each process if we execute it alone on one core, two cores, three cores, and four cores? That is, execute DAG 1 on one core, then on two cores, then on three cores, and then on four cores. Do the same for DAG 2. (Hint: You can put the result in the form of a table with three columns: #cores, time for DAG 1, and time for DAG 2. your table will have four rows for one, two, three, and four cores). Assume each core does not have hyperthreading technology.

b. [10 points] Based on the results you calculated in part a above, which DAG benefited more from using more cores? The two DAGs look similar, yet one DAG benefited from more cores. What is the main characteristic that this DAG has and makes it benefit from more cores?

c. [10 points] Suppose DAG 1 is your media player and DAG 2 is your browser. You start these two processes exactly at the same time and you are using a machine with eight cores (with no hyperthreading technology). How long does each DAG take to finish? Justify your answer.

Problem 2

Suppose we have the following MPI+OpenMP piece of code. The whole program is not shown. The lines with dots (i.e. lines 2, 7, 10, and 12) contain some other code not needed for this problem. Some of the variables are used but not declared (e.g. num, procID, n, and arrays A and B). Assume they have been declared in a code not shown here. Assume this program runs on eight cores processor. Each core is a four- way hyperthreading. Each core has its own private level 1 cache. Each level 1 cache is 32KB. Each core also has its own level 2 cache of size 1MB each. There is a shared level 3 cache of size 8MB. The program is executed using the following command: (progname is the name of the program’s executable).

mpiexec -n 4 ./progname

1. int main(int argc, char **argv){

3. float finalresult, sigma;

4. MPI_Init(&argc, &argv);

5. MPI_Comm_size(MPI_COMM_WORLD, &num);

6. MPI_Comm_rank(MPI_COMM_WORLD, &procID);

8. sigma = procID / num;

9. finalresult = findNum(A,B, n, sigma);

10.

11. MPI_Finalize();

12.

13. }

14.

15. float findNum(float * A, float * B, int n, float sigma)

16. {

17. float result = 0, total = 0;

18. int i;

19. #pragma omp parallel for reduction(+:result) numthreads(4)

20. for (i=0; i<n; i++)

21. {

22. float factor;

23. factor = A[i] * B[i];

24. result += factor * sigma *i;

25. }

26. MPI_Allreduce(&result, &total,1,MPI_FLOAT,MPI_SUM,MPI_COMM_WORLD);

27. return total;

28. }

a. [20 points] Fill in the right column of the following table with a short answer to the questions on the left. Please do not write any justification unless the question asks for it explicitly.

How many processes were created in the whole system?
How many threads are generated in total?
What is the maximum number of those threads (that you mentioned above) that can execute in parallel?
How many copies of the variable ‘sigma ’ were created in the whole system?
How many copies of the variable ‘factor’ were created in the whole system?
How many threads will execute the code in line 12 (the dots …. )?
How many threads will execute the code in line 24?
How many threads will execute the code in line 26?
Is there a race condition for line 23?
Justify your answer to the above question in one line

b. [8 points] Is there a situation where threads, in the above code, accessing the arrays A[] and B[] can cause the coherence protocol to start? If yes, what is the situation? If not, why not?

c. [8 points] Given the code above, how many virtual address spaces did the OS create? Justify your answer in 1-2 lines only.

d. [8 points] Is there a possibility that a process reaches line 27 before the other processes? If yes, will this cause a problem, and what is that problem? If not, why not?

e. [8 points] If one of the processes created for the above code crashed for some reason, do we risk having a deadlock? Justify.

Problem 3

For each question below, choose all correct answers. That is, a question may have one or more correct answers.

a. [4 points] Suppose a process wants to send data to a subset of processes . That process has the following options:

1. Split the communicator to smaller ones and use collective communication.

2. Make a series of send and receive to each one of the destination processes.

3. Use broadcast call.

4. Split each process into multiple threads and let threads communicate through shared memory.

b. [4 points] A warp in an NVIDIA GPU:

1. is transparent to the programmer.

2. consists of a maximum of 32 threads.

3. has each four threads share the same fetch and decode hardware.

4. suffer from thread divergence, also called branch divergence, possibility.

c. [4 points] A block in CUDA can be split among two SMs.

1. This statement is always true.

2. This statement is true if the block has more threads than the number of SPs in the SM.

3. This statement is always false.

4. This statement is false only if the number of threads in the block is less than the number of SPs in the SM.

d. [4 points] The following characteristics are needed for a code to be GPU friendly.

1. computation intensive

2. independent computations

3. similar computations

4. large problem size

e. [4 points] Choose all the correct statements from the following one:

1. If one program has a higher speedup than another program, for the same number of cores, it means that the program with the higher speedup also has higher efficiency than the other one.

2. MPI can run on distributed memory machines and shared memory machines.

3. OpenMP can run on distributed memory machines and shared memory machines.

4. Power consumption/dissipation is the main reason we moved from single core to multicore.

Statistical Computing and Empirical Methods
2024-11-14
Dialogue: New Chapter in the History of Computing
2020-08-29
APT
Artificial Intelligence Computing Conference（2018.09.12）
2019-01-28
Intel
China Cloud Computing Conference（2018.07.24）
2019-01-28
Cloud
CSCI1120 Introduction to Computing
2024-11-26
Oracle Parallel DML
2024-04-14
OracleParallel
MA2552 Introduction to Computing (DLI)
2024-10-05
VNC（Virtual Network Computing）服務
2024-11-06
VNC
EMATM0061: Statistical Computing and Empirical
2024-09-29
EMATM0061 Statistical Computing and Empirical
2024-11-21
An Overview of High Performance Computing and Responsibly Reckless Algorithms
2024-09-18
ViewORMGo
聊聊flink的Parallel Execution
2019-02-12
Parallel
parallel rollback引數總結
2019-02-28
Parallel
alter table nologging /*+APPEND PARALLEL(n)*/
2018-05-31
APPParallel
Oracle's Parallel Execution Features(zt)
2019-06-21
OracleParallel
並行處理 Parallel Processing
2019-01-04
並行Parallel
MU5IN160 – Parallel Programming
2024-11-03
Parallel
Linux parallel 命令使用手冊
2023-04-10
LinuxParallel
[Vue CLI 3] 配置解析之 parallel
2018-09-03
VueParallel
openGauss Parallel-Page-based-Redo-For-Ustore
2024-03-28
Parallel
並行閘道器 Parallel Gateway
2020-04-05
並行ParallelGateway
Mandelbrot set 以parallel_for_實現
2024-06-19
Parallel
ISSCC2025 Computing-In-Memory Session 趨勢整理
2024-11-25
Session
The practice of high-performance graph computing system Plato in Nebula Graph
2022-11-24
ORM
0316理解db file parallel read等待事件
2018-03-19
Parallel事件
Parallel Pattern Library(PPL)學習筆記
2018-12-31
Parallel筆記
Evolutionary Computing: Notebook assignment - Traveling Salesman Problem 練習隨筆
2024-09-09
架構之爭，Wave Computing 宣佈MIPS將開源
2018-12-18
架構
0322理解db file parallel read等待事件2
2018-03-23
Parallel事件
【1】Embarrassingly Parallel（易平行計算問題）
2021-12-09
Parallel
多執行緒那點事—Parallel.for
2021-01-02
執行緒Parallel
C#併發實戰Parallel.ForEach使用
2019-08-10
C#Parallel
11g parallel_instance_group 'cursor: mutex S'
2019-05-17
ParallelMutex
C#並行程式設計：Parallel的使用
2021-10-18
C#並行行程程式設計Parallel
OpenMP Parallel Construct 實現原理與原始碼分析
2023-01-25
ParallelStruct原始碼
JVM小冊(1)------jstat和Parallel GC日誌
2021-04-30
JVMJSParallelGC
【入門筆記】CSE 365 - Fall 2024之Computing 101（pwn.college）
2024-11-06
筆記
阿里雲徐立：面向容器和 Serverless Computing 的儲存創新
2021-11-28
阿里Server

CSCI-UA.0480-051: Parallel Computing

相關文章