© Distribution of this video is restricted by its owner
00:01 | All right. No it should be . Yes. So I wanted to |
|
|
00:06 | clear about the last private. Wasn't clear explanation in the last lecture. |
|
|
00:18 | it is well defined what it actually back last private? What gets exported |
|
|
00:27 | of the parallel region and it is X. Value that is associated with |
|
|
00:37 | last integration of the for loop regardless when it happened to be computed. |
|
|
00:47 | it's not the last time ex possible touched but it's the X. Value |
|
|
00:55 | the last iteration. So we are an experiment. Yeah after class last |
|
|
01:06 | . So. Mhm. This may hard to display but uploaded. Um |
|
|
01:22 | so they basically they chose on the there. What the Tread idea about |
|
|
01:34 | threads in the particular case and it's . Yeah two iterations and but well |
|
|
01:43 | can see also that the friends did execute nicely in order with the which |
|
|
01:57 | that this block assignment of integration in that the thread number three would have |
|
|
02:09 | last three I guess iterations in this . Um But that's got down therefore |
|
|
02:25 | X value was touched last but nevertheless looks at the bottom you can see |
|
|
02:31 | what was exported was in fact the value that was created and the iteration |
|
|
02:40 | was max or while in this case guess. So it so the order |
|
|
02:50 | which the iteration indices were treated does effect so it's well defined what gets |
|
|
03:01 | out of the region. So any on that. Yeah. Okay. |
|
|
03:16 | stop sharing my screen and I'll switch today's lecture. So, uh, |
|
|
03:28 | is like the last time. yeah, I think so. Was |
|
|
03:49 | some question I shouldn't answer. You it. Uh huh. Okay. |
|
|
03:58 | then we'll continue with open and peak we left off. More or |
|
|
04:05 | I encourage you to do the example was, I didn't get the last |
|
|
04:09 | . I think the slides hopefully self and I think it's a good exercise |
|
|
04:15 | go through and trying to make And if you have questions about |
|
|
04:20 | then let us know one way or . Oh, this is kind of |
|
|
04:29 | . The great up on the first bullets is really what was covered last |
|
|
04:35 | . So we'll move on to some work sharing constructs and the constructs that |
|
|
04:41 | listed here and I do not think will get to do the example |
|
|
04:47 | Instead I leave time for cst demo and Okay. So yes, the |
|
|
04:58 | again, this kind of structure of empty is um, throughout the control |
|
|
05:04 | creating parallel regions that were the work construct. And as we talked about |
|
|
05:12 | some extent, this notion of shared private. Have to be careful about |
|
|
05:19 | race conditions. And then there are which didn't talk much about last |
|
|
05:26 | So that's what I want to talk today and some of the long term |
|
|
05:33 | . So here the kind of bluish are the ones I did not cover |
|
|
05:37 | time and I'll cover most of them some degree today. Except I will |
|
|
05:45 | cover Cindy and task them will be a future lecture. All right. |
|
|
05:53 | additionally work sharing constructs. So they these three that I will talk about |
|
|
05:59 | as work sharing construct a mask, construct a single construct and infections constructs |
|
|
06:07 | if 2nd and 3rd the master single kind of miss normal because it's actually |
|
|
06:14 | sharing words. So it's the opposite it's still in the group of construction |
|
|
06:21 | this labor of work sharing. So the master construct does is basically that |
|
|
06:32 | um, part of the code should exit code executed by the master thread |
|
|
06:44 | . So and after in this case this pregnant open empty master in the |
|
|
06:48 | region. That means that's it. codes within this carbon gracious will then |
|
|
06:57 | executed with a master read only. sometimes that like this example, another |
|
|
07:08 | that was that I did this side are familiar with something that's almost as |
|
|
07:17 | used as matrix multiply or matrix vector is to do the Corbett direction on |
|
|
07:26 | and in that case boundaries are treated from the internal points on the |
|
|
07:33 | And that is because it was chosen best have the massive threat is what |
|
|
07:40 | boundaries and that means that the other do not father working with. |
|
|
07:54 | peaceful cold that is dedicated to the . Now for this particular example, |
|
|
08:03 | I mentioned, typical jacobi iteration on mesh and for corrections of the |
|
|
08:09 | It's important that standards gets updated before for instance deal with the internals and |
|
|
08:17 | why it's also the barrier statement after open and the master. Yeah. |
|
|
08:27 | of the code because the master construct not have an implicit burying so if |
|
|
08:37 | wants things to be synchronized, one to explicitly required by having a variance |
|
|
08:45 | otherwise other threads will escaped. But math is designed to the master and |
|
|
08:52 | on to follows So and part of thing that I'm pretending on this Fortuyn |
|
|
09:02 | that shows that the master threads it's of always life through the whole |
|
|
09:13 | So whereas other friends are not, are basically confined to be created and |
|
|
09:25 | at the end of the region. master threat continues through. And that's |
|
|
09:31 | that's important. The reason why this construct doesn't have it mm implicit barrier |
|
|
09:40 | the end of the cold section for to which it is supplied. So |
|
|
09:47 | is what I said. So then was a single construct that is also |
|
|
09:54 | for insurance construct which is basically that of cold like for the master should |
|
|
10:01 | executed by a single friend and it's from the master in the sense that |
|
|
10:11 | threat could take the job of executing code for which is declared to um |
|
|
10:25 | by just a single threat. But it can be and it did not |
|
|
10:30 | could be a massive hit but it also be any other threat. Now |
|
|
10:34 | single, unlike the master construct does a barrier at the end of the |
|
|
10:42 | of the code for which it to it's applied. So in this case |
|
|
10:49 | this code will behave exactly the same logically as the one of the masters |
|
|
10:57 | putting in the barriers because it's implicit this single, so that was the |
|
|
11:07 | and the single. So that's what says here. So what because of |
|
|
11:13 | barrier, what it means and when thread execute this cold segment than the |
|
|
11:21 | ones potentially doesn't do anything depending on they are and what work is designed |
|
|
11:28 | their threats. Mm Then there was section's construct that is a way of |
|
|
11:40 | up work among Mhm. The So in this case what every code |
|
|
11:52 | is following an open empty section, or construct is executed by one Fred |
|
|
12:03 | , so but the different sections here be executed in parallel. So in |
|
|
12:11 | case X, y and z calculation concurrently, but different threats with each |
|
|
12:20 | executed by a single threat. And then there as says the barrier |
|
|
12:31 | the end of the from the pragmatic empty sections constructed, initiates the part |
|
|
12:41 | the code for which a division for is explicitly managed through this section, |
|
|
12:49 | home. So what is as if has more sections then threads than thread |
|
|
12:58 | On more than one section. But this, the way this call is |
|
|
13:05 | , there is no particular threat that on in a particular section. So |
|
|
13:10 | they always decide is a good threat take on at the tech perception, |
|
|
13:16 | what's going to happen. So, there is a question in the |
|
|
13:23 | Yeah, well I see that um the difference is in the single |
|
|
13:42 | you cannot have several threats working on sections of cold at the same |
|
|
13:51 | So the sections allows you to have each section, it behaves like the |
|
|
14:01 | um construct, but unlike the single , you can have several threads working |
|
|
14:10 | different pieces of code at the same . Did that answer the question? |
|
|
14:23 | , yeah, so I think these the rock sharing comes structure had planned |
|
|
14:31 | talk about, there is no value . The barrier is at the very |
|
|
14:39 | . So the thing that matches the sections construct. So in this case |
|
|
14:50 | will be three threads executing uh in it's likely. And the synchronization happens |
|
|
15:01 | the end when these three threats have their respective cold sentence. Mm |
|
|
15:16 | Okay, so a little bit about for I'd rather talk to the |
|
|
15:24 | I think that's fairly and through them no critical and atomic may not be |
|
|
15:31 | much. So I'll go through What's your question? Oh, |
|
|
15:39 | All right, so that's just pretty the obvious thing. The barrier basically |
|
|
15:45 | that everything is on there assigning things a is done before anything has worked |
|
|
15:54 | to assign things to be. Uh that's pretty much the obvious function of |
|
|
16:02 | barriers. So then a little example being worth spending a little bit of |
|
|
16:09 | you'll see. Yeah, maybe. I'm going to ask questions and we'll |
|
|
16:15 | if I hear you or if when so you have to repeat But |
|
|
16:21 | so we can see this cold here excess global terrible. And hence shed |
|
|
16:28 | all the threads museum and explicitly So by having the shared have to |
|
|
16:37 | or attribute for the parallel section. . So I mean it's all threads |
|
|
16:42 | execute the region then have access to . And then what happens in that |
|
|
16:52 | is that thread number zero update Value. But none of the other |
|
|
17:00 | objects. The ex father. So then there is a sprint |
|
|
17:08 | So now the question is what do different threads print to the print to |
|
|
17:19 | the print fire or what would they ? And they take hers it should |
|
|
17:35 | too because it's shared, correct while correct in the sense exhibition And depending |
|
|
17:48 | whether thread zero updated X before the thread looked at it. So if |
|
|
18:00 | threat that is other threat than zero is slower in this way Then it |
|
|
18:09 | print five. If it gets too . Before threats, zero updates that |
|
|
18:14 | will print too. So this is on what it is. But after |
|
|
18:19 | barrier because things are sink. So means at that point X is definitely |
|
|
18:25 | to five. So then all threads pin fine. Oh, so |
|
|
18:33 | it's just important to keep in mind threads can execute things in an in |
|
|
18:40 | arbitrary order and there's no guarantee with makes most of the progress. So |
|
|
18:48 | pick someone can nothing for any time between different threats. And I'll talk |
|
|
18:59 | about this flush in a moment. that's what the name is for when |
|
|
19:08 | are synchronized between threats. Right? So critical is the former mutual exclusion |
|
|
19:22 | this process. Is that the piece cold for which the critical construct applies |
|
|
19:32 | only be executed by one thread at time. Uh huh. But all |
|
|
19:41 | will eventually execute that piece of So what, it's not like a |
|
|
19:48 | construct in the sense is just one can do it all. We'll do |
|
|
19:52 | but they will do it at different . Yeah. So that's what a |
|
|
20:00 | one call at the time. Oh . So uh looking at in this |
|
|
20:08 | call example, um I wanted to out how this for loop is |
|
|
20:18 | So in this particular case um the iterations of the for loop. Our |
|
|
20:40 | two different threads and a given the state does whatever is thriving fred |
|
|
20:48 | is and then the next loop It does is something that text the |
|
|
20:59 | of the current ID and increments it the number of threats. So basically |
|
|
21:04 | kind of um picking out iteration counts in Iran robin way. So if |
|
|
21:17 | have the range of iteration and haters in this case and you have |
|
|
21:29 | 10 threats then thread number zero does number zero Then it does like a |
|
|
21:40 | number 10 etcetera. And the third one does one plus and the next |
|
|
21:47 | does 11. So it's kind of takes the range of inspiration in disease |
|
|
21:56 | doing a round robin way, assign loop indices to threats. So this |
|
|
22:03 | a common way that people sometimes decided do paralysing loop and explicitly control of |
|
|
22:16 | intervention in disease gets society particular Mm Okay, so the next construct |
|
|
22:31 | is kind of similar but still different the critical is atomic And the different |
|
|
22:42 | atomic applies to variables or memory locations supposed to cold segments. So it |
|
|
22:54 | assures that the particular variable is only or By one thread at the |
|
|
23:06 | So it prevents and kind of confusion race conditions and that sequences one at |
|
|
23:15 | time. So that's sort of the between critical is one thread at the |
|
|
23:25 | for cold segment. Atomic. one at the time for a variable opted |
|
|
23:36 | question on the difference between critical and . Okay, next one is if |
|
|
23:49 | or that can be used as is in this particular example where the segment |
|
|
23:57 | code is paralyzed under certain conditions. specific conditions to then the parallel region |
|
|
24:07 | generated and a bunch of trends, generated for treating the code segment that |
|
|
24:15 | if it's false, the region will be created and that means sect the |
|
|
24:22 | are called will just be executed by threat. Well then it's just so |
|
|
24:30 | reason for doing things like this is it says here and I said before |
|
|
24:36 | generating primary region and a bunch of has a certain amount of overhead and |
|
|
24:41 | they work that is small then one not really care to try to paralyze |
|
|
24:50 | . But actually then it's better than to just sequentially go through the |
|
|
24:55 | Mm So this and sometimes in the you don't necessarily know and what in |
|
|
25:02 | case the end value is So that's you have this sort of data dependent |
|
|
25:08 | you paralyze things or not by potentially this its expression to control. What |
|
|
25:17 | parallel regions multiple threats should be Okay, so this was that there |
|
|
25:27 | a new way away. That means normally there is the this is that |
|
|
25:37 | implicit barrier at the end of the loop and I know what it means |
|
|
25:43 | um progress. It's not limited to that every friend Gets to the end |
|
|
25:52 | the four. Look before moving on of course that be very useful. |
|
|
26:00 | again, needs to be used unfair upon whether one for correctness need things |
|
|
26:06 | be in central. Not so I guess just an example of you |
|
|
26:15 | what cause here where there is still lists on top there is basically creating |
|
|
26:23 | vision with maybe in c being shared the idea being in private for its |
|
|
26:31 | , the best places thread ID. that means you know, a different |
|
|
26:39 | of gets updated by different threads. it's not this forest conditions but then |
|
|
26:47 | moving on to using a One has barrier and then there is the 4th |
|
|
26:55 | . So in this case and generous sees and that one again would like |
|
|
27:01 | have all done. But then for last four Luca highlighted in red then |
|
|
27:10 | okay not to have Been synchronized for four lou but then it gets sync |
|
|
27:18 | us at the very end because that's initial or the problem partner regions of |
|
|
27:23 | done. So it depends on what called. Logic is where there is |
|
|
27:32 | and history introduced. No. but it's again, a way of |
|
|
27:39 | huh avoiding the or canceling the default implicit synchronization. Yeah. Mhm. |
|
|
27:50 | right. So the next thing I I want to talk about this |
|
|
27:58 | Mhm. So, reduction is a common operation, as you probably do |
|
|
28:04 | know, this is a very simple example, you're standing up the |
|
|
28:09 | OnRA eight and then computing there average . So sequential. That's fine. |
|
|
28:17 | Fonda's kind of straightforward parallelization that is here now on this side then. |
|
|
28:30 | we'll see what happens. So is any problem with doing it this |
|
|
28:41 | So the average C A. Yeah. Yeah. Oh, is |
|
|
28:55 | outside the region issue? All Yeah. So, well, the |
|
|
29:17 | . Mhm. So in a way problem stems from I guess two |
|
|
29:26 | but one being that The 80 variable going to be the average at the |
|
|
29:34 | is the global one. So that in the parallel region, the different |
|
|
29:44 | are then updating a V by adding respective A values. They have you |
|
|
29:55 | each A values because the lupin extract among the threads, so they all |
|
|
30:05 | different A values. So that's But the problem is that maybe is |
|
|
30:12 | and share. So if two threads to get to the statement at the |
|
|
30:19 | time, the object will not be . So one gets a race condition |
|
|
30:30 | doing it this way. So there's few ways and I think a few |
|
|
30:37 | here of working on this americans is . So in this case and avoids |
|
|
30:53 | problem with having a global shipments a variable by having a local accumulator for |
|
|
31:02 | one of the threats. So maybe justice one that is private for each |
|
|
31:15 | . So that means that the thread submission is called just fine. They're |
|
|
31:24 | every day locals from different threads. when get partial sums correct for each |
|
|
31:32 | of the threads, but then and needs to add up the partial sums |
|
|
31:39 | is generated by the different threats. , but now to avoid the race |
|
|
31:49 | again, Mountain use the open and . So that means it's that gets |
|
|
31:54 | turn the ads it's partial sum. the global. So so this now |
|
|
32:08 | it generates a correct code? Any And there are other suggestions for how |
|
|
32:25 | do this. My that's all I just talked about the journey. |
|
|
32:36 | I guess I have an example you're what happened in this case and that |
|
|
32:42 | that something's us did just to verify in a simple case and I can |
|
|
32:48 | that the local partial sums here Um so the thread number is on |
|
|
32:58 | right hand column, right Thread And a was basically Running from 1 |
|
|
33:05 | eight. Listen, eight elements in array. So that means read zero |
|
|
33:15 | a one. So that means the becomes one. Then it also takes |
|
|
33:22 | second threat. The second index. that means is a A or one |
|
|
33:29 | basically the second element of A. too. So it at those two |
|
|
33:33 | . Someone gets after that taking the iteration and was designed to the thread |
|
|
33:40 | as in 2-1. So another local is three. Um We can take |
|
|
33:47 | just jump to the last trend and number three that has the last two |
|
|
33:53 | of the loop. And that means guess The second to last element of |
|
|
34:00 | 97. And then the second index it gets that trend is |
|
|
34:07 | So then it has a thing at . And then what happens in the |
|
|
34:12 | region that the final value on each the of the local? So that |
|
|
34:20 | 3 7, 11 and 15 gets up And it doesn't show one that |
|
|
34:26 | here but We can doing it, actually is 36. So it's 7 |
|
|
34:34 | per seven, that is 10 11 is 21 plus 15 or |
|
|
34:41 | And then there were eight uh Elements basically 36 divided by eight, which |
|
|
34:48 | four points. So it ended up correct. So left. Well this |
|
|
34:59 | so so yes, what is this a swan flip side of this? |
|
|
35:07 | And I come to that in a um that is adding up the partial |
|
|
35:15 | from the threads. It's a question of the critical statement, one could |
|
|
35:21 | use atomic that I mentioned before because this case it's just a single |
|
|
35:27 | The variable that it's updated. So of critical one could have used the |
|
|
35:32 | for this particular example but it doesn't the fact that the adding partial sums |
|
|
35:41 | sequential. That's a sequential operation. there was a question in the |
|
|
35:51 | Okay. Mhm. Yes, And so nothing yes, nothing proceeds |
|
|
36:15 | all the threads has done. The critical good question I shall mention |
|
|
36:31 | so that was atomic. Uh so what I said I would talk about |
|
|
36:42 | reduction. So using the reduction clause um both is managing the parallelism among |
|
|
37:01 | threats, work assignment to threats as as synchronization and correctness. And in |
|
|
37:14 | does not force anything to be A certain principle and depends how this |
|
|
37:25 | construct is implemented in order to make it's correct. But we all |
|
|
37:32 | No. So how fast can you one talks algorithms in terms of parallel |
|
|
37:41 | in this case. So how many does it take at the minimum? |
|
|
37:53 | you have say many threads to add eight numbers or nobody misses, we'll |
|
|
38:09 | you guys get the idea from that . So in parallel algorithms basically on |
|
|
38:29 | huh have that the least number of it takes to say at all. |
|
|
38:36 | elements of an array is kind of the sequence of Paraguay's submission. So |
|
|
38:45 | you have kind of a tree. the number of elements to be added |
|
|
38:51 | belief and then you combine that pair you get basically the minimum number of |
|
|
38:59 | steps if we like is the height the tree so it's best Logan so |
|
|
39:08 | that's a I didn't have behind this and by doing the pair was additions |
|
|
39:19 | the correct way than going up up tree from the leaves, you can |
|
|
39:30 | sort of each going up the tree each pair or leaves and as long |
|
|
39:37 | they only sent points when you come join on the left and right brands |
|
|
39:44 | the tree, but different sides of tree can operate in parallel. So |
|
|
39:52 | how the reduction or the reason for inclusion of the reduction cause that when |
|
|
40:00 | the sequential part, that was in critical or atomic piece of the code |
|
|
40:09 | this reduction operation. So here is what it does then in this case |
|
|
40:16 | just have this additional cost for the four construct that claims Yeah, this |
|
|
40:24 | going to be a reduction loop. . And in this case again final |
|
|
40:31 | should be in the variable avian. this case it's a plus reduction. |
|
|
40:36 | add up the values So in this one doesn't have to declare and the |
|
|
40:44 | variables for each one of the threads one doesn't need to worry about adding |
|
|
40:51 | the local values of the reduction. implementation, take care of all of |
|
|
41:01 | any questions on that. So here typical reduction variables. Yeah, so |
|
|
41:15 | then they have some initial is so you add things it's clearly a |
|
|
41:20 | subtract. Say then, you the initial values designed to the summation |
|
|
41:26 | basically initialize to zero something sets up . If it's you can also multiplication |
|
|
41:32 | being the reduction operator and in this starts with the one and then there's |
|
|
41:39 | things for logic as to what initial are. So this is available in |
|
|
41:48 | mp and in most parallels languages or because it's an important operation in so |
|
|
41:56 | applications that efficient for all good performance good. Critical sometimes. And this |
|
|
42:08 | guess that I would say something about flush. Mhm. Operation and it's |
|
|
42:14 | basically says that open MPI has what's as this relaxed consistently share memory models |
|
|
42:21 | that Mr scan have a local you a while but when you come to |
|
|
42:26 | point then it make sure that um of variable states is consistent among threats |
|
|
42:38 | might enforce it by using explicitly flush . So here is kind of what |
|
|
42:46 | flush that's some of it is implicit some that you can also make it |
|
|
42:52 | , don't not sure that things are fred sees the value should expect them |
|
|
42:57 | see but you don't necessarily um needed correctness on every program. So thinking |
|
|
43:06 | the time is kind of overkill and things down. Okay, so looking |
|
|
43:16 | a good time here. Um so let's see Yes. Mhm. Start |
|
|
43:24 | demo and the first time left, will continue. Oh, okay. |
|
|
43:32 | . Mhm. Mhm. I'll start my screen then, yep. |
|
|
43:47 | Mhm. Mhm. All right. . Oh, between Okay, so |
|
|
44:09 | just trying to keep it interactive and to answer uh questions, hopes what |
|
|
44:24 | ? Yes. Session ended for some what my uh session ended on stamping |
|
|
44:35 | something. Okay. Okay. I assume session because I can see your |
|
|
44:41 | . Yeah, apparently that, as said session got terminated. Okay. |
|
|
44:52 | . Yeah, so I thought that be learned on the tools for hopefully |
|
|
45:02 | won't take long time to get access . You said this year's stampede is |
|
|
45:09 | . Yes. And you can see it gives me access very quickly. |
|
|
45:17 | . No. Okay. Yeah. . Right. Mhm. Okay. |
|
|
45:29 | . So is to start off so by now we all know how the |
|
|
45:36 | and P program looks like. So have the O M P dot H |
|
|
45:41 | file uh in your program and then you have that in your source |
|
|
45:45 | you can use all the open MPI that we've been seeing throughout the throughout |
|
|
45:51 | lectures um before getting into what the does. If you want to compile |
|
|
45:58 | court um For with open mp. with gcc compiler, you don't have |
|
|
46:03 | do anything special just need to add extra flag to your compilation command and |
|
|
46:12 | just to uh any uh compilation, any other program if you are using |
|
|
46:19 | intel compiler as the I C Then you just need to switch this |
|
|
46:24 | from f open and P to Q mp. And that's the that's the |
|
|
46:28 | difference between using gcc and intel And once you do that it should |
|
|
46:35 | the uh executable as you were expected . All right. So, first |
|
|
46:40 | in this program, how many threads be reported by uh this om Vietnam |
|
|
46:47 | in the cereal region? Mm How many times will it report? |
|
|
46:59 | yes. So, how many in in the cereal reason? How many |
|
|
47:05 | village report? It's a serial Yes. So, yes. And |
|
|
47:14 | reason. It will report. Uh one thread. How many threads will |
|
|
47:18 | report? If if I do not the number of threads explicitly in the |
|
|
47:25 | region. How many threads will be if I do not explicitly set the |
|
|
47:33 | of threads. The only position, ? It's that simple. Okay. |
|
|
47:42 | yeah. All right. So, depends on the environment variable called o |
|
|
47:50 | num threads and that. Its value on how the uh opening period time |
|
|
47:57 | been said So on stampede. If check the value of RMP num |
|
|
48:02 | it is by default one I believe ridges to it is 28. That's |
|
|
48:06 | to the number of course on Each of the processors or I think |
|
|
48:11 | the whole load, you know what that? It will if even if |
|
|
48:21 | ask less number of processors on a , on a note, it will |
|
|
48:25 | be that default value. Oh are I was running this martin? |
|
|
48:35 | is, it is you can you request more threads than the even the |
|
|
48:40 | as well. Yes, in that multiple transmit run on the same court |
|
|
48:50 | . Yes, funny part of Right, by default, it's that |
|
|
48:55 | this environment were able to one mm , um what happens if I |
|
|
49:03 | Yeah, so in the slide deck I comment on it later but there |
|
|
49:14 | a slide that details exactly how they figures out how many threats to |
|
|
49:20 | Um so you can also look in slide deck and a few slides ahead |
|
|
49:25 | where I stopped under the one time and there you will find they open |
|
|
49:33 | spec for how threads are assigned. , yep. All right, so |
|
|
49:46 | question how many threads you will get I explicitly tell um so there was |
|
|
49:54 | about mirrors. Well, um was in the chat about this explain what |
|
|
50:03 | mirrors, computers? Yes, I either, I didn't I'm not sure |
|
|
50:11 | matters. Yeah, maybe I heard , uh you said that if number |
|
|
50:16 | um requested threat is more the number course then it too near, but |
|
|
50:23 | I've heard just wrong. Oh yeah, so um yes, a |
|
|
50:31 | detailed why, how it decides depending the number of conditions, what number |
|
|
50:39 | threads are actually is given to the of give me x number of |
|
|
50:48 | but they always should not assign more than is available. Right. |
|
|
51:00 | When you when you said uh any of threads using this, Oh, |
|
|
51:05 | . Certain um threats called. That's a request to the operating system. |
|
|
51:10 | not a guarantee that it will give the same number of threats on your |
|
|
51:15 | your on your private laptops or any machine. You may get the same |
|
|
51:19 | , but on shared machines it's not a guarantee. So it's always a |
|
|
51:24 | idea to check once you are in parallel region that how many threads you |
|
|
51:29 | got for your execution. And it's a good a good idea to maybe |
|
|
51:35 | codes that do not depend on the of threats you got. So it |
|
|
51:39 | be still be able to give you correct answer. Okay, When they |
|
|
51:47 | being on. Right. Yes. that's that's what I was going to |
|
|
51:52 | . So the environment variable, it sets what's called an open and fair |
|
|
51:58 | time is a internal control variable. I C V. So the environment |
|
|
52:04 | sets its value. But when you oh, mp, said num threads |
|
|
52:09 | that function call actually updates the value that internal control variable. So whatever |
|
|
52:15 | ask an open Oh, NBC, threads it takes priority. Ask uh |
|
|
52:20 | the environment, radio? Yes. . All right. Yes. You're |
|
|
52:32 | it? Yes. Right. You're you're not telling the world you're requesting |
|
|
52:39 | ? Yes. Just make sure you that. Okay. And yes, |
|
|
52:44 | 4th 1 is, I think pretty . I think we already discussed that |
|
|
52:48 | if you are asking for eight you may get less. Number of |
|
|
52:52 | always is not going to give you number of threats than you ask. |
|
|
52:58 | I think if I just run it it is because I did not set |
|
|
53:02 | number of threads right now. So giving me one thread outside the parallel |
|
|
53:06 | and one thread inside the parallel reason well. If I just quickly uncommon |
|
|
53:15 | this part here and I'm sorry if recompile mhm. Then now you have |
|
|
53:25 | your operating system gave you a threats every every thread executed this same piece |
|
|
53:32 | code. So the code gets replicated the across the threats. So, |
|
|
53:39 | in the plasters, so the program your questions, you don't get, |
|
|
53:56 | know, depends how the operating system . Right, correct, correct. |
|
|
54:06 | the other person's program and the the that is using most of the |
|
|
54:12 | Yeah. Right. Yeah, that's , depends how the operating system sets |
|
|
54:19 | priority of different processes that it gets for. So yeah, it may |
|
|
54:27 | you the same number of things that ask for, Even teams that it |
|
|
54:31 | enough resources to do that. All right. So assuming that you |
|
|
54:42 | how many, which have a number threads you ask for, you get |
|
|
54:46 | from the U. S. So many total threads will be in this |
|
|
54:52 | ? Mhm, mm hmm. Four Okay, right. And how many |
|
|
55:02 | are threat will be assigned? Let kill that fact scenes is our little |
|
|
55:16 | huh. This part of which is to be like no. So if |
|
|
55:25 | they always gave you the four it will not, it will not |
|
|
55:28 | away any threats from you once. , yes, for example. So |
|
|
55:34 | each uh thread will get for attrition correct because this is static scheduling by |
|
|
55:41 | , that's the static scheduling. So will try to evenly distribute all the |
|
|
55:45 | . It rations with all the all threads. So yeah, if I |
|
|
55:50 | simply run this to get an output like this again, uh notice here |
|
|
55:54 | there is no implicit order in the of the iterations or any thread so |
|
|
55:59 | can But that's one thing, one is for sure that each thread got |
|
|
56:04 | nutrition's Yeah. Okay, okay. , mhm. All right. And |
|
|
56:16 | , let's assume that gave you the of threats that you requested for? |
|
|
56:21 | many um how many threads were generated this uh this respond in this |
|
|
56:31 | Yes, there is a class going Uh up till 5 30, |
|
|
56:42 | Mhm. Uh huh. So how threads in this program? 48. |
|
|
56:52 | , and how many traditions part Mm Yeah yeah yeah. It says |
|
|
57:10 | are one in the chat, that's that's the correct answer. So |
|
|
57:16 | you ask for more threats than you the work available for Yeah, they're |
|
|
57:22 | and parent and will again try to the word evenly. And the threats |
|
|
57:26 | do not get any work, they'll sit idle but they will be |
|
|
57:30 | That's for sure if if the S. Gave you that many |
|
|
57:35 | Uh So again if you run it , each I've got one nutrition each |
|
|
57:41 | the other threat. Pretty much did didn't do anything, you know? |
|
|
57:49 | right, so far more expensive. . Yes. If you if you |
|
|
57:56 | how how much how many threads you for your work. If you are |
|
|
58:00 | of the number of lou penetrations and , correct? Yes. You may |
|
|
58:07 | hugging unnecessary resources that your program may end up using. Yeah. And |
|
|
58:15 | right. Uh Take a few seconds look at this program. And the |
|
|
58:21 | here is can you expect a correct from this uh parallelized loop section |
|
|
58:29 | So what this code basically is doing it has to raise A and |
|
|
58:34 | You initialize a a rabbit? Somebody . Then we have four threads And |
|
|
58:42 | have uh eight elements in the code eight elements for a day and we |
|
|
58:49 | this loop With four trans here and basically just element of B equals element |
|
|
58:58 | a plus element of B. For minus one. BI Equals AI Plus |
|
|
59:07 | , I -1. Can you expect direct output from this? If you |
|
|
59:14 | this for loop, why be Okay, I need help talking |
|
|
59:33 | Mhm Yes. So what you're seeing is called what we all know as |
|
|
59:38 | data dependency because the a threat that be accessing, let's say some element |
|
|
59:45 | B um may ask for an element me that there's being worked on by |
|
|
59:52 | other threat and the execution. So a data dependency across the across the |
|
|
59:57 | . Patricians mainly. All right. . One way of saying it is |
|
|
60:01 | threads as well. Yeah. So that case because we shared uh different |
|
|
60:07 | , maybe updating the value at the time and if you run this program |
|
|
60:12 | is no guarantee that you will get right outboard. uh in this case |
|
|
60:19 | or the other may have been uh in this case. All the Patricians |
|
|
60:25 | in a sequence. So we got but some some execution of the program |
|
|
60:31 | the operations may be jumbled between as know, there's no implicit order so |
|
|
60:34 | may get an incorrect output here. you need to be careful about if |
|
|
60:39 | any data dependencies or any other kind dependencies on your program that you're trying |
|
|
60:43 | paralyze It was also correct answer given the chat in case you didn't see |
|
|
60:52 | . Oh mm, correct. Jordan is right there, Yeah. |
|
|
61:01 | . Mhm. All right. Uh is a bit tricky one. So |
|
|
61:09 | have set eight threads on the top , Then we have a parallel region |
|
|
61:17 | spawns two threads and sets the outer which is an interior as private for |
|
|
61:24 | section. Then for this outer parallel we get the thread ID using Oh |
|
|
61:30 | . Get thread number or trade we print that. And then inside |
|
|
61:36 | parallel region we have another parallel region spawns two threads and I don't think |
|
|
61:43 | discussed this clause yet. But if add non threads clause in front of |
|
|
61:49 | and be parallel for then for that parallel section. Even if you let's |
|
|
61:54 | for a threads and you need only trade, you can specify that you |
|
|
61:59 | only to threats for this particular parallel . But yes. So the important |
|
|
62:04 | here is we have implicit in what said that the number of threats sitting |
|
|
62:12 | a parallel constructs as the highest priorities kind of over, is the set |
|
|
62:19 | that overrides the environmental variable setting. , Sorry. One. Yeah. |
|
|
62:28 | . So we have the nested battle in in this in this case. |
|
|
62:34 | first question is how many threads let's are in the outer parallel region. |
|
|
62:44 | , how many total threads do we in the inner parallel reason. Uh |
|
|
62:54 | Yeah. So it's in the Have one answer if you can see |
|
|
63:00 | . Okay, four. Okay, correct. Yes. All right, |
|
|
63:07 | this is the tricky one uh based question one and question two, how |
|
|
63:13 | print shops do you think will be the output? So we have two |
|
|
63:17 | in the outer region and for in inner region. Eight. Yes. |
|
|
63:31 | Okay. One offer also in the . Yes. Well, I'm not |
|
|
63:42 | it if it's correct or not, let's just run it. Uh All |
|
|
63:48 | , So the last four, let's it. So when did we got |
|
|
63:53 | ? We got one outer and two . So we got to print ups |
|
|
63:57 | the outer region. We got right uh inner print off for the outer |
|
|
64:07 | and one printer for the outer So we got only to print offs |
|
|
64:12 | the inner regions. Thank you. reason for that is there is another |
|
|
64:20 | variable and uh open mp. Which called as uh oh, empty |
|
|
64:30 | Yeah. And right now it doesn't any value. So by default it |
|
|
64:34 | to false. If you want to nested regions, then you need to |
|
|
64:41 | this particular variable. They're true. you once you said that now, |
|
|
64:48 | you run it, you get two plaintiffs for the two outer threads And |
|
|
64:56 | of them had to inner threads. we get basically four printouts for the |
|
|
65:02 | region. Does that make sense? right. Now the final question is |
|
|
65:10 | total. How many threats do you in this program? Six? |
|
|
65:21 | No, No. There are four on this program. The reason is |
|
|
65:29 | . MPI uses something called us thread . Uh It's called thread pooling. |
|
|
65:35 | what it does is when you ask two threads here, it did spawn |
|
|
65:40 | threads that's expected behavior. But when both threads went for this inner |
|
|
65:48 | They each asked for two friends. that means in total now we needed |
|
|
65:52 | threads. Yes. For okay, the outer outer region we needed to |
|
|
66:05 | right now, each of these two for the inner region needed to threads |
|
|
66:11 | . Right, So four in you needed four threats for the inner |
|
|
66:15 | . Right? But if you think it's in a simple way, you |
|
|
66:19 | to threats for the outer region and threads for the inner region. So |
|
|
66:23 | in total shoot, you should expect threads. Right. But what happens |
|
|
66:28 | open MPI uses thread pulling. So reuses the two threads that were running |
|
|
66:33 | the outer region. So it does spawned two extra threats. It uses |
|
|
66:38 | it spawned previously for the outer region well. So in total, rather |
|
|
66:42 | having six threads, you end up only four threads. So you get |
|
|
66:49 | myself express to the fence, not performance but at least use uh spare |
|
|
66:57 | overhead of uh spawning two more Yeah, yeah, yeah, I |
|
|
67:05 | also question in the chat uh was issue in open mp necessary is set |
|
|
67:13 | false the outer threats. Do not the inter nested loops. No, |
|
|
67:18 | do enter the nested region but they not spawn extra threads. They still |
|
|
67:25 | run with a single thread. So inner parallel region is basically ignored as |
|
|
67:33 | and considered just as a sequential section the outer parallel region. Yeah. |
|
|
67:43 | . Alright. So yes, so this because in your assignment you are |
|
|
67:48 | to paralyze uh nested loops at the mr loops. So for that you |
|
|
67:54 | need to set this particular variable to . Otherwise you may not get you |
|
|
67:59 | not get uh master luke nested regions be working? Mhm. All |
|
|
68:08 | That was it for that. All . That was just the basic um |
|
|
68:20 | with the parallel regions. Just a examples with the critical region. Um |
|
|
68:27 | right. So I think this you guys cannot tell me. Can |
|
|
68:32 | expect the correct result from this Uh One if you have a critical |
|
|
68:39 | construct or if you do not have critical construct in either of the |
|
|
68:45 | do you expect a direct result What we're doing is we're just trying |
|
|
68:48 | find a find the maximum value from from the A So what do you |
|
|
68:56 | if we if I remove the critical , do you think we'll get the |
|
|
69:00 | output. Mhm. Mhm. Getting get the bus, you know? |
|
|
69:19 | . Well, max is uh So it is a shared variable. |
|
|
69:29 | . So here the problem is that max variable is, as I |
|
|
69:36 | is shared. And excuse me. has a race condition on it. |
|
|
69:43 | if I remove critical then every threat be trying to update it with whatever |
|
|
69:48 | value that it gets. So in case you may not get the correct |
|
|
69:53 | when you add critical, then each uh only one thread can enter this |
|
|
69:58 | section at a given time. So that case you're guaranteed to set the |
|
|
70:03 | value for each thread for the max . So a sample output may look |
|
|
70:10 | like this. So, if you , let's say four elements in a |
|
|
70:16 | , 12 without. Okay, Well 12 is not the max value out |
|
|
70:22 | those four. But if you have with the critical, you're guaranteed to |
|
|
70:28 | the cut it out. Uh We'll sleep is uh I just added |
|
|
70:39 | to make sure that there is some between the Senate executions. Yeah. |
|
|
70:44 | even if you don't have that, say you can Yeah, let's say |
|
|
70:49 | you have something a large piece of other than the sleep function, it's |
|
|
70:54 | assimilate something like that. Mhm. right. What was this guy? |
|
|
71:10 | . All right. I think everyone knows how the uh scope of the |
|
|
71:17 | work. So here are the So what would happen if we use |
|
|
71:23 | shared private or first private for the I hear this program and in this |
|
|
71:32 | every threat is just trying to add plus the thread I. D. |
|
|
71:38 | to the I. Variable. So would you expect as the output for |
|
|
71:45 | . Um if you have shared private first private? Um Yeah in |
|
|
71:57 | Okay. And the fear of right . Mhm. Yeah. Uh |
|
|
72:14 | Okay his take your time. Think it. What and dr johnson let |
|
|
72:27 | know if I'm going over time if want to. No no no I |
|
|
72:30 | it's good. So I think I get yourself go through what she |
|
|
72:37 | I'll let you know when time is . You have about five more |
|
|
72:41 | Okay sure mm. All right. for now it's just show what the |
|
|
72:52 | output may look like you guys want guess maybe. Yes. Yeah that's |
|
|
73:02 | . So what's going to happen with ? If he said I as shared |
|
|
73:09 | every thread will obviously get we'll be with the shared variable and whichever variable |
|
|
73:18 | the last update to the shared You will see the output reflected for |
|
|
73:26 | particular thread because isa shared variable. is trying to update it but in |
|
|
73:33 | end whoever ran Last will be the Setting the final value of I. |
|
|
73:40 | . So here thread six run last it updated the value. 1000 plus |
|
|
73:48 | . So that's 1006 that you get in the end. Yeah, if |
|
|
73:54 | set it to private, then we I with them but since uh we |
|
|
74:02 | to private here, Then none of threads get this 10 value here. |
|
|
74:08 | get some either some garbage value or this case open and period time decided |
|
|
74:12 | initialize it to zero. So that's everyone gets a zero value in this |
|
|
74:17 | . And in the end because it a private variable, You don't get |
|
|
74:22 | updates outside the barrel reason because private not carry out the outputs outside the |
|
|
74:27 | region. So you still get 10 the region. Okay, in case |
|
|
74:32 | first private it's pretty much the But now every threat sees the initial |
|
|
74:39 | of variable I because its first private That gets carried inside the barrel |
|
|
74:45 | So rather than having a zero, get a 10 inside the barrel |
|
|
74:49 | but still outside the parallel region. get attempt because you don't carry out |
|
|
74:53 | final value outside the region. just a simple example to show the |
|
|
75:01 | of the of the variables. It depends on how you stop. |
|
|
75:15 | , there is a class going Yeah. Alright. I think I'll |
|
|
75:23 | with this example. Sure. No . All right. So, |
|
|
75:37 | All right. So with this example using the uh the clause schedule. |
|
|
75:45 | the question here is if things since were on this nested loops? Uh |
|
|
75:53 | don't know if you have a they always also have the right thread IEDs |
|
|
76:01 | local to regions. Yes, I that was part of I forgot to |
|
|
76:07 | it during that example. Yes. . Yes. Coming back to that |
|
|
76:13 | . Yes. Inside the the nested region. The thread IEDs again start |
|
|
76:20 | zero and that they are private to nested region. So you would not |
|
|
76:26 | in the inner region to go from or three for each region. Outer |
|
|
76:32 | , you get in zero and in and same for outer one, you |
|
|
76:37 | zero and then one. Yes, can do that. Mhm. |
|
|
76:50 | yes. Uh huh. Yes, uh I think by Yes, the |
|
|
77:05 | clause I think follows the hierarchy of western region. So yes, that |
|
|
77:09 | be carried out into the minister Yes. Yeah. Right. |
|
|
77:19 | I'm sorry. I forgot to mention when I was talking about. That |
|
|
77:22 | an important point too. Then one to keep track of they found your |
|
|
77:30 | and get the thread ID. Uh . That is local. So if |
|
|
77:37 | depends on also to which outer I've read it is related. one |
|
|
77:45 | to keep track of the thread numbering things are nested. Right? If |
|
|
77:53 | cold depends on that for correctness. . So uh So, I think |
|
|
78:02 | time is pretty much up for us okay finish today, so we'll continue |
|
|
78:09 | your demos next time. Sure, , uh huh and I will receive |
|
|
78:16 | where I stopped the slides next yep. Um so one thing uh |
|
|
78:24 | wanted to ask since I will be also on next week, whether this |
|
|
78:33 | is fine or you rather have online , it's just for my it's my |
|
|
78:44 | agreeing for this format. Okay, what I guess, partially my hope |
|
|
78:50 | expectation, even though it's not but I think it's better than just |
|
|
78:55 | personally, so I appreciate that. so it can take just some more |
|
|
79:03 | before people are coming in, but think we have at least a couple |
|
|
79:08 | minutes if anyone has any questions, the slides, I didn't get to |
|
|
79:22 | heavens also a bit of explanation on things that demo today, so we |
|
|
79:31 | have a back up both in terms the video and in terms of the |
|
|
79:35 | that are on blackboard for today, phone, just stop recording at this |
|
|
79:55 | . |
|