© Distribution of this video is restricted by its owner
00:04 | what? Yeah. So today I continue to talk about open empty and |
|
|
00:22 | leave room for suggest to continue to or some features of open empty And |
|
|
00:30 | didn't cover all the slides I had the last lecture. So I will |
|
|
00:39 | with some of the slides that were the last lecture and then and some |
|
|
00:45 | points for open mp. So um , so there was I think a |
|
|
00:56 | bit of discussion to lash examples or about work scheduling. I'll talk about |
|
|
01:03 | first and then some more about quantum and I'll talk to a simple example |
|
|
01:12 | open mp and hopefully get into a bit or something known as tasks. |
|
|
01:19 | is a more flexible way of dealing using open mp. All right. |
|
|
01:25 | we're scheduling and I refuse slides about scheduling but this is in terms of |
|
|
01:33 | regions, hard to divide up the among threads in a parallel region. |
|
|
01:41 | it's kind of most these who thought in terms of loop constructs was a |
|
|
01:47 | of iterations and the thing I showed example earlier, I think it was |
|
|
01:53 | Spectrum application maybe. And there were 1000 integrations and there were four threads |
|
|
02:01 | each thread that A Block of 252 . So that's kind of the default |
|
|
02:08 | of doing business. But I can what's known as the number of |
|
|
02:17 | Uh I'm not in the context of loop that is assigned to a friend |
|
|
02:25 | that's known as the chunk. So chunks to is just the number of |
|
|
02:29 | in the case of a loop or amount of kind of atomic piece of |
|
|
02:36 | sort of block work. That is scientific threat and one can introduce something |
|
|
02:45 | a static or dynamic assignment of the . Two, the threats. So |
|
|
02:55 | there are more chunks than there are and threads, just the chunks or |
|
|
03:01 | block of iterations assigned in a kind round robin manner. Whereas in the |
|
|
03:09 | case however, get the job done , get to grab a new |
|
|
03:16 | So basically the chunk are kind of on the Q. And the threats |
|
|
03:21 | pick off the next chunk in the and I have some pictures of |
|
|
03:27 | Let me give you some more intuition how it actually works. Then there |
|
|
03:34 | also another version known as guided, which case the chunk size is not |
|
|
03:45 | but it tends to be on the with the big chunks and one can |
|
|
03:52 | the chunks. That's and that's depends how the compiler guys decided to implement |
|
|
04:02 | guided procedure and the junkies tend to the names of this largest chunk and |
|
|
04:11 | it gets subdivided and then one can let the runtime system decide what is |
|
|
04:19 | good chunk size. So the later or more work at runtime and it's |
|
|
04:25 | flexible and less sort of control from user and you kind of use those |
|
|
04:35 | ideas when things may be highly data in the making directorate for making, |
|
|
04:42 | small partition, there are no conditions the loop. So the work is |
|
|
04:48 | at compile time, particularly if the size is known, but since there |
|
|
04:56 | no data dependency, but generally that's true. So in that case my |
|
|
05:01 | want to choose this more flexible scheduling hope that the runtime system has a |
|
|
05:08 | guidance for when it knows what's how to assign the chance today. |
|
|
05:18 | so there's kind of a little bit terms of uh when things may |
|
|
05:24 | as I said, when things are known at compile time, the static |
|
|
05:29 | probably the best option because than the or the effort in dividing up the |
|
|
05:38 | is done at compile time. So one time things can be very |
|
|
05:44 | Whereas when there starts to be unpredictability the amount of work in a looped |
|
|
05:50 | the dynamic me ends up being the version and the more one goes down |
|
|
05:58 | the top towards guidance and towards then their own time system has to |
|
|
06:04 | more work. And sometimes uh, has kept around. So in like |
|
|
06:13 | instance, the order option that one see from previous executions or during the |
|
|
06:21 | on the various chunks has been done try to make some predictions of what's |
|
|
06:27 | good allocation of work and threats might okay. So here's some of the |
|
|
06:35 | , one of the graphical illustrations of concepts of scheduling. So at the |
|
|
06:42 | is the static and as you can chunks are fixed in size and the |
|
|
06:51 | in Iran Robin way grabs the chunks they are and to which the work |
|
|
07:01 | divided and dynamic the which thread gets chunk it's not know ahead of |
|
|
07:11 | So it depends how quickly the work finished and the threads king has started |
|
|
07:19 | different times and whoever. So in case threat number three, I guess |
|
|
07:24 | ready to grab a chunk first and thread to and then thread one, |
|
|
07:29 | can also see that three and four he got the chunk was it did |
|
|
07:34 | very quickly and went back to grab chunk And in fact in this case |
|
|
07:39 | 4th fret ended up in the end 3, 3, 3 chance. |
|
|
07:48 | in terms of the guided, there a couple of options of how things |
|
|
07:51 | have been implemented and as you can that it starts out with the biggest |
|
|
07:57 | and then in the next round of something, the chunk size have been |
|
|
08:03 | and successively reduced. And the other that to try to reduce um the |
|
|
08:11 | of scheduling but potentially Children starting with large chunks and then reduce the number |
|
|
08:19 | chung soon. In the end there be a few rounds of grabbing chucks |
|
|
08:28 | I think there's a couple of other of uh in this case, I |
|
|
08:36 | know exactly what the code was for this, but as an illustration of |
|
|
08:42 | happened in a particular case and in case using the static um, scheduling |
|
|
08:50 | that's it's not uncommon. So the friend on the first friend I need |
|
|
08:59 | may be used in a particular way end up in that case getting to |
|
|
09:07 | more work than what most of the threads are doing. So in this |
|
|
09:12 | , if in the threat case it's almost a factor of two difference |
|
|
09:18 | the thread that the maximum amount of and the one with the least amount |
|
|
09:23 | work on the other hand saying cold using dynamics scheduling things ended up actually |
|
|
09:33 | very nicely work or load balance between threats. So again, Montana, |
|
|
09:44 | , this is, you know how huh work is assigned to the |
|
|
09:52 | so hard to do. They work was using scheduling features in open M |
|
|
10:01 | and there is just another graph of ways of so on. And the |
|
|
10:11 | in this case using it chunks of fire for dynamic and think I need |
|
|
10:17 | it's not time. It's basically showing integrations in the loop are divided up |
|
|
10:24 | threats. And so any question on scheduling club. Okay town in runtime |
|
|
10:44 | . Both I had used it in other examples before and so we shall |
|
|
10:49 | Houston and the this demo uh on . So yes, a few |
|
|
10:59 | so I just wanted to comment on particularly, never get maximum number of |
|
|
11:07 | and I'll talk a little bit more now. They always just signed thread |
|
|
11:12 | the next slide or two. Mhm , a little bit tricky. In |
|
|
11:18 | of this cap max threads, which nothing has been requested in terms of |
|
|
11:27 | for the region then it actually gets the maximum number of threads available for |
|
|
11:40 | region. On the other hand, one has Before one gets to this |
|
|
11:47 | time function has one way or another to specify or properly have requested a |
|
|
11:58 | number of threads then it takes his from that specification request for a number |
|
|
12:04 | threads so it doesn't really tell you it happened. Forget max threads is |
|
|
12:13 | after a set number of threats has issued. So so what did returns |
|
|
12:21 | not necessarily the maximum available. It also be the next the number of |
|
|
12:27 | requested if that has happened before. wrong time function is called. And |
|
|
12:35 | a certain number of threads that is in a specific number of vets, |
|
|
12:42 | and then think the Houston an example also get the number of threads which |
|
|
12:48 | the number of threads that has been to parallel region but there was and |
|
|
12:55 | examples used also Get the 3rd I'd by using they get threatened number or |
|
|
13:04 | against the idea for a particular Some at the wrong time. This |
|
|
13:12 | queer functions and we discussed this the last week that three different ways and |
|
|
13:22 | Fontaine request threads and one is to what's known as the environmental variable. |
|
|
13:28 | on threads but it has the lowest and then they run fund function set |
|
|
13:33 | of threats takes priority over the environmental and the claws in a parallel |
|
|
13:43 | A statement that's the highest priority. it overwrites the other two effects. |
|
|
13:49 | not the same. And then this and it's just this a few times |
|
|
13:58 | a certain number of threats is not in the number of threads you |
|
|
14:02 | it's a request And on the next it's a busy one. So, |
|
|
14:10 | I encourage you to take a look it and I'll try to make some |
|
|
14:14 | to it because it's probably not feasible see on the screen, but maybe |
|
|
14:22 | is on the projector screen. But I think I'll start from the |
|
|
14:32 | in fact, So there is an save close to the bottom of this |
|
|
14:39 | . So so that can be kind direct and assess the number of threads |
|
|
14:49 | the region or it can be some flexible. So if this variable then |
|
|
14:58 | is false, that means they want fix number of threads for the execution |
|
|
15:04 | apparel region and as I said, the last LCF on this. So |
|
|
15:12 | says, so if I asked for threads then what's available and I'll come |
|
|
15:18 | to threads available in a bit then up to you ever implemented their own |
|
|
15:26 | system, what is going to But if you request fewer dressed and |
|
|
15:35 | available then you get what you Now on the other hand, if |
|
|
15:46 | okay that the number of threads may then if you ask for less than |
|
|
15:55 | available then you may get what's requested you may also get fueled. Mm |
|
|
16:03 | if you ask for more threads is in the dynamic threat scenario you get |
|
|
16:11 | number of threads. It is at as many as the number of threads |
|
|
16:20 | . Now the number of threads it's a function of So it's kind |
|
|
16:26 | slightly above the middle of the There is um says looks wasn't when |
|
|
16:34 | won't continue so so here's that's available is said as the maximum number of |
|
|
16:46 | that the hard work and support minus threats have already been used, especially |
|
|
16:55 | free capability of the hardware. So is capable of as a maximum minus |
|
|
17:02 | already been had told up by their friendships, two other regions in the |
|
|
17:11 | . So it's uh huh somewhat elaborate used by their unconscious system too, |
|
|
17:20 | to a set number of threats and what we try to stay as stress |
|
|
17:25 | it is that request not to guarantee what you get. So any questions |
|
|
17:38 | with runtime functions or threatened assignments to regions. Mhm Okay, so it |
|
|
17:52 | just a simple example, I think similar to worry where they talked about |
|
|
17:57 | I will not comment further on So and here's an example. Again |
|
|
18:05 | illustrating That you set the learner make . So that means now it's a |
|
|
18:11 | number of threads and in this case gold asked for as many friends as |
|
|
18:16 | are. Ah yeah, system can in terms of the number of processors |
|
|
18:26 | then that's the crucial thing and get ID for the threads and then it |
|
|
18:35 | yes, that one on the that's to global variable num threats. So |
|
|
18:43 | , basically says I think I said 11 but So in this case it's |
|
|
18:50 | single yesterday one updated but this is good way or actually then making certain |
|
|
19:00 | which may be useful um hopefully not correctness, but in terms of understanding |
|
|
19:06 | performance of the code that you check number of threats was actually signed because |
|
|
19:13 | , it may not be what was . Okay. Toy. And there |
|
|
19:20 | assignments have suggested to do this kind procedure used request a number of years |
|
|
19:26 | then you check what the runtime system gave you uh it's just quickly common |
|
|
19:40 | and a little bit distinction between the threat is totally logical entity and the |
|
|
19:46 | fred yes, The one sort of continuation of the parent threat in the |
|
|
19:56 | region. This we talked about last in so yes, demo. In |
|
|
20:04 | of that thread ideas are local to Prairie region. This is just given |
|
|
20:12 | complexity of open empty in this current . It's just a good again, |
|
|
20:21 | cheats sort of speak in terms of most commonly used open MPI functions that |
|
|
20:30 | can get along way with And they about 21 functions that are on this |
|
|
20:38 | . No, at the core of one would need to know to do |
|
|
20:42 | decent job with an open em people now I'm going to talk about this |
|
|
20:51 | example. So and the questions Okay, so this example called computing |
|
|
21:12 | . And it turns out there is simple equation that If one integrated on |
|
|
21:18 | interval of 10-1, it turns out be exactly by mm No. And |
|
|
21:28 | computer, we need to do this version of this integral. It's not |
|
|
21:34 | function. So simple numerical approximation of integral is this kind of was known |
|
|
21:43 | older approach that basically approximating? Because integral is the area under the so |
|
|
21:54 | the curve is defined by the The other four divided by one plus |
|
|
22:00 | squared. So the area under that between zero and one turns out to |
|
|
22:06 | hi, so approximate the area under curve antony then use this collection of |
|
|
22:16 | that are the lighter blue versions. And of course it's not exact but |
|
|
22:22 | a decent approximation in particular if the are narrow in the X. |
|
|
22:34 | So I'm going to do this collection rectangles and the discreet version of the |
|
|
22:45 | some warriors sequential code. So in case The internal wall between zero and |
|
|
22:52 | was divided in 200,000 rectangles. And this call does takes this little point |
|
|
23:04 | . So what it does pick the or each of these rectangles. So |
|
|
23:15 | the left coordinator of the rectangle is . And then it takes the midpoint |
|
|
23:25 | you know I and II plus So that's a good point. And |
|
|
23:29 | this is the integral. So it the height of the rectangle at this |
|
|
23:36 | of the rectangle. And then it up all the rectangle areas. And |
|
|
23:45 | the rectangles are all of the same you can not sums up the |
|
|
23:51 | it sums up all the heights. then since they have the same width |
|
|
23:57 | the end it does an application when step size or the width of the |
|
|
24:02 | . So this is just adding up height of the rectangles. So this |
|
|
24:07 | the sequential told. So now to open mp version. So well so |
|
|
24:20 | question. Okay okay so in this um this code was a hard |
|
|
24:30 | That's the tube creds to work on rectangles and then as we can see |
|
|
24:37 | a bunch of these variables or global . Um So the number of threads |
|
|
24:46 | also then one generates basically generate of values. There are as many. |
|
|
24:55 | basically each threat gets his own uh of partial sums or hide the |
|
|
25:03 | So in the two cases you can the blue and reddish triangles and one |
|
|
25:09 | gets blue and one friend gets we'll talk more about that. But |
|
|
25:14 | since each thread has his own some there is really no race conditions to |
|
|
25:23 | about the threats trying to update the same some variables. And then um |
|
|
25:31 | was that the number of threads requested the the two threads that gets to |
|
|
25:38 | a bump and then we have the region here for the open Mp. |
|
|
25:44 | in this case I. D. and threads without the ea so there's |
|
|
25:52 | variables when I say them and sounds same but pay attention to there is |
|
|
25:59 | different and threads in this code, one without DNA is local. And |
|
|
26:06 | each thread in the primary region. huh. And usual thing and then |
|
|
26:13 | threat gets his own I'd and then also up this is local private and |
|
|
26:24 | without the a from reading the number threads that was a sign. But |
|
|
26:35 | we have this global variable and friends are divided defined up here that then |
|
|
26:41 | get updated by one of the french . world race conditions and then we |
|
|
26:51 | this slope that than each one of threads get to welcome. So there's |
|
|
27:03 | parallel four here. So basically all called chunk here is executed by all |
|
|
27:09 | friends and in this case to threats since again they have their own partial |
|
|
27:17 | variable here that there is no race and updating there partial sums were they're |
|
|
27:27 | Luther and in this case and then used it in the previous example |
|
|
27:32 | So um each thread increments looping their web, the old number of |
|
|
27:42 | So it does things in a round fashion or the total range of loop |
|
|
27:51 | . And then it's the same thing of colder and get the height or |
|
|
27:55 | respect to triangle triangle. So after is done we have as many partial |
|
|
28:04 | sums. That's our number of So then we need to add them |
|
|
28:10 | and in this case this is outside parallel vision that goes from here to |
|
|
28:16 | . So this is basically sequential So there is no issue here in |
|
|
28:22 | of um race conditional. And the thing going on with updating pipe. |
|
|
28:32 | right. So any questions on this version open empty on the serial |
|
|
28:48 | Okay, so now let's see. here's what happens or obviously it's a |
|
|
28:55 | of threats is two and then there a table that has 1, |
|
|
29:00 | 4 threads obviously was really compiled for Thread assignments. So in addition to |
|
|
29:06 | the run it for three and Yeah, so and then we have |
|
|
29:12 | execution time in the right column. , so and in comments to what |
|
|
29:26 | 1-2 threads, that was obviously good in the execution time, but then |
|
|
29:33 | didn't really plan out too well. , so we have the colder, |
|
|
29:44 | And the suggestion why it doesn't scale two threads in this case. |
|
|
29:54 | uh huh. So cruz for listen to replace that, you ask |
|
|
30:08 | and I wasn't very clear on my . I know the suggestion was that |
|
|
30:14 | you're not getting the same number of , it is a little bit trickier |
|
|
30:24 | . So it was a good thing , that race condition right. Was |
|
|
30:29 | because each friendly has its own some . Right? So they could, |
|
|
30:36 | no competition. Everything is uh perfectly little balance in this case if |
|
|
30:43 | number of directions is a multiple of number of friends And in this case |
|
|
30:51 | was a few threads and 100,000 So Even if one thread gets one |
|
|
30:59 | intervention than the others, that would really matter since there's so few threats |
|
|
31:03 | they are basically even in four threads have what? 25,000 iterations each and |
|
|
31:10 | actually perfectly even No. So the part is yes. Mhm. So |
|
|
31:28 | this case we're four threads being so because Houston array Some array to |
|
|
31:41 | this, the one um partial sum the array for each one of the |
|
|
31:49 | . So that means that yeah the elements of the some array are in |
|
|
31:58 | same cash line. So At the one cash is one of the things |
|
|
32:09 | in order to get updated our That means different course, don't share |
|
|
32:17 | one cache show and threads are in different core. The cash line investment |
|
|
32:24 | to in a way Move or be between the different one. Cash is |
|
|
32:35 | the cache line has invalidated And thread updates Some zero. So then against |
|
|
32:44 | then when another threat is different core to update the same cash right then |
|
|
32:53 | these to grab it, copy it in this case the first court. |
|
|
33:01 | that's why for every time threat in different core wants to do something than |
|
|
33:12 | one that happens to have the current version. Things need to basically pick |
|
|
33:18 | it back and forth all the So this is what's coming on as |
|
|
33:23 | sharing in the sense there's no sharing of any data items but the data |
|
|
33:33 | happens to be in the same cash . So this is the notional for |
|
|
33:45 | and that's why this notion of our or cash lines and uh of course |
|
|
33:56 | a cache line you know 11 to may be shared them on a few |
|
|
33:59 | but not level one. So, the computer architecture in terms of what |
|
|
34:10 | are private to each core and also size of the cash line and horror |
|
|
34:17 | is allocated to cache lines. It's when it comes to understanding performance, |
|
|
34:31 | . Any questions on that. So have a part of the reason for |
|
|
34:38 | on talk about processor architecture. typical sizes of cash is and cash |
|
|
34:47 | sizes and didn't talk too much about coherence, but this is part of |
|
|
34:56 | the system needs to do. Yeah, from Mhm. All |
|
|
35:04 | So how does one avoid these So there are a few ways, |
|
|
35:11 | most naive and straightforward way is to kind of having of this summary. |
|
|
35:22 | basically the bad field in this not generously to the array so |
|
|
35:29 | So in that case it feels out cache line with things we don't care |
|
|
35:37 | . So the different some variables that are interested in ends up indistinct cash |
|
|
35:46 | . So in that case there won't any form of sharing anymore. But |
|
|
35:53 | kind of an ugly way of doing because that means that you're cold will |
|
|
35:59 | on the particular cash line sizes in platform you're using. So if you |
|
|
36:06 | on a different platform, a processor has a different cash line size and |
|
|
36:11 | need to change your parents. So works. But it isn't good. |
|
|
36:18 | the fact that it works, we see on this slide novel padding added |
|
|
36:23 | make sure that with some variables are the cash loans from this case it |
|
|
36:28 | the problem because now we get good up even or reduction in execution |
|
|
36:38 | not quite a factor for but not bad. It's close. So I |
|
|
36:45 | benefit from using more threats of the Within quote scales at least four threads |
|
|
36:52 | this case. Any questions on Mhm. Mhm. Okay. So |
|
|
37:07 | does one do to try to make code more portable? Wow. So |
|
|
37:18 | , I said one way to avoid some array and basically have local some |
|
|
37:26 | for each thread. So now there no variable sharing, there's no concerned |
|
|
37:33 | this condition for obtaining the variable because strength as its own some variable. |
|
|
37:40 | it means that at the end one to have a global variable. The |
|
|
37:49 | . Yes, updated. Well um local fred son entities so in this |
|
|
38:00 | this is done and inside the paramo but using the critical construct. So |
|
|
38:09 | that case Pie only gets updated when thread at the time. But all |
|
|
38:15 | get to update by. So this the way then to make the code |
|
|
38:24 | dependent on the particular architectural feature on processor. So it's a lot more |
|
|
38:30 | version of the code. And in case so that pretty much as well |
|
|
38:40 | the parent version and or one since a single variable but could also use |
|
|
38:53 | atomic state. It was only So atomic works a single variable and |
|
|
39:02 | can work on the collection of So in this case either one will |
|
|
39:09 | and the other one since it is reduction operation we can also use the |
|
|
39:16 | reduction statement to have the system take of um avoiding race condition and also |
|
|
39:28 | potentially in parallel instead of a sequential . Now doing in peril submission of |
|
|
39:38 | thread vice some variables. So that's option for this symbolical. And in |
|
|
39:49 | case it turned out there was not as efficient as using but it should |
|
|
39:56 | better. So for four threads in case tom is critical um we are |
|
|
40:04 | serial edition uh the four dress no didn't because too much of an |
|
|
40:14 | But where is the reduction may not been all that will be implemented but |
|
|
40:20 | should scale better onto a large number threats than the news from the serial |
|
|
40:29 | . Uh huh. And I think was more or less what I read |
|
|
40:36 | and take home condition about this cold are too avoid for sharing by instead |
|
|
40:48 | local variables. And then well they atomic critical time and the partial sums |
|
|
40:55 | use the production costs and function. . Oh now I want to switch |
|
|
41:06 | test your comments on that so and now we'll hand it over to suggest |
|
|
41:14 | a demo. So any questions on ? Mhm. Uh So far today |
|
|
41:21 | terms of or previous lecture, I'm . Yeah. Okay. So I |
|
|
41:35 | mr josh also with demo test but some comments first. So there's a |
|
|
41:41 | of different aspects of tasks and in way of some more the general version |
|
|
41:54 | parallel regions. Yeah. And I'll uh talk through some examples and see |
|
|
42:04 | far I get but I wouldn't believe is 20 minutes or so for so |
|
|
42:11 | to continuing the demo from last All right. So, well said |
|
|
42:18 | idea is to Kind of move away the strict 4th joint type of business |
|
|
42:31 | and have a more flexible way of synchronizing and scheduling. Task then what's |
|
|
42:46 | in the parallel regions for joint And I will probably not be able |
|
|
42:53 | cover all of it today but I then do it next lecture. So |
|
|
43:06 | and as you will see first, looks like in the examples I have |
|
|
43:11 | that is kind of looks fairly similar sections as it says on this |
|
|
43:18 | but it is the more flexible construct sections. So what so, tasks |
|
|
43:32 | initiated in the private region and then kind of Yes. Associated cold and |
|
|
43:46 | also the data environment and its own variables package up everything that's normally is |
|
|
43:57 | the parallel region but task and then independent of each other and I'm sure |
|
|
44:08 | examples of that and I think this just pretty much a comment that already |
|
|
44:18 | that, you know, in a parallel regions were in the implemented as |
|
|
44:26 | of tasks without opening it up to program actually him or herself initiated and |
|
|
44:39 | tasks. But now with the test came, I think, what was |
|
|
44:44 | ? Open? Empty version three. it has been used because it solves |
|
|
44:49 | of the issues that the early 4th version did not address very well. |
|
|
44:57 | it starts with an open bragman open test simply and it has done a |
|
|
45:03 | of causes that controls what's happened. I will talk about these various aspects |
|
|
45:12 | tests, that's the goal. So tends to be I guess the way |
|
|
45:20 | used Most of the time is that have a single threat to create the |
|
|
45:31 | but began the task then are independent each other. Whatever France is in |
|
|
45:38 | parallel region then can grab the So basically one that phenomenally in the |
|
|
45:52 | simplest incarnation, basically threat is assigned the task by the runtime system and |
|
|
46:02 | the threat, that's what task it's the runtime system decide and if there |
|
|
46:10 | a president task then After it gets be more than one test. |
|
|
46:18 | So now this simple example then I going to get through uh huh. |
|
|
46:27 | is just this is snow carol That's pretty simple and see what |
|
|
46:34 | No, so here is to stand open empty without tasks in this parallel |
|
|
46:41 | And then we are in this case statements um Now if one has 2 |
|
|
46:51 | as we do know there is no order in between threads so a number |
|
|
46:57 | things can happen in terms of what printed, so I will just continue |
|
|
47:03 | my own here and say you know threads in this case this happened. |
|
|
47:12 | that was pretty much what I had would happen but not necessarily guarantee that |
|
|
47:17 | would happen. Um So but it have been um that again they were |
|
|
47:31 | so two friends. So basically the both reds could basically have grabbed each |
|
|
47:46 | since it's replicated code, remember So the threads both have all three |
|
|
47:54 | statements, so that means one could Both Red one and to print a |
|
|
48:00 | before anything else. So they could been all kinds of different combinations of |
|
|
48:07 | three words, a racing car, on which threat gets to execute what |
|
|
48:15 | , but for Israel they will obviously them in the order that they are |
|
|
48:25 | . So here maybe you can see should be easy enough and it takers |
|
|
48:32 | this example, my content. Mhm , obviously since just one thread and |
|
|
48:42 | we'll go through the statements in order no problem. Right, so now |
|
|
48:51 | going to try this with the task . So in this case there is |
|
|
48:56 | friend that as the print of a and then it generates tooth tasks. |
|
|
49:08 | task that Prince race and one test prints car. So now what were |
|
|
49:20 | ? Cool. Okay. Any adjustments huh. As Okay. So teach |
|
|
49:32 | and the eu Okay it was a bit of it kind of echoes in |
|
|
49:38 | end so I will thanks for the . Even though I can comment directly |
|
|
49:45 | it but yes, obviously what could ? And the reason is in this |
|
|
49:52 | that A is printed first is guaranteed then that thread generals these two tests |
|
|
50:03 | there is no particular order in in which the tests are executed. So |
|
|
50:09 | are not subject to the single threat that are subject to or managed by |
|
|
50:16 | number of threats in the parallel So in this case there was set |
|
|
50:23 | the bottom the slide Fun specified two of doesn't stay in the cold example |
|
|
50:30 | they are mr ram it in this with just two threads and it may |
|
|
50:36 | happen that the threat that gets the task May get to its statement 1st |
|
|
50:46 | it could also be that's different. get the car test prints that |
|
|
50:55 | So there is no order between racing even this task example mhm Later on |
|
|
51:05 | next time I will tell you about ways how you can organise or sequence |
|
|
51:12 | as you may want them because there ways for controlling of the task, |
|
|
51:20 | each other or wait for each So here's another one. No, |
|
|
51:29 | had another. Okay, print statement is fun to watch. So in |
|
|
51:38 | expectations of what this code might Well we have the issue again a |
|
|
51:56 | thread, it's just the two but this single thread has print A |
|
|
52:07 | print is fun to watch. And we have the two tasks that are |
|
|
52:15 | executed independently and in parable by In case different threats. The two |
|
|
52:22 | So clearly we get a first because was printed before anyone of the test |
|
|
52:29 | generated and then we discussed already on previous slide that racing cars and were |
|
|
52:36 | at any orders. And then we a single thread again, the two |
|
|
52:43 | are completed at some point and we is fun to watch. So here's |
|
|
52:50 | actually be what's happening that the single then at a and it's fun to |
|
|
53:00 | may get the job done before any of the tasks, get their job |
|
|
53:05 | . So there's no guarantee that the will be completed before the single thread |
|
|
53:13 | has a and it's fun to gets its job done. So tests |
|
|
53:19 | be initiated at any time that the system decide and it may not be |
|
|
53:27 | in terms of the sequence of the as you see it. Okay, |
|
|
53:36 | I think well um let's start the and if there's time left, I |
|
|
53:44 | talked about the next item for Yeah mm yeah, I think you |
|
|
53:54 | share the screen. Right, mm okay. For some reason here |
|
|
54:03 | ssh session keeps dropping off. yeah, mm yeah, mhm |
|
|
54:25 | mhm yeah, mhm Oh, mhm, mhm, mhm yeah, |
|
|
54:49 | Oh yeah, mhm uh just a of examples that uh we're left to |
|
|
55:06 | last time, so I guess mhm pretty much everyone now can know how |
|
|
55:15 | uh the scheduling works here. So obviously we have uh requesting a |
|
|
55:21 | so just assume that we get what asked for from the operating system and |
|
|
55:27 | just Anyone, any takers, what happen if we have 16 nutrition's for |
|
|
55:32 | for loop in if you use dynamic and uh if you use static |
|
|
55:40 | arctic best practices. Mhm. yes, with static you get an |
|
|
55:47 | distribution with all the threads and with , it will depend on how the |
|
|
55:53 | each thread uh got uh finished its nutrition. And if there is any |
|
|
55:59 | attrition available that it will take and chunk size defines that, how many |
|
|
56:05 | iterations that each thread will get at um at a given time there's one |
|
|
56:11 | thing that you can use a static chunk size as well. So in |
|
|
56:16 | case the difference will be let's say . you have 16 nutrition's um let's |
|
|
56:22 | you had two threads, that means get a pediatrician's per per thread. |
|
|
56:27 | ? So from 0 to 7 uh number 12 0 would be executing in |
|
|
56:34 | normal case if you don't specify any size and the rest of them would |
|
|
56:37 | threat to uh thread one. But you specify junk size and in some |
|
|
56:44 | uh these uh the attrition zero and will be executed on trade zero, |
|
|
56:50 | two and three will be executed on one and so on certain. Then |
|
|
56:54 | distribution will be more in a round fashion rather than even distribution. So |
|
|
56:58 | a difference if you use chunk size static and if you don't so just |
|
|
57:04 | example here. So as you can that static, we would expect Uh |
|
|
57:10 | threads and 16 iterations the even distribution the tracks for inspiration and with |
|
|
57:16 | you can expect such um execution as . So zero executed quite a few |
|
|
57:24 | on them. Uh The last train executed two of the patricians, there's |
|
|
57:29 | distribution between the threats regarding their Yeah. Yeah. Yeah, that's |
|
|
57:42 | good question. It's not a Um the the use case for dynamic |
|
|
57:49 | mostly when you're uncertain of the amount time that it will take for |
|
|
57:55 | So let's say in in a simple multiplication case or matrix vector multiplication. |
|
|
58:00 | that case you are expected to work the same piece or the same amount |
|
|
58:05 | data on with each thread. So you can expect the traditions to |
|
|
58:09 | the same amount of time. But you're performing some other words, that |
|
|
58:12 | on how much data that each thread . In that case, dynamic maybe |
|
|
58:16 | little bit faster because if one thread finished then it's better that it executes |
|
|
58:20 | other piece of situations as well rather the that actually might have gotten that |
|
|
58:27 | situation. So it depends how your is. Mhm. All right. |
|
|
58:36 | think everyone knows the single uh construct well. So let's see if anyone |
|
|
58:47 | with single I just have two questions this example. So where do you |
|
|
58:54 | are the implied barriers in this uh the scope, is it here or |
|
|
59:00 | it the 2nd 1? Mhm. . Well, all right. |
|
|
59:28 | so single, if you remember, comes with an implied barrier after the |
|
|
59:33 | section and that it executes. So you can expect all the all the |
|
|
59:38 | to uh to synchronize with each So no thread will uh go to |
|
|
59:43 | second uh single statement. Well, least at that line in the in |
|
|
59:48 | source code before one of the threads finished performing the first print of. |
|
|
59:55 | , um with single statement, you add a no wait clause. And |
|
|
60:00 | that's going to do is it's going tell all the other threats that you |
|
|
60:04 | really need to wait for a threat might be performing the second statement to |
|
|
60:11 | and the other sides can just simply on with whatever is the next piece |
|
|
60:16 | . So you can expect an something like this. The first friend |
|
|
60:23 | , you're guaranteed that no other thread be performing second or third print statement |
|
|
60:27 | the first print statement is executed because an implied barrier after it. |
|
|
60:33 | In case of second print statement, we did uh specified no weight loss |
|
|
60:39 | the single statement. That means some the threats went ahead and printed the |
|
|
60:44 | print statement even before thread forgot print second print statement. So there's an |
|
|
60:50 | barrier after the single construct. But can skip that by adding this no |
|
|
60:56 | clause after it. Mhm. Okay. At this Is a tricky |
|
|
61:09 | . Alright, so take a minute at this code here. And the |
|
|
61:15 | here is what if you specify J private. And what's going to happen |
|
|
61:24 | you specify Js shared or rather not as anything and let it be shared |
|
|
61:31 | default because it's defined outside the parallel ? Good. So would you expect |
|
|
61:38 | correct output for uh for this program you said J Private? Or if |
|
|
61:46 | would you expect the correct output refused let it be shared? So while |
|
|
61:51 | thinking, I'll just quickly comment We have for attractions in the outer |
|
|
61:55 | because we have been defined as Yeah. And then the inner |
|
|
62:00 | we have five patricians, and because setting two trends here. Um In |
|
|
62:08 | , we obviously have four times that's 20 nutrition's. But you can |
|
|
62:14 | um Uh 10 of those attractions will divided each uh each of the two |
|
|
62:24 | . So I guess what might happen we subject to private or shared? |
|
|
62:40 | , oh. Mhm. We wow. There's models. It isn't |
|
|
63:01 | start because it's in the scope. ? Yeah. Well you don't have |
|
|
63:08 | parallelism here, We're just paralyzing the loop here in this case. So |
|
|
63:15 | of the four outer patricians to each be distributed to two of the |
|
|
63:22 | And then in the inner side for inner loop, you can expect Both |
|
|
63:28 | the threats to a trade over the Patricians for Jay. But would that |
|
|
63:35 | if you have a shared J. ? I don't think so, because |
|
|
63:42 | you have a um if you have pragmatist mp parallel for in front of |
|
|
63:47 | loop for let's say for this outer here by default, that makes the |
|
|
63:52 | uh loop variable as private to each . But if you have another loop |
|
|
64:00 | inside that same loop, that private does not get inherited by the inner |
|
|
64:08 | . So by default, J will shared. If you don't declare it |
|
|
64:12 | . And what could happen is sure both the threads might run the the |
|
|
64:18 | outer loop but because they are sharing . A variable one threat might updated |
|
|
64:26 | some value. Uh The other one read some uh some value that it's |
|
|
64:31 | Yet worked on outside thread one increased to from 1 to 2 and then |
|
|
64:37 | zero comes in And read the two it starts from 2 to 5 rather |
|
|
64:42 | going from 1-5. So that means sharing that value, Correent? So |
|
|
64:48 | may not get all the 20 traditions that. uh for the for this |
|
|
64:54 | groups of four times 5, 20 may not get the all 20 |
|
|
64:58 | So here a sample out but you expect something like this. As you |
|
|
65:04 | see here we got I zero going J 12 J five Then I went |
|
|
65:11 | from J12 J five. But again because jay was shared, you apparently |
|
|
65:18 | up running J calls to to again I calls one and that might have |
|
|
65:23 | because I called zero may have updated to do at the same time. |
|
|
65:28 | that means J is being shared. so you may not get all 20 |
|
|
65:32 | so that if you see this count , It won't go all the way |
|
|
65:36 | 20 I think yes it just ended executing 18 nutrition's in total. But |
|
|
65:44 | you set the private jet to be explicitly then you can expect a correct |
|
|
65:53 | And then you get all the 28 here. So you have to be |
|
|
65:58 | in terms of the loop index is they're not private for inner loop. |
|
|
66:03 | the problem will be paddled for is for an outer loop. Mhm, |
|
|
66:12 | hmm, variable. Okay. You Yeah, all court move. Mhm |
|
|
66:22 | , variable. Cable be private because defined inside the parallel region. So |
|
|
66:28 | you want to check the scope of variable, just check where it is |
|
|
66:33 | if it's defined inside the barrel region it will be private to all the |
|
|
66:37 | . If it's defined outside barrel then by default it will be shared |
|
|
66:41 | you explicitly started to be private first . Last private. Any of |
|
|
66:48 | Okay. So yes. K is in this case. All right. |
|
|
66:58 | see. Yes, I think I the same uh example for tasks as |
|
|
67:04 | professor just showed on the slide. this is what he is going to |
|
|
67:10 | in the coming slides. But one that I wanted to mention is, |
|
|
67:15 | say if you had a variable uh as private for the panel region, |
|
|
67:22 | have you had a variable named chair for this barrel region then if you |
|
|
67:32 | all the threads, let's say you on J or update J whatever it |
|
|
67:39 | do, all the threads can do with uh their copy of J |
|
|
67:44 | But when the tasks are called that variable is upgraded to a first private |
|
|
67:53 | dust. Okay, so if you had private uh j you did not |
|
|
68:02 | any updates, that means you may a garbage value or any value that |
|
|
68:08 | , I may have said for a variable, that value will be um |
|
|
68:13 | into the tasks. So any private upgraded to first private here? |
|
|
68:24 | All right, I think that's most the examples I had to show dr |
|
|
68:32 | , maybe you can continue from here . Okay, so I want to |
|
|
68:36 | if there were any questions before then describe my screen Yeah, no questions |
|
|
68:55 | this, yeah. Oh I don't any right, okay, yeah, |
|
|
69:03 | for example will be a little bit exactly how variables are inherited in |
|
|
69:10 | tasks that are similar but not identical the case for carbon regions as I |
|
|
69:16 | mentioned. Mhm So um Scotland Yes, so the global variables there |
|
|
69:25 | behaving the same as in parliament regions as um Okay, mm she has |
|
|
69:35 | uh so local variables, there are and then it becomes first private as |
|
|
69:46 | can hang mentioned. So here is example, I think that tried to |
|
|
69:51 | that a little bit contrived in the that too open and pay parent regions |
|
|
70:01 | one I think I am be a and and subsequent declare be as private |
|
|
70:11 | then we can see A and B C are what defined before one gets |
|
|
70:19 | anyone of the parallel regions of the variables than D. Is defined in |
|
|
70:28 | For a 2nd problem region. So clearly private. Per threat in this |
|
|
70:37 | region. And then in this region is a task as well and then |
|
|
70:47 | is defined in a science within the . So now the question is |
|
|
70:56 | what's the scope of the different five variables defined years? So scope of |
|
|
71:06 | . Is pretty straightforward. It's only outside and there doesn't appear anywhere else |
|
|
71:12 | this cold. So it is within task. It inherits as it said |
|
|
71:19 | global property. So that's shared. about the what does B. And |
|
|
71:34 | terms of the task? And is still a brush fire? Yes, |
|
|
71:47 | think I heard it correct. So , it's first private because yeah. |
|
|
71:56 | inside the test. It's both private it also inherits what the the |
|
|
72:08 | So it's assigned a value in the . So that's why it's first |
|
|
72:14 | not just private. How about Mhm. We'll see is defined before |
|
|
72:30 | of the parallel regions. So it's same with shared right? So similar |
|
|
72:40 | A. Doesn't. How about d ? Well can we say about the |
|
|
72:55 | the task? Well these defined in parable region so is private to each |
|
|
73:13 | in the problem region. So in task and it is first grade and |
|
|
73:23 | is simple is just private because it's within the task. Okay. All |
|
|
73:30 | . So now what are the So A is pretty straightforward. That's |
|
|
73:36 | share. And it was designed the of one. So what's B What's |
|
|
73:44 | value of B? It is Justin huh. Parliament is serious. You |
|
|
73:57 | repeat. Oh, I'm sorry. some junk values since uh there's no |
|
|
74:03 | carried out by the uh well it's on the audience doesn't work too |
|
|
74:13 | today. So it's thank you. in the cold that is visible was |
|
|
74:22 | assigned in the various even though its private because after the second parallel region |
|
|
74:33 | be was never assigned the value Because private. So it's not b equals |
|
|
74:42 | because be in the private vision is different B. And local stores for |
|
|
74:46 | threat and it wasn't first private in parallel statement. So it doesn't inherit |
|
|
74:52 | whole stupid it's a new memory location their private. So it was never |
|
|
74:58 | . And the fact that becomes first within the task doesn't it's just important |
|
|
75:05 | value it had before entering the And if since it was never signed |
|
|
75:11 | , that's why it's basically I'm She is pretty straightforward. three. |
|
|
75:22 | issue defined too because the even though first private, but it was actually |
|
|
75:26 | the value. So we know what is right. And then he is |
|
|
75:32 | signed in the region. So that's problem. So it's as the judge |
|
|
75:40 | out and I tried to stressed all time. It's really tricky or one |
|
|
75:45 | to be very careful to keep track the status of variables what they are |
|
|
75:54 | or private and whether they are initialized not. So, so it's ample |
|
|
76:05 | to make mistakes unfortunately. Right. right, let's see. Yeah. |
|
|
76:13 | this is just um comment on So as I said earlier on, |
|
|
76:24 | order to manage or clear about whether are private or shared, it's good |
|
|
76:33 | always states exactly what you wanna intends variables to be and not rely on |
|
|
76:46 | status of the variables. Okay, I had something about synchronization, I |
|
|
76:54 | couple of minutes I'll just uh mention um um probably talk about the first |
|
|
77:04 | because I think I have a couple minutes only. So barrier is no |
|
|
77:09 | than we're used to and task weight I guess the opposite to no wait |
|
|
77:16 | in the parable regions statements and that also showed for the single statement. |
|
|
77:24 | here is the old example again, . And now with this task wait |
|
|
77:34 | statements on this case, even though don't know which order car and raised |
|
|
77:43 | will be executed. We know that will be executed before is fun to |
|
|
77:50 | is executed because of the task wait . So in this case there should |
|
|
77:58 | so exactly that is fun to watch always the last thing to get printed |
|
|
78:04 | of this desk wait state and Task group is yes. Uh |
|
|
78:18 | And yes I had talk about this to so now maybe we can see |
|
|
78:27 | at least I'll wait and see if one was tell me what gets printed |
|
|
78:36 | this case. Okay. So when have excess a global variable then we |
|
|
78:45 | the single thread that generates the first and then we have a task wait |
|
|
78:52 | then generate the second task. So but access a global variable. Now |
|
|
79:03 | do you expect to get printed? . And the takers. Okay so |
|
|
79:30 | first tasks generated in this case It actions, started up zero and gets |
|
|
79:41 | . So when the first print if encountered It's a Task one. It |
|
|
79:50 | X. And the second task Um doesn't get initiated because of the |
|
|
80:00 | waits statement. So They get the one is going to print x equals |
|
|
80:09 | . And then the second task increment again. So we'll get tax one |
|
|
80:15 | actually equals two. On the other . Mhm. If um we change |
|
|
80:30 | code to use the first private Then what what phone one intellect. |
|
|
80:48 | Yes. Uh huh. I think heard one and runs and that's what's |
|
|
80:52 | to happen. So 1st private. means each task gets initialized With x |
|
|
81:01 | zero. So that's why we get print statement even though there is a |
|
|
81:07 | wait is going to print the Excellent. And I think that's |
|
|
81:17 | Um Mhm. Let's see. No . I don't. Right. So |
|
|
81:34 | the thing that in this case because first test initial list X. Which |
|
|
81:46 | also private so that task so if task finishes then it's not clear what's |
|
|
81:54 | to get printed. So I think my time is up. So I |
|
|
82:02 | the next thing is I'll talk about but that's for next time. |
|
|
82:11 | Okay. And the questions. Uh . So I stopped recording now and |
|
|
82:23 | |
|