© Distribution of this video is restricted by its owner
Transcript ×
Auto highlight
Font-size
00:00 Oh, so this is the last . A plan to do An open

00:08 , at least for the moment. at the end of the lecture,

00:14 Josh will give a demo on some the open MP constructs that we have

00:19 talking about. And I will cover . And, um, they

00:30 topics for today. You should be the next slide. So I talk

00:36 the data scopes and some of the work sharing in the other construct that

00:43 toe tasks. Most of it is to what we already talked about.

00:47 happens, Thio correspondent constructs in parallel . And then I will talk about

00:56 , affinity in terms of the They're not asking you to do

01:02 Neither with task nor thread affinity, welcome to try it. But it's

01:08 viewed as advanced concepts in open M and that I just want you to

01:15 aware of and this introductory spc course cover open MP and some depth one

01:27 pretty much almost need the full course on open MP. But so this

01:33 for information, mostly, and not much for you to, and then

01:40 , as an assignment, but maybe of the concepts that I will talk

01:45 something and you may want to do a project. Wow. So with

01:51 , said eso, data scope is much similar. Thio three and the

01:58 for Carla Regions. Except there's one distinction that in tasks it's when you

02:08 the same variable names. Uh, rule is that the variable is acted

02:17 treated as the first private variable for task, so that means it does

02:25 or gets initialized from the calling So here is not an example that

02:33 Fox, see if it can get and sorting up with the scope of

02:39 various variables are. So start with away two things in alphabetical order

02:49 as we can see A is defined sequential part of the code before get

02:59 of the parallel regions being defined so on willing to volunteer about the scope

03:09 a is basically shared private. So private. Well, maybe it's too

03:20 . So yes, it was declared the sequential part, Um, and

03:26 it's globally shared variable, so it shared. Now we have been a

03:34 bit confusing the way this example was that this first declared a shared.

03:41 then it was actually declared as private the second Carol and Region statement.

03:53 now inside the task on directive, is the no volunteers old, then

04:13 first private because it's declared is But now inside the task because of

04:23 inheritance rules sort the task directive. , is treated us first private.

04:36 we have C, and it takes for telling me what the scope of

04:44 is shared. Private source. That's for that in the private,

05:04 , but to, um mhm Ah, step back and think about

05:13 a little long. That's okay. the local they cold example right c

05:27 defined and the sequential part of the . And then it is not for

05:37 mentioned anywhere. Such we just shared ? Yes, so and then it

05:48 to d. So and it takes for lead. So D is not

06:05 in the sequential part. So it's . Lee appears inside the parallel

06:17 so that means the is private, . What's kind of the second credible

06:30 , and then when it comes to inside the task. Uh huh.

06:39 is inherited from the calling parallel region it ISS and first private giving the

06:47 of inheritance for tasks. Um, I guess it's simple to see it's

06:59 found inside the task, so that his private to the task. So

07:08 I also made a little bit so I'll try this one since the

07:12 value off A There's one. So it's a question about B. What

07:32 might be okay, Um, here the thing. So maybe anyone can

07:56 or why. I claim this is case. You know, that was

08:02 initial gas, but I thought it too far fetched. Um, e

08:06 a trick question? Yes. So it's b is private thing, it

08:13 possibly be. Uh huh. It be referring to the same b above

08:20 . Correct? No, there is other assignment. So it would have

08:25 be on a fine. Yeah. there's no assignment to be the

08:32 Be so even if its first private the task region I was never a

08:38 the value in the calling region from calling region. So that's why it

08:44 up being undefined whenever we say Last question mark is that, like

08:51 , defined in the sense that, it could be zero just because that's

08:54 default for events or yes. So a program of perspective, one should

09:01 it as being unknown immediately that the , when it allocates story for

09:09 that it also is kind enough to it to zero. But there's no

09:15 that that will be done. So just memory space. Whatever is happens

09:20 be in that memory location at the . Unless it's initialized by the compiler

09:27 telling you someone can count on Initializing things, too. Nice values

09:35 you. Yeah, so then we See, that was sheriff. So

09:44 we agree on that. And how the Given the discussion we just

09:57 It is initialized so, and it's first private inside the test. So

10:04 inside the task construct them inherits the valuable for today. Uh huh.

10:13 there's nothing magic about the So this a little bit of the scoping rules

10:20 , um, shared in variables and hard things are treated when it comes

10:26 the task construct. So here's the recommendation ist many times do in

10:34 not to be kind off taken by Take or so to always be explicit

10:45 what Mom wants to do in By then, say the fourth nuns

10:49 they stand and then you have to all the variables case and organization,

10:56 , the barrier. It works exactly in the pilot regions we talked about

11:02 . But then there's too new synchronization that I wanted to mention. One

11:08 Task Wait and the other one status and here is kind of an example

11:14 how to use the task. so the task weight is pretty much

11:21 as a barrier, but it's a restricted to preceding task before the statement

11:27 made. So in this case, applies to the two tasks, but

11:32 doesn't apply to anything else. So that case, what happens is when

11:41 execute this piece called, Um, A is before any of the tasks

11:49 sponsored that gets printed first than because the task way. That means the

12:00 task has to be completed before there fun to watch gets printed,

12:06 The order between the two tasks is , so that can believe the car

12:12 raise first. So this is what's on. As I said, this

12:16 access the barrier just for the And then, yeah, I had

12:23 example here, I guess. Eso if I can get participation and sorting

12:28 what gets printed. Anyone? I've intends to be the one but

12:45 else and Taiwan is welcome. Thio as well. Of course, everybody

12:57 a little bit shy. So if look at the code. So we

13:02 the inheritance access to find in the my other regions, so that means

13:09 the past constructs um, the first rule applies. So that means the

13:20 statement ought to print task one and X from 0 to 1. And

13:29 it, uh, gets to get before the second statement is executed.

13:39 then, uh, we have a task on again exes, then inheriting

13:46 the first private rule. Anyone is take it for what X?

13:54 the second prince stuntman may print we'll do it. And so because

14:06 was just ask wait statement. That's . Then that X has been implemented

14:11 the first task before the second task executed. So that means X equals

14:18 one when the second task is So that's why X gets implemented one

14:26 time. So without the task they could have both printed one.

14:35 , frequent in both corrupt principle. our, you know, last to

14:41 have printed one and task 1 may printed, too, because depends which

14:49 things gets executed. Task. May update to the second task May Without

14:56 task, weight could have updated X the first in text. Task one

15:05 the increments. Ah, what this make any difference? We'll soon get

15:20 same output or we get a different . It's a 50 per chance

15:40 Yes, same or different. here's what we got. Anyone want

15:56 comment on Biden? So what happens that now we have the first private

16:10 or claws when they task directed. that means now the private copy of

16:20 king yet generated and that iss initialized the global X. So even if

16:28 have the task, wait in between two tasks, they both start with

16:34 equals to zero because of the first close. So it's again just to

16:45 the trick in insists, I in terms of managing memory and memory

16:50 in open and people at least in view, um, so this is

17:00 one more case that just talk Thio the thing is that this is another

17:14 because there is no guaranteed me either or finish between tasks. So the

17:30 the task, if you like in calm and in fact finished before whatever

17:38 in the called task. So that's it's it could be the case that

17:45 harder task actually again finished and terminated the other task gets to do this

17:52 system. So the task group construct , um, similar. But the

18:04 sets everything within the region that applies the test Group has to finish before

18:12 a synchronization point at the end of test group, regardless of what happens

18:20 the task directed. All right, , clear, like the barrier.

18:34 , This just shows an example of and how things can be done in

18:37 of it. Task group on that of anything and the that's spawning thing

18:48 finished before that set of tasks, , accept it. But on the

18:57 hand, the how they one of first ask in reading from top to

19:03 , could very well have finished work . Um, there's just another task

19:12 construct. Uh, that is pretty as that parallel four statement that it

19:22 the loops. But there is some options in terms of causes for the

19:27 group constructed assists towards the bottom one can define a little bit

19:33 One wants this for loop thio be up. So in this case,

19:41 can specifying Is it says towards the of this slide and have it in

19:48 next slide as well. That one can specify the grain size that

19:54 , because chunk size there was some in the schedule clause for the parallel

20:04 that there was static and dynamic here a little bit more explicit in terms

20:09 actual specifying grain size or specifying the of task. Mom wants to take

20:16 on the loop, construct to, there's a saloon cold viruses that this

20:27 grain size or minimum number of iterations a particular task should be at least

20:34 . And then there was this range the runtime system is allowed to use

20:40 the minimum size and twice the minimum . All right, so a little

20:50 about scheduling. So in terms of scheduling, it's a A lot of

20:59 has to deal with relationships between the tasks, since tasks are independent and

21:10 ordered with respect to each other. for test scheduling, their additional clauses

21:17 didn't show up in terms off parallel . And I will talking very quickly

21:30 because again it's extra. And then will be a few examples that you

21:38 look at. But it is kind a the next well, these air

21:46 of all you need for test, the If Clause Band is if course

21:52 , but had access like the different untold caused may not the intuitive at

22:01 . The other ones at least probably some meaning intuitive, meaning that,

22:08 now the tide thing is the fact when it comes to tasks, the

22:14 of code that is defined to be task it's not necessarily executed.

22:28 The Given thread for its entirety, threads can decided to just or the

22:39 system rather can decide to have several one at the time, though,

22:46 one has a single construct outside the directive. Yeah, the task

22:54 may be executed by a single but not necessarily the same threat

23:01 uh, the entire task. So parts through executing the task. A

23:09 threat can take over the role of test. If one wants to ensure

23:20 given tests is executed by one and same thread throughout, one needs to

23:29 it by basically having it as it's as tied to the starting threat.

23:42 , by the fault, it's And, as it said here,

23:48 they're fun doesn't watch out. And is, and I'm tied certain

23:55 The causes and constructs may not behave way unexpected, too. It's like

24:06 you do things where they depend on thread ID that may not, it's

24:15 likely to be the same again then thread team is defined by the operating

24:19 for a parallel region, so the have their ides throughout the pilot

24:25 And then, um, it's a threat takes over on the third

24:31 changes same thing with a threat. . Then, if a different threat

24:38 over, it does not get So there's lots of caution have been

24:45 in in terms of writing code that on of the same third being used

24:55 the entire test. So on. then there's, if close, that

25:03 be used to guarantee that things um, tight. And again,

25:10 going through this fairly quickly because it's advanced concepts and don't expect you

25:18 , uh, work on for any . But you're beyond the particular

25:25 I think it's I want you to that there's lots off tools of flexibility

25:31 how to use open MP. And is another way off saying that,

25:40 guess, reminding that tasks are so generated when there are encountered and depending

25:51 the rules execution rules that are set the task. Um, then they

25:57 be deferred or liver, but basically creating this pool of tasks that has

26:05 be executed, and then they're operating designed, uh, hard to deal

26:13 the tasks so they can start the execution once they happening or they can

26:21 scheduled at some future time, depending But again, they always themes the

26:27 thing to do, depending upon availability threads. And maybe it's a system

26:33 in terms off. Uh huh. , heat conditions, air calling where

26:41 decided they need to basically hold back some threads because off environmental concerns,

26:51 , the task yield is something where can in the cold declare that it's

26:57 . Thio potential suspend given task in off some other tasks, depending upon

27:07 one feels so the most important. we talked to the Ramstein system to

27:14 more flexibility, India and, scheduling the tasks on this is just

27:22 simple example off what things can be again where Mom wants to have some

27:32 part that can potentially, uh, priority touched. And there is a

27:44 thing of resigning priority to task leaving their income system choices. Andi

27:53 the hints from the programmer as to ones, uh, might want to

28:00 referred to execute um earlier as supposed later. And then there's,

28:08 or explicit way off declaring dependence of which one's needs to be executed before

28:19 tests get schedules along chains. Let's input and output, dependencies off tasks

28:30 the hair, Uh, simple Next tried to illustrate that where one

28:38 this case shows that on the variable that is involved in the task number

28:47 , in this case generate something that tests depend on. So that

28:54 it's declined as a output in this or the end value of X in

29:03 test, number one is something that task may depend on, and then

29:11 other to test them personally fire They need excess an input on,

29:18 one that can basically build a data graph. But the X variable and

29:24 things ordered in the pits has required based on the dependencies. So and

29:36 says one on three here, and is, I think, what I

29:40 only one. There are a couple variations on this theme that I think

29:45 did not decided to talk about, , um, that you may want

29:51 take a look again. I'm just to highlight things that our advanced concepts

29:57 gives flexibility off creating quite rich, all statements or constructs in open and

30:10 respect applications. And these are three codes that I'm not going to talk

30:19 today. But hopefully they I recently documenting. So they are in the

30:25 deck that is uploaded and that the lecture. If there's interest, I'll

30:31 to go through these examples in detail explain what's going on in the different

30:39 . But just for your information, some of you may not be familiar

30:46 first two. I bet you are with the last one, even though

30:50 may not, as it turns on historic for experience in teaching, This

30:57 may not be familiar by Nate. first two is very simple, objective

31:04 that we use for solving in your of equations and the Kobe example.

31:15 heard the method LaPlace was used in of the assignments you're already done.

31:21 , the guy outside l is just the more efficient each other method.

31:27 , but again, some of you be familiar with it, but maybe

31:30 all, um and L. Their composition is just a different name

31:36 Gaussian elimination. And that's what I'm . That at some point in your

31:40 you have, um, being exposed garden elimination. So their code examples

31:48 to use the test constructs and the constructs and writing efficient, open,

31:56 coats. The difference in terms of Jacoby in God's Idol, in this

32:04 is that if one looks at the dependencies and the girl cell Seidel

32:10 there is particular notion that is common parallel computing. Constant yeah, codes

32:21 one, uh, discovered that some kind of has a very front type

32:31 , um, model in it where on the way from scan be computed

32:37 parallel and then different way fronts mhm follow each other can also be computed

32:44 . So it's that sense more complicated Jacoby, but also more flexible.

32:54 these are our code examples that you find in the slide extra on the

32:59 of time. I will not cover unless and send emails are tell

33:08 you know, in next lecture or down the road that would like me

33:14 travel them in detail. I'll So the next thing I wanted to

33:20 today and I will do that somewhat as well is this notional trade affinity

33:28 basically, how to control where threads and act. So first, I'll

33:37 to motivate it, the need for or importance by an example. Then

33:42 talk about kind of default, allocation raised the memory and then some of

33:49 binding constructs on. Then after this , well, to the demo.

33:56 first example have two examples. on one is the common matrix month

34:03 multiplication. And this was done on huh. Single core. I guess

34:11 was probably gone on my laptop. colleagues in Brooklyn on it shows two

34:20 allegations. One is scattered and the one is compact or closed and scatter

34:28 you kind of spread things out or course. Uh, and then recently

34:35 wait on compact is to try Just use kind of the minimum

34:40 of course, to Dio if it's to each other on the top

34:48 In this case that shows the performance making small application on this particular CPU

34:57 Intel on on the bottom graph to left, it shows the energy consumption

35:07 . So in terms of the performance the dark blue that ISS. The

35:13 version, its's spreading things out and as many course as was available for

35:20 test is clearly beneficial. That, this case, even got more than

35:26 the performance you know about at the clock rate. Take out about 2.5

35:31 from Forwarded Thio 100 Giga flops in particular case. So for performance is

35:38 the case that the scatter was the winner. And if one looks at

35:46 energy consumption for doing this competition on compact waas or energy hungry, the

35:56 one than the scattered so scattered in case gave higher performance and,

36:08 the lower energy. So if one at what consider, in terms of

36:16 efficiency, in this case, the yeah, if one works out the

36:21 roughly 60 plus mega flows provoked, the other one, a compact,

36:31 a lot less. So the scatter a winner and both performance and

36:39 and it was a little bit more a factor of three 3.5 or so

36:44 favor of the scatter. You the other one example I had was

36:50 to expect a multiplication and in this , it turns out toe kind of

36:57 the opposite in this case, the depending upon the clock. Right?

37:03 the scattered performed better in this case low clock rates and didn't perform as

37:09 for the high clock rate. But terms of energy, the scatter was

37:18 all the long. So it turns , if you look at these numbers

37:22 this case, the scatter was elusive the compact was a significant winner.

37:31 this is just justify that allocating threads have a significant impact both on performance

37:39 energy efficiency. On this, just what I already said so hard it

37:49 . This depends, of course. also, how the day I was

37:54 and what's the operating system does is use what's known as the first touch

38:03 . So it knows what part threats going to be allocated or to the

38:13 , and it means for the According this on this case. That's

38:18 example from a dual socket system that's two pieces off little memory this

38:25 Foreign stock it Andi. This cartoon basically to course for each socket,

38:32 if it's just a single thread things allocated on a single, uh,

38:39 the memory as associate with the singles . On the other hand, if

38:43 do the two threaded, violent then they already get split up and

38:51 gets allocated across the two. So I will, uh, any

39:03 , I guess, on this first principle. So that Z that's why

39:14 a little bit and that was behind the Matrix multiply and the Matrix vector

39:21 . Behavior in the examples escaped and this has has said, That's

39:32 , uh, operating system Common Operating does so now a little bit what

39:40 can do to control with threads and some slides here. That is a

39:46 amount of text on, and that's most DeMent for documentation. And then

39:51 try to explain it to some very examples, and I give you some

39:56 or how the constructs were. um, the open MP then uses

40:10 notions off places and the places as said on the previous slide and will

40:16 repeated on coming sciences. You can what they mean. They can remain

40:23 that can mean course. That can sockets. And then you have a

40:29 of defining which, um, threads cores you want to use for their

40:41 in, uh, the variables. different ways. Um, there's a

40:50 specification, I think. As I , this is more for documentation,

40:55 I'll cover it in the examples, this is to give you some

41:01 Um, if you want to try around both three. Definitely. So

41:08 is just mhm. Different ways are I was using my examples that comes

41:16 the following slides in this case that the first statement basically defines threads.

41:24 the thing that is being considered as on. You can also then design

41:30 find the number of places by putting a number within Prince. It's the

41:34 statement, or you can use these in the last three statements here.

41:44 square is different ways for specifying the thing you have in the Middle state

41:51 off the five. This an explicit on, but you can also use

42:00 of the kind of triplet type constructs is common and array programming languages in

42:10 and then second from the bottom business start at four and repeated four

42:17 So expect 0123 on the second Start at four and repeated four

42:23 And that gets the 4567 etcetera. you can then, and the strike

42:32 is stand on the bottom statements, 0 to 4 is repeat four

42:37 and then they started this four between one of them. So that means

42:42 points for the first one becomes 048 , um, all right, so

42:55 the thread allocation and starts to wherever master thread in apparel calls is

43:04 And then it depends. Whether the or spread that alone illustrates in the

43:12 few slides is in effect. um, the clothes clause or attributes

43:27 the bind clause is they're trying to friends as closely as possible. We

43:40 . It's available. Andi here is rules. What happens when Francis Nice

43:49 than the number of places? Or there's more threats and places, How

43:54 collections off red skits allocated to each the available places on here is I've

44:03 illustration on what happened in the first . If there's fewer threats and

44:10 then there is one thread per place this case is the best of thread

44:15 on, um, place number Then it's a round robin assignment off

44:21 as you keep going. On the hand, if there's more threads,

44:26 places, then Ron gets a few . And in this case, I

44:33 there were 12 threads and what eight . So that means I will be

44:40 threats for certain places on a single for other places and exactly how,

44:49 on sort of maximizes the number off that gets to and then the rest

44:58 them gets one or how long spread . Fred's out among places when it's

45:04 even there when threads are not and multiple places is up to the operating

45:11 . But this again shows the close that two threads are allocated to the

45:19 places and starting where the master threads and keep going until, um,

45:27 has basically only one thread for the places now spread, Affinity tried thio

45:39 things out. This is the basic that is given in the open MP

45:49 specification, where things spread out. . I will show you are not

45:58 consistent with this, but apparently there some implementation freedoms in how vendors do

46:07 . But the basic idea is to things out. Um, so in

46:12 case, if there are, there were threads than places than

46:22 Basically, groups places Andi creates as groups as there are threads and gives

46:31 thread for group. Otherwise, if more threats in places than things gets

46:39 according to the standard definition, the things actually happen in the close

46:44 But there is one way in which might happen in the spread. Allocation

46:53 threads again started with it Master threat and place number five. And in

47:02 case, where three threads. So number of places gets carved up into

47:07 groups, and in this case, with a group around where the Master

47:16 is allocated and then things are done and around wrapped in fashion. And

47:23 , which ones gets have tohave three two places in their petition is up

47:30 the operating system to decide. But this case, the three threads are

47:35 based on partitions or groups off In the petitions in Iran, robber

47:45 and the bottom one shows in case ends up being similar to you.

47:51 those allocation that I was showing examples that apart illustration vendor says, or

48:01 off the operating system for their platforms choices. And I'll try to cover

48:08 , um, fairly quickly in the few slides. So in order to

48:18 of suggestion of time to do some , so here's a little bit off

48:26 might happen. So this is to it, um, server, in

48:32 case, Andi, with 12 course sucked in it and then hyper threading

48:40 . And then it shows how numbering of threats, have done such

48:46 people in your number one threat for before you start to number. The

48:52 read, and these are things that not pre defined by the standard,

48:58 it's up to vendors and was implemented how to do this numbering, and

49:06 hope to be able to cover its quickly towards the end before I hand

49:10 to see ash. Um, so this particular numbering I can see on

49:17 bottom. The first thing is you across all the course on one sockets

49:22 then move on to the next And then you come back and number

49:26 second thread on the first stock. and this is the way into all

49:31 things here is now there. Most the clothes vs spread. In this

49:39 , it was four threads and there eight cores. So in this

49:44 there close uses up the first four and spread, you know, divides

49:52 eight course up into four groups and one threat for each group on.

49:58 , in this case, it was that the master threat waas on core

50:07 . So no ankle setting, and this is again a different way of

50:14 it. Um, using close and to mom or what? Right.

50:29 was a compact. So what did do different here? Uh,

50:37 What's happened? I get confused. me move on. So this is

50:43 compact again now, But this is a direct assignment by placing the four

50:52 on on the left hand side 0123 the other places. Then on the

51:00 to 4. So this is just . Honest slides. Um, so

51:09 is another way of doing the direct using the kind of repeat or Tripplett

51:20 . But when the state is it assumed to be one. So

51:24 why you get the same thing as clothes statement on the left hand

51:33 And you've got the exact same as the spread on the right hand

51:38 No. Yeah, I put trading . Then, um, this is

51:44 happens in this case again, using still numbering you in this case a

51:52 and the closed allocation threats than is one thread for all the course

52:02 They're going through one through 0 to , using all the eight course but

52:07 thread per core. Because again, said, the way the intel dozen

52:14 and then, um, on the and say cores, even though hyper

52:20 is enabled, it just streets course the unit. And then it allocates

52:25 eight threats by using one per And this is just another example

52:35 I will skip talking about this because want to cover a little bit more

52:43 and show what in tow does, hopefully they slides. Our examples are

52:48 self explanatory. so I will jump . So here is the way that

52:56 tools to show and then so you them or somewhere showing how you can

53:03 out what that's our and I will this and talk you to you about

53:10 intel kind off those things. So have in their bind policy cause I

53:17 a family or flexibility and I will show your very quickly some examples that

53:24 will illustrate. Maybe if you're you can go on, look at

53:30 details. So come on again. is for some. So now the

53:36 that said so here is so intell Things are quickly against First Open MP

53:47 basically a note programming concepts. So note is the starting point in their

53:53 on then another, then has if sockets on it on. In this

54:00 , I have no idea why they , uh, package to the second

54:04 package three. But that's what they . But there's basically two subjects that

54:10 deal with song and again sockets or CPU is plugged into a socket is

54:19 the inter vocabulary treated as a The thing that gets plugged into the

54:24 zones. That, too, packages that in there and then each

54:30 in this case off. The CPUs to courts, and then each of

54:37 course has hyper threading enabled. So , in that case, are basically

54:43 total of eight threads possible. And I wanted Thio allocate eight threads and

54:55 the compact things, threats gets allocated as shown here, starting from the

55:02 again. If the master thread is particular, Progress said, if under

55:15 now. Instead, as I mentioned , Intel treats, uh, goes

55:24 the route towards the leads so on , spreading it first spread things on

55:34 or sockets. And so, in case, thread number one ends up

55:40 the second Soccer and then it And then I go back on to

55:50 first, um, socket. But it uses the second court on

55:59 Same thing that if you keep going it alternates left and right, starting

56:05 the root. So this is the they scatter. Statement works.

56:12 inter on. Then they have more in terms off. They can.

56:21 , you can specify if you want different priority than the order in levels

56:32 by this binary tree. So you necessarily need to use the sockets as

56:43 first thing to be the alternating thing such angle down the tree. So

56:49 have various way already ordering the levels terms of doing thread. The sign

56:57 this is what they have the commute offset. And I will skip talking

57:04 that and you can look at it again that take questions either at the

57:09 two thirds time or next lecture. get so he has time to do

57:13 demo, and there is a couple examples and will that I will stop

57:20 . Let's see Josh continue and do demo, and I also say,

57:25 addition to the examples I didn't talk in terms of the task statement,

57:31 also additional slides in the slide deck today about performance optimization that you may

57:37 useful Onda again. If there's I can talk about those of the

57:43 lecture, but otherwise next week I'll a limit about characterizations and compilers and

57:51 will hand up the mid term there be taking home mid term that ticket

57:56 way to do number that I will it to triage. So start sharing

58:05 . Yeah. Okay, so it's screen visible. Yep. Okay.

58:17 we just a few examples off First of examples about the thread,

58:24 and then some examples. Very simple about the open and big clauses that

58:30 seen in last class. Eso The example here is just about the functions

58:38 are provided by open and fear that related. Toa threat. Affinity.

58:43 , as you can see, I a simple program here that has called

58:47 MP get number rocks, which returns number off hardware threats. Um,

58:54 , um, on your system. there's no MP get non places,

59:01 tells you the number off places that being used. And that depends on

59:07 the environment. Variable. Um, underscore places as we see eso in

59:12 slides. Then you can also use when we get placed. Non products

59:20 pass on, idea off any of places, and so that gives you

59:27 number off processes inside a place. then you can do the same

59:32 Uh, by using the function when get placed broke ideas to get the

59:39 off the processes inside a particular Eso I will not be running the

59:44 . I already have a worthy outputs I used collected using the bad

59:51 So just, uh, just go them. Eso Here's an output off

59:57 place is test source code here, we see so initially, as you

60:03 see, I've said the O. P places Toby sockets. So as

60:08 recall on the compute notes off stampede we have to sockets por note.

60:15 so when you query the number of you can see there's 96 hardware threats

60:21 to hyper threading so that we get . If and since we've said,

60:27 , MP places Toby sockets, that we will get number off places as

60:33 because we have two sockets and if you call, don't get placed

60:40 . That is the number off processes the place ID zero out of these

60:47 , you can see because we have threads for, uh, for

60:53 That means our 96 threats were divided two places, each containing 48

61:01 And also, if you query the off ideas off the processes inside one

61:09 this place. Out of the you can get all the IEDs,

61:15 from this you can see that the zero was allocated. Took place is

61:22 . And then there's there's a gap receive. Ah, the process one

61:29 allocated to the second place, which is not shown as an inside the

61:34 I'd zero. And as you can , all the even number processes that

61:41 Thio to place Idea zero. if you change the MP places,

61:50 equal toe course. Initially we did . So I recall that we have

61:56 cores on on this computer notes. again the total number of processes remain

62:02 . But now, since we had cores, we can see we have

62:06 places now and also recall that uh, each core on these tempted

62:13 compute nodes, has to our veterans of hyper training. So that's why

62:20 each of these 48 places, there a there are 22 processes. And

62:28 you try to query for this idea you can see the process is zero

62:35 48 are part off place i Zero. And as Dr Johnson just

62:40 , that this process IEDs are allocated a round robin fashion. So since

62:45 have 47 or 48 physical course, IEDs 0 to 47th went very allocated

62:56 all the 48 physical course, and 48 48 is the next one.

63:03 belongs to core zero. That means and 48 decide on the same physical

63:08 . These air to hardware threads off core zero. And again, if

63:15 move to the next level of granularity , when P places Toby Threads,

63:20 you already had the maximum number of that is 96 but And since we

63:26 hyper threading, we have 96 places well because we set our granularity Toby

63:34 . Uh, then each off these places has one process because granularity is

63:42 and you can get the idea off process inside one of these places,

63:48 or place I d zero you get zero process inside that place.

63:55 there is one more way you can the affinity off threads by using the

64:03 M s So again, I have output for that command for different

64:09 Uh, yeah. So to use model a mask you need to first

64:16 on stampede to then for this example said going be known threats Toby

64:22 And for the first case, we , uh oh. MP places Toby

64:28 . So granularity is course. And we see the Prague bind environment

64:35 which binds which determines the binding off threads to different resource is on the

64:41 system. And for this example, started to spread. So remember,

64:46 granularity is course and we said three the distribution of threats to be equal

64:53 spread. And if you run a and Peking man now you can see

65:00 core has two threads. So this axis is the number off is the

65:07 , of course. And each line , ah, hardware thread.

65:13 as you can see, the thread waas blinded toe can be minded to

65:20 of the hardware threads on the core . Right? Because eso this score

65:27 here, these two. And since had our process binding Toby equal to

65:34 so the next thread was three open period time tried to assign it as

65:41 across the board as possible. So next thread thread one was assigned the

65:47 Number 24. So you need to 20 and four. So that's call

65:51 24. So it can be been any off the hyper threats inside the

65:58 number 24. So that's why you two fours here, and similarly

66:04 since we have four cores authored four so third thread was can be assigned

66:10 court number one. So again, tried to spread evenly as possible.

66:15 the fourth thread, uh, can pinned inside a court number 25.

66:23 any of the high portraits of Court now keeping these two parameters? Same

66:29 non threats. And when he places be same foreign course, we change

66:35 process, binding from spread toe clothes . Then, as you can

66:41 thread zero is still bind it to zero Here. Now tread, too

66:49 come closer in terms. Off distance the Newmarket lecture it comes, it

66:56 be now. Bind it to court four thread three. Can we bind

67:02 toe either off the hyper threads off eight and thread four can be bind

67:10 to either of the hyper threads off 10. So just 10 Placido

67:15 So that's 10th Corps here. Does make sense? It's a It's a

67:24 bit tricky in the first when you it first time, but,

67:28 read it a couple more times and starts to make sense. Uh,

67:35 , so now let's change our, , P places from course to

67:41 and I should go a little bit . So now we change the granularity

67:47 threads and, as you can if we have the spread process binding

67:54 rather than having to places inside accord the thread can reside now, since

68:02 granularity has changed to just threads in thread can only exist on one of

68:07 hyper threads. But again, the the same thread zero can decide on

68:13 zero thread. One Encore 24 tried on core one and thread three on

68:21 25 but now they can only reside one off the Eiffel Threads, as

68:25 to previous case where they can. could have resided on any of the

68:29 threads. And similarly, if we change our process finding toe clothes from

68:36 again the tread zero comes to core first hyper thread. Of course.

68:44 and thread one comes to the second thread off course zero Similarly thread 00020

68:56 to goes to called force first hyper That's the one with equal to those

69:02 and thread three goes to second hyper off your for So that's this tool

69:10 can use to see how the opening of time is behaving with different threat

69:16 and, uh, place granularity Does that any question about that?

69:26 , I think this relates to when asked how to separate between or how

69:31 sign all your tasks or processes to course to make sure that you're not

69:39 right, right? Right. So think hyper threads if you don't want

69:42 right in some sense, because they're gonna be there, Right?

69:46 right, right. So you can different levels of granularity off places and

69:51 of process binding to do that. . Yeah. Eso These were most

70:00 . Were there two examples about trade , then now, the next few

70:05 are mostly Q and A so Can anyone tell me what would happen

70:11 this code if I remove these critical ? Critical clause. So see that

70:17 I'm here initializing an area. And we're trying to do is trying to

70:23 a finding the maximum value off out this in this area. So does

70:31 see a problem if I remove the class here? Say, if I

70:37 this, what's gonna happen? eso critical makes it such that Onley

70:48 threat can enter at a time, ? Right. Um uh, sort

70:53 like a multi line atomic. Or the way I think of it.

70:57 , So it will be the max the loss thread that entered that

71:04 So Ah, it'll be the max a partition. But it might not

71:10 what you want it exactly. Right, right. So here's the

71:15 that you might get without the And here's the one with with the

71:21 statement. So here, See, without the critical. What may have

71:26 here is 27 for the attrition that using value off 27 the attrition that

71:34 using value off 12. They both have gotten inside. There's if

71:41 because, let's say 27 gotten and to sleep and did not update

71:47 And at the same time, the that was handling value 12 could have

71:52 gotten inside this if statement. And one who came out, uh,

71:58 from outside the section will update the at the at the end point.

72:06 that's the value that you will see the from that particular threat. And

72:12 , if you just add a critical , tau that section, what is

72:15 to do is going to allow only threat at the time inside that section

72:20 that if condition and you can make that your output is correct.

72:28 now again, again, another simple . So the question is, what

72:32 happen if I change this clause here I to share private and first

72:40 So first question. What's gonna happen if I make it shared?

72:49 if you make it shared, it'll last threat that ran the Valley of

72:56 will be 1000 plus the number of threat so it could be different.

73:03 ? What? What? What would get? Uh, the end off

73:07 barrel region outside the barrel region outside parallel region. Well, you would

73:12 10 because you didn't declare it us private. What's what's gonna happen if

73:18 set it as a share? sorry. Go ahead. Yeah,

73:25 ahead if you said it is um, well, what I said

73:29 , right. So, um, will be the last thread that access

73:33 parallel region or really, the last of that access the statement, the

73:36 statement of the parallel region. Because we have no guarantee that the

73:40 within a parallel region are running right another for a single threat on what's

73:46 happen if its first private, if first private, uh, then it

73:55 have made a difference. Right? first private just makes it so that

73:59 initialized to the valley of the Um, where it was declared,

74:03 , right. That's going to get by the high equals 1000 plus

74:09 So here's the output that you can so far. Shared. As you

74:13 see, the value was shared amongst threads off I So, uh,

74:19 were three thirds who got to see value updated by trade zero. So

74:26 would have made it 1000 plus So 1000. And as it turns

74:32 , not all the threads also saw updated value. So there were still

74:37 threads that did not that ran after but did not see an updated

74:43 There was threat to third one and . That's all that updated value.

74:50 in the end, since it was , the one that ran last,

74:55 , decided the final value. If for private as you can see,

75:00 dependent on opening period of time. it initialize is the value for I

75:04 the barrel region for this case, did and initialized it to zero.

75:09 whatever happened inside Battle region, it not affect the final value because I

75:15 private with first private azi can see initial value was carried inside the barrel

75:22 . So inside paddle region, you've 10 and again since it waas first

75:28 stayed 10 outside the battle of Now with the reduction clause,

75:36 it's a simple example here, so overview of what happens here. So

75:43 have three areas A, B and . We try to add these two

75:46 inside our battle region. So it . Make sure thats works correctly.

75:53 few things that you make I need make sure is you need to make

75:56 B and C share. Uh, this case, they will be implicitly

76:01 because they're defined outside the battle Uh, you need to make your

76:07 variable that holes, uh, the value off these two elements. You

76:14 to make sure that it's private This making private eyes again redundant because

76:21 you recall the loop indexes are toe the threads and there's this class

76:29 for the some variable. So since variable was defined outside the barrel

76:35 it should be global. However, reduction clause has defined in the opening

76:40 documentation the identified that it takes for thread. There will be a private

76:46 off this identify for reduction for for thread, so there will be a

76:53 copy for each thread. And so can. You will perform whatever operation

76:59 want to perform with. Some here the paddle region and after the barrel

77:05 Open and people apply this operator on the copies off the sum for all

77:12 threads and gather it outside the Bible . So output you will get will

77:18 a reduction operation on all copies off threats. So here's the output that

77:25 can expect. Eso These were the Way had two threads, see was

77:35 correctly on that, although some was for Isis, 0 to 7,

77:39 should be 28. And because we reduction on it got the correct out

77:46 . Uh, now another example off the schedules. So the question here

77:54 what will happen if we set static or dynamic schedule? So we have

78:01 threads and a loop that has 16 . So question is, what's what

78:06 happen if we have a static Okay, Anyone? Oh,

78:27 Eso What's going to happen is with scheduling if you recall from the lectures

78:33 , uh, open and being on will evenly distribute all the all the

78:38 across all the threads. So, you can see threads e each thread

78:43 to it rations, uh, off follow thread. One got one here

78:49 second here. I tried to go it rations as well. If he

78:54 to dynamic, then opening period time assign workload to the threats that have

79:02 their work in There is still some left to be done. Eso In

79:08 case, it's not guaranteed that each will perform the same amount off

79:12 So as you can see from the for dynamic scheduling 30 performed,

79:18 six situations and all the others, , performed two iterations each and we're

79:25 thread five and six, as you see, which we're not allocated any

79:30 because their work was allocated different Uh, so, for example,

79:39 scoping. So this is again from , uh, from the lecture from

79:46 lecture slides. So if you set , G to be shared in this

79:55 , if you don't do anything if it's a J share, then

80:00 will be, uh, shared by the allow the outer loops. And

80:07 that case, you pay end up a case where uh huh, the

80:12 value a little bit messy and a can see I won. Ah,

80:20 for j equals to do two As you can see, I equals

80:24 is missing. It's Jake. Also and zero and 01 and two.

80:30 traditions. So the solution that was , uh, in the slides was

80:36 made j private in that case, in that case, all the all

80:46 outputs are correct that quickly we started session. I'm just two more examples

80:55 show. It's it. Now, question here is we have this battle

81:23 and thes three or two single The question is, uh, is

81:29 a implied barrier here? We have a single statement here. Is there

81:37 implied implied barrier with the single Yes. No. Okay. Eso

82:01 answer is yes. For the first , there is an implied barrier here

82:07 when the single if you use, , it's declared as that it has

82:15 barrier for it. As you can , the first print statement,

82:20 executes, uh, in the first , no matter what, Uh,

82:25 this case, for the second there is not an implied barrier because

82:29 have no wait claws on because off it can happen that the second friend

82:35 can execute after the third print statements some of the threats. So you

82:41 to have a no wait barrier uh, to make sure or remove

82:47 based on what your application is with single statement. And this is again

82:52 example from the slides. So what would be the output for this program

83:04 I have? If I run it in this way, all these single

83:08 task outputs removed you would maintain, a cyclic ordering on threats. But

83:20 might get, like, a race to watch race car. Yeah.

83:25 . And what happens if I remove the comments here? So what happens

83:29 I do this? What's what would the output now? So the task

83:43 , um, thought it could be any order. Eso ah, you'll

83:50 always get your set of words, , before another set of words,

83:56 you have a single clause in that single, but, um, within

84:01 set of words, a would always printed first and then race car.

84:06 , and it's fun to watch. be in jumbled order, I

84:10 Okay. Yeah. So the only incorrect about that is that since we

84:15 a task, wait here. That it's fun to watch. Will always

84:21 after all the other tasks have been . So it's fun to watch Will

84:26 at the end. Uh, in cases, eso that z what's gonna

84:32 so without, uh, all uh, dusk clauses, you can

84:38 there's outputs from to battle threads. not in any order. And with

84:46 statements is always first God and race have any, uh, any position

84:53 on whichever task around first. And fun to watch will be always at

84:57 end. Eso That's all the examples had. Thank you. Okay.

85:07 it questions from and then one You you would be releasing the midterm

85:15 No, next week. Next Okay. And then you get one

85:20 to do it. All right. . So and after that, there

85:43 be an assignment. Thio work on . Okay, It's someone once.

86:22 you, E myself. Stop

-
+