© Distribution of this video is restricted by its owner
00:04 | So, um, we'll continue with open MP that I started last |
|
|
00:13 | So now, discussing or talking about various constructs, um, the basic |
|
|
00:23 | as well as what's known as close that is a way off telling the |
|
|
00:31 | a bit about what? What should done in response to the constructs on |
|
|
00:39 | there's time, which I probably don't I'll get Thio go through an example |
|
|
00:44 | what Detect? Because I will stop little bit early for us to give |
|
|
00:50 | a demo for the basic open MP and how to use it on. |
|
|
00:57 | we'll talk a little bit more about for them or more next time. |
|
|
01:03 | first, just a quick recap about Open MP is all about. It's |
|
|
01:07 | shared memory programming model so practically it's to single note Marty third or multi |
|
|
01:18 | programming on the streets around it. for this class, just think of |
|
|
01:24 | as a something for programming individual and it worked through compiling directions. |
|
|
01:31 | then it has the support of ransom , and one can define execution environment |
|
|
01:38 | environment of variables, and it is fourth joint type model. Uh, |
|
|
01:44 | conceptually is similar to the master as was discussed last lecture. But |
|
|
01:51 | different in the sense that for open , it's just one address space, |
|
|
01:58 | separate address spaces and by having one address space, then it can |
|
|
02:05 | done on a thread level. So is a lot less heavyweight, so |
|
|
02:11 | speak, than typical master slave processes spawned on potentially different notes or |
|
|
02:23 | On the very basic one is the model pin Parallel constructed. We'll talk |
|
|
02:30 | about today, and the other partner brought up last time is that it |
|
|
02:36 | shared memories. So, uh, requires a fair amount of attention to |
|
|
02:43 | one wants shed or not between the in parallel regions. And we're talking |
|
|
02:49 | that today, and this is a bit of the structure brought up. |
|
|
02:54 | there are about the control structures and work turning constructs. There are things |
|
|
03:02 | manage in data sharing, and, course, since the Panelists we need |
|
|
03:07 | so synchronizing threads. And then there 10 functions. Recover a little bit |
|
|
03:14 | most of these aspects today, and a little bit of point off it |
|
|
03:22 | open, empty. That started out very simple, uh, about I |
|
|
03:27 | 20 years ago now it was just way of trying thio make programming parallel |
|
|
03:34 | . Oh, shared memory systems easier use is not having to deal with |
|
|
03:39 | extends directly, but creating about your top of it. And it's if |
|
|
03:44 | fairly simple standard as it shows on graph to the right. That was |
|
|
03:49 | more than about 15 pages on. now, it's more than 500 |
|
|
03:54 | just the specification of the standard. in some ways it's lucky that there |
|
|
04:00 | not that many constructs one needs to in order to get the recent program |
|
|
04:04 | work. So it says here it's 21 functions or constructs money is to |
|
|
04:11 | aware off, and we'll cover most them today and the rest of them |
|
|
04:16 | lecture. So here's something what I at the end of the last |
|
|
04:22 | which, uh, chose Steve typical of a cult. That one is |
|
|
04:30 | include statement for the Open and P , libraries. And then there is |
|
|
04:38 | very basic command or directive I should the problem open parallel. That business |
|
|
04:47 | that the next cold docks, in case in close within their college |
|
|
04:54 | is something that you, the wants paralyzed. So in this |
|
|
05:02 | it doesn't specify in a particular number threats to use for this parallel |
|
|
05:08 | And that means the operating system has freedom in deciding what how many threats |
|
|
05:13 | want to sign to this parallel Uh, then it shows one of |
|
|
05:23 | yeah runtime commands that's we'll talk more towards the end of today's lecture, |
|
|
05:30 | its command that gets the idea identity each thread, since they all have |
|
|
05:38 | unique identity that is many times So you can make things conditional depending |
|
|
05:45 | what threat is doing what. So is the open and get thread |
|
|
05:51 | which is to spread. The idea supposed to the number of threats. |
|
|
05:55 | a command for that to that we'll about. And then there's a simple |
|
|
06:00 | know, hello World Program that everybody's USA's economical example for, um, |
|
|
06:07 | to programming on anything. So in case, from can see what they |
|
|
06:12 | put happened to be and what it's of illustrates it is that the different |
|
|
06:21 | are not ordered it in a So whoever gets to anything first get |
|
|
06:27 | chance to basically, in this print up whatever the statement says, |
|
|
06:31 | should do so in this case. , the thread that had the identity |
|
|
06:40 | one was the one that first reached print fellow statement on. But before |
|
|
06:46 | managed to print the word part of whole world said number zero actually ended |
|
|
06:52 | printing out Hello. And so things sort of jumbled, and it depends |
|
|
07:00 | how the different threats progresses to the . Consider doing so. There is |
|
|
07:05 | employed synchronization or ordering between threads unless forces what. And the other part |
|
|
07:14 | that each thread basically execute the copy the cold in the parallel region, |
|
|
07:23 | they all have as well, said lecture. They all have their own |
|
|
07:28 | . Counter stacks, registers etcetera. they are each other's executed, executing |
|
|
07:37 | , the cold that has been And they if there the workload sharing |
|
|
07:45 | that is happening throughout the mechanisms that principle each thread gets a copy of |
|
|
07:52 | code to execute and, um, in a several slides to come. |
|
|
08:01 | talk on the tower. There was shade or not when it comes |
|
|
08:07 | uh, what happens in the pilot ? Mhm. So here is and |
|
|
08:15 | to all that this types of constructs are the direction and constructs that's the |
|
|
08:21 | flow constructs. And then there's attribute that are kind of modifiers to the |
|
|
08:29 | , instructing the compiler off what and the other compiled into the runtime |
|
|
08:38 | What Mom wants toe happen within their regions or response to the construct. |
|
|
08:45 | I will cover most off the construct in these three columns on this slide |
|
|
08:52 | we go, so I will not on them here. But as we |
|
|
08:56 | through them, so for us that work sharing constructs and there is the |
|
|
09:03 | one that is very commonly use. is how to, um, paralyzed |
|
|
09:11 | . And in this case, one use thing can open empty parallel construct |
|
|
09:18 | directly compiler to try to paralyze in case, the four loop. And |
|
|
09:23 | a couple of ways of doing One can you see open parallel four |
|
|
09:27 | specifically on direct to computer that following loop is the thing that should be |
|
|
09:36 | . So that means each threat in case get the copy of the um |
|
|
09:42 | the statement and on the next few , uh, have a little bit |
|
|
09:47 | than this. Loop work is divided among threads, but after apparently region |
|
|
09:55 | back in sequential code, so low on the right basically shows when that's |
|
|
10:00 | serial thing where in this Step 2000 then follows on the loop that then |
|
|
10:09 | a number of threads again. This ? No, the specific number Francis |
|
|
10:15 | . So the degree of parallelism that is something determined by the operating system |
|
|
10:23 | talk about that happens. Hopefully, get to it later tonight. And |
|
|
10:30 | there's a sequential part. And then a new kind of look doing something |
|
|
10:34 | again. Whether that's followed. there's a concrete example tried to show |
|
|
10:41 | little bit. All these things so I only have. The sequential |
|
|
10:46 | is the simple loop with French up integrations and now to create an open |
|
|
10:56 | version of this one, as includes for including the libraries necessary for open |
|
|
11:03 | and then the pragmatist that defines such for loop now should be paralyzed. |
|
|
11:11 | that means the compile it and generates for a number of threads, as |
|
|
11:16 | illustrated in last lecture. So the who doesn't really have to worry |
|
|
11:23 | uh neither creating threads or synchronizing So another point, I guess talk |
|
|
11:34 | later, is that so called spin programming model that this is a good |
|
|
11:42 | off. So there means that's a program on multiple data. And that's |
|
|
11:48 | most common where your programming, whether shared memory systems or distributed memory |
|
|
11:54 | it doesn't mean that, um, thread follows the same execution path through |
|
|
12:00 | program because there are conditional zand that cause the execution path to depend on |
|
|
12:08 | data that is being worked on. even though the program it's a single |
|
|
12:14 | of the same program for all it doesn't necessarily mean that exactly the |
|
|
12:19 | instructions of being executed by each Oh, Walter Johnson, Yes. |
|
|
12:25 | this, of course, another name a single instruction instead of program? |
|
|
12:29 | those two terms interchangeable? No. there is Cindy that is a single |
|
|
12:42 | So that means that's the generic in instruction you add to erase and in |
|
|
12:50 | single ad instruction that works on all paradise elements say off the two Eurasia |
|
|
12:58 | . So it's much more restricted than single program. So in the single |
|
|
13:05 | , as I said, the code look the same or is identical, |
|
|
13:10 | it enables conditions and much more flexible than Cindy. We'll talk about rectory |
|
|
13:18 | on top Olympic about compilation in some lecture. There is, you |
|
|
13:24 | conditional is also unseemly. But it to be that, um, branches |
|
|
13:31 | should not be taken are still being on. Then he just ignore the |
|
|
13:37 | . That's not the case. When have a single program is just excused |
|
|
13:41 | after one. So coming back to particular example, then so here is |
|
|
13:51 | what happens. So in this the loop there has 1000 iterations capped |
|
|
13:59 | default. Unless you ask for anything , it gets split up evenly among |
|
|
14:04 | threats. So in this case, shows that the first thread yeah, |
|
|
14:08 | 1st 250 iterations in this loop and the next threat Yes, the next |
|
|
14:14 | 50 the next 2 50 in the 2 50. So it does, |
|
|
14:21 | otherwise told anything. Uh, just up the range loop index range even |
|
|
14:29 | the threats. Several science on this . Four threads. So that was |
|
|
14:37 | to say. And this slide basically that one can do the little parral |
|
|
14:44 | by doing it. Kind of um, open MP instruction first to |
|
|
14:51 | the region and then paralyzed the four inside the parallel region. Or Lincoln |
|
|
14:59 | on the right hand side, combine two parallel and four into parallel |
|
|
15:04 | If it's just a single loop body one once paralyzed. So let's zero |
|
|
15:14 | The questions on that before I talk a little bit on this Dana scoping |
|
|
15:23 | . Uh, so whenever you say makes a copy saying, that's, |
|
|
15:28 | , pre process the directive, we're that it makes literal copies of the |
|
|
15:33 | , right? Not just as a good question, I've actually been trying |
|
|
15:38 | find out the real answer to that our failed. So I am not |
|
|
15:44 | whether the standard forces the copying of code, but, uh, it |
|
|
15:49 | a bit unnecessary since it is just memory system, so every threat has |
|
|
15:55 | principle should have access to the code then it has his own instruction |
|
|
16:02 | so it knows what instruction it actually . So it's not clear to me |
|
|
16:09 | it's necessary to copy the code. I'm sorry I was trying but so |
|
|
16:14 | failed in finding a concrete answer, it's in the implementation option for the |
|
|
16:20 | writers or it is required that the be copied. So sorry I don't |
|
|
16:30 | a better answer at this point. other question? It doesn't matter for |
|
|
16:43 | for the execution off the program, it has obviously impact on the memory |
|
|
16:52 | . And they could have minor impact execution if several threats tried to access |
|
|
17:00 | same construction at the same time, which case they had to be |
|
|
17:10 | Unless it's so depending upon how the system works. Uh huh. |
|
|
17:18 | yes, about the scopes. So default rule is that anything declared in |
|
|
17:28 | sequential region is available to any threat the parallel region. Conversely, anything |
|
|
17:43 | inside the parallel region is private. threads in the region so things do |
|
|
17:52 | kind of get exported out of parallel into sequential regions. But second, |
|
|
17:58 | regions inherit things from hmm sequential Mhm. And then there's also the |
|
|
18:08 | that for function calls or subroutines inside region. Things are private, |
|
|
18:16 | Um, whatever goes on in there or subroutine. So one needs to |
|
|
18:22 | very careful and managing what shed and is not. To make sure one |
|
|
18:30 | they both correctness as well as, , more performance related behaviors off the |
|
|
18:38 | on it. To be this Sai is the most tricky part in terms |
|
|
18:44 | open MP to get it right and it perform well, you say |
|
|
18:51 | Do you like it's not that zone . Just one copy amongst all of |
|
|
18:55 | . So shared in this case means just one storage location for the variable |
|
|
19:01 | anyone can get to, and it's , and there is a separate member |
|
|
19:07 | for each thread for the variable. this is what I said. So |
|
|
19:16 | the attributes clauses then that tells how variables should be treated in the parallel |
|
|
19:26 | , so I'll talk about them on following slides so shared as the |
|
|
19:31 | That's something that's one storage location that gets accessed by the different threads. |
|
|
19:40 | it's ah, then storage associate It with sequential parts private means there's a |
|
|
19:48 | , um, memory location being allocated the variable. Even if it has |
|
|
19:54 | the same name as in the sequential , I'll show example, some of |
|
|
19:59 | things work. And then there is other following to hear first private and |
|
|
20:06 | private. And I'm sorry, that . What happened then, son? |
|
|
20:27 | so you guys have my share right? So I just want to |
|
|
20:29 | sure that I haven't that stuff. okay. So on So where we |
|
|
20:40 | so the first private and fed um, and less private. It |
|
|
20:49 | a little bit how things are manage the private clauses in effect, First |
|
|
20:57 | essentially has to deal with how variables initialized, and that's private. Our |
|
|
21:03 | in private variables are kind of exported of parallel regions, and then the |
|
|
21:12 | , um, rule for what's fair what's private can be over it. |
|
|
21:17 | by having it the false statement that defines what the default should be a |
|
|
21:22 | to that default toe. So Um, mhm is again. As |
|
|
21:36 | said already, I think they responded the question that is a story |
|
|
21:41 | There's a single one that everybody can or write, but the fact that |
|
|
21:47 | can deal with it. Since threads kind of a Nord in their |
|
|
21:54 | that means they can also try Thio read or write something at the same |
|
|
22:01 | . And then it could also be the outcome is undetermined in terms of |
|
|
22:06 | got access to do what. So condition is one problem when there are |
|
|
22:16 | , uh, variables. And it's that the use of responsibility to make |
|
|
22:24 | that things are ended up being correct here is just a tiny example in |
|
|
22:32 | case, access to find in the regions or by default. It's a |
|
|
22:38 | variable, so that means all the in the paranoid region defined through the |
|
|
22:43 | and Pearl now can update this variable , you know, implemented by |
|
|
22:51 | um on. Then what? X ends up being afterwards. If one |
|
|
22:57 | lucky, the trends do it one the time, and then it just |
|
|
23:05 | the value or five. Plus the of threats are being used, but |
|
|
23:09 | can also be that they're trying to it at the same time, in |
|
|
23:13 | case only one gets access and the value may not reflect the number or |
|
|
23:19 | were actually being used in the parallel . So this is what it |
|
|
23:25 | So now the when one declares a to be private, so that means |
|
|
23:35 | variable that Teoh most likely than I in the sequential part. That's that's |
|
|
23:41 | same name. But you want to that same name in the private |
|
|
23:47 | but not share it with other Unit try to be private on. |
|
|
23:53 | means the new memory locations allocated for variable in there for each thread in |
|
|
24:00 | parallel region. The point to be , although it's it's allocated, but |
|
|
24:07 | not initialized, so one has to care to. Initialize has been one |
|
|
24:12 | or another. We'll talk about ways do it, Um, one can |
|
|
24:18 | it inherit the value that is existing the sequential section or otherwise in |
|
|
24:28 | Do some form of assignment, as I said, because it's a |
|
|
24:32 | memory location, so it is the thing to be right off. It |
|
|
24:37 | the allocated at the end, and you want to keep the value at |
|
|
24:41 | end of the parallel region. It up to you. So if you |
|
|
24:46 | to export it, one uses last course that I was talking about. |
|
|
24:53 | here in the next couple of I think so. Here is now |
|
|
25:04 | clear the state. This is not good example is not something you should |
|
|
25:08 | . Thio, Copy and remember to contrary. So given what it says |
|
|
25:15 | top and what I have said and one sees, and the issues with |
|
|
25:20 | particular code of a piece of cold accessing a very able that has been |
|
|
25:28 | allocated right. If it's been flushed the time you leave the or, |
|
|
25:33 | guess it'll print the 01 they It was the one that you created |
|
|
25:37 | will have been destroyed. Temp. you exit the probably region again, |
|
|
25:49 | says open and prepare a little So on Lee, the forced Luke |
|
|
25:53 | paralyzed at the end of the four attempt that was used inside. The |
|
|
26:00 | is lost and it is attempt that and was defined in the sequential region |
|
|
26:10 | is still valid. So that's why prints out temples initialized. It's the |
|
|
26:17 | on. Whatever happened in the parallel is lost and any other comments to |
|
|
26:26 | particular coat. So it follows the scoping rules. Yes. Okay, |
|
|
26:35 | the other thing that's temples, not . So that means who knows what |
|
|
26:39 | value is. Each thread will implemented and add 1000 or whatever value |
|
|
26:46 | Waas having one. It was you know, sometimes, um, |
|
|
26:52 | default compiler writers may actually include that things are initialized to zero, but |
|
|
26:57 | absolutely no guarantee what that value might . All right, so this is |
|
|
27:06 | way, as I mentioned, to the value if you use the same |
|
|
27:12 | name in the parallel regions on this , increments is used both inside the |
|
|
27:21 | region and outside. Um, but the region, each threat against his |
|
|
27:28 | , uh, memory location for increments everything has his own increments variable. |
|
|
27:35 | by the first private Klaus added to parallel four construct, that means all |
|
|
27:43 | memory location for increments for each one the thirds. In the parallel, |
|
|
27:48 | gets initialized 20 that it was in global region. So there's 600. |
|
|
27:55 | as anyone can go through and see the loops are doing here and the |
|
|
28:01 | on the first private okay, that's is the opposite. That's in |
|
|
28:11 | Thio export. So this big the name from something that has been unique |
|
|
28:24 | a threat inside the power of the to get it out. Then I |
|
|
28:28 | the last private, that is then the last value off exes among all |
|
|
28:33 | threads in that parallel region is the that gets, um, then cop |
|
|
28:41 | into the global variable X or sequential X. Um, they're not storage |
|
|
28:52 | ID, but that's why it's kind a copy from the parallel region. |
|
|
28:57 | to this sequential. All right, what I said. So now try |
|
|
29:06 | ask a little bit more questions to little example. So there's three variables |
|
|
29:13 | B and C that are initialized in sequential region. And then there is |
|
|
29:19 | parallel region started with on the cause that the private and see It's the |
|
|
29:27 | private variable. So now I'll see you guys can help me decide what |
|
|
29:36 | are in the parallel region. So start with I guess a and it |
|
|
29:46 | on what a might be in the region. Well, the default is |
|
|
29:56 | , right? So it would be amongst a lthough threats. Right? |
|
|
30:03 | , general, that's an equal Next they do, it would be |
|
|
30:07 | see. And in comments on those guys, No, no one volunteer |
|
|
30:25 | accept Island is good and commenting anyone or Yeah. Then you're welcome to |
|
|
30:35 | . B is each. Each one a sound copy of B, which |
|
|
30:38 | initialized. I'm sorry on initialized because doesn't have the first before it, |
|
|
30:44 | ? Yeah. Okay. Yes, . So what's he did? So |
|
|
30:49 | means, um now, let's So once one is done with the |
|
|
30:56 | region, then what are the values a B and C? Let's |
|
|
31:10 | after the parallel region, uh, would not have, uh I guess |
|
|
31:16 | is dependent on the operating system. the last one that wrote a which |
|
|
31:22 | arbitrary order unless you specified it. I think it would be, uh |
|
|
31:26 | can't exactly say, um, for , it would be one. And |
|
|
31:36 | , see, it would also be because we didn't use any of the |
|
|
31:40 | private clauses. Correct. So So this is correct that so Because |
|
|
31:48 | was exported for B and C out the Charlie religion, that means whatever |
|
|
31:53 | might have values they may have had kind of lost on what there is |
|
|
31:59 | existed in the sequential region and for correctly, depending on what happened to |
|
|
32:06 | in the parallel region, it's nothing . It will retain his value. |
|
|
32:10 | . Otherwise, it will be whatever left assigned to it in the parallel |
|
|
32:20 | . Eso this is, um, critical aspect off, um, open |
|
|
32:36 | . So the rule is that the look variable. But in this little |
|
|
32:47 | , the high index in the for is private to each threat. That |
|
|
32:57 | not something that has to be explicitly because, uh, hopefully you can |
|
|
33:03 | if it ended up being a shared , things would be a mess. |
|
|
33:11 | even if the loop is paralyzed, , different threads, you know, |
|
|
33:17 | any threads, that's supposed to you know, wrench within the |
|
|
33:24 | uh, integration count on it. wants to have his own notion |
|
|
33:30 | which I accept. But if some threads has another idea what I should |
|
|
33:37 | , then things will become a royal . So for that reason, do |
|
|
33:42 | disease. Um, for loops that paralyzed bye specifications has to be private |
|
|
33:54 | in this case, everything else. . Not nothing is defined in the |
|
|
34:00 | regions. That's all chair. And , um so this means, I |
|
|
34:09 | , which is the point. That perhaps easy to miss is in this |
|
|
34:18 | , it's the J Loop that is if from one's nested loops. |
|
|
34:27 | there were, uh, inner so to speak, in nested loops |
|
|
34:32 | be paralyzed, one has to explicitly so. Otherwise, it paralyzed salute |
|
|
34:38 | follows the parallel for stick. So means, in this particular case that |
|
|
34:47 | thread that executes some range of J disease is going to carry out all |
|
|
34:55 | orations off the eyes of soul. kind of makes sense from what? |
|
|
35:05 | programmer intendant, Right After each J , you're supposed to go through the |
|
|
35:10 | range of I indices, but in case, everything is that, um |
|
|
35:22 | are maybe not literally, as we earlier, but it's executed as |
|
|
35:29 | Each red has a copy on the look, but it's not a private |
|
|
35:40 | is so So this is what I'm to say. So now I think |
|
|
35:49 | have an example here. We'll spend little bit of time on. Try |
|
|
35:53 | you figure out if this thing may or not. I think it's a |
|
|
35:59 | thing to start. We're trying to which variables are on shared in which |
|
|
36:05 | are private. Two threads. So see if anyone otherwise off is volunteering |
|
|
36:19 | time, too. To try you can start with hi J and |
|
|
36:28 | whether they are chad or private when comes to the garland regions. So |
|
|
36:50 | they are defined in the shared So they are chaired. Hi, |
|
|
36:59 | on. Then we'll come to the low. Pamela just went through |
|
|
37:03 | The I in the for loop, is private to his thread in order |
|
|
37:10 | to create a total mess. is defined in the private regions. |
|
|
37:18 | . Is private to each threat And we have the forge a loop. |
|
|
37:27 | volunteers about Jay is a private or mhm that should be shared because it |
|
|
37:39 | declare a new job. And yes, correct it because it's not |
|
|
37:44 | loop that is paralyzed, and that's it and stop being a shared |
|
|
37:51 | So here is a little bit, we listen. That plant, a |
|
|
37:54 | one, is. Hopefully sit down your own and look at this particular |
|
|
37:59 | and see what shared and what's So, um So I was pointed |
|
|
38:09 | since J. Is shared and commented the previous slide. That means the |
|
|
38:15 | threads that even though they have their specific range off in disease that they |
|
|
38:24 | on, they all have a copy . Therefore, J. Luke and |
|
|
38:31 | is to shared variables. All of threads can, uh, read an |
|
|
38:38 | the J variable. So that's why become M s. So this is |
|
|
38:46 | little bit hard to see. you can see it on your |
|
|
38:52 | It's a little bit faint. So is, um, citizens suggestion. |
|
|
38:58 | enough to run this code as it with two threads and put an equal |
|
|
39:03 | four and just taking it as it now, if you look at this |
|
|
39:13 | , so in principle they had n four. That news, Uh, |
|
|
39:19 | I in the school from zero through . For each of these I in |
|
|
39:26 | , we have five j indices. that means, in total, the |
|
|
39:36 | Cali should be called four times 5 times. But as you can |
|
|
39:43 | things terminated after seven calls, but was 10 also in there, |
|
|
39:50 | So they're not ordered, as we in terms of what gets printed, |
|
|
39:54 | it's not 20. So that is different threads. And, you |
|
|
39:59 | there is no kind of order somebody , Um, not there are somewhere |
|
|
40:07 | J equals one in five and then four or something. But then some |
|
|
40:14 | threats. My reset, the So then it goes back. And |
|
|
40:17 | you look at the sequence of J for a given, I iteration, |
|
|
40:26 | not necessarily incremental. Um, in case, they, um what they |
|
|
40:39 | . And but there is one that got a equals six, but again |
|
|
40:45 | also some some high iterations didn't get the all the J it rations as |
|
|
40:50 | , right? So it's kinda useful take a look at it and see |
|
|
40:56 | they ice. And Jay's. Asai that, um, for so the |
|
|
41:05 | interest case had two threads. So zero gets, uh, equals to |
|
|
41:10 | and one and thread one guests I index two and three, and |
|
|
41:17 | since he for I constitute, that just one guy that immediately got terminated |
|
|
41:25 | they got J six, right? then what? So this is what |
|
|
41:34 | just went through. And then So just did this thing also then to |
|
|
41:39 | giant print make gay private, And you can, in fact, see |
|
|
41:46 | things are getting orderly. So for I index, all the files j |
|
|
41:54 | iterations were done. Um, if have it story from 1 to 5 |
|
|
42:01 | did check it. So I think this case, everything got done as |
|
|
42:05 | waas intended in terms off, most intended. I didn't think they want |
|
|
42:11 | be random Africa. So this is then. And when James Private each |
|
|
42:22 | index Scott, it's five integrations off jail. Ooh, any other thing |
|
|
42:30 | you might spot. If you can see on your screen that weak |
|
|
42:38 | There's actually one more problem. That things show that they're a was not |
|
|
42:44 | , that it turns out in the situation, whatever was there before, |
|
|
42:47 | it's not guaranteed, as I mentioned be nice, nice value, like |
|
|
42:52 | . It could be anything all Any questions? Comments on that |
|
|
43:09 | But again it shows easily. It potentially to make mistakes and writing open |
|
|
43:15 | code in terms off variable sharing among , and it can be kind of |
|
|
43:24 | to develop you. That's why it's recommended that one is very explicit in |
|
|
43:31 | what the variables are private and If nothing else, it should make |
|
|
43:37 | bugging somewhat easier. Hmm. Declaring that are private or making things |
|
|
43:48 | Just to be safe, however, have some costs because it means that |
|
|
43:53 | memory locations gets allocated for things that necessarily need to be private. So |
|
|
43:59 | may have cost in memories, and is just, uh, what this |
|
|
44:07 | clause does. That function makes the overwrite. Whatever the compiler writers are |
|
|
44:13 | to be. There, default make things being very as the fault |
|
|
44:20 | being shared, and then you don't Thio specified a teach on the constructs |
|
|
44:27 | may involve the variable. So this what pretty much this cause. So |
|
|
44:38 | , in terms off that variables being or not, there is also the |
|
|
44:46 | private. So you said that private that the variable is private to each |
|
|
44:53 | . In a parallel region, the private is somewhat different. So it |
|
|
45:01 | that variables declared to be third private private, too. Um, each |
|
|
45:10 | so he's threatened, gets a Remember location for that variable, but |
|
|
45:16 | is preserved outside the primary region, that's the main difference. So, |
|
|
45:32 | , to some degree behaves a za variable, but it's not identical. |
|
|
45:38 | this sense. There are separate memory for each one of the threads, |
|
|
45:42 | it's not just a single variable in sequential region, and then they also |
|
|
45:51 | to be initialized. And then we use, um copy and as a |
|
|
45:58 | off initializing, for example. And also data statement that we'll talk |
|
|
46:04 | uh, later. So this one can to the traditional fort call that |
|
|
46:12 | would use if you were doing this my noi. Yes, The thing |
|
|
46:17 | that when you create a parallel you have a bunch of friends, |
|
|
46:23 | ? So all of a sudden, you have a bunch of copies off |
|
|
46:31 | with the same variable name as in sequential region. So maybe, you |
|
|
46:36 | , sequential region with one variable name and in apparel region than you |
|
|
46:41 | say, 10. Um And now turns out in this particular case that |
|
|
46:52 | master threat private variable is storage associate with the sequential region. But the |
|
|
46:57 | nine are not on. They still to be initialized somehow, except the |
|
|
47:05 | Fred. So, um, so has to deal with again management of |
|
|
47:12 | data. And the forking does not , uh, anything explicitly about how |
|
|
47:21 | various, uh huh kind of replicated are allocated or initialized around. So |
|
|
47:34 | have an example. Hopefully, it'll it shed some light on what this |
|
|
47:37 | means. So I guess, example. So in sequential religion, |
|
|
47:46 | defined maybe variable I and the third d and the Variable X and then |
|
|
47:55 | have this threat private statement for A X. That means now an |
|
|
48:00 | Um, we'll have dedicated copies for threat inside the parallel region that |
|
|
48:09 | but they will not be the allocated the exit off the parallel region. |
|
|
48:20 | then there is some statement here, I can have the fragment open |
|
|
48:25 | parallel, private. In that for B and the Fed, I'd |
|
|
48:30 | now local to the first parallel A, um is also has a |
|
|
48:39 | copy. Um, in the parallel , Be definitely because it was declared |
|
|
48:48 | be private and X is a The global guy through the threat private |
|
|
48:56 | though each threat has his own So here is now what's the Prince |
|
|
49:01 | is generating, So that prints out A, B and X values. |
|
|
49:11 | , Hopefully, there's kind of no . So both a and B get |
|
|
49:16 | i. D and X. And one plus 1.1 multiplied by the |
|
|
49:22 | I'd again. Whichever thread prints out reaches the prince step and first way |
|
|
49:33 | know. So it's so the jumble terms off the ordering of what the |
|
|
49:37 | are. And it's not just 123 then because the ordering is not |
|
|
49:44 | But we can see a B and third IEDs, and there are zero |
|
|
49:51 | 30 and then they are kind of . And then look at the X |
|
|
49:57 | . That also makes sense given the I D. That fed zero. |
|
|
50:02 | , it's basically one plus zero. it's the one and the other one's |
|
|
50:07 | probably incurred. Well, explain this , properly, not supplied by the |
|
|
50:14 | idea and added the one. So not two. Remarkable, I guess |
|
|
50:22 | . Then there is a sequential and then I guess it's the more |
|
|
50:26 | part what happened and, uh, second probably region. So in |
|
|
50:39 | willing to comment a little bit on this print out for the second private |
|
|
50:45 | makes sense. So now, since and X were decided to be or |
|
|
51:09 | sorry to be threat private, that they're not a copy for each |
|
|
51:17 | And because it was thread private and private, they are preserved after the |
|
|
51:24 | private region. So that means all and X values that ended up |
|
|
51:32 | um, the value for each one the threads at the end off the |
|
|
51:38 | part of the region are still in . So that's why if you look |
|
|
51:45 | the A and X values for the threads, they are exactly the same |
|
|
51:51 | they were and the first panel So more or less just the difference |
|
|
51:59 | private and direct private is persistence among regions graph. And however for |
|
|
52:09 | that was the private variable to be the allocated and basically value is unknown |
|
|
52:17 | you come to the second parallel So whatever happens to be in that |
|
|
52:23 | , uh, at the time. so it Zira. But it depends |
|
|
52:30 | actually is there. So it's not guarantee that it would be zero. |
|
|
52:38 | So this is just showing the cop just initializing by the Ted privates variable |
|
|
52:48 | using the copy, and they're being value from the global to each, |
|
|
52:55 | , one of the threats. And there is another one that allows Fred's |
|
|
53:04 | share values assigned so one thread can . Ah, it's trend private |
|
|
53:15 | too. Yeah or other Should not private threat private but private variables to |
|
|
53:21 | other threats. So there is a off copying or broadcasting values among threads |
|
|
53:30 | what's known as a team that I mentioned all that much. But I |
|
|
53:34 | talk a little bit more about Maybe not so much today, but |
|
|
53:38 | time. But it's basically the team the collection of threads created for a |
|
|
53:44 | region when you have nested things that a little bit more complicated. But |
|
|
53:51 | more so when we talked about Andi, that's what we're talking |
|
|
53:57 | hopefully next time. Otherwise the lecture next there is more work sharing constructs |
|
|
54:09 | the parallel four we talked about. there's three more that are good to |
|
|
54:13 | . The master single in sections, the master simply says that whatever their |
|
|
54:21 | of code, that is, designated to be, um, or |
|
|
54:32 | close, what they call the practice this case is the open, empty |
|
|
54:35 | . The best that says that that cold should only be executed by the |
|
|
54:41 | Fred. There may be reason why wants to decimate some piece of cold |
|
|
54:49 | to effectively the sequential, but more more so than that. Um, |
|
|
54:55 | is only the master threat that can it if one is more flexible with |
|
|
55:00 | one of the threads, even though one at the time or only single |
|
|
55:05 | , it's not one at the Sorry, I shouldn't have said that |
|
|
55:09 | the other threads ignores it and just that piece of code and continues. |
|
|
55:16 | the other constructive fun doesn't care which execute that than one. Use a |
|
|
55:22 | construct that this similar to the except it doesn't earmark the master threat |
|
|
55:28 | be the ball. So in this , anyone on the trends can be |
|
|
55:33 | , and the section and construct is giving out blocks of code two different |
|
|
55:40 | . So this shows an example that three pieces of called X y and |
|
|
55:45 | on it designate the X calculation One thread. Why, to another |
|
|
55:52 | on DSI the calculation to a third . And this it says here, |
|
|
55:58 | there is, um, more sections threads than the threads takes turn until |
|
|
56:03 | are exhausted in terms of sections. , if there are fewer sections, |
|
|
56:10 | threads than some threats gets not becomes because they don't get the scientist |
|
|
56:22 | there's the flow control constructs Uh, , and that's maybe I'll go through |
|
|
56:31 | on. Then I would probably leave to so yesterday. The demo. |
|
|
56:39 | the flow control constructs Is there one at least the first one the barrier |
|
|
56:47 | may be obvious in probable execution Come that month. But we need a |
|
|
56:55 | threads occasionally, depending upon the logic the program to synchronize before anything else |
|
|
57:01 | . And the barrier is the most way of doing that. Basically, |
|
|
57:07 | , um, it's a synchronization point threats I get there earlier, White |
|
|
57:16 | the last threats arrived in terms of same part in the coat critical, |
|
|
57:23 | Atomic has to deal with things that only be executed by one threat. |
|
|
57:28 | talk more about what these things are then a conditional if and no |
|
|
57:35 | no, wait is if synchronous station required. Um, one can change |
|
|
57:42 | default behavior for some other constructs, , not have an implicit barrier and |
|
|
57:52 | about these things. So the barrier a simple example Here, that one |
|
|
57:59 | , uh, parallel section original Um, but in that case, |
|
|
58:07 | I get to do the be equal wants A to be fully computed. |
|
|
58:14 | that means all friends working on a to have done their work before you |
|
|
58:19 | anything would be so That's what the says. So I think that's supposed |
|
|
58:27 | know magic. Um, about this . Let's see what I had in |
|
|
58:33 | for this one. Yes. So this case, um, when man |
|
|
58:41 | so there's a conditional if only um tried with Heidi zero is the |
|
|
58:49 | that incumbent find that is a global . So depending upon what happens between |
|
|
58:58 | threads between the prince statement, uh, it generates to sort of |
|
|
59:07 | depending upon whether thread zero has implemented or not. Whereas after the barrier |
|
|
59:16 | guaranteed that things will be fine. , so that's what is that the |
|
|
59:27 | think there's a mutual explusion exclusion type . So that means one piece of |
|
|
59:34 | can only be executed by one threat the time. So in this |
|
|
59:41 | it's not the case that thread saying except one that get to do |
|
|
59:46 | uh, thirds get to do but only one at the time. |
|
|
59:53 | in this case, rests is a variables. So everybody wants to update |
|
|
60:02 | . So what? This construct, , enables. Or it gives correctness |
|
|
60:10 | in this case, the race condition eliminated. Otherwise, all the different |
|
|
60:15 | may, or at least some of threats may try to update dress at |
|
|
60:19 | same time, and only one will . In this case, all threats |
|
|
60:24 | the increments. The rest variable. the critical because only one at the |
|
|
60:31 | and then thistle is probably I'll talk that. Next time I'll just finish |
|
|
60:39 | the atomic one. And then I suggest to the demo. So the |
|
|
60:45 | is somewhat similar to the critical except refers Thio. Individual memory locations critical |
|
|
60:56 | be used for a code segment that executed just by one third at the |
|
|
61:01 | , whereas atomic is yes, um updates or reads our actions on |
|
|
61:11 | single memory location. So it could you been used in the other example |
|
|
61:16 | it was only one variable, the variable. But so I guess maybe |
|
|
61:23 | was a for example. So I then, uh, stop here and |
|
|
61:29 | talk about this next time. So get time to do the demo, |
|
|
61:38 | then it comments a question while so , maybe getting himself ready. |
|
|
61:44 | So you talked about the difference between process and a thread yesterday? |
|
|
61:49 | but what kind of, uh what we be aware of? When when |
|
|
61:55 | , Um, in the sense that I was going through operating systems, |
|
|
62:00 | would say the idea and the P for process idea versus Threat idea. |
|
|
62:05 | there any nuances asides from the ones you mentioned on Monday that we should |
|
|
62:10 | aware of in terms of the difference a process. And if you |
|
|
62:15 | the only thing I can think of that threads have a lot less baggage |
|
|
62:20 | processes. So that's why that tends , um B'more efficient in execution and |
|
|
62:29 | also in terms of memory required because process has a lot more information associated |
|
|
62:36 | it than threaten. Okay, so of the lightweight thing that you were |
|
|
62:43 | about last time. Six. So why I, for instance, one |
|
|
62:51 | use which I was talking about the lectures, this message programming interface or |
|
|
62:57 | I for short, that is process . So one can certainly use |
|
|
63:04 | P. I for doing, no programming and have multiple threads on |
|
|
63:13 | parallel processing by using MP I instead open empty, but is at the |
|
|
63:19 | level. So it's much more overhead using open and people Okay, I |
|
|
63:31 | can take it up on that, next time. And let's just do |
|
|
63:38 | thing here. It's, um So everyone see my screen? Yeah. |
|
|
63:48 | , great. So today I will give some basic demo, uh, |
|
|
63:54 | for the open MP constructs in the next lecture, we'll see some of |
|
|
64:00 | more advanced constructs that Johnson talk towards end. Like critical and atomic. |
|
|
64:05 | those stuff, we'll see them in next lecture again. So this demo |
|
|
64:10 | be mostly Q and A based. I will ask questions based on the |
|
|
64:17 | courts. I will show you and free toe. Guess what? The |
|
|
64:21 | . What outputs will be eso before start with that, uh, some |
|
|
64:27 | information about a code when you want , have open and be, |
|
|
64:33 | support with it and run multi travel . So with open and the first |
|
|
64:38 | you need to know is you need header file up here. So you |
|
|
64:43 | your C programs. So that's RMB . You need to include that and |
|
|
64:50 | you can start using all the open constructs and open and be function calls |
|
|
64:58 | compile a program with open and be . You can use either Intel Compiler |
|
|
65:05 | G and G C C Compiler with compiler, You can, uh, |
|
|
65:12 | your coat just like you would But you need to add one extra |
|
|
65:16 | , which is the high phone que empty. If you happen to use |
|
|
65:24 | , then this flag becomes F MP. So that's just the only |
|
|
65:30 | between intel and, uh, GCC . We'll use I see my hyphen |
|
|
65:39 | open and be provide our source code and the output by lame and just |
|
|
65:46 | it. So first question here uh, to have all of you |
|
|
65:53 | what will be the print state output this print statement and what and how |
|
|
66:01 | will be the outputs for the second statements Notice that this this call here |
|
|
66:09 | being has been commented out. So guesses what would be doubt, But |
|
|
66:19 | first print statement, uh, it get the number of threats of that |
|
|
66:28 | machine is capable of, right? it's in the cereal, um, |
|
|
66:33 | of the program. Okay, I'm saying it's right or wrong, but |
|
|
66:38 | , but what about the second print ? Um, not sure exactly how |
|
|
66:45 | works. But assuming parallel defaults to max number of threads Okay, then |
|
|
66:50 | would print the number of the thread total. So you should you should |
|
|
66:57 | the number of print statement should match output that you got from the serial |
|
|
67:02 | statement. Okay, so a Z see this. I'm giving this them |
|
|
67:07 | the bridges, compute notes, which , uh, to course, |
|
|
67:11 | two processors with 14 corsage each. in total, we have 28 hardware |
|
|
67:18 | . So let's see what happens when done this. So this is the |
|
|
67:27 | with the first print statement. So region will always print number off threads |
|
|
67:33 | one when you call or go, , get non threads because, |
|
|
67:38 | in serial reason your program or process you want to call it, it |
|
|
67:43 | has the master threat. So whenever call it in cereal section, it |
|
|
67:47 | only eat it on one. For you have this drag Malindi Battle of |
|
|
67:55 | , it actually depends. It depends what the value off Uh, this |
|
|
68:08 | variable that there's 30 M. P threads is so see here that this |
|
|
68:15 | environment variable is not set on bridges stampede. If you go and |
|
|
68:19 | it will be set toe one. since here it is not said the |
|
|
68:26 | Muay MP parallel section will take the number of threads. Toby, the |
|
|
68:31 | number off art where threats that's So in those kids, that's 28 |
|
|
68:38 | Stampede. If you just simply go run it, it will likely print |
|
|
68:43 | second print statement only once, because default, open MP has been configured |
|
|
68:49 | have this environment variable as one on to. So it depends on how |
|
|
68:55 | the Open MP has been configured. , let's say if I go ahead |
|
|
69:03 | remove, uh, there's a comment here, and let's set the number |
|
|
69:09 | threads explicitly. Now, what that is the, uh who MP set |
|
|
69:18 | threads Call. It supersedes thean environment so it can override the value that |
|
|
69:26 | the environment variable And now if you it, your parallel region will only |
|
|
69:32 | a threats. So it does not if your environment variable had a different |
|
|
69:39 | . Theo. MP set non threats always override the value help for that |
|
|
69:45 | variable. Does that make sense? this this goes off or I'm not |
|
|
69:53 | it just has hyper threading enabled enabled . Bridges does not have hyper threading |
|
|
69:58 | . So on stampede, it will 96 I believe, uh, print |
|
|
70:06 | If you don't said anything. Yeah. The second example here is |
|
|
70:16 | a few Open and be open and calls so open and be max |
|
|
70:23 | It prints the maximum hardware threads that that you can have Lambie get |
|
|
70:30 | Prague's here is um ah, it to the number off course that you |
|
|
70:40 | you have when we get thread That's the maximum number of threads that |
|
|
70:46 | respond and number off places so open be defines places by default as a |
|
|
70:55 | . So can either have a value thread a socket. Oh, are |
|
|
71:04 | tread core socket or node. So default, that's, uh said to |
|
|
71:10 | . And since we have only one access right now it will count number |
|
|
71:17 | places as one. And then inside , uh, that place I'd It |
|
|
71:27 | then tell you the number off rocks number off course that are present. |
|
|
71:32 | compiling this one again eyes same as previous one. And as you can |
|
|
71:38 | , Max Threads, we have 28 threads. We have 28 cores. |
|
|
71:45 | is the thread limit supported by opening on time. We have a number |
|
|
71:51 | places. So since we have only note and the i d off the |
|
|
71:56 | place number in the whole place so since we only have one |
|
|
72:00 | so it's ideas, said Thio. uh now the next exam called It's |
|
|
72:11 | one. So again, question for . Uh, we have this call |
|
|
72:18 | setting number of threads here to 28 we have this four loop that's paralyzed |
|
|
72:25 | 16 nutrition's. So my first question how many threads will be spawned for |
|
|
72:32 | case, since the index is lower the number of threads that probably spawn |
|
|
72:45 | of them and give one generation to one, right? So it's, |
|
|
72:51 | actually not true, but because it spawn 28 threads. And when it |
|
|
72:59 | or reaches this parallel section, it the open and be run time will |
|
|
73:03 | mine. How much work can uh, distributed amongst the threads. |
|
|
73:10 | in this case, only 16 threads get to perform the world because it |
|
|
73:15 | to evenly distribute work across the and the rest of the threats will |
|
|
73:22 | idle or depending on what opening period does with it. They may |
|
|
73:28 | uh, just get deleted. Let's go ahead and run it. |
|
|
73:36 | As you can see, we have threads that are performing work, and |
|
|
73:42 | thread has been assigned with one alteration those 16 situations off the affordable. |
|
|
73:53 | let's say we said number off threads than the number of federation. Now |
|
|
74:01 | going to happen? It'll it'll give her her thread. Correct, |
|
|
74:14 | so it will spawn for threads. again it will try toe evenly, |
|
|
74:19 | the work across all the threads. in this case that, uh, |
|
|
74:25 | , um, there will be four and each will perform four reiterations off |
|
|
74:31 | for loop and noticed that here. , Theo traditions that are assigned to |
|
|
74:38 | thread are sequential. So 012 and were assigned to trump zero. And |
|
|
74:44 | on in four and 4567 to thread eso for the previous example. |
|
|
74:50 | I know the iteration and the threat matching, but I'm assuming that's implementation |
|
|
74:54 | find right, not part of the . It doesn't matter which threat gets |
|
|
74:58 | portion of the work. Yes. there is a option. Uh, |
|
|
75:06 | modify this behavior, uh, which can be said by the clause that's |
|
|
75:12 | schedule, which we will discuss, believe in coming lecture. So by |
|
|
75:18 | that tries toe do a a static . So in that case, three |
|
|
75:25 | numbers and threads matches. But if said the scheduling two dynamic, that |
|
|
75:31 | change. Okay, Well, So we'll see that close next |
|
|
75:41 | Uh, so yeah, So this example is a little bit interesting, |
|
|
75:45 | I'll give one minute for everyone to . Look through it, um, |
|
|
75:51 | see what's going on. So just give a summary way have ah, |
|
|
75:57 | have two areas and be I uh, the first element off this |
|
|
76:04 | zero thistles, the initialization. Look we initialized a with some values. |
|
|
76:11 | we come out of that and said off threads as four. Uh, |
|
|
76:17 | we have our four loop, which penetration. In our case, Nutrition's |
|
|
76:22 | eight. So four threads and edit . Now, the interesting part is |
|
|
76:28 | operation. So can anyone tell me might be wrong with this with this |
|
|
76:38 | or operation here? Eso It seems high is a shared variable, |
|
|
76:49 | Mhm. So, uh, it be It will be updating and unexpected |
|
|
76:57 | because, for all we know that access to a I might be a |
|
|
77:01 | eye than the assignment to be I , Right. So I actually is |
|
|
77:07 | private variable because since we signed the that loop indexes are private two |
|
|
77:16 | But yes, your answer is partly . Because let's say a zero, |
|
|
77:24 | , went to trade zero, but let's say B zero and, |
|
|
77:34 | B one and B zero went toe zero, and B two went to |
|
|
77:42 | next thread. But the next thread also need access to be one that |
|
|
77:48 | I minus one and in case the thread happens to execute this instruction after |
|
|
77:58 | before the 1st 1st thread, Then output may be messed up to simply |
|
|
78:04 | that there is a data dependency amongst the little bit rations. And if |
|
|
78:12 | threats do not operate in sequence, , uh, you're out. May |
|
|
78:17 | be correct. Let's just go ahead run it. So as you can |
|
|
78:25 | you just in the 1st 1st, , you can see that there's for |
|
|
78:32 | threat to it required. It calculates five using a five and before. |
|
|
78:40 | if you notice before has not been yet, it gets calculated down here |
|
|
78:46 | thread one and the value that threat took for before zero, which should |
|
|
78:54 | not, which is not correct. that's why our output bees, they |
|
|
78:59 | not come out correct. So the here is that if you have data |
|
|
79:04 | in your low penetrations and you are careful about that, your outputs may |
|
|
79:11 | come out correct. Does that make ? Yes. So you said the |
|
|
79:19 | indices are always private. 44 right in This is that private. But |
|
|
79:27 | one threat tries to access in um, element that belongs to some |
|
|
79:33 | thread. Then there is a data between those two threads. Okay? |
|
|
79:39 | that happens in this case because of , uh, one. Right. |
|
|
79:46 | . Uh uh. Next sounds are . Is this one? So we |
|
|
79:55 | , in this case, a simple , uh, battle region. |
|
|
80:02 | as you can see, uh, are to a main threads that are |
|
|
80:07 | spawned for for the outdoor region, then each off the, uh, |
|
|
80:15 | outer region spawns two more threads using applause, which is non underscore |
|
|
80:23 | So my first question is, how threads, Uh, do you think |
|
|
80:29 | in total? I want to stay , but I'm not sure if that's |
|
|
80:39 | too obvious. It was mhm. . Anyone else? So intuitively. |
|
|
80:53 | it again. Six. Uh, , so it wants. Uh, |
|
|
81:00 | was right. There will be four spawned. So but yes, |
|
|
81:05 | logically, if you just look at code, you would you would say |
|
|
81:08 | threats. But that's not how the and parent time works. Eso if |
|
|
81:13 | if you just think logically. There's two out of threads each spawning |
|
|
81:17 | threads. So that's for four more of, uh, except except the |
|
|
81:24 | main traits that becomes six threads but open and peed on time. What |
|
|
81:30 | does is, or I should fortunately, what it does is since |
|
|
81:35 | tries to reuse the already spawned So since it already spawned two threads |
|
|
81:41 | for the outer section, it just to more threads to fulfill the requirement |
|
|
81:46 | four threads inside, uh, business section. So what it does is |
|
|
81:52 | saves the all the software stack or and everything for the outer outer region |
|
|
82:00 | use it reuses those two threads that for the outer outer region for the |
|
|
82:07 | the region as well. So the is, in total, there will |
|
|
82:11 | four threads. Now, a second . What do you think will be |
|
|
82:18 | output off off this program if I run it like this? How many |
|
|
82:26 | statements you will see? There is print statement here and once print statement |
|
|
82:31 | . So how many print statements will ? Will come out on console in |
|
|
82:40 | case would be sex. Right? . Uh, I wouldn't have asked |
|
|
82:47 | it was so simple. So in case, what happens is you got |
|
|
82:54 | zero out one. So you got to to print statements for the outer |
|
|
82:59 | threads. But you only got, , to print statements for the inner |
|
|
83:07 | the region. The reason being, , to enable working off nested parallel |
|
|
83:14 | , you need to set, this environment variable, which is going |
|
|
83:19 | nested the true. Otherwise Nestor regions not work, and this is important |
|
|
83:25 | it will be used in your next . So make sure you do that |
|
|
83:30 | you work with Nestor battle regions. , once you said that, you |
|
|
83:34 | see the correct output. So you out zero and out, one for |
|
|
83:39 | sprint statement for the outer two and then you get for outer zero |
|
|
83:47 | you see in zero and in one the same for outer first thread. |
|
|
83:53 | see, uh, in zero and one, uh, one more thing |
|
|
83:59 | take away from this example is that thread IEDs inside Ah, parallel regions |
|
|
84:07 | a nester parallel region are private and , so as you can see how |
|
|
84:12 | zero as in zero and in one well. And outer one has in |
|
|
84:16 | and in one as well. These are not two and three days or |
|
|
84:20 | and one. So, uh, that are assigned to threads inside a |
|
|
84:27 | of regions eyes our unique for each region. Any questions on that? |
|
|
84:35 | might. We're kind of out of . So thank you to kind of |
|
|
84:40 | problem we can. It was too . Can really do it a little |
|
|
84:45 | next time. Uh, structure, there's a quick one that you want |
|
|
84:51 | comment. I had one more question the assignment. Um, so rappel |
|
|
84:58 | , uh, both CPU energy and main memory energy DRAM. Um, |
|
|
85:07 | the it's us to compare it to thermal density that's provided by intel. |
|
|
85:14 | , and Intel provides it at the level. So are we more or |
|
|
85:18 | throwing away our valley for DRAM and using the one for the CPU? |
|
|
85:26 | guess it provides it for the But we're using one core right on |
|
|
85:31 | being said if we're trying to make accurate measurements, we would divide that |
|
|
85:37 | CPU thermal density by 24. Since only using one core on our |
|
|
85:48 | Yes, on. That's, um okay. Uh, Thio do that |
|
|
85:55 | long as you explain what to do Yeah, later on, I think |
|
|
86:04 | some of becoming assignments, it will or course. So I guess it's |
|
|
86:08 | long way of saying that in addition the core energy, there is also |
|
|
86:14 | parts of the chip that consumes So the energy consumption, um, |
|
|
86:20 | the packages in the inter concept is , um, directly, um, |
|
|
86:30 | to the number. Of course, a bunch of off. Um, |
|
|
86:35 | that non as intercourse encore parts of chip that, uh, concerns power |
|
|
86:43 | is not proportional to the number of . Of course that is being |
|
|
86:47 | So But for now, for this , you know, long as you |
|
|
86:52 | us what your reasoning is, it's . Later on, we'll kind of |
|
|
86:56 | it a bit. Okay, in of the deer, I am at |
|
|
87:02 | separate products or into doesn't say that into doesn't give you any part number |
|
|
87:08 | max power number for the memory. it's at this point most of for |
|
|
87:15 | information to see a little bit the between chip processor chip memory, power |
|
|
87:23 | the memory of power on it's possible one goes on the weapon, find |
|
|
87:30 | the power consumption is for one of dims. But, uh, then |
|
|
87:35 | need to go and look at how deems is it on this particular platform |
|
|
87:40 | find respect for those dames. And was more than we intended or I |
|
|
87:46 | you to do for this assignment. , yes, that's their own |
|
|
87:51 | Depending on about generation of memory chips is using, um, then it's |
|
|
87:57 | just the size on how many gigabytes is that you have, but also |
|
|
88:02 | scene. Most technology generation being used typically is in the 4 to |
|
|
88:08 | What range for them when there are active the idol energy for games are |
|
|
88:18 | lower. Okay, yeah, it's true, the General and today |
|
|
88:24 | more than expected. Yeah, that's mouthful, but it's good to know |
|
|
88:31 | had a questions following. So in data, it it depends upon just |
|
|
88:38 | data Mrs Rate, the total cash . It doesn't depend upon the Lord's |
|
|
88:45 | Lord of Wars to Mrs So barely the cash Lord Mrs and distort Mrs |
|
|
89:01 | . I'm trying Thio. See if got your question. Correct. So |
|
|
89:06 | question waas how Thio estimate the the memory traffic. Is that a |
|
|
89:18 | Effectively The question waas Yeah, the rate off the cash. It depends |
|
|
89:25 | the total Mrs Right Total cash, . They don't They don't depend upon |
|
|
89:31 | Lord or the store. Mrs. Well, the total cash Mrs depends |
|
|
89:39 | the load and storm is is since kind of the aggregates, including |
|
|
89:44 | This is so that's why I think advice to dio last level cash a |
|
|
89:51 | three cache misses and look at the number of Mrs because each one of |
|
|
89:57 | , whether it's for instructions or for or stores, will cost memory |
|
|
90:05 | We are just going to consider the one, right, because it considers |
|
|
90:10 | the mother Mrs Right. It's included the total and for the level three |
|
|
90:18 | , um, stampede processors a Sfar started to call Poppy can get the |
|
|
90:27 | number of Mrs but they couldn't separately instructions, stores and our breed and |
|
|
90:33 | , mrs. But it presumably collects total number. Mhm. Okay, |
|
|
90:47 | it doesn't. Yes. Is that the separate statistics for lows, Read |
|
|
90:51 | instructions. It just gives the But it's not bad because again it |
|
|
90:59 | one of each miss generous memory It's a little bit different. Eso |
|
|
91:11 | right may actually cause mawr memory traffic Alok. So it's not exactly capturing |
|
|
91:22 | memory traffic. Because when I talked caches have said that most, |
|
|
91:29 | cash policies today has a so called allocate. So that means a right |
|
|
91:36 | in itself, cars both the reason the right. So in that |
|
|
91:43 | right, Mrs Jim may generate more traffic than instruction, read or |
|
|
91:49 | No. Okay. Okay. Thank for glad your finger, you |
|
|
92:08 | And yet that question that was all the |
|