© Distribution of this video is restricted by its owner
00:00 | Uh huh. Okay. So I continue talking about tasks. Open it |
|
|
00:16 | and then I'll talk about the concept facility I guess before I actually start |
|
|
00:26 | lecture if there's any comments, questions you so yes I want to make |
|
|
00:33 | student stuff. I think there's anything . Uh start so I started to |
|
|
00:48 | about these concepts of tasks that open support since version 3.0 or open up |
|
|
00:55 | standard uh talked about construct itself and scope of uh variables and what I |
|
|
01:08 | to call inheritance rules so what but get propagated kind of downward in the |
|
|
01:20 | . Today I'll talk about a little more about synchronization uh and then I'll |
|
|
01:26 | with the other patterns. So with the barrier as well as the task |
|
|
01:35 | uh that is again is synchronization. it's kind of the opposite of the |
|
|
01:40 | wait for parallel regions in that case is a way for making sure that |
|
|
01:47 | again uh synchronized before things moves up gave examples last time. So today |
|
|
01:56 | this notion of test group that is way of having tests synchronized. So |
|
|
02:03 | is simply what it's like to declare a test group and then there is |
|
|
02:11 | potential collection of test uh and then being a task group construct that means |
|
|
02:23 | code segment to which the test price means the test generating within that |
|
|
02:33 | all get synchronized before things moved So at the end of this test |
|
|
02:40 | all the task will be completed and task book two seats before all the |
|
|
02:47 | are complete. Um So that I was just construct of this not too |
|
|
03:02 | from the barrier but it device them potentially you can hire a test being |
|
|
03:11 | . Um It was one example and forgot that. Uh So this is |
|
|
03:23 | just another example of again that uh group, the supply to segment of |
|
|
03:30 | code that was similar to what's shown the previous, it's like but not |
|
|
03:35 | to all tasked. So there's some generated before this test group is |
|
|
03:46 | So there are lots of different ways controlling what gets done in parallel and |
|
|
03:56 | to again coordinate to synchronize tasks. it's quite rich and it's richer than |
|
|
04:03 | using the fork, join other parallel and stole the about the test. |
|
|
04:09 | of course it also gets more complicated keep track of the logic that it's |
|
|
04:16 | so there's also a workshop in construct is simply task loop and it's pretty |
|
|
04:24 | the first instance act as a parallel construct for regions but it has more |
|
|
04:33 | in how uh the integration range is up long tests and I think I |
|
|
04:45 | samples on the carbon slides, I'm to kill me. So uh and |
|
|
04:56 | . Mm So yes, so I simplify this. No shadow grande sized |
|
|
05:03 | they number of tests generating. So just a little bit more flexibility again |
|
|
05:10 | how to oh, divide up the situation domain. Mhm. Um And |
|
|
05:20 | yes. Okay. It wasn't on side, there will be on the |
|
|
05:24 | flight but disaster can also have classes The kind of four construct and have |
|
|
05:34 | in terms of the variables are then between the task and the just generating |
|
|
05:45 | that but it's of course that Tesco . Mhm. So I will talk |
|
|
05:55 | think hopefully in the next few session these various entities about two uh manage |
|
|
06:04 | . Mhm. Uh So this is the grain size. So in this |
|
|
06:18 | there's a loop branch and it says the minimum a number of iterations in |
|
|
06:23 | book that gets assigned to test is . So that's the notion of the |
|
|
06:31 | size and I think maybe a strip but the grain size then can be |
|
|
06:39 | and it has arranged between the minimum but no more than twice a minute |
|
|
06:45 | specified. So that's the only thing about this particular cold segment. Just |
|
|
06:51 | illustrate the grain size class can be for the task. Look Yeah. |
|
|
07:03 | next scheduling and this is where I there's a lot more flexible than in |
|
|
07:11 | style of regions. I'll try to each one of these scheduling options. |
|
|
07:24 | . In the next few sites. . Yeah. So Yeah. |
|
|
07:34 | So at first I guess I should which says on the slide that |
|
|
07:40 | tasks can be executed in any order I think was just the last |
|
|
07:49 | but it's also that was tried to last time is that the tasks um |
|
|
07:58 | be deferred. So there is no um Thailand when the particular task and |
|
|
08:10 | started to get executed says up to runtime system and the operating system to |
|
|
08:16 | out, went to take on tasks which wants to take off. So |
|
|
08:24 | can be as it says on top , that could be executed immediately, |
|
|
08:31 | there's no guarantee that that's the case they can be deferred, which means |
|
|
08:36 | essentially that task, it basically placed a pool and then threads whenever they |
|
|
08:42 | to or rather the ransom system decides time to do something about the task |
|
|
08:48 | threats goes and to take tests of from the pool of tasks to be |
|
|
09:00 | . Yeah, it also is the that it's not guaranteed that a threat |
|
|
09:13 | do all the work for a particular . So unless one kind of controls |
|
|
09:23 | , the runtime system, a switch task between threats. Okay, So |
|
|
09:33 | that's what sent towards the bottom of slide, that one can fly them |
|
|
09:40 | test to make sure that it's the said, it's used to complete the |
|
|
09:48 | , will do the entire work for task or again, can be untied |
|
|
09:54 | the opposite. So that's another thing it is different compared to dealing with |
|
|
10:02 | regions and on as tasks. Um yes, I guess just some cautions |
|
|
10:14 | john dealing with tasks because of the and what might happen. So, |
|
|
10:21 | this is, I don't think if remember correctly in the assignments, the |
|
|
10:29 | are not doing too much good is that correct? Yes. |
|
|
10:36 | So this is just for you to aware of and maybe it's something |
|
|
10:42 | if you do a project for the of the course, uh, that |
|
|
10:48 | open mp at that point, there be some of these concepts that they |
|
|
10:52 | to explore and so that's why and an important part of open mp. |
|
|
10:57 | I I just want you to be of this test construct. Yeah. |
|
|
11:09 | , so this is an if clause then if causes similar to what we |
|
|
11:14 | an example for private regions that one the knife claws, that person |
|
|
11:20 | well, if there is enough iterations the look for instance, then find |
|
|
11:28 | his look. But if the, only a few iteration is probably not |
|
|
11:34 | the overhead of paralyzing the use and a parallel region and having third steel |
|
|
11:41 | it and similar we can do with f course that one can control with |
|
|
11:46 | test should be generated enough basically, sunday, whether the of course is |
|
|
11:53 | or false. Yeah. And here again, in terms of the scheduling |
|
|
12:03 | that as seven mentioned before, there be deferred or not and you can |
|
|
12:15 | have this this inclusion option that it in that case handle, buddy fred |
|
|
12:28 | creates the task. So um I that's enough commentary on this point. |
|
|
12:43 | wow. And there's here's where I it gets more interesting in time for |
|
|
12:51 | flexibility. The test, this even the next construct is dependent |
|
|
13:00 | So this is the way in the . He'll simply basically telling the frontal |
|
|
13:07 | again that the task effectively be suspended favor of another task. Yeah. |
|
|
13:15 | huh. So depending on what the of your code is, sort of |
|
|
13:20 | task maybe uh that's critical in some than some other tests. So one |
|
|
13:30 | want in that case that threads gets to dealing with another task. In |
|
|
13:38 | it again get, in a a higher priority get executed. And |
|
|
13:48 | is just called example when the task is used for particular function. And |
|
|
14:02 | I will talk about the next construct then I'll stop and see also. |
|
|
14:07 | Russia's and the comments of the task . So um well, the results |
|
|
14:15 | priority call. Sorry, I forgot that. Probably this again, it's |
|
|
14:20 | advisory in some sense, it's a expression of importance but it doesn't the |
|
|
14:32 | system doesn't necessarily obey with the desires it basically then gets a priority and |
|
|
14:40 | upon the resources available at that given . Um then it's most likely we'll |
|
|
14:47 | to comply with priority given through their , but like with the number of |
|
|
14:54 | requested as we talked about several It's hence suggestions to the random |
|
|
15:02 | what the program I believe it's a thing to do. And then there |
|
|
15:10 | just depend force that I mentioned that one can specifying or tests are dependent |
|
|
15:20 | other tasks. So that means attacks not proceed until the dependent the task |
|
|
15:29 | which is dependent has been completed. it's a more flexible mechanisms to order |
|
|
15:39 | . Then the 4th joint model in the standard parallel region. So one |
|
|
15:45 | basically build a dependence tree not as more flexible through the task yield and |
|
|
15:53 | depends crosses And I think there's some in this case that best is |
|
|
16:06 | Again, what tasks are depending on honest This case task one that the |
|
|
16:15 | text than is also serving as input the other tasks. So in that |
|
|
16:23 | my eyes built sort of the logic or how the tests are supposed to |
|
|
16:29 | . Mm So they depending depend out there can be a list of |
|
|
16:37 | So it doesn't need to be just single one and I think that was |
|
|
16:49 | correct. So there is, it's third example that just to look at |
|
|
16:56 | case you get to the point where want today experiment the task and again |
|
|
17:02 | dependence graphs. Oh, a test then of the task and the task |
|
|
17:10 | , arguments are good uh things to around. Yes, so I'll stop |
|
|
17:22 | a minute here and see if. you should want to comment on more |
|
|
17:26 | the task constructs and not to manage or not really. Nothing new. |
|
|
17:45 | , so I think I'm going to these examples for the moment and I'm |
|
|
17:54 | come back to it. Uh the of the lecture depending upon how quickly |
|
|
18:01 | go, because I want to talk let's not ask thread affinity and that |
|
|
18:10 | an important concept, whether one use or not. So I went through |
|
|
18:17 | bunch of slides here, get the part now and I will start an |
|
|
18:27 | . So I think once I've talked thread affinity, I hope I've answered |
|
|
18:35 | question that think was asked in the first lecture about harman controls, what |
|
|
18:46 | do. And in part we talked using the Freddie Freddie, I'd two |
|
|
18:55 | of control what threat ends up doing works. But as I also |
|
|
19:03 | Yeah, probably the most critical thing the relationship between threads and where the |
|
|
19:12 | lives that they operate the ball And well as how threads are allocated. |
|
|
19:22 | course depending upon whether the application is bound or are you or memory |
|
|
19:34 | And that was part of the early of the lectures and assignments too, |
|
|
19:40 | this. Matrix multiply that is usually limited and matrix vector and stream and |
|
|
19:47 | of the others that are memory so then threads allocation becomes important, |
|
|
19:58 | this is not what I'm going to into. So I have first to |
|
|
20:06 | example again, the first one is matrix multiplication again. So yes, |
|
|
20:18 | Way one can do things as usual known as scattered or spread it, |
|
|
20:23 | a couple of different names, that the same thing. So vocabulary is |
|
|
20:29 | kind of unified, but it means the spreading things out, whether it's |
|
|
20:35 | scatter and spread and compact, sometimes called close, that means I'm trying |
|
|
20:42 | kind of practice things and make threads as close to each other as |
|
|
20:49 | So now in this maintenance multiplication example was for investment single server in this |
|
|
20:58 | , there were two versions of one is the old example, so |
|
|
21:04 | was only four course in one case six core and the other and then |
|
|
21:12 | a couple of graphs, then then um and the pretty much totally straight |
|
|
21:21 | is compute rates and measured in terms gigabytes per second for this server and |
|
|
21:28 | is a blue line that is for scattered allocation and the red line, |
|
|
21:33 | is for the compact dedication. So the scatter, basically all course Were |
|
|
21:40 | over four threads and in this case courts, so Basically each core got |
|
|
21:49 | threat and that then got a much performance as we can see even you |
|
|
21:59 | them more than double the performance. and the compact then did one used |
|
|
22:10 | the hyper threading up option. So gives then to threats to each core |
|
|
22:20 | the hyper threat option. But as can see from the performance program that |
|
|
22:30 | very successful in terms of performance. the point of hyper threading is that |
|
|
22:42 | can be quite useful when functional units charismatic units are and and not |
|
|
22:56 | but since functional units are shared among thread uh in a hyper threading environment |
|
|
23:03 | similar daily, a multi threading as called by some other vendors. So |
|
|
23:09 | don't get really more compute capability by hyper threading. So this is the |
|
|
23:17 | example so by using is compact and allocation than basically the number of functional |
|
|
23:26 | that got used that cut in So yeah, that's why it's important |
|
|
23:36 | understand general application what's critical now in of also the lower graph looks at |
|
|
23:45 | energy efficiency of the competition and unfortunately color coding that these folks did that |
|
|
23:52 | their plot a little bit messed up to you. The following point |
|
|
23:58 | So now blue it's still the scatter compact ended up being yellow. So |
|
|
24:11 | and in this case lower is Less energy to do the job. |
|
|
24:17 | clearly better. So that says that scatter was both faster more than twice |
|
|
24:25 | performance but and also more energy So it's about 12. About 60% |
|
|
24:38 | on the energy using the scatter allocation to the compact allocation. So any |
|
|
24:55 | on this. So using, understanding to Use 3rd allocation is important, |
|
|
25:09 | for performance and for energy efficiency in competition. 1, 1 question is |
|
|
25:19 | in case of compact happen if the has multiple A V X units, |
|
|
25:26 | say one skylight in that case shouldn't , uh improving complex nothing as |
|
|
25:35 | Yes. Yes. Yes. So you have replicated functional units, like |
|
|
25:43 | the case that was just mentioned, have to every excuses for instance. |
|
|
25:50 | , then uh it may be a bit more complex than just using hold |
|
|
25:56 | , hold compact representation, but you like to have a thread education that |
|
|
26:01 | both A B X units in the and there is, I've tried to |
|
|
26:16 | some comments in a bit on but one comment I will make now |
|
|
26:22 | that is the john just trying to them, Make sure my uses all |
|
|
26:33 | units. If there is more than in each core is the case that |
|
|
26:44 | unused, all functional units in the , what might happen is that the |
|
|
26:51 | , it gets reduced because it gets hot so then participation may play your |
|
|
26:59 | so one probably might not get one get more than a single functional unit |
|
|
27:06 | , but one is not likely to double. So it's probably advice things |
|
|
27:14 | to use one of the x unit all the course and go back and |
|
|
27:22 | up with the second one if one more threats and Mhm. All |
|
|
27:31 | So this we're in terms of energy in terms of charismatic work for what |
|
|
27:42 | fox. But what in the two ? So scatter was almost four times |
|
|
27:52 | good. Yeah, is made to multiplication and that behaves quite differently as |
|
|
28:02 | can see. So in this case compact was the better choice except for |
|
|
28:11 | clock right soil front. So they guess I should have said that on |
|
|
28:14 | horizontal axis but the corporate um so this because the low clock rate scatter |
|
|
28:24 | still okay but at a higher cooperate actually the compact representation is the winner |
|
|
28:31 | terms of performance and it's also they they're in terms of the energy efficiency |
|
|
28:42 | this case, forget it. Yes. So, so it again |
|
|
28:53 | because maybe you expect from application tends be limited by memory accesses, not |
|
|
28:59 | the functional units. And that's why must use the different behaviors in this |
|
|
29:04 | that the compact in fact ended up better. And so this is measurements |
|
|
29:11 | this is from the work group game group at UC Berkeley. And the |
|
|
29:18 | in part why the compact also is from the energy point of view is |
|
|
29:24 | the firmer in modern processors if the are not used, they shut them |
|
|
29:31 | in a very low power state. that reduces the power consumption for the |
|
|
29:38 | of course compared to in the case you just just one trip the core |
|
|
29:43 | that case you didn't get the performance scattering things out on. So it |
|
|
29:49 | better off to in this case save by shutting down course and just limiting |
|
|
29:55 | number of course engaged in this job again, it's not generally compute |
|
|
30:08 | Mhm So any questions and so these again two Very simple computations that illustrates |
|
|
30:18 | difference and how one would manage trade location um both for performance and energy |
|
|
30:32 | . Mm and this is just a of what I already said. |
|
|
30:42 | so now I'm going to talk more you than controller but third allocation but |
|
|
30:50 | also the data allocation part. And think so cigars mentioned that in the |
|
|
30:58 | in the last lecture of the lecture This so called 1st touch principle. |
|
|
31:08 | that is simply that the thread and kind of get allocated together. So |
|
|
31:20 | get looking to or rather the threat allocated to um where where the data |
|
|
31:33 | that it works on 1st. So in this case it's just one |
|
|
31:40 | then the data gets allocated in the um for the threat is allocated to |
|
|
31:47 | core for which the data was So that's uh obviously not. So |
|
|
31:56 | if one, I'll give examples in next few slides but okay. For |
|
|
32:02 | single tragic cold, that's the right to do. But when you have |
|
|
32:12 | threads then it may be an So if you later on in this |
|
|
32:19 | were to have a parallel region that on the data but and then the |
|
|
32:23 | for when you increase the number of gets allocated to other course then the |
|
|
32:31 | is longer farther away and it gets slower to access and more energy consuming |
|
|
32:39 | access because all of them that needs talk to a single uh memory |
|
|
32:49 | So Now in case one work to in this case two threaded core and |
|
|
32:56 | you have the for loop private That now has two threads and then |
|
|
33:02 | uh huh allocation of era is them between where the two threads are |
|
|
33:16 | So this is kind of a better . So this this is what there |
|
|
33:22 | systems too. So so this it's an example. Yeah, so |
|
|
33:35 | that's the son that if there is well, I think yes, I |
|
|
33:38 | say, but if the single thread all the data, then it ends |
|
|
33:43 | in the memory associated with a core the thread runs and then as I |
|
|
33:48 | then everybody else will have to go grab and work with that piece of |
|
|
33:54 | . So obviously the thing that was of done on the second part of |
|
|
33:58 | previous line is then too make sure the initialization of rate is done in |
|
|
34:06 | private region, so then automatically in has split up among where the threads |
|
|
34:14 | are allocated. So the design is things to pay attention to and do |
|
|
34:25 | even basic open mp assignment is to track over data is and where the |
|
|
34:34 | end up being allocated. And there just another example again where the Data |
|
|
34:44 | initialized in the best sequential one, it's just in a single place and |
|
|
34:49 | things uh it's not very efficient. , as I said, there's nothing |
|
|
34:55 | logically with a cold it will run as I said, it's not normally |
|
|
35:00 | because data is allocated Yes. Where single threat was running that generated the |
|
|
35:09 | . So in this case one can again, similar to what was |
|
|
35:12 | Any other example that one has initialization paramount regions where multiple threats. |
|
|
35:26 | And then there was just an example commands one can use to get an |
|
|
35:31 | of what happens. And this is showing in this case book. |
|
|
35:38 | threads and if I use this particular knew my A C T L. |
|
|
35:46 | a scent even shows expected latency between different um course or threats how they're |
|
|
35:58 | . So there are ways of getting also how good the thread allocation is |
|
|
36:03 | using some of the comments, not . Okay. You use the L |
|
|
36:10 | E P I believe so, And one of them also might be |
|
|
36:13 | than one everything, number six Yeah, so this was um now |
|
|
36:32 | got to try to talk talk about you control with where controls them and |
|
|
36:38 | threads ends up. But any questions comments so far about uh this first |
|
|
36:48 | and how data and threats are allocated to each other if one doesn't explicitly |
|
|
36:57 | things. Okay, But so now two parts of you control things, |
|
|
37:12 | talk about uh about scatter and compact spread them close. But the first |
|
|
37:23 | then is and then uh one prevents from wandering around which is binding. |
|
|
37:35 | open. Empty. This used on notion of places for to which threats |
|
|
37:46 | be allocated and to which they can be bounce. So one thing is |
|
|
37:54 | allocate things, but Unless one also things to be bound to the particular |
|
|
38:02 | , they always may have its own of doing the execution of the cold |
|
|
38:09 | move threats around. Mhm. So standard places uh to which threats can |
|
|
38:27 | executed, it's kind of um confusing some degree, but there is also |
|
|
38:35 | threat concept in terms of the So we have seen that many designs |
|
|
38:45 | have like until the orphan hyper so they have Up to two threads |
|
|
38:51 | core. Mhm. And so one then assign threats are being executed to |
|
|
38:57 | of these two hardware threads if you , but then I can also advocate |
|
|
39:06 | simply to course. So the higher concepts and not bother too much, |
|
|
39:12 | which of the hardware threads gets the executing threat or we can simply allocate |
|
|
39:21 | for socket. So those are the targets replacement of threads for execution is |
|
|
39:29 | score and stuff. Then I can places and I will give examples that |
|
|
39:39 | is more illustrating them but this is documentation or you want places might be |
|
|
39:46 | how to specify them. So can specific example or number identities for places |
|
|
39:56 | I can make a list of places I can also groups places into |
|
|
40:06 | So uh so the petitioners, the in order, this that this continues |
|
|
40:14 | for breast as you cannot have and in the places, so it's a |
|
|
40:20 | start to finish um with constant executive and let's see uh and I think |
|
|
40:34 | , so basically so in principle like other um constructed essentially trip with notation |
|
|
40:48 | is being used but then one can part of the chip, it notation |
|
|
40:56 | so there is basically lower bounds, looks towards the bottom of the |
|
|
41:00 | this is lower bound length and So if you omit the stride and |
|
|
41:05 | seems that the strike is one and course if you omit the length and |
|
|
41:10 | just a single place, so there's few different ways of Looking at the |
|
|
41:22 | thing and again, the last Yeah, is basically first the Middle |
|
|
41:29 | . Is that the partition? So basically Each partition has four places and |
|
|
41:35 | there is four different partitions and if look at it there's continuous range |
|
|
41:42 | 3 and 4567. So these are gun continues for complying with the notions |
|
|
41:49 | place petitions. But then one can use this triplet notation. So and |
|
|
41:57 | second from the bottom is just zero and that means there was no third |
|
|
42:06 | of Australia's one. So the best 01, 2, 3 and there's |
|
|
42:11 | of them. So it was against location, the length and then potentially |
|
|
42:18 | strike. So that's what the last uses this try notation that vessels as |
|
|
42:26 | between different instances Is four. So basically start zero. The next one |
|
|
42:33 | at four and the next one starts eight. Yeah, so that was |
|
|
42:41 | places are defined. And then there's spine clause that allows you to prevent |
|
|
42:48 | operating system from moving the threat during execution. Mhm. So and that's |
|
|
43:00 | master threat definitely policy as I So if for instance the place is |
|
|
43:06 | socket, that means there are many threats that can execute in the same |
|
|
43:12 | As the master 3rd. So that that case in potentially makes sense to |
|
|
43:18 | the master definitely policy. Yes. place is kind of rich enough to |
|
|
43:28 | manage several threads concurrently. So I . Okay, so I will give |
|
|
43:39 | particular questions. I'll take them As or 2 more slice the mental. |
|
|
43:48 | , yes, I think the next is the graphic administration of all the |
|
|
43:52 | here because as but the open mp use instead of compact, compact was |
|
|
44:07 | my colleagues at Berkeley used instead of MacoS is the official open empty name |
|
|
44:19 | placing threads as close as possible and they have spread instead of scatter. |
|
|
44:27 | now close. So then it depends there are a few words fred's than |
|
|
44:35 | number of places then as previous kind straightforward. So one just allocate threads |
|
|
44:49 | the places and one starts with allocating . Two places where the master thread |
|
|
44:59 | is executing, right, so, I'll illustrate that the next slide, |
|
|
45:07 | it's confusing in the text. On other hand, if there's more threats |
|
|
45:11 | places, Then one basically does the of threads in order to much the |
|
|
45:20 | of places. And I also show something on the next slide, |
|
|
45:29 | so on the top there is the of fewer threats and places and in |
|
|
45:36 | case the parents or master thread was and running place number five. And |
|
|
45:48 | that means in this case there was threads. So the thread allocation starts |
|
|
45:55 | plays five and then it goes through places in Iran Robin fashion because there |
|
|
46:04 | fewer threats and places, that means will be in this case a couple |
|
|
46:10 | places that don't get assigned in the , there is no kind of |
|
|
46:17 | On the other hand, on the when they're small threats and places and |
|
|
46:25 | we have these clothes allocation policy in case there were 13 I guess threads |
|
|
46:39 | there are eight places, so there small threads some places, so that |
|
|
46:45 | in this case on the front assign to threats to places than my could |
|
|
46:53 | old 16 uh blocks of threats. , of course there was only |
|
|
47:02 | So that doesn't quite work out, it means well first go through the |
|
|
47:08 | two threads per place or again starting the master or parent threat was |
|
|
47:17 | So the first two threads gets allocated place, five connects to play six |
|
|
47:23 | and then run still try to use the places. So that means some |
|
|
47:31 | places only get the single threat but can be implementation dependence, that could |
|
|
47:37 | be the case that place, number threads, 10 and 11. And |
|
|
47:43 | there's the remainder part of in that would get allocated to place, number |
|
|
47:50 | , Okay, any questions on And then spread is then trying to |
|
|
48:04 | things out and then there's also a depending on when there are fewer threats |
|
|
48:10 | the number of places or if there more threads then a number of places |
|
|
48:15 | this case, the blocking also takes and I think I have another graph |
|
|
48:22 | that tooth, so when I want do this spread type allocation then and |
|
|
48:33 | are few threads on this case. I took an example with three threads |
|
|
48:40 | then the eight places get partition up place petitions and each petitioning, so |
|
|
48:51 | this case one would need three petitions there were three threads and three threads |
|
|
48:58 | kind of fewer than the number of , so and in this case it |
|
|
49:02 | not an even divide again, so place partitions only have two places in |
|
|
49:09 | case only one. And then the starts again with the petition that in |
|
|
49:18 | the parent or massive friend runs, that gets 3rd # zero and then |
|
|
49:24 | and a round robin fashion so The partitions and each place petition gets one |
|
|
49:33 | and at the bottom is the other when there's more threads than petitions. |
|
|
49:44 | . Uh This case, in fact doesn't end up being all, that |
|
|
49:52 | not different at all compared to the close partition case because of how the |
|
|
50:01 | relationship between the number of places and being worked out. But I think |
|
|
50:13 | basic idea is perhaps best illustrated in top one and I have a few |
|
|
50:17 | examples um showing again this close and allocation of threats, so the or |
|
|
50:33 | . Thanks again, you made, want to play with him in the |
|
|
50:38 | MPI assignment that you do have. . Um yes. So that's what |
|
|
50:49 | said really when things are not. then there's a buying clothes that |
|
|
50:54 | prevent they're always from moving threats around I know that at some point, |
|
|
51:05 | you're experimenting with bind. So I know if you have any comments to |
|
|
51:09 | with the bind. Yeah, Thank . Obviously by definition it prevents moving |
|
|
51:19 | trends within a place. Uh You have some outputs but I need |
|
|
51:26 | find them. Maybe I can show in next picture. Okay. |
|
|
51:32 | I'm sorry. I've been think of but I know we did talk about |
|
|
51:36 | and I can't remember what example you play with. We'll try to demonstrate |
|
|
51:42 | next time. Um, somewhere sorta instead of the homemade examples. So |
|
|
51:51 | is I think it was done on old version of stampede. Not the |
|
|
52:00 | one I believe. So in this , as you can see there was |
|
|
52:11 | socket server, 12 cores per socket hyper threading. So two threads per |
|
|
52:21 | and so on the right hand side , see the kind of illustration with |
|
|
52:26 | kind of rounded corner boxes being the . And then you have the course |
|
|
52:32 | the soccer than the brownie squiggles. don't say hyper threads in each corner |
|
|
52:43 | the bomb. You can also see you can get the listing of what |
|
|
52:50 | call the physical idea that has done socket I'd and then within each socket |
|
|
52:56 | have the core idea that again is or it's unique for each soccer. |
|
|
53:02 | it starts the lower for its And then you can see the threat |
|
|
53:10 | that in this case goes through which common default allocation of threads at. |
|
|
53:21 | there are other versions too. But this case basically you can see that |
|
|
53:27 | get allocated one threat per core, moves from um Score zero and then |
|
|
53:37 | next gets in core, one on same socket and it goes through until |
|
|
53:44 | headache allocated a threat to each one the core on the first socket and |
|
|
53:48 | they moved on to the second But they're also allocation schemes where you |
|
|
53:55 | between the sockets first and I'll come to that later. So there's ways |
|
|
54:01 | again figuring out how you do the and the coach, you know, |
|
|
54:09 | examples on the next side. So is there is some part illustrating different |
|
|
54:17 | of specifying The places to which one something allocated. This had not been |
|
|
54:25 | um, place statement. So by and it does, does the |
|
|
54:34 | course in this case. So it's of different examples. So basically two |
|
|
54:40 | for corporate office and this case was buying clothes. So the allocation is |
|
|
54:48 | going through um, the first in this case on the first socket |
|
|
54:56 | should say, whereas in the spread this case again it was going through |
|
|
55:06 | the threads out among all the cores but not alternating by the sockets But |
|
|
55:16 | place partitions of size two And placing thread in each partition of size |
|
|
55:25 | So this is I guess this example here's another way all more forcefully listening |
|
|
55:37 | particular places so it gives the same as the previous ones. But instead |
|
|
55:46 | having just close construct explicitly assigned threats course. So there's nothing unique otherwise |
|
|
55:59 | that. And here is an example the triplet notation without The stride argument |
|
|
56:10 | assume the stride is one. So said, well starting place is zero |
|
|
56:19 | I have four threats to allocate. that's the cold on four. So |
|
|
56:24 | just increments by one because this study implicitly one. Mhm and Mark. |
|
|
56:30 | then on the right hand side create spreads location by using again in this |
|
|
56:39 | the trip at plantation and then have strike to be too mhm Oh. |
|
|
56:49 | . So any questions so far? this is not playing with hyper |
|
|
56:53 | Well, you know, have to explicitly. I'm dealing with the |
|
|
57:04 | Okay, so now it says that hardware threat is the target And then |
|
|
57:12 | wanted eight threads and in this case default was to take the first kind |
|
|
57:21 | threat in each core. Yes. then just go through all the |
|
|
57:30 | in this case that gets one thread To get the eight threats allocated and |
|
|
57:40 | also invest has used the course and the same outcome, in which case |
|
|
57:49 | the system doesn't care which one of two threads um gets allocated or taking |
|
|
57:57 | of the threat that one wants allocated the court. Mhm. This is |
|
|
58:16 | , I guess the only interesting part if I look at the second place |
|
|
58:22 | material that starts at um Yeah, eight, right. And then it |
|
|
58:31 | eight instances. So in that case get the second statement here to allocate |
|
|
58:39 | to the second threat in each court I think that's or one can instead |
|
|
58:49 | using course, which is the right side. Well then make your place |
|
|
58:56 | Of the zero and the 8th So that means now this is basically |
|
|
59:01 | gets one thread and there's no one . eight threads so it just Create |
|
|
59:09 | such instances or for they ain't tends be allocated. Uh huh. So |
|
|
59:22 | just more example playing around with um case, again, their places on |
|
|
59:31 | left hand side places these threads and then it's also the bank calls is |
|
|
59:39 | , so in that case we'll set in the first socket or if it's |
|
|
59:47 | then it gets allocated across. So is many different ways you can specify |
|
|
59:54 | the same thing but point can also control things down to which hardware threat |
|
|
60:02 | more. This is just the printer get that demonstrates which fred X to |
|
|
60:18 | executed is allocated where in terms of hardware threats. So these are our |
|
|
60:28 | different ways of illustrating the same So in this case and looks up |
|
|
60:35 | left most illustration here, there was of eight threads and um it was |
|
|
60:45 | cores and Listed to surpluses, corporate or 16 cores. So in this |
|
|
60:54 | there is um the first threat gets , disappear and then the second threat |
|
|
61:05 | advocated to the next core etcetera. basically one thread per core um in |
|
|
61:13 | case and this is just a different of printer for the same thing and |
|
|
61:21 | certainly the more readable. So I this option are worked. Um oh |
|
|
61:28 | , 1 things printed out. So investment system and sort of eight cores |
|
|
61:36 | socket. So this is what you see in the right most one that |
|
|
61:41 | huh since they wanted A threads but 16 course they got spread out using |
|
|
61:48 | other course. So this is, know, core zero, got 246 |
|
|
61:53 | eight and then it repeats for the socket. Any questions, nope too |
|
|
62:14 | ? Um so this is another I don't think then I wanted take |
|
|
62:24 | look at it, but I think skip talking about it, I wanted |
|
|
62:30 | talk about something else. Let's Yeah, think I'm more or less |
|
|
62:41 | about these things. So let me to what I wanted to talk about |
|
|
62:46 | . So there's plenty of examples symbol put out in terms of seeing how |
|
|
62:51 | can specify where things are now um I said sometimes, well I can |
|
|
62:59 | between sockets um uh huh And during allocation when there are several sockets in |
|
|
63:09 | old on this kind of round robin sockets or one can do was done |
|
|
63:15 | example, one threat for core. huh. In the socket until all |
|
|
63:22 | has got the one threat, then move on to the next um socket |
|
|
63:28 | that can be controlled. So mm is so Intel has a fairly sophisticated |
|
|
63:39 | for how to control allocation of So this is vendor specific IBM as |
|
|
63:48 | . I haven't seen what they indeed use but I'm using this example from |
|
|
63:55 | but IBM as a similar but not way or managing our friends are |
|
|
64:04 | Two execution threats are allocated to the and in the case of IBM they |
|
|
64:15 | four threads per core. So that's bit more options than name them. |
|
|
64:22 | so they have this what they call the type and the premiers and the |
|
|
64:28 | and I'll try to illustrate what these are in the next couple of slides |
|
|
64:34 | certain, simply just started from the here, the position where the third |
|
|
64:41 | starts um type talked about in this they call it compact like the birth |
|
|
64:51 | folks. Not close as open empty um talk about it and the scatter |
|
|
64:59 | then they have a couple of more of our controlling thread, the location |
|
|
65:06 | commute. Yes, I will illustrate . But that is relating to what |
|
|
65:13 | said. We're the alternate between Soccer's or Kind of Core 1st. So |
|
|
65:25 | they have basically and some a show the next slide mm hierarchy of entities |
|
|
65:35 | then they can kind of change the of the levels in the hierarchy. |
|
|
65:44 | it is more explaining of what the attributes are for type. But let |
|
|
65:54 | show this thing that I wanted to you. So this, it's kind |
|
|
66:02 | the default. All right. as an old and it has a |
|
|
66:11 | of sockets and I don't know why use package three instead of package |
|
|
66:16 | package zero and three. I have idea why they can. If it's |
|
|
66:20 | typo in this from where I be borrowed this slide from, from an |
|
|
66:26 | folks. But anyway, saw logically old consistent sockets and each socket has |
|
|
66:34 | number of cores and then it's for in the case of to threats if |
|
|
66:41 | threading is enabled, so sure have . And this compact or close |
|
|
66:53 | And the thing that happens is what illustrated on the previous sites do |
|
|
66:59 | Okay. In this case if not in this case, eight threads |
|
|
67:09 | allocated. Starting with core zero and in core zero hard reference zero. |
|
|
67:16 | then the next execution threads gets advocated the other hardware threat. In course |
|
|
67:23 | . And then you move on the and then you move on to the |
|
|
67:28 | socket and fill up the thread score core and the next socket. So |
|
|
67:35 | is exactly how it works. and this month is the scatter |
|
|
67:45 | So in this case attracted, one through the course in order. So |
|
|
67:52 | first socket zero and the core then core one in the same |
|
|
67:58 | And if there are more courts to in the socket but move one corner |
|
|
68:03 | the time and then move on to next socket. So that means you |
|
|
68:13 | through and basically get one threat per per socket and then you'll come back |
|
|
68:22 | . You see other harbor threat and core in order from Soccer zero onto |
|
|
68:35 | Sockets. So as you can see different. So one goes actually, |
|
|
68:44 | this one sorry. And this alternated Sakis, I'm sorry for confusing. |
|
|
68:51 | are miss read slides. So as can see first in this case, |
|
|
68:58 | this is alternating between sockets for the auction right against Oh, I don't |
|
|
69:08 | , I'm sorry. First threat in zero core zero. Then the next |
|
|
69:17 | is in the next socket and then back to the first socket and then |
|
|
69:24 | the next core and then back to other socket and take the next core |
|
|
69:32 | then goes back again to the first . But then Use the first hardware |
|
|
69:39 | or second threat in the court. . Oh, and then uh, |
|
|
69:47 | other example here from the folks that I use the offset here, that's |
|
|
69:56 | I wanted to show it to. in this case There are also |
|
|
70:00 | 3. So in this case, , it starts basically in the third |
|
|
70:15 | in this case, but here is the first threat is allocated, but |
|
|
70:20 | things are progressing from there. So is just illustrating the offset where you |
|
|
70:28 | the trade allocation and then, so of the first compact that didn't have |
|
|
70:33 | offset attributes. It started Back in zero. But then I think the |
|
|
70:43 | one is where they also commuted the most important player in terms of this |
|
|
70:54 | of places to which things can get , Um and then the offset |
|
|
71:02 | So it starts in this case, . And again, 4, |
|
|
71:09 | But even though it is compact the next threat is not allocated. |
|
|
71:18 | other hardware thread in the same but it went on to the next |
|
|
71:26 | . So basically levels got promoted. the most important part? So there's |
|
|
71:33 | different ways that the vendors have Two. Um let uses allocate threats |
|
|
71:48 | course to get the maximum out of resources basically, whether it's again memory |
|
|
71:54 | limited or compute limited and where data allocated to make sure that there is |
|
|
72:03 | between the execution and most of the that threats interact with or use for |
|
|
72:11 | execution. And I think that's then a here's examples of performance tuning but |
|
|
72:23 | tongue is pretty much up but we'll highlight them the thing I skipped for |
|
|
72:30 | up for those interested in the um stuff. So I'll just tell you |
|
|
72:37 | it is. The standard examples Yeah. Uh Uh huh. It |
|
|
72:44 | further back than I thought, so . Mhm. So what it is |
|
|
72:52 | three examples, I encourage anyone interested the past construct as well as open |
|
|
72:59 | code for can only, for in addition to matrix application, a |
|
|
73:04 | factor is Jacoby method. Fight confidential the grid, nearest neighbor communication or |
|
|
73:14 | In the two D Array. It's common and there is a few slides |
|
|
73:18 | shows first just the standard way of it, then how to use the |
|
|
73:24 | construct and some this question of how can group some blocks of roles if |
|
|
73:32 | do it by roman or do it columns to reduce the number of tasks |
|
|
73:38 | get the better performance and less And then the other example is so |
|
|
73:45 | false idols method is another h m for solving linear systems of equations and |
|
|
73:51 | use that to create flexible. Yeah update rules, they called my |
|
|
74:00 | If anyone is familiar with it, a very simple method, but it's |
|
|
74:04 | very efficient, so it doesn't converge that quickly. So this guy Alcedo |
|
|
74:10 | converges to be quickly, but then can basically create what's known as away |
|
|
74:16 | , so that's what often used to out how to order computations or schedule |
|
|
74:24 | . And here's an example then how create way France, they are stable |
|
|
74:30 | . And then there's also uh some the test construct on eventually evolves to |
|
|
74:38 | to figure out how to yet enough and the parallels between way France. |
|
|
74:44 | what can have this recently called wavefront that you have this color diagonal lines |
|
|
74:52 | basically shows how one went up and um before the previous wave front is |
|
|
75:00 | , as long as they basically trailing other, you can have several away |
|
|
75:05 | going on concurrently and updating degree and the other generic test cases, value |
|
|
75:15 | . And I believe one of the , if I remember correctly, I |
|
|
75:22 | remember which one is using L. . Well, maybe I'm just |
|
|
75:29 | Yeah, but there is also a version of any decomposition among the slice |
|
|
75:35 | escape for that sometimes. So I those for you yourself to produce and |
|
|
75:43 | you have questions on them, if do take time to look at |
|
|
75:47 | we'll be able to answer questions on . Okay, my time is |
|
|
75:58 | So it so far amount of things have brought up in terms of open |
|
|
76:07 | . Um but much of it this , something that could be of interest |
|
|
76:15 | you in terms of being projects. that turns out to be an interest |
|
|
76:22 | otherwise, um you don't need all it for doing your assignments, n |
|
|
76:32 | commerce from you. Yeah. Looking stop the recording |
|