© Distribution of this video is restricted by its owner
00:02 | so the last time trying to convince that power and that is the consumption |
|
|
00:08 | in fact a big deal in many respects. And that's why I have |
|
|
00:13 | to the sport of the such PC for students. Also, the gap |
|
|
00:17 | to, um, not only be about power and energy consumption, but |
|
|
00:24 | learn how you can get some information total the codes do in that regard |
|
|
00:32 | . So today I want, as mentioned last time trying to wrap it |
|
|
00:36 | on, I will talk about two in particular something known as dvf sor |
|
|
00:42 | voltage and frequency and control for which is something that all the processes |
|
|
00:48 | these days and that cannot see. fact, um, benchmarking in the |
|
|
00:54 | that you can get unrepeatable benchmarks unless kind of locks things in. And |
|
|
01:01 | I'll mention a particular standard is being , uh, for as an interface |
|
|
01:08 | what the process is due and the system known as the P I and |
|
|
01:16 | . Depending upon how quickly I I'm gonna talk a little bit about |
|
|
01:19 | Dent. ER Data Center case started Facebook that was interesting, but depending |
|
|
01:26 | time, a social. I may mentioned that much about it in order |
|
|
01:30 | move on to open and peak. um not insured. Trying Thio. |
|
|
01:40 | voice this sprung. So now about go on MHP Politician frequency scaling. |
|
|
01:51 | is it? So I think at once before I talked about See most |
|
|
01:57 | properties that it's the power consumption is to the voltage square and times the |
|
|
02:04 | . That's a dominating, dynamic energy for the most technology, whether it |
|
|
02:10 | for processors and memory of some other off functional unit. So the point |
|
|
02:18 | that there, as a kind of ish line, shows here. If |
|
|
02:23 | application is not particularly sensitive to the frequency on the processor, then 1 |
|
|
02:31 | want to keep a relatively low So because then it would not affect |
|
|
02:37 | runtime much. But it reduces the consumption so you gain in energy efficiency |
|
|
02:44 | slowing down the clock a bit. that's clearly the case if you have |
|
|
02:49 | that is memory about in terms of application, and that's partially what I |
|
|
02:54 | ah, repeatedly sort of mentioned to Thio get an assessment for the |
|
|
03:00 | whether it's Memory LTD's and not in of size but in terms of |
|
|
03:07 | by and with all agency. Or it is CPU, Lemme limited. |
|
|
03:12 | I'll come back to that in a to So this is the basic idea |
|
|
03:16 | controlling voltage and frequency, which is dynamically, or northern processes, whether |
|
|
03:22 | from Intel or M D or IBM some others. He was kind of |
|
|
03:29 | little bit of all this slide that done many years ago, but we |
|
|
03:34 | that started something known as the Green list as a way of promoting energy |
|
|
03:41 | and computing. It did this, , I can degree in something cluster |
|
|
03:48 | built when he was at Los Alamos Labs. And what this slide kind |
|
|
03:53 | just shows is the colonel Light colored is the slow down or the performance |
|
|
04:04 | by applying this dynamic voltage and frequency idea So most of the time you |
|
|
04:11 | see for the codes they tried the impact waas less than 5%. Where |
|
|
04:20 | the taller um, dark red bars and an energy efficiency gain off in |
|
|
04:30 | 10 to 20 plus percent, depending what the application Waas and the upper |
|
|
04:36 | hand graph shows kind of true I would say they're more than a |
|
|
04:45 | . Where is the top center Is comes from Nass Parallel benchmarks that |
|
|
04:52 | except the benchmark. Now that's more Colonel Cold pieces of cold developed by |
|
|
05:00 | many years back. And it includes G. Conn radiant method. If |
|
|
05:06 | familiar to some of UFT for you and I seem to get sort. |
|
|
05:11 | Luo is caution elimination effectively. So a reasonable range off kernels are or |
|
|
05:19 | algorithms that is used to many scientific engineering applications. But I'm not going |
|
|
05:26 | go further interest. It just shows experimented that on a particular processor, |
|
|
05:32 | 15 20 years ago, in so this is a little bit conception |
|
|
05:35 | what I talked about. And this an example was actually running on the |
|
|
05:41 | processors. So until has algorithms that frequency and voltage is on their chips |
|
|
05:50 | a function of how the workload and more that chose this read |
|
|
05:58 | Design is how the CPU power consumption ah is affected by the frequency, |
|
|
06:13 | it's as energy. So that's assume fixed times. It doesn't say |
|
|
06:16 | but it is affecting up the power as a function of the clock |
|
|
06:21 | again using the so called square low C most the dashed bluish curve. |
|
|
06:30 | sort of declines with increasing frequency shows that is typical for applications that are |
|
|
06:40 | or clock frequency limited off the So it tends to be that, |
|
|
06:47 | , the run time decreases more quickly the power consumption goes up. But |
|
|
06:55 | clock frequencies. So that is the for a well optimized may takes multiply |
|
|
07:01 | , for instance. And so one and that has been used to control |
|
|
07:10 | and frequency for such applications is was as, um, run to the |
|
|
07:17 | or race to the whole basically, you run a fast as you |
|
|
07:20 | and then you stop. And for bound applications, that's a good |
|
|
07:29 | whereas from memory bound applications, that's the good strategy. So this particular |
|
|
07:35 | that they called Earth for Energy, around toe raise to the heart |
|
|
07:41 | try to based on real time as this applications execute decide a lot |
|
|
07:51 | the optimal strategy for clock frequencies, this was one of the greenish curve |
|
|
07:58 | to illustrate. So that's something that an operational try, unless you try |
|
|
08:02 | prevent it. Then you run cold sky like and subsequent processors from Intel |
|
|
08:12 | , so I'll talk a little bit this standard and has been developed for |
|
|
08:22 | an interface between the hardware and the system. And this is first just |
|
|
08:28 | very global view of the application of on top of the operating system. |
|
|
08:32 | then has an idea, or what hardware can do and what the car |
|
|
08:41 | are. And these days that I mention before about all processes processes today |
|
|
08:50 | temperature sensors built into them, and the answer is being used in controlling |
|
|
08:56 | , all teaching and frequencies in order make sure that things don't overheat. |
|
|
09:03 | some algorithms like rappel. That is you're using for Assignment. Three. |
|
|
09:09 | take thermal inertia into consideration so they control and allow um, sort of |
|
|
09:20 | , uh, participation for short periods the chip is relatively cool. So |
|
|
09:26 | has Fella sophisticated Thurman models to make that things don't over eat. And |
|
|
09:34 | is something that happens again by Where? So there is a |
|
|
09:38 | I mentioned that before their separate processors , now the course, but separate |
|
|
09:44 | of, um, projects that decides how to manage power and clock frequencies |
|
|
09:52 | the Children. So this is sort high level diagram. Aha ! This |
|
|
10:00 | c p. I is architected if like. And that's something that's primary |
|
|
10:09 | , well started to be developed for processors because they are operating most time |
|
|
10:17 | batteries, so that's a very limited resource. So power management has been |
|
|
10:26 | big deal for a long time, S o, as it says on |
|
|
10:33 | lighter, many different kind of what they call the global system first |
|
|
10:37 | top. And then there are is of system. Sleep states as they |
|
|
10:43 | , and I'll mention that develop it then I will spend some more time |
|
|
10:48 | about C states and also peace tapes doesn't show on this particular slide. |
|
|
10:56 | here is a little bit so the states and peace states are the ones |
|
|
10:59 | are particularly relevant for this course and you're doing with CPUs and memory, |
|
|
11:10 | it's basically very cartoon shows. It's . The operating system in Communicates with |
|
|
11:17 | is a C P I subsystem that vendors do support and systems firmware on |
|
|
11:24 | vendors decides on what things they want manage themselves in which things they want |
|
|
11:30 | leave, uh, or let the system also help manage. So here's |
|
|
11:39 | little bit more detailed picture on how things work. So the yellow box |
|
|
11:44 | the system states on the way the works for all of them, whether |
|
|
11:50 | CPU states for system states or any of the other A components in the |
|
|
11:56 | is that index, or number after the letter determines something that is |
|
|
12:04 | operational and then as a digit increases the letter, it's increasingly deeper sleep |
|
|
12:16 | , and I'll talk a little bit about that when it comes to see |
|
|
12:19 | and, yes, most of he states. I'll talk about the |
|
|
12:26 | also for peace steps, but they're sleep states. So this just little |
|
|
12:34 | . What he stated states actually mean terms of the system states, I |
|
|
12:40 | not talk much about it, but can see again. That s zero |
|
|
12:46 | the fully operational states. Nothing is ah turned off, down or |
|
|
12:54 | So any degrees, everything is kind fully powered up and fully ready to |
|
|
12:59 | and do things. And then, if you turn to some highest system |
|
|
13:06 | , then things are not ready to in terms off CPS or memory. |
|
|
13:14 | I said, I'll talk a little more about the sea in the P |
|
|
13:20 | and this slide just tried to show what's kind of used to control the |
|
|
13:27 | we expect to see in peace And so first, uh, the |
|
|
13:36 | here on the upper half on the side, this clearly shows the |
|
|
13:43 | um square low again in person before three square times F. And that's |
|
|
13:49 | dynamic part that is affected by the and frequency control that is through the |
|
|
13:58 | control off participation and energy consumption on today. You also mentioned, |
|
|
14:10 | a few terms the clock gating and modulation. I'll just explain them on |
|
|
14:14 | next slide very quickly. I won't into it. My clock gating. |
|
|
14:18 | music turnout off the clock modulation is little bit refined version of complicating |
|
|
14:26 | Gaining is means that you turn off to some subsection of the chip |
|
|
14:33 | In one of the previous lectures, show that today's chip they have a |
|
|
14:38 | of what's known as Power Islands that individual control off the power on the |
|
|
14:45 | a typical The core is its own islands, or they can be turned |
|
|
14:50 | and off individually. And then there's power domains for other parts of the |
|
|
14:58 | . That comment, of course, , and on the bottom left hand |
|
|
15:06 | and shows what affected by clock gating clock modulation for that matter and what's |
|
|
15:12 | by the power gating. And here's instead, that would explain the |
|
|
15:19 | Just the clock. Gating is simply the clock for a period of time |
|
|
15:24 | issuing a cop Stop clock assertions and then it stops the clock. |
|
|
15:30 | that's it. Whereas the clock modulation a bit too, finalists said then |
|
|
15:35 | have basically one thing that that enabling to turn often on the cloud for |
|
|
15:40 | periods of time. So if you a little bit more flexible of dynamic |
|
|
15:45 | of the clock. 1 may use modulation, but in actual life |
|
|
15:51 | um waas more frequently used or often in chips before the dynamic voltage infrequently |
|
|
16:02 | became the norm. So it may be used in certain cases, and |
|
|
16:06 | certainly implemented on chips. But the s is the dominating control mode for |
|
|
16:12 | and energy consumption. So here is little bit of an illustration to try |
|
|
16:18 | give you some idea what happens when enter some form of sleep states. |
|
|
16:27 | the reason we're having several suits levels sleep state is that the various levels |
|
|
16:37 | sleep states are again use dynamically, depending upon the period of in |
|
|
16:46 | the processor or core may enter quote , a deeper sleep state. But |
|
|
16:53 | deepest lead state is also associated with longer time to get it back to |
|
|
16:59 | operational mode. So, as the of graph on the kind of center |
|
|
17:07 | the right of the slide shows is , um, the C zero. |
|
|
17:12 | a fuller operations. Stay at the . In fact, today is in |
|
|
17:20 | first columns. Everything is kind of , but When you enter this, |
|
|
17:25 | one state that is a light sleep , if you like, then you |
|
|
17:32 | turn off the clock to the But everything is on. So cash |
|
|
17:37 | our retaining their information or data so really happens. Thio the cash sub |
|
|
17:46 | on chips How everyone you enter this state, then you may actually plus |
|
|
17:55 | lower level cash is like l one l two. And if you have |
|
|
17:58 | L three cache for it typically means that the content of L one and |
|
|
18:03 | two is written to elf the other cash and eventually, as you enter |
|
|
18:10 | so the furthest deep state that they seven on Intel processors than all the |
|
|
18:18 | are flushed and the cores are kind fully turned off. And that's also |
|
|
18:24 | in the highest energy savings. the most of the highest, Layton |
|
|
18:31 | . They came back into business. eso the next couple of slides shows |
|
|
18:39 | little bit of the complexity that has . There's now processes are no longer |
|
|
18:47 | courtship, so it means each of course has their own C state and |
|
|
18:58 | kind of follows what within labeling that . I showed on the previous slide |
|
|
19:07 | it's just a sad added another C front. All listen C state for |
|
|
19:13 | no meaning. The core C when you get the double, sees |
|
|
19:18 | they signify that it is unique for of the course. And otherwise something |
|
|
19:27 | pretty much says what was on their life except a little bit of description |
|
|
19:33 | the C six and C seven and on the previous slide. And, |
|
|
19:41 | , then now there is again. course are a single processor or |
|
|
19:49 | as tends to be the word that being used for complete processor that gets |
|
|
19:54 | into the socket on their board. then the package also has see states |
|
|
20:00 | the package. C State depends, , the sea states of the course |
|
|
20:07 | the processor. So it is essentially the package assumes the lowest C state |
|
|
20:17 | the course and the processors home. cartoon, interesting to the right of |
|
|
20:23 | graph to the right, basically shows while if all the course and to |
|
|
20:27 | three for instance, then the package also allowed into going to it's C |
|
|
20:33 | state on correspondent Leigh with C six so it's good to be aware that |
|
|
20:44 | see states, for course, are managed as and then depending upon what's |
|
|
20:49 | situation on some, the chip, , common features in the package for |
|
|
20:54 | the course and, uh, gaps according to what's needed by the most |
|
|
21:02 | court only most likely sleeping court. I took this as an example. |
|
|
21:11 | is from Intel architecture Daniel can find the Web, and that shows the |
|
|
21:22 | , the normal normal time to exit entry, for that matter, the |
|
|
21:28 | levels. So sleep states. So , you can see it's in the |
|
|
21:34 | you know, microsecond range, A couple of 100 microseconds if you |
|
|
21:40 | in the C seven state and, , it's good to think about, |
|
|
21:49 | know, microsecond doesn't sound like and it isn't. But in terms |
|
|
21:53 | CPU cycles, it's actually quite a off work that could have been done |
|
|
21:59 | you have to wait for things to up. So it is good to |
|
|
22:04 | keep in mind with a clock frequency , or the chip in understanding what |
|
|
22:10 | penalty is for being in a sleep . Onda. Then I shouldn't say |
|
|
22:21 | , but the peace states. This something that, yes, performance states |
|
|
22:27 | this is actually the clock frequency off CPI use. So that means the |
|
|
22:35 | , uh, see status C So it's a fully active and that's |
|
|
22:41 | only case when the peace states a . And, you know, turban |
|
|
22:46 | is part of the peace states So the and again is in this |
|
|
22:58 | , T zero is kind of the clock frequency, and then it goes |
|
|
23:03 | on here is I'm sorry. It the opposite that, at least for |
|
|
23:10 | Inter guys that, um, the number is the lowest clock and this |
|
|
23:17 | for a couple of the Intel But the clock frequencies are for the |
|
|
23:26 | peace states on. As you can it, it's not the continuous |
|
|
23:30 | Soul is not the knob you turn basically can dial in different frequencies, |
|
|
23:36 | that comes from that. Those are generated on some kind of yeah, |
|
|
23:43 | the frequency divider off frequency multiplier. operates and discreet into Bolton for these |
|
|
23:50 | processors. It's about 132 140 make hurt for each jumping clock frequency. |
|
|
24:00 | on the Bader, it's something I again for at the Empire. |
|
|
24:05 | It's kind of an oldest process at point, but it's kind of hard |
|
|
24:10 | find this information. So I'm sorry don't have an update. But the |
|
|
24:14 | intervals hasn't changed much for its typically 100 plus megahertz peace between each |
|
|
24:22 | uh, these p states and again peace states, for instance, is |
|
|
24:28 | by algorithms like the Earth algorithm. it's also possible from users space to |
|
|
24:37 | to particular clock frequency. But usually requires some privileges on the system to |
|
|
24:46 | able to do that. Mhm This is kind of a summary off |
|
|
24:57 | I was saying that there is power . There is, um, an |
|
|
25:05 | to go in and control piece but normally is done by firmware that |
|
|
25:14 | temperature on the various parts of the into, um, consideration determine voltages |
|
|
25:22 | frequencies. So it is the case many off the current generation ships that |
|
|
25:31 | a few 100 temperature sensors, pieces of silicon Thio, understand what |
|
|
25:41 | is kind of hot or what parts the chipper recently cool so you can |
|
|
25:49 | raise the clock frequency. Andi, things are managed by the operating |
|
|
25:54 | As I mentioned in in a comment an earlier lecture, the operating system |
|
|
26:00 | choose toe move our clothes from the area to cool area on the chip |
|
|
26:07 | it's restricted from the insult. and I think that comment on that |
|
|
26:17 | and yes, so earlier when I about the need for managing power, |
|
|
26:27 | , uh, there was this Google just over 10 years ago. What |
|
|
26:34 | complained about the power consumption on standard were not proportion to the workloads. |
|
|
26:42 | they sort of, even when we're , was pretty much nothing. So |
|
|
26:48 | ships for idle they're still consumed more 50% off the power. So there |
|
|
26:56 | not much correlation between part of consumption workload. So the since then, |
|
|
27:04 | I mentioned, the voltage control has on to the diet used to be |
|
|
27:11 | the diet on the circuit board. now it's on the chip itself, |
|
|
27:15 | there's no power domains for each one the individual course and other key parts |
|
|
27:20 | the chip, and it's also, only a few years back individual clock |
|
|
27:27 | for the frequency control for the individual . That came a few years after |
|
|
27:34 | power control. But thes efforts has cause that the part consumption is |
|
|
27:44 | Ah, big advanced tours being more to the work. Little eso. |
|
|
27:51 | this case, the island ship may consumer around 2025% the full power at |
|
|
28:00 | work. Also, it's more proportionate the workload, but it's still not |
|
|
28:05 | fully proportional. And that's buy a . It's because of active power |
|
|
28:18 | so that will stop that offered some about this kind of rapid idea or |
|
|
28:24 | you about that there is dynamic control happens, and anyone is gonna wanting |
|
|
28:34 | do more advanced work. One can go in and control it oneself. |
|
|
28:39 | it is a good thing to be barrel because in how you develop |
|
|
28:55 | no more or no questions, then will just very quickly illustrate something that |
|
|
29:04 | give just a little bit of I'm not coming to me go into |
|
|
29:08 | . But anyone interested there is on blackboard, a paper from Facebook |
|
|
29:19 | which the next to slides are taken discusses, have Facebook controls the participation |
|
|
29:32 | its data centers. And I think quite interesting article Lucia put in So |
|
|
29:41 | it is just discovered. It's actually think, for the first lecture, |
|
|
29:48 | I started to talk about power. it's in under lecture age. You |
|
|
29:51 | find there this article from Facebook Um, so the way things works |
|
|
30:04 | not for a small cluster, but the big data sets. And mm |
|
|
30:09 | , sites It is that they as I mentioned power that is equivalent |
|
|
30:19 | small town. So they make draw 10 2030 megawatts or even more sometimes |
|
|
30:26 | the giant data centers. And that you have pretty much, at least |
|
|
30:34 | the quote unquote last mile dedicated car . And those are expensive the |
|
|
30:43 | So you don't over provision that so you build the supply infrastructure according |
|
|
30:51 | the expected demand, and that means can't really exceeded because then the equivalent |
|
|
31:00 | refused blows at the network, our level, and that if one of |
|
|
31:09 | big data center goes offline, It's significant load that disappears on the Net |
|
|
31:20 | that can actually cause instability in the grade. So this is serious business |
|
|
31:31 | this level. So controlling the partial important. Of course, the data |
|
|
31:42 | owners don't want Thio by a whole of access infrastructure capacity. So there |
|
|
31:49 | Thio just by the right capability from infrastructure so they don't want to again |
|
|
31:57 | for over provisioning. It's also the that if you ask for, do |
|
|
32:05 | on the utility, builds out infrastructure spent a lot of money on the |
|
|
32:11 | , and then it turns out you're using it much. Then there is |
|
|
32:17 | contracts with utilities. There's also kind this if you don't use as much |
|
|
32:23 | as your contract ID for and by . Several years back, there was |
|
|
32:33 | in The New York Times about Microsoft I guess, misjudged. They built |
|
|
32:42 | new data center in a smaller community come for our utility needed to build |
|
|
32:51 | lines. And then it turns out Microsoft did not nearly used as much |
|
|
33:01 | as they had contacted for, So were penalties and the cause, and |
|
|
33:07 | turned out that the penalties were stiff . So Microsoft choose to basically run |
|
|
33:18 | full blast for nothing, just in to avoid the penalty but for no |
|
|
33:23 | so working or useful load being carried . So anyway, this is a |
|
|
33:31 | deal to managing power. Mhm. so here is a little bit off |
|
|
33:40 | these things tends to work, given I just said about infrastructure and this |
|
|
33:47 | of points out ability for infrastructure at levels to sustain overload. So if |
|
|
33:56 | look at the red line on this what it shows that one has about |
|
|
34:03 | seconds. If the Lowell is about 30% above the designated or contract ID |
|
|
34:14 | before the main circuit breaker trips to utility network. So it's not much |
|
|
34:22 | to do adjustment of car at the point to data center after the rock |
|
|
34:32 | . That is the dark green You have a lot more time because |
|
|
34:37 | there's, you know, when you the central limit here and and |
|
|
34:42 | if you have a bunch of independent , then they tend thio, not |
|
|
34:48 | worst case doesn't tend to happen at same time at the whole event, |
|
|
34:51 | you get more lever in terms Using too much power at the Radcliffe |
|
|
35:01 | these slides have chose a little bit how the car variation is within certain |
|
|
35:12 | windows at different levels again in terms the data center, leftist Iraq level |
|
|
35:17 | right is the total data central and it shows that if you take |
|
|
35:24 | the entry point to the data the main switch your breaker than within |
|
|
35:32 | second vendors. At that level, are not very large. Most of |
|
|
35:39 | variations appears within just three seconds, if you allow time 10 minute window |
|
|
35:50 | once, most of the variations you know, within 5%. But |
|
|
35:56 | this larger? So the larger the in the and then take one interest |
|
|
36:03 | , then she in terms of the types of applications that run how much |
|
|
36:08 | happens in various periods of time, I'm not going to go into |
|
|
36:13 | But this press specials what happens within minute, how much off the variations |
|
|
36:19 | . See the effort accumulated distribution functions you, uh, the total number |
|
|
36:26 | our variations on that happens. So this case, pretty much everything happens |
|
|
36:34 | an, uh um the, um , all right. Our variation |
|
|
36:49 | It's, um, to take the line. That's some storage applications or |
|
|
36:56 | . Most of the variations happens within are less than about 10 ish |
|
|
37:04 | Um, now, So what to ? Things that have a hierarchical, |
|
|
37:11 | , structure for managing them. So manage things are direct level. And |
|
|
37:16 | go back and look at the first of network diagram or or data center |
|
|
37:21 | feed structure they do. It's hierarchically . Um, so they talked basically |
|
|
37:31 | rappel to get the information that the server level. And then, |
|
|
37:38 | the strategy is basically to have power , then that so they have a |
|
|
37:45 | , as this is the captain And then if they allow power, |
|
|
37:52 | for a short time period thio potentially that a little bit. But then |
|
|
37:59 | mean it reaches this threshold, and they do turn off services on. |
|
|
38:07 | turn off enough services that they hope get to the captain target, and |
|
|
38:13 | they keep the capping in effect until load has dropped below this uncapped ing |
|
|
38:21 | , and then they remove it, then the time to be that the |
|
|
38:24 | increases again. So on this is of just the same thing for the |
|
|
38:31 | level. And then I have a of examples here where the shows little |
|
|
38:37 | the various services and on the blue . And how many? The |
|
|
38:43 | uh, for the Web service and servers and feed newsfeed servers. And |
|
|
38:50 | many of these service was operating at given part level? And then the |
|
|
38:54 | line shows kind of capping targets. many were affected by tapping? |
|
|
39:01 | on this shows a little bit huh? The performance. They last |
|
|
39:06 | the functional production, which is, I wanted to show the next couple |
|
|
39:11 | sides were again quickly just to get idea What, um, how this |
|
|
39:17 | works in practice. Um, and greenish line shows the actual power. |
|
|
39:28 | then you're having, um, Parliament set for this group of servers and |
|
|
39:35 | the blue curves and shows how many were actually kept when the green line |
|
|
39:41 | the car. Captain Target. And , as one can see in the |
|
|
39:45 | follow their first blue rectangle that when green line drops of fixed shouldn't it |
|
|
39:52 | get the young captain target then, , the service and no longer capped |
|
|
39:57 | then comes back and goes. And I had a couple of other examples |
|
|
40:02 | showing more in detail. What happened ? These are in the article that |
|
|
40:07 | at. And then I had a on the case study that I thought |
|
|
40:12 | interesting when they show how things happened things went bad and how by this |
|
|
40:20 | power management that they have for their centers when something went wrong, how |
|
|
40:25 | actually then tried to recover and the didn't fully work. So they had |
|
|
40:30 | do several kinds of restart. But never Tom got to the point through |
|
|
40:38 | capping that the service totally The data did not get offline altogether, but |
|
|
40:46 | , so anyone interested in that type see how they actually do things for |
|
|
40:51 | real life. It's interesting, so this is a little bit what |
|
|
40:59 | claim. Andi. There are many Google. That's the same. They |
|
|
41:04 | use now automated party management in their centers and they all claim is |
|
|
41:10 | Significant amount. Oh, energy and Indians. Uh huh. And that's |
|
|
41:18 | I had in mind for talking about in that wall. Stop in that |
|
|
41:26 | extra slide you can look at stops screen sharing for a moment and |
|
|
41:34 | on to, um, finding my empty slides while I happy to take |
|
|
41:51 | . Mm, no questions or that it. In terms off harbor and |
|
|
42:07 | to manage hardware, they're actually in off performance and power. So now |
|
|
42:17 | switch to talk about open MP. , um, I'm sure most of |
|
|
42:31 | probably heard about open emptiness and when made and used it, Um, |
|
|
42:38 | unless along talk about what? The kind of system, as our systems |
|
|
42:48 | for open, empty, uh, water basic paradigm for using him open |
|
|
42:56 | people it is, um, and programming model and the memory models. |
|
|
43:05 | and I start to talk about open constructs, but I won't mostly cover |
|
|
43:13 | next lecture on by the a bit the victory following next lecture. So |
|
|
43:22 | , I guess against a reminder off talked about service. So this is |
|
|
43:29 | one indirect way of saying that Open . Primarily you should do it. |
|
|
43:36 | for the programming individual multicourse servers and typical server, you know, like |
|
|
43:47 | stampede to servers and dual socket servers Do they are Numa servers. |
|
|
43:55 | we mentioned before that there is memory associated with each one of the |
|
|
44:01 | um, on. But then tire has a shared at your space. |
|
|
44:11 | regardless of which processor or socket and core is it knows about the address |
|
|
44:20 | that corresponds to, uh, the chips associate ID with the entire |
|
|
44:28 | regardless to which socket there are So that means that they are kind |
|
|
44:36 | the new my architectures type variety, access to the local them's is faster |
|
|
44:46 | access to games. Um, the subject, but and then this, |
|
|
44:56 | shown that before so and again, MP is for shared memory systems. |
|
|
45:04 | not the only way of doing but it's probably the most commonly used |
|
|
45:12 | . Later on, we'll talk about p I that has message passing interface |
|
|
45:17 | you can also use for shared memory . But it's typically labels more heavyweight |
|
|
45:25 | of more overhead with it. So not necessarily what you want to use |
|
|
45:32 | a single server, and then one more directly use poll six threads for |
|
|
45:40 | shared memory programming and anyway, and MP is kind of convenience layer on |
|
|
45:45 | , proposing spreads so may dealing ah, parallel programming more easily. |
|
|
45:55 | , of course, there's automatic privatization you take the code, Um, |
|
|
46:01 | is using standard programming languages. Next C plus plus fortune. What have |
|
|
46:09 | that you should have no notion off or structures of memory or anything of |
|
|
46:16 | nature and hope that the compiler was to figure everything out. And so |
|
|
46:23 | , the success of automatic prioritization is limited. So that's why one |
|
|
46:30 | um, some form of added information , um, in terms of open |
|
|
46:39 | . It is by using directives something says on the next slide. So |
|
|
46:47 | MP is employed. You use one the standard programming languages and then one |
|
|
46:55 | . It's compiling directives to it. , when informs the compiler on what |
|
|
47:04 | wants toe happen in certain pieces of , and in order for things that |
|
|
47:09 | , then there is a collection around routines that supports they open empty. |
|
|
47:17 | then there's environmental variables that one juices define the execution context off the |
|
|
47:26 | empty cold. Um, the common of misunderstanding that may happen in the |
|
|
47:35 | days when we learned about parallel programming to not be aware off the fact |
|
|
47:42 | it has no notion off anything beyond shared address space. And typically that |
|
|
47:50 | only existing on a single server. of getting it beyond that. And |
|
|
47:54 | mention that briefly. But in terms stamping to, I don't think there's |
|
|
48:00 | layer you know, the software layer kind of, um, give the |
|
|
48:05 | of a true shared memories. So essentially the first order of business, |
|
|
48:11 | something you can use for programming single , and I advise you to get |
|
|
48:23 | when they open MP website. It lots of very useful information, as |
|
|
48:31 | hints or compilers and tools, lots presentations and videos, tutorials and, |
|
|
48:38 | , lots of very useful information. I encourage you to go to this |
|
|
48:45 | . Um, it also, of , has the current standards, and |
|
|
48:52 | it's very helpful to go and figure more details about how things actually |
|
|
48:58 | or at least what the standard Because there are a number of it |
|
|
49:08 | degrees of freedom that compiler writers and in order how to implement things are |
|
|
49:15 | . Everything is totally locked down in standard on here as suggestions for |
|
|
49:26 | I think there a few years but they're still highly relevant, so |
|
|
49:31 | don't necessarily contain the latest improvements in standard. But please, for the |
|
|
49:39 | assignment they're going to get. These were perfect defined. They cover everything |
|
|
49:46 | will need to know about. And very good videos, and I watched |
|
|
49:52 | so I know what they are on foot. Yeah, so this is |
|
|
49:59 | little bit Repeat a what brought up today and earlier about this notion of |
|
|
50:08 | that does in one aspect off shared processors. And that's the case for |
|
|
50:18 | servers. Today they are young memories with socket, so that causes there's |
|
|
50:29 | a uniformed access, depending where things in the address space. There has |
|
|
50:35 | symmetric about the processors, but they're longer common. This is basically the |
|
|
50:43 | structure that shows the application is a off again codes with compiling directives and |
|
|
50:51 | my use, uh, compiler that the directive and open MP compiler on |
|
|
50:59 | , then has with it libraries to some of the constructs. And, |
|
|
51:06 | I mentioned environmental variables that helps you the execution context. And then there's |
|
|
51:13 | run time libraries to help manage threatened . And that's kind of a joint |
|
|
51:20 | between specific upon MP runtime libraries and operating system. Who does what Onda |
|
|
51:32 | that we mentioned a little bit. that access So again, standard |
|
|
51:37 | Today they're numa shed memory system, there's lots of compilers available to support |
|
|
51:47 | . It is also what's known as , coherent distributed member systems, are |
|
|
51:54 | ? Some clusters have had this I'm not so sure that, |
|
|
52:03 | how many vendors supported today say the graphics used to be the dominating supplier |
|
|
52:10 | , uh, but that that Silicon do no longer existing that swallowed up |
|
|
52:16 | . If I remember correctly, and not sure that that's supported any longer |
|
|
52:21 | any one of the new products, may still the systems after that has |
|
|
52:29 | . And then, as I mentioned can create the software lay on top |
|
|
52:33 | distributed memory systems. Distributed memory systems just a collection of shared servers that |
|
|
52:41 | the shared memory connected over a But then we can create the illusion |
|
|
52:48 | shared memory by software layer. And well, sometimes called the studio |
|
|
52:55 | The SM or available be a So, um, and then, |
|
|
53:02 | course, the multiple ring systems that talked about. So yeah, as |
|
|
53:12 | mentioned that before that it's kind of layer on top of post six threads |
|
|
53:18 | make programming more easy or high level directly using posting threats. And the |
|
|
53:30 | is that the programmer or the user the strategic decisions off. But you |
|
|
53:41 | or want to be executed in parallel which may defined to keep sequential |
|
|
53:53 | And then for these so called parallel of regions of the code, the |
|
|
54:00 | is supposed to be able to figure how to do the details off, |
|
|
54:08 | , paralyzing it, Andi sharing work among threads, and we'll talk about |
|
|
54:15 | when it comes to the various constructs is being used in opening, which |
|
|
54:22 | here's a little bit more detail what division of labor is between the programmer |
|
|
54:28 | the compiler. Um so, as said, programming gives hint or directors |
|
|
54:39 | what the compiler and should try to out in terms off, how to |
|
|
54:45 | things and how to generate multiple threats those sections. And it's not |
|
|
54:55 | But there are then what's known as for allowing the programmer. Also, |
|
|
55:05 | decide how they work closely is supposed be shared among threats. And |
|
|
55:11 | um, there are synchronization primitives. are also constructs in the open MP |
|
|
55:22 | has implicit in transitions for the It doesn't explicitly need to put securitization |
|
|
55:29 | into the actual one off the more was a Traffic and Difficulty Fox about |
|
|
55:41 | an open MP in particular when it to performance. But sometimes also respect |
|
|
55:48 | correctness is to make sure that the sharing is correct. First, we |
|
|
56:00 | the correctness of the program and, , sensible respect the performance. So |
|
|
56:07 | talk more about that. That's go the constructs. But that's one thing |
|
|
56:13 | this was a non trivial in terms , open and be. And as |
|
|
56:22 | says here, there is no automatic in open emptying. That means it |
|
|
56:30 | take a sequential code that is not through directives. Thio the parallel realizable |
|
|
56:41 | you like. So it doesn't take sequential piece of code and try to |
|
|
56:46 | out what can safely be executed in . So it's on Lee in response |
|
|
56:54 | director from the programmer. So the then takes his directives and try |
|
|
57:06 | Paralyze it, basically generate a number threads. Then we'll talk about that |
|
|
57:15 | more detail. You know both the off. It's being used on bond |
|
|
57:21 | that happens, and then it does load sharing. Either it has some |
|
|
57:27 | rules, or it follows the instructions with the constructs of how to share |
|
|
57:33 | load. Among the threats, and a society that some compensation |
|
|
57:46 | No, if it's I think I threats a few times, and I |
|
|
57:52 | I even may have, uh, what it is before. But just |
|
|
57:59 | case anyone is uncertain, so thread kind off the atomic, a minimal |
|
|
58:08 | in this case that is an independent of control. So it's a execution |
|
|
58:15 | if you like that and has its program counter and has his own register |
|
|
58:22 | . It has its own associative but it's a little bit off subtlety |
|
|
58:28 | when it comes to open empty what means, and we'll talk about it |
|
|
58:31 | terms of the memory model. But it does a stream of instructions that |
|
|
58:37 | to be executed with its only in counter and register so that it's more |
|
|
58:51 | , then the process. That process may have one or many threads to |
|
|
58:58 | the task, and it has its address space where threads my share address |
|
|
59:07 | , uh, designated or assigned for process. So the process is sort |
|
|
59:13 | high level concept or broader concept that past threads, um, to use |
|
|
59:23 | to carry out the work. And is just maybe, you know, |
|
|
59:30 | a little bit of the program for process, has pretty much everything and |
|
|
59:36 | sac point to program conference and whereas then the individual threats they have |
|
|
59:43 | own. That point is program concerts registers, but they do share things |
|
|
59:50 | are common for all the threads in process, like the Total Address, |
|
|
59:57 | and user idea and other things. , too, may be required for |
|
|
60:05 | and all kinds of other purposes. why would one and use open |
|
|
60:15 | So the idea is that it should a portable program, and it is |
|
|
60:24 | song as open MP is supported on platform, and the version of the |
|
|
60:39 | MP is again supported on the So it's a little bit of trickery |
|
|
60:48 | not all compilers supposed support the same version. So if you use |
|
|
61:00 | a version of the standard, that is more reason so to |
|
|
61:07 | that is supported by one compiler like . C. C. For |
|
|
61:12 | it may not be supported by compiler another platform or GCC, so there's |
|
|
61:20 | little bit off gray zone in terms the portability when it just comes to |
|
|
61:25 | functionality. But the main idea is the same source code, but they're |
|
|
61:31 | the given sets of directive should be compatible for any platform up there. |
|
|
61:41 | there is no guarantee that you also at good performance are fun as optimized |
|
|
61:50 | open, empty code for a particular . That means taking into concertante features |
|
|
61:57 | that platform it may not work all well on another platform that doesn't have |
|
|
62:04 | same set of features, and I it by using directives is typically start |
|
|
62:13 | a sequential code, and then you directives. So again it's you have |
|
|
62:21 | to profiling tool to figure out what of the cold is most time consuming |
|
|
62:28 | then also realized that those pieces of code could very well be executed in |
|
|
62:37 | . You can start calling the program trying to paralyze those sections of the |
|
|
62:43 | and then successful and move on to of the called that after parliament |
|
|
62:52 | maybe new cases off the cold that now the most time consuming. But |
|
|
63:03 | , optimizing things for performance is not a trivial. That's right, |
|
|
63:13 | so we'll talk about that more. , So here's the basic idea off |
|
|
63:23 | open, empty kind of works. so mentioned. There are this construct |
|
|
63:31 | in the former directives in the source that informs the compiler that certain segments |
|
|
63:40 | the code is things that one would to see paralyzed and that's known as |
|
|
63:47 | regions on when the region's is comes an end, then one is back |
|
|
63:55 | sequential processing domain. And so there one thread that runs through all the |
|
|
64:04 | private regions that you may have in code that is known a semester |
|
|
64:09 | And it always kind of lives. then, in a parallel region, |
|
|
64:15 | threats respond common, illustrated by the pieces in this diagram. Dr. |
|
|
64:23 | . Yes. So this is more less client server model for open up |
|
|
64:30 | . Yeah, well, one time use it. Morris A fork |
|
|
64:34 | Since it's not different devices, client tends to be that there is one |
|
|
64:41 | and then can forum artwork to different . Eso In this case, it's |
|
|
64:50 | the same computing defense. That or or even chip that can then support |
|
|
64:59 | than a single threat. So that's logically, it's typically you'd as fourth |
|
|
65:06 | parallel list. Okay, eso now could be that the threats that is |
|
|
65:15 | in this forking action gets handed out clients. But this is kind of |
|
|
65:23 | then, when it that's more often the process level. So it's than |
|
|
65:32 | lot more heavyweight than yes, kind of allocating a separate set the |
|
|
65:42 | and stacks and program counter because the may grabbed instructions from the same place |
|
|
65:52 | the master fed. So there is necessarily in the program handed over to |
|
|
65:57 | other server. So that's why I and when? Because conceptually is |
|
|
66:08 | The I want because off the difference this is a lot more lightweight than |
|
|
66:14 | tend to happened in a client server . It, um So this is |
|
|
66:26 | the basic idea and share memory Yes, that there is this fork |
|
|
66:32 | joins a threat. There is synchronization . That happens, and we'll talk |
|
|
66:37 | about all these different constructs. you know, once you have created |
|
|
66:43 | number of threads than money is to , Yeah, eyes the work that |
|
|
66:47 | supposed to share it industry we carried in the region. How is it |
|
|
66:53 | to be shared? Among the threats we were talking and but more of |
|
|
67:02 | details and next lecture I'll talk a bit more about what happens in the |
|
|
67:10 | few minutes. So here's the basic than to make things to happen. |
|
|
67:17 | create a parallel regions on one use , as I mentioned a few times |
|
|
67:23 | open MP construct, and they're slightly syntax and fortune and see so most |
|
|
67:35 | students today, they don't c and plus plus and, uh, even |
|
|
67:40 | somebody also no fortune. So it slightly different a format fortune that I |
|
|
67:48 | show on this light. So has this pragmatist that is, |
|
|
67:56 | input to the computer that now comes open MP constructed than has a name |
|
|
68:05 | a collection of causes. And I talk a number of these constructs, |
|
|
68:10 | , and show one today, I , and then the causes that has |
|
|
68:13 | deal with how you want to construct be carried out. But the basic |
|
|
68:20 | is you create this region through this MMA, and then the region is |
|
|
68:27 | and enclosed within the curly brackets here then at the end of this parallel |
|
|
68:34 | , there is an implicit synchronization. doesn't need to synchronize the threats that |
|
|
68:41 | working on this parallel region, eso . The classes can specify the number |
|
|
68:49 | beds and all the behaviors, and talk about on. Here is I |
|
|
68:56 | one example that on show today and 60 parallel simplest form start region, |
|
|
69:06 | there's different forms of the parallel And this is the very generic ones |
|
|
69:10 | says What's comes. I want to a parallel region with the number of |
|
|
69:17 | . It doesn't tell how many threads use, but you can do that |
|
|
69:21 | show how that can be done uh, if you don't specify |
|
|
69:28 | you'll leave it entirely to the operating to decide how many threats to use |
|
|
69:35 | whatever algorithm it chooses to do and they can choose to do it |
|
|
69:40 | on whatever number off threads are available that particular instance. In time on |
|
|
69:48 | server you're running. And, the availability maybe not just based on |
|
|
69:57 | threads use. But as we it also takes, um, temperature |
|
|
70:03 | heat states that so it also may the number of threads based on what |
|
|
70:09 | thinks that the hardware can take without too hot. So here is just |
|
|
70:17 | little bit off a lot. An of what the left thing investor shows |
|
|
70:26 | open, empty code, that as in this case, it actually |
|
|
70:31 | , um, that for the parallels that will the entered after this |
|
|
70:39 | Mama the program. I wanted four and then on the right hands, |
|
|
70:45 | it shows what the compartment. And searching the code in terms of generating |
|
|
70:50 | fourth Feds and get the particular code then make sure that the four threads |
|
|
70:56 | generated and eventually synchronized again. Then said the I also mentioned the memory |
|
|
71:08 | that is important. And then we're to talk about various constructs. But |
|
|
71:17 | the shared memory model is the fact on this things are specified. |
|
|
71:30 | all fetch shares the same under So you have potentially you start the |
|
|
71:41 | is essential poll they have the program an address space, and then you |
|
|
71:46 | Paramount Region. Yeah, uh, number of threads are generated, and |
|
|
71:52 | threads can access any part of memory things are specified otherwise, and that |
|
|
72:01 | cause headaches and that you don't necessarily order and the progress of individual |
|
|
72:11 | So there is no guarantee that threads execute statements, and in order you |
|
|
72:19 | them to do so. That can crazy because, um, things no |
|
|
72:28 | to be deterministic compared to the sequential because things are not enforced in any |
|
|
72:33 | order. So for that reason, may choose to have certain data being |
|
|
72:43 | . Two threads. And in that , Onley. That thread can access |
|
|
72:49 | data. And I'll talk more about things worked on there than, |
|
|
72:55 | cause is that you have in order deserve how data should the rules for |
|
|
73:02 | data accesses? Um, yes, is the private data and its own |
|
|
73:10 | space. Or is it still in same global address space just inaccessible to |
|
|
73:15 | threats? What happens in general in , is that you, um, |
|
|
73:25 | within the address space of the So it's in the global address |
|
|
73:30 | But those memory location is only accessible the threat. Okay, so that |
|
|
73:38 | new memory gets allocated within the global space for the program, so it's |
|
|
73:46 | their programs at the space. But unique for the thread, so I'll |
|
|
73:57 | more about that. So that uh, it's a safe condition. |
|
|
74:06 | if you declare data to be it's initially un initialized. Unless you |
|
|
74:13 | it otherwise. So you need to careful and not believing, even if |
|
|
74:19 | use the same name that it automatically things from the global address space accessible |
|
|
74:24 | everybody. Or that at the end results gets accessible because then against the |
|
|
74:33 | , when the thread dies tall, is trickiness. We'll talk about |
|
|
74:41 | so this is a little bit more the structure. So I mentioned so |
|
|
74:46 | they one parallel constructs, basically to the parallel region. Then there's work |
|
|
74:53 | constructs that I haven't talked about, we'll talk about then. I just |
|
|
74:58 | this data model for the data and we'll talk about that that works |
|
|
75:03 | terms of shared in private variables. synchronization mentioned. And then there is |
|
|
75:10 | from variables that I was talking and I thought I had, |
|
|
75:17 | few minutes left, and I'll show on a quick example. And then |
|
|
75:22 | a repeat. The next number just show this non determine is that happens |
|
|
75:27 | basically block I talked about before. Basic Block is a piece of code |
|
|
75:31 | those are not familiar with. I compiler terminology and told that it's the |
|
|
75:37 | are set of instructions that just as entry point at the end and an |
|
|
75:43 | point at the end and no entry in the middle. So here's an |
|
|
75:48 | of a piece of cold. And here's now one block basic block. |
|
|
75:57 | this particular Golden says be one. the reason that Instruction five is not |
|
|
76:02 | is because if you look statement it has a go to statement of |
|
|
76:08 | so you cannot jump into basic So that's why five starts in New |
|
|
76:14 | . And then there's nothing going into of these six and seven or eight |
|
|
76:19 | . So this is now the second block, and then we can look |
|
|
76:25 | the next basic plot. There's nothing into, and then we have to |
|
|
76:29 | to nine. So that's again by . Was knocked in there one reason |
|
|
76:36 | well as they go to five statement that basic blocks and then we have |
|
|
76:42 | statement 13 goes to 23 so that definitely in the locked. And if |
|
|
76:48 | go to 23 we see it basically and nothing happens there and nothing else |
|
|
76:54 | into that. About 20 State 22 to go to five, so that's |
|
|
77:00 | sorry This one is a vacant block we have an exit and go to |
|
|
77:06 | . There was just one statement Then there's these two other basic |
|
|
77:11 | So here is basically but the cold looks like in terms of basic |
|
|
77:17 | And I will just wait. But will repeat it next time. The |
|
|
77:22 | couple of slides. So here um, and took generic Hello, |
|
|
77:28 | . And I will leave that in slice and the website by the |
|
|
77:32 | This will start a parallel region through program Open MP parallel. It doesn't |
|
|
77:37 | any number of threads that I want do have, um, but then |
|
|
77:43 | want to print out Hello world, it has for the thread id |
|
|
77:53 | So that's what you get. Thread investor gets it ID for threads, |
|
|
77:58 | then you run this cold and here's the output and, as you can |
|
|
78:03 | , is kind of jumbled because there no particular order inferred in which the |
|
|
78:09 | get to print out where they are the execution of the code. But |
|
|
78:14 | will talk about that next time. time is up, so I'll take |
|
|
78:26 | Yeah. So the next time I talk more again about the execution |
|
|
78:34 | that and how it took control, management off what's shared and what's |
|
|
78:44 | among other things. Because that's to personally. The most tricky part and |
|
|
78:52 | see help financing both of correctness and . Yeah, |
|