© Distribution of this video is restricted by its owner
00:00 | uh, so not, well, gonna lead into yeah, today. |
|
|
00:14 | about clusters and programming of clusters. before that, tell him, talk |
|
|
00:21 | some simple concept that, unfortunately, already used. But talk about them |
|
|
00:27 | little bit more and then talk about and cluster programming. We'll see how |
|
|
00:33 | we get. Uh huh. So little bit, Um, I think |
|
|
00:42 | have used the concept of speed up , and I'll try to talk a |
|
|
00:46 | bit more about it and then something I guess it's particularly relevant in terms |
|
|
00:54 | . And it's often used in terms carol computing. Um, and we |
|
|
01:00 | done with pile computing in terms of MP and and you could use. |
|
|
01:06 | it has been first, open, shared memory, the fact of the |
|
|
01:11 | note programming and then two other spaces terms of GPU accelerated notes. But |
|
|
01:19 | didn't necessarily discuss this concept of strong scaling, that you will see pretty |
|
|
01:25 | anything dealing with computing in terms of or talks, and then talk about |
|
|
01:34 | efficiency that is coming on the yes piece on me. How that is |
|
|
01:39 | with in papers. Um clusters are known as distributed memory systems, and |
|
|
01:48 | will try Thio. Talk that and it to what's known as massively parallel |
|
|
01:54 | that also distributed memory type systems and focus on a little bit. Not |
|
|
02:03 | . But these distributed memory system broke them together, which is the communication |
|
|
02:11 | involved, since there are multiple notes need to talk to each other |
|
|
02:16 | And if there's time left, us to get into the programming paradigms for |
|
|
02:23 | types of system, whether it's clusters employees. So first, um, |
|
|
02:32 | , uh, definitions, I would some terms are going to be used |
|
|
02:38 | the first several slides on reduced uh, cyclists per instructions are instruct |
|
|
02:47 | Converse instructions per cycle and talk a bit about compilers and compiler optimization. |
|
|
02:55 | it's a good thing to know on concept, off cycle score instructions or |
|
|
03:01 | instructions per cycle and I was supposed , and the cycles per second that |
|
|
03:09 | something you get from the clock rate were talked about more than once. |
|
|
03:15 | then we have talked about execution has basically whatever the work is that |
|
|
03:20 | supposed to be pretty done divided by corresponding execution time. So hopefully there |
|
|
03:29 | nothing reeling all that much new on , like justice reminder. Um, |
|
|
03:39 | then a little bit of definition of up. And I think it's just |
|
|
03:42 | integrity definition, uh, that they used e, I guess, specifying |
|
|
03:48 | too well before, basically, it's different in execution. Time is one |
|
|
03:54 | of doing it different in execution time and after the change on. Hopefully |
|
|
04:01 | change reduced the execution time someone gets speed up by taking the ratio. |
|
|
04:07 | , conversely, what can you Execution rates before and after on. |
|
|
04:15 | should give you the same conspirator. , so this, um, is |
|
|
04:26 | in general something that one looks at fractional calls on, that it's affected |
|
|
04:35 | a change, whether it's paralyzing it some other kind of improvement. That |
|
|
04:40 | does so try to speed things And so in general, one looks |
|
|
04:47 | the improvement on the fraction of the that does, uh, improve in |
|
|
04:55 | to the change. And then there the remaining fraction that behaves in the |
|
|
04:58 | way as before. So when God , well, equation as as a |
|
|
05:05 | after the changes were made is the that isn't time for the thing that |
|
|
05:10 | not affected, plus the time for part of the code that did get |
|
|
05:17 | . And the point of this thing then up there actual speed up, |
|
|
05:23 | have this level expression at the bottom the slide. That shows that even |
|
|
05:30 | I manage to totally reduce the or the time for the fraction of the |
|
|
05:38 | , one worked on two pretty much negligible. The best paid up on |
|
|
05:45 | get is then what's left is the code that you know, one minus |
|
|
05:52 | in this case. So this is as handles law, and that's something |
|
|
05:56 | should I've heard of and the Easy are remember to when if somebody talks |
|
|
06:05 | what someone's law and I'm most most said that speed up is limited by |
|
|
06:13 | peace of the code that was not . This intuitive but people sometimes than |
|
|
06:18 | forget on yeah, that, in , historically and Don was on engineer |
|
|
06:26 | IBM that came up with this a , long time ago, and I |
|
|
06:32 | kind of be used as a way the time to say that that WAAS |
|
|
06:43 | in Dominating were at the forefront at time of improving codes through architectural means |
|
|
06:53 | to say that instruction level parallelism is limited. So there was kind of |
|
|
07:01 | to engage in it because the games be saw marginal anywhere that the complexity |
|
|
07:08 | those by some us hardly justified. on the reason why things these days |
|
|
07:19 | and napped follow this particular model? that pretty much all there good? |
|
|
07:33 | , speed ups or the juice even in a multi core processors and |
|
|
07:41 | particular clusters is due to this concept data parallel with which is not covered |
|
|
07:48 | by this simple, um, does eso. The next thing is an |
|
|
07:57 | that some of us they have If you have taken computer architecture |
|
|
08:03 | it's a very simplistic example. But illustrates the point off London's law in |
|
|
08:10 | way, and it comes from the it's a Paterson book computer architecture that |
|
|
08:18 | kind of the dominating textbook for computer classes and for those of you who |
|
|
08:25 | know them, they are California and on NSC at Stanford and the |
|
|
08:32 | at Berkeley. So this little example in the context of what's a typical |
|
|
08:38 | for California's to think about. And going to Las Vegas. Sure. |
|
|
08:43 | in this case, there is, , a little bit on there, |
|
|
08:49 | contrived example of getting from L. to Las Vegas with a certain kind |
|
|
08:56 | options for different parts of it. on this case of his fun section |
|
|
09:02 | pretty much at the walked according these two fellows that takes 20 hours |
|
|
09:08 | then for the rest of the ah or trip from L. A to |
|
|
09:14 | Vegas. Um, there's 200 miles you have a bunch of options for |
|
|
09:19 | are you travel those 200 months on next few slides, I'm going to |
|
|
09:26 | through this little exercise. So here the first option. The first was |
|
|
09:31 | walk for 20 hours, and then can also walk, put for the |
|
|
09:35 | of the trip, and then that then at, you know, four |
|
|
09:42 | . That takes you 50 hours toe there the 200 miles, and then |
|
|
09:46 | call it speed up for this part to compare it with the other options |
|
|
09:52 | normalized that toe one. So the chip time it's not 70 hours and |
|
|
09:57 | you can use the bicycle. Then goes a little bit faster. So |
|
|
10:01 | it takes basically 20 hours to do 200 miles, and then we have |
|
|
10:06 | initial 20 hours. So now it's and that means yes, there are |
|
|
10:13 | consequences that it talks about half the or speed up or 1.8, and |
|
|
10:19 | you can do sort of more modest car, and that's getting a little |
|
|
10:24 | more speed up. And then you do you are for our guys on |
|
|
10:32 | things. Now it takes considerably instead the 50 hours the walking takes less |
|
|
10:37 | two hours, and then you know can dio the rocket car. If |
|
|
10:42 | live in California and you have the where they try rocket cars every now |
|
|
10:46 | then, um so, yes, takes only in that case a third |
|
|
10:53 | an hour instead of 50 hours. in the end, the told speed |
|
|
10:58 | wasn't more than about the fact of so that's just illustrated on and they'll |
|
|
11:06 | the some parts that doesn't improve in end. Limit speed up. So |
|
|
11:13 | regardless how hard to try for the that you can affect the middle. |
|
|
11:17 | this case, it was the majority are the distance or perhaps the number |
|
|
11:25 | lines of cold? It doesn't value to go to the extreme. So |
|
|
11:33 | just a simple example. All the up now focusing it a little bit |
|
|
11:40 | , Yes. And what cover for rest of this class? And not |
|
|
11:47 | today and of course, as where improvements is due to paralyzing cult. |
|
|
11:54 | for that subscript e on the speed for the section of the code that |
|
|
11:59 | paralyzed. So now I'm done so , looks the same. It's just |
|
|
12:06 | this improvement on a section of coldness a symbol s subscript ID with |
|
|
12:14 | So, in this case, just , even if 95% of the code |
|
|
12:21 | reduce reduce to almost nothing in the best speed that we can hope |
|
|
12:26 | is that 20. Okay, so now if you think of the |
|
|
12:39 | speed up, So this is questions we can hanging up here too, |
|
|
12:46 | . And have some opinion whether in , monkeying get, uh, speed |
|
|
12:55 | or improvement in the running time. is, uh, mawr than the |
|
|
13:03 | of threats to engaged. I suppose 10 threads Can you get more than |
|
|
13:10 | 10 times speed up? Ignoring caches the light? Yes, exactly. |
|
|
13:22 | that's good point. Yes. So , um, what can happen is |
|
|
13:31 | as you increase the number off threads that means you have most likely then |
|
|
13:38 | for it to make sense, to a bunch, of course. And |
|
|
13:43 | , um, at some point a least when it comes to live with |
|
|
13:48 | cash is the problem may be small that it fits in level three |
|
|
13:54 | So that means that most of the references will be too. Cash is |
|
|
13:58 | depending upon what's your code is. then you will see the runtime being |
|
|
14:05 | proportional Thio accessing, cashing that accessing main memory. So this is |
|
|
14:13 | what we said. So why? this is one reason, and anyone |
|
|
14:20 | of some other reasons, um, intuitively we would we would say |
|
|
14:34 | The first time if we can. can't imagine anything other than cash behavior |
|
|
14:40 | it more speed of being greater than number of groups that we have. |
|
|
14:45 | . That's correct. But in the . Well, what I have on |
|
|
14:50 | up on this side is something um hopefully that will become You become |
|
|
14:58 | of it as I talked about more parallel algorithms and this is the fact |
|
|
15:04 | it comes to cluster. So this not, um, be something that's |
|
|
15:10 | due to think of at this point this course that I want you to |
|
|
15:15 | aware of it because that's something that important when it comes to clusters that |
|
|
15:24 | and communication may scale differently and that cause things. Thio also gaps. |
|
|
15:32 | speed up is higher than the number Fed. So, of course, |
|
|
15:37 | to use and then there's a third that shouldn't really be. But when |
|
|
15:44 | re papers, it's something you have be aware of. So when we |
|
|
15:49 | at speed up, um, if use this notion off using looking at |
|
|
15:57 | up based on rates, so then you compute sort of the maximum rate |
|
|
16:09 | could be achieved using the nominal uh, or standard clock frequency. |
|
|
16:21 | then the code is actually being So when you do timing than they |
|
|
16:28 | turn out that the processes may have running in terrible mode. So that |
|
|
16:34 | you're not using the proper maximum execution soul. When you read papers, |
|
|
16:42 | will find occasionally that people have you know, more than an speed |
|
|
16:49 | that is higher than proportion to the of Fred's, or they get deficiencies |
|
|
16:56 | are harder than 100% which obviously it's possible. But many people that have |
|
|
17:02 | problem writing it down and getting it even or is obviously is a |
|
|
17:08 | It is not correct so hard to . Yep. Are you aware of |
|
|
17:16 | utilities similar to Newman Control that allows Thio tell the process or to stay |
|
|
17:21 | a base? Frequently there is. don't remember exactly what they are, |
|
|
17:25 | you can lock today process of frequency prevent it from changing frequency on |
|
|
17:31 | whether it's in the When we talked power management, that there was thes |
|
|
17:40 | states that is, the cloud grades used, and one can basically tell |
|
|
17:46 | operating system not to manipulate the Great Quincy. So I don't have |
|
|
17:57 | command. We can look it but it is something you can |
|
|
18:01 | So you can prevent these things from to you and know exactly what |
|
|
18:08 | So that's if one wants to be benchmarking. That's what should be. |
|
|
18:15 | not saying that it's 100% guaranteed because may be that, um, the |
|
|
18:26 | unquote firmware that the CPU vendors have sometime over right. If the CPU |
|
|
18:37 | in danger of overheating, that it's will not honor what the operating system |
|
|
18:45 | to be the cockpit. Because there this kind of sometimes tug of wars |
|
|
18:54 | the firmware and the operating system as who's supposed to has the final word |
|
|
19:00 | what's happening. And it tends to , you know, everyone wants to |
|
|
19:06 | the equipment. So in the from where will go mhm Um, |
|
|
19:18 | , so now about this concept of and weak scaling and parallel efficiency. |
|
|
19:23 | these are also things where I personally a victim, an issue with what |
|
|
19:31 | happens out there, and so I'm my biases here. And the reason |
|
|
19:42 | these things are, um I will on this slide. But first, |
|
|
19:49 | me tell you what I just wrote on this slide. Yeah. So |
|
|
19:54 | have something that I also called fair up that is usually not disgusting. |
|
|
19:59 | talked about weak and strong scaling or up. And the reasons for me |
|
|
20:08 | putting in this notion affair speed up bond. We'll talk about that later |
|
|
20:13 | I talked about parallel algorithms because it out not in the case you have |
|
|
20:19 | so far for the standard matrix vector matrix multiplication algorithms. They do paralyze |
|
|
20:27 | fine. But for many other the best sequential algorithms don't necessarily paralyzed |
|
|
20:40 | , so if run has a, know, access, I want to |
|
|
20:46 | parallel computation. You may choose a algorithm and certainly different incarnation in terms |
|
|
20:55 | the code than what you would um, things for a single core |
|
|
21:03 | threat. So at first, type speed up mentioned this slide as essentially |
|
|
21:13 | thank you. Aware of that there different ways of looking at the speed |
|
|
21:20 | , and sometimes it just marginal And sometimes it may be quite |
|
|
21:28 | Uh, how in terms of total being performed that the parallel algorithm in |
|
|
21:35 | American pursue a lot more work than that were designed for a single core |
|
|
21:43 | threat. So I think to be , one should really look, if |
|
|
21:49 | wants to do it, looking at up from a single thread to whatever |
|
|
21:55 | of threads you're using, one should of look at the best case for |
|
|
22:00 | single scenario. And then you cut everyone has for soapy friends. And |
|
|
22:09 | what the fair means and the other types of speed up. The other |
|
|
22:14 | types of speed up is what's typically weak and strong. And I'll start |
|
|
22:24 | to comment on the strong scaling. that s basically you look at the |
|
|
22:31 | workload or problem, and you tried find out how it scales or how |
|
|
22:40 | execution. Time is reduced with the all threads that you're using for that |
|
|
22:50 | , and that's certainly a very variable of looking at things. But still |
|
|
23:02 | has to be careful and have one that because, um, that was |
|
|
23:12 | of my early arguments with the creators this top 500 list and the use |
|
|
23:23 | this girls and elimination coat, because that this was started, the thing |
|
|
23:31 | looked at boss to solve their linear of equations with the 1000 equation and |
|
|
23:39 | unknowns, mhm and aside argued with on the list staple, and nobody |
|
|
23:48 | going to by a supercomputer to solve problem. So in general, larger |
|
|
23:58 | that is being used is used to larger problems and that this was this |
|
|
24:05 | . A week speed up or weak comes from in for that type of |
|
|
24:13 | . By speed up, you scale problem in proportion to the sites of |
|
|
24:18 | cluster or parallel computer that you're using effectively in another way of looking at |
|
|
24:25 | , you use kind of the same or fraction of total memory in the |
|
|
24:34 | it's a one node or 10,000 you say 50% or whatever memory you |
|
|
24:43 | for the problem size, and then study how the efficiency or time is |
|
|
24:51 | as you scale things up. Potential . Both the problem and the |
|
|
24:56 | What you do not do in strong . So this is what I |
|
|
25:02 | So I think in terms of practicalities , people use larger crusted for larger |
|
|
25:12 | they don't use. Largely. It mean that sometimes for modest sized |
|
|
25:19 | it's still interesting to find out the scaling. So any questions or comments |
|
|
25:27 | that it comes to me. I you to be aware all these things |
|
|
25:33 | anyone writes papers or read papers about , what actually being, uh discussed |
|
|
25:44 | the critical about what you read and on this is kind of just follow |
|
|
25:52 | in terms of efficiency. And that's issue I have in most papers up |
|
|
25:58 | . Not every paper but a zits been red at the bottom. |
|
|
26:06 | most people report this power efficiency. simply efficiency. Now parallel efficiency is |
|
|
26:16 | on the top one. Basically, is kind of the ratio between the |
|
|
26:22 | up you got for using soapy Fred's the P threats or course that you |
|
|
26:29 | . So you know, if you something that the running time decreases just |
|
|
26:38 | proportion of perfectly 100% proportional to MM off course of our friends you're using |
|
|
26:47 | you get the part of efficiency over . Um, but it doesn't mean |
|
|
26:54 | you're efficient, really is 100%. that's my problem when people report this |
|
|
27:02 | efficiency as efficiency because in practice things use a very small fraction on the |
|
|
27:11 | is you actually using. So that's I really emphasized one should look, |
|
|
27:22 | , to really efficiency is nothing wrong looking at Carol efficiency. But don't |
|
|
27:28 | it to be the efficiency off there coats. So anyone, um, |
|
|
27:42 | sto yes was the typical efficiency is rial, as I call it on |
|
|
27:51 | slide efficient is off cold weather's and note code, or even more so |
|
|
28:01 | terms off classical. What fraction of total resource is typically being used? |
|
|
28:10 | , anyone would guess it's in a digits, right? So even the |
|
|
28:18 | calls are often in the, you , 1 to 10% range, and |
|
|
28:26 | cold, maybe you know 0.1%. it is not easy to get very |
|
|
28:40 | . Utilization of resource is so high , as I think was it on |
|
|
28:47 | slide, have ordered from build up of the early lectures was, you |
|
|
28:53 | , Parliament, they call in a . Efficiency is easy. Single core |
|
|
28:58 | is hard. All right. so any questions on these topics now |
|
|
29:10 | move on to structure, approach, about clusters and MPP and things from |
|
|
29:20 | nature. So many of you may have used clusters and know what it |
|
|
29:25 | , but I'm first going to talk case you haven't, uh, talked |
|
|
29:31 | little bit about clusters. And what differences between clusters and so called MPP |
|
|
29:37 | for short and again, some examples it so harsh definition talk the level |
|
|
29:50 | parallel lists that is typical for both and empathy. Zehr supposed to, |
|
|
29:56 | , what we have talked about so and to get the cluster clusters |
|
|
30:01 | as basically collection of independent processors tied by a network and then used as |
|
|
30:13 | a computer platform. And I talk little bit about the particular simple performance |
|
|
30:21 | to understand these clusters can be modeled terms of performance. So first I |
|
|
30:28 | a little bit what? The differences about how they're constructed, and I'll |
|
|
30:33 | a little bit more about that in next several sites. So clusters, |
|
|
30:43 | , some tea too is a cluster clusters we have had Or have that |
|
|
30:49 | mention the data science institute to And, um, this is the |
|
|
30:58 | I got the new one that it just announced, Sabine. But that's |
|
|
31:03 | old one. And then there was new one. See something? |
|
|
31:07 | anyway, but there yes, they're clusters and so on. The overriding |
|
|
31:19 | was a driving or principal about putting clusters is, I would say, |
|
|
31:26 | effectiveness more than performance. So and tends to be using mass produced |
|
|
31:35 | So it's standard service that are basically units that they can buy and use |
|
|
31:45 | by one. Or if i, know, hundreds of thousands off |
|
|
31:49 | and then you connect them up through communication technology and then varies a little |
|
|
31:59 | . Performance conscious you are, but general use used any form of standard |
|
|
32:08 | technology out there. And Ethernet is far the dominating standard, not necessarily |
|
|
32:15 | terms off clusters, but in terms and networking technology. Internet is by |
|
|
32:23 | the dominating at technology or protocols being . Thio connecting up things, whether |
|
|
32:31 | local area network, or what area , but the cause of its relatively |
|
|
32:40 | cost in particular for the lower data . Uh, it's also used in |
|
|
32:46 | clusters. So I think we'll Ponta have Ethernet and Sabina may also you |
|
|
32:53 | if a net stamping does not, it's more of a focus on |
|
|
32:59 | But it's still that was a kind a mass produced, Internet interconnection network |
|
|
33:08 | known as Infinite Band. It's an standards, basically, even though, |
|
|
33:13 | , practically it has been dominated by vendor. That was anyway being as |
|
|
33:23 | in high performance standard networks as Intel bean in terms of CPUs, and |
|
|
33:31 | company is known as melon arsenal mentioned a little bit more and subsequent |
|
|
33:37 | Um, and I think I'm comment little bit more on Ethernet. And |
|
|
33:43 | in abandoned slices to come MPP I was a. The main difference |
|
|
33:53 | that and piggies tends to use proprietary technology, sometimes even proprietary processor |
|
|
34:06 | Um, or at least in terms compute nodes, has been proprietary |
|
|
34:14 | not an offer self server, but that is this unjust to build and |
|
|
34:22 | and It has an been driven by aspect, So cost has been not |
|
|
34:32 | totally dominating aspect, but more performance very large. Um, were close |
|
|
34:42 | his pain was something that paralyzes to very high degree. So But in |
|
|
34:52 | of the programming model tends not to the all that visible when you use |
|
|
35:02 | I've talked about for this class, message passing library, MP I, |
|
|
35:07 | implementation of the libraries are very So the performance characteristics are different. |
|
|
35:13 | whether you use a cluster in an , but in terms of the |
|
|
35:17 | right is not different. So there's pictures and guys. Yeah, anyone |
|
|
35:28 | seen clusters. And this is not that common toe what things look like |
|
|
35:34 | . I'm getting some of the dates the side that I put here as |
|
|
35:38 | than the top ones and pretty much the bottom of left corner, but |
|
|
35:46 | to be homegrown clusters of people put in their lab. This is more |
|
|
35:52 | it looks like. A professional put clusters and, um, the top |
|
|
36:01 | top role or pictures here from if European side tellers at in Munich in |
|
|
36:09 | . That is as among the largest for research. Uh, and the |
|
|
36:17 | lower left hand corner is simply, , that have been using on the |
|
|
36:22 | right hand corner is one from the data centers and one comment that it's |
|
|
36:30 | related to the use but in terms the lower right hand corner. |
|
|
36:39 | and I think in the very first lecture and used to slide that showed |
|
|
36:45 | aspects of large data centers that has deal with it calling so on the |
|
|
36:52 | right hand picture on the lower left show things that big data the commercial |
|
|
37:03 | centers do not tend to use. , they're nice, pretty cabinets. |
|
|
37:11 | in the lower right hand corner, is no doors for the rack, |
|
|
37:15 | airflow and cooling can so dominate. huh. The giant data centers do |
|
|
37:23 | on, um, they're not done looking pretty or for people to be |
|
|
37:28 | it on dso. Sometimes they run talked, so they're not designed for |
|
|
37:37 | to being signed. And here is little bit what going on? Quote |
|
|
37:42 | backside that doesn't have the door. you have a door, looks like |
|
|
37:48 | DSO. I will talk a little today and then, uh, class |
|
|
37:54 | come. I will talk more about interconnection networks that is being used, |
|
|
37:59 | this shows a little bit to the and yellow and orange cables. They |
|
|
38:06 | basically fiber optic cables on the great . There are basically copper wire cables |
|
|
38:13 | people use and then on the upper hand corner is, um, kind |
|
|
38:20 | wiring abundance that are totally custom made the clusters so on. And then |
|
|
38:30 | is a picture of an MPP, it doesn't look very different from anything |
|
|
38:36 | you saw on. That's is the that from the outside you can't really |
|
|
38:42 | much about whether things are an empathy a cluster. But this picture happened |
|
|
38:48 | be from I've been out of I ends Lou Jean, Syria's of computers |
|
|
38:59 | eso Any questions on that and I'll about I'll talk more about the difference |
|
|
39:06 | clusters and MPP and pictures to come slides to come. Yeah. |
|
|
39:17 | another comment that can do. And one is, um so all these |
|
|
39:23 | data centers at this point in so it'll be this particular, |
|
|
39:32 | empathy. The Blue Jean was designed be very power efficient on it. |
|
|
39:39 | , also the most power efficient not MPP, but more part efficient and |
|
|
39:44 | around. And then what was used there cooling off the data center. |
|
|
39:51 | you can see on the tiles on floor that they're perforated. So what |
|
|
39:57 | then, every other I'll have thes floor tiles for mhm cool air Thio |
|
|
40:06 | up, and then every other ill known as a hot tile that sort |
|
|
40:11 | have the hot air. Then um, suck it up. So |
|
|
40:15 | alternating nothing cold dials in the state centers, and many times they're in |
|
|
40:21 | using enclosures for a pair of ragged in order to separate hot and cold |
|
|
40:29 | from each other. Yeah, and saw the other part, um, |
|
|
40:35 | I wanted to stress at this When it comes to clusters and MPP |
|
|
40:41 | the scale of parallel is that needs be managed compared to single older, |
|
|
40:49 | GPS. So, and as you see, this obviously is from the |
|
|
40:56 | 500 list. So it's yes, large clusters. But even you |
|
|
41:05 | the modest sized clusters we have Uh uhh. And we'll show on |
|
|
41:12 | subsequent slides. And even companies have toe. Not only the Internet |
|
|
41:18 | giant clusters, but many other and clusters in industry also have significant |
|
|
41:25 | of course, and threats to So this is from the average |
|
|
41:30 | of course, on the stop And then this is the maximum |
|
|
41:33 | So it's against something. Thio Notice off the largest clusters of more than |
|
|
41:39 | million, of course, um, be managed. It doesn't mean that |
|
|
41:47 | application they run use the entire and that's often not the case. |
|
|
41:54 | they may still use a significant fractions millions, of course, for any |
|
|
42:01 | application. Mhm. And there's also minimum number, of course, on |
|
|
42:07 | list. So even the smallest the number 500 today has more than |
|
|
42:14 | or close to 20,000 course on and this is kind of an |
|
|
42:20 | but you can look at it, guess, on your own display. |
|
|
42:25 | then But I'm just illustrate So a bit the top 10 in the bottom |
|
|
42:33 | on the stock 500 list and then want to produce it in terms of |
|
|
42:39 | ones is DP use and which ones . He was deeply used. It's |
|
|
42:43 | of interesting to see that the question the largest number, of course, |
|
|
42:49 | all, um, as far as know, even though it's it's and |
|
|
42:58 | Chinese processing. But I believe it much uses the X 86 instructions certain |
|
|
43:05 | of similar to that course. We have 10 million plus of those courts |
|
|
43:13 | , simple course like GP. Of , on this was a little bit |
|
|
43:20 | point out again, since money or will end up in industry and |
|
|
43:26 | you know, end up with the companies that you have giant clusters, |
|
|
43:31 | I mentioned, but also the oil gas companies very large posters, and |
|
|
43:39 | that case, their coats are many highly parallel. And, um, |
|
|
43:47 | this so this was just put in to get your some sense for uh |
|
|
43:53 | on the left hand diagram the short , as they call it, and |
|
|
43:58 | stopped funded list system share so that um, off the number of systems |
|
|
44:05 | the top 500 list. Close to of those clusters are in industry. |
|
|
44:16 | as you can see on the right slide if you look at performances |
|
|
44:21 | country illustrates that the clusters used forest or academic use tend to be |
|
|
44:32 | so they may not be as But that's by a little bit. |
|
|
44:36 | 50% of the total capability on this 500 list is basically and, |
|
|
44:45 | research types clusters mhm. So try butt out that this is extremely important |
|
|
44:56 | terms of scalability, and that comes both the algorithms as well as |
|
|
45:03 | Your software has to be extremely scalable in order for these clusters are empathy |
|
|
45:13 | be effectively is used. And so orders of magnitude more than you find |
|
|
45:23 | on CPUs and GP use and, , well respected you be use? |
|
|
45:33 | , yes, well, you But now you note that if you |
|
|
45:38 | may have a few 1000 course not about 2000 course, But in order |
|
|
45:49 | make effective use, you need this parallel is so effectively in terms of |
|
|
45:54 | number, all friends would say one to factor after unless you can use |
|
|
46:03 | it comes to NVIDIA. 32 of , in fact, to live |
|
|
46:06 | That is kind of the equivalent of CPU thread, then. So the |
|
|
46:13 | of Carol is even on GPS. , it's your kind of ignore. |
|
|
46:19 | Cindy aspect of the current list is less than well 1 may need to |
|
|
46:27 | with in terms off clusters and MP Andi. In the comments of questions |
|
|
46:36 | that, I'll have a couple more on the difference between clusters and MPP |
|
|
46:52 | often feinting trap. So the next is now, as I mentioned a |
|
|
46:57 | times, what makes things different is . What they talked about before is |
|
|
47:03 | that there is a bunch of independent in some shape and form that are |
|
|
47:11 | out to form this clusters and MPP one of the main differences ISS how |
|
|
47:20 | , um, interconnection is being So I'll try to illustrate this on |
|
|
47:30 | slide. So the main difference is for M. P. P |
|
|
47:43 | the network is much more highly integrated the nodes. Then what you find |
|
|
47:56 | clusters so on. I mentioned that on when I talked. What? |
|
|
48:02 | difference is between clusters and MPP that peas, um designed for highly parallel |
|
|
48:11 | , clothes and performance and use proprietary by being proprietary technology attempt to be |
|
|
48:19 | expensive and pricey. And what clusters so clusters? What kind of dominate |
|
|
48:26 | dominating up there. But there are examples of them trapeze. And if |
|
|
48:34 | go and look at the top 500 that go back and look at the |
|
|
48:38 | , I want to it now. you would find that a few of |
|
|
48:42 | are labeled as MP piece, but far from the 95 or 98 or |
|
|
48:49 | more percent is clusters. So when comes to clusters, then, |
|
|
49:01 | one basically used network adapter step plugs this. Hi. Oh, bus |
|
|
49:07 | piece. I express bus. um, do you think and |
|
|
49:17 | this I also put Yes, they're not you those clusters. But there |
|
|
49:25 | this volunteer computing type networks and some you might hold known about, you |
|
|
49:32 | , folding at home, etcetera. in that case, basically this local |
|
|
49:37 | network or wide area network technology that connects the various pieces there, people |
|
|
49:45 | for, used by some in Um, costs are e I have |
|
|
49:59 | one example often entity, that is the again, Lou Jean, |
|
|
50:08 | be computer, in this case I have their most recent fund is known |
|
|
50:15 | losing, too. Unfortunately, I say on IBM, stop producing the |
|
|
50:25 | Jean Syria's So that's no longer you buy one anymore. You can still |
|
|
50:31 | some on the Blue Jean Cube available there at some centers so still |
|
|
50:38 | but they're no longer produced. But point off using this for administration is |
|
|
50:51 | this integration off the network is, , much more, or network is |
|
|
50:58 | more tightly integrated into the notes on would find in clusters. So here's |
|
|
51:04 | they did in this case, that the network ties into the known |
|
|
51:10 | the same level as the Level three are being done. So, |
|
|
51:18 | network is no further away than a three. Josh, Uh, when |
|
|
51:25 | talk about networks in some future I will talk a little bit more |
|
|
51:30 | what the green boxes at the bottom the slide. These with stores and |
|
|
51:34 | and barrier. It's a transom for empty peas. Um, networks to |
|
|
51:43 | synchronization tend to be also dedicated networks from other networks for data communication between |
|
|
51:53 | . And that's also kind of our class citizens, like other aspects of |
|
|
52:02 | . Wow, on the other has shown before that this of the |
|
|
52:09 | I expressed. But, um, is the I O bus that is |
|
|
52:14 | for interconnecting nodes using standard technologies like or infinite bond. Or sometimes they |
|
|
52:25 | also be proprietary technologies but prepares the well, friend, do not use |
|
|
52:32 | PCs, Crispus. All right, So when it, uh, comes |
|
|
52:47 | dealing with clusters on trying Thio, eventually get good performance, not just |
|
|
52:54 | a working code, one has to again aware off that there is now |
|
|
52:59 | more layer off communication toe worry It's not just the memory hierarchy, |
|
|
53:12 | so the main memory. Andi, various levels of cash that I didn't |
|
|
53:19 | on this particular slide, but also interconnection networks that this then at first |
|
|
53:27 | , were emitted by the EU But then they also be limited by |
|
|
53:34 | the interconnection network is put together, we'll talk about that a little bit |
|
|
53:42 | the next few slides, and uh, interruptible will stop in a |
|
|
53:48 | slides. I believe Thio make Sajjad . Let's see. Um, |
|
|
53:56 | So I said that would make some on, um, least two open |
|
|
54:04 | , the Ethernet standard and the FINA Standard. Um so Ethernet this by |
|
|
54:15 | the dominating standard for local area networks even wide area networks today. And |
|
|
54:23 | reason is it tends to be very effective. Um, because I guess |
|
|
54:37 | it's in part less, um, design, because the focus is not |
|
|
54:50 | so much on Leighton. See, it tends to be highly agency. |
|
|
54:56 | and so it's a little bit, , easier to design and build on |
|
|
55:07 | in that case, being less And then it also. Yet the |
|
|
55:14 | market shares with that again helps drive there or advertise the cost, then |
|
|
55:23 | non recurring costs for engineering. So as the mentor of the Ethernet |
|
|
55:30 | Bob Metcalfe has said, you if it can be done by |
|
|
55:34 | it it will be done by and it tends to the truth. |
|
|
55:40 | , and sometimes when it comes to really high bandwidth. It is not |
|
|
55:46 | that much less costly. Turns out it often for any jump and |
|
|
55:54 | It tends to be maybe a couple years behind, before it becomes, |
|
|
56:01 | , sufficiently lower cost. Thio compete . Things I can't sing about. |
|
|
56:11 | on day, that's why if you and look at this top find and |
|
|
56:14 | lists, which are then clusters that prediction for the top 500 list, |
|
|
56:19 | of them are focused on performance. you will find, uh, that |
|
|
56:26 | very large fraction of those clusters used China banned some use proprietary but also |
|
|
56:33 | nontrivial portion of the cost of use interconnection technology. And then there is |
|
|
56:41 | list off different versions off the On standard on, I think is |
|
|
56:49 | the case, um, even though are another proposed improvement beyond this high |
|
|
56:57 | rate, or HDR on seeing the standard that is out there, |
|
|
57:06 | and underneath also have a little bit listing in terms off. The Layton |
|
|
57:17 | that is inherited plate disease in the . In terms of the switches Thio |
|
|
57:27 | that is being used to put together network. I will talk much more |
|
|
57:30 | that in the future. Lecture as mentioned, um, but the agency |
|
|
57:43 | yes, it's in terms of the cycles, and they still be in |
|
|
57:48 | order of 100 or more, but some, depending on again the switch |
|
|
57:59 | . It may even come down depending the routing protocol. But to |
|
|
58:04 | um, tens off cycles, not on these are for the switchgear |
|
|
58:11 | So later on, when we talk the N. P I. Things |
|
|
58:18 | usually enough in the Nanna second but more in the microsecond range. |
|
|
58:23 | remember that processors operating the few gigahertz to 4 gigahertz range typically. So |
|
|
58:32 | means, uh, the cycle time less than a nanosecond. So microseconds |
|
|
58:42 | means tense off processing tens of thousands processing cycles. I have a couple |
|
|
58:49 | other things mentioned on the slide at bottom on the path that was an |
|
|
58:55 | waas. Mostly I haven't say and that is the propriety of technology by |
|
|
59:04 | that it's now it's owned by uh, in tow as, |
|
|
59:14 | at various times, doubled in networking . Uh, like but until basically |
|
|
59:25 | been focused on being a component you know they've been CP use when |
|
|
59:30 | do something new they like. Early , when clusters or parallel computers started |
|
|
59:38 | emerge, they also had the product for thinking about four or five years |
|
|
59:45 | basically the practical demonstrations. So how can build clusters using CPUs, but |
|
|
59:51 | they didn't want to compete with it some Dell or HP or IBM in |
|
|
59:56 | of platforms. So then they retrenched just doing CPUs. Same thing they |
|
|
60:03 | been in the networking business, but for a while they build switches. |
|
|
60:09 | then the returns retrenched to just doing components being used in switch here and |
|
|
60:15 | they did on the past to try do interconnect for data centers, since |
|
|
60:23 | do want to be in the data business. But unfortunately they didn't get |
|
|
60:28 | for their only path. So it out. And 2019, about four |
|
|
60:35 | after the released their products, they to no longer eso on the path |
|
|
60:44 | about a month ago they, I , in part, change their minds |
|
|
60:52 | and they spun out. Ah, basically created a startup company and funded |
|
|
61:02 | to outside of Intell. Try to interconnection networks built based on this only |
|
|
61:09 | technology that they did, right. , um, Crais and computer, |
|
|
61:23 | always, been a computer company focused the high end of clusters, not |
|
|
61:31 | clusters. And they have also had only two connection technology. I think |
|
|
61:38 | along as well as they don't design own CPUs. But they do design |
|
|
61:44 | own servers and how to integrate the . And they still do. By |
|
|
61:54 | way, I think it waas 78 ago, when the interconnection technology network |
|
|
62:02 | Cray had at the time they sold ah technology toe internal can, I |
|
|
62:10 | , used it for the on the design. But great. Continue to |
|
|
62:15 | new networking technology that this, I , a slingshot technology that But again |
|
|
62:21 | pray is owned by HP. So little bit about the principle of data |
|
|
62:28 | Um So Okay, so I guess should stop and ask if there's any |
|
|
62:35 | before talking about today very high simple model of data communications or any |
|
|
62:42 | on this notions of clusters and M P's and in particular technologies thevenet between |
|
|
62:52 | band or others being used for interconnection the cluster interconnect protocols. Um, |
|
|
63:01 | this also what is used to communicate sockets on the same node? |
|
|
63:07 | Well, ah, and the reason saying no Is that the thing I'll |
|
|
63:21 | about? Like M. P for instance, that is used for |
|
|
63:27 | was a cluster of programming. It's on top of the protocols, so |
|
|
63:34 | may use something native. Someone that uses whatever is provided in terms of |
|
|
63:44 | the CPUs. Because CPU and two p union socket the socket in to |
|
|
63:51 | on a circuit board uses whatever CPU uh built into a platform vendors like |
|
|
64:03 | , Dell and others making multi socket . So they in terms on |
|
|
64:15 | I don't remember exactly. Intel may used the on the path between its |
|
|
64:24 | , and he had this its own technology. Thio use the product calls |
|
|
64:31 | communicating between sockets on the same thing , uh, in open n pierre |
|
|
64:41 | it uses whatever is native to the used to, uh, do the |
|
|
64:49 | between in the Newman nodes. So maps onto the more native protocols being |
|
|
64:56 | . Wow. So but in the model, like you can use |
|
|
65:03 | P i to, um, Basically, um, each core is |
|
|
65:12 | independent computer and the communication between core and N p I. And then |
|
|
65:18 | up to the MP I implementer to what mechanisms they actually used to do |
|
|
65:24 | communication. Given what they know about the data is, it's So the |
|
|
65:33 | MP model treats, uh, any on any socket as unequal citizen. |
|
|
65:41 | then the implementer is what it looks . The difference is, um um |
|
|
65:47 | . Okay, right. So it the native stuff in, whereas MPP |
|
|
65:53 | have more layers on top. And it's a question. Now implement. |
|
|
65:58 | have decided to use the mechanisms and I'll come back to that as |
|
|
66:07 | go on. And the other question the thing that start to make a |
|
|
66:23 | from what I've said so far um, that clusters are a collection |
|
|
66:33 | independent computers. So and that's what's this notion of distributed memory systems that |
|
|
66:39 | not comment all that much about. when we talked about open a |
|
|
66:45 | C and accelerated notes and heterogeneous nodes had to deal with to memory |
|
|
66:53 | the GPU memory and the CPU memory it comes to clusters. Um, |
|
|
67:01 | addition, there is a separate memory for the different notes, and we'll |
|
|
67:11 | back to that aspect. But the thing is that in the notes are |
|
|
67:17 | it into connection network, whether it's proprietary network or standard one, |
|
|
67:25 | it's still is mhm explicit in some , as supposed, Thio CPU |
|
|
67:34 | as we just discussed, um, it needs to be basically different protocols |
|
|
67:41 | did with communication between different nose. the first level in terms I'll talk |
|
|
67:49 | this, then have a high level off capturing this notion off, or |
|
|
67:57 | the communication network into. I was about, um, performance, and |
|
|
68:07 | is unfortunate. Another area, I . Where, uh, things are |
|
|
68:16 | and how people use this team. off late in C and delay. |
|
|
68:24 | , when we talked about the individual , there's no bit of ambiguity as |
|
|
68:30 | mentioned, when it comes to we about slam what it means in terms |
|
|
68:35 | processor or C for you, is the core or is it Uh |
|
|
68:40 | Quote unquote, the package that you into a socket on. So for |
|
|
68:49 | class, I tried thio, the consistent and use these notions notions |
|
|
68:59 | late agency and delay. So for delay and try to use basically they |
|
|
69:07 | encompassing time forgetting the message has it or data from one place to another |
|
|
69:22 | and that told delay is then depending and Leighton see and a time that |
|
|
69:33 | takes to get quote unquote the Good data from point A to point |
|
|
69:44 | . So so for the Leighton. , then it has a few components |
|
|
69:51 | it. There's overheads in the center the receiver to deal with getting the |
|
|
70:03 | from point A to point B. then there is basically time of flight |
|
|
70:15 | is, in monarchy. Many cases off the speed of light delay. |
|
|
70:23 | not quite, but I can think it as the speed of light. |
|
|
70:28 | , when one has said they're just first of a single communication links. |
|
|
70:34 | it's just the point to point type transfer. You have to go through |
|
|
70:40 | protocol stack on the sending and to out to build the packets, since |
|
|
70:47 | of them are packet oriented. But that when I talk more about networks |
|
|
70:52 | some future class, whole talk about protocols for doing it. But, |
|
|
70:57 | know, one protocol has again the one up the Ethernet has the protocol |
|
|
71:04 | that you're taking a networking class, probably know about it. And there's |
|
|
71:08 | bunch of things time being put And you may have walked Mississippi I |
|
|
71:12 | that has a header that needs to put together in terms, source and |
|
|
71:17 | and all kinds of other information that into that. And it takes time |
|
|
71:22 | build up. And And then, , it has to, you |
|
|
71:28 | get put on the warriors, so speak. And yet the first people |
|
|
71:34 | get to the other end, and the receiver has thio uh, |
|
|
71:40 | uh, figure out how to interpret it comes in on, figure out |
|
|
71:46 | to do with. So Leighton is the way I well, and |
|
|
71:53 | hope will be consistent to use it just thes collection of overheads that there |
|
|
72:00 | no matter how small messages sent that's it is. It's not entirely |
|
|
72:08 | Sometimes it may depend on the size the message, but the first |
|
|
72:13 | It's something that is relatively independent. totally independent off the payload. And |
|
|
72:20 | there's the other part that is definitely on the payload. And then the |
|
|
72:27 | slide just illustrates a little bit of impact off, depending upon the balances |
|
|
72:34 | these two aspect the late and see the bandwidth that to get of the |
|
|
72:41 | between the end points and clearly that higher the payload. Uh, then |
|
|
72:50 | less the impact is our the agency the overhead kind of things that you |
|
|
72:56 | . And that's pretty much what the shows that you know, effective band |
|
|
73:01 | this, um, that for small kind of the effective bandwidth is low |
|
|
73:09 | it's kind of dominated by the That's I was a larger day package |
|
|
73:19 | the payload. There's less influence there selecting, say, have That's pretty |
|
|
73:25 | the take home message from this Um, Now, things gets more |
|
|
73:34 | than this very simplistic model, illustrates, because in reality for these |
|
|
73:44 | systems, you don't interconnect every note other note, but you have some |
|
|
73:53 | off interconnection network with switches sector between two and points. So both Layton's |
|
|
74:05 | bandwidth gets affected how this network and put together and what the low |
|
|
74:11 | So the network topology place a role that may cause some of the links |
|
|
74:19 | be shared by many pass between endpoints in the Internet. It depends on |
|
|
74:29 | rotting protocol Being used depends on obviously much load you put on the |
|
|
74:35 | And that, in its sense, on where data gets allocated in this |
|
|
74:41 | or MPP. And then what particular or application is being used, |
|
|
74:49 | how your access data across the Then you may also have message priority |
|
|
74:58 | effects things gets done. And you also carve up the bandwidth on links |
|
|
75:06 | give, um, difference or priority different streams of data between different person |
|
|
75:15 | we're depending on what's happened. And , um, happens also in these |
|
|
75:22 | networks that you have mhm kind What if your Internet communication network business |
|
|
75:31 | one also get this notional software defined where you can have uh huh network |
|
|
75:41 | being managed and allocated and not just the role capabilities off links and |
|
|
75:50 | Thank you, Onda. The next things my time is up. |
|
|
75:57 | um, for anyone interested in um, again, more or less |
|
|
76:05 | alert off. Be careful as to you read. Um, and the |
|
|
76:13 | I have a bunch of slides here I found kind of entertaining more than |
|
|
76:19 | else in terms off people doing And what happens when vendors does comparative |
|
|
76:29 | between their own products and competing Because, you know, everybody can |
|
|
76:35 | stuff or use some customers set up test their stuff against somebody else's |
|
|
76:42 | And the next few slides here that will let you produce. I won't |
|
|
76:48 | about them today or next time, you can see for some other common |
|
|
76:54 | colds that is being used in scientific engineering competition influences from the fluid dynamics |
|
|
77:01 | that is commonly used up there. is another one LS Dyna That is |
|
|
77:08 | code that, um is commonly used the order industry as well as the |
|
|
77:14 | of Defense trying to figure out, know, DVDs that what's original producer |
|
|
77:21 | this cold. They always had tried figure out what happens when you try |
|
|
77:24 | build something up. Uh, car want to figure out what happens in |
|
|
77:30 | crashes. Eso and I think one another one. I can't remember. |
|
|
77:38 | thought it was the third one Yes, thus that is quantum |
|
|
77:45 | Colds or anyone interested in us can what the company argued about Fishing's |
|
|
77:52 | We're doing the same sector benchmark. what I found one of them using |
|
|
77:56 | this case was that the three commonly benchmarks Oh, there well defined. |
|
|
78:03 | both companies in this case used um a similar type environments and without. |
|
|
78:10 | here's just flip through the sides. then there's also little bit of this |
|
|
78:14 | and shot network, but I'll let produce the slides. And then next |
|
|
78:19 | I will start to talk a little about this cluster programming, so I |
|
|
78:24 | end here today. I'll take questions I'll start with cluster programming and np |
|
|
78:31 | next time. Well, stop sharing and take questions. So today I |
|
|
78:46 | wanting you, yes, basic background scalability in terms of perilous that needs |
|
|
78:52 | be managed and add the additional component the communication network in there and |
|
|
78:59 | Thio introduced a very simplistic model in of latency and bandwidth for capturing the |
|
|
79:06 | off performance of communication networks. Dr Johnson, one clarification. |
|
|
79:24 | on the bonus problem for 1.1 esos both inner product and Saxby run the |
|
|
79:32 | amongst the three cases you found to the best performance for all case of |
|
|
79:37 | um So when it says amongst the cases we're talking about, uh, |
|
|
79:42 | parallel ization out of Carroll ization and parallel ization, um and yeah, |
|
|
79:49 | ahead. The for the next That's us. For all cases. |
|
|
79:53 | referring to the size understanding. yes. But that's for all the |
|
|
80:03 | cases and all the size metric size . So, in terms of |
|
|
80:11 | yes, well, right, that's strategies are the loathe outer loop on |
|
|
80:17 | looks, whichever you found to be best for for me, uh, |
|
|
80:22 | the normal measurements, the problem one , I believe, and whichever we |
|
|
80:29 | the best performing. You use that with or three organization flag, compiler |
|
|
80:39 | blood and measure the performance with how matrix sizes as well as all the |
|
|
80:46 | cases. Okay, so sorry. you repeat so when it does amongst |
|
|
80:57 | three cases, I mean, they choosing one of three. Yes, |
|
|
81:01 | either in a loop liberalization on the loop. Fertilization off. Realization |
|
|
81:09 | uh okay, so we'll choose one inner outer and both, and then |
|
|
81:18 | run everything okay? Outside a question . So it was very obvious that |
|
|
81:25 | , um, the the both um, the first, the |
|
|
81:34 | So one slash 24 or 24 slash or whatever the case may be the |
|
|
81:39 | thing lost ones are essentially the same only paralyzing the inner and only paralyzing |
|
|
81:45 | outer. Um, is there is overhead and specifying the one, or |
|
|
81:52 | it doesn't end up being exactly the as if we had just paralyzed the |
|
|
81:58 | and just the inner. Um, my opinion, there should not be |
|
|
82:06 | overhead because, um, the outer . In that case, even if |
|
|
82:13 | do not specify a magma from the , it should be handled by the |
|
|
82:20 | thread. And if you define a of non threats one for that. |
|
|
82:27 | of love. It's basically just a statement. Okay? Because they're they're |
|
|
82:35 | that end up getting it expanded by compiler anyway. Right? So I |
|
|
82:38 | imagine. Okay, 12 eso is to be run by the master threat |
|
|
82:44 | , A so long as we're using threat in the afternoon. Okay, |
|
|
82:57 | . Thanks for very good questions. other question? Okay. If |
|
|
83:16 | then I will end today's session and you. We'll be back on |
|
|
83:22 | Thank you. Thank you. Thank |
|