© Distribution of this video is restricted by its owner
00:06 | I won't say what happened there. right, so a little bit more |
|
|
00:12 | the interconnection network. So last I talk a little bit about networks |
|
|
00:23 | are used in particular in clusters designed high performance, in which networks is |
|
|
00:32 | integral on a very important part and represent the nontrivial cost of the system |
|
|
00:40 | trying to get give you some also cost models against some sense for how |
|
|
00:53 | cost is kind of related to The topology ability are networks are being |
|
|
00:59 | together as well as a little bit . They're all characteristics in terms, |
|
|
01:05 | capabilities. What in particular diameter, agency. Andi. Awesome. And |
|
|
01:15 | by section, with respect to kind congestion in some sense of the |
|
|
01:23 | And ended up, I guess, , the lecture we're talking about factories |
|
|
01:30 | has become, I will say, dominating into connect for logical clusters designed |
|
|
01:36 | performance. All right, so step a little bit, uh, talk |
|
|
01:46 | crossbars that has been used and is on. Then we'll talk about the |
|
|
01:54 | off so called multistage networks. And fact, the factories are mostly used |
|
|
02:00 | small to stage networks in the sense there computers are the least of the |
|
|
02:06 | on all the internal. Also, tree are in fact stages or switches |
|
|
02:13 | just used for switching. Um, then caveat. Some of the networks |
|
|
02:21 | has what's known as combining features so can do reduction in the network, |
|
|
02:28 | instance, as well as replicating messages broadcasting. All right, so I'll |
|
|
02:36 | about the crossbar first and then the network and then towards the and talk |
|
|
02:41 | little bit about dropping in this Yeah, all right, the |
|
|
02:48 | My looks like this. So in nice part about the crossbar is that |
|
|
02:55 | cross part like this one is also known as non blocking. So that |
|
|
03:01 | anyone on the SE notes on the hand side can send messages to anyone |
|
|
03:07 | the notes on the top on the . And there's no contention. So |
|
|
03:15 | is a pathway between any pair of without causing blocking off other messages. |
|
|
03:22 | that's okay, one of the benefits cross bar. But it is, |
|
|
03:30 | , also relatively expensive Thio implement. that's why it's the usage has been |
|
|
03:43 | , and it depends on the technology the size networks, but or |
|
|
03:49 | Mom wants ability, whether it's useful or not. And I showed an |
|
|
03:54 | . I think, last lecture on fully interconnected that is effectively than the |
|
|
04:00 | equivalent crossbar that WAAS or is used some systems today, notably by great |
|
|
04:12 | than in our own by its P . Anyway, so here is kind |
|
|
04:18 | one very kind of famous example. computers by now more obsolete. But |
|
|
04:24 | creates a big revelation. When I built, the so called Earth simulator |
|
|
04:29 | embarrassed the US by being the most computer at the time by a good |
|
|
04:35 | . And it costs the U. . To start several programs to build |
|
|
04:44 | in the number one thrown, so speak and high performance computing. |
|
|
04:50 | this computer, when it was required his own power station. It |
|
|
04:53 | very power hungry. Well, that's so much for this particular slide that |
|
|
04:59 | shows that this system effectively implemented the between 640 processing notes. And as |
|
|
05:11 | can see in the middle of the , implementing this, uh, require |
|
|
05:17 | almost 3000 kilometers worth off cabling in to got this, um, |
|
|
05:26 | Ondas. You can see On top the slide was a building built for |
|
|
05:31 | for the computer, and again the station was built next to it in |
|
|
05:37 | the power of the computer. Here's little bit off or they may look |
|
|
05:41 | in terms off. Ah, this up together is one floor for the |
|
|
05:46 | , and then there's two floors underneath doing the infrastructure for power and cooling |
|
|
05:54 | once or alone, Well, not height. Floor is just used for |
|
|
05:59 | off the computer. Anyway, I That's one thing that anyone in this |
|
|
06:06 | at least have heard of at some in their life. Now there is |
|
|
06:12 | example asses that crossbars are being used create computers again. One of the |
|
|
06:20 | synonymous with supercomputers since the founding of company by Seymour Crane, that legendary |
|
|
06:28 | architect um, they design their own parts, which is kind of distributed |
|
|
06:37 | hierarchical across parts, which they're trying illustrate them, uh, this |
|
|
06:42 | So then they build networks using this , which there is a 64 port |
|
|
06:51 | , and this picture shows a little . Tires put together as, |
|
|
06:56 | every wall was known as tiles that do report tiles that is organized in |
|
|
07:05 | off kind of a mess, or rows and eight columns off these switching |
|
|
07:12 | that are 16 by eight Cross part they are full crossbars and in |
|
|
07:19 | this particular switch chip. In it's not just this one, |
|
|
07:26 | but, as it says on the hand side of this line is in |
|
|
07:30 | , five crossbars in one piece of . And they did that in order |
|
|
07:38 | separate different types of traffic. So are the way the communication protocol is |
|
|
07:48 | implemented. That they do uses their channels for setting up communication or our |
|
|
07:57 | in the system. Uh, and do not interfere with the actual data |
|
|
08:04 | that has a separate data channel. in this case, eso this chip |
|
|
08:10 | designed about when he was put in two years ago. It was signed |
|
|
08:15 | from the backside. Suspected it would something new coming up in a year |
|
|
08:19 | two. But anyway, at that , each on the links, if |
|
|
08:26 | like, uh, each port on particular switch, and I capability 200 |
|
|
08:34 | per second right on the Layton City get through. The switch was, |
|
|
08:39 | average, 359 seconds. And I'll a little bit more on later |
|
|
08:46 | How they used to switch to put a network thio interconnect orbit larger systems |
|
|
08:54 | 64 notes S o N a questions crossbar or this particular chip. So |
|
|
09:12 | question that was, eh? So not sure that there will be any |
|
|
09:17 | nable inside, but at work, do a lot of, um, |
|
|
09:24 | defined infrastructure. Um, just becoming popular for businesses nowadays. Um, |
|
|
09:30 | any of the virtualization aspects translate Is it all lost? I'm talking |
|
|
09:38 | , like, the ability to um, certain, um, parameters |
|
|
09:43 | metrics. It's all of that loss soon as we start virtual izing any |
|
|
09:49 | our resource is, um or would have to have, like, a |
|
|
09:54 | defined allocator to really understand what's going underneath the hood and these virtual |
|
|
10:02 | I do believe that is correct. , so and I don't have, |
|
|
10:13 | know, Chris sponsor for you about in these networks. So if you |
|
|
10:20 | at sort of wide area or local network or your typical things for when |
|
|
10:27 | use Internet, sauna and then one virtualized it and build, said software |
|
|
10:40 | network layer on top of the actual infrastructure. And then there is also |
|
|
10:45 | quality of service aspect when you get priorities to different types of traffic. |
|
|
10:53 | so a former virtual ization of the I'm not aware of has been used |
|
|
11:03 | cluster settings. But the quality of aspect is there. So you can |
|
|
11:09 | , different types of traffic in the . And that's in fact supported in |
|
|
11:14 | switch. And it's also supporting and is made by the other dominating, |
|
|
11:23 | , vendor in terms off networks for performance computer that is Milana Hawks that |
|
|
11:31 | , um, the offer quality or or prioritization of traffic in the |
|
|
11:41 | Okay, that makes sense. And suppose that in any sort of HPC |
|
|
11:46 | you wouldn't be interested in virtualization anyway of the the overhead that's associate ID |
|
|
11:53 | with the virtualization itself, right? . Um, so it I can |
|
|
12:04 | don't both argue for and against because you, most of the large scale |
|
|
12:13 | there are large scale because they tend favor in some sense extreme applications that |
|
|
12:23 | either the entire system or large fractions the system. Otherwise, they could |
|
|
12:30 | well. I built, if you smaller, cheaper systems than these extreme |
|
|
12:35 | systems. On the other hand, you do things like, uh, |
|
|
12:42 | cloud service providers that also have very systems, the virtualization may actually help |
|
|
12:51 | of wall off different applications from each . So in that case, having |
|
|
12:57 | software defined network on top of the physical infrastructure mean, in fact, |
|
|
13:06 | in terms off security aspect and interference a security perspective between different job training |
|
|
13:13 | the system at the same time, But in terms off system that's pushing |
|
|
13:23 | performance, it probably isn't helpful to virtualization. It may not be worth |
|
|
13:31 | effort. Okay, that makes I was thinking more in terms of |
|
|
13:36 | using software to, um forced you a certain part of memory has to |
|
|
13:43 | , you know, certain thrashing or distance communication, long distance with quote |
|
|
13:50 | it, the academic sense, so that's more in terms of the |
|
|
13:56 | management and trying to figure out how allocate. Um, the process is |
|
|
14:01 | it's m p I across their notes the system and trying thio get them |
|
|
14:09 | it does not cause too much congestions high latency in the network. And |
|
|
14:19 | come back to the way the way is computer companies. So one is |
|
|
14:25 | the factories again, if they're fully out, then effectively there's, |
|
|
14:35 | enough bandwidth to think of them as paths between each off the leaf notes |
|
|
14:42 | causing contention in the network. and in case off using what I |
|
|
14:55 | , I believe last time. And know I have some. No one |
|
|
14:59 | in today, I guess the Dragon Network that this company using their |
|
|
15:05 | um, it is kind of an to the factory. They use basically |
|
|
15:13 | crossbars. So and principle there should be too much contention. But it |
|
|
15:21 | on the writing that we'll talk a bit about towards the end of today's |
|
|
15:27 | . So, um, but the manager results and trying to if it |
|
|
15:33 | some smarts, try Thio, minimize traffic in the network by suitable |
|
|
15:46 | but it does is also affected how writing is done. Okay, |
|
|
15:51 | thank you, Dr Johnson. This a question. Good question. |
|
|
15:55 | the I again for since there's no but the network and the principle of |
|
|
16:04 | these networks is a bit different, is also quite interesting in the extent |
|
|
16:10 | which companies decided that, um, need Thio build on design and build |
|
|
16:19 | own switch care in that trip. on commodity networking parts for these on |
|
|
16:29 | performance systems. That's what I was ask. It seems like this. |
|
|
16:35 | network interconnect will be a lot more than things we've seen in the past |
|
|
16:42 | this course, uh, or the network that has kind of no come |
|
|
16:54 | gone in popular too. So don't . But I would say the last |
|
|
17:01 | years, years if, um, been just a very small part of |
|
|
17:10 | market and very much just for the , Uh, performance systems in some |
|
|
17:18 | not necessarily extreme scale, but things applications air such that no matter how |
|
|
17:25 | try kind of design your algorithm, will be a lot of interaction between |
|
|
17:31 | notes. So that's why I mentioned . The MPP is where I want |
|
|
17:39 | the network not on the Iot but basically into the same level as |
|
|
17:46 | can your memory. So it's kind network was the first class citizen in |
|
|
17:50 | sense that main memory isn't the basically the network and to the processor |
|
|
17:58 | the same level as the Level three is, um, but commodity that |
|
|
18:06 | is cheaper. Ah tends to out . His proprietary stuff, uh, |
|
|
18:14 | a few years, but then it of catches up. Oh, and |
|
|
18:21 | has designed that is, again the on the high end of the |
|
|
18:26 | In terms of performance, they pretty always had had their own network |
|
|
18:36 | They sold off. Not this There's no nothing shot, but I |
|
|
18:46 | there's two generations back. They sold their tech network technology toe intellect this |
|
|
18:52 | trying to integrated into their efforts on year for network say Intel does also |
|
|
19:03 | processors, in addition to doing, to use. But it has kind |
|
|
19:08 | struggled a bit to keep up with even, uh, recently high volume |
|
|
19:18 | , but so on a zai mentioned is higher volume and cheaper and |
|
|
19:30 | uh, against difficult. Thio make modest, if not low cost, |
|
|
19:38 | even that tends to follow other protocols a few years. Delay in terms |
|
|
19:45 | the adoption in terms of even higher . But protocol is has a higher |
|
|
19:53 | , if constrictive follows the I Ethernet . So again, when it comes |
|
|
19:59 | the high end, it's hasn't been , um, why did Dr. |
|
|
20:08 | if you go sort of stepped down in the fairly and sort of high |
|
|
20:15 | oriented systems, then you'll find But not for applications that many fluid |
|
|
20:21 | and other applications that requires a lot inter processor communication. That is still |
|
|
20:30 | the nominating if you can afford and their performance. But it should also |
|
|
20:37 | said that the and I'll mention it on that, uh, because off |
|
|
20:49 | prevalence of Ethernet and all kinds on situations, even vendors, I'll switch |
|
|
20:59 | for it's busy systems. Um, their initial um, focus, I |
|
|
21:09 | say I'm getting the best possible performance of their switchgear. They have chosen |
|
|
21:16 | also figure out how to support the protocol on their switchgear and, |
|
|
21:23 | by their underlying designs, being very focused on performance that can even get |
|
|
21:32 | that a performance on the Internet Then you understand that Bonilla switch that |
|
|
21:38 | may get from, You know, not trying to put Cisco down and |
|
|
21:44 | of the other then switch vendors, their focus is not necessarily be in |
|
|
21:50 | the minimum latency but problem or in of throughput than being competitive in terms |
|
|
21:56 | cost. So the company mentioned melon they also support there are basically focused |
|
|
22:07 | Dudas and in Philippon company. But order to reach and get a bigger |
|
|
22:12 | , they also support Ethernet on their and do so competitive in terms of |
|
|
22:20 | with fewer Ethan and vendors. And turns out that pray for this |
|
|
22:27 | Um, kind of they do support Internet on it but have their own |
|
|
22:37 | , so to speak, um, the performance sensitive applications. But they |
|
|
22:43 | also talk to standard Ethernet devices and out how to translate the standard Internet |
|
|
22:50 | into their own. That has a lower overhead understanding. Even it |
|
|
23:01 | so okay, and then more questions comments. All right, So talk |
|
|
23:18 | about the multistage networks. As I already, the way the factories air |
|
|
23:23 | is in fact, Azaz multistage But there are a few of them |
|
|
23:29 | is still around. And as a bit, I guess our evolution of |
|
|
23:37 | hard to build high performance systems. wanted you to at least be familiar |
|
|
23:43 | the terms. So before that, gonna start talking about shuffles and perfect |
|
|
23:50 | and talked about the Shuffle Exchange And this is an example of what |
|
|
23:54 | truffle is on. The one that card, if anyone does, knows |
|
|
24:01 | the card shuffle is, and this exactly what happens here. So particular |
|
|
24:05 | thing with, uh, eight the right cars and typically card player |
|
|
24:13 | actions that you have and then the in the cards. And that's simply |
|
|
24:17 | this shuffle happens. So their networks in fact, implement that and I |
|
|
24:22 | show you in the subsequent slides. they're used in the shuffle. Interconnect |
|
|
24:29 | used in many multistage networks, and you can do the first thing, |
|
|
24:34 | then so it's very simple How you this destination address for resource If you |
|
|
24:40 | to do, um, kind of shot foot robbing in the network, |
|
|
24:45 | what you need. And sometimes you to do that. So you think |
|
|
24:50 | sounds like an odd thing. But fact, that's his oxen. What |
|
|
24:56 | in when you dio so change from column Major ordering. In fact, |
|
|
25:09 | time of fermentation that is required to that is, in fact can be |
|
|
25:15 | described as a shuffle around shuffle So it's in fact, a primitive |
|
|
25:22 | that can be very useful in many of competitions on network. So if |
|
|
25:30 | that suppose this type of communication pattern to do well for many applications, |
|
|
25:37 | here is an example now off, right, it is so 16 Port |
|
|
25:44 | Network. And if you can look how the things are drawn here |
|
|
25:52 | um, the you will see that look at the first kind of stage |
|
|
26:01 | the bottom half off the notes or are in to leave when you enter |
|
|
26:09 | first blue box column. So this what you know, shuffle in the |
|
|
26:16 | bits in this case is used in stage to build this omega network. |
|
|
26:27 | , so, um yes, What wanted to say was about so sometimes |
|
|
26:37 | the way these networks were used. that jihad processors on one side, |
|
|
26:44 | , on the left side and what on the right side waas memory |
|
|
26:49 | And this was kind of known as Dancel kind of approach to computer architecture |
|
|
26:56 | , you know, voice on one and girls on one side in this |
|
|
26:59 | , CP use on one side and modules on the others. And these |
|
|
27:09 | , I've been used. So IBM something they call the R P three |
|
|
27:16 | research parallel processor prototype, and I never made into products. I did |
|
|
27:22 | for as a learning vehicle but than result in some sense. Waas What's |
|
|
27:28 | as the SP two? That was scalable parallel processor. That was the |
|
|
27:35 | , Andi, this was kind of . This started building this type of |
|
|
27:41 | that is illustrate for a few different off total systems. Um, so |
|
|
27:49 | what's kind of known as the based , and it zaps you. It |
|
|
27:55 | like the same as you read. is not in terms of inter |
|
|
27:58 | but the structure is kind of Andi, um, the red errors |
|
|
28:06 | the top shows in which direction it's shuffle. So if you go like |
|
|
28:11 | to right and some interconnection stages in network is rather an untruthful than the |
|
|
28:18 | , and you can see that they're of different ratings of different number of |
|
|
28:22 | involved in the different stages in building network. This is something that has |
|
|
28:28 | been used on. Here is another that is similar to the previous one |
|
|
28:35 | one of the stages. He was as a butterfly interconnect. So anyone |
|
|
28:42 | about butterfly except the ones flying But in terms off either algorithms or |
|
|
28:49 | before, no takers lot. If knows about past 40 transforms no, |
|
|
29:11 | , fast forward transformed. The data pattern is in fact, exactly by |
|
|
29:16 | fly, so it's very common and it's actually very good form of |
|
|
29:23 | networks that I will talk for a more about it. However, this |
|
|
29:29 | network happens to use one of these a butterfly interconnection for one of its |
|
|
29:34 | and building a multi stage network here a fully configured by the flying Network |
|
|
29:41 | this is exactly one way of the that during the data interaction and computing |
|
|
29:49 | first four year transform. So if were to do that, this would |
|
|
29:55 | kind of a perfect interconnection network because these intact data interactions will be directly |
|
|
30:02 | by links in the network. uh, eso this network was used |
|
|
30:13 | by a company called BBN N On for Nick and human ous. |
|
|
30:19 | I remember is a what the acronym for. And if they became perhaps |
|
|
30:30 | famous not for this butterfly machine. they were the ones that in fact |
|
|
30:38 | in the initial DARPA project to the on it to the Internet and establishing |
|
|
30:47 | connectivity between the East Coast and the Coast. And it was based on |
|
|
30:52 | switchgear that they built, uh, this company P m e N |
|
|
30:57 | They're not in the switch business, , but there are different, among |
|
|
31:02 | things, the defense contractor and and is a little bit off sort of |
|
|
31:09 | you can lay this kind of butterfly out, and I'm not going to |
|
|
31:15 | too much into the details, someone plugging in the numbers that I asked |
|
|
31:18 | a little bit about last time. it turns out it's a very good |
|
|
31:23 | trade off in terms of diameter and section with and put that costs or |
|
|
31:29 | or building the network. So it's only good for doing a 50 like |
|
|
31:35 | in which is kind of divide and style algorithm, but it's also good |
|
|
31:44 | for building networks. And I think next slide shows a little bit |
|
|
31:50 | um, different drawing then I had the stylist on the previous craft, |
|
|
31:59 | it's essentially the same thing laid out , and it shows how they |
|
|
32:05 | In fact, I think it was Dolly again that I mentioned before in |
|
|
32:11 | off doing being, I guess that and BP for research at the |
|
|
32:20 | Um, you also came up with I'll talk about later in terms of |
|
|
32:25 | when he was a graduate student, Catholic. So anyway, this is |
|
|
32:29 | things can be laid out in terms the Butterfly Network, and I think |
|
|
32:35 | the next slide, I know, it comes, I think on this |
|
|
32:41 | to them. So, um, , um that is being Dali after |
|
|
32:53 | pushing for this butterfly network instead, factories a zone option, he, |
|
|
33:03 | , came up with this notion of they call the Dragon Flying network. |
|
|
33:09 | , it did nothing. My name consulting for the great computer company. |
|
|
33:17 | and here is how no create, fact, put together there Dragon Flying |
|
|
33:27 | for their computer systems today using the switch that I mentioned early on in |
|
|
33:33 | lecture. So on the left hand of the slide, it shows how |
|
|
33:41 | kind of built the networks into, some sense, three layers the lowest |
|
|
33:50 | than think closest to the computer. there's just connectivity directly to the switch |
|
|
33:59 | the way they used is 64 which is that the primarily or typically |
|
|
34:07 | 16 of those 64 ports for connecting notes then. So that means they |
|
|
34:15 | another 48 ports left that they can to interconnect switches. And on the |
|
|
34:22 | hand side, what is shown is they call a group, which, |
|
|
34:30 | the left hand side, is consists 32 switches on those switches are fully |
|
|
34:40 | . So off the 48 ports that not used to connect up compute |
|
|
34:48 | they use 31 2 for a switch come connect to, uh, the |
|
|
34:57 | switches in the same group. So is enough then communication channels to connect |
|
|
35:05 | to 31 other switches. And then leaves 17 ports to connect between what |
|
|
35:19 | , uh, groups or 32 switches . So it's my work about all |
|
|
35:27 | them, the mouth of the calculus this kind of while building up the |
|
|
35:33 | one ends up well up to 544 that can be connected up. And |
|
|
35:41 | more than a quarter of a million processors in all that you cannot get |
|
|
35:45 | good systems with with a maximum off cops in the network. So that |
|
|
35:55 | , and since a lot of them directly connected, there is not many |
|
|
36:02 | that are being shared on cost for . And on the right hand |
|
|
36:09 | it's a little bit Let's march your . So in that case, what |
|
|
36:15 | proposed instead of using your single link , say, switches in a group |
|
|
36:23 | can use or pair up set of . Or that means instead of having |
|
|
36:32 | to 30 and for 31 on the , you connect to 15 of the |
|
|
36:42 | , so you get basically 16, , switches in a group and then |
|
|
36:53 | the right hand side. They also then to up links to connect to |
|
|
37:01 | part group in connecting two different But it's still a substantial number off |
|
|
37:11 | or compute nodes subject can accommodate in single system with three hops. So |
|
|
37:18 | is what on in fact, this of network is what's being used in |
|
|
37:26 | three so called exact scale system that far has been ordered in the United |
|
|
37:35 | . So this company has in one all the extreme scale computer, |
|
|
37:44 | , buildings that has been made in last two years in the US |
|
|
37:53 | so let's see what I had That was next. Another network. |
|
|
37:58 | any questions of this is a little related Ireland questions on this case, |
|
|
38:04 | for contention, and now I think have this slide towards the very end |
|
|
38:09 | the day is lecture coming back to particular network and a little bit how |
|
|
38:14 | deal with running in these dragon fly . Okay, so here's another network |
|
|
38:29 | has been, um, also used order that results. And it's a |
|
|
38:36 | type network, and it was originally for building phone switches. So this |
|
|
38:43 | something that cannot on their that's way and potentially also larger phone networks, |
|
|
38:50 | definitely see in bending phone switches. it is a recursive network. So |
|
|
38:58 | it shows the kind of one records step point You can live these networks |
|
|
39:04 | . This was also I guess I kind of a bit of simulation of |
|
|
39:08 | rickerson network works and basically implementing the of a full cross part. So |
|
|
39:18 | see if I can get through this bit. Um, so there is |
|
|
39:23 | split it into two parts. then you continue, uh, sort |
|
|
39:29 | half size in the middle, and you can't have this shuffle connections on |
|
|
39:34 | site. And then you keep doing a few times, and then you |
|
|
39:42 | eventually down to where they used to two switches. So if you don't |
|
|
39:47 | the ricker shin all the way to down as Waas on the previous slide |
|
|
39:53 | , right? So now you have by force, which is, and |
|
|
39:57 | upon what the optimist size in terms for switch might be. When we |
|
|
40:04 | about that last time, depending upon technology and then bandwidth, you may |
|
|
40:11 | able to get on the switch. number of portrait different, and they |
|
|
40:16 | to have grown a little bit over . This kind of simulation your shows |
|
|
40:21 | happens when you keep doing the record , and we can do it all |
|
|
40:25 | way to a very limited number of , um, in the switch. |
|
|
40:32 | it also has some properties that don't in Yangshuo on this slide that it's |
|
|
40:38 | that you have an ability also to rob around congested area. So it's |
|
|
40:46 | than a single path between source and . And that's also the case |
|
|
40:53 | um, Butterfly Network that I just you and the factories so you can |
|
|
41:01 | wrapped around congested areas on the network E. Think that holes as Moria |
|
|
41:15 | . You can look through the slides you're interested, but it's just appointed |
|
|
41:20 | has the features on allowing for wrapping other networks on, like the Butterfly |
|
|
41:33 | and some of the other ones that showed you are, in fact blocking |
|
|
41:38 | . But these are non blocking. part of the, uh, benefit |
|
|
41:43 | cost networks as well as the dragon in the factory. Okay, I |
|
|
41:50 | see one. Okay, so this now. So then course network |
|
|
41:56 | iwas or has been used. And may still be used on. Not |
|
|
42:02 | for sure, but there was this called Miracle that was started by Jack |
|
|
42:14 | . Fact of the member of Caltech Gus lecture showed you this one of |
|
|
42:20 | first parallel computing systems. That was of the start of the current generation |
|
|
42:32 | parallel computing systems. There were several that would call it the third round |
|
|
42:37 | actually lasted, um, on that in the eighties. And last time |
|
|
42:44 | was this cosmic cube that was a bit of a landmark design that was |
|
|
42:50 | by Geoffrey Fox and Chuck sites and and check. Since then, together |
|
|
42:59 | one of his students build, Ali a running protocol that is known as |
|
|
43:05 | or running that we'll talk about in bit. And, um Then Chuck |
|
|
43:12 | decided to often start a company called Todo Lo Leighton see high performance switch |
|
|
43:22 | . And then he also decided for company that but they were going to |
|
|
43:29 | lost to use close network topology for their switches. So this is just |
|
|
43:36 | picture how they did it. And of the thing that is also important |
|
|
43:42 | the hair is kind of skipped over far is how you module arise the |
|
|
43:49 | so you can build the large scale or the standard components and not have |
|
|
43:56 | kind of rewire things, depending on system that you intend to build. |
|
|
44:01 | the class network has in nice all being petition nable into standard model |
|
|
44:08 | action, then with arbitrary a large so or switches in this case. |
|
|
44:18 | this is just a bunch of pictures the date. So I think eventually |
|
|
44:23 | did this, I think the largest , and I'm not sure if they |
|
|
44:28 | , um, did in the largest . This company has said they had |
|
|
44:34 | own marionette protocol that was again designed very low latency high performance, but |
|
|
44:44 | , like I mentioned about melon, and pray. They also, after |
|
|
44:48 | few years, supported Ethernet on top its own hardware, infrastructure and low |
|
|
44:56 | protocol to get on a better performance the Internet and the your typical of |
|
|
45:05 | Commodity Infinite Switch. And they had in a different what they called spine |
|
|
45:11 | and switch cards through built up large systems when there is a little bit |
|
|
45:20 | on this picture of did certain this company, Jack Side, sold |
|
|
45:25 | company a few years ago. they have. They're not as visible |
|
|
45:31 | . I think the company still the that bottles still produces switches, but |
|
|
45:37 | Internet high performance season that switches, will say, by using for AM |
|
|
45:44 | hardware underneath. Um, so our compare a little bit a few different |
|
|
45:56 | . I'm not talking about drafting, , uh, stop talking about topology |
|
|
46:01 | this point. Andi, see if any questions on what I've talked |
|
|
46:10 | So if anyone just try to somewhere what it's worth, kind of remembering |
|
|
46:18 | za basic construct for doing multistage The shuffle connectivity on by the fly |
|
|
46:31 | , uh, are worth trying to , because those air used in building |
|
|
46:38 | kinds Off network has slightly different properties close. Or Venice Network that was |
|
|
46:46 | way back for phone systems or phone has proven again, having very nice |
|
|
46:54 | and mhm being in the currency. they may be used to buy it |
|
|
47:01 | company that bought their income, but , in terms of building logical |
|
|
47:07 | it's the factories and the dragon fly of the currently state of the art |
|
|
47:12 | How to Do things. Okay, , so the next several slides is |
|
|
47:26 | trying Thio make a comparison a little off, and I showed in terms |
|
|
47:32 | the Cost network and or starting with four crossbar and then, in that |
|
|
47:39 | , going from a full crossbar by switches of different numbers of input and |
|
|
47:45 | sport getting a more or less the interconnection network. So I'm trying to |
|
|
47:54 | a little bit, in this using their a number of switches and |
|
|
47:59 | costs and links as the cost Not necessarily volume area, as I |
|
|
48:05 | about in the Thompson great model. , so I think the next kind |
|
|
48:12 | slide shows a little bit of what . Um, just counting switches on |
|
|
48:20 | number of links and doing this particular with 40 96 inputs and outputs to |
|
|
48:29 | cross sports question top. So that's you get the Costis simply number all |
|
|
48:37 | one creature, 11 for each columns basically 40 96. And then there's |
|
|
48:45 | two unique links per switch, others input into our port sports. |
|
|
48:51 | if you didn't start to go through steps that I did in terms of |
|
|
48:56 | from the cross party building about the network to use to buy two |
|
|
49:02 | um, and building the networks in , the relative cost off the multistage |
|
|
49:14 | as, um in some sense in times lower. So in this |
|
|
49:22 | they full cross bars on the And since the number is larger among |
|
|
49:28 | means theme. The thing built by by two switches is, um nowhere |
|
|
49:38 | . You can also see the number links gets reduced in doing these |
|
|
49:43 | and then I can also see that four by four kind of results in |
|
|
49:48 | same sort of benefits in terms off cost, but then as to increase |
|
|
49:54 | number of porch the benefits? less because that becomes closer to the |
|
|
50:00 | crossbar. But you say, weeks. So this is just And |
|
|
50:11 | what can do some graphs showing, know what the trade offs start and |
|
|
50:15 | not stand on this flight on. think the other part was a non |
|
|
50:21 | feature that dimension about. Then that's depending upon what Master stage network |
|
|
50:29 | end up well again. They close hands as well as factories and |
|
|
50:36 | Networks has multiple passed between each pair source and destination pairs, so that |
|
|
50:44 | help alleviate contention in an airport. other thing that is typical in this |
|
|
50:52 | is like what's illustrated with the kree . I talked about the use arriving |
|
|
51:02 | to connect, uh, 16, 1000 your their typical configuration. So |
|
|
51:10 | sometimes called us bristling, where you and number of computing nose into the |
|
|
51:16 | switch shorts. So the interconnection network have as many Leafs knows, that |
|
|
51:22 | are computers that's simply this bristling and in terms of gray and young |
|
|
51:31 | was 16 computer knows looking into single , but the number off so outgoing |
|
|
51:44 | just in their switch. Waas So there's plenty or relative, |
|
|
51:53 | get data past from the leaves Now this. I'm not going to |
|
|
51:59 | through it. But there's just trying summarize the things I've talked about in |
|
|
52:03 | of thereby section with finding side out potential area costs in building the network |
|
|
52:12 | well. As uh, I Yeah, it's on the typical characteristic |
|
|
52:18 | that one. Andi I have noticed did not put in there Dragon find |
|
|
52:25 | the way credit gets it, but to give you a little bit |
|
|
52:33 | Sense off the networks. I've talked that. Uh huh. This comes |
|
|
52:41 | to me a little bit. This and as you mentioned a few times |
|
|
52:48 | some of the resource managers than take have attempted, uh, to take |
|
|
52:56 | network topology into account, since in of moving data, um, things |
|
|
53:05 | good at the level one caches and I get worse deliver to get |
|
|
53:08 | A level three that's significantly produced when got two main memory and then it |
|
|
53:15 | even one more significant reduction. When end up having to move, they |
|
|
53:22 | across saying to connection network. So or resource managers or even the runtime |
|
|
53:36 | supporting, for instance. And depending upon what it is and |
|
|
53:42 | yet sometimes information then from the resource and tried to have figure out good |
|
|
53:49 | . Of course, many their interaction of data dependent and you one cannot |
|
|
53:58 | know at compile time or a data . And yet they have placement |
|
|
54:07 | What the best allocation is respect to . That may not be so |
|
|
54:12 | But again, the dragon fly, well as the factories, are try |
|
|
54:19 | alleviate and provides a lot more uniformity terms of data accesses than some of |
|
|
54:28 | earlier networks, which is part of reason why they have taking a |
|
|
54:34 | As I mentioned, the factory network not been used for over 30 years |
|
|
54:39 | building systems of all scales and the Flying Network has been used for on |
|
|
54:46 | . I think mostly by Craig. anyone is free to construct their own |
|
|
54:51 | according to the Dragonflies principle, but factories are something being predominantly |
|
|
55:01 | I do not know what the current storm does in terms of trying to |
|
|
55:06 | topology where let's learn into some degree out of this so called PBS a |
|
|
55:16 | manager. So there was just home your information, and that has been |
|
|
55:26 | to being issued that people have struggled it, but also tried to remove |
|
|
55:30 | problem we're trying to build networks up less sensitive to the place. |
|
|
55:41 | so next a few minutes about unless their questions. So okay, |
|
|
56:04 | talking about driving. Okay, so are many different ways of thinking about |
|
|
56:13 | . Uh, so one is thinking running. Basically, that tends to |
|
|
56:20 | the case and a lot of including many Internet, where you have |
|
|
56:29 | set the routing tables in each. , that tells depending what the destination |
|
|
56:35 | , what what outgoing, fortunately used forward messages. The other one is |
|
|
56:43 | opposite. It's randomized running, you , talked about that in the next |
|
|
56:49 | slides. And of course, there adaptive routing that tries to, |
|
|
56:57 | adapt thio and choose running past according in the lead agency or in the |
|
|
57:06 | or congestion and there's other things that did not put on here. I |
|
|
57:14 | I also kind of It's not as running that he has tried to out |
|
|
57:19 | . Shortest distance, Uh, but not necessarily optimal. Then their so |
|
|
57:27 | soaring forward, I will talk about virtual cut through and the warm running |
|
|
57:32 | I mentioned. That was something that either invented or certainly popularized by jack |
|
|
57:43 | on bond. Ben Donny. And there's something dimensional. They're running that |
|
|
57:51 | not going to talk more about And except I want to say, |
|
|
57:55 | you have something that, like we're a mesh network when you have very |
|
|
58:02 | defined dimensions in terms of the dimensions the mesh dimension, order rounding, |
|
|
58:09 | try to wrap the dimension and the order you know first X, then |
|
|
58:13 | then y Z, or whichever dimension anyone choose. But to do them |
|
|
58:21 | , they're out each dimension before switching running in a different dimension. Principal |
|
|
58:28 | go back and forth between them, the dimension order rotting enough piece, |
|
|
58:35 | ? And then, of course, all the destiny issues of deadlock and |
|
|
58:38 | lock, And I went up talk about it either. Except hopefully the |
|
|
58:44 | is that on their luck, it's that messages get stuck. Uh, |
|
|
58:53 | . Waiting for each other, like talked about in terms off, I |
|
|
58:59 | . Um, send receives an I in that. If you're not |
|
|
59:06 | , one can cause deadlock every messages for some other message to move live |
|
|
59:15 | is the opposite. Um, in sense of messages gets injected to the |
|
|
59:24 | . But some are never managed to and basically circulates in the network |
|
|
59:33 | And those two years, the last aspect aspects that Locke and live lock |
|
|
59:39 | a reason why There has been great about using adaptive rowing in these networks |
|
|
59:55 | proving that adaptive rowing is deadlock in look free is not necessarily easy. |
|
|
60:07 | but adaptive routing is, in used bye gray in their networks. |
|
|
60:18 | , maybe come back to that at end of the lecture. Any questions |
|
|
60:26 | this kind of labeling on different types routing the poor and talking a little |
|
|
60:31 | about each? Okay, so here a little bit Anyone that is taking |
|
|
60:42 | network class knows about TCP and i and message structure. And, |
|
|
60:55 | I play headers. I t new four and six and cetera. So |
|
|
61:06 | main point time for me in this about bringing up this slide is to |
|
|
61:13 | a little bit about what you may have seen before, which is flits |
|
|
61:20 | fits. So there is a lot attention again Thio performance and protocols for |
|
|
61:36 | works for clusters and MP please. , um, So the header sizes |
|
|
61:47 | big issue eso on trying to have headers because many times in particularly if |
|
|
61:56 | think about synchronizing processes across notes, payload is very small, so overheads |
|
|
62:10 | a serious issue. And when I about the bill gene machine in |
|
|
62:17 | they built, in fact the dedicated for synchronizing processes so they didn't have |
|
|
62:23 | deal with the normal message passing in data networks. Also, in terms |
|
|
62:30 | the switch that crate built that was , in fact, five different prosper |
|
|
62:40 | in one dealing with different aspects of communication in order for, for |
|
|
62:47 | synchronization and control messages, Thio either delayed or interfere with the data. |
|
|
63:02 | now flits is commonly used and there's oriented running protocol, and what it |
|
|
63:15 | is kind of the smallest flow control on the flip. It is then |
|
|
63:23 | up off the number off fits, that's it tends to be exactly matching |
|
|
63:33 | underlying physical interconnection structure. And that's I said on the slide so many |
|
|
63:44 | , um, one use a very number off. I'm a physical |
|
|
63:53 | If that's what to use, you also use optics, and with |
|
|
63:57 | it's a little bit different. But the no level it's in the case |
|
|
64:05 | you know, to switch that copper tends to be the cheapest alternative and |
|
|
64:13 | good enough in terms of performance that can do. Use copper cables so |
|
|
64:19 | kind, you're down to physical So if it may very often the |
|
|
64:25 | four bits, but you don't do control on every four bits, you |
|
|
64:32 | a bunch of them and train going the set the wires, and that |
|
|
64:37 | the flicked for what you do so . So all right, and then |
|
|
64:49 | upon what you do. As I , the headers they do, I |
|
|
64:54 | have like in I P. you have priority quality service on all kinds |
|
|
65:01 | other attributes and encoded into the Um, now randomized trotting, uh |
|
|
65:14 | exactly what is, um so Let's . And that is a well known |
|
|
65:24 | scientists that Turing Award winner. Among things. He came among other |
|
|
65:31 | he came up with WAAS. This of random, my strolling Andi he's |
|
|
65:37 | of joke about that. It's kind a really joke, as far as |
|
|
65:41 | know, as this is the one you is a Brit. So the |
|
|
65:47 | that you keep post office send mails you want to send a letter from |
|
|
65:53 | to somebody else, the post office really try to get it directly to |
|
|
65:58 | recipient, but actually send it to arbitrary random place first and then Sunday |
|
|
66:04 | where I was supposed to go. this is, in fact, a |
|
|
66:11 | balancing technique. So instead off trying find the so the best path from |
|
|
66:23 | to destination, there are things to random intermediate destination and from there to |
|
|
66:33 | final destination, so that minimizes that of hot spots in the network. |
|
|
66:43 | this is what really and driving or I starting tends to be eso. |
|
|
66:51 | is again, depending upon. Mentioned a little bit in terms of petition |
|
|
66:57 | of petitions among notes that sometimes and networks, randomized assignment or petitions to |
|
|
67:07 | may in fact be beneficial. Instead trying, Thio, you know, |
|
|
67:12 | minimum distances and many or because minimum is not alone on telling you what |
|
|
67:24 | you're going to get, because there gets some links that are shared between |
|
|
67:30 | routes, and then you get congestion it's not so trivial to figure |
|
|
67:37 | have to do the optimum placement. drowning was also used in terms off |
|
|
67:45 | connection machine, and I mentioned a times that was the first that was |
|
|
67:49 | good factories. And it's also the I used to work for before coming |
|
|
67:54 | In which, and we used random starting in this factory. So as |
|
|
68:02 | can see again from the stylish picture the right that each on the leaf |
|
|
68:09 | around things at the bottom had in case two options for uplinks into the |
|
|
68:20 | and then at the first level of tree internal notes than each one of |
|
|
68:30 | also had to up place and uh, one level above the |
|
|
68:39 | It has both four uplinks and four links. But the lowest layer in |
|
|
68:43 | case, huh? To a place four groundings. So there was a |
|
|
68:48 | of the factory is not the full factory, but the way the around |
|
|
68:53 | was used. Industry is randomly select one of the airplanes one sends messages |
|
|
69:01 | , so that balance the loads. once you kept Thio, get to |
|
|
69:06 | lowest common and sister in the tree get to the proper leaf node, |
|
|
69:11 | it's a dedicated path from that turnaround down the tree to the leaf |
|
|
69:17 | so randomized things on the way but had deterministic running on the way |
|
|
69:26 | the other machine on that I'm aware that you some form of random ization |
|
|
69:35 | computing on a city for help. genius element processes that was designed by |
|
|
69:42 | very well known computer architect Burden Um and he used one of these |
|
|
69:51 | of dancehall approaches to building the parallel so he had and interconnection network between |
|
|
70:00 | on one side and memory modules on other side. So what he |
|
|
70:04 | he randomized the allocation off data to memory modules in order to try to |
|
|
70:11 | the chance of hot spots in the or for the memory modules themselves and |
|
|
70:19 | one machine. That kind of user was also this fluent machine I waas |
|
|
70:27 | by one of my students when I a jail university. And then there |
|
|
70:31 | actually built by some people in but it never was a commercial |
|
|
70:37 | But one of the interesting part of was adopted by others in terms off |
|
|
70:44 | programming models. I would say was you that this machine had apparently prefix |
|
|
70:49 | the basic instruction and that waas, , that in a programming language by |
|
|
70:57 | fellow called by kind bailouts at Carnegie that adopted this idea and showed the |
|
|
71:03 | or prefixes basic instruction. Uh And all right, so that's what |
|
|
71:11 | wanted to say about brand. I sprouting, um store and forward |
|
|
71:18 | A. Za typical way. Things done in many networks. It's not |
|
|
71:24 | in this high performance network for but certainly and local and wide area |
|
|
71:32 | is very common, and I'll try illustrate it on the next slide, |
|
|
71:37 | believe. And then there is kind a no improvement on the store and |
|
|
71:41 | . There is no less virtual cut networks. And, um, the |
|
|
71:48 | is, uh, that in virtual through, you don't necessarily store the |
|
|
71:57 | for each hop. So I will that on that on the next two |
|
|
72:02 | that tried to put the limit of graphic administration of the story and photographing |
|
|
72:07 | some text on it. So the forward running is the symptoms that the |
|
|
72:14 | first decides. You know where Thio for? To send the message based |
|
|
72:18 | some routing algorithm on knowledge of the , either. And there are some |
|
|
72:27 | running tables or it has some knowledge network. And if it's some kind |
|
|
72:32 | adapt routing, then it also know about driving traffic. Possibly, |
|
|
72:40 | but it depends whether it has just local view or a global deal what |
|
|
72:44 | on in an aircraft? But the is, essentially that consents the package |
|
|
72:50 | the next note on why it? it gets the package on it looks |
|
|
72:58 | then the first thing that happens In fact, it gets put into |
|
|
73:04 | buffer or memory, and at some depending on what the policy for running |
|
|
73:11 | in the switch, it takes a at the header and figure out what |
|
|
73:15 | priority the package has and where it's to go. And then it puts |
|
|
73:21 | message Internet put, preferring one output three for that where it wants to |
|
|
73:27 | to go. But the point is in each stage in the running, |
|
|
73:33 | packets get stored in memory, retreat memory and then expect inspected. So |
|
|
73:40 | every packet ends up enduring a round toe memory. The virtual cut through |
|
|
73:49 | to be a little bit smarter, as soon as that gets the header |
|
|
73:55 | than inspects the header, and if turns out that it knows where it |
|
|
74:04 | so should go in terms of buffering it output buffer is free in |
|
|
74:08 | router. It is the allocation into output buffer and sends the header and |
|
|
74:16 | , merry way and effectively builds kind circuit for the trailing part of the |
|
|
74:28 | . So maybe there flips that are , so it ASL Ong as there's |
|
|
74:36 | that prevents the header. All the of the packets gets forwarded immediately and |
|
|
74:43 | not endure a round trip to However, if the header gets stuck |
|
|
74:50 | it shows little bit little or in middle, I guess, of the |
|
|
74:55 | rolls here than what happens at that . Their message. The whole message |
|
|
75:01 | stored where the headache can no longer . And it doesn't release the but |
|
|
75:11 | and path A has been using before to the rather where things get stuck |
|
|
75:20 | it gets a faster formatting through each the driving switches as long as the |
|
|
75:25 | and doesn't get stuck. But when gets stuck a to that point and |
|
|
75:30 | stored, and then you have to the process to try to eventually get |
|
|
75:35 | the destination, and that warm home is, um, I was saying |
|
|
75:47 | on the got to cut through, in the sense that and the |
|
|
75:57 | all the flips, if you get stopped if the header gets |
|
|
76:06 | So the message with all the different are in fact spread out across the |
|
|
76:17 | in the network, so none of pieces endurance around trip to memory in |
|
|
76:24 | one, or or anyone of the switch is on the way to the |
|
|
76:31 | . So what happens in this case , um So the good part, |
|
|
76:36 | guess, is that it doesn't endure own trip to memory. The potentially |
|
|
76:41 | part is that buffers and the writers the way still are occupied. If |
|
|
76:50 | had against. Stuck, however, . If you have reasonably good congestion |
|
|
76:58 | , it turns out, got this . Home running has been shown to |
|
|
77:04 | quite effective in this computer networks that used for again internally in customers, |
|
|
77:12 | MPP. So warm hole rounding has of been the norm for quite a |
|
|
77:18 | years and how to do her performance problem. I think this is kind |
|
|
77:27 | my last slide for today, and a little bit just tends to be |
|
|
77:34 | 90 biased and flavor on this great , but it was more or less |
|
|
77:44 | on relative ease and finding some data for it. So this what I |
|
|
77:53 | to say where this slide is essentially things that they, in addition to |
|
|
77:59 | their own switch that were designed for , late and see they, in |
|
|
78:07 | , do not use the infinite Bond Protocol. Honor is a open standards |
|
|
78:13 | like Internet, but they took the Ethernet for a call and made their |
|
|
78:19 | what they call high performance computing, HPC Ethernet that has ah less of |
|
|
78:28 | header overhead. Soas faras remember that afforded by tender instead of 64 by |
|
|
78:33 | . And so that's one of the they re engineered in terms off. |
|
|
78:39 | then the re engineered the whole protocol when the Internet. Then they use |
|
|
78:46 | running in their network. And again used the Dragon Fly Network, where |
|
|
78:51 | lots off redundant or optional pathways. running messages and there is no more |
|
|
78:59 | the distance and three hops between endpoints the network. Um, and this |
|
|
79:07 | just the effectiveness on their adaptive routing as well as separating the traffic between |
|
|
79:14 | and data movement. So, the bottom science bottom graph. It |
|
|
79:22 | shows the gain by using their particular control and adapted running protocol that pretty |
|
|
79:28 | everybody wants in terms of the all the task regardless, or whether |
|
|
79:36 | was a synchronization or, um, of many toe one that is kind |
|
|
79:44 | a gather type operation, right? and they also always kind of a |
|
|
79:50 | operation. Um, that everything ended completing sooner. Then it did, |
|
|
80:00 | particular skin for adoptive running in congestion . So when that hope I'm giving |
|
|
80:09 | a flavor off the networks and that is a lot off attention first paid |
|
|
80:15 | attention Thio or today have toe design . But also there, right, |
|
|
80:22 | think that and congestion control that is in the switch cares that they |
|
|
80:38 | So at that time is up and take questions. So ah, and |
|
|
80:52 | thing. So if you, from jobs at some point in your future |
|
|
81:01 | these clusters hopefully would appreciate that when placement on the different processes that you |
|
|
81:12 | use for NPR makes a difference on kind of best case scenario or what |
|
|
81:18 | observe the fact that in general the is the shared resource. So that |
|
|
81:26 | you're gap impacted by other jobs running other notes in the system because even |
|
|
81:31 | they do, sections of the network be shared or you don't get necessarily |
|
|
81:40 | performance. Um, and that comes to the question. Whether you can |
|
|
81:44 | of walk off you want a piece network cannot be impacted by other traffic |
|
|
81:50 | the network on. So why should some variability when you do benchmarking or |
|
|
82:02 | performance in these systems that comes from network itself? Unless you manage your |
|
|
82:13 | lucky to you sketch their own network be allocated to a network that is |
|
|
82:22 | not having anything in common with other of the system where other jokes were |
|
|
82:43 | , thank you for today, and was stopped sharing my screen. |
|
|
82:56 | So then I guess I'll stop the as |
|