© Distribution of this video is restricted by its owner
00:04 | so today a little bit more about I Andi. I realized I may |
|
|
00:11 | cover everything I wanted to talk about NPR today, so there may be |
|
|
00:15 | a little bit next lecture to. , um, I'll start up talking |
|
|
00:27 | the one sided communication so today and various suspects of it and hopefully also |
|
|
00:34 | about something. There's no NAS virtual . And, um so in the |
|
|
00:45 | lectures about NPR and it has been known as the two cited communication where |
|
|
00:53 | and receiver processes there, um, in one way or another, whether |
|
|
01:00 | blocking or non blocking. But they are whether it's a symmetry between sender |
|
|
01:08 | receiver. This is not the case it comes to the one side and |
|
|
01:15 | , and we'll talk about the differences and how the one sided works and |
|
|
01:21 | it was, uh, introduced. when that c all right, so |
|
|
01:34 | is steady. I remember that processes are very much self contained on like |
|
|
01:42 | when I came to open and be . In this case, each process |
|
|
01:48 | his own private memory. There is Chand anything, really, between |
|
|
01:54 | There are have their own code and own in the way we do |
|
|
02:00 | They have spin the model that same for this class used they all the |
|
|
02:10 | , and they all have their own . And as we talked about so |
|
|
02:16 | , it waas message passing. The option for is changing information or data |
|
|
02:23 | the various processes. So now to this one sided communication one actually need |
|
|
02:31 | . Some notion off a t least accessible memory. Someone process can either |
|
|
02:42 | data to retrieve it data from some process memory space or right into |
|
|
02:50 | So this is the notion off, , having a mechanistic to create remotely |
|
|
02:58 | memory, as it's kind of illustrated this slide. And as you will |
|
|
03:04 | when we talk about it, that again done within, uh, |
|
|
03:12 | So it is local to communicators the these things is being done. |
|
|
03:21 | eso this is basically then creating uh, notion off the global address |
|
|
03:30 | for a particular communicator and the vocabulary terms of M p. I |
|
|
03:40 | it's called the window. So all processes within the communicator they have should |
|
|
03:50 | access to this created global and and we'll talk about how this is |
|
|
03:56 | created in the next several slides. just shows what you can do. |
|
|
04:04 | can go on retreat data from other and globally accessible memory, or one |
|
|
04:13 | write, tore up, take that of memory. It still means that |
|
|
04:17 | decide who create this globally accessible memory part of your private memory, but |
|
|
04:25 | doesn't have to be all of So here is a little bit of |
|
|
04:31 | dynamic of whatever click through illustration of difference between the two sides of communication |
|
|
04:37 | the one side is for this is the two sided first. What happens |
|
|
04:42 | again that there is maybe the receiver that post receive, and then the |
|
|
04:51 | again. I want to send And then these buffers, in terms |
|
|
04:56 | the center, then gets handed over the NPR Communications library, takes the |
|
|
05:05 | from the Senate buffer and or tells the pointer to where the data should |
|
|
05:11 | from, and then get said via receive buffer to the receiving process |
|
|
05:20 | So this is again illustrating a little of the synchronous behavior in terms of |
|
|
05:29 | the communication library manages the data that and receives, uh, statements in |
|
|
05:36 | code ask things to be happened. it's that's what's a little bit to |
|
|
05:41 | discussion between non blocking and blocking and happens why things can proceed, even |
|
|
05:47 | of in the blocking version. But has to wait until again. The |
|
|
05:52 | a library has taking care off and that things can be overacting now in |
|
|
06:01 | of the one side and communication so done differently is on this case |
|
|
06:09 | side. It's the same, but than it receives the data than it |
|
|
06:14 | can directly deposit the data without receiving on the receiving process. So this |
|
|
06:26 | the basic differences. It it is of one sided only one, and |
|
|
06:33 | in this case was illustrated for, , descend action or the desire to |
|
|
06:40 | or update the memory in the recent process space. This just shows a |
|
|
06:50 | bit kind of illustrate. The difference that one part off why the one |
|
|
07:00 | was introduced is performance in the Zonda Predictor, and if you have blocking |
|
|
07:08 | now, there's only one process so you don't need Thio in any |
|
|
07:13 | wait for the receiving process to be to receive the data. It's also |
|
|
07:22 | case that when we have the two of communication that always needs to be |
|
|
07:29 | sense and receives, and there is such requirement in the one side, |
|
|
07:34 | communication. So depending upon what's your in, is the code on the |
|
|
07:43 | process. When it comes to one , communication does not need to have |
|
|
07:49 | explicit expectation or, um, have matching sort of receive statements in the |
|
|
08:00 | because she don't necessarily know how much or have any sense to expect. |
|
|
08:06 | it's kind of difficult to figure out many receives you need to insert |
|
|
08:12 | So that's the other part of creating flexibility in her processes interact by having |
|
|
08:18 | one sided communication. Dr. Johnson , confusion back to slides ago to |
|
|
08:26 | one side of communication Example. Eso yes, the erecting was on the |
|
|
08:32 | . Um, so those air those the buffers, correct, they |
|
|
08:37 | Say what the n p i communications does. Okay, so I |
|
|
08:47 | e guess it could be a but not necessarily. Right, |
|
|
08:51 | so that that, um, that's the data that was sent and |
|
|
08:58 | That's there all the time, So that that's what makes it possible |
|
|
09:03 | the receiving, um no to Hey, I'll get it whenever I |
|
|
09:08 | , but I don't need thio signal the other one as opposed to excited |
|
|
09:15 | . Now it doesn't single anything. , it's like kind of a normal |
|
|
09:25 | statement just updates that memory. It happens to be in another process instead |
|
|
09:33 | the process. Memory off this in case, sending process. Okay, |
|
|
09:41 | it has no knowledge of whether or it has stayed in the buffer until |
|
|
09:46 | , uh, requested from the It's no buffer. Basically, it |
|
|
09:52 | a memory segment, so it doesn't on, look and receive buffer or |
|
|
09:58 | for the data. This just accessing and it may have get updated or |
|
|
10:04 | . And that's you have to make that if you depend on being |
|
|
10:09 | the other mechanism you need to use order to make sure that the message |
|
|
10:14 | gotten there Okay, so is this side of communication? Example? |
|
|
10:21 | can we think of it as sort the way that you can make promises |
|
|
10:25 | the compiler to guarantee that something is happen. Ah, good question. |
|
|
10:33 | need to think about that. There independent processes. Um, they have |
|
|
10:47 | own program counters. And if you synchronization and special ordering, you need |
|
|
10:53 | be very explicit about it and these . So I'm I think to put |
|
|
11:05 | more in towards the the I'm trying think of the differences between the guarantees |
|
|
11:12 | have whenever you're doing two sided versus loss of guarantees whenever you have a |
|
|
11:17 | sided communication. All right. um, if again you need the |
|
|
11:31 | of the cold depend on data having updated before you proceed to do |
|
|
11:41 | then the compiler does not automatically help out. And by inserting stuff for |
|
|
11:49 | , do you have thio? Make and I'll talk a little bit. |
|
|
11:54 | sort of how you kind off guarantee or synchronization when there is not involved |
|
|
12:01 | the communication action itself. Okay, . So because the it's natural to |
|
|
12:17 | of think of the very any different running on intuitive want time to think |
|
|
12:27 | it on similar, capable process but it's now not guaranteed. So |
|
|
12:39 | process, even though it's the same , may run on something elegant many |
|
|
12:44 | as fast as they receiving us the process. So it may have |
|
|
12:50 | you know, going through a lot instructions, a unique time than the |
|
|
12:55 | does. And if you're ensure that a certain action in the code things |
|
|
13:04 | to be updated than you need to sure that that has in fact happened |
|
|
13:11 | some mechanism, I probably will not you. Okay, that makes |
|
|
13:19 | E was just commenting because the obviously side of communication has less, |
|
|
13:27 | handles associated with it, if you . Eso that implies other stuff that |
|
|
13:31 | have to do as a programmer. have guaranteed that, um, to |
|
|
13:35 | able to take advantage of the one of communication to begin with, |
|
|
13:39 | Yeah, not so much at the . Well, for correctness. |
|
|
13:45 | I think you have to guard against . Make sure that when there |
|
|
13:51 | um, ordering required for corrections of code that one has to be explicit |
|
|
13:58 | it and how the code is Um, in terms of the |
|
|
14:05 | it's the headaches off the implement er's the MP I Library Communications Library to |
|
|
14:10 | sure that the hand shaped between sender receiver and access to the particular receiver |
|
|
14:22 | memory is, um, can be safely. So that, um, |
|
|
14:33 | off, he says, synchronization, . That is all that he didn't |
|
|
14:40 | the N. P I communication libraries not, something that one would need |
|
|
14:47 | worry about in the application code. as I said, the logic ordering |
|
|
14:54 | to be it's not built into the that the level of synchronization you want |
|
|
15:02 | happen through this Andruzzi process. Now the couple. So ordering has to |
|
|
15:08 | , then explicitly. That's when to extent it's necessary. Okay, |
|
|
15:17 | Thanks. Thanks. Okay. Welcome good question. So it's It's |
|
|
15:24 | Uh, correctness is guaranteed and what mechanism are. All right, so |
|
|
15:31 | can we're on this slide. So think I'm pretty much said that |
|
|
15:36 | uh it increases the flexibility. And d couples, um, communication form |
|
|
15:45 | ordering a synchronization that may be required correctness off the concurrent processes. Not |
|
|
15:53 | huh. The N p. I represents, um and I think I |
|
|
16:01 | much also said this already. so maybe I will, uh, |
|
|
16:09 | to the next line. So the sit next, set the slices, |
|
|
16:15 | how do you, um, basically ? All right, uh, about |
|
|
16:22 | parts off the process. Memory is from other processes. So on this |
|
|
16:34 | something pretty much when I said I'm quickly to that NPR's terminology for this |
|
|
16:48 | off process local memories that kind of up in it. Pool that is |
|
|
16:57 | . Bye. Processes in a communicator known as this as a window. |
|
|
17:07 | , so and maybe you'll start to sense by shows a little bit of |
|
|
17:11 | structure is so that means the MP library and needs thio. Know what |
|
|
17:18 | address is in the different processes as off this quote unquote global address |
|
|
17:27 | So when there's, um, communications nah, it happens in the right |
|
|
17:36 | of the well accessible memory space. there is a few off the |
|
|
17:48 | P I functions when you called um, within quotes generate this global |
|
|
17:59 | space, so it depends. as he tried to stay on this |
|
|
18:07 | here, whether memory for this the accessible and you just have already |
|
|
18:17 | allocated or not. So the allocate second line on the slide best player |
|
|
18:27 | allocates and globally accessible memory as well make it immediately available. The other |
|
|
18:39 | is one can also create this, , space if it's already is allocated |
|
|
18:50 | of memory and they want to included the global address space. And then |
|
|
18:55 | is a dynamic version where one can attached and detached, um, memory |
|
|
19:02 | this globally accessible address space and all talk about they said a little bit |
|
|
19:09 | on the next few slides. So tries to basically tell again which communicator |
|
|
19:20 | window nuts. I want, global accessible. Other cities you want |
|
|
19:26 | create or which communicator it applies and . So the name of that global |
|
|
19:37 | spaces and the rest of the part defines that particular address space. So |
|
|
19:47 | is some flexibilities, you know, address and the size and displacement from |
|
|
19:53 | starting address. That and there is flexibility in terms of how you do |
|
|
19:59 | , that it doesn't necessarily need to the same on displacement on all the |
|
|
20:07 | involved in setting up this global live the global yeah accessible address space within |
|
|
20:16 | communicator. So there's some flexibility, it's, um, how this is |
|
|
20:24 | done. But all the participating processes the communicator needs to process or have |
|
|
20:35 | call in the coat on. Then guess it's just listed. I'm not |
|
|
20:43 | through it. But eso this question the same for arguments wanted the last |
|
|
20:49 | . Thio inform MP I about how want. Some think that much than |
|
|
20:54 | can use Thio Tater Access rights. then there is just a simple science |
|
|
21:08 | cold and in this case, creating window where the variable or their race |
|
|
21:14 | , a memory space defined by a a starting address, Uh, in |
|
|
21:21 | case, the communicated with all the . So it's not restricted to some |
|
|
21:27 | of processes. In this case, waas three entire collection of processes the |
|
|
21:34 | World communicator And then, as it here, one gets the allocation and |
|
|
21:42 | they, uh, memory space should be freed up properly. Um, |
|
|
21:51 | it's a similar and I was creation the allocation of memory if it wasn't |
|
|
21:57 | done, but and then it can immediately available as well. In the |
|
|
22:04 | address space. Let's see. And is an example of that investigation. |
|
|
22:10 | arguments again, for all of them involved using the and convert Communicator on |
|
|
22:24 | . So this is the dynamic um that then can be used to |
|
|
22:33 | some later point and attach or deep after it's attached, if they want |
|
|
22:43 | date touch allocated space from this globally places so initially it creates the |
|
|
22:53 | but it doesn't initialize it in any , so it's kind of empty. |
|
|
22:57 | then, um, one can use attached and against use an example. |
|
|
23:04 | says Attach comes here also. First create dynamic, um, address space |
|
|
23:14 | then later on a particular piece of memory is then attached to that particular |
|
|
23:27 | . So, um, this is my guess how they actually function called |
|
|
23:33 | like And what the arguments things. , so you have it. It's |
|
|
23:41 | goes beyond what your has to do your assignments are These are typical of |
|
|
23:46 | advanced concepts and things I talked mostly first thing that you should be aware |
|
|
23:53 | these one sided communications that have gotten lot of traction. But, |
|
|
23:59 | but this intercourse I choose and not to make you use it. But |
|
|
24:05 | you want to use it and or least ends up using NPR and other |
|
|
24:12 | thoughts, of course you should know . Have some starting point producing this |
|
|
24:18 | side and communication. Okay, so was a little bit about how Thio |
|
|
24:27 | this and initialized this globally accessible memory a fraction off each process. |
|
|
24:40 | local memory and what's not then, this global memory part is then still |
|
|
24:49 | to each one of the processes, other processes cannot reach or update that |
|
|
24:55 | of the memory now the actions and Typically I used for Theis. One |
|
|
25:04 | of communication is was known as gets puts Andi sometimes with some associate ID |
|
|
25:14 | . And so this is also what been common is in what's known as |
|
|
25:20 | memory programming. Margarine indeed do have shared memory, as Waas has been |
|
|
25:32 | that actually physically implements what Peter then called as a distributed shared memory. |
|
|
25:41 | it has explicit mechanisms and, you , hardware on firmware that gives the |
|
|
25:49 | of a shared memory to the programmer you can write on bond retrieve variables |
|
|
25:57 | if it waas say in the No, that's we've talked about before |
|
|
26:04 | in the shared memory programming models of there are many than put this basically |
|
|
26:11 | into the shared memory. And, , is received data from the shadows |
|
|
26:18 | and I will talk about them in next few slides, and trying to |
|
|
26:23 | them on here is kind of a off them. Uh huh. For |
|
|
26:29 | . So here is then, the of this put that's you than one |
|
|
26:36 | wants to write into another process And as long as it is in |
|
|
26:42 | globally accessible part of the memory in promote process than theirs, right or |
|
|
26:53 | should proceed. There is no notion time, so to speak. But |
|
|
27:02 | and it gets handed over to the of library and the uh huh process |
|
|
27:10 | executes the port can continue on the process. I have no idea doctors |
|
|
27:20 | eventually happening or going on, so depends again synchronization, and I will |
|
|
27:31 | about in a bit. So it , Um uh huh. One needs |
|
|
27:38 | be careful and using this to make that one gets the correct behavior of |
|
|
27:43 | coats. Yeah, this is basically opposite that the process request data |
|
|
27:51 | you know, the process. And long as the request addresses something in |
|
|
27:56 | global a shared memory off the other , eventually the decided that will arrive |
|
|
28:05 | the requesting process. Mhm. Um , uh, there is also the |
|
|
28:17 | that is kind of the put with . So that means, uh that |
|
|
28:28 | target address, then get updated words value that is being put being on |
|
|
28:47 | plus operation being added to whatever it or is in that memory location. |
|
|
28:54 | it's right when and I had effectively what's in the target address. And |
|
|
29:04 | there's a number of UPS codes like had subject multiplication. And and |
|
|
29:14 | there is the reverse thing where you return, even got the data and |
|
|
29:20 | Thio, whatever is in the local . Yeah. Yeah, And then |
|
|
29:31 | a couple of other ones that can and swap and fetching up and fetching |
|
|
29:36 | is, you know, pretty much to the get accumulate in comparison. |
|
|
29:43 | is that? Taking the comparison and a conditional swap. So these are |
|
|
29:48 | basic kind of share memory, like that the one sided communication model enables |
|
|
30:00 | , I think. Oh, any on that? So far. So |
|
|
30:14 | so these, um, communications actions says that they behaves as if, |
|
|
30:23 | , you are kind of the only acting and the global memory. Now |
|
|
30:34 | said, depending upon a lot, of the code requires and there's issue |
|
|
30:42 | in which order things happens. And this says, I think the next |
|
|
30:46 | slice may illustrate it better because it's of pictures instead of text. |
|
|
30:51 | um, it tells when there is ordering implied between subsequent communication requests on |
|
|
31:03 | there is now, so talk to . But this is the documentation, |
|
|
31:10 | text to this pictures of this says , uh, illustrates what it said |
|
|
31:16 | the previous slide that basically, if have so and sequence of put |
|
|
31:23 | there is no guaranteed ordering in which will be updated or delivered upon the |
|
|
31:33 | process so there can be many reasons things the right out of order organs |
|
|
31:41 | on the border on the receiving Just as example again, if these |
|
|
31:51 | runs on different nose in the network no guarantee that the path that the |
|
|
32:04 | so they put statements, takes in network on the same and that |
|
|
32:13 | um, time it takes for and the first put to reach the receiving |
|
|
32:24 | is any shorter, it could be longer than perhaps what happened. It's |
|
|
32:30 | case for the second good statement. again, there is no guarantee all |
|
|
32:38 | , border in the receiving process and in puts and gets. There's no |
|
|
32:48 | in terms of the ordering, when it comes to sequence off the |
|
|
32:56 | statements. And apparently the intern standards that for that case, the ordering |
|
|
33:04 | to be respected in terms of which order in which takes the communication statements |
|
|
33:11 | executed. Eso the next sectors sites I have is showing a few of |
|
|
33:23 | different ways off mhm and forcing, would say, ordering or synchronization between |
|
|
33:34 | events that the process is they And I guess one of the important |
|
|
33:45 | is that there's this notion of a that is basically Windows off time, |
|
|
33:50 | you like between the synchronization events, think it's the best way of describing |
|
|
34:01 | , and hopefully in the next few for make it somewhat more clear. |
|
|
34:10 | one set, Uh, that makes sense, uh, intuitive against this |
|
|
34:18 | of defense or kind of like a that you start one of these air |
|
|
34:26 | and say OK, from now there is, you know, receptive |
|
|
34:34 | and by some later point in the , you want to make sure that |
|
|
34:39 | communication actions between these two fence posts completely so then this a book. |
|
|
34:50 | is what he said before, there no particular ordering in terms of successive |
|
|
34:54 | reports and gets. But there is guaranteed the order, and when it |
|
|
34:59 | to you accumulating communication instructions or function . So in this case, all |
|
|
35:11 | processes involved in this window has to these two France instructions. So this |
|
|
35:19 | one way which one can guarantee, instance, that the put action has |
|
|
35:26 | on the receiving side before one start access that memory location that presumably was |
|
|
35:34 | to be updated by another process. , now, this is, |
|
|
35:46 | a little bit more flexible in the that the target and the origins our |
|
|
35:54 | have execute there in our own subtle definition, all the start and finish |
|
|
36:04 | minute. So, in this uh, the targets starts an effort |
|
|
36:13 | the post and completed by away, , on the origin that actually is |
|
|
36:21 | several cars. Another start by the on than a completion off the |
|
|
36:30 | And I remember correctly, they their are not allowed to be nested, |
|
|
36:41 | they have to be sort of matching on targets and origins on. Then |
|
|
36:54 | what I want to talk about before kind of the active ones were both |
|
|
37:00 | and origin are participating and defining the . Where is the passive target Molder |
|
|
37:11 | that It's just that the in this there were origin process. Want to |
|
|
37:19 | sure a certain communication actions has completed you move on in the coat. |
|
|
37:28 | in that sense, you think kind create these two fence post a lot |
|
|
37:36 | the lock and unlock part on this just still the text this carbon war |
|
|
37:47 | what the arguments are. And again applies to the window. Uh, |
|
|
37:55 | is, um, globally shared I think there's another one that's again |
|
|
38:03 | a similar defines what is involved in locking, unlocking or guarantees distinction, |
|
|
38:16 | or of sequencing of actions in the . So this is it. Everything |
|
|
38:27 | . But just to illustrate that for correctness of the gold sequence may |
|
|
38:34 | important that when you use one side communication that synchronization has to be made |
|
|
38:42 | . Thio one off the synchronization function where there is the kind of the |
|
|
38:54 | rigid, uh, friends function called the totally local lock on lock or |
|
|
39:03 | somewhat more flexible? All right, target function called that is separate down |
|
|
39:17 | origin and target processes. So I know, even though it wasn't very |
|
|
39:27 | , my description that maybe at least some idea off better idea how the |
|
|
39:41 | between sending and receiving processes can be in the one side and communication. |
|
|
39:56 | I was a question raised by other the horse. So did it help |
|
|
40:05 | ? Yeah, of course. so the next example is not so |
|
|
40:09 | about using synchronization, but next examples trying to illustrate some of the performance |
|
|
40:17 | in terms of using one side of . But, uh, I would |
|
|
40:28 | that on the flexibility has been the driving forces. Why putting gets were |
|
|
40:35 | On one side, the communication was in MP I. It wasn't part |
|
|
40:39 | the first for so I think, standards releases. So this is trying |
|
|
40:50 | illustrate the notion off the one side communications for base for the puts and |
|
|
40:58 | . And there is a bunch of here, and I guess what they |
|
|
41:03 | the first one sided N p That was the one sided N p |
|
|
41:07 | . And the other ones are different off, I will say once at |
|
|
41:13 | to started the FBI, Okay, a computer company that has been particularly |
|
|
41:23 | on high performance computing systems if you know it. But for those who |
|
|
41:30 | know crazy, synonymous with supercomputers and kind of spin there very much on |
|
|
41:38 | product line, Um, so when comes to the put on the left |
|
|
41:46 | on this graph, you can see Leighton see that is remarkably lower for |
|
|
41:57 | particular for small messages. And they they, I think both access or |
|
|
42:08 | longer ethnic. So even though it not show that Georgia difference, it |
|
|
42:13 | algorithmic such a reason with fraction. when it comes to potentially wanting to |
|
|
42:23 | processes than there's not usually a whole of data, so the messages or |
|
|
42:30 | data that is being communicated maybe quite . So in that case puts and |
|
|
42:36 | maybe quite beneficial in particular for the once cleared a message body, it's |
|
|
42:45 | than it's not so apparent. What gayness, Because then the payload this |
|
|
42:53 | again, apparently for the get, is a round trip you have to |
|
|
42:59 | and received. The data was not as pronounced as or the foot |
|
|
43:04 | and and then there's another one for here, um, in terms of |
|
|
43:11 | message rate, and in that it's also noticeable again. This case |
|
|
43:18 | look a lot of factors, for the smaller messages on then it's |
|
|
43:26 | that marked for very large messages. and then I think there was a |
|
|
43:35 | of other also examples here off potential Um and I guess one thing that |
|
|
43:47 | the left hand silence that's higher is . In that case, it was |
|
|
43:52 | search. And I think that's one that may be good to be aware |
|
|
44:01 | in terms off the blue curve in graph on the left hand side that |
|
|
44:12 | communications software, depending on walked is being used for actually sending |
|
|
44:20 | Um, the Macklin the mechanisms used be quite different for small messages compared |
|
|
44:35 | large messages. And And I wanna taking you networking class. Uh, |
|
|
44:46 | have become familiar in terms of TCP in terms of jumbo packets or |
|
|
44:52 | So the mechanisms that may be used again small and small messages may have |
|
|
44:59 | buffering mechanisms. Confirmed a large And that's my guest is the reason |
|
|
45:04 | this very significant drop in terms off bandwidth. I once you get, |
|
|
45:11 | guess past 32 or someone's most 64 whatever in the number off elements are |
|
|
45:22 | terms of this, uh, Well, that is just just one |
|
|
45:28 | thio caveat. So to speak toe aware of that underlying communication mechanisms being |
|
|
45:33 | for M p I libraries, um end up that if you do some |
|
|
45:43 | , you'll receive behavior like the blue instead of a nice smooth one |
|
|
45:51 | uh, this is again guests, other more application oriented benchmark. So |
|
|
46:00 | cases, you know, the benefits marginal. In some cases, it's |
|
|
46:04 | measurable. And I think that waas have an example. I remember just |
|
|
46:16 | using, um, the one sided . So any questions on this, |
|
|
46:24 | , performance graphs or benchmarks? If , I'll talk a little bit more |
|
|
46:35 | an explicit way in which, the get them put in prison. |
|
|
46:46 | example I'm going to talk about is just use inputs as far as I |
|
|
46:50 | . Now, um, it is Kobe example you have seen a couple |
|
|
46:58 | times before already. So it The notion of this competition should be |
|
|
47:05 | familiar. So the basic operation is realization operation in which and no, |
|
|
47:16 | is updated on it. Slightly I think in there previous jacoby |
|
|
47:22 | Um, that was used for open be I waas. Yeah, five |
|
|
47:27 | stencil. But at that point, it was only updating the center point |
|
|
47:35 | just used to for neighboring points in case, and the five point potential |
|
|
47:41 | actually also using the center points But it's kind of reminder difference. |
|
|
47:47 | point is that one needs to access from four neighbors. Now the thing |
|
|
48:00 | , now we're trying to use, , not honest certain with several |
|
|
48:07 | But the reason for several processes is want to use several notes in a |
|
|
48:12 | . So now things gets partitioned. , but summer measures or sub grids |
|
|
48:22 | process and those processes are most likely going to be on the same |
|
|
48:29 | Some may be, uh, but on how you want things to be |
|
|
48:37 | , I think of these processes for exercise, I would say being on |
|
|
48:44 | notes, it doesn't necessarily matter because is only about the logic interaction between |
|
|
48:54 | . But as we can see on right side on this graph that when |
|
|
49:01 | comes to mesh points that are of edge off the sub domain of the |
|
|
49:06 | mesh allocated to process, it would data from another process. So this |
|
|
49:14 | where we come into Uh huh. would need to get data from processes |
|
|
49:25 | adjacent submission is so, um, yes, I said that before. |
|
|
49:41 | think that's in terms of processes. typical rule is that psychology owner computer |
|
|
49:49 | , she got submission a scientific That process also does all the competitions |
|
|
49:59 | to do the update on that So in that case, in this |
|
|
50:06 | , that means that sort of the submissions requires the Jell O nash points |
|
|
50:17 | cells in order to do the full off the white submission. So |
|
|
50:26 | sometimes these are the owners boundary sometimes the shadow regions and its many |
|
|
50:32 | names for these yellow pieces. So typically is being done than that? |
|
|
50:42 | process allocates somewhat larger submission that can the Jell O pieces of the |
|
|
50:54 | So those yellow pieces pieces are basically or replicates. What, in energy |
|
|
51:04 | ? Uh huh. Process. So guess this is what they says |
|
|
51:10 | that when ads than these yellow shadow and that can now be holding the |
|
|
51:22 | the white pieces need to update on cells or great points. Yeah, |
|
|
51:32 | on this line, based on the shows what long process needs to send |
|
|
51:43 | adjacent processes because the green ones are things up a needed by the yellow |
|
|
51:53 | the four adjacent processes to the one has a green in the box. |
|
|
52:04 | no one can use this one sided To do this on. This is |
|
|
52:11 | the origin target, but first, do the one sided communication and then |
|
|
52:19 | can use now the foot statement to update those regent. And let's see |
|
|
52:30 | there was another slide for this. , this is just the cold. |
|
|
52:36 | , it does this thing. But I talked about the cold, |
|
|
52:44 | any questions on this example of how tend to use in many codes that |
|
|
52:51 | this notion off 100 regions. Now waas just a five point stencil. |
|
|
52:58 | there was kind of only one this or so in adjacent process needed. |
|
|
53:06 | it's your has, um, a stencil. Um, so go on |
|
|
53:16 | slides here to sell the stencil. a stencil. So if you, |
|
|
53:21 | instance, have something that goes to great points away from the center point |
|
|
53:29 | each direction, um, then you need to reach deeper into your neighbors |
|
|
53:36 | also send more of your own grades adjacent neighbors. And there's also doesn't |
|
|
53:44 | to be just following the coordinate You can also have and diagonals and |
|
|
53:51 | complex stencils than anyone that does. analysis knows that this, um, |
|
|
54:00 | for how you use great point values be considerably more complex. And this |
|
|
54:11 | kind of known as higher order stencils tends to potentially gives you and more |
|
|
54:19 | it cone or to your integrations. you do electric tive algorithm. And |
|
|
54:30 | for that matter, there is um, fundamental 36 that quantum chroma |
|
|
54:40 | . They also interactions using more complicated for for interaction. And they come |
|
|
54:50 | sector used to represent the physics. yes, on that, that picture |
|
|
55:00 | there. This is, of happening for all the stencils or the |
|
|
55:06 | , right? Right. For the second row, all the way |
|
|
55:09 | the right has put arrows left up down. Right? So the idea |
|
|
55:16 | that, you know, basically every execute this put statement, so there's |
|
|
55:25 | one being shown here. But if cannot take the middle road here, |
|
|
55:29 | it means the process on the left correspondent Lee is, um, execute |
|
|
55:39 | to its neighbors. So the center on this picture will also be the |
|
|
55:47 | of puts from its four neighbors. , Perfect. All right, now |
|
|
55:55 | make one more statement relating to not asked about earlier. So now, |
|
|
56:06 | out from what the algorithm is um they use this kind of |
|
|
56:21 | uh, attractive jacoby algorithm, in case you may need to do as |
|
|
56:27 | done in an example, in some lecture that you have old and new |
|
|
56:37 | . So the green doesn't kind of over right yourselves. So they kind |
|
|
56:44 | do things and the lobster lock step terms of updating. But there's also |
|
|
56:53 | that is known as chaotic relax ation where you don't really care. So |
|
|
57:03 | that case, uh, the receiving of the put statement, it's when |
|
|
57:10 | it rates it. Just use whatever it happens tohave whether the center point |
|
|
57:17 | several integrations behind or ahead. It care. So is an example, |
|
|
57:27 | on what the it intuitive algorithm whether you want to enforce and ordering |
|
|
57:38 | . So the global integration steps, you like, or if you don't |
|
|
57:43 | . So whether you use some form fencing, synchronization, block and lock |
|
|
57:51 | not, it depends. Um, , the numerical algorithm there you're actually |
|
|
58:04 | and maybe I'll get to talk about before the course is over. But |
|
|
58:09 | relates to, uh, algorithms and to elaborate a little bit further on |
|
|
58:15 | . So if you think of this in, uh so you're very large |
|
|
58:22 | . The process is there are the off the center. Point may not |
|
|
58:29 | allocated to anything. Any notes in physical machine that is anywhere close, |
|
|
58:41 | where you happen to be. So communication time for some neighbors may be |
|
|
58:48 | short and two other neighbors it may very long. And so that's why |
|
|
58:59 | in terms of parallel computing, sometimes a synchronous algorithm has gotten a revival |
|
|
59:06 | of trying to enforce synchrony because if you can use an asynchronous |
|
|
59:14 | maybe in the end, billion require lot less time for reiteration on |
|
|
59:22 | and you can reach the answer more . The position you want. |
|
|
59:30 | so here is just a little bit to illustrate the cold for this particular |
|
|
59:39 | . And you can see there's just for put statements that illustrate the four |
|
|
59:48 | that is, too for this sort no ourselves. And then there's 24 |
|
|
59:54 | West in terms off what the target are and what the for the sense |
|
|
60:01 | for the puts, you know, the source and what the destination. |
|
|
60:05 | it uses the, uh, different of windows designed to the, |
|
|
60:16 | different boundaries Selves so on. And believe in C. Ashkan perhaps. |
|
|
60:36 | so I think these codes are available on blackboard or I forgot to check |
|
|
60:42 | own blackboard. Um, let your I think there there are available. |
|
|
60:49 | if you want to look at the codes, there are there, |
|
|
60:53 | not just this cold segment for the statements, double check. But I |
|
|
60:58 | it's under the contents section of FBI . Yeah, I think there should |
|
|
61:04 | example codes for everything and very Or if that it's there. If |
|
|
61:09 | not already, I'm sorry, but reminding to check before where they |
|
|
61:18 | but you can also, um, fact, this Google putting in there |
|
|
61:25 | name It's most instances you will find website and get sore still. So |
|
|
61:40 | is just quick example how you can it in a product. Um, |
|
|
61:47 | you do matrix matrix multiplication using to accumulated. We're in this case, |
|
|
61:54 | , they used the multiply, or add operation to do the, |
|
|
62:04 | , in a product. Ah, this is another again, just where |
|
|
62:10 | can find the gold? I hope is. Well, we will double |
|
|
62:14 | . Can my apologies for not checking of class? Um, so this |
|
|
62:20 | a little bit of discussion, I already had. When it's useful, |
|
|
62:27 | use their different synchronization roads. And depends again on the algorithm. And |
|
|
62:35 | could also depend on the environment the that you're actually using for the |
|
|
62:45 | Let's see what else? Okay, I'm going to switch a little bit |
|
|
62:54 | dealing with processes, so I'll stop a second to see if there's any |
|
|
63:01 | question on one side of communication. it'll be different topic for their over |
|
|
63:09 | rest of us. The class Okay, So this notion of virtual |
|
|
63:22 | is something thio that addresses how the are logically organized on. Then there's |
|
|
63:37 | other part that is the mapping, the processes and then mapped on to |
|
|
63:45 | physical machine. So the reason for notion of virtual apologists In the early |
|
|
63:56 | , um, of M p i , I will say to fold one |
|
|
64:07 | like in the Kobe example. I mentioned the from the applications perspective. |
|
|
64:20 | kind of beneficial to think off. process is being organized, I |
|
|
64:29 | In that case, a two dimensional . It could be any dimensions. |
|
|
64:36 | 34 Whatever is useful in case of application, But in that case for |
|
|
64:44 | Jacoby algorithm, it's, you East West, North South. Neighbors |
|
|
64:48 | the grid is kind of the logical of writing was cold. So that |
|
|
64:54 | one reason for these virtual topology and that was the condition. |
|
|
65:03 | The other reason well, and that's relevant. Depending upon the application, |
|
|
65:10 | tends to be began in Terms then, is scientific and engineering computations |
|
|
65:17 | these highly structured rates may not be dominating as they once were. That's |
|
|
65:24 | more flexible petitioning off space today, maybe was the case in the early |
|
|
65:31 | . So computing. Um, but other one waas too. Matt, |
|
|
65:44 | process is onto the machine in a that was beneficial. We respect to |
|
|
65:51 | the various notes and the machine were . And we haven't talked about interconnection |
|
|
65:59 | yet, but I will in the lecture and for I would say, |
|
|
66:08 | 15, 20 years. And even , today, some form off multi |
|
|
66:18 | mess has been quite popular. So small collections of processors to the measure |
|
|
66:32 | very used and may still be Andi, um, you may remember |
|
|
66:39 | I talked about processors processing architectural thing her for something. Um, when |
|
|
66:48 | comes to multicourse today, the high count ones that tend to have the |
|
|
66:53 | day mission to connect. So there some potential benefits even on a single |
|
|
67:02 | toe. Have wrappings that kind off the physical into connect in order to |
|
|
67:13 | do contention in the network, depending what you're algorithm is doing otherwise and |
|
|
67:21 | together machines upto find dimensional tourists have used and are still being used by |
|
|
67:29 | vendors. But they're also other Topology is off interconnection networks for processors |
|
|
67:38 | for which the petition, um, may not be particularly relevant. But |
|
|
67:48 | , I will talk more about that I talked about interconnection networks and then |
|
|
67:51 | other topology that was supported as a topology is to be able to define |
|
|
67:59 | graphs. All right, so the thing is, so I guess this |
|
|
68:04 | what I already talked about. when the right well grand center one |
|
|
68:12 | the typical the STD regular mesh that used to be quite frequently and still |
|
|
68:20 | irrelevant for two or three D form in terms of physical things being simulated |
|
|
68:26 | the lower right hand corner shows more some dependence craft between, um, |
|
|
68:33 | , you know, functions or Actually, this in a level left |
|
|
68:37 | shows in. Not a typical, know, final element grid. |
|
|
68:42 | that will final volume type creed that may use for doing some simulations, |
|
|
68:48 | aware flow around the rain. uh, you know some other things |
|
|
68:55 | your car and I'll talk about how carve data structures up and allocate |
|
|
69:00 | Two notes in a later lecture. business. So now this is what |
|
|
69:10 | talk more about next time on the thio. The process is, |
|
|
69:18 | to the actually the machine. But move on. Thio talking about the |
|
|
69:25 | P I petition first. So here's of the petition things. So what |
|
|
69:35 | has then is M P I routines allow you to create a Kardashian |
|
|
69:46 | And I'll talk a little bit about different routines what they did. |
|
|
69:53 | so this is kind of a little Summers like what I want to talk |
|
|
69:57 | . So I think I have slides each one of them separately, but |
|
|
70:02 | is the first one they create is creating cutie condition and from a Nexus |
|
|
70:12 | communicator. So you have one communicated an input to the card create |
|
|
70:17 | And then they create another communicator and talked about the other ones hospital. |
|
|
70:24 | there's that create, um, function , and there is couple of |
|
|
70:38 | and I'll say you will. I on the difference line here about There's |
|
|
70:43 | said. There is one, input car communicator Now and then you |
|
|
70:57 | a new one so the import could the convert that it's have all the |
|
|
71:04 | that has no structure, and then want to create a sort of mesh |
|
|
71:12 | configuration of processes on you specify the of dimension there could be 2 to |
|
|
71:18 | or whatever number of dimensions you want then for each off the dimensions. |
|
|
71:27 | also define the extent of that dimension putting an argument to say how many |
|
|
71:34 | alongside the quote unquote X axis and many processes along the Y axis |
|
|
71:42 | Then it comes again, motivated by lot off the miracle analysis golds. |
|
|
71:51 | you want periodic grids or measures, sometimes you don't. So you can |
|
|
71:57 | specify for each off the dimension if want to wrap around or not. |
|
|
72:06 | then there is the reorder attributes you specify. And that is whether |
|
|
72:17 | P. I is told Thio preserve process rank from what ING is in |
|
|
72:29 | inputs and big communicator so it will the same rank in the output. |
|
|
72:37 | if N. P. I is to assign a different rank in the |
|
|
72:43 | Communicator, and I think that's pretty what it says on this slide. |
|
|
72:52 | reorder is a little bit, not so easy to understand in the |
|
|
73:00 | it does not in itself do any or mapping off processes to processors or |
|
|
73:17 | units of any flavor. But it MP I and M P I libraries |
|
|
73:22 | may have an information about. But machine looks like and based on other |
|
|
73:29 | in the mapping process, come up new ranks that both potentially benefit the |
|
|
73:41 | . Um, so here's just unexamined showing the case where there's wraparound along |
|
|
73:48 | axis and no wraparound along the other . Okay, now there is also |
|
|
73:57 | deems create the team that, is trying to offer some service unless |
|
|
74:10 | know what you kind of really want the shape of the petition great that |
|
|
74:20 | being created. So if in the um array, you get, put |
|
|
74:35 | value zero for the extent of the of process of the number of process |
|
|
74:40 | a particular dimension N p, I take the US ah, opportunity to |
|
|
74:48 | and figure out walked from that performance of view. We respect of what |
|
|
74:57 | knows about the cluster to assign a length or number of processes and what |
|
|
75:10 | tends to do it tries to make as kind of Poland called Square as |
|
|
75:23 | , whether that indeed is beneficial to application, it doesn't really have enough |
|
|
75:29 | to know lot. The usage pattern in the code Thio guarantee that that's |
|
|
75:36 | fact, the goal that is the . Now the other thing I want |
|
|
75:43 | say. And I was, try to figure out I think this |
|
|
75:51 | a function that is not supported in NPR implementations. I could not find |
|
|
76:00 | in open M p I, on other hand, in another NPR implementation |
|
|
76:05 | Azam pitch I faras. I know still there. Um, so here's |
|
|
76:16 | little bit concrete example off what it do on as it said on the |
|
|
76:21 | slide. Yeah, um, things to be easy. Evenly divide herbal |
|
|
76:32 | the total number off processes by the length. So that's why things may |
|
|
76:39 | come up the way you I hope , because there's this restrictions that things |
|
|
76:45 | to be evenly divisible, including the number off process news that you |
|
|
76:54 | You can also specify through the endings lengths, if you like dimension |
|
|
77:02 | extents. Something was product is smaller the total number off. No such |
|
|
77:08 | half. In which case, kind . That's fine. But you can |
|
|
77:13 | Thio. Basically, it has to in the communicator that you're given. |
|
|
77:20 | we'll get the matter. Andi, encourage you to take a look at |
|
|
77:23 | things, but you can see that . If you have. It was |
|
|
77:32 | nodes in this case in two dimensions the first role, and then they |
|
|
77:36 | to do it a Z evenly as can, and I'm then the comes |
|
|
77:43 | a three by two. Great for dimensions. Since, uh, it |
|
|
77:51 | everything. So two by two, small and three by three is too |
|
|
77:55 | . Began three bite Um, and guess this may be a good |
|
|
78:05 | And don't the what? The ordering terms of processes it uses, as |
|
|
78:18 | said is fairly is undetermined. It's specified in the standard, and NPR |
|
|
78:25 | free to use whatever it does. reorder is set, Um, and |
|
|
78:34 | way it may do it. It's to use draw major ordering. And |
|
|
78:38 | you look at a lot of examples there, that's what people you should |
|
|
78:45 | on their slides. But it's not true that that is, in |
|
|
78:49 | what happens because again, NPR implementer free to use whichever way they |
|
|
78:55 | Andi Potential use machine configuration info Thio I'll do the mapping the between processes |
|
|
79:07 | the Input Communicator on the processes in Output Communicator. Let's see affection. |
|
|
79:19 | a few more examples on We'll try do that. And I talked about |
|
|
79:23 | mapping next structure, as well as combination off Opening P and and the |
|
|
79:36 | Materials guy and where you can use N P. I run Andi |
|
|
79:42 | you know, eight processes and in case that you want to condition with |
|
|
79:50 | dimensions in a four by two Then it tells you which rank gets |
|
|
79:56 | to reach on the to, dimensions and this a couple of |
|
|
80:03 | and you can look at them. think there is, uh, and |
|
|
80:08 | there are basically information routines that tells making gap information about the petition greed |
|
|
80:18 | , the number off dimensions in each of the access and whether their periodic |
|
|
80:21 | not so 10 see what your that also do the conversions between rank and |
|
|
80:34 | in the communicator and this'll is called . You can get the, |
|
|
80:42 | the condition coordinate for the rank that process is e think this is |
|
|
80:51 | There's an example that you have, and this is one, I think |
|
|
80:57 | last one I wanted to try to , and I will take questions. |
|
|
81:01 | then I can answer more questions about petition. And I will also talk |
|
|
81:06 | the great one next time, since didn't get time to do that. |
|
|
81:09 | as also sometimes, um, instead using the puts on the one side |
|
|
81:15 | communication one can sometimes is useful and and used. It's basically shift operator |
|
|
81:22 | left. Shift right up, It's a try, and there is |
|
|
81:28 | algorithms that kind of tend to use , coordinate organization one also have a |
|
|
81:35 | 30 on. This is kind of example of that. Okay, there |
|
|
81:41 | more I should talk on. My is up. So make a few |
|
|
81:45 | comments about the condition next time and about how you define graphs. It |
|
|
81:50 | in the slides ever uploaded, so stop there and take questions. If |
|
|
82:11 | , I will stop the recording |
|