© Distribution of this video is restricted by its owner
Transcript ×
Auto highlight
Font-size
00:01 Yeah. Okay. Okay. I'll show some of uh some

00:10 And you go for a couple of . That might be useful for

00:15 Not just for the assignment but also your projects as well. So I'll

00:22 with some simple examples and a couple things related to process, uh burning

00:28 mapping and so on. You can you in case you talk so

00:38 So simply starting with already uh simple and it's not physical things. The

00:47 . Yes. So uh this is uh basic skeleton. Apparently I program

00:53 like uh so you need to make you have the header file, FBI

00:58 included in your program and then any uh standard C or C plus plus

01:03 files that you may need. Um most important all that you need to

01:10 with babies in it. And right we're just do uh evil initialization with

01:16 passing any extra parameters to it. This program here, it's simple to

01:22 of the first opening the program is we just requested uh number of threads

01:30 respond in the parallel section and what ideas were and the printed them.

01:35 here with N. B. You can do similar things. But

01:38 first this called empty icon size which you that there are no longer of

01:44 in your uh that belong to this communicator. And this is the first

01:52 communicator that NPR provides you. That's Congolese so that contains all the processes

01:57 you will you respond when you run program. So this program gives you

02:02 number of processes then? We're in those processes. You can also request

02:08 rank of each process by using this empty icom rank and again use the

02:14 as the context for your for the of processes that you're referring to request

02:20 ranks and this is the variable that get the uh result of this.

02:26 this function called, you can also some information about the process of name

02:32 using this Mp I get processor I'll show you how the output looks

02:36 . And it was just another simple state mental world. And when you

02:44 with your program you need to Colombia finalized too, remove all the uh

02:49 Mp I uh set up was done your program. So any FBI program

02:54 look very similar to this. It shortly have this one M P I

02:58 and then one and one FBI finalized . Uh huh. Well as we

03:05 talked about it in the lecture last that FBI is a standard and there

03:10 several implementations of it. Um so this case I'm right now on on

03:15 bridges bridges to cluster compute note and guys have the open MPI I uh

03:23 of M P I. Um so do that to use that implementation you

03:29 need to do model Lord of an . Yeah and that lets you is

03:35 particular modules? This is the Open module that you need to use on

03:40 trampoline. Do you also have access the internal mbia implementation which the package

03:47 for which goes like I M P and whatever version uh for that package

03:53 be there. Mhm. Now when want to combine your FBI programs for

03:59 program uh you simply do it the N C I C C rather than

04:04 C C C R I C C . Uh If I just stood here

04:08 couple of times, it will show whether um available commands. Here's A

04:13 B I C plus plus is the cross compiler for FBI. If you're

04:18 FORTRAN then you have the FORTRAN compiler well for and then N P I

04:23 is what you will use to execute FBI programs and specify how many processes

04:30 you want to spawn for your for program. So here are simply too

04:37 And so yeah, there's nothing special the compilation Choir that's simply just replacing

04:44 you. General gcc compiler compiler with from Thailand and that generates your

04:52 Now there's nothing that you have done your program. You have only added

04:59 . I also your your program. not like open Mp where you specify

05:03 number of threads or anything inside your right yet. The the only time

05:09 uh your program will be replicated across . We'll be venue around it using

05:16 Iran. So that yeah, you provide and live life which stands for

05:20 number of processes. And here let's idea of four processes and then run

05:25 program. Uh So what's that that's to do. It's going to replicate

05:30 program, the entire program on four . And when I run it,

05:39 is weird. Mhm. I didn't any bindings. Okay. Mhm.

06:08 . Yeah. Okay. I don't what's going on here. Yeah,

06:17 know. I know. I asked 65 when I requested these resources.

06:29 sure I have 65. Yeah, requested 65. Of course when I

06:33 getting these resources interesting. Yeah. me see if I can do this

06:44 stamp duty up the there was a . All right. Mhm. I'm

06:58 about that. Okay. Yeah, . This number is coming here.

07:17 . Okay. I'll have to track reminding example was on the bridges

07:21 so I'll need to see what can done to show that. But

07:25 I'll start with these simpler examples So here. Um Yes, rather

07:30 open MPI I'm here using the intel so you get to see both of

07:36 right now because of this problem, . Okay, so yeah minus.

07:42 before hopefully this works nothing but got up. There we go.

07:49 So yeah, you get four Uh which which got the entire program

07:56 be executed there um the second simple that we saw last time was how

08:05 do communication between two processes. So again I have simply M.

08:10 I in it uh downsized rank and I chose one of the processes to

08:16 sores. There's one if condition that that checks whether the world frank equals

08:21 source strength and that process becomes the in this case and then I have

08:27 process which is processed one the destination posts a um receive uh and statement

08:37 . So yeah as we saw last that we need to have both both

08:40 uh a pair of FBI sent and for data communication for these blocking calls

08:47 then in the in the end you M. P. I finalized.

08:50 here uh the sender is sending just value of five uh to the receiver

08:56 in this case and then compilation will the same. Mhm. And yeah

09:07 let's do this. Yes come on yep. Yeah cross zero sent five

09:20 this one and uh receiver received that uh 100 later element here. Uh

09:28 uh this is one of the Again we saw last time that what

09:35 if you have um the receives and that do not imagine because dreadlocks a

09:42 to see both ranks, they post receive, two from uh to expect

09:48 data from the other rank is zero zero and one drinks performing communication here

09:54 in this case the last time that will cause a Denmark here. So

10:00 I understand this program here but uh processes yeah, it will just keep

10:09 , it will not show any uh how would just stand on that log

10:13 both processes are just waiting to just some data from the other process

10:23 And the solution I believe is in one where either you can rearrange your

10:31 and received to be correctly posted or can use the nonworking calls. So

10:36 ones Mp Sand and NPR received, are blocking calls. That means the

10:41 does not progress until until those calls finished is non blocking cost. Did

10:47 get posted? So the data gets to some internal sufferer. Local buffer

10:52 processes and then uh progress with whatever instruction they need to execute. Similarly

10:59 I receive we receive um instructions are her and then uh there might be

11:05 other around by mechanisms that might be might handle any data that was received

11:11 the central process at a later point time. So it does not wait

11:15 the Los Angeles actually dissent from that not. Yeah. Yeah, so

11:27 you run this now let's shoot hopefully finishes as you can see there's no

11:34 no implied order between any of the as well here. The receive and

11:40 for rank one was posted before rank so you need to keep in mind

11:45 the similar cases open MPI that your outcome or collectors should not be dependent

11:52 the order of the processes in which execute. If you if you want

11:57 make sure then you can use any any barrier constructs that the FBI

12:02 But did you want? Yeah. . It's yeah. Okay. So

12:17 is one simple example of how you perform a broadcast cast and also how

12:23 can do some production uh in this here. So uh here what I'm

12:29 is just having a which is initialized some character values. It's uh area

12:36 characters and then the FBI Broadcast function called FBI Broadcast function is a collective

12:44 . So all the processes that are in this broadcast function need to call

12:48 together. Well, not exactly but at least they they all need

12:53 call it. Uh and as you the offer, that will be just

13:01 will be broadcasted to all the Um and you need to remember that

13:07 receiving processes should also have um initialized declared the same offer in their memory

13:15 that sale here when initializing this uh this buffer there was an if condition

13:22 the only rank zero, initialized before zero declared it. But the other

13:28 other processes try to request some data it. It will result in an

13:33 because that memory location will not exist you clear it. So make sure

13:39 you uh that memory location is accessible all the processes. And here,

13:45 is the number of elements, So that's not the total size of

13:51 area, a total size of the uh computer by NPR and I'm using

13:56 NPR types you provide in these so in this case that's empty uh

14:01 , that's transparently a character which is . Right? So n times one

14:06 will be the number of bytes that be transferred or broadcasted to all the

14:12 . And then you also need to the root process for your broadcast

14:17 And this is zero process is the having zero practice here, uh is

14:22 root and then these are all these belong to the com board context.

14:28 I also need to provide that. Yeah, you need to pass a

14:38 to a memory location, president, would be an area of any

14:45 it could be appointed to a in which case you need to define

14:49 custom data types as we saw last to define the sizes and types of

14:54 uh infrastructure, right. Start anything, everything points to memory or

15:07 location. Right. It looks to uh it needs to be a memory

15:16 . Right, so this way you broadcast. Now, one other interesting

15:20 that's going on here is that these goals which are the NPR Waldrop

15:27 Uh that's being used here. So each each process is computing this local

15:34 uh locally for each each process, I'm doing here is trying to compute

15:42 the maximum minimum and the uh total uh taken by all the processes to

15:50 this NV broadcast function And this NPR dysfunction is again a blocking call.

15:56 even though process zero may send data some process, some that process may

16:01 some time. There will be a involved in receiving that data. Receiving

16:07 will not reach this. And uh call here that takes here, finish

16:14 time that takes it to finish. the these local times were different for

16:18 process here by using FBI reduce, can also check what's clear was the

16:25 minimum and total time. You can all the all the processes that were

16:29 more than this broadcast function. here is the simplest way you can

16:33 is buy providing the address to the to the local variable that contains the

16:40 time for each process. Yeah, there is a global uh max I'm

16:47 both, which will be, which end up which will end up having

16:53 maximum maximum reduction value only uh on 00 process, zero process will get

17:02 maximum value of local time out of the processes there. So let's say

17:09 have four processes, one process to seconds. Other one twenties and

17:13 Under 40 seconds. Right? So this reduction process, zero will get

17:17 know that 40 seconds was the maximum for some for one of the

17:22 So that's the maximum reduction. The reduction you can get. What was

17:27 minimum price for 10 seconds was one the processes that and then if you

17:33 to get the average time taken by the all the processes to finish this

17:38 function, we can simply do a reduction on this local time variable.

17:44 then towards the end. Yes, . And you can do sometimes divided

17:53 the number of processes. Look at , look at the average time across

17:58 the all the processes. Does that sense? Okay. Yeah.

18:09 And now, when I run it right, we do. But how

18:18 you compute averages across across the You need to get everyone everyone's a

18:25 of times and then completely average Uh huh. No, no,

18:42 each each process is running in a uh memories. Remember the address

18:47 So, one process does not know the local time of another process,

18:53 the daughter time in december. each Each process again, as I

19:07 , only knows about its local So, one process may finish

19:11 One event process may finish later All right. So let's say process

19:17 finished in 10 seconds. Some other finished in 40 seconds. How does

19:22 does process zero knows that the other finishing? 40. Right. All

19:31 . All right. Yeah. You'll the difference between uh the timing steak

19:41 so it's not significant right now. was hoping that the richest one which

19:47 . So I can show you something results using from happening. But let's

19:50 when I get to the end. , yeah, as you can

19:53 there's uh some difference between the amount time state and between different processes to

19:59 these broadcast this broadcast corporation here and depends on where your process ends up

20:06 the north or if you are multiple right now, I'm only on the

20:11 , but you can also do communication not something that is the agency

20:15 Then we conclude the interconnect lay agencies the infinite barrel it and that whatever

20:22 being used between the notes of your are communication across the notes.

20:27 Oh, you can go home. Today, these knows uh the slum

20:36 taking quite a while to actually access resources. Yeah. Otherwise I would

20:40 shown up if I okay, you be my resources right now. I

20:45 not get access to it quickly. Yeah, yeah, but again,

20:54 a little bit different. But not much. Yeah. Yeah, I'll

20:58 if I can run it on bridges uh towards the end. Show some

21:03 results there. All right. Uh , this point. Yeah. So

21:13 one it's just another version of Uh So I can show you that

21:21 it's not it's not very interesting. basically what's going on is every processes

21:31 uh addition of some uh elements in vector and N. B.

21:36 New computer reduction. Yeah, pretty the same way that we did the

21:41 of timing. So wasn't that interesting ? Mhm. Yes. So this

21:51 is one of the uh the only I think to uh create groups,

21:57 groups of uh the processes that you in your program. So this is

22:04 I'm trying to do is I have was born eight processes and then I'll

22:09 to divide them into groups of of of two, and then it probably

22:14 get its own local communicated rather than through the world communicator. So,

22:23 do that first, uh I have two areas of ranks, ranks once

22:30 uh include 012 and three, and to include 4567. So these

22:36 these will be the ranks that will divided divided different two groups from the

22:41 processes that I respond. Okay. , Yes. So, first I

22:50 to get access to the handle uh P I call set for the for

22:57 original group and to do that, simply call this function F B I

23:02 group and provide the communicator for that and that gives you the handle inside

23:10 variable, which is of type Np group. So that's where you get

23:16 to the handle of that growth. , once you have got that you

23:24 call uh you can have uh left condition to uh decide, decide which

23:33 were going to which groups are Whichever bank has um idea of number

23:39 drugs might do less than number of by two. So that's uh whichever

23:45 is less than has less than Process. I. D. Will

23:48 in this air condition here. And I'm creating the group hereby inclusion.

23:55 that that means we call FBI group as I. N. C.

24:00 . Lonely And then provide the handle the original group 1st. But we

24:05 up here. You need to also how many uh banks will be part

24:12 this this new group. And then need to provide the ides of these

24:18 these ranks here. Uh That will part of this new group. And

24:23 you also need to provide they handle un initialized handle to this new group

24:33 is also an act of M. . I grew up here.

24:38 Yeah. So any any process that have uh I. D. More

24:45 more than four here for more than . We'll go to this health condition

24:49 will be part of the other group . The interesting thing is you don't

24:54 to use blue handles for each of uh private of these groups because the

25:02 gets replicated. Everyone will have one of this new group variable basically.

25:09 whatever they initialized in their negro variable be local to them. All

25:14 So you only need one valuable for the all the processes. Once you

25:20 hold these NPR group uh functions, you need to do is create a

25:27 for this for this new group. that you can do by doing

25:31 I can create provided the communicator for original group we handle for the new

25:38 and a new communicator for this new , which is Newcombe, and which

25:43 of type np I come here. basically you get handled and then you

25:49 create a new communicator for that for new groups. Oh, and

25:57 this is this one already uh quality you can do on any local offer

26:02 these new uh newly created ranks here the FBI group drank gives you the

26:11 of a particular process inside a particular . Initially we were doing mp.

26:17 comrade which which gives you the global of uh the well using using the

26:25 . And it gives you the rank the process. You can you can

26:29 can do FBI comrade again on the communicator that you got to get the

26:34 rank as well. But this is way to get ranked by using the

26:37 handle here. And this is similar open MPI where the think yeah.

26:50 where the group group thanks uh private do each of the groups here.

26:56 global banks as you can see, from 0-7. And then in the

27:00 bank you don't want to and three you don't want to. And so

27:03 are local local to each of the . A very simple example here of

27:17 pie computation that we saw in open um um yeah, like they're all

27:31 reduce is basically yes. Uh it basically gets all the values from

27:41 all the processes and rather than that value value and depending upon only the

27:48 , it ends up on all the that were involved in the very emotional

27:54 . So in normal reduction, let's you had five processes, right?

28:00 type of production you perform, the value ends up on the route.

28:05 note, Let's 00 process. And all reduced, the reduced value

28:10 up on all the processes that were , not just on the under root

28:15 . That's right. So that's that's only difference between produce and all

28:20 Oh, oh, okay. Right. Contact. Okay.

28:40 Uh huh, mm hmm. mhm mm. Okay, mm

28:55 Uh huh. But yes, it confuses leah the total of the global

29:04 and world. So 00 plus one two plus three is six 4567 added

29:12 22. It always need to make that yeah. Uh huh.

29:26 Right. So in an mp I yeah, the reason I'm showing this

29:31 , it's not uh nothing special going here. That's basically um yeah,

29:37 FBI broadcast, which all the how many new iterations that we are

29:42 to run for this by computation So that's the end in this uh

29:48 this loop attractions here. The important I want to make from this example

29:55 , is that unlike open and which will distribute all the indexes for

30:00 and you don't need to care about indexes goes to, which uh which

30:05 in the general case, right? open mp. For FBI you always

30:12 to use uh this methodology of check computing index through uh these process ID's

30:22 the program has replicated across all the . Right. So even though the

30:29 know how many processes processes are in in the in the execution environment and

30:34 time is not dividing up the loops iteration into smaller chunks, you get

30:40 give them some iteration ideas that they compute without any uh involvement from the

30:47 . So in lmp, I whenever want to paralyze a for loop or

30:52 or any any iterative uh structure, always need to use that they started

30:59 the process highly. And then computing more of the of the number of

31:04 number of loops or do I blow uh some chung number of implement to

31:12 the uh the elements that you are to compute. Yeah. Cool.

31:20 . Yeah. But listen, we're for the example, the inter computer

31:26 exactly the same as you saw for MPI Again, the iterations are computer

31:32 on the uh based on the process rank and each process stops when it

31:39 uh values uh larger than as for . And the document that each process

31:47 is by a number of processes that involved in your in your uh parallel

31:53 of your uh your FBI program come by only one. And each process

32:00 might buy which is a local uh of the value of the fire that

32:06 compute. And towards the end, just adds up a local uh locally

32:14 values. Uh my pie into a variable if I and that reduced value

32:23 up in the process heroes, Which the root of this year's operation.

32:28 , yeah, yeah. Uh remind . They say that only you can

32:41 can hard code it but in this I didn't I didn't do it.

32:47 . Generally, yeah. But generally happens is an mp I pE program

32:53 people and women and a lot of the entire argument list, let's say

33:00 have an argument that when the arguments you pass to your program right would

33:05 red size, let's say you're doing multiplication, criticized. What's the block

33:10 that each program should get each process get and so on. Probably any

33:15 of arguments, right? Generally only Zero initialize is those uh those arguments

33:23 it gets from India from from the when they execute their program. The

33:29 step always is to broadcast those arguments uh to all the processes as a

33:37 general way of programming in FBI. , you will very, very rarely

33:43 that all the all the arguments are initialized on all the processes. That's

33:49 generally what I see him. I'm sure what the exact reason for that

33:53 , but that's the general programming paradigms people use when writing the FBI

33:57 arguments are passed on process zero and it runs what castle to all

34:03 the best processes kind of Yes. Yeah, yeah, yeah,

34:18 yeah, it's okay we're going And uh huh Come on Northern as

34:36 , a mountain is Yeah. Okay. Yeah. Running it

35:02 That's Yeah, I only chose the of processes times for a number of

35:07 that should have been performed so that steps in each process. Just only

35:12 four steps for the entire thing. . Okay, let me see if

35:18 can show you guys the binding examples I want to show. Yeah,

35:26 sure. Not sure what went wrong that execution environment on bridges.

35:33 everything works fine. Until you start them Mosul. Okay, let me

35:43 . Yeah. Okay. Yeah. that's uh Copeland mp. I um

35:51 is a very simple way to check your processes are blinded or pinned the

35:57 resources, right? Uh similar to we saw for open empty. All

36:03 . So, so with the FBI parameters that will be using and I

36:10 say in Colombia there is a different of doing it there, do it

36:14 environment variables which I won't get into now, but we can see that

36:21 and it's a good question, I know. Yeah, so but in

36:29 you do it using uh some of and what many variables that provide access

36:34 ? Yeah, yeah, give them . Oh no, someone was trying

36:40 get in, I think I let in. Yeah. Mhm.

36:46 With you can simply do that with of the flags uh that M P

36:52 run provides you and to get some about what uh what those are the

36:58 don't you simply do. Nt I uh dash edge. And these are

37:04 general flags that you can use but are some other categories that you can

37:09 can also get some information about a of them that are interesting for us

37:14 now. Uh one of them is and the other one is binding and

37:20 get some some extra information about those of uh options that NPR and

37:27 has this message here, says FBI to help and about mapping and these

37:36 the flags that you can use here mapping. What it means is when

37:43 have that more than one processes, tells you mapping tells the runtime that

37:50 does the next process should be? respond around it too. So let's

37:56 if I specify mapping as poor. first process will be spawned or invited

38:03 it on 44 0. And the process will be a bind it to

38:08 next score as the next physical physical . If I specify mapping as socket

38:16 let's say you have to stop Then first process will be blinded to

38:20 zero. And then second process will blinded to socket, socket one.

38:27 right. So mapping decide where does where does your process end up physically

38:33 the honor system. Right. And then there's another Okay. How

38:39 that's the minding. Right. Which physical location out of your sockets or

38:48 trust. Even where your process will up right. Where it will execute

38:58 is a there is a default value everything. Right? So, if

39:02 don't specify whatever the default is for one time that has been uh configured

39:08 he's calling FBI packages that will be generally it's either core or socket or

39:14 . That makes sense. And for applications. Alright, I'll show you

39:21 that looks like I'm not. I'm to get into that before I get

39:25 it. I was just giving us definitions there. And so that was

39:30 binding means what all locations can your move between? That is where can

39:37 operating operating system move uh your process it once it is spawned. So

39:44 you as you remember, the operating has the authority to move your processes

39:49 places in the uh in the So if you let's say specify binding

39:56 poor, that means the process process stay only on that particular core for

40:01 entire execution period. Operating system cannot it. If you specify binding as

40:09 , that means the process can move all the cores in Silas socket.

40:14 operating system has the authority to move process in on any of the course

40:19 that in that socket. So let show you a couple of examples then

40:23 uh finish with that. Okay? , so let's say Okay,

40:36 that NPR and if you want to uh the mapping is you just need

40:40 provide this display map flag, then need to provide first. You're mapping

40:46 was okay, that's not good. , just look at my allocation.

41:02 , Sorry? Yeah. Okay, you. Yeah. Okay. All

41:17 . Mhm. Okay, I'm a , this place play up manna by

41:27 this I will choose as good for , um I'll also choose the finding

41:34 the core And then I was on asked for 65 last year, 65

41:42 uh and then execute that. maybe I there we go.

41:52 so what this output shows you here okay. It will be fine if

42:01 make these things smaller, will actually it more readable I think. Uh

42:09 I know it looks small but it make sense when I show you the

42:12 . Yeah. Yes. So I map by score. Yeah, I

42:23 it's hard to see but yeah, with me here. Okay. And

42:29 this case apparently I got between these uh square races. That's only one

42:36 here. As you can see. that means I only got one core

42:40 in the first socket when when someone me access to it. And then

42:45 got a bunch of course in the socket. That's between these two square

42:50 here. So because I specified map four, that means each consecutive process

42:58 up on the next physical court. you can see from this lee that

43:04 up on these uh these locations So each process is find a top

43:10 to the next available court. And using bind to core, it also

43:16 you that the processes are only buying single single. It's not going to

43:22 to any other physical court or any socket here. Right. And so

43:27 of designing these 6065 processes that I and ended up getting blinded like

43:36 So all the way up till the of the second subject, uh one

43:40 by one core. All right. difference, I will show you now

43:50 if I say map by socket and this case, because I'll show you

43:59 I'm doing this in this case, processes will be handed to the same

44:05 core in this case. So, also need to tell it that you

44:10 allowed to overload a code a That means you are allowed to have

44:18 processes on the single on a single . So, you need to tell

44:22 otherwise it gives you an error that are trying to map multiple processes to

44:26 same physical core. So this flag you do that. So core colon

44:32 dash allowed. All right. so here you can see because I

44:40 map by socket. So, first went to socket zero. The second

44:47 went to sock at one. But , rather than going to the next

44:52 and socket, second socket For the process, the third process now came

44:58 to start at zero. Yeah, we were mapping my socket earlier.

45:07 were mapping by pause so that each went to the next available physical

45:14 Now, because we were doing map socket. Now, you can see

45:20 uh there are uh the next process up on the previous uh relative side

45:27 sockets here. Does that make Because we're happy in my pocket.

45:35 , so as I said, mapping where does your next uh process ends

45:40 on the physical resource? It it goes to the door in the

46:00 next mapping place. So even though said mob eye socket, it went

46:07 zero on first uh for the first And it went to stop zero and

46:13 one. Now, now, because I only got access because of

46:19 one board. The process I live on the same golden in soccer.

46:23 . The third process, but the process actually had access to uh access

46:29 places. Uh One more war on top of zero. So that's why

46:35 rather than going to the same core socket, socket one It went to

46:39 next next available, go on Socket . So process movement to Sockets.

46:45 , 0 on Socket one. Process went to core one on south at

46:50 . So whichever available? Poor. is there? It will go,

46:58 will be, it will be the time you'll see uh this process uh

47:04 to come back to you and you with all all the available on that

47:11 ? Uh no longer than them and back Because zero Unsalted 1.

47:16 if it has more processes to Yeah. Is that making sense?

47:33 . Yeah, so shocking, You know what, what is it

47:40 Slurp gave me access to only one now because because I asked for 65

47:46 on these nodes, you have 54 on the socket. So I got

47:50 got 14 or on socket zero and 60 64 on socket one.

48:01 correct. Yes. If I had all the 1 28 course I did

48:05 because again on bridges I'm getting along times because of that. But

48:10 if you were requesting all 1 28 , then you would have seen two

48:15 lines going all the way towards the . Right? Yeah, but

48:24 And just one last example I show then I'll let dr johnson continue.

48:30 was it? Yes, map eye and trying to start it as

48:36 So it was binding to cause so inside the socket on lady process can

48:43 between uh entirely inside. Just want that. There's no movement at

48:47 basically. Oh, so if I bind through socket kit, then the

48:53 looks something like this. I know a little bit hard looking, but

48:56 guide you through it. All Yeah, Yeah. Mhm. All

49:02 . So, this here is processed . Which again, there's only one

49:08 from its mapped and bind it to socket. zero. Yeah, is

49:15 Yeah, in this case it would better area. Yeah. Okay.

49:25 that will make sense. The scroll here. So, so uh zito

49:35 floor on the on the soccer zero . But if you look at process

49:40 which is here I think. And yes. So it's violent to

49:52 the cores that were available on on on the second pocket. So mapping

49:57 that socket. So each process goes an internet socket. But now the

50:02 are blinded to a socket or rather a call. So the weapon system

50:07 removed. The process between any of physical cores there. So depending on

50:15 you want to do in your application happenings and my links can be.

50:20 huh deciding, deciding factor in what you get and that's you.

50:28 Mm hmm. Yeah. Let's see this makes sense. So what I

50:36 was around the uh the broadcast operation I showed you with 100 megabyte of

50:42 exchange to be done. So this the execution call that I performed.

50:47 now if you see my assignment um in between uh to do prophecies you

50:54 see the difference and now you can what happens when you go to.

51:01 Still I'm on the same note, only communication is open between those

51:06 So even with different sockets at least minimum time is at least three times

51:11 less than you can follow them. maximum tender. Uh huh. Uh

51:25 . Here also the process, the time very close to that.

51:35 Yeah, 100 is really not the . All right. Yeah.

51:44 I credit. Oh exactly. Uh Yeah, that's pretty much what

52:02 have, Yeah, yeah. Oh, mhm Yeah, can't do

52:19 . Mm hmm. Mhm Right. , especially him up over the final

52:28 . I will. Mhm Yeah. . And it's on us so greatly

52:34 time and it should be on the represents in terms of projects, upload

52:44 again, if it's not there, think they all done this, that's

52:51 Change. The old one. Is still? Yeah, yes, that's

52:58 I was up to. The deadline on top here. So I the

53:04 to send it up and and since started to talk about it and it's

53:08 to get started early that I wanted be this project description and went

53:17 I think today, no, the structures, so it's not very onerous

53:24 do the project description is just, just think about it. Um and

53:30 there's a bunch of texts here. but the point is um we need

53:37 write normal than age in terms of you want to do, What resources

53:48 need to do the project and then dramatic 16 years clusters or computers being

53:55 in the class and even have their and I want to reject it,

54:00 not something that is in process and , there's a good chance you can

54:05 it. And then once I mentioned time, do you need to describe

54:12 data sets are going to use and you're going to verify correctness and then

54:25 , so part of the thing is common among students are too and data

54:33 that they have in mind is too to make sense of to started to

54:40 out either paralyze or trying to tune the running time, the milliseconds and

54:47 doesn't matter what to do, you're going to measure over. So it

54:52 to be enough of a workload not connect some decent observations about performance.

55:02 that's what I'm trying to get out the description. So I'll give your

55:07 but it's a final guy. I you to just our ideas and that's

55:14 much I think all these taxes and I'm here in the long list

55:22 projects students in this past over the have done um some are and it

55:31 basically what their students interests have Uh huh. So it comes from

55:40 different aspects of different applications or some it is more just there have been

55:49 fifties, uh huh. You might interested in another something where this aspect

55:57 it and there's something three slides So there's lots of things to

56:03 Some of it comes from engineering scientific and someone with the systems come from

56:10 image based applications and um thank you some, you know, competition

56:21 second physics or dynamics or what many and disciplines. So these size,

56:32 are kind of the old examples, they should be on the website of

56:36 problem again and so on. This just to give you a deal.

56:42 right. Yes. And I'm sure they have things in mind that do

56:48 classes or from it or thesis that this system vote from the class

56:55 perfect. The violence of work on them next to the donald.

57:00 That's about me. We're gonna do you're interested in. Mr um so

57:11 can do it. You know, more than mp mp. I clusters

57:15 A C or whatever. Uh programming and and choose choose spine is known

57:23 the appropriate for the project. And as we said in the last

57:32 that the focus is what to learn to the techniques teaching in the

57:44 If you get fabulous performance in the of close to thanks performance or high

57:56 , that's terrific. But even if don't, as long as you understand

58:01 you have to fix it, uh uh that's fine too. So long

58:12 . And a high performance is not fraction of deep, high efficiency

58:17 not all this easy, most of time. It's not. But the

58:23 of the classes that we should have idea how to approach it. And

58:27 can tell you no one steps from first inclination what, but, you

58:32 , efficiency of that and then where are, it's time to wrap up

58:37 project. And um and then see else you would have done if you

58:43 more time. So it's yeah, guess I did say that, did

58:49 say, I think that's the beginning the chorus sets, um the written

58:56 on the presentation, so the exam , final presentation time and we want

59:02 get both the written report and I want to say at least 24

59:10 at the end of the presentation, everything to cut it preparing for feedback

59:15 time from the presentation. Mhm. the I think you said on the

59:25 one here a check, but still the example is on december time.

59:35 it's a monday, I have it to fight. So it's actually,

59:41 think the first thing they sound Oh, but I also know that

59:52 uh students take off of the holiday and if the class of the whole

60:04 to earlier, you can certainly do process, but that's what we kind

60:12 agreed upon the students in the past to what happened for. So I

60:25 questions in relation to projects, that's last time, you know, I

60:35 the individuals projects but to his final , some type of students, Children

60:43 could come back 13 smile, that very nice. The Syrians.

60:51 that's fine. Yes, you some students have done an empty and

60:59 opening here or open MPI and openness see and so or the serial versus

61:09 , it's fine. So I just to understand we have whatever problem we

61:17 uh to understand. What efficiency is cold yet. Um it's when it's

61:33 good from the first information, not look under cold. What steps did

61:38 take and why did you take And did it actually pay off?

61:45 . So we have talked about some it, I haven't talked about all

61:49 much yet, but the talks a bit, at least some of the

61:54 to write the check to the memory and try to act efficiently. So

62:02 is a little bit, I'll talk about that turns out four months structure

62:08 Soul. The thing we didn't talk was a couple of open, empty

62:14 you are to share, divide up work among friends and how to also

62:25 of manage memory in the sense having variables versus shared variables based conditions,

62:31 it's also memory utilization those kind of . We talked about talk a little

62:38 from the used GPU that's worried about transfer between the two and my complaint

62:42 producing and they spread a little bit the demos that compilers have gotten better

62:48 last year, whether you use the versus non managed memory yourself and you

62:55 try and see if it can be . Uh that's sort of uh I

63:03 play around, but uh scalability in of others escaped in a number of

63:09 you use or um number of threads also in terms of the problem size

63:21 for manifest the common things wow. the comments of americans, process

63:32 yes, process not being talked about . Both open, empty and FBI

63:41 and in particular Mattis for those, know, as shown uh if you

63:51 few processes and pretty much everything is one or two notes french fry it

63:57 doesn't matter too much. And she to reduce the number of notes.

64:04 , so in the election, find can do it. I'm not to

64:10 whatever Since nowadays there's a fair number course and there's no also say I

64:16 to use 100 what are the number course and you can use it on

64:21 minimum number of notes or you can it thin. Yes it did.

64:26 will take only a few times to a bunch of notes industrial one or

64:30 courses now but you can experiment and the uh huh spending things out

64:39 So that was, I think I about it in terms of connectivity in

64:44 of the scatter or our contacts are with different names of the things minimize

64:51 number of military use for the number threats that we want to use processes

64:57 you want to spread it out and spread it out to the advantage because

65:02 advantage case for the number of Uh huh Sometimes that's a good

65:08 Sometimes the gun culture cluster, sometimes extra communication as those that otherwise you

65:15 the game became then we're done with communication that manage it from the

65:22 Also fixed and also spreading moving selective . But we're combat things again because

65:39 Level one and on both Bridges Center some speed 11111 to our private and

65:51 . So that means how you might sometimes they're sharing, compete for the

65:59 resource and it's better to spread things again that a memory about that he

66:05 up but it also can result in one so cash lines sourcing Iran and

66:13 forth sharing. So that can degrade in the intention was good. So

66:19 some things again, playing around with . And I'm just so again,

66:27 understanding of some of the tools brought , but it works for the application

66:34 a hand stuff. All right. were also comments and seeing them past

66:47 in europe with an application make justice times. Remember the access by them

66:56 have a significant empire. So the applications in china. Yeah.

67:03 I have a reputation. No. huh. Yeah, memory bound provides

67:11 stream or fixed artists that you have your assignment. What kind of situation

67:19 devices, execution that's not the Keep the change. Well mm

67:27 Find an application of it because it not do what? Optimize the application

67:34 that for shoes. Oh, uh . So coming insane when the other

67:50 needs extreme groups. And the point to, so in that case,

67:55 , about the memory system, I a fairly arbitrary actually planning a number

68:01 distribution of performing well compared to something with the race so that we also

68:09 something that could potentially be a Winchester playing on and see the difference between

68:17 does make its operations as far as operation, different potential structural things.

68:27 huh. Again, good performance at sparse matrix. Um, and again

68:37 and other things so forth. Whenever is some good um, hold up

68:46 open source or it's even if it's some vendor library installed on systems,

68:53 have access to encourage students to invest use those that are presumably highly engineered

69:02 get good performance. That's a reference . Compared to what we're talking about

69:09 when it comes to 50 service. , sign for the rentals or I

69:17 corresponding math library. You are well and it doesn't mean that you can't

69:24 them. But but in general they work really well. So it's a

69:30 sense of getting the sense for our cold complexion. What could be expected

69:41 um, it can't all this expect Yes, 100% efficiency. Well,

69:51 kind of ridiculous statements of most But um, in a certain computations

70:00 simply can't do it, bring it because matrix operations and genetics,

70:07 So major protector as a balance. multiplies is perfect for what dominates today

70:13 a confused not by ad architecture. the next single structure, you do

70:18 on supplying that. But if you have the same number as multiplying good

70:27 , they can't benefit from that. there is no way that you're going

70:32 get that. People look at the line size to there was sort of

70:36 . If this doesn't hold that this the best performance. You can.

70:41 . So then they go that's realism on the application. What's best for

70:48 maximum performance is that You mentioned that and 50s. The standard model.

70:56 is a complex. The complex at doesn't have the same number of

71:01 And multiplies because the conflicts operation. it's Where is the four months of

71:06 and six ads. So if you big performance, assuming you can do

71:12 multiplying us at the same time. is not enough multiplies the matchup for

71:17 ads. There's no way they're going get 100% efficiency and that's nothing wrong

71:22 it. But then they need to set your expectations store. So that

71:29 you know, and maybe you can for instructions are best about debts and

71:35 do that is just that Make up get to the 10 opposite tastes.

71:40 complex mixed response invitation. So so they should start, you know,

71:47 you can get close to that number it comes to be about 70% and

71:52 still doing I don't know something. then um so it's again try to

72:05 the call them how to relate to architecture and what souls checks can

72:12 Uh huh. The gap right. to write, you know, drive

72:20 sports car figure out over the Yeah. So I mean 1 1

72:38 to think about it is rather than jumping into analyzing for other applications.

72:46 . About the members. The way start is the way we started doing

72:51 is can I do things that no a single first step? But then

72:56 my problem is that I don't have compute resources to get good efficiency,

73:03 there it would make sense paralyzing the otherwise. Uh huh. Uh The

73:14 class act. Thanks. Are we the structure of the Yeah.

73:26 Right. Mhm. I'm trying what's the video practice much. Uh

73:33 Well, what's up? They're not much I'm good for? Yeah,

73:40 can. Right. Yeah. So yeah, that's something I

73:50 students too start with trying to figure well, uh how much work did

73:57 ? But what is the work of moment? A long time we take

74:02 terms of are really successful. And then it should be in it

74:07 the order of seconds at least total that I am in the best possible

74:12 initially. And they were translated to with the measure things any may end

74:18 taking hours in the end, but a starting point to make sure you

74:23 have to smaller data. Okay. you so much. Well, and

74:32 recording. Stop recording. Mm All right.

-
+