© Distribution of this video is restricted by its owner
00:00 | he told me. Okay, so know this. So today, talk |
|
|
00:25 | one more tool kit, and I this or the towel as there's no |
|
|
00:31 | the tuning and analysis tool kit, guess before I do that, |
|
|
00:39 | yes, say that's in terms of assignment, one that was just returned |
|
|
00:45 | . I think everybody did quite Um, and the theme, I |
|
|
00:53 | , if any general comments, if were kind of deductions or points, |
|
|
00:58 | it is. Try to make sure when you look at performance, in |
|
|
01:06 | sense, it is relative thio what platform can do. So it's a |
|
|
01:12 | , whether it's, uh, good bad, and just looking at times |
|
|
01:18 | themselves doesn't really tell whether it's good use or not. So that's the |
|
|
01:24 | that will come back in every excitement you're supposed to trying thio Reason about |
|
|
01:33 | well did the code actually do relative what it could have been doing? |
|
|
01:39 | for that purpose, Papa is one on today. Towers, another tool |
|
|
01:45 | is, um, really tool set kid as it's set on well, |
|
|
01:55 | briefly go over again. Some of capabilities. It's a very rich to |
|
|
02:00 | . And then so Joshua will. more some of their kind of simple |
|
|
02:07 | that well might be sufficient for most the assignments in the course. So |
|
|
02:13 | talk a little bit about the capabilities little bit, how their tool kit |
|
|
02:17 | actually put together or architecture in. talk about how one can uses |
|
|
02:24 | get to instrument the code in many ways. And then one song had |
|
|
02:30 | talk a little bit about how it the measurements and what type of measurements |
|
|
02:36 | be collected, you know, and specifically talked about profiling and tracing and |
|
|
02:45 | is what you will be using and . You might be able to use |
|
|
02:52 | . It's a little bit more extensive then finally talk a little bit about |
|
|
02:59 | off, trying to make sense of collected through instrumentation. Oh, areas |
|
|
03:09 | of just seeing this pyramid before and focused on the memory system, and |
|
|
03:17 | put in the little graph that was nothing in Lecture three showing the evolution |
|
|
03:23 | complete compute capabilities versus, uh, main memory capabilities. And there is |
|
|
03:32 | of divergence, so a lot of focus in order to get performances on |
|
|
03:37 | well the memory hierarchy is used. Poppy that let's Talk About last time |
|
|
03:48 | focused on the process of level. so it is with register caches on |
|
|
03:59 | local memory. It has also got increased capabilities over the year to |
|
|
04:05 | they were network interfaces and the and there's also set about the |
|
|
04:12 | which we will not talk about in course. That's target specific parts off |
|
|
04:21 | sort of system in memory hierarchy. , on the other hand, is |
|
|
04:25 | comprehensive tool set, and in fact becomes comprehensive by basically actors and interface |
|
|
04:33 | many up their their tools, including . Oh, come on. So |
|
|
04:46 | is a little bit what things it the comprehensive tool. Um, you |
|
|
04:51 | do all kinds off tests or Uh, we can do for parallel |
|
|
04:59 | and sequential programs, and there current to It's only used for sequential code |
|
|
05:06 | to make keep things simple at but it's perfectly viable both for shared |
|
|
05:13 | programs as well as for clusters. has a pretty sophisticated system for |
|
|
05:20 | Managing the output on the results of as well as then for analyzing the |
|
|
05:27 | on for many off the analysis parts actually use other tools public open source |
|
|
05:34 | that are out there. Onasa seventh bottom it developed by the University of |
|
|
05:41 | , they have a performance research lab , as the website tower, you |
|
|
05:47 | gonna continue you and there are lots information, including presentations and reports, |
|
|
05:54 | well as downloads that you can do if you want to use it. |
|
|
05:58 | for prince source in history and in of again coming back to when I |
|
|
06:04 | a few times to try to understand performance. And for this course, |
|
|
06:09 | not just the time you ended up for running a piece of cold, |
|
|
06:14 | actually trying to understand the resource utilization and towers and can be used not |
|
|
06:20 | to figure out how well it's but also way so focusing on where |
|
|
06:25 | improve them. So the typical first you do is figure out the |
|
|
06:31 | and then, in order to understand the time is good or not, |
|
|
06:36 | you really need to have an expectation the platform s capable of it and |
|
|
06:42 | the time is, Ah, good . Or now and then anyone can |
|
|
06:48 | at instruction, the collections and Mrs and other things that Poppy does |
|
|
06:56 | I talked about last lecture. Then can also look at Iowa. That |
|
|
07:02 | to be disk. We don't worry much about this, can discourse. |
|
|
07:05 | that's something that certainly can be And it's essential for many those |
|
|
07:12 | even though we don't deal with it this class and focus more on |
|
|
07:16 | uh, non, this part off execution and then half in things |
|
|
07:26 | So these interests standards that the questions one should ask oneself when it comes |
|
|
07:31 | understanding coat performance. And it's, , it's typically like three steps. |
|
|
07:39 | , there's three instrumentation. So to sure that somehow the kind of data |
|
|
07:47 | interested in in order to judge the is being collected from execution of the |
|
|
07:53 | and there many different ways of instrument the code that, uh, towel |
|
|
07:58 | help it, and I'll talk about a little bit. So yes, |
|
|
08:03 | the source code instrumentation Montenegro's rappers from routines and even is kind of a |
|
|
08:10 | more sophisticated, I would say too, on the dynamic instrumentation by |
|
|
08:16 | the binaries. We will not do in this course, will stay with |
|
|
08:21 | at the source code level. Then is the actual measurement, basically running |
|
|
08:30 | code for a number of cases that of interest. And for that, |
|
|
08:35 | can do profiling and tracing that I talk about profile. Then you're skips |
|
|
08:42 | of summer information. But it tells depending upon how you do the |
|
|
08:47 | perhaps where to focus on on Where the tracing also has sequence information in |
|
|
08:54 | of called sequences. What happens when different ways of doing is to do |
|
|
09:01 | instrumentation that then focus on particular On DSO does indirect instrumentation, but |
|
|
09:09 | not buying sort particular probes in the , but using counters of various flavors |
|
|
09:16 | infer where things maybe of interest, can be collected in many different |
|
|
09:25 | You can progress. You can do blocks in the code that can produce |
|
|
09:30 | that can. I also mentioned when talks about Poppy, and if one |
|
|
09:36 | not careful, one can end up lots of data and then trying to |
|
|
09:42 | sense out of data. It's I would say. Definitely, benefits |
|
|
09:48 | having some form of analysis. Tools visualization, I will say, is |
|
|
09:54 | very useful because it tends to be a good visualized. Herman Cain very |
|
|
10:00 | spot where there may be anomalous or or something, which is much harder |
|
|
10:06 | do if you had a tablet So that's why the and assignment also |
|
|
10:13 | ask you, Thio, Graf things addition to providing a table that gives |
|
|
10:17 | more exact information that you can do a graph typesetting. And, of |
|
|
10:25 | , for large data set to many need more sophisticated tools like data mining |
|
|
10:29 | statistical analysis. So on. And think it was mentioned on the previous |
|
|
10:36 | . Even it What's this notion? and inclusive measurements. So this is |
|
|
10:41 | to explain what those two terms main exclusive focus on particular parts, whereas |
|
|
10:47 | inclusive measurements covers kind of a whole without giving yes particular actions. And |
|
|
10:56 | really illustrates example, uh, how might make a difference and what information |
|
|
11:02 | get later on. So this just to illustrate these three faces in terms |
|
|
11:12 | trying to do performance, the bugging assessment of your cold, the instrumentation |
|
|
11:19 | , the measurements and the analysis And under each column, As you |
|
|
11:23 | on the slide, it tells the ways I was already listed on the |
|
|
11:30 | slide that I shall but source school or using libraries or linking in the |
|
|
11:37 | virus for your static or dynamic, , instrumentation of your code on. |
|
|
11:43 | you can also work with Execute Herbal do this rewriting of the binary at |
|
|
11:48 | type and we'll talk more about the . As I mentioned on this is |
|
|
11:55 | showing a little bit about the various . Then when it comes Thio |
|
|
12:01 | it can be events based in a you can do it on. You |
|
|
12:05 | , the whole program. You can it on the routine basis. Talk |
|
|
12:09 | present when it comes to probably doing convened with the communication calls, and |
|
|
12:16 | you can deal with heterogeneous type architectures you have accelerators like GP use or |
|
|
12:23 | B. J s or other attached . I'll talk a little bit of |
|
|
12:30 | profiling options more in detail and to sense on this flat and called path |
|
|
12:34 | the names that it shows on the as well as about. They're tracing |
|
|
12:43 | I'll talk a little bit about the and number off the names mation and |
|
|
12:48 | traces are other software packages off Use off for doing some of the |
|
|
12:57 | on a society is a comprehensive So this is kind of and I |
|
|
13:02 | Sign that tried to put everything together on the left hand side. It's |
|
|
13:07 | of the basic things that we talked today, which is, uh, |
|
|
13:13 | the top. Left hand is most the instrumentation and measurement, and then |
|
|
13:18 | bottom is some of the more simple part in the center, and you |
|
|
13:23 | a little bit, I would actually kind of should have been in |
|
|
13:27 | left hand part. But I just fit the people that put it together |
|
|
13:31 | it actually has to deal with the . The PDT that is stands for |
|
|
13:37 | database tool kit, and under in the middle section is the analysis |
|
|
13:43 | . And there is the database far as the B M F |
|
|
13:52 | the MF and then there is the profile, and it goes on and |
|
|
13:57 | para proof that supports them. Profiling . We will not talk about right |
|
|
14:07 | neither today nor plan in the future we're trying to just again. They |
|
|
14:14 | enter to the tool they're supposed to you expert users. So here is |
|
|
14:20 | a little bit pulled out that you maybe be able to see what it |
|
|
14:24 | in terms of the instrumentation. So source code instrumentation and then with the |
|
|
14:30 | library. Rappers and I were talking those on the next several slides. |
|
|
14:35 | then there is the measurement part that you to define events that, like |
|
|
14:42 | proper, you can have use precept . Or you can define your own |
|
|
14:48 | and its various resolutions. And then the profiling and tracing that I will |
|
|
14:54 | about and then at the bottom of slide that shows they kind of data |
|
|
15:00 | for profiling and tracing in terms of counters, system counters and various internal |
|
|
15:09 | . Okay, on this is just them. Some of the analysis |
|
|
15:16 | So I said, I will talk they called instrumentation, so starting on |
|
|
15:23 | is a dimension these different ways of the instrumentation. Andi thes Our first |
|
|
15:30 | ways to resource called instrumentation, is use the program data. There's Toolkit |
|
|
15:37 | about that didn't come the next Um, and your assignment will do |
|
|
15:43 | compiler generated instrumentation, but you can then do manual. And then I |
|
|
15:50 | illustrate some of the other ways of in codes, but necessarily will not |
|
|
15:57 | using them in the course. So here is using this program. Database |
|
|
16:06 | clipped the best. But the principle illustrated on this slide that it basically |
|
|
16:14 | the source code from the application and analysis off the cold. And |
|
|
16:22 | based on the instrumentation that you may , that you want that, do |
|
|
16:31 | ? Eventually, the tower instrumentals and an instrumented code that you can then |
|
|
16:39 | of re compile. Now it has the decided instrumentation within in the obstacle |
|
|
16:46 | by the compliance. And he was little bit of the process shown that |
|
|
16:51 | it does that after the person it , the typical computers doing generate an |
|
|
17:00 | language representation of the program that has analyzed and then what they call the |
|
|
17:07 | tape is then the thing that processes output from the analyzes together with the |
|
|
17:15 | of instrument instrumentation you wanted and then , and an instrumented source code that |
|
|
17:24 | then compiled on be executed. The other one is compiler instrumentation. |
|
|
17:31 | that produce? Um, basically, , put their, uh, preamble |
|
|
17:41 | r before the particular compiler, You to do so tell on the score |
|
|
17:47 | then the compiler. And that means tile C or FORTRAN component which other |
|
|
17:53 | use well, then automatically instrument the . According to pre defined set that |
|
|
18:01 | top folks decided, was a useful of the codes. It's not quite |
|
|
18:06 | flexible and rich, but on the hand, it doesn't require you to |
|
|
18:12 | and specify specific things that you want have, uh or measure and, |
|
|
18:21 | , suggestible demo later for see how use the compiler to instrument code. |
|
|
18:29 | , and this is just showing that have a bunch of options than to |
|
|
18:32 | out how much of information want in output. So again, what I'm |
|
|
18:42 | is just giving you high level overview options are supposed to given your |
|
|
18:48 | just a learning to things that top do. And it we'll just use |
|
|
18:55 | very few options in the assignment. foreign area life using, uh, |
|
|
19:03 | computers in particularly you may find going and looking at the towel and poppy |
|
|
19:11 | in detail. Useful rapper instrumentation is through different things you can do pre |
|
|
19:20 | , uh, library. You can library teens that exists. Or you |
|
|
19:25 | sort of link an specific, library routines in the library. Teens |
|
|
19:32 | in terms of us, said here the pre processing is, makes it |
|
|
19:38 | simple and use a pre processor that , uh, works on the source |
|
|
19:45 | and then does insertions. And of course it. It's limited in |
|
|
19:52 | they can do depending upon the pre that what is capable, uh, |
|
|
20:00 | wrapper libraries. It's perhaps particularly useful you're there are calls or routines that |
|
|
20:11 | link in for which you don't have source consider restricted more or less just |
|
|
20:15 | fixed binary, and you still want collect information. And some the information |
|
|
20:21 | be collected by this using rapper and this is just so it's a |
|
|
20:28 | bit harder can use the rapper writin . We're not going to give you |
|
|
20:33 | exercise for you to find your own proteins. And there is just the |
|
|
20:42 | about the rapper worked, uh, you have a gun, the source |
|
|
20:47 | , and you have the past program then instrumentation. From then, you |
|
|
20:52 | to wrap things around the generate an cold I think we will use. |
|
|
21:01 | I think suggestion would show pre lobbying some of the instrumentation box. Listen |
|
|
21:10 | then slide is not what you're shared member and MP, I or |
|
|
21:15 | you or open sea and won't be in the first or in this assignment |
|
|
21:21 | used to use star. But you like to use Tao and future |
|
|
21:26 | and that just illustrates that Yeah, then the various diverting pretty lonely on |
|
|
21:40 | is options of binary instrumentation that again will not use. But I just |
|
|
21:44 | to highlight, so that doesn't really the source code it all so it |
|
|
21:49 | your binary and during runtime, that's I call the diamonds for dynamic instrumentation |
|
|
21:54 | runtime on. There's three different options , but we haven't used in this |
|
|
22:02 | , even though another called developments that make instrumentation is things that has been |
|
|
22:08 | in various projects that have been And so you're silent. There was |
|
|
22:16 | . So ah, I will talk this and I was maybe ask for |
|
|
22:24 | minute on DSI If there are Yes, I said, This is |
|
|
22:28 | very high level over here, and they're just a larger to capabilities off |
|
|
22:36 | more than going into details. Josh, we'll talk a little bit |
|
|
22:41 | in detail about specific usage that is for the assignment. So these are |
|
|
22:48 | two, I guess, conceptually or different ways of instrument in the codes |
|
|
22:56 | generate measurements. One is to insert on Do you can do that? |
|
|
23:01 | gun through using the PDT and the kit, or the compilers that in |
|
|
23:08 | probes in the code that defines petition segments, you know that they want |
|
|
23:15 | or I can do it. Indirect , by using Creole example, is |
|
|
23:22 | hardware performance counters that may not directly act on specific code segments. Eso |
|
|
23:33 | is just, uh, illustration off , um, indirect performance measurements. |
|
|
23:40 | what Poppy does, for instance, because it used to performance counters and |
|
|
23:46 | , various instructions and cash behaviors. you can then restrict that information to |
|
|
23:52 | used for Cold Box and Fred's But it sort of interactive doesn't say |
|
|
24:03 | for particular segments of code, except you define it through a thread single |
|
|
24:11 | . So yes, it is. can have used to define events in |
|
|
24:17 | like in Poppy on Again. It exclusive and inclusive measurements where again inclusive |
|
|
24:26 | like if you dio start timer and you run it for a routine and |
|
|
24:30 | you have and time I call, it has everything in between without doing |
|
|
24:37 | for specifically, say statements. that's an illustration, and they tend |
|
|
24:45 | be mon a tonic lee increasing like timer in the American or your account |
|
|
24:54 | . How many times a particular instruction being used. So it's also I |
|
|
24:59 | a good point to say why you to use tools like poppy or town |
|
|
25:06 | if you try thio, get detailed performance or runtime information and performance |
|
|
25:15 | information about a non trivial piece of doing by doing by inserting entire |
|
|
25:22 | making some runs and then making Mawr of time recalls to get narrow things |
|
|
25:28 | , it becomes quite Messi, time consuming in making many different |
|
|
25:36 | So by using tools like Top Towel Poppy, you can collect a lot |
|
|
25:43 | this information and single runs without going the headache off manually, doing all |
|
|
25:49 | assertions and captures. And I think , the concept the interval events is |
|
|
25:58 | between basically start end of probes. events may be triggered by particular statements |
|
|
26:06 | actions in the program, and then can do it in, necessarily in |
|
|
26:11 | of context and when there's routines for or statement level. So a little |
|
|
26:20 | about using Tao. And I think has just come to talk about the |
|
|
26:25 | and tracing. Um, so I say the typical way of doing performance |
|
|
26:34 | bugging or optimization is I would recommend first to do profiling that I will |
|
|
26:41 | about next to basically find out where what is the most time consuming parts |
|
|
26:48 | the code, and that could be on block level, A sustained routine |
|
|
26:57 | . Not typically, it may be to do it on, Uprooting level |
|
|
27:02 | and figure out which routines are the time consuming and then so narrow things |
|
|
27:10 | . And this is, um, of more or less what this |
|
|
27:14 | So I certainly collected data, and you have to make sense other than |
|
|
27:22 | . So here is now trying to this like this for the difference between |
|
|
27:27 | and tracing, if that's not So basically what the profiling does, |
|
|
27:35 | collects aggregate or summary information for the to decide Aziz of interest. It |
|
|
27:46 | no, um, sequence information in , so there is no particularly dependence |
|
|
27:56 | . Where is in the tracing? do have, um, the time |
|
|
28:03 | . So then you see when various happens in the execution of the |
|
|
28:11 | so I'll come back and talk more the profiling and the tracing in the |
|
|
28:17 | several slides. But this is just introduce the two concepts, and one |
|
|
28:23 | aggregated information on car attributes as to them and tracing and has the time |
|
|
28:31 | between events. So this is one that is important to really keep in |
|
|
28:43 | why you may not always want to with doing trace collection because it can |
|
|
28:50 | be overwhelming an amount of data that being generated. And that's why some |
|
|
28:57 | at various levels, maybe a good point us it's there. You can |
|
|
29:03 | with some limited profiling. One. look for some name events in the |
|
|
29:12 | or right you can do was known flat profile and get example in itself |
|
|
29:19 | may not the very intuitive what it . But it's again the summary information |
|
|
29:24 | whatever code segments you decide or code is of interest. Then you can |
|
|
29:32 | things like more detail on a loop . Or then you can do things |
|
|
29:40 | based on the con graph and figure how, um, the various routines |
|
|
29:47 | being called and other time we spent the in those routine following the called |
|
|
29:57 | . So the profiling pot. As mentioned this aggregate information and you can |
|
|
30:06 | , you know what the particular aspect the code that they want to be |
|
|
30:13 | basis for the profiling like loops, blocks, function calls, threads, |
|
|
30:21 | you want and then the thing that associated with that, whether you want |
|
|
30:29 | or various forms. Accounts maybe hasn't they're Bice transferred. If it's a |
|
|
30:36 | that it's likely to be limited by accesses you made want to focus on |
|
|
30:44 | data motion. That's when there data patterns like bites. If you think |
|
|
30:52 | , you may want to look you know, instruction calendars for integers |
|
|
30:57 | floating points. Or you may start some material level and think about function |
|
|
31:04 | or routines just how many times it executed and if you have it, |
|
|
31:11 | like what they had so far. like matrix multiplying, there is data |
|
|
31:18 | and figure out, you know, for function calls very easily from the |
|
|
31:25 | structure. But and most practical the pathway through the code is data |
|
|
31:35 | . So in that case, how times functions are called the data? |
|
|
31:39 | on the data set, you're running on, and you may need to |
|
|
31:43 | run it from many different datas has figure out how things behaves so that |
|
|
31:52 | that the flat profile is again basic information for the, um, parts |
|
|
31:59 | the code that you want the instrumentation work for whether again routines or threads |
|
|
32:07 | what it is, but it's kind high level somewhere. Information. Where |
|
|
32:13 | the Cold Path? Profile follows the for the call. Andi. I'll |
|
|
32:19 | some examples. Next. Um, think also this is fun. So |
|
|
32:26 | borrow this slide from, of course quite a few years ago. And |
|
|
32:31 | guess the people at the site that scientific unveiling folks there's still a lot |
|
|
32:38 | FORTRAN cones out there. So as can see on this side, what |
|
|
32:42 | used? The instrument. The code the tower FORTRAN compiler. In that |
|
|
32:47 | , the town underscore if 92 instrument cold using the tough FORTRAN compiler and |
|
|
32:54 | it's just chosen this case to run and it remember from the slalom lecture |
|
|
33:05 | how to specify the number off course processors of safe to use that you |
|
|
33:12 | etcetera. That is in the clear statement. But again, see, |
|
|
33:17 | will give a concrete example later. here's again what the flat profile may |
|
|
33:24 | like for a particular code and in case decided that time was the property |
|
|
33:31 | the code that one was interested And in this case it done on |
|
|
33:38 | car 15 basis or there's some routine C h e PRD, and it |
|
|
33:46 | us how many seconds were spent in cold and the next one. Waas |
|
|
33:52 | , this c c h e b f. It's such a going down |
|
|
33:57 | . And so it's a bunch of function calls, and it tells in |
|
|
34:04 | how much time was spent in those . It doesn't tell you how many |
|
|
34:11 | in every team was used to whether used once or if produced many |
|
|
34:18 | So it doesn't give you the time for distribution for different cause to the |
|
|
34:25 | routine, depending on when it was in the execution. So it's just |
|
|
34:31 | aggregate. So in this case of may be, if I try to |
|
|
34:37 | , the cold will be natural, start to look at the routine that |
|
|
34:42 | the most time. But it doesn't that because it took the most |
|
|
34:47 | it's not a very efficient code. that one her cars additional insight to |
|
|
34:56 | . And this is, um, trying to the loop level on the |
|
|
35:03 | thing using now FORTRAN compiler, the the couple on. In this case |
|
|
35:09 | are. They may look like So there is a look for me |
|
|
35:13 | multiplying the matrices that is, by the dominating time, wise and |
|
|
35:20 | Then there's a bunch of other So clearly fun goes down this profile |
|
|
35:26 | is being generated. At some you probably don't care if it's very |
|
|
35:31 | or not, because even if it's , it takes a little time they're |
|
|
35:36 | to optimize. This is not going change, change around time much. |
|
|
35:43 | can also use multiple counters. And , Joshua demo this So we'll skip |
|
|
35:49 | . But just showing this case, poppy performance counters to collect some information |
|
|
35:56 | instructions and this case cash basis for . And this is an instruction count |
|
|
36:06 | you can see what's happening in the Ah, the I guess in this |
|
|
36:15 | that operate on because you the way used in right, the instruction count |
|
|
36:22 | the time it took. So it's complicit, a measure that you get |
|
|
36:27 | this multiple counters and the next seven some cold path to try to make |
|
|
36:38 | of that. So this is I will. You can look and |
|
|
36:45 | that we'll see. The same routine more than once, depending when it's |
|
|
36:50 | in this called path. But I'll some simpler examples in the next few |
|
|
36:56 | . So here's a very simple example illustrate how they called path they look |
|
|
37:01 | so in this case it shows the routines being called and how much time |
|
|
37:07 | spent in the various routines. So is kind of more comprehensible. But |
|
|
37:13 | and this is the more serious not as a lot smaller, |
|
|
37:20 | called path tree. And when you it with a towel, it doesn't |
|
|
37:28 | show all the potential branches, but also allows you to follow and see |
|
|
37:35 | time like on the previous graph he in the various parts on the |
|
|
37:41 | Check, um, on the next slides is just trying to illustrate the |
|
|
37:52 | off information you can get between the and inclusive profile. So you have |
|
|
37:58 | this profile, and now this is exclusive version. And here's an inclusive |
|
|
38:06 | on obviously look very different, even maybe our, um, from just |
|
|
38:14 | the slides on you but pulled out the next slide. Parts that are |
|
|
38:19 | of more interesting to try to shoot where there may requesting different significant difference |
|
|
38:28 | doing exclusive and inclusive measurements. So for one of this routine here, |
|
|
38:38 | actually the numbers are the same. the light blue air that shows that |
|
|
38:45 | not really any difference between the inclusive . So there was nothing else interesting |
|
|
38:51 | on. On the other hand, you take the uh huh, dark |
|
|
38:58 | or almost black, there you can that exclusive time is very small. |
|
|
39:06 | the inclusive time is very large, speaking. So in that case, |
|
|
39:14 | closer measurement doesn't really point out on much about what happening in this |
|
|
39:20 | A ties 30. So these are important to kind off be aware of |
|
|
39:30 | two, and that includes their measurements not always all that helpful. But |
|
|
39:36 | , it can be a first step it is more moderates in terms |
|
|
39:43 | Mhm. They have put that sir. Um, so could you |
|
|
39:51 | explain inclusive and exclusive with respect to off routine? So, as as |
|
|
39:56 | did or here, each routine was in the profiling, right. So |
|
|
40:02 | that the routine how will you different between exclusive and inclusive? So you |
|
|
40:10 | did the exclusive measurements by thinking and at this craft that this little insert |
|
|
40:17 | you have on the left side and craft, the inclusive part measures the |
|
|
40:25 | in this case for everything between the and the finish of this full |
|
|
40:33 | So function call exclusive Onley measures the statement A equals a plus one. |
|
|
40:44 | else that goes on and this full , including the call to the |
|
|
40:50 | is in the inclusive party. So these statements, individual statements would would |
|
|
40:57 | an exclusive. Though they are fair has 100 or uh, automatically |
|
|
41:03 | Then all the equations will come and . Is that right? Well, |
|
|
41:11 | if you want an exclusive measurement for statements, yes, then that will |
|
|
41:19 | measured and everything else will not be in that timing. Okay. So |
|
|
41:28 | can alter which is to be, is we can alter. We can |
|
|
41:33 | which are which are to be chosen exclusive and inclusive. So, in |
|
|
41:39 | , how usually reports inclusive, inclusive exclusive values for all the all the |
|
|
41:46 | in your court. You don't have select event reports either of them, |
|
|
41:53 | it reports both of them. You see it on this slide. Maybe |
|
|
41:58 | , maybe you. You can continuously biscuits. So this is in a |
|
|
42:03 | for the exclusive and inclusive time, ? So as you can see, |
|
|
42:08 | exclusive and inclusive time for all the . So each name everything in the |
|
|
42:14 | column. It's once of routine in code. Eso if you take, |
|
|
42:18 | example, just the first one dot application. So that's most likely the |
|
|
42:25 | call in the source code and that that particular section, if you see |
|
|
42:31 | , takes only two point something But it also includes all the other |
|
|
42:37 | calls that are that you can see it. So, including everything that |
|
|
42:44 | subroutine takes 54 seconds s o. would be the summation off all the |
|
|
42:50 | that is going on. Exactly. everything inside it, it's it's taking |
|
|
42:54 | total of 54. But that particular itself is only active for two |
|
|
43:02 | Does that make sense? Yeah, then they don't add up like 52.8 |
|
|
43:07 | be a subroutine off dot application. . So they don't add up because |
|
|
43:13 | next one is 33 category microseconds, it? And what is the unit |
|
|
43:20 | this? No, u s. , these are the seconds. But |
|
|
43:25 | cannot add up all the inclusive You. If you add up all |
|
|
43:29 | exclusive ones, then you will end with 54.92 That's that's on the |
|
|
43:35 | Okay. Okay. Explosive times for . Then you'll end up with |
|
|
43:40 | Okay. Okay. Okay. Thank . I had a question as |
|
|
43:48 | Yeah, go ahead. So you a routine that is heavy on Rikers |
|
|
43:54 | , um, are are discrepancies introduced to the time it takes to allocate |
|
|
44:00 | on the call stack or things of sort have over him that's not captured |
|
|
44:05 | town. Yeah, I'm not too about the Rikers, and I believe |
|
|
44:12 | reports each, uh, each record has one one sub protein itself. |
|
|
44:19 | not sure about that. I'll have check. Okay. Yeah. As |
|
|
44:25 | as recorded in goes I'm not entirely about that. Yeah. So, |
|
|
44:37 | , coming back to this slide. what she has talked about in the |
|
|
44:42 | . So again, the the tower has, you know, 54.9 inclusive |
|
|
44:49 | , but in itself, at the or the call overhead. For that |
|
|
44:59 | is like to seconds. So the thing that it's being called, this |
|
|
45:04 | exact routine, right? That took . So again, if you had |
|
|
45:14 | exclusive time for the towel to the time for the exact then you get |
|
|
45:20 | toe almost the 54 9. So best because you see that the difference |
|
|
45:28 | what's being called first thing in Tao inclusive time, plus exclusive time for |
|
|
45:35 | gets to the more or less the for the whole thing. So this |
|
|
45:42 | shows, uh, the number of and being used in this case. |
|
|
45:48 | is only called once, executive called . But as you go down some |
|
|
45:53 | the sequence of call in this some other routines that are called very |
|
|
46:00 | number of times. So you got lease information and then you can see |
|
|
46:06 | kind off towards the middle On the . It sort of pops up a |
|
|
46:13 | bit of the country, and then again goes down in the calling sequence |
|
|
46:20 | this addict differently. In appears E at two different lines and then called |
|
|
46:27 | different number of times in the two the first call on 180,000 times. |
|
|
46:33 | the second time is baby in the . It's 90,000 times, so you |
|
|
46:40 | a fair amount off information how things being used. So that's what I |
|
|
46:45 | . Um, the profile as the that I will show before on the |
|
|
46:51 | , the total number and the told safer. The Advocate 13. But |
|
|
46:55 | how many times it was called. , um so any more questions are |
|
|
47:05 | to this, Otherwise we'll talk a bit about the tracing part. So |
|
|
47:17 | just done. They add the sequence calls not just in time so and |
|
|
47:25 | much time each one of the calls take. And so you have time |
|
|
47:31 | for each invocation off each call and direction off the running time. So |
|
|
47:39 | here is an illustration when you have programs that one needs to have basically |
|
|
47:45 | synchronized clock, and in this case shows to process is a and B |
|
|
47:53 | that are going to exchange information by sending stuff to the process be in |
|
|
47:59 | process, V than being having I . We'll talk about how this things |
|
|
48:05 | when we talk about N. P and classical programming. But this just |
|
|
48:11 | show how things needs to be synchronized the global trains you So you |
|
|
48:18 | um, basically map up all the went events between the different processes on |
|
|
48:25 | common timeline and see when they And so the cause and effect And |
|
|
48:33 | is what I don't think you're showing today, maybe a future time. |
|
|
48:43 | , and you can get something that finally mastered to look at and in |
|
|
48:46 | case, waas. They, tell uses visualization to a convent |
|
|
48:52 | Or, um, sure, for one is yet for over. |
|
|
48:56 | And here is what you may look a much more complicated sequence in kind |
|
|
49:01 | in the middle, a little you see, and very short time |
|
|
49:09 | that covers about eight milliseconds from you , 15.592 and the color coded things |
|
|
49:19 | the various routines being called and the lines trying to illustrate some of the |
|
|
49:25 | in this trace between the different processes this case, four different processes. |
|
|
49:31 | , um, so it gets, , quite complicated. And it is |
|
|
49:39 | it says that tracing has the temporal and the special aspect, but it |
|
|
49:46 | which the profiling does now. But problem is, tracing produces very large |
|
|
49:56 | sets that usually not the first thing do. But it may be something |
|
|
50:01 | need to do to find out for instance, in particularly parallel |
|
|
50:08 | why there is lots of either time various processes and because it, um |
|
|
50:17 | waiting for information from somebody else. those things may be very hard to |
|
|
50:23 | out just based on profiling. So need thio both this time sequence for |
|
|
50:29 | different processes to be able to infer causes perhaps inefficiencies. I think this |
|
|
50:41 | be a good point for so to talking about actually have to use |
|
|
50:49 | in particular for the it's time. then there is a first time |
|
|
50:57 | I'll show some more slides about the part what you can do so we'll |
|
|
51:08 | . But I let, uh yes. Now talked about E. |
|
|
51:12 | have a question. Yeah, eso talked about idle time, right? |
|
|
51:18 | can we see the ideal time in code in profiling or interesting eso? |
|
|
51:32 | question. I do not know if actually get explicitly the whole time. |
|
|
51:45 | , other one, except, for , by if you first look at |
|
|
51:52 | single thread performance, the only thing guess you can use for the idle |
|
|
51:57 | is to use proper. And you out stalls that shows, uh, |
|
|
52:02 | the time for the individual course. that case, um, when it |
|
|
52:11 | to using open MP and other forms idle times due to dependence between |
|
|
52:22 | whether you can get that without maybe so. Yes. Can |
|
|
52:29 | Yeah, I'm not sure about open either, but yes, for just |
|
|
52:34 | serial programs, as you said, , there are events that correspond toe |
|
|
52:41 | number of cycles your program was stalled whether it was for waiting for any |
|
|
52:47 | or waiting for the resource are just memory access. So there are for |
|
|
52:53 | because there is no particular routine and , so if you're trying todo do |
|
|
52:58 | by statement or by block or by thread. Uh, it tells you |
|
|
53:05 | time for thread, for instance, not the breakdown where the time in |
|
|
53:09 | thread went. And there's no particular that is used for idle time. |
|
|
53:18 | , there's a consequence. So that's I think you would need the trace |
|
|
53:23 | figure out when it happened. And eso if we have something like they |
|
|
53:31 | very process waiting for a memory toe to that process. So some data |
|
|
53:37 | be transferred so we don't we cannot that time very, very devastating |
|
|
53:43 | So both the process. Yeah, . Definitely the trace to figure out |
|
|
53:52 | processes. Waiting for what? Other ? To communicate something. Okay. |
|
|
53:59 | you. Bruce. Yes. I'm that they can't quite detailed information. |
|
|
54:04 | 11 side. I wish there was simple way, but I don't see |
|
|
54:13 | . Yeah. Yes. What Good question. So so thank |
|
|
54:22 | So Yeah. Yep. Okay. is my screen visible? Now I |
|
|
54:36 | see your skin. Okay. So, eso this will be a |
|
|
54:41 | demo about how you will be using for the upcoming assignment or the one |
|
|
54:47 | already out. Uh, so give S O for giving you the |
|
|
54:52 | First, why will be using So if you remember from the previous |
|
|
54:58 | when we used Poppy, we had insert poppy calls inside our source |
|
|
55:05 | And let's say when you have huge called multiple thousands of lines of |
|
|
55:13 | gold with complex structures. Then inserting calls kind of becomes cumbersome. It's |
|
|
55:20 | impossible to do it, but it really cumbersome, uh, and then |
|
|
55:26 | all the performance metrics for your But in this case, Dow comes |
|
|
55:32 | very handy. And it, does all of the insertion off |
|
|
55:38 | as we saw in the slides in source code. So you don't have |
|
|
55:41 | make any changes to your code. , how does that for you? |
|
|
55:48 | so that's what That's the reason why will be using Dow as an interface |
|
|
55:53 | puppy. It's not a substitution off . It's an interface to using puppy |
|
|
55:57 | a much more simpler with eso here the left, you can see the |
|
|
56:02 | that you have for this assignment. the it has to multiplication functions. |
|
|
56:09 | is the classic multiplication, and from interchange, multiplication will be used just |
|
|
56:15 | the classic one for this example for demo. As you can see, |
|
|
56:20 | have not added any poppy calls to does not have the puppy dotage or |
|
|
56:24 | of the high level puppy calls the the functions of source code eyes completely |
|
|
56:31 | . Now, when you want to using how the first step you would |
|
|
56:36 | to do and I opened it here well, is as you can see |
|
|
56:41 | , 1st step is Thio Load the Tao that you can do using the |
|
|
56:48 | module load and just provide model Remember, we are on the computer |
|
|
56:53 | to make sure you don't do any this on a logging road just to |
|
|
56:59 | sure if the model was loaded correctly not. So here you can see |
|
|
57:04 | down. Model was loaded. when Dow is installed on on any |
|
|
57:10 | thes systems, or even if you ahead and install on your local |
|
|
57:15 | you will configure it, using all packages that you have, and when |
|
|
57:20 | configure it, Dow generates make files it will use to compile your |
|
|
57:27 | Andi have all carry all the parameters it will need while compiling your instrumented |
|
|
57:33 | . So to say, those make are located in the directory pointed by |
|
|
57:42 | environment. Variable. Just close. the style. So if you, |
|
|
57:48 | , list that directory, you will there will be two make files in |
|
|
57:53 | directory for our case. Since we're just the serial code for this |
|
|
58:02 | uh, will be using this uh, make file that starts with |
|
|
58:07 | lonp. The other file make file . You see, it starts with |
|
|
58:11 | FBI. So that one will require to have FBI calls in your in |
|
|
58:17 | source code, which we do not right now. So that's why I |
|
|
58:20 | be using the second one. if you want to use it, |
|
|
58:25 | will need to set up a, , environment radio called as Tao make |
|
|
58:35 | . And then you will just uh, the path to that particular |
|
|
58:43 | file that you can do by just this environment variable now and giving the |
|
|
58:49 | of field make five. So when do that, let's just go ahead |
|
|
58:54 | make sure that it was centered So just try to print it |
|
|
59:01 | As you can see, it uh uh huh. It took him |
|
|
59:08 | . Eso thes forced to steps. just have to do once every time |
|
|
59:14 | log into a new SSX session and you don't have to worry about |
|
|
59:17 | So loading the model and setting that file that you just have to do |
|
|
59:22 | once. Now the next step is a couple of compiler options that you |
|
|
59:30 | said. We'll get to the metrics later on. But first, we |
|
|
59:35 | just sit a couple of compiler So if you want your option your |
|
|
59:40 | to give out a more verbose output it's compiling the code, you can |
|
|
59:45 | this option. We will not do right now. It's going to make |
|
|
59:49 | output a little messy, but you do it. It's not gonna make |
|
|
59:53 | whole lot of difference now, As said, we'll be using Tao |
|
|
60:00 | uh, towers and interface to the library. So how do you tell |
|
|
60:06 | that what performance metrics that you want measure so that you can do using |
|
|
60:11 | environment variable called Tao Underscore metrics that's by now and setting it too. |
|
|
60:20 | whichever performance metric that you want to . So let's say you want toe |
|
|
60:28 | , measure the number off single precision in your court. What will |
|
|
60:37 | What we'll do is we'll just set value off that particular environment. Variable |
|
|
60:43 | this please settlement from puppy the PSB . That's trance for single precision |
|
|
60:51 | We said that. Let's make sure went incorrectly. Right. So now |
|
|
60:59 | double metrics environment variable points to that that event. Now here, |
|
|
61:08 | I already did some testing, So those numbers for now, All those |
|
|
61:13 | for now. But as you can , we do have a madman dot |
|
|
61:17 | uh, source code. That's exactly same as what we have here on |
|
|
61:21 | left. Eso But you when you compile your cord, what you do |
|
|
61:28 | what you just do. GCC mammal c and you give the output |
|
|
61:34 | uh, name when you want to your court, uh, with tower |
|
|
61:40 | that it gets instrumented. You will replace GCC or I C c. |
|
|
61:47 | you in case you were using intel with the Dow rapper for C compilers |
|
|
61:53 | that, uh, that is the underscore C c not message. So |
|
|
61:59 | replace your compiler named with that. , when you compile it using this |
|
|
62:06 | , as you can see how that's instrumentation by itself and ultimately in generated |
|
|
62:15 | executable that has the instrumented source So it now has all the probes |
|
|
62:21 | it and all the necessary calls already that executable, uh, necessary to |
|
|
62:27 | that performance measurements. So once you compiled your source scored, how do |
|
|
62:34 | run it? So again you can the command from Tao that's called a |
|
|
62:41 | exact and then provide a tag and that it's a serial program because that's |
|
|
62:49 | . Now. We're dealing with just single tragic programs on. Then simply |
|
|
62:53 | give you, uh, gave it lane. When you run that |
|
|
63:00 | execution is gonna finish normally and already this profile in there. But this |
|
|
63:07 | will be generated whenever you run your code. Now, when you want |
|
|
63:13 | read this, um, thats generated , you will use the command line |
|
|
63:20 | provided by Dow, which you can by using the command, Petrov. |
|
|
63:27 | you can run paper off in the that contains this profile file. So |
|
|
63:36 | you do that, you can see profile that was generated for single precision |
|
|
63:43 | because we said down metrics as single operations to be counted. Now, |
|
|
63:48 | you can see the main Dow uh, the main function, the |
|
|
63:56 | Manimal that was called in our function our source code. So this |
|
|
64:00 | and also the initialization of the So all four functions you can see |
|
|
64:07 | the left You can also see exclusive inclusive counts for each of these. |
|
|
64:13 | recall that for matrix multiplication, the off operations are toe to times on |
|
|
64:22 | . So and we had 1000, the total number of operations will be |
|
|
64:28 | billion operations. Ah, so as can see, if you go through |
|
|
64:35 | inclusive gowns So this classic magma actually those two billion operations. But because |
|
|
64:44 | classic Madam awas called from the main and main function was ultimately part off |
|
|
64:52 | , uh, our application, these billion operations were counted inclusively for all |
|
|
65:00 | three functions. Now Here comes the between exclusive and inclusive counts. When |
|
|
65:07 | see exclusive counts, you will see these two billion operations were actually just |
|
|
65:13 | inside the classic Matt malfunction That actually the did all the operations. So |
|
|
65:20 | that's the importance off checking exclusive counts compared to inclusive counts for any of |
|
|
65:28 | performance metric. Uh, now, the previous example, we just, |
|
|
65:35 | , mentioned one metric in the style operate, uh, down metrics, |
|
|
65:44 | , environment variable. Now let's say have tow. You want thio measure |
|
|
65:50 | than one, uh, metrics. what you can do is you can |
|
|
65:55 | define a Colin, separated list off and just set it in the town |
|
|
66:07 | . Now, the best thing about is you just have to change this |
|
|
66:11 | variable. You don't need to re your cold. And when you have |
|
|
66:16 | that, you can simply just go and run your code again. And |
|
|
66:30 | that's going to do is it's going create profiles inside these two directories that |
|
|
66:38 | toe thes two events. Now, this case, one thing that you |
|
|
66:45 | remember always is if you are mentioning or more events in just one |
|
|
66:54 | Uh, I'm just went out for allocation, but anyway, if you |
|
|
66:58 | multiple events, make sure you always the command poppy event chooser and check |
|
|
67:09 | compatibility off the events that you are , uh, a same time. |
|
|
67:19 | you don't do that, if you up using, let's say poppy L |
|
|
67:23 | d C M. And that's the one cash data cache, misses and |
|
|
67:28 | single precision operations. These two events not compatible with each other. You |
|
|
67:32 | check that by using pop, even if you happen to use incompatible events |
|
|
67:38 | will not report any values for any these events, and you will just |
|
|
67:42 | on thinking what's going on. So you're using multiple events, make sure |
|
|
67:46 | use events that are compatible with each . That's pretty much it for. |
|
|
67:54 | mainly what you will be using for assignments. Uh, at least for |
|
|
67:58 | second upcoming assignments. Any questions on , uh, the codes that you |
|
|
68:07 | on blackboard? They already have thes , and I've added a bunch of |
|
|
68:12 | as well, so you can go and read through them they pretty much |
|
|
68:16 | the same steps that I just showed . So if there are no |
|
|
68:29 | I will stop here. Okay? . Mhm. 10 minutes left. |
|
|
68:50 | you see where I want something Right. All right. So these |
|
|
69:03 | usto I want you ready. So so yes, demo. So there |
|
|
69:12 | just corresponding slides in the deck This shows a little bit how you |
|
|
69:20 | then use some of the other tools they're not planning on using. But |
|
|
69:25 | shows that this Dashti option if you to be binary writer, other |
|
|
69:33 | And there is a kind of also the list off environmental variables. Short |
|
|
69:39 | . But again, best thing is to make go to the tower website |
|
|
69:44 | you want to do something that's not . But it's also what the defaults |
|
|
69:52 | . Just a couple of slides against show what the program analysis allows you |
|
|
69:56 | do. And in terms off the , up off profiler, Uh, |
|
|
70:04 | will just very quickly show you examples what kind of graphs can be |
|
|
70:10 | Thio illustrate. Present the data that be collected. So here is |
|
|
70:19 | You already so a little bit on per threat where things are kind of |
|
|
70:24 | with respect to the time the various steaks and the color coding is in |
|
|
70:34 | protein, and in this case, a parallel code. So you so |
|
|
70:37 | each thread, you see each routines proportion off time. In that |
|
|
70:51 | um, there is, and there just a little bit under the options |
|
|
70:56 | can choose. Things are being presented I and here is a little bit |
|
|
71:05 | ways. Where are you? In case, you can show it relative |
|
|
71:14 | each other in a more clear perhaps that for each routine on each |
|
|
71:19 | you get the profiling. You can the relative difference in how much particular |
|
|
71:27 | teen is used in a different um, so on And here is |
|
|
71:37 | a different way in terms of the inclusive and exclusive times and the calls |
|
|
71:45 | this is an open, empty coz a parallel code that now we have |
|
|
71:50 | it simple for the first exercise. just to single threat that we will |
|
|
71:56 | get to do open MP examples. at that time and may be useful |
|
|
72:03 | go back and look at this particular on bond. This this is an |
|
|
72:11 | again on the country where you can a particular part of the country on |
|
|
72:17 | . C. There how the time spent in that culture. Then there's |
|
|
72:24 | kinds of fancy ways are doing three bar graphs on this particular case |
|
|
72:31 | um, different routines. Different times you can choose, uh, how |
|
|
72:38 | represent things. So this is one there is another one when the triangle |
|
|
72:44 | in terms of representing things and there whether you're scatter plot and trying to |
|
|
72:50 | , uh, thirteen's. So, , we may try to do a |
|
|
73:00 | paragraph at some point, maybe no when we talk about open empty than |
|
|
73:06 | at this moment. So it's just you a number of these different plots |
|
|
73:13 | . And this is just an example some of their software that Tower is |
|
|
73:19 | to do the graphical representation on the and trace and trace representation. And |
|
|
73:27 | is trace analysis, and, as said, they get somewhat complicated, |
|
|
73:32 | it can follow over time. Different are used for the different processes how |
|
|
73:39 | change behavior over time, respect of is being used again. Routines are |
|
|
73:44 | coded, so I think that's pretty what I had in this. There's |
|
|
73:51 | highlight about the routines that are being . Were not continuous, um, |
|
|
73:54 | the of course. But if you back and look at this total comprehensive |
|
|
74:01 | off gotta encompasses on this central software , you will find these thirteen's. |
|
|
74:11 | different ways of illustrating with a different . So, as I said, |
|
|
74:15 | is just to highlight what you can more than trying to teach you exactly |
|
|
74:19 | to use each one of the But I encourage you to explore if |
|
|
74:24 | do more complex things outside what we're , This course uh huh. This |
|
|
74:32 | just a reminder on some of the and during performance measurements that things are |
|
|
74:39 | so that among run other things means don't get the full detail. It |
|
|
74:44 | means that you may not get exactly data every time you do a new |
|
|
74:53 | , because it is some statistical effects there. Um, and there are |
|
|
75:03 | things I'll talk more about. I to talk about contact compilers, for |
|
|
75:11 | . So the statistic, your your and that was part of the awareness |
|
|
75:18 | think I wanted to create using in assignment. One, that compiler optimization |
|
|
75:27 | change. Also, instruction counts because things may be optimized out, so |
|
|
75:34 | know may not get the same instruction , depending upon the level of optimization |
|
|
75:42 | is being used by the competitors. also the case that the same compiler |
|
|
75:50 | level doesn't give the same results for . The same code on different |
|
|
75:56 | So if you run it on an Exodus six or you run it on |
|
|
76:01 | M. D x 36 you may Catholics exact same data, you |
|
|
76:08 | for the same data set at the compiler optimization. Never. Um, |
|
|
76:20 | that's I think, what this lines thio awareness, like on the next |
|
|
76:28 | , is just much of the references wished or the lecture slides. So |
|
|
76:36 | that, that was the slides for . So questions either for suggestion, |
|
|
76:42 | demo or how to use now or in general. We'll try to answer |
|
|
76:48 | best we know, we don't have use in more than by the class |
|
|
76:52 | have used it in some other co projects and then, uh, coming |
|
|
76:58 | to their all the time. They trace tracing a lot to try to |
|
|
77:05 | dependencies. And we're time waas lost the court will go on hyper |
|
|
77:16 | okay? Or or operating system sees course to separate course. Whenever we |
|
|
77:24 | , like approx like CPU or any those, um, cool for |
|
|
77:30 | So they didn't Wasn't really a problem assignment one, because we were using |
|
|
77:35 | note and one core. Um, if we were using, uh, |
|
|
77:39 | or three, um, and we trying theoretical Max is how would we |
|
|
77:45 | that it used three physical cores instead , you know, physical course if |
|
|
77:50 | were using three hyper threads. um, the physical course, |
|
|
78:02 | Right. This is a good So there's one way to know exactly |
|
|
78:14 | you get. And I will talk that in some future class, and |
|
|
78:20 | is to lock threads to course toe the operating system from moving things |
|
|
78:30 | So if you don't fix threats to , the threat may not stay on |
|
|
78:37 | same core through the entire execution, operating system may decide to move |
|
|
78:45 | So that's why I do not know these tools, if you can get |
|
|
78:53 | time trace off. What core trace been on through the duration off the |
|
|
79:09 | . Right. So there is, , the problem and they're on. |
|
|
79:23 | unless I think you're locked traces, cannot, um no. Where it |
|
|
79:32 | allocated, Intel has leased for some processors a way to decide how you |
|
|
79:51 | threats to be allocated two course and threads. And I covering up. |
|
|
80:06 | when I talk about open MP, it doesn't quite answer your question. |
|
|
80:11 | do you know where it was Unless you lock it. Okay. |
|
|
80:19 | . It's the only mechanism that we of to be able to guarantee the |
|
|
80:24 | hardware being used. Yes, you . Yes, it's available to the |
|
|
80:30 | . Okay, So in an where we the servers have 24 per |
|
|
80:38 | and 48 per known right? So we were to say a lot, |
|
|
80:43 | of them where they're just be waiting that makes the that prevents the operating |
|
|
80:50 | during an error or what we get sort of error message or what? |
|
|
80:55 | will be there, right? I remember. Sorry. What I remember |
|
|
81:06 | top of my head is that it not defined whether things get kind of |
|
|
81:17 | when you don't have enough physical resource or if you did, do you |
|
|
81:23 | an error? So if you basically more threats than the hardware can |
|
|
81:37 | then, um what happens, I , is it's undefined in the |
|
|
81:45 | Everyone, But I will look it when I talked about it. Thio |
|
|
81:50 | you better, more precise answer. that's what I remember. Okay. |
|
|
81:57 | you, Dr Johnson. So that fault that most of the allocation that |
|
|
82:10 | guess that always does it Yeah, kind of round Robin, so to |
|
|
82:20 | , between sockets on dso it does soccer zero If it's a two soccer |
|
|
82:28 | , soccer zero, then next red it run back to socket zero and |
|
|
82:33 | back to socket one. And when goes back to the same socket, |
|
|
82:38 | takes the next core. The physical for the second thread on the core |
|
|
82:47 | to the second core. And then it has filled up all the |
|
|
82:53 | Then it goes, if our pathetic enabled and then it starts to do |
|
|
82:58 | same thing at the hyper threading Okay, so uses the physical force |
|
|
83:05 | it starts hyper threading. Right? the fourth mechanisms that I think most |
|
|
83:12 | use. Okay, so so but , yes, you. So that's |
|
|
83:21 | what the option is, what they spread and mhm. But there's also |
|
|
83:28 | option that has different names. But one compact, in which case it |
|
|
83:35 | the threads asl long as possible on same court on the same sock it |
|
|
83:41 | it takes on the next socket. mean, so that means, for |
|
|
83:51 | , f. You have relatively small sets, and it's a lot off |
|
|
83:58 | sharing between the threads on the same set. It may be advantages to |
|
|
84:04 | them on the same socket if the fits, and then three because they |
|
|
84:08 | need to go to the other On the other hand is your, |
|
|
84:15 | , restricted by memory bandwidth. You want to have spread things out among |
|
|
84:20 | socket, so you get as much bandwidth as you can, but you |
|
|
84:28 | control that in open M. P far as I remember using this attributes |
|
|
84:35 | how you want the threats allocated. , I was just wondering because, |
|
|
84:43 | , we ran something with for I wouldn't know if it was hyper |
|
|
84:48 | or not and whether or not the paper was correct or not. But |
|
|
84:53 | if you said it, it's uh, expensive the physical port before |
|
|
84:58 | starts. Hybrid turning them should be to go. And yes, very |
|
|
85:04 | question. So, uh, and very thoughtful. So yes, so |
|
|
85:10 | if it to know. So in off the assignment itself, as long |
|
|
85:17 | you explain how you figured the that's all fine because, you |
|
|
85:22 | So if you say well, if have two threads, I will use |
|
|
85:28 | maximum for two cores. And assuming that what's happening in three threads three |
|
|
85:33 | and that assume that what was happening long as you tell that was the |
|
|
85:37 | for the judgment or if you, know somebody has computes the total for |
|
|
85:44 | entire processor or socket or chip as as it's clear what the basis waas |
|
|
85:51 | I have It's all OK. Answers this course. In reality, If |
|
|
85:56 | really want to be detailed, then actually correctly. You need to understand |
|
|
86:00 | many course was actually used. Thank you, Dr Johnson. |
|
|
86:09 | Well, yeah. Any more If not, I won't try to |
|
|
86:24 | the recording here with summer. I'll there |
|