© Distribution of this video is restricted by its owner
00:00 | mhm dr lecture time. No idea I get a green frame this |
|
|
00:08 | Mm So we'll see hopefully will not , you know, I don't know |
|
|
00:15 | it comes from. Um so today will continue with another tool that is |
|
|
00:25 | for understanding how your cold performs the you're using and that you know as |
|
|
00:35 | towel and after I give you I introduction a little bit of the characteristics |
|
|
00:42 | features of now. So yes with demo how to use tower as well |
|
|
00:50 | again demo some of the features of . Yeah. Oh okay. |
|
|
00:59 | So but now stands for this tuning analysis utilities. It's kind of a |
|
|
01:06 | kit that has a wide range of and it uses many other packages for |
|
|
01:14 | task. Like we mentioned last time used pop before getting access to registers |
|
|
01:23 | collects data about the processor behavior and memory system for instance can tell can |
|
|
01:32 | more and we can also do more that. Just an example of other |
|
|
01:38 | that are out there used to accomplish broader scope of things that it has |
|
|
01:46 | so there's many ways of instrumental in and we'll talk about that as well |
|
|
01:50 | depending upon what you want, it use different underlying for supporting packages and |
|
|
01:58 | of what I will give examples of not the latter part of the profiling |
|
|
02:04 | tracing since we talked fair amount of features on Tuesday this week. Mhm |
|
|
02:14 | this can use the picture show before shows the kind of broad scope of |
|
|
02:21 | and that it uses various other packages specific functions or tasks and as it |
|
|
02:31 | it's a comprehensive toolkit and as you see on this coming slides It follows |
|
|
02:40 | methodology and talked about last time. the instrumentation part and then there is |
|
|
02:46 | data collection part and then there's the part. It has also a good |
|
|
02:53 | of tools for the analysis part. . And this this this is pretty |
|
|
03:02 | just examples of what I talked about time plus some things that public does |
|
|
03:08 | do. So this you can also at IO and other features um network |
|
|
03:17 | and so on. The probably does support at least not in very wide |
|
|
03:26 | . Okay so I'll talk about the few slides, different ways of instrumented |
|
|
03:33 | codes and as josh mentioned last Uh huh. The top provides nice |
|
|
03:40 | for instrument in the cold kind of automatically. If not automatically. Sometimes |
|
|
03:50 | can be done to so instead of having to invest in search statements in |
|
|
03:56 | code like you mentioned for poppy you have towel do that for you and |
|
|
04:07 | the instrument, the instrumentation can be so you can control how the measurements |
|
|
04:12 | being done. Whether you sort of measurements, we have probes or some |
|
|
04:17 | future or to do some more indirect and there's various kind of scopes, |
|
|
04:25 | can do the whole program or you do things by functions or by loops |
|
|
04:31 | in many ways you can control how is being collected and then as I |
|
|
04:37 | it has a good set of analysis for trying to make sense of the |
|
|
04:41 | that is being produced during the measurement and I was talking about some of |
|
|
04:47 | and I think she? S will them or some of the tools you |
|
|
04:51 | for trying to make sense of the that you collect and this is just |
|
|
04:59 | or less whatever. I designed this of three phases instrumentation, measurement and |
|
|
05:05 | and it kind of characteristic is the the instrumentation can be done and working |
|
|
05:11 | a source code and one and and the other bottom here in the blue |
|
|
05:16 | is to directly work with a binner the execute double. So there are |
|
|
05:22 | levels in between in terms of using for instance or all kinds of rappers |
|
|
05:29 | try to instrument the code and the touch what they're doing until I think |
|
|
05:36 | just using the computer approach the instrument called But so you can correct me |
|
|
05:43 | I come to that and as I we talked about events last time in |
|
|
05:49 | form that would probably type events. is basically in that case it means |
|
|
05:54 | aspects of your cold like a number sections, number of cycles, cache |
|
|
06:01 | that are then label as events and can define exactly what type of information |
|
|
06:08 | would want to collect about your code then there are somewhat I guess more |
|
|
06:14 | level aspects of profiling and tracing that give examples of today and then again |
|
|
06:23 | to the analysis section And the next is a very busy one that you |
|
|
06:30 | if you look at it on your and you can probably make a little |
|
|
06:35 | sense of it. But the point essentially just to give you the impression |
|
|
06:40 | it is a very first the type and comprehensive tool that have support from |
|
|
06:46 | aspects of getting information about your cold analyzing the cold and then they're analyzing |
|
|
06:55 | try to make sense of data collected we'll talk about some of these suspects |
|
|
07:01 | um Tower is again a very comprehensive tool so we just want to in |
|
|
07:09 | introductory course to get to okay pay chance to pay attention and in case |
|
|
07:16 | use it beyond the course then you where to find more information and get |
|
|
07:20 | exploit more of the features that we do in assignments. So one um |
|
|
07:29 | I guess I highlighted that last time it, the data volume that various |
|
|
07:39 | give you can get overwhelming. So should both the conscientious of what tests |
|
|
07:48 | you use but them and also needs be pay attention to how much data |
|
|
07:54 | need to be able to answer questions are asking yourself about your code and |
|
|
08:00 | just shows that you know at one , if you look a profile information |
|
|
08:06 | not usually overwhelming. But if you to the other end to get trace |
|
|
08:13 | that means that also has the time into what happens in your cold. |
|
|
08:18 | been an about face, it can quite substantial in terms of the data |
|
|
08:24 | organ that you get. So this is just trying to make you aware |
|
|
08:29 | that what you're asking for consequences also but its output and then also has |
|
|
08:37 | for how you can actually get some information and all the data you |
|
|
08:42 | Mhm And there's just the instrumentation So Tower provides means for automatically instrumental |
|
|
08:53 | it. You're cold use what they the G D T or the program |
|
|
08:58 | tool kit, compiler instrumentation but basically your code um using towel and then |
|
|
09:07 | was about computers for the platform you're and then it does the instrumentation for |
|
|
09:12 | and that's I think the way we you to do instrumentation for the assignment |
|
|
09:18 | of course you can always do it . But then there was also ways |
|
|
09:24 | linking the code to various libraries that and specific features that are interested in |
|
|
09:32 | you may not get to the automatic that tower provides and then do you |
|
|
09:42 | a minimal impact on your code than may In fact used tools that does |
|
|
09:50 | the binary two then collect information from execution of their coach. And so |
|
|
10:03 | is kind of just a little bit to just get some a picture of |
|
|
10:08 | automatic instrumentation um works and when it to town so it's best at us |
|
|
10:20 | and analyzing of the so scrolled and it uses um it's kind of library |
|
|
10:28 | instrumentation capabilities and depending upon how you instrument with particularly instrumentation, then it |
|
|
10:38 | an instrument. The code that you okay used for the execution. And |
|
|
10:45 | is kind of just an example of compiler and I think that's what we |
|
|
10:50 | for demo later that just shows that up. Compiling is straightforward with you |
|
|
10:56 | your FORTRAN compiler then you use towel do the calculation and you get code |
|
|
11:07 | that according to directive such a This particular example says mp in |
|
|
11:12 | So that means this particular example happens be for somebody doing very code for |
|
|
11:20 | cluster using mp a message passing library the principle is the same. Weather |
|
|
11:28 | a sequential killed a parallel code of flavor and then you have various ways |
|
|
11:36 | controlling of instrumentation done. And here just a list of some of the |
|
|
11:41 | that you have for telling the compiler about the instrumentation to done and how |
|
|
11:49 | information they want to the output. believe that Joshua thought it's more about |
|
|
11:57 | of the examples and what is the marine. Sure. So yeah. |
|
|
12:07 | actually welcome to interrupted their comments and of the particular slides. Yeah. |
|
|
12:17 | and here's just an example of some the other software packages that helped in |
|
|
12:23 | case. But there rewriting the binary by the computer. So dining is |
|
|
12:30 | fun? And the stands for dynamic simply that has been around for quite |
|
|
12:35 | time. And then there are some were packages again for rewriting primary binaries |
|
|
12:43 | you can use. Mhm. And example on the bottom is again from |
|
|
12:49 | parallel told this the empire run um that's similar to estrogen for islam slur |
|
|
13:02 | that they used uh just first example assignment. Yeah. Yeah. And |
|
|
13:11 | is just to get an example. . Uh support an example. You |
|
|
13:16 | some of you do use it in similarity about you see uh in the |
|
|
13:23 | most codes are parallel. We'll get that later in the course. So |
|
|
13:30 | the measurement part is one is again do director observation by using probes that |
|
|
13:39 | checks on the status of the code various parts and then it's just um |
|
|
13:52 | and again depending on what approach you're . You can then collect different types |
|
|
13:57 | information but it's your control as it Director what information will have fine instrument |
|
|
14:04 | your code in that way. The one is the director performance measurements And |
|
|
14:10 | some ways if she they just use , they don't, it's instant in |
|
|
14:17 | very high level kind of an interact of the code is performing. But |
|
|
14:27 | you can do like with carpet uh also in terms of other packages that |
|
|
14:34 | support event based sampling or ebs that so on. A few strains will |
|
|
14:41 | using. So that was the two . And the other example I guess |
|
|
14:50 | the used to define events um begin the senate sample was just using the |
|
|
14:57 | set the clock before and after Figure out what it took between the |
|
|
15:00 | calls. Um, but generally there anatomically increasing and except as uh so |
|
|
15:10 | I was born with some of the timers they may re cept in case |
|
|
15:18 | call them. Someone has to be . It's not necessarily true that they |
|
|
15:22 | all um metrics that you collect depending the function we're using is monitoring with |
|
|
15:33 | and then you can do just for events in Nicole. You can this |
|
|
15:39 | that's atomic events as supposed to they timer for a routine that is not |
|
|
15:46 | atomic or Aloofness or something. So is more capturing specific events set in |
|
|
15:56 | cold. That's the example here particular how much memory for instance is used |
|
|
16:07 | particular pointing Nicole and then you can different what I'll call refer to |
|
|
16:15 | you can do it some proteins, can do classes, you can do |
|
|
16:19 | , you can do loops and sure , so different ways of doing it |
|
|
16:24 | so you should meet them. That's well. So, um, now |
|
|
16:34 | few examples of using power and send and then I'll give examples of puts |
|
|
16:43 | towel generates and after that I will to suggest a democrat to actually use |
|
|
16:52 | and it's just preamble a little bit um how to both instrument and then |
|
|
17:02 | about the random but then how to but um, tar collects about your |
|
|
17:12 | . Mm hmm. So as I , no. Well talking about the |
|
|
17:20 | blocks in the class with news against features of town uh, form and |
|
|
17:30 | essentially for providing profiling and tracing that the class here. They're using it |
|
|
17:39 | a compliment papa to poppy. And as a way of using party but |
|
|
17:50 | instrument the cold. So the difference it's not on the familiar is a |
|
|
18:04 | gives aggregate the information about the It has no notion of time. |
|
|
18:11 | it's just for whatever segment of code the entire code that's running. It |
|
|
18:17 | you the global or total picture of happened in the execution. Whereas the |
|
|
18:26 | also has the event in a What happens with the code and what |
|
|
18:33 | called at various points in time in cold. So that's why execution traces |
|
|
18:43 | substantially more data than just doing profiling your coat and I'll talk about the |
|
|
18:54 | example here. So but just on slide so under the profiling colony, |
|
|
19:03 | best to see a number of function in that particular code that was |
|
|
19:09 | And it tells you in this case it stands on top of the bar |
|
|
19:14 | that um the units that you see their bars as second. So you |
|
|
19:22 | control the resolution. For instance if want to use time, whether you |
|
|
19:27 | seconds or milliseconds or whatever the unit time is that is relevant for the |
|
|
19:33 | . We can also have other things number of instructions or number of calls |
|
|
19:38 | many other aspects of your cold but gives a total number of times spent |
|
|
19:45 | Time that is spent in this for in the 1st 13 on top. |
|
|
19:50 | this like you I case sweep Um No it's also on top since |
|
|
19:59 | . I'll come back to that and future slides. But you mentioned that |
|
|
20:05 | time at the exclusive is just for particular function. The unique pieces of |
|
|
20:14 | . So if it calls other functions of that time is not included an |
|
|
20:22 | type. But if you have inclusive , it would report all uh calls |
|
|
20:32 | by this routine. Nike sweetie. . And it's again the total time |
|
|
20:40 | all the different calls not just For car. So it just different instances |
|
|
20:48 | the culture and the team may take amounts of time. So it doesn't |
|
|
20:51 | you any detailed information, just the for the whole ground and in some |
|
|
20:57 | it sorts them nicely. So in case it's easy to focus if you |
|
|
21:03 | to optimize the cold, you go the ones most likely that spent uh |
|
|
21:08 | the code spends most of the time so there is just a little bit |
|
|
21:16 | the maxime in total. And as mentioned ready you can do it for |
|
|
21:22 | function for basic blocks for loops it threads and processes. We haven't talked |
|
|
21:29 | much of a difference between threads and but we will in subsequent lectures and |
|
|
21:37 | you have the whole range of attributes those particular scope that you can collect |
|
|
21:45 | towel using you and it it turned either Poppy or some other tool that |
|
|
21:51 | haven't talked about in detail but how would know what to use if |
|
|
21:59 | give it the attributes and scope. in the profiling there is what the |
|
|
22:10 | that showed and talked about in terms profile that's known as a flat |
|
|
22:15 | Um I guess it's because it's kind one dimensional in some sense, it |
|
|
22:23 | give you much insight as a set what actually happens in the code except |
|
|
22:28 | aggregate information where a cycle path profile you also the control flow information about |
|
|
22:38 | happens in the code. And then can also have uh to find special |
|
|
22:46 | profile. And I give examples of profile as well as flat profile. |
|
|
22:56 | so there is just another flat profile and in this case again it's time |
|
|
23:06 | I guess it's short here exclusively inclusive difference. And this is again another |
|
|
23:12 | plot. And uh I think there's much to more comment on that. |
|
|
23:18 | read they did for the other flat except it's just list a bunch of |
|
|
23:25 | calls. And again, this is parallel cold written for a cluster because |
|
|
23:32 | shows mp that is this message passing . Someone you can tell if they're |
|
|
23:38 | as to what that this system parallel . Mm And here is just an |
|
|
23:48 | of one of how uh you can tell and what instrumentation you want in |
|
|
24:01 | loops. And I think so, demo it later on. It also |
|
|
24:06 | your options in terms of what, much I'll put your want from tower |
|
|
24:12 | terms of using the for instance, verbals uh option. Now there is |
|
|
24:23 | no, just a different type of flat profile must in a kind of |
|
|
24:29 | level thing. So in this case a different unit of time microseconds and |
|
|
24:36 | metric was used at the time all in this particular case. But it |
|
|
24:40 | that this uh nope, what do make multiply matrices for this particular cold |
|
|
24:47 | was clearly very dominating forever the what spell. But then there's also |
|
|
24:51 | other loops. But yeah, that themselves don't spend too much time. |
|
|
24:58 | that means maybe these shallow slopes and other functions that perhaps this uh in |
|
|
25:10 | other bars to and I give another example of things met more based on |
|
|
25:18 | choreograph and start to decipher what And I think this is just an |
|
|
25:26 | of how to control what you want terms of and this is instrumented no |
|
|
25:37 | video town. And I think again will spend more time I'm going through |
|
|
25:44 | particular or related example. Um so more uh that's another flat profile. |
|
|
25:59 | instead of doing time in this case can also use composite measures, sums |
|
|
26:06 | are just in time. In this you want it, the user wanted |
|
|
26:12 | get the instruction rates or invested, the number of instructions and than the |
|
|
26:19 | it takes. But it's an You get the average for the whole |
|
|
26:26 | . Uh huh. And otherwise there no the next example is to try |
|
|
26:35 | get the call back and I think next slide will basically show you what |
|
|
26:39 | get in that case. Um so they call path profile in this case |
|
|
26:47 | gap and the fraction of total time and for this example in the barriers |
|
|
27:00 | and it's again exclusive. So come the inclusive example. Then I think |
|
|
27:06 | next slide or two. But so just shows you different things you can |
|
|
27:13 | out. But now here is actually representation of the consequence that's when you |
|
|
27:22 | get and have the call graph. it just shows in this case that |
|
|
27:28 | main ways to potentially do different calls the function one and the front books |
|
|
27:34 | five. Um and then how so control flow works in this particular code |
|
|
27:45 | what function is being called that we to call from Function one man either |
|
|
27:50 | four or two and then they are called function to then call function |
|
|
27:57 | Now that's the kind of simple example sometimes it's useful to get some understanding |
|
|
28:02 | what the control flow is as well how much time is spent in the |
|
|
28:07 | functions and the next line that shows little bit perhaps more typical for a |
|
|
28:15 | application as supposed to some simple kernel what things may look like and then |
|
|
28:22 | becomes obviously it lot more complex that of start to analyze it. But |
|
|
28:30 | towel can help you gaps. They sequence and for you the cold and |
|
|
28:41 | you analyze it. Um and here yet another example and I'm not so |
|
|
28:56 | this one, I think I have clock that is the inclusive time slot |
|
|
29:03 | profile. And so in this case attention to a few of these so |
|
|
29:10 | can show you the difference coming So here's one routine and it's cleaner |
|
|
29:16 | . one that consumes most of the . Here. They have active and |
|
|
29:22 | there is a bunch of other ones the ladder here. Now the next |
|
|
29:29 | it shows if you tie in the sex or profile, okay. Using |
|
|
29:36 | as the attributes with the code, using now the inclusive time. The |
|
|
29:45 | if function that was the one that most of the time if you just |
|
|
29:50 | at routine itself um this final most . So that means it's called somewhere |
|
|
29:57 | the cold and even though the majority a big fraction at the time for |
|
|
30:02 | whole execution is spent on that routine It's called by other ones. So |
|
|
30:08 | is kind of the whole application and if you go down here then you |
|
|
30:13 | this objective function down a bit and can see that it's the same number |
|
|
30:20 | fact as um this time exclusives or some sense from looking at both the |
|
|
30:28 | of inclusive and exclusive. You can fact conclude that the addictive doesn't really |
|
|
30:37 | any other function but there are ways looking at it in other ways using |
|
|
30:47 | . So I think on this side just extracted some of the thing to |
|
|
30:52 | the difference between exclusive and inclusive and kind of like the blue areas pointed |
|
|
31:01 | what I just talked about in terms the objective function and there are other |
|
|
31:09 | that uh that's kind of the opposite that this disc interest function that is |
|
|
31:17 | the bottom of the exclusive list or profile. That in itself doesn't do |
|
|
31:26 | unique cold for this function. But you do the inclusive timing. That |
|
|
31:33 | it calls a bunch of other So in the inclusive timing, it |
|
|
31:39 | Actually consume a good fraction of the time, more or less, 60% |
|
|
31:43 | the total time. But again, because it calls other functions to do |
|
|
31:49 | job. So it is useful to both, I would say so exclusive |
|
|
31:56 | clearly a good thing to when you to figure out while you want to |
|
|
32:03 | the effort to improve performance. But also sometimes useful to look at the |
|
|
32:09 | times to narrow the search for what look for. And this graph shows |
|
|
32:24 | little born or detail of. Now basically on the left column here. |
|
|
32:30 | have the next thing on the calls being used in this the application as |
|
|
32:39 | call. And we saw this descriptive that was at the bottom of the |
|
|
32:47 | profile. And see that exclusive time very small for instance, but the |
|
|
32:55 | time is definitely not small. And it calls uh one function space out |
|
|
33:04 | and then they call this objective function I highlighted a few times. Hearing |
|
|
33:09 | at these graphs and one can see that uh it you look at the |
|
|
33:21 | and inclusive time for dysfunction, they're . So that means you can, |
|
|
33:25 | you mentioned, it does not have embedded calls in it. It's just |
|
|
33:33 | so called that Yes, Dima can look at the numbers here in terms |
|
|
33:40 | see what happens if you look at discotheque is fixed function here and took |
|
|
33:44 | 20 inclusive, 22 something um seconds believe it's the unit here and then |
|
|
33:54 | can see that corresponds to oh sorry if you add up certain numbers |
|
|
34:06 | you should get the right numbers So if you look at basically this |
|
|
34:12 | function here, the ad back and these two times here and then the |
|
|
34:19 | time for the space function that calls , they add up to basically the |
|
|
34:25 | time that you have here and you go through some of these other |
|
|
34:29 | you can decipher how the times are making sense in terms of what's inclusive |
|
|
34:38 | exclusive given the culture that they have the left hand side, this output |
|
|
34:45 | gives you the number of calls. so in this case you can see |
|
|
34:50 | this event function. Yes, called lot in this case about 180,000 times |
|
|
34:59 | um going through this or completing the town. So you get a lot |
|
|
35:05 | information about both the total time spent the function and the number of times |
|
|
35:13 | called. So that on the other and inspect time recall is not very |
|
|
35:17 | . But so you know you may have thought of it if you just |
|
|
35:23 | one instance of the regime that maybe not that critical because it's fairly |
|
|
35:29 | But on the other hand if you at the aggregate time and you may |
|
|
35:34 | to pay attention to it and when talked about a little bit about how |
|
|
35:42 | structure code to make efficient codes and didn't already are aware of. Its |
|
|
35:47 | calls are expensive. So for instance you do this many function calls and |
|
|
35:52 | time for function call is not the per function call is perhaps not very |
|
|
35:58 | . You may want to look at of reducing the number of function costs |
|
|
36:06 | this is just an example of what can get out of town in terms |
|
|
36:10 | statistics and getting insights of what uh happens in the cold. If you |
|
|
36:16 | the call yourself you may know it well but if you're given some codes |
|
|
36:23 | you try to figure out what goes with it, this type of information |
|
|
36:27 | very helpful to help you just so called. Um than some examples of |
|
|
36:40 | again it gives a time stamp in to what the profiling does and chose |
|
|
36:53 | sequence of events in time and what events takes for each one of the |
|
|
37:00 | . So you get do you remember first? It's like a shoulder of |
|
|
37:06 | difference between profiling and tracing and I'll you some more examples of what this |
|
|
37:15 | looks like but and this this is of what it does and this example |
|
|
37:22 | again for a more difficult situation than have done so far. That means |
|
|
37:28 | have a number of threads or processes system for processes. If he program |
|
|
37:36 | cluster is using FBI Processes on this for two different processes and then towel |
|
|
37:46 | truck, what happens in each process ? So you get basically tries to |
|
|
37:54 | per process and the time stamp when events happens for that particular process. |
|
|
38:04 | then you can get the global view having the traces for all the different |
|
|
38:09 | emerged. So towel manages all these for you and again, more complex |
|
|
38:20 | . Yeah. And this is just way off uh making sure you again |
|
|
38:28 | the trace as an output and so made hemorrhoids, I think this is |
|
|
38:36 | very good example but it's still there I'm going to have a better example |
|
|
38:41 | think next that gives you a little of the different aspects of output depending |
|
|
38:48 | what underlying software tell uses for helping produce uh both the profiler and |
|
|
38:59 | So in this case you can see the middle here kind of trace where |
|
|
39:03 | are different functions are color coded and this case again it's a pretty |
|
|
39:09 | So you have and processes and then can have as it said here, |
|
|
39:16 | also for in this case, notes the top bit mixture but also a |
|
|
39:22 | here. So you can follow things threads, you can follow things for |
|
|
39:27 | time and of course should have a high resolution time. The graphs are |
|
|
39:35 | somewhat nontrivial to um Interpret. So has to be again, again conscientious |
|
|
39:42 | what one is asking for in order not to get so much data, |
|
|
39:48 | can't really see what happens. And the top is kind of just the |
|
|
39:53 | type of information where again things are coded on their per thread on the |
|
|
39:59 | basis in terms of the parallel Uh huh. So I think this |
|
|
40:08 | what I have already I said well maybe a good as a reminder. |
|
|
40:14 | don't think I need to repeat it maybe go the federal cuts of |
|
|
40:19 | Okay. And then I think I leave it to suggest to the |
|
|
40:28 | I think that's the proper thing and there's time left then let me show |
|
|
40:33 | some more slight about visualization tools but it's because for them for them to |
|
|
40:41 | , so I will hand it to yes, so I will stop sharing |
|
|
40:50 | screen I guess. Yes, I take over I think. Okay. |
|
|
40:57 | , I should have done it. . Well well, uh well the |
|
|
41:08 | I'll be showing will be in context using now as uh that's for coffee |
|
|
41:16 | then I'll show some of the Examples how to use fire across and how |
|
|
41:24 | navigate through it to check the profiles 1000. Right? But first on |
|
|
41:31 | stampede to uh what you need is because you need to be on a |
|
|
41:37 | nodes to do all the experiments. you can do it on logging roads |
|
|
41:41 | don't do it. Uh but uh starting with using italian personally to make |
|
|
41:50 | that you load the model uh for and they close attention here. There's |
|
|
41:58 | two or three different versions of our on the campaign to uh and the |
|
|
42:03 | that we will be using is uh dot two. And that we know |
|
|
42:08 | doesn't have any problems. Uh working copy the other religions have had some |
|
|
42:14 | working with poppy and make crash sometimes make sure you use this particular |
|
|
42:19 | Yeah. Uh once you have done , make sure you uh that second |
|
|
42:25 | loaded there, we got uploaded. the first thing uh that you need |
|
|
42:30 | do and to set one of the were able for how that calls the |
|
|
42:37 | make violence. And the way you do it is by simply uh setting |
|
|
42:44 | that we're able to the right location the way you would know the right |
|
|
42:49 | is uh the make file is located the location pointed by the environment variable |
|
|
42:58 | south. I think it's simply do uh dollar tao that should give you |
|
|
43:05 | location and if you alert to that mm it should give you the list |
|
|
43:13 | that directory and it will show you make files here. Now the make |
|
|
43:18 | that you want to use is this that have intel O M. |
|
|
43:24 | In its name? Not the intel . I won we will we will |
|
|
43:28 | using intel MPI one because otherwise if use the until the FBI one, |
|
|
43:33 | will require your program to be an program which currently we're not using. |
|
|
43:39 | what you need to do here is the uh down make trial variables to |
|
|
43:47 | uh to point out that uh find so that you can do by using |
|
|
43:52 | command here. So now make vertical town slash the name of the Mc |
|
|
44:03 | . And once you've done that just make sure that correctly out here there |
|
|
44:14 | is pointing to the correct made Now. The other thing that you |
|
|
44:19 | to do is let's say what bobby metrics that you want to measure and |
|
|
44:26 | said that what what you need to is that the environment variable called down |
|
|
44:33 | equal to the uh about the event you want to measure. So let's |
|
|
44:40 | we want to measure uh single precision for For so I just said that |
|
|
44:48 | metrics. Environment variables about events. uh now here the example that I'm |
|
|
44:55 | is the matrix multiplication code that you've for your assignment And here I have |
|
|
45:04 | N equals 1000. So that means matrix multiplication, the total number of |
|
|
45:11 | are too and cube. It will too big operation or two billion |
|
|
45:19 | So now that you have set up two things that thou make five and |
|
|
45:24 | metrics, what you need to do to uh compile programs using one of |
|
|
45:32 | towers compiler rappers. So if usually you may do is use the compiler |
|
|
45:38 | gcc And so as your source code and the uh the output by |
|
|
45:44 | The the simplest thing you need to is replaced D C T with the |
|
|
45:51 | from doubt for the second pilot, all you need to do. Once |
|
|
45:56 | have set it up, you make and the metric uh environment variable. |
|
|
46:03 | question for me if it's okay. how do you know which compiler? |
|
|
46:11 | uses jesus? It uses uh intel from the uh from the uh make |
|
|
46:22 | lamps, you can tell mostly. . Yes. Okay. Yeah. |
|
|
46:25 | generally in the make final length when when you can figure and install it |
|
|
46:30 | it has to be with violent contribute when it is configured and installed generally |
|
|
46:37 | make violent contains the name of the as well. And it also convinced |
|
|
46:43 | of all the creatures that that particular file. Support with support, |
|
|
46:48 | It obviously supports the tv t the program uh or what uh some data |
|
|
46:56 | . It also supports the open and grounds as well using the make |
|
|
47:02 | Okay. Yeah, it's general Uh if the vendor compiler is available |
|
|
47:09 | the platform you're using, I would using the vendor compiler, it's |
|
|
47:16 | But no guarantee that the code will um more efficient in using the resources |
|
|
47:25 | Gcc. But gcc sometimes beats the compiler. So it's no guarantee. |
|
|
47:29 | , I said. But the starting that would advise to use the vendor |
|
|
47:34 | if it's available. Yeah. And case somebody doesn't know that Gcc is |
|
|
47:41 | gm you compiler. And if you to use intel compiler for c it's |
|
|
47:46 | I G P the gtc with And that would be using the intercom |
|
|
47:53 | . Good. Okay. So now , coming back to town. Uh |
|
|
47:59 | you use the style compilers, rappers compile your code. And when you |
|
|
48:04 | that, that basically in walks down thou starts instrument in your soul. |
|
|
48:13 | it starts with passing your code using program database, scared It has all |
|
|
48:20 | instrumentation calls inside your source for using internal module called how instrumental. And |
|
|
48:30 | this perform called the linking with all other libraries like party and open mp |
|
|
48:35 | whatever you're using. And then after linking of object file you get your |
|
|
48:43 | name, uh whatever you provided in command be provided, uh Math molest |
|
|
48:50 | house. Put my limbs you get instrumented executable. So this is not |
|
|
48:56 | simple executable. This is an instrumented now. Yeah. Now the good |
|
|
49:02 | but now is as you may have by now with my the democratic |
|
|
49:09 | you had to insert the puppy calls your source told. And good thing |
|
|
49:15 | pal as you may have noticed now you did not have to change your |
|
|
49:18 | . So that all the only thing that you needed to do was to |
|
|
49:22 | your soul with the 1000 pilots And that also comes with the uh |
|
|
49:31 | that once you have some file your , you don't necessarily need to recompile |
|
|
49:36 | thoughts to get data about any other events. And I'll show you how |
|
|
49:41 | works. But uh close getting back how to now get some data about |
|
|
49:48 | the event that we search for. research. Uh Happy single decision up |
|
|
49:56 | the event that we want to Now then uh want to execute your |
|
|
50:03 | . You use another apple from now called the tower exact underscore exactly be |
|
|
50:10 | with a bag with the miners capital . Plans. Uh And tell it |
|
|
50:15 | serial program when we will use open . P. Or mp I or |
|
|
50:22 | . Uh Can we need to provide with the tag of open NPR NPR |
|
|
50:27 | for now, since we're just dealing serial programs to tell him that the |
|
|
50:31 | program. Okay. And then just uh huh provide the name of your |
|
|
50:39 | executable and the execution generally, uh you would run your program but the |
|
|
50:48 | is now in the same execution you will see a new file that's |
|
|
50:54 | profile 0.0 dot zero dot video. this profile file, it actually contains |
|
|
51:00 | information about the event that we wanted collect uh information about all the functions |
|
|
51:08 | we have in the program uh for event. Now, the way you |
|
|
51:14 | this particular profile file on the console by using a console based profiler provided |
|
|
51:21 | taliban. Uh just spotted people and you simply call this people of profiler |
|
|
51:29 | the directory where these profiles are located this profile is located that should open |
|
|
51:37 | profile. Fine with all the information that particular event for each of the |
|
|
51:46 | that was executed in our program. in this program, I only enable |
|
|
51:51 | traffic matrix multiplication. I don't agree the interchange matrix multiplication, but data |
|
|
51:58 | have collected, it only contains the function the classic maximal and the initialization |
|
|
52:05 | function whether yeah. Now, here can see the clear difference between the |
|
|
52:12 | , inclusive and exclusive events. Now , exclusive events has refined the |
|
|
52:19 | it shows only the events that were by the specific function and not by |
|
|
52:25 | of the Children or not a So we know that the classic maximal |
|
|
52:31 | that was the one that performs the billion operations needed for the matrix |
|
|
52:38 | So we see the exclusive town For matmos here uh about a little bit |
|
|
52:44 | than two billion. Um with the down the main function calls the classic |
|
|
52:53 | multiplication. You see those two billion as inclusive operations for main function as |
|
|
53:02 | . Although we know that main function not actually refunded for the classic. |
|
|
53:07 | need to make sure what you looking and the assignment, I believe lee |
|
|
53:13 | to 2 uh measure the exclusive counts the dysfunctions so that you get a |
|
|
53:22 | idea of one of those specific functions actually going. So you don't end |
|
|
53:28 | adding whatever the other initialization function or other function might be doing. |
|
|
53:34 | Any questions of them. Now I one but lets the students ask |
|
|
53:40 | Yeah, If not while they may about questions, the one question that |
|
|
53:49 | to my mind was it would be to, and I think it's supported |
|
|
53:57 | the sky lake. What type of instructions were actually used if any. |
|
|
54:03 | when they used to be maybe it's 12 or 2 56 or adjusted scale |
|
|
54:09 | instructions and I don't remember exactly what property function name was for that. |
|
|
54:16 | think it's gonna stop the right We can get from Buffy a |
|
|
54:22 | Yeah it's called copies back the single . So if you want to be |
|
|
54:30 | to see what happens. Yeah, helps in trying to figure out, |
|
|
54:35 | know, the mama looks at performance efficiency whether it's just. Uh |
|
|
54:42 | Um Okay, let me come back of uh Yeah. Before I go |
|
|
54:53 | that, the good thing about how coming back to that now, I |
|
|
54:58 | need to recompile the goal. I need to set the cell metrics environment |
|
|
55:02 | able to collect information about a new . And now what I need to |
|
|
55:07 | is simply execute my instrument. Uh . And now should replace that profile |
|
|
55:18 | I had in the directory with the information about this new different events and |
|
|
55:25 | a currency, a single provisions But the that every single decision looks |
|
|
55:31 | much the same. It doesn't seem be much different sort of now at |
|
|
55:37 | . That's good. Right. But doesn't tell you how many uh instructions |
|
|
55:46 | issued on that type. Uh I think the instruction went where Uh event |
|
|
55:56 | available on the 60 years I Uh only tells you about the |
|
|
56:02 | Oh yeah. These are the instructions . Yeah. Yeah, should |
|
|
56:08 | Yeah those are instructions. So instructions operations apparently are equal in this |
|
|
56:14 | Yeah, that's when I saw the numbers and that surprised me. We |
|
|
56:25 | to find exactly what this uh that mean? What yep we got some |
|
|
56:31 | to. Yeah. Okay. Uh so uh that was only collecting one |
|
|
56:40 | event for a given execution. You also collect multiple events uh in in |
|
|
56:46 | single execution and that we can do setting the down metrics, environment variables |
|
|
56:53 | equal to a colon separated list of . Now, one thing I should |
|
|
57:00 | here is that if you're collecting multiple , uh you need to make sure |
|
|
57:07 | you're not ending up collecting events that not compatible with each other. So |
|
|
57:13 | that we saw a tool from barbie uh last time in the demo called |
|
|
57:20 | puppy event user. And that tells that too um event uh are compatible |
|
|
57:32 | each other or not. So let's again, just give a reminder if |
|
|
57:36 | Our target is obviously option about the one. Total Kaltschmitt is compatible with |
|
|
57:41 | other. It will tell you that are not compatible with each other and |
|
|
57:46 | be counted together. So just make you don't use any events that uh |
|
|
57:53 | are incompatible. And even if you up doing it, it's likely when |
|
|
57:58 | run the code that uh the dow . Well how exact command and throughout |
|
|
58:05 | saying that it's downtown here event. yes, coming back to how you |
|
|
58:12 | do that for compatible event. Uh . So just simply provide, like |
|
|
58:21 | give the SDR a colon and let's we want to do puppy total cycles |
|
|
58:29 | I know these two are compatible with other. And just that cell matrix |
|
|
58:33 | a uh colon separated list of these uh metric. And again, we |
|
|
58:40 | need to compile your recompile your Just do that exact and run the |
|
|
58:46 | . And now the difference is that will allow them right to directory in |
|
|
58:52 | execution directory and it will call those by the name of the events that |
|
|
58:59 | and that is collected but has not been directly and it has probably total |
|
|
59:04 | directly now. And if you go one of these directly, you can |
|
|
59:11 | the actual profile for that particular event you can uh read that by using |
|
|
59:17 | command line pilot policy problem. Now that's pretty much it what uh |
|
|
59:27 | need to do in terms of getting running uh as a background for now |
|
|
59:32 | get all the uh event information you how uh now if you want to |
|
|
59:45 | news bad across to get some uh we base profile. What you need |
|
|
59:54 | do is you need to go to uh that visualization portal and there's a |
|
|
60:01 | for it with hot stocks dot utexas e d u. And here when |
|
|
60:06 | go uh you will just go to hometown when you go to the jobs |
|
|
60:12 | . It will let you currently one my jobs running but it will give |
|
|
60:16 | some configuration operations options, which are the slides you can follow those steps |
|
|
60:21 | get one of your jobs running there sure. It takes a while to |
|
|
60:26 | a job for this particular VMC session the uh for the fact regionalization |
|
|
60:33 | But once it's running it should give a window that looked like this and |
|
|
60:40 | it loves into one of the compute . And here again, you can |
|
|
60:44 | your home directory or all the files are all from on the club. |
|
|
60:50 | , but here it allows you to the para crocks, G y based |
|
|
60:57 | and that's going to one of these again. So again, this is |
|
|
61:02 | of the profile that we just Um Yeah here again, you need |
|
|
61:09 | load the dow model. Mhm. it's a different ssh session for |
|
|
61:16 | You need to set all your environment and everything again, but we don't |
|
|
61:21 | it for now. We all limit town one. You all right. |
|
|
61:26 | rather than doing pete brock as we in the on the command line. |
|
|
61:31 | in the other session here you can do battle problem uh and press enter |
|
|
61:39 | that should open the barrel crop And so it tells you all the |
|
|
61:46 | about the uh the note that you're uh the call graph or the event |
|
|
61:55 | for your for your program and then smaller window here it tells you the |
|
|
62:03 | information about uh the uh the event we collected inamorata graphical format. Now |
|
|
62:10 | , this is one of similar to screenshots that you saw in the |
|
|
62:13 | You have the metric name on the . You have uh telling you that |
|
|
62:18 | you're showing you an exclusive values or , your standard deviation mean max and |
|
|
62:26 | . and since it was a single program, it only has this one |
|
|
62:32 | role that has no zero showing. . And this has been a problem |
|
|
62:38 | their visualization for the let your internet get some big money. Uh but |
|
|
62:45 | , so if you click on this no zero name, which is for |
|
|
62:51 | single thread, it should open a detailed uh, window that shows you |
|
|
62:58 | exclusive town for each of the function a graphical apartment, similar profiles but |
|
|
63:04 | a graphical format here. So here have the exclusive counts for the |
|
|
63:09 | Michael. You can also choose the events by going into options select metric |
|
|
63:19 | choosing inclusive. And that way you see the inclusive towns for your for |
|
|
63:25 | events. Yeah. And then you also go back and try uh Come |
|
|
63:39 | . Yeah. And the windows after which gives you two have 3D visualization |
|
|
63:45 | all the other stuff that you can comments prophesied. It depends on how |
|
|
63:52 | you have collected in your profile. example, I can show you for |
|
|
63:58 | another program that you may get at point as an assignment. Uh huh |
|
|
64:05 | , it won't give up anything and talk solution jumping out there. So |
|
|
64:11 | is one of the so that you be asked to profile. I believe |
|
|
64:18 | one of the assignments and it has lot more function calls as compared to |
|
|
64:26 | simple food that we saw similar to of the screenshots. Uh Colombia |
|
|
64:33 | So again, you see all the function calls here And the exclusive towns |
|
|
64:38 | about three single decision of. And looking at this, you can tell |
|
|
64:43 | average function is performing more work compared the other function in case you have |
|
|
64:47 | really, really complicated good, this you sort out uh the most compute |
|
|
64:55 | , not computing cancer, but the busy functions out of your complex. |
|
|
65:07 | any questions, So be a question the chest. Yeah. Way to |
|
|
65:17 | multiple tribe. Now, I see number. Match men, standard |
|
|
65:24 | Uh No, unfortunately, it doesn't you to multiple trials. You'll have |
|
|
65:31 | collect uh profile multiple times and copy from it unless I guess. Um |
|
|
65:46 | , to run things in the loop you run it and look it actually |
|
|
65:53 | the old profile. Uh But then got in the total lower average. |
|
|
66:04 | , but you don't get the highest grams or to speak for instance. |
|
|
66:08 | Yeah. Yeah, I think In that case you make a run |
|
|
66:25 | the Yeah. So what's the choice output formats for tell whether you then |
|
|
66:38 | have some analysis program runs on So you don't have to manually do |
|
|
66:45 | quickly. Can Right. I don't the command but I believe it allows |
|
|
66:51 | to export these profiles and PS three or some other easy to process |
|
|
66:58 | I can look at how that a question. Since we're asking to do |
|
|
67:05 | , there may be outlier executions. , but I can try to look |
|
|
67:11 | the command but I believe it's how do it and Yeah. Uh or |
|
|
67:20 | that you know, some other program is good. Yeah. Um can |
|
|
67:26 | statistics. Right? All right. was pretty much it for the |
|
|
67:39 | Um You know, one thing uh mention, it's not necessarily regarding the |
|
|
67:46 | but in terms of presenting the information your assignment report, uh with your |
|
|
67:53 | , you will like you will get of the scripts that I've written and |
|
|
67:57 | just goes through different combinations of uh events to get your data much more |
|
|
68:04 | . But try to uh extract the information from that script output. Don't |
|
|
68:13 | take screenshot uh of the output sentence the new report. People. That |
|
|
68:19 | make any sense when you try to the data. Try to extract the |
|
|
68:23 | information, put it in a table then analyze it. But one thing |
|
|
68:28 | would want to mention but in terms I don't know if you have to |
|
|
68:37 | any Things are doing the three day capabilities. I have not done. |
|
|
68:45 | you can Just use the 3D visualization take a look at it about in |
|
|
68:51 | words, since you're at it. show that various ways of doing |
|
|
68:59 | Yeah. Do more of a instagram instagram but multi in terms of |
|
|
69:09 | Right. Yeah. There's all these are in the better propaganda, interesting |
|
|
69:22 | truth. The right one. Right . The ones that I showed towards |
|
|
69:27 | uh and the flat profile that you call it. We can all presumably |
|
|
69:32 | pretty visualizations. Okay. And they more questions on course you josh. |
|
|
69:49 | someone just simply show a couple of three D plots that you can do |
|
|
69:55 | me. And the one question. table masked men have three trials for |
|
|
70:02 | party event repeated for two versions of maximal algorithm. Yes, I believe |
|
|
70:09 | . What are you expecting? Asking years until select information for all |
|
|
70:27 | events that are available for the ones make sense for those programs. You |
|
|
70:37 | agree it's a little bit of data but that's why we've try to provide |
|
|
70:41 | with scripts to automate that and this to write your own scripts to do |
|
|
70:55 | . But let's stop shouting at that . Okay. Uh huh. Mm |
|
|
71:07 | ground ones. It screams my name first one where I was yeah so |
|
|
71:16 | just slipped through on this so there's stuff related to what so yes I |
|
|
71:22 | so mhm mhm um tell the cornel yeah so yeah here's I guess things |
|
|
71:35 | you can control what it does and just want to the show a little |
|
|
71:40 | and yeah we were just shut up profile and and this one parents again |
|
|
71:49 | can control under the options, what call a graph that you can |
|
|
71:52 | You know. Met months is a simple programs it won't show much um |
|
|
71:58 | useful thing in here I can just where as well as you can choose |
|
|
72:03 | attributes you want on the three access this case for a three D. |
|
|
72:08 | on the cold and there is again things may disclose um what you want |
|
|
72:21 | get out of it uh you know the best way so they're just different |
|
|
72:26 | of getting a 3D visualization and that it's um quite helpful and figuring out |
|
|
72:35 | to focus on. So I just to highlight um variously departing options that |
|
|
72:41 | not give you much in terms of past that maybe later on. Either |
|
|
72:48 | there assignments or if you choose to a project maybe have something more complex |
|
|
72:54 | you want to be able to comprehend easily than and it's just a flat |
|
|
73:04 | . So I think that's what I just wanted to show in terms of |
|
|
73:08 | treaty both in capabilities. I don't I have much more except there's a |
|
|
73:14 | slice about some of the other tools how is using. So if you're |
|
|
73:19 | in some of them then look at last few sites and they will have |
|
|
73:26 | to or you can get more information the specific tools or you can go |
|
|
73:31 | the tile website there, the crowd landscape garden that for us more pointers |
|
|
73:41 | various suspects account. Yeah, so have yes I'm tracing is something that's |
|
|
73:48 | sophisticated depending on what tools to use um in terms of parallel programming they |
|
|
73:57 | actually quite useful because then you can things on there also preneurs and portrayed |
|
|
74:04 | so you can have much more idea what is the slowest part of what |
|
|
74:12 | or process is performance limiting but of kind of the more decrease of freedom |
|
|
74:22 | you have your code in terms of , processes and um communication routines and |
|
|
74:30 | parts on us to be careful in what we're asking for. Two not |
|
|
74:37 | in order to make visualization tools being to extract or expose what you hope |
|
|
74:46 | find but the tools are generally pretty job taking Multivariate data set and turn |
|
|
75:00 | into something that can represent it into whether with this screen format even as |
|
|
75:08 | , the three D plot, you in the end, the third dimension |
|
|
75:15 | kind of amazing and you have to and finesse. Okay, I will |
|
|
75:23 | there and see if there's any Stop sharing screen, oh, stop |
|
|
75:40 | |
|