© Distribution of this video is restricted by its owner
00:02 | Okay. So today, what my is to give you the aspects uh |
|
|
00:15 | of the aspects of the platforms you're on now and in the future that |
|
|
00:22 | important to understand whether they're cold you're uses the resources well or not. |
|
|
00:32 | it's gage tourists the elements of processor that are important for understanding resource use |
|
|
00:47 | mm Okay. So that's the way kind of organized. It's respect to |
|
|
00:56 | types uh Cpus the first in the , as I mentioned at the beginning |
|
|
01:04 | the of course the first lecture that trying to understand what's important for conventional |
|
|
01:17 | server. Cpus then look a little closer at GPUS or accelerators but for |
|
|
01:29 | that will only be on GPS but trying to understand a little bit more |
|
|
01:38 | for a long time it was that so called X 86 Type architectures for |
|
|
01:47 | instruction set. The story way back Intel had the models processes labeled something |
|
|
01:56 | and that instruction set is then run only by Inter but also by MD |
|
|
02:01 | a few others, but because of becoming a prime the same constant. |
|
|
02:13 | number of things started to happen in in recent years that there is a |
|
|
02:18 | of architectures and in particular architecture is for being energy efficient. Has now |
|
|
02:27 | entered into the general sort of computer , not just in the mobile computing |
|
|
02:34 | . So I'll give a bit of on these things and and generals platforms |
|
|
02:43 | processors have become what's known as heterogeneous try to bring that up just to |
|
|
02:48 | you an orientation of what's out there try to assess the thing that you |
|
|
02:53 | be using and getting assignments. So I said a few times already, |
|
|
03:03 | high performance, as far as I'm is synonymous with high efficiency and you're |
|
|
03:09 | um kind of getting high performance or to have good resource efficiency, that |
|
|
03:19 | you need to understand what capabilities are the platform you're using as well as |
|
|
03:25 | also have to understand both the application the code. They are not synonymous |
|
|
03:32 | you have an application in mind and find some algorithms for it and then |
|
|
03:37 | call it up and several different steps their choices once even from given |
|
|
03:46 | your sources of algorithms and their choices you produced the code. So one |
|
|
03:51 | to understand Both consumption of the bottom the top 10 No one is working |
|
|
03:58 | . Mhm. So the things are to stress today is kind of this |
|
|
04:05 | bit of laundry list on this Uh It's important to understand the degree |
|
|
04:14 | perilous that exists even in a single or single processor, which is kind |
|
|
04:22 | more than the number of course is to it than that. We'll bring |
|
|
04:27 | up as I talk about the various today. Uh There are things to |
|
|
04:34 | attention to when one looks at the of the various processors is what's shared |
|
|
04:43 | what's not and what's private for the and it's typically has to deal with |
|
|
04:51 | parts of the memory anarchy but and buses that feeds and will stay in |
|
|
04:57 | that are shared. Another important whether it is has some form of |
|
|
05:06 | instructions of the instructions Uh typically called instructions uh with an extra part and |
|
|
05:16 | V a like w very long instruction has a little bit more flexibility than |
|
|
05:21 | instructions, so they're not identical or synonymous. Another thing that is usually |
|
|
05:33 | need to look at the kind of of description, a little bit on |
|
|
05:37 | marketing materials. Sometimes it's often hard come by the fact that because of |
|
|
05:48 | is such an important role in modern , most of them have firmly under |
|
|
05:55 | the clock frequency. So you as I mentioned, I think last |
|
|
05:59 | you don't have full control um by , uh I thought frequency is being |
|
|
06:08 | so I'm trying to understand whether the is making good use. One also |
|
|
06:14 | to understand what's going on under the to some degree and then there is |
|
|
06:20 | memory system then as I mentioned is weakest part of the system. |
|
|
06:25 | And depending upon applications, different aspects the memory system is critical. So |
|
|
06:34 | insee to the various levels in the hierarchy, there is the bandwidth issues |
|
|
06:40 | there's also it's just like cash line that are important and how when there |
|
|
06:50 | capacity conflict mrs Howe But the rules for kind of writing or replacing what's |
|
|
07:00 | in the cache. And then there other things that is known as cash |
|
|
07:05 | selectivity that has to deal with how places in a given cash that the |
|
|
07:13 | item can be written to today. will mostly mention these attributes of a |
|
|
07:25 | and I'll talk more in a later about some details of the memory hierarchical |
|
|
07:32 | system. And then we also need pay attention to the data paths that |
|
|
07:39 | available for moving data around between caches to of course in the same |
|
|
07:51 | So I guess that still be hard ask I guess and figure it out |
|
|
07:57 | in this virtual environment that And we're to know how many of you are |
|
|
08:04 | with these concepts and for many these kind of new. So I don't |
|
|
08:09 | how to do that. Maybe you suggestion can monitor to some degree or |
|
|
08:19 | figure out. Next lecture will be to face again. So maybe they'll |
|
|
08:24 | it at that time. It's hard . Okay, so this is what |
|
|
08:29 | going to try to point out when talked about various processes in the next |
|
|
08:35 | side or today. It's a preamble try to uh condition you what I'm |
|
|
08:43 | to focus on but this is just cartoon and hopefully most of you have |
|
|
08:50 | in one form or another before. well I can't do best to illustrate |
|
|
08:58 | yes in each process there's tend to a collection of course that has some |
|
|
09:06 | of processing logic that are floating point , integrate units and compares com pretenders |
|
|
09:14 | branching logic and all kinds of other . Instruction decoders basically complete computer in |
|
|
09:24 | in terms of all being able to and make use of instructions and reference |
|
|
09:33 | . Then there are some piratical caches so today is when it comes to |
|
|
09:42 | process is definitely the most I think of them have 3 11th of cash |
|
|
09:46 | there included in the process and they're part of the same piece of silicon |
|
|
09:54 | the data rates between the different levels cash and my memory where is quite |
|
|
10:02 | bit. And so does the latency the distance in terms of processor cycles |
|
|
10:10 | , where data lives and talk in about that when I talked about various |
|
|
10:18 | . Um but uh you can see it's in this case mm if you |
|
|
10:31 | at the for a single core which kind of a column in this cattle |
|
|
10:40 | the the data right between The functional of processing logic and the level one |
|
|
10:48 | is nor less the same as the rate to main memory. Typically a |
|
|
10:58 | bit less so but if you look the numbers in terms of gigabytes per |
|
|
11:06 | they will notice that it doesn't take cores to need access to main memory |
|
|
11:14 | the main memory, then that becomes limiting factor. It's like and there's |
|
|
11:21 | numbers on this line is kind of course if you don't have any data |
|
|
11:26 | use, they basically such a right the memory And it's also about two |
|
|
11:34 | of magnitude or the country more. . Between the agency or the distance |
|
|
11:42 | terms of compute cycles to the main compared to level one cache. So |
|
|
11:51 | are fire, quantitative numbers are The best to try to understand how |
|
|
11:57 | work they need on data without being . Brand was limited the way we |
|
|
12:03 | about in terms of the real fine in the last match. So there |
|
|
12:11 | more or less a similar picture. reminding your terminology that registers are they |
|
|
12:18 | closest to the execution units and the units, ADDers, multipliers, uh |
|
|
12:28 | unit. They all operate on data registers. Register files sometimes gone. |
|
|
12:36 | those are basically a single cycle away . So you can grab things out |
|
|
12:43 | registers and put data back in registers a single cycle. And that also |
|
|
12:51 | to bandwidth between the register file and collection of registers and the functional businesses |
|
|
12:57 | high in order to try not to to be able to support to functional |
|
|
13:05 | . But then as you move away the ability to move data has produced |
|
|
13:18 | and in front of it discussed, wiring complexity. We'll talk a little |
|
|
13:26 | more about that I guess some slices come but there are 10s of thousands |
|
|
13:35 | or more wires on a single piece silicon. So then tend to be |
|
|
13:42 | ones that defines the cheap area. the logic. Yeah. And then |
|
|
13:51 | is different levels of cash and I'll a little bit more about that on |
|
|
13:58 | next side. The one thing that wanted to end if you look at |
|
|
14:05 | of the sub folders on the cash say that as I mentioned last time |
|
|
14:10 | , when I talked about stream that processors don't strictly follow this notion of |
|
|
14:19 | move from one level of cash to next level of cash etcetera from main |
|
|
14:24 | to the execution units and then back cash policies sometimes are not helpful. |
|
|
14:34 | actually her performance and for that reason are many architect as a way of |
|
|
14:42 | caches and sometimes that's called us non stores. For instance, if you |
|
|
14:48 | to write things back to memory and caches, other things that some |
|
|
14:58 | you know, perhaps not everyone is with some of the main memory that |
|
|
15:07 | called Iran for dynamic random access it's by no means random access and |
|
|
15:14 | talk about that in a subsequent lecture the design of those, the main |
|
|
15:20 | do something called double data rate designs then it's followed by a number that |
|
|
15:28 | kind of the generation of the design specification or standard for this double data |
|
|
15:34 | memories. So today's recent year PCS server processors that used to be R4s |
|
|
15:46 | wants to three and there's still some it around and take reasons depending upon |
|
|
15:53 | what the market is for processors to an older memory technology And there is |
|
|
16:01 | some indeed er five standard but it not yet been support it but I |
|
|
16:07 | towards the end of the year news that are coming out are expected to |
|
|
16:12 | engineer fine but not yet in production beyond that just played a disk in |
|
|
16:21 | and they're not deal with this can storage in this cost. Mhm So |
|
|
16:30 | is uh just a little bit again the same pictures before but something I |
|
|
16:39 | encouraged everyone to do and try to performance. Follow the data, where |
|
|
16:46 | it start and where does it end ? So and for pretty much most |
|
|
16:54 | , data starts in main memory or in any part in the cache hierarchy |
|
|
17:00 | to start somewhere else. And if a real time of steaming processors main |
|
|
17:05 | , maybe someday I input but and the results obviously the registers are not |
|
|
17:15 | end point for the results but it to come out of the system |
|
|
17:23 | Mhm. In most applications it is case that there is more input data |
|
|
17:30 | there is output. data. That's this kind of read errors are the |
|
|
17:37 | words and my graph here that and as a simple example that there's more |
|
|
17:45 | being read and written just thinking about matrix vector multiplication where you have to |
|
|
17:51 | Y and X. Um why? the results. That's the thing here |
|
|
17:56 | right back to memory or some other of output and that is can be |
|
|
18:05 | there Largest. So if you have taking us may take 30,000 3000 and |
|
|
18:11 | have a million elements But why? just 1000 elements of it's not just |
|
|
18:18 | small factor that can be a huge in terms of the amount of of |
|
|
18:22 | to be loaded compared to what's Mhm. So many, some mentioned |
|
|
18:34 | time a lot of processes kind of designed to do very well for matrix |
|
|
18:43 | . Of course not everything is made operation but has been a core operation |
|
|
18:51 | on and it still is in and you look at machine learning and many |
|
|
18:56 | types of applications, it still is that tends to have more important than |
|
|
19:05 | data. Mm So as the black is the capabilities and the red things |
|
|
19:14 | , especially pointing to the needs rather the capabilities of systems where the capabilities |
|
|
19:23 | the black errors is the function of trade off, what's costly and what's |
|
|
19:32 | and the assumption that many applications somehow an option for data reuse. So |
|
|
19:41 | you design your algorithm picked algorithm as . And the realization or implementation of |
|
|
19:49 | in terms of software is such that can make effective use for memory. |
|
|
19:54 | always stay down. So that's why point is to try to get all |
|
|
20:00 | pieces to work. So the performance kind of ideally determined. I say |
|
|
20:07 | level one cache and not finding memory that's again the roof flying and these |
|
|
20:13 | intensity we talked about last time where place at all and there is some |
|
|
20:25 | basically how things works as I mentioned that registers pretty much all this and |
|
|
20:32 | of the architecture. So it's just cycle away. Just take 1 2nd |
|
|
20:36 | move data from for two registers and functional units. And there is usually |
|
|
20:44 | wiring to be able to move all operas needed for an instruction from the |
|
|
20:55 | file to functional units and back. all this data path can be operated |
|
|
21:01 | parallel. So one can, if isn't registered, one can operate functional |
|
|
21:11 | , kind of full speed full But the other point I tried to |
|
|
21:16 | out here that means that data rates the ability to move data is quite |
|
|
21:27 | in a way on modern processing So it's in order 10s of terabytes |
|
|
21:33 | second at the innermost level if one a function units and what any that |
|
|
21:45 | I said, that means 10s of of wires and we also know that |
|
|
21:56 | process soon, I think I showed pictures, you know, I'm a |
|
|
22:00 | the board if you haven't seen them and when you buy a processor |
|
|
22:06 | into the MD or somebody else, something that maybe if it's a big |
|
|
22:11 | , maybe an inch by an inch little bit more. But you |
|
|
22:16 | they realized it's sort of sure 10s thousands of connections on that kind of |
|
|
22:23 | area. So it's just not feasible bring everything out for mechanical reasons. |
|
|
22:32 | , so in order to try to with all kinds of limitations on the |
|
|
22:40 | , cash is exists and has been existence for a long, long |
|
|
22:45 | What has happened over the years? , that's got them more and more |
|
|
22:48 | them in terms of um not sizes , but ah, hierarchy, that |
|
|
22:55 | different scopes. So as I there's typically three levels of cash on |
|
|
23:01 | on Pcs today and they Level one designed to be pretty much as close |
|
|
23:11 | functional units as the register's not sometimes it's just a single cycles between |
|
|
23:16 | one and register file, but sometimes a couple of seconds. And as |
|
|
23:22 | get 11-11 3 then they come further . The reason is also that The |
|
|
23:33 | one by being designed to operate at speed. They also tend to both |
|
|
23:40 | more power and take a bit more per bit than the other 7 - |
|
|
23:50 | , 3. So it's again a off based on properties the technology being |
|
|
23:57 | to implement Cashes wire one has this enough just to one level of |
|
|
24:06 | Another thing to the bandwidth is a that in terms of the process of |
|
|
24:11 | and I will talk about later is one is to look at the with |
|
|
24:16 | data path and the rate at which operate. So typically and most of |
|
|
24:24 | chips today, different parts of the piece of silly can operate at different |
|
|
24:30 | frequencies and that means that different levels cash may operate different clock frequency is |
|
|
24:39 | buses or the data path on the may operate the different clock frequencies that |
|
|
24:45 | caches or the functional units. So needs to pay attention to not only |
|
|
24:56 | some buses but also the data right operates that in order to actually understand |
|
|
25:03 | feasible and whether resources were used or and um uh because that's all |
|
|
25:15 | that's main memories several 100 cycles So this is kind of a bit |
|
|
25:23 | recall and didn't talk much about you know, it was on the |
|
|
25:26 | in last lecture but just to get perspective on things and 11 can look |
|
|
25:33 | it from the bandwidth perspective, that the upper graph, we can look |
|
|
25:38 | it from the latest perspective that is lower graph in this slide and it |
|
|
25:46 | , and this line was done not long ago, but john Mcalpin, |
|
|
25:53 | is not the university of texas in and he is generally known as Mr |
|
|
25:58 | and he was the one that first up with the idea of the memory |
|
|
26:04 | stream that has become widely used even , many years after he came up |
|
|
26:10 | the idea but the artist slide is terms of state of the art processors |
|
|
26:20 | doesn't quite sure how in some sense or what the capabilities are in terms |
|
|
26:30 | processing or functional units on a chip the ability to feed the chip. |
|
|
26:37 | this has here's a fit, take recent generation of inter process is that |
|
|
26:44 | is 83, to that is the that was released this year, You |
|
|
26:50 | do about 100,000 floating point operations in time it takes just to get one |
|
|
27:00 | work out of memory that gives some also how hard it is or well |
|
|
27:08 | to use memory hierarchies and what it of an application or the computation in |
|
|
27:14 | to be actually sustain the performance close the functional units can do on a |
|
|
27:23 | . So for most applications, as mentioned, day before they end up |
|
|
27:30 | memory band was limited, it if you look at the agency, |
|
|
27:37 | not quite as bad and the bottom is the red line is really looking |
|
|
27:43 | the number of floating point out for you can do in the time it |
|
|
27:47 | too rich even data item from memory that hasn't changed much since john mcalpin |
|
|
27:53 | graph. And the lower curve is just for reference putting the upper blue |
|
|
27:59 | on the same rest. But this again takes perspective on again what it |
|
|
28:08 | both been picking algorithms if it's possible the application you have and then watch |
|
|
28:14 | what the code does in terms of realizing what's possible. Yeah, so |
|
|
28:22 | little bit more and then I'll stop a second and asked if that |
|
|
28:27 | So here is, you know, to this simple stream trial age or |
|
|
28:33 | you do in the matrix spectrum. matrix multiplication is kind of in the |
|
|
28:40 | instruction ambassador multiplying and out. Mhm if we try to work it out |
|
|
28:46 | it takes in terms of again moving . So the theme is consistently that |
|
|
28:54 | bandwidth and or memory system is the part. Mhm And so here is |
|
|
29:00 | of just putting some numbers to that . So suppose this takes, you |
|
|
29:05 | , three loads and one story A and C needs to be loaded and |
|
|
29:09 | you store see once it's no results then you also need an address for |
|
|
29:14 | and simplicity and seven supermoon assume matters also 64 bits like for 64 bit |
|
|
29:22 | addresses may I mean the sixth form today uh so it that it |
|
|
29:29 | that's an example. It is everything making it. So that means |
|
|
29:36 | 56 points um for cycle to do single operation. So in that case |
|
|
29:48 | if you run something at 2.5 gigahertz it is fairly typical. Some processes |
|
|
29:55 | faster as you will see today and mean it was well And some of |
|
|
30:00 | may actually four GHz for a little more than that even so it's by |
|
|
30:05 | means the worst case. So this kind of a single threat needs, |
|
|
30:12 | doing this operation. So now if think what a single processor chip can |
|
|
30:23 | And if you have 256 threads running the same piece of silicon, that |
|
|
30:29 | you need about 36 terabytes per second order to sustain uh the ability to |
|
|
30:36 | this instruction on the top. So if you look at typical service and |
|
|
30:47 | they use so called dual in line modules of dims, we'll talk more |
|
|
30:54 | that in a later lecture but that's of a memory module you used and |
|
|
30:59 | onto the motherboard and This now they four us again the fourth generation of |
|
|
31:07 | double data rate memories and the 3200 you the cop grades of that memory |
|
|
31:14 | again we'll talk more about it later it means that this is kind of |
|
|
31:19 | top of the line memory module today is supported by pieces or servers And |
|
|
31:24 | can do 25GB per second. So basically More than three orders of magnitude |
|
|
31:33 | . So if they try to support functional units directly from main memory at |
|
|
31:42 | terabytes per second, that would mean it would actually need 1400 memory modules |
|
|
31:48 | this type for a single processor So that's obviously not realistic in the |
|
|
31:57 | place to get that many memory channels a single processor chip. And if |
|
|
32:07 | multiply it out, I mean 64 times 1400. So that is right |
|
|
32:13 | to 100,000 wires, you would need get out of the processor. So |
|
|
32:17 | doesn't really work. Mhm. So also says in order to actually again |
|
|
32:25 | it. We'll need to have applications has potential for data reduce and then |
|
|
32:30 | to figure out hard to realize that . But there's also another aspect that |
|
|
32:36 | critical and that's the other part that the energy consumption. So the energy |
|
|
32:45 | for this type of memory module is what doesn't sound like much five pickled |
|
|
32:56 | Corbett. But if you were to that and operate that these 36 terabytes |
|
|
33:01 | second, what it means is The would consume about 1004 kW. So |
|
|
33:10 | again not realizable in any way see use or can be a parliament far |
|
|
33:19 | but this is I would say 4, 5, 6 times even |
|
|
33:25 | power hungry. She presumed wounded. in the end it's just not |
|
|
33:34 | Uh so on, no stop after slide. So this I just give |
|
|
33:42 | little bit of comment in terms since talk why it's necessary to use |
|
|
33:49 | His soul. You don't need to about the whole slide the scandal cut |
|
|
33:54 | bottom lines or a rose in this little table that says islam. That |
|
|
34:00 | kind of cash so and the best of castro versus if you were to |
|
|
34:07 | to go to main memory so it's the factor of 100 plus in terms |
|
|
34:14 | energy consumption difference between using cash is using main memory. So that's again |
|
|
34:24 | an energy perspective it's necessary to make , use our caches as well as |
|
|
34:31 | a performance perspective and I can stop and see if there's any questions and |
|
|
34:40 | will switch to talk and give you process for examples. Okay, so |
|
|
34:56 | move on to talk about processor examples here is just a little bit of |
|
|
35:04 | recent generation of processors and the last our community to us. So the |
|
|
35:19 | guess things I wanted you to pay to here is I guess the memory |
|
|
35:25 | that these processors chips support. So top of the line ones they support |
|
|
35:36 | eight memory channels today and they Use DVR four memories or 34, 30 |
|
|
35:46 | version of and that's the fastest they to support and that means to get |
|
|
35:53 | 200 gigabyte per second per psychotic and bottom ones are at the DP US |
|
|
36:11 | the thing for them, it is the fact that of Between depending upon |
|
|
36:19 | models use 4-6 times the memory band such a gap on a CPU the |
|
|
36:27 | in the middle intel atom just put there as an example of something that |
|
|
36:36 | typically designed for mobile processors are that used in cell phone risks in some |
|
|
36:46 | , still used in cell phones. they don't have many much of memory |
|
|
36:50 | and memory language it and they also a lot cooler. So they said |
|
|
36:55 | want to do what processors were as rest of them runs it maybe a |
|
|
36:59 | of 100 once the other thing I I should comment on on this flight |
|
|
37:05 | in terms of memory technology is something most of them says they are and |
|
|
37:11 | version and then there's HP N that for high bandwidth memory that will also |
|
|
37:17 | about subsequent action when I talk about more in detail, but these are |
|
|
37:26 | a different way of being integrated with processors and that allows for the higher |
|
|
37:32 | that is giving the name to the . Right. Um Yeah, so |
|
|
37:43 | this is a little bit kind of summer a little bit, but I |
|
|
37:46 | talk about in terms of the number threats. Again, a level of |
|
|
37:49 | listening to have in each processor. today there's tens of thousands of no |
|
|
37:57 | of threads. Sorry that for typical processors where it's in the several thousands |
|
|
38:12 | in the case of GPU s but also needs a limited type of primary |
|
|
38:20 | and GPS because they are this simply . So things needs to be |
|
|
38:27 | you need to have application in which can the organizer, single instruction operates |
|
|
38:36 | data at the same time. This not necessarily the case in a sip |
|
|
38:43 | that has the w so yeah, flexibility and what you can do in |
|
|
38:48 | single instruction. Um so now too and questions meanwhile, thanks. So |
|
|
39:10 | first example is for I would say the most ambitious and complex and feature |
|
|
39:23 | processor out there today. That is IBM power series of processors. They |
|
|
39:32 | lots of features in them and they That's awesome teacher, which course. |
|
|
39:42 | that means it doesn't tend to have very high core count on a single |
|
|
39:50 | of silicon because each one its core requiring a fair amount of realistic. |
|
|
40:01 | one thing again, coming back to path and is one important aspects of |
|
|
40:06 | processors and what you can do with . Um kind of a little graph |
|
|
40:14 | in the middle. This shows you weakness of the data pass in this |
|
|
40:20 | between the level two and level It doesn't say it's between level one |
|
|
40:24 | the way IBM does it for this the include the level one in what |
|
|
40:33 | label core. So it's inside that of silicon. So but it's says |
|
|
40:42 | mhm ability to move data from L to L one is 256 bits white |
|
|
40:54 | a stability from to write things from 1 to L two Is only 64 |
|
|
41:00 | . So it's a factor of four terms of the ability to load data |
|
|
41:05 | Store data between L one and L between L two and L three is |
|
|
41:11 | of balanced and I think it depends yeah, application, they have been |
|
|
41:17 | what they see as important. So this case obviously came to the conclusion |
|
|
41:26 | it was okay that between L two L freedom may be fine to have |
|
|
41:32 | much the same capability for loads and But when it comes to L 3 |
|
|
41:38 | make memory it's again, 2 to . I shouldn't say again, it's |
|
|
41:44 | to 14 to one and two to three. 2-1 is kind of common |
|
|
41:49 | most processes but this social little bit this case. So the market that |
|
|
41:57 | targets for this tip is most transaction or database processing. Uh No so |
|
|
42:11 | for the scientific and engineering type computation . So you don't as well as |
|
|
42:21 | the internet type applications so you don't many power type architectures. Yes but |
|
|
42:29 | or google or Microsoft for that Mm Other things to pay attention to |
|
|
42:37 | terms of um the cash is is ice. So in this case the |
|
|
42:46 | one is 32. Tell avoid most the process that have separate Instructions and |
|
|
42:54 | Cache at that Level one. But level two there tend to be um |
|
|
43:02 | er or unified. I said sometimes so data instruction share down to use |
|
|
43:09 | they tend to Have their own one and 32 kilobytes. The sephardic |
|
|
43:17 | Size for L. one data and caches. So when you come to |
|
|
43:24 | . two as I mentioned the tend be bigger. So for this case |
|
|
43:29 | too is 512. So considerably bigger the 32. So So that is |
|
|
43:40 | 16 times bigger. So it's noticeable . The other thing that is somewhat |
|
|
43:48 | for this architecture is That the cash sites that means seven amount of data |
|
|
43:55 | that is moved together between the cash and the main memory. That is |
|
|
44:03 | as a block of data that is of atomic. And in terms of |
|
|
44:07 | stuff around 128 fight is a bit than the most common is 64 |
|
|
44:17 | And one more thing to point out as I mentioned in terms of the |
|
|
44:23 | , it's what's known as associative No and then maybe you know where |
|
|
44:29 | off but if you don't it's a of places to which a chara fine |
|
|
44:37 | and be assigned in the cash. in a four way cash it means |
|
|
44:48 | cash line taken from memory can be in one of four places in |
|
|
44:54 | It cannot go in an arbitrary So in terms of associative itty of |
|
|
45:01 | is there are at one extreme direct caches that means There is a 1 |
|
|
45:08 | one correspondence between location and cash and memories. You can only go in |
|
|
45:14 | place. So if you need to something that means that whatever is in |
|
|
45:21 | location uh oh maybe over it. if that's allow otherwise that thing first |
|
|
45:29 | to be stored the memory before you load the data, new data that |
|
|
45:34 | to go in that place. And the other extreme is the fully associated |
|
|
45:40 | language case whatever comes from memory can stored in any place in the cash |
|
|
45:50 | then there's different cash policies and I will talk about that later in |
|
|
45:54 | of when you have a choice where you choose to store the data and |
|
|
46:05 | things? So in this case it's power hungry processors on this consent, |
|
|
46:11 | known as the T. V. . That is a thermal design power |
|
|
46:16 | is the participation that the processor is to be able to sustain. It's |
|
|
46:27 | necessarily as I mentioned last time, maximum power that the process that will |
|
|
46:35 | use and that needs to be cool of the TDP is a common number |
|
|
46:43 | it's important to realize that is not for the maximum power consumption of the |
|
|
46:50 | . Okay, that's in the As I mentioned last time that one |
|
|
46:55 | the computer vendors that we both come from mate and I think that's and |
|
|
47:07 | guess one part, one more comment this ship is that as I |
|
|
47:14 | I've been targeted, I don't know processing or database applications. So that |
|
|
47:21 | memory bandwidth is particularly critical. Um for that reason the in fact, |
|
|
47:29 | the chip itself has a channel is memory but then they actually made kind |
|
|
47:35 | a buffer memory if you like. replicates each one of the memory channels |
|
|
47:40 | the processor chip four times. So fact it has a little of 32 |
|
|
47:50 | to main memory. So that also you can have a lot more main |
|
|
47:58 | on the single processor than most other can. And when you read things |
|
|
48:04 | this systems, that means you have better chancellor cashing things in main memory |
|
|
48:09 | in most other designs. So here an inter skylight now. So I |
|
|
48:19 | to probably not spend as much time each one of the new other processes |
|
|
48:24 | try to allergic to things and just out some differences. So Skylink is |
|
|
48:30 | one that is being used on stampede . It's not the most recent generation |
|
|
48:35 | mental that the slide on that but um this is what you come |
|
|
48:41 | you, so it has 28 so that is kind of more than |
|
|
48:47 | a little bit more than a number course on the IBM process I asked |
|
|
48:51 | about Yeah, the other is instead forgot to mention on this uh IBM |
|
|
48:59 | but we'll go back and bring it now and that is up here in |
|
|
49:04 | corner that says there's a little bit confusing terminology in terms of threads when |
|
|
49:11 | comes to processors, uh and it's because slightly different mechanisms used to handle |
|
|
49:21 | threads on a single core. So call it simultaneous multi friending or S |
|
|
49:28 | t. Um and the course on IBM system is basically designed to Be |
|
|
49:42 | to manage four threads concurrently within the court. Well, actually I |
|
|
49:54 | Um Yes, so for now, good enough. So On the other |
|
|
50:06 | , if you look at the more instructions that type architectures, which is |
|
|
50:13 | the power crosses surrounds, it runs own instructions. Um So for the |
|
|
50:20 | that originated by internal way, way . Um the comment thing is that |
|
|
50:28 | are designed two be capable of managing threads, that's the same time and |
|
|
50:38 | call it high preferably and so there's M D. But the mechanism for |
|
|
50:46 | the friends are different and that's partially the different terminology is fair. So |
|
|
50:52 | doesn't confuse the different mechanism. We need to get into the mechanism and |
|
|
50:56 | of course, but things to be of that, both the chips on |
|
|
51:04 | too as well as stampede to are to manage to threats for court now |
|
|
51:14 | admins can configure whether they enable hyper for now and now. I do |
|
|
51:24 | remember about to ask my man remember many times size, turn off the |
|
|
51:34 | of hyper spending and the reason is back to an earlier flight today, |
|
|
51:43 | everything in the core is kind of private to the core nor replicated for |
|
|
51:53 | threats and most things are in fact between the threats operating in the same |
|
|
52:02 | . So that means if you have than one thread then threads compete for |
|
|
52:09 | same resources and may in fact the performance from the uh improve performance. |
|
|
52:18 | hyper threading is good when kind of for instance, memory to deliver things |
|
|
52:26 | maybe one of the threads has stuff so it can proceed then there is |
|
|
52:33 | really contention for say functional units and have with it. Multiple threats may |
|
|
52:41 | win. But for many well designed als, that's not the case. |
|
|
52:50 | multi threading loses. So I think the past for bridges, one final |
|
|
52:56 | correctly, Pittsburgh turned off multi threading maybe the stampede you had enabled |
|
|
53:07 | is that correct? Yes. Until threading enabled british doesn't right. I |
|
|
53:14 | know for bridges to what the status , I think it or it still |
|
|
53:22 | a hyper turning off but just this point to be aware of again, |
|
|
53:29 | you try to understand cold and cold , this notion of hyper threading and |
|
|
53:35 | one shouldn't make the mistake and believe when you use more than one |
|
|
53:41 | the performance doubles and I asked shouldn't great because so many resources are shared |
|
|
53:49 | then maybe contention for the same resources you enable more correct. Um the |
|
|
53:57 | thing I didn't comment so much on the, on the IBM process is |
|
|
54:03 | notion of very long instructions. Word our Cindy features, which is something |
|
|
54:13 | you can do basically you have kind similar to what's the case for Gpus |
|
|
54:22 | there replication of a floating point units allows you to do in a single |
|
|
54:31 | many same mouth ads in the same . So you get the very wide |
|
|
54:39 | that then if they application is such the code is such that you can |
|
|
54:46 | them into a single instruction than you close to the peak performance average and |
|
|
54:54 | there is and things just considered in case the level on caches are very |
|
|
55:02 | to the IBM power ship. They a little bit more believe level two |
|
|
55:09 | and and so on and it's also Power intensive chips so it's about 200 |
|
|
55:19 | peak um or TDP the thermal design um the other thing that is importance |
|
|
55:32 | so this chip has six memory channels the next generation ship after this the |
|
|
55:41 | increased to eight. Remember the power eight channels that was kind of the |
|
|
55:47 | age as skylink or even a little older. They already had a more |
|
|
55:53 | more on the memory bandwidth team and the processor level and for comparison I |
|
|
56:02 | the the main competitor to into in of running similar instructions um they were |
|
|
56:11 | sir had more memory channels and better with the memory than many other |
|
|
56:18 | Now there's this more comfortable but for long time intel got a lot of |
|
|
56:24 | for not having enough memory bandwidth, processors. The other part wanted to |
|
|
56:32 | attention to this particular sign is the left hand part and that goes back |
|
|
56:39 | this notion I had mentioned earlier on that it is firmware that controls the |
|
|
56:48 | on the chip. So what the left hand graph um intend to tell |
|
|
56:56 | is that Well one used this very instructions and the chip consumes anticipate more |
|
|
57:09 | and that the power dissipated is related the clock frequency. So in order |
|
|
57:15 | to overheat the chip, if you to make use of all the resources |
|
|
57:20 | the chip, it gets clocked out there's nothing you can do about it |
|
|
57:29 | and someone up let you kind of their ship. So they tried to |
|
|
57:34 | you from being too ambitious and trying squeeze performance out of your processor. |
|
|
57:39 | in this particular case, if you to use all the course and using |
|
|
57:46 | full extent that is RAVX five felt , the co operators about half of |
|
|
57:52 | it otherwise would be. So if just have a single thread in a |
|
|
57:56 | core, the clock right? Maybe the more than twice that of before |
|
|
58:03 | used Every X five films instructions. , so here is just a little |
|
|
58:12 | more an additional comment on this slightest to just be a little bit more |
|
|
58:20 | than the previous picture of the fact there is nowadays so many course on |
|
|
58:28 | single process of die that's using kind just a single bus and have every |
|
|
58:35 | talking to a single bus doesn't So each ship today has a network |
|
|
58:41 | it and it's most common today with number of course that are on the |
|
|
58:47 | that fair list, simple in some to the emotional and more sophisticated network |
|
|
58:55 | issues that tend to use to the . So this is just pointing out |
|
|
59:02 | that is the case and the other to make here in terms of the |
|
|
59:06 | level cash, that is the NlC this cash. It's basically, each |
|
|
59:11 | has a little bit of it even it's shared things from an access point |
|
|
59:21 | all of course. But it also that some pieces on the third level |
|
|
59:27 | closer to record than others. So means that the access time To the |
|
|
59:32 | level cash is not uniform on the . So it depends on the relative |
|
|
59:40 | between data and the court that once data on the doctor. Mhm And |
|
|
59:49 | a little bit just to understand that of designs. A fairly complex and |
|
|
59:54 | just so it's a little bit of data path in this case they level |
|
|
60:01 | and the steps of doing instruction the and breaking it down into what they |
|
|
60:06 | on my crops. And some people always in the architecture community kind of |
|
|
60:13 | a not the war but tussle about I should have a complex instruction set |
|
|
60:22 | a simple instruction set known as risk risk now has gained a lot of |
|
|
60:28 | over in recent years that then tend have a much more limited instructions that |
|
|
60:35 | that means it takes perhaps more instructions to get the particular operation done in |
|
|
60:42 | complex instruction set. But it also that the architect and potential can be |
|
|
60:50 | . So what kind of intel has over the years that has been an |
|
|
60:54 | of assist constructions that the kind of now the Microsoft's, that is more |
|
|
60:59 | risk instructions. So they break down complex instructions into risk like instructions before |
|
|
61:06 | get execute. And I would also if you haven't already done that. |
|
|
61:16 | a very useful. Things is not detailed about to get the big picture |
|
|
61:22 | what happens to listen to the triggering lecture given by john Hennessy and the |
|
|
61:30 | a few years back, it's, think that you were earliest on some |
|
|
61:34 | otherwise deceased. The finding by googling here's a little bit how things are |
|
|
61:41 | put together and the west means to as you can imagine. And that's |
|
|
61:47 | of the way things I've put together this stampede pra notes and then there |
|
|
61:54 | higher, you know, socket but as I mentioned before, the |
|
|
62:00 | sockets are by far the most but there are also in particular |
|
|
62:06 | when you have need for lots of but then in an old touch, |
|
|
62:11 | tend to do four socket or eight configurations but and then I also put |
|
|
62:21 | just a little bit for reference uh most recent survey process that there are |
|
|
62:29 | this year. Um I don't know it's fully object but it's supposed to |
|
|
62:34 | available before the end of the year least. Um and the things that |
|
|
62:40 | predominantly has done, they have upped core comes like everybody else on this |
|
|
62:46 | compared to Skylight. Now there are course, not quite the double of |
|
|
62:51 | 28 course but significant more. They design them to do the two threads |
|
|
62:57 | core if you choose to You still this five forward find instructions, they |
|
|
63:07 | a little bit of data size for data cache but the instruction gas is |
|
|
63:13 | the same as previous generations and amount memory is more or less the same |
|
|
63:19 | core. Right Then they have a , they increased from 6 to 8 |
|
|
63:25 | channels and they're also Increase the support data rate for the memory to this |
|
|
63:31 | 200 memory But it's also more fire business that around 200 it's not going |
|
|
63:37 | closer to 300 T. V. . Um So then just uh talk |
|
|
63:46 | little bit on the am decide the competitor 10 until it's not as widely |
|
|
63:56 | because they have had if you misfortunes the the years they were serious competitors |
|
|
64:03 | then they made some missteps and almost out of business and then they came |
|
|
64:08 | and then it started to become competitive I made some mistakes again and again |
|
|
64:13 | went out of business but now when kind of generation on their architecture that |
|
|
64:20 | as theirs and of course they are competitive and gaining a lot of |
|
|
64:26 | So at this point as well worth familiar also the D and d processors |
|
|
64:36 | now if you use clouds and some context um pretty much all the cloud |
|
|
64:44 | now also allow you to choose between instances that runs on intel or run |
|
|
64:53 | AMG tend to be the, if johnson an AMG processor it's likely cheaper |
|
|
65:01 | you then if he runs on an on their long story about that. |
|
|
65:07 | part of the reason for this difference cost is that am they would say |
|
|
65:17 | have been leading or pioneered sign designs later than has been come adopted by |
|
|
65:27 | I already mentioned that have led in of emphasizing memory bandwidth, they have |
|
|
65:34 | led in terms of focusing on power energy consumption so they have always been |
|
|
65:43 | power than intel cpus there are many changes differences too but one that has |
|
|
65:49 | them to the price competitive is that tended to have get the core account |
|
|
65:59 | by using what's known as chiplets that now becoming common industry. So they |
|
|
66:07 | have a piece of silicon in their often doesn't have a large number of |
|
|
66:14 | honest but they put a bunch of of silicon that repeats the modest number |
|
|
66:21 | course on it or in the same . So in fact what you're getting |
|
|
66:29 | a high core count processor but because piece of silicon they're using smaller, |
|
|
66:38 | get the higher yield so that means cost per core is lower. So |
|
|
66:44 | can in that case be able to on price. There was, you |
|
|
66:49 | see in this case things are very . They actually have had in this |
|
|
66:55 | slightly higher in this case the larger cache. Oh but there are kind |
|
|
67:00 | in the same boat podcast you see and they have been trailing a little |
|
|
67:06 | in terms of the with some instructions In this case they have to 56 |
|
|
67:16 | wide instructions was half as wide, half as many things can be done |
|
|
67:21 | a single instruction but they have trying some sense to make up for it |
|
|
67:26 | they have more coarse um there are and end so it's a complex scheme |
|
|
67:32 | figuring out how to stay competitive but can also see that um compared to |
|
|
67:41 | it lower power dissipation. Mhm and is just for your reference uh data |
|
|
67:52 | the I am the processes used in too. So um the so it |
|
|
68:06 | to the internship, I guess one I should point out is they have |
|
|
68:10 | fact a little more level three cache intel has so they have more memory |
|
|
68:17 | the die then even though they also a higher cork are but like the |
|
|
68:26 | , it's the same in this case participation. So let's see so shoe |
|
|
68:35 | examples has said things have gotten interesting terms of diversity things converge for a |
|
|
68:42 | long time. Everything became kind of until because of stumbles by and they |
|
|
68:48 | above all everything was kind of there instructions that the X- 86 that is |
|
|
68:54 | run by MD processes but those processes power hungry. So the mobile community |
|
|
69:04 | the end use either of them. used arm processors that have been designed |
|
|
69:12 | be very power efficient and this is the most recent but I think fairly |
|
|
69:21 | of what you can get in today arm and arm is not building their |
|
|
69:27 | processes. They design processors and the the designs, so Samsung and apple |
|
|
69:34 | that better. Um, and many have used arm processor designs and building |
|
|
69:45 | processor chips. Right? So as can see in this case they are |
|
|
69:50 | even though they are designed for being energy efficient. It's not that they |
|
|
69:54 | skipping on the size of cash is fact even larger. Level one Cache |
|
|
70:00 | than was on the very sophisticated Power or 9 processors. And the same |
|
|
70:08 | with level two Level 3 Cache is by no means sub standard. It's |
|
|
70:13 | that the focus has been different and other features that they don't have but |
|
|
70:20 | of No a couple of 100 watts this case this particular one is like |
|
|
70:27 | . Cool. It's one and a of 10 actually 40 or 50 less |
|
|
70:36 | than the other ones. So that caused a number of companies to try |
|
|
70:42 | figure out how to use this design compete with intel and Andy And that |
|
|
70:49 | happened in the last 5 10 years power became such a constraining factor for |
|
|
70:57 | . So amazon they went off and their own process suits using the or |
|
|
71:06 | using the arm designs. And this just example of using the designs you |
|
|
71:15 | in the previous sign and putting together system using those kinds of processors and |
|
|
71:20 | other components that armed folks design and amazon made a piece of silicon and |
|
|
71:30 | it together and actually making a complete . So today if you use amazon |
|
|
71:37 | can also opt to have your virtual run on what amazon called the graviton |
|
|
71:44 | it's the core of it is armed and then there again cheaper given them |
|
|
71:53 | using the N. D. Instances amazon. And there is just another |
|
|
72:02 | that also based their chips on arm they uh in this case here 60 |
|
|
72:09 | and I'm not going to go into details but a number of companies that |
|
|
72:14 | out trying and playing around with using marvel which is the company time they're |
|
|
72:20 | designed then this is yet another companies that actually build their own silicon that |
|
|
72:30 | with the arm designs. But unlike amazon that you can only use on |
|
|
72:37 | , they don't sell their chips, just use it for running right their |
|
|
72:44 | . But parents to start up founders from into a few years ago, |
|
|
72:50 | about the five year old company that this year released their first processor And |
|
|
72:57 | first release have 80 course, that's a high core count. CPU And |
|
|
73:04 | you can recognize decisis of caches and and numbers are similar to the ones |
|
|
73:09 | had before and again they use eight channels to get to memory. |
|
|
73:17 | and this is a little bit the , I want to say something about |
|
|
73:21 | used to so before I have a minutes left today. So, but |
|
|
73:26 | is a summary in terms of a of course most of them use |
|
|
73:35 | has the ability to have two threats socket. So, and I am |
|
|
73:43 | up to this before or eight threads depending upon its a bit, it's |
|
|
73:49 | flexible design. So you can figure course as the atomic versions, so |
|
|
73:55 | speak or you can gang two of together and treat them as if they |
|
|
73:59 | a single court. But in the , What the chip can do is |
|
|
74:04 | threats. Mm hmm. And the terms is the 32 arithmetic operation is |
|
|
74:12 | you used to say the x fight version and the powers and the course |
|
|
74:17 | pretty much this summary, you know about um, the DP us. |
|
|
74:24 | this is the Gpu that is on ranges too, I think there is |
|
|
74:33 | more release from envy their past the that is installed in the bridges to |
|
|
74:42 | but this is still quite typical and not far off from what the most |
|
|
74:49 | edition from a *** is, they a few different versions depending upon when |
|
|
74:54 | target ai or machine learning work clothes more floating point intensive work clothes but |
|
|
75:01 | point of this, I want to what this slide is uh that the |
|
|
75:09 | of threads or that means the number parallel instruction streams this was just before |
|
|
75:16 | of being in the tends to lower is now and the little thousands to |
|
|
75:24 | thousands but there is the restrictions that threads need to have a lot of |
|
|
75:32 | . So one needs to be able use this Cindy feature. So you're |
|
|
75:40 | have one instruction like mouse pad and you have lots of options on |
|
|
75:47 | Then you can make good use of of the GPS um so what else |
|
|
75:57 | want to stress and this one is thing that I had them in early |
|
|
76:04 | that there is a difference and part the, that was a popularity or |
|
|
76:12 | of Gpus is that they have higher bandwidth and CPU safe to use today |
|
|
76:20 | I said had to play up to 64 Bit White Memory Channels. So |
|
|
76:29 | is uh What 512 bits wide data to main memory persecuted. Where is |
|
|
76:43 | Gpus tended to have more bits and particular now when they have started to |
|
|
76:50 | this high bandwidth memory They have like chosen this slide there's 4000 bits of |
|
|
76:57 | 96 to be precise. So it's factor eight wider data path to the |
|
|
77:04 | . But it also means you don't these things out and use modules on |
|
|
77:10 | motherboard but it actually has to be a single packet. So you never |
|
|
77:17 | out on their motherboard and then you get This considerably how your bandwidth that |
|
|
77:24 | 5-6 times higher than what you see the CPU but it also means you |
|
|
77:32 | kind of restricted in the size of memories. So yes your gang memory |
|
|
77:37 | but you lose in the sights of memories. So Typically what you see |
|
|
77:42 | up to 32 um gigabytes of memory is on there and the political service |
|
|
77:52 | So the bridges to seemed to use think and what 2 56 um Uh |
|
|
78:00 | . Server for the regular memory for extreme memory that may be a few |
|
|
78:08 | . So it's a huge difference in of the memory. So that |
|
|
78:14 | And another aspect of the G U. S. Is there actually |
|
|
78:18 | designed to need a CPU to run they are kind of in a touch |
|
|
78:25 | and they're not self contained. So means the problem usually starts and ends |
|
|
78:30 | the CPU and then the CPU and no need to talk to each other |
|
|
78:35 | solve the entire problem because the memory too small on the GP. And |
|
|
78:41 | I have for comparison the competitor to and number one competitor is A. |
|
|
78:50 | . D. And the again have or broaden their activity now that terrorists |
|
|
78:58 | doing well and have some money. used to be dominating I would say |
|
|
79:04 | terms of the game market whereas NVIDIA on went after this server and scientific |
|
|
79:13 | data center market and now Andy is to do the same and this is |
|
|
79:19 | release from last year that is fully with the recently and envy their releases |
|
|
79:29 | in terms of floating point performance and much anything else. So it's good |
|
|
79:35 | know about the same the chips to my time is always up. But |
|
|
79:41 | I wanted to just mention that the things that I'm there, internet companies |
|
|
79:50 | done in particular google's and has been uh designing their own service for a |
|
|
79:56 | time then even if several years ago they started to thank you for designing |
|
|
80:03 | own silicon and they instead of using US for machine learning they design what's |
|
|
80:13 | as a tensor processing unit. Tpu that has pieces of functional units are |
|
|
80:25 | helpful in getting good performance from machine . And again matrix multiply is one |
|
|
80:30 | the core operations. So there might multiple units on their TPU chip and |
|
|
80:36 | graph up into the right tend to the performance and it's a real flying |
|
|
80:42 | . You see this land dick term then you see the peak performance when |
|
|
80:45 | get to being compute found and then get compute bond. This is a |
|
|
80:52 | scale on the vertical axis. So kind of be deep used by an |
|
|
80:59 | of magnitude more for machine learning and . So I think that's why I |
|
|
81:05 | to point out and this is coming the point that in order to get |
|
|
81:11 | energy efficiency and that's what's behind a of and now many efforts in doing |
|
|
81:19 | processors is that you're getting game orders magnitude in both performance and energy efficiency |
|
|
81:25 | tailoring your designs to your work. and without that I think I will |
|
|
81:31 | skip the last few slides today because time is up conceive and take some |
|
|
81:35 | . This is just given the the of one process we worked with in |
|
|
81:39 | group that shows a different width of data passed on a particular processor and |
|
|
81:46 | the different clock rates for the different that is being used in the system |
|
|
81:52 | with that I think I just stopped and this is something that's josh worked |
|
|
81:57 | also shows this heterogeneous processes that has common in the mobile market where you |
|
|
82:03 | different pieces of silicon that are tailored particular functions like crypto engines and display |
|
|
82:10 | and are your engines and um it's of more floating point and so you |
|
|
82:17 | different types of processors and here is of a snapdragon and you can see |
|
|
82:21 | have the GPU has a display process D P U. Is a vector |
|
|
82:27 | of CPU and has a digital single sp and I said to you so |
|
|
82:35 | , that's other things to also be of. You don't we want deal |
|
|
82:39 | that in the class except attach processor the firm in terms of the GPU |
|
|
82:46 | if you end up working in something to mobile processing, you end up |
|
|
82:52 | the chips that has different functional units has different instruction sets and programming becomes |
|
|
82:58 | more complex than what we deal with the class. Okay, sorry I |
|
|
83:03 | stop there. So any questions. bro. So again, this was |
|
|
83:30 | translucent and to both get to details details are important for understanding performance for |
|
|
83:38 | assignments are going to do plus give a little bit of what's out there |
|
|
83:43 | generally used in other contexts and focus the elements. Again, our |
|
|
83:51 | it's important. Unfortunately it's not just thing. Mhm. Okay, so |
|
|
84:13 | yeah. All right. So when has not been and have few |
|
|
84:21 | I see your username rick, I make sure that happens right now after |
|
|
84:27 | , you know that added sugar access . Yeah. So I'm sorry if |
|
|
84:36 | missed your email earlier, but I seen it. Okay. Yeah. |
|
|
84:52 | . I'll stop sharing screen. I will stop referring |
|