© Distribution of this video is restricted by its owner
00:00 | Yes, that shit, Um, the lecture. So, actually, |
|
|
00:07 | see if we can figure it out chats or raising hands or something as |
|
|
00:16 | . Us too. If anyone has had a computer architecture course. |
|
|
00:34 | one. Alright, that's OK. today I'll talk about memory system, |
|
|
00:46 | on which is a little bit, , something and probably repeat for those |
|
|
00:53 | you who have had it computer architecture since a lot off performance issues has |
|
|
01:01 | deal with the memory system not to performances when needs to understand the memory |
|
|
01:09 | . So that's why I'm going to today basically talking about it, that |
|
|
01:15 | kind of to frame levels from this . Cash is in the local memory |
|
|
01:20 | try Thio. I explained how they because that helps in trying to understand |
|
|
01:28 | you wish your code but do in to get good performance. And |
|
|
01:34 | the reason is that memory system tends be the limiting factor for so many |
|
|
01:40 | . So first caches and then talk main memory and the examine one you |
|
|
01:47 | , uh, doing this memory benchmark stream and groups. For those of |
|
|
01:54 | who actually did make them directly But figuring out the gigabytes per |
|
|
02:00 | you probably discovered that there was a of 10 or more difference in the |
|
|
02:07 | of the bandwidth you got in the cases. And the question is really |
|
|
02:12 | ? So I'll try toe give insight to why that is in this |
|
|
02:16 | The first cash is, um, terms and hopefully familiar to most of |
|
|
02:21 | , even though you haven't time. classes, hit and missile caches and |
|
|
02:27 | hit is when the process to try receive the street data Alright, data |
|
|
02:35 | to cash that is present there. nice one doesn't have to go the |
|
|
02:42 | memory to find the data on Mrs the opposite when things are up |
|
|
02:48 | So that means that you don't have older data from main memory into the |
|
|
02:55 | . And I'm just talking about this that it's the simplest context in which |
|
|
02:59 | is just one level of cash. , but it's all effectively the |
|
|
03:05 | even though there's more complicated between several of cash. But for now, |
|
|
03:10 | concept of hit and miss and penalties the second and the administrators simply just |
|
|
03:20 | minus the hit rate hit trade. , the frequency of instructions that actually |
|
|
03:28 | or data references that hits in Then the cash line concept has been |
|
|
03:34 | about. I did when I talked the processor architectures talked about cash. |
|
|
03:39 | and I talked about cash lines Isis about Leighton sees and banquets, and |
|
|
03:45 | try to shed some more light on these things are important. And there's |
|
|
03:51 | , Dr Johnson. Yeah? Did go back a sled? Um, |
|
|
03:55 | the miss penalty. So that does include the time it takes to determine |
|
|
04:00 | the hit or miss right? Its's the actions thereafter. Good point. |
|
|
04:06 | they'll be precise. Um, the penalties, yes, should be not |
|
|
04:13 | time taking to discover whether the item there or not. And I'll talk |
|
|
04:18 | little bit about that on several slides today. How I missed things actually |
|
|
04:25 | . Um, so there is kind an overhead, regardless, whether it's |
|
|
04:31 | or miss, that is related to out whether things air president or not |
|
|
04:40 | and for that overhead, I'm not how you actually measure. Ah, |
|
|
04:53 | hit and Miss right, you can through pocket, for instance. |
|
|
05:00 | so we'll see if it gets a bit clear. But police otherwise come |
|
|
05:05 | to the question, that girl. , thank you. So then there |
|
|
05:11 | the notion of locality that has talked lot about in terms off both architecture |
|
|
05:16 | and caches and codes. And there two kinds. There's one known as |
|
|
05:22 | , and one knows a special on is simply that data item. Mhm |
|
|
05:32 | used again, whether it's red or , but it's access the same. |
|
|
05:38 | them again before too long, so not precise exactly what short or long |
|
|
05:46 | . But the idea is that they pay too many instructions before it touched |
|
|
05:55 | again, and we'll talk about that some future lecture. There's something called |
|
|
06:02 | Use Distance that tells how many instructions goes on between of touching the same |
|
|
06:07 | and again, the special locality uh, locality. We respect toe |
|
|
06:16 | that are nearby. We'll speak in , so it's not the same right |
|
|
06:22 | again, but it's like, you , walking down the line, and |
|
|
06:27 | you touch things down the line, are next will be addressed soon, |
|
|
06:32 | if you jump around so that's spatial is what you have in the stream |
|
|
06:37 | when you had the fact that destroyed . So you go to the next |
|
|
06:41 | in the next instruction, whereas in you don't have special locality because you |
|
|
06:47 | around all over them. Yeah, . But, um, there are |
|
|
06:55 | or three types of mrs and non talks about compulsory capacity and conflict. |
|
|
07:01 | I'll talk a little bit more about not so much about the compulsory, |
|
|
07:06 | that is simply that data has to at least once loaded into cash because |
|
|
07:16 | in principle do not reach for things the first level of cash. So |
|
|
07:25 | though there is a little bit, , great area but compulsory simply, |
|
|
07:31 | has to be written at least once or written two main memory. So |
|
|
07:37 | no way around it that main memory to be accessed at least once for |
|
|
07:42 | item being used in the computation. that's the compulsory part, the capacity |
|
|
07:48 | that caches are small compared to main , and they're small because they're |
|
|
07:55 | Relative thio main memories designed and towards end of the lecture, hope to |
|
|
08:00 | to that. So cash is air , so there's no way Typically, |
|
|
08:06 | data set could fit in cash. that's why you're out of chapel cash |
|
|
08:14 | . Then when you load new data currently in cash, it basically overrides |
|
|
08:20 | is in cash. So that's the , miss that. There's no room |
|
|
08:26 | the data. And then, of , if you overwrite someday, I |
|
|
08:30 | to decide how to do it, , to deal with the over it |
|
|
08:34 | data in order to preserve correctness. that's a different story. So when |
|
|
08:40 | comes Thio what to do? Which of, um, cash line, |
|
|
08:49 | , to overwrite on. Then there a replacement policy in place that they |
|
|
08:55 | talk about. And then there's yet concept off conflict, Mrs. And |
|
|
09:03 | has to deal with where the cash from memory is being placed in the |
|
|
09:12 | . So I'll talk about the strategy for doing that as well as what's |
|
|
09:19 | as associative ity on the next few . Um, so this has to |
|
|
09:26 | with no the placement part. So does cash line, which is again |
|
|
09:32 | group off memory locations treated as one unit When your reference memory. So |
|
|
09:40 | does it end up in the So in the direct map cash, |
|
|
09:45 | only one place unique place police cash . Memories can go. It has |
|
|
09:50 | choice. Um, in the fully cache than the cash line from memory |
|
|
09:58 | go anywhere in the cash and then most common ones on the set. |
|
|
10:06 | associative caches. That means there are group cash locations toe, which the |
|
|
10:17 | line can be mapped. So it a few choices. And that's when |
|
|
10:23 | about the processors. Uh, if go back and look at this |
|
|
10:27 | you will find that most of them eight way or 16 way or four |
|
|
10:34 | associated caches. That means cork a cash knows okay, different locations in |
|
|
10:41 | cash to which particular cash line in can, um, go. So |
|
|
10:51 | said at the moment, So One extreme, uh, the direct |
|
|
10:57 | is the one way associated cash because only one place they can go |
|
|
11:02 | Fully associated cash is simply a que with a K is equal to the |
|
|
11:07 | of cash lines that can fit in cash. Um, so when it |
|
|
11:15 | to replacement strategies is associate for direct cash. It's really not choice. |
|
|
11:22 | it's not an issue to be Where is when you have fully associative |
|
|
11:28 | set associate caches? There is more one way, uh, cash line |
|
|
11:34 | go to. So then you have decide which cash line should be over |
|
|
11:42 | , or everything there is sometimes called means that you will have to make |
|
|
11:47 | that the content on that cash if it's not already updating their |
|
|
11:51 | needs to be updated in memory or level passions. So the commonest, |
|
|
12:00 | common studies yeah, I would is through the for sure hell are |
|
|
12:04 | ? So that means the things that been, um, touch the longest |
|
|
12:11 | ago, so least recently used is thing that is selected or eviction or |
|
|
12:20 | . Other ones that are not uncommon sort of randomly picked one of the |
|
|
12:24 | , regardless of whether it was recently or used quite some time ago, |
|
|
12:31 | the other ones and practice things are quite a simple and clean as |
|
|
12:39 | um, strategies, at least a . You tend tohave all kinds off |
|
|
12:48 | to try to improve program behavior, I'll talk a little bit more and |
|
|
12:56 | I'll stop for more questions. so that was kind of the loading |
|
|
13:03 | . Where does it go? And you have choices, which how do |
|
|
13:08 | choose where it goes? Then there's other side When you want to write |
|
|
13:14 | . Yeah, if in today all of multi core systems and at least |
|
|
13:23 | everyone, definitely caches or private to level to tends to be private to |
|
|
13:31 | level threes or not. But if cash line is saying level one if |
|
|
13:36 | write to it, that means if so happens that on that data is |
|
|
13:43 | residing in some other cash, then you write it to it in your |
|
|
13:50 | , then it becomes invalid and all other caches, or if somehow it |
|
|
13:59 | a modified in memory than whatever is caches are no longer value. |
|
|
14:06 | of course, conversely for a particular . If somebody else right in their |
|
|
14:11 | align that too old, then your becomes invalid. Order it so the |
|
|
14:18 | coherence mechanisms are trying to deal with to make sure that data, if |
|
|
14:24 | lives in several caches, are in . On that, everybody accesses correct |
|
|
14:34 | . We'll talk more about that when talk about an open MP, because |
|
|
14:38 | an issue, uh, in multi systems and sharing of cash lines. |
|
|
14:47 | huh. There's then policies associated with . Are you right to cash andare |
|
|
14:56 | that? This problem, more or , I will say equal frequent |
|
|
15:03 | It depends, in fact, on systems with the cash hierarchy. It's |
|
|
15:09 | necessarily the same policy for all levels the cash, but the basic approaches |
|
|
15:17 | what's known us right through. All back right through is, as one |
|
|
15:22 | of intuitively can think of. It when you write to cash, main |
|
|
15:26 | also gets updated. So that's why kind of written through two main |
|
|
15:32 | So that's kind of in nice and that everything is in sync. On |
|
|
15:37 | other hand, it can generate a of excess traffic domain memory, that |
|
|
15:43 | , it's very slow compared to the , so that's why it tends not |
|
|
15:52 | be used as much as the right , and the right back is only |
|
|
16:00 | . The cash line is only written main memory when it needs to be |
|
|
16:05 | or over written, so that saves lot off traffic. Two main |
|
|
16:12 | Of course, that causes a little of problem, because when you're right |
|
|
16:18 | Monticello, watch the bus. Other can watch the bus and figure out |
|
|
16:23 | happening, and they influences us to the cash line is invalid or not |
|
|
16:31 | the right back. Other cashes in , Principal don't know, except through |
|
|
16:36 | mechanisms. What has happened to Cashman's? There is even more to |
|
|
16:43 | story, and there's something known as Allocate No right allocate. So, |
|
|
16:51 | I think I mentioned, talked about . Most processes, in fact, |
|
|
16:57 | right allocate processes policy. So what means is, when you want to |
|
|
17:03 | to memory, then if the cash is not in that holds, the |
|
|
17:10 | is not present in the cash. first have to load that cash line |
|
|
17:16 | the cash updated in the cash and potentially write it back, depending on |
|
|
17:23 | it's a right through right back so that means in the right |
|
|
17:30 | there may be a potential extra load cash lines from memory. So that's |
|
|
17:37 | that search, for instance, stream there's no data reuse. So that's |
|
|
17:43 | cash line is not present when you the output from stream, so it |
|
|
17:47 | to go ahead and load. Cash and stream does not count for |
|
|
17:54 | So that's why when I look at Max Stream performance, it's not likely |
|
|
18:03 | ever yet to the peak memory bandwidth they're an extra load that is not |
|
|
18:10 | for. But there is, tricks around that to not, |
|
|
18:20 | many processes used for this get better for streaming tight applications. And that's |
|
|
18:27 | no one has non temporal stores that kind of listed under the no right |
|
|
18:32 | where you basically bypassed the cash. through some mechanism or other, you |
|
|
18:41 | have to load something in to cash modify it and write it back to |
|
|
18:47 | address memory. And by passed the iris, Dr Johnson, yes, |
|
|
18:55 | will that be used in a situation you're fairly certain that the overhead introduced |
|
|
19:00 | the cash is gonna be more time yeah, what's taking us if |
|
|
19:05 | So that's so when can find out the instructions that the compiler generates, |
|
|
19:15 | in the analysis that the compiler does your source goal that has figured out |
|
|
19:20 | it's safe to use is, bypass instructions instead of the typical instruction |
|
|
19:28 | accesses cash. Okay. And so has what they call a non temporal |
|
|
19:37 | . The arm processors have it, most processors today have it because off |
|
|
19:42 | need to get a good performance when is fairly clear and safe, the |
|
|
19:51 | the cash. So a little bit than about how the caches are kind |
|
|
20:03 | architected or designed nothing and try to just now a little bit about the |
|
|
20:12 | question that was asked before. so the structure off the memory address |
|
|
20:23 | , ISS has shown on this slide when you have a cash system that |
|
|
20:28 | much every computer today has. So a cash taka cash index and a |
|
|
20:35 | line awesome, and the cash line this perhaps the easiest to understand. |
|
|
20:45 | respect to the memory system there's a off memory locations that actually did atomic |
|
|
20:52 | but obviously you need Nah, all bits in the cash line so many |
|
|
21:02 | the processors today has in the Most of them are 64 bytes cash |
|
|
21:07 | . Some has won 28 bytes, if you use single position or four |
|
|
21:13 | or double this eight fights, so means there's a few words in each |
|
|
21:18 | line, so you need a way tell which one you are interested |
|
|
21:22 | So that's the cash line offset to out where you want to go within |
|
|
21:29 | cash line, the other to try explain the next service, likes what |
|
|
21:34 | do. Yeah, but the cash has to deal with figuring out where |
|
|
21:46 | the cash a particular cash line goals the cash stock. Then it's the |
|
|
21:53 | off the address that is used to out if the data item is somewhere |
|
|
22:00 | cash or not, and hopefully the several slide would give some, |
|
|
22:07 | at least pictorial understanding of harvest. kind of puts together, so here's |
|
|
22:14 | little bit more on the same So and the little image box on |
|
|
22:22 | left hand side. Libya pick below middle it shows that the cash line |
|
|
22:29 | this kind off one park. And there's a cash transfer. The cash |
|
|
22:33 | , a justice part off what's in center of the memory system to retrieve |
|
|
22:38 | cash line and then the cash A justice interpreted in these two |
|
|
22:44 | The attack and the index, where the index tells where in cash things |
|
|
22:53 | . So on the tags goes into known as a director. That is |
|
|
22:56 | thing, that research trying to figure if the data is in cashew |
|
|
23:03 | and then the indexes used to figure where it lives in the cash. |
|
|
23:09 | here is not a couple of slides try to explain on this mapping |
|
|
23:18 | and I'm afraid that Cash Line would using somewhat ambiguously in talking. Tried |
|
|
23:26 | keep it as the peace and refer . Use it for the piece of |
|
|
23:33 | or collection off memory locations in main memory that has the actual |
|
|
23:42 | And then sort of that content gets into the cash. And on this |
|
|
23:47 | , the notion of frame is used locations in the cash where in cash |
|
|
23:52 | from memory goals but it's sometimes hard keep them separates. Or sometimes cash |
|
|
23:58 | mean some the data in the frame the cash now. So here is |
|
|
24:07 | first illustration off for the associative that has in this case, the cash |
|
|
24:17 | hold eight cash lines. So on case, called frame locations here of |
|
|
24:25 | that is there different places for the lines. And since fully associative, |
|
|
24:31 | means anyone on the cash lines in can use any one of the frames |
|
|
24:37 | 0 37 in the direct map And I'll show that on the next |
|
|
24:43 | how these things actually work up the . That cash is just anyone. |
|
|
24:51 | in memory can only use one off eight locations. We cannot be mapped |
|
|
24:59 | anyone of this seven, so that's other extreme from the fully to the |
|
|
25:06 | . Then they set their associative In this case is illustrated by having |
|
|
25:12 | sets on within each set. You two options, two frames, so |
|
|
25:20 | is kind of a two way, you like set associative cash, so |
|
|
25:27 | next slide shows and kind of a . And if it takes the cash |
|
|
25:33 | who doesn't address 15 and the simple . It can choose as I mentioned |
|
|
25:39 | or the eight location in the for associate cash in the direct map |
|
|
25:45 | It only has one choice on its figured out. Just taking the cash |
|
|
25:53 | address and doing it. Ahmad, size of the cash. So if |
|
|
26:00 | take 15 in this case, they line addressing main memory mob the number |
|
|
26:05 | locations. Eight. Then you get your seven. So that's it's only |
|
|
26:12 | and that figure to the set associative now. Do you dio sort of |
|
|
26:23 | the number of sets instead? And its set, you have three |
|
|
26:29 | So if you do ma before you the number three, so that means |
|
|
26:33 | goes to set three. But it's from this exercise determined which one of |
|
|
26:39 | locations and that set it's going to used. That comes to using the |
|
|
26:46 | policy to figure that out. now, um, one more comment |
|
|
26:55 | want to make on this slide and I'll take questions if their questions on |
|
|
27:02 | . But both in the direct map such associative caches, this restrictive placement |
|
|
27:15 | be quite a drawback if the code very regular regular memory access pattern. |
|
|
27:31 | my canonical example is fast for your for those who you are familiar with |
|
|
27:39 | , Um, the data access patent to be having strikes that have powers |
|
|
27:45 | to, And that means in direct cash. That tends to be that |
|
|
27:55 | , um, data request made, fact, map to the very same |
|
|
27:58 | line. So when you get poor cast utilization, you get that |
|
|
28:03 | better in the set associative. But could still be the case that, |
|
|
28:10 | the whole range O R sequence our accesses, ends up, come to |
|
|
28:15 | same sex, and that means very cash behavior. So I said that |
|
|
28:21 | take questions at this point. If anyone, I will do An example |
|
|
28:31 | summer. Huh? CNN the Was there something in the chat? |
|
|
28:38 | you tell me? No. I said there is nothing in the |
|
|
28:42 | . All right. So I guess doing the example, I will illustrate |
|
|
28:49 | these others interpretation works in the three . Direct map for the associative and |
|
|
28:54 | such associative. Um, so in direct map. We had these three |
|
|
29:03 | the tag, the index and the . And, um, when a |
|
|
29:09 | map, the index basically just tells what cash line the or what location |
|
|
29:18 | the cash the cash lamb is a to. So when you try to |
|
|
29:22 | out whether the data you want is in the cash or not, the |
|
|
29:31 | thing you have to do into inspect for that particular cash location and compare |
|
|
29:41 | to the tag associated with the data want to use. So it's a |
|
|
29:45 | simple comparison to figure out whether the is indeed present in the cash or |
|
|
29:52 | . And then the offset within the is pretty straightforward. So this is |
|
|
30:00 | direct cash now, so we It's very simple and it's inexpensive, |
|
|
30:07 | s already mentioned that has potential problems being able to benefit from the |
|
|
30:18 | Take the other extreme, and that's fully associative cache than there is |
|
|
30:25 | um, basically, location in the needs to be interpreted. So the |
|
|
30:32 | stock is all the bits, in addition to the bits used to |
|
|
30:39 | where in the cash on your Let's so then the picture looks more |
|
|
30:44 | this. What's supposed to happening? now, since the cash line can |
|
|
30:48 | any player in anywhere in the cash has Thio look up all the different |
|
|
30:57 | for all the frames or locations in cash for a cache line can go |
|
|
31:06 | figures any one of those happens to . Attack associate with a cache |
|
|
31:11 | You're trying to access data from all access to. So basically, you |
|
|
31:18 | to do the search for comparison or target tag would all the tax |
|
|
31:26 | So this is a lot more expensive in terms off, um, deciding |
|
|
31:35 | it's there or not. So that's time was, it's more cumbersome and |
|
|
31:41 | . So yes, aan den. guess the such associative cache is kind |
|
|
31:48 | a trade off between the fully associative offering some flexibility compared to the direct |
|
|
31:56 | direct map cash, but limit the that is required. Ah, compared |
|
|
32:04 | the four the associative cache. So the cash index is the number of |
|
|
32:12 | used to identify which set is I'm to be used to map the cash |
|
|
32:23 | in memory. So then the picture somewhat like this. Where can you |
|
|
32:29 | a tag? And the search for the cash line is present in the |
|
|
32:34 | or not is just searching the tags with the cash line or the frames |
|
|
32:42 | , except that is targeted by the index. So right, so any |
|
|
32:55 | on that? So it's just the offs and this overhead That's sort |
|
|
33:07 | Yeah, depending upon what the process designers decided to do and how they |
|
|
33:15 | the search mechanisms for figuring out how cycles it takes to determine what's in |
|
|
33:22 | or not. No thanks from, , a little bit of comments here |
|
|
33:32 | . What function do, too? to improve cash performance and I'll do |
|
|
33:37 | little bit. Walk through of an , um, are going through the |
|
|
33:44 | plus miss and turn off the of course, once cashes to profess |
|
|
33:50 | know I want to reduce the time the hit. I want Thio. |
|
|
33:55 | is miss rates. I want to his penalty all of these things, |
|
|
34:01 | how do you do it? And a trade off, so I think |
|
|
34:03 | the next cider trying to comment on few of those. So, of |
|
|
34:08 | , larger cache sizes that is beneficial many regards. Um, because that |
|
|
34:16 | more data is closer to the functional . So, you know reduces this |
|
|
34:27 | right? Well, not opportunities, chances, because there's more stuff in |
|
|
34:33 | cash. On the other hand, more the larger the cash is than |
|
|
34:38 | hit time increases, because if the process becomes no longer so, on |
|
|
34:47 | other hand, you can also similar associative ity that reduces the conflict, |
|
|
34:55 | , because there's more places a cache can go to. Um, but |
|
|
35:02 | , the search then increases to figure whether it's there or not. Another |
|
|
35:09 | is committed to increase the cash line , so you large you load the |
|
|
35:15 | chunk at that time. And so means the number off Mrs May decrease |
|
|
35:27 | you get more. So if you spatial locality. Yeah, clearly, |
|
|
35:35 | . One of the data that comes in a given cash line stretch on |
|
|
35:42 | other hand, for a fixed side . That means jewelry, cash lines |
|
|
35:47 | in the cash for them. It increase the conflict Mrs. And of |
|
|
35:55 | , since you loved the larger number data items, the Miss penalty is |
|
|
36:01 | figure one for Darch cash lines. that's why it tend to be that |
|
|
36:07 | have stayed for a very long not use in 64 by cash |
|
|
36:13 | And if you go back and though at I think there was the |
|
|
36:19 | party nine that actually used for uh, the levels of the cash |
|
|
36:24 | 28 5 rights. This is a bit just of the trade off between |
|
|
36:35 | and distance. Um, not trade between l one and L two years |
|
|
36:42 | one tends to have stayed rather even though you can more gap |
|
|
36:48 | Um, a, uh, cash caches of largest sites on the |
|
|
36:56 | the level once tends to have stayed they are insights, and that is |
|
|
37:02 | speed reasons that by being small insights are close to the logic units or |
|
|
37:09 | distance, do it on distance means in this case and come to that |
|
|
37:17 | towards the end of today's lecture, that matters not quite speed of light |
|
|
37:22 | related, so small means tend to fast. And so that's why |
|
|
37:30 | as, um, chapter small and they have to do, is not |
|
|
37:36 | far away. And the penalty for an L one is modest. Then |
|
|
37:42 | speed of Elbaneh spin kind of driving . Yeah, and the other things |
|
|
37:49 | be aware that there are moves between levels of cash unless you have bypassed |
|
|
37:57 | , which is exist. But it's the normal motor business. Okay, |
|
|
38:06 | to the example. Any questions? . All right. So I'll try |
|
|
38:18 | work through an example how the cash l l are You kind of |
|
|
38:27 | and I think it is useful to how the cash is work. So |
|
|
38:32 | this case, that once is not . That a 16 bit or to |
|
|
38:39 | words Eso This is kind of a . Addressable little system that has 256 |
|
|
38:49 | and idealistic. Okay, I That means things are instead of k |
|
|
38:56 | forest off 1000. Okay, I , um, sort of marginal 10 |
|
|
39:05 | . So, in this case, 56 k, uh, works in |
|
|
39:10 | memory, then it has a four work cash, and it sets associative |
|
|
39:18 | for cash lines per set, and cash line size is 64. Works |
|
|
39:27 | My Word Again was two bites. Means is 1 28 by cash line |
|
|
39:32 | this little example. And then we're to do an exercise to figure out |
|
|
39:37 | benefit of the cash. By assuming the cash is 10 times faster than |
|
|
39:46 | . That's a way underestimate its in more like a factor of 100 |
|
|
39:54 | But now the numbers becomes smaller and works for this illustration. Purposes. |
|
|
40:00 | in this case is kind of the does the loop loops 15 times and |
|
|
40:08 | each low goals and get 43 52 from successful memory location. So it's |
|
|
40:16 | of simple, like Houston Street. E have little A B's and C's |
|
|
40:26 | going through this example. So first to try to understand how the address |
|
|
40:35 | going to be petitioned. So we'll if this can work by question and |
|
|
40:44 | , uh so hopefully suggest my see or you guys can just pick |
|
|
40:51 | So first question is so how maney bits do we need and then I'm |
|
|
41:01 | six a to 56 k. Is that just a memory? I |
|
|
41:07 | , memory adjustment right to 56. , it is five. And |
|
|
41:14 | How many bits is the address to able to address the 2 56 |
|
|
41:21 | So how many kids? It is . That so 16 words? |
|
|
41:32 | So the address eso 1600 to That is what you're saying will be |
|
|
41:41 | , it would be divided, uh, that much boards could fit |
|
|
41:45 | that Hominy grits. Mhm. So there is 2 56 K items |
|
|
41:57 | be retreated, right each 16 so it doesn't tell. You know |
|
|
42:03 | many bits there are in memory, you need to be able to distinguish |
|
|
42:08 | 256 k addresses. What works? should be the log of 2. |
|
|
42:19 | times 1000 24. Right? Right based too. Yeah, I guess |
|
|
42:28 | . So that's 18. So 56 is to to the pyre eight |
|
|
42:35 | the K. I is 10. wasn't. There's 18 bits. So |
|
|
42:44 | so The next question then we needed offset field in terms of the |
|
|
42:52 | So how many bits do we need find it worked within the cash |
|
|
43:03 | Okay. How many choices are there the cash line? 256. Not |
|
|
43:32 | the cache line. I'm sorry. , four from right. So we |
|
|
43:40 | need to bits if we have four . Cash line has 64 words. |
|
|
43:49 | sorry. Yes, it's confusing. why I was going through this |
|
|
43:53 | That, uh then there are is few different things to keep track |
|
|
43:57 | One is the number off memory addresses we went through. Figured out. |
|
|
44:03 | two to the 18 different memory addresses we need to keep track called than |
|
|
44:11 | cash line has 64 words. So need to be able to distinguish between |
|
|
44:16 | items, So Okay, it should logged based to 64 this time. |
|
|
44:23 | , Correct. So that's two to six. Sorry. I'm so used |
|
|
44:28 | using these things. I'm always thinking things that forest of six parts of |
|
|
44:32 | . So I have them easily in head. So then the so that's |
|
|
44:40 | of the whole address Field is 18 on. Now we have one part |
|
|
44:46 | the address field that is the There is six bits, then the |
|
|
44:50 | thing we're going to try to figure is how many bits do we need |
|
|
44:57 | figure out which set on cash? is going to be map into. |
|
|
45:08 | how many sets did they have? , it wasn't stated, So we |
|
|
45:22 | to figure out. I mean, it's order. And how do we |
|
|
45:26 | out how many sets there are? , we do have some information, |
|
|
45:36 | we know that one cash line Go . Yeah, forecast length or said |
|
|
45:44 | 64. Cashman's so 64 divided by . We start that many sets would |
|
|
45:51 | there because four cash line occupies. said yes. So say that |
|
|
46:03 | So how many words is in one , how many food try it is |
|
|
46:17 | 14 organized the status with you. each set holes for cash lines, |
|
|
46:37 | of which is 64 words. so set in fact, has four |
|
|
46:46 | 64 to 2 56 works. So set size is 2 56 works. |
|
|
47:04 | now the total number of words in cash is tour times 10 24 which |
|
|
47:14 | 40 96. So 40 96 is total on each set. Your car |
|
|
47:21 | 2 56. So that means, fact, that there are 16 sets |
|
|
47:31 | is 40 96 divided by 2 So that means sixteens, that's that's |
|
|
47:43 | bits. So there is a little what the picture is. So you |
|
|
47:51 | , everyone is with me how this exercise work. Yeah, whenever we |
|
|
47:57 | word, that's that's the unit that cannot break down any further, |
|
|
48:02 | Yes. Okay, that's yeah, this context and most words in most |
|
|
48:15 | , actually, it's the so like position. The word would be 32 |
|
|
48:25 | . It doesn't mean that the computer could be byte addressable a bit |
|
|
48:30 | but And this a simple example, the smallest adjustable unit. All |
|
|
48:40 | so now on the same part, to do it with me, no |
|
|
48:46 | . And then we're going to figure how this cash works with the |
|
|
48:53 | So no cash is very simple. that memory access stage 10 times as |
|
|
49:00 | as the cash access or no cash every references. A main memory reference |
|
|
49:08 | we had 15 integrations and each integration for 3 52 accesses. And so |
|
|
49:16 | just multiply the numbers together and we something like 652 800 No. 66 |
|
|
49:24 | to 800. Ah, times t that time unit is. So that's |
|
|
49:33 | and not so interesting. So trying to figure out how they said |
|
|
49:36 | you, it actually works. So the issue is that the cash could |
|
|
49:48 | 64 or the cash line size was . Um, and it's the figure |
|
|
50:00 | and they go through these 40 to references. You know, a cache |
|
|
50:07 | at the time. That means that have to retrieve 68 cash lines to |
|
|
50:20 | or loads all the 43 52 And now there was, Ah, |
|
|
50:33 | cash. So this reads wrongly. sorry. So there's, um But |
|
|
50:46 | slides are consistent and not matching the in the previous slides. And now |
|
|
50:50 | using cash lines and earlier was 64 cash line size in bytes. It's |
|
|
50:58 | 68 of this, and now I'm going to use for this exercise that |
|
|
51:04 | of a cache line size, UM, 64 items and not worry |
|
|
51:13 | much about how this works out. see you on the next slide. |
|
|
51:21 | make it more clear. What does four sets and I guess, 64 |
|
|
51:32 | , 16 Dean sets with four cash , purse set and 64 words for |
|
|
51:46 | line. So the picture on the boat getting straight? Sorry, I |
|
|
51:50 | myself confused. So the time for fetches from cash it's is very |
|
|
51:57 | It hit in the cash it, the cash access time is. And |
|
|
52:04 | , the number off access is for and then the number of iterations. |
|
|
52:10 | it's just simple product. Then the issue is figuring out how many missus |
|
|
52:17 | goingto happen when the cash is It doesn't fifth or hold all the |
|
|
52:27 | , so we'll figure these things So in this case, the Miss |
|
|
52:33 | is again the block size. In case or cash line, size actually |
|
|
52:39 | , Just get blocked is very often used a synonym for cash line. |
|
|
52:46 | basically, the cash line size 64 the memory access time, which was |
|
|
52:54 | . So that's dependent for each myth . Then we need to figure out |
|
|
52:58 | number of Mrs and this is what's on the next few slides here. |
|
|
53:07 | basically, first you get this cold compulsory, Mrs, because there's nothing |
|
|
53:13 | the cash. When you start, have to read everything. And so |
|
|
53:17 | this case, we had the 16 that we figured out when each, |
|
|
53:25 | set had four, um, cash in each catch line had I |
|
|
53:34 | um, the 64 bytes. So know, as you read this |
|
|
53:48 | So we had 64 loading off cash . So what you see in black |
|
|
54:04 | and start to load the first cash on you, go into set |
|
|
54:10 | and then next cash wrangles into set . And after you have read 16 |
|
|
54:19 | lines, you have put one cash into each one of the 16 |
|
|
54:25 | and then you use kind of the location or frame in a given |
|
|
54:31 | So then the subsequent trash lines that into, um, the second |
|
|
54:39 | so to speak, in each one the cash sets, and it's all |
|
|
54:46 | fine until your gap to run And you have basically filled up thing |
|
|
54:52 | the slots in the cash. So you read the 65th item or item |
|
|
55:00 | 64 then you have tow overwrites So in this case, what's the |
|
|
55:08 | like you. So you overwrite, , cash line zero. And then |
|
|
55:15 | over canceling one. And so you have four. Mrs. When you |
|
|
55:24 | the first integration of the loop off 15 and the next time around, |
|
|
55:33 | again. So now you don't any have the cash line zero in the |
|
|
55:42 | because that was over written towards the of the first iteration. So now |
|
|
55:48 | need to reload. Cash line And now again with you. |
|
|
55:57 | policy than you replace cash line You go to basically sort of the |
|
|
56:05 | slot, in a sense. So you end up over to get cash |
|
|
56:12 | . 0123 over. Right, the slot in the first for sets, |
|
|
56:20 | your good again for a while until need to load cash line number |
|
|
56:28 | Well, 16 was, in over written, um, now by |
|
|
56:36 | line zero. So now you need load cash Line 16. And with |
|
|
56:41 | l. A. Your policy it ice Cash lane 32. So now |
|
|
56:47 | get the sequence off form or Mrs 16 through 19 work overrated. And |
|
|
56:56 | you're OK again for 20 to 31 and then they miss again. |
|
|
57:04 | in this case, uh, in , you miss most five off the |
|
|
57:13 | line loads in this iteration. You the first four on. Then you |
|
|
57:19 | another four because, um, these set before was over Written because of |
|
|
57:29 | l. A. You keep doing . And then again, like in |
|
|
57:33 | first iteration, yeah, again have , uh, find a place for |
|
|
57:42 | Line 64 through 67. And then again goes bardella your policy. So |
|
|
57:50 | it again over rice. What is the second, um, slopped in |
|
|
57:57 | first four set. So this basically this behavior because of it, Unless |
|
|
58:05 | possible behavior. So in the you end up figuring out that even |
|
|
58:11 | you had it all up in terms the number off its and this is |
|
|
58:16 | yes, things improved, but even you were only four cash lines short |
|
|
58:22 | this simple example, and the cash 10 times faster. Ah, they |
|
|
58:28 | improve the performance with more than a , too, because the expense off |
|
|
58:32 | line, this is and the replacement . So any questions on that other |
|
|
58:40 | will talk about main memory. So was what I had plan to talk |
|
|
58:45 | in terms of cash in and illustration how things works. So move |
|
|
59:03 | then. So a little bit about is it's a bit old slide and |
|
|
59:09 | haven't found any good, you know data. But I'm sure things have |
|
|
59:13 | Bean more relaxed than it was at time. This line was done in |
|
|
59:19 | of importance of speeds of Amazon claims 100 million second costume, 1% in |
|
|
59:25 | in terms off DeLay or leighton see when at the time of half a |
|
|
59:31 | cost from 20% in, uh, and brokers stock crater. You guys |
|
|
59:43 | that making custom, you know, million for milliseconds, uh, |
|
|
59:50 | So in some businesses, delays are expensive. This is just a |
|
|
59:58 | Oh, it's like that. I somewhere exactly from John Mark, helping |
|
|
60:03 | basically pointing out again the gap in between CPUs and main memory. And |
|
|
60:11 | try to just love it both. , that is and how it's trying |
|
|
60:18 | be addressed to keep. They got small as possible. So in Terms |
|
|
60:28 | , Bandit talked about it before. is that processor is over. Time |
|
|
60:34 | gotten more and more memory channels. basically the band with two main memory |
|
|
60:42 | increased by adding sort of parallelism in off memory accesses. Another big part |
|
|
60:52 | tends to be slow relative to is how things are packaged and |
|
|
61:00 | and I will talk about that as in the next several slides. So |
|
|
61:07 | use has had an advantage compared to server processors because they have had, |
|
|
61:12 | , or a wider paths to memory use different kind of packaging known as |
|
|
61:20 | . DVD are supposed to the other members, and I'll talk more about |
|
|
61:25 | later. And more recently, one again tried to increase their which of |
|
|
61:32 | data passed to memory by using these called high bandwidth memories, and I |
|
|
61:37 | talking about that so focus a little on service, but also point out |
|
|
61:43 | , too, other types of computing and then talk about the memory |
|
|
61:48 | So first, several pieces and the things one talks about and and the |
|
|
61:54 | that is based computers, I'm knows about Davis. And that's what's |
|
|
61:59 | as dual in line memory modules. that's basically packaging on memory ships |
|
|
62:06 | In order to, um, I would say the memory bus Switz |
|
|
62:14 | , most memory Busses or 64 bits were his memory chips. Rarely on |
|
|
62:21 | has output 64 bit, so they much fewer, typically for eight or |
|
|
62:27 | but sometimes 32 bits. So that you need a number of memory chips |
|
|
62:31 | order to map match the width of memory channels. And what's on this |
|
|
62:37 | from top to bottom is different and the current generation is known as |
|
|
62:43 | DDR Four memories, and I'll talk about uh, you see these kind |
|
|
62:49 | dims and it is just a little board with a bunch of chips on |
|
|
62:55 | that that makes up remember and come into how these things are structured on |
|
|
63:01 | little circuit board and a Securitas Perhaps more than anything else that in |
|
|
63:08 | not to plug in the right wrong of memory into your memory slot, |
|
|
63:13 | is a little notch that make things sit in a property. Yeah, |
|
|
63:20 | is also a concept that is important understand in terms of these names that |
|
|
63:25 | something known as ranks and the rank the collection. Off memory chips are |
|
|
63:34 | matches the weight off the memory as I said today today to pretty |
|
|
63:41 | , or 64 bit slight. So rank is a collection of chip memberships |
|
|
63:48 | ends up being able thio except or 64 bits at the time. In |
|
|
63:56 | of data, these chips can be on one side off the circuit board |
|
|
64:04 | it can be mounted. Uh together, sort of big on one |
|
|
64:10 | . So it's not, uh the on one side is not sufficient to |
|
|
64:15 | memory, but so there get some them also on the back side, |
|
|
64:19 | to speak off this, um, circuit board. So that's coming. |
|
|
64:25 | why I said double sided card. the rank covers both sides are supposed |
|
|
64:30 | single side of the car. So though they have chips on both sides |
|
|
64:35 | the card, it doesn't mean that a single rank because the chips on |
|
|
64:40 | side may be fully capable of matching memory about sweat. So in that |
|
|
64:46 | , it becomes a double side, rank or to rank memory module. |
|
|
64:54 | there's also these dims that basically has of these ranks on the same side |
|
|
65:00 | the car. So this tells a bit of these things works. The |
|
|
65:06 | of these ranks is that it's partially packaging part, but any only one |
|
|
65:14 | at the time can talk to the bus. So when you have several |
|
|
65:21 | and given them, you need to which one off these ranks they want |
|
|
65:29 | communicate with. So there's selection of that is necessary known as Chip |
|
|
65:38 | How many chips you need depends on width off each one of these chips |
|
|
65:42 | the witness, the number of bits outputs and as I mentioned, typically |
|
|
65:46 | through 16 is are the most And that's why you know, takes |
|
|
65:55 | chips that are its wind to make for 64 bits work, So I |
|
|
66:01 | I'll look on this slide at the . You can see 1234567 and it's |
|
|
66:11 | or nine. Actually, um, a big thing in the middle about |
|
|
66:15 | there's four equals for equal size pieces silicon, effectively on the right and |
|
|
66:23 | on the left. So that is likely something that is a using X |
|
|
66:30 | memory chips s O. That means . And that also means this theme |
|
|
66:37 | air correcting code that uses eight So 64 plus eight that's gives you |
|
|
66:44 | or nine ships. Um, there a little bit off how this is |
|
|
66:50 | used in configuring your computer systems, typically the number of bits it is |
|
|
66:59 | on and the memory chip is now to the width of the chip. |
|
|
67:08 | all right, like in this to get a bit chip, maybe |
|
|
67:12 | four or times eight. And that you how many gigabytes of memory your |
|
|
67:21 | holds. Because if you have times memory chips, then for 64 bit |
|
|
67:29 | , then you need 16 of them match the bus. And if it's |
|
|
67:33 | gigabit per chip, then you get for gigabytes. On the other |
|
|
67:38 | if you use time 16 chips, only need the quarters, many |
|
|
67:44 | So for the same size membership, only get the quarter, the amount |
|
|
67:49 | memory. So these are things that chooses among women configures, uh, |
|
|
67:54 | on here is a little bit more things. May looks as a picture |
|
|
68:00 | a circuit board and shows basically two and the world socket thing and is |
|
|
68:07 | blue things that a framed in red are where these deems goes on the |
|
|
68:13 | . Boards on this case each on two CPUs can host four games. |
|
|
68:21 | just so happens that are placed two each. I decide on the CPS |
|
|
68:28 | issues. So how you configure these ? I don't go through quickly s |
|
|
68:34 | , not something that you do when write your code. But if you |
|
|
68:38 | a computer system, there's something one be aware of how things works. |
|
|
68:44 | these are examples that with a different of members channels for, um, |
|
|
68:50 | . So on the top is basically memory channels and it shows what's really |
|
|
68:55 | logic. Can either have two or all these names. We're memory |
|
|
69:01 | but there's also cases where you can have one on. There is more |
|
|
69:09 | to this story because, as it to illustrate on the top role here |
|
|
69:17 | when you have to on these so that's that's their speed is like |
|
|
69:24 | 33 megahertz. Where is on the hand side? Where you have three |
|
|
69:30 | for channel is is 800 megahertz That's a reflection of the fact that |
|
|
69:36 | more games you put on a memory , the harder it gets to service |
|
|
69:43 | . So it tends to be that more names to put on the channel |
|
|
69:49 | even the more ranks you put a channel, um, the lower the |
|
|
69:56 | rate ends up being so here's another from HP just showing different configuration that |
|
|
70:03 | again, that they tend to um, the our memory you can |
|
|
70:09 | three per channel and and total 18 from this is what they call maximum |
|
|
70:16 | and maximum number of them's. And the bottom you have something where speed |
|
|
70:21 | more important But it also limits the of, uh, deanship companion. |
|
|
70:28 | here is just another example that shows at typically the number of ranks it |
|
|
70:34 | support. Per channel is eight, it doesn't matter how the ranks are |
|
|
70:40 | among dips, and that has in to do with the address ing |
|
|
70:46 | which rank you want. So basically ranks is three bits. So there's |
|
|
70:55 | I pretty much said already, so here is a little bit more |
|
|
71:02 | a few pictures showing you on the , and these kids, they will |
|
|
71:06 | dims, and so is, And how this, uh, selection |
|
|
71:14 | um ranks is being carried out and come. There's just a few fairly |
|
|
71:22 | illustration how these things works and a of the number of memories channels on |
|
|
71:27 | of these processors out there. And one more important aspect in trying to |
|
|
71:35 | how things actually work. Um, it comes to the main memory |
|
|
71:40 | and that is, that depends a . There's more than one memory |
|
|
71:46 | and it's also typical that memory controller mawr than one memory channels on these |
|
|
71:58 | . Uh, independent case on the where there's a memory controller from Memory |
|
|
72:02 | , which is not typical, it's typical that the memory controllers serves to |
|
|
72:12 | three memory controllers. Something dental has two memory controllers for their six memory |
|
|
72:21 | , and I'll come back to that a little bit. This is kind |
|
|
72:24 | the Numa aspect that I talked about in terms of the processes in the |
|
|
72:29 | socket Andi that why some, memory is a bit further away because |
|
|
72:38 | order to get to another sockets even though the address space is known |
|
|
72:43 | this shared memory, in that it's not equal distance in time or |
|
|
72:51 | . So here is kind of the was referring to that I was coming |
|
|
72:55 | in terms off what the poor memory or if it's just a single |
|
|
73:01 | needs to deal with. So this basically a number. Of course, |
|
|
73:06 | has a number of friends running, that needs to access parts of memory |
|
|
73:14 | that they all send their request to memory controller. That man needs to |
|
|
73:19 | and all figure out on which channel this data located, then, on |
|
|
73:26 | channel on which rank is it and then within that ranch where it |
|
|
73:33 | located. And that comes to something bankroll and columns that is, to |
|
|
73:39 | with her. Actually, memory chips designed, so there's a whole lot |
|
|
73:44 | selection that needs to be addressed. then the memory controller also need to |
|
|
73:48 | about laced in seas and outstanding request keeping track all what belongs to whom |
|
|
73:55 | things comes back from memory. Um, all right, well, |
|
|
74:02 | when you show this is just another of dealing with memory that is more |
|
|
74:07 | embedded in mobile systems that you don't this slots for modular memories like them |
|
|
74:17 | to plug into slots. In take the memory chips and baskets older |
|
|
74:21 | onto, um, the circuit So here's the white boxes are infected |
|
|
74:28 | chips that are sold it on to board in there. China. Bigger |
|
|
74:33 | are processes other things that is not used to get tight. Bandwidth is |
|
|
74:39 | take memberships, stack them on top each other, and then you get |
|
|
74:46 | communication. Pastor gets data from the chips they're stacked on to each |
|
|
74:51 | Uh, no through something known as silicon the tortillas lease. So this |
|
|
74:58 | a relatively expensive way of doing And technology hasn't been affordable lunch in |
|
|
75:05 | last couple of years, but it's being used for high end processors, |
|
|
75:10 | particular for GP use. And here a little bit off soldier girl back |
|
|
75:18 | look at process of lecture slide for use. You will get some of |
|
|
75:23 | data for how I wanted terrorist for h p. M. I will |
|
|
75:29 | to cover couple of more things in last few minutes here and now, |
|
|
75:36 | guess follow up next lecture to finish up. But memory is using the |
|
|
75:43 | technology as the processors are. Everything today is used Thing was known. |
|
|
75:49 | most of our complementary metal oxide There's two different designs one for dynamic |
|
|
75:58 | access memory D ram and one forest random access memory, which is |
|
|
76:04 | Um And so here, um, the thing that for density industrial main |
|
|
76:11 | type of memory cells and S rahm or cash type memory cells. So |
|
|
76:19 | are the Iran cells, essentially one cell and, uh, SDRAM is |
|
|
76:27 | six transistors, so that means it's larger, then the ram. So |
|
|
76:34 | means it's not as dense part of reason why it's more expensive. The |
|
|
76:40 | one, since it's also used for , are you one Speed is also |
|
|
76:47 | for speed that Zo other design constraints also may accept more expensive. We |
|
|
76:53 | there, perhaps more well, one properly is that the there, |
|
|
77:03 | it actually forgets, which is the , so it doesn't keep the information |
|
|
77:09 | long. So that's why it, order for the Iran to retain |
|
|
77:17 | it needs to go through what's known a refresh cycle and let me see |
|
|
77:22 | my next time was supposed to So So this is the way things |
|
|
77:30 | being laid out. So each one these cross points its effect kind of |
|
|
77:34 | memory location in the Iran. So organized as the Matrix. So you |
|
|
77:42 | both roles and column addressed off, an item. But my Tom is |
|
|
77:49 | absolute. I would like to start try Thio, cover that and try |
|
|
77:54 | address, and this is part of reason why the Iran or is so |
|
|
78:03 | I'll try to explain that next and then I'll take questions. |
|
|
78:15 | you're welcome to obviously ask questions next when I continue talking about memory. |
|
|
78:25 | huh. Okay. If you no this time, then my little stop |
|
|
5999:59 | |
|