Big and Little Endian
轉
Big and Little Endian
Basic Memory ConceptsIn order to understand the concept of big and little endian, you need to understand memory. Fortunately, we only need a very high level abstraction for memory. You don't need to know all the little details of how memory works.All you need to know about memory is that it's one large array. But one large array containing what? The array contains bytes. In computer organization, people don't use the term "index" to refer to the array locations. Instead, we use the term "address". "address" and "index" mean the same, so if you're getting confused, just think of "address" as "index".
Each address stores one element of the memory "array". Each element is typically one byte. There are some memory configurations where each address stores something besides a byte. For example, you might store a nybble or a bit. However, those are exceedingly rare, so for now, we make the broad assumption that all memory addresses store bytes.
I will sometimes say that memory is byte-addresseable. This is just a fancy way of saying that each address stores one byte. If I say memory is nybble-addressable, that means each memory address stores one nybble. Storing Words in MemoryWe've defined a word to mean 32 bits. This is the same as 4 bytes. Integers, single-precision floating point numbers, and MIPS instructions are all 32 bits long. How can we store these values into memory? After all, each memory address can store a single byte, not 4 bytes.
The answer is simple. We split the 32 bit quantity into 4 bytes. For example, suppose we have a 32 bit quantity, written as 90AB12CD16, which is hexadecimal. Since each hex digit is 4 bits, we need 8 hex digits to represent the 32 bit value.
So, the 4 bytes are: 90, AB, 12, CD where each byte requires 2 hex digits.
It turns out there are two ways to store this in memory. Big EndianIn big endian, you store the most significant byte in the smallest address. Here's how it would look:
Little EndianIn little endian, you store the least significant byte in the smallest address. Here's how it would look:
Notice that this is in the reverse order compared to big endian. To remember which is which, recall whether the least significant byte is stored first (thus, little endian) or the most significant byte is stored first (thus, big endian).
Notice I used "byte" instead of "bit" in least significant bit. I sometimes abbreciated this as LSB and MSB, with the 'B' capitalized to refer to byte and use the lowercase 'b' to represent bit. I only refer to most and least significant byte when it comes to endianness. Which Way Makes Sense?Different ISAs use different endianness. While one way may seem more natural to you (most people think big-endian is more natural), there is justification for either one.
For example, DEC and IBMs(?) are little endian, while Motorolas and Suns are big endian. MIPS processors allowed you to select a configuration where it would be big or little endian.
Why is endianness so important? Suppose you are storing int values to a file, then you send the file to a machine which uses the opposite endianness and read in the value. You'll run into problems because of endianness. You'll read in reversed values that won't make sense.
Endianness is also a big issue when sending numbers over the network. Again, if you send a value from a machine of one endianness to a machine of the opposite endianness, you'll have problems. This is even worse over the network, because you might not be able to determine the endianness of the machine that sent you the data.
The solution is to send 4 byte quantities using network byte order which is arbitrarily picked to be one of the endianness (not sure if it's big or little, but it's one of them). If your machine has the same endianness as network byte order, then great, no change is needed. If not, then you must reverse the bytes. History of Endian-nessWhere does this term "endian" come from? Jonathan Swift was a satirist (he poked fun at society through his writings). His most famous book is "Gulliver's Travels", and he talks about how certain people prefer to eat their hard boiled eggs from the little end first (thus, little endian), while others prefer to eat from the big end (thus, big endians) and how this lead to various wars.
Of course, the point was to say that it was a silly thing to debate over, and yet, people argue over such trivialities all the time (for example, should braces line in parallel or not? vi or emacs? UNIX or Windows). MisconceptionsEndianness only makes sense when you want to break a large value (such as a word) into several small ones. You must decide on an order to place it in memory.
However, if you have a 32 bit register storing a 32 bit value, it makes no sense to talk about endianness. The register is neither big endian nor little endian. It's just a register holding a 32 bit value. The rightmost bit is the least significant bit, and the leftmost bit is the most significant bit.
There's no reason to rearrange the bytes in a register in some other way.
Endianness only makes sense when you are breaking up a multi-byte quantity, and attempting to store the bytes at consecutive memory locations. In a register, it doesn't make sense. A register is simply a 32 bit quantity, b31....b0, and endianness does not apply to it.
With regard to endianness, You may argue there's a very natural way to store 4 bytes in 4 consecutive addresses, and that the other way looks strange. In particular, it looks "backwards". However, what's natural to you may not be natural to someone else. The fact of the matter is that the word is split in 4 bytes, and most people would agree that you need some order to place it in memory. C-style stringsOnce you start thinking about endianness, you begin to think it applies to everything. Before you see big or little endian, you may have had no idea it even existed. That's because it's reasonably well-hidden from you.
If you do bitwise/bitshift operations on an int, you don't notice the endianness. The machine arranges the multiple bytes so the least significant byte is still the least significant byte (e.g., b7-0) and the most significant byte is still the most significant byte (e.g., b31-24).
So, it's natural to think whether strings might be saved in some sort of strange order, depending on the machine.
This is where it's useful to think about all the facts you know about arrays. A C-style string, after all, is still an array of characters.
Here are some facts you should know about C-style strings and arrays.
- C-style strings are stored in arrays of characters.
- Each character requires one byte of memory, since characters are represented in ASCII (in the future, this could change, as Unicode becomes more popular).
- In an array, the address of consecutive array elements increases. Thus, & arr[ i ] is less than & arr[ i + 1 ].
- What's not as obvious is that if something is stored in increasing addresses in memory, it's going to be stored in increasing "addresses" in a file. When you write to a file, you usually specify an address in memory, and the number of bytes you wish to write to the file starting at that address.
Since C-style strings are arrays of characters, they follow the rules of characters. Unlike int or long, you can easily see the individual bytes of a C-style string, one byte at a time. You use array indexing to access the bytes (i.e., characters) of a string. You can't easily index the bytes of an int or long, without playing some pointer tricks (using reinterpret cast, for example, in C++). The individual bytes of an int are more or less hidden from you.
Now imagine writing out this string to a file using some sort of write() method. You specify a pointer to 'c', and the number of bytes you wish to print (in this case 4). The write() method proceeds byte by byte in the character string and writes it to the file, starting with 'c' and working to the null character.
Given that explanation, is it clear whether endianness matters with C-style strings? Hopefully, it is clear.
As an aside, since C++ strings are objects, it may have complicated inner structures, and so it's less obvious what a C++ string would look like when print out to a file. It's well-known what a C-style string looks like (a sequence of characters ending in a null character), which is why I've been careful to call them C-style strings.
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/30088583/viewspace-2139273/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- Little Endian & Big Endian
- Big-Endian Little-Endian
- little-endian VS. big-endian
- 大端(Big Endian)與小端(Little Endian)簡介
- 【引用】『轉』【大端(Big Endian)與小端(Little Endian)簡介】
- big_endian和little_endian的說明(轉載)
- Little_endian和Big_endian的區別和C測試程式
- 關於Big-Endian/Little-Endian 位元組順序的簡單記錄
- 徹底搞懂字元編碼(unicode,mbcs,utf-8,utf-16,utf-32,big endian,little endian...)字元Unicode
- ELF file data encoding not little-endianEncoding
- 符號編碼-ASCII、Unicode、Unicode big endian、UTF-8之間的關係(轉)符號ASCIIUnicode
- 相信大家在閱讀有關通訊資料傳輸、PLC資料儲存等技術文件時,經常會碰到“Big-Endian”(大端對齊)與Little-Endian(小端對齊)術語。很多朋友不理解大端和小端模式,本文給大家寫一下此知識點。模式
- The Little JavaScript ClosuresJavaScript
- v$transportable_platform.endian_formatPlatform
- little skill---ping
- VM虛擬機器增加磁碟空間的擴容操作(little by little)虛擬機
- 小飛賊Little Snitch 5啟用最新版+Little Snitch 5註冊碼
- sysaux bigUX
- 最新 Little Snitch 5 for mac小飛賊防火牆「 Little Snitch 5啟用碼資源」Mac防火牆
- 什麼是Little定律(littles law)
- The furniture is playing with the little wall shelves
- 好書推薦《The-Little-Prince》
- 作業系統ENDIAN(位元組儲存次序)作業系統
- redis教程(The little redis book中文版)Redis
- The Little Redis Book中文版 關於Redis
- The Little Redis Book中文版 入門Redis
- BZOJ4374 : Little Elephant and Boxes
- RMAN跨平臺傳輸表空間(different Endian)
- RMAN跨平臺傳輸表空間(same endian)
- 小飛賊Little Snitch 5 啟用最新版+Little Snitch 5 註冊啟用碼「支援m晶片」晶片
- 2021CISCN逆向-little_evil
- Little Magnet BT Pro 4.4破解版
- http-little-toy v0.0.4-preview 更新HTTPView
- Efficient DevSecOps Workflows with a Little Help from AIdevAI
- 想要升級Big Sur?你的Mac與Big Sur相容嗎?Mac
- C# split big file into small files as, and merge the small files into big oneC#
- macos big sur正式版釋出,macos big sur安裝教程Mac
- aix lvm big vgAILVM