Byte Order
From PortaWiki
Different CPUs represent numbers that use multiple bytes in memory in different orders. The order that a particular CPU uses to store numbers is described by the term "endianness". Generally the type of endianness that a CPU uses is either big or little endian. Big endian CPUs store multi byte integers from left to right in memory, while little endian does the reverse. This can be demostrated by running the following code on different platforms.
int
main(int argc, char *argv[])
{
u_int32_t n = 0x11223344;
u_int8_t *b;
int i;
b = (u_int8_t *)&n;
for (i = 0; i < sizeof(n); i++)
printf("byte %d: 0x%02x\n", i, b[i]);
return (0);
}
On a big endian CPU the following output is produced:
byte 0: 0x11 byte 1: 0x22 byte 2: 0x33 byte 3: 0x44
While on a little endian CPU you get this:
byte 0: 0x44 byte 1: 0x33 byte 2: 0x22 byte 3: 0x11
The endianness of a platform is generally transparent to the programmer until they need to exchange data with another computer. For example, a program that needs to store a string on disk could use an integer to store the length of the string. The following two programs will work on machines of the same endianness:
write.c:
#include <stdio.h>
#include <fcntl.h>
#include <err.h>
int
main(int argc, char *argv[])
{
u_int32_t len;
char data[64] = "this is my data";
int f;
len = strlen(data);
printf("length: %d\n", len);
printf("data: \"%s\"\n", data);
if ((f = open("myfile", O_WRONLY|O_CREAT|O_TRUNC, 0666)) == -1)
err(1, "open");
/* write the length of the string in the data */
if (write(f, (void *)&len, sizeof(len)) == -1)
err(1, "write");
/* write the data */
if (write(f, (void *)data, sizeof(data)) == -1)
err(1, "write");
close(f);
return (0);
}
read.c:
#include <stdio.h>
#include <fcntl.h>
#include <err.h>
int
main(int argc, char *argv[])
{
u_int32_t len;
char data[64];
int f;
if ((f = open("myfile", O_RDONLY, 0666)) == -1)
err(1, "open");
/* read the length of the string in the data */
if (read(f, (void *)&len, sizeof(len)) == -1)
err(1, "read");
/* read the data into memory */
if (read(f, (void *)data, sizeof(data)) == -1)
err(1, "read");
close(f);
data[len] = '\0'; /* null terminate our data for printing */
printf("length: %d data: \"%s\"\n", len, data);
return (0);
}
Running these programs on the same computer gives the following output:
user@bigendian ~$ ./write length: 15 data: "this is my data" user@bigendian ~$ ./read length: 15 data: "this is my data"
However, copying the data file from the bigendian host to a little endian machine and running read does this:
user@lilendian ~$ ./read length: 251658240 Segmentation fault (core dumped)
The little endian host understood the ordering of the integer in memory differently and so tried to write the \0 to the wrong place in memory. The address it ended up trying to write to was outside its address space so the operating system killed it.
Fortunately this integer byte order problem has been around for a while so someone has already come up for a solution for us. The solution is a set of functions called htonl, htons, ntohl, and ntohs. These functions convert 16 and 32bit quantities between host and "network" byte ordering. The h and n in the function names refer to "host" and "network" respectively while the s and l are "short" (16-bit) and "long" (32-bit). So you can read the function names as "host to network long", "host to network short", and so on. As shown below, they can guarantee that your on disk or on the wire integer represention will work on any platform.
newwrite.c:
#include <stdio.h>
#include <fcntl.h>
#include <err.h>
#include <sys/types.h>
int
main(int argc, char *argv[])
{
u_int32_t len;
char data[64] = "this is my data";
int f;
len = strlen(data);
printf("length: %d\n", len);
len = htonl(len);
printf("data: \"%s\"\n", data);
if ((f = open("myfile", O_WRONLY|O_CREAT|O_TRUNC, 0666)) == -1)
err(1, "open");
/* write the length of the string in the data */
if (write(f, (void *)&len, sizeof(len)) == -1)
err(1, "write");
/* write the data */
if (write(f, (void *)data, sizeof(data)) == -1)
err(1, "write");
close(f);
return (0);
}
newread.c:
#include <stdio.h>
#include <fcntl.h>
#include <err.h>
#include <sys/types.h>
int
main(int argc, char *argv[])
{
u_int32_t len;
char data[64];
int f;
if ((f = open("myfile", O_RDONLY, 0666)) == -1)
err(1, "open");
/* read the length of the string in the data */
if (read(f, (void *)&len, sizeof(len)) == -1)
err(1, "read");
/* read the data into memory */
if (read(f, (void *)data, sizeof(data)) == -1)
err(1, "read");
close(f);
len = ntohl(len);
printf("length: %d\n", len);
data[len] = '\0'; /* null terminate our data for printing */
printf("data: \"%s\"\n", data);
return (0);
}
On the big endian host the output remains the same:
user@bigendian ~$ ./newwrite length: 15 data: "this is my data" user@bigendian ~$ ./newread length: 15 data: "this is my data"
And on the little endian host it magically works:
user@lilendian ~$ ./newread length: 15 data: "this is my data"
