Byte Order

From PortaWiki

Jump to: navigation, search

Different CPUs represent numbers that use multiple bytes in memory in different orders. The order that a particular CPU uses to store numbers is described by the term "endianness". Generally the type of endianness that a CPU uses is either big or little endian. Big endian CPUs store multi byte integers from left to right in memory, while little endian does the reverse. This can be demostrated by running the following code on different platforms.

int
main(int argc, char *argv[])
{
        u_int32_t n = 0x11223344;
        u_int8_t *b;
        int i;
 
        b = (u_int8_t *)&n;
        for (i = 0; i < sizeof(n); i++)
                printf("byte %d: 0x%02x\n", i, b[i]);
 
        return (0);
}

On a big endian CPU the following output is produced:

 byte 0: 0x11
 byte 1: 0x22
 byte 2: 0x33
 byte 3: 0x44

While on a little endian CPU you get this:

 byte 0: 0x44
 byte 1: 0x33
 byte 2: 0x22
 byte 3: 0x11

The endianness of a platform is generally transparent to the programmer until they need to exchange data with another computer. For example, a program that needs to store a string on disk could use an integer to store the length of the string. The following two programs will work on machines of the same endianness:

write.c:

 #include <stdio.h>
 #include <fcntl.h>
 #include <err.h>
 
 int
 main(int argc, char *argv[])
 {
         u_int32_t len;
         char data[64] = "this is my data";
         int f;
 
         len = strlen(data);
         printf("length: %d\n", len);
         printf("data: \"%s\"\n", data);
 
         if ((f = open("myfile", O_WRONLY|O_CREAT|O_TRUNC, 0666)) == -1)
                 err(1, "open");
 
         /* write the length of the string in the data */
         if (write(f, (void *)&len, sizeof(len)) == -1)
                 err(1, "write");
         /* write the data */
         if (write(f, (void *)data, sizeof(data)) == -1)
                 err(1, "write");
 
         close(f);
 
         return (0);
 }

read.c:

 #include <stdio.h>
 #include <fcntl.h>
 #include <err.h>
 
 int
 main(int argc, char *argv[])
 {
         u_int32_t len;
         char data[64];
         int f;
 
         if ((f = open("myfile", O_RDONLY, 0666)) == -1)
                 err(1, "open");
 
         /* read the length of the string in the data */
         if (read(f, (void *)&len, sizeof(len)) == -1)
                 err(1, "read");
         /* read the data into memory */
         if (read(f, (void *)data, sizeof(data)) == -1)
                 err(1, "read");
 
         close(f);
 
         data[len] = '\0'; /* null terminate our data for printing */
         printf("length: %d data: \"%s\"\n", len, data);
         return (0);
 }

Running these programs on the same computer gives the following output:

 user@bigendian ~$ ./write
 length: 15
 data: "this is my data"
 user@bigendian ~$ ./read
 length: 15
 data: "this is my data"

However, copying the data file from the bigendian host to a little endian machine and running read does this:

 user@lilendian ~$ ./read
 length: 251658240
 Segmentation fault (core dumped)

The little endian host understood the ordering of the integer in memory differently and so tried to write the \0 to the wrong place in memory. The address it ended up trying to write to was outside its address space so the operating system killed it.

Fortunately this integer byte order problem has been around for a while so someone has already come up for a solution for us. The solution is a set of functions called htonl, htons, ntohl, and ntohs. These functions convert 16 and 32bit quantities between host and "network" byte ordering. The h and n in the function names refer to "host" and "network" respectively while the s and l are "short" (16-bit) and "long" (32-bit). So you can read the function names as "host to network long", "host to network short", and so on. As shown below, they can guarantee that your on disk or on the wire integer represention will work on any platform.

newwrite.c:

 #include <stdio.h>
 #include <fcntl.h>
 #include <err.h>
 
 #include <sys/types.h>
 
 int
 main(int argc, char *argv[])
 {
         u_int32_t len;
         char data[64] = "this is my data";
         int f;
 
         len = strlen(data);
         printf("length: %d\n", len);
         len = htonl(len);
         printf("data: \"%s\"\n", data);
 
         if ((f = open("myfile", O_WRONLY|O_CREAT|O_TRUNC, 0666)) == -1)
                 err(1, "open");
 
         /* write the length of the string in the data */
         if (write(f, (void *)&len, sizeof(len)) == -1)
                 err(1, "write");
         /* write the data */
         if (write(f, (void *)data, sizeof(data)) == -1)
                 err(1, "write");
 
         close(f);
 
         return (0);
 }

newread.c:

 #include <stdio.h>
 #include <fcntl.h>
 #include <err.h>
 
 #include <sys/types.h>
 
 int
 main(int argc, char *argv[])
 {
         u_int32_t len;
         char data[64];
         int f;
 
         if ((f = open("myfile", O_RDONLY, 0666)) == -1)
                 err(1, "open");
 
         /* read the length of the string in the data */
         if (read(f, (void *)&len, sizeof(len)) == -1)
                 err(1, "read");
         /* read the data into memory */
         if (read(f, (void *)data, sizeof(data)) == -1)
                 err(1, "read");
 
         close(f);
 
         len = ntohl(len);
         printf("length: %d\n", len);
 
         data[len] = '\0'; /* null terminate our data for printing */
         printf("data: \"%s\"\n", data);
 
         return (0);
 }

On the big endian host the output remains the same:

 user@bigendian ~$ ./newwrite
 length: 15
 data: "this is my data"
 user@bigendian ~$ ./newread
 length: 15
 data: "this is my data"

And on the little endian host it magically works:

 user@lilendian ~$ ./newread
 length: 15
 data: "this is my data"
Personal tools