Why Does Winsock Keep Corrupting My Data?

by Warren Young

Newcomers to network programming almost always run into problems early on where it looks like the network or the TCP/IP stack is munging your data. This usually comes as quite a shock, because the newcomer is usually told just before this that TCP is a reliable data transport protocol. In fact, TCP and Winsock are quite reliable -- it's just that they sometimes don't behave as would be convenient for your program. Below are several of the most common problems that at first appear to be data corruption.

Problem 1: Packets Are Illusions

This problem comes up in various guises. A canonical example is, "My client program sent 100 bytes, but the server program only got 50. What gives?" I think that understanding this issue is one of TCP/IP's rites of passage.

The critical issue is that TCP is a stream protocol. This means that if you send 100 bytes, the receiving end could receive all 100 bytes at once, or 100 separate single bytes, or four 25-byte chunks. Or, the receiver could even receive that 100 byte block plus some data from the previous send and some from the succeeding send. Re-read this paragraph until you truly understand it. I'm not kidding.

"Okay, so now that I understand stream protocols, what do I do if I want the receiving end to read whole packets only?" you ask. The two most common ways are to either prefix the packet with a length value, or terminate it with something unique. (For what it's worth, a prefix is most helpful if it is a binary-encoded number; skip ahead to Problem 2 below for more information on doing this properly.) For example, you could create your protocol such that every packet is preceded by a 2-byte unsigned integer that tells how long the packet is. An example of the other method would be a CRLF (carriage return, line feed) as used in the NNTP, POP3, SMTP and HTTP protocols. I prefer the former method, because the latter requires your program to blindly read until it finds the end of the packet, whereas the former lets the program start dealing with the packet just as soon as the length prefix comes in.

In any case, the most important thing is to check the return value of recv(), which indicates how many bytes it placed in your buffer. Between this and your new packet-finding code, you should be able to reliably read complete packets from the TCP stream.

Problem 2: Byte Ordering

You have probably noticed all that ntohs() and htonl() stuff required in Winsock programming, but you might not know why it's required. The reason is that there are two major ways of storing integers on a computer: big-endian and little-endian. Big-endian numbers are stored with the most significant byte in the lowest memory location ("big-end first"), whereas little-endian systems reverse this. (There are even bizarre "middle-endian" systems!) Obviously two computers must agree on a common number format if they are to communicate, so the TCP/IP specification defines a "network byte order" that the headers (and thus Winsock) all use.

The end result is, if you are sending bare integers as part of your network protocol, and the receiving end is on a platform that uses a different integer representation, it will perceive the data as garbled. To fix this, follow the lead of the TCP protocol and use network byte order, always.

For what it's worth, network byte order is big-endian, though you should never take advantage of this fact. Some programmers working on big-endian machines ignore byte ordering issues, but this is bad style, if for no other reason than because it creates bad habits that can bite you later. Other interesting trivia: the most common little-endian machines are the Intel x86 and the Digital Alpha. Most everything else, including the Motorola 680x0, the Sun SPARC and the MIPS Rx000, are big-endian. Oddly enough, there are a few "bi-endian" devices that can operate in either mode, like the PowerPC and the HP PA-RISC 8000. Most PowerPCs always run in big-endian mode, however, and I suspect that the same is true of the PA-RISC.

Problem 3: Structure Padding

To illustrate the structure padding problem, consider this C declaration:

    struct foo {
        char a;
        int b;
        char c;
    } foo_instance;

Assuming 32-bit ints, you might guess that the structure occupies 6 bytes. The problem is, many compilers "pad" structures so that every data member is aligned on a 4-byte boundary. Compilers do this because modern CPUs can fetch data from properly-aligned memory locations quicker than from nonaligned memory. With 4-byte padding on the above structure, it would actually take up 12 bytes. This issue rears its head when you try to send a structure over Winsock whole, like this:

    send(sd, (char*)&foo_instance, sizeof(foo), 0);

Unless the receiving program was compiled on the same machine architecture with the same compiler and the same compiler options, you have no guarantee that the other machine will receive the data correctly.

The solution is to always send structures "packed" by sending the data members one at a time. Or, you can force your compiler to pack the structures for you. Visual C++ can do this with the /Zp command line option or the #pragma pack directive, and Borland C++ can do this with the -a command line option.

Conclusion

The moral of the story is, trust Winsock to send your data correctly, but don't trust that it works the way you think that it ought to!

Copyright © 1998 by Warren Young. All rights reserved.

Back to the Winsock Programmer's FAQ...



Go to my home page Go to my Important RFC Lists page Go to the main Programming Resources page

Please send updates and corrections to <tangent@cyberport.com>.