Thursday, July 28, 2011

Where Did These Charges Come From?

Last month we received the following queries from two customers:
Customer #1:  “I know that my devices only send 40 bytes and the portal is reporting they sent anywhere from 397-767 bytes. Where are the extra bytes coming from?”
Customer #2:  “My IIS is reporting that I have sent/received a total of 3,075 bytes, but when I look at the portal it says 6,000 KB. I am trying to figure out where the other 2,925 bytes are coming from.”
Unfortunately, these are typical queries we get from customers who have launched their first wireless application. Their first reaction is that something must be wrong with the carrier’s service or billing system and that they are being improperly charged.
Let me start off stating categorically what we tell customers every time these discussions start:
·         No carrier interjects billable bytes or data into a customer data stream that inflates the billable data volume – the data reported in the management portals. In fact, all of the carrier interactions with the device take place over the control channel (i.e., device registration and authentication, channel assignment and cell site hand-off, packet data protocol (PDP) context session initiation and termination, etc.), are not counted as billable transactions and do not show up in the count of billable data.
·         Carrier billing systems are ruthlessly accurate at counting the data incurred by communicating devices, down to the byte. They do not make mistakes. We have confirmed this numerous times by sniffing the raw packet traffic just outside the carrier’s point-of-presence at their GGSN.

So ALL of this measured data traffic is caused by the device and the central application with which they are usually communicating. But clearly it is more data volume than the customer expected. In fact it is a lot more – the 2x to 20x underestimations in the customer examples above are unfortunately typical. And since carriers charge for all data usage, these underestimations are expensive.
So what is causing the extra data that these applications are being charged for?
I will start by pointing out that carriers charge for every byte that is transmitted over the data channel. All of the carrier’s management functions are conducted over the control channel, so all data traffic flowing over the data channel – whether sent by the device or sent by the central application – is presumed to be caused by the customer application and is considered billable.
Another consideration is that carriers measure data by the “session.” Here the “session” in the wireless world is from when the PDP context is first established until it is closed. Within this PDP context session, data can flow back and forth between the device and the central application in multiple transactions or packets. At the close of the session, the carrier simply measures and reports the total number of bytes sent by the mobile device and the total sent by the central application over the air. Neither carriers nor their MVNOs capture or record any more detail than that, which contributes to the customer’s confusion about what causes session totals.
Yet another consideration is that the carriers charge for all data that is transmitted over the air. They do not guarantee the receipt and processing of data by the destination end point. If an action by an application node causes data to be sent over the wireless channel, it is billable even if it is not correctly “received” by the destination end point address.
So again, what is causing the extra data that these applications are being charged for? There are some principal causes.
Communications Protocol Overhead
A number of novice wireless application developers use TCP as the fundamental transport for their application – because it is what they are used to using in wireline communications, and it is the prevailing communications protocol in an Internet dominated world. Unfortunately, TCP is a relatively chatty protocol that imposes a high amount of data volume overhead on most M2M applications. TCP wraps a significant amount of data around the customer’s payload. TCP packet overhead can include fully qualified destination address, sender address, routing information, error correction checksums, packet sequence numbers, datagram descriptors – the list goes on and on.
In addition, TCP is a chatty protocol that causes a number of messages to be exchanged between the sender and destination in addition to the principal message carrying the customer’s intended payload. These additional messages may include TCP session negotiation and establishment, encryption negotiation, address advertisement and routing negotiation, confirmation of receipt, resend of incorrectly received packets, flow control throttling – this list also goes on and on. With the protocol stack often embedded in the radio module itself, much of this chatty overhead can not be turned off in the configuration of TCP. But even when it is accessible, many novice wireless developers do not do this because it isn’t needed over wireline connections and it “gives up” functionality.
All of the bytes involved in operating these communications protocols and transactions are billable data.
Application Overhead
The application itself can be designed to cause additional billable data traffic that is not directly involved in moving “application data” between the device and the central application. Sometimes these transactions are designed intentionally, and some unwittingly. For example, sometimes the application polls remote devices for the data it wants. Or the remote device sends heartbeats to signal to the application that it is still operational, to keep its PDP context session active, or to communicate its dynamically assigned IP address. The device or central application may be designed to require the receiving node to send an acknowledgement that the message or data was received correctly. All of these actions will generate billable traffic on the wireless network, and if they are being conducted using TCP and its headers, then the data volume can balloon.
Any transmissions which are used to configure or manage the device, or to troubleshoot the application will also cause billable traffic. This will include all commands sent to a remote device to change its configuration, release it to sleep mode, or clear or reset alarms. “Pinging” the device, collecting configuration parameters (e.g., device serial number, firmware version, current settings), or uploading logs will cause billable data traffic. Synchronizing state between the remote device and the central application will also generate additional traffic, as will having the application synchronize the time base with the device. All of these actions will generate data traffic that is measured and billed.
All of the bytes that the application causes to be sent between the device and the central application are billable traffic, not just the bytes that may be displayed in the application or stored in its database.
Communications Errors
Problems with the overall communication system may cause data to be sent over the wireless link that is billable, but which may not reach the destination application. Establishing a PDP context usually means that an end-to-end connection is established at the Layer 3 protocol level before any application communication starts, so this should usually not be an issue. Bugs do occur in customer applications or configurations, however, that do result in network traffic that is invisible to the customer application.
The most common cause of this type of unexpected data volume are defects in the application or device that cause messages to be sent repeatedly and frequently in error. The receiving node is not expecting the messages, or does not even recognize the contents, and so they are not “recorded” at the application. The sender keeps sending because it is not receiving an acknowledgement, because of a corruption of a timer setting, or from some other application defect. But since this bug produces data transmissions over the air, they are all billable.
Other communications errors can be the result of poor wireless application design. For example, a device may be designed to “recover” and restart a communications session if it does not receive an acknowledgement within a timeout period. But the variable latency in the wireless network may have caused a delay in the device registering or establishing the PDP context. The originally submitted data transmission may still be “in process” when the device decides to reset and start over. Any data sent before the PDP context was terminated will still be chargeable, even it is not usable to the destination application.
In marginal coverage, the wireless link may fail and the PDP context may terminate before a transmission that is in process can “complete. “ Any data transmitted before the link dropped would still be billable, even if the partial transmission is unrecognizable and unusable to the destination node.  All retries would also be billable transmissions, and these could add up if the coverage is marginal enough and the connection link cannot stay up long enough to complete a transaction. This phenomenon is fairly uncommon, but it can cause a significant volume of traffic if the application is designed to just keep trying. It is worth noting that marginal coverage, or even lack of coverage, is not a “bug” of wireless networks. These networks do not guarantee any coverage, and they only provide service on a “best efforts” basis.
All data sent over the wireless channel is billable, whether a transaction or session completed normally from the standpoint of the application.
Rounding
Another potential reason for the difference between what a user thinks they are sending, and the data volume for which they are billed, is rounding. Rounding is a practice followed by some carriers in which they “round up” the measured data traffic in a session to an even amount for billing purposes. Usually the measured data traffic in a session is rounded up to the next whole kilobyte, although some international carriers round up to the nearest 10 kilobytes. Rounding can undermine the efforts of diligent application developers who are selective about the data they send, send it in a compressed binary form, and use a low overhead protocol (such as UDP). Rounding can have a significant impact on the billable data volume when very small amounts of data are sent during a session, but its impact is less noticeable in applications that send higher data volumes.
Rounding usually shows up in the invoice or even the summary list of sessions. Most carrier portals provide the underlying real byte count of data sent to and from a remote device, so the effect of rounding can be identified.
Summary
The portal traffic statistics do not lie. What it reports is the actual billable traffic that the application is loading on the wireless network.
Analysis of the communications link is necessary to find the specific root cause of the unexpected data volume. The best way to diagnose these communications issues is to run a protocol “sniffer” on the communications feed from the carrier before it reaches the destination node. For some communications problems, data communications analysis may even need to be done in front of the firewall. Instrumenting the device and the central application to capture communications traffic and facilitate diagnostics is also usually helpful.
The bottom line in this type of situation is this:
The application developer’s surprise about the difference between the payload size and the chargeable data volume only indicates that they do not understand some fundamental principles about how their application works, how data communications protocols work, and how the cellular communications network works.