NAT Is the Enemy of Low Power Devices

vv_ • 5 months ago

The problem(-s) described in the blog post are really acute for IoT in general, especially if you want your device to run on batteries or you have a limited data budget.

> Therefore, when you try to continue talking to the server over a previously established session, it will not recognize you. This means you’ll have to re-establish the session, which typically involves expensive cryptographic operations and sending a handful of messages back and forth before actually delivering the data you were interested in sending originally.

The blog post mentions Session IDs as a solution, but these require servers to be stateful, which can be challenging in most deployments. An alternative is Session Tickets (https://datatracker.ietf.org/doc/html/rfc5077), but these may cause issues when offloading networking to another device, such as a cellular modem, as their implementation may be non-standard or faulty.

incoming rant

These issues could be mitigated—or even solved—by using mature software platforms like Zephyr RTOS and their native networking stacks, which receive more frequent updates than traditional “all-in-one-package” SoCs. However, many corporations choose to save a few dollars on hardware at the expense of incurring thousands in software engineering costs and bug-hunting. It is often seen as more cost-effective to purchase a cellular modem with an internal MCU rather than a separate cellular modem and a host MCU to run the networking stack. It is one of the many reasons why many IoT devices are utter garbage.

hasheddan • 5 months ago

Author here -- thanks for engaging in the discussion! You won't find any pushback from us on using Zephyr -- we are contributors, the firmware example in the post is using it (or Nordic's NCS distribution of it), and we offer free Zephyr training [0] every month :)

[0]: https://training.golioth.io/

bsder • 5 months ago

> It is often seen as more cost-effective to purchase a cellular modem with an internal MCU rather than a separate cellular modem and a host MCU to run the networking stack.

This one isn't just cost--the compliance restrictions that the cellular carriers place on you are idiotic.

The big one we bumped into is "must allow allow carrier initiated firmware updates with no restrictions on scheduling" which translates to "the carrier will eat your battery often and without warning".

In addition, many IoT devices may not call home more than once every couple of months. And the carrier will happily roll out tower firmware that will kill those being able to call home.

If I use a module with my own firmware, the modem folks will simply point fingers at me. If I use a module with integrated SoC and firmware and it gets updeathed, I get the "joy" of yelling at the cellular module manufacturer.

(I had the wonderful experience of watching a cellular IoT project go gradually dead over 3 days as the carrier rolled out an "upgrade" across its system. We got a front seat as the module manufacturer was screaming bloody murder at the carrier who simply did "We Don't Care. We Don't Have To. We're the Phone Company.")

vv_ • 4 months ago

Your experience is bizzare. Was it Verizon (or AT&T/T-Mobile) and did you use a Cat-1bis/Cat-M/NB-IoT or higher class modem?

Normally US carriers require Firmware-Over-The-Air (FOTA) capability for the modem firmware. This is not the case for deployments outside of the United States, to my knowledge.

It would be interesting to hear more about your story!

immibis • 4 months ago

Sounds like you need a more watertight contract and more lawyers.

bsder • 4 months ago

A cellular carrier has a legal budget whose rounding errors are likely larger than your company's total worth.

Good luck litigating.

immibis • 4 months ago

All the lawyers in the world won't protect you from paying the penalties in the contract you signed when you overtly and deliberately failed to deliver.

shipp02 • 5 months ago

What makes a separate cellular modem better than an internal cellular modem? Is it because software updates are available for the separate modems?

I am evaluating some Nordic semiconductor parts for a project. They seem to have an internal modem but Nordic uses zephyr. Any thoughts?

gwbas1c • 5 months ago

> What makes a separate cellular modem better than an internal cellular modem?

The US 3G shutdown required some rather expensive and unexpected upgrades. Vendors signed long-term contracts with 3G providers, and then "someone" was on the hook, to replace something, when the 3G vendors terminated their contracts prematurely.

The deeper the modem was integrated into a product, the harder it was to change. The shallower the modem was integrated, the easier it was to change.

For example:

One of my cars just lost its internet connectivity, and the automaker never offered any way to fix it. (I didn't care, I only used Android Auto in that car.)

My employer (IOT) sent out free chips to our customers. They had to arrange someone to go do a site visit and swap a chip while on a phone call with us. We're small and early enough that it wasn't a big deal.

My solar panel vendor wanted me to $pend big buck$ on a new smart meter and refused to honor their warranty. I told them to run a cable to the ethernet port in my meter.

vv_ • 4 months ago

Car manufacturers should abandon developing their own entertainment systems and instead collaborate with Apple (CarPlay) and Google (Android Auto) to improve integration and support a wider range of use cases. Unfortunately, they seem to revive these efforts every 4-6 years (e.g. Mercedes Benz)

The only feature I need to control remotely in my car is preheating during winter—I wonder how they could achieve that without using cellular connectivity as paying a subscription for such a service would make it less attractive to me.

gwbas1c • 4 months ago

vv_ • 5 months ago

> What makes a separate cellular modem better than an internal cellular modem?

When using a separate cellular modem, you can connect it to your MCU via either a USB or UART interface. In IoT applications, UART is the more common choice. Then you can multiplex the interface with CMUX, allowing simultaneous use of AT commands and PPP.

With frameworks like lwIP or Zephyr supporting PPP, you can get your network running really quickly and have full control over the networking and crypto stacks. Using Zephyr you get POSIX-compliant sockets which allows you to leverage existing protocol implementations.

In contrast, using a SoC's often requires reintegrating the entire network stack, as they typically do not support POSIX sockets. I've worked on SoC's that only support TLS 1.1 and the vendor refused to upgrade it, as it would require them to re-certify their modem. Switching to a different SoC can mean repeating this process from scratch as different vendors implement their own solutions (sometimes even the same vendor will have different implementation(-s) for different modems).

> I am evaluating some Nordic semiconductor parts for a project. They seem to have an internal modem but Nordic uses zephyr. Any thoughts?

It runs on Zephyr RTOS and can be built as a standalone modem (https://docs.nordicsemi.com/bundle/ncs-latest/page/nrf/appli...). You can either use Zephyr's native networking stack or offload it to the modem while retaining the same API. This means you get access to all of Zephyr’s protocol implementations without additional effort. The design makes it feel as though you have a completely independent MCU connected to an external modem.

That said, it does have quirks and bugs, particularly when offloading networking. It also has relatively limited resources, with only 256 kB of RAM and 1 MB of flash storage.

Overall, it is the best SoC I’ve worked with, but it is still an SoC. Whether it suits your project depends on your specific use case. If you plan to integrate additional radios in the future (e.g., UWB, BLE, Wi-Fi), I’d recommend using a separate MCU if your budget allows. This will provide significantly more flexibility. Otherwise it is definitely one of the better SoC's currently in the market, to my knowledge.

PS. It only supports Cat-M (& NB-IoT but I'm going to skip over it intentionally!) which is not globally supported, so you should make sure the region you want to deploy in supports that technology.

baobun • 5 months ago

Security.

On one hand, licensing requirements and regulation often mean that modems are locked down in terms of firmware updates, reference documentation, source, and capabilities. This often translates into a larger "black box" area, and one embedded inside your SoC instead of physically separate and connected over a serial bus.

On the other, on-chip modems often (not sure about those Nordics) have DMA.

The combination of those two is scary.

vv_ • 4 months ago

> The combination of those two is scary.

Could you clarify what you mean by this?

bluGill • 5 months ago

cellular modems go nonfunctional/obsolete much faster than other systems. 3g is almost entirely gone worldwide. 4g is still around, but providers are already reducing how much their towers dedicate to it. The standards body is working on 6g, who knows when that will come and push out older stuff.

If the case of my car I don't care - I have never found a use for the cellular connectivity it has (if any). However there are lots of other devices where the cellular connectivity is important and users will want to upgrade the modem to keep it working. If cellular connectivity is just a marketing bullet point nobody cares about then integrated is good enough, but if your device really isn't useful without the modem make that modem replaceable for somewhat cheap.

toast0 • 5 months ago

4g should survive better than 2g and 3g, because the 5g standard allows for mixed mode deployments where the coordination channel runs as 4g, and the slots can be 4g or 5g dependening on what the client device is capable of. Running the coordination channel with 5g encoding could be a little more efficient, but it's not a big loss compared to running a minimum size 2g/3g allocation.

vv_ • 4 months ago

EDGE and E-GPRS is going to survive for at least another 5-10 years. Although this statement might not be globally accurate.

vv_ • 4 months ago

Generally the modem would support EDGE/E-GPRS (2G) and 3G simultaniously. In Europe, EDGE/E-GPRS is still quite popular and unlikely to be sunsetted in the next 5 years. It is a pity that many manufacturers tightly integrated the modem into their PCB, instead of creating a separate "communication box" - especially as some use-cases have the budget for it.

jasonjayr • 5 months ago

> The blog post mentions Session IDs as a solution, but these require servers to be stateful, which can be challenging in most deployments.

Doesn't this just move the 'state' into the operating system, or networking layer, in the form of an active TCP connection?

vv_ • 4 months ago

You can use Session Ticket's w/ UDP too.

Nevertheless, with modern cloud providers moving the state into the OS/Networking layer is still easier to scale. You don't need to write your own services to handle it.

oakwhiz • 5 months ago

Is DTLS a workaround for the session issue? Haven't had much experience with it myself but it does cut down some of the statefulness.

vv_ • 5 months ago

You don’t want to redo the handshake (which, as far as I know, is identical to TLS) every time you send a packet using DTLS. Therefore, you still need to retain state information. In my opinion, using DTLS (UDP) is considerably more nuanced than using TLS (TCP) in an embedded IoT context.

hasheddan • 5 months ago

The post details the use of CoAP over DTLS, employing Connection IDs.

marsven_422 • 5 months ago

[flagged]

SirSavary • 5 months ago

Could you elaborate for someone who is unfamiliar?

shipp02 • 5 months ago

Zephyr RTOS[1] is an RTOS supported by the Linux foundation. It has similar structures to Linux for device configuration like a device tree and tries to emulate POSIXish APIs. I think some embedded people are put off by this configuration structure.

Notably, Zephyr RTOS is the basis of Nordic Semiconductor's SDK[2]. Nordic is a major manufacturer of cellular and wireless MCUs.

[1]:https://www.zephyrproject.org/ [2]:https://docs.nordicsemi.com/bundle/ncs-latest/page/nrf/index...

Edit: GP seems to have a history of posting 1 line hot takes with no elaboration

marsven_422 • 5 months ago

[dead]

RobotToaster • 5 months ago

You prefer ThreadX?

pwdisswordfishz • 5 months ago

Is it used by the army of Israel against civilians? Because I can't think of any other circumstance that would justify such a strong condemnation.

dent9876543 • 5 months ago

That NAT is a problem presumes that we actually want our IoT devices reaching out to the out-of-intranet zone.

NAT gets the blame, and the intranet as a concept is generally a big corp term.

But I prefer my IoT devices not to need to reach out of my network. For me, NAT is an unwitting ally in the fight against such nonsense.

vollbrecht • 5 months ago

The mere existence of Tailscale should give a hint that NAT is only a speedbump and not any protection whatsoever. It protects you against nothing. Every method that Tailscale uses to traverse NAT can be in isolation used by any other piece of software. For more info about that you can read the following article.

https://tailscale.com/blog/how-nat-traversal-works

immibis • 5 months ago

What people really want is a firewall, and since NAT acts as a firewall, they confuse it with that.

My university has a public IP for every computer, but you could still only connect to the servers, not random computers, from the outside. Because they had a firewall.

username332211 • 5 months ago

What ordinary people (as opposed to IT departments) really want is firewall that can't be accidentally disabled by pushing an overly permissive firewall rule.

NAT/port forwarding, for all their faults make it rather difficult to write rules allowing traffic to a machine you didn't intend to expose to the world.

Symbiote • 5 months ago

Consumer routers have very similar UI for managing an IPv6 firewall as IPv4 NAT port forwarding.

This is not in any way a benefit of NAT.

lxgr • 5 months ago

Then... make the firewall UI so that you can't accidentally push an overly permissive firewall rule?

Just because NAT accidentally achieves some good outcomes doesn't in any way imply that said good outcomes are somehow exclusive to NAT.

lupusreal • 5 months ago

Yeah but the average person wouldn't know to set up a firewall (and can't count on their ISP to have their best interests at heart.) Therefore the general public benefits from the degree of protection that NAT provides.

Symbiote • 5 months ago

lxgr • 5 months ago

Then just enable the firewall by default, or don't even provide a way to disable it unless the user enters "developer/advanced/Pro (tm)" mode. None of these are valid excuses for NAT.

phendrenad2 • 5 months ago

"not any protection whatsoever" is way too strong a statement. NAT does raise the bar to exploiting a random smart lightbulb in your house significantly higher.

kccqzy • 5 months ago

The big distinction is that for Tailscale both endpoints know they want to talk to each other, and that both have Internet access. That's not the usual case firewalls are designed for.

Tailscale doesn't strictly need NAT traversal. They can run only their DERP servers and still continue to work. If your firewall tries to block two devices from communicating and yet allows both devices internet access, you have already lost.

lxgr • 5 months ago

Sounds like you like the idea of a stateful firewall, and good news: There are stateful firewalls for IPv6!

They have all the upsides of NATs (i.e. an option to block inbound connections by default), with none of the downsides (they preserve port numbers, can be implemented statelessly, they greatly simplify cooperative firewall traversal, you can allow inbound connections for some hosts).

Spivak • 5 months ago

I found it weird that IPv6 folks are so against NAT as a cultural thing when it works perfectly well on IPv6. They're not fundamentally opposed.

I could have all of my servers in public subnets and give them all public IP addresses, but I still prefer to put everything I can in private. Not only does the firewall not allow traffic in, but you can't even route to them. It now becomes really hard to accidentally grant more access than you intended.

I would hazard that most devices on there internet are in the boat of want to talk to the internet but not be reachable on it.

kccqzy • 5 months ago

Yea IPv6 folks are indeed against NAT philosophically because it's considered one of the big mistakes of IPv4.

There is a distinction between being publicly addressable and publicly routable. You can have the former without having the latter.

If you want more private addresses, IPv6 has a solution too: use ULAs and not GUAs. Design your internal network so it has mostly ULAs for application servers, database servers and the like, except for the reverse proxy having both publicly accessible GUAs as well as ULAs for talking to the rest of the network.

I personally use ULAs and GUAs concurrently on my network, because I have a residential ISP where my GUA prefix is not fixed.

lxgr • 5 months ago

I'm not opposed to anyone voluntarily using a NAT at all. I just hate it when somebody makes that decision for me, and that unfortunately still happens all the time.

If it's a well-reasoned decision, sure, but I do suspect that more often than not it's a lack of knowledge about alternatives that makes people still opt for NATs, and that just makes me sad on top of being annoyed with the inconvenience of having to tunnel when a direct connection seems so close at hand.

> I would hazard that most devices on there internet are in the boat of want to talk to the internet but not be reachable on it.

I highly doubt that. One big example is VoIP: Incredibly common these days, yet so much of it is going through centralized relays, and often for absolutely no technical reason.

chgs • 5 months ago

How do you feel about NPT?

Akronymus • 5 months ago

For a personal network where you decide to use NAT on ipv6? Sure, go ahead.

Being forced into a CGNAT on ipv6 is just a dick move though. And I believe that's the kinda NAT that has coloured the opinions of most NAT for ipv6 detractors.

rcxdude • 5 months ago

If you don't want that, then complain about a lacking a configuration as such and configure your firewall so that that they can't. But don't cheer on something that's breaking functionality that others might want (especially if it doesn't actually achieve your own goals reliably).

dent9876543 • 5 months ago

Oh, I do that too.

But to your point about not cheering on NAT, well I will because I see NAT as useful tool.

It is not an opinion well aligned with the preferences of the IETF. But the purist model of transparent end-to-end networking has never sat well with me. It’s just not a thing we want.

lxgr • 5 months ago

So because you don't want it means that nobody can get it?

Because that's what happens if you advocate for NAT by default. Conversely, with "IPv6 + inbound-blocking-firewall on CPEs by default", everybody gets the same behavior by default, and people that want something else can get that instead.

vv_ • 5 months ago

A telematics tracker in a vehicle that logistic companies use (e.g. Amazon, FedEx) is also considered as an IoT device. I don't believe that the author is talking about Smart Home appliances exclusively.

kstrauser • 5 months ago

What NAT are you using that doesn’t have a firewall? I haven’t personally used one of those since the ‘90s.

kazinator • 5 months ago

The first NAT I used in the middle 90's was IP Masquerading in the Linux kernel, by Pauline Middelink. That had a firewall.

kstrauser • 5 months ago

Same, but ISTR we had some Cisco gear where NAT (or PAT as they insisted; yes, I know difference; no, no one cared) had a different license or hardware requirement or something from firewall rules.

bluGill • 5 months ago

I agree until I discover I'm doing something where I want to access/change that device. It is really nice when I'm returning home early that I can change my thermostat out of vacation mode. I've often wished I had a way to tell if I left a door unlocked.

Security and privacy is of course critical to all this, but the concept of internet itself is not wrong.

craftkiller • 5 months ago

That's what a VPN is for. Every router I've had in the past decade has had support for running a VPN server so you can have one running 24/7 without any additional hardware. Even my retired elderly parents run a VPN server on their home router.

timewizard • 5 months ago

> retired elderly parents run a VPN

Does that VPN use certificates or a pre shared key? Do they understand the different security implications between those two choices?

craftkiller • 5 months ago

procaryote • 5 months ago

Especially if the data is unencrypted and only authenticated by source ip and a long lived token-like thing

raggi • 5 months ago

What I want from platforms, and I fought for at one time in Google with no success: a platform API that provides applications a way to schedule packets when the radio turns on.

The mobile platforms in particular continue to assume that you can live with their high level HTTP stuff, and it's just not good enough. The non-mobile platforms largely don't even approach the problem at all.

freedomben • 5 months ago

Indeed, that sounds like an obvious feature. Hard to believe it hasn't been implemented! I'd love to have that feature on Linux desktop/laptops. I think you could make lots of applications behave a whole lot better.

raggi • 5 months ago

the arguments from folks on the "architecture review boards" was that multiple connections are always bad and that developers can't be trusted. i'm willing to accept that they did get beat up quite a bit over power at various points when at times applications were a big part of the problem. That said this is also a gross misunderstanding of the problem and overall solution space, as well as very much gatekeeping.

blitzar • 5 months ago

Dump the entire queue to a google server (confirm receipt and shut down the connection) and then have the google server forward all the data to its destinations?

Seems to me that would be the lowest power, lowest developer trust, lowest number of connections, maximum gatekeeping method.

apple1417 • 5 months ago

I have worked on a device with this exact same "send a tiny sensor reading every 30 minutes" use case, and this has not been my experience at all. We can run an STM32 and a few sensors at single digit microamps, add an LCD display and a few other niceties and it's one or two dozen. Simply turning on a modem takes hundreds of microamps, if not milliamps. In my experience it's always been better for power consumption to completely shut down the modem and start from scratch each time [1] - which means you're paying to start a new session every time anyway. Now I'll agree it's still inefficient to start up a full TLS session, a protocol like in the post will have it's uses, but I wouldn't blame it on NAT.

[1] Doing this of course kills any chance at server-to-device comms, you can only ever apply changes when the device next checks in. This does cause us complaints from time to time, especially for those with longer intervals.

vv_ • 5 months ago

Power Saving Mode (PSM), a power-saving mechanism in LTE, was specifically designed to address such issues. It allows the device to inform the eNB (base station) that it will be offline for a certain period while ensuring it periodically wakes up to perform a Tracking Area Update (TAU), preventing the loss of registration. This concept is similar to Session Tickets or Session IDs in (D)TLS—or at least, that’s how I like to think about it. However, there are no guarantees that the operator will support this feature or that they will support the report-in period that you want!

Maintaining an active session for communication between the endpoint and the edge device is highly power-intensive. Even with (e)DRX, the average power consumption remains significantly higher than in sleep mode. Moreover, the vast majority of devices do not need to frequently ping a management server, as configuration and firmware updates are typically rare in most IoT deployments.

hasheddan • 5 months ago

Great pointer! My sibling post in this thread references a few other blog entries where we have detailed using eDRX and similar low power modes alongside Connection IDs. I agree that many devices don't need to be immediately responsive to cloud to device communication, and checking in for firmware updates on the order of days is acceptable in many cases.

One way to get around this in cases where devices need to be fairly responsive to cloud to device communication (on the order of minutes) but in practice infrequently receive updates is using something like eDRX with long sleep periods alongside SMS. The cloud service will not be able to talk to the device directly after the NAT entry is evicted (typically a few minutes), but it can use SMS to notify the device that the server has new information for it. On the next eDRX check in, the SMS message will be present, then the device can ping the server, and if using Connection IDs, can pull down the new data without having to establish a new session.

glowing12131 • 5 months ago

Is "Non-IP Data Delivery" (basically SMS but for raw data packets, bound to a pre-defined application server) already a thing in practice?

In theory, you get all the power saving that the cellular network stack has to offer without having to maintain a connection. While on protocol layer NIDD is almost handled like an SMS (paging, connectionless), it is not routed through a telephony core (and hence sloooow). The base station / core will directly forward it to your predefined application server.

vv_ • 4 months ago

It has been heavily advertised, but its support is inconsistent. If you are deploying devices across multiple regions, you likely want them to function the same way everywhere.

lxgr • 5 months ago

802.11 supports the same thing. A STA (client) can tell an AP that it'll be going away for some time, and the AP will queue all traffic for the STA until it actively reports back. Broadcast traffic can also be synchronized to particular intervals (but low power devices are usually not interested in that anyway for efficiency reasons).

vv_ • 4 months ago

I have very little experience with Wi-Fi, as the industry I worked in relied almost exclusively on cellular networks. However, I wonder how many Wi-Fi routers actually support this functionality in practice - as queing traffic means you need to cache it somewhere.

hasheddan • 5 months ago

Author of this post here -- thanks for sharing your experience! One thing I'll agree with immediately is that if you can afford to power down hardware that is almost always going to be your best option (see a previous post on this topic [0]). I believe the NAT post also calls this out, though I believe I could have gone further to disambiguate "sleeping" and "turning off":

> This doesn’t solve the issue of cloud to device traffic being dropped after NAT timeout (check back for another post on that topic), but for many low power use cases, being able to sleep for an extended period of time is more important than being able to immediately push data to devices.

(edit: there was originally an unfortunate typo here where the paragraph read "less important" rather than "more important")

Depending on the device and the server, powering down the modem does not necessarily mean that a session has to be started from scratch when it is powered on again. In fact, this is one of the benefits of the DTLS Connection ID strategy. A cellular device, for example, could wake up the next time in a completely different location, connect to a new base station, be assigned a fresh IP address, and continue communication with the server without having to perform a full handshake.

In reality, there is a spectrum of low power options with modems. We have written about many of them, including a post [1] that followed this one and describes using extended discontinuous reception (eDRX) [2] with DTLS Connection IDs and analyzing power consumption.

[0]: https://blog.golioth.io/power-optimization-recommendations/ [1]: https://blog.golioth.io/turn-off-subsystems-remotely-to-redu... [2]: https://www.everythingrf.com/community/what-is-edrx

clearint • 5 months ago

This article should clarify at the start whether TCP or UDP is under consideration. NAT idle timeouts for both are typically very different. RFC 5382 [0] specifies no less than 2 hours and 4 minutes for TCP. RFC 4787 [1] specifies no less than 2 minutes for UDP. Towards the end of the article it becomes clear that it's UDP.

The example diagrams also incorrectly show port numbers exceeding 65535. The port fields in TCP and UDP headers are 16 bits [2].

[0]: https://www.rfc-editor.org/rfc/rfc5382 [1]: https://www.rfc-editor.org/rfc/rfc4787 [2]: https://textbook.cs161.org/network/transport.html

apitman • 4 months ago

That's interesting. Do NATs in the wild tend to be spec-compliant with their timeouts?

justahuman74 • 5 months ago

Interestingly, IPv6 is not listed as the solution

LegionMammal978 • 5 months ago

An IPv6 router with a stateful firewall blocking incoming connections could have just the same issues with timeouts, I'd imagine. Switching to IPv6 doesn't just mean that anyone can make a P2P connection to anyone else (even STUN needs a third-party server to coordinate the two peers).

(D)TLS session resumption (I'm not sure if their "Connection IDs" are that or something similar) seems like the most foolproof solution to this scenario, assuming that the remote host can support it.

namibj • 5 months ago

But it'd be trivial to tell it to free the device from it, unlike with NAT, where you pretty much have to expire sessions to not run out of memory.

LegionMammal978 • 5 months ago

Not if the end user isn't in control of the firewall. (And if they were, then they could just forward dedicated ports for the devices they need.) It might not be as bad as the CGNAT situation, but there are plenty of big WANs that can't be reconfigured at will.

chgs • 5 months ago

I have firewalls in v4 and v6 networks which don’t do any natting (well other than some 6-4 between them). They track sessions for security purposes, and they time them out for both security and memory reasons.

MerManMaid • 5 months ago

>An IPv6 router with a stateful firewall blocking incoming connections could have just the same issues with timeouts, I'd imagine.

You'd be surprised... PCP (Port Control Protocol) implemented by large vendors such as Cisco and Apple are able to punch through a firewall for up to 24 hours in a single session.

https://github.com/Self-Hosting-Group/wiki/wiki/Port-Mapping...

api • 5 months ago

The weird fear around it is crazy. It’s mostly just bigger IPs and it makes so much complexity and ugly hacks like NAT go away.

withinboredom • 5 months ago

and also so many other things. ARP goes away, dhcp goes away -- yet people reinvented dhcp anyway and did it wrong (IHMO).

I'm of the opinion that IPV6 changed some small things just enough to get people to have to learn new stuff -- and also forgot that NAT is not a firewall, somewhere along the way.

userbinator • 5 months ago

IPv6 is its own complexity.

api • 5 months ago

Why?

positr0n • 5 months ago

Great question that I don't think the other commentors really answered. Sadly I don't really know "why" either. Second system effect?

Instead of making IPv6 just "IPv4 with 128bit instead of 32bit address space" the designers changed a bunch of things. NDP is new and different than ARP. ICMPv6 is extended and different. MLD instead of IGMP for multicast group membership. DHCP is extended and has a bunch of other alternatives, like SLAAC. IPsec is mandatory now.

All the new IPv6 protocols seem more secure, more powerful, and more efficient. So I guess that is "why". But I bet in an alternative history where backwards compatibility and the upgrade process were prioritized IPv6 would be a lot more widespread.

Joel_Mckay • 5 months ago

LegionMammal978 • 5 months ago

gnabgib • 5 months ago

It was 3 months ago: https://news.ycombinator.com/item?id=41884515

vv_ • 5 months ago

Few cellular modems commonly used in IoT support IPv6, and not all mobile network operators provide an IPv6 address. Since cellular connectivity plays a major role in the industry, IPv6 cannot be used as a blanket solution to this problem.

altairprime • 5 months ago

Either it would or it wouldn’t help in most cases, but the absence of consideration of it at all weakens the article’s arguments from a carrier perspective. IPv6 adoption was at 90% by US mobile carriers a couple years ago, and the US is not known for its telco infrastructure investment; so, while using IPv6 may not be a uniform cure for their issues, the article’s total focus on legacy IPv4 NAT issues is in stark contrast to its availability carrier-side in one of the weakest examples available. China regulates that IPv6 be supported and enabled by default on all hardware sold for use in-country since last year, and telcos have six months left until a first-stage IPv4 new-hardware prohibition goes into effect later this year, so the assumption that most cellular modems don’t support IPv6 seems unlikely as well given their regulatory climate. This deserves more research or at least an explanation of why such was not done for the initial release of the paper.

kiwijamo • 5 months ago

There are still countries with zero IPv6 on mobile, e.g. New Zealand. All three mobile carriers refuse to deploy IPv6 on mobile despite two of them doing so on their fixed line networks. Do agree with your point tho IPv6 should be considered a possible solution.

wavesound • 5 months ago

On modern firewalls/routers, NAT is only one cooks in the network kitchen raining on this author's parade. Stateful Packet Inspection has timeouts too!

lxgr • 5 months ago

But for TCP, you don't even need to be stateful just to prevent inbound connections, which is a huge win over NATs.

UDP still needs state tracking, unfortunately.

procaryote • 5 months ago

It seems brave to let an IoT device talk to someone over an unencrypted wan though. They're often of pretty varying software quality and rarely updated.

If you really want IoT wifi devices, put them on a separate wifi, and only let them talk to a local device that you can keep up to date. Assume they're vulnerable to local attacks over wifi and act accordingly, e.g. don't give the IoT wifi access to your other devices beyond to that controller, and definitely not to the wider internet.

If they're closed source, assume they're already compromised from the factory

vv_ • 5 months ago

The issue in IoT is that most people expect to be able to control their IoT (e.g. Smart Home) devices from outside of their network. This requires you to have a central server these devices communicate with or a similar deployment on-premises with a public IP address.

I've always wondered how it is economically feasible to run these central services without a monthly subscription. If you stop selling devices you'll go under fairly quickly.

leptons • 5 months ago

I have all my "Tuya" IoT devices on a separate network, isolated from my personal intranet, and the cloud servers these devices talk to are in China. I pay nothing to "Tuya" for their cloud service, never have. If the "Tuya" service ever goes down, my home automation is also down, and that sucks. I'm trying to replace it all with "Tasmota" devices now, but it's not quite as easy or as cheap to do.

procaryote • 5 months ago

Yeah, sadly, people care about convenience and not security. So you get things like this:

https://www.malwarebytes.com/blog/news/2024/04/ring-agrees-t...

and somehow they are still in business, and popular.

If you do care about security, keeping your home-automation within your own control is probably the only sane path. Homeassistant and similar open source things like openhab are pretty good if a bit fiddly, and a wireguard vpn like tailscale a fairly practical way to access it when away from home.

vv_ • 4 months ago

> If you do care about security, keeping your home-automation within your own control is probably the only sane path.

It is also very expensive, so only a minority of informed users would select such a route. Not to mention that it is not trivial to setup.

To my knowledge the only company that has decent security practises is Ubiquiti.

What the article describes is actually fairly standard in the IoT industry. I'm surprised there have been so few lawsuits.

lxgr • 5 months ago

NAT breaks the central idea of IP (i.e. packet switching with stateless intermediary nodes). All of its problems are essentially downstream from there.

kazinator • 5 months ago

What you can do is port forwarding. You have a bunch of devices behind a 1:N NAT, so they share one IP address. For specific services on those devices, you can pair dedicated ports with this IP address, binding them to internal IP:port pairs.

It's not a perfect solution for every scenario, and requires configuration, but there it is.

This is how people on residential lines run web servers, mail servers, ... they map TCP ports like 443, 80, 22 and 25 on their router to go to specific internal hosts.

Doh!

bb88 • 5 months ago

With CG-NAT this doesn't work. Multiple customers are sharing the same IP address, all of which are sitting behind a NAT. Further the internet gateway is a NAT sitting behind the CG-NAT. And if you prefer to use a nice Mesh WiFi router, well that's a third NAT layer.

Common suggestions I've heard:

"Use a VPN"

I tried to buy a computer from Apple directly. They detected the VPN and wouldn't let me purchase it. I turned off the VPN and the purchase worked. I then got a call from Apple asking me if I really did intend to buy the computer.

"Use Tailscale"

On my setup, tailscale can't really navigate the NAT setup correctly. I have a 200Mb downlink and I will often only get a 10th of that through tailscale. Also the routing table no longer works as routing is handled by a myriad of netfilter rules -- which may or may not conflict with docker on the same box.

"Use Ubiquiti (or other networking gear) to get rid of the last NAT"

Gah. I didn't need to perform a huge cash outlay 10 years ago and rewiring the house to do the same thing I'm trying to do today.

kazinator • 5 months ago

I don't have my own external IP address (even if dynamic) + I want my devices to be spontaneously contactable from the network without polling anything from the inside = does not compute

bb88 • 4 months ago

Right and that's what breaks the internet.

procaryote • 5 months ago

True in the general sense, but don't do this for IoT devices. Always remember that the S in IoT stands for Security

johnklos • 5 months ago

This has poorly considered generalizations and reads like an ad.

hasheddan • 5 months ago

An ad for... the IETF? All of the firmware discussed in this post is open source, and we even contributed DTLS Connection ID server side support to a popular open source library [0] so other folks can stand up secure cloud services for low power devices. Sure, we sell a product, but our broader mission is making the increasing number of internet connected devices more secure and reliable. When sharing knowledge and technology is in service of that mission, we do not hesitate to do so.

[0]: https://blog.golioth.io/golioth-announces-connection-id-supp...

VagabundoP • 5 months ago

Wasn't IPv6 suppose to solve all this?

I don't understand why that stalled.

Also considering the state of iot security its probably not a great idea to have everything accessible anyway. But that's a slightly different problem to solve.

chgs • 5 months ago

It’s not going to solve a stateful firewall timing out.

You try to continue a tcp session that’s timed out on my firewall and the packets will be dropped.

This applies a fair amount t to me when I suspend my laptop, my ssh session will drop as both the server and the firewalls drop the session while it sits there peacefully. When it comes back the tcp packets get sent into the void.

Meanwhile my WireGuard connection which runs through two separate ipv4 nats works just fine, as it doesn’t rely on sessions or a server timing out a socket.

Nat is irrelevant to the problem.

VagabundoP • 4 months ago

Why not let through all ipv6 traffic? It seems improbable that your device is going to get caught up in a ipv6 scan; the address space is too vast - I'm assuming here.

Just have relatively secure devices using ipv6 - not shipping with standard default admin passwords would nearly be enough if there aren't obvious vulnerabilities.

philjohn • 5 months ago

Isn't this the whole point of Thread? You have a low power mesh network for inside a deployment (house, office, factory, whatever) and it's the border router which does the "communicate with the internet/lan/wider network" piece. These are typically plugged in (so no worries about having to be low power) have plenty of memory (so no worries about having to drop established routes like a router).

gabesullice • 5 months ago

> This doesn’t solve the issue of cloud to device traffic being dropped after NAT timeout (check back for another post on that topic), but for many low power use cases, being able to sleep for an extended period of time is less important than being able to immediately push data to devices.

Is "less important" a typo?

hasheddan • 5 months ago

Yes, thank you for calling out. It should read "more important". This has been corrected.

eternityforest • 4 months ago

Why are we still making IoT devices that talk to anything other than a local hub? They don't need to care about NAT.

arcmechanica • 5 months ago

NAT was an good solution that the IETF came up with, we wouldn't be able to have the internet at our scale without it

p_l • 5 months ago

NAT was introduced by private company called Network Translation Inc. and successfully broke efforts to migrate off IPv4 (which was supposed to be EOLd by 1990) and permanently broke the "network of hosts" into asymmetric one of servers and clients.

Note that we had a solution for address exhaustion by 1991, but it was just "good" and not "perfect" and worst of all it used the hated OSI protocol stack (TUBA - TCP & UDP on top of OSI CLNS - also known as IPv9). It even had at least two usable implementations at the time it was proposed (for SunOS and Cisco IOS)

fc417fc802 • 5 months ago

> permanently broke the "network of hosts" into asymmetric one of servers and clients.

That was inevitable and can't reasonably be blamed on NAT. As but a few examples. ISPs arbitrarily break things unless you pay them extra. Stateful firewalls are a good thing. Even on my local LAN I can't reliably SSH into my laptop because it's on WiFi and does funky power saving stuff with the chipset I guess. My phone is far worse than my laptop in that regard.

Setting up a reliable and widely reachable server requires deliberate effort regardless of the existence of NAT.

Hikikomori • 5 months ago

First product, but does the early RFCs and discussions predate their work?

p_l • 5 months ago

There's some work predating it, but it was very much undeployed before PIX let out the genie out, because even the early NAT work notes it might be problematic even in short term.

wmf • 5 months ago

People would have resisted TUBA the same ways they're resisting IPv6 now. It's not a technical problem.

p_l • 5 months ago

It's partly technical (BSD Sockets being bad API that hard does low level proto ok details in applications) and partially business - vendors didn't want to do the work to upgrade software and hardware - especially with advent of CEF and similar hardware routing options. And by 1990s the government-led standardisation efforts that gave us widespread ethernet and IPv4 got axed, and efforts to make vendors update if only for federal contracts died in waiver hell.

The others kinds of problems are from there over time.

PhilipRoman • 5 months ago

The thing I like about NAT is that it is essentially an ISP side stateful firewall. I would migrate to the ipv6 globally addressed mode immediately if my ISP had a checkbox "disallow all incoming connections".

yjftsjthsd-h • 5 months ago

> I would migrate to the ipv6 globally addressed mode immediately if my ISP had a checkbox "disallow all incoming connections".

Does your router not already do that by default?

PhilipRoman • 5 months ago

As another commenter already clarified, I meant CGNAT on the ISP side. I don't believe any ISP currently offers an equivalent firewall on their side.

lxgr • 5 months ago

ISP side? Hopefully not, or rather, only if you're behind one of those awful CG-NATs (and I'm not aware of any that let you actually configure port forwarding, although my knowledge here might be outdated; fortunately I haven't had to deal one in a long time). Otherwise, it's usually your CPE doing the NATting.

It sounds like you want an ISP-provided stateful firewall though, upstream of your (metered, slow) connection, which I'd agree would be a great feature to have!

fc417fc802 • 5 months ago

It's funny. They'll block things I don't want them to block (email, http server, ...) but not unsolicited inbound IPv6 connections.

kstrauser • 5 months ago

Your router almost certainly has that option.

PhilipRoman • 5 months ago

Of course, it's probably the default everywhere, but with NAT the traffic never reaches me in the first place.

ianburrell • 5 months ago

kstrauser • 5 months ago

Interesting. My ISP passes both IPv4 and IPv6 inbound, expecting you to block them yourself.

nrabulinski • 5 months ago

Unless IPv6 were to be actually adopted as it was introduced

gamedever • 5 months ago

I don't know networking all that well. In my mind, I have 50 devices connected to my router behind NAT. My Mac, My Apple TV, my iPhone, My PC, My Linux Box, My partner's versions of all of those. My video games. Etc

From outside there's 1 IP address. With IPv6, every device would get it's own address outside. Why do I want that? That sounds less private to me. Am I mis-understanding something? Lots of traffic on one IP address sounds more obfuscated than all separate.

jeroenhd • 5 months ago

With IPv6, every device has multiple IP addresses. One or more addresses that are rotated* to prevent you from being tracked easily, and one that's derived from your device's MAC address so you can make your devices easily accessible from WAN by opening ports in your firewall if you want to.

You could disable the rotating addresses, or disable MAC-based ones by using DHCP, but there's usually no point.

As for why you would want something like that: a whole bunch of software and hardware breaks because of NAT. Consumer NAT has some monkey patching inside of it rewriting some protocols to make them work again (which also allowed random websites to open arbitrary ports to arbitrary addresses in some Linux routers a while back, because NAT overrules firewall settings to work) but there are still limitations.

For instance, if you're having issues with your Nintendo Switch, Nintendo will tell you to forward every single port to your Switch (https://en-americas-support.nintendo.com/app/answers/detail/..., hope that IP address doesn't get reassigned to an unpatched device later). Multiple Xbox consoles behind the same NAT requires tricking them into super-restricted-NAT mode to work, or enabling UPnP which allows devices to open ports in your firewall without any authentication.

NAT just kind of sucks. IPv6 wasn't ready for deployment when NAT gained popularity, but all of the reasonable problems have been solved over a decade ago.

*=default rotation happens daily, but your OS may allow you to pick a shorter duration. I've found out the hard way that setting this to five minutes will fill up Linux' route table real fast after a few days.

Hikikomori • 5 months ago

Marsymars • 5 months ago

> From outside there's 1 IP address. With IPv6, every device would get it's own address outside. Why do I want that? That sounds less private to me. Am I mis-understanding something? Lots of traffic on one IP address sounds more obfuscated than all separate.

Having recently enabled IPv6 for my home network, the "why" was that a) IPv6 to IPv6 connections are nominally more efficient than those that have to traverse NAT and b) it enables connectivity to/from IPv6-only internet devices.

The privacy upsides of a single IPv4 IP for a household are, to me, more marginal than the above benefits.

wmf • 5 months ago

I'm pretty sure the IETF fought against NAT.

hermanubis • 5 months ago

Why not just have the server send a token to the client that the client includes in the next request?

iforgotpassword • 5 months ago

Because by that point the tcp session is already broken.

withinboredom • 5 months ago

UDP?

londons_explore • 5 months ago

> This doesn’t solve the issue of cloud to device traffic being dropped after NAT timeout (check back for another post on that topic),

This is the key problem. I think it would be best solved by NAT devices having some way of probing their timeout policy, and then notifying the endpoints when they expire a mapping.

One way to do that might be for a client to deliberately send a packet to the server with an insufficient TTL.

The reply (via ICMP) could then contain fields specifying the timeout in minutes of the mapping, together with some generation number specifying if the mapping table has been cleared due to reboot or overflowed since last queried.

There might even be a way to request a specific mapping become long lived - perhaps for months or years.

The benefit of all this is that client devices can do push notifications from servers or P2P notifications, all with no polling - allowing for example coin cell devices to last for months or years (with appropriate WiFi protocol improvements too)

mannyv • 5 months ago

People have been complaining about NAT for decades. It's time to STFU. It's a well-known problem with a bunch of well-known solutions.

If you don't like NAT you could go IPv6.

But really, why do you need to talk to your device? If it's just reporting in NAT is irrelevant. If you want to do device management just write something into your protocol to check for updates/commands and deal with it on a periodic basis. You can even do that on startup, so you can tell the customer to power cycle the device. It's unlikely that any IoT device needs instant updates, so long periodic updates are probably fine.

dave78 • 5 months ago

There are plenty of IoT devices that people want to execute commands on (anything remotely controlled, basically). Polling for commands on a periodic basis introduces lag into that process which is irritating. Furthermore, polling at a frequent interval can end up using a lot of power as well versus waiting in a receive-only mode for an incoming command.

raggi • 5 months ago

The alternative to polling is unfortunately polling, which is what the article is about.

You can avoid polling for messages, but you have to send packets outbound regularly in order to maintain a NAT mapping & connection, so that the external side can send messages inward.

The latency is overcome this way, so latency is a solvable problem, but this need to constantly wake up a radio every <30s in order to keep a NAT session alive is a significant power draw.

In theory you might be able to avoid this with NAT-PMP / UPnP however their deployments are inconsistent and their server side implementations are extremely buggy.