2021-07-06

Storing IP's and networks in SQL Server efficiently

When you Google on how to store IP addresses efficiently in SQL server few approaches keep coming up. Most of these approaches most notably tend to 'forget' about IPv6 and work exclusively for IPv4 IP addresses. Other approaches use a (var)char to store IP addresses and this works fine, as long as your (var)char allows up to 39 characters if you want to be able to handle IPv6 IP addresses. It'll also work fine whenever all you need to do is store an IP address or maybe even search for a specific IP address, such as in logging scenarios, where you can write SELECT * FROM MYTABLE WHERE [ip] = '192.168.0.1'. However, when storing networks (such as 192.168.0.0/24 in CIDR notation) things get a bit harder. If you want to find all networks that 'contain' a given network you're in for some trouble. Another solution is to store IP's as (var)binary. Again, most solutions I found only worked with IPv4 but it's perfectly possible to store both IPv4 and IPv6 IP's in a single (var)binary field.

Now let's assume we want to search our networks for any network containing a given IP. You could retrieve all records from your table, rehydrate them all, and iterate over them to find any matching networks in code. But that may be a very expensive operation if you have lots of networks. Ideally we want SQL to do the heavy lifting and only return records we're actually interested in.

Turns out this is quite possible without even needing UDF's. For a given network we store the "Prefix" (which is, or rather, should be) the first IP in the network and the prefixlength. So for 192.168.0.0/24 we store the prefix (192.168.0.0) and the prefixlength (24) but we also store the 'last' IP in the network; in this example that would be 192.168.0.255. If we store the IP's as (var)binary it turns out that SQL Server is perfectly happy using the BETWEEN operator on these fields. And, yes, that also works with IPv6 IP addresses which are 16 bytes of length. To use this with Entity Framework Core we will need to 'massage' EF a little but nothing too wild. Let's see what it takes...

Sidenote: We'll create a project using Entity Framework Core but that is not a requirement; the technique (using (var)binary) to store IP addresses will work just fine on it's own and as long as you represent IP addresses in hexadecimal form in your queries you'll be good to go. Also we're assuming you're using Visual Studio 2019 or later and .Net 5.0 or later. Again, neither of these are a requirement for the technique to work.

First: let's set up our solution and project: Create a directory, say IPNetworkEFCore, open a command prompt or terminal and CD into the directory. Then run the following commands to create a solution, project and add the project to the solution:

dotnet new sln
dotnet new console
dotnet sln add .\IPNetworkEFCore.csproj

Let's also add some packages we'll require:

dotnet add package Microsoft.EntityFrameworkCore.Design
dotnet add package Microsoft.EntityFrameworkCore.SqlServer

Now open the solution. The following is optional, but bear with me: Double-click the project (IPNetworkEFCore) in your solution explorer. Add the following two XML elements to the PropertyGroup element:

<LangVersion>latest</LangVersion>
<Nullable>enable</Nullable>

Let's start defining our entity. Create a class named Network and add the following code:

Now, things aren't gonna be -this- easy, it won't be too bad. Let's change our entity a bit:

We added a static _none field which will be our "No IP address" value. We're also going to add a _prefixbytes backing field and _last private property. Essentially, these are going to hold the IP address bytes; the _prefixbytes being the "start" of the network and _last being the "last" IP address in the network. To calculate the last IP address we're adding the CalcLast() and CreateMask() methods. And finally, we tell EF that we don't want to map the Prefix and Last properties by adding a [NotMapped] attribute. And that's about as hairy as it gets.

What's going on here?

The _none field is just to keep the compiler happy and make sure the Prefix property is always initialized. The [NotMapped]attributes ensure the Prefix and Last properties are not mapped (d'uh) to the underlying table; instead we use the _prefixbytes and _last field/property. These contain the bytes that make up the first and last IP address in the network. Since the last address can be calculated from the Prefix and PrefixLength we can get away with a "No-op" setter (or rather: initializer in this case; did you notice we use a record instead of a class?). The bulk of the entity is now made up of the CalcLast() and CreateMask() methods (and those could be easily moved elsewhere to keep the entity clean(er)).

Now, I like to keep my entities with my EntityTypeConfiguration. So we add an internal NetworkConfig class that tells EF to map the _prefixbytes backing field and _last property to the Prefix and Last database fields and we also specify an index.

Add a new class named IPAddressExtensionMethods. We'll add an extension method to convert an IP address to a hexadecimal representation in this class:

Next up is our DbContext; add a class named IPNetworkTestDBContext and add our Networks DBSet. Nothing out of the ordinary here. Yet. Make sure you change the connectionstring to whatever you need.

Next we create our initial migration and create the database by running:

dotnet ef migrations add Initial
dotnet ef database update

We should now have a database named IPNetworkTest containing a Networks table. And... that's about it. Most of the hard work is done!

We're now ready to test it. For testing purposes we'll add two "convenience methods" to our DBContext; a ClearNetworks() method to quickly clear the Networks table and a FindNetworksContaining(IPAddress) method to find networks containing the given IP address. Usually you won't need the former method and the latter I would usually add to a repository or something similar and not to the DBContext.

Remember: from here on we're just testing and playing with our code. First we add some networks to the database. The description of every network is a string representation of what is, essentially, in the Prefix + PrefixLength fields. The Last field is only needed to simplify searching. Every time you run the program the Networks table will be cleared and the networks will be added.

Let's have a gander at our table in SSMS:

Table contents

So far, so good. Everything seems to work. Now let's see if we can find a network based on an IP address that is in one of the networks. We add a bit more code and run it to see the testresults.

Results

Seems about right to me! Now, I haven't tested this more thoroughly yet, but I am open to suggestions and/or improvements. Just send me a PR and we'll take it from there.

There is one small-ish issue with this code; technically a network like 10.0.0.15/24 doesn't make much sense; it's the same as 10.0.0.0/24 - both have a first IP of 10.0.0.0 and last IP of 10.0.0.255. This isn't handled by the current code. We could fix this pretty easily, however, I think I prefer to be able to reconstruct the original value(s) when storing a network. I don't think it's the entity's responsibility, nor the datalayers' responsibility as a matter of fact, to ensure a network makes sense. I think that should be handled elsewhere.

You can find the entire project here. You'll find that I added an example for a single IP address and you'll see it's a lot easier!

Happy coding!

P.S. Yes, I know PostgreSQL has inet and cidr datatypes - for quite some while now actually.

No comments:

Post a Comment