B-Districting: the smallest way to store data

2011-03-11

the smallest way to store data

is not to store it at all.
A unique id for a US Census block winds up being 15 decimal digits, which fits handily into an 8 byte int.
Actually there are less than 10,000,000 blocks in the US, so that could easily be a 32 bit number.
But if what I really want to do is store a mapping from each block to district number for each block (easily a 1 byte number), the smallest way to store this is just a list of district numbers. Use the Census data file as a canonical ordering of the blocks.
CSV for this becomes 15 decimal digits, comma, one to three decimal digits, newline. 20 bytes vs 1.
For the hundreds of thousands of blocks in Texas, after gzipping the CSV, this is a 2372 KB file. gzipped byte list is 32 KB.
Sadly, a CSV file in a .zip archive seems to be the common interchange format for these things.
At least I get to use my format between my client and my server.

1 comment:

adminJune 1, 2021 at 8:26 PM
google 2567
google 2568
google 2569
google 2570
google 2571
google 2572
google 2573
ReplyDelete
Replies

Add comment

B-Districting

2011-03-11

the smallest way to store data

1 comment:

Followers

My Blog List

Blog Archive

About Me

B-Districting

2011-03-11

the smallest way to store data

1 comment:

Followers

My Blog List

Blog Archive

About Me

Subscribe To