Friday, June 22, 2012

Searching point clouds - Hashes vs Trees

About a year ago I was sitting in the lawn outside of the Google buildings in Mountain View, discussing the project I was working on at the time. I had the fun job of displaying very large (60GB +) NetCDF data sets in NASA WorldWind. We were approaching it from the classic game developer perspective and considering building an octree or hyper-octree (this data had temporal slices) index over the dataset to quickly locate subsections we wanted to display at the proper detail. As an aside we began to consider what the limits of the disk to screen transmission are. A 1920x1200 screen with RGB data will have about 6.6 MB of data uncompressed, with compression this will come down a bit. At 60 FPS this requires about 3 Gbit/s disk bandwidth, compare this to what is available via SATA 3. SATA 2 will definitely be too slow adding the times required to search the index, perform seeks and read the pre-computed chunk. It might look better with highly redundant data and good compression.

The idea here comes down to searching using 2 different methods - Hash Search vs Binary Search. Octree based rendering is essentially a binary search in 3 dimensions. The other way to render would be to hash the camera position and orientations or view frustrums, create indexes and blocks of the point cloud being rendered to load chunks as the camera is moved around and data block required is located on the disk using the index + hash. The gamers keep harping on about polygon counts without looking at how the facade of the very large word-count of the internet can be quickly acessed via hashing the query, similarly the first surface of a large object count scene can be quickly accessed via hashing the camera.

All this brings me to the point of the conversation. In the last few days a bit of a drama played out in the hard-core gamer community regarding a leak from Euclideon, who make engines not websites. Hash based searching will work quite well for static data and unlike trees will not require rebalancing for dynamic data. Finding a nice unique hash for frustrums can get quite tricky and storing and compressing the results in a nice data format will also be a challenge. Building a game engine from scratch without all the blocks the graphics world has put in place is a Radical Rewrite. May be it will work, then again we live in a universe where there are 4 dimensions and no one asks how to compute the volume of a hyper-sphere. Knowing e=mc^2 does not let you build an atom bomb or a nuclear reactor, even figure out how to avoid Fukushima. Engineering simply is not the same as Mathematics.

No comments: