Optimum search with geospatial coordinates

Joshua Marsh joshua at themarshians.com
Wed Mar 19 07:30:28 MDT 2014


On Wed, Mar 19, 2014 at 12:39 AM, Tod Hansmann <plug.org at todandlorna.com>wrote:

> I wasn't suggesting a database, that's just silly.  I also work in
> resource constrained environments (sometimes as constrained as a PIC18),
> but that doesn't mean I wouldn't setup an in-memory tree of some sort
> (instead of the array/vector) if I needed to do a search a lot of times.
>  Even a linked list could be sorted in various ways. Anyway, if it's an
> array/vector there's really no avoiding checking every single item, because
> there's nothing saying the next item can't have a shorter distance result.
>  It's an easy calculation and all that, but it's still going to be O(n)
> regardless of the approach.
>
>
I think Tod has some good points about this. If the processing speed isn't
up to snuff, an O(n) loop could be very time consuming even with just a
thousand iterations. You probably won't have issues at a hundred, but you
should test the speed. Threads may help, but only if you actually have
multiple processors/cores to do the work. Otherwise, the threading will
just consume extra resources.

R-Tree and it's variants are all fairly common for this task. If the data
points change infrequently (think a list of stores for an app that may only
be updated weekly), you should be able to serialize the R-Tree so you don't
have to rebuild it every time you load the application.

Again, though, I'd suggest you do some real testing on a system that has
the resources you expect. On my laptop, the difference between R-Tree and
iterating over an array is unnoticeable for hundreds of records. If I do
millions of records though, it starts to become clear that the R-Tree is
faster. Just like most optimizations, it's a trade-off between the pain
your customers might feel and the pain you'll feel. I think though you
should wait to make a decision until you can empirically say that your
vector iteration is going to be a pain point for the average/worst case
scenarios.


More information about the PLUG mailing list