Finding trips in a mess of events, with confidence.
As part of the GPS / GSM Tracker I am working on I send data back to a server periodically to be parsed and stored in a database. This data includes things such as the time, position, speed, altitude, HDOP, etc. In the context of this project, I refer to each set of data as an event.
Currently, each individual event is lumped into the one database table and is identified by an **EID **(Event Identifier) and a DID (Device Identifier – The Device ID which is sent as part of the event). This is all well and good as I can look through the table and see what each device is doing at any given time. However, what I want to be able to do is pick out indiviual **Trips. **For example, if I put the tracker into the car and drive from my house to the local shopping centre, every event included in that time should form a single **Trip. **I want to be able to do this intelligently without any human input…
The solution to this has so far ended up to be quite a simple solution: To write a class that does the following:
- Load all events in the events table (Looking at implementing a marker so we only parse events that need to be).
- Discover each individual device that has an event in that table (Planning for multiple devices).
- Scan through each devices events.
- Mark the first event as the start of a trip
- Run a series of simple tests to determine when we have located the event that will act as the far trip boundary.
- Create a new Trip object and return to step 4.
- Repeat until there are no more events and we have scanned through all devices.
A Trip then becomes a subset of that data that encompasses all events between the *near trip boundary *(E.G. Event ID: 231) and the far trip boundary (E.G. Event ID 276).
near-trip boundary of trip 1: EID 231 - EID 232 is part of trip 1 - EID 233 is part of trip 1 ... - EID 275 is part of trip 1 far-trip boundary of trip 1: EID 276 near-trip boundary of trip 2: EID 277
From that information we can calculate a whole host of other data, such as average speeds, elevations, analyse speed data and plot out the route on a map.
Seems simple enough, right? Now the key problem here becomes just exactly how we determine that we have located the far trip boundary. As said before, I run a series of simple tests to determine if we have found the boundary. Using these tests I can assign a confidence level to a trip boundary, then, if the confidence level is greater than a certain value it is reasonable to assert that we have located the trip boundary (Or at least are pretty close).
I am running two basic types of tests: Time-based and location-based. The time based tests check if a certain amount of time has elapsed since the last event. The more time has elapsed, the more confident we can be that the event before that one is a trip boundary.
The same goes for the location, if the tracker hasn’t moved very far (Allowing for GPS inaccuracies) then we are probably starting a new trip. This on it’s own doesn’t really make us confident enough to say it’s a new trip, however if the tracker hasn’t moved more than 100m and it’s been more than 24 hours since the last event, then we can be pretty confident that it’s a new trip (And so create the far trip boundary at the previous event).
Thus is my reasoning for building the confidence levels. As an example, here are the tests I am currently using:
//First we check time if (difftime(eventTime,lastEventTime) > TIME_1) confidence += TIME_1_CONF; if (difftime(eventTime,lastEventTime) > TIME_2) confidence += TIME_2_CONF; if (difftime(eventTime,lastEventTime) > TIME_3) confidence += TIME_3_CONF;
// Now check location
if (calculateDistance(lat, lon, lastLat, lastLon) < LOC_1) confidence += LOC_1_CONF; if (calculateDistance(lat, lon, lastLat, lastLon) < LOC_2) confidence += LOC_2_CONF; if (calculateDistance(lat, lon, lastLat, lastLon) < LOC_3) confidence += LOC_3_CONF; }
Here LOC_*n *and LOC_n_CONF are the location thresholds and the confidence level that should be assigned to each, respectively. The same goes for time.
Now this is all well and good, but what happens if there, say, the event is the last in the DB table? This will mean it will be ignored because there is no event after it to compare it with. To get around this, the last event in the table gets a fairly high confidence level (Because if there are no more events, the tracker is probably disabled).
if (isLastRow) confidence += 5;
Again this wont be enough on it’s own, it needs some time-based tests as well to push the confidence over the threshold.
We also have another problem, what if the tracker updated 25 seconds before we stopped, and that was 1KM away? Then the location-based tests wont be any good at all. So, if none of the distance tests are satisfied, or if it is the last row in the table we run the same time-based tests, except this time comparing with the current time:
if ((calculateDistance(lat, lon, lastLat, lastLon) > LOC_1) || (isLastRow)) { if (difftime(now,eventTime) > TIME_1) confidence += TIME_1_CONF; if (difftime(now,eventTime) > TIME_2) confidence += TIME_2_CONF +1; if (difftime(now,eventTime) > TIME_3) confidence += TIME_3_CONF +1; }
The logic behind this is that if the event is the last in the table, and that event was 12 hours ago (Against the current time) we can reasonably assume that the last event is actually a far-trip boundary. Also, if none of the location tests pass but the last event was quite some time ago (compared to current time) we can assume that the distance wasn’t entirely accurate and so boost it’s confidence.
In testing so far (Which, to be honest, is quite limited) this has proven to give good results:
Tribal Chicken Systems Trip Processor starting up Discovering devices... Found device: 1 ----- Beginning scan... Found far trip boundary at EID 255 (Device ID 1), confidence 10 Found far trip boundary at EID 315 (Device ID 1), confidence 8 Found far trip boundary at EID 333 (Device ID 1), confidence 8 Found far trip boundary at EID 360 (Device ID 1), confidence 10 Found far trip boundary at EID 412 (Device ID 1), confidence 10 Found far trip boundary at EID 466 (Device ID 1), confidence 8 Found far trip boundary at EID 557 (Device ID 1), confidence 7 Found far trip boundary at EID 558 (Device ID 1), confidence 10