Extracting GPS Data

In the case of a $GPGSV header, the number of satellites is the fourth entry. In case of a $GPRMC header, we have a bit more interesting information. The second field is the timestamp, the fourth field is the latitude, the sixth field is the longitude, and the eighth field is the velocity. Again, turn to the NMEA 0183 format for more details. Table 1-1 summarizes the fields and their values in a $GPRMC line.

Table 1-1. $GPRMC Information (Excerpt)

Field Name





$GPRMC (fixed)













Some caveats regarding the information in $GPRMC. We first turn to the timestamp of an arbitrary line:

['$GPRMC', '140055.00', 'A', '4454.1740', 'N', '09325.0143', 'W', '000.0', '128.7', '300508', '001.1', 'E', 'A*28']

In this output, the timestamp appears as '140055.00'. This follows the format hhmmss.ss where hh are two digits representing the hour (it will always consist of two digits—if the hour is one digit, say 7 in the morning, a 0 will be added before it), mm are two digits representing the minute (again, always two digits), and ss.ss are five characters (four digits plus the dot) representing seconds and fractions of seconds. (There's also a North/South field as well as an East/West field. Here, for simplicity, we assume northern hemisphere, but you can easily change these values by reading the entire $GPRMC structure.)

■ Note In the ISO time format, we've used HHMMSS to denote hours minutes and seconds. Here we follow the convention in NMEA, which uses hhmmss.ss for hours, minutes, and seconds and sets DD and MM to angular degrees and minutes.

The timestamp string is a bit hard to work with, especially when plotting data. The first reason is that it's a string, not a number. But even if you translated it to a number, the system does not lend itself nicely to plotting because there are 60 seconds in a minute, not a 100. So what we want to do is "linearize" the timestamp. To achieve this, we translate the timestamp as seconds elapsed since midnight, as follows: T = hh * 3600 + mm * 60 + ss.ss.

The second issue we have is that hh, mm, and ss.ss are strings, not numbers. Multiplying a string in Python does something completely different from what we want here. So we have to first convert the strings to numerical values, in our case, float, because of the decimal point in the string representing the seconds. This all folds nicely into the following:

['$GPRMC', '140055.00', 'A', '4454.1740', 'N', '09325.0143', 'W', '000.0', '128.7', '300508', '001.1', 'E', 'A*28']

»> float(row[l][0:2])*3600+float(row[l][2:4])*60+float(row[l][4:6]) 50445.0

The operator [] denotes the index, so row[l] is the second field of row (counting starts at zero) which is a string. The first two characters of a string are denoted by [0:2]; this is known as string slicing. So to access the first two characters of the first field, we write row[l][0:2]. Upcoming chapters will include more about strings and methods of slicing them.

Next we tackle latitude and longitude. We face the same issue as with the timestamp, only here we deal with degrees. Latitude follows the format DDMM.MMM where DD stands for degrees and MM.MMM stands for minutes. We decide to use degrees this time. To translate the latitude into decimal degrees, we need to divide the minutes by 60:

['$GPRMC', '140055.00', 'A', '4454.1740', 'N', '09325.0143', 'W', '000.0', '128.7',

>>> float(row[3][0:2])+float(row[3][2:])/60.0


For latitude information we require the fourth field, hence row[3]. This example also introduces another notation, [2:], which means the slice of the string from the third character until the end. Also notice that the code uses 60.0 and not 60. When dividing by 60, it's implied that you want an integer division; dividing by 60.0 means you want a floating-point division, which is to say you care about the information past the decimal point. However, seeing as we already specified that we want the information as a floating-point number as indicated by the float() conversion, the result will be a floating point regardless. Still, it's good practice to let Python know what kind of division you really want.

Here are some examples to further illustrate the point:

>>> 100/60.0 1.6666666666666667 >>> float(l00)/60 1.6666666666666667

Longitude information is similar to latitude with a minor difference: longitude degrees are three characters instead of two (up to 180 degrees, not just up to 90 degrees) so the indices to the strings are different.

Listing 1-6 presents the entire function to process GPS data.

Listing 1-6. Function process_gps_data() from pylab import *

# constant definitions NMI = 1852.0

def process_gps_data(data):

Processes GPS data, NMEA 0183 format.

Returns a tuple of arrays: latitude, longitude, velocity [km/h], time [sec] and number of satellites.

See also: http://www.gpsinformation.org/dale/nmea.htm

for row in data:

num_sats.append(float(row[3])) elif row[0] == '$GPRMC':

t_seconds.append(float(row[l][0:2])*3600 + \ float(row[l][2:4])*60+float(row[l][4:6])) latitude.append(float(row[3][0:2]) + \

float(row[3][2:])/60.0) longitude.append((float(row[5][0:3]) + \

float(row[5][3:])/60.0)) velocity.append(float(row[7])*NMI/1000.0)

return (array(latitude), array(longitude), \

array(velocity), array(t_seconds), array(num_sats))

Some notes about the process_gps_data() function:

• NMI is defined as 1852.0, which is one nautical mile in meters and also one minute on the equator. The reason the constant NMI is not defined in the function is that we'd like to use it outside the function as well.

• We initialize the return values latitude, longitude, velocity, t_seconds, and num_sats by setting them to an empty list: []. Initializing the lists creates them and allows us to use the append() method, which adds values to the lists.

• The if and elif statements are self-explanatory: if is a conditional clause, and elif is equivalent to saying "else, if." That is, if the first condition didn't succeed, but the next condition succeeds, execute the following block.

• The symbol \ that appears on the several calculations and on the return line indicates that the operation continues on the next line.

• Lastly, the return value is a tuple of arrays. A tuple is an immutable sequence, meaning you cannot change it. So tuple means an unchangeable sequence of items (as opposed to a list, which is a mutable sequence). The reason we return a tuple and not a two-dimensional array, for example, is that we might have different lengths of lists to return: the length of the number of satellites list may be different from the length of the longitude list, since they originated from different header stamps.

Here's how you call process_gps_data():

>>> y = read_csv_file('../dataWGPS-2008-05-30-09-00-50.csv') >>> (lat, long, v, t, sats) = process_gps_data(y)

The second line introduces sequence unpacking, which allows multiple assignments. Armed with all these functions, we're ready to plot some data!

Was this article helpful?

0 0

Post a comment