作业：使用Douglas-Peucker算法进行轨迹压缩

创建时间：2017-11-10 投稿人：浏览次数：487

本文参考了一篇学长的博客，链接一下找不着了。

数据预览

fd=open(r".2007-10-14-GPS.log","r")
limit=3
i=0
for line in fd:
    if i<limit:
        print(line.strip())
        i+=1
    else:
        break
fd.close()

可见文件包含了原始的GPS数据，为了后续计算方便，需要对原始数据进行处理与提取。

课题任务为路径压缩，只需要用到GPS数据中的经纬度，所以将经纬度字段进行提取。

GPS原始数据格式及其意义见百度。

数据预处理

将GPS数据提取出来并进行单位统一化，单位为度。

fd_s=open(r".2007-10-14-GPS.log","r")
fd_d=open(r".2007-10-14-GPS_transformed.txt","w")

id=0        #坐标点序号,以0开始,以文件行数结束

for line in fd_s:
    #转换为标准经纬度,ddd.mm
    longitude=float(line.split(" ")[3])/100
    latitude=float(line.split(" ")[5])/100

    #转换成以度为单位的经纬度,ddd+mm/60
    longitude_transform=longitude//1+(longitude-longitude//1)*100/60
    latitude_transform=latitude//1+(latitude-latitude//1)*100/60

    #此处注意精度,若精度不够后面会出现两个点极其相近而发生除0错误
    fd_d.write("%s,%.6f,%.6f
"%(id,longitude_transform,latitude_transform))        
    id+=1

fd_s.close()
fd_d.close()
print("data size:{}".format(id))

输出为：3150

简介

Douglas-Peucker算法常用于轨迹压缩，典型步骤如下：

在曲线首尾两点A，B之间连接一条直线AB，该直线为曲线的弦
计算离弦AB距离最远的点与最大距离
比较最大距离与阈值，若小于阈值则舍弃该点，以AB弦作为曲线的近似
若大于阈值，则以此点将曲线划分成两段，并对两段曲线分别进行1-3步操作

此处实现了四个函数，分别为：

Geodist(point1,point2)：返回两个点之间的距离
get_vertical_dist(pointA,pointB,pointX)：返回点X与弦AB的垂直距离
DP_compress(point_list,output_point_list,Dmax)：按照阈值Dmax将压缩后的point_list输出到output_point_list中
get_MeanErr(point_list,output_point_list)：返回输出相对于输入的平均误差

点间距离

import math

def Rad(d):
    return d * math.pi / 180

def Geodist(point1,point2):
    radLat1 = Rad(point1[1])
    radLat2 = Rad(point2[1])
    delta_lon = Rad(point1[0] - point2[0])
    top_1 = math.cos(radLat2) * math.sin(delta_lon)
    top_2 = math.cos(radLat1) * math.sin(radLat2) - math.sin(radLat1) * math.cos(radLat2) * math.cos(delta_lon)
    top = math.sqrt(top_1 * top_1 + top_2 * top_2)
    bottom = math.sin(radLat1) * math.sin(radLat2) + math.cos(radLat1) * math.cos(radLat2) * math.cos(delta_lon)
    delta_sigma = math.atan2(top, bottom)
    distance = delta_sigma * 6378137.0

    return round(distance,3)

#test
# point1=(116.470818,39.927558,0)
# point2=(116.470527,39.927338,1)
# print(Geodist(point1,point2))

点弦距离

使用三角形面积公式计算点到弦的距离：
三角形面积S=12dh=p(p−a)(p−b)(p−c)−−−−−−−−−−−−−−−−−√ $S=frac{1}{2}dh=sqrt{p(p-a)(p-b)(p-c)}$ ，其中d为底，h为高，p为半周长。点X到弦AB的距离可看作是三角形AB底的高，由此可计算点弦距离。

def get_vertical_dist(pointA,pointB,pointX):
    a=math.fabs(Geodist(pointA,pointB))

    #当弦两端重合时,点到弦的距离变为点间距离
    if a==0:
        return math.fabs(Geodist(pointA,pointX))

    b=math.fabs(Geodist(pointA,pointX))
    c=math.fabs(Geodist(pointB,pointX))
    p=(a+b+c)/2
    S=math.sqrt(math.fabs(p*(p-a)*(p-b)*(p-c)))

    vertical_dist=S*2/a

    return vertical_dist

#test
# pointA=(116.470818,39.927558,0)
# pointB=(116.470880,39.927403,3149)
# pointX=(116.470907,39.927267,10)
# print(get_vertical_dist(pointA,pointB,pointX))

递归压缩

def DP_compress(point_list,output_point_list,Dmax):
    start_index=0
    end_index=len(point_list)-1

    #起止点必定是关键点,但是作为递归程序此步引入了冗余数据,后期必须去除
    output_point_list.append(point_list[start_index])
    output_point_list.append(point_list[end_index])

    if start_index<end_index:
        index=start_index+1        #工作指针,遍历除起止点外的所有点
        max_vertical_dist=0        #路径中离弦最远的距离
        key_point_index=0        #路径中离弦最远的点,即划分点

        while(index<end_index):
            cur_vertical_dist=get_vertical_dist(point_list[start_index],point_list[end_index],point_list[index])
            if cur_vertical_dist>max_vertical_dist:
                max_vertical_dist=cur_vertical_dist
                key_point_index=index        #记录划分点
            index+=1

        #递归划分路径
        if max_vertical_dist>=Dmax:
            DP_compress(point_list[start_index:key_point_index],output_point_list,Dmax)
            DP_compress(point_list[key_point_index:end_index],output_point_list,Dmax)

计算误差

距离误差为被压缩的点到与其相邻的关键点之间弦的距离。

def get_MeanErr(point_list,output_point_list):
    Err=0

    start_index=0
    end_index=len(output_point_list)-1

    while(start_index<end_index):        #遍历所有关键点
        #选取两相邻关键点
        pointA_id=int(output_point_list[start_index][2])
        pointB_id=int(output_point_list[start_index+1][2])

        id=pointA_id+1        #工作指针,用于遍历非关键点
        while(id<pointB_id):        #遍历两关键点之间的非关键点
            Err+=get_vertical_dist(output_point_list[start_index],output_point_list[start_index+1],point_list[id])
            id+=1

        start_index+=1

    return Err/len(point_list)

point_list=[]
output_point_list=[]

#将处理后的数据读入内存
fd=open(r".2007-10-14-GPS_transformed.txt","r")
for line in fd:
    line=line.strip()
    id=int(line.split(",")[0])
    longitude=float(line.split(",")[1])
    latitude=float(line.split(",")[2])
    point_list.append((longitude,latitude,id))
fd.close()

DP_compress(point_list,output_point_list,Dmax=30)

output_point_list=list(set(output_point_list))        #去除递归引入的冗余数据
output_point_list=sorted(output_point_list,key=lambda x:x[2])        #按照id排序

#将压缩数据写入输出文件
fd=open(r".output.txt","w")
for point in output_point_list:
    fd.write("{},{},{}
".format(point[2],point[0],point[1]))
fd.close()

print("compression rate={}/{}={}".format(len(point_list),len(output_point_list),len(output_point_list)/len(point_list)))
print("mean error:{}".format(get_MeanErr(point_list,output_point_list)))

压缩率与平均误差分别为：

将压缩前后的轨迹进行对比，检查压缩效果。

import matplotlib.pyplot as plt

uncompressed=[[],[]]
for point in point_list[:]:
    uncompressed[0].append(point[0])
    uncompressed[1].append(point[1])

plt.plot(uncompressed[0],uncompressed[1])
plt.xlabel("longitude")
plt.ylabel("latitude")
plt.title("uncompressed")
plt.show()


compressed=[[],[]]
for point in output_point_list:
    compressed[0].append(point[0])
    compressed[1].append(point[1])

plt.plot(compressed[0],compressed[1])
plt.xlabel("longitude")
plt.ylabel("latitude")
plt.title("compressed")
plt.show()

压缩效果还是可以接受的，压缩前后的轨迹差别不大。

对于一般路径而言，Douglas-Peucker算法是有效的，但是不难想到，当出现折返路径时，往外突出的部分很容易被压缩掉，因为偏差是以垂直距离为准的。下面的图片展示了这种情况下的压缩效果。

声明：该文观点仅代表作者本人，牛骨文系教育信息发布平台，牛骨文仅提供信息存储空间服务。

上一篇： Android图片压缩加密上传 - JPEG压缩算法解析
下一篇：精讲哈夫曼压缩算法

热门文章: CTF writeup 2_南邮网络攻防训...; SSM框架——详细整合教程（...; Linux Shell脚本编程－－curl命...; HttpClient使用详解; Java面试题全集（上）; JAVA设计模式之单例模式; java.lang.OutOfMemoryError: PermGen ...; TCP协议中的三次握手和四次...; form表单的两种提交方式，su...; String,StringBuffer与StringBuilder...

最新文章: Java之品优购课程讲义_day20（7）; 剑指 Offer - 8：跳台阶; Netty权威指南_札记02_NIO编程; mysql时间属性之时间戳和datetime之...; 虚拟现实或许可以拯救古埃及的“...; spring cloud服务注册中心eureka---集群...; Java SE 第六章; HTTP请求+数据库; HIDL学习笔记之HIDL C++（第二天）; ubuntu系统下指定tomcat运行时为JDK1.8...