二零一一年四月论文题目:基于规则的中文地址分词与匹配方法作者姓名:谭侃侃入学时间:2008年9月专业名称:地图学与研究方向:3S技术集成地理信息系统与应用指导教师:刘文宝职称:教授指导教师:牟乃夏职称:副教授论文提交日期:2011年4月论文答辩日期:2011年6月授予学位日期:Rule-based Chinese Address Segmentation and Matching MethodsA Dissertation submitted in fulfillment of the requirements of the degree ofMASTER OF SCIENCEfromShandong University of Science and Technologyb yTan KankanSupervisor:Professor Liu WenbaoSupervisor:Professor Mu NaixiaGeomatics CollegeApril2011声明本人呈交给山东科技大学的这篇硕士学位论文,除了所列参考文献和世所公认的文献外,全部是本人在导师指导下的研究成果。
该论文资料尚没有呈交于其它任何学术机关作鉴定。
硕士生签名:日期:AFFIRMATIONI declare that this dissertation,submitted in fulfillment of the requirements for the award of Master of Philosophy in Shandong University of Science and Technology,is wholly my own work unless referenced of acknowledge.The document has not been submitted for qualification at any other academic institute.Signature:Date:摘要在信息时代的今天,城市各部门都存有大量与地址有关的地理位置信息,这些数据大多是非空间信息,无法通过地理信息系统来实现行业之间的数据共享。
因此,城市地址信息空间化是数字城市建设的重要组成部分。
地理编码技术正是实现城市地址信息空间化的方法,它提供了一种将文本描述的地址信息转换为地理坐标的方式,通过编码技术和地址匹配来确定此地址数据在电子地图上对应的地理实体位置。
通过地理编码技术,大量的社会经济数据将变成坐标化的空间信息,从而进行更快速有效的空间分析,为政府决策提供支持。
论文以武汉市的地址研究为项目背景,进行中文地址分词与地址匹配研究。
利用地理编码技术实现地址的快速查询匹配和社会经济数据的空间化,建立数据库统一管理,从而实现城市各部门、行业数据的共享。
主要研究内容和取得的成果如下:(1)改进了现有的地址模型,并根据此地址模型将地址数据规范化,建设完备的标准地址数据库。
(2)在研究了几种地址分词及匹配方法的基础上,提出一种基于规则的地址分词匹配方法,加入了规则树和歧义存储等机制,通过算法改进,提高了地址残缺和歧义这两类模糊地址的匹配成功率。
(3)建立了知识学习机制,通过地址补录模块,将匹配失败和数据库中缺少的地址补录入库,从而不断完善标准地址数据库。
关键词:地理编码,地址标准化,中文地址分词,地址数据库,规则库,地址匹配ABSTRACTIn the information age of today,there are a large number of address information in the city departments.Most of the datas are non-spatial information,we can not share them by Geographic information system.So it’s a main part of digital city build ing to informationize the city address information.Geocoding is a method to informationize the city address information,which provides a way of translating the text address to geographic coordinates.By geocoding technology,a large number of socio-economic data will become spatial information in the form of coordinates,data sharing can be achieved between the city departments and the industry,so there will be a more rapid and effective spatial analysis and decision-making.The paper takes the research of the WuHan addresses as the project background,using the address Geocoding technology to achieve address rapid query and socio-economic data spatialization,and build address database.Then we can share information in the city departments.The main contents of the research are:(1)Improving the existing address model,then we made address standardization by the new model,and build the standard address database.(2)Research some address segmentation and geocoding methods,and propose a rule-based Chinese address geoeoding method.We add rule tree and ambiguity storage mechanism to improve the success rate of fuzzy address matching.(3)The paper creates a learning system,so we can add database with fail-matching address by address adding module.Keywords:Geocoding,Address standardization,Chinese address segmentation, Address database,Rule database,Address matching目录1绪论 (1)1.1研究背景及意义 (1)1.2国内外研究现状 (2)1.3研究内容 (7)1.4论文的组织结构 (8)1.5本章小结 (8)地址编码与中文地址分词的关键技术 (9)2地址编码与中文地址分词的关键技术2.1地址标准化 (10)2.2中文地址分词 (15)2.3地址数据库匹配 (19)2.4本章小结 (21)3基于规则的中文地址分词与匹配基于规则的中文地址分词与匹配 (22)3.1地址模型研究 (22)3.2标准地址库的创建 (23)3.3规则库与规则树 (24)3.4模糊地址分析处理 (25)3.5基于规则的模糊中文地址分词与匹配算法 (26)3.6论文算法的改进 (29)3.7本章小结 (30)地址编码系统的设计 (31)4地址编码系统的设计4.1系统开发工具与实验平台 (31)4.2系统设计方案 (33)4.3本章小结 (36)地址编码系统的实现 (36)5地址编码系统的实现5.1系统主控模块 (36)5.2标准地址库创建 (38)5.3标准地址库管理 (39)5.4批量地址匹配 (40)5.5标准地址库补录 (42)5.6实验结果分析 (43)5.7本章小结 (45)总结与展望 (46)6总结与展望6.1总结 (46)6.2展望 (46)致谢 (47)致谢参考文献 (48)参考文献攻读硕士学位期间主要学术成果 (51)攻读硕士学位期间主要学术成果Contents1Introduction (1)1.1Background of the Research (1)1.2Current Research Home and Abroad (2)1.3The Contents of the Research (7)1.4Paper Structure (8)1.5Chapter Summary (8)2The key technology of Geocoding (9)2.1Address Standardization (10)2.2Chinese Address Segmentation (15)2.3Matching in Database (19)2.4Chapter Summary (21)3Rule-based Chinese Address Segmentation and Matching (22)3.1Address Model Research (22)3.2Building the Standard Address Database (23)3.3The Rule-base and Rule-tree (24)3.4Fuzzy Address Analysis (25)3.5Rule-based Chinese Address Segmentation and Matching Arithmetic (26)3.6Advantage of the Arithmetic (30)3.7Chapter Summary (30)4Design of the Geocoding System (32)4.1Development Tools and Platform of the System (32)4.2System Design (33)4.3Chapter Summary (36)5Implementation of the Geocoding System (36)5.1Main module of the system (36)5.2Building standard address database (37)5.3Standard address database management (38)5.4Batch address Matching (39)5.5Standard address database additional (42)5.6Results Analysis (43)5.7Chapter Summary (45)6Conclusions and Prospects (46)6.1Conclusions (46)6.2Prospects (46)Acknowledgements (47)Main Reference Document (48)Main Work Achievement of the Author during the Master (52)1绪论1.1研究背景及意义随着地理信息系统(GIS)的不断发展和其在各行业的广泛应用,人们对信息共享的要求也越来越迫切。