基于网页提取与分析技术,开发IRIS台站元数据抓取程序,设计并建立禁核试核查北京国家数据中心(NDC)的IRIS台站元数据库,实现IRIS台站元数据从IRIS向NDC的定期自动同步。作为NDC运行的基础支撑数据库之一,该数据库为NDC开展辅助性地震监测数据的自动台站筛选和数据申请奠定了技术基础。
The synchronization of station metadata from IRIS to CTBT Beijing national data center (NDC) is implemented through a program developed to acquire the station metadata of Incorporated Research Institutions for Seismology (IRIS) automatically and periodically via Web scraping and content analysis. The station metadata database of NDC is designed and implemented to store. That database used as one of China NDC's elementary supporting databases paves the way to the automatic supplementary stations selection and data request of China NDC.
2019,40(2): 150-154 收稿日期:2018-06-18
DOI:10.3969/j.issn.1003-3246.2019.02.022
作者简介:商杰(1983-),男,硕士研究生,湖北麻城人,工程师,主要从事禁核试核查工作
参考文献:
谢克武. 大数据环境下基于Python的网络爬虫技术[J]. 软件开发,2017,(9):44-45.
FDSN, IRIS, USGS. Standard for the Exchange of Earthquake Data (SEED) Reference Manual version 2.4[S]. 2012.
FDSN:FDSN StationXML Schema[EB/OL].[2018-06-01]http://www.fdsn.org/xml/station.
Lawson R. 用Python写网络爬虫[M]. 李斌,译. 北京:人民邮电出版社,2016:9-17.
Richardson L. Beautiful soup documentation[EB/OL]. (2017-08-11)[2018-06-01]. http://www.crummy.com/software/BeautifulSoup/bs4/doc.