人工智能识别植物准确率高达80%
据《自然》杂志官网日前报道,一篇发表在最新一期《进化生物学》杂志上的论文称,用成千上万份标本图像“训练”过的计算机算法,已经能自动识别被压制的、干燥植物标本的物种。这是科学家首次尝试通过深度学习,让计算机使用大型复杂数据集的神经网络,解决了识别自然物种分类的困难任务。
世界各地的自然历史博物馆正在加速藏品数字化进程,将标本图像存储在开放数据库中。比如美国国家科学基金会的iDigBio项目的一个数据库,就拥有来自全美各地收集的超过1.5亿张植物和动物图像。
目前,世界3.5亿个物种中,只有一小部分被数字化了。但是,随着计算技术的进步,哥斯达黎加理工学院计算机科学家艾瑞克·蒙塔罗和法国蒙彼利埃国际发展农业研究中心植物学家皮埃尔·邦尼特认为,为标本做大数据集已经成为可能。他们的团队已经实现了植物识别的自动化。
研究人员借助智能手机应用程序现场拍摄标本,积累了数以百万计的新鲜植物图像,然后对1000多个物种、超过26万份植物标本进行了扫描识别,采用先进算法的识别准确率高达80%。
邦尼特说,这样惊人的结果往往让植物学家担心其学术领域被轻视。“但人类的专长永远不会被消除,识别结果仍需要植物学家来检验正确与否。”
人工智能识别标本的方法,极大地减少了植物学家收集和识别标本的时间,还能帮助改进标本数据贫乏地区的植物鉴定水平,对生物多样性丰富但植物标本较少的地区特别有用。
此外,这种方法还能让研究人员对大数据进行额外的分析。一般而言,植物标本样本中含有丰富的数据信息,例如采集时间和地点,采集时在开花还是在结果,以及花群密集特征等。由于一些样本是几个世纪以前的数据,因此,可以帮助研究植物是如何适应气候变化的。
美国宾夕法尼亚州立大学博士彼得·威尔夫说:“在自然历史的进程中,这种方法预示着未来。”(来源:中国科技网 记者房琳琳)
Going deeper in the automated identification of Herbarium specimens
Abstract
Background Hundreds of herbarium collections have accumulated a valuable heritage and knowledge of plants over several centuries. Recent initiatives started ambitious preservation plans to digitize this information and make it available to botanists and the general public through web portals. However, thousands of sheets are still unidentified at the species level while numerous sheets should be reviewed and updated following more recent taxonomic knowledge. These annotations and revisions require an unrealistic amount of work for botanists to carry out in a reasonable time. Computer vision and machine learning approaches applied to herbarium sheets are promising but are still not well studied compared to automated species identification from leaf scans or pictures of plants in the field.
Results
In this work, we propose to study and evaluate the accuracy with which herbarium images can be potentially exploited for species identification with deep learning technology. In addition, we propose to study if the combination of herbarium sheets with photos of plants in the field is relevant in terms of accuracy, and finally, we explore if herbarium images from one region that has one specific flora can be used to do transfer learning to another region with other species; for example, on a region under-represented in terms of collected data.
Conclusions
This is, to our knowledge, the first study that uses deep learning to analyze a big dataset with thousands of species from herbaria. Results show the potential of Deep Learning on herbarium species identification, particularly by training and testing across different datasets from different herbaria. This could potentially lead to the creation of a semi, or even fully automated system to help taxonomists and experts with their annotation, classification, and revision works.
原文链接:https://bmcevolbiol.biomedcentral.com/articles/10.1186/s12862-017-1014-z