NRSP-8 BIOINFORMATICS COORDINATION PROGRAM 2021 ACTIVITIES Supported by Regional Research Funds, Hatch Act James Reecy, James Koltes, and Fiona McCarthy, Joint Coordinators OVERVIEW: Coordination of the NIFA National Animal Genome Research Program's (NAGRP) Bioinformatics is primarily based at, and led from, Iowa State University (ISU), with additional activities at the University of Arizona (UA), and is supported by NRSP-8. The NAGRP is made up of the membership of the Animal Genome Technical Committee, including the Bioinformatic Subcommittee. FACILITIES AND PERSONNEL: James Reecy (ISU) James Koltes (ISU), and Fiona McCarthy (UA) serve as Co-Coordinators. Iowa State University and University of Arizona provide facilities and support. OBJECTIVES: The NRSP-8 project was renewed as of 10/01/18, with the following objectives: 1. Advance the quality of reference genomes for all agri-animal species by providing high contiguity assemblies, deep functional annotations of these assemblies, and comparison across species to understand structure and function of animal genomes; 2. Advance genome-to-phenome prediction by implementing strategies and tools to identify and validate genes and allelic variants predictive of biologically and economically important phenotypes and traits; and 3. Advance analysis, curation, storage, application, and reuse of heterogeneous big data to facilitate genome-to-phenome research in animal species of agricultural interest. PROGRESS TOWARD OBJECTIVE 1: Advance the quality of reference genomes for all agri-animal species by providing high contiguity assemblies, deep functional annotations of these assemblies, and comparison across species to understand structure and function of animal genomes. PROGRESS TOWARD OBJECTIVE 2: Facilitate the development and sharing of animal populations and the collection and analysis of new, unique, and interesting phenotypes. PROGRESS TOWARD OBJECTIVE 3: Advance analysis, curation, storage, application, and reuse of heterogeneous big data to facilitate genome-to-phenome research in animal species of agricultural interest. The following describes the project's activities over this past year. Multi-species support The Animal QTLdb, CorrDB, NAGRP Bioinformatics Tools, and the NAGRP data repository have been actively supporting the research activities for multiple species. The QTLdb has been accommodating active curation of QTL/association data for seven species (cattle, catfish, chicken, horse, pig, rainbow trout, and sheep). In 2021, a total of 24,178 new QTL/association data were curated into the database, bringing the total number of curated data to 235,970 QTL/associations. Currently, there are 34,342 curated porcine QTL, 177,199 curated bovine QTL, 16,217 curated chicken QTL, 2,605 curated horse QTL, 6,072 curated sheep QTL, and 1,413 curated rainbow trout QTL in the database (https://www.animalgenome.org/QTLdb/). An additional 3,068 correlations (increase by species: cattle: 1,554; chicken: 208; goat: 311; pig: 846; sheep: 149) and 1,237 heritability data (increase by species: cattle: 315; chicken: 32; goat: 2; pig: 843; sheep: 45) were curated into the Animal CorrDB in 2021. Currently there are a total of 24,104 correlation data on 874 traits and 4,319 heritability data on 1,075 traits in 6 livestock animal species (this summary includes a decrease of 1,241 data recalled as part of data quality control efforts). A new livestock SNP ID/name matching data repository and search tool has been added to the NAGRP Bioinformatics Tools. This data collection includes 7,856,530 known SNP IDs/names and 5,055,768 SNP 'rs' to 'ss' ID matches contributed by 10 labs/research groups, and these data are not found in any other public SNP data resources. We continue to welcome such SNP data contributions to this repository. Ontology development We have developed a hierarchy display tool to facilitate expanding and exploring the Vertebrate Trait (VT) Ontology, Livestock Product Trait (LPT) Ontology, Clinical Measurement Ontology (CMO), and other ontology hierarchies. This tool has been implemented as part of the web portals for Animal QTLdb, VT, LPT, and CMO project websites. This past year we continued to focus on the integration of the Animal Trait Ontology into the Vertebrate Trait Ontology (http://bioportal.bioontology.org/ontologies/VT). Fifteen (15) dataset updates were released to the public throughout 2021. We have continued working with the Rat Genome Database to integrate ATO terms that are not applicable to the Vertebrate Trait Ontology into the Clinical Measurement Ontology (http://bioportal.bioontology.org/ontologies/CMO). Traits specific to livestock products continue to be incorporated into a Livestock Product Trait Ontology (LPT), which is available on NCBO's BioPortal (http://bioportal.bioontology.org/ontologies/LPT). Three (3) LPT updates were released during 2021. Seven (7) updates of Livestock Breed Ontology (LBO; https://www.animalgenome.org/bioinfo/projects/lbo/) were made. We have also continued mapping the cattle, pig, chicken, sheep, and horse QTL traits to the Vertebrate Trait Ontology (VT), LPT, and Clinical Measurement Ontology (CMO) to help standardize the trait nomenclature used in the QTLdb. A semi-automated data release pipeline was developed to minimize the manual steps involved in new data upload and version release to BioPortal.ORG and GitHub with AnimalGenome.ORG as a new data sync hub. The VT data download is available through the Github portal (https://github.com/AnimalGenome/vertebrate-trait-ontology) where users can automate their data updates. Anyone interested in helping to improve the ATO/VT is encouraged to contact James Reecy (jreecy@iastate.edu), Cari Park (caripark@iastate.edu), or Zhiliang Hu (zhu@iastate.edu). The VT/LPT/CMO cross-mapping has been well employed by the Animal QTLdb, CorrDB, and VCMap tools. Annotation to the VT is also available for rat QTL data in the Rat Genome Database and for mouse strain measurements in the Mouse Phenome Database. We have also continued to integrate information from multiple resources, e.g. FAO - International Domestic Livestock Resources Information, Oklahoma State University - Breeds of Livestock web site, and Wikipedia, as well as requests from community members. Expanded Animal QTLdb functionality We made efforts to enable the support of multiple genome builds for all livestock species by creating a pipeline lifting SNPs between different assemblies using SAMtools, BEDtools, BWA, and locally developed Perl scripts. All curated QTL/association data continue to be automatically ported to NCBI, Ensembl, UCSC genome browser, and Reuters Data Citation Index in a timely fashion. Users can fully utilize the browser and data mining tools at NCBI, Ensembl, and UCSC to explore animal QTL/association data. Efforts were continually made, working with our counterparts at these institutions, to eliminate any glitches that arose during the automated or semi-automated data porting process. In addition, we have continued to improve existing and add new QTLdb curation tools and user portal tools. Other improvements included the standardization of data links across species for external databases (db_xref) for both QTLdb and CorrDB; improved editor/curator tools to aid SNP name/ID look up and batch annotation for QTL/association data curation; and more improvements on eQTL data display and batch annotations. More improvements and developments as an on-going process are continually being carried out. Further developments of Animal Trait Correlation Database (CorrDB) Our efforts to overhaul and re-develop functionality within CorrDB are ongoing. We continued to strengthen the data quality control procedures to help improve data quality. The new outcome is a re-designed web interface for users to more easily access data by species (the front page). Internally, standardization of program configurations for parameters and functions will help to streamline future tool development and debugging efforts. The CorrDB efforts continue to feature co-development with the QTLdb for shared use of resources and tools, such as trait ontology development and management, literature management, breed ontology management, and bug reporting tools for improved data quality control. The improved CorrDB curator tools are available to the public for any user to register for an account to curate correlation data. As reported in earlier sections, in 2021, correlation data and heritability data continued to be curated. The public web portals continue to undergo improvement. Facilitating research The Data Repository for the aquaculture, cattle, chicken, horse, pig, and sheep communities to share their genome analysis data has proven to be very useful and has been actively used (https://www.animalgenome.org/repository). While new data is continually being curated, we have gradually scaled down the support for hosting supplementary files for publications for more sensible use of the NRSP8 bioinformatics funds. We have redirected the community to a better data repository resource (Open Science Framework, OSF, https://osf.io/) for better long-term data security. In 2021, the data currently in the supplementary data repository was prepared for transfer to OSF in the coming months. Appropriate web visit forwarding will be set up on the current site to redirect to the new URL. The data downloads from the repository generated over 2.05 TB of data traffic in 2021. Throughout the year, over 62 cases were handled through our helpdesk at AnimalGenome.ORG to help users with inquiries/requests for services affecting community research activities and the use of our services. Provided assistance ranged from data transfer and hosting, data deposition, data curation, web presentation, and data analysis, to software applications, code development, advice for tool developments, etc. Community support and user services at AnimalGenome.ORG We have been maintaining and actively updating the NRSP-8 species web pages for each of the six NRSP-8 species. We continue to host mailing lists/websites for various research groups in the NAGRP community (https://www.animalgenome.org/community/). This includes groups like AnGenMap, FAANG international consortium working groups, and CRI-MAP users, new meetings, and user bulletin boards to facilitate these meetings, among other user forums (https://www.animalgenome.org/community). The Functional Annotation of ANimal Genomes (FAANG, https://www.faang.org/) website has been continually developed and maintained to actively support the FAANG activities. The FAANG site serves not only as a FAANG-related information hub, but also as a platform for this international consortium's communication, collaboration, organization, and interaction. It serves over 760 members and 12 working groups and sub-groups, with 14 listserv mailing lists, bulletin board, database, and tools for membership and working group management. The actively hosted materials include meeting minutes, tools/protocols for FAANG activities, incorporation and use of data portal hosted at EBI, presentation slides, and video records of scientific meetings and related events, all interactively available to members through the web portal. Site maintenance We have further consolidated services and developmental platforms to the current Dual Quad Core Xeon Linux server. Efforts were made to improve data backup, security, and availability. This was accomplished by better use of the resources for shared workloads, better data security and network security, and improved protocols for data backup, management, and inventories. Reaching out We have been sending periodic updates to more than 3,000 users worldwide (https://www.animalgenome.org/community/angenmap/) to inform the animal genomics research community of the news and updates regarding AnimalGenome.org. "What's New on AnimalGenome.ORG web site" emails were sent out 3 times in 2021, consistent with the pace/pattern of the past 17 years (https://www.animalgenome.org/bioinfo/updates/). PLANS FOR THE FUTURE OBJECTIVE 1. Advance the quality of reference genomes for all agri-animal species by providing high contiguity assemblies, deep functional annotations of these assemblies, and comparison across species to understand structure and function of animal genomes. We will continue to analyze "omics" data to help better annotate livestock genomes. OBJECTIVE 2. Advance genome-to-phenome prediction by implementing strategies and tools to identify and validate genes and allelic variants predictive of biologically and economically important phenotypes and traits. OBJECTIVE 3. Advance analysis, curation, storage, application, and reuse of heterogeneous big data to facilitate genome-to-phenome research in animal species of agricultural interest. We will continue to work with bovine, mouse, rat, and human QTL database curators to develop minimal information for publication standards. We will also work with these same database groups to improve phenotype and measurement ontologies, which will facilitate transfer of QTL information across species. We will continue working with U.S. and European colleagues to develop a Bioinformatics Blueprint, similar to the Animal Genomics Blueprint recently published by USDA-NIFA, to help direct future livestock-oriented bioinformatic/database efforts. Publications: Hu, Zhi-Liang, Carissa A. Park, and James M. Reecy (2022). Bringing the Animal QTLdb and CorrDB into the future: meeting new challenges and providing updated services. Nucleic Acids Research, Volume 50, Issue D1, Pages D956-D961. DOI: 10.1093/nar/gkab1116