Understanding Licenses
Common data licenses
- Introduction to intellectual property rights in data management"This fact sheet provides a brief overview of some of the issues associated with managing IPR [Intellectual Property Rights] in data projects." This document is a great way to understand why you might want to use an Open Data License instead of Creative Commons and how these are different from software licenses.
- Licensing (Stanford Libraries)Describes Creative Commons and Open Data licenses.
- What is the most appropriate licence for my data?Describes how common licenses used for research data products.
Attribution
- Best practices for CC attributionIncludes examples
- Open source software attribution obligationsIncludes good and bad examples
Citation vs attribution
Multidisciplinary
- Finding and Reusing Data on a Generalist Repository"Generalist repositories can also accommodate data that may not have an existing disciplinary repository. Because the data can come from many fields, the common fields used to describe and categorize data can be fairly broad Once you are familiar with the repository though, there can be ways to hone in on metadata related to specific topics."
- U.S. Government Open Data"Under the OPEN Government Data Act, which is Title II of the Foundations for Evidence-Based Policymaking Act, government data is required to be made available in open, machine-readable formats, while continuing to ensure privacy and security."
- Racial Equity Tools: Demographics and Population Data"Data is never neutral. [...] public databases may lag years behind how researchers and activities are reconfiguring identity groups and other classifications. Why does this matter for racial equity work? Because the people most likely to be excluded in public data sets often represent populations who are already the most marginalized and invisibilized"
- Google Dataset Search"In addition to making datasets universally accessible and useful, Dataset Search's mission is to:
Foster a data sharing ecosystem that will encourage data publishers to follow best practices for data storage and publication
Give scientists a way to show the impact of their work through citation of datasets that they have produced
As more dataset repositories use schema.org and similar standards to describe their datasets, the variety and coverage of datasets that users find in Dataset Search will continue to grow."
Spatial Datasets
- Finding GIS DataAn Oklahoma-based guide listing many of the resources below and more.
- GeoNames"Database covers all countries and contains over eleven million placenames that are available for download free of charge."
- Physical maps available at Youngblood Energy LibrarHow to find maps in the Youngblood Energy (Geology) Library in Sarkey's Energy Center.
Global data
- DataONEGeographic search, indexing scientific research studies. Some have GIS data, some do not.
- ESRI Open Data HubOpen Data Portal hosted by the publisher of ArcGIS.
- Native Land"Native Land Digital is a Canadian not-for-profit organization, incorporated in December 2018. Native Land Digital is Indigenous-led, with an Indigenous Executive Director and Board of Directors who oversee and direct the organization. [...] Maps potentially function as colonial artifacts and represent a very particular way of seeing the world – a way primarily concerned with ownership, exclusivity, and power relations."
- IPUMS (Institute for Social Research and Data Innovation)"Census and survey data from around the world integrated across time and space."
- Socioeconomic Data and Applications Center (sedac)"SEDAC, the Socioeconomic Data and Applications Center, is one of the Distributed Active Archive Centers (DAACs) in the Earth Observing System Data and Information System (EOSDIS) of the U.S. National Aeronautics and Space Administration. Focusing on human interactions in the environment, SEDAC has as its mission to develop and operate applications that support the integration of socioeconomic and earth science data and to serve as an "Information Gateway" between earth sciences and social sciences."
- Gapminder"Gapminder produces free teaching resources making the world understandable based on reliable statistics."
- PANGAEA. Data Publisher for Earth & Environmental Science"The information system PANGAEA is operated as an Open Access library aimed at archiving, publishing and distributing georeferenced data from earth system research. The system guarantees long-term availability of its content through a commitment of the hosting institutions."
- WorldPop"Within the WorldPop project a range of open geospatial datasets are combined in a flexible regression tree framework to reallocate contemporary aggregated spatial population count data."
United States
- Geospatial Data GatewaySponsored by the USDA and Natural Resources Conservation Service
- CEC North American Environmental AtlasCommission for Environmental Cooperation
- PolicyMap"An idea in 2007 to make geographic data easily accessible and understandable to policy makers grew into what is now PolicyMap: a data warehouse of over 50,000 indicators accessible through a state-of-the-art mapping and analytics platform."
- Mapping Police Violence"This information has been meticulously sourced from the three largest, most comprehensive and impartial crowdsourced databases on police killings in the country: FatalEncounters.org, the U.S. Police Shootings Database and KilledbyPolice.net. We've also done extensive original research to further improve the quality and completeness of the data; searching social media, obituaries, criminal records databases, police reports and other sources to identify the race of 90 percent of all victims in the database." Provides map and infographics as well as raw data download as spreadsheet.
- National Oceanic and Atmospheric AdministrationSearch within the research areas on the site for GIS data specific to your needs.
- Census BureauLearn about America's communities through our data profiles. They cover 100,000+ different geographies: states, counties, places, tribal areas, zip codes, and congressional districts. For each, we cover topics like education, employment, health, and housing just to name a few.
- Dallas-Forth Worth Federal Statistical Research Data CenterPartnership between the U.S. Census Bureau and a consortium that includes the Federal Reserve Bank of Dallas, the University of Texas at Dallas, the University of Texas Southwestern Medical Center, the University of Texas at Arlington, Southern Methodist University, the University of North Texas, and the Dallas-Fort Worth Hospital Council Foundation. OU is also a consortium member so OU researchers can contact this branch for help with US Census data.
- NHGISNational Historic Geographic Information System
Hosted by University of Minnesota - U.S. Geological SurveyLinks to National Map, National Atlas and Geospatial One Stop
- US Bureau of Land ManagementEach office has their own data centers.
- Mapping Inequality: Redlining in New Deal America"Redlining directed both public and private capital to native-born white families and away from African American and immigrant families. As homeownership was arguably the most significant means of intergenerational wealth building in the United States in the twentieth century, these redlining practices from eight decades ago had long-term effects in creating wealth inequalities that we still see today. Mapping Inequality, we hope, will allow and encourage you to grapple with this history of government policies contributing to inequality." The data are available as shapefiles and GeoJSON, and are searchable by description, transcription, city, and state.
- Data 4 Black Lives: COVID-19 Data"D4BL has worked to consolidate state level data to explore the disproportionate impact of COVID-19 on Black people in the US. The D4BL COVID-19 Dataset captures state-level COVID-19 cases and deaths for Black people in the United States. D4BL established a team of volunteer data scientists to develop a codebase for automating the data extraction from state websites and storing it into this dataset."
- Inuit Siku (sea ice) Atlas"A group of Inuit experts, community researchers, and university researchers, have worked together over the past several years to document specialized Inuit knowledge about sea ice. Inuit elders and hunters are the local experts on sea ice. Through long term observation and experience with the sea ice, they have developed deep and rich understandings of the marine environment"
Oklahoma
Aviation & Aerospace Engineering
- MIT Global Airline Industry ProgramCommercial airline data sets with a focus on revenue metrics
- Gas Dynamics LabDatasets from Princeton University’s Gas Dynamics Lab
Engineering Data
- AirNowProvides air quality data for 500 U.S. cities, as well as U.S. consulates and embassies. Includes an interactive map.
- CitrinationOpen data repository for data on materials and chemical information, including alloys, polymers, and semiconductors.
- Energy Information AdministrationProvides statistics and other data relating to major energy sectors, including petroleum, natural gas, coal, nuclear, and renewables.
- KnovelAlso known as Data Search, find materials and properties information from technical references.
- MakeItFromA material properties database that emphasizes ease of comparison. The data are sourced from published standards, academic literature, and supplier documentation.
- Materials Data Facility (MDF)Discover published datasets; provides resources to help you publish your own datasets.
- Materials ProjectSearch for materials by chemistry, composition, or property. Also offers computational tools that can predict the properties of novel materials.
Geology & Geophysics Data
- EarthChem LibraryOpen access repository for geochemical datasets
- Geological Society of America (GSA) Data RepositorySupplemental data from Geological Society of America publications
- IRIS Seismic Data AccessSponsored by the NSF, IRIS provides wide ranging seismic data, including time series, event, and historical data.
Environmental and Sustainability Datasets
- Chemical Effects in Biological Systems (CEBS)NIEHS supported public data sets.
- Chemical Entities of Biological Interest (ChEBI)Chemical Entities of Biological Interest (ChEBI) is a freely available dictionary of molecular entities focused on ‘small’ chemical compounds. (European Bioinformatics Institute)
- Comparative Toxigenomics Database (CTD)CTD illuminates how environmental chemicals affect human health.
- DataONE (Data Observation Network for Earth)Data Observation Network for Earth (DataONE) is the foundation of new innovative environmental science through a distributed framework and sustainable cyberinfrastructure that meets the needs of science and society for open, persistent, robust, and secure access to well-described and easily discovered Earth observational data.
- Environmental Genome ProjectThe NIEHS Environmental Genome Project is a multi-disciplinary, collaborative effort focused on examining the relationships between environmental exposures, inter-individual sequence variation in human genes and disease risk in U.S. populations.
- IASSIST Data SourcesInternational Association for Social Science Information Services & Technology data sources.
- National Institute of Environmental Health Sciences DatabasesLinks to 12 science related data sets supported by NIEHS.
- OpenDOARDirectory of Open Access Repositories.
Life Sciences Datasets
- EcoCyc E. coli DatabaseEcoCyc is a scientific database for the bacterium Escherichia coli K-12 MG1655. The EcoCyc project performs literature-based curation of the entire genome, and of transcriptional regulation, transporters, and metabolic pathways.
- Gene Ontology Resource"The Gene Ontology (GO) knowledgebase is the world’s largest source of information on the functions of genes. This knowledge is both human-readable and machine-readable, and is a foundation for computational analysis of large-scale molecular biology and genetics experiments in biomedical research."
- Human Genome ProjectThe Human Genome Project (HGP) refers to the international 13-year effort, formally begun in October 1990 and completed in 2003, to discover all the estimated 20,000-25,000 human genes and make them accessible for further biological study.
- iDigBio specimens online"Making data and images of millions of biological specimens available on the web"
- IUCN Red List of Threatened Species"Contains over 75,000 assessments of species, subspecies, varieties and subpopulations covering a variety of taxa"
- MetaCyc: Metabolic Pathway DatabaseMetaCyc is a curated database of experimentally elucidated metabolic pathways from all domains of life. MetaCyc contains 2453 pathways from 2788 different organisms
- MovebankAnimal movement data from a variety of sources (GPS tracking, geolocators, etc.)
- National Science Foundation's National Ecological Observatory Network (NEON)"NEON collects environmental data and archival samples that characterize plant, animals, soil, nutrients, freshwater and atmosphere from 81 field sites strategically located in terrestrial and freshwater ecosystems across the U.S."
- AquaDocsAquaDocs is an open access repository covering the natural marine, estuarine/brackish and freshwater environments. It includes all aspects of the science, technology, management and conservation of these environments, their organisms and resources, and the economic, sociological and legal aspects.
- NBII (Archive Site)National Biological Information Infrastructure website was sacrificed by the Federal Government in 2012. The University of North Texas has archived portions of this resource.
- Qiita Datasets"Qiita allows users to download public data as well as the user’s own private data. This data can then be used for processing and analysis in external tools."
- TAIR: The Arabidopsis Information ResourceThe Arabidopsis Information Resource (TAIR) maintains a database of genetic and molecular biology data for the model higher plant Arabidopsis thaliana
- Smithsonian NMNH Department of Botany Collections OnlineThe plant collections of the Smithsonian Institution began with the acquisition of specimens collected by the United States Exploring Expedition (1838-1842). These formed the foundation of a National Herbarium which today numbers over 5 million historical plant records, placing it among the world's largest and most important. Over 4.2 million specimen records (including over 115,000 type specimens with images) are currently available in this online catalog
National Center for Biotechnology Information (NCBI)
- NCBI: All ResourcesCatalog of all NCBI databases and tools
If you don't know which specialized database below to use, start by searching all databases.
How to:
DNA/RNA Databases
- GeneGene integrates information from a wide range of species. A record may include nomenclature, Reference Sequences (RefSeqs), maps, pathways, variations, phenotypes, and links to genome-, phenotype-, and locus-specific resources worldwide.
- Gene Expression Omnibus (NCBI)GEO is an international public repository that archives and freely distributes microarray, next-generation sequencing, and other forms of high-throughput functional genomics data submitted by the research community.
- GenomeThis resource organizes information on genomes including sequences, maps, chromosomes, assemblies, and annotations.
- NucleotideThe Nucleotide database is a collection of sequences from several sources, including GenBank, RefSeq, TPA and PDB. Genome, gene and transcript sequence data provide the foundation for biomedical research and discovery.
Protein Databases
- Conserved Domain DatabaseThe Conserved Domain Database is a resource for the annotation of functional units in proteins. Its collection of domain models includes a set curated by NCBI, which utilizes 3D structure to provide insights into sequence/structure/function relationships.
- ProteinThe Protein database is a collection of sequences from several sources, including translations from annotated coding regions in GenBank, RefSeq and TPA, as well as records from SwissProt, PIR, PRF, and PDB. Protein sequences are the fundamental determinants of biological structure and function.
- StructureContains macromolecular 3D structures derived from the Protein Data Bank, as well as tools for their visualization and comparative analysis.
Taxonomy
- TaxonomyThe Taxonomy Database is a curated classification and nomenclature for all of the organisms in the public sequence databases. This currently represents about 10% of the described species of life on the planet.