Restructured GEO 

Gene Expression Omnibus (GEO), administered by the National Center for Biotechnology Information (NCBI), is the largest public repository for high-throughput functional genomic data and is an indispensable resource in medical research. However, some of the most useful metadata for each dataset in GEO is stored in unstructured English text that is difficult for researchers to utilize effectively. We address this problem by developing Restructured GEO (ReGEO), which re-organizes and categorizes GEO series and makes them searchable by useful attributes such as the number of time points in the experiment, in the case of time course studies, and the disease being investigated in the study. These features are curated from the unstructured metadata stored in GEO through text mining techniques. ReGEO is designed as a user-friendly database for integrative analytical research. The convenience and accuracy of identifying thousands of potentially useful datasets is expected to catalyze integrative analytical research in the era of Big Data.

