You are not logged in. | Log in

These are the formats supported for gene expression data files:

General Expression Data

This is a format that includes files generated by Illumina's GenomeStudio and Affymetrix's Expression Console as well as other software.
The data are in tab-delimited text tables with a single header row labeling the columns, followed by a row for each probe. The first column contains the probe ID, and many or all of the following columns contain data for the samples.
Each sample data column must be labeled with the sample ID followed by a suffix that begins with a period and contains no other periods. Signal value columns, which are required, have suffixes which include the word "Signal" or "signal". Detection p-value columns, which are optional, include the word "Detection" or "detection".
Other columns (with no periods at all in the label) are allowed but ignored.

Plain Signal Data

This is a simple format that includes only signal data.
The data are in tab-delimited text tables with a single header row labeling the columns, followed by a row for each probe. The first column contains the probe ID, and all of the following columns contain signal values for the samples.
The signal column headings must simply be the sample IDs.
No other columns are allowed.

GEO Family (soft)

This format is available through the NCBI Gene Expression Omnibus (GEO) website. The files are typically named GSEnnnn_family.soft (where nnnn is the series number).
Only one platform is allowed per sample set in our system. Many studies provide Family files with samples run on different chips. In these cases the best approach is to upload Series Matrix files, described below.
Files downloaded from GEO are usually compressed, with .gz extensions. They need to be decompressed (e.g. using gunzip) before uploading them here.

GEO Series Matrix (txt)

This format is available through the NCBI Gene Expression Omnibus (GEO) website. The files are typically named GSEnnnn-GPLmmm_series_matrix.txt (where nnnn is the series number and mmm is the platform number.)
These files only include samples for a single platform. One drawback is that they do not include detection p-values.
Files downloaded from GEO are usually compressed, with .gz extensions. They need to be decompressed (e.g. using gunzip) before uploading them here.