Query the OmicIDX Bigquery tables

Use standard SQL queries to explore omics metadata.

In this section, we will introduce SQL query concepts mainly through examples. Here, we assume a basic understanding of SQL; there are many good online tutorials for getting up-to-speed.

Opening thoughts

Bigquery uses standard SQL, but it is not a traditional relational database system. It is a columnar database, meaning that each column of data is stored separately from the other columns. This has a couple of implications. The first is that queries will generally be most efficient if you can specify which columns you want returned (i.e., no select * from ... queries!). The second is that limiting queries to return only required columns will reduce costs since Bigquery charges based on the amount of data scanned in a query.

Table exploration

As I mentioned in my Opening thoughts section, Bigquery discourages select * queries. So, how can we get a sense of what is in the table in order to construct good queries?

The OmicIDX dataset appears in the left "dataset" window.

The OmicIDX dataset appears in the left “dataset” window.

Clicking the "pin" button on the right will keep the OmicIDX dataset in your dataset navigator for easy access.

Clicking the “pin” button on the right will keep the OmicIDX dataset in your dataset navigator for easy access.

Clicking the "pin" button on the right will keep the OmicIDX dataset in your dataset navigator for easy access.

Clicking the “pin” button on the right will keep the OmicIDX dataset in your dataset navigator for easy access.

Clicking the "pin" button on the right will keep the OmicIDX dataset in your dataset navigator for easy access.

Clicking the “pin” button on the right will keep the OmicIDX dataset in your dataset navigator for easy access.

First query

Clicking the "pin" button on the right will keep the OmicIDX dataset in your dataset navigator for easy access.

Clicking the “pin” button on the right will keep the OmicIDX dataset in your dataset navigator for easy access.

Clicking the "pin" button on the right will keep the OmicIDX dataset in your dataset navigator for easy access.

Clicking the “pin” button on the right will keep the OmicIDX dataset in your dataset navigator for easy access.

SELECT count(organism) as number_of_samples, organism
FROM `isb-cgc-01-0006.omicidx.sra_sample` 
GROUP BY organism 
ORDER BY number_of_samples DESC
LIMIT 100

Last modified January 19, 2020: Adding to docs (afce765)