Quickstart
To start using biobricks just:
$ pipx install biobricks
$ biobricks configure # set a path and token
$ biobricks install clinvar # or any other database
Installing A Brick
clinvar is a database of clinically relevant genetic variants. First, install it:
$ biobricks install clinvar
Each brick has a list of filesystem paths (starting at BBLIB)
$ biobricks assets clinvar
hgvs4variation_parquet: <BBLIB>/brick/hgvs4variation.parquet
submission_summary_parquet: <BBLIB>/brick/submission_summary.parquet
The clinvar assets are parquet files, which can be used in python and R:
Python Example
To use biobricks in python, first install the command line client (see Quickstart).
The assets
function returns a namespace with brick asset paths.
This also works well with spark, but pandas and pyarrow work well too.
>>> import biobricks, pandas
>>> clinvar = biobricks.assets('clinvar') # a namespace with paths to assets
>>> pandas.read_parquet(clinvar.allele_gene_parquet)
AlleleID GeneID Symbol Name GenesPerAlleleID ...
0 15041.0 9907.0 AP5Z1 adaptor related protein complex 5 subunit zeta 1 1.0
1 15042.0 9907.0 AP5Z1 adaptor related protein complex 5 subunit zeta 1 1.0
...
R Example
To use biobricks in R, first install the command line client (see Quickstart), then install
the R package from github (see below). The bbassets
function is the main function and it
returns a named list of assets for a given brickref.
> install.packages('biobricks')
> clinvar <- biobricks::bbassets('clinvar') # a named list of assets
> arrowds <- arrow::open_dataset(clinvar$allele_gene_parquet) # arrow loads parquet files
> arrowds |> head() |> dplyr::collect()
# A tibble: 6 × 7
# AlleleID GeneID Symbol Name GenesPerAlleleID Category Source
# * <dbl> <dbl> <chr> <chr> <dbl> <chr> <chr>
# 1 15041 9907 AP5Z1 adaptor related prot… 1 within … submi…
# 2 15042 9907 AP5Z1 adaptor related prot… 1 within … submi…
A list of available bricks can be found at status.biobricks.ai.