Skip to main content

Metadata Dictionary

The CancerModels.org Metadata Dictionary expresses the details of the data model, which adheres to specific formats and restrictions to ensure a standard of data quality. The following describes the attributes and permissible values for all of the fields within the clinical tsv files for the CancerModels.org platform. For more information about the Metadata dictionary.

Version 1.1 (2023-06-27)

Version 1.0 (2023-05-10)

0 New field0 Updated field0 Deleted field
6 files with 58 fields
Attribute:

All

Patient (patient)

7 Fields
The collection of descriptors associated with patient information for data submission to CancerModels.org.
Sheet Name Example: patient
Field & Description
Attributes
Type
Permissible Values
Notes
patient_id
Unique anonymous/de-identified ID for the patient from the provider.
Required
TEXT
Values must meet the regular expression

Examples:
Alphanumeric
sex
Sex of the patient.
Required
TEXT
Any of the following:
female
male
other
Not provided
Not collected
Any of the following values: [male, female, other, not collected, not provided]
history
Cancer relevant comorbidity or eviromental exposure.
TEXT
Values must meet the regular expression
Free text
ethnicity
Patient Ethnic group. Can be derived from self-assement or genetic analysis.
TEXT
Values must meet the regular expression
Use NCIT ontology term where possible.
ethnicity_assessment_method
Patient Ethnic group assessment method.
TEXT
Any of the following:
self-assessed
genetic
Not provided
Not collected
Any of the following: ['self-assessed', 'genetic', 'not provided', 'not collected']
initial_diagnosis
Diagnosis of the patient when first diagnosed at age_at_initial_diagnosis - this can be different than the diagnosis at the time of collection which is collected in the sample section.
TEXT
Values must meet the regular expression
Use NCIT ontology term where possible.
age_at_initial_diagnosis
This is the age of first diagnostic. Can be prior to the age at which the tissue sample was collected for implant.
TEXT
Values must meet the regular expression

Examples:
Numeric. Can be exact age or binned in 10 year groups (1-9, 10-19, ...).

Patient Sample (patient_sample)

20 Fields
The collection of descriptors associated with clincal information of the tumor sample for data submission to CancerModels.org.
Sheet Name Example: patient_sample
Field & Description
Attributes
Type
Permissible Values
Notes
patient_id
Unique anonymous/de-identified ID for the patient from the provider.
Required
TEXT
Values must meet the regular expression

Examples:
Alphanumeric
sample_id
Unique ID of the patient tumour sample used to generate cancer models.
Required
TEXT
Values must meet the regular expression
Alphanumeric
collection_date
Date of collections. Important for understanding the time relationship between models generated for the same patient.
TEXT
Values must meet the regular expression

Examples:
MMM YYYY
collection_event
Collection event corresponding to each time a patient was sampled to generate a cancer model, subsequent collection events are incremented by 1.
TEXT
Values must meet the regular expression
collection event + 'event number'
months_since_collection_1
The time difference between the 1st collection event and the current one (in months).
TEXT
Values must meet the regular expression

Examples:
Numeric. The collection event 1 should be 0, collection event 2 should be 6 if 6 months have elapase between collection 1 and collection 2 and collection event 3 should be 9 if 9 months have elapsed between collection 1 and collection 3.
age_in_years_at_collection
Patient age at collection.
Required
TEXT
Values must meet the regular expression

Examples:
Numeric. Can be exact age or binned in 10 year groups (1-9, 10-19, ...).
diagnosis
Diagnosis at time of collection of the patient tumour used in the cancer model
Required
TEXT
Values must meet the regular expression
Use NCIT ontology term where possible.
tumour_type
Collected tumour type.
Required
TEXT
Any of the following:
primary
metastatic
recurrent
refractory
pre-malignant
2 more
Use NCIT ontology term where possible.
primary_site
Site of the primary tumor where primary cancer is originating from (may not correspond to the site of the current tissue sample).
Required
TEXT
Values must meet the regular expression

Examples:
Use NCIT ontology term where possible.
collection_site
Site of collection of the tissue sample (can be different than the primary site if tumour type is metastatic).
Required
TEXT
Values must meet the regular expression

Examples:
Use NCIT ontology term where possible.
stage
Stage of the patient at the time of collection.
TEXT
Values must meet the regular expression

Examples:
Should be in the format of the stage classification.
staging_system
Stage classification system used to describe the stage, add the version if available.
TEXT
Values must meet the regular expression
Free text
grade
The implanded tumour grade value.
TEXT
Values must meet the regular expression

Examples:
Free text
grading_system
Grade classification corresponding used to describe the stage, add the version if available.
TEXT
Values must meet the regular expression
Free text
virology_status
Positive virology status at the time of collection. Any relevent virology information which can infliuence cancer like EBV, HIV, HPV status.
TEXT
Values must meet the regular expression
Use NCIT ontology term where possible.
sharable
Is patient treatement information avalaible and sharable? If yes fill out following treatment columns: treatment_naive_at_collection, treated_at_collection, treated_prior_to_collection.
Required
TEXT
Any of the following:
yes
no
Not provided
Not collected
Any of the following: ['yes', 'no', 'not provided', 'not collected']
treatment_naive_at_collection
Was the patient treatment naive at the time of collection? This includes the patient being treated at the time of tumour sample collection and if the patient was treated prior to the tumour sample collection. The value will be 'yes' if either treated_at_collection or treated_prior_to_collection are 'yes'.
Required
TEXT
Any of the following:
yes
no
Not provided
Not collected
Any of the following: ['yes', 'no', 'not provided', 'not collected']
treated_at_collection
Was the patient being treated for cancer at the time of tumour sample collection. This includes any of the following: radiotherapy, chemotherapy, targeted therapy, homorno-therapy.
TEXT
Any of the following:
yes
no
Not provided
Not collected
Any of the following: ['yes', 'no', 'not provided', 'not collected']
treated_prior_to_collection
Was the patient treated for cancer prior to the time of tumour sample collection. This includes any of the following: radiotherapy, chemotherapy, targeted therapy, homorno-therapy.
TEXT
Any of the following:
yes
no
Not provided
Not collected
Any of the following: ['yes', 'no', 'not provided', 'not collected']
model_id
Unique identifier for all the cancer models derived from the same tissue sample. Needs to be unique.
Required
TEXT
Values must meet the regular expression

Examples:
Alphanumeric

Sharing (sharing)

7 Fields
The collection of descriptors associated with model data sharability for data submission to CancerModels.org.
Sheet Name Example: sharing
Field & Description
Attributes
Type
Permissible Values
Notes
model_id
Unique identifier for all the cancer models derived from the same tissue sample.
Required
TEXT
Values must meet the regular expression

Examples:
Alphanumeric
accessibility
Define any limitation of access of the model per type of users like academia, industry, academia and industry, or national limitation if needed (e.g. no specific consent for sequencing).
Required
TEXT
Any of the following:
academia
industry
academia and industry
Any of the following: ['academia', 'industry', 'academia and industry']
europdx_access_modality
If part of EUROPDX consortium fill this in. Designates a model is accessible for transnational access through the EDIReX infrastructure, or only on a collaborative basis (i.e. upon approval of the proposed project by the owner of the model)
TEXT
Any of the following:
transnational access
collaboration only
Not applicable
Not provided
Not collected
Any of the following: ['transnational access', 'collaboration only', 'not provided']
email
Contact email for any requests from users about models. If multiple, include as comma separated list.
Required
TEXT
Values must meet the regular expression
Email address
name
Contact person (should match that included in email column).
Required
TEXT
Values must meet the regular expression

Examples:
Free text
form_url
If the center has a contact form rather than a contact email include the link here.
TEXT
URL
database_url
If institution has a public database and want that link to be shared, include here.
TEXT
Values must meet the regular expression
URL

Pdx Model (pdx_model)

9 Fields
The collection of descriptors associated with PDX models for data submission to CancerModels.org
Sheet Name Example: pdx_model
Field & Description
Attributes
Type
Permissible Values
Notes
model_id
Unique identifier for all the cancer models derived from the same tissue sample.
Required
TEXT
Values must meet the regular expression

Examples:
Alphanumeric
host_strain_name
Host mouse strain name (e.g. NOD-SCID, NSG, etc).
Required
TEXT
Values must meet the regular expression

Examples:
Mouse strain name
host_strain_nomenclature
The full nomenclature form of the host mouse strain name.
Required
TEXT
Values must meet the regular expression
Use most precise nomenclature where possible.
engraftment_site
Organ or anatomical site used for the PDX tumour engraftment (e.g. mammary fat pad, Right flank).
Required
TEXT
Values must meet the regular expression

Examples:
Use anatomical graft site, not collected, or not provided.
engraftment_type
PDX Engraftment Type: Orthotopic if the tumour was engrafted at a corresponding anatomical site (e.g. patient tumour of primary site breast was grafted in mouse mamary fat pad). If grafted subcuteanously use hererotopic.
Required
TEXT
Any of the following:
heterotopic
orthotopic
Not provided
Not collected
Any of the following: ['heterotopic', 'orthotopic', 'not collected', 'not provided'].
sample_type
Description of the type of material grafted into the mouse. (e.g. tissue fragments, cell suspension).
Required
TEXT
Values must meet the regular expression
Free text
sample_state
PDX Engraftment material state (e.g. fresh or frozen). If other please describe.
TEXT
Values must meet the regular expression

Examples:
Free text
passage_number
Passage number: When different host strains, or PDX Engraftment Site or PDX Engraftment Type or PDX Engraftment Material were used during the PDX line generation, please add passage - add rows per model as needed. We assume that passage 0 correspond to first engraftment - if not the case please indicate what passage 0 correspond to. passage number- add rows if columns D,E,F, G changes - if no change and always same D,E,F,G add 'all' as passage value to specify the conditions are the same in all passages.
Required
TEXT
Values must meet the regular expression

Examples:
Numeric or all.
publications
If model has been part of a published study include pubmed IDs separated by commas.
TEXT
Values must meet the regular expression
PMID: 8 digit format, separatedby commas.

Model Validation (model_validation)

5 Fields
The collection of descriptors associated with cancer model validation techniques for data submission to CancerModels.org
Sheet Name Example: model_validation
Field & Description
Attributes
Type
Permissible Values
Notes
model_id
Unique identifier for all the cancer models derived from the same tissue sample.
Required
TEXT
Values must meet the regular expression

Examples:
Alphanumeric
validation_technique
Any technique used to validate PDX against their original patient tumour, including fingerprinting, histology, immunohistochemistry.
Required
TEXT
Values must meet the regular expression
Free text
description
Short description of what was compared and what was the result: (e.g. high, good, moderate concordance between xenograft, 'model validated against histological features of same diagnosis' or 'not determined') - It needs to be clear if the model is validated or not.
Required
TEXT
Values must meet the regular expression
Free text
passages_tested
Provide a list of all passages where validation was performed. Passage 0 correspond to first engraftment (if this is not the case please define how passages are numbered).
Required
TEXT
Values must meet the regular expression

Examples:
A list of numbers separted by commas.
validation_host_strain_nomenclature
Validation host mouse strain, following mouse strain nomenclature from MGI JAX.
Required
TEXT
Values must meet the regular expression
Full host strain name, not collected, or not provided

Cell Model (cell_model)

10 Fields
The collection of descriptors associated with cell and organoid models for data submission to CancerModels.org
Sheet Name Example: cell_model
Field & Description
Attributes
Type
Permissible Values
Notes
model_id
Unique identifier for all the cancer models derived from the same tissue sample.
Required
TEXT
Values must meet the regular expression

Examples:
Alphanumeric
name
Most common name associate with model. Please use the CCLE name if available.
Required
TEXT
Values must meet the regular expression
Free text
type
Type of organoid or cell model.
Required
TEXT
Any of the following:
organoid
CRC
3-D: other
2D: other
cell line
2 more
One of the following: ['organoid', 'CRC', '3-D: other', '2D: other', 'cell line'].
growth_properties
Observed growth properties of the related model.
Required
TEXT
Values must meet the regular expression
One of the following: ['embedded 3d culture', 'adherent', 'mix of adherent and suspension', 'suspension'].
parent_id
Please add the model Id of the model used to generate the model. If the model was not in this set, please refer to it by external id.
TEXT
Values must meet the regular expression

Examples:
Alphanumeric
origin_patient_sample_id
Unique ID of the patient tumour sample used to generate the model.
TEXT
Values must meet the regular expression
Alphanumeric
comments
Add crucial comments about the model that cannot be expressed by other fields.
Required
TEXT
Values must meet the regular expression
Free text
supplier
Please provide the suplies brief acronym or name followed by a colon and thel number or name use to reference the model.
TEXT
Values must meet the regular expression
Free text
external_ids
Depmap accession, Cellusaurus accession or other id. Please place in comma separated list.
TEXT
Values must meet the regular expression
Free text
publications
If model has been part of a published study include pubmed IDs separated by commas.
TEXT
Values must meet the regular expression
PMID: 8 digit format, separatedby commas.