This site and tool are intended for research purposes only.

Database Queries

For a provided sample entered manually in the Database Single Query or sample(s) uploaded from a batch file in the Database Batch Query tab, STRprofiler will generate a report that includes the similarity scores (described below) as computed against a database of known STR profiles.

The report will differ depending on if an individual sample or batch of samples is provided.

Default Database

Current data underlying the default database were provided by The Jackson Laboratory PDX program, and the NCI Patient-Derived Models Repository (PDMR).

If this app is hosted with a custom database, please contact the host for information on the database source.

CLASTR / Cellosaurus API Query

Query of the Cellosaurus (Bairoch, 2018) cell line database is also available for single and batch samples via the CLASTR (Robin, Capes-Davis, and Bairoch, 2019) REST API.


Single Query Report

For individual samples, a report is generated with the following fields when ‘STRprofiler Database’ is selected as the search type.

Output Field Description
Mixed Sample Flag to indicate sample mixing. Sample mixing is determined by the “‘Mixed’ Sample Threshold” option. If more markers are tri+ allelic than the threshold, samples are flagged as potentially mixed.
Shared Markers Number of markers shared between the query and database sample.
Shared Alleles Number of alleles shared between the query and database sample.
Tanabe Score Tanabe similarity score between the query and database sample (if Tanabe selected).
Master Query Score Master ‘Query’ similarity score between the query and database sample (if Master Query selected).
Master Ref Score Master ‘Reference’ similarity score between the query and database sample (if Master Ref selected).
Markers 1 … n Marker alleles with mismatches highlighted.

The report is filtered to include only those samples with greater than or equal to the Similarity Score Filter Threshold defined by the user, and report only the similarity score selected.

When Cellosaurus Database (CLASTR) is selected as the search type, a report is generated with the following fields:

Output Field Description
Accession Cellosaurus cell line accession ID. Links are provided to each accession information page.
Name Cell line name.
Score Similarity score between the query and cell line sample. Reported score reflectes the selected Similarity Score Filter.
Markers 1 … n Marker alleles with mismatches highlighted.

The report is filtered to include only those samples with greater than or equal to the Similarity Score Filter Threshold defined by the user.


Batch Query Report

For batched samples, a summary report is generated.

Output Field Description
Mixed Sample Flag to indicate sample mixing. Sample mixing is determined by the “‘Mixed’ Sample Threshold” option. If more markers have more than 3 alleles for this number of markers, the sample will be flagged as potentially mixed.
Top Match Name and Tanabe score of top match to sample.
Next Best Match Name and Tanabe score of next best match to sample.
Tanabe Matches Name and Tanabe score of matches above scoring threshold to sample.
Master Query Matches Name and Masters (vs. query) score of matches above scoring threshold to sample.
Master Ref Matches Name and Masters (vs. reference) score of matches above scoring threshold to sample.

The report is filtered to include only those samples with greater than or equal to the Similarity Score Filter Thresholds defined by the user.

When Cellosaurus Database (CLASTR) is selected as the search type, a report is generated in XLSX format, and can be downloaded via the Download XLSX button. These results will not be displayed in the app window directly, they must be downloaded.

Database File Management

Users can upload custom database files. The files must be in CSV format. A ‘Sample’ header must be present, but custom marker names may be used. Note that to score Amelogenin using the option provided, there must be a Amelogenin header in the uploaded file.


Query Options

Sample Query Options

  • Amelogenin scoring is excluded by default but can be included by selecting the option.
  • ‘Mixed’ Sample Threshold: is the number of markers with >= 2 alleles allowed before a sample is flagged for potential mixing. [default: 3]
  • Similarity Score Filter: is the similiarity score used for result filtering. [default: Tanabe]
  • Similarity Score Filter Threshold: is the threshold to filter results. Only those samples with >= the threshold will appear in results. [default: 80]

Batch Query Specific

STRprofiler Database and Within File options:

  • Amelogenin scoring is excluded by default but can be included by selecting the option.
  • Tanabe Filter Threshold: is the Tanabe score threshold over which a sample is considered a match in batch and file queries. [default: 80]
  • Masters (vs. query) Filter Threshold: is the Masters (vs. query) score threshold over which a sample is considered a match in batch and file queries. [default: 80]
  • Masters (vs. reference) Filter Threshold: is the Masters (vs. reference) score threshold over which a sample is considered a match in batch and file queries. [default: 80]

Cellosaurus Database (CLASTR) options:

  • Similarity Score Filter: is the similiarity score used for result filtering. [default: Tanabe]
  • Similarity Score Filter Threshold: is the threshold to filter results. Only those samples with >= the threshold will appear in results. [default: 80]

References

STRprofiler is provided under the MIT license. If you use this app in your research please cite:
Jared Andrews, Mike Lloyd, & Sam Culley. (2024). j-andrews7/strprofiler: v0.4.0. Zenodo. https://doi.org/10.5281/zenodo.10544686

Bairoch A. (2018) The Cellosaurus, a cell line knowledge resource. Journal of Biomolecular Techniques. 29:25-38. DOI: 10.7171/jbt.18-2902-002; PMID: 29805321

Robin, T., Capes-Davis, A. & Bairoch, A. (2019) CLASTR: the Cellosaurus STR Similarity Search Tool - A Precious Help for Cell Line Authentication. International Journal of Cancer. PubMed: 31444973  DOI: 10.1002/IJC.32639