Theory Methods Usage Guide Features FAQ

Theory: Native Gaussian Entanglement

What is Protein Entanglement?

Protein entanglement refers to topological knots and complex linking patterns within the three-dimensional backbone of protein structures. These entanglements are distinct from simple geometric knots and involve the deep threading of the protein chain through itself. Understanding protein entanglement is crucial for:

  • Protein Folding: Entanglements may play a role in protein folding kinetics and stability
  • Functional Design: Some proteins use entanglement for mechanical strength or specialized functions
  • Evolution: Studying how entanglement patterns are conserved across species
  • Structure Prediction: Improving computational models of protein structure

Gaussian Entanglement Definition

Native Gaussian Entanglement (GE) is a method for detecting and quantifying protein entanglement based on the Gaussian linking number. The Gaussian linking number is a topological invariant that measures how many times two curves wind around each other in 3D space.

For each pair of residues in the protein backbone, the method calculates:

Lk(i,j) = (1/4π) ∫∫ (r₁ × r₂)·(r₁ - r₂) / |r₁ - r₂|³ ds₁ ds₂ Where: - r₁, r₂ are position vectors of the two chain segments - The integral is computed over the arc length of both segments - The result is a continuous measure of linking

Non-zero linking numbers between distant residues indicate the presence of entanglement. The magnitude indicates the strength of the entanglement.

High-Quality vs. Clustered Results

The analysis produces three levels of results:

Result Type Description Use Case
Native_GE All detected entanglements based on raw Gaussian linking calculation Comprehensive analysis, research-grade data
Native_HQ_GE High-quality entanglements filtered by statistical significance and structural criteria Focus on reliable, biologically relevant entanglements
Native_Clustered_HQ_GE Further clustering of HQ results to remove redundancy and group similar entanglements Simplified interpretation, core entanglement regions

Methods: EntDetect Analysis Pipeline

Analysis Steps

1. Structure Preprocessing

Input structures (PDB or AlphaFold) are first preprocessed to:

2. Gaussian Entanglement Detection

For all residue pairs (i,j) where i < j-window_size:

3. High-Quality Filtering

Raw results are filtered based on:

4. Clustering

High-quality entanglements are clustered to group related findings:

Contact Type Options

The analysis can use different types of inter-atomic contacts:

Entanglement Detection Methods

Method Description Sensitivity
GLN (Gaussian Linking Number) Classic Gaussian method Standard
TLN (Topological Linking Number) Discrete topology-based approach Conservative
Consensus (Default) Requires agreement between GLN and TLN High specificity

Usage Guide

Single Structure Analysis

  1. Go to the Submit Analysis page
  2. Upload your PDB file or enter a PDB ID to download from RCSB
  3. Select structure type (Experimental or AlphaFold)
  4. Choose organism profile for clustering cutoff
  5. Optionally enable feature generation (requires UniProt ID)
  6. Click Submit to start analysis

Batch Analysis

  1. Go to the Submit Analysis page
  2. Click the "Multiple Files" tab
  3. Select multiple PDB files at once
  4. Optionally set per-file UniProt accession IDs
  5. Configure shared analysis parameters
  6. Submit to process all files with batch tracking

Interpreting Results

Results include three main tables:

Key columns in results:

Entanglement Features

Feature Generation

When enabled (with valid UniProt ID), the system generates detailed entanglement features including:

  • Threading Metrics: N-terminal and C-terminal threading counts and positions
  • Crossing Counts: Number of strand crossings in different regions
  • Geometric Features: Loop sizes, threading depth, structural context
  • Sequential Analysis: Position-specific metrics and threading patterns
  • Protein Coverage: What percentage of the protein is involved in entanglement
  • Bond Classification: Identification of C-C bonds and backbone bonds in threading

UniProt Integration

Features are enriched with UniProt accession information to enable:

Frequently Asked Questions

What file formats are supported?

The platform supports PDB files (.pdb) and can download structures from RCSB PDB using PDB IDs. AlphaFold predictions should be provided in PDB format.

How long does analysis take?

Analysis time depends on protein size:

What does the confidence score mean?

High-quality results (Native_HQ_GE) have been validated against multiple detection methods. Clustered results (Native_Clustered_HQ_GE) represent the most reliable and consolidated findings.

Can I analyze multi-chain structures?

Yes! Analysis is performed per-chain, so multi-chain structures will show separate results for each chain. Inter-chain entanglement is analyzed as separate intra-chain problems for each chain.

What does "No entanglements detected" mean?

Some proteins have no knots or complex entanglements. This is a valid biological finding. Many proteins fold successfully without topological complexity. The absence of entanglement doesn't indicate analysis failure.

How do I cite this work?

Please cite the EntDetect paper and reference this platform. For more information, contact the O'Brien Lab at Penn State University.

Can I download my results?

Yes! Once your analysis is complete, visit the Browse Results page and use the Download button to get a ZIP file containing all result files.