getpdb
Retrieve protein structure data from Protein Data Bank (PDB) database
Syntax
PDBStruct
= getpdb(PDBid
)
PDBStruct
= getpdb(PDBid
,
...'ToFile', ToFileValue
, ...)
PDBStruct
= getpdb(PDBid
,
...'SequenceOnly', SequenceOnlyValue
,
...)
PDBStruct
= getpdb(PDBid
,
...'TimeOut', TimeOutValue
,
...)
Input Arguments
PDBid | Character vector or string specifying a unique identifier for a protein structure record in
the PDB database. Note Each structure in the PDB database is represented by a four-character alphanumeric
identifier. For example, |
ToFileValue | Character vector or string specifying a file name or a path and file name for saving the PDB-formatted data. If you specify only a file name, that file will be saved in the MATLAB® Current Folder. |
SequenceOnlyValue | Controls the return of the protein sequence only. Choices are true or false (default). If there is one sequence, it is returned as a character array. If there are multiple sequences, they are returned as a cell array. |
TimeOutValue | Connection timeout in seconds, specified as a positive scalar. The default value is 5. For details, see here. |
Output Arguments
PDBStruct | MATLAB structure containing a field for each PDB record. |
Description
The Protein Data Bank (PDB) database is an archive of experimentally
determined 3-D biological macromolecular structure data. getpdb
retrieves
protein structure data from the Protein Data Bank (PDB) database,
which contains 3-D biological macromolecular structure data.
searches
the PDB database for the protein structure record specified by the
identifier PDBStruct
= getpdb(PDBid
)PDBid
and returns the MATLAB structure PDBStruct
,
which contains a field for each PDB record. The following table summarizes
the possible PDB records and the corresponding fields in the MATLAB structure PDBStruct
:
PDB Database Record | Field in the MATLAB Structure |
---|---|
HEADER | Header |
OBSLTE | Obsolete |
TITLE | Title |
CAVEAT | Caveat |
COMPND | Compound |
SOURCE | Source |
KEYWDS | Keywords |
EXPDTA | ExperimentData |
AUTHOR | Authors |
REVDAT | RevisionDate |
SPRSDE | Superseded |
JRNL | Journal |
REMARK 1 | Remark1 |
REMARK N Note N equals 2 through 999. | Remarkn Note n equals 2 through 999. |
DBREF | DBReferences |
SEQADV | SequenceConflicts |
SEQRES | Sequence |
FTNOTE | Footnote |
MODRES | ModifiedResidues |
HET | Heterogen |
HETNAM | HeterogenName |
HETSYN | HeterogenSynonym |
FORMUL | Formula |
HELIX | Helix |
SHEET | Sheet |
TURN | Turn |
SSBOND | SSBond |
LINK | Link |
HYDBND | HydrogenBond |
SLTBRG | SaltBridge |
CISPEP | CISPeptides |
SITE | Site |
CRYST1 | Cryst1 |
ORIGXn | OriginX |
SCALEn | Scale |
MTRIXn | Matrix |
TVECT | TranslationVector |
MODEL | Model |
ATOM | Atom |
SIGATM | AtomSD |
ANISOU | AnisotropicTemp |
SIGUIJ | AnisotropicTempSD |
TER | Terminal |
HETATM | HeterogenAtom |
CONECT | Connectivity |
calls PDBStruct
= getpdb(PDBid
,
...'PropertyName
', PropertyValue
, ...)getpdb
with optional
properties that use property name/property value pairs. You can specify
one or more properties in any order. Each PropertyName
must
be enclosed in single quotation marks and is case insensitive. These
property name/property value pairs are as follows:
saves
the data returned from the database to a PDB-formatted file, PDBStruct
= getpdb(PDBid
,
...'ToFile', ToFileValue
, ...)ToFileValue
.
controls the return of the protein sequence only.
Choices are PDBStruct
= getpdb(PDBid
,
...'SequenceOnly', SequenceOnlyValue
,
...)true
or false
(default).
If there is one sequence, it is returned as a character array. If
there are multiple sequences, they are returned as a cell array.
sets the connection timeout (in seconds) to retrieve data from the PDB database.PDBStruct
= getpdb(PDBid
,
...'TimeOut', TimeOutValue
,
...)
The Sequence Field
The Sequence
field is also a structure containing
sequence information in the following subfields:
NumOfResidues
ChainID
ResidueNames
— Contains the three-letter codes for the sequence residues.Sequence
— Contains the single-letter codes for the sequence residues.
Note
If the sequence has modified residues, then the ResidueNames
subfield
might not correspond to the standard three-letter amino acid codes.
In this case, the Sequence
subfield will contain
the modified residue code in the position corresponding to the modified
residue. The modified residue code is provided in the ModifiedResidues
field.
The Model Field
The Model
field is also a structure or an
array of structures containing coordinate information. If the MATLAB structure
contains one model, the Model
field is a structure
containing coordinate information for that model. If the MATLAB structure
contains multiple models, the Model
field is an
array of structures containing coordinate information for each model.
The Model
field contains the following subfields:
Atom
AtomSD
AnisotropicTemp
AnisotropicTempSD
Terminal
HeterogenAtom
The Atom Field
The Atom
field is also an array of structures
containing the following subfields:
AtomSerNo
AtomName
altLoc
resName
chainID
resSeq
iCode
X
Y
Z
occupancy
tempFactor
segID
element
charge
AtomNameStruct
— Contains three subfields:chemSymbol
,remoteInd
, andbranch
.
Examples
Retrieve the structure information for the electron transport
(heme) protein that has a PDB identifier of 5CYT
,
read the information into a MATLAB structure pdbstruct
,
and save the information to a PDB-formatted file electron_transport.pdb
in
the MATLAB Current Folder.
pdbstruct = getpdb('5CYT', 'ToFile', 'electron_transport.pdb')
Version History
Introduced before R2006a
See Also
getembl
| getgenbank
| getgenpept
| pdbdistplot
| pdbread
| pdbsuperpose
| pdbtransform
| pdbwrite