AffyStruct
|
MATLAB structure containing information from an Affymetrix data
or library file, for expression, genotyping (SNP), or resequencing
assay types.
The following tables describe the fields in AffyStruct for
the different Affymetrix file types.
EXP, DAT, CEL, CHP, CLF, BGP, CDF, and GIN Files Field | Description |
---|
Name | File name. | DataPath | Path and folder of the file. | LibPath | Path and folder of the CDF and GIN library files associated
with the file you are reading. | FullPathName | Path and folder of the file. | ChipType | Name of the Affymetrix GeneChip array (for
example, DrosGenome1 or HG-Focus). | Date or CreateDate | File creation date. |
EXP File Field | Description |
---|
ChipLot
Operator
SampleType
SampleDesc
Project
Comments
Reagents
ReagentLot
Protocol
Station
Module
HybridizeDate
ScanPixelSize
ScanFilter
ScanDate
ScannerID
NumberOfScans
ScannerType
NumProtocolSteps
ProtocolSteps | Information about experimental conditions and protocols
captured by the Affymetrix software. |
DAT File Field | Description |
---|
NumPixelsPerRow | Number of pixels per row in the image created from the GeneChip array
(number of columns). | NumRows | Number of rows in the image created from the GeneChip array. | MinData | Minimum intensity value in the image created from the GeneChip array. | MaxData | Maximum intensity value in the image created from the GeneChip array. | PixelSize | Size of one pixel in the image created from the GeneChip array. | CellMargin | Size of gaps between cells in the image created from the GeneChip array. | ScanSpeed | Speed of the scanner used to create the image. | ScanDate | Date the scan was performed. | ScannerID | Name of the scanning device used. | UpperLeftX
UpperLeftY
UpperRightX
UpperRightY
LowerLeftX
LowerLeftY
LowerRightX
LowerRightY | Pixel coordinates of the scanned image. | ServerName | Not used. | Image | A NumRows -by-NumPixelsPerRow image
of the scanned GeneChip array. |
CEL File Field | Description |
---|
FileVersion | Version of the CEL file format. | Algorithm | Algorithm used in the image-processing step that converts from
DAT format to CEL format. | AlgParams | Character vector containing parameters used by the algorithm
in the image-processing step. | NumAlgParams | Number of parameters in AlgParams . | CellMargin | Size of gaps between cells in the image created from the GeneChip array,
used for computing the intensity values of the cells. | Rows | Number of rows of probes. | Cols | Number of columns of probes. | NumMasked | Number of masked probes, which are not used in subsequent processing. | NumOutliers | Number of cells identified as outliers (extremely high or extremely
low intensity) by the image-processing step. | NumProbes | Number of probes (Rows * Cols )
on the GeneChip array. | UpperLeftX
UpperLeftY
UpperRightX
UpperRightY
LowerLeftX
LowerLeftY
LowerRightX
LowerRightY | Pixel coordinates of the scanned image. | ProbeColumnNames | Cell array containing the eight column names in the Probes field:
PosX — x-coordinate
of the cell
PosY — y-coordinate
of the cell
Intensity — Intensity value of the
cell
StdDev — Standard deviation of intensity
value
Pixels — Number of pixels in the
cell
Outlier — True/false flag indicating if
the cell was marked as an outlier
Masked — True/false flag indicating if the
cell was masked
ProbeType — Integer indicating the probe
type (for example, 1 = expression)
| Probes | NumProbes -by-8 array of information about
the individual probes, including intensity values. The ProbeColumnNames field
contains the column names of this array. |
CHP File Field | Description |
---|
AssayType | Type of assay associated with the GeneChip array (for
example, Expression, Genotyping, or Resequencing). | CellFile | File name of the CEL file from which the CHP file was created. | Algorithm | Algorithm used to convert from CEL format to CHP format. | AlgVersion | Version of the algorithm used to create the CHP file. | NumAlgParams | Number of parameters in AlgParams . | AlgParams | Character vector containing parameters used in steps required
to create the CHP file (for example, background correction). | NumChipSummary | Number of entries in ChipSummary . | ChipSummary | Summary information for the GeneChip array, including
background average, standard deviation, max, and min. | BackgroundZones | Structure containing information about the zones used in the
background adjustment step. | Rows | Number of rows of probes. | Cols | Number of columns of probes. | NumProbeSets | Number of probe sets on the GeneChip array. | NumQCProbeSets | Number of QC probe sets on the GeneChip array. | ProbeSets
(Expression GeneChip array) | NumProbeSets -by-1 structure array containing information for each
expression probe set, including the following fields:
Name — Name of the probe set.
ProbeSetType — Type of the probe
set.
CompDataExists — True/false flag
indicating if the probe set has additional computed information.
NumPairs — Number of probe pairs in the
probe set.
NumPairsUsed — Number of probe pairs in
the probe set used for calculating the probe set signal (not
masked).
Signal — Summary intensity value for the
probe set.
Detection — Indicator of statistically
significant difference between the intensity value of the PM probes and
the intensity value of the MM probes in a single probe set
(Present , Absent , or
Marginal ).
DetectionPValue — P-value for the
Detection indicator.
CommonPairs — When
CompDataExists is true , contains
the number of common pairs between the experiment and the baseline after
the removal of outliers and masked probes.
SignalLogRatio — When
CompDataExists is true , contains
the change in signal between the experiment and baseline.
SignalLogRatioLow — When
CompDataExists is true , contains
the lowest ratios of probes between the experiment and the
baseline.
SignalLogRatioHigh — When
CompDataExists is true , contains
the highest ratios of probes between the experiment and the
baseline.
Change — When
CompDataExists is true ,
describes how the probe changes versus a baseline experiment. Choices
are Increase , Marginal Increase ,
No Change , Decrease , or
Marginal Decrease .
ChangePValue — When
CompDataExists is true , contains
the p-value associated with Change .
| ProbeSets
(Genotyping GeneChip array) | NumProbeSets -by-1 structure array containing information for each
genotyping probe set, including the following fields:
Name — Name of the probe set.
AlleleCall — Allele that is present for
the probe set. Possibilities are AA (homozygous for
the major allele), AB (heterozygous for the major and
minor allele), BB (homozygous for the minor allele),
or NoCall (unable to determine allele).
Confidence — Measure of the accuracy of
the allele call.
RAS1 — Relative Allele Signal 1 for the
SNP site, which is calculated using sense probes.
RAS2 — Relative Allele Signal 2 for the SNP
site, which is calculated using antisense probes.
PValueAA — p-value for an
AA call.
PValueAB — p-value for an
AB call.
PValueBB — p-value for a
BB call.
PValueNoCall — p-value for a
NoCall call.
| ProbeSets
(Resequencing GeneChip array) | NumProbeSets -by-1 structure array containing information for each
resequencing probe set, including the following fields:
CalledBases —
1-by-NumProbeSets character vector containing the
bases called by the resequencing algorithm. Possible values are
a , c , g ,
t , and n .
Scores —
1-by-NumProbeSets array containing the score
associated with each base call.
|
CLF File Field | Description |
---|
LibSetName |
Name of a collection of related library files for a given chip. There is
only one LibSetName for a CLF file. For example, PGF and
CLF files intended for use together must have the same
LibSetName . | LibSetVersion | Version of a collection of related library files for a given chip. There is only one
LibSetVersion for a CLF file. For example, PGF and CLF
files intended for use together must have the same
LibSetVersion . | GUID | Unique identifier for the CLF file. | CLFFormatVersion | Version of the CLF file format. | Rows | Number of rows in the CEL file. Note The CLF file is 1 base, which means the first row and column
are designated 1,1, not 0,0. | Cols | Number of columns in the CEL file. Note The CLF file is 1 base, which means the first row and column
are designated 1,1, not 0,0. | StartID | Starting number for the numbering of elements in the
CLF file. Tip This information is useful when numbering does not start with
1. | EndID | Ending number for the numbering of elements in the CLF
file. Tip This information is useful when numbering does not start with
1 and/or there are gaps in the numbering. | Order | Order in which the probe IDs are numbered in the CEL file, either
'row_major' or
'col_major' . | DataColNames | Names of the columns in the CEL file that contain data. | Data | If the numbering of elements in the CLF file is sequential,
this field contains a function handle that calculates the x-
and y- coordinates of each element in the file
from the probe ID. If the numbering of elements in the
CLF file is not sequential, this field contains a matrix indicating
the number value of each element in the file. |
BGP File Field | Description |
---|
LibSetName | Name of a collection of related library files for a given chip. There is only one
LibSetName for a BGP file. | LibSetVersion | Version of a collection of related library files for a given chip. There is only one
LibSetVersion for a BGP file. | GUID | Unique identifier for a BGP file. | ExecGUID | Information about the algorithm used to generate the BGP
file. | ExecVersion | Cmd | Data | Structure containing the following fields:
probe_id — ID of the probe to use for
background correction.
probeset_id — ID of the probe set in the
PGF file to which the probe belongs.
type — Classification information for the
probe.
gc_count — Combined number of G and C
bases in the probe.
probe_length — Length of the probe in base
pairs.
interrogation_position — Interrogation
position of the probe. It is typically 13 for 25-mer PM/MM
probes.
probe_sequence — Sequence of the probe on
the array, going in the direction from array surface to solution. For
most standard Affymetrix arrays, this direction is from 3' to 5'. For example, for
a sense target (st) probe (see the probe_type field),
complement the sequence in this field before looking for matches to
transcript sequences. For an antisense target (at), reverse this
sequence.
atom_id — ID of the atom to which the
probe belongs.
x — Column coordinate of the probe in the
CEL file.
y — Row coordinate of the probe in the CEL
file.
probeset_type — Classification information
for the probe set, such as control, affx, or spike. This type
information can include multiple classifications and can also be
nested.
probe_type — Classification information
for the probe, such as pm (perfect match), mm (mismatch), st (sense
target), or at (antisense target). This type information can include
multiple classifications and can also be nested.
|
CDF File Field | Description |
---|
Rows | Number of rows of probes. | Cols | Number of columns of probes. | NumProbeSets | Number of probe sets on the GeneChip array. | NumQCProbeSets | Number of QC probe sets on the GeneChip array. | ProbeSetColumnNames | Cell array containing the six column names in the ProbePairs field
in the ProbeSets array:
GroupNumber — Number identifying the group
to which the probe pair belongs. For expression arrays, this value is
always 1 . For genotyping arrays, this value is
typically 1 (allele A, sense), 2
(allele B, sense), 3 (allele A, antisense), or
4 (allele B, antisense).
Direction — Number identifying the
direction of the probe pair. 1 = sense and
2 = antisense.
PMPosX — x-coordinate
of the perfect match probe.
PMPosY — y-coordinate
of the perfect match probe.
MMPosX — x-coordinate
of the mismatch probe.
MMPosY — y-coordinate
of the mismatch probe.
| ProbeSets | NumProbeSets -by-1 structure array containing information for each
probe set, including the following fields:
Name — Name of the probe set.
ProbeSetType — Type of the probe
set.
CompDataExists — True/false flag
indicating if the probe set has additional computed information.
NumPairs — Number of probe pairs in the
probe set.
NumQCProbes — Number of QC probes in the
probe set.
QCType — Type of QC probes.
GroupNames — Name of the group to which
the probe set belongs. For expression arrays, this field contains the
name of the probe set. For genotyping arrays, this field contains the
name of the alleles, for example {'A' 'C' 'A'
'C'}' .
ProbePairs —
NumPairs -by-6 array of information about the probe
pairs. The column names of this array are contained in the
ProbeSetColumnNames field.
|
GIN File Field | Description |
---|
Version | GIN file format version. | ProbeSetName | Probe set ID/name. | ID | Identifier for the probe set (gene ID). | Description | Description of the probe set. | SourceNames | Source or sources of the probe sets. | SourceURL | Source URL or URLs for the probe sets. | SourceID | Vector of numbers specifying which SourceNames or
SourceURL each probe set is associated
with. |
|