matlab.io.datastore.BlockedFileSet
Blocked file-set for collection of blocks within file
Description
The matlab.io.datastore.BlockedFileSet object helps you
process a large collection of blocks within files when moving through the files iteratively.
Use the BlockedFileSet object together with the DsFileReader object
to manage and read files from your datastore.
Creation
Syntax
Description
creates a bs = matlab.io.datastore.BlockedFileSet(location)BlockedFileSet object for a collection of blocks within files
based on the specified location.
specifies the file extension, subfolders, or sets object properties. You can specify
multiple name-value pairs. Enclose names in quotes.bs = matlab.io.datastore.BlockedFileSet(location,Name,Value)
Input Arguments
Files or folders to include in the BlockedFileSet object,
specified as a character vector, cell array of character vectors, string array, or a
structure. If the files are not in the current folder, then
location must be a full or relative path. Files within subfolders
of the specified folder are not automatically included in the
BlockedFileSet object.
Typically for a Hadoop® workflow, when you specify location as a
structure, it must contain the fields FileName,
Offset, and Size. This requirement enables you
to use the location argument directly with the initializeDatastore method of the matlab.io.datastore.HadoopLocationBased class. For an example, see Add Support for Hadoop.
You can use the wildcard character (*) when specifying
location. Specifying this character includes all matching files or
all files in the matching folders in the file-set object.
If the files are not available locally, then the full path of the files or folders
must be a uniform resource locator (URL), such
as
hdfs://.hostname:portnumber/path_to_file
Data Types: char | cell | string | struct
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN, where Name is
the argument name and Value is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose
Name in quotes.
Example: bs =
matlab.io.datastore.BlockedFileSet(location,'IncludeSubfolders',true)
File extensions, specified as a character vector, cell array of character
vectors, or string array. You can use the empty quotes '' to
represent files without extensions.
If 'FileExtensions' is not specified, then
BlockedFileSet automatically includes all file
extensions.
Example: 'FileExtensions','.jpg'
Example: 'FileExtensions',{'.txt','.csv'}
Subfolder inclusion flag, specified as a numeric or logical 1
(true) or 0 (false).
Specify true to include all files and subfolders within each
folder or false to include only the files within each
folder.
Example: 'IncludeSubfolders',true
Properties
Block size in bytes to be used to split file information, specified as one of these values:
'file'— Use size of next file in the collection.numeric scalar — Use specified value in bytes.
Example: 'BlockSize',2000
Alternate file system root paths, specified as a string array or a cell array. Use
'AlternateFileSystemRoots' when you create a datastore on a local
machine, but need to access and process the data on another machine (possibly of a
different operating system). Also, when processing data using the Parallel Computing Toolbox™ and the MATLAB®
Parallel Server™, and the data is stored on your local machines with a copy of the data
available on different platform cloud or cluster machines, you must use
'AlternateFileSystemRoots' to associate the root paths.
To associate a set of root paths that are equivalent to one another, specify
'AlternateFileSystemRoots'as a string array. For example,["Z:\datasets","/mynetwork/datasets"]
To associate multiple sets of root paths that are equivalent for the datastore, specify
'AlternateFileSystemRoots'as a cell array containing multiple rows where each row represents a set of equivalent root paths. Specify each row in the cell array as either a string array or a cell array of character vectors. For example:Specify
'AlternateFileSystemRoots'as a cell array of string arrays.{["Z:\datasets", "/mynetwork/datasets"];... ["Y:\datasets", "/mynetwork2/datasets","S:\datasets"]}Alternatively, specify
'AlternateFileSystemRoots'as a cell array of cell array of character vectors.{{'Z:\datasets','/mynetwork/datasets'};... {'Y:\datasets', '/mynetwork2/datasets','S:\datasets'}}
The value of 'AlternateFileSystemRoots' must satisfy these
conditions:
Contains one or more rows, where each row specifies a set of equivalent root paths.
Each row specifies multiple root paths and each root path must contain at least two characters.
Root paths are unique and are not subfolders of one another.
Contains at least one root path entry that points to the location of the files.
For more information, see Set Up Datastore for Processing on Different Machines or Clusters.
Example: ["Z:\datasets","/mynetwork/datasets"]
Data Types: string | cell
This property is read-only.
Number of blocks in the blocked file-set object, specified as a numeric scalar.
Example: bs.NumBlocks
Data Types: double
This property is read-only.
Number of blocks read from the BlockedFileSet object, specified
as a numeric scalar.
Example: bs.NumBlocksRead
Data Types: double
This property is read-only.
Information about blocks in the
matlab.io.datastore.BlockedFileSet object, returned as a
matlab.io.datastore.BlockedInfo object with these properties:
Filename— Name of the file in theBlockedFileSetobject. The name contains the full path of the file.FileSize— Size of the file in number of bytes.Offset— Starting offset within the file to be read.BlockSize— Size of the block in number of bytes.
For information about a specific block, specify the block index. For example,
bs.BlockInfo(2) returns information for the second block. If you
call bs.BlockInfo specifying (:) or without
specifying an index, it returns information for all of the blocks.
Example: bs.BlockInfo(2)
Object Functions
hasPreviousBlock | Determine if blocked file-set has previous block |
previousblock | Information on previous block in blocked file-set |
hasNextBlock | Determine if blocked file-set has another block |
nextblock | Information on next block in blocked file-set |
progress | Determine how many blocks or files have been read |
maxpartitions | Maximum number of partitions |
partition | Partition file-set object |
subset | Create subset of datastore or FileSet |
reset | Reset the file-set object |
Examples
Create a blocked file-set and query information for specific blocks in the blocked file-set.
Create a blocked file-set bs for a collection of files and specify the block size.
folder = {'accidents.mat','airlineResults.mat','census.mat','earth.mat'}folder = 1×4 cell
{'accidents.mat'} {'airlineResults.mat'} {'census.mat'} {'earth.mat'}
bs = matlab.io.datastore.BlockedFileSet(folder,'BlockSize',2000)bs =
BlockedFileSet with properties:
NumBlocks: 98
NumBlocksRead: 0
BlockSize: 2000
BlockInfo: Show BlockInfo for all 98 blocks
AlternateFileSystemRoots: {}
Obtain information for specific blocks using either the nextblock function or by querying the BlockInfo property and specifying an index. Obtain information for consecutive blocks using nextblock. For example, obtain information for the first two blocks in the set.
blk1 = nextblock(bs)
blk1 =
1×1 BlockInfo
Filename FileSize Offset BlockSize
_________________________________________________________________________________________________________________ ________ ______ _________
"/mathworks/devel/bat/filer/batfs2566-0/Bdoc25b.2988451/build/runnable/matlab/toolbox/matlab/demos/accidents.mat" 7343 0 2000
blk2 = nextblock(bs)
blk2 =
1×1 BlockInfo
Filename FileSize Offset BlockSize
_________________________________________________________________________________________________________________ ________ ______ _________
"/mathworks/devel/bat/filer/batfs2566-0/Bdoc25b.2988451/build/runnable/matlab/toolbox/matlab/demos/accidents.mat" 7343 2000 2000
Query the BlockInfo property to get information about the last block in the set.
lastblk = bs.BlockInfo(98)
lastblk =
1×1 BlockInfo
Filename FileSize Offset BlockSize
_____________________________________________________________________________________________________________ ________ ______ _________
"/mathworks/devel/bat/filer/batfs2566-0/Bdoc25b.2988451/build/runnable/matlab/toolbox/matlab/demos/earth.mat" 32522 32000 522
Version History
Introduced in R2020a
See Also
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Seleziona un sito web
Seleziona un sito web per visualizzare contenuto tradotto dove disponibile e vedere eventi e offerte locali. In base alla tua area geografica, ti consigliamo di selezionare: .
Puoi anche selezionare un sito web dal seguente elenco:
Come ottenere le migliori prestazioni del sito
Per ottenere le migliori prestazioni del sito, seleziona il sito cinese (in cinese o in inglese). I siti MathWorks per gli altri paesi non sono ottimizzati per essere visitati dalla tua area geografica.
Americhe
- América Latina (Español)
- Canada (English)
- United States (English)
Europa
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)