Contenuto principale

Explore and Import Data from Cassandra Database Table

This example shows how to explore the structure of an Apache™ Cassandra® database and import data from a Cassandra database table into MATLAB® using a Cassandra database connection with the Apache Cassandra database C++ interface. The Cassandra database stores database tables according to the partition key. The partition key affects how the data is filtered in the database.

In this example, the Cassandra database contains the employees_by_job database table with employee data and the job_id partition key.

Create Cassandra Database Connection

Create a Cassandra database connection using the configured data source CassandraDataSource and a blank user name and password. The apacheCassandra function returns conn as a connection object.

datasource = "CassandraDataSource";
username = "";
password = "";
conn = apacheCassandra(datasource,username,password);

Explore Cassandra Database Structure

View the keyspaces in the Cassandra database using the Cassandra database connection. The returned string array contains the keyspaces. A keyspace contains one or more database tables and defines how the database replicates the data in the tables.

conn.Keyspaces
ans = 6×1 string array
    "employeedata"
    "system"
    "system_auth"
    "system_distributed"
    "system_schema"
    "system_traces"

Return the names of the Cassandra database tables in the employeedata keyspace. t is a string array that contains the names of the database tables in the employeedata keyspace.

keyspace = "employeedata";
t = tablenames(conn,keyspace)
t = 3×1 string array
    "employees_by_id"
    "employees_by_job"
    "employees_by_name"

Return the names of the Cassandra database columns in the employees_by_job database table.

tablename = "employees_by_job";
cols = columninfo(conn,keyspace,tablename);

Display the first few names of the Cassandra database columns in the employees_by_job database table.

head(cols)
ans=8×4 table
            "job_id"      "text"    1        ""
         "hire_date"      "date"    0    "DESC"
       "employee_id"       "int"    0     "ASC"
    "commission_pct"    "double"    0        ""
     "department_id"       "int"    0        ""
             "email"      "text"    0        ""
        "first_name"      "text"    0        ""
         "last_name"      "text"    0        ""

cols is a table with these variables:

  • Name — Cassandra database column name

  • DataType — Cassandra Query Language (CQL) data type of the Cassandra database column

  • PartitionKey — Partition key indicator

  • ClusteringColumn — Clustering column indicator

The value in the PartitionKey variable indicates whether the database column is a partition key. The column job_id (job identifier) is a partition key in this database table.

Import Data from Cassandra Database

Import data from the employees_by_job database table into MATLAB. This database has data about shop clerks, so use the partition key value SH_CLERK.

keyValue = "SH_CLERK";
results = partitionRead(conn,keyspace,tablename, ...
    keyValue);

Display the first few rows of the returned employee data.

head(results)
ans=8×13 table
    "SH_CLERK"    03-Feb-2008    183    NaN    50    "GGEONI"    "Girard"    "Geoni"    120    1×2 table    2    "650.507.9879"    2800
    "SH_CLERK"    13-Jan-2008    199    NaN    50    "DGRANT"    "Douglas"    "Grant"    124    1×2 table    [9;4;5]    "650.507.9844"    2600
    "SH_CLERK"    19-Dec-2007    191    NaN    50    "RPERKINS"    "Randall"    "Perkins"    122    1×2 table    2    "650.505.4876"    2500
    "SH_CLERK"    21-Jun-2007    182    NaN    50    "MSULLIVA"    "Martha"    "Sullivan"    120    1×2 table    [1;5]    "650.507.9878"    2500
    "SH_CLERK"    21-Jun-2007    198    NaN    50    "DOCONNEL"    "Donald"    "OConnell"    124    1×2 table    [7;1;9]    "650.507.9833"    2600
    "SH_CLERK"    17-Mar-2007    195    NaN    50    "VJONES"    "Vance"    "Jones"    123    1×2 table    [8;6;2;4;2]    "650.501.4876"    2800
    "SH_CLERK"    07-Feb-2007    187    NaN    50    "ACABRIO"    "Anthony"    "Cabrio"    121    1×2 table    [6;5;2]    "650.509.4876"    3000
    "SH_CLERK"    11-Jul-2006    190    NaN    50    "TGATES"    "Timothy"    "Gates"    122    1×2 table    [3;10;1]    "650.505.3876"    2900

results is a table that contains these variables:

  • job_id — Job identifier

  • hire_date — Hire date

  • employee_id — Employee identifier

  • commission_pct — Commission percentage

  • department_id — Department identifier

  • email — Email address

  • first_name — First name

  • last_name — Last name

  • manager_id — Manager identifier

  • office — Office location (table that contains two variables for the building and room)

  • performance_ratings — Performance ratings

  • phone_number — Phone number

  • salary — Salary

Close Cassandra Database Connection

close(conn)

See Also

| | | |

Topics

External Websites