User's Guide - CDF Files

B.3 CDF Files

CDF is a data abstraction for self-describing multidimensional Arrays. It represents a simpler data model than that of Data Explorer, one similar to that of the Array Object. Data are accessed in CDF through an applications programming interface, available as C and FORTRAN libraries from the National Space Science Data Center, NASA/Goddard Space Flight Center, Greenbelt, MD. Data in CDF may be stored in a number of physical formats (e.g., native or portable binary, single or multiple files, row or column majority), but the interface is the same. Hence, data in a CDF written in a format "foreign" to the workstation on which Data Explorer is running are converted automatically during the Import process.

Data Explorer provides support for importing Fields stored as CDF r-variables. To import data from a CDF, specify the CDF name as the name parameter in the Import Configuration dialog box (not the file name, since the CDF may be in multiple-file format). If the CDF has more than one variable, which is typical, Data Explorer categorizes each variable as positions, series, or data as appropriate. Variables that vary in one dimension only and are not record-variant are considered positions, and become the positions component in a Field Object. In many cases, these variables may have the CDF variable mnemonics of LATITUDE and LONGITUD, which are mapped to the first (x) and second (y) components of the positions vector, if they exist. This mapping permits direct use of these data with cartographic and other tools for the earth and space sciences that are publicly available for use with Data Explorer. Otherwise, the first n variables categorized as positions (where n is the dimensionality of the CDF dimensions) are used to form the positions component. Any additional such variables are treated as data variables. If there are no positions-type variables, the positions component will be a regular grid with origin of 0 and increments of 1 in each axis, where the number of axes corresponds to the dimensionality of the imported CDF r-variable.

If there are records in the CDF, each record is imported as a series member. In many cases there is a variable with the mnemonic EPOCH, which corresponds to a time stamp for each record in the CDF. If so, the double representing msec since 0 AD in each value of EPOCH is stored as the series position attribute. If not, the first variable that is record-variant and nondimensional-variant is considered the series variable. This variable is imported as the series position attribute. If there is no time variable, the series position starts at 1 and increments by 1 per series member, so that there is one member for each record in the CDF. The series position attribute, containing the time stamp, may be accessed with the Attribute module.

You can specify the name or names of the data variable in the variable parameter of the Import tool and the corresponding variable(s) will be imported. In the same way, you can use start, end, and delta to import a subset of CDF records.

Variable and global attributes present in the CDF are imported as Object attributes. These attributes may be accessed through the Attribute and Inquire modules (e.g., to build metadata-driven applications).

Variables that vary in all dimensions and are record-variant are considered data variables. Any variable that is not a position or time variable is also considered a data variable, allowing every variable to be imported. If you want the positions to be a variable other than the one chosen by Data Explorer, you can use Replace or Rename to switch the components (e.g., two or more sets of positions information are stored for different coordinate systems). Each data variable becomes a data component in a Field Object. Hence, there is one Field for each data variable in the Group imported. Since Data Explorer can handle data more flexibly than CDF, some assumptions are imposed upon certain classes of data that may be imported:

Since data stored in CDF are not distinguished as cell-centered or node-based, all data components are treated as the latter, (i.e., data dep positions). The Post module may be used to transform a Field to cell-centered (i.e., data dep connections).
Since CDF does not "natively" support Fields other than rank=0, all data variables are treated as scalars. The Compute module can be used to construct the appropriate vector representation from multiple scalar Fields.
The connections component depends on the dimensionality of the data variable such that 0 = none, 1 = lines, 2 = quads, 3 = cubes, and so on.
Each positions variable is considered a term of a Product Array to form the positions component.
All variables of 0 dimension are imported as the data component of a Field with no positions and no connections. If the LATITUDE and/or LONGITUD variables exist, the other variables are considered data components of Fields with positions and no connections, where the positions are those latitude and longitude variables. You can construct an appropriate Field with positions and connections from the variables that are imported through modules like Construct, Regrid, and Connect.
All variables of 1 dimension are imported as the data component of a Field of lines, where the positions would typically be a scalar (i.e., the one independent variable). If the LATITUDE and LONGITUD variables exist, then the positions are a 2-vector constructed from the latitude and longitude Arrays, but still a line.
One-dimensional variables in CDF may be of one of three distinct classes, which are NOT distinguished in the way they are stored in a CDF file: 1) true 1-dimensional or line data; 2) indexed point data; or 3) indexed mesh data. You must know which class the variable belongs to in order to ensure that Data Explorer processes the data in an appropriate fashion. The first class is handled correctly. For the second and third class, the connections component of any imported Field(s) may be meaningless. You can use the Remove module to eliminate it and treat the Field as scattered or point data (i.e., use Regrid or Connect to create a more appropriate mesh).
Treating such data as a collection of points is consistent with the original design philosophy of CDF and CDF applications. The third case actually represents an irregular mesh, which Data Explorer can support directly. Unfortunately, the connectivity information (i.e., the mesh structure) is typically not stored in the CDF, so Import cannot directly reconstruct the original mesh. Hence, the data must be treated as point data unless you have information, external to the CDF, that can be used to recreate the original mesh structure.

[ OpenDX Home at IBM | OpenDX.org ]