Function Argument/Return Value Data Types

Every value that a UDF accepts as an argument or returns as a result, must map to a SQL data type that you can specify for a table column.

Every data type has a corresponding structure defined in the C++ and Java header files with two member fields and some predefined comparison operators and constructors:

  • is_null indicates if a value is/is not NULL. When non-NULL, val holds the argument or return value.
  • null () is a member function that constructs an instance of the struct with the is_null flag set.
  • <,>=,BETWEEN,ORDER BY are built-in SQL comparison operators and clauses that work automatically based on the SQL return type of each UDF.
  • Every struct within your UDF code defines == and != operators for comparisons with structs of the same type for typical C++ comparisons within your own code. Each kind of struct one or more constructors that define a filled-in instance of the struct.
  • Every type of struct has a null() member function that returns an instance of the struct with is_null flag set.
  • Impala cannot process UDFS that accept or return composite or nested types. This applies to UDFs written in C++ and Jave-based Hive UDFs.
  • You can create multiple functions with the same SQL name and different argument types to overload functions, however you must use different C++ or Java entry point names in the underlying functions.

The following table lists the data types defined for C++ in /usr/include/impala_udf/udf.h:

Data Type



Represents an INT column.


Represents a BIGINT column.


Represents a SMALLINT column.


Represents a TINYINT column.


Represents a STRING column. It has a len field that represents the length of the string and a ptr field that points to the string data. It also has a constructor that creates a new StingVal struct based on a null-terminated C-style string or a pointer plus a length. It also has a constructor that takes a pointer to a FunctionContext struct and length, which does not allocate space for a new copy of the string data that you can use in UDFs that return string values.


Represents a BOOLEAN column.


Represents a FLOAT column.


Represents a DOUBLE column.


Represents a TIMESTAMP column. It has a 32-bit integer date field that represents the Gregorian date and a 64-bit integer time_of_day field that represents the current time of day in nanoseconds.