CORAL expression parser
Under construction!
This page is meant to describe what users should know about the SQL expression parser in CORAL. It covers questions that are often asked by end users in their support requests. In particular, some issues in the CORAL expression parser are pointed out and some recommendations given to users about dos and donts for the
queries and
column names they use in their application.
Introduction: CORAL expression parser
Users create CORAL queries by specifying several input parameters to the appropriate interfaces. These input parameters include column names to be selected, table names and SQL fragments (a query WHERE clause passed to the setCondition method).
As the applications may need to be ported to several backends, users write the same code with the same parameters for all backends, such Oracle and SQLite. Each CORAL plugin responsible for one such backend (e.g. OracleAccess and SQLiteAccess) use an "expression parser" to process these input parameters and create the complete SQL statement appropriate for that backend.
When CORAL was originally written, each plugin was using a separate expression parser. However, as most of the code in these parsers was the same, and to ease software maintenance, a common
ExpressionParser
class that can be used by all plugins was recently introduced in
CORAL_2_1_0
. This is described for instance in
task #7973
, while some open issues are found in
task #16787
.
The original expression parsers, from which the common ExpressionParser described above was also derived, were using an algorithm where each word in the user's SQL fragments was explicitly compared to a list of table, column and bind variable names known to CORAL. One of the main problems with this implementation was that it was relying on CORAL to find out a priori which table and column names a user could possibly query. Apart from not being foolproof, this algorithm especially implied a large performance overhead, for instance in the case of Oracle from many expensive queries against the Oracle 'data dictionary' (see some examples in
task #10775
and
bug #91531
). For this reason, the common ExpressionParser is recently being replaced by a new
SimpleExpressionParser
class, which uses a very different algorithm not involving data dictionary queries. The new mechanism is discussed in
task #10844
and has been introduced in CORAL_2_3_18.
Common issues and recommendations
Two common sources of problems in this area include:
- The use of lowercase column or table names in Oracle (or more generally the use of non-default case names in case-insensitive systems). This is an issue because Oracle is case-insensitive and by default uppercase. Lowercase names in SQL statements can be used, but are automatically translated to uppercase ('create table test' will actually create table 'TEST'). Lowercase objects can be created and queried but in that case they need to be escaped ('create table "test"' will create table 'test').
- The use of protected keywords (like 'select') as column or table names. These keywords can in many cases be used (lowercase or uppercase), but in that case they also need to be escaped.
Common issues and CORAL bugs involving lowercase names include for instance:
- Missing quotes around lowercase table aliases in Oracle, leading to ORA-00904 (e.g. bug #28189
).
- Missing quotes around lowercase table aliases in Oracle, leading to ORA-00918 (e.g. bug #42457
).
- Missing quotes around lowercase column names in Frontier, leading to ORA-00904 (e.g. bug #42317
).
Common issues and CORAL bugs involving protected keywords include for instance:
- Missing quotes around protected keywords (e.g. bug #17651
in MySQL).
Many of the issues above were identified using the old plugin-specific expression parsers or the first common ExpressionParser. More recently, a few issues have also been reported for the newer SimpleExpressionParser, such as:
- Reserved words such as 'like' should not be escaped with quotes, whether lowercase or uppercase, when used as SQL operators (e.g. bug #89370
and the first part of bug #87111
).
- Reserved words such as 'like' should be escaped with quotes (whether lowercase or uppercase? only lowercase) when used as table aliases (e.g. bug #88007
).
- SQL functions such as 'SYSTIMESTAMP WITH TIME ZONE' should not be escaped with quotes (e.g. the second part of bug #87111
). [This actually was not fixed: it should probably be considered a new 'feature' that can only be addressed by one of the recommendations below: within Oracle, especially if you use special SQL functions, use uppercase unless you have very strong reasons not to do so!]
Some issues with the new SimpleExpressionParser probably still exist and have not been found. Others, as mentioned above, have been found but will be considered as 'features' that will not be fixed, as they concern very special functions and use cases.
One of the most relevant ticket where these issues were discussed is
bug #88007
.
One issue that is still not understood is
bug #91075
.
In summary, there are a few simple recommendations users could try and follow to avoid issues like the ones above:
- If possible, do not use any reserved words as table, column and alias names.
- If possible, do not use lowercase table, column and alias names, especially not in Oracle.
- Within your SQL fragments, some special SQL functions must be uppercase (e.g. SYSTIMESTAMP, see above).
We will try to keep this page updated as more issues appear. Thanks for your collaboration!
--
AndreaValassi - 01-Feb-2012
PS: Reserved words (work in progress)
This is a followup (in progress) of
bug #91075
.
It should be noted that there are two categories of 'reserved words':
- Words like 'SELECT' or 'AND' which are SQL commands and operators
- Words like 'SYSTIMESTAMP' which are SQL functions
There is a difference: with the present API, CORAL may try to do something with the first category of words if used as column names, but will never be able to do anything with the second category.
If a user calls
addToOutputList("SELECT")
, CORAL can guess that:
- There is a column called "SELECT" and the user wants to retrieve its value, hence SELECT should be escaped. This is precisely bug #91075
, for which a fix is being prepared.
However, is a user calls
addToOutputList("SYSTIMESTAMP")
, CORAL can take this to mean one of two things:
- There is a column called "SYSTIMESTAMP" and the user wants to retrieve its value, hence SYSTIMESTAMP should be escaped.
- The user wants to retrieve the current time via the "SYSTIMESTAMP" function, hence SYSTIMESTAMP should not be escaped.
As there are use cases (in COOL) for the second case, while the first case can be avoided by following the recommandation not to use reserved words as column names, the policy of CORAL will be to give precedence to the second case in doubt. This is precisely the idea behind the fix for the second part of
bug #87111
.
In summary:
- Some reserved words can in simple cases be used as column names, and CORAL will try to accomodate these if not to complex. This is the case for 'SELECT' in bug #91075
.
- Some reserved words, especially those indicating SQL functions, cannot be treated by CORAL as column names and it is the responsability of the user to avoid such words as column names or find other workarounds. This is the case for 'SYSTIMESTAMP' in the second part of bug #87111
.
- One workaround for some simple queries may be to prepend column names by table aliases (as in 'SELECT T."SYSTIMESTAMP" from T', see comment #2 in bug #91075
.
- The general recommandation remains: do NOT use reserved words as column names. If you do, you should accept CORAL "as-is", it may work or it may not work...