An article from earlier this year by Ulrike Schwinn titled Oracle Object Storage Access for all Oracle Databases with DBMS_CLOUD got me thinking about doing more with CSV data because it can be treated as a table through DBMS_CLOUD.CREATE_EXTERNAL_TABLE. In essence, provide secure RESTful access to the data, and even visualise that data through charts.
Very quickly, we’ll go from a uploading a Salary.csv dataset to rendering a chart of Job Titles for the highest education level and permitting RESTful queries on data…
Salary Data
Let’s start with a CSV dataset. For this exercise we’ll use a public domain Salary by Job Title and Country from kaggle.com. The download is a ZIP archive with two files. In this case it is just the Salary.csv which will be used.
I’m not going into all the details of downloading the archive file, unzipping it, creating a bucket, and uploading the Salary.csv but needless to say, those are important steps to get you this stage…
Note that the file has a URL for accessing and downloading it but that URL is not publicly accessible.
External Table
In the database, the DBMS_CLOUD package can create a table which has it’s data loaded from the CSV file. To get access to object storage from the database we require a credential object through DBMS_CLOUD.CREATE_CREDENTIAL. Details on achieving that are already covered in Ulrike’s Oracle Object Storage Access for all Oracle Databases with DBMS_CLOUD article and my previous article Data move mojo with ORDS 21.3.0. In fact, I’m going to reuse that DATAPUMP_CRED_NAME credential. That’s just out of convenience for me. You be you and do your own thing.
To create the SALARY table we must specify a column list with data types and sizes. That requires figuring out, based on the data and also the description on kaggle.com, what data types makes sense. Some columns are obviously numeric, but it is not always clear how big or small those columns should be.
begin
DBMS_CLOUD.CREATE_EXTERNAL_TABLE(table_name=>'SALARY',
credential_name => 'DATAPUMP_CRED_NAME',
file_uri_list => 'https://frckdrxj9oco.objectstorage.eu-frankfurt-1.oci.customer-oci.com/n/frckdrxj9oco/b/example-bucket/o/Salary.csv',
format => json_object('type' VALUE 'CSV', 'skipheaders' VALUE '1'),
column_list =>'Age NUMBER(4,1),
Gender varchar2(6),
Education_Level NUMBER(1),
Job_Title varchar2(100),
Experience NUMBER(4,1),
Salary NUMBER(9,2),
Country varchar2(200),
Race varchar2(100),
Senior NUMBER(1)'
);
end;
/
Note that the file_uri_list
parameter came directly from the object details panel in Object Storage.
Unless there’s a syntax error, creating the table will always work. You’ll only find out about problems when you run a select query. That’s when the DB attempts to read the file.
Chart
ORDS Database Actions has a charts and dashboard feature that we’re going to use now. Charts enable you to create charts from the database. The chart is constructed using the input SQL command. Dashboards enable you to group charts together to create reports. Let’s define a bar chart showing the number of job titles for the highest education level. So that the chart is not too busy I’m limiting the chart to only show job titles that have more than 10 occurrences.
For simplicity my chart definition says that it is unprotected, so it is publicly available and anyone can see it with the URL. That’s ok with this scenario because the data is in the public domain anyway but please consider the protection on your charts and dashboards.
That URL can be shared and it will render in the browser like this…
Although the chart and its data is available to anyone with the URL, the data is just the data for the chart.
For many folks the only way they can create a chart based on a CSV file and share it is through using a spreadsheet and sharing that, which means sharing the underlying data. In this case the data is not shared, just the chart. So far so cool. Now let’s look at sharing that data and allowing clients to query it.
AutoREST
This will be a short section. It is as simple as:
EXEC ORDS.ENABLE_OBJECT(p_object => 'SALARY');
That’s it. The SALARY table is now accessible through a REST service at /ords/<my schema>/salary/
. Moreover, the Filter Object query parameter provides a powerful search mechanism through the Salary.csv file. For example, let’s find people in the data set with more than 30 years experience: q={"experience":{"$gt":30}}
You can’t do that so easily in a text editor 😁
See Filter Objects in the ORDS Developer Guide for more information. Of course this RESTful URL can be protected to require a authorisation for accessing it too while the chart is left unprotected.
Note that with standard REST Enabled tables you can perform DELETE, GET, POST and PUT methods but with external tables, the only functional method is GET. All others will get a HTTP 555 response with this message: ORA-30657: operation not supported on external organized table
.
Conclusion
If you have followed the above steps you have opened up new ways of accessing, sharing and interrogating data in a CSV file.
To reiterate, when using an external table the data comes from files in the Cloud or from files in a directory. The data does not get loaded or copied into a table in the database. Drop in a new Salary.csv file with different data and it is immediately picked up on the next query.
The combination of Object Storage, Autonomous Database and Oracle REST Data Services provide some interesting options for doing more with CSV files. It has its limits though. To be realistic, large volumes of data and complex queries would require actually loading copies of the data into the database. For that one should consider Autonomous Database Data Studio to load the data. You can still use ORDS as above to share the data as charts or expose a RESTful interface to the data.
This article was inspired by Ulrike’s January blog post. More excellent articles on data management can be found at https://blogs.oracle.com/authors/ulrike-schwinn. Well worth a look.