How to get datatable content via API in workflow

Dear community,

is there a way to fetch the content of a data table via API in a workflow (which to my knowledge means using a Flexible REST API processor) for further processing?
The use case behind this is, that we dynamically want to fetch data table contents from different projects, domains, and maybe in the future even instances.

Thanks a lot!

Have you tried using the /data/{dataSetID}/samples call maybe?

When doing the call you have to specify the amount of samples you’d like to get by using the amount parameter.

Here’s the official docu for it: https://od-api.pages.intranet.onelogic.de/#data__datasetid__samples_get

I would give /data/content with a suitable data limiter configuration a try. Generally, keep a close eye on performance. What you are planning to do can easily become a performance bottleneck.

Thanks for the answers!

I think in this particular case the “samples” call is not what I want because I really need the entire content of the table. I could set the amount to a very high number but that sounds a bit dangerous already :smiley:
I have been experimenting with /data/content and found a configuration that works in the processor (along with header parameter “Content-Type” = “application/json”:

[{
    "dataId": "8cb16472-c56c-4da7-8581-ae04564a6972",
    "dataRequest": {
        "id": "3ca1b4c6-dfaa-48ad-83b7-845b847e6aba",
        "transformations": [],
        "statistics": [],
        "includeRowId": "ALWAYS",
        "suppressTotalCount": true,
        "limiter": {
            "type": "COMPLETE"
        }
    }
}]

But like @stefan.ganser mentioned, this is to be treated with caution because it loads the entire data without a proper pagination mechanism.
@stefan.ganser do you have some recommendation as to which size this is most likely ok and from when on, pagination would give a serious performance advantage?

@magdalena.murr this is difficult to say. Expressing it in number of rows seems inappropriate to me, because this measure doesn’t take the amount of data per row into account. Also, I could imagine that the underlying data source plays a role, too.

1 Like

Thank you for the answer nonetheless! :slight_smile:

Thank you very much for the example @magdalena.murr. I was also trying to get the content of a Data Table via OD API, but in a Python function, and I used the following syntax to do that. When formulating the request, I experienced lots of encoding problems for the body, which I solved by using data=json.dumps(request_body) :smile:

#Input: datatable_id: The ID of the data table whose content we wanna read
#Output: the content of datatable ID passed as input
def get_datatable_content_by_id(base_url, datatable_id, token):
    header = create_API_header(token)
    request_name = "data/content/"
    url = base_url + request_name
    
    request_body = [{"dataId": datatable_id,
                    "dataRequest": 
                    {
                            "id": "3ca1b4c6-dfaa-48ad-83b7-845b847e6aba",
                            "transformations": [ ],
                            "statistics": [],
                            "includeRowId": "ALWAYS",
                            "suppressTotalCount": True,
                            "limiter": 
                            {
                                "type": "COMPLETE"
                            }
                    }
    }]
    
    r = requests.post(url, headers=header, verify=False, data=json.dumps(request_body))
    print(r.content)
    
    assert r.status_code == 200, "Could not retrieve data table ID" + datatable_id + " with the ONE DATA API!"
    return(json.loads(r.content))

If you also want to load the data table’s content JSON into a Pandas Dataframe, I created this Python method to do that:

#Input: - json_content(String) = the content of the data table requested
#Output: - Pandas DataFrame = the json_content parsed as pandas data frame
def get_dt_json_content_into_pd_data_frame(json_content):
    
    json_dt = json_content[0]
    
    #Get table headers
    list_dt_column_names = [field["name"] for field in json_dt["schema"] ]
    
    dict_dt_columns = {column_name : [] for column_name in list_dt_column_names}
    
    #Iterate over all rows and add corresponding entries to dict_dt_columns
    for row in json_dt["records"]:
        for column_name in list_dt_column_names:
            dict_dt_columns[column_name].append(row[column_name])

    pd_dict_dt = pd.DataFrame(dict_dt_columns)