Getting Sensor Data in Bulk from the One Platform


#1

I have started work as a new user with an existing Exosite application.

The project has been underway with live users since 2014.

I need to collect some past data on the order of about 1MM data readings.

I can certainly narrow that down by CIK and time period to whatever is practical.

Does anyone here have a recommendation for the fastest way I can get these readings in bulk form to load into another application?

I have access to whatever resources I need in terms of compute and storage - and the python code that exists uses a 1 reading-at-a-time approach that will not be able to get the data for me fast enough for my needs at present - this is more or less a one-time need so that i can process this data in another application.

Thanks,
-jamse


#2

@asllc.james,

Thanks for posting!

We would definitely recommend using the RPC API and specifically the ‘read’ procedure. That procedure can specify a window of time and a limit to the number of datapoints returned.

The limitation that you will have to keep in mind is that this call must be made over HTTP. We are agnostic as to what is used to make the call and where it comes from.

If your external application is capable of making requests, by all means arm it with your device’s CIK and instruct it to send RPC API requests.

Otherwise a script running on another machine to download and then upload information is a good way to go about this.

Here is an old example on how to do this in Python: https://github.com/exosite-garage/archive_data

Don’t like Python? You can check here if we have a library that wraps our RPC API in a language you prefer.

This is a common use case of Exosite, let us know if you have any trouble.

Happy to help,
-Martin


#3

I currently have a Python app that does this (inherited) and it is uploading to an MS SQL server, inserting 1 row at a time into a relational database - it seems to take about 5 minutes per 1000 readings.

So, if I read your recommendation correctly I can get the readings the same way, and then just modify the code to change to a bulk upload strategy to SQL server.

I am thinking of storing the data in a local file to my script as tab delimited text, then using MS SQL Server Integration Services (SSIS) to bulk upload that data.

Does that sound like a likely approach to you, based on your experience?


#4

Just reading through the exo usage text, and saw this (NOTE: CIK is an example):

exo dump 5fbbf00000000000000000000000000000000000 clientdump.zip

Would this be a fast way to get a large chunk of data for a CIK, perhaps all 1MM entries?


#5

If your script is already using the RPC API – Yep, it should be as straightforward as changing the options sent in your request and handling the larger set of data returned.

That sounds very reasonable, though I am saying this with no knowledge on how to use SSIS.

The approach you outlined is generally what some other users have setup today. The differences are the rules of the file format and the target database.

-Martin


#6

Excellent and clear guidance. Thank you Martin.


#7

Just to follow-up:

I had a Python script that is getting data from Exosite via OneP http API, and then sending the data to our SQL Server database.

I successfully dramatically increased performance end-to-end with millions of readings by doing the following:

  1. Modified the script to change the SQL Server insert code to simply write a line of data to a text file, using tabs as delimiters.

  2. Used SQL Server SSIS to import data to my database.

The transfer of readings at the millions-of-data-points scale went from not possible to about 2+ hours end-to-end (writing the file and consuming it with SSIS).

Just FYI: Using SQL Server SSIS is outside the scope of our discussion here - please see MS SQL server documentation for that info).

-james


#8

That sounds like a dramatic improvement – congrats.

I am glad to hear that you got things working.

-Martin