Posted by linitrex on April 10, 2018 at 5:28am
The major challenge when using Drupal for publishing data from sensor measurements is how to adopt its data model which is content based to information that consists of many fields collected over prolonged periods of time. I'm interested if anyone in this group has implemented or is aware of solutions where several millions of records with tens to hundreds of fields are imported and visualized or exported to common formats using integration with third party technologies. I have implemented such a solution and I would like to share knowledge with others.

Comments
Microservice driven content
To help us solve this problem with a large archive we use microservice that drive many different things including Drupal content rather than trying to make the content drive the data. Here is a bit about the data service...
https://go.usa.gov/xQ2tu
We also have a search service to discover datasets that contain the needed datatypes based on your spatial and temporal range. Here is the documentation for it...
https://go.usa.gov/xQ2tJ
Currently we are in the process of migrating datasets to these services and will ultimately have ~900 datasets with time ranges that extend back in time 100 million years to today's near real time satellite data and derived products. We currently house ~20 petabytes of environmental information (worlds largest repository). A critical piece to orchestrating this much information is having solid metadata that describes these widely varied datasets. We are focused on ISO-19115-2 metadata records for these using GCMD keywords and CF standard names.
Thanks you so much, Phil. At
Thanks you so much, Phil. At that time I was looking exactly at https://www.ncei.noaa.gov wondering what stands behind. My case is definitely not that big. My biggest database is 3 800 000 records, each having 266 fields. I'm writing an article and as a part of my research I'm making an overview of the different approaches taken. I'll refer to your documents. Are you aware of any articles published regarding your approach?
This is a late response, but
This is a late response, but would like to hear your findings on this.
One way to handle large number of records would be to create files from the corresponding records, then create a custom file formatter to display them with visualization charts, as query a very number of items from database will greatly affect site performance for concurrent users.
This is a late response, but
This is a late response, but would like to hear your findings on this.
One way to handle large number of records would be to create files from the corresponding records, then create a custom file formatter to display them with visualization charts, as query a very number of items from database will greatly affect site performance for concurrent users.
This is a late response, but
This is a late response, but would like to hear your findings on this.
One way to handle large number of records would be to create files from the corresponding records, then create a custom file formatter to display them with visualization charts, as query a very number of items from database will greatly affect site performance for concurrent users.
Timeseries data framework
Another late response but... I have developed a framework (data Helper or dH) to flexibly store timeseries data in drupal. https://www.drupal.org/sandbox/robertwb/2298887
It uses 3 custom related entities:
dh_variabledefinition - detailed meta-data information pertaining to variables stored in the dh_timeseries and dh_properties tables. Attributes include units, default/null values, descriptions, time-step/units (if applicable), vocabulary (text field to organize related variables), plugins (for data entry and rendering - see dH Variable Plugin System).
dh_timeseries - a lightweight table for storing time series data related to any entity. Linked via a foreign key "varid" to the dh_variabledefinition table. Structured like a field, with entity_type and featureid (entity_id) fields to refer back to parent entity, but is actually an entity which permits more flexible use in Views, enabling dh_timeseries to be a base table, which permits constructing views that do an implicit UNION between entities that share variables in the dh_timeseries table.
dh_properties - a lightweight entity for storing flexible and persistent data related to any entity. All entries in dh_properties must have a "varid" property set, which refers to meta-data (name, type, units, plugins) in the dh_variabledefinition table. Structured like a field, with entity_type and featureid (entity_id) fields to refer back to parent entity, but is actually an entity which permits more flexible use in Views, enabling dh_properties to be a base table, which permits constructing views that do an implicit UNION between entities that share variables in the dh_properties table.