Synchronise OBIS nodes scopes with their endorsed data

All the OBIS nodes, either regional o thematic have a given scope.

However, not all the OBIS data that falls under a node scope can be found when that node is selected.

This is because the nodes do not always “share” the data between each other, although they should. Instead, the OBIS nodes focus mainly on the datasets they manage instead of also endorsing datasets originally managed by a different node but that fall within their scope.

Resulting into innacurate searches E.g

European data from the OTN node will not be found if a user searches for European data using the EurOBIS node

This confuses the users that intuitively rely on the nodes scopes to find data.

Proposed solution:

All the OBIS nodes should endorse all the datasets within OBIS that fall under their scope.

An efficient implementation of the solution could have the following steps:

  • Identify OBIS nodes scopes

    • Regional nodes → based on geometry/layer
    • Thematic nodes → based on temporal, depth, etc. query
  • Create a list of OBIS datasets that each OBIS node should endorse.

    • Based on their scope
    • List content → Datasets under node scope - Datasets already endorsed by that node.
  • Store the list in a findable place

  • Standardise endorsement communication with OBIS Secretariat

    • Github/email…?
    • Automated?? since another OBIS node has already endorse it
      • Would force OBIS nodes to agree on Quality Standards and approaches..
2 Likes

Dear Ruben, I think this would be better solved by creating more intelligent filters on the OBIS portal. e.g., I want all tracking data in EU marine waters. Asking one node to endorse data from another node would not be very efficient. What would be interesting is an automated notification to nodes e.g. new data from EU waters has been published in OBIS, so regional data portals like EMODNet could ingest those.

1 Like

Hi @WardA

The notification to nodes on publication of data within their scopes sounds great to me and it would solve one of the problems for sure. :star_struck:

Although it would also be useful to have a list of datasets that fall under the scope of a node, and not just a one time notification that would only aid regional portals to gather the datasets and not necessarily OBIS users that search for data in OBIS.

Regarding the endorsement, I understand that it may not be the best solution since it puts work on the OBIS node managers, but it’s the only option I could think of to aggregate datasets that fall within a given node scope.

More intelligent filters based on specific use cases sounds great too actually. However, it still seems intuitive for users to use OBIS nodes filters to get data based on the nodes scope.

I’ll bring this topic to the next DCG this friday to gather members opinions.

hi @rubenperper definitely interesting discussion and would be good to bring it to DCG as well as NCG. However, it will definitely be a challenge each time e.g. EurOBIS publishes a dataset to check if it also falls under the scope of another node (who will do that?), have it endorsed by that node (what should be done, QC metadata, check for duplicates, …?) before ingesting. And the process will need to take into account that EurOBIS republishes datasets in each publication cycle. How many people expect to find all EU datasets to be listed under EurOBIS if there was a more easily available regional filter on OBIS? We need more coordination among nodes, but I would not want to make the publication process more difficult. The criticism I hear the most is that the publication process is too cumbersome.

Hi,

Alright, I propose to break the issue down to make it more digesteable since many topics have been introduced.

First of all I’d like to change the title of the Discourse topic to “Synchronise OBIS nodes scopes with the data available via each OBIS node metadata page” since that was the intended purpose of the topic, shoutout to my poor choice of words choosing topic names. :raising_hands:

Datasets endorsement
• Agreed: No need to re-endorse data that has already been endorsed by other OBIS node.

Availability of data in OBIS Nodes pages (real intention of the topic)
• Proposal: Each node needs an easily automatable list of DwC-A endpoint URLs (e.g. IPT resource URLs) of datasets in OBIS that belong under each node scope so they are available via each node OBIS page.

How to do it? example list for OBIS Norway:

  • All datasets in [list of Norwegian IPTs] that have OBIS Network in IPT metadata
  • All datasets that have Norwegian records (not necessarily in Norwegian IPTs):

My question is, could we all agree on a standard way to communicate this to OBIS Secretariat so it can be automated and communication burden minimized?

The example for OBIS Norway is easily automatable since it uses filters from the OBIS mapper to capture Norwegian data (apart from a conditional list of IPT resources). Would this make sense?

@WardA Does this make more sense than my initial comments in the topic?

Apology for my stupidity @rubenperper

I am not sure if I understand. How is what you need different from the following?

I think all the datasets that EurOBIS endorsed are also listed in the links above.

could we all agree on a standard way to communicate this to OBIS Secretariat so it can be automated and communication burden minimized?

Why do we need to communicate to OBIS Secretariat? I felt like I am missing something. The dataset is already ingested in this point, or do you mean before the dataset is ingested?

Part of the reason I don’t understand is the mapper link does not show anything in my browser. Trying to understand this as AntOBIS, thanks for your patience.

1 Like

Hi @ymgan and all,

Disclaimer: I’m using a node different than EurOBIS to showcase this because I’m looking at it from a general OBIS perspective and not only from an EurOBIS one.

The premise is: not all the OBIS data that belongs to the scope of a node can be found under its OBIS node page.

For example, the EurOBIS node page contains European data, yes, but it doesn’t contain ALL the european data in OBIS. My thinking is that it should contain it and I’m looking for an automated way to solve the same issue for all the OBIS nodes.

Another example, I’m sure that there is Antartic data in OBIS that is unknown to AntOBIS because it was published by a different node. That data doesn’t appear in the AntOBIS node page.

I initially used an example with OBIS Norway because their node page is empty since they just joined. However, there is a lot of norwegian data in OBIS that should be available via their node page:

Is it expected that the Norwegian node page stays empty until they start filling it up with new datasets (I hope not) OR that we just migrate all the Norwegian datasets from the other nodes to the Norwegian node and remove them from where they are now (I also hope not)?

The scopes of OBIS nodes is inherently overlapping since we have both Regional and Thematic lots and since there are global datasets that expand across nodes scopes.

A way of solving issues like these have already been proposed and is currently in use, for example, This dataset, although initially published and endorsed by EurOBIS, also appears under the OBIS UK node page because both OBIS nodes have been linked to it, allowing each node to keep good track of data within their scope.

I understand that we do not want each node to re-endorse datasets each time, and I’ll agree with it for now (although re-endorsing datasets updates seems essential to prevent exposure of updates that contain anything-at-all type of data). So let’s not discuss about endorsement.

My question is, how can we make sure that OBIS Norway gets all those norwegian records in their Node page? What is the best approach? Of course we could just get a list of them and ask Pieter to link them to the node and just like with OBIS UK datasets, they would be linked to several nodes. But the reason why I created this topic is because I’m seeing the opportunity to create a logic to automatically link all datasets to their corresponding node (once they have been endorsed by one of them), without having to ask Pieter to link them everytime that we get a case like this.

Disclaimer 2: this idea of creating a scope logic for each node is easy for Regional nodes, not so much for Thematic nodes, but it’s matter of testing.

I hope it is a bit clearer now, but if it’s not, I’d then ask the question: What is the purpose of having OBIS node pages? Is it only to keep track of the mobilisation and data maintenance efforts of each node team?

Thank you so much @rubenperper !! It was very thoughtful of you to consider so many aspects. I am still not completely understanding the intent. Maybe it will be helpful if you can talk through this in one of the coordination group meetings?

I’m sure that there is Antarctic data in OBIS that is unknown to AntOBIS because it was published by a different node. That data doesn’t appear in the AntOBIS node page.

Personally, I see a region/location as a fact and node as a role. I am worried that overlapping both may mislead people about the ownership of the data. That being said, my personal preference is to have data about antarctica/southern ocean to be separated from data published/endorsed by AntOBIS.

I felt like what you want is probably something like the data about and data publishing in GBIF country pages. The 2 tabs are clearly separated. Example: Norway

I hope this helps! I am curious about the answers of your other questions too!
Thanks again for putting so much thoughts in this!

1 Like

Hi to all,

Personally I think we should make country profiles for fata that are in that country, we should remenber that despite GBIF that the nodes are national and the MoU are signed for the goverment, on OBIS we have “national” nodes and in some places we have more than one node.

The country profiles should be a good start point as the work that @WardA was mentioning about the reporting for the GBF (target 21 and maybe others)

I dont understand the diference on the nodes and the EEZ filter to the date.

Edit: We could prioritize (in my opinion) the eez and abnj zones to show statistics.

1 Like