  Date:08 Aug 2020
Seamless and uniform access to chemical data andtools experience gained in developing the OpenTox5th Meeting on U S Government Chemical Databasesand Open Chemistry 25 26 Aug 2011Dr Nina Jeliazkova.
Ideaconsult Ltd Bulgariawww ideaconsult net What is OpenTox Crash course 2 slides Motivation Architecture.
Technology Current status OpenTox API implementation matrix Experience the devil is in the detail REST web services Resource Description Framework.
Implementations AMBIT services Dataset service examples Chemical structures QA data query anddatasets comparison examples Conclusions.
Ideaconsult Ltd OpenTox crash course 1 with the help of cURL http curl haxx se 1 Find a compound by an identifier structure similarity substructure.
curl X GET http host query compound searc... Returns the URI of the compound http host compound 328curl X GET http host query smarts search ... Returns URIs of the hits http host compound 4562 Find a predictive model.
curl X GET http host model Returns URI of the available models e g http host model 83 Apply the model to the compoundcurl X POST http host model 8 d dataset uri http host compoun... Returns URI of the results e g http host dataset 999.
The results can be retrieved in all chemical MIME formats as well RDF XML N3 Ideaconsult Ltd OpenTox crash course 2 4 Find a datasetcurl X GET http host dataset search TOXCS... .
Returns the URI of the dataset s http host dataset 785 Launch a descriptor calculation algorithm on this datasetcurl X POST http host algorithm 8 d dataset uri http host dataset... Returns URI of the results e g http host dataset new.
6 Train a modelcurl X POST http host algorithm LinReg ddataset uri http host dataset ... Returns URI of the model e g http host model newLRmodel7 Apply the model to the compound from the previous slide.
curl X POST http host model newLRmodel dIdeaconsult Ltd dataset uri http host compound... 4 Motivation Predictive Toxicology applications need common.
components Access to datasets Algorithms for descriptor calculation and model building Validation routines The state of the art involves re implementation of.
these components in every new application If we had these components readily available we Quickly build new applications for specific purposes Experiment with new combinations of algorithms Speed up method development and testing.
Ideaconsult Ltd OpenTox Components Compounds Structures names Features Chemical and biological toxicological properties substructures .
Datasets Relationships between compounds and features Algorithms Instructions for solving problems Models Algorithms applied to data yield models which can beused for predictions Validation Methods for estimating the accuracy of model.
predictions Reports Report predictions and models e g to regulatoryauthorities Tasks Handle long running calculations Authentication and Authorization Protect confidential data.
Service registration and querying Finding services ofspecific type Ideaconsult Ltd Requirements Technological Platform Web services REST independence Communication.
Interoperability with through well definedexternal programs and interfacesdata sources Ontologies for the Transparency for exchange ofscientific and knowledge and data.
regulatory credibility Use and promote open Open for future standardsextensions Open sourcecomponentsIdeaconsult Ltd .
REpresentational State Transfer Architectural style for distributed information systems on the Simple interfaces data transfer via hypertext transferprotocol HTTP stateless client server protocol GET POST PUT DELETE.
Each resource is addressed by its own web address Multiple representations per resource Lightweight approach to web services Simplifies enables development of distributed and local Cacheability scalability inspired from the successful WWW.
architecture Language independentIdeaconsult Ltd OpenTox API Application Programming Interface How applications talk to each other Validation.
How developers implementapplications GEThttp opentox org dev apis api POSTPUT OntologyPOST Model PUT.
PUT DELETE OpenTox API ImplementationAll components are implemented as REST web services There could be multiple implementations of same type of components Subset of services could be hosted by the same provider or by multiple.
providers at separate locations Ideaconsult Ltd Implementation first an API The most common approach to scientific software and Identify the data model and functionality.
Translate the data model into a database schema Implement the database and user interface functionality Optionally provide libraries or expose some of the functionality as web Advantages Use one s favourite technology and jump directly into implementation.
Attract end users with nice GUI relatively quickly Relatively easy to persuade funding organisations this will be a usefulresource Disadvantages Proliferation of incompatible services providing similar functionality but.
incompatible programming interface Difficult to extract and collate data automaticallyIdeaconsult Ltd What end users really need The user profile organic chemistry background working in industry .
uses computational modelling tools but not a developer programmer I can do web search in the following databases and look for acompound and perhaps later for some toxicity endpoint SciFinder http www cas org products sfac... Toxnet http toxnet nlm nih gov .
ChemID http chem sis nlm nih gov chem... SCCS http ec europa eu health index... NTP http ntp niehs nih gov Google http www google co uk Pubmed http www ncbi nlm nih gov site... .
PubChem http pubchem ncbi nlm nih gov Disclaimer the list is not comprehensive But how can I retrieve results for multiple compounds andendpoints automatically without going manually to all the web And if technically possible is it legal .
Ideaconsult Ltd The Internet provides a unique example of whatsociety can achieve by adopting common standards Internet Engineering Task Force IETF working groups have theresponsibility for developing and reviewing specifications intended as.
Internet Standards The standardisation process starts by publishing aRequest for Comments RFC a discourse prepared by engineers andcomputer scientists for peer review or to convey new concepts orinformation IETF accepts some RFCs as Internet standards via its three step.
standardisation process If an RFC is labelled as a Proposed Standard itneeds to be implemented by at least two independent andinteroperable implementations further reviewed and after correctionbecomes a Draft Standard With a sufficient level of technical maturity a Draft Standard can become.
an Internet Standard Organisations such as the World Wide Webconsortium and OASIS support collaborations of open standards forsoftware interoperability While recently some authors argue the standardisation process is less thanideal and does not always endorse the best technical solutions the.
existence of the Internet itself based on compatible hardware andsoftware components and services is a demonstration of theopportunities offered by collaborative innovation flexibility Ideaconsult Ltd interoperability cost effectiveness and freedom of action .
Standards in Life Sciences Cheminformatics Historically the cheminformatics world has been driven by de factostandards developed and proposed by different vendors A number of initiatives relatively recent have adopted open.
standardisation procedures most notably InChI CML BlueObeliskinitiatives but there are no requirements for independent interoperableimplementations so far Bioinformatics Many relatively recent open data initiatives standardisation efforts .
Comparison with the network engineering practices Network hardware and software have to work together by its very essence Reviewers in the network engineering world are likely comfortable with reviewingcomputer code of the implementations Chemistry Biology software and databases can live in their own worlds unless we.
want data shared and tools interoperable Interoperability standards may affect business modelsIdeaconsult Ltd A common API first multiple independentinteroperable implementations later.
What we have done differently in OpenTox Identified the data model and functionality Defined the REST web service application programminginterface API which covers the data model andfunctionality.
Developed six independent interoperableimplementations of the API in 3 different languages Test if different implementations work together if not identify if the API spec is ambiguous leading to different interpretations or just the.
implementation is buggy Provided API libraries developed end user applications web UI standalone GUI command line tools workflowcomponents Ideaconsult Ltd .
A common API first multiple independentinteroperable implementations later Advantages Recall the use case Avoid proliferation of incompatible resources this.
however only makes sense if the API is adopted beyond asingle implementation Easy to develop multiple GUI applications once theAPI library functionality is in place Disadvantages.
Think first then implement GUI comes last Harder to persuade funding organisations becausereviewers usually look for nice GUIs Ideaconsult Ltd .
OpenTox API implementationchallenges Distributed team 7 out of 11 partnersdeveloping software Different experience and background.
Distributed system Efficient algorithms are the key Multi threading is important The amount of data transferred may affect the performance Network connections may fail or be slow.
Optimizing the perceived latency response time Troubleshooting a set of interoperating web services is hard Requires close cooperation and devotion of multipledevelopers Ideaconsult Ltd .
OpenTox API implementation challengesSteep learning curve Many API changes at the early stage Learning REST web services A new technology for all but one partner.
REST framework and browser peculiarities bugs instabilities What is RESTful what is not and why whether it matters Learning HTTP who said it is a simple protocol Pay attention to stream encoding headers error codes redirection .
many HTTP spec details Selecting a solution for REST security Learning RDF All OpenTox components are defined byhttp opentox org api 1 1 opent... .
A new technology for all partners Ideaconsult Ltd RDF Lessons learned OpenTox specific it hasn t started as Linked data RDF project .
OpenTox uses RDF for serialization only without mandatorybackend triple storage New resources and new triples are generated dynamically REST and RDF mix was not a popular choice back in 2009 but is natural for enabling retrieval of partial resource.
representation described by triples some issues discussed in http www jcheminf com content 3 1 1... RDF verbose libraries memory hungryl lack of streaming Steep learning curve some hard topics .
Data model vs format The subject predicate object concept vs tabular hierarchicalstructure Ideaconsult Ltd The recognition of the added value XML JSON YAML plain text etc Algorithm.
Algorithms GET Algorithms for descriptor calculation generationand selection of features for the representation ofchemicals structure based features chemical andbiological properties .
Classification and regression algorithms for creationof Q SAR models Rule based algorithms Algorithms for the aggregation of predictions frommultiple Q SAR models and endpoints and.
aggregation of predictions General purpose algorithms e g for visualization similarity and substructure queries applicabilitydomain read across Ideaconsult Ltd .
Uniform approach to data processingRead data from a web address process write to aOpenTox Components. Compounds: Structures, names, …. Features: Chemical and biological (toxicological) properties, substructures, … Datasets: Relationships ...

