The eupolicy.social admin @admin

**Blue Obelisk** @blueobelisk@fosstodon.org · 1d

Blue Obelisk @blueobelisk@fosstodon.org

Oscar4 5.3.0 has been released: https://github.com/BlueObelisk/oscar4/releases/tag/5.3.0

"OSCAR (Open Source Chemistry Analysis Routines) is an open source extensible system for the automated annotation of chemistry in scientific articles. It can be used to identify chemical names, reaction names, ontology terms, enzymes and chemical prefixes and adjectives, and chemical data such as state, yield, IR, NMR and mass spectra and elemental analyses."

GitHubRelease 5.3.0 · BlueObelisk/oscar4[maven-release-plugin] copy for tag 5.3.0

#cheminformatics #chemistry

**Noel O'Boyle** @baoilleach@mstdn.science · 3d

Noel O'Boyle @baoilleach@mstdn.science

Blog post on "A new job, a postdoc opportunity, an open biological curator role, and a user group meeting "

https://baoilleach.blogspot.com/2025/07/a-new-job-postdoc-opportunity-open.html

baoilleach.blogspot.comA new job, a postdoc opportunity, an open biological curator role, and a user group meetingAlmost exactly 6 months ago, I took over the leadership of the Chemical Biology Services team at EMBL-EBI, Hinxton, UK. This is the team th...

#cheminformatics #chemjobs #chempostdocs

Continued thread

**Egon Willighgen** @egonw@mastodon.social · 4d *

4d *

Egon Willighgen @egonw@mastodon.social

the preprint argues that a flood of mediocre articles is (partly) caused by open data making it easy to do all sorts of analyses of these.

It seems to me that the observation is correct. I have had this impression in #cheminformatics too, and, indeed, running an analysis on a data set like Tox21, ChEMBL, etc, had been getting increasingly easy.

But I do not think that #openscience and #FAIR are the problem, nor restricting access the solution.

I have some ideas about the underlying problem:

**Egon Willighagen** @egonw@social.edu.nl · 5d

Egon Willighagen @egonw@social.edu.nl

as Marshall College Professor Jones could have said: "This belongs in a database!" https://doi.org/10.1021/acsomega.4c10413

Well, the first pKa column and all arsonic acids are now in @wikidata :)

#openscience #chemistry #cheminformatics

**The BridgeDb Project** @bridgedb@fosstodon.org · Jul 28

Jul 28

The BridgeDb Project @bridgedb@fosstodon.org

new BridgeDb Datasources release: https://github.com/bridgedb/datasources/releases/tag/20270728

"Uses UniProtKB as name, removes EcoGene, and multiple small updates"

BridgeDb Datasources is a dataset with metadata about data sources and organisms used by BridgeDb Java and downstream tools like @wikipathways, PathVisio, and others

GitHubRelease Release 20250728 · bridgedb/datasourcesUses UniProtKB as name, removes EcoGene, and multiple small updates. Full Changelog: 2024092...2027072

#bioinformatics #cheminformatics

**Egon Willighagen** @egonw@social.edu.nl · Jul 26

Jul 26

Egon Willighagen @egonw@social.edu.nl

chemical compound identifiers in @wikidata: https://edu.nl/xb8y7

#chemistry #wikidata #cheminformatics

**Kohulan Rajan** @Kohulan@mastodon.social · Jul 24

Jul 24

Kohulan Rajan @Kohulan@mastodon.social

New Preprint Alert!

We're excited to share our latest work on #ChemRxiv! MARCUS (Molecular Annotation and Recognition for Curating Unravelled Structures) is a web-based platform for extracting chemical information from scientific papers.

Preprint: https://doi.org/10.26434/chemrxiv-2025-9p1q1

Try it out: https://marcus.decimer.ai

ChemRxivMARCUS: Molecular Annotation and Recognition for Curating Unravelled StructuresThe exponential growth of chemical literature necessitates the development of automated tools for extracting and curating molecular information from unstructured scientific publications into open-access chemical databases. Current optical chemical structure recognition (OCSR) and named entity recognition solutions operate in isolation, which limits their scalability for comprehensive literature curation. Here we present MARCUS (Molecular Annotation and Recognition for Curating Unravelled Structures), a tool to aid curators in performing literature curation in the field of natural products. This integrated web-based platform combines automated text annotation, multi-engine OCSR, and direct submission capabilities to the COCONUT database. MARCUS employs a fine-tuned GPT-4 model to extract chemical entities and utilises an ensemble approach integrating DECIMER, MolNexTR, and MolScribe for structure recognition. The platform aims to streamline the data extraction workflow from PDF upload to database submission, significantly reducing curation time. MARCUS bridges the gap between unstructured chemical literature and machine-actionable databases, enabling FAIR data principles and facilitating AI-driven chemical discovery. Through open-source code, accessible models, and comprehensive documentation, the web application enhances accessibility and promotes community-driven development. This approach facilitates unrestricted use and encourages the collaborative advancement of automated chemical literature curation tools. We dedicate MARCUS to Dr Marcus Ennis, the longest-serving curator of the ChEBI database, on the occasion of his 75th birthday.

#Cheminformatics #OpenScience #ChemicalDatabases

**Egon Willighagen** @egonw@social.edu.nl · Jul 23

Jul 23

Egon Willighagen @egonw@social.edu.nl

ha, I managed to convert a CXSMILES in @wikidata via the MDL V2000 molfile and the #inchi webtool into an #InChI (!B) and InChIKey :) https://www.wikidata.org/wiki/Q66421202#P117

#cheminformatics #polymers

**Egon Willighagen** @egonw@social.edu.nl · Jul 23

Jul 23

Egon Willighagen @egonw@social.edu.nl

heading to the 2nd day of the Technical InChI Meeting in Aachen/Germany

#cheminformatics #inchi

**Egon Willighagen** @egonw@social.edu.nl · Jul 20

Jul 20

Egon Willighagen @egonw@social.edu.nl

heading tomorrow to the InChI Technical Exchange Meeting Summer 2025 in Aachen/DE

Looking forward to it, and particularly talking about the InChI for inorganics and trying that in @wikidata :) See https://doi.org/10.26434/chemrxiv-2025-53n0w

And also the nano InChI, see https://doi.org/10.3390/nano10122493

ChemRxivScholia Chemistry: access to chemistry in WikidataSharing knowledge on chemicals in the digital age has been the playground of databases such as the Chemical Abstract Services and PubChem. Wikipedia complements this field by providing context to chemicals aimed at a broad audience, but is not easily read by machines. Wikidata was started as a database service to improve the machine readability of the knowledge captured in Wikipedia. Wikidata has an open license, application programming interfaces, and a strong provenance model. Scholia uses the features to provide access to chemical knowledge. This study reviews the chemistry in Wikidata, shows how thousands of new chemicals were added, extends Wikidata with new properties for chemical representation and external links to additional databases, and shows how we extended Scholia to represent the chemistry in Wikidata.

#nanoinformatics #cheminformatics #inchi

**Egon Willighagen** @egonw@social.edu.nl · Jul 20

Jul 20

Egon Willighagen @egonw@social.edu.nl

updating some java #cheminformatics libraries... Euclid and CMLXOM, and possibly making a Bacting release... but need to continue working on the CDK Depiction book chapter too...

**Egon Willighagen** @egonw@social.edu.nl · Jul 16

Jul 16

Egon Willighagen @egonw@social.edu.nl

oh boy... finally getting there... so close to finalizing and release a new SMARTCyp?

See for context https://chem-bla-ics.linkedchemistry.info/2024/04/07/cdk2024.html and https://chem-bla-ics.linkedchemistry.info/2024/06/16/cdk2024-3.html #cdk2024

#opensource #cheminformatics

**Egon Willighagen** @egonw@social.edu.nl · Jul 16

Jul 16

Egon Willighagen @egonw@social.edu.nl

did you notice I started @Codeberg ?

#opensource #cheminformatics

**Egon Willighagen** @egonw@social.edu.nl · Jul 15

Jul 15

Egon Willighagen @egonw@social.edu.nl

# Qleverfile for PubChem, use with the
# QLever CLI (`pip install qlever`)
#
# qlever get-data # ~2 hours, ~120 GB, ~19 billion triples
# qlever index # ~6 hours, ~20 GB RAM, ~350 GB disk space (for the index)
# qlever start # a few seconds

Source: https://github.com/ad-freiburg/qlever-control/blob/main/src/qlever/Qleverfiles/Qleverfile.pubchem

GitHubqlever-control/src/qlever/Qleverfiles/Qleverfile.pubchem at main · ad-freiburg/qlever-controlContribute to ad-freiburg/qlever-control development by creating an account on GitHub.

#cheminformatics

**Egon Willighagen** @egonw@social.edu.nl · Jul 13 *

Jul 13 *

Egon Willighagen @egonw@social.edu.nl

I think I am going to try to recover a bit of #cheminformatics / #chemistry #history, and make the index of the Internet Journal of Chemistry (IJC) FAIR in @wikidata

While the journal no longer exists, many articles are cited quite a few times.

I did some exploration some time ago, and for some I found full text "self-archiving" versions online.

And, TIL that Web of Science has entries for the articles too, which I just added for the 9 articles already in #Wikidata: https://w.wiki/Eide