No results can be a result too, chemists from Delft prove. They publish one of the largest datasets of a model of rhodium-catalysed hydrogenation that showed surprisingly little, as shown in Chemical Science.

For half a century, rhodium catalysts have been used for the enantioselective hydrogenation of alkenes. Countless articles have been published on this type of catalyst and the underlying reaction mechanism, so one might expect that there is a well-developed understanding of these catalytic reactions. But there is no streamlined way to quickly select the right ligands for your homogeneous catalyst when changing substrates. Adarsh Kalikadien, Evgeny Pidko and colleagues at TU Delft and Janssen Pharmaceutica wanted to see if they could use machine learning to develop a predictive model for this, but the project turned out differently than expected.

‘The idea was not that complicated’, says Kalikadien, a PhD student in Pidko’s group. ‘We set up a simple model reaction using a very well-known rhodium catalyst. The aim was to use this to build statistical models to predict which catalysts and ligands you can use, which would lead to less trial and error.’ They unleashed several machine learning models on a combination of a computational dataset and Janssen’s high-throughput experiments.

Random

One of the things the team did was to compare the performance of these models. Kalikadien: ‘We calculated all kinds of properties based on quantum chemistry – the most intensive and expensive calculations – 2D cheminformatics and also 1D representations. These properties are different representations of the catalyst as a model would see it. As a test, they also added a random set of 34 random numbers between -100 and 100. ’The bizarre thing was that all the simpler models, even the random one, showed the same performance as the expensive one, so it turned out to be completely uninformative.’

‘We made everything open source.’

Something that is not reflected in the paper, but which influenced the project, was a small oversight within the team. ‘On the computer, you draw the 3D structure of the catalyst you have tested under certain conditions. You then do DFT calculations on it and extract the properties’, says Kalikadien. ‘We were using the CAS numbers of the ligands. But what we did not realise was that our CAS numbers and the drawings on the vials in the lab did not match our 3D structures.’

‘For months we discussed the properties with the team and made improvements, and finally we had really good calculations at a high level of computation’, the PhD student continues. ‘But at a certain meeting it turned out that the calculated structures did not refer to the correct identifiers for the experimental data! So we had to go through all these structures one by one to see where things were wrong. To our surprise, once we had processed the right molecules and built a new statistical model, we got almost exactly the same results.’ So one of the conclusions was: with this out-domain modelling approach, it doesn’t matter what you put in. It was an indication that the model was not learning much from the given representation. ‘We can laugh about it now, but during the project it cost me some of my mental well-being’, he says with a laugh.

Valuable

This was supposed to be a simple little project to do in between, but it did not go as expected. ‘I myself found many of the results a little disappointing’, Kalikadien admits. ‘Nevertheless, the research – and especially the data it generated – proved valuable, especially in the light of advances in machine learning. So we made everything open source. Not only can all the data be viewed, but we also offer the code, including packages and manuals, so that anyone who wants to can do the same kind of research.’

In this way, they published one of the largest datasets of a particular type of hydrogenation reaction. But the publication was still a challenge. ‘It was a very deep study of how machine learning works in chemistry, and not all the conclusions were positive. This led to a high-profile journal rejecting the article because they felt it “didn’t belong here”. Fortunately, Chemical Science was more open to it, so we were able to publish our data, code and even interactive figures there.’

Meaningful

What’s next? ‘Our representation was not as meaningful as we had hoped, so we are now looking for a representation of the catalyst that is perhaps a little less simplified, but still as simple as possible’, says Kalikadien. ‘You also don’t want the cost to be too high, so we are trying to get more information from the reaction mechanism into the model without making it too complicated. In other words, a dynamic version of the representation.’

Kalikadien, A.V. et al. (2024) Chem. Sci., DOI: 10.1039/D4SC03647F

Onderwerpen