The Trends of De Novo Molecular Designs in the Twenty-First Century: A Mini-Review

The inception of advanced bioactive agents has driven the growth for sustained drug delivery and the boom of new medicines. The future of the medical and chemical biology relies on the amalgamation of the advanced systematic and analytical techniques, which shall be tethered together with a robust theoretical framework. The de novo drug design is one of such exciting strategies that use computational theories to generate novel molecules with a good affinity to the desired biological target. This mini-review provides a basic overview of the current trends and algorithms, which aids in the advancement of the de novo molecular framework.


Introduction
The recent advancements of machine learning, coupled with data science, have enabled the scientific route to steer through massive amounts of data and addressing various problems with accuracy and precision (He, Zhang, Ren, & Sun, 2016;Silver et al., 2017). Moreover, using data science has been established to work efficiently by reducing the time taken for a process to complete its operation and thus increasing the overall efficacy of the output. In particular, over the last couple of years, a significant amount of advances have been established in designing superior de novo molecules via machine learning. The ultimate role of devising a De Novo molecular framework is to develop a product via computational sciences, whose physicochemical properties meet arbitrary given requirements (Green et al., 2017;Tabor, Rosch, & Guzik, 2018). However, most of the recent studies solely focus on the computational perspective of designing molecular frameworks, leaving various www.scholink.org/ojs/index.php/sshsr Social Science, Humanities and Sustainability Research Vol. 1, No. 1, 2020 28 Published by SCHOLINK INC. loopholes in experimental verifications and their physical interpretations. For instance, in the case of polymers informatics, often, the structural complexities and the molecular dynamics posses to be the major disadvantages in synchronizing the theoretical and the experimental results (Ramprasad, Batra, Pilania, Mannodi-Kanakkithodi, & Kim, 2017;Audus & de Pablo, 2017). The conventional method used to prepare molecular designs relies majorly on fragment-based methods (for instance, RECAP) ( Figure 1), which aims to synthesize molecules by tethering the known fragments. However, issues like patentability and inferior structural diversity provide a potent setback for these techniques, when implemented. To address these challenges, the idea of de novo molecular generation methods was cultivated, which needed no such fragments to frame the algorithms. These processes use the Simplified Molecular Input Line Entry System strings, along with black-box optimization and deep neural network, to fetch the molecular simulations (Schneider & Fechner, 2005;Lewell, Judd, Watson, & Hann, 1998  This brief review aims to provide a fundamental outline of the recent developments in the De novo molecular designs and their various areas of applications in biomedicine, sustainability developments, and high-performance processing operations.

The Current Trends of De Novo Molecular Designs
The goal of developing such advanced materials, especially in the biomedical industry, is to incorporate desired and precise properties into the drugs. However, the integration of desired properties needs optimization of the maximum number of process variables, making this field of science one of the most growing domains (Shoichet, 2004;Scior et al., 2012;Author, 2012 to design molecular frameworks reinforced on the conditional variational autoencoder (Figure 2). The approach skips the high throughput virtual screening step and uses deep learning-based generative models directly to fabricate molecules having specific target properties. A condition vector is incorporated into the system, which regulates the target properties simultaneously when exposed to a particular environment (Lim et al., 2018). The group demonstrated that it was possible to induce five target properties (MW, LogP, HBD, HBA, and TPSA) having an error range of 10%. Moreover, the property and the behavior of one target property can be instantaneously tuned without disturbing the other working parameters, thus making the algorithm one of the robust techniques to design molecular platforms (Lim et al., 2018). With the advent of various organo-metallic frameworks, the usages of multiple polymer and polymer hybrids have emerged to be one of the potent candidates for applications in solar cells, organic light-emitting diodes, conductors, sensors and ferroelectrics (Niu, Guo, & Wang, 2015;Kaji et al., 2015;Ueda et al., 2014;Yeung & Yam, 2015). These designs of the organic framework are often accompanied by the optimized designs and predicted drawbacks to generate the most reinforced structure from the "chemical space" for a given environment (Kaji et al., 2015). network, which accelerated the development of SMILES. The group generated a large number of data points and then used a black-box optimization theory to choose the high performing molecules . In this context, Tsuda and his colleagues reported communicating a novel Python library (ChemTS) that bridges de novo molecular designs with material science. The set of SMILES strings is represented as a search tree where the ith level corresponds to the ith symbol (Yang, Zhang, Yoshizoe, Terayama, & Tsuda, 2017). A SMILES route is presumed to be completed when the string travels from the path to the terminal node. The initial root node based search tree was initiated by the Monte Carlo Tree search (a randomized best-fit search method), which effectively creates downstream channels and shallow tree via the rollout optimization (Figure 3). The amalgamation of the Monte Carlo Tree search method resulted in the library to exhibit to perform efficiently than the conventional systems (creating 40 molecules per minute). Moreover, the authors believe that this technique may be excellent in the alloy designing process, which involved the combination of multiple variables aiming for numerous desirable properties (Yang, Zhang, Yoshizoe, Terayama, & Tsuda, 2017).

2017), Copyright Taylor and Francis, 2017
In another instance, Tsuda and his research group developed molecule generators (ChemGE), which have the ability to fabricate multiple molecules with the complement from parallel computation (Yoshikawa, Terayama, Honma, Oono, & Tsuda, 2018). The working principle which ChemGE applies is slightly different from that of the conventional strategies. The novel technique of the population-based grammatical evolution model optimizes the number of unique molecules to be developed (Yoshikawa, Terayama, Honma, Oono, & Tsuda, 2018). The grammatical evolution works upon a given population to optimize a set of strings that operates by a context-free grammar. Such population-based evolutionary methods have been regaining popularity for solving black-box optimization problems, such as hyperparameter optimization and neural network design, because of their inherent concurrency. When contrasted with the programs that operate using SMILES, the mutation operation in ChemGE enables in the probability of developing a higher number of molecules.
It is indeed that SMILES inherently poses an organized way to distribute the molecular graph; however, the linear representation of a molecule graph may have various limiting factors (Yoshikawa, Terayama, Honma, Oono, & Tsuda, 2018).
Furthermore, mutation operations in evolution ensure broad diversity throughout the optimization process. The drug-likeness score, which was used to benchmark the fabricated molecule, exhibited that ChemGE can yield a higher number of molecules (because it uses an in-depth learning-based approach) when contrasted with the traditional computational science techniques. Using a parallel probe of 32 cores generated 189 molecules whose docking scores are better than the best molecule in a database (DUD-E18) in 26 hours ( Figure 4) (Yoshikawa, Terayama, Honma, Oono, & Tsuda, 2018). Waller et al. long short term memory based recurrent neural networks can be potentially applied to the statistical chemical language model (Segler, Kogej, Tyrchan, & Waller, 2017 physiochemical properties, this algorithm can develop a higher number of new molecules, which can further be extended to create a virtual screening. The model was observed to behave transfer learning when disintegrated into smaller sets of molecules responsive towards a specific target. The program can be a potential for robot conducting synthesis and biological testing as it can autonomously initiate the process owing to the generation of the language model on multiple iterations. One of the prime advantages of using the system is that it provides a bolstered framework to address various molecular generation approaches. Moreover, the model amalgamates the structure generation and optimization, thus behaving as a dual responsive algorithm. Although interpretability is one of the significant drawbacks of the system, the small work step to cast molecule generation as a reinforcement learning problem is something that we all should look forward to (Segler, Kogej, Tyrchan, & Waller, 2017).

Conclusions
As chemistry is the language of nature, every day, it is mutating somehow to make our living style better than the previous day. However, in the case of medicinal chemistry, fabricating drug to ensure our well being is challenging (Whitesides, 2015). One of the many challenges in drug design is the vast size of the search space for novel molecules. From a large pool of synthetic chemicals, it is arduous to pick a drug with specific functionality for targeted treatment in a short period. To address the sustainability challenge, modern high-throughput screening techniques allow testing of this molecular space in the laboratory every day. However, the more vast space, the higher the number of trails to be conducted, and hence the more shall become less cost-efficient. Thus, the evolution of computational science developed, which amalgamated theory with experiments to narrow down the search space. De novo drug design is one of the evolving technology that shall disrupt the medical industry once it reaches its peak owing to its potential to fabricate active molecules for tailor-made biological applications.

Conflict of Interest
Sayan Basak declares that he has no conflict of interest.

Human/Animal Rights
This article does not contain any studies with human or animal subjects performed by any of the authors.