In today’s world, key priority of companies have shifted to leveraging Artificial Intelligence, Cognitive Computing, Machine Learning & Advance Analytics to achieve different business goals like – maximizing profit, reducing risk, improving regulatory compliance deepen customer relations, improve operational efficiency and be a market leader
Many areas of AI have now matured significantly and no longer in an experimentation phase. We see an increase in multiple business functions getting AI Driven in production. One of the main reason in this trend is increased focus by firms in strengthening their Data Engineering and setting up the right Data Backbone (using Big Data & Cloud).
Implementing effective AI / Machine Learning / Advance Analytics solutions, require Data Scientists to work on huge volumes of cross-functional & multi-structured datasets, link them together, perform exploratory analysis, and craft right features to feed into a Model building pipeline. In general, quality of model is directly proportional to quality of data used.
For example – consider a predictive model in an investment management firm which could predict the risk score for a company/asset, to help in making right investment decisions. This would require analyzing not only the financial information (which many times arrives late), but look at signals from alternate datasets as well like – what type of people work in a firm, market news about the company or sector it belongs to, different industry research documents, etc.
If Data Scientists start sourcing, analyzing & leveraging such datasets at the time of model building, this will look like a hugely overwhelming task for a data scientist and will spend the most of the time only in dealing with extracting & managing the datasets, rather than building the model. It’s better to take a pragmatic approach, and develop reusable pipelines which could collect and organize such datasets easily in a governed manner, and can also use Cognitive capabilities to extract information from multi-structured datasets (PDFs, images etc.). Data Scientists can focus on leveraging the sourced data (which is pre-processed with cognitive technologies, validated, cleansed, profiled and cataloged), to start experimenting & train models with their choice of languages, tools, IDEs, frameworks. And deliver business value at much faster pace.
Following diagram a high level reference architecture for a Modernized Data Platform which consume AI based cognitive services, as well helps in building new AI models & Services. Left side of the architecture is Data Engineering heavy (Extraction & Transformation of Streaming & Batch Datasets, Performing Validation, Standardization, Profiling & Cataloging, Enhancing Data Quality, managing lineage etc.).
And right side of the architecture is where ML Engineering and AI & Data Science pays a bigger role (feature engineering, exploratory analysis, model training, model governance, MLOps, deploying models etc.).
At Coforge, we have successfully implemented such end-to-end architectures at scale, and delivered real business value to many of our customers. We recommend and suggest taking an incremental “MVP-based” approach in implementing such architectures, and Coforge and accelerate the overall journey using our different Data Engineering and AI/Cognitive accelerators.