Jun 28Fun with Data SciencePreface It’s been a long time since we started stories on data science and the latest technologies, and we have achieved a lot on this platform. Plus, life gets boring if you keep working seriously without taking any break. This time, we have something that will bring a smile to your…Data Science2 min read
Published in INSAID·Mar 24Linear Regression with PySparkBy Hiren Rupchandani and Abhinav Jangir In our previous article, we performed a basic EDA using PySpark. Now let’s try implementing a linear regression model and make some predictions. You can find the corresponding notebook here. Before we jump to linear regression, we also need to process our data and…Linear Regression4 min read
Mar 24Deep dive into Big Data: Hadoop (Part 1)by Pronay Ghosh and Hiren Rupchandani Around 90% of the world’s data was created in the last two years, according to estimates. Furthermore, 80 percent of the data is unstructured or available in a variety of forms, making analysis challenging. You now have an idea of how much data was…Big Data5 min read
Mar 23Building conversational AI chatbots with Amazon Lex(Part-1)by Pronay Ghosh and Hiren Rupchandani In the previous article, we learned in-depth about Amazon Augmented AI solutions. We learned that Data in any format, including organized and unstructured data, is fed into Amazon’s proprietary AI services. We also learned that models are created according to standard protocol and deployed…Big Data5 min read
Mar 22EDA with PySparkBy Hiren Rupchandani and Abhinav Jangir PySpark is an interface for Apache Spark in Python. It not only allows you to write Spark applications using Python APIs but also provides the PySpark shell for interactively analyzing your data in a distributed environment. PySpark supports most of Spark’s features such as…Pyspark7 min read
Mar 15Distributed Systems (Part -2)by Pronay Ghosh and Hiren Rupchandani In the previous article, we had a high-level overview of distributed systems. We found that why data is so important. We learned that Big data is a large amount of diversified information that is arriving in ever-increasing volumes and at ever-increasing speeds. Big data…Distributed Systems5 min read
Mar 14Amazon Augmented AI in depthby Pronay Ghosh and Hiren Rupchandani The fundamentals of Amazon A2I were discussed in the previous article. Some machine learning applications necessitate human supervision, as we discovered. This is done to ensure the accuracy of sensitive data, to give ongoing improvements, and to retrain models with new predictions. In these…AWS5 min read
Mar 13The Tidbits of Apache SparkBy Ashish Lepcha and Hiren Rupchandani Since began out as a research task at the AMPLab at U.C. Berkeley in 2009, Apache Spark has ended up one of the key big data distributed processing frameworks globally. It can be deployed in numerous ways, supports Java, Scala, Python, and R programming…Spark10 min read
Mar 10Introduction to Distributed Systems (Part -1)by Pronay Ghosh and Hiren Rupchandani A distributed system is a computing environment in which diverse components are dispersed across a network of computers (or other computing devices). These devices split up the work and coordinated their efforts to complete the task more quickly than if it had been assigned…Distributed Systems4 min read
Mar 9Introduction to Amazon Augmented AI (A2I)- A Bird Viewby Pronay Ghosh and Hiren Rupchandani In the previous article, we learned how to build, train and deploy a machine learning model using Amazon Sagemaker. Human oversight is required for some machine learning applications. This is done in order to verify the accuracy of sensitive data, provide continual improvements, and…AWS5 min read