ML Application - Car Price Prediction

What other work can be done by those doing the job of extracting useful information from the data I described in the previous post? In this article, I am going to try to explain the concepts of data science, artificial intelligence and machine learning mean in terms of products coming out of the data with a real-life example.

artificialintelligencemachinelearningvirtualbrain1584997pxhere.com.jpg

Image from PxHere

How Much is My Car Cost?

Data needed to develop the car-price prediction model is available in relational databases. If it is too large to be kept in relational databases, this data is in big data environments. If we want to model this data in relational databases or big-data environments, this data is moved to environments such as Python or R. Data from data sources is handled by tools such as Python and R, and the data analytics process begins.

Since the goal here is to create a car-price prediction model, this work is done with statistical prediction models or machine learning techniques. Machine learning algorithms basically produce a function model like the one below. In this example, I handled a car seller site and let's take a look at how this model is made with the system integration process.

Big data structures can be thought of as an Excel spreadsheet. Here, we have some data about the cars. When the features of the cars are entered, we want the model we created to give us a price prediction. Data should be prepared before proceeding to the modeling phase. Because there is a possibility of structural defects in the dataset. For example, if we know that the A3 model of the car of the A brand is not Manual gear, such incorrect values should be corrected in the data set. After doing these corrections, we can start creating the vehicle prediction model.

We can model this structure in the data set with many algorithms. One of these techniques is the multiple linear regression model. The model I mentioned will be a function like the one below.

Beta expressions here are the coefficients learned from the dataset. Learning the structures in the data set actually means finding the direction and severity of the effects of parameters such as gear type, damage, brand, which we know affect the price. When we find this function-like structure, we will actually find the contribution of these parameters to price prediction.

As a human, we probably do not know the price of a car with completely randomly generated features or there is no such car. However, it is possible to predict this with machine learning. That's why, considering that artificial intelligence can do many things in the future, some question marks may appear in your minds. Anyway, I will talk about my ideas about the future of artificial intelligence later in another article.

Now we have parameters in our car sample that create negative and positive effects on the price. For example, old cars are cheaper and automatic geared vehicles are more expensive than manual. So what kind of prediction algorithm will we need when we consider these two parameters together?

The model technique we will use here will be based on the multiple linear regression model. There are also many modeling techniques. I will touch on them in my later writings. But we can say that the basis of all of them is actually a multiple linear regression model.

When we model the data set we have, we will have a function like above. The Beta values here represent the severity of the effect levels of the variables learned from the data set. After the necessary mathematical operations and optimization in the background, we have a function like this. When we consider Beta 1 value here, we see that model, KM parameter determines the value as 0.2. Likewise, other parameters determine the coefficients in the model equation in certain dimensions. Categorical variables such as damage condition and gear type in the equation take the value 1 or 0. When all the values in this equation are written and solved, we will learn the estimated car price.

In this article, I have simply explained how a learning process is with a machine learning approach from the data and multiple linear regression modeling techniques. When the features of a car came from this taught structure, we were able to predict what the price of this vehicle would be.

In the next step, the results we get from the model we have established are compared with real values. Minor changes are made to the coefficients to get closer estimates to the real values, and the model we created can be made to deliver more realistic results over time.