Let's take a closer look at my outlier ignoring algorithm. First, let's look at what is going on in the photo. The tiny dots every are the data points gathered during a simulated left hand test. The slightly transparent purple regions are the regions the algorithm "deletes" from the list of valid data points. The cyan dot is the average point from the available data and the cyan lines are the lines made by the Cartesian coordinate system centered at the average point (I am aware that the cyan dot in this picture is not in the exact average location, but it is close enough to illustrate how the algorithm works). The red numbers number each quadrant.
First off, the algorithm finds the average point of the collected data. That point should be inside the circle made by the arc. Next, the algorithm determines which hand is being tested. In the screenshot, it uses the average point, but I have since changed it to use the first data point because patients start with their thumb along the side of the device and sweep their thumb towards themselves. The change was made to accomodate smaller devices in which the test may take up the entire screen. Once the average point and the hand being tested are found, the algorithm deletes all data in the 3rd quadrant for left handed tests and the 4th quadrant for right handed tests. This region is represented by the semi-transparent magenta rectangle in the screenshot of the app. A collection of data points along the edge of the screen is common during tests on smaller devices, but not on tablets, because the palm comes in contact with the screen.
Second, the algorithm uses some statistics to calculate the maximum allowed distances between valid data points. I have yet to take a stats class, so I used a formula that I found on Yahoo Answers. I really wish my school offered AP Stats...
Third, the algorithm checks all remaining data points to see if they have a neighbor within the allowed maximum distance. If they do, then the data point is assumed to be valid. This caused problems earlier with the palm's invalid data points because they had neighbors within the allowed maximum distance, but they were all invalid data points. But if the data point does not have a neighbor within the specified distance, then it is discarded. In the screenshot of the app, you can see data points discarded by this part of the algorithm highlighted in a semi-transparent magenta circle.
That pretty much sums up the algorithm. It can be processor intensive, especially the third part because of the number of calculations involved, but since the user starts the process by clicking on the "analyze data" button I'm not too worried if it takes a second or two to finish.
Thanks for stopping by!
Chris Konstad
No comments:
Post a Comment