Wednesday, 14 December 2016

Fake Post Detection

Fake Post Detection
Bot enabled curation
After getting the results of US presidential election, Facebook took a lot of heat for spitting on the fake news in its news feeds, by the time Facebook had replaced the human curators with an aggregation algorithm.

"Anything happening anywhere, it is just shared as a post, if nothing happens even that can be represented and shared as a post, a post made is completely based on the mindset of a user."

Event Capturing and Sharing

Today’s social networking sites has become a virtual world, here each of us are connected by various means. People uses this connection for many things, advertising is the most profitable one. Just because one can reach to huge set of audiences globally, it has been a very good platform to journalists and media housed to share the live news updates. On the other hand it has become very easy for misleading huge set of audiences with a small post with wrong information.

"From Silicon Valley till the White House everyone has suffered from the ill effects due to hoax news and now the America stands Trumped…!"

A post in Facebook is collection texts, image or a video along with the information of person posting it; further every post will have a counter of reactions and comments list. The beauty of text is, it is not restricted to describe a post, but can be used to refer a remote resource using URL or to emphasize the post using hash tag (# as prefix) or can refer an entity(user or a page in Facebook) itself using @ as prefix.

Curating a post:

a.       Requirements for Curating a post:

i.                    Every post must contain textual description, other than image or video.
ii.                  A post related to news must be tagged as #news at-least.
iii.                Tagging peoples, pages or location will gives proper relevance.

·         Any post that doesn’t satisfy these requirements will be ignored from the algorithm, flagging them as ineligible.

      b.    Assumptions that can be made for Curating an eligible post:

i.                    If a post made by verified source of Facebook (verified users or pages), then the post would be trustful.
ii.                  All re-posts from verified sources, without modifications will be again trust-able.
iii.                If a post contains only a URL and if its domain has good rank, then the post would trust-able.

·         An assumption is not a solid basis, so they should never be hard coded.

c.    To identify genuine posts from unverified sources:

i.  Any event shared digitally will have at-least a twin, across various sources.
ii. The challenge is to identify such sources and to weather or not to trust it.

A. Algorithm to Test a Post for news feeds:

Step 1: IF tags == null || tags dontHave "#news" 
        then SET isEligible = FALSE
        END IF 

Step 2: IF msgTXT == null 
        then SET isEligible = FALSE
        END IF 

Step 3: IF isEligible == FALSE 
        then EXIT
        END IF 

Step 4: IF userID is verified
         then (post must be from trusted source)
         doPost(this, "green")
        END IF 

Step 5: IF msgTXT contains only the URL && ( RANK(URL) is GOOD || isReputed(URL) )
         then (post must be from a trustworthy external source)
         doPost(this, "green")
        END IF 

Step 6: Exit

B. Algorithm to doPostProcess for posts from un-verified sources:

Step 1: IF isEligible == FALSE
        then EXIT
        END IF 

Step 2: findTwins(this)

Step 3: IF twinPost exists in any verified source ( must be a re-post from a trusted source )
then doPost(this, "green") 
        END IF 

Step 4: EXIT

C. Algorithm to find a Twin Post across Internet:

Step 1: Use existing search engine APIs and search for the presence of keywords, tags, mentions and descriptions of the post. (To identify the sources of inbound links to this post)

Step 2: If search results exists, verify the sources using Reputation Algorithm (reputation based on page rank, popularity or hits...) by ignoring redundant results.

Step 3: based on verification result classify the post under R B G category
        - R = Red = Unverified / Unable to verify (no twin exists, may be a fresh post, from a fresh or less popular source  )
        - B = Blue = Partially verified (found some or very less twins)
        - G = Green = Fully verified ( inbound links to this post exists across reputed sites, not just like Facebook post share )

Step 4: doPost(this, COLOR)

Step 5: EXIT

NOTE: A blog/web post shared on reputed sites like Reddit or Facebook may not be considered as a trusted one unless that blog/website achieve good reputation/rank. Since the back links on reputed sites points back to that source blog/website, which is still evolving...!

E.g. Consider this post that you are reading now in my blog. My Alexa Rank at the time of writing this post is 15966794.., yah that's very poor. Ranking a page depends on many factors and this can be considered to measure the reputation of a domain, website or a blog. If  I share this post on Reddit or Facebook, since this is a new post there are no twins to it, Ironically my algorithm will mark that post as unverified at the best case, as the reputation strategy is completely based on Page Ranking..!

NOTE: A new and genuine resource with less reputation being marked as R or B can be influenced by factors that we are considering for reputation of a resource. So a proper reputation assignment strategy is required and page rank only seems insufficient at-least in my case... :P

Finally doPost(Post, Color) is a method to render all eligible posts for news feeds, that are categorized under color codes.

So Color Coding and Twin Post Detection was my strategies, to identify the truth-hood of posts that are floating on net.

Featured post

Common Errors in English

Although English is a foreign language yet its important to learn in our country, If you needs to survive just out of your state now En...