Platform Overview¶
DeepSee looks at the nature of websites and rings of websites to discover places that harbor fraud. Where on-page trackers capture a point in time, DeepSee captures the journey, end-to-end, that a user might take. You can use our self-serve webapp or our API to access risk information on millions of websites.
Integrating With Our Platform
We offer a real-time API, batch data transfers to object storage, and a web application for analysts. Reach out to us for a demo
Core Concepts & Terminology¶
Behavioral analysis is a core component of our analysis, and the risk types / relationships we create are used to power each of our products. We have developed many features which are completely unique to our platform, and so you may find yourself unfamiliar with some terms you encounter.
This section serves as an introduction to those terms, which you will see references to throughout all the other sections of our documentation portal. They serve as filters for our advanced search, and they are part of what underlies our risk scoring system.
Risk Types¶
In addition to a 0-100 risk score for each domain, we offer the following individual risk assessments:
- Pop-up: Sites with a high Pop Up Risk produce suspicious pop-ups or load as pop-ups frequently
- Redirector: Sites with a high Redirector Risk force you to visit another page when you visit them. This behavior can be triggered by an interaction with the page, or automatically as a “zero-click” redirect.
- Embedded: Sites with high Embedded Risk have untrustworthy referral values. These embedded pages load underneath the content of another page.
- Resource Hog: These sites run scripts that require lots of communication between the client and other web servers and have elements that refresh frequently. These sites may prompt users to block ads entirely.
- Bloat: Bloated sites have a sizable initial loading time. This is due to the site containing many different images or containers. As opposed to Resource Hog, these sites may speed up after the initial page load. Bloat can cause trackers to fire in a way which causes inconsistencies when comparing measurement between multiple vendors.
Similar Site Types¶
Sites that are connected by means other than loading / linking each other:
- Same Parent: When these domains appear, they are often spawned by the same process
- Shared ads.txt: These domains have highly similar ads.txt files.
- Behaves Similarly: These domains have similar activity profiles to each other.
- Shared Google Product IDs: These domains have the same Google Analytics, AdWords, or Floodlight ID. The type of Google product is specified by the similarity type.
- Similar URL Structure: These domains have similarly constructed query strings, with many shared parameter names & types of data communicated
Site Tag Definitions¶
These are unique characteristics that are not necessarily related to risk. Currently, we label sites with the following tags:
- Ad Domain: This domain is used as part of the ad serving process
- Big Mover: This domain’s ranking has recently changed significantly. Sign of sourced traffic
- Bouncer: This domain performs an immediate redirect.
- Displays Ads: This domain has ad placements featured on its pages
- High-Risk Advisory: This domain has one or more risk types that are incredibly high.
- Hub: This domain has a large number of unique subdomains.
- Measurement Domain: This domain is seen recording information about the user during many sessions across disparate domains
- Misinformation Domain: This domain has a Low or worse factual rating according to sources collected by iffy.news' Iffy Index
- Misleading Content Format: This domain dynamically changes it's monetization strategy in order to aggressively monetize users from paid aquisition channels. Further expalnation is available in our blog post: Two Tales of One Website: How Arbitrage Sites Manipulate Metrics Using Misleading Content Formats
- MFA Domain: This domain is confirmed to source its traffic inorganically, and it has an excessive ad experience. This list is manually verified to contain the most excessive clickbait.
- Paid Inbound Traffic Profile: A high proportion of this domain's total observed links come from sponsored content boxes or other paid ads
- Piggybacked: This domain appears as a query parameter in the URLs of other domains.
- Piracy Domain: This domain hosts movies, tv, or print media without permission from the copyright holder
- Recently Registered: This domain has been registered for 6 months or less.
- Weak Inbound Link Profile: We've encountered fewer than 10 other domains linking to this one.
- Widget Domain: This domain has a widget which is embedded within the content of many different sites
Traffic Flow Information¶
- Inbound Connections: Sites loading, or linking to, the domain of interest
- Outbound Connections: Sites loaded, or linked to, by the domain of interest during the course of a browsing session
- Connection Types:
- Link: does not infer a page is loaded, but it does show a possible path a user could easily travel between sites
- Referrer: this connection signals that the page was loaded during the course of normal browsing
- Popup: when we detect a popup window, sites loaded in that window get marked a Popup connection type from the site that was originally crawled
- Redirect: when the users browser gets forcibly navigated, the destination of traffic is connected to the crawled domain with a Redirect connection type
Content Categories¶
These are the content categories we apply to sites that we visit. Currently, only english language content is categorized.
- Adult: Depictions of graphic sexual acts
- Automotive: Consumer vehicle information like car ratings, car purchasing recommendations, owners clubs
- Business & Industrial Info: Trade publications & market reports; business & competitive intelligence
- Computer Software & Hardware: Documentation, tutorials, technical resources, product pages; B2B software; professional education
- Education & Academia: Resources for accessing education / learning
- Family & Parenting: Parenting guides; content built around family dynamics
- Food & Drink: Restaurants, cuisine reviews, recipes, and general culinary interest topics
- Gambling: Content built around allowing / instructing users to gamble
- Games: In-browser Games / Quizzes
- Government, Nonprofit, and NGO: Official sites for government / non-profit organizations; not advertising related
- Health & Fitness: Exercise, health issues, medications
- Home & Garden: Gardening appliances, gardening instructions, home improvement
- Jobs: Resources for finding work & networking
- Literature & Art: Fine art, books, authors, high culture
- Marketing/Advertising: Corporate sites & lead capture forms
- Movies, Music, TV, & Online Streaming: Movies, music, TV, theater
- News: General: News outlets covering wide range of topics (ex: yahoo.com, nbcnews.com, local news outlets); Celebrity Fan/Gossip
- Non-English: Catch all for non-english language sites; currently we only categorize english language content
- Online Communities & User Generated Content (UGC): User submitted content like forums , image hosting, torrent sites
- Parked/Forbidden: Unregistered / undeveloped sites
- Personal Finance & Law: Content built around getting money, saving it, and/or investing it
- Pets & Animals: Animal interest topics & suppliers of goods made for pets
- Prescription and Illicit Drugs: Content promoting the sale or consumption of illegal or prescription drugs
- Real Estate: Buying/selling homes
- Religion & Spirituality: Theological & spiritutal interest topics
- Science & Technology Interest: Science and technology content designed for general consumption, in contrast to purely academic literature
- Search Engine: Search engines
- Shopping: General e-commerce; coupons
- Social Media: Social media platforms
- Sports & Outdoors: Major league sports& competition activities; hiking, nature, fishing etc
- Style, Fashion, and Beauty: Jewlery, clothing, accessories, makeup
- Travel & Leisure: General travel: booking trips, trip blogs, trip ideas
- Uninformative: Sites that are too short on content, or too generic to label
- Video Games & E-Sports: Video game interest; game guides
- Website Building & Design: Tools that help you build websites, and give you access to stock photos & design elements