Salvatore Stolfo

Professor Stolfo has recommended books in the following areas:

Salvatore J. Stolfo is a tenured professor of computer science at Columbia University in New York and a leading expert in computer security. He is known for his research in machine learning applied to computer security, intrusion detection systems, anomaly detection algorithms and systems, fraud detection, and parallel computing.

Born in Brooklyn, New York, Stolfo received a Bachelor of Science degree in Computer Science and Mathematics from Brooklyn College in 1974. He received his Ph.D. from NYU Courant Institute in 1979 and has been on the faculty of Columbia ever since, where he’s taught courses in Artificial Intelligence, Intrusion and Anomaly Detection Systems, Introduction to Programming, Fundamental Algorithms, Data Structures, and Knowledge-Based Expert Systems.

While at Columbia, Stolfo has received close to $50M in funding for research that has broadly focused on Security, Intrusion Detection, Anomaly Detection, Machine Learning and includes early work in parallel computing and artificial intelligence. He has published or co-authored over 250 papers and has over 21,000 citations with an H-index of 67. He pioneered research in a number of areas within computer security that are widely in use today. In 1996 he proposed a project with DARPA that applies machine learning to behavioral patterns to detect fraud or intrusion in networks. This approach to security has recently emerged within the industry as user behavior analytics. His earlier research on machine learning algorithms applied to credit card fraud was adopted throughout the financial industry.

Academic DADO Parallel Computer Stolfo and students Dan Miranker, Mike van Biema, Alexander Pasik and Steve Taylor, designed the architecture and software systems for the DADO parallel computer, an example “fifth generation computer” sponsored by DARPA’s high performance parallel computing initiative in the mid-1980s. The DADO research group designed and built in a lab at Columbia University a fully functional a 1023-processor version of the machine that was the first parallel machine providing large-scale commercial speech recognition services. The DADO occupied about 2 cubic feet of cabinet space. The DADO was tested at sea in a Navy research vessel to test its capabilities for related acoustic analyses and detection capabilities. A parallel broadcast and resolve/report function introduced by the DADO machine apparently influenced part of the design of the IBM Blue Gene parallel computer.

The DADO technology was the first invention claimed by Columbia University for ownership of a faculty member’s intellectual property under the 1980 Bayh-Dole Act. A company called Fifth Generation Computer was formed by Columbia and outside investors to commercialize the DADO machine. The company subsequently developed a commercially deployed speech recognition system operated by Qwest. A dispute between the small company and a large telecommunications provider and Columbia University caused a six-year detour into the US court system where ultimately Stolfo prevailed.

DADO introduced the parallel computing primitive: “Broadcast, Resolve, Report”, a hardwire implemented mechanism that today is called MapReduce.

Data Mining of Big Data ACE Expert system: the First Deductive Database System and Application Among his earliest work, Stolfo along with colleague Greg Vesonder of Bell Labs, developed a large-scale expert data analysis system, called ACE (Automated Cable Expertise) for the nation’s phone system. AT&T Bell Labs distributed ACE to a number of telephone wire centers to improve the management and scheduling of repairs in the local loop. ACE is likely to have been the first system to combine rule-based inference (an AI expert system) with a relational database management system, the AT&T CRAS system, and serves as a model for deductive data base systems that were the subject matter of research for many years in the database community. ACE was the first expert system of its kind that was commercialized and widely distributed.

Merge/Purge, De-duplication of large datasets In other work related to the “merge/purge” problem (sometimes referred to as “record linkage” or “data deduplication”) an algorithm developed by him and student Mauricio Hernandez has been used in large-scale commercial systems for data cleansing. Identifying and purging duplicates from large data sets is a very important part of large-scale data analysis systems, especially in commercial data analytics. The algorithms invented provided a means of scaling to very large data sets while balancing the requirement to produce accurate results in the presence of arbitrary noise and error in the data base. The patented technology was licensed by Informix, a company that was later acquired by IBM.

KDD CUP Data set The DARPA IDS evaluation datasets were constructed by Lincoln Labs in 1998 and 1999 for the DARPA Cyber Panel program. These network trace data sets were used to evaluate the performance of different intrusion detection systems; they were the only network trace data with ground truth available to the open research community. The data, however, were difficult to use directly by a wider community of data mining researchers. Stolfo and his associates in the IDS lab including Wenke Lee created the KDD Cup dataset derived from the DARPA IDS datasets. The DARPA network trace data were converted to “connection records” making the data more suitable for data mining researchers to test various machine learning algorithms. This data created as a community service is extensively used in IDS research, even today.

Machine Learning Applied to Cybersecurity Improved Credit Card Fraud Detection Stolfo consulted to the CTO of Citibank for several years and conducted research on machine learning algorithms applied to the credit card fraud problem. Much of that work with students Phil Chan and Andreas Prodromidis published as “meta-learning”- based strategies, demonstrated how to improve the accuracy of fraud detectors and substantially reduce loss due to fraud.

Worminator Stolfo was an early proponent of collaborative security and distributed IDS technology and systems. Stolfo and students Ke Wang and Janek Parehk developed a fully functional IDS alert exchange system that introduced a new means of sharing sensitive data in a privacy-preserving manner. The technique involved communicating network packet content found to be anomalous or verified as an attack after converting the raw packet content into a statistical representation allowing accurate correlation of common attacks across sites. The method invented by Stolfo and students to share and correlate content across administrative domains without disclosing sensitive information introduced the use of Bloom filters storing n-gram content of network packet datagrams. The method was extensively studied and continues to be used in several ongoing experiments. The method also formed the basis of a recent project with colleagues Steve Bellovin and Tal Malkin for the secure querying of encrypted document databases without requiring the insecure decryption of any document when searching for relevant content.

Decoys and FOG Computing Stolfo coined the term FOG computing (not to be confused with fog computing) where technology is used “to launch disinformation attacks against malicious insiders, preventing them from distinguishing the real sensitive customer data from fake worthless data.” Stolfo’s proposed approach is to confuse and confound a traitor by leveraging uncertainty, to reduce the knowledge they ordinarily have of the systems and data they now gain access to without authorization. FOG computing systems integrate bait information with systems that generate alerts when a decoy is misused.

The Insider Threat: RUU? In 2005 Stolfo received funding from the Army Research Office to conduct a workshop to bring together a group of researchers to help identify a research program to focus on insider threats. Since then the IDS group at Columbia working with other researchers at I3P developed several demonstration systems evidence of insider malfeasance. The work includes user profiling techniques (especially for masquerader detection. “RUU” is a spoken acronym for Are You You?) studied by Stolfo and student Malek Ben Salem, and a number of decoy generation facilities studied jointly with co-PI Angelos Keromytis and student Brian Bowen.

Email Mining Toolkit (EMT) The EMT system sponsored by DARPA contracts was among the first machine learning system to incorporate social network analyses in important security problems, including spam detection and virus propagation. The extensive set of analyses in EMT, developed by Stolfo and student Shlomo Herskhop and others, allowed analysts, forensics experts, students and researchers the opportunity to explore large corpora of bail messages and discover a wide range of important derivative knowledge about the communication dynamics of a user or an organization. Among its applications, EMT models user behavior to identify uncharacteristic bail flows indicative of spam bots and viral propagations. The toolkit has been downloaded by well over a 100 users and elements of the analyses introduced by EMT serve as a model for other analytical systems. The entire body of analyses demonstrated a general description of all IDS network and communication analysis systems conveniently described by the acronym, CV5.

Embedded Device Security Symbiotic Embedded Machines (SEM) and Insecure Embedded Systems Student Ang Cui working with Stolfo in the IDS lab invented a concept to embed arbitrary code into legacy embedded devices. The symbiotic embedded machine technology has been demonstrated to provide a direct means to inject security features into operational Cisco IOS routers in situ without any significant performance degradation and without any negative impact on the routers primary function. The Symbiote technology is being explored for use in a number of different platforms and devices (ARM, MIPS, X86) and several interesting applications, especially for a large set of existing insecure embedded devices found on the internet. This line of work is supported by the DARPA CRASH program that has brought together a very large number of computer science researchers focused on clean slate design for a new generation of safe and secure computer systems. Preliminary work performed by Cui and Stolfo in the IDS lab performed a wide area scan of the internet counting the number of vulnerable devices. To date over 1.1 million have been found.

Service to the US Government High Tech Subcommittee of the New York City Partnership, 1987 (chaired by J. Lederberg of Rockefeller Univ.). New York State Science and Technology Foundation, New Business Evaluation, consultant 1989. DARPA IPTO Futures Panel, 2007, 2008. National Academies National Research Council/Naval Studies Board Committee on Information Assurance for Network-Centric Naval Forces, 2008, 2009.

Entrepreneurial Spin out companies Red Balloon Security Founded in 2011, Red Balloon Security (or RBS) is a cyber security company founded by Dr Sal Stolfo and Dr Ang Cui. A spinout from the IDS lab, RBS developed a Symbiote technology called FRAK as a host defense for embedded systems under the sponsorship of DARPA’s Cyber Fast Track program. FRAK is a system that provides the core capability to automatically unpack, modify and repack embedded system firmware to install Symbiote defenses. Currently, they are developing products and services that are based upon the Software Symbiote technology.

Allure Security Technology Created based on their IDS lab research for the DARPA Active Authentication and the Anomaly Detection at Multiple Scales program, Dr Sal Stolfo and Dr. Angelos Keromytis founded Allure Security Technologies. Using active behavioral authentication and decoy technology Stolfo pioneered and patented in 1996. Allure brought those technologies together into Novo, an active user behavior analytics security solution that protects devices from data loss and intrusion. Allure’s research has been supported by Columbia University, the National Science Foundation, DARPA, DHS, and others.

Founded in 2009, Allure Security Technology was created based on work done under DARPA sponsorship in Columbia’s IDS lab based on DARPA prompts to research how to detect hackers once they are inside an organization’s perimeter and how to continuously authenticate a user without a password.

Acquired companies/technologies Electronic Digital Documents

Stolfo’s company Electronic Digital Documents produced a “DataBlade” technology, which Informix marketed during their strategy of acquisition and development in the mid 80’s. Stolfo’s patented merge/purge technology called EDD DataCleanser DataBlade was licensed by Informix. Since its acquisition by IBM in 2005, IBM Informix is one of the world’s most widely used database servers, with users ranging from the world’s largest corporations to startups.

System Detection Inc System Detection was one of the companies founded by Prof. Stolfo to commercialize the Anomaly Detection technology developed in the IDS lab. The company ultimately reorganized and was rebranded as Trusted Computer Solutions. That company was recently acquired by Raytheon.

Media/Popular Culture In 2013, The Washington Post interviewed Dr. Stolfo about his technology that uses decoy data to mislead hackers, a product soon to be vended by Allure Security Technology. In 2013, The New York Times reported that Dr. Stolfo and his advisee Ang Cui had intercepted the operating system of Cisco’s VoIP phones in order to spy remotely, enabling them to transcribe conversations using Google’s voice-to-text translation. In 2012, The Scientific American covered the pair’s new “symbiote” program that would detect invasions of firmware code without slowing down a computer’s speed.

In 2011, being the adviser to Ang Cui, during his staged intrusion into university printers, NBC News interviewed Stolfo on the topic of cyber security.

Awards and honors

  • Popular Science Award of “What Best of what’s new”, 2016.

  • IBM Faculty Career Development Award

  • IEEE Fellow, 2018.

  • ACM Fellow, 2019.

  • Numerous best paper awards and IEEE Security & Privacy “most influential” paper.