Lingfeng Yang

About

I am a Ph. D candidate at Stanford University, under the supervision of Dr. Pat Hanrahan. I work in computer graphics, human-computer interaction, and programming languages. I received my BS in mathematics and computer science from the California Institute of Technology. My email address is “l yang” (without the space or quotes) at cs dot stanford.edu, and is the best way to contact me. You may also find me in the Gates building #382 on the Stanford campus.

Online materials

GitHub CV

Publications

Lingfeng Yang, Pat Hanrahan, and Noah. D. Goodman. Generating Efficient MCMC Kernels from Probabilistic Programs. In proceedings of AISTATS 2014. PDF | BibTeX

Jerry O. Talton, Lingfeng Yang, Ranjitha Kumar, Maxine Lim, Noah D. Goodman, and Radomír Měch. Learning Design Patterns with Bayesian Grammar Induction. In proceedings of UIST 2012. Best paper nominee. PDF

Yi-Ting Yeh, Lingfeng Yang, Matthew Watson, Noah D. Goodman, and Pat Hanrahan. Synthesizing Open Worlds with Constraints using Locally Annealed Reversible Jump MCMC. ACM Transactions on Graphics 2012, presented at SIGGRAPH 2012. PDF

Yi-Ting Yeh, Katherine Breeden, Lingfeng Yang, Matthew Fisher, and Pat Hanrahan. Synthesis of Tiled Patterns using Factor Graphs. ACM Transactions on Graphics, presented at SIGGRAPH 2013. PDF

Lingfeng Yang. Modeling Player Performance In Rhythm Games. ACM SIGGRAPH Asia 2010 Sketches. PDF

Jerry O. Talton, Daniel Gibson, Lingfeng Yang, Pat Hanrahan, and Vladlen Koltun. Exploratory Modeling with Collaborative Design Spaces. Proceedings of SIGGRAPH Asia 2009. PDF

Research interests

My long-term goal is nothing less than a complete redefinition and reconception of computation in order to better serve human lifeforms.

In the future, different “categories” of computation or computational representations and the barriers built between them will seem as silly as gangs formed and wars fought over slightly different radii of steering wheels in cars, dismissing the words of what someone wrote because it was written in cursive or a language that was not understood, or refusing to trade with a different country because a different currency is in use.

To this end, my line of inquiry is along three different points in the “stack” (something else that more points out how broken the concept of “computation” currently is):

Opening up the very process of programming language interpretation to physical scrutiny through the statistical mechanics of program execution.
Making fluid the boundaries between different representations through probabilsitic knowledge compilation.
Demonstrating the ideas more concretely through tools that better support human thinking in fields such as content generation.

I now give overviews of each of these avenues.

Statistical mechanics of program execution

Edsger Dijkstra, in “Substitution Processes” (EWD 28) put it best: “A machine defines (by its very structure) a language, viz. its input language; conversely, the semantic definition of a language specifies a machine that understands it. In other words, machine and language are two faces of one and the same coin.”

The field of probabilistic modeling, in particular probabilistic programming, has shown that it is useful to approximate the processes by which real-world observations are generated by stochastic machines that, depending on the current state of the machine, follow one of many different next states stochastically according to the current instruction, which may incorporate non-determinism or randomness.

In this setting, probabilistic inference becomes a problem of abstracting and characterizing the behavior of such machines. In particular, the question of inference becomes a question of estimating the probability mass of machine states satisfying certain properties. In most cases, however, the set of all machine states is intractable (both practically and theoretically) to compute exactly, and we often resort to using an approximating machine guaranteed to give answers quickly (static program analysis and verification), or falling back on executing the program itself for a number of runs (dynamic analysis and symbolic execution).

It is often hard to see the big picture by applying such traditional techniques; while individual inference algorithms, even extremely efficient ones, can be readily produced, several more issues remain. For instance, different inference algorithms have wildly varying degrees of computational efficiency when we consider different hardware substrates that perform the computing. Integrated circuits for performing general processing, such as CPU’s, are often much slower (in terms of real time taken to reduce variance of an estimated value) than another logic-gate-based but highly specialized circuit that also take much less energy to perform the same operations. What kind of inference should general computing hardware such as CPUs and GPUs perform versus for ASICs and FPGAs? In the long run, is there a good reason at all to consider separately the computational and “purely statistical” aspects of an inference scheme? For each unique program by itself, the most efficient gate-based machine to execute it would have what properties? Accounting for physical processes beyond that which occur in logic-gate-based circuits?

This is the motivation for statistical mechanics, a field that has been concerned from the outset in analyzing any and all processes for which a state space and transition function can be formulated. I am then concerned with useful formulations of program execution as statistical mechanical systems. I will start by slightly tweaking models from the analysis and verification literature, incorporating energy terms and action principles that summarize program behavior. For instance, if a non-determinsitic program is modeled as a machine that may, at any step, undo the last transition or jump back to the beginning, one may, for a given program, hope to estimate the profitability of depth-first versus breadth-first model search techniques in answering queries about the program in terms of estimated efficiency of flow from the start state to desired states. This then leads to potential formulations of energy for a given program and machine, e.g., potential energy is greatest at states where many paths are possible leading from it (the initial state).

Probabilistic knowledge compilation

The field of knowledge compilation (KC) is concerned with computing alternative representations of logical formulae that allow previously exponential-time queries to be (repeatedly) performed in polynomial time. I am interested in both methods that apply traditional KC techniques to probabilistic inference as well as probabilistic views of KC, such as the compilation of incomplete or imprecise representations of the state space (such as traces or formulas in satisfiability modulo theories (SMT)) that nevertheless result in fast, accurate queries, especially for classes of models with many deterministic relationships between states.

The fields of search algorithms of all kinds, compilation, and even complexity theory will eventually be laid to rest.

Content generation

As we move on from procedurally generating objects and designs with simple representations such as grammars for buildings and trees, the requirement for richer representations that capture much more complex objects in increasingly compact and usable forms increases, up to and beyond general-purpose programs. Yet, increasing the complexity of the representation tends to widen the cognitive gulf between the program and the resulting output, making them hard to use. Moreover, we introduce a computational gap as well. For instance, many design problems can become intractable when expressed as a system of constraints in a declarative programming language. I am concerned with developing tools and algorithms to address these issues.