Dr. Jochen L. Leidner – science / engineering / education / entrepreneurship

Call for Nominations: The Microsoft BCS/BCS IRSG Karen Spärck Jones Award 2022

TLDR: Closing date: 9 September 2022

~ An award to commemorate Karen Spärck Jones ~

A pioneer of information retrieval, the computer science sub-discipline that also underpins the technology of modern Web search engines, Karen Spärck Jones was a British professor of Computers and Information at the University of Cambridge in Cambridge. Her contributions to the fields of Natural Language Processing (NLP) and Information Retrieval (IR), especially with regard to experimentation, have been outstanding, highly influential and lasting, and include the introduction of Inverse Document Frequency for relevance ranking. Her achievements resulted in her receiving a number of prestigious accolades such as the BCS Lovelace medal for her advancement in Information Systems, and the ACM Salton Award for her significant, sustained and continuing contributions to research in information retrieval. Karen was also an outspoken advocate for women in computing, and we encourage former advisors of talented scientists to provide the judges with a rich and diverse candidate pool to select from.

To learn more about Karen and her work, visit:
* https://en.wikipedia.org/wiki/Karen_Sp%C3%A4rck_Jones
* https://www.youtube.com/watch?v=U8FecRxSiUM
* https://www.youtube.com/watch?v=5fYeKiebpuo

In order to honour Karen’s achievements, the BCS Information Retrieval Specialist Group (BCS IRSG) in conjunction with the BCS has established an annual award to encourage and promote talented researchers who have endeavoured to advance our understanding of Natural Language Processing or Information Retrieval with significant experimental contributions.

To celebrate the commemorative event, the recipient of the 2022 award will be invited to present a keynote lecture at BCS IRSG’s annual conference — the European Conference on Information Retrieval (ECIR) next year. This forum provides an excellent venue to present and announce the award as the conference attracts many new and young researchers.

Eligibility. Open to all NLP/IR researchers who have no more than 10 years experience after their Ph.D. at the closing date for nominations (non-research times, e.g. parental leave or career breaks, will be taken into account to ensure equity; please point at such times in the nominee’s CVs).

Criteron. The candidate ought to have substantially advanced our understanding of NLP or IR or both through experimentation.

Nominations. The following should be provided:
• Name of nominee, position, affiliation, years since completion of the Ph.D.;
• Name of person proposing the nominee, position, and affiliation;
• Short case for the award, not to exceed 2,500 words, highlighting the contributions the individual has made;
• List of the individual’s top five publications reflecting the relevant contributions, and role within these; and
• Exactly two supporting letters from people who would like to encourage/support the nomination.

Nominations should be emailed to the panel chair below. The support letters can be emailed separately by the referees. It is possible for individuals to nominate themselves, in which case they should provide three support letters. Please note that we anticipate that people who provide support letters will do so only for a single candidate.

Award Panel. The Award Panel Chair, appointed by the BCS IRSG Committee, will invite panel members from amongst representatives of the BCS main council, the BCS IRSG Committee, the European Chapter of the Association for Computational Linguistics (EACL), the Award-sponsoring organisation (unless there could be a conflict of interest), as well as seasoned experts in IR and NLP from academia and industry.

Prize. The recipient of the award will receive a certificate, a trophy, a cash prize of £1000 plus expenses for the awardee to travel to ECIR 2023.

Note that the Karen Spärck Jones Award will now alternate between ECIR and EACL to promote integration between the IR and NLP communities that Karen Spärck Jones was an active member of. The 2022 prize award lecture will take place at ECIR 2023.

Timeline for the 2022 Award:
• 9 September 2022 — closing date for nominations;
• 17 September 2022 — deadline for support letters;
• 9 December 2022 — notification of the prize recipient;
• 2 April-6 April 2023 — recipient presents keynote at ECIR 2023 in Dublin, Ireland.

The Karen Spärck Jones Award is sponsored by Microsoft Research Cambridge; we would like to thank our generous sponsors.

Current Award Chair: Jochen L. Leidner <leidner AT acm.org>.

Microsoft-BCS/BCS-IRSG Karen Spärck Jones Award 2020

I am happy to announce that the winner of the 2020 Microsoft-BCS/BCS IRSG Karen Spärck Jones award (to be presented at ECIR 2021 next year) is Dr. Ahmed Hassan Awadallah (Principal Research Manager at Microsoft AI Research in Redmond, WA, USA).

Ahmed has accepted the award. He will give a talk at ECIR 2021 (originally in Lucca, now online only).

I would like to thank the eight independent judges for their valued contributions.

Cell Differentiation, GEB and High School Biology

James Somers wrote on his blog:

“I wish my high school biology teacher had asked the class how an embryo could possibly differentiate — and then paused to let us really think about it. The whole subject is in the answer to that question. A chemical gradient in the embryonic fluid is enough of a signal to slightly alter the gene expression program of some cells, not others; now the embryo knows “up” from “down”; cells at one end begin producing different proteins than cells at the other, and these, in turn, release more refined chemical signals; …; soon, you have brain cells and foot cells. How come we memorized chemical formulas but didn’t talk about that? It was only in college, when I read Douglas Hofstadter’s Godel, Escher, Bach, that I came to understand cells as recursively self-modifying programs. The language alone was evocative. It suggested that the embryo — DNA making RNA, RNA making protein, protein regulating the transcription of DNA into RNA — was like a small Lisp program, with macros begetting macros begetting macros, the source code containing within it all of the instructions required for life on Earth. Could anything more interesting be imagined?“

That’s exactly right, and that’s why I think all school kids should read Gödel, Escher, Bach. I was lucky to buy myself a copy at 16 (I had seen the book mentioned in very different contexts that had not much to do with each other, and that made me curious), and it is fair to say it changed my life.

Some Recommended ICLR 2021 Papers

How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks

Theoretical Analysis of Self-Training with Deep Networks on Unlabeled Data

Share or Not? Learning to Schedule Language-Specific Capacity for Multilingual Translation

Autoregressive Entity Retrieval

Predicting Infectiousness for Proactive Contact Tracing

ICLR 2021 – A Small Selection of Top Papers

These 5 also happen to be among the best-scoring set of 15 papers out of nearly 3,000 (although there are of course many other good papers in the top-15, I have an admitted positive bias towards NLP work as that is what I work on).

Call for Nominations: The Microsoft BCS/BCS IRSG Karen Spärck Jones Award 2020

Closing date (extended): 18 September 2020 (Anywhere on Earth TZ)

               ~ An award to commemorate Karen Spärck Jones ~

A pioneer of information retrieval, the computer science sub-discipline that also underpins the technology of modern Web search engines, Karen Spärck Jones was the Professor of Computers and Information at the University of Cambridge in England. Her contributions to the fields of Natural Language Processing (NLP) and Information Retrieval (IR), especially with regard to experimentation, have been outstanding, highly influential and lasting, and include the introduction of Inverse Document Frequency for relevance ranking. Her achievements resulted in her receiving a number of prestigious accolades such as the BCS Lovelace medal for her advancement of Information Systems, and the ACM Salton Award for her significant, sustained and continuing contributions to research in information retrieval. Karen was also an outspoken advocate for women in computing.

To learn more about Karen and her work, see:

To celebrate the commemorative event, the recipient of the 2020 award will be invited to present a keynote lecture at BCS IRSG’s annual conference — the European Conference on Information Retrieval (ECIR) next year. This forum provides an excellent venue to present and announce the award as the conference attracts many new and young researchers.

Eligibility. Open to all NLP/IR researchers, who have no more than 10 years post doctoral or equivalent experience at the closing date for nominations (non-research times, e.g. parental leave or career breaks, will be taken into account).

Criterion. To have endeavoured to advance our understanding of NLP and/or IR through experimentation.

Nominations. The following should be provided:
• Name of nominee, position, affiliation, years since completion of the Ph.D.;
• Name of person proposing the nominee, position, and affiliation;
• Short case for the award, not to exceed 2500 words, highlighting the contributions the individual has made;
• List of the individual’s top five publications reflecting the relevant contributions, and role within these; and
• Exactly two supporting letters from people who would like to encourage/support the nomination.

Nominations should be emailed to the panel chair below. The support letters can be emailed separately by the referees. It is possible for individuals to nominate themselves, in which case they should provide three support letters. Please note, that we anticipate that people who provide support letters will do so only for a single candidate.

Award Panel. The Award Panel Chair, appointed by the BCS IRSG Committee, will invite panel members from amongst representatives of the BCS main council, the BCS IRSG Committee, sponsoring organisation(s), as well as at least two experts appointed by the BCS IRSG committee.

Prize. The recipient of the award will receive a certificate, a trophy, a cash prize of £1000 plus expenses for the awardee to travel to ECIR.

Timeline for the 2020 Award to be presented at ECIR 2021:
• 18 September 2020 — closing date for nominations (update: has been extended by 2 days);
• 25 September 2020 — deadline for support letters (update: has been extended by 2 days);
• 16 December 2020 — notification of the prize recipient;
• 28 March-1 April 2021 — recipient presents keynote at ECIR 2021 in Lucca, Italy.

The Karen Spärck Jones Award is sponsored by Microsoft Research Cambridge.

Award Chair: Jochen L. Leidner, Refinitiv Labs and University of Sheffield.

For a list of previous recipients of the award, cf. http://irsg.bcs.org/ksjaward.php

Ubuntu 20.04 Security Vulnerability

Ubuntu Linux 20.04 LTS has made login passwords displayable with a button in the way WiFi passwords usually are. While this may have some utility, it presupposes the cleartext form is stored somewhere, which could be a vulnerability or at the very least could be said to increase the attack surface of the system. I think for WiFi, the cost/risk benefit is okay; for user and root passwords, however, I think the risk by far outweighs the benefits.

Introduction to Financial Markets – Reading Guide

Now and again, people ask me where to start if they would like to acquire knowledge about financial markets. So I have put together a little initial reading list.

Trading

Larry Harris (2012), Trading and Exchanges: Microstructure for Practitioners

Investment & Advisory

Glen Arnold (2014), FT Guide to Banking

Frank J. Fabozzi and Harry M. Markowitz (2011), The Theory and Practice of Investment Management: Asset Allocation, Valuation, Portfolio Construction, and Strategies

Giuliano Iannotta (2014), Investment Banking: A Guide to Underwriting and Advisory Services Paperback

Risk

Sébastien Billot (2020), Financial Crime Compliance: Identify and Mitigate Financial Crime Risks

Wealth Management

Charlotte B. Beyer (2014), Wealth Management Unwrapped

CoViD-19: Some Surprises

In this post, I’d like to point out a few observations that have surprised me during the current pandemic.

Country behaviors. A pandemic may require responses that are more authoritarian than a society’s normal operations, and this in itself is a controversial topic. But if we accept it for the moment then what could be observed is two processes were at play in parallel in our universe: the official authorities e.g. the European Union, the British government the US Federal government at the top made announcements but it was lower-level authorities that were actually responsible for much of the day-to-day rules, and the inconsistent messaging kept confusing people. Furthermore, while there were a few acts of charity (like Romanian medical staff flying to Italy to help, or German hospitals taking in patients from France and Italy), overall people were quite country-focused. At the same time, each country’s population (and media) was keenly looking at others’ performance as a way to “benchmark” (for lack of a better term) one’s own government’s performance. This had become possible due to the Internet as a global communication enabler. Unlike a war, a pandemic attacks all of humanity in a globally connected world, so one would have hoped countries to work together to speed up the extinction of the disease.

Organizational behavior. Many companies finally switched to online work. This should have happened 15-20 years ago, but better late than never. A group of people that kept business with business flights to visit colleagues in the very same company that are just located on another continent to me has always been the biggest waste of money, at the same time creating huge environmental damage. It is refreshing how unproblematic this shift was, how quickly everything could be implemented (given that there was zero preparation), and how effective things have been running, at least in businesses that are suitable for online work. The losers were schools and government administrations: those nations talking about “one laptop per child” in developing countries were often unable to organize their own pupils. In London, the first architecture office with 50 staff has reportedly canceled their office lease, not because of financial struggles from the pandemic but responding to the insight that an office is not needed any longer (given the cost of London-based office space, that’s no surprise). I would not be surprised if in the future more companies were “mostly virtual”, with occasional meetings in physical spaces rented on demand by the hour or day to stay connected on a personal level. Companies will soon turn their attention towards recovery, and leave the pandemic memory behind. But there have been 60 pandemics from 2000-2020, so one would expect some kind of institutional learning to happen in advanced organizations (CMM level 5?).

People’s behavior. People’s personal believes and the degrees of adherence to official guidance (or mandatory rules) is interesting to observe. Generally, as is perhaps expected, earthlings are ill-equipped cognitively to deal with abstract concepts and tiny viruses invisible to the eye. So what happened is people started to take the pandemic more and more seriously as soon as someone in their personal environment was affected, but no sooner. Actual behavior often differed from projected behavior, as evidenced by various senior scientists, advisors, or ministers that were caught and reported in the media to be in violation of rules they themselves promoted. Different ethical value systems also shone through, e.g. whether trading lives against business losses was seriously being considered.

Scientist’s responses. Scientists disagree with each other, and that’s fine – at least when they among each other. What is not fine is to present only one view when communicating with external (non-scientist) audiences, as this creates a misconception of consensus. On the side of public health policies, I am stunned that no-one has forcibly argued for more alignment and standardization in the counting of the infected and dead across countries. If enough information is collected for each case, governments could easily tally up counts in more than one way, which renders invalid the argument that a particular standard would not meet a country’s internal requirements or not appropriately address its needs. Even more stunning is that no strong voices have been speaking out in favor of recurring, national/regional random sampling for CoViD-19 testing with the aim of getting an unbiased view of the pandemic’s spread. Instead, debates based on data-sets known to be heavily biased were fought, and attacked as invalid, but without attempting to implement proposals to fix it.

The source of the pandemic itself. The SARS-CoViD-2 virus and the pandemic it caused (CoViD-19) are remarkable in that the virus is not very deadly, at least in relative terms when compared e.g. to Ebola, yet it caused havoc at unanticipated scale. It turns out that one of the “success” factors of the little (30 kB of information) coronavirus is exactly that it does not kill people quickly, but lets them pass on the disease to many other individuals before symptoms get very strong.

SARS-Covid-2: A Crude Back-of-the-Envelope Estimation of Deaths

Disclaimer: I am not a medic, and not a pandemic modeling researcher. But I am a computer science researcher that has made models of various kinds since the 1990s, many published and sold, and I do have a background as a former Red Cross paramedic (yes, I know how to convert a hospital in case of an Ebola outbreak and such, and I have intubated/resuscitated folks).

This post is a response to various other models that I’ve seen and found too complicated. A complex model while we do not know much instils a lack of credibility in me.

Here is a very crude (back-of-the-envelope) calculation of the overall estimated deaths per country for two countries that I know a bit better and have been following online and offline since December.

The numbers are covering the full Corona pandemic period (not just up to a certain date). The forecast:

United Kingdom: between 66,500 and 798,000 deaths
Germany: between 8,000 and 96,000 deaths

This “model” is based on the following assumptions:

We don’t really know a lot, so we need wide confidence margins. Don’t believe anyone who gives you one number.
Because of our lack of knowledge about the disease, as tempting as it may be to run a simulation, I don’t feel comfortable with that approach, as it suggests “more science” than we have.
The % of population eventually infected is: 10%-60% (taken from expert statements)
The % of exitus letalis outcomes (% infected eventually dying from or in connection with SARS-Covid-2) is: 10%-20% (my own observation from JHU: 9%-22%, rounded to 10% best case and 20% worst case, thankfully at the time of writing we’re now down from 22% to 17%)
Country populations:
Germany: 80 million
UK: 66.5 million
Response effectiveness OoM: Germany: 10E-2; UK: 10E-1, the order of magnitude difference to a “do nothing” approach (which would treated as a 10E-0 multiplier) based on my observations.
Note there are absolutely no assumptions made about the actual duration by design – the above is a pure “part of the pie” computation.
Existing knowledge (model should be consistent with these):
UK: at least 30k dead as of May 8
Germany: at least 8k dead as of May 8

I will compare these numbers against body counts on 2021-05-05. If the model is good, the total numbers of deaths (hospital and otherwise) for the two countries will lie in the two interval brackets provided.

Potential future work includes:

apply to other countries;
refine the “response effectiveness multiplier” based on a set of critical policy elements being present or not in a country;
provide (separate) forecasts for the duration of the pandemic and the financial impact.

Looking into Rust

Rust is a programming language that was started around 2014 by a Mozilla employee as a private project; its inventor managed to convince the Mozilla foundation to make it an official project, and in recent years, Rust has consistently ranked top as the language most liked by developers. It competes with Go in their joint attempt to de-thrown C/C++ as the standard language for highly performant systems programming.

The reason I got interested in Rust is because it uses strong static types and type inference. Its notation inherits some elements from the functional language ML, which is close to the mathematical notation for functions, and that in turn makes the code easy to read, e.g.

fn calculate_length(s : String) -> (String, u64) {
//.. return a tuple of a string and an unsigned 64-bit integer value
}

Ownership and Explicit Ownership Transfer

Unlike Java, Python, LISP or Go, Rust doesn’t use garbage collection. Unlike C, it also does not use explicit malloc() / free() calls, which have been difficult for developers to keep track of an a source of bugs, crashes and security vulnerabilities. So how does Rust do it?

Basically, a (non-atomic) object that leaves the scope (function, block) gets released, unless an explicit ownership transfer is demanded. References are excluded from needing ownership to reference an object, as are slices, which are contiguous ranges of container elements:

let s = String::from("hello world"); let hello = &s[0..5]; let world = &s[6..11];

For more detail, consult:
https://doc.rust-lang.org/stable/book/ch04-01-what-is-ownership.html
Rust’s compiler can also figure out at compile time when there is a chance of a dangling reference. In the words of the language manual:

“The Rust language gives you control over your memory usage in the same way as other systems programming languages, but having the owner of data automatically clean up that data when the owner goes out of scope means you don’t have to write and debug extra code to get this control.”

First experiments with Rust

Downloading and trying the rustc compiler via the cargo build system command turned out to be easy. Libraries specified as dependencies (“crates”) automatically get pulled from the Rust repository, a far cry from the effort it takes to install/build basic C++ libraries that are not header-only. The Rust compiler’s
error messages are readable, they localize errors well (not hard to do better than GNU g++ on that front) and the use of colour coding distinguishes source code fragments from the error messages proper in human-friendly ways.

The Crux

The litmus tests for a new programming language are stability, community and libraries. Without a stable syntax, serious developers quickly shy away from
investing their time and making a production bet seems to risky. Without a thriving community around a language, the continuity of development tool development, library development and general problem solving are in jeopardy (you want to be coding in something so that you can find the solution to your problem on StackExchange, really). An without available libaries that provide GUI frameworks, logging tools, regex engines, database abstraction layers, CSV readers, vizualization toolkits and other daily needs (some general, some depending on your area) your productivity will be reduced by the distraction of needing “just one day more, I need to quickly implement a hashtable library”.

I may return with a report of my Rust story after gaining a bit more experience, and after finishing reading the manual.