Menu Close

Past Weekly Research Seminars

This page is used to maintain information about our regular meetings, with links to relevant the presentations, papers and other resources.

19 May 2021 – CV – Zoom meeting

Title: Open Discussion about Gesture Recognition lead by Dr Anna Caute

Abstract: Aphasia is a communication disability caused by stroke or other brain injury. It affects 450,000 people in the UK and has a profound impact on quality of life. It is a heterogenous condition, which varies in severity and can affect all aspects of communication, both verbal and non-verbal. Research shows that use and understanding of gesture can be affected in people with aphasia (PWA). Whereas healthy speakers use a wide range of gesture types (e.g. pointing, pretending to use an object), PWA use a limited range. However, PWA rely on gesture more than healthy speakers to get their message across (van Nispen, 2017).

Speech and Language Therapists often encourage gesture use during therapy. However, it is an area that has been under-explored in research and clinicians lack evidence-based tools to assess gesture (Caute et al, 2021). Gesture assessment poses many challenges- unlike spoken language, it is hard to describe in written form due to its holistic, imagistic, transitory nature. In gesture research, coding categories are used to describe gesture forms and functions. Coding provides rich, descriptive data, but is prohibitively time-consuming to use in clinical practice.

In future, a potential solution to this may lie in technology. Novel research has explored the use of motion-tracking technology to analyse the kinematic features of gesture (Trujillo et al., 2019). Technology has also been employed to deliver computer gesture therapy for people with aphasia (Roper et al, 2016). This seminar will be an open discussion about the potential for gesture recognition technology to facilitate the clinical assessment of gesture.

Bio: Dr Anna Caute is a Lecturer in Speech and Language Therapy in the School of Health and Social Care. Her main research interests are in gesture and the use of technology in aphasia therapy. Her PhD investigated the benefits of gesture therapy for people with severe aphasia. She has researched a variety of technological applications in therapy. Recent studies have investigated the use of e-readers, text-to-speech software and portable smart camera technologies to facilitate reading for people with aphasia, the use of voice recognition software to facilitate writing and the development of a novel gesture screening tool.

28 April 2021 – NLP – Zoom meeting

Title: Discussion of the paper: “A Primer in BERTology: What we know about how BERT works” led by Tasos Papastylianou.

Abstract: We will be discussing the Rogers et al (2020) paper (download). From the paper: “Transformer-based models have pushed state of the art in many areas of NLP, but our understanding of what is behind their success is still limited. This paper is the first survey of over 150 studies of the popular BERT model. We review the current state of knowledge about how BERT works, what kind of information it learns and how it is represented, common modifications to its training objectives and architecture, the overparameterization issue and approaches to compression. We then outline directions for future research.”

24 March 2021 – CV – Zoom meeting

Title: Convolutional Autoencoder based Deep Learning Approach for Alzheimer’s Disease Diagnosis using Brain MRI by Ekin Yagis

Abstract: Rapid and accurate diagnosis of Alzheimer’s disease (AD) is critical for patient treatment, especially in the early stages of the disease. While computer-assisted diagnosis based on neuroimaging holds vast potential for helping clinicians detect disease sooner, there are still some technical hurdles to overcome. This study presents an end-to-end disease detection approach using convolutional autoencoders by integrating supervised pre- diction and unsupervised representation. The 2D neural network is based upon a pre-trained 2D convolutional autoencoder, to capture latent representations in structural brain MRI scans. Experiments on the OASIS brain MRI dataset revealed that the model outperforms a number of traditional classifiers in terms of accuracy using single slice.

Bio: Ekin Yagis is a Ph.D. student in Computer Science and Electrical Engineering at the University of Essex. She majored in Electrical Engineering at the Koc University and holds an M.Sc degree from Sabanci University, Istanbul. She works as a research officer in CSEE under supervision of  Dr. Alba Garcia and Dr. Vahid Abolghasemi. Her research interests include medical image processing, machine learning, and computer vision. She is recently focusing on the detection of neurodegeneretive diseases such as Parkinson’s and Alzheimer’s diseases using machine learning.

17 March 2021 – NLP – Zoom meeting

Title: Keys, Values, and Queries : how the vectors are generated? led by Jabir Alshehabi Al-Ani

Abstract: As part of our seminar series on word embeddings and transformers in general, this week we dive further in the secrets or the hidden information behind using specific values and where these values come from. Keys, values, and queries are the backbone of transformers. How these vectors are created is confusing even after reading many articles. The “Attention is all you need” paper described it in a very generalized form. In this presentation, I will simplify the way of describing it and how these three vectors are generated. The presentation will talk more about the role of word embeddings to generate these values vector and how after a sequence of operations we could get the three vectors.


10 March 2021 – CV – Zoom Meeting


Title: How does human Chemosignals influence the brain network involved in decision-making? by Saideh Ferdowsi

Abstract: Chemosensory communication is known as an effective way to influence the human emotion system. Phenomena like food selection or motivation, based on chemical signals, present a unique pathway between chemosensory and emotion systems. Human chemosignals (e.g. sweat) which are produced during different emotional states contain associated distinctive odors and are able to induce same emotions in other people. For instance, sweat is known as a social chemosignal participating in social interaction. Chemosignal perception engages a distributed neural network which has not been well characterized yet. In this talk, Dr Saideh Ferdows is going to illustrate how functional magnetic resonance imaging (fMRI) can be used to investigate the neural circuits underlying social emotional chemosignal processing.

Bio: Dr Saideh Ferdowsi is a senior research officer at University of Essex working on the POTION project. POTION indicates promoting social interaction through emotional body odours. She received her PhD from the University of Surrey in Biomedical signal and Image processing. Her main research interests are biomedical signal and image processing, data fusion, blind source separation and machine/deep learning, Bayesian inferencing and brain connectivity. Saideh has been exploring the application of signal processing and statistical methods for analysis of EEG and fMRI data of the human brain. Results of her researches have been published in peer reviewed journals. 

3 March 2021 – NLP – Zoom Meeting

Title: An introduction to the core concepts behind Transformers: Self-attention (part 3) by Kakia Chatsiou

Abstract: We are continuing on the theme of BERT and transformers, focusing our discussion on Positional Encoding mechanisms and residuals within transformers

We will carry on with the discussion of the following key paper (I will be leading the discussion, but bring your ideas and questions):

If we have the time we can also review the python implementation of the paper here

24 February 2021 – CV – Zoom Meeting


Title: Automation of medical document processing by Srinidhi Karthikeyan 

Abstract: Did you know how many hours pharmacist spend on processing the medical documents? In the current situation, we need more healthcare workers than ever, we can’t afford to let the pharmacist work on the documents. What can be done to solve this? Automate the work! We are working on the project that involves automation of the medical documents like discharge summaries, referral letters, accident emergency letters, etc. But most of the documents are scanned images and pdfs which doesn’t allow the OCR engine to achieve the best accuracy.  What are the challenges in processing medical documents, how to solve them? Does using a language model help us improve the accuracy of OCR output?  

Bio: Srinidhi Karthikeyan was born in India in 1997. She completed her bachelor’s degree in Computer Science and Engineering from Anna University, Chennai in 2019 and in 2020 she completed her master’s degree in Data Science from the University of Essex, Colchester, UK. Her main area of interest is Natural Language processing and computer vision.

17 February 2021 – NLP – Zoom Meeting

Title: An introduction to the core concepts behind Transformers: Self-attention (part 2) by Kakia Chatsiou

Abstract: We are continuing on the theme of BERT and transformers, focusing our discussion on self-attention mechanisms within transformers

We will carry on with the discussion of the following key paper (I will be leading the discussion, but bring your ideas and questions):

If we have the time we can also review the python implementation here, if not we can always carry that on at a following session.

9 February 2021 – CV – Zoom Meeting


Title: The histogram of gradient orientation for BCI processing: Capturing Waveforms by Rodrigo Ramele

Abstract: This talk presents a method to analyze Electroencephalographic (EEG) signals based on the analysis of their waveform shapes.  This method mimics what electroencephalographers have been doing clinically, visually inspecting, and categorizing phenomena within the EEG by the extraction of features from waveforms. These features are constructed based on the calculation of histograms of oriented gradients (i.e. SIFT) from pixels around the signal plot. This methodology could be potentially used to provide an objective framework to analyze, characterize and classify EEG signal waveforms.  We will explore the feasibility of this approach by detecting several signals widely used in the field of Brain Computer Interfaces, particularly the P300, an ERP elicited by the oddball paradigm of rare events.

Bio: Dr Rodrigo Ramele is a Computer Engineering from the Universidad Nacional de La Matanza (Argentina). He holds a Graduate Specialization in Cryptography from the Instituto Enseñanza Superior M.Savio (Argentina) and Graduate Research Specialization in Robotics and Bioengineering from Tohoku University (Japan). He completed his Ph.D in Brain Computer Interfaces at the Instituto Tecnológico de Buenos Aires (Argentina) in 2018, working on the analysis of EEG using Computer Vision techniques. Currently working as Senior Research Officer at the BCI-NE Lab of Essex University, working on BCI-based collaboratively decision making.

03 February 2021 – NLP – Zoom Meeting

Title: Attention is all you need: An introduction to the core concepts behind Transformers by Kakia Chatsiou

Abstract: In the first session of the term, we are continuing on the theme of BERT and transformers.

We will first watch an excerpt from Jay Alamar’s visual introduction to transformers “The illustrated transformer” to to remind ourselves how transformers work, and then we will be discussing the following key paper (I will be leading the discussion, but bring your ideas and questions):

If we have the time we can also review the python implementation here, if not we can always carry that on at a following session.

27 January 2021 – CV – Zoom Meeting


Title: Beyond Validation: Characterising Modes of Segmentation Failure by Tasos Papastylianou

Abstract: Validation typically answers the question “to what extent does a segmentation algorithm work?”. However, this fails to address the equally interesting questions of why, when, or how an algorithm succeeds or fails. In many clinical applications, these are non-trivial questions, since some failures may be more clinically relevant than others. We demonstrate how modes of segmentation failure can be investigated in an anatomically meaningful manner with the use of appropriate fuzzy maps / masks, which can be straightforwardly constructed to express meaningful anatomical relationships between a probabilistic segmentation object and its ground truth. This allows us to ask questions like “how much of the segmentation’s failure occurs near anatomical landmark X”, or “at an approximate distance Y from X”, “in the general direction of Z”, “twice as bad near X as near Z”, “how much of the failed parts are due to the presence of a particular anatomical substructure or anatomical artifact”, etc, thereby providing an extra layer of explainability to the validation process.

Bio:  Dr Tasos Papastylianou is a Senior Research Officer at the Brain-Computer Interfaces and Neural Engineering (BCI-NE) lab at the University of Essex, working on a US-UK Bilateral Academic Research Initiative (BARI) project led by Prof. Riccardo Poli, looking at human-AI collaborative decision-making. Prior to this, he worked as a Machine Learning and Biomedical Signal Processing researcher for the Nevermind Project (, involving intelligent tools and systems enabling depression self management in patients with secondary depression, led locally by Dr Luca Citi. He was awarded his DPhil in 2017, in the area of Biomedical Engineering and Healthcare Innovation, and specifically Medical Image Analysis at the University of Oxford. In the past he also worked as a qualified physician in the NHS and as a concert pianist. He is particularly interested in tackling problems involving applications of AI / ML in clinical practice.

9 December 2020 – NLP – Zoom Meeting

Title: BERT Applications by TBC

Abstract: TBC

2 December 2020 – CV – Zoom Meeting


Title: Essex at MediaEval Predicting Media Memorability 2020 task by Janadhip Jacutprakart

Abstract: In this presentation, Janadhip will present the approaches and results of the participation of the Essex-NLIP team to the MediaEval 2020 Predicting Media Memorability task. The task requires participants to build systems that can predict short-term and long-term memorability scores on real-world video samples provided. The Essex-NLIP team investigated the use of different pre-computed features to predict the performance of memorability scores with various regressions. We used Random Forest, Decision Tree, Gradient Boosting, Extra Tree and Sequential regression models in this experiment. Different pre-computed features were compared using regression. Additionally, feature-fusion models are proposed in this paper to explore the efficiency of possible models to provide enhanced and accurate prediction outcome for both short-term and long-term memorability. 

Bio: Janadhip Jacutprakart is currently a Computer Science PhD student at the University of Essex. She finished her MSc in Data Science with Merit at the University of Essex in September 2020, received an MBA in marketing and management from the University of Thai Chamber of Commerce in 2018 and a Bachelor of Technology in Computer Graphic and Multimedia from Bangkok University International in 2009. She is currently pursuing her PhD study under the supervision of Dr Alba García Seco De Herrera. Her research focuses on the computer vision in radiology imaging with multi-disciplinary on information retrieval and natural language processing. She’s also a researcher in the ImageCLEFcaption research project with Dr Alba Garcia Seco De Herrera.

25 November 2020 – NLP – Zoom Meeting

Title: Sesame street, Unicorns and Black holes: BERT Examples & Alternatives by Jabir Alshehabi Al-Ani & Yunfei Long

Abstract: This week we are continuing our discussion and demonstration of the Transformers Family.

18 November 2020 – CV – Zoom Meeting


Title: Simple Effective Methods for Decision-Level Fusion in Two-Stream Convolutional Neural Networks for Video Classification by Rukiye Savran Kiziltepe

Abstract: Convolutional Neural Networks (CNNs) have recently been applied for video classification applications where various methods for combining the appearance (spatial) and motion (temporal) information from video clips are considered. The most common method for combining the spatial and temporal information for video classification is averaging prediction scores at the softmax layer. Inspired by the Mycin uncertainty system for combining production rules in expert systems, this study proposes using the Mycin formula for decision fusion in two-stream convolutional neural networks. In this talk, a comparative study of different decision fusion formulas for video classification will be presented.  

Bio: Rukiye Savran Kiziltepe is a Ph.D. student in the School of Computer Science and Electronic Engineering at the University of Essex. She received a B.Sc. degree from Hacettepe University, Ankara in 2014, and an M.Sc. degree from the University of Essex in 2017. She is currently pursuing her Ph.D. studies under the supervision of Prof. John Gan. Her research concentrates on the study and development of deep learning schemes for video classification. Rukiye’ s research interests include machine learning, video processing, and computer vision. She is particularly interested in video classification using deep learning techniques.  


11 November 2020 – NLP – Zoom Meeting

Title: An Introduction to Transformers by Jabir Alshehabi Al-Ani & Yunfei Long

4 November 2020- CV – Zoom Meeting

Presentation slides


Title:  Deep neural ensembles for improved pulmonary abnormality detection in chest radiographs by Sivarama Krishnan Rajaraman

Abstract: Cardiopulmonary diseases account for a significant proportion of deaths and disabilities across the world. Chest X-rays are a common diagnostic imaging modality for confirming intra-thoracic cardiopulmonary abnormalities. However, there remains an acute shortage of expert radiologists, particularly in under-resourced settings that results in interpretation delays and could have global health impact. These issues can be mitigated by an artificial intelligence (AI) powered computer-aided diagnostic (CADx) system. Such a system could help supplement decision-making and improve throughput while preserving and possibly improving the standard-of-care. A majority of such AI-based diagnostic tools at present use data-driven deep learning (DL) models that perform automated feature extraction and classification. Convolutional neural networks (CNN), a class of DL models, have gained significant research prominence in tasks related to image classification, detection, and localization. The literature reveals that they deliver promising results that scale impressively with an increasing number of training samples and computational resources. However,  the techniques may be adversely impacted due to their sensitivity to high variance or fluctuations in training data. Ensemble learning helps mitigate these by combining predictions  and blending intelligence from multiple learning algorithms. Complex non-linear functions constructed within ensembles help improve robustness and generalization. Empirical result predictions have demonstrated superiority over the conventional approach with stand-alone CNN models. In this talk, I will describe example work at the NLM that use model ensembles to improve pulmonary abnormality detection in chest radiographs.

Bio: Dr. Sivaramakrishnan Rajaraman joined the Lister Hill National Center for Biomedical Communications (LHNCBC), National Library of Medicine (NLM), National Institutes of Health (NIH), as a postdoctoral researcher in 2016. Dr. Rajaraman received his Ph.D. in Information and Communication Engineering from Anna University, Chennai, India. He is involved in projects that aim to apply computational sciences and engineering techniques toward advancing life science applications. These projects involve use of medical images for aiding healthcare professionals in low-cost decision-making at the point of care screening/diagnostics. Dr. Rajaraman is a versatile researcher with expertise in machine learning, data science, biomedical image analysis/understanding, and computer vision. He has more than 15 years of experience in academia where he taught core and allied subjects in biomedical engineering. He has authored several national and international journal and conference publications in his area of expertise.

This page is used to maintain information about our regular meetings, with links to relevant the presentations, papers and other resources.

1 July 2020- CV – Zoom Meeting

Presentation slides

Title:  Essex at ImageCLEFcaption 2020 task: Medical Concept Detection with Image retrieval by Francisco Parrilla Andrade, Luke Bentley and Arely Aceves Compean

Abstract: ImageCLEF 2020 is an evaluation campaign that is being organized as part of the CLEF initiative labs. The campaign offers several research tasks that welcome participation from teams around the world. In this seminar we will describe our participation at the ImageCLEFcaption 2020 task.  Based on the visual image content, ImageCLEFcaption 2020 task provides the building blocks for medical image understanding step by identifying the individual components from which captions are composed. The concepts can be further applied for context-based image and information retrieval purposes. Our approach identifies the presence of relevant concepts in a large corpus of medical images with an image retrieval methodology using features extracted via DenseNet-121 model.

Bio: Francisco Parrilla Andrade, Luke Bentley and Arely Aceves Compean are currently studying at the University of Essex. As part of their work at Essex they have developed a solution for the ImageCLEFcaption 2020 achieving the 3rd position at the benchmark. 

24 June 2020- NLP – Zoom Meeting

Title: Political discourse analysis with multi-scale convolutional neural networks: Introducing the COVID-19 Press Briefings Corpus by Kakia Chatsiou

Abstract: In this paper, we report on preliminary work, building a sentence-level political discourse classifier using existing annotated corpora of political manifestos from the Manifestos Project (Volkens et al, 2019) and applying them to a corpus of COVID-19 daily press briefings (Chatsiou, 2020). We use manually annotated political manifestos as training data to train a Convolutional Neural Network Classifier, using BERT embeddings; then employ it to the COVID19 press briefings corpus to automatically classify sentences in the test corpus. We report on a series of classifier options with CNN trained on top of pre-trained word vectors for sentence-level classification tasks. We show that CNN combined with transformers like BERT outperforms CNN combined with other embeddings (Word2Vec, Glove, ELMo). We conclude with a range of options for further exploration and further work.

Bio: Kakia is currently working as a postdoctoral Senior Research Officer, at the ESRC Business and Local Government Data Research Centre, University of Essex. She is a computational social scientist and her research focuses on automated, quantitative methods of processing large amounts of textual and other forms of unstructured data – mainly political texts and social media – and the methodology of text mining. She has published on applications of measurement and the analysis of text as data on machine learning methods and deep learning. She is also applying machine learning and natural language processing techniques to the analysis of public policy. Her substantive research interests centre on resilience and the role of public policies and institutions at different levels of governance in shaping it. See her profile page for more details.

17 June 2020- CV – Zoom Meeting

Presentation slides

Title:  From Research to Application: Using Computer Vision-Based Neural Networks to Reduce Food Waste by Somdip Dey

Abstract: Computer scientists at the University of Essex developed an AI-powered food stock management app – nosh – to help users remember the expiry date of stocked items before they expire. Among many existing cool features such as recipe suggestions and showing user’s food buying and wasting trends to help reduce food waste in the household, computer vision-based neural network models are being introduced in the app to make it easier for the user to keep track of stocked food. This talk will present the working theory on how cutting edge research could be transferred into real-world applications to help people and society such as reducing food waste. Examples of using computer vision-based neural networks in the nosh application are discussed as case studies.

Bio: Somdip Dey is currently an Artificial Intelligence Ph.D. candidate working on embedded systems at the University of Essex, the U.K. His current research interests include affordable artificial intelligence, information security, computer systems engineering and computing resources optimization for performance, energy, temperature, reliability, and security in mobile platforms. He has also served as a Reviewer and TPC Member for several top conferences such as DATE, DAC, AAAI, CVPR, ICCV, ASAP, IEEE EdgeCom, IEEE CSCloud, and IEEE CSE. 

10 June 2020 – NLP – Zoom Meeting

TitleEvents in Language and the World: Integrating Conceptual and Referential Knowledge by Gosse Minnema

3 June 2020 – CV – Zoom Meeting

Presentation slides

Title:  TMAV: Temporal Motionless Analysis of Video using CNN in MPSoC by Somdip Dey

Abstract: Analyzing video for traffic categorization is an important pillar of Intelligent Transport Systems. However, it is difficult to analyze and predict traffic based on image frames because the representation of each frame may vary significantly within a short time period. This also would inaccurately represent the traffic over a longer period of time such as the case of video. We propose a novel human-inspired methodology that integrates analysis of the previous image frames of the video to represent the analysis of the current image frame, the same way a human being analyzes the current situation based on past experience. In our proposed methodology, called IRON-MAN (Integrated Rational prediction and Motionless ANalysis), we utilize Bayesian update on top of the individual image frame analysis in the videos and this has resulted in highly accurate prediction of Temporal Motionless Analysis of the Videos (TMAV) for most of the chosen test cases. The proposed approach could be used for TMAV using Convolutional Neural Network (CNN) for applications where the number of objects in an image is the deciding factor for prediction and results also show that our proposed approach outperforms the state-of-the-art for the chosen test case. We also introduce a new metric named, Energy Consumption per Training Image (ECTI). Since, different CNN based models have different training capability and computing resource utilization, some of the models are more suitable for embedded device implementation than the others, and ECTI metric is useful to assess the suitability of using a CNN model in multi-processor systems-on-chips (MPSoCs) with a focus on energy consumption and reliability in terms of lifespan of the embedded device using these MPSoCs.

Bio: Somdip Dey is currently an Artificial Intelligence Ph.D. candidate working on embedded systems at the University of Essex, the U.K. His current research interests include affordable artificial intelligence, information security, computer systems engineering and computing resources optimization for performance, energy, temperature, reliability, and security in mobile platforms. He has also served as a Reviewer and TPC Member for several top conferences such as DATE, DAC, AAAI, CVPR, ICCV, ASAP, IEEE EdgeCom, IEEE CSCloud, and IEEE CSE. 

20 May 2020 – CV -Zoom Meeting

Title: 3D Convolutional Neural Networks for Diagnosis of Alzheimer’s Disease via structural MRI by Ekin Yagis


Abstract: Alzheimer’s Disease (AD) is a widespread neurodegenerative disease caused by structural changes in brain and leads to deterioration of cognitive functions. Patients usually experience diagnostic symptoms at later stages after irreversible neural damage occurs. Early detection of AD is crucial in patients’ quality of life and start treatments to decelerate the progress of the disease. Early detection may be possible via computer assisted systems using neuroimaging data. Among all, deep learning utilizing magnetic resonance imaging (MRI) have become prominent tool due to its capability to extract high level features through local connectivity, weight sharing, and spatial invariance. In this paper, we built a 3D VGG variant convolutional neural network (CNN) to investigate the classification accuracy based on two publicly available data sets, namely, ADNI and OASIS. We used 3D models to prevent information loss from 3D MRI in process of slicing and analysing by 2D convolutional filters in 2D models.


Bio: Ekin is a Ph.D. student in Computer Science and Electrical Engineering at the University of Essex. She majored in Electrical Engineering at the Koc University and holds an M.Sc degree from Sabanci University, Istanbul.  She works as a research assistant in Nevermind project under supervision of Dr. Luca Citi and Dr. Alba García Seco de Herrera. Her research interests include medical image processing, machine learning, and computer vision. She is recently focusing on the detection of neurodegenerative diseases such as Parkinson’s and Alzheimer’s diseases using machine learning.


13 May 2020 – NLP – Zoom Meeting

Title: Deep Learning Models for Multiword Expression Identification by Shiva Taslimpoor

Abstract: Multiword Expressions (MWEs) are combinations of two or more words that together show idiosyncratic behaviour. Examples are ‘take place’, ‘give up’, ‘by and large’, etc. Such expressions pose a challenge to Natural Language Processing both from syntactic and semantic points of view. Semantically, their meaning cannot be straightforwardly conveyed from the meaning of their components. Moreover, such expressions might not flexibly go through all standard syntactic changes. Identification of MWEs not only helps in lexicography to build better dictionaries but also enables us to detect actual uses of the expressions in running texts. Over the years, MWE identification in running text has been modelled similar to named entity recognition using sequence labelling. However, we believe detecting these expressions is more challenging and study one of their special characteristics which is their potential to be discontinuous. We propose a new neural methodology which combines graph convolutional networks (relying on dependency parse information) and self-attention mechanism to capture long-range dependencies between components of MWEs. The results show the outperformance of the model over previous approaches. In this talk, I give a brief overview of my research on MWEs and present the recently developed model and the results. 

Bio: Shiva Taslimipoor is a Research Associate at the University of Cambridge. Her research lies in the in intersection of NLP and machine learning. She is a member of the ALTA Institute where she works on machine learning technologies for language teaching and assessment. Shiva did her PhD at the University of Wolverhampton, with a thesis on automatic identification and translation of multiword expressions, where she extensively investigated different methodologies for sequence tagging.

6 May 2020 – CV – Zoom meeting

Title: Adaptive Vision for Human Robot Collaboration by Dimitri Ognibene 

Abstract: Unstructured social environments, e.g. building sites, release an overwhelming amount of information yet behaviorally relevant variables may be not directly accessible. Currently proposed solutions for specific tasks, e.g. autonomous cars, usually employ over redundant, expensive, and computationally demanding sensory systems that attempt to cover the wide set of sensing conditions which the system may have to deal with. Adaptive control of the sensors and of the perception process input is a key solution found by nature to cope with such problems, as shown by the foveal anatomy of the eye and its high mobility and control accuracy. The design principles of systems that adaptively find and selects relevant information are important for both Robotics and Cognitive Neuroscience. At the same time, collaborative robotics has recently progressed to human-robot interaction in real manufacturing. Measuring and modeling task specific gaze behaviour is mandatory to support smooth human robot interaction. Indeed, anticipatory control for human-in-the-loop architectures, which can enable robots to proactively collaborate with humans, heavily relies on observed gaze and actions patterns of their human partners. The talk will describe several systems employing adaptive vision to support robot behavior and their collaboration with humans. 

Bio: Dr Dimitri Ognibene obtained his PhD in Robotics from the University of Genoa in 2009. Before joining the University of Essex, he has been performing experimental studies and developing formal methods for active social perception at UPF, Barcelona, as a Marie Skodowska-Curie COFUND Fellow; developing algorithms for active vision in industrial robotic tasks as a Research Associate (RA) at Centre for Robotics Research, Kings College London; devising Bayesian methods and robotic models for attention in social and dynamic environments as a RA at the Personal Robotics Laboratory in Imperial College London; studying interaction between active vision and autonomous learning in neuro-robotic models as a RA at Institute of Cognitive Science and Technologies of the Italian Research Council (ISTC CNR). He also collaborated with Wellcome Trust Centre for Neuroimaging (UCL) to address the exploration issue in the currently dominant neurocomputational modelling paradigm. Dr Ognibene has also been Visiting Researcher at Bounded Resource Reasoning Laboratory in UMass and at University of Reykjavik (Iceland) exploring the symmetries between active sensor control and active computation or metareasoning. Dr Ognibene presented his work in several international conferences on artificial intelligence, adaptation, and development and published on international peer-reviewed journals. Dr Ognibene was invited to speak at the International Symposium for Attention in Cognitive Systems (2013 and 2014) as well as in other various neuroscience, robotics and machine-learning international venues. Dr Ognibene is Associate Editor of Paladyn, Journal of Behavioral Robotics, and has been part of the Program Committee of several conferences and symposiums.

29 April 2020- Zoom Meeting

Title: HealTAC 2020 Highlights  by Jon Chamberlain

Bio: Jon is the lead of the Natural Language and Information Processing research group, incorporating the Language and Computation (LAC) inter-disciplinary group, at the University of Essex, which covers a diverse range of topics during regular meetings such as statistical analysis to using Prologue programming language.

He has been a reviewer for journals and conferences such as PlosONEJMIRNatureLREVIJHCSCHISIJACL 2020AACL-IJCNLP 2020EMNLP 2020COLING 2020 and BMC, and am Associate Editor for the journal Human Computation. He has been on the programme committee for workshops including: PlayIT (WWW2011); Disco2013; GCI2012; GamifIR 2014; GAMNLP 2019AnnoNLP 2019; and SCSD 2019. He is a co-chair of the Games4NLP workshop series that has previously been held at EACL 2017 and LREC 2018.

He also writes for the British Computer Society and review proposals for EPSRC.

11 March 2020 – NLP – colloquium room (5A.540)

Title: Generating descriptions of objects in visual scenes by Micha Elsner (Departmet of Psychology, University of Essex)

Abstract: This volume of text about orders is an invaluable source of information which needs to be effectively summarized in a way that can help desk agents to have a clear picture of the order journey instead of check the order information from several places for the time that a customer calls in to find out about the progress of their order.

Bio: Along with his colleagues in the Clippers lab group, Micha is a computational linguist at The Ohio State University. He is also a member of the Buckeye Language Network and the Center for Cognitive and Brain Sciences. He builds computational models of infant language acquisition, especially phonetics, phonology and morphology. He also works on the relationship between language and vision.  Micha graduated from Brown University in 2011 (advised by Eugene Charniak) and then worked a postdoc at the University of Edinburgh with Sharon Goldwater before going to OSU. His thesis was on modeling discourse coherence.

22 April 2020 – CV


Title: Single Sample Augmentation using GANs for Enhancing the Performance of Image Classification by  Shih-Kai Hung

Abstract: It is difficult to achieve high performance without sufficient training data for deep convolutional neural networks (DCNNs) to learn. Data augmentation plays an important role in improving robustness and preventing overfitting in machine learning for many applications such as image classification. In this paper, a novel method for data augmentation is proposed to solve the problem of machine learning with small training datasets. The proposed method can synthesize similar images with rich diversity from only a single original training sample to increase the number of training data by using generative adversarial networks (GANs). It is expected that the synthesized images possess class-informative features, which may be in the validation or testing data but not in the training data due to that the training dataset is small, and thus they can be effective as augmented training data to improve the classification accuracy of DCNNs. The experimental results have demonstrated that the proposed method with a novel GAN structure for training image data augmentation can significantly enhance the classification performance of DCNNs for applications where original training data is limited. 

Bio: Shih-Kai Hung is a first-year PhD student in CESS, working on computer vision and deep learning. He completed his BSc in Electric Engineer department but shifted his MSc thesis towards computer science through internet security and watermark. His current research focuses on image synthesis using generative adversarial networks (GANs) to solve the problem of owning very limited data in image classification, which traditionally requires a large number of labelled images to reach the high performance. 

04 March 2020 – CV

Title: DisplaceNet by Dr Grigorios Kalliatakis

Abstract: Every year millions of men, women and children are forced to leave their homes and seek refuge from wars, human rights violations, persecution, and natural disasters. The number of forcibly displaced people came at a record rate of 44,400 every day throughout 2017, raising the cumulative total to 68.5 million at the years end, overtaken the total population of the United Kingdom. Up to 85% of the forcibly displaced find refuge in low- and middle income countries, calling for increased humanitarian assistance worldwide. To reduce the amount of manual labour required for human-rights-related image analysis, we introduce DisplaceNet, a novel model which infers potential displaced people from images by integrating the control level of the situation and conventional convolutional neural network (CNN) classifier into one framework for image classification.  

Bio: Dr Grigorios Kalliatakis is a computer vision researcher with a computer science background in image analysis and AI working with the ESRC Human Rights, Big Data and Technology (HRBDT) Project, housed at University of Essex. His current research focuses on the development and application of methods for interpreting and analysing complex imagery in order to automate the visual recognition of various human rights violations. He has experience on a variety of computer vision-related topics, from image classification and image interpretation to scene understanding and big data. 

19 February 2020 – CV


Title: SoCodeCNN: A new approach to teaching machines to understand program source code using computer vision methodologies by Somdip Dey

Abstract: Automated feature extraction from program source-code such that proper computing resources could be allocated to the program is very difficult given the current state of technology. Therefore, conventional methods call for skilled human intervention in order to achieve the task of feature extraction from programs. This research work named SoCodeCNN is the first to propose a novel human-inspired approach to automatically convert program source-codes to visual images. The images could be then utilized for automated classification by visual convolutional neural network (CNN) based algorithm. 

Bio: Somdip Dey is currently an Artificial Intelligence Ph.D. candidate working on embedded systems at the University of Essex, the U.K. His current research interests include affordable artificial intelligence, information security, computer systems engineering and computing resources optimization for performance, energy, temperature, reliability, and security in mobile platforms. He has also served as a Reviewer and TPC Member for several top conferences such as DATE, DAC, AAAI, CVPR, ICCV, ASAP, IEEE EdgeCom, IEEE CSCloud, and IEEE CSE. 

5 February 2020 – CV

Abstract: Marine conservation often relies on long term monitoring projects to effectively set and maintain sustainability goals. Current methods are time-consuming and restrictive for large scale areas, often relying on human annotation for classifying and quantifying substrates either in-situ or photographically. ImageCLEFcoral was set up in 2019 to challenge teams in developing systems for the automatic annotation and localisation of substrates from photographs, with the aim of greatly speeding up data collection and allow for monitoring to be expanded in scale and scope. The naturally varied morphology of reef substrates pose a greater challenge than normally faced by machine and deep learning algorithms, making this one of the more complex ImageCLEF tasks. Entrants to the 2020 task will continue to push forward in this challenge, with the aim to continually improve the accuracy of annotations year after year. 

Bio: Jessica Wright is an interdisciplinary PhD student in CSEE and Biological Sciences, working in 3D reconstruction of coral reef systems. She completed her BSc and MSc in Marine Biology, but her MSc thesis shifted her towards computer science through 3D modelling as a tool for reef complexity measurements and monitoring of natural systems. Soon after her PhD began, she started working with the ImageCLEFcoral team to annotate reef substrate images in the hopes of developing an effective system for monitoring reefs and prioritising conservation where it is most needed. 

22 January 2020 – CV

Presentation slides

Title: Evaluation of Fuzzy and Probabilistic segmentation algorithms by Dr Tasos Papastylianou

Abstract: Validation is a key concept in the development and assessment of medical image segmentation algorithms. However, the proliferation of modern, non-deterministic segmentation algorithms has not been met by an equivalent improvement in validation strategies. 

In this talk, we will be making the case that extant validation practices can lead to false results in the presence of probabilistic algorithms and gold standards. We will then briefly examine the state of the art in validation, and propose an improved validation method for non-deterministic segmentations, showing that it improves validation precision and accuracy on both synthetic and clinical sets, compared to more traditional (but still widely used) methods and state of the art. 

Bio:  Dr Tasos Papastylianou is a Senior Research Officer in Machine Learning and Biomedical Signal Processing, working on the Nevermind Project ( ) which involves intelligent tools and systems enabling depression self management in patients with secondary depression. Prior to this he was awarded his DPhil in November 2017, in the area of Biomedical Engineering, and specifically Medical Image Analysis, via the CDT in Healthcare Innovation at the University of Oxford. During this time he also co-founded Sentimoto Ltd, a company specialising in wearable and mobile analytics for older adults. Before his DPhil, Tasos was a qualified physician working in the NHS. 

4 December 2019 – CV

Title: Combining Very Deep Convolutional Neural Networks and Recurrent Neural Networks for Video Classification by Rukiye Savran Kiziltepe

Abstract: Convolutional Neural Networks (CNNs) have been demonstrated to produce outstanding performance in image classification problems. Recurrent Neural Networks (RNNs) have been utilized to make use of temporal information for time series classification. The main goal of this study is to examine how temporal information between frame sequences can be used to improve the performance of video classification using RNNs. In this talk, a comparative study of seven video classification network architectures will be presented. 

BioRukiye Savran Kiziltepe is a Ph.D. student in the School of Computer Science and Electronic Engineering at the University of Essex. She received a B.Sc. degree from Hacettepe University, Ankara in 2014, and an M.Sc. degree from the University of Essex in 2017. She is currently pursuing her Ph.D. studies under the supervision of Prof. John Gan. Her research concentrates on the study and development of deep learning schemes for video classification. Rukiye’ s research interests include machine learning, video processing, and computer vision. She is particularly interested in video classification using deep learning techniques. 

13 November 2019 – CV

Presentation slides

Title: Deep Learning for Neurological Disease Classification by Ekin Yagis

Abstract: In recent years, convolutional neural networks (CNNs) have been used to detect and classify a range of diseases from cancer to neurological disorders. In this talk, generalization performance of the networks on the classification of the two most common neurological disorders namely Parkinson’s Disease (PD) and Alzheimer’s Disease (AD) will be discussed. 

BioEkin Yagis is a Ph.D. student in Computer Science and Electrical Engineering at the University of Essex. She majored in Electrical Engineering at the Koc University and holds an M.Sc degree from Sabanci University, Istanbul.  She works as a research assistant in Nevermind project under supervision of Dr. Luca Citi and Dr. Alba García Seco de Herrera. Her research interests include medical image processing, machine learning, and computer vision. She is recently focusing on the detection of neurodegeneretive diseases such as Parkinson’s and Alzheimer’s diseases using machine learning. 

21 March 2019 – 3pm, Colloquium Room (5A.540)

Title: An Interactive Image Retrieval Approach to Searching for Images on Social Media by Orland Hoeber, University of Regina, Canada

Abstract: Searching for images posted within social media services such as Twitter relies on matching textual queries to the contents of the posts that include the images. Unfortunately, social media posts may not always provide accurate or meaningful descriptions of the contents of the embedded images, making searching for images a challenging task. In this research, we augment the textual contents of the posts with new information extracted from the images using image processing and deep learning methods, and provide a visual interface to enable interactive image retrieval. A user study was conducted with 28 participants to collect evidence on how our approach was used in relation to Vakkari’s three-stage model of information seeking. We also analyzed participants’ perceptions of usefulness, ease of use, and satisfaction in comparison to a common grid-based image search interface. The results from this study highlight the value of providing visual and interactive features to enable searchers to discover images from social media sources.

Note that this is the presentation I will be giving at CHIIR in Glasgow, with some additions to more clearly explain the value of visualization and interaction in supporting information seeking tasks.

Short Bio
Dr. Orland Hoeber is a Professor in the Department of Computer Science at the University of Regina. His primary research interests are in the areas of information visualization, interactive information retrieval, visual analytics, geovisual analytics, sport analytics, social media, and mobile computing. Dr. Hoeber teaches courses on web and database programming, mobile computing, and information visualization. He has an active research team working on the design, development, and study of visual and interactive software to support exploration, analysis, reasoning, and discovery in a broad range of information-centric domains. Adaptive Vision for Human Robot Collaboration 3D Convolutional Neural Networks for Diagnosis of Alzheimer’s Disease via structural MRI

This page is used to maintain information about our regular meetings, with links to relevant papers and other resources.

For the Spring Term 2016, The reading seminar sessions will be on deep learning and are convened by Deirdre Lungley. The research group meetings will be on recent work we have presented and are convened by Jon Chamberlain.

Venue: Colloquium room (next to Udo’s office)

Spring Term 2015

  • Mon 8 Feb (12-1pm) – Reading Seminar – Deep Learning

We will start by reviewing this blog: The Unreasonable Effectiveness of Recurrent Neural Networks

  • Mon 22 Feb (12-1pm) – Reading Seminar – Deep Learning

We’ll be reviewing the slides on back-propagation available here:

  • Wed 24 Feb (4-6pm) – Departmental Seminar – Sien Moens, Argumentation Mining – 1N1.4.1

Abstract: Argumentation mining is currently in the center of attention of the text mining research community. In human discourse – whether written or spoken – argumentation always plays an important role. Arguing means that you claim that something is true and you try to persuade your audience that your claim is true by providing evidence to support your claim. Argumentation mining can be defined as the detection of the argumentative discourse structure in text or speech and the recognition or functional classification of the components of the argumentation. Argumentation mining is part of the broader field that recognises rhetorical discourse structures in text, where rhetoric is the art of discourse that aims to improve the capabilities of writers and speakers to inform, persuade or motivate particular audiences in specific situations.
The lecture will focus on the text mining methods to accomplish the structuring of the discourse and classification of argumentation components and their relations. It will discuss machine learning methods that recognize structures in discourse and methods of distributional semantics that find argumentative relations between the text segments. We illustrate the talk with our own work on argumentation recognition in court decisions.
Argumentation mining refines search and information retrieval tasks or provides the end user with instructive visualizations and summaries of an argumentative structure. The idea is to build tools that help users to quickly find arguments that sustain a certain claim or conclusion without having to read tons of information.

Short Bio: Marie-Francine Moens is a full professor at the Department of Computer Science at KU Leuven, Belgium. She holds a M.Sc. and a Ph.D. degree in Computer Science from this university. She is head of the Language Intelligence and Information retrieval (LIIR) research group and is a member of the Human Computer Interaction unit. She is currently also head of the Informatics section of the Department of Computer Science at KU Leuven. Her main interests are in the domain of automated content recognition in text and multimedia data and its application in information extraction and retrieval using statistical machine learning, and exploiting insights from linguistic and cognitive theories. She is currently a member of the Council of the Industrial Research Fund of KU Leuven and is the scientific manager of the EU COST action iV&L Net (The European Network on Integrating Vision and Language). She is a member of the editorial board of the journal Foundations and Trends® in Information Retrieval. In 2011 and 2012 she was appointed as chair of the European Chapter of the Association for Computational Linguistics (EACL) and was a member of the executive board of the Association for Computational Linguistics (ACL). From 2010 until 2014 she was a member of the Research Council of KU Leuven.

  • Mon 29 Feb (12-1pm) – Research Group Meeting directly followed by:
  • Mon 29 Feb (1-2pm) – Reading Seminar – Deep Learning

Continuing to review the above slides on back-propagation.

  • Mon 7 March (12-1pm) – Reading Seminar – Deep Learning
  • Mon 14 March (12-1pm) – Research Group Meeting (Mijail Kabadjov, CSEE)
  • Wed 16 March (4-6pm) – Departmental Seminar – Josef Steinberger – Media Gist

MediaGist: A cross-lingual analyser of aggregated news and commentaries
Dr Josef Steinberger (University of West Bohemia)
16:00 Weds 16 March 2016

MediaGist is an online system for crosslingual analysis of aggregated news and commentaries based on summarisation and sentiment analysis technologies.
It is designed to assist journalists to detect and explore news topics, which are controversially reported or discussed in different countries.
News articles from current week are clustered separately in currently 5 languages and the clusters are then linked across languages. Sentiment analysis provides a basis to compute controversy scores and summaries help to explore the differences. Recognized entities play an important role in most of the system’s modules and provide another way to explore the data. I will describe the key modelues of the system and demonstrate capabilities of MediaGist by highlights from the last weeks.

Short Bio: Josef Steinberger is an associate professor at the Department of computer science and engineering at the University of West Bohemia, Czech Republic. He holds a M.Sc. and a Ph.D. degree in Computer science from this university. His main interests are in the domain of summarisation and sentiment analysis. From 2009 until 2012 he joined the team at the Joint Research Centre of the European Commission, Italy, to work on Europe Media Monitor. Building media monitoring solutions brought other topics of interest, like news clustering, categorisation or named entity recognition. To achieve high multilinguality his aim is to limit dependency on a particular language or building multilingual resources. He has been actively involved in the Multiling community, which organizes summarisation evaluation campaigns. In 2013, he received Marie Curie funding with the MediaGist project.

  • Mon 14 March (6pm) – London Text Analytics Meet up

Setup and co-organised by Professor Udo Kruschwitz, this group is for people interested in learning about and discussing topics related to text analytics (aka natural language processing). The group began life as the London GATE users group, but has since expanded to embrace other NLP platforms / toolkits and a growing interest in the use of text analytics for applications in areas such as search, social media, intelligence, life sciences, customer experience and more. They welcome both researcher and practitioner viewpoints alike.

Interests include:
Text Analytics · Natural Language Processing · Text Mining · Search, Information Retrieval · Speech Recognition

  • Mon 21 March (12-1pm) – Reading Seminar – Deep Learning

This page is used to maintain information about our regular meetings, with links to relevant papers and other resources.

Autumn Term 2014

  • Friday, 5th December: Profile-Based Summarisation for Web Site Navigation
    Time: 2-4pm
    Room: Colloquium Room

We presented in detail the work we conducted which has been accepted in one of the leading journal in the area of information retrieval (ACM TOIS). We also provided some information about the different types of statistical tests. In particular, the tests we have applied to our data.

  • Friday, 21st November: Time: 2-3pm, Room: Colloquium Room

Significant test, what makes NLP different from other fields and what problems NLP faces. Will be good to have a look at Søgaard et al. 2014 paper: What is in a p-value in NLP? (CoNLL).
By: Ans Alghamdi

  • Friday, 7th November: Time: 2-3pm
    Room: Colloquium Room

Everyone is invited to talk about what he’s up to, plus we will plan some talks for the next couple of meetings.

Spring term 2014

This term we looked at big data computation with Hadoop and other tools.

This is the wiki for the Language and Computation Group at the University of Essex. This page is used to maintain information about our regular meetings, with links to relevant papers and other resources.

Summer Term 2014

  • Tuesday, 17th June 2014: Research Group Meeting; Location and time: TBA, 12.00-2.00

Silviu Paun will give a dry run of the work he’s going to present at PGNET

  • Tuesday, 3rd June 2014: Prof. Stephen Pulman (University of Oxford) – Compositional sentiment analysis; Location and time: 1N1.4.1, 12.00-2.00

Abstract: Sentiment Analysis – recognising positive and negative attitudes expressed in text – has become a very popular application of computational linguistics techniques, spawning a large number of startups, and generating a lot of commercial interest. In this talk I summarise recent research and trends in sentiment analysis, look at some relatively novel applications, and also critically examine various claims that have been made about the role of sentiment analysis in tasks like stock market prediction and election result forecasting.


Short bio: Stephen Pulman is a Professorial Fellow of Somerville College, Oxford, and a Fellow of the British Academy. He has also held visiting professorships at the Institut für Maschinelle Sprachverarbeitung, University of Stuttgart; and at Copenhagen Business School. He is a co-founder of TheySay Ltd, a sentiment analysis company which was spun out of the department in 2010.

  • Tuesday, 20th May 2014: Research Group Meeting; Location and time: 3.501, 12.00-2.00; Richard Sutcliffe and Chris Fox (University of Essex) – C@MERATA at MediaEval 2014 – Extracting Answer Passages from Classical Music Scores using Natural Language Descriptions

Abstract: When studying musicological analyses of works of western classical art music there are frequent references to relevant passages in the printed score. Indeed, musicologists can refer to very complex aspects of a score within a text. Other experts know what passages they are talking about because they can interpret musical terminology in an appropriate fashion and can then look through scores to find the passages in question. However, this can be time consuming. So the long-term aim here is to develop tools which could facilitate the work of musicologists.

In the C@merata task, there will be a series of questions with required answers. Each question will consist of a short noun phrase in English referring to musical features in a score and a short classical music score in MusicXML. The required answer will consist of zero or more passages occurring in the score which contain the musical features specified in the question. A passage consists of a start point and an end point in the score associated with the question.

This talk will present our work up to now on the C@merata Task, including details of the task and directions for the future.

More information about the project is available from the C@MERATA 14: Question Answering on Classical Music Scores page

  • Tuesday, 8th April 2014, CS Colloquium room, 12.00-2.00

The group met to discuss ideas about what to cover in the summer term and the frequency of meetings.

  • Wednesday, 26th February 2014: Bob Carpenter (Columbia University) – Probabilistic Models of Annotation – Room and Time: CS Colloquium room, 12.00-2.00

Abstract: Standard agreement measures for inter-annotator reliability are neither necessary nor sufficient to ensure a high quality corpus. Compared to conventional agreement measures (e.g., kappa, alpha), probabilistic annotation models provide more information about quantities of interest including gold standard labels, sense prevalence, and individual annotator accuracies and biases.

I will present a large scale, open access, case study of word sense annotation using examples drawn across genres from the American National Corpus (ANC) with WordNet word senses. I will contrast in-house trained and supervised annotators with crowdsourced annotations gathered using Amazon’s Mechanical Turk.

I will conclude with an application of mutual information (i.e., expected information gain) to measure the value of gathering a label from an annotator.

This is joint work with Becky Passonneau.

  • Thursday, 6th February: Gareth J. F. Jones, CNGL: Centre for Global Intelligent Content, School of Computing, Dublin City University, Ireland – Utilizing Recommender Algorithms for Enhanced Information Retrieval – Date and time: 10-12, Room 3.411

Abstract: Retrieving relevant items which meet a user’s information need is the
key objective of information retrieval (IR). Current IR systems
generally seek to satisfy search queries independently without
considering search history information from other searchers. By
contrast, algorithms used in recommender systems (RSs) are designed to
predict the future popularity of an item by aggregating ratings of the
reactions of previous users of an item. This observation motivates us to
explore the application of RS methods in IR to increase search
effectiveness. In this study, we examine the suitability of recommender
algorithms (RAs) for use in IR applications and methods for combining
RAs into IR systems by fusing their respective outputs. Novel methods
are introduced including an adapted RA to enhance performance of RSs in
our integrated application, and an approach using cluster-based link
analysis. Experimental results are reported for an extended version of
the FIRE 2011 personalized IR data collection.

(joint work with Wei Li)

Autumn Term 2013

This term we looked at big data computation with Hadoop and other tools.

  • Big Data sessions – MapReduce

In this first “Big Data” meeting we are going to focus on MapReduce – an originally developed by Google programming model used to process large data sets in a parallel manner.
The meeting will take place on Thursday 7th November between 10:00 AM – 12:00 AM
The location is: Room 3.411
Speaker: Silviu Paun

  • Big Data sessions – MapReduce part 2

We decided to break down the MapReduce workshop into 2 subsequent sessions due to the large volume of material that is required to be covered in order to offer a comprehensive understanding of the framework in question.

The meeting for this second session on MapReduce will take place on Thursday 14th November between 10:00 AM – 12:00 AM
The location is: Room 3.411
Speaker: Silviu Paun

  • Big Data sessions – MapReduce part 3

The focus of this third session is on Inverted Indexing for Text Retrieval purposes and on various types of indexing compression techniques.

The meeting will take place on Thursday 21th November between 10:00 AM – 12:00 AM

The location is: Room 3.411
Speaker: Deirdre Lungley

  • Big Data sessions – MapReduce part 4

The session is a lab session and it targets the practical side of MapReduce – a hands-on experience with the code in a Hadoop installed environment. I will go through some code samples and explain step by step how all the Hadoop components (including MapReduce) work in a symbiotic manner.

The meeting will take place on Friday 20 December 2013 between 11:00 AM – 1:00 PM.

The location is: Room 5B.531
Speaker: Silviu Paun

  • Big Data sessions – MapReduce part 5

The session will focus on graph processing in map reduce; more specifically, it will illustrate how algorithms such as Dijkstra and PageRank can be written in a map reduce style.

The meeting will take place on Thursday 30 January 2014 between 10:00 AM – 12:00 AM.

The location is: Room 3.411
Speaker: Silviu Paun

Language and Computation Day

Monday, 30th of September 2013, all day — 1N1.4.1

This is a wiki for the Language and Computation Group at the University of Essex. This page is used to maintain information about our regular meetings, with links to relevant papers and other resources.

Language and Computation Day

Thursday, 4th October 2012, all day — 1N1.4.1

Autumn Term 2012

  • Week 8 – BUGS to LDA – Wednesday, 21st October 2012, 12-1pm – room 2.408 – Bayesian inference Using Gibbs Sampling to Latent Dirichlet Allocation

This is a wiki for the Language and Computation Group at the University of Essex. This page is used to maintain information about our regular meetings, with links to relevant papers and other resources.

Language and Computation Day

Week One, Friday, 7th October 2011, all day — room 3.405

Autumn Term 2011

Autumn Term meetings were normally scheduled at 13.00 till 14.00 alternative Thursdays and Fridays in rooms 3.406 and 2.406 respectively. For more details see the week-by-week schedule breakdown below. We usually continued discussions over coffee/lunch.

  • Week 8 – Thursday, 24th November 2011, 1-2pm – 2.406

This week, Lubomir Krcmar will present his recent work on “Composionality prediction using semantic spaces”. Lubomir will first talk about the Query Expansion technique in Information Retrieval linking it with semantic spaces and the problem of composionality. Finally, he will focus on and discuss the ACL 2011 task and the composionality prediction.

His slides are available from this link.

  • Week 9 – Friday, 2nd December 2011, 1-2pm – 3.406

Dyaa has offered to do a short presentation on WEKA and Bayesian Networks.

  • Week 10 – Thursday, 8th December 2011, 1-2pm – 2.406

Mahmoud is giving a presentation. Further info to be confirmed soon.

  • Week 11 – Friday, 16th December 2011, 1-2pm – 3.406

Massimo is carrying on with Knight’s tutorial “Bayes with tears”. (page 13 onwards).

Saturday, 17th December 2011, tbc – Richard’s House
LAC group Christmas social. More info to be circulated to the lacworkshop list in due course.

Spring Term 2012

  • Week 16

We will look at the first Chapter of the Gelman and Hill book (2006) on Data analysis using regression and multilevel/hierarchical models. ISBN:9780521686891
DOI: 10.2277/ 052168689X. The table of contents is available from this link.

The Albert Sloman Library holds a printed copy (HA 31.3.G4) and an online book (record) that can be viewed here via the NetLibrary. Check this link too (essex users only).


This is a wiki for the Language and Computation Group at the University of Essex. This page is used to maintain information about our regular meetings, with links to relevant papers and other resources.

Autumn Term meetings were normally scheduled at 12.00 till 13.00 every Thursday in Room 3.410. We usually continued discussions over coffee/lunch.

Spring Term meetings were normally scheduled at 13.00 till 14.00 every Thursday in Room 3.407 for wks 19-20 and Room 3.411 for wks 21-24. We usually met at 12.00 before the meeting for a coffee/lunch and chat.

Summer Term meetings are normally scheduled at 13.00 till 14.00 every Thursday in Room 5B.105 for wks 31-32 and Room 5A.108 for wks 33-39. We usually meet at 12.00 before the meeting for a coffee/lunch and chat.

Topics for 2010—11

  • To be scheduled, from last year:
    • Analysis of Real Data in R
    • Doug on BAe message analysis project
  • To be scheduled, from this year’s planning meeting:
    • Doing Statistics with R
    • shallow named entity recognition/disambiguation
    • adapting to new domains

Autumn Term 2010

Meeting held to decide timing and topics of LAC meetings during the Autumn Term.

The time convenient for most present, was Thursdays 12-1pm or Thursdays 4pm-5pm (with a preference on the first slot), both followed by coffee/lunch/drinks (depending on the hour) somewhere on campus, starting from Thursday 28th October and meeting weekly.

Could those not present let the group know if the times suggested are convenient, by emailing the list?

The topics that we thought would be interesting to look at this term were added at the top of this page.

Massimo will lead the first meeting, suggesting papers to the group to read (links and further information to appear soon on this page).

  • Week 3 – Thursday, 21th October 2010

No meeting this week.

  • Week 4 – Thursday, 28th October 2010, 12pm-1pm in 3.410

Massimo will lead the first meeting, discussing the following paper: Edgar Meij , Marc Bron , Laura Hollink , Bouke Huurnink , Maarten De Rijke. Learning Semantic Query Suggestions. [citeseerX record] [pdf]

  • Week 5 – Thusday, 4th November 2010, 12pm-1pm in 3.410 – Virtual Machines for NLE.

Dyaa to give a demo of a virtual machine. Discussion of tools that could be installed on a LAC group virtual machine to provide a common NLE development environment. This would allow people to use and share a common environment without being obliged to install, upgrade or downgrade packages installed on their own machines, and without having to install a particular host operating system.

For some general information on virtual machines, see for example VirtualBox and VMware. Other virtual machines are available, but many are either not “free” or do not run under MS Windows. For licensing reasons, pre-packaged virtual machines typically run a “free” operating system, such as Linux or some version of BSD. Many virtual machines, such as VirtualBox and VMware, can host MS Windows. This may be covered by the University’s site licence. Although technically it may be possible to do so, Apple does not allow its operating systems to be installed in a virtual machine.

  • Week 6 – Thursday, 11th November 2010, 12pm-1pm in 3.410 – Review a selection of ACL 2010 papers

Deirdre – P10-1138 : Celina Santamaría; Julio Gonzalo; Javier Artiles
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results [pdf]

Kakia – P10-1107 : Benjamin Snyder; Regina Barzilay; Kevin Knight
A Statistical Model for Lost Language Decipherment [pdf]

  • Week 7 – Thursday, 18th November 2010, 12pm-1pm in 3.410 – Review a selection of ACL 2010 papers

Dyaa – P10-2068 : David Vickrey; Oscar Kipersztok; Daphne Koller
An Active Learning Approach to Finding Related Terms [pdf]

Jon – P10-1030 : Fei Wu and Daniel S. Weld
Open Information Extraction Using Wikipedia [pdf]

  • Week 8 – Thursday, 25th November 2010,12pm-1pm in 3.410

We reviewed selected papers for an upcoming Information retrieval conference.

  • Week 9 – Thursday, 2nd December 2010, 12pm-1pm in 3.410

Dyaa could take us through the session track results for TREC 2010 or we can carry on reviewing papers with Udo. Or we can come up with something else.

Kakia could present (now or in the future) an overview or selected papers from the NAACL HLT 2010 Fifth Workshop on Innovative Use of NLP for Building Educational Applications: or the Proceedings of the 2010 Workshop on NLP and Linguistics: Finding the Common Ground

Please put your suggestions on the wiki.

  • Week 10 – Thursday, 9th December 2010, 12pm-1pm in 3.410 – Invited talk by Andrew Anderson, UCL – “On measuring covariation in speech production”.

Abstract: Over the past 15 years measures of speech motor variability in repetitions of the same utterance have become popular in: indexing the severity of motor disorders; charting development and aging; and probing how experimental manipulations (e.g. linguistic complexity, speech speed) tax the production process. Although they have been successful in teasing out differences between experimental groups, measures of variability have been crude in the following senses: They tend to focus on a single or small subset of elements contributing to speech production (most commonly lower lip displacement); They tend not to take into account spatial and temporal components of variation (that can to an extent be modified independently in production); Where covariation across effectors has been measured, this has almost exclusively been based on point wise comparisons (e.g. covariation in the timing of turning points across effectors).

There is intuitive motivation to move to a more holistic analysis of production, taking into account continuous activity in as many components as can be measured. This talk: grounds this motivation in experimental data; shows how Functional Data Analysis (statistical methods focusing on analysing curves) can be extended to derive continuous measures of spatio- temporal synchronisation and asynchrony across multiple effectors; considers how non-invasive audio/visual and MRI can be applied to measure joint activity conventionally measured invasively. Outlines MATLAB software packages written by the author to allow non-specialised users to undertake complex analyses of biomechanics with point and click graphical means.

  • Week 11 – Thursday, 16th December 2010, 12pm-1pm in 3.410

Spring Term 2011

  • Weeks 16 -18 – no meetings
  • Week 19 – Thursday, 10th February 2011, 1pm-2pm in 3.407 – Spring term planning meeting

We agreed that 1-2pm was a convenient time for most and that Kakia would be booking a room (the same or any in this corridor if possible) for future meetings.

The group agreed that the topic that Bayesian models is and interesting topic to look into for this term, and for that, we will be going through the chapters of the following book: Title tbc, Dafne Koller

Other resources to look into suggested in the meeting include:

    • Collective Intelligence book (reference to be confirmed)
    • WEKA tools for visualising bayesian methods


  • Week 20 – Thursday, 17th February 2011, 1pm-2pm in 3.407

At TREC 2010 Essex submitted a run to the Session Track. Dyaa will be giving an overview of the track, the Essex submissions and present the results. The working paper included in the TREC notes can be found here: [

  • Week 21 – Thursday, 24th February 2011, 1pm-2pm in 3.411 – (note the change of room)

Starting this week we will read through the chapters of the following book: Daphne Koller and Nir Friedman. (2009). Probabilistic Graphical Models: Principles and Techniques. MIT Press. [book site]

We started this week with the first chapter (introduction).

  • Week 22 – Thursday, 3rd March 2011, 1pm-2pm in 3.411

This week Mahmoud and Roseline are presenting CLUTO, a cross-platform graphical application for clustering low-high dimensional datasets.

More information about CLUTO are available here:

Roseline’s presentation

Mahmoud’s presentation

  • Week 23 – Thursday, 10th March 2011, 1pm-2pm in 3.411 – No meetings this week.
  • Week 24 – Thursday, 17th March 2011, 1pm-2pm in 3.411

We are reading the first half of chapter 3 of the Koller and Friedman (2009) book “The Bayesian Network Representation” (sections 3.1 and 3.2). Deirdre to lead the discussion.

Spreadsheet implementation of figure 3.4, pp.53 of Koller and Friedman (2009) (.ods Open Office Spreadsheet format)
(.xls MS Excel Spreadsheet format)

More information on how to get the chapter were circulated in the LAC group list.

  • Week 25 – Thursday, 24th March 2011, 1pm-2pm in 3.411

We are reading the second half of chapter 3 of the Koller and Friedman (2009) book “The Bayesian Network Representation” (sections 3.3 and 3.4). Dyaa lead the discussion.

  • Week 26 (Easter Vacation) – Thursday, 31st March 2011, 1pm-2pm in 3.409

We are reading the remainder of second half of chapter 3 (sections 3.3 and 3.4). Dyaa and Deirdre lead the discussion.

  • Week 27 (Easter Vacation) – Thursday, 7th April 2011, 1pm-2pm in 3.409 – No meeting this week.

Summer Term 2011

  • Week 30 – no meetings
  • Week 31 – Thursday, 5th May 2011, 1pm-2pm in 5B.105

We are reading section of the PGM book onwards. We have also agreed to present one exercise each from the list of exercises on page 96 onwards, as follows:

Kakiap.96: ex. 3.5
Deirdrep.98 ex. 3.12
  • Week 32 – Thursday, 12th May 2011, 1pm-2pm in 5B.105 – We worked on with exercise 3.5.
  • Week 33 – Thursday, 19th May 2011, 1pm-2pm in 5A.108 – We will carry on with the rest of exercises.
  • Week 34 – Thursday, 26th May 2011, 1pm-2pm in 5A.108 – no meeting this week
  • Week 35 – Thursday, 2nd June 2011, 1pm-2pm in 5A.108 – no meeting this week
  • Week 36 – Thursday, 9th June 2011, 1pm-2pm in 5A.108 – Invited talk by Dr. Kamaran Fathulla, IA, Essex – “A richer understanding of Bayesian Network Diagrams”

Abstract: This talk introduces a richer approach for understanding Bayesian Network Diagrams. The approach states that diagrams including Bayesian types have an underling higher level structure that transcends their application specifics. The way to identify these structures requires a departure from the limitations of our existing reductionist ways of understanding diagrams. The suggested approach treats diagrams as real world entities functioning in a number of Aspects. The Spatial and Symbolic Aspects are considered as the ones needed to elegantly capture the meaning of diagrams and distinguish them from other types of entities. The way the two Aspects are manipulated to generate a number of meta modes of diagrams is discussed. Aided by these modes a rethink of Bayesian type diagrams is worked out. The topic is clearly rooted in philosophy so a short introduction of the philosophical underpinnings is introduced.

Talk slides

Additional papers:;

  • Week 37 – Thursday, 16th June 2011, 1pm-2pm in 5A.108 – This week we carried on with chapter 4 pf the PGM book (sections 4.1). Kakia to lead the discussion. Slides
  • Week 38 – Thursday, 23rd June 2011, 1pm-2pm in 5A.108 – This week we carry on with section 4.2. Kakia to lead the discussion. The resource mentioned in the meeting is available from here (essex users only).
  • Week 39 – Thursday, 30th June 2011, 1pm-2pm in 5A.108 – no meeting this week
  • Week 40 – Thursday, 7th July2011, 1pm-2pm in 5A.108 – This week we carry on with the last part of section 4.2. Kakia to lead the discussion.

This is a wiki for the Language and Computation Group at the University of Essex. This page is used to maintain information about our regular meetings, with links to relevant papers and other resources.

Topics for 2009—10

  • To be scheduled:
    • Web usage mining (Sharhida)
    • Anawiki Presentation (Jon)

Autumn Term 2009

  • Language and Computation Day – Week One, Thursday, 8th October 2009, all day – 5A.332 – Details available from this page
  • Week 2 – Thursday, 15th October 2009, 10-12pm – 1NW.3.7

Meeting held to decide timing and topics of LAC meetings during the Autumn Term.

The LAC workshop meetings for this term were scheduled on Monday, 11am-1pm in room 3.406. Apologies to those that cannot make this time.

The running topic will be learning about the “R” open source statistics package. The current proposal is that we alternate between sessions on “R” and other presentations and activities.

  • Week 3 – Monday, 19th October 2009, 11am-12pm in 3.406

Introduction to R, an open source implementation of the S language.

Some of the things that the group would like to do with R this term are:

  • Hierarchical linear modelling
  • Drawing nice graphs/plots
  • Curve Fitting
  • Frequencies
  • Understanding Statistics better
  • Apply to teaching and PhD students supervision
  • Non-normal distribution
  • Apply to Data
  • Use it as a (free & open-source) platform from statistics tests

We will be reading from the following books:

  1. Baayen, R. (2008). Analyzing linguistic data: A practical introduction to statistics using R.
  2. Dalgaard, P. (2008). Introductory statistics with R.
  3. Gries, St. (2009) Quantitative corpus linguistics with R. see accompanying site.

FAQ and manuals can be found at: R Homepage.

Some resources are also available here, in the LAC group material folder (accessible by Essex users only).

  • Week 4 – Monday, 26th October 2009, 11am-12pm in 3.406 – We discussed chapters 1-3 from Dalgaard (2008). R command file for the first R lab attached to this page (click on Files at the right bottom corner of the page).
  • Week 5 – Monday, 2nd November 2009, 11am-12pm in 3.406 – We discussed chapters 3-4 from Dalgaard (2008). R command file for the second R lab attached to this page (click on Files at the right bottom corner of the page).
  • Week 6 – Monday, 9th November 2009, 11am-12pm in 3.406 – We will be discussing sections 3.5.3 and chapter 4 from Dalgaard (2008).
  • Week 7 – Monday, 16th November 2009, 11am-12pm in 3.406 – Continued discussion of R.
  • Week 8 – Monday, 23rd November 2009, 11am-12pm in 3.406 – Continued discussion of R.
  • Week 9 – Monday, 30th November 2009, 11am-12pm in 3.406 – Continued discussion of R.
  • Week 10 – Monday, 7th December 2009, 11am-12pm in 3.406 – Continued discussion of R.
  • Week 11 – Monday, 14th December 2009, 11am-12pm in 3.406 – Continued discussion of R.

Spring Term 2009

  • Week 16—19 – Monday, 11th January—8th February 2010, 11am-1pm in 3.406 – Continued discussion of R.
  • Week 20 – Monday, 15th February 2010, 11am-1pm in 3.406 – Planning meeting for future topics.
  • Week 21 – Monday, 22nd February 2010, 11am-1pm in 3.406 – Regression analysis with R.
  • Week 22 – Monday, 1st March 2010, 11am-1pm in 3.406 – Mini “Language and Computation Day” with talks by PhD students.
  • Week 23 – Monday, 8th March 2010, 11am-1pm in 3.406 – Presentation by Adela Gánem

Topic: JISC funded project on the evaluation of SIMiLLE (System for an Immersive and Mixed reality Language Learning Environment), a 3D Virtual World. More information can be found at

  • Week 24 – Monday, 15th March 2010, 11am-1pm in 3.406 – We will be reviewing papers for ACL, the ACL Student Research Workshop and ECAI. (This can be a very informative experience, and may help us write better papers, as well as giving a preview of the current trends in NLP research.)
  • Week 25 – Monday, 22nd March 2010, 11am-1pm in 3.406 – Assume there is no meeting this week unless you hear otherwise by email. Resuming meetings in the Summer Term.

Following topics to be picked up next term

  1. Analysis of real data in R
  2. Presentation by Deirdre
  3. Doug on BAe message analysis project

This is a wiki for the Language and Computation Group at the University of Essex. This page is used to maintain information about our regular meetings, with links to relevant papers and other resources.

Summer Term meetings will normally be scheduled 9am to 11am on Friday in 5B.111A. (Please note this is not the CSEE Seminar Room.)

Language and Computation Day

Week One, Friday, 3rd October 2008, all day – 4B.531

Details available from

Autumn Term 2008

For the Autumn Term, 2008, meetings were normally on Friday at 1-3pm in the CSEE/CS Seminar Room, unless otherwise indicated. Meetings in Week Four, Five and Seven moved to Thursday, 10am-12 noon (10.30 start for Week Seven).

  • ACL 2008 papers

For much of the Autumn Term we worked through a selection of ACL 2008 papers. Here is a list indicating which papers people are willing to present. Crossed out entries have been done. Bold entries are the ones we are currently reading. Feel free to indicate which of the above papers you would be interested in presenting in the future (edit the wiki, or email Chris).

KakiaP08-1002, P08-2008
JonP08-2012, P08-1002 looks interesting too
ChrisP08-1069, P08-1026
DeirdreP08-1018, P08-1017
  • Week 2 – Friday, 10th October 2008, 2-4pm – 4B.531

Meeting held to decide timing and topics of LAC meetings during the Autumn Term.

We identified a number of workshops/sessions that might be of interest, and for each of them we have a volunteer who will study the program and select (and present) a paper that we will all read by next Friday.
  • W1 Human Judgements in Computational Linguistics (Udo)
  • W5 Knowledge and Reasoning for Question Answering (Chris)
  • W6 Grammar Engineering Across Frameworks (Kakia)
  • W7 2nd Information Retrieval for Question Answering Workshop (Richard)
  • Main conference: Machine Translation sessions (Doug)

Link for COLING workshops:

We could not find a slot which would suit all of us so we agreed on the following slot for now: Friday 13:00-15:00 (CS Seminar Room – the room is to be confirmed). We also identified a second slot: Thursday 11:00-13:00. We anticipate that this might be a better time in some weeks so that Dyaa and Lorna can attend as well.
Chris agreed to look into wikis and to set up a simple one so that we will manage the reading group via a wiki rather than emails or a web page. Doug will liaise with Computing Services and talk to Chris.
  • Week 3 – Friday, 17th October 2008, 1-3pm – 4B.531
We went over some topics and papers from the recent COLING conference, and identified three of interest (grammar parser, flat semantics for translation, evaluation of annotators):

Kakia to present the Santaholma (2008) paper on Friday 24.10.2008 (see below for further info).

Chris reported he had looked into various options for wikis, including in-house Microsoft software and external sites using open source wiki software. It was agreed we would initially go for a simple, externally visible wiki that supports latex markup for those that need it.
  • Week 4 – Friday, 24th October 2008, 1-3pm – 4B.531
CoLing 2008
Kakia presented the Multilingual Grammar Resources in multilingual Application Development paper by M. Santaholma from the CoLing 2008 Workshop Session: “Grammar Engineering Across Frameworks” .
Link to the Workshop:
Link to the paper:
Link to the the tool used to demonstrate the typology of Japanese, English, Finnish and Greek (World Atlas of Language Structure) :
  • Week 5 – Thursday 30th October, 10am-12pm – Room 5A.540
Sonja put forward possible suggestions / directions for future reseach cooperation plans in the group.
For next week, we decided to meet on Thursday at 10am, as this time seemed to be the most convenient.
  • Week 6 – Thursday 6th November 2008, 10am-12pm – Room 1.NW3.7 in the Network Centre
ACL 2008
Doug and Kakia looked at the ACL 2008 papers and identified trends and / or papers of possible interest to the group.
Chris demonstrated how to use the wiki at the end of the meeting.

Here is the list of papers from ACL that people thought might be interesting.

As html on another site:

As normal (wiki) text here: ACL 2008 papers

We agreed that people would have a look at this list and select a couple of papers that they might find interesting presenting/lead the discussion on. The table of people and the papers they are willing to present is currently near the top of this page.

  • Week 7 – Friday, 14th November 2008, 1-3pm – 4B.531

Richard lead a discussion on Topic Indexing and Retrieval for Factoid QA by Kisuh Ahn and Bonnie Webber (, a paper from the IR4QA workshop at this year’s COLING. The complete proceedings of the IR4QA workshop can found at

Summary: This paper is about an interesting method for going through documents, finding person names etc and associating information with all those names before doing any question answering. It is thus an extension of Prager’s Predictive Annotation and could be applicable to a variety of different tasks involving documents and the information they contain.

  • Week 8 – Thursday, 20st November 2008, 10.30-12.00, CS Seminar Room (4B.531)

[ Meeting cancelled, topic moved to Week 9. ]

  • Week 9 – Friday, 28th November 2008, 1-3pm, 4B.531

We went through some of the selected ACL papers that were identified in Week 6.

Kakia presented the P08-1002 paper by Bergsma, Lin and Goebel on Distributional Identification of Non-Referential Pronouns.

Jon presented the P08-2012 paper by Finkel and Manning on Enforcing Transitivity in Coreference Resolution.

  • Week 10 – Friday 5th December 2008

Chris gave a talk outlining some of the things he has been working on during his study leave (deontic reasoning, and the analysis of obligations and permission). This is by way of a gentle introduction to slightly more technical discussions on various deontic puzzles and paradoxes, which may form the topic of a future LAC meeting.

Notes for the talk:
Suggested background reading:

We will continue with our informal review of ACL papers next week.

  • Week 11 – 12th December 2008

Deirdre and Richard presented the first of their papers selected from ACL 2008, as follows:

  1. “Selecting Query Term Alterations for Web Search by Exploiting Query Contexts” by Guihong Cao, Stephen Robertson and Jian-Yun Nie, P08-1018 (Deirdre);
  2. “Collecting a Why-Question Corpus for Development and Evaluation of an Automatic QA-System” by Joanna Mrozinski, Edward Whittaker, Sadaoki Furui, P08-1051 (Richard).

(Chris’ paper was postponed till next term.)

  • Week 12 – 19th December 2008 – No Meeting. (Resuming next term. Time and day of week to be decided

Spring Term 2009

For the Spring Term, 2009, we covered a number of topics mostly relating to the current research of LAC members.

  1. Plagiarims Detection (Zdeněk)
  2. NLTK — toolkit presentation (Doug)
  3. Log analysis of UKsearch (Richard)
  4. Document Summarisation (Mahmoud)
  5. Deontic Wiki (Chris)
  6. UIMA — toolkit presentation (Dyaa)
  7. More ACL08 papers
  • Week 16 – 23rd January 2009 — 10am, CSEE Seminar Room.

A short meeting to agree the timings and some topics for the rest of the term. (Have a peek at the suggestions page for Spring Term topics and relevant information on availability.)

  • Week 17 – 23rd January 2009 — 9am, CSEE Seminar Room.

Plagiarim Detection (Zdeněk)

  • Week 18 – 30th January 2009 — 9am, CSEE Seminar Room and 10am, Room 5N.3.7.

The provisional topic “The Good Samaritan” (Chris) was postponed.

Chris is chaired an alternative seminar, 10am—11am in 5N.3.7, to which all members of LAC were invited.

The original topic is to be rescheduled for a later date.

  • Week 19 – 6th February 2009 — 9am, CSEE Seminar Room.

NLTK — toolkit presentation (Doug)

  • Week 20 – 13th February 2009 — 9am, CSEE Seminar Room.

Log analysis of UKsearch (Richard)

  • Week 21 – 20th February 2009 — 9am, CSEE Seminar Room.

Document Summarisation (Mahmoud), including a brief discussion of the problems facing Arabic NLP.

  • Week 22 – 27th February 2009 — 9am—10am, CSEE Seminar Room.

Deontic Wiki (Chris): a brief description of a Digital Library project funded by CSEE and the Philosophy Department, and some potential applications for NLP techniques.

  • Week 23 – 6th March 2009 — 9am—11am, CSEE Seminar Room.

The UIMA Framework – presentation (Dyaa)

If there is time, this may be followed by a brief session on Mendeley for managing documents (Kakia) and possibly Foxit Reader (Jon). We were hoping to schedule this part in a lab. Unfortunately no CSEE labs are available at this time, so you may want to bring your laptop if you have one.

  • Week 24 – 13th March 2009 — 9am—11am, CSEE Seminar Room.

This week we go back to discussing some of the selected ACL08 papers that were identified in Week 6.

Kakia presented paper P08-2008 by Dmitriy Dligach and Martha Palmer on Novel Semantic Features for Verb Sense Disambiguation.

  • Week 25 – 20th March 2009 — 9am—11am, CSEE Seminar Room.

Continuation of ACL08 selected papers.

  1. “A Re-examination of Query Expansion Using Lexical Resources” by Hui Fang, P08-1017 (Deirdre);

Background Reading (Fang & Zhai):

  1. “A Formal Study of Information Retrieval Heuristics”, sigir04-formal.pdf;
  2. “An Exploration of Axiomatic Approaches to Information Retrieval”, sigir05-axiom.pdf;
  3. “Semantic Term Matching in Axiomatic Approaches to Information Retrieval”, sigir06-expansion.pdf;
  • Week 30 – 24th April 2009 — 9am—11am, 5B.111A.- Please note the change in room for this event for the Summer Term. Discussed programme for the coming term.
  • Week 31 – 1st May 2009 — 9am—11am, 5B.111A. –“The Good Samaritan and the Hygienic Cook.” (Chris)

Version of a talk to be presented at Philosophy of Language and Linguistics (PhiLang2009)

Preliminary Abstract: When developing formal theories of the meaning of language, it is appropriate to consider how apparent paradoxes and conundrums of language are best resolved. Unfortunately, given the complexity of language, it is not always entirely clear how to apportion the “blame” for our intuitions about a given example, and how the interpretation of language is best factored into different aspects of meaning. Furthermore, variations in the wording sometimes appear to give rise to very different intuitions. Perhaps it is variations in the behaviour of similar examples that may help give some clues as to the appropriate factorisation of the interpretation of language, and help us to refine our understanding of problematic phenomena.

In the case of Deontic Logic, which seeks to model reasoning with obligations and permissions, there are a range of familiar paradoxes, including the so-called Good Samaritan Paradox (Prior 1958), where we wish to avoid any implication that we ought to rob someone if we assent to the obligation to “help a robbed man”. Such an obligation may be expressed by a sentence like the following.

(1) You must help a robbed man.

The fact that we do not normally take this to mean that we are obliged to rob someone leads to questions about the implicit scoping of the obligation operator with respect to modifier expressions (Castañeda 1981), or whether obligation distributes across conjunction, as in Standard Deontic Logic (e.g. McNamara 2006). In the literature, the intuitions about such examples is assumed to be clear and obvious, even if the means by which they are best captured is open to some debate.

The quality and nature of these intuitions might be undermined, or perhaps refined, if we consider examples of the same form, but with different words, such as

(2) You must use a clean knife.

In this case, we may be happy to conclude that there is an obligation to ensure the knife has been cleaned. But what then is the source of the intuition that there is no obligation to rob in (1)? Perhaps our intuitions about how such examples are best analysed is influenced by pre-existing moral assumptions and value judgements (e.g., that it is wrong to rob). If we fail to take this possibility into account, then our intuitions about specific examples may lead us astray when seeking universal rules governing deontic statements. If we are to take possibility into account, then the question remains as to how we are to do so. We could consider obligations as defeasible (Bonevac 1998, Makinson and van der Torre 2003). More specifically, we might attribute our intuitions to some defeasible generic interpretation (Carlson & Pelletier 1995). Alternatively, we might consider the different intuitions arising from some implicit focus on “help” in (1) and “clean” in (2).

This exemplifies the difficulty in attributing our intuitions about a particular example to a particular facet of language, and the need to look for examples that challenge our intuitions when formulating general principles.

  • Week 32 – 8th May 2009 — 9am—11am, 5B.111A. – Stephen to present the Autoadapt project.
  • Week 33 – 15th May 2009 — 9am—11am, 5B.111A. – No meeting.
  • Week 34 – 22nd May 2009 — 9am—11am, 5B.111A. – Kakia to present a brief report on the recent EACL 2009 Conference in Athens, Greece. slides – Deirdre to present a brief report on the recent ECIR 2009 Conference in Toulouse, France. Slides

(Other conference reports and discussions of upcoming summer school and conference programmes also welcome.)

  • Week 35 – 29th May 2009 — 9am—11am, 5B.111A.

This week we will each present a 15-minute Summaries of EACL09 Papers.

DBpedia – A Linked Data Hub and Data Source for Web and Enterprise Applications, Georgi Kobilarov, Chris Bizer, S Aren Auer, Jens Lehmann, WWW2009 Developer’s Track Proceedings.

A Brief Report on the Semantic Search Workshop at the WWW 2009 conference, Madrid, Spain.

NLP and the humanities: the revival of an old liaison, Franciska de Jong summary slides
  • Week 36 – 5th June 2009 — 9am—11am, 5B.111A.
Learning Efficient Parsing, Gertjan van Noord

Automatic Single-Document Key Fact Extraction from Newswire Articles, Itamar Kastner, Christof Monz

Topics for Summer Term 2009

  • Scheduled
  1. 1st May: The Good Samaritan (Chris)
  2. 8th May: Autoadapt project (Stephen)
  3. 15th May: (no meeting)
  4. 22nd May: Conference reports and programmes (Deirdre, Kakia, and all?)
  5. 29th May: Adaptive Domain Models (Deirdre)
  • To be scheduled
    • Web usage mining (Sharhida)
    • Anawiki Presentation (Jon)

Language and Computation Seminar, 2005/06: 
Empirical Methods in NLP

This year’s seminar is about how to design an experiment both in general and with specific application to NLP, how to test an  hypothesis and, more in general, how to evaluate an NLP system. We will try to use examples from anaphora resolution, but we will also read experimental work from other areas of NLP.

The most novel feature of this year’s seminar is that this is going to be an AUDIENCE PARTICIPATION SEMINAR, meaning that the participants (you) are all expected to present some material; Massimo will only do few presentations …. so have a look at the topics identified below and decide what you’d like to read.

This term, the seminar will meet in the Colloquium Room in Computer Science (5A.540, next to Massimo’s office), Tuesdays, 11-12:45.

This page:

  • Primary Text:
    • Paul Cohen, Empirical Methods in AI, MIT Press, 1995
  • Supplementary Readings I: Statistics
    • Woods, Fletcher, and Hughes,  Statistics in Language Studies. Cambridge.
    • R. Kirk, Experimental Design, Brooks / Cole
  • Experimental design (in psychology): a first introduction
    • October 11th, Sonja
      • Readings:
        • Sonja’s handout
        • Cohen, ch. 3
        • Kirk, chapter 1
  • Hypothesis testing: a first introduction 
  • Experimental design II: Latin Square designHypothesis testing II: the t-test and its applications  
    • November 15th: t-test (Mijail)
      • Dietterich, 1998
      • For a more basic intro, see Cohen ch. 4 /  Woods and Hughes ch 8
    • November 22nd: use of t-test to compare the performance of anaphora resolution systems:
  • Evaluation in NLP & Anaphora Resolution (November 29th)
  • Hypothesis testing III: Computer-intensive methods  (December 6th)
    • General motivation (for which kinds of population parameters you can’t use the t-test?)
      • Readings: Cohen, ch. 5 (handouts still available from Massimo)
    • Further readings:
      • The standard reading is Noreen, E. W., (1989), Computer intensive methods for testing hypotheses, John Wiley and Sons (but our library doesn’t have it)
      • Alternatives: Manly, B. F., Randomization and Monte-Carlo methods in biology, Chapman and Hall, 1991
      • Edgington, E. S., Statistical inference: the distribution-free approach, McGraw-HIll, 1969.
  • Experimental design, IV:  Power calculations (December 13th)
    • December 13th: Nancy
      • Main readings: Cohen, ch. 4
      • Further reading: R. Kirk, ch. 1 (t-test)
    • January 10th: Riccardo Russo
    • January 17th: An example of power calculations – dual models vs connectionist explanations of morphology (Sonja)
  • Hypothesis Testing IV: ANOVA
    • January 24th, Theoretical introduction (Ron)
      • Readings: Cohen ch. 7? Woods-Hughes ch. 12?  Kirk chapter 5?
    • ANOVA in psychology:
      • The Poesio et al 2001 paper on underspecification in Anaphora?
      • The Spivey et al paper?
    • ANOVA in NLP: examples
  • Experimental design IV: examples of good practice in experimental design in AR & NLP
  • Hypothesis testing V: Chi-square  
    • Basic intro: Olivia? 
      • Readings: Woods and Hughes ch 9?
    • Applications to anaphora / NLP
      • Readings: Poesio to appear??
    • An alternative to Chi-square: log-likelihood (Dunning CL 1993)
  • Experimental design, III: Sample design
    • General intro
      • Readings: from R. Kirk
    • Corpora used in AR / NLP
  • Hypothesis Testing VI: Other distributions
    • Binomial & the sign test
    • Poisson
  • Additional forms of  performance assessment
    • Learning curves (Richard? Mijail?)
      • Readings: Cohen ch. 6
      • Maybe also go back to Cohen ch. 2 (advanced visualization)
    • Analysis of a decision tree
      • Readings: Shiberg et al
    • Feature selection
  • More Hypothesis Testing:
    • Linear regression
      •  Readings: Woods-Hughes ch. 13?
    • Logistic regression
    • Magnitude estimation
  • Improving the performance of ML systems

Language and Computation Seminar, 2004/05: 
Underspecification and Incrementality

The theme of this year’s L&C seminar is the role of underspecification in disambiguation  and language theorizing. We will cover both psychological evidence about language  processing – including evidence suggesting that much of language processing takes place incrementally – and formalisms that have been proposed to represent lexical, referential, and scope underspecification.

During  the Autumn term, the seminar will meet on Tuesday afternoon, 3-5, in the Colloquium Room in Computer Science (5A.540, next to Massimo’s office).

This page:

  • Background I: Ambiguity and Vagueness
    • Manfred Pinkal,  Logic and Lexicon. Kluwer, 1995.
  • Background II: Psycholinguistics
    • M. Gernsbacher (ed), The Handbook of Psycholinguistics, 1994.
    • G. Altmann (ed). Psycholinguistics: Core Concepts. Routledge, 2002 (especially volumes II and III)
  • Background III: Early psychological work (not read in seminar)
    • Bever, T. G. (1970). The cognitive basis for linguistic structure. In J. R. Hayes (ed), Cognitive Development of Language. (p. 279-363). Wiley.
  • Background IV: Latest complete  draft of Massimo’s monograph …
    • M. Poesio, Utterance Processing and Semantic Underspecification (pdf), February 2000
  • October 19th: Introduction (Ambiguity, Vagueness, Combinatorial Explosion)
    • Massimo’s handout
    • Some useful references:
      • Chierchia and McConnell-Ginet (1990), Meaning and Grammar, MIT Press. (Ch. 1, 3)
      • Hirst, G. (1987) Semantic Interpretation and the resolution of ambiguity, Cambridge. (Ch. 1)
      • Lyons (1995), Linguistic Semantics, Cambridge.  (ch. 1, 2, 3, and 9)
      • Pinkal, M. (1995), Logic and Lexicon, Kluwer.
  • Incrementality: Psychological evidence and computational models  
    • October 26th (Massimo):  The Garden Path model.
      • L. Frazier (1987), Sentence Processing: A Tutorial Review. In M. Coltheart (ed),  Attention and Performance XII, Erlbaum, 559-586. handout
      • Also relevant:
        • G. T. M. Altmann and M. J. Steedman (1988), Interaction with context during human sentence processing, Cognition 30(3), 191-238
        • Ferreira, F. and Clifton, C. (1986), The independence of syntactic processing, Journal of Memory and Language, 25, 348-368
    • November 2nd: NO MEETING THIS WEEK!!
    • November 9th and 16th (Massimo): The role of lexical and discourse information, revisited.
      • M. C. MacDonald et al (1994), Lexical nature of syntactic ambiguity resolution, Psychological Review, 101(4), 676-703
      • Tanenhaus et al (1995), Integration of visual and linguistic information in spoken language comprehension, Science
      • Other useful papers:
        • Trueswell et al (1993), Semantic influences on parsing, Journal of Memory and Language, 33(3), 285-318
    • November 23rd (Doug, Massimo): computational models of human parsing.
      • ch. 6 of Hirst 1987
      • Kempen, G. (1996). Computational models of syntactic processing in language comprehension. Ch. 8 of T. Dijkstra and K. de Smedt (ed), Computational Psycholinguistics, Taylor and Francis.
      • Other readings:
        • Steve Abney  (1989). A computational model of human parsing. Journal of Psycholinguistic Research, 18, 129-144. (pdf)
        • Fernando Pereira (1985). A new characterization of attachment preferences. In D. R. Dowty and L. Karttunen and A. M. Zwicky (eds), Natural Language Parsing–Psychological, Computational and Theoretical perspectives, pages 307-319. Cambridge University Press
        • Stuart M. Shieber (1983).  Sentence Disambiguation by a Shift-Reduce Parsing Technique.  Proc. of IJCAI,  699-703
    • December  6th: Incrementality in lexical semantics
      • ch. 4 of Hirst 1987
      • Greg Simpson (1994), Context and the processing of ambiguous words. In M.A. Gernsbacher (Ed.), Handbook of psycholinguistics (pp. 359-374). San Diego: Academic Press
      • Swinney, D. A.. (1979). Lexical access during sentence comprehension: (Re)consideration of context effects. Journal of Verbal Learning and Verbal Behavior 18:645-659
      • Other readings:
        • Tanenhaus  M.S., Leiman, J.L., and Seidenberg, M.S.  (1979). Evidence for multiple stages in the processing of ambiguous words in syntactic contexts. Journal of Verbal Learning and Verbal Behavior, 18, 427-440
        • Seidenberg M.S., Tanenhaus, M.K., Leiman, J.L., and Bienkowski, M.  (1982). Automatic access of the meanings of ambiguous words in context:  Some limitations of knowledge-based processing.  Cognitive Psychology, 14,  470-519
        • Tabossi, P. (1988). Accessing lexical ambiguity in different types of sentential contexts. Journal of Memory and Language, 27, 324-340
        • K. Rayner and S. A. Duffy,  (1986). Lexical complexity and fixation times in reading: Effects of word frequency, verb complexity, and lexical ambiguity. Memory and Cognition
    • December 14th: Jurafsky, D. (1996), A probabilistic model of Lexical and Syntactic Access and Disambiguation, Cognitive Science, 20, 137-194 (pdf)
    • January 20th: Incrementality in reference
      • Sedivy, Tanenhaus, Chambers & Carlson (1999). Achieving Incremental Semantic Interpretation through Contextual Representation. Cognition, 71(2): 109-147 (doc)
      • Arnold, J. E., Eisenband, J. G., Brown-Schmidt, S, and Trueswell, J. C. (2000). The immediate use of gender information: eyetracking evidence of the time-course of pronoun resolution. Cognition 76, B13-B26 (pdf)
  • Underspecification: early parsing models and psychological evidence  
    • February 3rd: General arguments for underspecification, The Moses effect
      • AJ Sanford & P. Sturt (2002) Depth of Processing in language comprehension: Not noticing the evidence. Trends in Cognitive Sciences  6(9) pp 382-386
      • S. Barton and A. J. Sanford (1993). A case study of anomaly detection: Shallow semantic processing and cohesion establishment. Memory and Cognition 21(4), 477-487.
      • Other readings:
        • Some chapters from Lyn Frazier (1999), On Sentence Interpretation, Dordrecht: Kluwer
    • February 24th: Underspecification in syntax.
      • Marcus, M., Hindle, D.,  & Fleck (1983),  Talking about talking about trees, in Proceedings of the ACL (pdf)
      • Some readings from L. Frazier and C. Clifton (1996), Construal, MIT Press.
      • Other readings:
        • P. Sturt and M. Crocker (1996).  Monotonic Syntactic Processing: A Cross-Linguistic Study of Attachment. Language and Cognitive Processes; 11(5):449-494. (pdf)
        • Fernanda Ferreira, Karl G. D. Bailey and Vittoria Ferraro (2002). Good-enough representations in Language Comprehension. Current Directions in Psychological Science, 11-15.
    • March 10th: Underspecification in lexical semantics:
      • Lyn Frazier and K. Rayner (1990).  Taking on semantic commitments: Processing multiple meanings vs. multiple senses. Journal of Memory and Language, 29, 181-200
    • Other readings: Underspecification in reference: 
      • Garrod, S., Freudenthal, D. & Boyle, E. (1994) The role of different types of anaphor in the on-line resolution of sentences in a discourse. Journal of Memory and Language. 33, 39-68
  • Evidence about scope
    • April 6th: Howard S. Kurtzman and Maryellen C. MacDonald (1993). Resolution of quantifier scope ambiguities. Cognition, 48, 243-279.
    • April 14th: R. Filik, K. B. Paterson, and S. P. Liversedge (2004). Processing doubly quantified sentences: evidence from eye movements. Psychonomic Bulletin and Review.  (pdf)
    • Other readings:
      • chap. 1 of Poesio 1994
      • Some papers by Gualmini and Musolino
  • April 21st: ACL wshop on prepositions 

  • Theories of logical form
    • May 11th? Moore, 1981; Schubert and Pelletier, 1982
    • May 18th: Fenstad et al, 1987
    • Alshawi and van Eijck, 1989
    • MRS
  • Hobbs’ “flat” logical representation  
    • Hobbs 1982, 1983, 1986
    • Verbmobil 2000?
  • The logic of ambiguity
    • Pinkal (1996)
    • van Deemter (1996)
  • The semantics of logical forms
    • Alshawi & Crouch 1992, Poesio 1991, 1996, possibly van Eijck & Jaspars
  • Theoretical models: lexical underspecification 

    • Hirst (1987), ch. 5
    • Pinkal
    • Copestake & Briscoe
    • Poesio 1996
    • Pustejovsky 1998
  • Theoretical models: underspecification in reference  
    • Alshawi (1991)
    • Poesio (2001)
  • Theoretical models: syntax
    • D-tree theory, Backofen et al
  • Theoretical models: Scope
    • UDRT: Reyle, 1993
    • Muskens
    • CLLS
      • Pinkal, 1996
      • Egg et al, 2001
    • Fox & Lappin 2004
    • ?? Hole semantics??
  • Theoretical models: collective / distributive underspecification
    • Kempson & Cormack 1981?
    • Frank and Reyle 1995



  • ARRAU (in construction)

Other useful Web links:

Language and Computation Seminar, 2003 / 2004: 
The Acquisition of Lexical & Ontological Knowledge

This seminar is now ended; this page will be kept around  as a pointer to the literature. 

This page:

  • Background I: The lexicon
    • Cruse, D.A. Lexical Semantics. Cambridge University Press, 1986.
    • J. Pustejovsky, The Generative Lexicon. MIT Press, 1995.
    • Marconi, D. Lexical Competence. MIT Press, 1997. 
    • Murphy, G. L. The Big Book of Concepts. MIT Press, 2002.
  • Background II: Hand-coded lexical resources
    • WordNet: the standard reference is the book edited by C. Fellbaum, WordNet, MIT Press, 1998. A number of papers and manuals about WordNet, as well as the system itself, can be downloaded from the project’s website
    • COMLEX and NOMLEX: C. MacLeod, R. Grishman, A. Meyers, L. Barrett, and R. Reeves. NOMLEX: A lexicon of nominalizations. Proc. of EURALEX, 1998.
    • Oxford Dictionary of English: we got the ODE as part of a joint project with Oxford. A description of the latest extensions is in McCracken’s paper at EACL 03
  • Background III: Ontologies There is a lot of connection between research on the lexicon and research on ontologies. Here are some of the many web sites dedicated to ontologies.
  • Background IV: Information Theory
  • Background V: Acquiring  lexical information about verbs (verb classes)
    A lot of work in lexical acquisition has to do with lexical properties of verbs – particularly subcategorization and selectional restrictions. The classic work in this area is covered in Manning and Schuetze, chapter 8, that we read last year.
    • Michael Brent (1993): From grammar to lexicon: unsupervised learning of lexical syntax. Computational Linguistics, 19(3):243-262.
    • Christopher D. Manning (1993): Automatic acquisition of a large subcategorization dictionary from corpora. Proceedings of the 31st Meeting of the ACL,  pp. 235-242. Columbus, Ohio.
    • Phil Resnik  (1993). Selection and Information: A Class-Based Approach to Lexical RelationshipsCognition
    • Paola Merlo and Susanne Stevenson (2001). Automatic verb classification based on statistical distributions of argument structure. Computational Linguistics, 27(3), 373-408.
    • Sabine Schulte im Walde (2003). Experiments on the choice of features for learning verb classes. Proc. of EACL
    This area remains very active. A lot of new work has come up  in connection with the Framenet project.
  • Old work we didn’t get a chance to read:
  • October 8th: Adam Kilgarriff (CS Seminar)
    Some references for those who want to read more about Kilgarriff’s work on thesauri and about WASPS:
    • Adam Kilgarriff and David Tugwell (2001). “WORD SKETCH: Extraction and Display of Significant Collocations for Lexicography”. In Proc. workshop “COLLOCATION: Computational Extraction, Analysis and Exploitation”, pp.32-38. 39th ACL & 10th EACL, Toulouse, July
    • A. Kilgarriff & C. Yallop (2000). “What’s in a Thesaurus” Proc. Second Conf on Language Resources and Evaluation Athens, May/June. Pp 1371–1379. 
  • Vector Space Representations from Psychology (and their application in NLP)
    • October 15th (Massimo):  HAL (Lund and Burgess, 1996; Burgess, 1998)
    • October 30th (Massimo) Latent Semantic Analysis. Landauer, T. K., Foltz, P. W., & Laham, D. (1998). Introduction to Latent Semantic Analysis. Discourse Processes, 25, 259-284. (pdf)
    • November 6th (Mijail): Applications of LSA to segmentation. F.Y.Y. Choi, P. Wiemer-Hastings and J. Moore. “Latent semantic analysis for text segmentation”. In Proceedings of the 6th Conference on Empirical Methods in Natural Language Processing, pp. 109- 117, 2001. (pdf)
  • Thesaurus Acquisition 
  • December 2nd: Lexical acquisition in children (Sonja)
  • Christmas Holidays!!
  • Clustering: 
    • January 15th  (Abdulrahman / Massimo): Manning and Schuetze, chapter 14
    • January 22nd (Abdulrahman / Massimo): Manning and Schuetze, chapter 14, continued
    • February 5th (Massimo): First meeting on Distributional Clustering and Lilian Lee’s work  (the  papers, including her dissertation, are available from her home page)
      • Fernando Pereira, Naftali Tishby, and Lillian Lee., Distributional Clustering of English Words.  Proceedings of the 31st ACL, pp 183–190, 1993
    • February 12th (Massimo): Baker and McCallum, Distributional Clustering of Words for Text Classification, SIGIR 1998. 
    • March 4th (Massim) More Distributional Clustering: Ido Dagan, Lillian Lee, and Fernando Pereira, Similarity-Based Methods for Word Sense Disambiguation. (1997), Proceedings of the 35th ACL/8th EACL, pp 56–63, 1997
    • March 11th (Abdulrahman): Clustering adjectives. Hatzivassiloglou & McKeown, Clustering adjectives according to meaning, ACL 1993
    • March 25th (Massimo): Clustering senses: Schuetze’s work. Schutze, H. 1998. Automatic word sense discrimination. Computational Linguistics.
    •  More wordsense clustering: McCarthy & Carroll CL 2003?
    • See also: Adam Berger’s dissertation on applying similarity to IR
  • Acquisition of taxonomic knowledge using syntactic patterns
    • April 15th: (Massimo): Hearst, M.A. (1998). “Automated Discovery of WordNet Relations” in WordNet: an Electronic Lexical Database, Christiane Fellbaum Ed, MIT Press, Cambridge MA, 1998.
      • Some useful older references cited by Hearst:
        • Hiyan Alshawi (1987). Processing dictionary definitions with phrasal pattern hierarchies. Computational Linguistics, 13, 195-202. 
        • M. Chodorow, R. Byrd, and G. Heidorn (1985). Extracting semantic hierarchies from a large on-line dictionary. Proc. of the ACL, 299-304.
        • J. Markowitz, T. Ahlswede, and M. Evens (1986). Semantically significant patterns in dictionary definitions. Proc. of the 24th ACL, 112-119.
    • May 12th: Sharon A. Caraballo (1999)
      Automatic construction of a hypernym-labeled noun hierarchy from text.
      In Proceedings of the 37th Annual Meeting of The Association for Computational Linguistics [2], pages 120-126.
      • Also: Pantel & Ravichandran, Automatic Labeling of Semantic Classes, Proc. NAACL 2004
      • See also:  Sharon A. Caraballo (2001). 
        Automatic Construction of a Hypernym-Labeled Noun Hierarchy from Text.
        Ph.D. dissertation, Computer Science Department, Brown University.
    • More interesting papers by Pantel available from his homepage:
      • Patrick Pantel and Dekang Lin. 2002. Discovering Word Senses from Text. In Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD-02). pp. 613-619. Edmonton, Canada.
      • Dekang Lin and Patrick Pantel. 2002. Concept Discovery from Text. In Proceedings of Conference on Computational Linguistics (COLING-02). pp. 577-583. Taipei, Taiwan.
      • Dekang Lin and Patrick Pantel. 2001. DIRT – Discovery of Inference Rules from Text. In Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD-01). pp. 323-328. San Francisco, CA.
    • May 19th  (Abdulrahman): Dominic Widdows – e.g., Unsupervised methods for developing taxonomies by combining syntactic and statistical information, NAACL 2003.  Other papers by Widdows are available from his homepage. Of particular interest: ” Graph Model for Unsupervised Lexical Acquisition” (COLING 2002, with Beata Dorow – building on Riloff and Shepherd’s 1997 work); and “Using LSA and Noun Coordination to Improve the Precision and Recall of Automatic Hyponymy Construction” (with Scott Cederberg; CONLL 2003; building on the Hearst 1992/1998 papers). 
    • See also: Steffen Staab’s page and the many papers on ontology acquisition there by the Karlsruhe group – e.g.,
  • May 26th: No meeting (LREC)
  • The acquisition of ontological information and concept hierarchies (Udo and Hala)
  • June 16th: no meeting (Massimo away) 
  • Extracting  part-of relations using Syntactic Constructions
  • The acquisition of causal and propositional  knowledge


Conferences in the area

Other useful Web links: