Time is Encoded in the Weights of Finetuned Language Models

Time vectors are an innovative solution to a significant challenge in language modeling: understanding texts from different time periods. As language evolves, words and their usages can change dramatically, leading to what is known as the temporal misalignment issue. This occurs when a language model trained on data from one era struggles with texts from another, due to differences in language use across time.

To address this, researchers have developed the concept of time vectors. The process begins with a base language model that has a broad but general knowledge. This model is then finetuned with texts from a specific time period, say the year 2000. This finetuning equips the model with an in-depth understanding of the language and trends from that era.

The key to time vectors lies in the comparison between the finetuned model and the original base model. The differences in their settings, termed as "time vectors," capture the unique linguistic characteristics of the specific time period. When these vectors are added back to the base model, it enhances the model's ability to process and understand texts from that time period more accurately.

One of the remarkable features of time vectors is their similarity for adjacent time periods. For instance, vectors for the years 1999 and 2000 would be quite alike. This allows for the interpolation of these vectors to create new ones, enabling the model to understand a range of time periods, even those it hasn't been explicitly trained on. This approach is not only versatile across different tasks and model sizes, but also scalable over various time scales.

This technique represents a significant advancement in adapting language models to the fluid nature of language, enabling them to remain relevant and effective across different historical contexts. It also opens up possibilities for creating models tailored to specific eras, enhancing the accuracy and applicability of language models in diverse applications.

Ref : https://arxiv.org/abs/2312.13401