Determination of Data Identity in Intellectual Property Cases

2024 04/23

Accurate determination of data identity is crucial in intellectual property cases, as it directly affects the trial and judgment results of the case. Firstly, it plays a crucial role in evidence preservation by determining data integrity or tampering, ensuring the legality of the evidence. Secondly, the determination of the identity of the data between the plaintiff and defendant helps to prove the authenticity of the infringement facts. Finally, determining data consistency can help confirm the scope of infringement and affect the determination of the consequences of infringement. It can be seen that the determination of data identity directly supports the effective implementation of the intellectual property legal system and is an important link in ensuring judicial fairness and protecting intellectual property.


Digital files are a common carrier in intellectual property cases, and software is a common object of infringement. This article will introduce common methods for determining data identity, with a focus on discussing several aspects such as file meta information and software data identity determination, in order to communicate with readers.


1、 Determination method for identical data


Data identity has two meanings. Narrowly speaking, it refers to two identical pieces of data used to prove that the data has not been tampered with or that the content of the two pieces of data is completely identical. In a broad sense, it indicates that data comes from the same source and is commonly used to prove that one piece of data is a copy of another and has been modified.


For the determination of identical data, there are usually methods such as hash algorithms, asymmetric decryption, and direct comparison.


Hashing algorithm is an important data processing tool that can map input data of any length to output data of fixed length, commonly referred to as hash value or digest. Through the hash algorithm, it is possible to determine the homogeneity of data, which includes two important concepts: uniqueness and non collision. Firstly, hash algorithms can ensure the uniqueness of data. For a given input data, the hash algorithm always generates the same hash value, which means that the same data will produce the same hash result. Therefore, by comparing the hash values of two data sets, it is possible to quickly determine whether they are exactly the same, thereby achieving uniqueness determination of the data. Secondly, using professional hash algorithms can also ensure collision free performance. Even if the difference in input data is very small, it can ensure that different input data will not produce the same or similar hash values, avoiding using hash values to infer or simulate input data. The hash algorithm can be used to ensure that the user's forensic data has not been modified after the forensic process, through uniqueness and non collision.


Asymmetric decryption is an encryption technique that involves using a pair of keys: public key and private key. These two keys are related, but the difference is that the public key can be freely distributed to anyone, while the private key is confidential and only known by the key holder. If the key that generated the data has not been cracked and the infringer still uses this method to produce the data, then if the data can be decrypted using a known public key, it proves that the data comes from the encryption result of the private key of the right holder, and the source of the data or method can be confirmed. This makes asymmetric decryption an effective tool for verifying data sources and has wide applications in fields such as intellectual property cases.


Additionally, direct comparison can be used to confirm that the two data sets are exactly the same.


2、 Common methods for determining data identity


Multiple methods can be used to determine the identity of generalized data, including direct observation, keyword comparison, and file meta information comparison.


The direct observation method is applicable to data materials with intuitive feelings such as images, sounds, and text. The parties involved and the adjudicators can judge whether the data comes from the same source based on subjective feelings. The main proof obligation of the parties in this situation lies in which data was first produced.


Keyword comparison method is an important means of judgment, especially in cases where employees leak software engineering. Some software vendors may only replace the original company or project name without replacing special keywords, which may lead to infringement being discovered. These types of keywords do not have special meanings that can be seen from plain text, but may use hash algorithms to convert company or project names into strings of numbers that are imperceptible to the naked eye, in order to achieve covert data protection. Such string infringers will not notice or notice it, but have not been modified, which can easily reveal their flaws.


Meta information refers to additional data related to the file itself, including the file's creation time, modification time, file size, file type, owner information, etc. In an infringement case, if the infringing document changes the data of the original party, the meta file may retain the same data as the original file, which is sufficient to prove the fact of infringement. Metafile information is often used to determine online data infringement. For example, the EXIF information of an image can provide detailed information such as the shooting time, location, and camera equipment used, providing important clues for proving infringement.


In addition, special information, fonts, and designs of some data rights holders can also serve as auxiliary evidence to prove the source of the data. The comprehensive application of these methods helps to ensure the accuracy and comprehensiveness of determining the identity of generalized data.


3、 Using File Meta Information to Discriminate Identity


Common meta information includes timestamps, file system information (FS info), EXIF information of images, and meta information of Word files.


A timestamp records the creation or modification time of data and can be used to compare the consistency of data blocks, especially when data is transmitted across systems or networks. In addition, timestamps can also be used to determine the order in which the same data is generated.


The EXIF (Interchangeable Image File Format) of an image file typically contains information about the shooting device, shooting time, geographic location, and more. By comparing the EXIF information of different image files, it can be determined whether they come from the same source or device, thereby helping to determine the consistency of the files.


The attribute information embedded in the file by the operating system, such as creation time, modification time, owner information, etc., can be obtained through the file attribute viewing tool or command provided by the operating system. Comparing the attribute information of different files can help determine their consistency.

The meta information of a Word file includes descriptive data such as author, title, topic, creation date, modification date, etc. These pieces of information provide important details about file content and attributes, which are very useful for file management and search, and also provide a convenient channel for determining the identity of Word files.


The Meta-info of a Word file typically contains the following information:
Title: The title or theme of a file, usually used to briefly describe the content or theme of the file.
Subject: The subject or main content of a file, providing a more detailed description of the file.
Author: The creator or original author of the file.
Company: The company or organization that creates documents.
Creation Date: The date and time when the file was created.
Modification Date: The last modification date and time of the file.
Version: The version number or information of a file, used to track its change history.

Comments: Additional explanations or notes on the content of a file.


4、 Software Data Identity Determination


As a common subject of intellectual property infringement, software contains rich data information. This section aims to determine whether there is identity in software products, in order to confirm whether there is infringement of software copy nature. Different software components are analyzed separately.


Use the file directory structure and program file content to determine identity. The file structure includes local program files and server files. Local files are relatively easy to download and analyze, but the server is difficult to obtain because it is on the other party's machine without judicial intervention. It is possible to determine whether there is copying or plagiarism between two software based on the file directory structure and file name.


Determine identity through data structure and database structure. If the table name, structure, and name of the software database are the same as other software. And in cases where this name is not yet universal, it can be inferred that the software has plagiarism or duplication. In addition to databases, locally stored files also have this recognizability.


Perform identity determination through software configuration files. Configuration files are generally important files that affect software functionality during startup. The suffix is. ini or. cfg, etc. For configuration file configuration segment names, configuration item key names, comments, etc. And the path of the configuration file can be used as a comparison of identity or similarity.


Perform identity determination through static and dynamic library files. For reuse without leaking code. Some core functions will be created into library files without the need for recompilation. And software operation depends on these libraries. If the library file is determined to be the same file through a hash algorithm, there is a high possibility of direct copying between the two software.


Determine identity through software interface and interaction behavior. Compare the user interface design of software, including layout, colors, icons, buttons, and other elements. If the interface designs of two software are very similar, it may indicate that they may be related. Pay attention to whether the overall style of the interface and user experience are similar, such as whether the same interaction design patterns, UI components, etc. are used. Observe the interaction behavior of the software, including user actions, response time, page jumps, data input and output, etc. If the interaction behavior patterns of two software are very similar, it may indicate a connection between them. Pay attention to whether the operation process and function implementation of the software are similar, such as common operations such as user registration, login, data query, and form submission.


epilogue


This article introduces various methods for determining data identity, including techniques such as hash algorithms and asymmetric decryption, as well as comparisons of file meta information and software data structures. The comprehensive application of these methods provides important basis for the trial of intellectual property cases. Protecting intellectual property rights not only requires the support of legal systems, but also the continuous progress and application of technological means. I hope this article can be helpful in the judicial practice of intellectual property cases.