<< Click to Display Table of Contents >>
Navigation: Version History >
What's new in 10.2
Find Similar locates and ranks documents with similar concepts.
Traditional Find Similar tools look for common words between documents which are susceptible to the shortfall of having unimportant or overly common words produce undesired matches. The documents may be similar, but not in a meaningful way with respect to the document content.
Concepts, on the other hand, allow the ideas behind the search text to be considered. DWR analyzes the text and appropriate metadata to build a network of concepts based on the gist of the text. These conceptual units allow meaningful phrases and information to be directly compared to create a more relevant result based on document semantics (rather than just the presence or absence of specific common words).
DWR displays concepts and connections in graph form to provide the reviewer with a quick overview of the driving factors behind the similarity search.
By comparing the relative strengths of the concepts, a relative strength indicator (RSI) is generated to allow reviewers to understand why the comparison was made, and how much the retrieved documents relate to the initial search. A numeric value between zero and one gives the similarity results at a glance; the larger the value, the more the documents are like the initial concepts.
Similarity searches require no training period and can be tailored on the fly. Each user generates the concepts by selecting a single document or multiple documents facilitating faster searches and exploration of the document corpus thus reducing search time.
Modified Extract Text vs. OCR
The OCR menu has been re-labeled to more clearly define the action of each request as well as updated to include the vertoizing option needed for Find Similar. This process is no longer part of the production work flow and is done in review after the production is finalized.
•Make production images searchable - Create text searchable PDFs for export either in a production or as individual documents (see Exporting Documents).
•Extract and save text - Extract text (where available) from native files. Useful for productions requiring a text file for each document.
•Vectorize for Find Similar - Run this to use the "Find Similar" feature after data has been added to the matter.
•Overwrite existing text - Use this when existing text should be replaced (all other options will skip documents for which text has already been extracted).
•Attempt forced OCR of PDFs - Certain PDFs are delivered with an overlaid text file in which only the text overlay is searchable/indexed (usually a form or a produced PDF with a Bates number overlaid). This option forces the tool to use Optical Character Recognition of the PDF image and not extract the overlaid text.
Language hint - When datasets involve additional languages the OCR engine can be optimized to recognize those languages.
Additional Production Flag
For documents ingested with Bates numbers, a light blue flag will indicate the document has been produced but produced by another party to the litigation.
Create a Custom Field Directly from Metadata Overlay
When the source file contains fields for which a custom field will be needed in DWR, right click on the field in the Source Column and choose "Create custom field".
The Add/Edit Custom Column box will appear - choose the type of column needed and select Apply.
A custom column is created and populated in the DWR column in the Overlay.